Project offerings 2015

Click on the supervisors name for a description of the projects they are offering.

Projects will be added over the coming weeks.

Supervisor

Project/s

Vera Chung

 

 

 

 

Interactive mobile app and system for on-site visitors

Object tracking for visiting applications

Deep Learning for object detection 

Vincent Gramoli USyd Malware Traffic Analysis
Benchmarking Concurrent Data Structures on Multi-core Machines
Evaluating Consensus Protocols in Distributed Systems
Seok-Hee Hong Scalable visual analytics
Visualization and analysis of large and complex biological networks and social networks
2.5D Graph navigation and interaction techniques
Drawing algorithms for Almost Planar graphs
MulitPlane graph embedding (2.5D Graph embeddability)
Bryn Jeffries and Sanjay Chawla Estimating fatigue of subjects from multimedia sources

Kevin Kuan

Wisdom of the Crowd in E-Commerce
Information Presentation in Group-Buying
Opinion Mining in Online Consumer Review
David Lowe    Augmented reality remotely-accessed labs
Architecture for collaborative remotely-accessed labs
Remote control of physical systems: Understanding Situational Awareness
Using VR glasses and a Wii controller to interact with remotely-accessed labs
Josiah Poon   Extracting numerical information from literature for meta-analysis
What are the important things in the Terms and Conditions?

Javid Taheri

Data/Network/Speed-Aware Job Scheduling for Distributed/Cloud Computing
Modelling a Real Distributed Environment

Projects supervised by Vera Chung

Interactive mobile app and system for on-site visitors
This project is to design and implement an iPhone app for remote visiting applications. For example, in the scenario of zoo visiting, the app should allow mobile users to remotely select online zoos and select their favourite animals for viewing (video cameras have already been deployed in the zoo). The mobile app should also provide more interactions between the visitor and the zoo. For example, the mobile user can subscribe his/her favourite animal and will receive push notification once the real-time video about that animal is available. Other functions include user can input comments, share his/her picture or video captures, ticket booking, etc.

The example of digital zoo:
http://hektor.digitaldjurpark.se/wp/wordpress/
http://baike.baidu.com/zoo

Object tracking for visiting applications
This project is to study object tracking algorithms using the swarm intelligence approaches e.g. Particle Swarm Optimization, Simplied Swarm Optimization, Genetic Algorithms, ,,You will apply the algorithms into video tracking in visiting applications. We have real-world animal video data captured from a zoo. You will implement the object tracking algorithms for these videos for animal tracking, give performance comparisons, and customized or improve the tracking results for this kind of applications. It is also possible to improve the tracking results by combining the video data and the simultaneous RFID data available (offline). The example of digital zoo:
http://hektor.digitaldjurpark.se/wp/wordpress/
http://baike.baidu.com/zoo

Deep Learning for object detection
As we move towards the image understanding, more precise and detailed object recognition information are important to us. In this context, we not only care about classifying images but also the precisely estimating the class and location of objects contained within the images. In this project, you will use the Deep Neural Networks to solve the problem of object detection. You will classify the object first then to precisely localize objects. The problem you will solve is challenging as you need to detect a potentially large number of object instances with varying sizes in the same image using a limited amount of computing resources.

Projects supervised by Vincent Gramoli

USyd Malware Traffic Analysis
In 2014, McAfee estimated the annual cost to the global economy from cybercrime as more than $400 billion. This includes the cost related to the stolen personal information of hundreds of millions of people.
In the 2012 report commissioned by the Australia’s National CERT about cybercrime and security survey, 20% of the surveyed companies identified cybersecurity incidents in the last year, with 21% of these incidents of type trojan/rootkit malware.

The goal of this project is to analyze the traffic at the University of Sydney using powerful multi/many-core servers connected through a 10Gbps network running Intrusion Detection and Prevention Systems to learn about the usage of malware by the machines accessing the Internet from the University campus.

The first phase of the project consists of deploying software components at the servers located between the university network and the Internet to help identifying malware threats. The second phase of the project consists of gathering accesses to infected websites. The third phase of the project consists of quantifying the threats by analyzing the collected data and drawing conclusions.

The project requires knowledge of some network technologies like tcpdump, wireshark, pcap files or suricata.

Benchmarking Concurrent Data Structures on Multi-core Machines
For the last decade, manufacturers have increased the number of processors, or cores, in most computational devices rather than their frequency. This trend led to the advent of chip multiprocessors that offer nowadays between tens to a thousand cores on the same chip. Concurrent programming, which is the art of dividing a program into subroutines that cores execute simultaneously, is the only way for developers to increase the performance of their software.

These multicore machines adopt a concurrent execution model where, typically, multiple threads synchronize with each other to exploit cores while accessing in-memory shared data. To continue the pace of increasing software efficiency, performance has to scale with the amount of concurrent threads accessing shared data structures. The key is for new synchronization paradigms to not only leverage concurrent resources to achieve scalable performance but also to simplify concurrent programming so that most programmers can develop efficient software.

We have developed Synchrobench in C/C++ and Java [1], the most comprehensive set of synchronization tools and concurrent data structure algorithms. The implementations of concurrent data structures and synchronization techniques keep multiplying. The goal of this project is to extend the Synchrobench benchmark suite with new concurrent data structures and to compare their performance against the 30+ existing algorithms on our multi-core machines to conclude on the algorithm design choices to adopt to maximize performance in upcoming concurrent data structures.

[1] Synchrobench
[2] Vincent Gramoli. More Than You Ever Wanted to Know about Synchronization: Synchrobench, Measuring the Impact of the Synchronization on Concurrent Algorithms. PPoPP 2015.

Evaluating Consensus Protocols in Distributed Systems
Distributed system solutions, like CoreOS used by Facebook, Google and Twitter, exploit a key-value store abstraction to replicate the state and a consensus protocol to totally order the state machine configurations. Unfortunately, there is no way to reconfigure this key-value store service, to include new servers or exclude failed ones, without disruption.

The Paxos consensus algorithm that allows candidate leaders to exchange with majorities could be used to reconfigure a key-value store as well [4]. To circumvent the impossibility of implementing consensus with asynchronous communications, Paxos guarantees termination under partial synchrony while always guaranteeing validity and agreement, despite having competing candidate leaders proposing configurations.

Due to the intricateness of the protocol [1] the tendency had been to switch to an alternative algorithm where requests are centralized at a primary. Zab, a primary-based atomic broadcast protocol, was used in Zookeeper [2], a distributed coordination service. Raft [1] reused the centralization concept of Zookeeper to solve consensus. The resulting simplification led to the development of various implementations of Raft in many programming languages.
The goal of this project is to compare a Raft-based implementation to Paxos-based implementations [3] to confirm that Paxos can be better suited than Raft in case of leader failures and explore cases where Raft could be preferable.

[1] Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm. In ATC, pages 305–319, Philadelphia, PA, 2014. USENIX.
[2] Flavio Junqueira and Benjamin Reed. ZooKeeper: Distributed Process Coordination. O’Reilly Media, Nov. 2013.
[3] Vincent Gramoli, Len Bass, Alan Fekete, Daniel Sun. Rollup: Non-Disruptive Rolling Upgrade. USyd Technical Report 699.

Projects supervised by Seok-Hee Hong

Scalable Visual Analytics

Visualization and Analysis of Large and Complex Biological Networks and Social Networks

2.5D Graph Navigation and Interaction Techniques

Drawing Algorithms for Almost Planar Graphs

MultiPlane Graph Embedding (2.5D Graph Embeddability)

Projects supervised by Bryn Jeffries and Sanjay Chawla

Estimating fatigue of subjects from multimedia sources
The CRC for Alertness, Safety and Productivity is a collaboration between industrial and university partners to develop products that improve alertness in the workplace and during commutes. The Alertness Database project, led by Dr Bryn Jeffries in the School of Information Technologies, is developing a cloud-based database suitable for collecting and sharing the research data collected by clinical research partners, and will be the foundation for data analysis and data mining activities.

The proposed project is to develop tools to perform feature extraction from several sources, including EEG recordings, recorded speech, photography and videography of subjects, stored in the database. The student chosen from this project would need to familiarise themselves with the accepted protocols for extracting features from one or more of these sources, and refine processes to obtain markers of fatigure. They would also be expected to liaise with clinical research partners to properly understand the domain and make use of the available research data.

Semester
Preferably S1

Projects supervised by Kevin Kuan

Wisdom of the Crowd in E-Commerce
Description: The wisdom of the crowd (WoC) effect refers to the phenomenon that the aggregate of many people’s opinions tends to be more accurate than the separate individual or even expert opinions. The WoC effect has been demonstrated in different contexts such as World Cup 2014, flu trends, stock markets, political elections, quiz shows, etc. However, little is known about the underlying nature of the WoC effect (i.e. how it works, why it works, when it works, etc.). For example, there is evidence that the WoC effect does not always work and can be severely undermined, depending on factors such as task characteristics, social situations, etc. This project aims to study the WoC effect in various e-commerce contexts (e.g. crowdsourcing, electronic word-of-mouth, social shopping, etc.). Students are expected to identify a WoC phenomenon in e-commerce they are interested, and collect data from Twitter and external sources to test the hypotheses. The findings will provide insights on how companies can use information systems to take advantage of the WoC effect in e-commerce.

Requirements: Knowledgeable in web programming (e.g. PHP); familiar with (or prepared to learn) basic text/opinion mining (e.g. by taking COMP5318); interested in statistical analysis.

Information Presentation in Group-Buying
Description: Group-buying (e.g. Groupon, LivingSocial, Spreets, etc.) is an emerging e-commerce model that has become very popular in the recent years. Apart from offering of products with deep discounts, a prominent feature of group-buying is the presenting of social information (e.g. number of people who have bought a deal, Facebook friends who “like” the deal, etc.) to influence purchase decision. This project aims to study the effects of information presentation on purchase decision in group-buying. Students are expected to identify the various types of information and how they affect purchase decision in the context of group-buying. The effects of information presentation on purchase decision will be examined by collecting data from external sources (e.g. Groupon, Spreets, Facebook, Twitter, etc.). Controlled experiment may also be conducted to test the hypotheses. The findings will provide insights on how group-buying sites should display the deals so that they are more likely to be purchased by the customers.

Requirements: Knowledgeable in web programming (e.g. PHP); interested in conducting controlled experiment and statistical analysis.

Opinion Mining in Online Consumer Review
Description: Consumers increasingly rely on online consumer reviews in guiding purchases. Given the large number of reviews available on a particular product, people are unable and unlikely to go through every review manually, and the ability of online review sites (e.g. Urbanspoon, Yelp, etc.) to help users identify useful reviews becomes increasingly important. This project aims to study the characteristics (length, tone, style, reviewer credibility, etc.) of a review that make it helpful. Students are expected to identify characteristics that potentially distinguish helpful reviews from the unhelpful ones, and examine these characteristics using review data collected from external sources (e.g. Urbanspoon, Yelp, etc.). The findings will provide insights to online review sites on how to automatically identify useful reviews among a vast number of reviews for the users.

Requirements: Knowledgeable in web programming (e.g. PHP); familiar with (or prepared to learn) basic text/opinion mining (e.g. by taking COMP5318); interested in statistical analysis.

Projects supervised by David Lowe

Augmented reality remotely-accessed labs
Existing remote labs largely duplicate conventional labs, but the computer mediation provides an opportunity to enrich the experience of interacting with the equipment by using augmented reality approaches (imagine a magnetics experiment where the video image is overlayed to show the magnetic field lines). This project involves developing the software interfaces for an existing remote laboratory in order to provide an illustrative prototype. The prototype will demonstrate the benefits that can be achieved through the use of augmented reality technologies.

Requirements:
Strong coding skills, particularly in Java, JavaScript and PHP. Labview would be of benefit but not essential. Some understanding of physical systems interfacing would also be useful.

References:
Labshare

Architecture for collaborative remotely-accessed labs
The leading remote labs management system – Sahara – has been designed to be consistent with multi-student distributed collaboration, but this functionality has not yet been fully explored or implemented. This project will investigate extending Sahara to incorporate distributed student collaboration within an experiment session.

Requirements:
Strong coding skills, particularly in Java, JavaScript and PHP. Labview would be of benefit but not essential. Some understanding of physical systems interfacing would also be useful.

References:
Labshare

Remote control of physical systems: Understanding Situational Awareness
It is becoming increasingly common to use remote access to control physical systems. For example, researchers within the Faculty have been exploring remote and automomous control of Mars Rovers, mining equipment, teaching laboratory apparatus and fruit picking robots. This project will focus on using eye tracking systems to understand the impact of remote access on the extent to which users' attention can be directed, and hence whether it is feasible to improve users' awareness of relevant aspects of the system under control.

Requirements:
Strong design and coding skills. Some understanding of physical system interface would be beneficial.

Using VR glasses and a Wii controller to interact with remotely-accessed labs
Existing remote labs largely utilise conventional web-based or Labview interfaces. This project will explore the feasibility of using alternative technologies (such as VR glasses and/or game controllers to interact with remote physical laboratory apparatus.

Requirements:
Strong coding skills, particularly in Java, JavaScript and PHP. Experience in coding for games devices would be beneficial but not essential.

References:
Labshare

Projects supervised by Josiah Poon

Extracting numerical information from literature for meta-analysis
A systematic review is a meta-analysis on literature that is generally applied to the health area to support evidence-based medicine. It is a process to search and select relevant research papers, then extract and synthesize appropriate information from these papers to answer a research question. In the past, these are mostly related to Randomized Controlled Trials (RCT), but new areas (such as Diagnostic Testing Accuracy (DTA)) have emerged recently. Regardless of the focal areas, researchers currently have to extract numerical information manually from these reports for meta-analysis. The information may be embedded in the paragraphs, it may also appear as a table in the article. The aim of this project is to explore how this task of extracting numbers can be automated to minimize researchers’ workload.

References:
Tong, M. Hsu, W. and Taira R. (2013). A Formal Representation for Numerical Data Presented in Published Clinical Trial Reports, in Studies in Health Technology and Informatics, 192, MEDINFO, 2013, pp.856-60. (DOI: 10.3233/978-1-61499-289-9-856)

Pre-requisites: It is anticipated the student to have completed or to do COMP5318 Knowledge Discovery and Data Mining.

What are the important things in the Terms and Conditions?
Terms and conditions (T&C) are the fine details in a binding contract. In the past, we would find a lawyer to help us go through them so that we could make the right decision. Nowadays, we are flooded by too many adhoc contracts to consider, whenever we buy services or items over the internet. These T&C can have so many pages that we, at the end, give up reading and simply press the “agree” button without realising its consequence. The aim of this project is to study and identify the important points in a T&C, and to use text mining techniques to extract the important information from these documents. Hopefully, we can develop a tool to help us make the right decision. Lastly, we can also consider this project closely related to document summarisation.

References:
De Araujo, D.A.; Rigo, S.J.; Muller, C.; Chishman, R., "Automatic Information Extraction from Texts with Inference and Linguistic Knowledge Acquisition Rules," Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on, vol.3, no., pp.151,154, 17-20 Nov. 2013 (doi: 10.1109/WI-IAT.2013.171)

Tin Tin Cheng; Cua, J.L.; Tan, M.D.; Yao, K.G.; Roxas, R.E., "Information extraction from legal documents," Natural Language Processing, 2009. SNLP '09. Eighth International Symposium on , vol., no., pp.157,162, 20-22 Oct. 2009 (doi: 10.1109/SNLP.2009.5340925)

Pre-requisites: It is anticipated the student to have completed or to do COMP5318 Knowledge Discovery and Data Mining.

Projects supervised by Javid Taheri

Data/Network/Speed-Aware Job Scheduling for Distributed/Cloud Computing
Distributed computing has been significantly changed since the arrival of cloud computing and its abandoned core. Regardless of the number of available cores, the fact that these cores should be optimally used has not been changed. This projects aim to schedule jobs among heterogeneous distributed/cloud resources to optimally maximize/minimize multi-objectives (such makespan of jobs and transfer time of data). This work starts with simulation and then moves to employ the designed algorithm to the real distributed computing environment we have in the school.

Modelling a Real Distributed Environment
Distributed computing has been significantly changed since the arrival of cloud computing and its abandoned core. Having too many cores distributed over many corners of this globe, also introduces new challenges to close the gap between what we expect to see (simulation) to what actually happened when scheduled jobs are executed on several machines. This project, aim to close this gap by conducting a serious of job scheduling allocations among different real distributed environments followed by appropriate measurements. The output of this project would be a model to take scheduling decision of a system and predict how long it would actually takes if deployed in a real environments.