Project Offerings

blue cubes

Project Offerings for Semester 2, 2014

For 2014, I have the following projects on offer on large-scale distributed data management, Big Data - next generation data analytics and data management for bio-data:

If you are interested in any of these projects, please contact me by email or in person.


Projects on Large-Scale Distributed Data Management:

Data Processing on MultiCore Machines: Building a Virtual Database Cluster (Honours project)

Multi-Core computers are becoming increasingly common for large servers. At the end of this year, server CPUs with 128 cores will become available. This poses a real challenge to database engines as those are optimised for concurrent workloads sharing resources and hiding latency, rather than for large numbers of parallel cores that can run many queries completely independent. There's some body of work on optimising databases for distributed systems such as a cluster of databases. In this project, we are interested to learn how those techniques perform if applied to a single multi-core machine that is configured as a 'virtual' cluster by deploying several virtual machines on the same hardware. To this end, we have both a large multi-core machine as well as a small research cluster available as hardware platform. The project student shall compare the performance of an open source DBMS on either platforms for a given workload and develop a new load distribution technique that optimizes the performance of the virtual database cluster.

Robust Snapshot Replication with PostgreSQL 9 (18cp MIT Project or Honours project)

This project is implementing and evaluating our group's Robust Snapshot Replication protocol [ADC2013] into the current PostgreSQL 9 database engine.

Database Cluster Management Tool (Software Development: MIT 12cp / TSP / Engineering Project /Undergraduate project for 1-2 students)

We have a small database cluster of 8 nodes which we use for several research projects. It is a multi-boot cluster (Linux and Windows 2003 Server) that can run different database engines, both commercial and open source, such as Oracle, Microsoft SQL Server, and PostgreSQL. We need a platform independent monitoring tool with a GUI that helps us (a) keeping track of the current cluster state and (b) reboot cluster nodes into different configurations. Ideally, it would also include a cluster allocation component to manage our research projects and allow us to use subsets of the cluster concurrently in different projects. This project shall conduct a study and review of corresponding database cluster management tools and set-up a suitable solution, eventually enriched with self-developed software components.

Skills needed:Some experience with programming and databases; Sys-Administration background of advantage

Suitable majors: Databases, Software Engineering, Networking


Projects on Next-Generation Data Analytics

Building a NoSQL Store with a Speculation-friendly Tree Index

NoSQL presents several advantages over classical Relational Database Management Systems (RDMS) as for example the efficiency and readability of the code accessing it. The drawback lies, however, in guaranteeing weaker properties than RDMS. More precisely, NoSQL would typically provide a basically-available, soft state and eventually consistent (also known as BASE) store whereas RDBMS guarantees ACID properties including the atomicity of a transaction for which the 'A' of ACID stands.

At the same time, current hardware developments offer high degrees of parallelism for pure in-memory data processing. Modern server CPUs with 64 cores are already available today, and main memory sizes in the tera-byte scale are feasible too already. The challenge with software designs for those systems is to avoid synchronisation bottlenecks that would render the hardware efforts useless and wasted…

The goal of this project is to combine the benefits of NoSQL store (in-memory efficiency) with the advantage of traditional RDMS (atomicity) through a new speculation-friendly data structure. A speculation-friendly algorithm is appealing for leveraging many-core memory transactions without experiencing contention hot spots. The key idea lies in allowing multiple threads (or processes) to modify concurrently the store while requiring a single thread to adapt the underlying structure in the background. More precisely, the project would consist of implementing a transactional NoSQL store by means of several instances of a speculation-friendly tree indexing different columns of the same data row. The resulting multiple-key-single-value store will be accessible through efficient atomic transactions that will be evaluated under realistic workloads.

This will be a technical project that needs good programming skills in either C or Java, and some experience with concurrent programming (as taught in COMP2129) and database structures (as taught in INFO3404).

The project will be co-supervised by Dr Gramoli and Dr Roehm.

A Touch Interface for SQL Databases (Honours project)

More and more computing systems are produced with touch interfaces, from smartphones via tablets to the latest versions of desktop operating systems (Windows 8 and Max OS X). At the same time, the basic interface to database systems is still SQL, which is a text-based query language that requires keyboard input and that is hard to learn for novice users.

In our TouchQL project, we aim to develop a query 'language' that is purely based on a graphical schema representation and input gestures and that allows to query a relational database using a tablet computer.

There exists already an initial prototype of TouchQL for Android devices that supports basic selections, projections and natural joins over local databases.

The goal of this Honours project is to extend this system with a mechanism for grouping and aggregation, and also to support querying remote databases. The challenge in the later part is to provide timely feedback to the user for the intended operations as in TouchQL, there is no separation between query formulation and query execution - users shall get immediate feedback on their intended actions on the actual data set. It would be additionally beneficial if the student would be able to port TouchQL from the Java-based Android to the Objective-C based iOS.


Projects on Database Support for Bioinformatics:

Bio-Data Processing using Map/Reduce (Honours project or 18cp MIT Research Project)

This project will investigate the suitability of a map/reduce framework for the parallel processing of DNA fragment data (so called 'short reads'). The student shall implement a short-read comparison algorithm on the database research cluster of the DBRG using the open source Hadoop system.

Skills needed: INFO2x20, COMP5138 or equivalent database course (INFO3404 would be perfect); good programming skills; Bioinformatics background not neccessary

Suitable MIT majors: Databases, Software Engineering, Computer Science

I also maintain a list of former projects supervised by me in recent years.