Proposals for Bachelor's/Master's theses

List of topics

A distributed backup system

The system would be peer-to-peer or community-based as opposed to cloud-based.
The system could be developed on top of BitTorrent, for example.
The particular challenge lies in making this system procedurally robust with regard to loss of backups.
Additional challenges include security and efficiency (with regard to extra space and bandwidth needed for backups).

Integration of MapReduce and RDMS-like indexing

The initial MapReduce model does not support any clever indexing a la RBMS. However, one could think of MapReduce computations as a particular means to create indexes. Hence, a natural question is this: how to best create such indices in a distributed file system, how to effectively extend the MapReduce programming model so that it takes appropriate advantage of these indexes. See for some related discussion and in fact, products. What are the scenarios anyhow that require indexes.

Programming file-system population and transformation

The focus of the MapReduce programming model is on queries, say data extraction over large sets of homogenous records. An orthogonal question is the actual population of a distributed file system. Think, for example, of a web crawler that is supposed to procedure files that are to be processed via MapReduce. What is the programming model for such a crawler? In particular, how is parallelization to be achieved effectively, and how do we distribute the data usefully? (To what extent does the distributed file system underneath support this task?) Now, also consider the problem of transforming the MapReduce-like input data. As a more concrete example, think of sorting, grouping, or incorporating a delta. Again, what programming model is appropriate here?

Extended the MapReduce model with a delta facility

Using the Diff format, for example, one could represent a sequence of versions of input data. Some initial investigation is needed to substantiate the usefulness of such a feature. Abstractly, there are these use cases: i) reasoning explicitly about version history; ii) efficient representation of the latest version in terms of existing version; iii) support for efficient MapReduce computations if they can essentially focus on the delta.

Reconciliation of MapReduce and Algorithmic Skeletons

Algorithmic Skeletons