Title: Typing MapReduce

Speaker: Jens Dörre (University of Passau, Germany)

Host: Ralf Lämmel, Inst. for Software Technology and CS

Date/Time: 14 Sep 2011 (Wednesday), 10am (ct)

Room: B 132


MapReduce is a framework for processing large distributed data sets. It is based on the combinators map and reduce as used in functional programming (which can be strongly typed using parametric polymorphism). Companies with large data centers such as Google, Yahoo, and Amazon have made the combination of map and reduce practical for processing very large data sets, mainly spread across a cluster of computers. Their MapReduce frameworks have been implemented with imperative languages and are quite complex in the interest of efficiency. The price is a partial loss of structure leading to an increased danger of unrecognized programming errors, in particular, type errors. The goal of project MapReduceFoundation is to make MapReduce type-safe in an imperative setting and more efficient in a functional setting (where type safety is a given).

As a first step, we look into Hadoop, the open-source Java MapReduce framework created by Yahoo. In Hadoop, the connection between the two phases of a MapReduce computation is unsafe: there is no static type check of the generic type parameters involved. We provide such a static check for Hadoop programs. To this end, we use strongly typed higher-order functions checked by the standard Java 5 type checker together with the Hadoop program.

In the future, our plan is to provide domain-specific debugging and type-checking facilities. To this end, we will design a DSL for MapReduce that will ultimately be capable of modeling work-flows of multiple MapReduce computations interconnected by data-flow.

Biography of the Presenter

Jens Dörre is a PhD student at the Department of Informatics and Mathematics at the University of Passau, Germany since 2009. Since June 2010, he is investigating typing issues in MapReduce programs with Christian Lengauer (Programming Group) and Sven Apel (Software Product-Line Group). This work is funded by the DFG (German Research Foundation) grant "MapReduceFoundation".

He has studied Computer Science at University of Passau, Germany and Laval University, Canada and obtained his diploma (masters) in Computer Science from University of Passau in 2009 with a thesis on "Feature-Oriented Composition of XML Artifacts".

He is interested in applying concepts from the field of Programming Languages to the construction of Parallel and Distributed Systems. These concepts include Construction of Languages and their Processors, Functional programming, Type Systems and Abstract Interpretation. With their help he tries to construct systems that are easy to write, efficient, portable and scalable.