Pydoop is a Python interface to Hadoop that allows you to write MapReduce applications in pure Python:

class Mapper(api.Mapper):

    def map(self, context):
        for w in context.value.split():
            context.emit(w, 1)
class Reducer(api.Reducer):

    def reduce(self, context):
        context.emit(context.key, sum(context.values))

Feature highlights:

Pydoop enables MapReduce programming via a pure (except for a performance-critical serialization section) Python client for Hadoop Pipes, and HDFS access through an extension module based on libhdfs.

To get started, read the tutorial. Full docs, including installation instructions, are listed below.

Indices and Tables