Free Hadoop distribution comparison (Cloudera, Hortonwork, Mapr)

Cloudera
They were the first on the market with their Cloudera Distribution including Apache Hadoop (CDH). This helped them to acquire valuable experience and to establish a solid customer base. Besides the core Hadoop plattform (HDFS, MapReduce, Hadoop Commons), CDH integrates 10 open source projects including HBase, Mahout, Pig, ZooKeeper, and others. Cloudera offers CDH, which is 100% open source, as a free download as well as a free edition of their Cloudera Manager console for administering and managing Hadoop clusters of up to 50 nodes. The enterprise version on the other hand combines CDH and a more sophisticated Manager plus an enterprise support package. The latest release, CDH4,  is the only distro of the three players, that is built on Hadoop 2.0.
Recently, Cloudera inked two significant relationships. IBM announced that besides their own Hadoop distribution, BigInsights will run the CDH distro. This was closely followed by partnering with HP.
Hortonworks
Hortonworks announced in June the general availability of their Hortonworks Data Platform (HDP). The HDP distro is 100% Apache open source code. The major difference from Cloudera and MapR is that HDP uses Apache Ambari for cluster management and monitoring. In its current 0.9 version, Ambari certainly can’t be so mature as Cloudera’s Manager or MapR’s Heatmap. The Hortonworks Data Platform is open source to its core  – no proprietary layers. You’ll therefore never have a vendor lock-in. Yet surprisingly to some, HDP will only be based on the original Hadoop 1.0 codebase.
At the Hadoop Summit 2012, Microsoft announced that they partnered up with Hortonworks to put Hadoop on Azure.
MapR
The major differences to CDH and HDP is that MapR uses their proprietary file system MapR-FS instead of HDFS. The reason for their Unix-based file system is that MapR considers HDFS as a single point of failure. The current version (v2.0) of their product is based on Apache Hadoop 0.20.2 and is known as M3 and M5. The fundamental difference between the free community edition M3 and the enterprise edition M5, is the extra high-availability features. There is MapR 2.o Beta available which I suppose will be built on Hadoop 2.0.
The company announced two prominent partnerships in June: Firstly, both editions (M3 and M5) have been selectedin addition to Amazon’s own version of Hadoop (version 0.20.205) on their Elastic MapReduce service. Secondly, MapR is now available on Google Compute Engine.
More Detail




Cloudera Hortonwork Mapr m3
Open source
x

Has enterprise version
x

x




Install automatic

Run script
→ Web admin
→ install via web gui

Run script
→ Web admin
→ install via web gui
Install via command line
WebUI manager Cloudera AmbariMapr




Service and Function


Host monitoring
x
x
x
Service monitoring
x
x
x
Activity monitoring for user jobs
x
x
x
Cluster-wide log search (solr integration)
x


Events management (alert)
x
x
x
Enhanced cluster statistics
x
x
x
File browsing
x
x
x




Sub framework avaiable



Ambari (WebUI management)

x

Hue (Hadoop user experience)
x
x

Oozie (Work flow management)
x
x
x
HDFS
x
x
x
Webhdfs
x
x

NFS

x
x
Map reduce
x
x
x
Yarn (Resource management ])
x
x
x
Zoo keeper (Cordination)
x
x
x
Hbase
x
x
x
Hive & Hcatalog
x
x
x
Pig
x
x
x
Impala (High performance query)
x


Tez (High performance query)



Sqoop (Rdbms integration)
x
x
x
Flume (Log tranfer)



Whirr (Cloud services)


x




Security



Kerberos
x
x
x
LDAP authentication
x
x
x
Knox (secure Hadoop clusters)

x