Last week, HPE announced its acquisition of MapR assets. As a MapR partner, we at Kyligence are happy to see that MapR’s great technology has found a new home.
MapR has a great file system that is not only compatible with HDFS, but also compatible with Linux file system. In MapR-FS, developers can mount a MapR cluster as a Linux directory, and execute standard Linux commands in this directory. Behind the scenes, files are replicated across the clusters just like in a Hadoop file system and can be processed by standard Hadoop frameworks like Hive, Yarn, and more.
On top of this file system, MapR has also built a database that is compatible with HBase (but doesn’t have the stability issues of HBase) and an event store that is compatible with Kafka. The great data platform of MapR has been servicing some of the world’s largest enterprises in some of the most challenging scenarios (i.e. a bio-info database for the 1.3 billion population of India).
What makes the HPE acquisition interesting is that HPE also acquired BlueData 9 months ago. BlueData enables Big Data applications to operate in container environments. Typical containers such as Docker are good for stateless applications, but they are not a great deployment option for stateful applications such as Hadoop and Spark. BlueData EPIC makes it easier to run Hadoop and Spark on Docker, it also (sort of) works with Kubernetes to orchestrate container clusters.
MapR can further enhance BlueData’s capability to containerize big data applications. MapR’s Persistent Application Client Containers (PACCs) are designed to be the storage foundation for containers. A container can be spun up or shut down, the data is stored in the MapR cluster with replications and dynamically mounted to the container.
Users don’t have to worry about the location of the data, all they need to know is when the container is ready, the data is ready. This is critical for a stateful application to run in the container environment. HPE now has the two key technologies needed to bring Big Data applications into the containerized environment.
Kyligence has traditionally run on Hadoop; Cloudera, Hortonworks and MapR are all officially supported enterprise platforms. Earlier this month we announced the GA of Kyligence 4.0, which is fully built on top of Apache Spark, supporting both on-YARN and standalone deployments. We are excited by the potential HPE’s acquisition of MapR assets brings.
Running Kyligence with Spark in a containerized environment will greatly reduce customers’ operations overhead and increase the flexibility of their analytics capabilities. We look forward to hearing more from the joint HPE/MapR team.