CDH (Cloudera's Distribution including Apache Hadoop)
Updated: Sep 10, 2019
Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volume and varieties of data in your enterprise. Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyze your data, and keep that data secure and protected.
CDH is the most complete, tested, and populer distribution of Apache Hadoop and related projects. CDH delivers the core element of Hadoop - scalable storage and distributed computing - along with a Web-based user interface and vital enterprise capabilities. CDH is Apacle-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL and interactive search, and role-based access controls.
Flexibility: Store any type of data and manipulate it with a variety of different computation framework including batch processing, interactive SQL, free text search, machine learning and statistical computation.
Integration: Get up and running quickly on a complete Hadoop platform that works with a broad range if hardware and software solutions.
Security: Process and control sensitive data.
Scalability: Enable a broad range of applications and scale extend them to suite the requirements.
High Availability: Perform mission-critical business tasks with confidence.
Compatibility: Leverage your existing IT infrastructure and investment.
What is Hadoop?
"an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware" - Hortonworks.
Data's too darn big (terabytes per day)
Vertical scaling does't cut it
Disk seek times
Horizontal scaling is linear
Hadoop: It's not just for batch processing anymore
The spoiler for HDFS (Hadoop Distributed Files System)
See in the next blog :)