A set of best practices has emerged for building applications on top of Hadoop, thanks to the broad adoption of Apache Hadoop across various industries. However, for many developers, particularly those who are relatively new to Hadoop, it's a challenge to learn what those best practices are and how to apply them.
Cloudera has created a new open source project called the Cloudera Development Kit (CDK), to help these developers get new projects off the ground more easily. The CDK is both a framework and long-term initiative for documenting proven development practices and providing helpful doc and APIs that will make Hadoop application development as easy as possible.
This on-demand webinar will teach you:
- About the current CDK release and its targeted use cases
- How the CDK will be managed and extended over time
- Why the CDK will have a long-term impact on Hadoop adoption
Scanning the Internet for External Cloud Exposures via SSL Certs
Cloudera Development Kit (CDK): Hadoop Application Development Made Easier
1. 11
Headline Goes Here
Speaker Name or Subhead Goes Here
Cloudera Developer Kit:
Hadoop Application Development Made Easier
E. Sammer | Engineering Manager
May 2013
2. 22
“[I]t’s not enough to just build a
scalable and stable system; the system
also has to be easy enough for
thousands of internal developers of all
types and all skill levels to use.”
http://gigaom.com/data/how-disney-built-a-big-data-platform-on-a-startup-budget/
11. 11
Infrastructure details
Serialization, file formats, and compression
Metadata capture and maintenance
Dataset organization and partitioning
Durability and delivery guarantees
Well-defined failure semantics
Performance and health instrumentation
11
12. 12
Cloudera Development Kit
Make Hadoop accessible to the enterprise developer
Codify expert patterns and practices
Make the “right thing” easy and obvious
Address the most common cases
Let developers focus on business logical, not infrastructure
12
13. 13
Cloudera Development Kit
An open source set of libraries, guides, and examples for
building data-oriented systems and applications
Provides higher level APIs atop existing components of CDH
Supports piecemeal adoption via loosely coupled modules
13
14. 14
CDK Data Module
High level APIs for interacting with datasets in HDFS
Configuration-based format and schema management
Consistent data model and serialization semantics
Metadata system integration and support
Automatic dataset partitioning and file management
14
16. 16
Under development
Configuration-based record transformation and filtering engine
Data pipeline deployment, discovery, and management
Working with customers, partners, and the community on new
modules and features
16
17. 17
Getting started
CDK code repo: github.com/cloudera/cdk
CDK example repo: github.com/cloudera/cdk-examples
Binary artifacts available from Cloudera’s Maven repository
Mailing list: groups.google.com/a/cloudera.org/d/forum/cdk-dev
17
18. 18
• Submit questions in the Q&A panel
• Watch this webinar on-demand at
http://cloudera.com
• Follow Cloudera @Cloudera
• Follow Cloudera Engineering
@ClouderaEng
• Thank you for attending!
Learn more about the CDK
http://cloudera.com/cdk
CDK on GitHub
http://cloudera.github.io/cdk/docs/0.
2.0/