Contenu connexe
Similaire à Talend Big Data Capabilities - 2014 (20)
Talend Big Data Capabilities - 2014
- 2. 2
About the Presenter
Rajan Kanitkar
• Senior Solutions Engineer
• Rajan Kanitkar is a Pre-Sales Consultant with Talend. He
has been active in the broader Data Integration space for
the past 15 years and has experience with several leading
edge software companies in these areas. His areas of
specialties at Talend include Data Integration (DI), Big
Data (BD), Data Quality (DQ) , and Master Data
Management (MDM).
• Contact: rkanitkar@talend.com
© Talend 2014
- 3. 3
Talend Big Data Platform
Hadoop, MapReduce, NoSQL capabilities …
© Talend 2014
- 4. 4
The Big Data Ecosystem
• Hadoop: the core project
• HDFS: the Hadoop Distributed File System
• MapReduce: the software framework for distributed
processing of large data sets
• Hive: a data warehouse infrastructure that provides
data summarization and a querying language
• Pig: a high-level data-flow language and execution
framework for parallel computation
• HBase: this is the Hadoop database. Use it when
you need random, realtime read/write access to
your Big Data
• And many many more: Sqoop, HCatalog,
Zookeeper, Oozie, Cassandra, MongoDB, Flume,
Impala, Stinger, Neo4J, etc.
© Talend 2014
- 6. 6
Key differentiator of Our Next Gen Architecture…
© Talend 2014
JAVA
ETL
Day-to-day
integration
Run everywhere
SQL
ELT
DW
appliance
Teradata, Netezza…
MapReduce
+ PIG + HiveQL
+ Sqoop + …
Hadoop
Highly
Scalable
Hadoop Grid
CAMEL
CAMEL
Message
transform-ation
High Frequency
No black-box engine
Enables light-weight distributed,
customizable and parallelizable
run time
Standards-Based
Code Generator
- 8. 8
Talend Big Data – “pure Hadoop”
© Talend 2014
Visual design in Map Reduce and optimize before
deploying on Hadoop
to this…
- 9. 9
Native Map/Reduce Jobs
• Create classic ETL patterns using native Map/Reduce
- Only data management solution on the market to generate native
Map/Reduce code
© Talend 2014
• Reduce the need for big
data coding skills
• Zero pre-installation on
the Hadoop cluster
• Hadoop is the “engine”
for data processing
- 10. 10
MapReduce 2.0, YARN, Storm, Spark
• Yarn: Ensures predictable performance & QoS for all apps
• Enables apps to run “IN” Hadoop rather than “ON”
• In Labs: Streaming with Apache Storm
• In Labs: mini-Batch and In-Memory with Apache Spark
© Talend 2014
Applications Run Natively IN Hadoop
YARN (Cluster Resource Management)
HDFS2 (Redundant, Reliable Storage)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm, Spark)
GRAPH
(Giraph)
NoSQL
(MongoDB)
EVENTS
(Falcon)
ONLINE
(HBase)
OTHER
(Search)
Source: Hortonworks
- 11. 11
© Talend 2014
iPaaS MDM
HA Govern
Security Meta
Storm Kafka
CXF Camel
STANDARD-IZE
MACHINE
YARN (Cluster Resource Management)
HDFS2 (Redundant, Reliable Storage)
800+
HIVE
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm, Spark)
GRAPH
(Giraph)
NoSQL
(MongoDB)
Events
(Falcon)
ONLINE
(HBase)
OTHER
(Search)
Talend: Ingest – Transform – Deliver
TRANSFORM (Data Refinement)
MAP PROFILE PARSE CLEANSE CDC
LEARNING
MATCH
INGEST
(Ingestion)
SQOOP
FLUME
HDFS API
HBase API
DELIVER
(as an API)
Karaf ActiveMQ
- 12. 12
© Talend 2014
Talend Big Data Sandbox &
Talend Big Data Jumpstart
Delivering instant value from all your data
- 13. 13
BIG DATA CHALLENGES
The Big Data Customer Discussion
© Talend 2014
- 14. 14
Top Big Data Challenges
© Talend 2014
Talend Directly
Addresses these
Challenges
Source: Gartner - Survey Analysis: Big Data Adoption in 2013 Shows Substance
Behind the Hype - 12 September 2013 - G00255160
- 16. 16
TALEND BIG DATA SANDBOX
30 day customer trial
© Talend 2014
- 17. 17
Cookbook Step-by-Step Directions
• Completely Self-contained Demo Sandbox
• Key Scenarios:
- Twitter Analysis
- Clickstream Analysis
- Web Log analysis
- ETL Offload
• Scenario Summaries
- Social Media insights
- Channel optimization
- Customer insights
- Data Warehouse Cost Reduction
© Talend 2014
- 18. 18
Ready for Launch
• Announcements
- Public announcement Tuesday 15th
- Newsletter was sent 9th July
• Customer Nurture campaign
- Scenario reminders, videos & Links
- Reminder to Talend AE
• Two Routes for 5.5
- Sandbox Download publicly available – 15th July
- Jumpstart and AE ‘access’ – 15th July
• Links for the 15th (Sandbox download)
- Public: http://www.talend.com/talend-big-data-sandbox
- Account Exec: send download link for customer to fill in:
© Talend 2014
• https://info.talend.com/prodevaltpbdsandbox
- 19. 19
TALEND BIG DATA JUMPSTART
A ‘guided tour’ of the Sandbox
© Talend 2014
- 20. 20
Why the ‘Jumpstart’?
Practical
Guided Tour
• Lead by Talend Solutions Engineer
• Learn about the Talend Studio
• See how to execute Hadoop processes
- Map/Reduce with YARN
- Pig
- HDFS
• See NoSQL Examples
- Hive
- HBase
- MongoDB
- Cassandra
© Talend 2014
- 21. 21
Key benefits
• NO Configuration/Development
• INSTANT results now, for the Future
• Valuable prototypes for FREE
• Working on the top THREE Hadoop Distributions
© Talend 2014
- 22. 22
3 Simple Messages
• Sandbox is Customer led, Jumpstart is Sales led
• Jumpstart is the best way to ‘get Talend’
- Google: Talend Jumpstart
• Work to get the best conversation & involve pre-sales
© Talend 2014
- 23. 23
© Talend 2014
Sandbox
- Talend Jumpstart Sandbox - virtual image installed with:
• Apache Hadoop distribution provided Hortonworks, Cloudera & MapR
• Pre-configured Talend Platform for Big Data 5.5*
• Four scenarios for you to try:
– Clickstream data
– Twitter sentiment
– Apache weblogs
– ETL Offload
• Demonstrations of several NoSQL databases
*Includes Talend Studio (graphical IDE), team working,
management, data quality and advanced big data features.
www.talend.com/products/platform-for-big-data