SlideShare une entreprise Scribd logo
1  sur  81
Big data beyond Hadoop –
How to integrate ALL your data
Kai Wähner
kwaehner@talend.com
@KaiWaehner
www.kai-waehner.de
9/24/2013
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Consulting
Developing
Coaching
Speaking
Writing
Main Tasks
Requirements Engineering
Enterprise Architecture Management
Business Process Management
Architecture and Development of Applications
Service-oriented Architecture
Integration of Legacy Applications
Cloud Computing
Big Data
Contact
Email: kontakt@kai-waehner.de
Blog: www.kai-waehner.de/blog
Twitter: @KaiWaehner
Social Networks: Xing, LinkedIn
Kai Wähner
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Key messages
You have to care about big data to be competitive in the future!
You have to integrate different sources to get most value out of it!
Big data integration is no (longer) rocket science!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
William Edwards Deming
(1900 –1993)
American statistician, professor,
author, lecturer and consultant
“If you can't measure it,
you can't manage it.”
Why should you care about big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
 „Silence the HiPPOs“ (highest-paid person‘s opinion)
 Being able to interpret unimaginable large data
stream, the gut feeling is no longer justified!
Why should you care about big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
What is big data? The Vs of big data
Volume
(terabytes,
petabytes)
Variety
(social networks,
blog posts, logs,
sensors, etc.)
Velocity
(realtime or near-
realtime)
Value
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Big Data Integration
– Land data in a Big Data cluster
– Implement or generate parallel processes
Big Data Manipulation
– Simplify manipulation, such as sort and filter
– Computational expensive functions
Big Data Quality & Governance
– Identify linkages and duplicates, validate big data
– Match component, execute basic quality features
Big Data Project Management
– Place frameworks around big data projects
– Common Repository, scheduling, monitoring
Big data tasks to solve - before analysis
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
“The advantage of their new system is that they can now look at their data
[from their log processing system] in anyway they want:
➜ Nightly MapReduce jobs collect statistics about their mail system such as
spam counts by domain, bytes transferred and number of logins.
➜ When they wanted to find out which part of the world their customers
logged in from, a quick [ad hoc] MapReduce job was created and they had
the answer within a few hours. Not really possible in your typical ETL
system.”
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
Use case: Clickstream Analysis
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
http://hkotadia.com/archives/5021
Deduce
Customer
Defections
Use case: Risk management
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
➜ With revenue of almost USD 30 billion and a network of
800 locations, Macy's is considered the largest store operator in the
USA
➜ Daily price check analysis of its 10,000 articles in less than two hours
➜ Whenever a neighboring competitor anywhere between New York
and Los Angeles goes for aggressive price reductions, Macy's follows
its example
➜ If there is no market competitor, the prices remain unchanged
http://www.t-systems.com/about-t-systems/examples-of-successes-companies-analyze-big-data-in-record-time-l-t-systems/1029702
Use case: Flexible pricing
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
➜ A lot of data must be stored „forever“
➜ Numbers increase exponentially
➜ Goal: As cheap as possible
➜ Problem: (Fast) queries must still be possible
➜ Solution: Commodity servers and „Hadoop querying“
Global Parcel Service
http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up
Storage: Compliance
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
This is your
company
Big Data Geek
Limited big data experts
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Big Data + Poor Data Quality = Big Problems
Data quality
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
➜ Wanna buy a big data solution for your industry?
➜ Maybe a competitor has a big data solution which
adds business value?
➜ The competitor will never publish it (rat-race)!
Big data tool selection (business perspective)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Looking for ‚your‘ required big data product?
Support your data from scratch?
Good luck! 
Big data tool selection (technical perspective)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to solve these big data challenges?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
 “*Often+ simple models and
big data trump more-elaborate
[and complex] analytics approaches”
 “Often someone coming from
outside an industry can spot
a better way to use big data
than an insider”
Erik Brynjolfsson / Lynn Wu
http://alfredopassos.tumblr.com/post/32461599327/big-data-the-management-revolution-by-andrew-mcafee
Be no expert! Be simple!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
 Look at use cases of others
(SMU, but also large companies)
 How can you do something similar
with your data?
 You have different data sources?
Use it! Combine it! Play with it!
Be creative!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
1) Do not begin with the data, think about business opportunities
2) Choose the right data (combine different data sources)
3) Use easy tooling
http://hbr.org/2012/10/making-advanced-analytics-work-for-you
What is your Big Data process?
Step 1 Step 2 Step 3
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Technology perspective
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
The critical flaw in parallel ETL tools is the fact that the data is almost never local to the processing
nodes. This means that every time a large job is run, the data has to first be read from the source,
split N ways and then delivered to the individual nodes. Worse, if the partition key of the source
doesn’t match the partition key of the target, data has to be constantly exchanged among the
nodes. In essence, parallel ETL treats the network as if it were a physical I/O subsystem. The
network, which is always the slowest part of the process, becomes the weakest link in the
performance chain.
http://blog.syncsort.com/2012/08/parallel-etl-tools-are-dead
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Slides: http://www.slideshare.net/pavlobaron/100-big-data-0-hadoop-0-java
Video: http://www.infoq.com/presentations/Big-Data-Hadoop-Java
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
The defacto standard for big data processing
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Even Microsoft (the .NET house) relies on Hadoop since 2011
How to process big data?
“A big part of [the
company’s strategy+
includes wiring SQL Server
2012 (formerly known by
the codename “Denali”) to
the Hadoop distributed
computing platform, and
bringing Hadoop to
Windows Server and Azure”
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Apache Hadoop, an open-source software library, is a
framework that allows for the distributed processing of
large data sets across clusters of commodity hardware
using simple programming models. It is designed to scale
up from single servers to thousands of machines, each
offering local computation and storage.
What is Hadoop?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Simple example
• Input: (very large) text files with lists of strings, such as:
„318, 0043012650999991949032412004...0500001N9+01111+99999999999...“
• We are interested just in some content: year and temperate (marked in red)
• The Map Reduce function has to compute the maximum temperature for every year
Example from the book “Hadoop: The Definitive Guide, 3rd Edition”
Map (Shuffle) Reduce
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Connectivity
Routing
Transformation
Complexity
of Integration
Enterprise
Service Bus
Integration Suite
Low High
Integration
Framework
INTEGRATION
Tooling
Monitoring
Support
+
BUSINESS PROCESS MGT.
BIG DATA / MDM
REGISTRY / REPOSITORY
RULES ENGINE
„YOU NAME IT“
+
Alternatives for systems integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Complexity
of Integration
Enterprise
Service Bus
Integration Suite
Low High
Integration
Framework
Alternatives for systems integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
More details about integration frameworks...
http://www.kai-waehner.de/blog/2012/12/20/showdown-integration-framework-
spring-integration-apache-camel-vs-enterprise-service-bus-esb/
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
More details about integration frameworks...
... or you come to my JavaOne session tomorrow
and see an updated version of the slides!
Wednesday, 11:30 – 12:30 PM
CON1934: Which Integration Framework to Choose?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise Integration Patterns (EIP)
Apache Camel
Implements the EIPs
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise Integration Patterns (EIP)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise Integration Patterns (EIP)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Architecture
http://java.dzone.com/articles/apache-camel-integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
HTTP
FTP
File
XSLT
MQ
JDBC
Akka
TCP
SMTP
RSS
Quartz
Log
LDAP
JMS
EJB
AMQP
Atom
AWS-S3
Bean-Validation
CXF
IRC
Jetty
JMX
Lucene
Netty
RMI
SQL
Many many more Custom Components
Choose your required components
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Choose your favorite DSL
XML
(not production-ready yet)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Deploy it wherever you need
Standalone
OSGi
Application Server
Web Container
Spring Container
Cloud
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise-ready
• Open Source
• Scalability
• Error Handling
• Transaction
• Monitoring
• Tooling
• Commercial Support
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Camel integration route
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Hadoop Integration with Apache Camel
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-hdfs component
// Producer
from("ftp://user@myServer?password=secret")
.to(“hdfs:///myDirectory/myFile.txt?append=true");
// Consumer
from(“hdfs:///myDirectory/myBigDataAnalysis.csv")
.to(“file:target/reports/report.csv");
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-hbase component
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
➜ A lot of data must be stored „forever“
➜ Numbers increase exponentially
➜ Goal: As cheap as possible
➜ Problem: (Fast) queries must still be possible
➜ Solution: Commodity servers and „Hadoop querying“
Global Parcel Service
http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up
Real World Use Case: Storage: Compliance
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Real world use case: Storage: Compliance
Orders
(Server 1)
Log Files
(Server 3)
Log Files
(Server 100)
ETL
QueryStorage
Payments
(Server 2)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Live demo
Apache Camel in action...
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-pig? camel-hive? camel-hcatalog?
Not available yet (current Camel version: 2.12)  Workarounds:
• Use Pig / Hive-Query scripts (via camel-exec component or any
scripting language)
• Build your own component (more details later ...)
• Use Hive-Hbase-Integration and store data in HBase ( „ugly“)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-avro component
... Avro, a data serialization system used in Apache Hadoop. camel-avro component
provides a dataformat for Avro, which allows serialization and deserialization of
messages using Apache Avro's binary data format. Moreover, it provides support for
Apache Avro's RPC, by providing producers and consumers endpoint for using Avro
over Netty or HTTP.
Camel is not just about connectors ...
Camel supports a pluggable DataFormat to allow messages to be marshalled to and
unmarshalled from binary or text formats, e.g. CSV, JSON, SOAP, EDI, ZIP, or ...
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Connectivity
Routing
Transformation
Complexity
of Integration
Enterprise
Service Bus
Integration Suite
Low High
Integration
Framework
INTEGRATION
Tooling
Monitoring
Support
+
BUSINESS PROCESS MGT.
BIG DATA / MDM
REGISTRY / REPOSITORY
RULES ENGINE
„YOU NAME IT“
+
Alternatives for systems integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Complexity
of Integration
Enterprise
Service Bus
Integration Suite
Low High
Integration
Framework
Alternatives for systems integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
More details about ESBs and suites...
http://www.kai-waehner.de/blog/2013/01/23/spoilt-for-choice-
how-to-choose-the-right-enterprise-service-bus-esb/
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Hadoop Integration with Talend Open Studio
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
…an open source
ecosystem
Talend Open Studio for Big Data
• Improves efficiency of big data job design with
graphic interface
• Generates Hadoop code and run transforms
inside Hadoop
• Native support for HDFS, Pig, Hbase, Hcatalog,
Sqoop and Hive
• 100% open source under an Apache License
• Standards based
Pig
Vision: Democratize big data
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
…an open source
ecosystem
Talend Platform for Big Data
• Builds on Talend Open Studio for Big Data
• Adds data quality, advanced scalability and
management functions
• MapReduce massively parallel data
processing
• Shared Repository and remote deployment
• Data quality and profiling
• Data cleansing
• Reporting and dashboards
• Commercial support, warranty/IP indemnity
under a subscription license
Pig
Vision: Democratize big data
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Talend Open Studio for Big Data
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
“The advantage of their new system is that they can now look at their data
[from their log processing system] in anyway they want:
➜ Nightly MapReduce jobs collect statistics about their mail system such as
spam counts by domain, bytes transferred and number of logins.
➜ When they wanted to find out which part of the world their customers
logged in from, a quick [ad hoc] MapReduce job was created and they had
the answer within a few hours. Not really possible in your typical ETL
system.”
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
Real world Use case: Clickstream Analysis
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Real world use case: Clickstream Analysis
Log Files
(Server 1)
Log Files
(Server 2)
Log Files
(Server 100)
ETL
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
One of the original uses of Hadoop at Yahoo was to store and process their massive volume of
clickstream data. Now enterprises of all types can use Hadoop to refine and analyze
clickstream data. They can then answer business questions such as:
• What is the most efficient path for a site visitor to research a product, and then buy it?
• What products do visitors tend to buy together, and what are they most likely to buy in
the future?
• Where should I spend resources on fixing or enhancing the user experience on my
website?
Goal: Data visualization can help you optimize your website and
convert more visits into sales and revenue.
Potential Uses of Clickstream Data
Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases”
http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: A semi-structured log file
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: ETL Job
„... using Talend’s HDFS and Hive Components”
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: ETL Job
„... using Talend’s Map Reduce Components*”
* Not available in open source version of Talend Studio
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
„Talend Open Studio for Big Data“ in action...
Live demo
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Analysis with Microsoft Excel
We can see that the largest number of page hits in Florida were for
clothing, followed by shoes.
Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases”
http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Analysis with Microsoft Excel
The chart shows that the majority of men shopping for clothing on our
website are between the ages of 22 and 30. With this information, we can
optimize our content for this market segment.
Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases”
http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Analysis with Tableau
Spoilt for Choice  Use your preferred BI or Analysis tool!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Custom components
Easy to realize for all
integration alternatives *
• Integration Framework
• Enterprise Service Bus
• Integration Suite
* At least for open source solutions
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Custom components
You might need a ...
• ... Hive component for Camel
• ... Impala component for Talend
• ... custom component for your
internal data format
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Live demo (Example: Apache Camel)
Custom components in action...
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Alternative for custom components
• SOAP
• REST
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Code example: REST API for Salesforce object store
// Salesforce Query (SOQL) via REST API
from("direct:salesforceViaHttpLIST")
.setHeader("X-PrettyPrint", 1)
.setHeader("Authorization", accessToken)
.setHeader(Exchange.CONTENT_TYPE, "application/json")
.to("https://na14.salesforce.com/services/data/v20.0/query?q=SELECT+name+from
+Article__c")
// Salesforce CREATE via REST API
from("direct:salesforceViaHttpCREATE")
.setHeader("X-PrettyPrint", 1)
.setHeader("Authorization", accessToken)
.setHeader(Exchange.CONTENT_TYPE, "application/json“)
.to("https://na14.salesforce.com/services/data/v20.0/sobjects/Article__c")
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Did you get the key message?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Key messages
You have to care about big data to be competitive in the future!
You have to integrate different sources to get most value out of it!
Big data integration is no (longer) rocket science!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Did you get the key message?
Thank you for your attention. Questions?
kwaehner@talend.com
www.kai-waehner.de
LinkedIn / Xing
@KaiWaehner

Contenu connexe

Tendances

A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
DataWorks Summit
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to Cassandra
DATAVERSITY
 
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Kai Wähner
 

Tendances (20)

VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case study
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to Cassandra
 
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
An Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudAn Introduction to Talend Integration Cloud
An Introduction to Talend Integration Cloud
 
The API Lie
The API LieThe API Lie
The API Lie
 
Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
 
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 

En vedette

081024 Com Part Ges Fin Cochabamba
081024 Com Part Ges Fin Cochabamba081024 Com Part Ges Fin Cochabamba
081024 Com Part Ges Fin Cochabamba
ICCO Cooperation
 
XTIANA OMOKHOSE ISEDU
XTIANA OMOKHOSE ISEDUXTIANA OMOKHOSE ISEDU
XTIANA OMOKHOSE ISEDU
isedu xtiana
 
Aprendiendo A Ver La Escultura
Aprendiendo A Ver La EsculturaAprendiendo A Ver La Escultura
Aprendiendo A Ver La Escultura
carolinaperez_76
 
Skills Portfolio 2010
Skills Portfolio 2010Skills Portfolio 2010
Skills Portfolio 2010
JacquiBIUK
 

En vedette (20)

Tobias Ahl - Rala - Sweden Rural & Municipality Broadband
Tobias Ahl - Rala - Sweden Rural & Municipality BroadbandTobias Ahl - Rala - Sweden Rural & Municipality Broadband
Tobias Ahl - Rala - Sweden Rural & Municipality Broadband
 
Clase 1 desordenes emocionales afectivos y conductuales
Clase 1 desordenes emocionales afectivos y conductualesClase 1 desordenes emocionales afectivos y conductuales
Clase 1 desordenes emocionales afectivos y conductuales
 
081024 Com Part Ges Fin Cochabamba
081024 Com Part Ges Fin Cochabamba081024 Com Part Ges Fin Cochabamba
081024 Com Part Ges Fin Cochabamba
 
XTIANA OMOKHOSE ISEDU
XTIANA OMOKHOSE ISEDUXTIANA OMOKHOSE ISEDU
XTIANA OMOKHOSE ISEDU
 
Logmatic at ElasticSearch November Paris meetup
Logmatic at ElasticSearch November Paris meetupLogmatic at ElasticSearch November Paris meetup
Logmatic at ElasticSearch November Paris meetup
 
Presentación puravera foro fehispor
Presentación puravera  foro fehisporPresentación puravera  foro fehispor
Presentación puravera foro fehispor
 
Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.
Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.
Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.
 
Global Connections in Long-term care - IAHSA 2012
Global Connections in Long-term care - IAHSA 2012Global Connections in Long-term care - IAHSA 2012
Global Connections in Long-term care - IAHSA 2012
 
Mito de prometeo
Mito de prometeoMito de prometeo
Mito de prometeo
 
Eventi GEOWEB
Eventi GEOWEBEventi GEOWEB
Eventi GEOWEB
 
Serious Games und Social Media: Ein Zukunftsmarkt
Serious Games und Social Media: Ein ZukunftsmarktSerious Games und Social Media: Ein Zukunftsmarkt
Serious Games und Social Media: Ein Zukunftsmarkt
 
Madres y blogs
Madres y blogsMadres y blogs
Madres y blogs
 
Aacte Junio 2008
Aacte Junio 2008Aacte Junio 2008
Aacte Junio 2008
 
CFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler Wales
CFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler WalesCFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler Wales
CFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler Wales
 
Aprendiendo A Ver La Escultura
Aprendiendo A Ver La EsculturaAprendiendo A Ver La Escultura
Aprendiendo A Ver La Escultura
 
Diagrama de Classe: Relacionamento de Composição
Diagrama de Classe: Relacionamento de ComposiçãoDiagrama de Classe: Relacionamento de Composição
Diagrama de Classe: Relacionamento de Composição
 
Syllabus propedéutica y terapéutica ocular ciclo 2 2015
Syllabus propedéutica y terapéutica ocular ciclo 2 2015Syllabus propedéutica y terapéutica ocular ciclo 2 2015
Syllabus propedéutica y terapéutica ocular ciclo 2 2015
 
Hoja de afiliación
Hoja de afiliaciónHoja de afiliación
Hoja de afiliación
 
Skills Portfolio 2010
Skills Portfolio 2010Skills Portfolio 2010
Skills Portfolio 2010
 
EXPEDIA.ES
EXPEDIA.ESEXPEDIA.ES
EXPEDIA.ES
 

Similaire à "Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013

JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integration
jazoon13
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Kai Wähner
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
Bardess Group
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 

Similaire à "Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013 (20)

JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integration
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus Example
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.
 
Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
John Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the CloudJohn Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the Cloud
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data
Big DataBig Data
Big Data
 
How to implement Hadoop successfully
How to implement Hadoop successfullyHow to implement Hadoop successfully
How to implement Hadoop successfully
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 

Plus de Kai Wähner

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 

Plus de Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013

  • 1. Big data beyond Hadoop – How to integrate ALL your data Kai Wähner kwaehner@talend.com @KaiWaehner www.kai-waehner.de 9/24/2013
  • 2. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Consulting Developing Coaching Speaking Writing Main Tasks Requirements Engineering Enterprise Architecture Management Business Process Management Architecture and Development of Applications Service-oriented Architecture Integration of Legacy Applications Cloud Computing Big Data Contact Email: kontakt@kai-waehner.de Blog: www.kai-waehner.de/blog Twitter: @KaiWaehner Social Networks: Xing, LinkedIn Kai Wähner
  • 3. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Key messages You have to care about big data to be competitive in the future! You have to integrate different sources to get most value out of it! Big data integration is no (longer) rocket science!
  • 4. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 5. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 6. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner William Edwards Deming (1900 –1993) American statistician, professor, author, lecturer and consultant “If you can't measure it, you can't manage it.” Why should you care about big data?
  • 7. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner  „Silence the HiPPOs“ (highest-paid person‘s opinion)  Being able to interpret unimaginable large data stream, the gut feeling is no longer justified! Why should you care about big data?
  • 8. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner What is big data? The Vs of big data Volume (terabytes, petabytes) Variety (social networks, blog posts, logs, sensors, etc.) Velocity (realtime or near- realtime) Value
  • 9. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Big Data Integration – Land data in a Big Data cluster – Implement or generate parallel processes Big Data Manipulation – Simplify manipulation, such as sort and filter – Computational expensive functions Big Data Quality & Governance – Identify linkages and duplicates, validate big data – Match component, execute basic quality features Big Data Project Management – Place frameworks around big data projects – Common Repository, scheduling, monitoring Big data tasks to solve - before analysis
  • 10. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner “The advantage of their new system is that they can now look at their data [from their log processing system] in anyway they want: ➜ Nightly MapReduce jobs collect statistics about their mail system such as spam counts by domain, bytes transferred and number of logins. ➜ When they wanted to find out which part of the world their customers logged in from, a quick [ad hoc] MapReduce job was created and they had the answer within a few hours. Not really possible in your typical ETL system.” http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data Use case: Clickstream Analysis
  • 11. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner http://hkotadia.com/archives/5021 Deduce Customer Defections Use case: Risk management
  • 12. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner ➜ With revenue of almost USD 30 billion and a network of 800 locations, Macy's is considered the largest store operator in the USA ➜ Daily price check analysis of its 10,000 articles in less than two hours ➜ Whenever a neighboring competitor anywhere between New York and Los Angeles goes for aggressive price reductions, Macy's follows its example ➜ If there is no market competitor, the prices remain unchanged http://www.t-systems.com/about-t-systems/examples-of-successes-companies-analyze-big-data-in-record-time-l-t-systems/1029702 Use case: Flexible pricing
  • 13. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner ➜ A lot of data must be stored „forever“ ➜ Numbers increase exponentially ➜ Goal: As cheap as possible ➜ Problem: (Fast) queries must still be possible ➜ Solution: Commodity servers and „Hadoop querying“ Global Parcel Service http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up Storage: Compliance
  • 14. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 15. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner This is your company Big Data Geek Limited big data experts
  • 16. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Big Data + Poor Data Quality = Big Problems Data quality
  • 17. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner ➜ Wanna buy a big data solution for your industry? ➜ Maybe a competitor has a big data solution which adds business value? ➜ The competitor will never publish it (rat-race)! Big data tool selection (business perspective)
  • 18. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Looking for ‚your‘ required big data product? Support your data from scratch? Good luck!  Big data tool selection (technical perspective)
  • 19. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner How to solve these big data challenges?
  • 20. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner  “*Often+ simple models and big data trump more-elaborate [and complex] analytics approaches”  “Often someone coming from outside an industry can spot a better way to use big data than an insider” Erik Brynjolfsson / Lynn Wu http://alfredopassos.tumblr.com/post/32461599327/big-data-the-management-revolution-by-andrew-mcafee Be no expert! Be simple!
  • 21. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner  Look at use cases of others (SMU, but also large companies)  How can you do something similar with your data?  You have different data sources? Use it! Combine it! Play with it! Be creative!
  • 22. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner 1) Do not begin with the data, think about business opportunities 2) Choose the right data (combine different data sources) 3) Use easy tooling http://hbr.org/2012/10/making-advanced-analytics-work-for-you What is your Big Data process? Step 1 Step 2 Step 3
  • 23. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 24. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Technology perspective How to process big data?
  • 25. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner The critical flaw in parallel ETL tools is the fact that the data is almost never local to the processing nodes. This means that every time a large job is run, the data has to first be read from the source, split N ways and then delivered to the individual nodes. Worse, if the partition key of the source doesn’t match the partition key of the target, data has to be constantly exchanged among the nodes. In essence, parallel ETL treats the network as if it were a physical I/O subsystem. The network, which is always the slowest part of the process, becomes the weakest link in the performance chain. http://blog.syncsort.com/2012/08/parallel-etl-tools-are-dead How to process big data?
  • 26. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Slides: http://www.slideshare.net/pavlobaron/100-big-data-0-hadoop-0-java Video: http://www.infoq.com/presentations/Big-Data-Hadoop-Java How to process big data?
  • 27. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner The defacto standard for big data processing How to process big data?
  • 28. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Even Microsoft (the .NET house) relies on Hadoop since 2011 How to process big data? “A big part of [the company’s strategy+ includes wiring SQL Server 2012 (formerly known by the codename “Denali”) to the Hadoop distributed computing platform, and bringing Hadoop to Windows Server and Azure”
  • 29. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Apache Hadoop, an open-source software library, is a framework that allows for the distributed processing of large data sets across clusters of commodity hardware using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. What is Hadoop?
  • 30. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Simple example • Input: (very large) text files with lists of strings, such as: „318, 0043012650999991949032412004...0500001N9+01111+99999999999...“ • We are interested just in some content: year and temperate (marked in red) • The Map Reduce function has to compute the maximum temperature for every year Example from the book “Hadoop: The Definitive Guide, 3rd Edition” Map (Shuffle) Reduce
  • 31. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner How to process big data?
  • 32. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 33. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Connectivity Routing Transformation Complexity of Integration Enterprise Service Bus Integration Suite Low High Integration Framework INTEGRATION Tooling Monitoring Support + BUSINESS PROCESS MGT. BIG DATA / MDM REGISTRY / REPOSITORY RULES ENGINE „YOU NAME IT“ + Alternatives for systems integration
  • 34. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Complexity of Integration Enterprise Service Bus Integration Suite Low High Integration Framework Alternatives for systems integration
  • 35. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner More details about integration frameworks... http://www.kai-waehner.de/blog/2012/12/20/showdown-integration-framework- spring-integration-apache-camel-vs-enterprise-service-bus-esb/
  • 36. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner More details about integration frameworks... ... or you come to my JavaOne session tomorrow and see an updated version of the slides! Wednesday, 11:30 – 12:30 PM CON1934: Which Integration Framework to Choose?
  • 37. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Enterprise Integration Patterns (EIP) Apache Camel Implements the EIPs
  • 38. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Enterprise Integration Patterns (EIP)
  • 39. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Enterprise Integration Patterns (EIP)
  • 40. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Architecture http://java.dzone.com/articles/apache-camel-integration
  • 41. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner HTTP FTP File XSLT MQ JDBC Akka TCP SMTP RSS Quartz Log LDAP JMS EJB AMQP Atom AWS-S3 Bean-Validation CXF IRC Jetty JMX Lucene Netty RMI SQL Many many more Custom Components Choose your required components
  • 42. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Choose your favorite DSL XML (not production-ready yet)
  • 43. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Deploy it wherever you need Standalone OSGi Application Server Web Container Spring Container Cloud
  • 44. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Enterprise-ready • Open Source • Scalability • Error Handling • Transaction • Monitoring • Tooling • Commercial Support
  • 45. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: Camel integration route
  • 46. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Hadoop Integration with Apache Camel
  • 47. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner camel-hdfs component // Producer from("ftp://user@myServer?password=secret") .to(“hdfs:///myDirectory/myFile.txt?append=true"); // Consumer from(“hdfs:///myDirectory/myBigDataAnalysis.csv") .to(“file:target/reports/report.csv");
  • 48. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner camel-hbase component
  • 49. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner ➜ A lot of data must be stored „forever“ ➜ Numbers increase exponentially ➜ Goal: As cheap as possible ➜ Problem: (Fast) queries must still be possible ➜ Solution: Commodity servers and „Hadoop querying“ Global Parcel Service http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up Real World Use Case: Storage: Compliance
  • 50. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Real world use case: Storage: Compliance Orders (Server 1) Log Files (Server 3) Log Files (Server 100) ETL QueryStorage Payments (Server 2)
  • 51. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Live demo Apache Camel in action...
  • 52. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner camel-pig? camel-hive? camel-hcatalog? Not available yet (current Camel version: 2.12)  Workarounds: • Use Pig / Hive-Query scripts (via camel-exec component or any scripting language) • Build your own component (more details later ...) • Use Hive-Hbase-Integration and store data in HBase ( „ugly“)
  • 53. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner camel-avro component ... Avro, a data serialization system used in Apache Hadoop. camel-avro component provides a dataformat for Avro, which allows serialization and deserialization of messages using Apache Avro's binary data format. Moreover, it provides support for Apache Avro's RPC, by providing producers and consumers endpoint for using Avro over Netty or HTTP. Camel is not just about connectors ... Camel supports a pluggable DataFormat to allow messages to be marshalled to and unmarshalled from binary or text formats, e.g. CSV, JSON, SOAP, EDI, ZIP, or ...
  • 54. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 55. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Connectivity Routing Transformation Complexity of Integration Enterprise Service Bus Integration Suite Low High Integration Framework INTEGRATION Tooling Monitoring Support + BUSINESS PROCESS MGT. BIG DATA / MDM REGISTRY / REPOSITORY RULES ENGINE „YOU NAME IT“ + Alternatives for systems integration
  • 56. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Complexity of Integration Enterprise Service Bus Integration Suite Low High Integration Framework Alternatives for systems integration
  • 57. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner More details about ESBs and suites... http://www.kai-waehner.de/blog/2013/01/23/spoilt-for-choice- how-to-choose-the-right-enterprise-service-bus-esb/
  • 58. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Hadoop Integration with Talend Open Studio
  • 59. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner …an open source ecosystem Talend Open Studio for Big Data • Improves efficiency of big data job design with graphic interface • Generates Hadoop code and run transforms inside Hadoop • Native support for HDFS, Pig, Hbase, Hcatalog, Sqoop and Hive • 100% open source under an Apache License • Standards based Pig Vision: Democratize big data
  • 60. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner …an open source ecosystem Talend Platform for Big Data • Builds on Talend Open Studio for Big Data • Adds data quality, advanced scalability and management functions • MapReduce massively parallel data processing • Shared Repository and remote deployment • Data quality and profiling • Data cleansing • Reporting and dashboards • Commercial support, warranty/IP indemnity under a subscription license Pig Vision: Democratize big data
  • 61. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Talend Open Studio for Big Data
  • 62. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner “The advantage of their new system is that they can now look at their data [from their log processing system] in anyway they want: ➜ Nightly MapReduce jobs collect statistics about their mail system such as spam counts by domain, bytes transferred and number of logins. ➜ When they wanted to find out which part of the world their customers logged in from, a quick [ad hoc] MapReduce job was created and they had the answer within a few hours. Not really possible in your typical ETL system.” http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data Real world Use case: Clickstream Analysis
  • 63. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Real world use case: Clickstream Analysis Log Files (Server 1) Log Files (Server 2) Log Files (Server 100) ETL
  • 64. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner One of the original uses of Hadoop at Yahoo was to store and process their massive volume of clickstream data. Now enterprises of all types can use Hadoop to refine and analyze clickstream data. They can then answer business questions such as: • What is the most efficient path for a site visitor to research a product, and then buy it? • What products do visitors tend to buy together, and what are they most likely to buy in the future? • Where should I spend resources on fixing or enhancing the user experience on my website? Goal: Data visualization can help you optimize your website and convert more visits into sales and revenue. Potential Uses of Clickstream Data Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases” http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
  • 65. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: A semi-structured log file
  • 66. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: ETL Job „... using Talend’s HDFS and Hive Components”
  • 67. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: ETL Job „... using Talend’s Map Reduce Components*” * Not available in open source version of Talend Studio
  • 68. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner „Talend Open Studio for Big Data“ in action... Live demo
  • 69. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: Analysis with Microsoft Excel We can see that the largest number of page hits in Florida were for clothing, followed by shoes. Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases” http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
  • 70. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: Analysis with Microsoft Excel The chart shows that the majority of men shopping for clothing on our website are between the ages of 22 and 30. With this information, we can optimize our content for this market segment. Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases” http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
  • 71. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: Analysis with Tableau Spoilt for Choice  Use your preferred BI or Analysis tool!
  • 72. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 73. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Custom components Easy to realize for all integration alternatives * • Integration Framework • Enterprise Service Bus • Integration Suite * At least for open source solutions
  • 74. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Custom components You might need a ... • ... Hive component for Camel • ... Impala component for Talend • ... custom component for your internal data format
  • 75. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Live demo (Example: Apache Camel) Custom components in action...
  • 76. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Alternative for custom components • SOAP • REST
  • 77. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Code example: REST API for Salesforce object store // Salesforce Query (SOQL) via REST API from("direct:salesforceViaHttpLIST") .setHeader("X-PrettyPrint", 1) .setHeader("Authorization", accessToken) .setHeader(Exchange.CONTENT_TYPE, "application/json") .to("https://na14.salesforce.com/services/data/v20.0/query?q=SELECT+name+from +Article__c") // Salesforce CREATE via REST API from("direct:salesforceViaHttpCREATE") .setHeader("X-PrettyPrint", 1) .setHeader("Authorization", accessToken) .setHeader(Exchange.CONTENT_TYPE, "application/json“) .to("https://na14.salesforce.com/services/data/v20.0/sobjects/Article__c")
  • 78. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Did you get the key message?
  • 79. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Key messages You have to care about big data to be competitive in the future! You have to integrate different sources to get most value out of it! Big data integration is no (longer) rocket science!
  • 80. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Did you get the key message?
  • 81. Thank you for your attention. Questions? kwaehner@talend.com www.kai-waehner.de LinkedIn / Xing @KaiWaehner