SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
1
©2017 Talend Inc
Greg Meimers
Steve Biernbaum
Big Data
2
©2017 Talend Inc
Demo
3
• Open your mobile phone’s browser & navigate to
http://snowflake.talend.live
Enter the session code only and click Submit; do not continue
Setup
4
• Open your mobile phone’s browser & navigate to
http://devicemotion.xyz
• Enter the session code only and click Submit; do not continue
To participate:
5
• Enter your first name only (no spaces or special characters)
Don’t click Submit until instructed
Setup
6
Collect, aggregate, categorize
sensor data in real-time…
…from your mobile phone
Today’s Goal
7
Javascript
reads
devicemotion
events
Stream micro-
batches to
REST service
REST service
sends data to
Kafka
Spark
Streaming
reads from
Kafka
Apply Machine
Learning to
classify activity
Load into Data
Warehouse
Visualization
data obtained
from REST
service
How Are We Collecting?
{REST} {REST}
8
• It let's you publish and subscribe to
streams of records. In this respect it
is similar to a message queue or
enterprise messaging system.
• It let's you store streams of records in
a fault-tolerant way.
• It let's you process streams of records
as they occur.
Distributed Streaming Platform
Kafka Background
9
• Fast and general engine for large-scale data processing
• Developed in response to processing limitations with MapReduce
• 10x faster than MapReduce on disk
• 100x faster than MapReduce in memory
• Has a stack of libraries including Spark Streaming & MLib (machine learning)
• Runs everywhere; on Hadoop or Standalone
Spark Background
10
• University study on gait (walking) characteristics based on smartphone sensors
proposed that each individual has a unique walking signature
• Showing a heat-trace on three individuals reveals their unique signature
Biometric Gait Signature
1 http://www.mdpi.com/2073-8994/8/10/100
2 http://kyrandale.com/viz/d3-smartphone-walking.html
11
A Single Sensor
InvenSense MPU-6500 (Galaxy S6)
• Single-chip (3mm x 3mm x 0.9 mm)
integrates a 3-axis accelerometer
and a 3-axis gyroscope
• For comparison
18mm 3mm
12
Linear Acceleration
• Shows forces measured by the accelerometer that
are caused by gravity
• The x, y and z axis show the direction of the force
• As you hold a phone looking at the screen…
• x is relative to the left and right sides
• y is relative to the up and down sides
• z is relative to the front and back sides
• If the phone is still, the linear acceleration values
should all be close to 0
• If you move it around it shows in real time how
much force is applied on it in the form of
acceleration
What Are We Collecting?
13
• The devicemotion event is fired at a regular interval and indicates the
amount of physical force of acceleration the device is receiving at that time
• The information being transmitted is sent in JSON payloads every 250 events
(~5 seconds):
JavaScript devicemotion Events
"motionData":[
{
"client_ip":"127.0.0.1",
"timestamp":"1723452955",
"aX":"1.4",
"aY":"0.9",
"aZ":"3.1",
"user_name":"Name"
},
...
]
14
Deduplication & Matching using Machine Learning to Scale to Big Data
Data Quality with Machine Learning
Training set
Single data set
with duplicates
Prediction of
potential
duplicates
Manual labeling: “is this a
duplicate?” yes/no
Run model
(Random Forests)
Train model
SAMPLE
ALL DATA
sampling
Continuous learning: the more data, the better the system learns
15
• Linear acceleration on x, y, z axes (m/s2)
• Data classified into 3 categories
• Resting
• Walking
• Running
• Approximately 450 events
Training Data
aX,aY,aZ,label
-4.1,8.07,-16.36,running
-2.34,9.69,-0.33,running
0.0,0.01,-0.01,resting
-2.38,-0.54,0.65,walking
-0.7,12.93,-4.91,running
-3.3,-0.89,5.27,walking
1.85,-1.37,-0.73,walking
0.01,0.0,0.0,resting
…
16
• Encode the model by using the previous handmade classified dataset
• Choose an appropriate algorithm for classification:
• Logistic Regression, Naïve Bayes, Decision Tree, Random Forest
• Validate algorithm using K-Fold Cross Validation
Encoding and Validating a Model
aX,aY,aZ,label
-4.1,8.07,-16.36,running
-2.34,9.69,-0.33,running
0.0,0.01,-0.01,resting
-2.38,-0.54,0.65,walking
-0.7,12.93,-4.91,running
-3.3,-0.89,5.27,walking
1.85,-1.37,-0.73,walking
0.01,0.0,0.0,resting
…
17
5 Ways to Exploit Your Big Data
Spark
Streaming
Batch &
Real-Time
In Memory
Machine
Learning
1 click code
migration
Analyze before acting
Turn data into
decisions, prescriptions
& actions
Leverage the latest
technology
Remove latency
Exploit data as it arrives
18
SUPPLIERS
CUSTOMERS
CLOUD
SENSORS
PREMISE
19
A Modern Big Data and Cloud Integration Platform
Data Fabric
APPLICATION
INTEGRATION
CLOUD
INTEGRATION
METADATA
MANAGEMENT
DATA
PREPARATION
BIG DATA
INTEGRATION
MASTER DATA
MANAGEMENT
20
Check Authorization
Big Data Architecture
Get Software Updates &
Publish Artifacts
Store Metadata
Store Users, Rights, Roles,
Projects, Activity, Monitoring
Send & Request
Artifacts/Jobs
Job Server can be inside
or outside the cluster
Setup deployment
21
UNIFIED PLATFORM
BATCH STREAMING HADOOP SPARK MAPREDUCE
INGEST PROFILE CLEANSE PARSE COMPLEX DATA
MAPPING
DATA QUALITY METADATA MANAGEMENT DATA LINEAGE
DESIGN DEPLOY MANAGE
ON-PREMISES PUBLIC CLOUD PRIVATE CLOUD
DATA GOVERNANCE
CONTINUOUS DELIVERY
DEPLOYMENT
BIG DATA
INTEGRATION
Big Data
22
Talend Development Environment
• Talend Studio
o Eclipse Based Design Environment
o Drag and Drop UI
o Distributed Teamwork / Collaboration
o Rich palette of connectors : 800+
• N-Tier Architecture
o Client: Talend Studio
o Project Server: Talend Administration Center
o ETL Server: Talend Runtime
• Talend Administration Center
o Define Users and Projects (LDAP Enabled)
o Deploy
o Schedule
o Recover Job execution
o Monitor
23
Create High Quality Information
• Data Quality and Profiling
• Explore, profile and monitor data
• Parse, cleanse, standardize and reconcile data
• Match, enrich and certify data, then and share it
widely and securely
• Map any data source to your business context
(customers products, organizations locations…)
• Data Masking
• Key Benefits
• More accurate information
• Regulatory compliance
24
Talend Data Preparation
The first unified integration platform for governed, self-service data preparation
• Self-service data access & cleansing
+ Enterprise scale through Talend Data Fabric
+ Collaboration and sharing across teams
+ IT governs data usage with role-based security
+ Turn ad-hoc data prep into fully managed DI
processes
+ Ready for Big Data
LIVE DATA-SET
…and more
25
The First Self-Service Data Quality Tool
Talend Data Stewardship App
Establish accountability and perfect data through teamwork
+ Engage everyone for data quality, not just data
stewards
+ Point & click approach for curation and
certification
+ Orchestrate data stewardship tasks as
campaigns
+ Audit and track data error resolution actions
26
Talend Data Preparation
Data cleaning and transformation for data analysts. Simple and powerful.
27
TIC Architecture: Connecting SaaS & Cloud Platforms
Templates
Integration
Flows
Cloud Engines SaaS App
On-premises apps & databases
Metadata in transit (HTTPS)
Customer data in transit
Firewall Firewall
Cloud Platforms
Multi-tenant
Web
Application
Talend Studio
28
TIC Architecture – Hybrid Integration
Templates
Integration
Flows
SaaS App
On-premises apps & databases
Metadata in transit (HTTPS)
Customer data in transit
Firewall Firewall
Status and Logs (HTTPS)
Remote Engines
Cloud Platforms
Multitenant
Web
Application
Talend Studio
29
©2017 Talend Inc
-Q&A

Contenu connexe

Tendances

Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataPrecisely
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Igor De Souza
 
Unlocking the value of your data assets with talend 6
Unlocking the value of your data assets with talend 6Unlocking the value of your data assets with talend 6
Unlocking the value of your data assets with talend 6Jean-Michel Franco
 
Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)Denodo
 
PgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOpsPgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOpsEDB
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Jeffrey T. Pollock
 
Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)Denodo
 
SAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence OverviewSAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence OverviewSAP Technology
 
Get the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost downGet the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost downAgilisium Consulting
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySamanthaBerlant
 
A complete-guide-to-oracle-to-redshift-migration
A complete-guide-to-oracle-to-redshift-migrationA complete-guide-to-oracle-to-redshift-migration
A complete-guide-to-oracle-to-redshift-migrationbindu1512
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies SnapLogic
 
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...Amazon Web Services
 
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...DataStax
 
Informatica Cloud Winter 2016 Release Webinar
Informatica Cloud Winter 2016 Release WebinarInformatica Cloud Winter 2016 Release Webinar
Informatica Cloud Winter 2016 Release WebinarInformatica Cloud
 
NetApp Tableau Presentation Final
NetApp Tableau Presentation FinalNetApp Tableau Presentation Final
NetApp Tableau Presentation FinalMark Wu
 
EDB Executive Presentation 101515
EDB Executive Presentation 101515EDB Executive Presentation 101515
EDB Executive Presentation 101515Pierre Fricke
 

Tendances (20)

Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your Data
 
Talend Metadata Bridge
Talend Metadata BridgeTalend Metadata Bridge
Talend Metadata Bridge
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
Data Migration to Azure
Data Migration to AzureData Migration to Azure
Data Migration to Azure
 
Unlocking the value of your data assets with talend 6
Unlocking the value of your data assets with talend 6Unlocking the value of your data assets with talend 6
Unlocking the value of your data assets with talend 6
 
Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)
 
PgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOpsPgConf 2018 - Postgres in a World of DevOps
PgConf 2018 - Postgres in a World of DevOps
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
 
Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)
 
SAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence OverviewSAP HANA Data Center Intelligence Overview
SAP HANA Data Center Intelligence Overview
 
Get the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost downGet the most out of your AWS Redshift investment while keeping cost down
Get the most out of your AWS Redshift investment while keeping cost down
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
 
A complete-guide-to-oracle-to-redshift-migration
A complete-guide-to-oracle-to-redshift-migrationA complete-guide-to-oracle-to-redshift-migration
A complete-guide-to-oracle-to-redshift-migration
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
 
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...
 
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
 
Informatica Cloud Winter 2016 Release Webinar
Informatica Cloud Winter 2016 Release WebinarInformatica Cloud Winter 2016 Release Webinar
Informatica Cloud Winter 2016 Release Webinar
 
NetApp Tableau Presentation Final
NetApp Tableau Presentation FinalNetApp Tableau Presentation Final
NetApp Tableau Presentation Final
 
EDB Executive Presentation 101515
EDB Executive Presentation 101515EDB Executive Presentation 101515
EDB Executive Presentation 101515
 

Similaire à Big data - Talend presentation to STLHUG

CWIN17 New-York / Unleash the possibilities of io t with spark and machine le...
CWIN17 New-York / Unleash the possibilities of io t with spark and machine le...CWIN17 New-York / Unleash the possibilities of io t with spark and machine le...
CWIN17 New-York / Unleash the possibilities of io t with spark and machine le...Capgemini
 
Big Data and Machine Learning on AWS
Big Data and Machine Learning on AWSBig Data and Machine Learning on AWS
Big Data and Machine Learning on AWSCloudHesive
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Applying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsApplying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsPriyanka Aash
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 
Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Pavel Hardak
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise deteo
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureMarco van der Hart
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive AnalyticsNandita Nityanandam
 
Data in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathonData in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathonCisco DevNet
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupShlomo Yona
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHortonworks
 
Pretty pictures - Brandon Satrom
Pretty pictures - Brandon SatromPretty pictures - Brandon Satrom
Pretty pictures - Brandon SatromFuture Insights
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for CybersecurityVMware Tanzu
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunk
 
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big datawebwinkelvakdag
 

Similaire à Big data - Talend presentation to STLHUG (20)

CWIN17 New-York / Unleash the possibilities of io t with spark and machine le...
CWIN17 New-York / Unleash the possibilities of io t with spark and machine le...CWIN17 New-York / Unleash the possibilities of io t with spark and machine le...
CWIN17 New-York / Unleash the possibilities of io t with spark and machine le...
 
Big Data and Machine Learning on AWS
Big Data and Machine Learning on AWSBig Data and Machine Learning on AWS
Big Data and Machine Learning on AWS
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Applying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsApplying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data Sets
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochure
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
 
Data in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathonData in Motion - tech-intro-for-paris-hackathon
Data in Motion - tech-intro-for-paris-hackathon
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
 
Pretty pictures - Brandon Satrom
Pretty pictures - Brandon SatromPretty pictures - Brandon Satrom
Pretty pictures - Brandon Satrom
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and Logs
 
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 

Plus de Adam Doyle

Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering RolesAdam Doyle
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster ServicesAdam Doyle
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowAdam Doyle
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAdam Doyle
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Localized Hadoop Development
Localized Hadoop DevelopmentLocalized Hadoop Development
Localized Hadoop DevelopmentAdam Doyle
 
The new big data
The new big dataThe new big data
The new big dataAdam Doyle
 
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020Adam Doyle
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleAdam Doyle
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAAdam Doyle
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackAdam Doyle
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
How stlrda does data
How stlrda does dataHow stlrda does data
How stlrda does dataAdam Doyle
 
Tailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsTailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsAdam Doyle
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingAdam Doyle
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019Adam Doyle
 
Data Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleData Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleAdam Doyle
 

Plus de Adam Doyle (20)

ML Ops.pptx
ML Ops.pptxML Ops.pptx
ML Ops.pptx
 
Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering Roles
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Localized Hadoop Development
Localized Hadoop DevelopmentLocalized Hadoop Development
Localized Hadoop Development
 
The new big data
The new big dataThe new big data
The new big data
 
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEA
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech Stack
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
How stlrda does data
How stlrda does dataHow stlrda does data
How stlrda does data
 
Tailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsTailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analytics
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019
 
Data Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleData Engineering and the Data Science Lifecycle
Data Engineering and the Data Science Lifecycle
 

Dernier

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Dernier (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

Big data - Talend presentation to STLHUG

  • 1. 1 ©2017 Talend Inc Greg Meimers Steve Biernbaum Big Data
  • 3. 3 • Open your mobile phone’s browser & navigate to http://snowflake.talend.live Enter the session code only and click Submit; do not continue Setup
  • 4. 4 • Open your mobile phone’s browser & navigate to http://devicemotion.xyz • Enter the session code only and click Submit; do not continue To participate:
  • 5. 5 • Enter your first name only (no spaces or special characters) Don’t click Submit until instructed Setup
  • 6. 6 Collect, aggregate, categorize sensor data in real-time… …from your mobile phone Today’s Goal
  • 7. 7 Javascript reads devicemotion events Stream micro- batches to REST service REST service sends data to Kafka Spark Streaming reads from Kafka Apply Machine Learning to classify activity Load into Data Warehouse Visualization data obtained from REST service How Are We Collecting? {REST} {REST}
  • 8. 8 • It let's you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system. • It let's you store streams of records in a fault-tolerant way. • It let's you process streams of records as they occur. Distributed Streaming Platform Kafka Background
  • 9. 9 • Fast and general engine for large-scale data processing • Developed in response to processing limitations with MapReduce • 10x faster than MapReduce on disk • 100x faster than MapReduce in memory • Has a stack of libraries including Spark Streaming & MLib (machine learning) • Runs everywhere; on Hadoop or Standalone Spark Background
  • 10. 10 • University study on gait (walking) characteristics based on smartphone sensors proposed that each individual has a unique walking signature • Showing a heat-trace on three individuals reveals their unique signature Biometric Gait Signature 1 http://www.mdpi.com/2073-8994/8/10/100 2 http://kyrandale.com/viz/d3-smartphone-walking.html
  • 11. 11 A Single Sensor InvenSense MPU-6500 (Galaxy S6) • Single-chip (3mm x 3mm x 0.9 mm) integrates a 3-axis accelerometer and a 3-axis gyroscope • For comparison 18mm 3mm
  • 12. 12 Linear Acceleration • Shows forces measured by the accelerometer that are caused by gravity • The x, y and z axis show the direction of the force • As you hold a phone looking at the screen… • x is relative to the left and right sides • y is relative to the up and down sides • z is relative to the front and back sides • If the phone is still, the linear acceleration values should all be close to 0 • If you move it around it shows in real time how much force is applied on it in the form of acceleration What Are We Collecting?
  • 13. 13 • The devicemotion event is fired at a regular interval and indicates the amount of physical force of acceleration the device is receiving at that time • The information being transmitted is sent in JSON payloads every 250 events (~5 seconds): JavaScript devicemotion Events "motionData":[ { "client_ip":"127.0.0.1", "timestamp":"1723452955", "aX":"1.4", "aY":"0.9", "aZ":"3.1", "user_name":"Name" }, ... ]
  • 14. 14 Deduplication & Matching using Machine Learning to Scale to Big Data Data Quality with Machine Learning Training set Single data set with duplicates Prediction of potential duplicates Manual labeling: “is this a duplicate?” yes/no Run model (Random Forests) Train model SAMPLE ALL DATA sampling Continuous learning: the more data, the better the system learns
  • 15. 15 • Linear acceleration on x, y, z axes (m/s2) • Data classified into 3 categories • Resting • Walking • Running • Approximately 450 events Training Data aX,aY,aZ,label -4.1,8.07,-16.36,running -2.34,9.69,-0.33,running 0.0,0.01,-0.01,resting -2.38,-0.54,0.65,walking -0.7,12.93,-4.91,running -3.3,-0.89,5.27,walking 1.85,-1.37,-0.73,walking 0.01,0.0,0.0,resting …
  • 16. 16 • Encode the model by using the previous handmade classified dataset • Choose an appropriate algorithm for classification: • Logistic Regression, Naïve Bayes, Decision Tree, Random Forest • Validate algorithm using K-Fold Cross Validation Encoding and Validating a Model aX,aY,aZ,label -4.1,8.07,-16.36,running -2.34,9.69,-0.33,running 0.0,0.01,-0.01,resting -2.38,-0.54,0.65,walking -0.7,12.93,-4.91,running -3.3,-0.89,5.27,walking 1.85,-1.37,-0.73,walking 0.01,0.0,0.0,resting …
  • 17. 17 5 Ways to Exploit Your Big Data Spark Streaming Batch & Real-Time In Memory Machine Learning 1 click code migration Analyze before acting Turn data into decisions, prescriptions & actions Leverage the latest technology Remove latency Exploit data as it arrives
  • 19. 19 A Modern Big Data and Cloud Integration Platform Data Fabric APPLICATION INTEGRATION CLOUD INTEGRATION METADATA MANAGEMENT DATA PREPARATION BIG DATA INTEGRATION MASTER DATA MANAGEMENT
  • 20. 20 Check Authorization Big Data Architecture Get Software Updates & Publish Artifacts Store Metadata Store Users, Rights, Roles, Projects, Activity, Monitoring Send & Request Artifacts/Jobs Job Server can be inside or outside the cluster Setup deployment
  • 21. 21 UNIFIED PLATFORM BATCH STREAMING HADOOP SPARK MAPREDUCE INGEST PROFILE CLEANSE PARSE COMPLEX DATA MAPPING DATA QUALITY METADATA MANAGEMENT DATA LINEAGE DESIGN DEPLOY MANAGE ON-PREMISES PUBLIC CLOUD PRIVATE CLOUD DATA GOVERNANCE CONTINUOUS DELIVERY DEPLOYMENT BIG DATA INTEGRATION Big Data
  • 22. 22 Talend Development Environment • Talend Studio o Eclipse Based Design Environment o Drag and Drop UI o Distributed Teamwork / Collaboration o Rich palette of connectors : 800+ • N-Tier Architecture o Client: Talend Studio o Project Server: Talend Administration Center o ETL Server: Talend Runtime • Talend Administration Center o Define Users and Projects (LDAP Enabled) o Deploy o Schedule o Recover Job execution o Monitor
  • 23. 23 Create High Quality Information • Data Quality and Profiling • Explore, profile and monitor data • Parse, cleanse, standardize and reconcile data • Match, enrich and certify data, then and share it widely and securely • Map any data source to your business context (customers products, organizations locations…) • Data Masking • Key Benefits • More accurate information • Regulatory compliance
  • 24. 24 Talend Data Preparation The first unified integration platform for governed, self-service data preparation • Self-service data access & cleansing + Enterprise scale through Talend Data Fabric + Collaboration and sharing across teams + IT governs data usage with role-based security + Turn ad-hoc data prep into fully managed DI processes + Ready for Big Data LIVE DATA-SET …and more
  • 25. 25 The First Self-Service Data Quality Tool Talend Data Stewardship App Establish accountability and perfect data through teamwork + Engage everyone for data quality, not just data stewards + Point & click approach for curation and certification + Orchestrate data stewardship tasks as campaigns + Audit and track data error resolution actions
  • 26. 26 Talend Data Preparation Data cleaning and transformation for data analysts. Simple and powerful.
  • 27. 27 TIC Architecture: Connecting SaaS & Cloud Platforms Templates Integration Flows Cloud Engines SaaS App On-premises apps & databases Metadata in transit (HTTPS) Customer data in transit Firewall Firewall Cloud Platforms Multi-tenant Web Application Talend Studio
  • 28. 28 TIC Architecture – Hybrid Integration Templates Integration Flows SaaS App On-premises apps & databases Metadata in transit (HTTPS) Customer data in transit Firewall Firewall Status and Logs (HTTPS) Remote Engines Cloud Platforms Multitenant Web Application Talend Studio