SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
1SYNCHRONOSS PROPRIETARY
Bryan Quinn
ETL Use Cases
2SYNCHRONOSS PROPRIETARY
Company Snapshot
(Q3’2014 revenue)
Market Leader
•  Synchronoss provides Personal Cloud and Activation
Platforms to Tier One Operators, MSO’s and Enterprises
around the globe
Business Model
Highlights
•  Monthly Subscription Fee per active Personal Cloud
subscriber (SAAS)
•  Revenue model consists of transaction fee for every
activation
Tier-One, Blue
Chip Customers
Proven Scale
•  130+ Million Cloud Subscribers connected in our Personal
Cloud around the globe
•  Activating millions of devices each week
Strong Financial
Position
•  Strong, consistent growth in revenue scale and profitability
since IPO in 2006
•  Healthy balance sheet and cash flow
3SYNCHRONOSS PROPRIETARY
Cloud
Synchronoss is driving the acceleration of the Personal Cloud
market with strong growth across its platform and technology.
2011 Today
Customers
Data Classes
Supported
Personal
Cloud Usage
Ingest Rate
Subscriber
Growth
75+ Leading global mobile carriers
20M
Contacts
30 Billion Entities
(Photos, videos, call logs, contacts, music, documents, Messages)
1Terabyte
per month
+215 Terabytes per day
A few thousand
subs per month 400K-500K New Subs per Week
130M+ Cloud Subscribers
3.5 Billion Addressable Market
4SYNCHRONOSS PROPRIETARY
Current Hadoop Landscape
•  CDH 5.5
•  7 Hadoop clusters in production (0-4 years)
•  80 nodes
•  4 billion log events processed daily for 1 customer
•  Smallest - 4 nodes, largest - 20 nodes
•  Baremetal, VMs
•  Single/Multi tenant
•  Multi-cluster single tenant
•  MapReduce Reporting & HBase clusters
•  YARN, HIVE, Hue, Oozie, MapReduce, Sqoop, HBase,
HDFS, Spark, Flume
5SYNCHRONOSS PROPRIETARY
ETL Use cases
•  HDFS client
•  Sqoop
•  MongoDB connector
•  Hive-HBase integration
6SYNCHRONOSS PROPRIETARY
Writing to HDFS
•  hdfs dfs -put <file> <path_on_hdfs>
•  hdfs dfs -text <filename.txt|gz|snappy)
•  HDFS good for large files. Not good at dealing with small
files (sequence files)
•  Log files - hdfs porter, retries, parallelise, corrupted files,
file size should match block size. 128MB block size.
~2.5m rows /file
Other options:
•  NFS mount
•  MapR proprietary file system
•  Flume
•  Camus/Goblin
7SYNCHRONOSS PROPRIETARY
Oozie
8SYNCHRONOSS PROPRIETARY
Sqoop
<action name='importACSUserData' retry-max="15" retry-interval="3">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>ont-dc2-master-hadoop01:8032</job-tracker>
<name-node>hdfs://ont-dc2-master-hadoop01:8020</name-node>
<prepare>
<delete path="hdfs://ont-dc2-master-hadoop01:8020/data/vmm/user/staging/acsuser/"/>
</prepare>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:oracle:thin:@10.102.40.44:1521:PRDC</arg>
<arg>--table</arg>
<arg>acs_user_account</arg>
<arg>--target-dir</arg>
<arg>hdfs://ont-dc2-master-hadoop01:8020/data/vmm/user/staging/acsuser/</arg>
<arg>--username</arg>
<arg>A_USERNAME</arg>
<arg>--password</arg>
<arg>A_PASSWORD</arg>
<arg>--columns</arg>
<arg>ID,LCID,CID,INSERT_TIME,ACCOUNT_STATUS,TENANT_ID,ACCOUNT_TYPE,EMAIL</arg>
<arg>--split-by</arg>
<arg>ID</arg>
<arg>--fields-terminated-by</arg>
<arg>t</arg>
<arg>--compress</arg>
<arg>--num-mappers</arg>
<arg>10</arg>
</sqoop>
<ok to="joining"/>
<error to="errorNotification"/>
</action>
9SYNCHRONOSS PROPRIETARY
Sqoop Import
INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of acs_user_account
INFO org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - BoundingValsQuery: SELECT MIN(ID), MAX(ID)
FROM acs_user_account
INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 142.0739 MB in 35.429 seconds (4.0101 MB/sec)
INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 3966156 records.
$ hdfs dfs -ls /data/vmm/user/staging/acsuser
-rwxr-xr-x 3 admin 16159780 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000000_0.gz
-rwxr-xr-x 3 admin 15973159 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000001_0.gz
-rwxr-xr-x 3 admin 15742979 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000002_0.gz
-rwxr-xr-x 3 admin 15626649 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000003_0.gz
-rwxr-xr-x 3 admin 15555272 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000004_0.gz
-rwxr-xr-x 3 admin 15536504 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000005_0.gz
-rwxr-xr-x 3 admin 15463208 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000006_0.gz
-rwxr-xr-x 3 admin 15450095 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000007_0.gz
-rwxr-xr-x 3 admin 14894144 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000008_0.gz
-rwxr-xr-x 3 admin 8573426 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000009_0.gz
10SYNCHRONOSS PROPRIETARY
MongoDB Hadoop Integration
11SYNCHRONOSS PROPRIETARY
Mongo Document
{
"did":"eebd8f8becfdcae81cad3d24f920c273638e8df7",
"ts":1423237590000,
"cd": ISODate("2015-02-06T15:49:07.534Z"),
"sg":{
“lcid":“2e2ee2www454t88776",
“action":“UploadingPhotos",
“type":“r",
},
"_id": ObjectId("54d4e273134dfc570d00b10e")
}
12SYNCHRONOSS PROPRIETARY
Set up Hive to Mongo
-- create hive table that points to MongoDB collection view
CREATE EXTERNAL TABLE 10_mongo_handset_state (
id STRING,
segment STRUCT<lcid:STRING,
action:STRING,
type:STRING>,
ts STRING,
cd STRING)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler‘
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","segment":"sg"}')
TBLPROPERTIES('mongo.uri'='mongodb://ec2-52-55.eu-west-1.compute.amazonaws.com:27017/db.
fab09d7f52d3fe1278?readPreference=secondary',
'mongo.input.query'='{"cd" : { "$gte" : {"$date":1447927200000}, "$lt" : {"$date":1447930800000} }}',
'mongo.input.split.create_input_splits'='false');
13SYNCHRONOSS PROPRIETARY
Mongo load to Hive
Now load the mongo db data into Hive/hdfs
INSERT OVERWRITE TABLE 10_handset_state PARTITION
(pdate, phour)
select
c,
IF(segment.lcid IS NULL, '', segment.lcid),
IF(segment.action IS NULL, '', UPPER(segment.action)),
IF(segment.type IS NULL, '', LOWER(segment.type)),
'20151119',
lpad(CAST(hour(from_unixtime(unix_timestamp(cd,"EEE
MMM dd HH:mm:ss z yyyy"))) as STRING), 2, '0')
from 10_mongo_handset_state;
14SYNCHRONOSS PROPRIETARY
Mongo load to Hive
INFO : number of splits:1
INFO : 2015-12-09 02:03:50,020 Stage-1 map = 0%, reduce = 0%
INFO : 2015-12-09 02:05:33,136 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
118.25 sec
INFO : MapReduce Total cumulative CPU time: 1 minutes 58 seconds 250 msec
INFO : Ended Job = job_1449567915620_2552
INFO : Loading partition {pdate=20151209, phour=01}
INFO : Time taken for adding to write entity : 0
INFO : Partition default.10001_pc_handset_event{pdate=20151209, phour=01}
stats: [numFiles=1, numRows=1797013, totalSize=313391267, rawDataSize=311594254]
15SYNCHRONOSS PROPRIETARY
HBase Overview
•  NoSQL distributed, scalable database modelled on
Google’s BigTable
•  Key/Value store
•  Data persisted to HDFS
•  Resilient, HA
•  Sparse
•  Automatic sharding
16SYNCHRONOSS PROPRIETARY
HBase shell
hbase shell> create 'user_profile_uploads',
{NAME => 'ul',
VERSIONS => 1,
COMPRESSION=>'gz',
TTL => '31536000'}
17SYNCHRONOSS PROPRIETARY
Oozie-Hive-HBase
// Oozie workflow action
<action name="loadHBaseData">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>yarnRM</job-tracker>
<name-node>hdfs://nameservice1</name-node>
<job-xml>/user/hive/conf/hive-site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>/user/hive/conf/hive-default.xml</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop-master01,hadoop-slave01,hadoop-slave02</value>
</property>
</configuration>
<script>script.q</script>
</hive>
<ok to=“nextStep"/>
<error to="errorNotification"/>
</action>
18SYNCHRONOSS PROPRIETARY
Hive-HBase
-- HBase managed table
CREATE EXTERNAL TABLE IF NOT EXISTS hbase_user_profile_uploads
(key string, size BIGINT, number int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler‘
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,ul:size,ul:num")
TBLPROPERTIES("hbase.table.name" = "user_profile_uploads");
-- sample key '0ab94b27311b468186f5d!20130604!HANDSET!APPLE/
IPHONE!image/jpeg‘
INSERT OVERWRITE TABLE hbase_user_profile_uploads
SELECT concat(userid,'!',pdate,'!',platform,'!',device,'!',fileType), fileSize,
number
FROM 10_user_profile_uploads
where pdate=20151118;
19SYNCHRONOSS PROPRIETARY
HBase queries
hbase shell>get 'user_profile_uploads', '0ab94b27311b468186f5d!20140513!
HANDSET!SAMSUNG/SCH-I545!image/jpeg‘
// exact key search – v quick – returns 1 row
PrefixFilter v fast
scan 'user_profile_uploads', {FILTER => "PrefixFilter ('0ab94b27311b468186f5d')"}
// 3 row(s) in 0.0630 seconds
However if key is at end of table will take a long time
scan 'user_profile_uploads', {FILTER => "PrefixFilter (‘zb94b27311b468186f5d')"}
//12 row(s) in 16 seconds
####### Optimum Solution is to use STARTROW along with Filter ############
scan 'user_profile_uploads', {STARTROW => ‘zb94b27311b468186f5d', FILTER =>
"PrefixFilter (‘zb94b27311b468186f5d')"} //12 row(s) in 0.1560 seconds
20SYNCHRONOSS PROPRIETARY
Hadoop
•  Linear scalability
•  Predictable reporting
•  Reproducible and reliable reports
•  Democratized data
•  Applications were black boxes – no longer so. Out of
the darkness…
•  Enables data-driven decision making
•  Jump In!
21SYNCHRONOSS PROPRIETARY
Thank you
Email: bryan.quinn@synchronoss.com
@bryantquinn

Contenu connexe

Tendances

Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
Percona Toolkit for Effective MySQL Administration
Percona Toolkit for Effective MySQL AdministrationPercona Toolkit for Effective MySQL Administration
Percona Toolkit for Effective MySQL AdministrationMydbops
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDSDenish Patel
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...DataStax
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
How To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - SlidesSeveralnines
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
MySQL 5.7 innodb_enhance_partii_20160527
MySQL 5.7 innodb_enhance_partii_20160527MySQL 5.7 innodb_enhance_partii_20160527
MySQL 5.7 innodb_enhance_partii_20160527Saewoong Lee
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsPhase2
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability Omid Vahdaty
 

Tendances (20)

Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Percona Toolkit for Effective MySQL Administration
Percona Toolkit for Effective MySQL AdministrationPercona Toolkit for Effective MySQL Administration
Percona Toolkit for Effective MySQL Administration
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
PostgreSQL on Solaris
PostgreSQL on SolarisPostgreSQL on Solaris
PostgreSQL on Solaris
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
ha_module5
ha_module5ha_module5
ha_module5
 
How To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - Slides
 
Wckansai 2014
Wckansai 2014Wckansai 2014
Wckansai 2014
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
MySQL 5.7 innodb_enhance_partii_20160527
MySQL 5.7 innodb_enhance_partii_20160527MySQL 5.7 innodb_enhance_partii_20160527
MySQL 5.7 innodb_enhance_partii_20160527
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring Tools
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 

Similaire à HUG_Ireland_BryanQuinnPresentation_20160111

Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Amazon Web Services
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon Web Services
 
Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )  Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH ) Alex Lau
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceNagios
 
The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations Nicola Kabar
 
Designing High Performance RTC Signaling Servers
Designing High Performance RTC Signaling ServersDesigning High Performance RTC Signaling Servers
Designing High Performance RTC Signaling ServersDaniel-Constantin Mierla
 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)Aman Kohli
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
 
Design & Secure Your Cloud Infrastructure
Design & Secure Your Cloud Infrastructure Design & Secure Your Cloud Infrastructure
Design & Secure Your Cloud Infrastructure Anoop Nair
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeAman Kohli
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Become a GC Hero
Become a GC HeroBecome a GC Hero
Become a GC HeroTier1app
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 

Similaire à HUG_Ireland_BryanQuinnPresentation_20160111 (20)

Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
 
Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )  Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical Experience
 
The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations
 
Designing High Performance RTC Signaling Servers
Designing High Performance RTC Signaling ServersDesigning High Performance RTC Signaling Servers
Designing High Performance RTC Signaling Servers
 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
Design & Secure Your Cloud Infrastructure
Design & Secure Your Cloud Infrastructure Design & Secure Your Cloud Infrastructure
Design & Secure Your Cloud Infrastructure
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Become a GC Hero
Become a GC HeroBecome a GC Hero
Become a GC Hero
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 

Plus de John Mulhall

cloud-migrations.pptx
cloud-migrations.pptxcloud-migrations.pptx
cloud-migrations.pptxJohn Mulhall
 
HUGIreland_VincentDeStocklin_DataScienceWorkflows
HUGIreland_VincentDeStocklin_DataScienceWorkflowsHUGIreland_VincentDeStocklin_DataScienceWorkflows
HUGIreland_VincentDeStocklin_DataScienceWorkflowsJohn Mulhall
 
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdfHUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdfJohn Mulhall
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallJohn Mulhall
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningJohn Mulhall
 
HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran John Mulhall
 
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016John Mulhall
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesJohn Mulhall
 
HUG Ireland Event Presentation - In-Memory Databases
HUG Ireland Event Presentation - In-Memory DatabasesHUG Ireland Event Presentation - In-Memory Databases
HUG Ireland Event Presentation - In-Memory DatabasesJohn Mulhall
 
HUG Ireland Event - Dama Ireland slides
HUG Ireland Event - Dama Ireland slidesHUG Ireland Event - Dama Ireland slides
HUG Ireland Event - Dama Ireland slidesJohn Mulhall
 
Periscope Getting Started-2
Periscope Getting Started-2Periscope Getting Started-2
Periscope Getting Started-2John Mulhall
 
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIBAIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIBJohn Mulhall
 
Sonra Intelligence Ltd
Sonra Intelligence LtdSonra Intelligence Ltd
Sonra Intelligence LtdJohn Mulhall
 

Plus de John Mulhall (13)

cloud-migrations.pptx
cloud-migrations.pptxcloud-migrations.pptx
cloud-migrations.pptx
 
HUGIreland_VincentDeStocklin_DataScienceWorkflows
HUGIreland_VincentDeStocklin_DataScienceWorkflowsHUGIreland_VincentDeStocklin_DataScienceWorkflows
HUGIreland_VincentDeStocklin_DataScienceWorkflows
 
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdfHUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John Mulhall
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
 
HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran
 
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation Slides
 
HUG Ireland Event Presentation - In-Memory Databases
HUG Ireland Event Presentation - In-Memory DatabasesHUG Ireland Event Presentation - In-Memory Databases
HUG Ireland Event Presentation - In-Memory Databases
 
HUG Ireland Event - Dama Ireland slides
HUG Ireland Event - Dama Ireland slidesHUG Ireland Event - Dama Ireland slides
HUG Ireland Event - Dama Ireland slides
 
Periscope Getting Started-2
Periscope Getting Started-2Periscope Getting Started-2
Periscope Getting Started-2
 
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIBAIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
 
Sonra Intelligence Ltd
Sonra Intelligence LtdSonra Intelligence Ltd
Sonra Intelligence Ltd
 

Dernier

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 

Dernier (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 

HUG_Ireland_BryanQuinnPresentation_20160111

  • 2. 2SYNCHRONOSS PROPRIETARY Company Snapshot (Q3’2014 revenue) Market Leader •  Synchronoss provides Personal Cloud and Activation Platforms to Tier One Operators, MSO’s and Enterprises around the globe Business Model Highlights •  Monthly Subscription Fee per active Personal Cloud subscriber (SAAS) •  Revenue model consists of transaction fee for every activation Tier-One, Blue Chip Customers Proven Scale •  130+ Million Cloud Subscribers connected in our Personal Cloud around the globe •  Activating millions of devices each week Strong Financial Position •  Strong, consistent growth in revenue scale and profitability since IPO in 2006 •  Healthy balance sheet and cash flow
  • 3. 3SYNCHRONOSS PROPRIETARY Cloud Synchronoss is driving the acceleration of the Personal Cloud market with strong growth across its platform and technology. 2011 Today Customers Data Classes Supported Personal Cloud Usage Ingest Rate Subscriber Growth 75+ Leading global mobile carriers 20M Contacts 30 Billion Entities (Photos, videos, call logs, contacts, music, documents, Messages) 1Terabyte per month +215 Terabytes per day A few thousand subs per month 400K-500K New Subs per Week 130M+ Cloud Subscribers 3.5 Billion Addressable Market
  • 4. 4SYNCHRONOSS PROPRIETARY Current Hadoop Landscape •  CDH 5.5 •  7 Hadoop clusters in production (0-4 years) •  80 nodes •  4 billion log events processed daily for 1 customer •  Smallest - 4 nodes, largest - 20 nodes •  Baremetal, VMs •  Single/Multi tenant •  Multi-cluster single tenant •  MapReduce Reporting & HBase clusters •  YARN, HIVE, Hue, Oozie, MapReduce, Sqoop, HBase, HDFS, Spark, Flume
  • 5. 5SYNCHRONOSS PROPRIETARY ETL Use cases •  HDFS client •  Sqoop •  MongoDB connector •  Hive-HBase integration
  • 6. 6SYNCHRONOSS PROPRIETARY Writing to HDFS •  hdfs dfs -put <file> <path_on_hdfs> •  hdfs dfs -text <filename.txt|gz|snappy) •  HDFS good for large files. Not good at dealing with small files (sequence files) •  Log files - hdfs porter, retries, parallelise, corrupted files, file size should match block size. 128MB block size. ~2.5m rows /file Other options: •  NFS mount •  MapR proprietary file system •  Flume •  Camus/Goblin
  • 8. 8SYNCHRONOSS PROPRIETARY Sqoop <action name='importACSUserData' retry-max="15" retry-interval="3"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>ont-dc2-master-hadoop01:8032</job-tracker> <name-node>hdfs://ont-dc2-master-hadoop01:8020</name-node> <prepare> <delete path="hdfs://ont-dc2-master-hadoop01:8020/data/vmm/user/staging/acsuser/"/> </prepare> <arg>import</arg> <arg>--connect</arg> <arg>jdbc:oracle:thin:@10.102.40.44:1521:PRDC</arg> <arg>--table</arg> <arg>acs_user_account</arg> <arg>--target-dir</arg> <arg>hdfs://ont-dc2-master-hadoop01:8020/data/vmm/user/staging/acsuser/</arg> <arg>--username</arg> <arg>A_USERNAME</arg> <arg>--password</arg> <arg>A_PASSWORD</arg> <arg>--columns</arg> <arg>ID,LCID,CID,INSERT_TIME,ACCOUNT_STATUS,TENANT_ID,ACCOUNT_TYPE,EMAIL</arg> <arg>--split-by</arg> <arg>ID</arg> <arg>--fields-terminated-by</arg> <arg>t</arg> <arg>--compress</arg> <arg>--num-mappers</arg> <arg>10</arg> </sqoop> <ok to="joining"/> <error to="errorNotification"/> </action>
  • 9. 9SYNCHRONOSS PROPRIETARY Sqoop Import INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of acs_user_account INFO org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - BoundingValsQuery: SELECT MIN(ID), MAX(ID) FROM acs_user_account INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 142.0739 MB in 35.429 seconds (4.0101 MB/sec) INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 3966156 records. $ hdfs dfs -ls /data/vmm/user/staging/acsuser -rwxr-xr-x 3 admin 16159780 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000000_0.gz -rwxr-xr-x 3 admin 15973159 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000001_0.gz -rwxr-xr-x 3 admin 15742979 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000002_0.gz -rwxr-xr-x 3 admin 15626649 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000003_0.gz -rwxr-xr-x 3 admin 15555272 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000004_0.gz -rwxr-xr-x 3 admin 15536504 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000005_0.gz -rwxr-xr-x 3 admin 15463208 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000006_0.gz -rwxr-xr-x 3 admin 15450095 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000007_0.gz -rwxr-xr-x 3 admin 14894144 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000008_0.gz -rwxr-xr-x 3 admin 8573426 2015-11-19 00:11 /data/vmm/user/staging/acsuser/000009_0.gz
  • 11. 11SYNCHRONOSS PROPRIETARY Mongo Document { "did":"eebd8f8becfdcae81cad3d24f920c273638e8df7", "ts":1423237590000, "cd": ISODate("2015-02-06T15:49:07.534Z"), "sg":{ “lcid":“2e2ee2www454t88776", “action":“UploadingPhotos", “type":“r", }, "_id": ObjectId("54d4e273134dfc570d00b10e") }
  • 12. 12SYNCHRONOSS PROPRIETARY Set up Hive to Mongo -- create hive table that points to MongoDB collection view CREATE EXTERNAL TABLE 10_mongo_handset_state ( id STRING, segment STRUCT<lcid:STRING, action:STRING, type:STRING>, ts STRING, cd STRING) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler‘ WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","segment":"sg"}') TBLPROPERTIES('mongo.uri'='mongodb://ec2-52-55.eu-west-1.compute.amazonaws.com:27017/db. fab09d7f52d3fe1278?readPreference=secondary', 'mongo.input.query'='{"cd" : { "$gte" : {"$date":1447927200000}, "$lt" : {"$date":1447930800000} }}', 'mongo.input.split.create_input_splits'='false');
  • 13. 13SYNCHRONOSS PROPRIETARY Mongo load to Hive Now load the mongo db data into Hive/hdfs INSERT OVERWRITE TABLE 10_handset_state PARTITION (pdate, phour) select c, IF(segment.lcid IS NULL, '', segment.lcid), IF(segment.action IS NULL, '', UPPER(segment.action)), IF(segment.type IS NULL, '', LOWER(segment.type)), '20151119', lpad(CAST(hour(from_unixtime(unix_timestamp(cd,"EEE MMM dd HH:mm:ss z yyyy"))) as STRING), 2, '0') from 10_mongo_handset_state;
  • 14. 14SYNCHRONOSS PROPRIETARY Mongo load to Hive INFO : number of splits:1 INFO : 2015-12-09 02:03:50,020 Stage-1 map = 0%, reduce = 0% INFO : 2015-12-09 02:05:33,136 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 118.25 sec INFO : MapReduce Total cumulative CPU time: 1 minutes 58 seconds 250 msec INFO : Ended Job = job_1449567915620_2552 INFO : Loading partition {pdate=20151209, phour=01} INFO : Time taken for adding to write entity : 0 INFO : Partition default.10001_pc_handset_event{pdate=20151209, phour=01} stats: [numFiles=1, numRows=1797013, totalSize=313391267, rawDataSize=311594254]
  • 15. 15SYNCHRONOSS PROPRIETARY HBase Overview •  NoSQL distributed, scalable database modelled on Google’s BigTable •  Key/Value store •  Data persisted to HDFS •  Resilient, HA •  Sparse •  Automatic sharding
  • 16. 16SYNCHRONOSS PROPRIETARY HBase shell hbase shell> create 'user_profile_uploads', {NAME => 'ul', VERSIONS => 1, COMPRESSION=>'gz', TTL => '31536000'}
  • 17. 17SYNCHRONOSS PROPRIETARY Oozie-Hive-HBase // Oozie workflow action <action name="loadHBaseData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>yarnRM</job-tracker> <name-node>hdfs://nameservice1</name-node> <job-xml>/user/hive/conf/hive-site.xml</job-xml> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/hive/conf/hive-default.xml</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop-master01,hadoop-slave01,hadoop-slave02</value> </property> </configuration> <script>script.q</script> </hive> <ok to=“nextStep"/> <error to="errorNotification"/> </action>
  • 18. 18SYNCHRONOSS PROPRIETARY Hive-HBase -- HBase managed table CREATE EXTERNAL TABLE IF NOT EXISTS hbase_user_profile_uploads (key string, size BIGINT, number int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler‘ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,ul:size,ul:num") TBLPROPERTIES("hbase.table.name" = "user_profile_uploads"); -- sample key '0ab94b27311b468186f5d!20130604!HANDSET!APPLE/ IPHONE!image/jpeg‘ INSERT OVERWRITE TABLE hbase_user_profile_uploads SELECT concat(userid,'!',pdate,'!',platform,'!',device,'!',fileType), fileSize, number FROM 10_user_profile_uploads where pdate=20151118;
  • 19. 19SYNCHRONOSS PROPRIETARY HBase queries hbase shell>get 'user_profile_uploads', '0ab94b27311b468186f5d!20140513! HANDSET!SAMSUNG/SCH-I545!image/jpeg‘ // exact key search – v quick – returns 1 row PrefixFilter v fast scan 'user_profile_uploads', {FILTER => "PrefixFilter ('0ab94b27311b468186f5d')"} // 3 row(s) in 0.0630 seconds However if key is at end of table will take a long time scan 'user_profile_uploads', {FILTER => "PrefixFilter (‘zb94b27311b468186f5d')"} //12 row(s) in 16 seconds ####### Optimum Solution is to use STARTROW along with Filter ############ scan 'user_profile_uploads', {STARTROW => ‘zb94b27311b468186f5d', FILTER => "PrefixFilter (‘zb94b27311b468186f5d')"} //12 row(s) in 0.1560 seconds
  • 20. 20SYNCHRONOSS PROPRIETARY Hadoop •  Linear scalability •  Predictable reporting •  Reproducible and reliable reports •  Democratized data •  Applications were black boxes – no longer so. Out of the darkness… •  Enables data-driven decision making •  Jump In!
  • 21. 21SYNCHRONOSS PROPRIETARY Thank you Email: bryan.quinn@synchronoss.com @bryantquinn