Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Big data at zulily
1. Big Data @ Zulily
By Echo Li, Data Engineer,
eli@zulily.com
Data Services, BI and Big Data Analytics, Zulily
2. Where we are
Powerful and Flexible
2
BIDS Data Platform
CUSTOMER INTERACTION POINTS
WEBSTORE MEMBER ENGAGEMENT
EVENT MANAGEMENT VENDOR
MANAGEMENT
SUPPLY CHAIN ERP & BACK OFFICE
Site Mobile
Orders &
Payments
Content
Mgmt.
CRM
Relevancy
(personalization)
Messaging
Offers &
Promotions
Item
Master
Catalog &
Event
Workflow
Mgmt.
Planning PortalEDI / Data
Exchange
Purchase
Orders
Workflow
& Tools
Order
Mgmt.
Fulfillment
Mgmt.
Transportation
Warehouse &
Inventory
Mgmt.
Financial
(SAP
Enterprise)
Business
Intelligence
HRIS
Warehouse
Automation
Initiatives:
• Capacity & Scale
• Data driven decision
making
– Data for Everyone
• Better customer
experience through
Personalization &
Targeting
3. How We Do It
…powered by Hortonworks Data Platform & Google Cloud
Tableau (Visualization & Reporting) Data Services (ZATA API)
Google BigQuery
Big Data Platform - Google Compute Engine
Hortonworks Data Platform 2.1 on Google Cloud
HDFS YARN HIVE/TEZ AMBARI
Google Cloud Storage
Platform Tools (zulily Build)
ZuSync (ETL) ZuScheduler (Scheduling) ZuMon (Data Monitoring)
Customer Data Mart Merch DataMart Supply Chain DataMart
Clickstream/Web Analytics
4. Data Processing Pipeline & Analytics
2014 zulily Proprietary and Confidential
4
Operational
Systems
External APIs
(Google, FB, Yahoo, Bing etc)
Hadoop Processing in Cloud
Real Time
ZuSync
Landing
Zone(LZ)
Staging(stg)
AtomicData
Store(ADS)
Aggregated
Dataset
Tier 1 ETLWF Tier 2 ETLWF
Cust ADS
Order ADS
Clickstrea
m
Big Query Tables
5. Our Journey…
5
Data Platform V1.0
Technology Stack:
• SQL Server
Challenges:
• Scale & Only supported
structured relational data
Advantages:
• Simple
• All data in same data store
• Makes it easy for
visualization, analytics and
reporting
Data Platform V2.0
Technology Stack:
• SQL Server, Apache Hadoop
Challenges:
• Lack of single data store
• Unable to mash up data
across structured and
unstructured data
• Difficult to scale visualization
with large scale data
Advantage:
• Ability to process
unstructured data at scale
• Tableau allows us to have
single visualization layer on
top of all data
Modern Data Platform V3.0
Technology Stack:
• Hadoop, Google Cloud Platform, Big
Query
Challenges:
• New Pricing Model which is good and
bad
• Requires new data processing
methodology(especially for structured
data)
Advantages:
• Supports Scale, high Speed
• Single Data Platform for structured and
unstructured data
• Enables scenarios which were difficult
to achieve in V1.0 or V2.0
• Enterprise Hadoop capabilities enable
management, monitoring and workflow
definition which are critical
6. Use Cases…
Use Case#1: Site & Event Funnel Analysis
Google
Cloud
Storage
Hadoop/GCE
Web
Servers
zulily
data
API
BigQuery
Funnel Analysis
ZATA(DATA API)
Reporting &
Analysis
(Powered by
Tableau)
Benefits
Increase Revenue
Improve marketing strategy and
targeting
Improve business decisions
7. Hadoop
/GCE
Use Case #3: Supply Chain Visibility
zulily
Sync
Others
Carriers
Google
Cloud
Storage
Order
Visibility
BigQuery
In Transit
Shipment
PO
zulily SCS
PO Shipment
EDI
Flat File
Vendor
Data
Exch.
Benefits
End to end
order visibility
Manage by exception
Reduce shipping costs
8. As our Journey Continues… we need more talents !!!
Please check out our career page:
http://www.zulily.com/careers
Editor's Notes
Relational data is doubling every few months
Non relational data is growing even faster
It’s not about clicks its about impressions
It’s not about who visited the site but who did not
Data was fragmented across different stores limiting analytics
More people more need for faster data