Zeshan Sattar- Assessing the skill requirements and industry expectations for...
eBay EDW元数据管理及应用
1. eBay EDW Metadata Management and Applications Dec 2011 熊家治 eBay 数据分析平台架构师 [email_address]
2.
3. The Birth of eBay . . . . . . sold for $14.83 USD Started with a Broken Laser Pointer . . . AuctionWeb was born on the Labor Day weekend in September 1995 Pierre Omidyar $30 eBay Founder
4. The Birth of eBay . . . FREE Service Running Off from a Home Server . . . $240 USD/month Pierre Omidyar
5. The Birth of eBay . . . Requesting for donations . . . Coins Personal Check Bills Money Order Coupons Movie Tickets
7. The Birth of eBay . . . Initial Business Model and Target Users . . . Build equitable electronic marketplace for Americans to buy and sell their stuff
8. eBay Facts 450+ Million Registered Users Over 2 Billion Photos 220+ Million Active Item Listing for sale 50,000 Categories 2 Petabytes Stored 25 Petabytes Processed daily 300+ Features per quarter 100,000 lines of code rolled out every 2 weeks 48 Billion SQL Calls Per day 5.5 Billion API Calls Per month > 4.4 GB Source Code - 16 Years After . . . Global Presents In 33 International Markets 10+ Million New Items Added Per Day $2,000+ USD Trading Value Per Second
9. Analytical Data Platforms Singularity EDW Low End Enterprise-class System Discover & Explore Analyze & Report 20-50 concurrent users 500+ concurrent users Enterprise-class System >5 concurrent users Structure the Unstructured Detect Patterns Hadoop Developer System EDW/ODW (Primary& Secondary) “ Compare User Activity against last year” Trending and Forecast Analysis (large history) Operational Analytics Transactional Analytics High volume ad hoc queries Contextual-Complex Analytics Deep, Seasonal, Consumable Data Sets Production Data Warehousing Large Concurrent User-base Image Fingerprinting Image Classification Pattern Recognition Detect Counterfeits & SNADs
14. APD– Resource Distribution Chennai, India Cognizant Technology Services (on shore / off shore model) Shanghai, CN DW Core Team, APD Ops anchor point for China based outsourcers (HP, DX). Core competencies DW Development, Business System Analysis, Quality Assurance, Architecture, Project Management Office and Production Support. Seattle, WA DW Core Team & anchor point for India based outsourcing. Core competencies in VLDB and highly efficient / scaleable arch (Next Gen). San Jose, CA BU Dedicated Teams (IMS, DMS, MRM, UBI), DW Core, and Arch & Ops. Core competencies in rapid development, VLDB, MPP, business analysis, DW Dev.
25. How to Read DFD? Step2: the step number is ordered by the job start time Job Start/End Time(HH:MM:SS) The script(job) name to populate the table in the step The output table of step1, also, it is the input table of step2 Round Corner Rectangle: The upstream tables from other subject area Blue line: Stands for the process critical path Set Background as gray to highlight the target table of the diagram
34. ETL JOB RUNTIME INFO from all ETL SERVER UC4 TABLE USAGE MASTER DATA FLOW JOBTRACK REPOSITORY TERADATA QUERYLOG from TD1/TD2/TD3/TD5 TABLE DEPENDECY QUERY PATTERN QUERY USER BEHAVIOR USER QUERY/BATCTH JOB ENHANCEMENT MDR TABLE USAGE INRO ETL JOB STATUS JOB TRACK REPOSITORY DATA SOURCE Applications … JOBTRACK OVERVIEW
35. JOBTRACK FEATURES AUTOMATION for Any Table + Any ETL JOB REALTIME + HISTORY + FORECAST ALL INFORMATION IN ONE PAGE NOT ONLY Dataflow, you can get all data about Data info you need