SlideShare une entreprise Scribd logo
1  sur  38
eBay EDW  Metadata Management and Applications Dec 2011 熊家治  eBay 数据分析平台架构师 [email_address]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Birth of eBay . . .  . . . sold for $14.83 USD Started with a  Broken  Laser Pointer . . . AuctionWeb was born on the Labor Day weekend in September 1995 Pierre Omidyar $30 eBay Founder
The Birth of eBay . . .  FREE Service Running Off from a Home Server . . . $240 USD/month Pierre Omidyar
The Birth of eBay . . .  Requesting for donations . . . Coins Personal Check Bills Money Order Coupons Movie Tickets
The Birth of eBay . . .  Start Profitable . . .
The Birth of eBay . . .  Initial Business Model and Target Users . . . Build  equitable electronic marketplace for Americans to buy and sell their stuff
eBay Facts 450+ Million Registered Users Over 2 Billion Photos 220+ Million Active Item Listing for sale 50,000 Categories 2 Petabytes  Stored 25 Petabytes Processed daily 300+ Features per quarter 100,000 lines  of code rolled out every 2 weeks 48 Billion SQL Calls Per day 5.5 Billion API Calls Per month > 4.4 GB Source Code - 16 Years After . . . Global Presents In 33 International Markets 10+ Million New Items Added Per Day $2,000+ USD Trading Value Per Second
Analytical Data Platforms Singularity EDW Low End Enterprise-class System Discover & Explore Analyze & Report 20-50  concurrent users 500+  concurrent users Enterprise-class System >5  concurrent users Structure the Unstructured Detect Patterns Hadoop Developer System EDW/ODW (Primary& Secondary) “ Compare User Activity against last year” Trending and Forecast Analysis (large history) Operational Analytics Transactional Analytics High volume ad hoc queries Contextual-Complex Analytics Deep, Seasonal, Consumable Data Sets Production Data Warehousing Large Concurrent User-base Image Fingerprinting Image Classification Pattern Recognition Detect Counterfeits & SNADs
eBay EDW ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
EDW Architecture
Closed loop, active Data Warehouse Site Databases Analytical Reporting Enterprise DW Raw data:  daily, hourly feeds Knowledge:  Integrated, aggregated, augmented www.ebay.com Trust & Safety Customer Support ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Wisdom:  informed, fact based actions Marketing
APD– Resource Distribution Chennai, India Cognizant Technology Services (on shore / off shore model) Shanghai, CN DW Core Team, APD Ops  anchor point for China based outsourcers (HP, DX).  Core competencies DW Development, Business System Analysis, Quality Assurance, Architecture, Project Management Office and Production Support. Seattle, WA DW Core Team & anchor point for India based outsourcing.  Core competencies in VLDB and highly efficient / scaleable arch (Next Gen). San Jose, CA BU Dedicated Teams (IMS, DMS, MRM, UBI), DW Core, and Arch & Ops.  Core competencies in rapid development, VLDB, MPP, business analysis, DW Dev.
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
EDW Metadata ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],We designed a tool to collect up-to-date ETL metadata automatically.
Subject Areas and Tables
Domains
Data Model Management
ETL Metadata Collecting Automation Meta Data Analysis Engine Meta Data Repository DBQL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],TUM ,[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Killer App 1: Data Flow Diagram(DFD)
BEFORE/AFTER ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DFD Tool Architecture DBQL Table Usage Info ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DFD Repository  is the collection of DFDs, we organize and display online Data Flow Analysis Engine DFD Meta Data DFD Repository
How to Read DFD? Step2: the step number is ordered by the job start time  Job Start/End Time(HH:MM:SS) The script(job) name to populate the table in the step The output table of step1, also, it is the input table of step2 Round Corner Rectangle: The upstream tables from other subject area Blue line:  Stands for the process  critical path  Set Background as gray to highlight the target table of the diagram
DFD’s   Homepage
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],Killer App 2: Data Rationalization ,[object Object]
Table Usage Metrics Platform ,[object Object]
Table Usage Metrics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],2011-07-20/2011-07-26 Primary Secondary TableName  Table Size(GB)  Table Skew Factor Daily CPU Cost Batch Hits End User Hits Loading Strategy  Finish Time  Table Size(GB)  Table Skew Factor Daily CPU Cost Batch Hits End User Hits Loading Strategy  Finish Time  Table 1 3,990 5 308,778 0 15 Daily Batch 15 3,969 2 323,822 0 26 Daily Batch 14 Table2  3,125 4 76,274 24,837 28,338 Daily Batch 2 3,071 2 124,307 13,710 21,126 Daily Batch 3
Data Rationalization Process
Typical Approaches ,[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ETL JOB RUNTIME INFO from all ETL SERVER UC4 TABLE USAGE MASTER DATA FLOW JOBTRACK REPOSITORY TERADATA QUERYLOG from TD1/TD2/TD3/TD5  TABLE DEPENDECY QUERY PATTERN QUERY USER BEHAVIOR  USER QUERY/BATCTH JOB ENHANCEMENT MDR TABLE USAGE INRO ETL JOB STATUS JOB TRACK REPOSITORY DATA SOURCE Applications … JOBTRACK   OVERVIEW
JOBTRACK FEATURES AUTOMATION for  Any  Table +  Any  ETL JOB REALTIME + HISTORY + FORECAST ALL  INFORMATION IN  ONE   PAGE NOT ONLY Dataflow, you can get all data about Data  info you need
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Other Applications
Q &A

Contenu connexe

Tendances

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesInformaticaTrainingClasses
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleSajjad Zaheer
 
02. Data Warehouse and OLAP
02. Data Warehouse and OLAP02. Data Warehouse and OLAP
02. Data Warehouse and OLAPAchmad Solichin
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data WarehouseZalpa Rathod
 
An overview of data warehousing and OLAP technology
An overview of  data warehousing and OLAP technology An overview of  data warehousing and OLAP technology
An overview of data warehousing and OLAP technology Nikhatfatima16
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platformdataeaze systems
 
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...y-asgari
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data WarehousesMichael Lamont
 
SAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White PaperSAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White PaperVipul Neema
 
Module 02 teradata basics
Module 02 teradata basicsModule 02 teradata basics
Module 02 teradata basicsMd. Noor Alam
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem ChemAxon
 

Tendances (20)

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
User 2013-oracle-big-data-analytics-1971985
User 2013-oracle-big-data-analytics-1971985User 2013-oracle-big-data-analytics-1971985
User 2013-oracle-big-data-analytics-1971985
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
02. Data Warehouse and OLAP
02. Data Warehouse and OLAP02. Data Warehouse and OLAP
02. Data Warehouse and OLAP
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data Warehouse
 
Star schema
Star schemaStar schema
Star schema
 
An overview of data warehousing and OLAP technology
An overview of  data warehousing and OLAP technology An overview of  data warehousing and OLAP technology
An overview of data warehousing and OLAP technology
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Star schema PPT
Star schema PPTStar schema PPT
Star schema PPT
 
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
SAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White PaperSAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White Paper
 
ETL QA
ETL QAETL QA
ETL QA
 
Module 02 teradata basics
Module 02 teradata basicsModule 02 teradata basics
Module 02 teradata basics
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Oracle: DW Design
Oracle: DW DesignOracle: DW Design
Oracle: DW Design
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
 

En vedette

Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Erik Fransen
 
Business Intelligence Release Management Best Practices
Business Intelligence Release Management Best PracticesBusiness Intelligence Release Management Best Practices
Business Intelligence Release Management Best PracticesJohn Heaton
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeData Science Thailand
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
85 business analyst interview questions and answers
85 business analyst interview questions and answers85 business analyst interview questions and answers
85 business analyst interview questions and answersBusinessAnalyst247
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 

En vedette (6)

Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
 
Business Intelligence Release Management Best Practices
Business Intelligence Release Management Best PracticesBusiness Intelligence Release Management Best Practices
Business Intelligence Release Management Best Practices
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
85 business analyst interview questions and answers
85 business analyst interview questions and answers85 business analyst interview questions and answers
85 business analyst interview questions and answers
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 

Similaire à eBay EDW元数据管理及应用

Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightEduardo Castro
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery ToolsAntonio Rolle
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analyticsMariaDB plc
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Jos van Dongen
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Keshav Murthy
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra QUONTRASOLUTIONS
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platformdataeaze systems
 
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...Vinu Charanya
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligenceAhsan Kabir
 

Similaire à eBay EDW元数据管理及应用 (20)

Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analytics
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
3dw
3dw3dw
3dw
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data.
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
3dw
3dw3dw
3dw
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 

Plus de mysqlops

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautifulmysqlops
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解mysqlops
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementmysqlops
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationmysqlops
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Clustermysqlops
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationmysqlops
 
Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsmysqlops
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告mysqlops
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫mysqlops
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践mysqlops
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现mysqlops
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析mysqlops
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考mysqlops
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示mysqlops
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事mysqlops
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDLmysqlops
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护mysqlops
 
MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规mysqlops
 

Plus de mysqlops (20)

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautiful
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-management
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replication
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internals
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDL
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护
 
Memcached
MemcachedMemcached
Memcached
 
DevOPS
DevOPSDevOPS
DevOPS
 
MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规
 

Dernier

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 

Dernier (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 

eBay EDW元数据管理及应用

  • 1. eBay EDW Metadata Management and Applications Dec 2011 熊家治 eBay 数据分析平台架构师 [email_address]
  • 2.
  • 3. The Birth of eBay . . . . . . sold for $14.83 USD Started with a Broken Laser Pointer . . . AuctionWeb was born on the Labor Day weekend in September 1995 Pierre Omidyar $30 eBay Founder
  • 4. The Birth of eBay . . . FREE Service Running Off from a Home Server . . . $240 USD/month Pierre Omidyar
  • 5. The Birth of eBay . . . Requesting for donations . . . Coins Personal Check Bills Money Order Coupons Movie Tickets
  • 6. The Birth of eBay . . . Start Profitable . . .
  • 7. The Birth of eBay . . . Initial Business Model and Target Users . . . Build equitable electronic marketplace for Americans to buy and sell their stuff
  • 8. eBay Facts 450+ Million Registered Users Over 2 Billion Photos 220+ Million Active Item Listing for sale 50,000 Categories 2 Petabytes Stored 25 Petabytes Processed daily 300+ Features per quarter 100,000 lines of code rolled out every 2 weeks 48 Billion SQL Calls Per day 5.5 Billion API Calls Per month > 4.4 GB Source Code - 16 Years After . . . Global Presents In 33 International Markets 10+ Million New Items Added Per Day $2,000+ USD Trading Value Per Second
  • 9. Analytical Data Platforms Singularity EDW Low End Enterprise-class System Discover & Explore Analyze & Report 20-50 concurrent users 500+ concurrent users Enterprise-class System >5 concurrent users Structure the Unstructured Detect Patterns Hadoop Developer System EDW/ODW (Primary& Secondary) “ Compare User Activity against last year” Trending and Forecast Analysis (large history) Operational Analytics Transactional Analytics High volume ad hoc queries Contextual-Complex Analytics Deep, Seasonal, Consumable Data Sets Production Data Warehousing Large Concurrent User-base Image Fingerprinting Image Classification Pattern Recognition Detect Counterfeits & SNADs
  • 10.
  • 11.
  • 13.
  • 14. APD– Resource Distribution Chennai, India Cognizant Technology Services (on shore / off shore model) Shanghai, CN DW Core Team, APD Ops anchor point for China based outsourcers (HP, DX). Core competencies DW Development, Business System Analysis, Quality Assurance, Architecture, Project Management Office and Production Support. Seattle, WA DW Core Team & anchor point for India based outsourcing. Core competencies in VLDB and highly efficient / scaleable arch (Next Gen). San Jose, CA BU Dedicated Teams (IMS, DMS, MRM, UBI), DW Core, and Arch & Ops. Core competencies in rapid development, VLDB, MPP, business analysis, DW Dev.
  • 15.
  • 16.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. How to Read DFD? Step2: the step number is ordered by the job start time Job Start/End Time(HH:MM:SS) The script(job) name to populate the table in the step The output table of step1, also, it is the input table of step2 Round Corner Rectangle: The upstream tables from other subject area Blue line: Stands for the process critical path Set Background as gray to highlight the target table of the diagram
  • 26. DFD’s Homepage
  • 27.
  • 28.
  • 29.
  • 30.
  • 32.
  • 33.
  • 34. ETL JOB RUNTIME INFO from all ETL SERVER UC4 TABLE USAGE MASTER DATA FLOW JOBTRACK REPOSITORY TERADATA QUERYLOG from TD1/TD2/TD3/TD5 TABLE DEPENDECY QUERY PATTERN QUERY USER BEHAVIOR USER QUERY/BATCTH JOB ENHANCEMENT MDR TABLE USAGE INRO ETL JOB STATUS JOB TRACK REPOSITORY DATA SOURCE Applications … JOBTRACK OVERVIEW
  • 35. JOBTRACK FEATURES AUTOMATION for Any Table + Any ETL JOB REALTIME + HISTORY + FORECAST ALL INFORMATION IN ONE PAGE NOT ONLY Dataflow, you can get all data about Data info you need
  • 36.
  • 37.