SlideShare une entreprise Scribd logo
1  sur  41
THINKING BIG
                                             TOGETHER
                                     Demonstrating the Future
                                        of Data Science
                                                         Mike Maxey
                                 Office of Strategy — Greenplum, A Division of EMC



© Copyright 2012 EMC Corporation. All rights reserved.                               1
The New Normal
    DATA DEVICES

                                                                       Individuals
                      Law                                                                                                                               Employers
                  Enforcement                     Analytic                                                          Advertising
                                                                                        Information                                    Marketers
                                                  Services                                Brokers


                                     MEDICAL                                                          INTERNET

                                                                                                                            Websites




                                                                                                                                                     Data
                                                                                                                                                     Aggregators
                                   GOVERNMENT                                                                     RETAIL
     Data
     Users/Buyers
                                                                                                            Catalog
                                                                                                            Co-ops


                     Media                                    Credit
                                       Media                                              List
                                                             Bureaus
                                      Archives                                          Brokers
                                                                                                                                                      Private
                                                 PHONE/                                                                                            Investigators
                                                   TV                                             FINANCIAL      Delivery                            /Lawyers
                                                                           Government
                                                                                                                 Services
                                                     Banks




© Copyright 2012 EMC Corporation. All rights reserved.                                                                                                              2
Through 2015, organizations integrating high-
           value, diverse, new information types and
       sources into a coherent information management
        infrastructure will outperform their industry
            peers financially by more than 20%.



  Source: Gartner; Hype Cycle for Big Data, 2012; July 31, 2012



© Copyright 2012 EMC Corporation. All rights reserved.            3
WHAT DOES
                                       IT TAKE?

© Copyright 2012 EMC Corporation. All rights reserved.   4
1. New Applications




© Copyright 2012 EMC Corporation. All rights reserved.   5
© Copyright 2012 EMC Corporation. All rights reserved.   6
2. Data Science




© Copyright 2012 EMC Corporation. All rights reserved.   7
data•science art of mathematically
       sophisticated data engineers
       delivering insights from data into
       business decisions and systems




© Copyright 2012 EMC Corporation. All rights reserved.   8
10 Years Of Patient History


       Saving Lives and Money With Data Science


© Copyright 2012 EMC Corporation. All rights reserved.                9
3. The Right Platform




© Copyright 2012 EMC Corporation. All rights reserved.   10
Big Data Requires a Unified Platform

                                                                 COLLABORATION &
             3       People                                       PRODUCTIVITY

                                                   RICH SQL & APPLICATION SUPPORT
             2         Tools



             1          Data
                                                    STRUCTURED     UNSTRUCTURED


© Copyright 2012 EMC Corporation. All rights reserved.                              11
Big Data Requires a Unified Platform




             1          Data
                                                    STRUCTURED   UNSTRUCTURED


© Copyright 2012 EMC Corporation. All rights reserved.                          12
MPP Databases



         10-100x @ 1/10th
            BETTER PERFORMANCE                           THE EDW COST




© Copyright 2012 EMC Corporation. All rights reserved.                  13
―What used to take 24 hours on Oracle, I can
                 do in less than 10 minutes on Greenplum.‖



© Copyright 2012 EMC Corporation. All rights reserved.         14
Out-Of-The-Box Functionality




                               Enterprise Data           MPP Database   Hadoop
                                 Warehouse



© Copyright 2012 EMC Corporation. All rights reserved.                           15
hadoop programmatic batch
       processing at scale.




© Copyright 2012 EMC Corporation. All rights reserved.   16
―We offloaded transformations to Hadoop
                                      and saved money on day one.‖

                                            —Top Telecommunications Company




© Copyright 2012 EMC Corporation. All rights reserved.                        17
IT TAKES MORE THAN

                                                         ONE TOOL

© Copyright 2012 EMC Corporation. All rights reserved.                        18
Greenplum UAP Unifies MPP and Hadoop
Access
                 SQL            ODBC/JDBC                Java/Perl/Python     CLI   PigLatin   HQL   OTHER
& Query



                                                             PARALLEL QUERY
                                                              INTEGRATION

                             SQL                                PARALLEL
                                                                                    HDFS
                                                             IMPORT/EXPORT



                GREENPLUM DATABASE                                            GREENPLUM HD

                                                         Greenplum UAP

© Copyright 2012 EMC Corporation. All rights reserved.                                                       19
Big Data Requires a Unified Platform



                                                   RICH SQL & APPLICATION SUPPORT
             2         Tools



             1          Data
                                                    STRUCTURED    UNSTRUCTURED


© Copyright 2012 EMC Corporation. All rights reserved.                              20
Business Intelligence and Reporting

                                                         Answering and
                                                         enabling new
                                                         questions

                                                         Extending the
                                                         reach of data and
                                                         insights




© Copyright 2012 EMC Corporation. All rights reserved.                       21
Predictive Analytics

  End-to-end
  analytics in a
  single view

  Multiple levels of
  access, powerful
  and jargon-free




© Copyright 2012 EMC Corporation. All rights reserved.   22
Powerful Partner Ecosystem
                                                  BUSINESS         DATA
     ANALYTICS                                  INTELLIGENCE   INTEGRATION    INDUSTRY




    Discovix




                                                                             TECHNOLOGY




© Copyright 2012 EMC Corporation. All rights reserved.                                    23
Big Data Requires a Unified Platform

                                                                 COLLABORATION &
             3       People                                       PRODUCTIVITY

                                                   RICH SQL & APPLICATION SUPPORT
             2         Tools



             1          Data
                                                    STRUCTURED     UNSTRUCTURED


© Copyright 2012 EMC Corporation. All rights reserved.                              24
High Cost of Knowledge Sharing

    Process breaks when
    organization structure
    changes
    Very difficult knowledge
    transfer
    No ―insurance policy‖ for
    intellectual assets


© Copyright 2012 EMC Corporation. All rights reserved.   25
Big Data Productivity


      Real-time collaboration
      for the entire team

      Shared data,
      shared models,
      shared insights




© Copyright 2012 EMC Corporation. All rights reserved.   26
DEMONSTRATION


© Copyright 2012 EMC Corporation. All rights reserved.   27
GREENPLUM CHORUS

                                                         A Social Platform For
                                                         Collaborative
                                                         Data Science


© Copyright 2012 EMC Corporation. All rights reserved.                           28
Chorus Enables Collaborative
Data Science
      Quickly deliver value from
      your data
      Share domain knowledge,
      content, and findings
      Keep teams productive as
      organizations change


© Copyright 2012 EMC Corporation. All rights reserved.   29
OPEN SOURCE
                  NOW AVAILABLE

© Copyright 2012 EMC Corporation. All rights reserved.   30
Availability of the OpenChorus Project

    www.openchorus.org                                   Chorus open source available
                                                         on October 23rd, 2012
                                                         Apache 2.0 license
                                                         Promotes an ecosystem of
                                                         data sources, applications,
                                                         and data science community



© Copyright 2012 EMC Corporation. All rights reserved.                                  31
The largest provider of social media data for
                              enterprise use.




© Copyright 2012 EMC Corporation. All rights reserved.         32
© Copyright 2012 EMC Corporation. All rights reserved.   33
GNIP Twitter Access
    Access to historical
    Twitter feeds as Chorus
    data source through
    GNIP APIs
    Import Twitter into
    Chorus as sandbox data




© Copyright 2012 EMC Corporation. All rights reserved.   34
© Copyright 2012 EMC Corporation. All rights reserved.   35
Tableau 8: Think with your Data
   Visual Analytics                                      Business Integration




                                                         Fast


   Any Data




                                                         Web & Mobile
                                                         Authoring

© Copyright 2012 EMC Corporation. All rights reserved.                          36
Tableau Server Integration
    Provision Tableau
    Workbooks from Chorus
    data sources
    Link and co-author
    Tableau hosted work files
    Tag and annotate on
    Tableau assets from within
    Chorus

© Copyright 2012 EMC Corporation. All rights reserved.   37
© Copyright 2012 EMC Corporation. All rights reserved.   38
Kaggle Top 27




© Copyright 2012 EMC Corporation. All rights reserved.                   39
Kaggle Data Scientist Resources
    Solicit for data scientist
    resources from Chorus
    interface
        – Access Kaggle data scientist
          profiles
        – Package Chorus workspace
          assets in project proposals
        – Solicit for collaboration
          opportunities



© Copyright 2012 EMC Corporation. All rights reserved.   40
THINKING BIG
                                                         TOGETHER
                                                         greenplum.com/communities


                                                                 #greenplum



© Copyright 2012 EMC Corporation. All rights reserved.                               41

Contenu connexe

Tendances

Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and AnalyticsGreenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analyticseaiti
 
EMC Greenplum Database version 4.2
EMC Greenplum Database version 4.2 EMC Greenplum Database version 4.2
EMC Greenplum Database version 4.2 EMC
 
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceThe IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceIBM Sverige
 
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceThe IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceIBM Danmark
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataAsis Mohanty
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200xIBM Sverige
 
Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad IIIT ALLAHABAD
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on HadoopEMC
 
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)Girish Srivastava
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadataLouis liu
 
Ibm pure data system for analytics n3001
Ibm pure data system for analytics n3001Ibm pure data system for analytics n3001
Ibm pure data system for analytics n3001Abhishek Satyam
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10keirdo1
 
Ibm db2 analytics accelerator high availability and disaster recovery
Ibm db2 analytics accelerator  high availability and disaster recoveryIbm db2 analytics accelerator  high availability and disaster recovery
Ibm db2 analytics accelerator high availability and disaster recoverybupbechanhgmail
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradataAsis Mohanty
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsDataWorks Summit
 
White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS   White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS EMC
 

Tendances (19)

Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and AnalyticsGreenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analytics
 
EMC Greenplum Database version 4.2
EMC Greenplum Database version 4.2 EMC Greenplum Database version 4.2
EMC Greenplum Database version 4.2
 
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse ApplianceThe IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse Appliance
 
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceThe IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse appliance
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
 
Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on Hadoop
 
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadata
 
Ibm pure data system for analytics n3001
Ibm pure data system for analytics n3001Ibm pure data system for analytics n3001
Ibm pure data system for analytics n3001
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10
 
Ibm db2 analytics accelerator high availability and disaster recovery
Ibm db2 analytics accelerator  high availability and disaster recoveryIbm db2 analytics accelerator  high availability and disaster recovery
Ibm db2 analytics accelerator high availability and disaster recovery
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradata
 
Teradata - Architecture of Teradata
Teradata - Architecture of TeradataTeradata - Architecture of Teradata
Teradata - Architecture of Teradata
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analytics
 
White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS   White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS
 
1 ieee98
1 ieee981 ieee98
1 ieee98
 
Netezza pure data
Netezza pure dataNetezza pure data
Netezza pure data
 

Similaire à Demonstrating the Future of Data Science

Manoj Chugh - Welcome Note and Changing Role of CIO's
Manoj Chugh - Welcome Note and Changing Role of CIO'sManoj Chugh - Welcome Note and Changing Role of CIO's
Manoj Chugh - Welcome Note and Changing Role of CIO'sEMC Forum India
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsEMC
 
EMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj ChughEMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj ChughEMC Forum India
 
Face to Face with Big Data
Face to Face with Big Data Face to Face with Big Data
Face to Face with Big Data EMC
 
M12S13 - RIM for the Next Generation: A Call to Action
 M12S13 - RIM for the Next Generation: A Call to Action M12S13 - RIM for the Next Generation: A Call to Action
M12S13 - RIM for the Next Generation: A Call to ActionMER Conference
 
Rob anderson
Rob andersonRob anderson
Rob andersonEduserv
 
Big data cloud cloud circle keynote_final laura colvine 8th november 2012
Big data cloud cloud circle keynote_final laura colvine 8th november 2012Big data cloud cloud circle keynote_final laura colvine 8th november 2012
Big data cloud cloud circle keynote_final laura colvine 8th november 2012IBM
 
M12S19 - S19 - CASE STUDY: e-RIM Success with Structured Data Systems
 M12S19 - S19 - CASE STUDY: e-RIM Success with Structured Data Systems M12S19 - S19 - CASE STUDY: e-RIM Success with Structured Data Systems
M12S19 - S19 - CASE STUDY: e-RIM Success with Structured Data SystemsMER Conference
 
Partnership for the Private Cloud
Partnership for the Private CloudPartnership for the Private Cloud
Partnership for the Private CloudCisco Canada
 
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisatiesData Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisatiesMultiscope
 
Informatica Presents: 10 Best Practices for Successful MDM Implementations fr...
Informatica Presents: 10 Best Practices for Successful MDM Implementations fr...Informatica Presents: 10 Best Practices for Successful MDM Implementations fr...
Informatica Presents: 10 Best Practices for Successful MDM Implementations fr...DATAVERSITY
 
Cw13 cloud meets big data by ibrahim alloub-emc
Cw13 cloud meets big data by ibrahim alloub-emcCw13 cloud meets big data by ibrahim alloub-emc
Cw13 cloud meets big data by ibrahim alloub-emcinevitablecloud
 
Keynote by Mario Derba at Optimized Data Center event, Milano
Keynote by Mario Derba at Optimized Data Center event, MilanoKeynote by Mario Derba at Optimized Data Center event, Milano
Keynote by Mario Derba at Optimized Data Center event, MilanoMario Derba
 
MITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationMITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationREMilk
 
Keynote by Mario Derba at Oracle Optimized Data Center event in Paris
Keynote by Mario Derba at Oracle Optimized Data Center event in Paris Keynote by Mario Derba at Oracle Optimized Data Center event in Paris
Keynote by Mario Derba at Oracle Optimized Data Center event in Paris Mario Derba
 
Keynote - Randy Newell of IBM
Keynote - Randy Newell of IBMKeynote - Randy Newell of IBM
Keynote - Randy Newell of IBMjowen_evansdata
 

Similaire à Demonstrating the Future of Data Science (20)

Manoj Chugh - Welcome Note and Changing Role of CIO's
Manoj Chugh - Welcome Note and Changing Role of CIO'sManoj Chugh - Welcome Note and Changing Role of CIO's
Manoj Chugh - Welcome Note and Changing Role of CIO's
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
EMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj ChughEMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
 
Face to Face with Big Data
Face to Face with Big Data Face to Face with Big Data
Face to Face with Big Data
 
Open Data for Enterprises
Open Data for EnterprisesOpen Data for Enterprises
Open Data for Enterprises
 
M12S13 - RIM for the Next Generation: A Call to Action
 M12S13 - RIM for the Next Generation: A Call to Action M12S13 - RIM for the Next Generation: A Call to Action
M12S13 - RIM for the Next Generation: A Call to Action
 
Rob anderson
Rob andersonRob anderson
Rob anderson
 
101 ab 1445-1515
101 ab 1445-1515101 ab 1445-1515
101 ab 1445-1515
 
101 ab 1445-1515
101 ab 1445-1515101 ab 1445-1515
101 ab 1445-1515
 
Big data cloud cloud circle keynote_final laura colvine 8th november 2012
Big data cloud cloud circle keynote_final laura colvine 8th november 2012Big data cloud cloud circle keynote_final laura colvine 8th november 2012
Big data cloud cloud circle keynote_final laura colvine 8th november 2012
 
M12S19 - S19 - CASE STUDY: e-RIM Success with Structured Data Systems
 M12S19 - S19 - CASE STUDY: e-RIM Success with Structured Data Systems M12S19 - S19 - CASE STUDY: e-RIM Success with Structured Data Systems
M12S19 - S19 - CASE STUDY: e-RIM Success with Structured Data Systems
 
Partnership for the Private Cloud
Partnership for the Private CloudPartnership for the Private Cloud
Partnership for the Private Cloud
 
Greenplum hadoop
Greenplum hadoopGreenplum hadoop
Greenplum hadoop
 
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisatiesData Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
Data Pioneers - Roland Haeve (Atos Nederland) - Big data in organisaties
 
Informatica Presents: 10 Best Practices for Successful MDM Implementations fr...
Informatica Presents: 10 Best Practices for Successful MDM Implementations fr...Informatica Presents: 10 Best Practices for Successful MDM Implementations fr...
Informatica Presents: 10 Best Practices for Successful MDM Implementations fr...
 
Cw13 cloud meets big data by ibrahim alloub-emc
Cw13 cloud meets big data by ibrahim alloub-emcCw13 cloud meets big data by ibrahim alloub-emc
Cw13 cloud meets big data by ibrahim alloub-emc
 
Keynote by Mario Derba at Optimized Data Center event, Milano
Keynote by Mario Derba at Optimized Data Center event, MilanoKeynote by Mario Derba at Optimized Data Center event, Milano
Keynote by Mario Derba at Optimized Data Center event, Milano
 
MITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationMITA Beyond MMIS Presentation
MITA Beyond MMIS Presentation
 
Keynote by Mario Derba at Oracle Optimized Data Center event in Paris
Keynote by Mario Derba at Oracle Optimized Data Center event in Paris Keynote by Mario Derba at Oracle Optimized Data Center event in Paris
Keynote by Mario Derba at Oracle Optimized Data Center event in Paris
 
Keynote - Randy Newell of IBM
Keynote - Randy Newell of IBMKeynote - Randy Newell of IBM
Keynote - Randy Newell of IBM
 

Dernier

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Demonstrating the Future of Data Science

  • 1. THINKING BIG TOGETHER Demonstrating the Future of Data Science Mike Maxey Office of Strategy — Greenplum, A Division of EMC © Copyright 2012 EMC Corporation. All rights reserved. 1
  • 2. The New Normal DATA DEVICES Individuals Law Employers Enforcement Analytic Advertising Information Marketers Services Brokers MEDICAL INTERNET Websites Data Aggregators GOVERNMENT RETAIL Data Users/Buyers Catalog Co-ops Media Credit Media List Bureaus Archives Brokers Private PHONE/ Investigators TV FINANCIAL Delivery /Lawyers Government Services Banks © Copyright 2012 EMC Corporation. All rights reserved. 2
  • 3. Through 2015, organizations integrating high- value, diverse, new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20%. Source: Gartner; Hype Cycle for Big Data, 2012; July 31, 2012 © Copyright 2012 EMC Corporation. All rights reserved. 3
  • 4. WHAT DOES IT TAKE? © Copyright 2012 EMC Corporation. All rights reserved. 4
  • 5. 1. New Applications © Copyright 2012 EMC Corporation. All rights reserved. 5
  • 6. © Copyright 2012 EMC Corporation. All rights reserved. 6
  • 7. 2. Data Science © Copyright 2012 EMC Corporation. All rights reserved. 7
  • 8. data•science art of mathematically sophisticated data engineers delivering insights from data into business decisions and systems © Copyright 2012 EMC Corporation. All rights reserved. 8
  • 9. 10 Years Of Patient History Saving Lives and Money With Data Science © Copyright 2012 EMC Corporation. All rights reserved. 9
  • 10. 3. The Right Platform © Copyright 2012 EMC Corporation. All rights reserved. 10
  • 11. Big Data Requires a Unified Platform COLLABORATION & 3 People PRODUCTIVITY RICH SQL & APPLICATION SUPPORT 2 Tools 1 Data STRUCTURED UNSTRUCTURED © Copyright 2012 EMC Corporation. All rights reserved. 11
  • 12. Big Data Requires a Unified Platform 1 Data STRUCTURED UNSTRUCTURED © Copyright 2012 EMC Corporation. All rights reserved. 12
  • 13. MPP Databases 10-100x @ 1/10th BETTER PERFORMANCE THE EDW COST © Copyright 2012 EMC Corporation. All rights reserved. 13
  • 14. ―What used to take 24 hours on Oracle, I can do in less than 10 minutes on Greenplum.‖ © Copyright 2012 EMC Corporation. All rights reserved. 14
  • 15. Out-Of-The-Box Functionality Enterprise Data MPP Database Hadoop Warehouse © Copyright 2012 EMC Corporation. All rights reserved. 15
  • 16. hadoop programmatic batch processing at scale. © Copyright 2012 EMC Corporation. All rights reserved. 16
  • 17. ―We offloaded transformations to Hadoop and saved money on day one.‖ —Top Telecommunications Company © Copyright 2012 EMC Corporation. All rights reserved. 17
  • 18. IT TAKES MORE THAN ONE TOOL © Copyright 2012 EMC Corporation. All rights reserved. 18
  • 19. Greenplum UAP Unifies MPP and Hadoop Access SQL ODBC/JDBC Java/Perl/Python CLI PigLatin HQL OTHER & Query PARALLEL QUERY INTEGRATION SQL PARALLEL HDFS IMPORT/EXPORT GREENPLUM DATABASE GREENPLUM HD Greenplum UAP © Copyright 2012 EMC Corporation. All rights reserved. 19
  • 20. Big Data Requires a Unified Platform RICH SQL & APPLICATION SUPPORT 2 Tools 1 Data STRUCTURED UNSTRUCTURED © Copyright 2012 EMC Corporation. All rights reserved. 20
  • 21. Business Intelligence and Reporting Answering and enabling new questions Extending the reach of data and insights © Copyright 2012 EMC Corporation. All rights reserved. 21
  • 22. Predictive Analytics End-to-end analytics in a single view Multiple levels of access, powerful and jargon-free © Copyright 2012 EMC Corporation. All rights reserved. 22
  • 23. Powerful Partner Ecosystem BUSINESS DATA ANALYTICS INTELLIGENCE INTEGRATION INDUSTRY Discovix TECHNOLOGY © Copyright 2012 EMC Corporation. All rights reserved. 23
  • 24. Big Data Requires a Unified Platform COLLABORATION & 3 People PRODUCTIVITY RICH SQL & APPLICATION SUPPORT 2 Tools 1 Data STRUCTURED UNSTRUCTURED © Copyright 2012 EMC Corporation. All rights reserved. 24
  • 25. High Cost of Knowledge Sharing Process breaks when organization structure changes Very difficult knowledge transfer No ―insurance policy‖ for intellectual assets © Copyright 2012 EMC Corporation. All rights reserved. 25
  • 26. Big Data Productivity Real-time collaboration for the entire team Shared data, shared models, shared insights © Copyright 2012 EMC Corporation. All rights reserved. 26
  • 27. DEMONSTRATION © Copyright 2012 EMC Corporation. All rights reserved. 27
  • 28. GREENPLUM CHORUS A Social Platform For Collaborative Data Science © Copyright 2012 EMC Corporation. All rights reserved. 28
  • 29. Chorus Enables Collaborative Data Science Quickly deliver value from your data Share domain knowledge, content, and findings Keep teams productive as organizations change © Copyright 2012 EMC Corporation. All rights reserved. 29
  • 30. OPEN SOURCE NOW AVAILABLE © Copyright 2012 EMC Corporation. All rights reserved. 30
  • 31. Availability of the OpenChorus Project www.openchorus.org Chorus open source available on October 23rd, 2012 Apache 2.0 license Promotes an ecosystem of data sources, applications, and data science community © Copyright 2012 EMC Corporation. All rights reserved. 31
  • 32. The largest provider of social media data for enterprise use. © Copyright 2012 EMC Corporation. All rights reserved. 32
  • 33. © Copyright 2012 EMC Corporation. All rights reserved. 33
  • 34. GNIP Twitter Access Access to historical Twitter feeds as Chorus data source through GNIP APIs Import Twitter into Chorus as sandbox data © Copyright 2012 EMC Corporation. All rights reserved. 34
  • 35. © Copyright 2012 EMC Corporation. All rights reserved. 35
  • 36. Tableau 8: Think with your Data Visual Analytics Business Integration Fast Any Data Web & Mobile Authoring © Copyright 2012 EMC Corporation. All rights reserved. 36
  • 37. Tableau Server Integration Provision Tableau Workbooks from Chorus data sources Link and co-author Tableau hosted work files Tag and annotate on Tableau assets from within Chorus © Copyright 2012 EMC Corporation. All rights reserved. 37
  • 38. © Copyright 2012 EMC Corporation. All rights reserved. 38
  • 39. Kaggle Top 27 © Copyright 2012 EMC Corporation. All rights reserved. 39
  • 40. Kaggle Data Scientist Resources Solicit for data scientist resources from Chorus interface – Access Kaggle data scientist profiles – Package Chorus workspace assets in project proposals – Solicit for collaboration opportunities © Copyright 2012 EMC Corporation. All rights reserved. 40
  • 41. THINKING BIG TOGETHER greenplum.com/communities #greenplum © Copyright 2012 EMC Corporation. All rights reserved. 41

Notes de l'éditeur

  1. SCRIPT:“For many, the ability to move data between Hadoop and a SQL analytical database is the ultimate.Not at Greenplum. We’ve gone well beyond “connectors” to allow our SQL database to access data wherever it lives.”gNet permits not only bulk movement, but also direct query access as we’ll see later. Looking at what gNet does, it not only connects the engines, but extends the massively-parallel engines with massively-parallel communications between them, and builds the necessary software layers for rapid movement and direct query access across that high-performance integration.When deployed in Greenplum’s unique Modular DCA, performance of both bulk data movement and direct data access is further enhanced because DCAs include carefully-designed switching infrastructures that assure minimum switching latency as nodes in Greenplum Database communicate directly with nodes in Greenplum HD.NOTES:
  2. Our expansive partner network ensures you protect your existing investments while having the opportunity to leverage the best available technology.Greenplum has deep partnerships with industry leading organizations such as the SAS institute, Informatica and alpine data labs. Finally, we are fortunate to work with a number of leading applications providers like Silverspring networks who leverage Greenplum as a powerful backend technology. Greenplum is proud to work with this extraordinary partner ecosystem.
  3. What we are announcing
  4. So, we’ve solved for the platform, but remember you also need the Data Scientists. (BUILD: Add Chorus) We are also announcing that we’ve joined forces with Kaggle to solve for the supply of Data Scientists, by integrating Kaggle’s data science community with Chorus, and creating a whole new data science marketplace. (BUILD: Add + Kaggle)Kaggle, as many of you know is: The leading platform for predictive modeling competitionsOver 57K participants, from over 100 countries and 200 universitiesOffers companies a cost-effective way to harness the “cognitive surplus” of the world’s best data scientistsI’d like to now invite Anthony, CEO of Kaggle to give his perspective on this exciting new integration
  5. And this is what we mean by thinking big together, with Chorus, the collaboration platform for Data Science, now open-sourced, and our partnership with Kaggle to deliver a new data science marketplace. This is big. This is solving the biggest problems facing Big Data and Data Science. This will enable organizations to reach their inner predictive enterprise.How do you learn more about this?