SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   1




INTEGRATING BIG
DATA
Dataversity Webinar
Feb 7 2012
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   2




State of Data Today
©2012 Sixth Sense Advisors, Inc. All Rights Reserved    3




A Growing Trend
 Expectations for BI are changing w/o anyone telling us

  Requirement         Expectations                               Reality
     Speed         Speed of the Internet              Speed = Infra + Arch +
                                                            Design
  Accessibility      Accessibility of a                   BI Tool licenses &
                       Smartphone                              security
    Usability         IPAD - Mobility                   Web Enabled BI Tool
   Availability       Google Search                  Data & Report Metadata
    Delivery        Speed of questions                Methodology & Signoff
      Data         Access to everything                    Structured Data
   Scalability       Cloud (Amazon)                    Existing Infrastructure
      Cost        Cell phone or Free WIFI                        Millions
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   4



The	
  Wisdom	
  of	
  Crowds	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   5


Data	
  Deluge	
  =	
  Business	
  Insights	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   6	
  



   BIG	
  Data	
  
Structured             Current                       New

                      ERP
                      CRM
                      SCM


                     Content
                     Management
                     Systems

                     Email
                     Call Center

                     Documents
                     Contracts


UnStructured
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   7




What’s so Big about Big Data

            Velocity
            Volume
            Variety
           Complexity
           Ambiguity
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   8


               So you are about to start the Big
               Data Project

   Tools                                                               Output




                     Data


instructions
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   9	
  




        The	
  Normal	
  Way	
  Results	
  In	
  ……..	
  




Image Source: Web
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   10	
  




  Why	
  Big	
  Data	
  can	
  Fail	
  on	
  the	
  RDBMS?	
  

                         New Data Types
   Current
                          New volume
     Data                                                             •  POOR
 Management               New analytics                                  Performance
   Platform                                                           •  Failed
(RDBMS + ETL             New workload                                    Programs
     +BI)                New metadata


                                                             Scalability; Sharding; ACID;
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   11	
  




BIG Data
•  Workload Demands                   •  Infrastructure
   •  Process dynamic data              Requirements
      content                             •  Scalable platform
   •  Process unstructured                •  Database independence
      data                                •  Fault tolerant
   •  Systems that can scale                 architectures
      up and scale out with               •  Low cost of acquisition
      high volume data                       and store
   •  Perform complex
                                          •  Supported by standard
      operations within                      toolsets
      reasonable response
      time
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   12




Hadoop


                                               Design Goals
                                               ü  System Shall Manage and
                                                   Heal Itself
                                               ü  Performance Shall Scale
                                                   Linearly
                                               ü  Compute Shall Move to
                                                   Data
                                               ü  Simple Core, Modular and
                                                   Extensible
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   13


Hadoop Differentiators

 Schema-on-Write: RDBMS                       Schema-on-Read: Hadoop
•    Schema must be created                   •    Data is simply copied to the file
     before data is loaded.                        store, no special transformation
                                                   is needed.
•    An explicit load operation has
     to take place which transforms           •    A SerDe (Serializer/Deserlizer)
     the data to the internal                      is applied during read time to
     structure of the database.                    extract the required columns.
•    New columns must be added                •    New data can start flowing
     explicitly before data for such               anytime and will appear
     columns can be loaded into                    retroactively once the SerDe is
     the database.                                 updated to parse them.
•    Read is Fast.                            •    Load is Fast
•    Standards/Governance.                    •    Evolving Schemas/Agility
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   14




Hadoop Known Limitations
•  Write-once model
•  A namespace with an extremely large number of files exceeds
   Namenode’s capacity to maintain
•  Cannot be mounted by exisiting OS
  •  Getting data in and out is tedious
  •  Virtual File System can solve problem
•  HDFS does not implement / support
   •  User quotas
   •  Access permissions
   •  Hard or soft links
   •  Data balancing schemes
•  No periodic checkpoints
•  Namenode is single point of failure
   •  Automatic restart and failover to another machine not yet supported
©2012 Sixth Sense Advisors, Inc. All Rights Reserved    15

   Hadoop Tips
•  Hadoop is useful                                    •  Implementation
   •  When you must process lots of                        •  Think big, start small
      unstructured data                                    •  Build on agile cycles
   •  When running batch jobs is                           •  Focus on the data, as you will
      acceptable                                              always develop schema on
   •  When you have access to lots of                         write.
      cheap hardware



                                                       •  Available Optimizations
•  Hadoop is not useful
                                                           •    Input to Maps
   •  For intense calculations with little or              •    Map only jobs
      no data                                              •    Combiner
   •  When your data is not self-contained                 •    Compression
                                                           •    Speculation
   •  When you need interactive results
                                                           •    Fault Tolerance
                                                           •    Buffer Size
                                                           •    Parallelism (threads)
                                                           •    Partitioner
                                                           •    Reporter
                                                           •    DistributedCache
                                                           •    Task child environment settings
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   16




 Hadoop Tips
•  Troubleshooting                                  •  Performance Tuning
  •  Are your partitions uniform?                       •  Increase the memory/buffer allocated
  •  Can you combine records at the map                      to the tasks
       side?                                            •    Increase the number of tasks that can
  •    Are maps reading off a DFS block                      be run in parallel
       worth of data?                                   •    Increase the number of threads that
  •    Are you running a single reduce wave                  serve the map outputs
       (unless the data size per reducers is            •    Disable unnecessary logging
       too big) ?                                       •    Turn on speculation
  •    Have you tried compressing                       •    Run reducers in one wave as they
       intermediate data & final data?                       tend to get expensive
  •    Are there buffer size issues                     •    Tune the usage of DistributedCache,
  •    Do you see unexplained “long tails”                   it can increase efficiency
  •    Are your CPU cores busy?
  •    Is at least one system resource being
       loaded?
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   17




NoSQL
•  Stands for Not Only SQL
•  Based on CAP Theorem
•  Usually do not require a fixed table schema nor do they
   use the concept of joins
•  All NoSQL offerings relax one or more of the ACID
   properties
•  NoSQL databases come in a variety of flavors
  •  XML (myXMLDB, Tamino, Sedna)
  •  Wide Column (Cassandra, Hbase, Big Table)
  •  Key/Value (Redis, Memcached with BerkleyDB)
  •  Graph (neo4j, InfoGrid)
  •  Document store (CouchDB, MongoDB)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved      18




 NoSQL Footprint

           Key       Amazon Dynamo
          Value


       Voldermort               Big       Google Big Table
                               Table
Size
                              HBase                                Lotus Notes
                                                         Doc
                                                       Database
                  Cassandra                                                                   Graph
                                                                                      Graph
                                                                                              Theory




                                   Complexity
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   19




    NoSQL
•  Access and Query                      •  Best Practices
    •  RESTful interfaces (HTTP as an        •  Design for data collection
       accessAPI)                            •  Plan the data store
    •  Query languages other than SQL        •  Organize by type and semantics
        •  SPARQL - Query language for       •  Partition for performance
           the SemanticWeb                        •  Access and Query is run time
        •  Gremlin - the graph traversal             dependent
           language                          •  Horizontal scaling
        •  Sones Graph Query Language        •  Memory Caching
    •  Data Manipulation / Query API
        •  The Google BigTable
           DataStoreAPI
        •  The Neo4jTraversalAPI
    •  Serialization Formats
        •  JSON
        •  Thrift
        •  ProtoBuffers
        •  RDF
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   20




     Textual ETL Engine
Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of
data that can be analyzed by standard analytical tools


                                                         •     Textual ETL Engine provides a robust user
                                                               interface to define rules (or patterns / keywords)
                                                               to process unstructured or semi-structured data.
                                                         •     The rules engine encapsulates all the complexity
                                                               and lets the user define simple phrases and
                                                               keywords
                                                         •     Easy to implement and easy to realize ROI




•    Advantages                                               •    Disadvantages
       •  Simple to use                                              •  Not integrated with Hadoop as a rules
       •  No MR or Coding required for text analysis                    interface
          and mining                                                 •  Currently uses Sqoop for metadata
       •  Extensible by Taxonomy integration                            interchange with Hadoop or NoSQL
       •  Works on standard and new databases                           interfaces
       •  Produces a highly columnar key-value                       •  Current GA does not handle distributed
          store, ready for metadata integration                         processing outside Windows platform
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   21




Integration
•  All RDBMS vendors today are supporting Hadoop or NoSQL as
 an integration or extension
  •    Oracle Exalytics / Big Data Appliance
  •    Teradata Aster Appliance
  •    EMC Greenplum Appliance
  •    IBM BigInsights
  •    Microsoft Windows Azure Integration
•  There are multiple providers of Hadoop distribution
   •  CloudEra
   •  HortonWorks
   •  Zettaset
•  Adapters from vendors to interface with CloudEra or
 HortonWorks distributions of Hadoop are available today. There
 are integration efforts to release Hadoop as an integral engine
 across the RDBMS vendor platforms
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   22

           Conceptual	
  SoluEon	
  Architecture	
  
                                                  Metadata             MDM


              ETL
                                Data
OLTP          ELT
                              Warehouse                                            Reporting
              CDC
                                                                                   Analytics
                                                     DataMart’s                     Search
                                                                                     OLAP
                                                                                  Text Mining
                               Big Data                                         Content Analytics
BIG Data      Textual            DW                                            Knowledge Analytics
Content        ETL
 Email                         Taxonomy
  Docs
              And / Or

           MR / Ruby / Java
              (Hadoop)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   23




Integration Tips
•  The key to the castle in integrating Big Data is metadata
•  Whatever the tool, technology and technique, if you do not
   know your metadata, your integration will fail
•  Semantic technologies and architectures will be the way to
   process and integrate the Big Data, much akin to Web 2.0
   models
•  Data quality for Big Data is a very questionable goal. To get
   some semblance of quality, taxonomies and ontologies can be
   of help
•  3rd part data providers also provide keywords, trending tags
   and scores, these can provide a lot of integration support
•  Writing business rules for Big Data can be very cumbersome
   and not all programs can be written in MapReduce
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   24


Which Tool


  Application      Hadoop              NoSQL               Textual ETL
Machine Learning     x                     x
  Sentiments         x                     x                       x
Text Processing      x                     x                       x
Image Processing     x                     x
 Video Analytics     x                     x
  Log Parsing        x                     x                       x
  Collaborative      x                     x                       x
    Filtering
 Context Search                                                    x
Email & Content                                                    x
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   25

Success	
  Stories	
  
 •  Machine learning & Recommendation Engines – Amazon,
      Orbitz
 •    CRM - Consumer Analytics, Metrics, Social Network
      Analytics, Churn, Sentiment, Influencer, Proximity
 •    Finance – Fraud, Compliance
 •    Telco – CDR, Fraud
 •    Healthcare – Provider / Patient analytics, fraud, proactive
      care
 •    Lifesciences – clinical analytics, physician outreach
 •    Pharma – Pharmacovigilance, clinical trials
 •    Insurance – fraud, geo-spatial
 •    Manufacturing – warranty analytics, supplier quality
      metrics
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   26




Data Science

Data Analytics                 Art & Science                          APPLIED SCIENCE

 Content                                                       User Interest Prediction
 Customer                                                         inventory prediction
 Product                                                              Machine learning
 Behaviors                                                              Pattern Mining
 Optimization                                                   Advanced Regression
 Big Data Processing & ETL                                                    Analysis



Business Intelligence
                                                                        Advanced Analytics
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   27

Challenges	
  
 •  Resources	
  Availability	
  
 •  MR	
  is	
  hard	
  to	
  implement	
  
 •  Speech	
  to	
  text	
  
     •  ConversaEon	
  context	
  is	
  oJen	
  missing	
  
     •  Quality	
  of	
  recording	
  
     •  Accent	
  issues	
  
 •  Visual	
  data	
  tagging	
  
     •  Images	
  
     •  Text	
  embedded	
  within	
  images	
  
 •  Metadata	
  is	
  not	
  available	
  
 •  Data	
  is	
  not	
  trusted	
  	
  
 •  Content	
  management	
  plaMorm	
  capabiliEes	
  
 •  Ontologies	
  Ambiguity	
  
 •  Taxonomy	
  IntegraEon	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   28




Contact
•  Krish Krishnan
   rkrish1124@yahoo.com
       Twitter: @datagenius

Contenu connexe

Tendances

What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaEdureka!
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Denodo
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseXpand IT
 

Tendances (20)

Big data ppt
Big data pptBig data ppt
Big data ppt
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data Database
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 

En vedette

Exploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in MuseumsExploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in MuseumsBrendan Ciecko
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collectionslljohnston
 
Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Mia
 
Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies exouniversity
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015 Den Reymer
 
Introduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakdaIntroduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakdaBAINIDA
 
Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Den Reymer
 

En vedette (9)

Exploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in MuseumsExploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in Museums
 
Liam
LiamLiam
Liam
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collections
 
Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)
 
Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies
 
QlikView & Big Data
QlikView & Big DataQlikView & Big Data
QlikView & Big Data
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015
 
Introduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakdaIntroduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakda
 
Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017
 

Similaire à Integrating Big Data: An SEO-Optimized Title

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Blueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and biBlueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and biDataWorks Summit
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...ArunshankarArjunan
 
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesSharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesHenry Ong
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopHortonworks
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesHortonworks
 

Similaire à Integrating Big Data: An SEO-Optimized Title (20)

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Integrated dwh 3
Integrated dwh 3Integrated dwh 3
Integrated dwh 3
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Blueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and biBlueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and bi
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
 
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesSharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
 

Plus de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

Plus de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Dernier

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Dernier (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Integrating Big Data: An SEO-Optimized Title

  • 1. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 1 INTEGRATING BIG DATA Dataversity Webinar Feb 7 2012
  • 2. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 2 State of Data Today
  • 3. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 3 A Growing Trend Expectations for BI are changing w/o anyone telling us Requirement Expectations Reality Speed Speed of the Internet Speed = Infra + Arch + Design Accessibility Accessibility of a BI Tool licenses & Smartphone security Usability IPAD - Mobility Web Enabled BI Tool Availability Google Search Data & Report Metadata Delivery Speed of questions Methodology & Signoff Data Access to everything Structured Data Scalability Cloud (Amazon) Existing Infrastructure Cost Cell phone or Free WIFI Millions
  • 4. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 4 The  Wisdom  of  Crowds  
  • 5. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 5 Data  Deluge  =  Business  Insights  
  • 6. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 6   BIG  Data   Structured Current New ERP CRM SCM Content Management Systems Email Call Center Documents Contracts UnStructured
  • 7. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 7 What’s so Big about Big Data Velocity Volume Variety Complexity Ambiguity
  • 8. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 8 So you are about to start the Big Data Project Tools Output Data instructions
  • 9. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 9   The  Normal  Way  Results  In  ……..   Image Source: Web
  • 10. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 10   Why  Big  Data  can  Fail  on  the  RDBMS?   New Data Types Current New volume Data •  POOR Management New analytics Performance Platform •  Failed (RDBMS + ETL New workload Programs +BI) New metadata Scalability; Sharding; ACID;
  • 11. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 11   BIG Data •  Workload Demands •  Infrastructure •  Process dynamic data Requirements content •  Scalable platform •  Process unstructured •  Database independence data •  Fault tolerant •  Systems that can scale architectures up and scale out with •  Low cost of acquisition high volume data and store •  Perform complex •  Supported by standard operations within toolsets reasonable response time
  • 12. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 12 Hadoop Design Goals ü  System Shall Manage and Heal Itself ü  Performance Shall Scale Linearly ü  Compute Shall Move to Data ü  Simple Core, Modular and Extensible
  • 13. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 13 Hadoop Differentiators Schema-on-Write: RDBMS Schema-on-Read: Hadoop •  Schema must be created •  Data is simply copied to the file before data is loaded. store, no special transformation is needed. •  An explicit load operation has to take place which transforms •  A SerDe (Serializer/Deserlizer) the data to the internal is applied during read time to structure of the database. extract the required columns. •  New columns must be added •  New data can start flowing explicitly before data for such anytime and will appear columns can be loaded into retroactively once the SerDe is the database. updated to parse them. •  Read is Fast. •  Load is Fast •  Standards/Governance. •  Evolving Schemas/Agility
  • 14. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 14 Hadoop Known Limitations •  Write-once model •  A namespace with an extremely large number of files exceeds Namenode’s capacity to maintain •  Cannot be mounted by exisiting OS •  Getting data in and out is tedious •  Virtual File System can solve problem •  HDFS does not implement / support •  User quotas •  Access permissions •  Hard or soft links •  Data balancing schemes •  No periodic checkpoints •  Namenode is single point of failure •  Automatic restart and failover to another machine not yet supported
  • 15. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 15 Hadoop Tips •  Hadoop is useful •  Implementation •  When you must process lots of •  Think big, start small unstructured data •  Build on agile cycles •  When running batch jobs is •  Focus on the data, as you will acceptable always develop schema on •  When you have access to lots of write. cheap hardware •  Available Optimizations •  Hadoop is not useful •  Input to Maps •  For intense calculations with little or •  Map only jobs no data •  Combiner •  When your data is not self-contained •  Compression •  Speculation •  When you need interactive results •  Fault Tolerance •  Buffer Size •  Parallelism (threads) •  Partitioner •  Reporter •  DistributedCache •  Task child environment settings
  • 16. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 16 Hadoop Tips •  Troubleshooting •  Performance Tuning •  Are your partitions uniform? •  Increase the memory/buffer allocated •  Can you combine records at the map to the tasks side? •  Increase the number of tasks that can •  Are maps reading off a DFS block be run in parallel worth of data? •  Increase the number of threads that •  Are you running a single reduce wave serve the map outputs (unless the data size per reducers is •  Disable unnecessary logging too big) ? •  Turn on speculation •  Have you tried compressing •  Run reducers in one wave as they intermediate data & final data? tend to get expensive •  Are there buffer size issues •  Tune the usage of DistributedCache, •  Do you see unexplained “long tails” it can increase efficiency •  Are your CPU cores busy? •  Is at least one system resource being loaded?
  • 17. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 17 NoSQL •  Stands for Not Only SQL •  Based on CAP Theorem •  Usually do not require a fixed table schema nor do they use the concept of joins •  All NoSQL offerings relax one or more of the ACID properties •  NoSQL databases come in a variety of flavors •  XML (myXMLDB, Tamino, Sedna) •  Wide Column (Cassandra, Hbase, Big Table) •  Key/Value (Redis, Memcached with BerkleyDB) •  Graph (neo4j, InfoGrid) •  Document store (CouchDB, MongoDB)
  • 18. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 18 NoSQL Footprint Key Amazon Dynamo Value Voldermort Big Google Big Table Table Size HBase Lotus Notes Doc Database Cassandra Graph Graph Theory Complexity
  • 19. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 19 NoSQL •  Access and Query •  Best Practices •  RESTful interfaces (HTTP as an •  Design for data collection accessAPI) •  Plan the data store •  Query languages other than SQL •  Organize by type and semantics •  SPARQL - Query language for •  Partition for performance the SemanticWeb •  Access and Query is run time •  Gremlin - the graph traversal dependent language •  Horizontal scaling •  Sones Graph Query Language •  Memory Caching •  Data Manipulation / Query API •  The Google BigTable DataStoreAPI •  The Neo4jTraversalAPI •  Serialization Formats •  JSON •  Thrift •  ProtoBuffers •  RDF
  • 20. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 20 Textual ETL Engine Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of data that can be analyzed by standard analytical tools •  Textual ETL Engine provides a robust user interface to define rules (or patterns / keywords) to process unstructured or semi-structured data. •  The rules engine encapsulates all the complexity and lets the user define simple phrases and keywords •  Easy to implement and easy to realize ROI •  Advantages •  Disadvantages •  Simple to use •  Not integrated with Hadoop as a rules •  No MR or Coding required for text analysis interface and mining •  Currently uses Sqoop for metadata •  Extensible by Taxonomy integration interchange with Hadoop or NoSQL •  Works on standard and new databases interfaces •  Produces a highly columnar key-value •  Current GA does not handle distributed store, ready for metadata integration processing outside Windows platform
  • 21. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 21 Integration •  All RDBMS vendors today are supporting Hadoop or NoSQL as an integration or extension •  Oracle Exalytics / Big Data Appliance •  Teradata Aster Appliance •  EMC Greenplum Appliance •  IBM BigInsights •  Microsoft Windows Azure Integration •  There are multiple providers of Hadoop distribution •  CloudEra •  HortonWorks •  Zettaset •  Adapters from vendors to interface with CloudEra or HortonWorks distributions of Hadoop are available today. There are integration efforts to release Hadoop as an integral engine across the RDBMS vendor platforms
  • 22. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 22 Conceptual  SoluEon  Architecture   Metadata MDM ETL Data OLTP ELT Warehouse Reporting CDC Analytics DataMart’s Search OLAP Text Mining Big Data Content Analytics BIG Data Textual DW Knowledge Analytics Content ETL Email Taxonomy Docs And / Or MR / Ruby / Java (Hadoop)
  • 23. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 23 Integration Tips •  The key to the castle in integrating Big Data is metadata •  Whatever the tool, technology and technique, if you do not know your metadata, your integration will fail •  Semantic technologies and architectures will be the way to process and integrate the Big Data, much akin to Web 2.0 models •  Data quality for Big Data is a very questionable goal. To get some semblance of quality, taxonomies and ontologies can be of help •  3rd part data providers also provide keywords, trending tags and scores, these can provide a lot of integration support •  Writing business rules for Big Data can be very cumbersome and not all programs can be written in MapReduce
  • 24. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 24 Which Tool Application Hadoop NoSQL Textual ETL Machine Learning x x Sentiments x x x Text Processing x x x Image Processing x x Video Analytics x x Log Parsing x x x Collaborative x x x Filtering Context Search x Email & Content x
  • 25. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 25 Success  Stories   •  Machine learning & Recommendation Engines – Amazon, Orbitz •  CRM - Consumer Analytics, Metrics, Social Network Analytics, Churn, Sentiment, Influencer, Proximity •  Finance – Fraud, Compliance •  Telco – CDR, Fraud •  Healthcare – Provider / Patient analytics, fraud, proactive care •  Lifesciences – clinical analytics, physician outreach •  Pharma – Pharmacovigilance, clinical trials •  Insurance – fraud, geo-spatial •  Manufacturing – warranty analytics, supplier quality metrics
  • 26. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 26 Data Science Data Analytics Art & Science APPLIED SCIENCE Content User Interest Prediction Customer inventory prediction Product Machine learning Behaviors Pattern Mining Optimization Advanced Regression Big Data Processing & ETL Analysis Business Intelligence Advanced Analytics
  • 27. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 27 Challenges   •  Resources  Availability   •  MR  is  hard  to  implement   •  Speech  to  text   •  ConversaEon  context  is  oJen  missing   •  Quality  of  recording   •  Accent  issues   •  Visual  data  tagging   •  Images   •  Text  embedded  within  images   •  Metadata  is  not  available   •  Data  is  not  trusted     •  Content  management  plaMorm  capabiliEes   •  Ontologies  Ambiguity   •  Taxonomy  IntegraEon  
  • 28. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 28 Contact •  Krish Krishnan rkrish1124@yahoo.com Twitter: @datagenius