SlideShare une entreprise Scribd logo
1  sur  47
copyright: Sixth Sense Advisors Inc @2012   1




BIG DATA & ANALYTICS
DAMA Chicago
April 18th 2012
copyright: Sixth Sense Advisors Inc @2012




The Buzz
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   3




A Growing Trend
  Expectations for BI are changing w/o anyone telling us
 Requirement        Expectations                            Reality
    Speed        Speed of the Internet            Speed = Infra + Arch +
                                                        Design
 Accessibility     Accessibility of a                BI Tool licenses &
                     Smartphone                           security
   Usability        IPAD - Mobility                Web Enabled BI Tool
 Availability       Google Search                      Data & Report
                                                         Metadata
   Delivery       Speed of questions             Methodology & Signoff
     Data        Access to everything                 Structured Data
  Scalability      Cloud (Amazon)                 Existing Infrastructure
     Cost         Cell phone or Free                        Millions
                         WIFI
copyright: Sixth Sense Advisors Inc @2012   4


    Data Disruptions




Porter Competitive Model
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   5




State of Data Today
copyright @Sixth Sense Advisors Inc 2012   6




Future of Data
copyright: Sixth Sense Advisors Inc @2012     7

  Big Data
Big Data can be defined as data that can grow in volume, velocity, variety and complexity at
unprecedented pace. The growth and complexity present challenges with the capture, storage,
management, analysis and visualization using the typical BI tool stack
copyright: Sixth Sense Advisors Inc @2012   8


 Tapping into the data
  Business                                Infrastructure

                                     Today we do Big or Small
Structured data
                                     compute with Small and
used today
                                     Large structured data sets




Big Data                             Big Data will mean Big or
existing across                      Small compute with Big
the enterprise                       data sets, not always
that can be                          available in structured or
made available                       semi-structured formats
to business
copyright: Sixth Sense Advisors Inc @2012   9




Analytics
•  Analytics is the key visualization technique to analyze and
   monetize from Big Data
•  The field of analytics is resurging from the advent of Big
   Data
  •  Social Analytics
  •  Sensor Analytics
  •  Text Analytics
  •  Deep Data Mining
•  Analytics needs metadata for integration
•  Applications
   •  Fraud Detection
   •  Campaign Optimization
   •  Demand and Supply Optimization
   •  Forecast Optimization
copyright: Sixth Sense Advisors Inc @2012




Long Tail


                                                     The New Way
                                               (with a bigger, longer tail)

  The Old Way
(Pareto Principle, Control
  or 80/20 rule)
                                                                Source: http://en.wikipedia.org/wiki/The_Long_Tail
                             20%



           When Web 2.0 is applied…
copyright: Sixth Sense Advisors Inc @2012

2008 US Presidential Elections




     $32 million raised from 275,000 people
              who gave $100 or less
copyright: Sixth Sense Advisors Inc @2012




   Long Tail Example
                      Web 2.0 significantly increases
                      total value contributed/received
                      by aggregating the long tail of
                      smaller value donors.
High $ value
  donors,
   Small
constellation

                                                       Source: http://en.wikipedia.org/wiki/The_Long_Tail
                20%

                            Low $ value donors,
                            Larger constellation
                                                                                                      BIG
                                                                                                      Data
copyright: Sixth Sense Advisors Inc @2012

Brand Management
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   14




What’s so Big about Big Data
                                             Velocity
                                             Volume
                                             Variety
                                            Complexity
                                            Ambiguity
copyright: Sixth Sense Advisors Inc @2012   15



 What do we collect

•  Facebook has an average of 30 billion pieces of content added
   every month

•  YouTube receives 24hours of video, every minute

•  5 Billion mobile phones in use in 2010

•  A leading retailer in the UK collects 1.5 billion pieces of
   information to adjust prices and promotions

•  Amazon.com: 30% of sales is out of its recommendation engine

•  A Boeing Jet Engine produces 20TB/Hour for engineers to
   examine in real time to make improvements
copyright: Sixth Sense Advisors Inc @2012




Potential Business Insights
•  Trends                       •  Pharmaceutical Companies
•  Brand Identity &                 •  Patient Education
   Management                       •  Physician Enriched Content
                                       Management
•  Consumer Education
                                    •  Reduce Clinical Trial Cycles
•  Competitive Intelligence            and Errors
•  Micro-Targeting Leverage         •  Pharmacovigilance
   “Crowdsourcing” driven       •  Financial
   innovation to better products •  Fraud
   and services (DELL,              •  Customer Management
   Innocentive (SAP, P&G))          •  Manufacturing
•  eDiscovery (Legal trends            •  Supply chain optimization
   and patterns, financial             •  Track & Trace
   fraud)                              •  Compliance
copyright: Sixth Sense Advisors Inc @2012           17

    Why DWBI Fails Repeatedly
                                                                         Lost value =
Business Value                                                           Sum
                                                                         (Latencies)+
                              Business Situation                         Opportunity
                                                                         Cost
                    Data Latency
Value
Lost




                                                   Data is ready

                        Analysis Latency

                                                                     Information is available


                                   Decision Latency
                                                                                                Decision is made




                                               Action time or Action distance
                                                                                                        Time

Base Graph Courtesy – Dr. Richard Hackathorn
copyright: Sixth Sense Advisors Inc @2012         18



    The Data Landscape
                                                        Datamarts
Transactional                                                                      Reports
  Systems             ODS                               & Analytical
                                                        Databases

                                                                                   Dashboar
                               Enterprise                                             ds
                              Datawarehous              Datamarts
Transactional
  Systems             ODS          e
                                                        & Analytical
                                                        Databases                  Analytic
                                                                                   Models


                                                                                    Other
Transactional
                                                                                   Applicatio
                      ODS                               Datamarts                     ns
  Systems
                                                        & Analytical
                                                        Databases




        Data Transformation
copyright: Sixth Sense Advisors Inc @2012   19




ACID Kills
•  Atomic – All of the work in a transaction completes
   (commit) or none of it completes
•  Consistent – A transaction transforms the database
   from one consistent state to another consistent state.
   Consistency is defined in terms of constraints.
•  Isolated – The results of any changes made during a
   transaction are not visible until the transaction has
   committed.
•  Durable – The results of a committed transaction
   survive failures
copyright: Sixth Sense Advisors Inc @2012   20	
  


     BIG	
  Data	
  Scenarios	
  EXAMPLES	
  
To: Bob.Collins@bankwithus.com

Dear Mr. Collins,

This email is in reference to my bank account which has
been efficiently handled by your bank for more than five
years. There has been no problem till date until last week
the situation went out of the hand.

I have deposited one of my high amount cheque to my
bank account no: 65656512 which was to be credited
same day but due to your staff carelessness it wasn’t
done and because of this negligence my reputation in the
market has been tarnished. Furthermore I had issued one
payment cheque to the party which was showing
bounced due to “Insufficient balance” just because my
cheque didn’t make on time.

My relationship with your bank has matured with the time
and it’s a shame to tell you about this kind of services are
not acceptable when it is question of somebody’s
reputation. I hope you got my point and I am attaching a
copy of the same for further rapid procedures and remit
into my account in a day.

Yours sincerely

Daniel Carter

Ph: 564-009-2311
copyright: Sixth Sense Advisors Inc @2012                                   21

       BIG Data Text Example
       •  We	
  will	
  o9en	
  imply	
  addi>onal	
  informa>on	
  in	
  spoken	
  language	
  by	
  the	
  way	
  we	
  place	
  stress	
  
          on	
  words.	
  	
  

       •  The	
  sentence	
  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  demonstrates	
  the	
  importance	
  stress	
  can	
  
          play	
  in	
  a	
  sentence,	
  and	
  thus	
  the	
  inherent	
  difficulty	
  a	
  natural	
  language	
  processor	
  can	
  have	
  
          in	
  parsing	
  it.	
  	
  
           •  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  Someone	
  else	
  said	
  it,	
  but	
  I	
  didn't.	
  	
  
           •  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  simply	
  didn't	
  ever	
  say	
  it.	
  	
  
           •  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  might	
  have	
  implied	
  it	
  in	
  some	
  way,	
  but	
  I	
  never	
  
                 explicitly	
  said	
  it.	
  	
  
           •  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  said	
  someone	
  took	
  it;	
  I	
  didn't	
  say	
  it	
  was	
  she.	
  	
  
           •  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  just	
  said	
  she	
  probably	
  borrowed	
  it.	
  	
  
           •  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  said	
  she	
  stole	
  someone	
  else's	
  money.	
  	
  
           •  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  said	
  she	
  stole	
  something,	
  but	
  not	
  my	
  money	
  

       •  Depending	
  on	
  which	
  word	
  the	
  speaker	
  places	
  the	
  stress,	
  this	
  sentence	
  could	
  have	
  several	
  
          dis>nct	
  meanings.	
  




Example Source: Wikepedia
copyright: Sixth Sense Advisors Inc @2012        22




   Pattern Detection
Clustering Techniques              Utilities
     K-Means                              Accuracy Measures
     Maximin                              Range Filters
     Agglomerative                        K-Fold Cross Validation
     Divisive                             Merge & Subset
     Regression                           Vector Magnitude

Classification Techniques
    Native Bayes                   Examples
    Neural Networks                • Text – OCR, Machine, Digital
           Back Propogational      •  Face recognition, verification, retrieval.
           Recursively Splitting   •  Finger prints recognition.
    K-Nearest Neighbor             •  Speech recognition.
    Minimum Distance               •  Medical diagnosis: X-Ray, EKG analysis
                                   •  Machine diagnostics data
Reduction Techniques               •  Geological data
    Backward Elimination           •  Automated Target Recognition (ATR).
    Forward Selection              •  Image segmentation and analysis (recognition
    Attribute Removal              from aerial or satelite photographs).
    Principal Components
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   23


               So you are about to start the Big
               Data Project

   Tools                                                               Output




                     Data


instructions
@2012 Copyright Sixth Sense Advisors   24




The	
  Normal	
  Way	
  Results	
  In	
  ……..	
  
copyright: Sixth Sense Advisors Inc @2012   25




 Performance
Re-Engineering a Ferrari Engine in a Yugo does not make the fastest
race car.



                 + New Data Types
   Current
     Data        + New volume                                 •  POOR
 Management      + New Analytics                                 Performance
   Platform                                                   •  Failed
                 + New Data Retention
(RDBMS + ETL                                                     Programs
     +BI)        + New Data Workloads
copyright: Sixth Sense Advisors Inc @2012   26


   Big	
  Data	
  and	
  You	
  

•  You	
  need	
  to	
  write	
  data	
  quickly	
  and	
  
  reliably	
  
   •  Incoming	
  data	
  streams	
  are	
  different	
  in	
  type,	
  
      size,	
  complexity	
  
   •  But	
  wri>ng	
  it	
  to	
  disk	
  or	
  memory	
  is	
  not	
  the	
  
      ul>mate	
  goal	
  

•  You	
  need	
  to	
  validate	
  data	
  in	
  real-­‐>me	
  

•  You	
  need	
  to	
  count	
  and	
  aggregate	
  as	
  
  your	
  write	
  
•  You	
  need	
  to	
  analyze	
  in	
  real-­‐>me	
  as	
  later	
  
  even	
  if	
  seconds	
  later	
  is	
  historical	
  
•  You	
  need	
  to	
  scale-­‐up	
  and	
  scale-­‐out	
  on	
  
  demand	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   27	
  




  BIG Data
ü Workload Demands                 ü Infrastructure Needs
   ü  Process dynamic data            ü  Scalable platform
       content                         ü  Database
   ü  Process unstructured                independence
       data                            ü  Highly Fault tolerant
   ü  Systems that can                    architectures
       scale up and scale out          ü  Commodity Platforms
       with high volume data           ü  Supported by standard
   ü  Perform complex                     toolsets
       operations within
       reasonable response
       time
copyright: Sixth Sense Advisors Inc @2012     28




  Data Warehouse Appliance

High Availability                                               •  A Data Warehouse (DW)
                                                                   Appliance is an integrated
Standard SQL Interface                                             set of servers, storage,
                                                                   OS, database and
Advanced Compression                                               interconnect specifically
                                                                   preconfigured and tuned
MPP                                                                for the rigors of data
                                                                   warehousing.
Leverages existing BI, ETL and OLTP investments
                                                                •  DW appliances offer an
Hadoop & MapReduce Interface / Embedded                            attractive price /
                                                                   performance value
Minimal disk I/O bottleneck; simultaneously load & query           proposition and are
                                                                   frequently a fraction of the
Auto Database Management                                           cost of traditional data
                                                                   warehouse solutions.
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   29




Hadoop


                                   Design Goals
                                   ü  System Shall Manage and
                                       Heal Itself
                                   ü  Performance Shall Scale
                                       Linearly
                                   ü  Compute Shall Move to
                                       Data
                                   ü  Simple Core, Modular and
                                       Extensible
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   30


       Hadoop Differentiators

 Schema-on-Write: RDBMS                     Schema-on-Read: Hadoop
•    Schema must be created                 •    Data is simply copied to the file
     before data is loaded.                      store, no special transformation
                                                 is needed.
•    An explicit load operation has
     to take place which transforms         •    A SerDe (Serializer/Deserlizer)
     the data to the internal                    is applied during read time to
     structure of the database.                  extract the required columns.
•    New columns must be added              •    New data can start flowing
     explicitly before data for such             anytime and will appear
     columns can be loaded into                  retroactively once the SerDe is
     the database.                               updated to parse them.
•    Read is Fast.                          •    Load is Fast
•    Standards/Governance.                  •    Evolving Schemas/Agility
copyright: Sixth Sense Advisors Inc @2012   31



Hadoop & RDBMS Analogy
                               RDBMS                                        Hadoop




               Sports car:                                         Cargo train:
               •    refined                                        •    rough
               •    has a lot of features                          •    missing a lot of luxury
               •    accelerates very fast                          •    slow to accelerate
               •    pricey                                         •    carries almost anything
               •    expensive to maintain
                                                                   •    moves a lot of stuff very
                                                                        efficiently
* Original Slide Author- Amr Adwallah , CloudEra
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   32




Hadoop Known Limitations
•  Note – All of these are being addressed by the committers this year and next

•  Write-once model
•  A namespace with an extremely large number of files
   exceeds Namenode’s capacity
•  Cannot be mounted by existing OS
  •  Getting data in and out is tedious
  •  Virtual File System can solve problem
•  HDFS does not implement / support
   •  User quotas
   •  Access permissions
   •  Hard or soft links
   •  Data balancing schemes
•  No periodic checkpoints
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   33

   Hadoop Tips
ü Hadoop is useful                              ü Implementation
   ü  When you must process lots of                 ü  Think big, start small
       unstructured data                             ü  Build on agile cycles
   ü  When running batch jobs is                    ü  Focus on the data, as you will
       acceptable                                        always develop schema on
   ü  When you have access to lots                      write.
       of cheap hardware



                                                 ü  Available Optimizations
ü Hadoop is not useful
                                                     ü  Input to Maps
   ü  For intense calculations with                 ü  Map only jobs
       little or no data                             ü  Combiner
   ü  When your data is not self-                   ü  Compression
                                                     ü  Speculation
       contained
                                                     ü  Fault Tolerance
   ü  When you need interactive                     ü  Buffer Size
       results                                       ü  Parallelism (threads)
                                                     ü  Partitioner
                                                     ü  Reporter
                                                     ü  DistributedCache
                                                     ü  Task child environment settings
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   34




 Hadoop Tips
ü Troubleshooting                             ü Performance Tuning
  ü  Are your partitions uniform?                 ü  Increase the memory/buffer
  ü  Can you combine records at the                   allocated to the tasks
      map side?                                    ü  Increase the number of tasks that
  ü  Are maps reading off a DFS block                 can be run in parallel
      worth of data?                               ü  Increase the number of threads that
  ü  Are you running a single reduce                  serve the map outputs
      wave (unless the data size per               ü  Disable unnecessary logging
      reducers is too big) ?                       ü  Turn on speculation
  ü  Have you tried compressing                   ü  Run reducers in one wave as they
      intermediate data & final data?                  tend to get expensive
  ü  Are there buffer size issues                 ü  Tune the usage of
  ü  Do you see unexplained “long tails”              DistributedCache, it can increase
  ü  Are your CPU cores busy?                         efficiency
  ü  Is at least one system resource
      being loaded?
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   35




NoSQL
•  Stands for Not Only SQL
•  Based on CAP Theorem / BASE
•  Usually do not require a fixed table schema nor do they use the
   concept of joins
•  All NoSQL offerings relax one or more of the ACID properties
•  Scalable replication and distribution
    •  Potentially thousands of machines
    •  Potentially distributed around the world
•  Queries need to return answers quickly
•  Mostly query, few updates
•  Asynchronous Inserts & Updates
•  NoSQL databases come in a variety of flavors
  •    XML (myXMLDB, Tamino, Sedna)
  •    Wide Column (Cassandra, Hbase, Big Table)
  •    Key/Value (Redis, Memcached with BerkleyDB)
  •    Graph (neo4j, InfoGrid)
  •    Document store (CouchDB, MongoDB)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved     36




       NoSQL Footprint
                Amazon Dynamo

             Key           HBase
            Value
                                 Big
                                Table
         Voldermort                        Google Big Table
Size
                                                                    Lotus Notes
                                                         Doc                               Graph
                                                       Database
                    Cassandra                                                              Theory
                                                                                    Graph




                                   Complexity
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   37




    NoSQL
ü  Access and Query                           ü  Best Practices
     ü  RESTful interfaces (HTTP as an             ü  Design for data collection
         accessAPI)                                 ü  Plan the data store
     ü  Query languages other than                 ü  Organize by type and
         SQL                                            semantics
          ü  SPARQL - Query language               ü  Partition for performance
              for the SemanticWeb                        ü  Access and Query is
          ü  Gremlin - the graph                            run time dependent
              traversal language                    ü  Horizontal scaling
          ü  Sones Graph Query                     ü  Memory Caching
              Language
     ü  Data Manipulation / Query API
          ü  The Google BigTable
              DataStoreAPI
          ü  The Neo4jTraversalAPI
     ü  Serialization Formats
          ü  JSON
          ü  Thrift
          ü  ProtoBuffers
          ü  RDF
copyright: Sixth Sense Advisors Inc @2012   38




Map Reduce
n Technique for indexing and searching large data volumes
n Two Phases, Map and Reduce
   n Map
    n Extract sets of Key-Value pairs from underlying data
    n Potentially in Parallel on multiple machines
 n Reduce
    n Merge and sort sets of Key-Value pairs
    n Results may be useful for other searches
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   39




    Textual ETL Engine
Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a
structure of data that can be analyzed by standard analytical tools


                                                         ü  Textual ETL Engine provides a robust
                                                             user interface to define rules (or
                                                             patterns / keywords) to process
                                                             unstructured or semi-structured data.
                                                         ü  The rules engine encapsulates all the
                                                             complexity and lets the user define
                                                             simple phrases and keywords
                                                         ü  Easy to implement and easy to realize
                                                             ROI


   ü  Advantages                                       ü  Disadvantages
         ü  Simple to use                                    ü  Not integrated with Hadoop as a
         ü  No MR or Coding required for text                    rules interface
             analysis and mining                              ü  Currently uses Sqoop for metadata
         ü  Extensible by Taxonomy integration                   interchange with Hadoop or NoSQL
         ü  Works on standard and new                            interfaces
             databases                                        ü  Current GA does not handle
         ü  Produces a highly columnar key-                      distributed processing outside
             value store, ready for metadata                      Windows platform
             integration
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   40




Integration
•  All RDBMS vendors today are supporting Hadoop or NoSQL as an
 integration or extension
  •    Oracle Exalytics / Big Data Appliance
  •    Teradata Aster Appliance
  •    EMC Greenplum Appliance
  •    IBM BigInsights
  •    Microsoft Windows Azure Integration
•  There are multiple providers of Hadoop distribution
    •  CloudEra
    •  HortonWorks
    •  Hadapt
    •  Zettaset
    •  IBM
•  Adapters from vendors to interface with CloudEra or HortonWorks
 distributions of Hadoop are available today. There are integration
 efforts to release Hadoop as an integral engine across the RDBMS
 vendor platforms
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   41

   Conceptual	
  Solu>on	
  Architecture	
  
                                                  Metadata             MDM


              ETL
                                Data
OLTP          ELT
                              Warehouse
              CDC
                                                     DataMart’s


                               Big Data
BIG Data      Textual            DW
Content        ETL
 Email                         Taxonomy
  Docs
              And / Or

           MR / Ruby / Java
              (Hadoop)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   42


Which Tool


  Application      Hadoop              NoSQL               Textual ETL
Machine Learning     x                     x
  Sentiments         x                     x                       x
Text Processing      x                     x                       x
Image Processing     x                     x
 Video Analytics     x                     x
  Log Parsing        x                     x                       x
  Collaborative      x                     x                       x
    Filtering
 Context Search                                                    x
Email & Content                                                    x
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   43




Integration Tips
•  The key to the castle in integrating Big Data is metadata
•  Whatever the tool, technology and technique, if you do not
   know your metadata, your integration will fail
•  Semantic technologies and architectures will be the way to
   process and integrate the Big Data, much akin to Web 2.0
   models
•  Data quality for Big Data is a very questionable goal. To get
   some semblance of quality, taxonomies and ontologies can be
   of help
•  3rd part data providers also provide keywords, trending tags
   and scores, these can provide a lot of integration support
•  Writing business rules for Big Data can be very cumbersome
   and not all programs can be written in MapReduce
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   44

Success	
  Stories	
  
 •  Machine learning & Recommendation Engines – Amazon,
      Orbitz
 •    CRM - Consumer Analytics, Metrics, Social Network
      Analytics, Churn, Sentiment, Influencer, Proximity
 •    Finance – Fraud, Compliance
 •    Telco – CDR, Fraud
 •    Healthcare – Provider / Patient analytics, fraud, proactive
      care
 •    Lifesciences – clinical analytics, physician outreach
 •    Pharma – Pharmacovigilance, clinical trials
 •    Insurance – fraud, geo-spatial
 •    Manufacturing – warranty analytics, supplier quality
      metrics
copyright: Sixth Sense Advisors Inc @2012   45




 Big Data Challenges
•  Integration to the EDW is still an open issue – Big Data
   reduces to small metrics, and this translates into the
   current state issues faced with EDW data
•  Big Data requires lot of Taxonomy processing especially
   in Content related Search
•  There are several applications that need high
   performing memory architectures as data is compute
   intensive – example image processing of brain scans
•  Technology is improving by the day, but integration and
   deployment are becoming equally complex.
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   46




Data Science
                         Art & Science
Data Analytics                                  APPLIED SCIENCE

 Content                                 User Interest Prediction
 Customer                                   inventory prediction
 Product                                        Machine learning
 Behaviors                                        Pattern Mining
 Optimization                             Advanced Regression
 Big Data Processing & ETL                              Analysis




Business Intelligence                              Advanced Analytics

  Business Analysts, Data Analysts, Metadata Architects,
  Data Architects are all in some evolutionary stage of a Data Scientist
copyright: Sixth Sense Advisors Inc @2012   47




Contact
          Krish Krishnan
          rkrish1124@yahoo.com
          Twitter - @datagenius

Contenu connexe

Tendances

Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...Dion Hinchcliffe
 
The secret art of building online communities through connections (pun intend...
The secret art of building online communities through connections (pun intend...The secret art of building online communities through connections (pun intend...
The secret art of building online communities through connections (pun intend...LetsConnect
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analyticsMike Davis
 
Navigating the Complex World of Compliance Guidelines
Navigating the Complex World of Compliance GuidelinesNavigating the Complex World of Compliance Guidelines
Navigating the Complex World of Compliance GuidelinesDATAVERSITY
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data GovernanceDATAVERSITY
 
Right now corporatepresentation july 2011
Right now corporatepresentation july 2011Right now corporatepresentation july 2011
Right now corporatepresentation july 2011Frank Ragol
 
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?DATAVERSITY
 
Towards open smart services platform
Towards open smart services platformTowards open smart services platform
Towards open smart services platformHamid Motahari
 
The Future for Smart Technology Architects
The Future for Smart Technology ArchitectsThe Future for Smart Technology Architects
The Future for Smart Technology ArchitectsPaul Preiss
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
Speed Matters - Intelligent Strategies to Accelerate Data-Driven Decisions
Speed Matters - Intelligent Strategies to Accelerate Data-Driven DecisionsSpeed Matters - Intelligent Strategies to Accelerate Data-Driven Decisions
Speed Matters - Intelligent Strategies to Accelerate Data-Driven DecisionsDATAVERSITY
 
Day 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_pressDay 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_pressIntelAPAC
 
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingDATAVERSITY
 
Increasing Security while Decreasing Costs when Virtualizing In-Scope Servers:
Increasing Security while Decreasing Costs when Virtualizing In-Scope Servers:Increasing Security while Decreasing Costs when Virtualizing In-Scope Servers:
Increasing Security while Decreasing Costs when Virtualizing In-Scope Servers:HyTrust
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessInside Analysis
 
DataEd Online: Unlocking Business Value through Data Modeling and Data Archit...
DataEd Online: Unlocking Business Value through Data Modeling and Data Archit...DataEd Online: Unlocking Business Value through Data Modeling and Data Archit...
DataEd Online: Unlocking Business Value through Data Modeling and Data Archit...DATAVERSITY
 
Modernizing Your Data Infrastructure
Modernizing Your Data InfrastructureModernizing Your Data Infrastructure
Modernizing Your Data InfrastructureBal R
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 

Tendances (20)

Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
 
The secret art of building online communities through connections (pun intend...
The secret art of building online communities through connections (pun intend...The secret art of building online communities through connections (pun intend...
The secret art of building online communities through connections (pun intend...
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
Navigating the Complex World of Compliance Guidelines
Navigating the Complex World of Compliance GuidelinesNavigating the Complex World of Compliance Guidelines
Navigating the Complex World of Compliance Guidelines
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
 
Right now corporatepresentation july 2011
Right now corporatepresentation july 2011Right now corporatepresentation july 2011
Right now corporatepresentation july 2011
 
Big data Readiness white paper
Big data  Readiness white paperBig data  Readiness white paper
Big data Readiness white paper
 
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?
 
Towards open smart services platform
Towards open smart services platformTowards open smart services platform
Towards open smart services platform
 
The Future for Smart Technology Architects
The Future for Smart Technology ArchitectsThe Future for Smart Technology Architects
The Future for Smart Technology Architects
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Speed Matters - Intelligent Strategies to Accelerate Data-Driven Decisions
Speed Matters - Intelligent Strategies to Accelerate Data-Driven DecisionsSpeed Matters - Intelligent Strategies to Accelerate Data-Driven Decisions
Speed Matters - Intelligent Strategies to Accelerate Data-Driven Decisions
 
Day 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_pressDay 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_press
 
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
 
Increasing Security while Decreasing Costs when Virtualizing In-Scope Servers:
Increasing Security while Decreasing Costs when Virtualizing In-Scope Servers:Increasing Security while Decreasing Costs when Virtualizing In-Scope Servers:
Increasing Security while Decreasing Costs when Virtualizing In-Scope Servers:
 
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
 
DataEd Online: Unlocking Business Value through Data Modeling and Data Archit...
DataEd Online: Unlocking Business Value through Data Modeling and Data Archit...DataEd Online: Unlocking Business Value through Data Modeling and Data Archit...
DataEd Online: Unlocking Business Value through Data Modeling and Data Archit...
 
Modernizing Your Data Infrastructure
Modernizing Your Data InfrastructureModernizing Your Data Infrastructure
Modernizing Your Data Infrastructure
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 

Similaire à Big Data and Analytics

Big Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability AnalyticsBig Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability AnalyticsDATAVERSITY
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotInside Analysis
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the CloudDATAVERSITY
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Big data(Sandeep Chaudhary)
Big data(Sandeep Chaudhary)Big data(Sandeep Chaudhary)
Big data(Sandeep Chaudhary)iamsandeepsivach
 
Big Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability AnalyticsBig Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability AnalyticsFitzgerald Analytics, Inc.
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Mark Heid
 
Big Data LDN 2018: INTELLIGENCE EVERYWHERE – POWER TO THE PEOPLE
Big Data LDN 2018: INTELLIGENCE EVERYWHERE – POWER TO THE PEOPLEBig Data LDN 2018: INTELLIGENCE EVERYWHERE – POWER TO THE PEOPLE
Big Data LDN 2018: INTELLIGENCE EVERYWHERE – POWER TO THE PEOPLEMatt Stubbs
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentStrategy 2 Market, Inc,
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best PracticesYellowfin
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntelAPAC
 
Scenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiScenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiFondazione CUOA
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analyticskatsoulis
 
Big data and bi best practices slidedeck
Big data and bi best practices slidedeckBig data and bi best practices slidedeck
Big data and bi best practices slidedeckActian Corporation
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
 
The Future of Advance Analytics
The Future of Advance Analytics The Future of Advance Analytics
The Future of Advance Analytics InnoTech
 

Similaire à Big Data and Analytics (20)

DAMA Presentation
DAMA PresentationDAMA Presentation
DAMA Presentation
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
Big Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability AnalyticsBig Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability Analytics
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's Not
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the Cloud
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Big Data a big deal?
Big Data a big deal?Big Data a big deal?
Big Data a big deal?
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Big data(Sandeep Chaudhary)
Big data(Sandeep Chaudhary)Big data(Sandeep Chaudhary)
Big data(Sandeep Chaudhary)
 
Big Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability AnalyticsBig Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability Analytics
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
Big Data LDN 2018: INTELLIGENCE EVERYWHERE – POWER TO THE PEOPLE
Big Data LDN 2018: INTELLIGENCE EVERYWHERE – POWER TO THE PEOPLEBig Data LDN 2018: INTELLIGENCE EVERYWHERE – POWER TO THE PEOPLE
Big Data LDN 2018: INTELLIGENCE EVERYWHERE – POWER TO THE PEOPLE
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best Practices
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
Scenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiScenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativi
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analytics
 
Big data and bi best practices slidedeck
Big data and bi best practices slidedeckBig data and bi best practices slidedeck
Big data and bi best practices slidedeck
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
The Future of Advance Analytics
The Future of Advance Analytics The Future of Advance Analytics
The Future of Advance Analytics
 

Plus de dmurph4

Insurance Data & Analytics Summit
Insurance Data & Analytics SummitInsurance Data & Analytics Summit
Insurance Data & Analytics Summitdmurph4
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Casesdmurph4
 
UML and Data Modeling - A Reconciliation
UML and Data Modeling - A ReconciliationUML and Data Modeling - A Reconciliation
UML and Data Modeling - A Reconciliationdmurph4
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Usedmurph4
 
Dama Chicago June 2012 Newsletter
Dama Chicago June 2012 NewsletterDama Chicago June 2012 Newsletter
Dama Chicago June 2012 Newsletterdmurph4
 
Mergers & Acquisitions
Mergers & AcquisitionsMergers & Acquisitions
Mergers & Acquisitionsdmurph4
 
Dama chicago newsletter_2012_issue_1
Dama chicago newsletter_2012_issue_1Dama chicago newsletter_2012_issue_1
Dama chicago newsletter_2012_issue_1dmurph4
 
2012 February dama chicago
2012 February dama chicago2012 February dama chicago
2012 February dama chicagodmurph4
 
Sample Dama Newsletter
Sample Dama NewsletterSample Dama Newsletter
Sample Dama Newsletterdmurph4
 
Data Quality - Are We There Yet?
Data Quality - Are We There Yet?Data Quality - Are We There Yet?
Data Quality - Are We There Yet?dmurph4
 
Building a Data Quality Program from Scratch
Building a Data Quality Program from ScratchBuilding a Data Quality Program from Scratch
Building a Data Quality Program from Scratchdmurph4
 

Plus de dmurph4 (11)

Insurance Data & Analytics Summit
Insurance Data & Analytics SummitInsurance Data & Analytics Summit
Insurance Data & Analytics Summit
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Cases
 
UML and Data Modeling - A Reconciliation
UML and Data Modeling - A ReconciliationUML and Data Modeling - A Reconciliation
UML and Data Modeling - A Reconciliation
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
 
Dama Chicago June 2012 Newsletter
Dama Chicago June 2012 NewsletterDama Chicago June 2012 Newsletter
Dama Chicago June 2012 Newsletter
 
Mergers & Acquisitions
Mergers & AcquisitionsMergers & Acquisitions
Mergers & Acquisitions
 
Dama chicago newsletter_2012_issue_1
Dama chicago newsletter_2012_issue_1Dama chicago newsletter_2012_issue_1
Dama chicago newsletter_2012_issue_1
 
2012 February dama chicago
2012 February dama chicago2012 February dama chicago
2012 February dama chicago
 
Sample Dama Newsletter
Sample Dama NewsletterSample Dama Newsletter
Sample Dama Newsletter
 
Data Quality - Are We There Yet?
Data Quality - Are We There Yet?Data Quality - Are We There Yet?
Data Quality - Are We There Yet?
 
Building a Data Quality Program from Scratch
Building a Data Quality Program from ScratchBuilding a Data Quality Program from Scratch
Building a Data Quality Program from Scratch
 

Dernier

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Dernier (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Big Data and Analytics

  • 1. copyright: Sixth Sense Advisors Inc @2012 1 BIG DATA & ANALYTICS DAMA Chicago April 18th 2012
  • 2. copyright: Sixth Sense Advisors Inc @2012 The Buzz
  • 3. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 3 A Growing Trend Expectations for BI are changing w/o anyone telling us Requirement Expectations Reality Speed Speed of the Internet Speed = Infra + Arch + Design Accessibility Accessibility of a BI Tool licenses & Smartphone security Usability IPAD - Mobility Web Enabled BI Tool Availability Google Search Data & Report Metadata Delivery Speed of questions Methodology & Signoff Data Access to everything Structured Data Scalability Cloud (Amazon) Existing Infrastructure Cost Cell phone or Free Millions WIFI
  • 4. copyright: Sixth Sense Advisors Inc @2012 4 Data Disruptions Porter Competitive Model
  • 5. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 5 State of Data Today
  • 6. copyright @Sixth Sense Advisors Inc 2012 6 Future of Data
  • 7. copyright: Sixth Sense Advisors Inc @2012 7 Big Data Big Data can be defined as data that can grow in volume, velocity, variety and complexity at unprecedented pace. The growth and complexity present challenges with the capture, storage, management, analysis and visualization using the typical BI tool stack
  • 8. copyright: Sixth Sense Advisors Inc @2012 8 Tapping into the data Business Infrastructure Today we do Big or Small Structured data compute with Small and used today Large structured data sets Big Data Big Data will mean Big or existing across Small compute with Big the enterprise data sets, not always that can be available in structured or made available semi-structured formats to business
  • 9. copyright: Sixth Sense Advisors Inc @2012 9 Analytics •  Analytics is the key visualization technique to analyze and monetize from Big Data •  The field of analytics is resurging from the advent of Big Data •  Social Analytics •  Sensor Analytics •  Text Analytics •  Deep Data Mining •  Analytics needs metadata for integration •  Applications •  Fraud Detection •  Campaign Optimization •  Demand and Supply Optimization •  Forecast Optimization
  • 10. copyright: Sixth Sense Advisors Inc @2012 Long Tail The New Way (with a bigger, longer tail) The Old Way (Pareto Principle, Control or 80/20 rule) Source: http://en.wikipedia.org/wiki/The_Long_Tail 20% When Web 2.0 is applied…
  • 11. copyright: Sixth Sense Advisors Inc @2012 2008 US Presidential Elections $32 million raised from 275,000 people who gave $100 or less
  • 12. copyright: Sixth Sense Advisors Inc @2012 Long Tail Example Web 2.0 significantly increases total value contributed/received by aggregating the long tail of smaller value donors. High $ value donors, Small constellation Source: http://en.wikipedia.org/wiki/The_Long_Tail 20% Low $ value donors, Larger constellation BIG Data
  • 13. copyright: Sixth Sense Advisors Inc @2012 Brand Management
  • 14. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 14 What’s so Big about Big Data Velocity Volume Variety Complexity Ambiguity
  • 15. copyright: Sixth Sense Advisors Inc @2012 15 What do we collect •  Facebook has an average of 30 billion pieces of content added every month •  YouTube receives 24hours of video, every minute •  5 Billion mobile phones in use in 2010 •  A leading retailer in the UK collects 1.5 billion pieces of information to adjust prices and promotions •  Amazon.com: 30% of sales is out of its recommendation engine •  A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements
  • 16. copyright: Sixth Sense Advisors Inc @2012 Potential Business Insights •  Trends •  Pharmaceutical Companies •  Brand Identity & •  Patient Education Management •  Physician Enriched Content Management •  Consumer Education •  Reduce Clinical Trial Cycles •  Competitive Intelligence and Errors •  Micro-Targeting Leverage •  Pharmacovigilance “Crowdsourcing” driven •  Financial innovation to better products •  Fraud and services (DELL, •  Customer Management Innocentive (SAP, P&G)) •  Manufacturing •  eDiscovery (Legal trends •  Supply chain optimization and patterns, financial •  Track & Trace fraud) •  Compliance
  • 17. copyright: Sixth Sense Advisors Inc @2012 17 Why DWBI Fails Repeatedly Lost value = Business Value Sum (Latencies)+ Business Situation Opportunity Cost Data Latency Value Lost Data is ready Analysis Latency Information is available Decision Latency Decision is made Action time or Action distance Time Base Graph Courtesy – Dr. Richard Hackathorn
  • 18. copyright: Sixth Sense Advisors Inc @2012 18 The Data Landscape Datamarts Transactional Reports Systems ODS & Analytical Databases Dashboar Enterprise ds Datawarehous Datamarts Transactional Systems ODS e & Analytical Databases Analytic Models Other Transactional Applicatio ODS Datamarts ns Systems & Analytical Databases Data Transformation
  • 19. copyright: Sixth Sense Advisors Inc @2012 19 ACID Kills •  Atomic – All of the work in a transaction completes (commit) or none of it completes •  Consistent – A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints. •  Isolated – The results of any changes made during a transaction are not visible until the transaction has committed. •  Durable – The results of a committed transaction survive failures
  • 20. copyright: Sixth Sense Advisors Inc @2012 20   BIG  Data  Scenarios  EXAMPLES   To: Bob.Collins@bankwithus.com Dear Mr. Collins, This email is in reference to my bank account which has been efficiently handled by your bank for more than five years. There has been no problem till date until last week the situation went out of the hand. I have deposited one of my high amount cheque to my bank account no: 65656512 which was to be credited same day but due to your staff carelessness it wasn’t done and because of this negligence my reputation in the market has been tarnished. Furthermore I had issued one payment cheque to the party which was showing bounced due to “Insufficient balance” just because my cheque didn’t make on time. My relationship with your bank has matured with the time and it’s a shame to tell you about this kind of services are not acceptable when it is question of somebody’s reputation. I hope you got my point and I am attaching a copy of the same for further rapid procedures and remit into my account in a day. Yours sincerely Daniel Carter Ph: 564-009-2311
  • 21. copyright: Sixth Sense Advisors Inc @2012 21 BIG Data Text Example •  We  will  o9en  imply  addi>onal  informa>on  in  spoken  language  by  the  way  we  place  stress   on  words.     •  The  sentence  "I  never  said  she  stole  my  money"  demonstrates  the  importance  stress  can   play  in  a  sentence,  and  thus  the  inherent  difficulty  a  natural  language  processor  can  have   in  parsing  it.     •  "I  never  said  she  stole  my  money"  -­‐  Someone  else  said  it,  but  I  didn't.     •  "I  never  said  she  stole  my  money"  -­‐  I  simply  didn't  ever  say  it.     •  "I  never  said  she  stole  my  money"  -­‐  I  might  have  implied  it  in  some  way,  but  I  never   explicitly  said  it.     •  "I  never  said  she  stole  my  money"  -­‐  I  said  someone  took  it;  I  didn't  say  it  was  she.     •  "I  never  said  she  stole  my  money"  -­‐  I  just  said  she  probably  borrowed  it.     •  "I  never  said  she  stole  my  money"  -­‐  I  said  she  stole  someone  else's  money.     •  "I  never  said  she  stole  my  money"  -­‐  I  said  she  stole  something,  but  not  my  money   •  Depending  on  which  word  the  speaker  places  the  stress,  this  sentence  could  have  several   dis>nct  meanings.   Example Source: Wikepedia
  • 22. copyright: Sixth Sense Advisors Inc @2012 22 Pattern Detection Clustering Techniques Utilities K-Means Accuracy Measures Maximin Range Filters Agglomerative K-Fold Cross Validation Divisive Merge & Subset Regression Vector Magnitude Classification Techniques Native Bayes Examples Neural Networks • Text – OCR, Machine, Digital Back Propogational •  Face recognition, verification, retrieval. Recursively Splitting •  Finger prints recognition. K-Nearest Neighbor •  Speech recognition. Minimum Distance •  Medical diagnosis: X-Ray, EKG analysis •  Machine diagnostics data Reduction Techniques •  Geological data Backward Elimination •  Automated Target Recognition (ATR). Forward Selection •  Image segmentation and analysis (recognition Attribute Removal from aerial or satelite photographs). Principal Components
  • 23. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 23 So you are about to start the Big Data Project Tools Output Data instructions
  • 24. @2012 Copyright Sixth Sense Advisors 24 The  Normal  Way  Results  In  ……..  
  • 25. copyright: Sixth Sense Advisors Inc @2012 25 Performance Re-Engineering a Ferrari Engine in a Yugo does not make the fastest race car. + New Data Types Current Data + New volume •  POOR Management + New Analytics Performance Platform •  Failed + New Data Retention (RDBMS + ETL Programs +BI) + New Data Workloads
  • 26. copyright: Sixth Sense Advisors Inc @2012 26 Big  Data  and  You   •  You  need  to  write  data  quickly  and   reliably   •  Incoming  data  streams  are  different  in  type,   size,  complexity   •  But  wri>ng  it  to  disk  or  memory  is  not  the   ul>mate  goal   •  You  need  to  validate  data  in  real-­‐>me   •  You  need  to  count  and  aggregate  as   your  write   •  You  need  to  analyze  in  real-­‐>me  as  later   even  if  seconds  later  is  historical   •  You  need  to  scale-­‐up  and  scale-­‐out  on   demand  
  • 27. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 27   BIG Data ü Workload Demands ü Infrastructure Needs ü  Process dynamic data ü  Scalable platform content ü  Database ü  Process unstructured independence data ü  Highly Fault tolerant ü  Systems that can architectures scale up and scale out ü  Commodity Platforms with high volume data ü  Supported by standard ü  Perform complex toolsets operations within reasonable response time
  • 28. copyright: Sixth Sense Advisors Inc @2012 28 Data Warehouse Appliance High Availability •  A Data Warehouse (DW) Appliance is an integrated Standard SQL Interface set of servers, storage, OS, database and Advanced Compression interconnect specifically preconfigured and tuned MPP for the rigors of data warehousing. Leverages existing BI, ETL and OLTP investments •  DW appliances offer an Hadoop & MapReduce Interface / Embedded attractive price / performance value Minimal disk I/O bottleneck; simultaneously load & query proposition and are frequently a fraction of the Auto Database Management cost of traditional data warehouse solutions.
  • 29. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 29 Hadoop Design Goals ü  System Shall Manage and Heal Itself ü  Performance Shall Scale Linearly ü  Compute Shall Move to Data ü  Simple Core, Modular and Extensible
  • 30. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 30 Hadoop Differentiators Schema-on-Write: RDBMS Schema-on-Read: Hadoop •  Schema must be created •  Data is simply copied to the file before data is loaded. store, no special transformation is needed. •  An explicit load operation has to take place which transforms •  A SerDe (Serializer/Deserlizer) the data to the internal is applied during read time to structure of the database. extract the required columns. •  New columns must be added •  New data can start flowing explicitly before data for such anytime and will appear columns can be loaded into retroactively once the SerDe is the database. updated to parse them. •  Read is Fast. •  Load is Fast •  Standards/Governance. •  Evolving Schemas/Agility
  • 31. copyright: Sixth Sense Advisors Inc @2012 31 Hadoop & RDBMS Analogy RDBMS Hadoop Sports car: Cargo train: •  refined •  rough •  has a lot of features •  missing a lot of luxury •  accelerates very fast •  slow to accelerate •  pricey •  carries almost anything •  expensive to maintain •  moves a lot of stuff very efficiently * Original Slide Author- Amr Adwallah , CloudEra
  • 32. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 32 Hadoop Known Limitations •  Note – All of these are being addressed by the committers this year and next •  Write-once model •  A namespace with an extremely large number of files exceeds Namenode’s capacity •  Cannot be mounted by existing OS •  Getting data in and out is tedious •  Virtual File System can solve problem •  HDFS does not implement / support •  User quotas •  Access permissions •  Hard or soft links •  Data balancing schemes •  No periodic checkpoints
  • 33. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 33 Hadoop Tips ü Hadoop is useful ü Implementation ü  When you must process lots of ü  Think big, start small unstructured data ü  Build on agile cycles ü  When running batch jobs is ü  Focus on the data, as you will acceptable always develop schema on ü  When you have access to lots write. of cheap hardware ü  Available Optimizations ü Hadoop is not useful ü  Input to Maps ü  For intense calculations with ü  Map only jobs little or no data ü  Combiner ü  When your data is not self- ü  Compression ü  Speculation contained ü  Fault Tolerance ü  When you need interactive ü  Buffer Size results ü  Parallelism (threads) ü  Partitioner ü  Reporter ü  DistributedCache ü  Task child environment settings
  • 34. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 34 Hadoop Tips ü Troubleshooting ü Performance Tuning ü  Are your partitions uniform? ü  Increase the memory/buffer ü  Can you combine records at the allocated to the tasks map side? ü  Increase the number of tasks that ü  Are maps reading off a DFS block can be run in parallel worth of data? ü  Increase the number of threads that ü  Are you running a single reduce serve the map outputs wave (unless the data size per ü  Disable unnecessary logging reducers is too big) ? ü  Turn on speculation ü  Have you tried compressing ü  Run reducers in one wave as they intermediate data & final data? tend to get expensive ü  Are there buffer size issues ü  Tune the usage of ü  Do you see unexplained “long tails” DistributedCache, it can increase ü  Are your CPU cores busy? efficiency ü  Is at least one system resource being loaded?
  • 35. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 35 NoSQL •  Stands for Not Only SQL •  Based on CAP Theorem / BASE •  Usually do not require a fixed table schema nor do they use the concept of joins •  All NoSQL offerings relax one or more of the ACID properties •  Scalable replication and distribution •  Potentially thousands of machines •  Potentially distributed around the world •  Queries need to return answers quickly •  Mostly query, few updates •  Asynchronous Inserts & Updates •  NoSQL databases come in a variety of flavors •  XML (myXMLDB, Tamino, Sedna) •  Wide Column (Cassandra, Hbase, Big Table) •  Key/Value (Redis, Memcached with BerkleyDB) •  Graph (neo4j, InfoGrid) •  Document store (CouchDB, MongoDB)
  • 36. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 36 NoSQL Footprint Amazon Dynamo Key HBase Value Big Table Voldermort Google Big Table Size Lotus Notes Doc Graph Database Cassandra Theory Graph Complexity
  • 37. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 37 NoSQL ü  Access and Query ü  Best Practices ü  RESTful interfaces (HTTP as an ü  Design for data collection accessAPI) ü  Plan the data store ü  Query languages other than ü  Organize by type and SQL semantics ü  SPARQL - Query language ü  Partition for performance for the SemanticWeb ü  Access and Query is ü  Gremlin - the graph run time dependent traversal language ü  Horizontal scaling ü  Sones Graph Query ü  Memory Caching Language ü  Data Manipulation / Query API ü  The Google BigTable DataStoreAPI ü  The Neo4jTraversalAPI ü  Serialization Formats ü  JSON ü  Thrift ü  ProtoBuffers ü  RDF
  • 38. copyright: Sixth Sense Advisors Inc @2012 38 Map Reduce n Technique for indexing and searching large data volumes n Two Phases, Map and Reduce n Map n Extract sets of Key-Value pairs from underlying data n Potentially in Parallel on multiple machines n Reduce n Merge and sort sets of Key-Value pairs n Results may be useful for other searches
  • 39. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 39 Textual ETL Engine Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of data that can be analyzed by standard analytical tools ü  Textual ETL Engine provides a robust user interface to define rules (or patterns / keywords) to process unstructured or semi-structured data. ü  The rules engine encapsulates all the complexity and lets the user define simple phrases and keywords ü  Easy to implement and easy to realize ROI ü  Advantages ü  Disadvantages ü  Simple to use ü  Not integrated with Hadoop as a ü  No MR or Coding required for text rules interface analysis and mining ü  Currently uses Sqoop for metadata ü  Extensible by Taxonomy integration interchange with Hadoop or NoSQL ü  Works on standard and new interfaces databases ü  Current GA does not handle ü  Produces a highly columnar key- distributed processing outside value store, ready for metadata Windows platform integration
  • 40. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 40 Integration •  All RDBMS vendors today are supporting Hadoop or NoSQL as an integration or extension •  Oracle Exalytics / Big Data Appliance •  Teradata Aster Appliance •  EMC Greenplum Appliance •  IBM BigInsights •  Microsoft Windows Azure Integration •  There are multiple providers of Hadoop distribution •  CloudEra •  HortonWorks •  Hadapt •  Zettaset •  IBM •  Adapters from vendors to interface with CloudEra or HortonWorks distributions of Hadoop are available today. There are integration efforts to release Hadoop as an integral engine across the RDBMS vendor platforms
  • 41. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 41 Conceptual  Solu>on  Architecture   Metadata MDM ETL Data OLTP ELT Warehouse CDC DataMart’s Big Data BIG Data Textual DW Content ETL Email Taxonomy Docs And / Or MR / Ruby / Java (Hadoop)
  • 42. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 42 Which Tool Application Hadoop NoSQL Textual ETL Machine Learning x x Sentiments x x x Text Processing x x x Image Processing x x Video Analytics x x Log Parsing x x x Collaborative x x x Filtering Context Search x Email & Content x
  • 43. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 43 Integration Tips •  The key to the castle in integrating Big Data is metadata •  Whatever the tool, technology and technique, if you do not know your metadata, your integration will fail •  Semantic technologies and architectures will be the way to process and integrate the Big Data, much akin to Web 2.0 models •  Data quality for Big Data is a very questionable goal. To get some semblance of quality, taxonomies and ontologies can be of help •  3rd part data providers also provide keywords, trending tags and scores, these can provide a lot of integration support •  Writing business rules for Big Data can be very cumbersome and not all programs can be written in MapReduce
  • 44. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 44 Success  Stories   •  Machine learning & Recommendation Engines – Amazon, Orbitz •  CRM - Consumer Analytics, Metrics, Social Network Analytics, Churn, Sentiment, Influencer, Proximity •  Finance – Fraud, Compliance •  Telco – CDR, Fraud •  Healthcare – Provider / Patient analytics, fraud, proactive care •  Lifesciences – clinical analytics, physician outreach •  Pharma – Pharmacovigilance, clinical trials •  Insurance – fraud, geo-spatial •  Manufacturing – warranty analytics, supplier quality metrics
  • 45. copyright: Sixth Sense Advisors Inc @2012 45 Big Data Challenges •  Integration to the EDW is still an open issue – Big Data reduces to small metrics, and this translates into the current state issues faced with EDW data •  Big Data requires lot of Taxonomy processing especially in Content related Search •  There are several applications that need high performing memory architectures as data is compute intensive – example image processing of brain scans •  Technology is improving by the day, but integration and deployment are becoming equally complex.
  • 46. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 46 Data Science Art & Science Data Analytics APPLIED SCIENCE Content User Interest Prediction Customer inventory prediction Product Machine learning Behaviors Pattern Mining Optimization Advanced Regression Big Data Processing & ETL Analysis Business Intelligence Advanced Analytics Business Analysts, Data Analysts, Metadata Architects, Data Architects are all in some evolutionary stage of a Data Scientist
  • 47. copyright: Sixth Sense Advisors Inc @2012 47 Contact Krish Krishnan rkrish1124@yahoo.com Twitter - @datagenius