CPG manufacturers need to understand big data and understand the value of big data. This presentation explains big data, the evolution of big data and how big data can be used.
2. JANET DORENKOTT, BIO
• Over 20 years of experience in information technology.
• Founded Relational Solutions in 1996 and co-owns with Rob York.
• Focused on data warehousing, data integration & business intelligence solutions
• Specialize in the complex issues associated with integrating point of sale and syndicated data
for the CPG industry & developed applications including POSmart and BlueSky, designed for
handling data complexities unique to CPG companies.
• Member of Retailwire’s Braintrust
• Founder of the Demand Signal Repository Institute on LinkedIn.
• Participated in the implementation of over 200 data warehouse and BI projects for companies
that include Chrysler, Chase, Timken, Xerox, Glaxo, Smuckers, P&G and many others.
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
3. GOALS FOR TODAY
• TO DEFINE BIG DATA
• EXPLAIN HOW BIG DATA CAN IMPROVE BUSINESS
• EXPLAIN HOW TO USE IT
• SHOW THE IMPORTANCE OF LEVERAGING SOCIAL MEDIA
4. “Top 10
“Companies
on the Move”
BlueSky
Integration
Studio
“Best at
integrating POS
with Internal
data”
Cleveland
Weatherhead 100
Fastest Growing
Businesses
Oracle
Developer of
the Year
Data Warehouse
& BI Consulting
1996 - 98 1999 - 01 2002 – 04 2005 - 06 2007 - 08 2009 - 10 2011 – 12 2013
“Data Warehouse
of the Year!”
BlueSky
“Coolest New
Technologies”
DataStage
ETL Best
Implementors
Award
Informatica’s
Partner of the
Year
Selects BIS to
integrate POS &
TradeEdge
Selects
POSmart to
embed in DSR
Best Software”
Finalist
BIG DATA… IT’S IN OUR BLOOD!
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
5. BUSINESS INTELLIGENCE
• Leverages data to provide users with “Fact Based Decision” capability.
• Derived from an enterprise data warehouse for management decisions
• Reports are also derived from “stove pipe” solutions, ERP applications and homemade
integration processes.
• Operational reports are not the same as Analytical reports.
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
6. TRANSACTIONAL VS. ANALYTICAL REPORTING
TRANSACTIONAL SYSTEM
• DATABASE STRUCTURE DESIGNED FOR
DATA ENTRY, UPDATE, AND PROCESSING.
• OPERATIONAL REPORTS.
• REPORTING USERS CAN IMPACT
PROCESSING - QUICKLY BECOMES A SLOW
ENVIRONMENT
• PURCHASED APPLICATIONS CONTAIN
STANDARD REPORTS
• INCONSISTENT DUE TO “TWINKLING”
• NO ACCESS TO SOME INFO
• REPORTS CAN TAKE DAYS OR BE
IMPOSSIBLE TO GET
• NORMALIZED MODEL FOR FAST INPUT
DATA WAREHOUSE
• DATA MODEL DESIGNED FOR ANALYTICAL
REPORTING AND AD-HOC QUERIES, BOTH
FROM A CREATION AND A PERFORMANCE
STANDPOINT
• FREQUENTLY CONTAINS DETAIL DATA AND
PRE-AGGREGATED SUMMARIES FOR FAST
REPORTING
• TOOLS ALLOW END USERS TO INQUIRE,
DRILL FROM SUMMARY TO DETAIL
• REPORTING USERS DO NOT IMPACT THE
TRANSACTIONAL SYSTEM
• OFTEN COMBINES DATA FROM MULTIPLE
TRANSACTIONAL SYSTEMS
• CONSISTENT – BUSINESS RULES
• TYPICALLY DENORMALIZED
Data
Mart
Transactional
System
e.g.
SAP
JDE
Oracle Apps
JDA
Homegrown
Data
Mart
Data
Mart
Data
Mart
Data
Mart
Data
Mart
Periodic Data Feeds
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
7. BIG DATA STARTED WITH ERP AND DATA WAREHOUSING
• DATA MART: FOCUSED
COLLECTION OF SIMILAR DATA
FOR REPORTING PURPOSES
Sales
Data Mart
Finance
Data Mart
Forecasting
Data Mart
International Sales
Data Mart
Vendor Information
Data Mart
DATA WAREHOUSE:
INTEGRATION OF MULTIPLE
DATA MARTS INTO AN
ENTERPRISE SOLUTION
Marketing
Data Mart
Common
Reference
Values
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
8. THE BIG DATA EXPLOSION!
Accounting
Shipments
Order
Processing
Manufacturing
Transactional/ERP
Analytical
Big Data
Currency Conversion
Weather Trends
SMS/MSS
Photo’s
Syndicated Data
Web & Outside Data Sources
EDW
CRM
Loyalty
Segmentation
Panel Data
Wholesaler, Distributor
& Broker Data
Promotion Results
Web Logs
EDI
Retailer POS Web Logs
3rd Party Data
Click Stream
Audio
Textual Content
Video
Reputation
Management
Social Media
Chatter
Blogs
Location Info
3-D Content
Schmatics
Geo-Spacial
Speech to
Text
Demographics
Emerging Market
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
9. WHAT’S THE DIFFERENCE?
Un-Structured
• Social Media
• Chatter, Text
Analytics, Blogs,
Tweets, Comments,
Likes, Followers,
Social Authority,
Clicks, Tags, etc.
• Digital, Video
• Audio
• Geo-Spacial
Multi-Structured
/Hybrid
• Emerging Market Data
• Loyalty
• E-Commerce
• Other Third Party Data
• Weather
• Currency Conversion
• Demographic
• Panel
• POS, POL, IR, EDI, RFID, NFC, QR,
IRI, Rsi, Nielsen, Other
Syndicated, IMS, MSA, etc.
Structured
ERP & DW
• Main Frame
• SQL Server
• Oracle
• DB2
• Sybase
• Access, Excel, txt, etc
• Teradata
• Neteeza, Other mpp
• SAP, JDE, JDA, Other ERP.
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
11. IT’S NOT JUST SIZE ,
VARIETY!
EDI
RFID
SAP
DB2
Oracle
TXT
SQL
AS2
CRM
TPO JDE
QR
ACESS
Mobile
EXCEL
NPD
IMS
TPM
E-Comerce
CRM
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
12. IT’S NOT JUST VOLUME & VARIETY!
VELOCITY MATTERS!
• Daily
• Weekly
• Monthly
• Quarterly
• Annually
• Every Hour
• Every Minute
• Every Second
• Every Nano-Second!
• Constantly Changing
• Constantly Growning!
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
13. IT’S NOT JUST VOLUME & VARIETY & VELOCITY.
COMPLEXITY!
• Aligning Hierarchy’s
• Integrating Internal Master Data with Retailer Master Data
• Applying Various Calendars
• Regional Territories
• Geographic alignment
• Currency Conversion
• Emerging Market
• Loyalty
• Market Basket
• Cleansing Issues
• Re-cast Data
• Slowly Changing Dimensions (how you want to handle
history, new stores, etc).
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
14. WHAT IS HADOOP?
•HADOOP IS AN OPEN SOURCE DATA LIBRARY WITH 2 KEY COMPONENTS:
1. DISTRIBUTED FILE SYSTEM (HDFS) – FOR HIGH BANDWIDTH, CLUSTER BASED STORAGE
2. DATA PROCESSING FRAMEWORK – USES “MAPREDUCE” TO DISTRIBUTE/MAP LARGE DATA SETS ACROSS
MULTIPLE SERVERS. EACH SERVER CREATES A SUMMARY OF THE DATA THAT HAS BEEN ALLOCATED TO IT. FROM
THERE, DATA IS “REDUCED” OR “AGGREGATED.” SIMPLY PUT, IT IS MAPPED, THEN REDUCED.
“HADOOP LETS YOU DEAL WITH VOLUME, VELOCITY AND VARIETY OF DATA. IT TRANSFORMS COMMODITY
HARDWARE AND PROVIDES AUTOMATIC FAILOVER.”
OWEN O’MALLEY, ARCHITECT FOR MAPREDUCE & SECURITY.
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
15. WHAT IS MAPREDUCE?
• A PARALLEL PROGRAMMING FRAMEWORK
• MADE POPULAR BY GOOGLE
• GENERATE SEARCH INDEXES
• WEB SCORING ALGORITHMS
• C++, JAVA, PYTHON, ETC.
• HARNESS 1000S OF CPUS
• MAPREDUCE PROVIDES
• AUTOMATIC PARALLELIZATION
• FAULT TOLERANCE
• MONITORING & STATUS UPDATES
“MAPREDUCE ALLOWS PROGRAMMERS
WITHOUT ANY EXPERIENCE WITH PARALLEL
AND DISTRIBUTED SYSTEMS TO EASILY
UTILIZE THE RESOURCES OF A LARGE
DISTRIBUTED SYSTEM.”
- JEFFREY DEAN AND SANJAY GHEMAWAT,
GOOGLE, INC., 2004
Map Function
Scheduler
Results
map
shuffle
reduce
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
16. MAPREDUCE IS SIMPLE WORD COUNT
Unstructured
Data Input
Boat Yacht Lake
House House Lake
Boat House Yacht
Fish Fish Fish
Splitting Mapping Shuffling Reducing Result
Boat Yacht Lake
House House Lake
Boat House Yacht
Fish Fish Fish
Boat, 1
Yacht, 1
Lake, 1
House, 1
House, 1
Lake, 1
Boat, 1
House, 1
Yacht, 1
Fish, 1
Fish, 1
Fish, 1
Boat, 1
Boat, 1
Yacht, 1
Yacht, 1
Lake, 1
Lake, 1
House, 1
House, 1
House, 1
Fish, 1
Fish, 1
Fish, 1
Boat, 2
Yacht, 2
Lake, 2
House, 3
Fish, 3
Boat, 2
Yacht, 2
Lake, 2
House, 3
Fish, 3
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
17. COMMON TERMINOLOGY
• PIG – HIGH LEVEL LANGUAGE THAT CONVERTS WORK TO MAPREDUCE
• HIVE – TRANSFORMS & CONVERTS TO MAPREDUCE USING SQL
• HBASE – SCALABLE, DISTRIBUTED DATABASE. PROVIDES A SIMPLE INTERFACE TO
DATA (I.E. FACEBOOK MESSAGES UTILIZE THIS)
• ZOOKEEPER – PROVIDES COORDINATION FOR SERVERS
• HCATALOG – METADATA PULLED OUT OF HIVE
• MAHOUT – MACHINE LEARNING LIBRARY
• SCOOP – TOOL TO RUN MAPREDUCE APPS THAT PULL OR PUSH OUT OF SQL OR
ORACLE
• CASCADE – TRANSLATES DOWN INTO MAPREDUCE
• OOZIE – WORKFLOW COORDINATION TO LEARN MAPREDUCE JOBS
• FUSE DFS – USED TO ACCESS LINUX FILES
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
18. HOW CAN BIG DATA BE USED?
• BIG DATA CAN BE USED TO MICRO-SEGMENT
CUSTOMERS, ANALYZE SENTIMENT, PREDICT
BEHAVIOR, PERSONALIZE OFFERS, CROSS-SELL
AND UPSELL ACROSS CHANNELS, MANAGE
REPUTATION, INCREASE SALE AND PROFITS.
• COMPANIES NEED TO “WALK BEFORE YOU RUN.”
• THE “BUILD IT & THEY WILL COME” PHILOSOPHY
RARELY WORKS. IDENTIFY A BUSINESS NEED.
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
19. SOCIAL MEDIA REQUIRES YOU TO
LISTEN
ENGAGE
INFORM
OFFER
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
20. LEVERAGING THE DATA MEANS YOU NEED TO
ACCESS
ANALYZE
ACT
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
21. IS SOCIAL MEDIA REALLY WORTH
LEVERAGING?
ACCORDING TO THE PEW RESEARCH CENTER:
• 100 MILLION ACTIVE USERS
• 50 MILLION LOG ON TO TWITTER EVERYDAY
• 55% ARE MOBILE USERS
-------------------------------------------
• AVERAGE TWEETS SENT PER DAY (IN MILLIONS):
• IN JANUARY, 2010 – 50 MILLION TWEETS PER SECOND
• IN FEBRUARY, 2011 – 140 MILLION TWEETS PER SECOND
• IN SEPTEMBER, 2011 – 230 MILLION TWEETS PER SECOND
• There were 2.5 million tweets regarding Steve Jobs’
death in the first 13 hours after it was reported, which is
about 53 tweets per second.
• 6,939 Tweets per second in Japan on New Years Eve at
Midnight
According to McKinsey Global Institute:
• Facebook – 700,000,000,000 minutes spent/month
• Google – 34,000 search/sec
• Email – 838,000,000 messages in 2013
• Twitter – 500,000,000 tweets/day
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
22. IT’S ONLY JUST BEGUN!
• LINKEDIN
• FACEBOOK
• YOUTUBE
• SLIDESHARE
• BRIGHTTALK.COM
• SCRIBED
• NAYMZ
• JIGSAW
• SPOKE
• G+
• TWITTER
• VINE
• INSTAGRAM
• BING
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
24. KNOW YOUR SOCIAL REPUTATION
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
25. KNOW WHERE YOUR SENTIMENT IS COMING FROM
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
26. SEE WHERE YOUR CHAMPIONS ARE
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
27. UNDERSTAND WHERE YOU NEED DAMAGE CONTROL
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
28. WHAT ARE YOUR FOLLOWERS SAYING
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
29. GOALS FOR TODAY – ACCOMPLISHED!
• TO DEFINE BIG DATA – VOLUME, VARIETY, VELOCITY & COMPLEXITY
• EXPLAIN HOW BIG DATA CAN IMPROVE BUSINESS – LISTEN, ENGAGE, INFORM & OFFER
• EXPLAIN HOW TO USE IT – LEVERAGING A FOUNDATION
• SHOW THE IMPORTANCE OF LEVERAGING SOCIAL MEDIA – INTEGRATE WITH OTHER DATA
30. THANK YOU & STAY TUNED!
• FOLLOW JANET DORENKOTT ON LINKEDIN, EMAIL JANETD@RELATIONALSOLUTIONS.COM
• CALL US AT 440-899-3296, JANET IS X225 / KAREN IS X 232
• FOLLOW RELATIONAL SOLUTIONS ON LINKEDIN, TWITTER @POSMARTBLUESKY & ON
FACEBOOK
• JOIN OUR “DEMAND SIGNAL REPOSITORY INSTITUTE” & “BIG DATA ASSOCIATION” GROUP ON
LINKEDIN
• SUBSCRIBE TO THE RELATIONAL SOLUTIONS CHANNEL ON YOUTUBE:
• RELATIONAL SOLUTIONS CHANNEL
• VISIT US AT WWW.RELATIONALSOLUTIONS.COM OR CALL 440-899-3296 X225
• LEARN MORE FROM OUR WEBINARS & DOWNLOAD OUR WHITEPAPERS
• SEE PRODUCT DEMO’S & DOWNLOAD TRIALS FROM OUR WEBSITE
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,