SlideShare a Scribd company logo
1 of 39
Download to read offline
Showcasing Data Science
Lab functionality
Welcome from Kognitio
www.kognitio.com
Today’s Web Seminar -
Presenters Host
Michael Hiskey
Vice President
Marketing & Business Development
Format &
Agenda
Keynote Presenters
Dr. Sharon Kirkham
Data Scientist
Kognitio Analytics Center of Excellence
• Big Data and Complexity– the need for Data Scientists 
Question Break #1
• Data Manipulation – functional demonstration
Question Break #2
• Product forecasting with parallel R  ‐ practical demonstration 
Question Break # 3
Kognitio
Kognitio is focused on providing the 
premier high‐performance analytical 
platform to power business insight 
around the world
• Kognitio invented the in‐memory analytical 
platform, first taking it to market in 1989
• Privately held
• Labs in the UK ‐ HQ in New York, NY 
The Data Science Lab
Data
Scientists &
Staff
Mathematic
Algorithms
MPP
Computing
BIG DATA
11
What do business users want to do?
Find patterns
Track life
time
journeys
Predict
behavior
Forecast
scenarios
Allocate
scarce
resources
Model
value
Characterize
groups
Visualize
discovery
Respond,
trigger,
manage,
promote
I’m a data scientist! Are you?
Entry level skills and development - aspiration
Machine
Learning
Graduates
I’m a data scientist! Are you?
Business
Expertise
Machine
Learning
Interpretation
skills
= Insight
Graduates
Need
guidance
Data
Scientist
Supporting the data scientist
Typical process – traditionally…
Database
Supporting the data scientist
Typical process – direct data preparation
Database
SQL processing
Supporting the data scientist
Typical process – produces analytical data set
Database
SQL processingData Set
Supporting the data scientist
Typical process – run analytics from server
Database
SQL processingData Set
???
Supporting the data scientist
Typical process – data samples often used
Database
SQL processingData Set
???
Data Samples
Process run
iteratively
= slow
Supporting the data scientist
Typical process – modelling process is honed
Database
SQL processingData Set
???
Data Samples
Process run
iteratively
= slow
Supporting the data scientist
Typical process – model is complete
Database
Data Set
???

Supporting the data scientist
Typical process – score full data (Ouch!)
Database
Data Set
???
Full data
to score
Supporting the data scientist
Push processes to DB – still produce analytical data set
Analytical Platform
SQL processingData Set
Supporting the data scientist
Push processes to DB – translate specific processes
Analytical Platform
SQL processingData Set
???
Translation
Supporting the data scientist
Push processes to DB – results passed back
Analytical Platform
SQL processingData Set
???
Translation
Result Data Set
Supporting the data scientist
Push processes to DB– modelling process is honed
Analytical Platform
SQL processingData Set
???
Translation
Result Data Set
Supporting the data scientist
Push processes to DB– model scoring done in DB
Analytical Platform
SQL processingData Set
???

Result Data Set
Supporting the data scientist
But we always want more! Complex data structure
Analytical Platform
Data Set
???

Result Data Set
SQL cannot handle
Data complexity.
How do I integrate
into my model?
Supporting the data scientist
But we always want more! non-standard processes
Database
SQL processingData Set
???
Data Samples Back where
we started
Supporting the data scientist
Bring Analytics to data – still produce analytical data set
SQL processing
SQL processing
Supporting the data scientist
Bring Analytics to data – can use other code for data prep
SQL processing
Kognitio scripting
Code executed
Using MPP
Data held in
Memory. Fast
access to CPUs
Supporting the data scientist
Bring Analytics to data – run analytics natively in Kognitio
SQL processing
Kognitio scripting
Code executed
Using MPP
Data held in
Memory. Fast
access to CPUs
One platform flexible working
from data prep through analytical
process
New! Kognitio version 8:
Enabling and extending the Analytical Platform
External Tables
External Functions
Not Only SQL
Hadoop Connector Other Connectors
Kognitio Storage
as an External table
General Availability:
June 2013
External Scripting – Data Transformation
Converting structured data into
XML format, i.e. furnishing
personalised content
Assembly
Converting XML into structured
data
Disassembly
Extracting complex information
from URLs
Pulling words from large text fields,
i.e. sentiment analysis
Parsing
Converting row based information
into columns for data mining,
i.e. supporting classification or
segmentation
Transposition
e.g. using perl
Examples where SQL is typically complex and extensive
Data Manipulation
Small Demo
Product Forecasting – with parallel R
Forecasting
Requirements
Forecast
Inputs
R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
Kognitio
platform
specification
16 servers
462GB
Kognitio
RAM
128 Cores
This is old kit
2.9 billion
rows of
epos
184 day time series
for 12K products
R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
1 output table
in RAM
128 parallel
instances of R
R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
Application &
Client Layer
ExcelAll BI Tools
R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
Application &
Client Layer
ExcelAll BI Tools
13 views of
different analytical
output
R running in an MPP environment
Persistence
Layer
Analytical
Platform
Layer
Application &
Client Layer
ExcelAll BI Tools
Result set
contained
# rows
12K forecasts and
stats calculated
in # seconds
2.9B EPOS items
collated into
time series
in # seconds
Product Forecasting
using parallel R Demo
Thank you for your participation today
• More information on today’s topic can be found at: 
• kognitio.com/mpp_r
• kognitio.com/product‐forecasting
• FREE TO USE – perpetual license
– www.kognitio.com/free
– Contact us for the pre‐release version 8
• Analyst White Papers
– EMA Comparative Analysis 
– In‐memory database platforms
– www.kognitio.com/emacompinmem
• Today’s slides (and more): www.slideshare.net/Kognitio
connect
www.kognitio.com
twitter.com/kognitiolinkedin.com/companies/kognitio
tinyurl.com/kognitio youtube.com/kognitio
NA: +1 855  KOGNITIO
EMEA: +44 1344 300 770

More Related Content

What's hot

Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Simplilearn
 
II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...
II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...
II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...
Dr. Haxel Consult
 
`Data mining
`Data mining`Data mining
`Data mining
Jebin R
 

What's hot (20)

Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
 
II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...
II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...
II-SDV 2012 Dealing with Large Data Volumes in Statistical Analysis and Text ...
 
II-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics TechnologiesII-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics Technologies
 
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataII-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
 
Self-service consumption Data Catalog
Self-service consumption Data CatalogSelf-service consumption Data Catalog
Self-service consumption Data Catalog
 
what is data science
 what is data science what is data science
what is data science
 
AI-SDV 2021: Angela Bauch - AILANI for clinical competitive landscaping
AI-SDV 2021: Angela Bauch - AILANI for clinical competitive landscapingAI-SDV 2021: Angela Bauch - AILANI for clinical competitive landscaping
AI-SDV 2021: Angela Bauch - AILANI for clinical competitive landscaping
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
 
ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
`Data mining
`Data mining`Data mining
`Data mining
 
Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...
Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...
Survey on Text Mining Based on Social Media Comments as Big Data Analysis Usi...
 
Toolboxes for data scientists
Toolboxes for data scientistsToolboxes for data scientists
Toolboxes for data scientists
 
GraphTour London 2020 - Customer Journey
GraphTour London 2020  - Customer Journey GraphTour London 2020  - Customer Journey
GraphTour London 2020 - Customer Journey
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ Karlsruhe
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoft
 

Similar to Product forecastingwebinar 20130417

Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
VMware Tanzu
 

Similar to Product forecastingwebinar 20130417 (20)

DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Data science lab enabling flexibility
Data science lab   enabling flexibilityData science lab   enabling flexibility
Data science lab enabling flexibility
 
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan GoleDemocratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan Gole
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field Experience
 
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Product forecastingwebinar 20130417

  • 1. Showcasing Data Science Lab functionality Welcome from Kognitio www.kognitio.com
  • 2. Today’s Web Seminar - Presenters Host Michael Hiskey Vice President Marketing & Business Development Format & Agenda Keynote Presenters Dr. Sharon Kirkham Data Scientist Kognitio Analytics Center of Excellence • Big Data and Complexity– the need for Data Scientists  Question Break #1 • Data Manipulation – functional demonstration Question Break #2 • Product forecasting with parallel R  ‐ practical demonstration  Question Break # 3
  • 4. The Data Science Lab Data Scientists & Staff Mathematic Algorithms MPP Computing BIG DATA 11
  • 5. What do business users want to do? Find patterns Track life time journeys Predict behavior Forecast scenarios Allocate scarce resources Model value Characterize groups Visualize discovery Respond, trigger, manage, promote
  • 6. I’m a data scientist! Are you? Entry level skills and development - aspiration Machine Learning Graduates
  • 7. I’m a data scientist! Are you? Business Expertise Machine Learning Interpretation skills = Insight Graduates Need guidance Data Scientist
  • 8. Supporting the data scientist Typical process – traditionally… Database
  • 9. Supporting the data scientist Typical process – direct data preparation Database SQL processing
  • 10. Supporting the data scientist Typical process – produces analytical data set Database SQL processingData Set
  • 11. Supporting the data scientist Typical process – run analytics from server Database SQL processingData Set ???
  • 12. Supporting the data scientist Typical process – data samples often used Database SQL processingData Set ??? Data Samples Process run iteratively = slow
  • 13. Supporting the data scientist Typical process – modelling process is honed Database SQL processingData Set ??? Data Samples Process run iteratively = slow
  • 14. Supporting the data scientist Typical process – model is complete Database Data Set ??? 
  • 15. Supporting the data scientist Typical process – score full data (Ouch!) Database Data Set ??? Full data to score
  • 16. Supporting the data scientist Push processes to DB – still produce analytical data set Analytical Platform SQL processingData Set
  • 17. Supporting the data scientist Push processes to DB – translate specific processes Analytical Platform SQL processingData Set ??? Translation
  • 18. Supporting the data scientist Push processes to DB – results passed back Analytical Platform SQL processingData Set ??? Translation Result Data Set
  • 19. Supporting the data scientist Push processes to DB– modelling process is honed Analytical Platform SQL processingData Set ??? Translation Result Data Set
  • 20. Supporting the data scientist Push processes to DB– model scoring done in DB Analytical Platform SQL processingData Set ???  Result Data Set
  • 21. Supporting the data scientist But we always want more! Complex data structure Analytical Platform Data Set ???  Result Data Set SQL cannot handle Data complexity. How do I integrate into my model?
  • 22. Supporting the data scientist But we always want more! non-standard processes Database SQL processingData Set ??? Data Samples Back where we started
  • 23. Supporting the data scientist Bring Analytics to data – still produce analytical data set SQL processing SQL processing
  • 24. Supporting the data scientist Bring Analytics to data – can use other code for data prep SQL processing Kognitio scripting Code executed Using MPP Data held in Memory. Fast access to CPUs
  • 25. Supporting the data scientist Bring Analytics to data – run analytics natively in Kognitio SQL processing Kognitio scripting Code executed Using MPP Data held in Memory. Fast access to CPUs One platform flexible working from data prep through analytical process
  • 26. New! Kognitio version 8: Enabling and extending the Analytical Platform External Tables External Functions Not Only SQL Hadoop Connector Other Connectors Kognitio Storage as an External table General Availability: June 2013
  • 27. External Scripting – Data Transformation Converting structured data into XML format, i.e. furnishing personalised content Assembly Converting XML into structured data Disassembly Extracting complex information from URLs Pulling words from large text fields, i.e. sentiment analysis Parsing Converting row based information into columns for data mining, i.e. supporting classification or segmentation Transposition e.g. using perl Examples where SQL is typically complex and extensive
  • 29. Product Forecasting – with parallel R Forecasting Requirements Forecast Inputs
  • 30. R running in an MPP environment Persistence Layer Analytical Platform Layer
  • 31. R running in an MPP environment Persistence Layer Analytical Platform Layer Kognitio platform specification 16 servers 462GB Kognitio RAM 128 Cores This is old kit 2.9 billion rows of epos 184 day time series for 12K products
  • 32. R running in an MPP environment Persistence Layer Analytical Platform Layer
  • 33. R running in an MPP environment Persistence Layer Analytical Platform Layer 1 output table in RAM 128 parallel instances of R
  • 34. R running in an MPP environment Persistence Layer Analytical Platform Layer Application & Client Layer ExcelAll BI Tools
  • 35. R running in an MPP environment Persistence Layer Analytical Platform Layer Application & Client Layer ExcelAll BI Tools 13 views of different analytical output
  • 36. R running in an MPP environment Persistence Layer Analytical Platform Layer Application & Client Layer ExcelAll BI Tools Result set contained # rows 12K forecasts and stats calculated in # seconds 2.9B EPOS items collated into time series in # seconds
  • 38. Thank you for your participation today • More information on today’s topic can be found at:  • kognitio.com/mpp_r • kognitio.com/product‐forecasting • FREE TO USE – perpetual license – www.kognitio.com/free – Contact us for the pre‐release version 8 • Analyst White Papers – EMA Comparative Analysis  – In‐memory database platforms – www.kognitio.com/emacompinmem • Today’s slides (and more): www.slideshare.net/Kognitio