SlideShare une entreprise Scribd logo
1  sur  30
Data Science Innovations:
Natural Language Generation, Systems of Insight & Deep Learning
August 2017
Suresh Sood, PhD
@soody,
suresh.sood@uts.edu.au
linkedin.com/in/sureshsood
Areas for Conversation
Data Science
Data Science Innovation (s)
Democratisation of big data
Gartner & Forrester Trends
 Natural Language Generation
 Systems of Insight
 Deep Learning
Vignettes in the two-step arrival of the internet
of things and its reshaping of marketing
management’s service-dominant logic
Woodside & Sood
Journal of Marketing Management Volume
33, 2017 - Issue 1-2: The Internet of Things
(IoT) and Marketing: The State of Play,
Future Trends and the Implications for
Marketing
Statistics, Data Mining or Data Science ?
• Statistics
–precise deterministic causal analysis over precisely collected data
• Data Mining
–deterministic causal analysis over re-purposed data carefully sampled
• Data Science
–trending/correlation analysis over existing data using bulk of population i.e.
big data
–Extraction of actionable knowledge directly from data through a process of
discovery, hypothesis, and hypothesis testing.
Adapted from: NIST Big Data taxonomy draft report :
(see http://bigdatawg.nist.gov /show_InputDoc.php)
Useful References Big Data
• NIST Big Data interoperability Framework (NBDIF) V1.0 Final Version (September 2015)
Big Data Definitions: http://dx.doi.org/10.6028/NIST.SP.1500-1
Big Data Taxonomies: http://dx.doi.org/10.6028/NIST.SP.1500-2
Big Data Use Cases and Requirements: http://dx.doi.org/10.6028/NIST.SP.1500-3
Big Data Security and Privacy: http://dx.doi.org/10.6028/NIST.SP.1500-4
Big Data Architecture White Paper Survey: http://dx.doi.org/10.6028/NIST.SP.1500-5
Big Data Reference Architecture: http://dx.doi.org/10.6028/NIST.SP.1500-6
Big Data Standards Roadmap: http://dx.doi.org/10.6028/NIST.SP.1500-7
• Apache Spark 2.1.0 Documentation
Machine Learning Library (MLlib) Guide http://spark.apache.org/docs/latest/ml-guide.html
GraphX Programming Guide http://spark.apache.org/docs/latest/graphx-programming-guide.html
SparkR (R on Spark) http://spark.apache.org/docs/latest/sparkr.html#sparkdataframe
Spark SQL, DataFrames and Datasets Guide http://spark.apache.org/docs/latest/sql-programming-guide.html
Data Science Innovation
Data science innovation is something
an organization has not done before or
even something nobody anywhere has
done before. A data science innovation
focuses on discovering and using new
or untraditional data sources to solve
new problems.
Adapted from:
Franks, B. (2012) Taming the Big Data Tidal
Wave, p. 255, John Wiley & Son
Data Science Algorithms
Companies are reimagining Business
Processes with Algorithms and there
is “evidence of significant, even
exponential, business gains in customer’s
customer engagement,
cost & revenue performance”
Wilson, H., Alter A. and Shukla, P. (2016),
Companies Are Reimagining Business Processes
with Algorithms, Harvard Business Review,
February
Variety of Data Types & Big Data Challenge
1.Astronomical
2.Documents
3.Earthquake
4.Email
5.Environmental sensors
6.Fingerprints
7.Health (personal) Images
8.Graph data (social network)
9.Location
10.Marine
11.Particle accelerator
12.Satellite
13.Scanned survey data
14.Sound
15.Text
16.Transactions
17.Video Big Data consists of extensive datasets primarily in the characteristics
of volume, variety, velocity, and/or variability that require a scalable
architecture for efficient storage, manipulation, and analysis.
. Computational portability is the movement of the computation to the location of the data.
• The data collected in a single day take nearly two million years to playback on an MP3 player
• Generates enough raw data to fill 15 million 64GB iPods every day
• The central computer has processing power of about one hundred million PCs
• Uses enough optical fiber linking up all the radio telescopes to wrap twice around the Earth
• The dishes when fully operational will produce 10 times the global internet traffic as of 2013
• The supercomputer will perform 1018 operations per second - equivalent to the number of stars in
three million Milky Way galaxies - in order to process all the data produced.
• Sensitivity to detect an airport radar on a planet 50 light years away.
• Thousands of antennas with a combined collecting area of 1,000,000 square meters - 1 sqkm)
• Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations and several
years - SKA ETA 5 minutes !
To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which,
according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all came
into existence. As a scientist, this is a once in a lifetime opportunity.”
Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska
Galileo
Square Kilometer Array
Construction
(SKA1 - 2018-23; SKA2 - 2023-
30)
Centaurus A
The following BigQuery query (note that the wildcard on "TAX_WEAPONS_SUICIDE_" catches suicide vests, suicide bombers, suicide bombings, suicide
jackets, and so on):
SELECT DATE, DocumentIdentifier, SourceCommonName, V2Themes, V2Locations, V2Tone, SharingImage, TranslationInfo FROM [gdeltv2.gkg] where
(V2Themes like '%TAX_TERROR_GROUP_ISLAMIC_STATE%' or V2Themes like '%TAX_TERROR_GROUP_ISIL%' or V2Themes like
'%TAX_TERROR_GROUP_ISIS%' or V2Themes like '%TAX_TERROR_GROUP_DAASH%') and (V2Themes like '%TERROR%TERROR%' or V2Themes like
'%SUICIDE_ATTACK%' or V2Themes like '%TAX_WEAPONS_SUICIDE_%')
The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record,
spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest
open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates
spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well.
GDELT + BigQuery = Query The Planet
Oil reserves shipment monitoring
Ras Tanura Najmah compound, Saudi Arabia
Source: http://www.skyboximaging.com/blog/monitoring-oil-reserves-from-space
https://nodexl.codeplex.com/
13
Sherman and Young (2016), When Financial Reporting Still Falls
Short, Harvard Business Review, July-August
Sood (2015), Truth, Lies and Brand Trust The Deceit
Algorithm,
http://datafication.com.au/
New Analytical Tools Can
Help
14
Deception Algorithm
(1) Self words e.g. “I” and “me” – decrease when someone
distances themselves from content
(2) Exclusive words e.g. “but” and “or” decrease with fabricated
content owing to complexity of maintaining deception
(3) Negative emotion words e.g. “hate” increase in word usage
owing to shame or guilty feeling
(4) Motion verbs e.g. “go” or “move” increase as exclusive words
go down to keep the story on track
Coronary Heart Disease, Psychological
Science, January 2015
15
The findings show that expressions of negative emotions such as anger, stress, and fatigue in the tweets
from people in a given county were associated with higher heart disease risk in that county.
On the other hand, expressions of positive emotions like excitement and optimism were associated with
lower risk.
The results suggest that using Twitter as a window into a community’s collective mental state may provide a
useful tool in epidemiology…So predictions from Twitter can actually be more accurate than using a set of
traditional variables.
http://www.analyzewords.com
16
2017 Hype Cycle for Data Science and Machine Learning,
29 July, http://www.gartner.com/document/3772081
Gartner (2017)
Strategic Predictions for 2017 and Beyond, research note
14 October, http://www.gartner.com/document/3471568
 By 2020-22 :
 100 million consumers shop in augmented reality
 30% of web browsing sessions without a screen
 Algorithms positively alter behavior of over 1B
 Blockchain-based business worth $10B
 IoT will save consumers/businesses $1T a year
 40% of employees cut healthcare costs via fitness tracker
Smart Data Discovery Will Enable New Class of Citizen Data Scientist
“With the addition of NLG [Natural Language Generation], smart data discovery platforms automatically present
a written or spoken context-based narrative of findings in the data that, alongside the visualization, inform the
user about what is most important for them to act on in the data.”
Gartner, 29 June, 2015
“With the addition of NLG [Natural Language Generation], smart
data discovery platforms automatically present a written or spoken
context-based narrative of findings in the data that, alongside the
visualization, inform the user about what is most important for them
to act on in the data.”
Gartner, 29 June, 2015
Smart Data Discovery Will Enable
New Class of Citizen Data Scientist
Systems of Insight
 Automated pattern extraction
 Outlier detection
 Correlation
 Time series
 Analytics integration with process, app or IoT
https://ubereats.com/melbourne/
20© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Forrester Research, 2016
Reports
&
Analysis
Visualisation
&
Interpretation
Write
Data/Business
“Story”
Insights
Led by Data Analyst or
Scientist
SME owner, Machine Learning and Natural Language Generation
Fusion of data science, business knowledge & creativity for maximium ROI
Data
Aggregation Operationalise
Detect &
Extract
Patterns and
Relationships
Generate
Insights &
Story
Process
Application
IoT
Data
Aggregation
or
Data Set
Traditional Analytics: Slow & Expensive
80% of time sifting through data
System of Insight (SoI)
SoI: Fast & Cost Effective
80% of time in decision making with client
22
outlier-detection “allow detecting a significant
fraction of fraudulent cases…different in nature from
historical fraud…resulting in a novel fraud pattern”
Baesens, B., Vlasselaer, V., and Verbeke, W., 2015, Fraud Analytics Using Descriptive,
Predictive, and Social Network Techniques: A Guide to Data Science for Fraud
Detection, Wiley
Online tenure leads to more spending per customer
High engagement leads to more orders, more
categories purchased, and more spend
https://www.quillengage.com
Better customer experiences . . .
. . . and half the inventory-carrying
costs
of other online fashion retailers.
Forrester, 2016
The ANZ Heavy Traffic Index comprises
flows of vehicles weighing more than 3.5
tonnes (primarily trucks) on 11 selected
roads around NZ. It is contemporaneous
with GDP growth.
The ANZ Light Traffic Index is made up of
light or total traffic flows (primarily cars and
vans) on 10 selected roads around the
country. It gives a six month lead on GDP
growth in normal circumstances (but
cannot predict sudden adverse events such
as the Global Financial Crisis).
http://www.a http://www.anz.co.nz/about-us/economic-markets-research/truckometer/
ANZ TRUCKOMETER
Systems of Insight
• Helps move away from “crisis levels” in talent
• Traditional 5 step analytics process reduced to 2 step from data to action
• Reimagine business processes through “machine engineering”
• Minimise messy data issues and data preparation time
Deep Learning Libraries, Platforms, APIs and Hardware
Next Step
Start using Data Science Innovations
Systems of Insight and innovative data sources
Natural Language Generation
Deep Learning
Data Science Resources
30
The future is impossible to predict.
However one thing is certain :
The company that can excite it’s customers dreams
Is out ahead in the race to business success
Selling Dreams, Gian Luigi Longinotti

Contenu connexe

Tendances

wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
parry prabhu
 
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Amit Sheth
 
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Amit Sheth
 

Tendances (20)

NG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsNG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic Happens
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
4차 산업혁명 시대의 싱크탱크의 변화(kdi)
4차 산업혁명 시대의 싱크탱크의 변화(kdi)4차 산업혁명 시대의 싱크탱크의 변화(kdi)
4차 산업혁명 시대의 싱크탱크의 변화(kdi)
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
 
A Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyA Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven Society
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive Framework
 
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
 
Big dataprocessing cts2015
Big dataprocessing cts2015Big dataprocessing cts2015
Big dataprocessing cts2015
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
future2020
future2020future2020
future2020
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big Data
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Big Data Challenges faced by Organizations
Big Data Challenges faced by OrganizationsBig Data Challenges faced by Organizations
Big Data Challenges faced by Organizations
 
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)
 
Smart Data and real-world semantic web applications (2004)
Smart Data and real-world semantic web applications (2004)Smart Data and real-world semantic web applications (2004)
Smart Data and real-world semantic web applications (2004)
 
Big DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationBig DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and Application
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Big Data 101
Big Data 101Big Data 101
Big Data 101
 

Similaire à Data science innovations

A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
ijcseit
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
ijcseit
 
PatternLanguageOfData
PatternLanguageOfDataPatternLanguageOfData
PatternLanguageOfData
kimErwin
 

Similaire à Data science innovations (20)

Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analytics
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
 
PatternLanguageOfData
PatternLanguageOfDataPatternLanguageOfData
PatternLanguageOfData
 
What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Ayasdi Case Study
Ayasdi Case StudyAyasdi Case Study
Ayasdi Case Study
 
Ayasdi: Demystifying the Unknown
Ayasdi: Demystifying the UnknownAyasdi: Demystifying the Unknown
Ayasdi: Demystifying the Unknown
 
JIMS Rohini IT Flash Monthly Newsletter - October Issue
JIMS Rohini IT Flash Monthly Newsletter  - October IssueJIMS Rohini IT Flash Monthly Newsletter  - October Issue
JIMS Rohini IT Flash Monthly Newsletter - October Issue
 
FACT - A New Way to Get News
FACT - A New Way to Get NewsFACT - A New Way to Get News
FACT - A New Way to Get News
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
Data and science
Data and scienceData and science
Data and science
 
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
 

Plus de suresh sood

Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016
suresh sood
 
Australian Business Culture
Australian Business Culture Australian Business Culture
Australian Business Culture
suresh sood
 
Transforming instagram data into location intelligence
Transforming instagram data into location intelligenceTransforming instagram data into location intelligence
Transforming instagram data into location intelligence
suresh sood
 

Plus de suresh sood (20)

Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to NowcastingGetting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
 
Bigdata AI
Bigdata AI Bigdata AI
Bigdata AI
 
Data Science Innovations
Data Science InnovationsData Science Innovations
Data Science Innovations
 
Swarm jobs
Swarm jobsSwarm jobs
Swarm jobs
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016
 
Beyond dashboards
Beyond dashboardsBeyond dashboards
Beyond dashboards
 
Systemof insight
Systemof insightSystemof insight
Systemof insight
 
TPA
TPATPA
TPA
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
 
Future of jobs, big data & innovation
Future of jobs, big data & innovation Future of jobs, big data & innovation
Future of jobs, big data & innovation
 
Jobs Complexity
Jobs ComplexityJobs Complexity
Jobs Complexity
 
Spark
SparkSpark
Spark
 
Datainnovation
DatainnovationDatainnovation
Datainnovation
 
Bigdatahuman
BigdatahumanBigdatahuman
Bigdatahuman
 
Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesight
 
DBIA
DBIADBIA
DBIA
 
Australian Business Culture
Australian Business Culture Australian Business Culture
Australian Business Culture
 
Cool Tools
Cool Tools Cool Tools
Cool Tools
 
Transforming instagram data into location intelligence
Transforming instagram data into location intelligenceTransforming instagram data into location intelligence
Transforming instagram data into location intelligence
 
Crowdsourcing Social Media
Crowdsourcing Social Media Crowdsourcing Social Media
Crowdsourcing Social Media
 

Dernier

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Dernier (20)

Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 

Data science innovations

  • 1. Data Science Innovations: Natural Language Generation, Systems of Insight & Deep Learning August 2017 Suresh Sood, PhD @soody, suresh.sood@uts.edu.au linkedin.com/in/sureshsood
  • 2. Areas for Conversation Data Science Data Science Innovation (s) Democratisation of big data Gartner & Forrester Trends  Natural Language Generation  Systems of Insight  Deep Learning
  • 3. Vignettes in the two-step arrival of the internet of things and its reshaping of marketing management’s service-dominant logic Woodside & Sood Journal of Marketing Management Volume 33, 2017 - Issue 1-2: The Internet of Things (IoT) and Marketing: The State of Play, Future Trends and the Implications for Marketing
  • 4.
  • 5. Statistics, Data Mining or Data Science ? • Statistics –precise deterministic causal analysis over precisely collected data • Data Mining –deterministic causal analysis over re-purposed data carefully sampled • Data Science –trending/correlation analysis over existing data using bulk of population i.e. big data –Extraction of actionable knowledge directly from data through a process of discovery, hypothesis, and hypothesis testing. Adapted from: NIST Big Data taxonomy draft report : (see http://bigdatawg.nist.gov /show_InputDoc.php)
  • 6. Useful References Big Data • NIST Big Data interoperability Framework (NBDIF) V1.0 Final Version (September 2015) Big Data Definitions: http://dx.doi.org/10.6028/NIST.SP.1500-1 Big Data Taxonomies: http://dx.doi.org/10.6028/NIST.SP.1500-2 Big Data Use Cases and Requirements: http://dx.doi.org/10.6028/NIST.SP.1500-3 Big Data Security and Privacy: http://dx.doi.org/10.6028/NIST.SP.1500-4 Big Data Architecture White Paper Survey: http://dx.doi.org/10.6028/NIST.SP.1500-5 Big Data Reference Architecture: http://dx.doi.org/10.6028/NIST.SP.1500-6 Big Data Standards Roadmap: http://dx.doi.org/10.6028/NIST.SP.1500-7 • Apache Spark 2.1.0 Documentation Machine Learning Library (MLlib) Guide http://spark.apache.org/docs/latest/ml-guide.html GraphX Programming Guide http://spark.apache.org/docs/latest/graphx-programming-guide.html SparkR (R on Spark) http://spark.apache.org/docs/latest/sparkr.html#sparkdataframe Spark SQL, DataFrames and Datasets Guide http://spark.apache.org/docs/latest/sql-programming-guide.html
  • 7. Data Science Innovation Data science innovation is something an organization has not done before or even something nobody anywhere has done before. A data science innovation focuses on discovering and using new or untraditional data sources to solve new problems. Adapted from: Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son Data Science Algorithms Companies are reimagining Business Processes with Algorithms and there is “evidence of significant, even exponential, business gains in customer’s customer engagement, cost & revenue performance” Wilson, H., Alter A. and Shukla, P. (2016), Companies Are Reimagining Business Processes with Algorithms, Harvard Business Review, February
  • 8. Variety of Data Types & Big Data Challenge 1.Astronomical 2.Documents 3.Earthquake 4.Email 5.Environmental sensors 6.Fingerprints 7.Health (personal) Images 8.Graph data (social network) 9.Location 10.Marine 11.Particle accelerator 12.Satellite 13.Scanned survey data 14.Sound 15.Text 16.Transactions 17.Video Big Data consists of extensive datasets primarily in the characteristics of volume, variety, velocity, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis. . Computational portability is the movement of the computation to the location of the data.
  • 9. • The data collected in a single day take nearly two million years to playback on an MP3 player • Generates enough raw data to fill 15 million 64GB iPods every day • The central computer has processing power of about one hundred million PCs • Uses enough optical fiber linking up all the radio telescopes to wrap twice around the Earth • The dishes when fully operational will produce 10 times the global internet traffic as of 2013 • The supercomputer will perform 1018 operations per second - equivalent to the number of stars in three million Milky Way galaxies - in order to process all the data produced. • Sensitivity to detect an airport radar on a planet 50 light years away. • Thousands of antennas with a combined collecting area of 1,000,000 square meters - 1 sqkm) • Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations and several years - SKA ETA 5 minutes ! To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which, according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all came into existence. As a scientist, this is a once in a lifetime opportunity.” Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska Galileo Square Kilometer Array Construction (SKA1 - 2018-23; SKA2 - 2023- 30) Centaurus A
  • 10. The following BigQuery query (note that the wildcard on "TAX_WEAPONS_SUICIDE_" catches suicide vests, suicide bombers, suicide bombings, suicide jackets, and so on): SELECT DATE, DocumentIdentifier, SourceCommonName, V2Themes, V2Locations, V2Tone, SharingImage, TranslationInfo FROM [gdeltv2.gkg] where (V2Themes like '%TAX_TERROR_GROUP_ISLAMIC_STATE%' or V2Themes like '%TAX_TERROR_GROUP_ISIL%' or V2Themes like '%TAX_TERROR_GROUP_ISIS%' or V2Themes like '%TAX_TERROR_GROUP_DAASH%') and (V2Themes like '%TERROR%TERROR%' or V2Themes like '%SUICIDE_ATTACK%' or V2Themes like '%TAX_WEAPONS_SUICIDE_%') The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record, spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well. GDELT + BigQuery = Query The Planet
  • 11. Oil reserves shipment monitoring Ras Tanura Najmah compound, Saudi Arabia Source: http://www.skyboximaging.com/blog/monitoring-oil-reserves-from-space
  • 13. 13 Sherman and Young (2016), When Financial Reporting Still Falls Short, Harvard Business Review, July-August Sood (2015), Truth, Lies and Brand Trust The Deceit Algorithm, http://datafication.com.au/ New Analytical Tools Can Help
  • 14. 14 Deception Algorithm (1) Self words e.g. “I” and “me” – decrease when someone distances themselves from content (2) Exclusive words e.g. “but” and “or” decrease with fabricated content owing to complexity of maintaining deception (3) Negative emotion words e.g. “hate” increase in word usage owing to shame or guilty feeling (4) Motion verbs e.g. “go” or “move” increase as exclusive words go down to keep the story on track
  • 15. Coronary Heart Disease, Psychological Science, January 2015 15 The findings show that expressions of negative emotions such as anger, stress, and fatigue in the tweets from people in a given county were associated with higher heart disease risk in that county. On the other hand, expressions of positive emotions like excitement and optimism were associated with lower risk. The results suggest that using Twitter as a window into a community’s collective mental state may provide a useful tool in epidemiology…So predictions from Twitter can actually be more accurate than using a set of traditional variables.
  • 17. 2017 Hype Cycle for Data Science and Machine Learning, 29 July, http://www.gartner.com/document/3772081 Gartner (2017) Strategic Predictions for 2017 and Beyond, research note 14 October, http://www.gartner.com/document/3471568  By 2020-22 :  100 million consumers shop in augmented reality  30% of web browsing sessions without a screen  Algorithms positively alter behavior of over 1B  Blockchain-based business worth $10B  IoT will save consumers/businesses $1T a year  40% of employees cut healthcare costs via fitness tracker Smart Data Discovery Will Enable New Class of Citizen Data Scientist “With the addition of NLG [Natural Language Generation], smart data discovery platforms automatically present a written or spoken context-based narrative of findings in the data that, alongside the visualization, inform the user about what is most important for them to act on in the data.” Gartner, 29 June, 2015
  • 18. “With the addition of NLG [Natural Language Generation], smart data discovery platforms automatically present a written or spoken context-based narrative of findings in the data that, alongside the visualization, inform the user about what is most important for them to act on in the data.” Gartner, 29 June, 2015 Smart Data Discovery Will Enable New Class of Citizen Data Scientist
  • 19. Systems of Insight  Automated pattern extraction  Outlier detection  Correlation  Time series  Analytics integration with process, app or IoT https://ubereats.com/melbourne/
  • 20. 20© 2017 FORRESTER. REPRODUCTION PROHIBITED. Forrester Research, 2016
  • 21. Reports & Analysis Visualisation & Interpretation Write Data/Business “Story” Insights Led by Data Analyst or Scientist SME owner, Machine Learning and Natural Language Generation Fusion of data science, business knowledge & creativity for maximium ROI Data Aggregation Operationalise Detect & Extract Patterns and Relationships Generate Insights & Story Process Application IoT Data Aggregation or Data Set Traditional Analytics: Slow & Expensive 80% of time sifting through data System of Insight (SoI) SoI: Fast & Cost Effective 80% of time in decision making with client
  • 22. 22 outlier-detection “allow detecting a significant fraction of fraudulent cases…different in nature from historical fraud…resulting in a novel fraud pattern” Baesens, B., Vlasselaer, V., and Verbeke, W., 2015, Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection, Wiley
  • 23. Online tenure leads to more spending per customer High engagement leads to more orders, more categories purchased, and more spend https://www.quillengage.com
  • 24. Better customer experiences . . . . . . and half the inventory-carrying costs of other online fashion retailers. Forrester, 2016
  • 25. The ANZ Heavy Traffic Index comprises flows of vehicles weighing more than 3.5 tonnes (primarily trucks) on 11 selected roads around NZ. It is contemporaneous with GDP growth. The ANZ Light Traffic Index is made up of light or total traffic flows (primarily cars and vans) on 10 selected roads around the country. It gives a six month lead on GDP growth in normal circumstances (but cannot predict sudden adverse events such as the Global Financial Crisis). http://www.a http://www.anz.co.nz/about-us/economic-markets-research/truckometer/ ANZ TRUCKOMETER
  • 26. Systems of Insight • Helps move away from “crisis levels” in talent • Traditional 5 step analytics process reduced to 2 step from data to action • Reimagine business processes through “machine engineering” • Minimise messy data issues and data preparation time
  • 27. Deep Learning Libraries, Platforms, APIs and Hardware
  • 28. Next Step Start using Data Science Innovations Systems of Insight and innovative data sources Natural Language Generation Deep Learning
  • 30. 30 The future is impossible to predict. However one thing is certain : The company that can excite it’s customers dreams Is out ahead in the race to business success Selling Dreams, Gian Luigi Longinotti