SlideShare a Scribd company logo
1 of 42
Download to read offline
#TOSMAC
Toronto SMAC Meetup – Welcome!
An Intro to Text Analytics on Big Data with a use case
#TOSMAC
Toronto SMAC Team
| © 2014 IBM Corporation2
Lucas Silva Felipe MosquettaMarcos de
Mello
#TOSMAC
Twitters numbers
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation3
As you know:
-500 million Tweets are sent per day.
-Twitter supports 35+ languages.
-255 million monthly active users.
Huge amount of data!
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation4
Overview
Section1 Section2 Section3 Section4 Section5
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation5
Overview
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation6
Overview
#TOSMAC
Let’s get started!
| © 2014 IBM Corporation7
#TOSMAC
Input data
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation8
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation9
Section2
#TOSMAC
Demo
| © 2014 IBM Corporation10
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation11
Next section
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation12
Next section Extractor: used to extract
structured information from
unstructured and
semi-structured data.
AQL: Annotation Query
Language. Rule language
with familiar SQL-like syntax.
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation13
Next section
Profiler:
troubleshooting performance
problems.
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation14
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation15
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation16
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation17
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation18
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation19
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation20
Types of extraction specifications:
- Dictionaries
-Regular expressions
- Part of speech
numbers:
7.5
4
13
#TOSMAC
Demo
| © 2014 IBM Corporation21
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation22
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation23
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation24
#TOSMAC
| © 2014 IBM Corporation25
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Basic feature AQL statements
- Develop the core building blocks of the extractor.
#TOSMAC
| © 2014 IBM Corporation26
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Candidate generation AQL statements
- Combine basic features AQL statements.
#TOSMAC
| © 2014 IBM Corporation27
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million
$4 thousand
$ 7.5 million
#TOSMAC
| © 2014 IBM Corporation28
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million
$4 thousand
$ 7.5 million
$7.5 million
#TOSMAC
| © 2014 IBM Corporation29
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Filter and consolidate AQL statements
- Refine results
- Remove invalid annotations
- Resolve overlap between annotations.
#TOSMAC
Demo
| © 2014 IBM Corporation30
#TOSMAC
| © 2014 IBM Corporation31
An Intro to Text Analytics on Big Data with a use case
Conclusion
#TOSMAC
Check point
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation32
#TOSMAC
What we have done
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation33
Section1 Section2 Section3
#TOSMAC
What are we going to do?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation34
Section4 Section5
#TOSMAC
Demo
| © 2014 IBM Corporation35
#TOSMAC
Also using R
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation36
1.75 0.32
#TOSMAC
What are we going to do?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation37
#TOSMAC
Demo
| © 2014 IBM Corporation38
#TOSMAC
So what?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation39
#TOSMAC
Companies
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation40
#TOSMAC
Exporting to you
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation41
#TOSMAC
Thank you!
Let's network!
| © 2014 IBM Corporation42

More Related Content

Viewers also liked

Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
DataWorks Summit
 

Viewers also liked (7)

Don't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROIDon't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROI
 
Big data analytics use case and software
Big data analytics use case and softwareBig data analytics use case and software
Big data analytics use case and software
 
Creating a Business Case for Big Data
Creating a Business Case for Big DataCreating a Business Case for Big Data
Creating a Business Case for Big Data
 
CRM as the hub of your big data - A Salesforce use case.
CRM as the hub of your big data - A Salesforce use case.CRM as the hub of your big data - A Salesforce use case.
CRM as the hub of your big data - A Salesforce use case.
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 

Similar to An Intro to Text Analytics on Big Data with a use case

Similar to An Intro to Text Analytics on Big Data with a use case (20)

A Text Analytics Marketscape (from Strata NY 2014)
A Text Analytics Marketscape (from Strata NY 2014)A Text Analytics Marketscape (from Strata NY 2014)
A Text Analytics Marketscape (from Strata NY 2014)
 
SMAC projects - The best summer internship experience I ever had!
SMAC projects - The best summer internship experience I ever had!SMAC projects - The best summer internship experience I ever had!
SMAC projects - The best summer internship experience I ever had!
 
Delivering Enterprise Applications: Faster. Cheaper. Better
Delivering Enterprise Applications: Faster. Cheaper. BetterDelivering Enterprise Applications: Faster. Cheaper. Better
Delivering Enterprise Applications: Faster. Cheaper. Better
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
Making Sense of DevOps Tools: Open Source to Enterprise Solutions
Making Sense of DevOps Tools: Open Source to Enterprise SolutionsMaking Sense of DevOps Tools: Open Source to Enterprise Solutions
Making Sense of DevOps Tools: Open Source to Enterprise Solutions
 
Making Sense of DevOps Tools: Open Source to Enterprise Solutions
Making Sense of DevOps Tools: Open Source to Enterprise SolutionsMaking Sense of DevOps Tools: Open Source to Enterprise Solutions
Making Sense of DevOps Tools: Open Source to Enterprise Solutions
 
Aspera In Telco
Aspera In TelcoAspera In Telco
Aspera In Telco
 
Vision2015-CBS-1148-Final
Vision2015-CBS-1148-FinalVision2015-CBS-1148-Final
Vision2015-CBS-1148-Final
 
0626 2014 01_toronto-smac meetup_io_t
0626 2014 01_toronto-smac meetup_io_t0626 2014 01_toronto-smac meetup_io_t
0626 2014 01_toronto-smac meetup_io_t
 
Analytics in High Tech Electronics Supply Chain
Analytics in High Tech Electronics Supply ChainAnalytics in High Tech Electronics Supply Chain
Analytics in High Tech Electronics Supply Chain
 
Big Data & Analytics Day
Big Data & Analytics Day Big Data & Analytics Day
Big Data & Analytics Day
 
Energy Central Webinar on June 14, 2016
Energy Central Webinar on June 14, 2016Energy Central Webinar on June 14, 2016
Energy Central Webinar on June 14, 2016
 
0430 toronto smac_meetup_worklight_intro_final
0430 toronto smac_meetup_worklight_intro_final0430 toronto smac_meetup_worklight_intro_final
0430 toronto smac_meetup_worklight_intro_final
 
Fast Track AIOps Automation with Prebuilt Databots
Fast Track AIOps Automation with Prebuilt DatabotsFast Track AIOps Automation with Prebuilt Databots
Fast Track AIOps Automation with Prebuilt Databots
 
IBM Relay 2015: Cloud is All About the Customer
IBM Relay 2015: Cloud is All About the Customer IBM Relay 2015: Cloud is All About the Customer
IBM Relay 2015: Cloud is All About the Customer
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 
How DixonsCarphone uses AppDynamics Application Analytics to Influence Busine...
How DixonsCarphone uses AppDynamics Application Analytics to Influence Busine...How DixonsCarphone uses AppDynamics Application Analytics to Influence Busine...
How DixonsCarphone uses AppDynamics Application Analytics to Influence Busine...
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
SaaS Data Protection
SaaS Data ProtectionSaaS Data Protection
SaaS Data Protection
 

More from Raul Chong

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 

More from Raul Chong (14)

Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
 
Design thinking
Design thinkingDesign thinking
Design thinking
 
Risk and financial portfolio analytics - A technical Introduction
Risk and financial portfolio analytics - A technical IntroductionRisk and financial portfolio analytics - A technical Introduction
Risk and financial portfolio analytics - A technical Introduction
 
Introducing Bluemix
Introducing BluemixIntroducing Bluemix
Introducing Bluemix
 
Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)
 
Business Analytics and Optimization Introduction
Business Analytics and Optimization IntroductionBusiness Analytics and Optimization Introduction
Business Analytics and Optimization Introduction
 
What has IBM Watson been up to since the Jeopardy! challenge?
What has IBM Watson been up to since the Jeopardy! challenge?What has IBM Watson been up to since the Jeopardy! challenge?
What has IBM Watson been up to since the Jeopardy! challenge?
 
Starting your education in big data - Sneak peek to the new Big Data University
Starting your education in big data - Sneak peek to the new Big Data UniversityStarting your education in big data - Sneak peek to the new Big Data University
Starting your education in big data - Sneak peek to the new Big Data University
 
Developing wearable technology apps quickly
Developing wearable technology apps quicklyDeveloping wearable technology apps quickly
Developing wearable technology apps quickly
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Mobile solutions for iOS (and other platforms) - Cloudant
Mobile solutions for iOS (and other platforms) - CloudantMobile solutions for iOS (and other platforms) - Cloudant
Mobile solutions for iOS (and other platforms) - Cloudant
 
Mobile solutions for iOS (and other platforms) - Worklight
Mobile solutions for iOS (and other platforms) - WorklightMobile solutions for iOS (and other platforms) - Worklight
Mobile solutions for iOS (and other platforms) - Worklight
 
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

An Intro to Text Analytics on Big Data with a use case