IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

© 2015 IBM Corporation
Using Bluemix and dashDB for Twitter Analysis
Session # 1824
Torsten Steinbach @torsstei

Please Note:
• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in
making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any
material, code or functionality. Information about potential future products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a
controlled environment. The actual throughput or performance that any user will experience will vary
depending upon many factors, including considerations such as the amount of multiprogramming in the
user’s job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve results similar to those stated
here.
2

IBM Insights for Twitter Service in Bluemix
2

Query exactly the data that
your social application needs.
Get IBM analytics enrichments
in addition to base Twitter data.
Whenever needed, check
whether previously received
Tweets are still valid
(compliance).
Ingest, enrich, curate,
govern Decahose
data over time.
Receive & process
compliance events.
Social Application using the IBM Insights for Twitter Service
IBM Insights for
Twitter Service:
Search over enriched
Decahose Data
IBM Insights for
Twitter Service:
Search over enriched
Decahose Data
Twitter
GNIP APIs
Twitter
GNIP APIs
Social
Application
Social
Application
IBM Insights for
Twitter System
on Softlayer
IBM Insights for
Twitter System
on Softlayer
Twitter Data enriched
through IBM Analytics
Twitter Data enriched
through IBM Analytics
Store and Index up to 2-year history of
enriched Tweets, point in time compliant
5
PowerTrack
collection rules &
filters.

Queries
6
keyword Matches tweets that have “keyword” in their body. The search is case-insensitive. cat
“exact phrase match”
Matches tweets that contain the exact keyword sequence <”exact”, “phrase”,
“match”>.
"cats and dogs"
#hashtag Matches tweets with the hashtag “#hashtag”. #insight2014
from: twitterHandle
Returns tweets from authors with the preferredUsername twitterHandle. Must not
contain the @ sign.
from:alexlang11
followers_count:lower
followers_count:lower,upper
Matches tweets of authors that have at least “lower” followers. The upper bound is
optional and both limits are inclusive.
followers_count:500
posted:startTime
posted:startTime, endTime
Matches tweets that have been posted at or after “startTime”. The “endTime” bound
is optional, and is inclusive.
Timestamps have to be in one of the following two formats:
“yyyy-mm-dd”
“yyyy-mm-dd'T'HH:MM:SS'Z'”
Timezone is UTC
posted: 2014-12-1T00:00:00Z,
2014-12-12T00:00:00Z
The query language mimics the Gnip Powertrack query language, a subset of Powertrack operators is available. See documentation in Bluemix as we roll out more query
operators.
Boolean Operators
Operator precedence: “-” is stronger than “AND” and “AND” is stronger “than OR”. You can (and should) use parentheses to make operator precedence explicit.
Example: ibm twitter -(lame OR boring) searches for tweets that contain both the terms “ibm” and “twitter” but neither “lame” nor “boring”.
Query terms
All of the following query terms can be freely combined with the boolean operators introduced above, e.g. ibm apple followers_count:500
Operator Example(s) Description
term1 AND term2
cat dog
cat AND dog
#cutecat food
Returns tweets that contain both term1 and term2.
Whitespace between two terms is treated as AND, so the
operator can be omitted
term1 OR term2 #money OR broke Returns tweets that contain either term1 or term2
-term1 ibm -apple Returns tweets that do not contain term1

Count: /messages/count?q=QUERY
• Use to find out how many Tweets match a given query
7
Http Code Description Example Response
200
Number of results at json_path(“search.results”)
URL to retrieve documents at
json_path(“related.search.href”)
Note: add you client_id and your client_secret to this URL
{
"search":{ "results":21695 }
"related":{ "search":
{ "href":"https://server.bluemix.net/api/v1/mes
sages/search?q=ibm" } },
}
4xx
There was a problem with your query. Please have a look at
json_path(“error”) to identify the problem.
5xx
There was a problem with the service. Please have a look at
json_path(“error”) and contact support.

Search: /messages/search?q=QUERY&size=NUMBER
• Search & retrieve <= NUMBER Tweets matching QUERY
8
Http Code Description Example Response
200
Number of overall results at
json_path(“search.results”)
First batch of results at json_path("tweets")
URL to retrieve the next batch of documents
(if available) at json_path(“related.next.href”)
Note: add you client_id and your
client_secret to this URL
{ "search": { "results": 16283624 },
"tweets": [ { "message": {
…
“body”: “this is a nice tweet ”
…
“actor” : { “followersCount”: 456,
“displayName”: “IBM Tweeter”
…
“cde” : {
"sentiment": { "polarity": "POSITIVE" ...
“author”: { “gender”:”male” …
}
4xx
There was a problem with your query.
Please have a look at json_path(“error”) to
identify the problem.
5xx
There was a problem with the service.
Please have a look at json_path(“error”) and
contact support.

Example Queries
• Get Tweets about an upcoming movie for a given time frame to sense interest &
reactions to trailer:
search?q="posted:2015-02-01T00:00:00Z AND #starwars"&size=5
• Get Tweets with positive/negative sentiment about a product to learn what
customers like / dislike about the product:
search?q="IBM Bluemix sentiment:positive"
• Get Tweets about a product being marketed and compare over time to sense
audience reaction to the campaign:
search?q="posted:2015-02-01T00:00:00Z,2015-02-15T00:00:00Z
AND #IBM"
9

Built-in Tool to load Tweets to dashDB

dashDBdashDB
Predictive Analytics With R In dashDB 1/3
• Built-in R runtime
& R Studio
• ibmdbR package
 Data frames logically representing data physically residing in dashDB tables
> con <- idaConnect("BLUDB", "", "")
> idaInit(con)
> sysusage<-ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')
> systems<-ida.data.frame('DB2INST1.SHOWCASE_SYSTEMS')
> systypes<-ida.data.frame('DB2INST1.SHOWCASE_SYSTYPES’)
 Push down of R data preparation to dashDB
> sysusage2 <- sysusage[sysusage$MEMUSED>50000,c("MEMUSED","USERS")]
> mergedSys<-idaMerge(systems, systypes, by='TYPEID')
> mergedUsage<-idaMerge(sysusage2, mergedSys, by='SID’)
 Push down of analytic algorithms to in-db execution
> lm1 <- idaLm(MEMUSED~USERS, mergedUsage)
R RuntimeR Runtime
BrowserBrowser
Any R RuntimeAny R Runtime
ibmdbRibmdbR
ibmdbRibmdbR
RStudioRStudio
REST Client
REST

 Dynamite-native implementation of statistical functions
• colnames, cor, cov, dim, head, length, max, mean, min, names, print, sd, summary, var
 Logically derived columns pushed down to Dynamite
> myDF <- ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')
> myDF$MemPerUser <- myDF$MEMUSED / myDF$USERS
 Sampling of tables in Dynamite
> idaSample(myDF, 3)
SID DATE USERS MEMUSED ALERT MemPerUser
1 8 2014-02-14 23:39:00.000000 34 5015 f 147
2 5 2014-01-22 07:52:00.000000 96 11512 f 119
3 7 2013-09-12 05:17:00.000000 39 5592 t 143
 Statistics about tables in Dynamite
> summary(myDF)
SID USERS MEMUSED ALERT MemPerUser
Min. :0.000 Min. : 3.000 Min. : 350.000 f :3655563 Min. :105.000
1st Qu.:2.000 1st Qu.: 35.000 1st Qu.: 5113.000 t :1344437 1st Qu.:135.000
Median :4.500 Median : 64.000 Median : 9455.000 NA's: NA Median :150.000
Mean : NA Mean : NA Mean : NA Mean : NA
3rd Qu.:7.000 3rd Qu.:111.000 3rd Qu.:16517.000 3rd Qu.:165.000
Max. :9.000 Max. :347.000 Max. :62379.000 Max. :209.000
 Statistics about categorical values
> idaTable(myDF)
ALERT
f t
3655563 1344437

 Store R objects in Dynamite database
> myPrivateObjects <- ida.list(type='private’)
> myPrivateObjects['series100'] <- 1:100
> x <- myPrivateObjects['series100’]
> X
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
[45] 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
[67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
[89] 89 90 91 92 93 94 95 96 97 98 99 100
> names(myPrivateObjects)
[1] "series100”
> myPrivateObjects['series100'] <- NULL
 Manage Dynamite tables
> idaExistTable('DB2INST1.SHOWCASE_SYSUSAGE')
[1] TRUE
> idaShowTables()
Schema Name Owner Type
1 BLUADMIN R_OBJECTS_PRIVATE BLUADMIN T
2 BLUADMIN R_OBJECTS_PRIVATE_META BLUADMIN T
3 BLUADMIN R_OBJECTS_PUBLIC BLUADMIN T
4 BLUADMIN R_OBJECTS_PUBLIC_META BLUADMIN T
> myView <- idaCreateView(myDF)
> idaIsView(myView)
[1] TRUE
> idaDropView(myView)
> idaIsView(myView)
[1] FALSE

 Create you R script with RStudio
• Storing it in home dir inside dashDB
 POST <dashdb-server>/dashdb-api/rscript/<fileName>
• Run the specified R script
 GET <dashdb-server>/dashdb-api/home
• List all files under user home (recursively)
– E.g. list the output written by your R script
 GET <dashdb-server>/dashdb-api/home/<fileName>
• Download the specified file
Running R in dashDB via REST API
15

dashDBdashDB
Predictive Analytics With Python In dashDB
• Bluemix Analytic Notebooks
• ibmdbPy package
 https://pypi.python.org/pypi/ibmdbpy
 Data frames logically representing data physically residing in dashDB tables
from ibmdbpy import IdaDataFrame
idadf = IdaDataFrame(idadb, "IRIS", indexer = "ID")
idadf = idadf[["ID","sepal_length", "sepal_width"]]
idadf['new'] = idadf['sepal_width'] + idadf['sepal_length'].mean()
idadf.head()
 Push down of analytic algorithms to in-db execution
from ibmdbpy.learn import KMeans
kmeans = KMeans(3) # clustering with 3 clusters
kmeans.fit_predict(idadf).head()
Analytics for Spark
Notebook in Bluemix
Analytics for Spark
Notebook in Bluemix
BrowserBrowser
Any Python RuntimeAny Python Runtime
ibmdbPyibmdbPy
ibmdbPyibmdbPy

Loading Twitter Data to dashDB with Bluemix App
Show Case for box office analysis with Twitter:
www.youtube.com/watch?v=9yVNwOs9L4c
Twitter loader app for dashDB: hub.jazz.net/project/torsstei/Twitter-Loader/overview
(www.youtube.com/watch?v=ANakSSGM4zU)

18
Movie Analysis Show Case
Public map data for US counties
https://www.census.gov/geo/maps-data/data/tiger-line.html
In Bluemix
dashDB service for analytics and
correlation between Tweets and
box office data
Box Office stats from the-numbers.com
Interactive app for visualization
using Node.JS and D3.js libraryTweets about movies
from Bluemix service
dashDB
Analysis using
built-in R &
RStudio
https://hub.jazz.net/project/torsstei/movie-analysis

Movie Analysis Show Case https://hub.jazz.net/project/torsstei/movie-analysis

S3
Swift
Populating dashDB with Data
dashDB
Geodata in Esri
ShapefilesOn Premise Databases
Mobile App Data
in Cloudant
GeoJSON
Twitter
The Weather Company
CSVs
Open Data
Bluemix
Cloud Storage
data.gc.ca, data.gov, data.gov.uk,
datahub.io, openAFRICA

The Weather Company Data Loader Bluemix App
2

dashDB: Key Use Cases
• Minimize capital expense of DR solutionDR in the Cloud

We Bring Netezza Compatible Analytic Platform to the
Cloud
Analytic Extension FrameworkAnalytic Extension Framework
UDX C++ APIUDX C++ API
Canned AnalyticsCanned Analytics
Application IntegrationApplication Integration
AE FrameworkAE Framework In-DB RIn-DB R In-DB LUAIn-DB LUAIn-DB PythonIn-DB Python In-DB PerlIn-DB Perl
OLAP FunctionsOLAP Functions
ROW_NUMBERROW_NUMBER
RANKRANK
LAGLAG LEADLEAD
DENSE_RANKDENSE_RANK Linear RegressionLinear Regression
Kmeans
Clustering
Kmeans
Clustering Decision TreeDecision Tree
Association RulesAssociation Rules
Association RulesAssociation Rules
Naive BayesNaive Bayes
Spatial OperatorsSpatial Operators
ContainsContains
TouchesTouches
WithinWithin
IntersectsIntersects
CrossesCrosses
OverlapsOverlaps
R WrapperR Wrapper Watson AnalyticsWatson Analytics ESRI ArcGIS
Connector
ESRI ArcGIS
Connector ……
Analytics Applications of ISVs and CustomersAnalytics Applications of ISVs and Customers
STDDEVSTDDEV
COVARCOVAR
…………

Analytic Code &
Algorithms:
Analytic Data:
Data pulled out and processed in analytic
application
Analytic
Applications
This is where we start from: All analytic processing done on application side
Analytics of Warehouse Data

SQLs
Analytic Code &
Algorithms:
Analytic Data:
Simple data lookup & massage operations
pushed down as SQL operations
Analytic
Applications
Benefit: Acceleration with no SQL skills required
SQLs
Push Down Step 1: BLU tables only logically represented in analytic application
Accelerate Analytics for Warehouse Data

SQLs
Analytic Code &
Algorithms:
Analytic Data:
Call built-in functions via SQL to execute
typical algorithms inside db
Cloud Tooling
Analytic
Applications
Benefit: Bring Standard Analytics to the Data
SQLs
Canned Algorithms
Push Down Step 2: Typical and popular algorithms pushed down to canned UDFs in the db

LanguageFramework
(UDX&AE)
Analytic Code &
Algorithms:
Analytic Data:
Deploy customer code and call via special
SQL function interfaces
SQLs
SQLs
Canned Algorithms
Analytic
Applications
Benefit: Bring Custom Analytics to the Data
Push Down Step 3: Execute entire customer analytic programs inside the db

Don’t forget to submit your Insight session and speaker feedback! Your
feedback is very important to us – we use it to continually improve the
conference.
Access your surveys at insight2015survey.com to quickly submit your surveys
from your smartphone, laptop or conference kiosk.
We Value Your Feedback!
31

32
Notices and Disclaimers
Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form
without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for
accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to
update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO
EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO,
LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted
according to the terms and conditions of the agreements under which they are provided.
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as
illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other
results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services
available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the
views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or
other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the
identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the
customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will
ensure that the customer is in compliance with any law.

33
Notices and Disclaimers (con’t)
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly
available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the
suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
•IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DB2® , DOORS®, Emptoris®, Enterprise
Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM
SmartCloud®, IBM Social Business®, IMS™, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON,
OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®,
pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®,
Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International
Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or
other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at:
www.ibm.com/legal/copytrade.shtml.

IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Recommandé

Recommandé

Contenu connexe

Similaire à IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Similaire à IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis (20)

Plus de Torsten Steinbach

Plus de Torsten Steinbach (17)

Dernier

Dernier (20)

IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis

Notes de l'éditeur