2. Please Note:
• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in
making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any
material, code or functionality. Information about potential future products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a
controlled environment. The actual throughput or performance that any user will experience will vary
depending upon many factors, including considerations such as the amount of multiprogramming in the
user’s job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve results similar to those stated
here.
2
5. Query exactly the data that
your social application needs.
Get IBM analytics enrichments
in addition to base Twitter data.
Whenever needed, check
whether previously received
Tweets are still valid
(compliance).
Ingest, enrich, curate,
govern Decahose
data over time.
Receive & process
compliance events.
Social Application using the IBM Insights for Twitter Service
IBM Insights for
Twitter Service:
Search over enriched
Decahose Data
IBM Insights for
Twitter Service:
Search over enriched
Decahose Data
Twitter
GNIP APIs
Twitter
GNIP APIs
Social
Application
Social
Application
IBM Insights for
Twitter System
on Softlayer
IBM Insights for
Twitter System
on Softlayer
Twitter Data enriched
through IBM Analytics
Twitter Data enriched
through IBM Analytics
Store and Index up to 2-year history of
enriched Tweets, point in time compliant
5
PowerTrack
collection rules &
filters.
6. Queries
6
keyword Matches tweets that have “keyword” in their body. The search is case-insensitive. cat
“exact phrase match”
Matches tweets that contain the exact keyword sequence <”exact”, “phrase”,
“match”>.
"cats and dogs"
#hashtag Matches tweets with the hashtag “#hashtag”. #insight2014
from: twitterHandle
Returns tweets from authors with the preferredUsername twitterHandle. Must not
contain the @ sign.
from:alexlang11
followers_count:lower
followers_count:lower,upper
Matches tweets of authors that have at least “lower” followers. The upper bound is
optional and both limits are inclusive.
followers_count:500
posted:startTime
posted:startTime, endTime
Matches tweets that have been posted at or after “startTime”. The “endTime” bound
is optional, and is inclusive.
Timestamps have to be in one of the following two formats:
“yyyy-mm-dd”
“yyyy-mm-dd'T'HH:MM:SS'Z'”
Timezone is UTC
posted: 2014-12-1T00:00:00Z,
2014-12-12T00:00:00Z
The query language mimics the Gnip Powertrack query language, a subset of Powertrack operators is available. See documentation in Bluemix as we roll out more query
operators.
Boolean Operators
Operator precedence: “-” is stronger than “AND” and “AND” is stronger “than OR”. You can (and should) use parentheses to make operator precedence explicit.
Example: ibm twitter -(lame OR boring) searches for tweets that contain both the terms “ibm” and “twitter” but neither “lame” nor “boring”.
Query terms
All of the following query terms can be freely combined with the boolean operators introduced above, e.g. ibm apple followers_count:500
Operator Example(s) Description
term1 AND term2
cat dog
cat AND dog
#cutecat food
Returns tweets that contain both term1 and term2.
Whitespace between two terms is treated as AND, so the
operator can be omitted
term1 OR term2 #money OR broke Returns tweets that contain either term1 or term2
-term1 ibm -apple Returns tweets that do not contain term1
7. Count: /messages/count?q=QUERY
• Use to find out how many Tweets match a given query
7
Http Code Description Example Response
200
Number of results at json_path(“search.results”)
URL to retrieve documents at
json_path(“related.search.href”)
Note: add you client_id and your client_secret to this URL
{
"search":{ "results":21695 }
"related":{ "search":
{ "href":"https://server.bluemix.net/api/v1/mes
sages/search?q=ibm" } },
}
4xx
There was a problem with your query. Please have a look at
json_path(“error”) to identify the problem.
5xx
There was a problem with the service. Please have a look at
json_path(“error”) and contact support.
8. Search: /messages/search?q=QUERY&size=NUMBER
• Search & retrieve <= NUMBER Tweets matching QUERY
8
Http Code Description Example Response
200
Number of overall results at
json_path(“search.results”)
First batch of results at json_path("tweets")
URL to retrieve the next batch of documents
(if available) at json_path(“related.next.href”)
Note: add you client_id and your
client_secret to this URL
{ "search": { "results": 16283624 },
"tweets": [ { "message": {
…
“body”: “this is a nice tweet ”
…
“actor” : { “followersCount”: 456,
“displayName”: “IBM Tweeter”
…
“cde” : {
"sentiment": { "polarity": "POSITIVE" ...
“author”: { “gender”:”male” …
}
4xx
There was a problem with your query.
Please have a look at json_path(“error”) to
identify the problem.
5xx
There was a problem with the service.
Please have a look at json_path(“error”) and
contact support.
9. Example Queries
• Get Tweets about an upcoming movie for a given time frame to sense interest &
reactions to trailer:
search?q="posted:2015-02-01T00:00:00Z AND #starwars"&size=5
• Get Tweets with positive/negative sentiment about a product to learn what
customers like / dislike about the product:
search?q="IBM Bluemix sentiment:positive"
• Get Tweets about a product being marketed and compare over time to sense
audience reaction to the campaign:
search?q="posted:2015-02-01T00:00:00Z,2015-02-15T00:00:00Z
AND #IBM"
9
12. dashDBdashDB
Predictive Analytics With R In dashDB 1/3
• Built-in R runtime
& R Studio
• ibmdbR package
Data frames logically representing data physically residing in dashDB tables
> con <- idaConnect("BLUDB", "", "")
> idaInit(con)
> sysusage<-ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')
> systems<-ida.data.frame('DB2INST1.SHOWCASE_SYSTEMS')
> systypes<-ida.data.frame('DB2INST1.SHOWCASE_SYSTYPES’)
Push down of R data preparation to dashDB
> sysusage2 <- sysusage[sysusage$MEMUSED>50000,c("MEMUSED","USERS")]
> mergedSys<-idaMerge(systems, systypes, by='TYPEID')
> mergedUsage<-idaMerge(sysusage2, mergedSys, by='SID’)
Push down of analytic algorithms to in-db execution
> lm1 <- idaLm(MEMUSED~USERS, mergedUsage)
R RuntimeR Runtime
BrowserBrowser
Any R RuntimeAny R Runtime
ibmdbRibmdbR
ibmdbRibmdbR
RStudioRStudio
REST Client
REST
13. Predictive Analytics With R In dashDB 2/3
Dynamite-native implementation of statistical functions
• colnames, cor, cov, dim, head, length, max, mean, min, names, print, sd, summary, var
Logically derived columns pushed down to Dynamite
> myDF <- ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')
> myDF$MemPerUser <- myDF$MEMUSED / myDF$USERS
Sampling of tables in Dynamite
> idaSample(myDF, 3)
SID DATE USERS MEMUSED ALERT MemPerUser
1 8 2014-02-14 23:39:00.000000 34 5015 f 147
2 5 2014-01-22 07:52:00.000000 96 11512 f 119
3 7 2013-09-12 05:17:00.000000 39 5592 t 143
Statistics about tables in Dynamite
> summary(myDF)
SID USERS MEMUSED ALERT MemPerUser
Min. :0.000 Min. : 3.000 Min. : 350.000 f :3655563 Min. :105.000
1st Qu.:2.000 1st Qu.: 35.000 1st Qu.: 5113.000 t :1344437 1st Qu.:135.000
Median :4.500 Median : 64.000 Median : 9455.000 NA's: NA Median :150.000
Mean : NA Mean : NA Mean : NA Mean : NA
3rd Qu.:7.000 3rd Qu.:111.000 3rd Qu.:16517.000 3rd Qu.:165.000
Max. :9.000 Max. :347.000 Max. :62379.000 Max. :209.000
Statistics about categorical values
> idaTable(myDF)
ALERT
f t
3655563 1344437
15. Create you R script with RStudio
• Storing it in home dir inside dashDB
POST <dashdb-server>/dashdb-api/rscript/<fileName>
• Run the specified R script
GET <dashdb-server>/dashdb-api/home
• List all files under user home (recursively)
– E.g. list the output written by your R script
GET <dashdb-server>/dashdb-api/home/<fileName>
• Download the specified file
Running R in dashDB via REST API
15
16. dashDBdashDB
Predictive Analytics With Python In dashDB
• Bluemix Analytic Notebooks
• ibmdbPy package
https://pypi.python.org/pypi/ibmdbpy
Data frames logically representing data physically residing in dashDB tables
from ibmdbpy import IdaDataFrame
idadf = IdaDataFrame(idadb, "IRIS", indexer = "ID")
idadf = idadf[["ID","sepal_length", "sepal_width"]]
idadf['new'] = idadf['sepal_width'] + idadf['sepal_length'].mean()
idadf.head()
Push down of analytic algorithms to in-db execution
from ibmdbpy.learn import KMeans
kmeans = KMeans(3) # clustering with 3 clusters
kmeans.fit_predict(idadf).head()
Analytics for Spark
Notebook in Bluemix
Analytics for Spark
Notebook in Bluemix
BrowserBrowser
Any Python RuntimeAny Python Runtime
ibmdbPyibmdbPy
ibmdbPyibmdbPy
17. Loading Twitter Data to dashDB with Bluemix App
Show Case for box office analysis with Twitter:
www.youtube.com/watch?v=9yVNwOs9L4c
Twitter loader app for dashDB: hub.jazz.net/project/torsstei/Twitter-Loader/overview
(www.youtube.com/watch?v=ANakSSGM4zU)
18. 18
Movie Analysis Show Case
Public map data for US counties
https://www.census.gov/geo/maps-data/data/tiger-line.html
In Bluemix
dashDB service for analytics and
correlation between Tweets and
box office data
Box Office stats from the-numbers.com
Interactive app for visualization
using Node.JS and D3.js libraryTweets about movies
from Bluemix service
dashDB
Analysis using
built-in R &
RStudio
https://hub.jazz.net/project/torsstei/movie-analysis
19. Movie Analysis Show Case https://hub.jazz.net/project/torsstei/movie-analysis
21. S3
Swift
Populating dashDB with Data
dashDB
Geodata in Esri
ShapefilesOn Premise Databases
Mobile App Data
in Cloudant
GeoJSON
Twitter
The Weather Company
CSVs
Open Data
Bluemix
Cloud Storage
data.gc.ca, data.gov, data.gov.uk,
datahub.io, openAFRICA
25. dashDB: Key Use Cases
• Minimize capital expense of DR solutionDR in the Cloud
26. We Bring Netezza Compatible Analytic Platform to the
Cloud
Analytic Extension FrameworkAnalytic Extension Framework
UDX C++ APIUDX C++ API
Canned AnalyticsCanned Analytics
Application IntegrationApplication Integration
AE FrameworkAE Framework In-DB RIn-DB R In-DB LUAIn-DB LUAIn-DB PythonIn-DB Python In-DB PerlIn-DB Perl
OLAP FunctionsOLAP Functions
ROW_NUMBERROW_NUMBER
RANKRANK
LAGLAG LEADLEAD
DENSE_RANKDENSE_RANK Linear RegressionLinear Regression
Kmeans
Clustering
Kmeans
Clustering Decision TreeDecision Tree
Association RulesAssociation Rules
Association RulesAssociation Rules
Naive BayesNaive Bayes
Spatial OperatorsSpatial Operators
ContainsContains
TouchesTouches
WithinWithin
IntersectsIntersects
CrossesCrosses
OverlapsOverlaps
R WrapperR Wrapper Watson AnalyticsWatson Analytics ESRI ArcGIS
Connector
ESRI ArcGIS
Connector ……
Analytics Applications of ISVs and CustomersAnalytics Applications of ISVs and Customers
STDDEVSTDDEV
COVARCOVAR
…………
27. Analytic Code &
Algorithms:
Analytic Data:
Data pulled out and processed in analytic
application
Analytic
Applications
This is where we start from: All analytic processing done on application side
Analytics of Warehouse Data
28. SQLs
Analytic Code &
Algorithms:
Analytic Data:
Simple data lookup & massage operations
pushed down as SQL operations
Analytic
Applications
Benefit: Acceleration with no SQL skills required
SQLs
Push Down Step 1: BLU tables only logically represented in analytic application
Accelerate Analytics for Warehouse Data
29. SQLs
Analytic Code &
Algorithms:
Analytic Data:
Call built-in functions via SQL to execute
typical algorithms inside db
Cloud Tooling
Analytic
Applications
Benefit: Bring Standard Analytics to the Data
SQLs
Canned Algorithms
Push Down Step 2: Typical and popular algorithms pushed down to canned UDFs in the db
Accelerate Analytics for Warehouse Data
30. LanguageFramework
(UDX&AE)
Analytic Code &
Algorithms:
Analytic Data:
Deploy customer code and call via special
SQL function interfaces
SQLs
SQLs
Canned Algorithms
Analytic
Applications
Benefit: Bring Custom Analytics to the Data
Push Down Step 3: Execute entire customer analytic programs inside the db
Accelerate Analytics for Warehouse Data
31. Don’t forget to submit your Insight session and speaker feedback! Your
feedback is very important to us – we use it to continually improve the
conference.
Access your surveys at insight2015survey.com to quickly submit your surveys
from your smartphone, laptop or conference kiosk.
We Value Your Feedback!
31
33. 33
Notices and Disclaimers (con’t)
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly
available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the
suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
•IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DB2® , DOORS®, Emptoris®, Enterprise
Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM
SmartCloud®, IBM Social Business®, IMS™, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON,
OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®,
pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®,
Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International
Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or
other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at:
www.ibm.com/legal/copytrade.shtml.