SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
1
Copyright © 2014 Splunk Inc.
Tom LaGatta
Data Scientist, Splunk
Olivier de Garrigues
Sr Prof Services Consultant,
Splunk
Splunk	
  for	
  Data	
  Science	
  
2
Disclaimer
During the course of this presentation, we may make forward-looking statements regarding
future events or the expected performance of the company. We caution you that such
statements reflect our current expectations and estimates based on factors currently known to us
and that actual events or results could differ materially. For important factors that may cause
actual results to differ from those contained in our forward-looking statements, please review our
filings with the SEC. The forward-looking statements made in this presentation are being made
as of the time and date of its live presentation. If reviewed after its live presentation, this
presentation may not contain current or accurate information. We do not assume any obligation
to update any forward-looking statements we may make. In addition, any information about our
roadmap outlines our general product direction and is subject to change at any time without
notice. It is for informational purposes only and shall not be incorporated into any contract or
other commitment. Splunk undertakes no obligation either to develop the features or functionality
describedortoincludeanysuchfeatureorfunctionalityinafuturerelease.
3
3 Key Takeaways
Splunk	
  is	
  great	
  for	
  
doing	
  Data	
  Science!	
  
Splunk	
  complements	
  
other	
  tools	
  in	
  the	
  
Data	
  Science	
  toolkit.	
  
Data	
  Science	
  is	
  about	
  
extrac:ng	
  ac:onable	
  
insights	
  from	
  data.	
  
1 2 3
4
About Us
• Tom LaGatta, Data Scientist – Tom joined Splunk in Spring 2014 as a Data Scientist specializing in
Probability and Statistics. Tom is an expert on the mathematics of inference, and he enjoys functional
programming in languages like Clojure, Haskell & R.At Splunk, Tom is helping to develop our internal
and external Data Science program and curriculum. Tom has a PhD in Mathematics from the
University of Arizona, and until recently was a Courant Instructor at the Courant Institute at New York
University.TomisbasedinNewYorkCity.
• Olivier de Garrigues, Senior Professional Services Consultant – Olivier is based in London on the
EMEA Professional Services team and has helped out more than 40 customers in 10 countries on
variousSplunkprojectsinthepastyearandahalf.Priortothis,heworkedasaquantitativeanalystwith
extensive use of MATLAB and R. He developed a keen interest in machine learning and enjoys
dreaming about how to make Splunk better for data scientists, and helped develop the R ProjectApp.
OlivierholdsanMSinMathematicsofFinancefromColumbiaUniversity.
5
Splunk	
  for	
  Data	
  Science	
  
6
What is Data Science?
Data Science is about extracting actionable insights from data.
• Helps people make better decisions.
• Can be used for automated decision-making.
• Data Science is cross-functional, and blends techniques & theories from:
– CS / Programming
– Math and Statistics
– Machine Learning
– Data Mining / Databases
– Data Visualization
• Don’t be afraid of Data Science!
– Substantive / Domain Expertise
– Social Science
– Communication and Presentation
– Accounting, Finance and KPIs
– BusinessAnalytics
7
Data Science & Analytics Teams
There is no “one size fits all” data scientist. Data Science &Analytics teams
are made up of people with complementary skill sets.
Source: Schutt & O’Neil. Doing Data Science. 2013
8
Splunk for Data Science
Splunk is great for doing Data Science!
• Integrate, query & visualize all the data:
– Platform for machine data
– Connects with any other data source
• Easy-to-useAnalytics capabilities.
• Powerful algorithms out-of-the-box.
• Sharp visualizations and dashboards.
• Deliver results to both IT & Business users.
• Complements other Data Science tools (next slide).
9
Splunk and Data Science Tools
Splunk complements other tools in the Data Science toolkit:
• Hadoop: the workhorse of the Data Science world. Using Hunk, you can
integrate Hadoop & HDFS seamlessly into Splunk.
• R & Python: the preferred languages of Data Science. Execute R & Python
scripts in your Splunk queries using the R ProjectApp & SDK for Python.
• SQL& other RDBMS: valuable stores for customer & product data. Use
Splunk’s DB ConnectApp to mash relational data up with machine data.
• External tools: export finalized data from Splunk using the ODBC Driver.
– Tip: do all your data processing in Splunk/Hunk, and export only the final results.
• D3 Custom Visualizations: sharp dashboards & reports using Splunk.
10
Splunk and Data Science Use Cases
Green Use Cases (easy out of the box) Yellow Use Cases (needs tinkering)
Trend Forecasting D3 Custom Visualizations
A/B Testing Predictive Modeling
Root Cause Analysis Sentiment Analysis
Anomaly Detection Conversion Funnel/Pathing
Market Segmentation More Algorithms via R & Python
Topic Modeling
Capacity Planning
Correlate Data from 2+ Sources
Data Munging & Normalization
KPIs & Executive Dashboards
Splunk is a powerful tool for lots of Data Science use cases:
11
Data	
  Science	
  Use	
  Cases	
  
12
Use Case: Trend Forecasting
Trend Forecasting: Given past & realtime data, predict future values & events.
• Common applications:
– Forecast revenue & other KPIs
– Web server traffic & product downloads
– Customer conversion rates
– Estimate MTTR & server outages
– Resource & capacity planning (AWSApp)
– Security threats (Enterprise SecurityApp)
• The “true” course of events can (and will) take only one of many divergent
paths. But which one…?
• Be mindful of rare events & black swans!
13
Splunk Solution: predict!
predict command: forecast future trajectories of time series.
• Implements a Kalman filter
to identify seasonal trends.
• Gives an “uncertainty
envelope” as a buffer
around the trend.
• Tip:Always run the predict
command on LOTS of past
data. Capture low-frequency
and high-frequency trends.
• Remember: the future is always uncertain…
14
Splunk Solution: Predict App
David Carasso’s PredictApp: forecast future values of individual events.
– 8 minute walkthrough: https://www.youtube.com/watch?v=ROvaqJigNFg
• Implements a Naïve Bayes classifier.
• You have to train models!
• Train a model to predict any target
field using any reference field(s):
fields ref1, ref2, ..., target

| train my_model from target!
• Guess target field for incoming events:
guess my_model into target
• Temporal or non-temporal prediction (include _time among reference fields).
15
Concept: Supervised Learning & Classification
Supervised learning: use observed training data to classify values of
unknown testing data.
• predict command (Kalman filter):
Training data = timechart of past & realtime values.
Testing data = time range for future values.
• PredictApp (Naïve Bayes classifier):
Training data = events with reference & target fields.
Testing data = events with reference fields but not target field.
• Tip: only deploy models & algorithms after extensive testing & evaluation.
• More powerful learning algorithms using R ProjectApp or SDK for Python.
16
Demo: Predict App
• Train a model to predict movie Rating based on MovieID, UserID, Genre,Tag
index=movielens Timestamp < 1199188800 UserID=593* | eval original_rating =
case(Rating<3,"Dislike", Rating=3,"Neutral", Rating>3,"Like") | fields
original_rating MovieID UserID Genre Tag | train rating_model from original_rating!
• Guess Rating for test data based on trained model
index=movielens Timestamp > 1199188800 UserID=593* | guess rating_model into
guessed_rating | top original_rating guessed_rating!
• Accuracy of model:
correct on 97.6% of values.
• Tip: always train on
LOTS of training data.
• Evaluate before deploying.
17
Use Case: Sentiment Analysis
SentimentAnalysis: the assignment of “emotional” labels to textual data.
• Can be simple +1 vs. -1, or more sophisticated: “happy”, “angry”, “sad”, etc.
• Analyze tweets, emails, news articles,
logs or any other textual data!
– Social data correlates with other factors.
• Typically done via supervised learning:
– Train a model on labeled corpus of text.
– Test the model on incoming text data.
• Read more about SentimentAnalysis:
–  Chapter14ofBigDataAnalyticsUsingSplunk(pp.255-282).
–  MichaelWilde&DavidCarasso.SocialMedia&SentimentAnalysis..conf2012
3rd 8th 4th 1st 2nd
2011 Irish General Election
17% 1.8% 10% 36% 19%
★
r=.79
18
Splunk Solution: Sentiment Analysis App
David Carasso’s SentimentAnalysisApp
assigns binary sentiment values to
textual data (logs, tweets, email, etc.).
• Naïve Bayes classifier under the hood.
• Twitter & IMDB models out of the box.
• Can guess language of authorship, and
“heat”, a measure of emotional charge.
• Tip: compare relative sentiment
changes across time & groups.
• How to train your own models: http://answers.splunk.com/answers/59743
19
Demo: Sentiment Analysis App
20
Use Case: Anomaly Detection
• An anomaly (or outlier) is an event which is vastly dissimilar to other events.
• Anomaly Detection is one of Splunk’s most common use cases. Examples:
– Transactions which occur faster than humanly possible.
– DDoS attacks from IPaddress ranges.
– High-value customer purchase patterns.
• Quick techniques for finding statistical outliers:
– Non-average outliers: more than 2*stdev from the avg.
– Non-typical outliers: more than 1.5*IQR above perc75 or below perc25.
• Tip: save these as eventtypes for automated outlier detection.
• Once anomalies have been found, dig deeper to discover root causes.
21
Splunk Solution: cluster
• Anomalies are dissimilar to other events (by definition).
• We can use clustering algorithms to help us detect anomalies:
– Non-anomalous events typically form a few large clusters.
– Anomalous events typically form lots of small clusters.
• Cluster your data, sort ascending:
cluster showcount=true labelonly=true 

| sort cluster_count cluster_label!
• Remember: there is no “right way” to
find all anomalies. Explore your data!
22
Concept: Unsupervised Learning & Clustering
• Aclustering algorithm is any process which groups together similar things
(events, people, etc), and separates dissimilar things (events, people, etc).
• Clustering is unsupervised: choose labels based on patterns in the data.
• Clustering is in the eye of the beholder:
– Lots of different clustering algorithms.
– Lots of different similarity functions.
• Do not confuse with:
– Computer cluster: a group of computers
working together as a single system.
– Splunk cluster: a group of Splunk indexers
replicating indexes & external data.
23
Demo: cluster!
24
Splunk Solution: Other Commands
• anomalies:
– Assigns an “unexpectedness”
score to each event.
• anomalousvalue:
– Assigns an “anomaly score” to
events with anomalous values.
• outlier:
– Removes or truncates outliers.
• kmeans:
– Powerful clustering algorithm.
You choose k = # of clusters.
25
Splunk Solution: Prelert (Partner App)
• ManagesAnomaly Detection directly.
– Pre-built dashboards, alerts,API.
– Use cases: Security, ITOps /APM, DevOps
– Godfrey Sullivan: "beautifully adjacent and
complimentary to what Splunk does”
• Can download from SplunkApps.
– May save you time withAnomaly Detection.
– Can also be good source of inspiration
for your ownAnomaly Detection dashboards.
• Keep in mind Prelert is a paid app:
– Cost: $225/month @ 5GB
26
Use Case: Market Segmentation
• Market Segmentation: group customers according to common needs and
priorities, and develop strategies to target them.
– Market segments are internally homogeneous, and externally heterogeneous.
i.e., market segments are clusters of customers.
• Many reasons for Market Segmentation:
– Different market segments require different strategies.
– Customers in same segment have similar product
preferences. Different segments, different preferences.
– Segments should be reasonably stable, to allow for
historical analysis (good for Data Science).
• Use Splunk’s clustering algorithms to identify and label market segments!
27
Data	
  Visualiza7ons	
  
28
Intro to Data Visualization
• Data Visualization is the creation and study of the visual representation of
data, and is a vital part of Data Science.
• The goal of data visualization is to
communicate information:
– Visualizations communicate complex
ideas with clarity, precision, and efficiency.
– Transmission speed of the optic nerve
is about 9Mb/sec – fast image processing.
– Pattern matching, edge detection.
– Visualizations pack lots of information
into small spaces. More than text alone!
29
Telling Stories with Data Visualizations
• We process data in linear narratives: even dashboards go top-to-bottom.
• Visualizations help pierce the monotony of text, number & data streams.
• Think about the story you’re telling:
– Empathize with the viewer.
– What’s their takeaway?
• Agood visualization tells its own story:
“Island Nation Obtains Favourable Balance
ofTrade; Goes OnTo RuleThe World.”
• Weave multiple visualizations together
to tell more effective stories.
William Playfair (1786)
30
Source: New York Times. May 17, 2012
Splunk
31
Source: New York Times. May 17, 2012
Splunk
32
Tips for Effective Data Visualizations
• #1 tip: Plot the most important keys on x & y axes.
– You choose “most important.”
– You might need >1 visualization.
• Manipulate size, color and shape
to convey additional information.
• Annotate, label and add icons ✔︎
• Use chart overlay to correlate data sources. Mix histograms & line charts ↑↑↑
• Manipulate numerical scale: linear vs. log scales (previous 2 slides).
• Read more about Data Visualization:
– Tableau’s whitepaper, VisualAnalysis Best Practices (2013).
– EdwardTufte’s The Visual Display of Quantitative Information (2001).
33
• Splunk now supports D3 visualizations
with some minor customization.
• Satoshi’s talk: “I want that cool viz in Splunk!”
• Resources for Custom Visualizations:
– Splunk Web FrameworkToolkit
https://apps.splunk.com/app/1613/
– Splunk 6.x Dashboard Examples
https://apps.splunk.com/app/1603/
– Custom SimpleXMLExtensions
http://apps.splunk.com/app/1772/
– Lots more D3 visualizations for use:
https://github.com/mbostock/d3/wiki/Gallery
D3 Custom Visualizations in Splunk
34
Demo: Sankey Chart
35
How-to for Sankey Charts
• Install the Custom SimpleXMLExtensions app: http://apps.splunk.com/app/1772/
• Create your own app, and install Sankey chart components:
– Drop autodiscover.js in $SPLUNK_HOME/etc/apps/<YOURAPP>/appserver/static
– Copy & paste /sankeychart/ subfolder into $SPLUNK_HOME/etc/apps/<YOURAPP>/
appserver/static/components
– Restart Splunk.
• In your dashboard:
– Include script="autodiscover.js" in <form> or <dashboard> opening tag
– Insert XMLsnippet from 2- or 3-node Sankey dashboard example
– Change 2 instances of “custom_simplexml_extensions” to <YOURAPP>.
– Update search and “data-options” parameters (nodes) in XMLto reflect your data.
36
Know Your Audience
• Finally, keep in mind your audience: who are they, what questions do they
care about, and how do they want to consume the data?
– Executive: KPIs, charts, tables with icons ✔︎
– MarketingAnalyst: KPIs & metrics. Sharp
images for their own reports & decks.Tableau.
– Data Scientist: output clean data to organized
data stores (Hunk, HDFS, SQL, NoSQL).
– Sysadmin: sparklines, gauges for activity &
MTTR, tables with highlighted anomalies.
– Security Ops: maps with detailed overlays,
drill down on anomalous events.
• Bring it back to the business problem & use case!
37
3 Key Takeaways
Splunk	
  is	
  great	
  for	
  
doing	
  Data	
  Science!	
  
Splunk	
  complements	
  
other	
  tools	
  in	
  the	
  
Data	
  Science	
  toolkit.	
  
Data	
  Science	
  is	
  about	
  
extrac:ng	
  ac:onable	
  
insights	
  from	
  data.	
  
1 2 3
38
List of References
Good books on Data Science:
•  Schutt & O’Neil. Doing Data Science. O’Reilly 2013
•  Provost & Fawcett. Data Science for Business. O’Reilly 2013
•  Max Shron. Thinking With Data. O’Reilly 2014
•  Edward Tufte. The Visual Display of Quantitative Information. Graphics Press 2001
•  Zumel & Mount. Practical Data Science with R. Manning 2014
•  Hastie et al. Elements of Statistical Learning. Springer-Verlag 2009 (free PDF!)
Using Splunk for Data Science:
•  Zadrozny, Kodali (and Stout). Big Data Analytics Using Splunk. Apress 2013
•  David Carasso. Exploring Splunk. CITO Research 2012
•  David Carasso. Data Mining with Splunk. .conf2012
•  Michael Wilde & David Carasso. Social Media & Sentiment Analysis. .conf2012
Good free references:
•  Tableau. Visual Analysis Best Practices. Tableau 2013
•  King & Magoulas. 2013 Data Science Salary Survey. O’Reilly 2013
•  DJ Patil. Building Data Science Teams. O’Reilly 2013
•  Cathy O’Neil. On Being A Data Skeptic. O’Reilly 2013

Contenu connexe

Tendances

SANS CTI Summit 2016 - Data-Driven Threat Intelligence: Sharing
SANS CTI Summit 2016 - Data-Driven Threat Intelligence: SharingSANS CTI Summit 2016 - Data-Driven Threat Intelligence: Sharing
SANS CTI Summit 2016 - Data-Driven Threat Intelligence: SharingAlex Pinto
 
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationBiting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationAlex Pinto
 
Splunk .conf2011: Splunk for Fraud and Forensics at Intuit
Splunk .conf2011: Splunk for Fraud and Forensics at IntuitSplunk .conf2011: Splunk for Fraud and Forensics at Intuit
Splunk .conf2011: Splunk for Fraud and Forensics at IntuitErin Sweeney
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
 
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysisD1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysisDr. Wilfred Lin (Ph.D.)
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataJames Sirota
 
Insider Threats Detection in Cloud using UEBA
Insider Threats Detection in Cloud using UEBAInsider Threats Detection in Cloud using UEBA
Insider Threats Detection in Cloud using UEBALucas Ko
 
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingAlex Pinto
 
Using Splunk for Information Security
Using Splunk for Information SecurityUsing Splunk for Information Security
Using Splunk for Information SecurityShannon Cuthbertson
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersTao Xie
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Alex Pinto
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
SANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceSANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceJason Trost
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Alex Pinto
 
SplunkLive! Splunk for Security
SplunkLive! Splunk for SecuritySplunkLive! Splunk for Security
SplunkLive! Splunk for SecuritySplunk
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Making pentesting sexy ossams - BSidesQuebec2013
Making pentesting sexy ossams - BSidesQuebec2013Making pentesting sexy ossams - BSidesQuebec2013
Making pentesting sexy ossams - BSidesQuebec2013BSidesQuebec2013
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Splunk
 

Tendances (20)

SANS CTI Summit 2016 - Data-Driven Threat Intelligence: Sharing
SANS CTI Summit 2016 - Data-Driven Threat Intelligence: SharingSANS CTI Summit 2016 - Data-Driven Threat Intelligence: Sharing
SANS CTI Summit 2016 - Data-Driven Threat Intelligence: Sharing
 
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationBiting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
 
Splunk .conf2011: Splunk for Fraud and Forensics at Intuit
Splunk .conf2011: Splunk for Fraud and Forensics at IntuitSplunk .conf2011: Splunk for Fraud and Forensics at Intuit
Splunk .conf2011: Splunk for Fraud and Forensics at Intuit
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysisD1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
D1 1440 cesar wong next generation sequencing &amp; bio medical data analysis
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Insider Threats Detection in Cloud using UEBA
Insider Threats Detection in Cloud using UEBAInsider Threats Detection in Cloud using UEBA
Insider Threats Detection in Cloud using UEBA
 
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
 
Using Splunk for Information Security
Using Splunk for Information SecurityUsing Splunk for Information Security
Using Splunk for Information Security
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
SANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceSANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat Intelligence
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
 
SplunkLive! Splunk for Security
SplunkLive! Splunk for SecuritySplunkLive! Splunk for Security
SplunkLive! Splunk for Security
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Making pentesting sexy ossams - BSidesQuebec2013
Making pentesting sexy ossams - BSidesQuebec2013Making pentesting sexy ossams - BSidesQuebec2013
Making pentesting sexy ossams - BSidesQuebec2013
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data Science for Cyber Risk
Data Science for Cyber RiskData Science for Cyber Risk
Data Science for Cyber Risk
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session
 

En vedette

SplunkLive! Developer Session
SplunkLive! Developer SessionSplunkLive! Developer Session
SplunkLive! Developer SessionSplunk
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk
 
Bảng các thông số trong hồi quy eview
Bảng các thông số trong hồi quy eviewBảng các thông số trong hồi quy eview
Bảng các thông số trong hồi quy eviewthewindcold
 
kinh tế lượng
kinh tế lượngkinh tế lượng
kinh tế lượngvanhuyqt
 
画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量takaya imai
 
MIRU2013チュートリアル:SIFTとそれ以降のアプローチ
MIRU2013チュートリアル:SIFTとそれ以降のアプローチMIRU2013チュートリアル:SIFTとそれ以降のアプローチ
MIRU2013チュートリアル:SIFTとそれ以降のアプローチHironobu Fujiyoshi
 

En vedette (6)

SplunkLive! Developer Session
SplunkLive! Developer SessionSplunkLive! Developer Session
SplunkLive! Developer Session
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data Science
 
Bảng các thông số trong hồi quy eview
Bảng các thông số trong hồi quy eviewBảng các thông số trong hồi quy eview
Bảng các thông số trong hồi quy eview
 
kinh tế lượng
kinh tế lượngkinh tế lượng
kinh tế lượng
 
画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量
 
MIRU2013チュートリアル:SIFTとそれ以降のアプローチ
MIRU2013チュートリアル:SIFTとそれ以降のアプローチMIRU2013チュートリアル:SIFTとそれ以降のアプローチ
MIRU2013チュートリアル:SIFTとそれ以降のアプローチ
 

Similaire à Splunk for DataScience (.conf2014)

Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Splunk
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights Joe Lamantia
 
Machine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkMachine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkSplunk
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation Sally Sadosky
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsShannon Cuthbertson
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk
 
SplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep DiveSplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep DiveSplunk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Splunk for big_data
Splunk for big_dataSplunk for big_data
Splunk for big_dataGreg Hanchin
 

Similaire à Splunk for DataScience (.conf2014) (20)

Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights
 
Machine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkMachine Learning and Analytics in Splunk
Machine Learning and Analytics in Splunk
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
SplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep DiveSplunkLive! Hunk Technical Deep Dive
SplunkLive! Hunk Technical Deep Dive
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Splunk for big_data
Splunk for big_dataSplunk for big_data
Splunk for big_data
 
sudipto_resume
sudipto_resumesudipto_resume
sudipto_resume
 

Plus de stelligence

BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-sharestelligence
 
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-shareSantisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-sharestelligence
 
Recommend 10 splunk apps-Bangkok Splunk Meetup#1
Recommend 10 splunk apps-Bangkok Splunk Meetup#1Recommend 10 splunk apps-Bangkok Splunk Meetup#1
Recommend 10 splunk apps-Bangkok Splunk Meetup#1stelligence
 
Navy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurityNavy security contest-bigdataforsecurity
Navy security contest-bigdataforsecuritystelligence
 
MBA-TU-Thailand:BigData for business startup.
MBA-TU-Thailand:BigData for business startup.MBA-TU-Thailand:BigData for business startup.
MBA-TU-Thailand:BigData for business startup.stelligence
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalstelligence
 
Self-service Analytic for Business Users-19july2017-final
Self-service Analytic for Business Users-19july2017-finalSelf-service Analytic for Business Users-19july2017-final
Self-service Analytic for Business Users-19july2017-finalstelligence
 
Bigdata for sme-industrial intelligence information-24july2017-final
Bigdata for sme-industrial intelligence information-24july2017-finalBigdata for sme-industrial intelligence information-24july2017-final
Bigdata for sme-industrial intelligence information-24july2017-finalstelligence
 

Plus de stelligence (8)

BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-shareSantisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
 
Recommend 10 splunk apps-Bangkok Splunk Meetup#1
Recommend 10 splunk apps-Bangkok Splunk Meetup#1Recommend 10 splunk apps-Bangkok Splunk Meetup#1
Recommend 10 splunk apps-Bangkok Splunk Meetup#1
 
Navy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurityNavy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurity
 
MBA-TU-Thailand:BigData for business startup.
MBA-TU-Thailand:BigData for business startup.MBA-TU-Thailand:BigData for business startup.
MBA-TU-Thailand:BigData for business startup.
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 
Self-service Analytic for Business Users-19july2017-final
Self-service Analytic for Business Users-19july2017-finalSelf-service Analytic for Business Users-19july2017-final
Self-service Analytic for Business Users-19july2017-final
 
Bigdata for sme-industrial intelligence information-24july2017-final
Bigdata for sme-industrial intelligence information-24july2017-finalBigdata for sme-industrial intelligence information-24july2017-final
Bigdata for sme-industrial intelligence information-24july2017-final
 

Dernier

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Splunk for DataScience (.conf2014)

  • 1. 1 Copyright © 2014 Splunk Inc. Tom LaGatta Data Scientist, Splunk Olivier de Garrigues Sr Prof Services Consultant, Splunk Splunk  for  Data  Science  
  • 2. 2 Disclaimer During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality describedortoincludeanysuchfeatureorfunctionalityinafuturerelease.
  • 3. 3 3 Key Takeaways Splunk  is  great  for   doing  Data  Science!   Splunk  complements   other  tools  in  the   Data  Science  toolkit.   Data  Science  is  about   extrac:ng  ac:onable   insights  from  data.   1 2 3
  • 4. 4 About Us • Tom LaGatta, Data Scientist – Tom joined Splunk in Spring 2014 as a Data Scientist specializing in Probability and Statistics. Tom is an expert on the mathematics of inference, and he enjoys functional programming in languages like Clojure, Haskell & R.At Splunk, Tom is helping to develop our internal and external Data Science program and curriculum. Tom has a PhD in Mathematics from the University of Arizona, and until recently was a Courant Instructor at the Courant Institute at New York University.TomisbasedinNewYorkCity. • Olivier de Garrigues, Senior Professional Services Consultant – Olivier is based in London on the EMEA Professional Services team and has helped out more than 40 customers in 10 countries on variousSplunkprojectsinthepastyearandahalf.Priortothis,heworkedasaquantitativeanalystwith extensive use of MATLAB and R. He developed a keen interest in machine learning and enjoys dreaming about how to make Splunk better for data scientists, and helped develop the R ProjectApp. OlivierholdsanMSinMathematicsofFinancefromColumbiaUniversity.
  • 5. 5 Splunk  for  Data  Science  
  • 6. 6 What is Data Science? Data Science is about extracting actionable insights from data. • Helps people make better decisions. • Can be used for automated decision-making. • Data Science is cross-functional, and blends techniques & theories from: – CS / Programming – Math and Statistics – Machine Learning – Data Mining / Databases – Data Visualization • Don’t be afraid of Data Science! – Substantive / Domain Expertise – Social Science – Communication and Presentation – Accounting, Finance and KPIs – BusinessAnalytics
  • 7. 7 Data Science & Analytics Teams There is no “one size fits all” data scientist. Data Science &Analytics teams are made up of people with complementary skill sets. Source: Schutt & O’Neil. Doing Data Science. 2013
  • 8. 8 Splunk for Data Science Splunk is great for doing Data Science! • Integrate, query & visualize all the data: – Platform for machine data – Connects with any other data source • Easy-to-useAnalytics capabilities. • Powerful algorithms out-of-the-box. • Sharp visualizations and dashboards. • Deliver results to both IT & Business users. • Complements other Data Science tools (next slide).
  • 9. 9 Splunk and Data Science Tools Splunk complements other tools in the Data Science toolkit: • Hadoop: the workhorse of the Data Science world. Using Hunk, you can integrate Hadoop & HDFS seamlessly into Splunk. • R & Python: the preferred languages of Data Science. Execute R & Python scripts in your Splunk queries using the R ProjectApp & SDK for Python. • SQL& other RDBMS: valuable stores for customer & product data. Use Splunk’s DB ConnectApp to mash relational data up with machine data. • External tools: export finalized data from Splunk using the ODBC Driver. – Tip: do all your data processing in Splunk/Hunk, and export only the final results. • D3 Custom Visualizations: sharp dashboards & reports using Splunk.
  • 10. 10 Splunk and Data Science Use Cases Green Use Cases (easy out of the box) Yellow Use Cases (needs tinkering) Trend Forecasting D3 Custom Visualizations A/B Testing Predictive Modeling Root Cause Analysis Sentiment Analysis Anomaly Detection Conversion Funnel/Pathing Market Segmentation More Algorithms via R & Python Topic Modeling Capacity Planning Correlate Data from 2+ Sources Data Munging & Normalization KPIs & Executive Dashboards Splunk is a powerful tool for lots of Data Science use cases:
  • 12. 12 Use Case: Trend Forecasting Trend Forecasting: Given past & realtime data, predict future values & events. • Common applications: – Forecast revenue & other KPIs – Web server traffic & product downloads – Customer conversion rates – Estimate MTTR & server outages – Resource & capacity planning (AWSApp) – Security threats (Enterprise SecurityApp) • The “true” course of events can (and will) take only one of many divergent paths. But which one…? • Be mindful of rare events & black swans!
  • 13. 13 Splunk Solution: predict! predict command: forecast future trajectories of time series. • Implements a Kalman filter to identify seasonal trends. • Gives an “uncertainty envelope” as a buffer around the trend. • Tip:Always run the predict command on LOTS of past data. Capture low-frequency and high-frequency trends. • Remember: the future is always uncertain…
  • 14. 14 Splunk Solution: Predict App David Carasso’s PredictApp: forecast future values of individual events. – 8 minute walkthrough: https://www.youtube.com/watch?v=ROvaqJigNFg • Implements a Naïve Bayes classifier. • You have to train models! • Train a model to predict any target field using any reference field(s): fields ref1, ref2, ..., target
 | train my_model from target! • Guess target field for incoming events: guess my_model into target • Temporal or non-temporal prediction (include _time among reference fields).
  • 15. 15 Concept: Supervised Learning & Classification Supervised learning: use observed training data to classify values of unknown testing data. • predict command (Kalman filter): Training data = timechart of past & realtime values. Testing data = time range for future values. • PredictApp (Naïve Bayes classifier): Training data = events with reference & target fields. Testing data = events with reference fields but not target field. • Tip: only deploy models & algorithms after extensive testing & evaluation. • More powerful learning algorithms using R ProjectApp or SDK for Python.
  • 16. 16 Demo: Predict App • Train a model to predict movie Rating based on MovieID, UserID, Genre,Tag index=movielens Timestamp < 1199188800 UserID=593* | eval original_rating = case(Rating<3,"Dislike", Rating=3,"Neutral", Rating>3,"Like") | fields original_rating MovieID UserID Genre Tag | train rating_model from original_rating! • Guess Rating for test data based on trained model index=movielens Timestamp > 1199188800 UserID=593* | guess rating_model into guessed_rating | top original_rating guessed_rating! • Accuracy of model: correct on 97.6% of values. • Tip: always train on LOTS of training data. • Evaluate before deploying.
  • 17. 17 Use Case: Sentiment Analysis SentimentAnalysis: the assignment of “emotional” labels to textual data. • Can be simple +1 vs. -1, or more sophisticated: “happy”, “angry”, “sad”, etc. • Analyze tweets, emails, news articles, logs or any other textual data! – Social data correlates with other factors. • Typically done via supervised learning: – Train a model on labeled corpus of text. – Test the model on incoming text data. • Read more about SentimentAnalysis: –  Chapter14ofBigDataAnalyticsUsingSplunk(pp.255-282). –  MichaelWilde&DavidCarasso.SocialMedia&SentimentAnalysis..conf2012 3rd 8th 4th 1st 2nd 2011 Irish General Election 17% 1.8% 10% 36% 19% ★ r=.79
  • 18. 18 Splunk Solution: Sentiment Analysis App David Carasso’s SentimentAnalysisApp assigns binary sentiment values to textual data (logs, tweets, email, etc.). • Naïve Bayes classifier under the hood. • Twitter & IMDB models out of the box. • Can guess language of authorship, and “heat”, a measure of emotional charge. • Tip: compare relative sentiment changes across time & groups. • How to train your own models: http://answers.splunk.com/answers/59743
  • 20. 20 Use Case: Anomaly Detection • An anomaly (or outlier) is an event which is vastly dissimilar to other events. • Anomaly Detection is one of Splunk’s most common use cases. Examples: – Transactions which occur faster than humanly possible. – DDoS attacks from IPaddress ranges. – High-value customer purchase patterns. • Quick techniques for finding statistical outliers: – Non-average outliers: more than 2*stdev from the avg. – Non-typical outliers: more than 1.5*IQR above perc75 or below perc25. • Tip: save these as eventtypes for automated outlier detection. • Once anomalies have been found, dig deeper to discover root causes.
  • 21. 21 Splunk Solution: cluster • Anomalies are dissimilar to other events (by definition). • We can use clustering algorithms to help us detect anomalies: – Non-anomalous events typically form a few large clusters. – Anomalous events typically form lots of small clusters. • Cluster your data, sort ascending: cluster showcount=true labelonly=true 
 | sort cluster_count cluster_label! • Remember: there is no “right way” to find all anomalies. Explore your data!
  • 22. 22 Concept: Unsupervised Learning & Clustering • Aclustering algorithm is any process which groups together similar things (events, people, etc), and separates dissimilar things (events, people, etc). • Clustering is unsupervised: choose labels based on patterns in the data. • Clustering is in the eye of the beholder: – Lots of different clustering algorithms. – Lots of different similarity functions. • Do not confuse with: – Computer cluster: a group of computers working together as a single system. – Splunk cluster: a group of Splunk indexers replicating indexes & external data.
  • 24. 24 Splunk Solution: Other Commands • anomalies: – Assigns an “unexpectedness” score to each event. • anomalousvalue: – Assigns an “anomaly score” to events with anomalous values. • outlier: – Removes or truncates outliers. • kmeans: – Powerful clustering algorithm. You choose k = # of clusters.
  • 25. 25 Splunk Solution: Prelert (Partner App) • ManagesAnomaly Detection directly. – Pre-built dashboards, alerts,API. – Use cases: Security, ITOps /APM, DevOps – Godfrey Sullivan: "beautifully adjacent and complimentary to what Splunk does” • Can download from SplunkApps. – May save you time withAnomaly Detection. – Can also be good source of inspiration for your ownAnomaly Detection dashboards. • Keep in mind Prelert is a paid app: – Cost: $225/month @ 5GB
  • 26. 26 Use Case: Market Segmentation • Market Segmentation: group customers according to common needs and priorities, and develop strategies to target them. – Market segments are internally homogeneous, and externally heterogeneous. i.e., market segments are clusters of customers. • Many reasons for Market Segmentation: – Different market segments require different strategies. – Customers in same segment have similar product preferences. Different segments, different preferences. – Segments should be reasonably stable, to allow for historical analysis (good for Data Science). • Use Splunk’s clustering algorithms to identify and label market segments!
  • 28. 28 Intro to Data Visualization • Data Visualization is the creation and study of the visual representation of data, and is a vital part of Data Science. • The goal of data visualization is to communicate information: – Visualizations communicate complex ideas with clarity, precision, and efficiency. – Transmission speed of the optic nerve is about 9Mb/sec – fast image processing. – Pattern matching, edge detection. – Visualizations pack lots of information into small spaces. More than text alone!
  • 29. 29 Telling Stories with Data Visualizations • We process data in linear narratives: even dashboards go top-to-bottom. • Visualizations help pierce the monotony of text, number & data streams. • Think about the story you’re telling: – Empathize with the viewer. – What’s their takeaway? • Agood visualization tells its own story: “Island Nation Obtains Favourable Balance ofTrade; Goes OnTo RuleThe World.” • Weave multiple visualizations together to tell more effective stories. William Playfair (1786)
  • 30. 30 Source: New York Times. May 17, 2012 Splunk
  • 31. 31 Source: New York Times. May 17, 2012 Splunk
  • 32. 32 Tips for Effective Data Visualizations • #1 tip: Plot the most important keys on x & y axes. – You choose “most important.” – You might need >1 visualization. • Manipulate size, color and shape to convey additional information. • Annotate, label and add icons ✔︎ • Use chart overlay to correlate data sources. Mix histograms & line charts ↑↑↑ • Manipulate numerical scale: linear vs. log scales (previous 2 slides). • Read more about Data Visualization: – Tableau’s whitepaper, VisualAnalysis Best Practices (2013). – EdwardTufte’s The Visual Display of Quantitative Information (2001).
  • 33. 33 • Splunk now supports D3 visualizations with some minor customization. • Satoshi’s talk: “I want that cool viz in Splunk!” • Resources for Custom Visualizations: – Splunk Web FrameworkToolkit https://apps.splunk.com/app/1613/ – Splunk 6.x Dashboard Examples https://apps.splunk.com/app/1603/ – Custom SimpleXMLExtensions http://apps.splunk.com/app/1772/ – Lots more D3 visualizations for use: https://github.com/mbostock/d3/wiki/Gallery D3 Custom Visualizations in Splunk
  • 35. 35 How-to for Sankey Charts • Install the Custom SimpleXMLExtensions app: http://apps.splunk.com/app/1772/ • Create your own app, and install Sankey chart components: – Drop autodiscover.js in $SPLUNK_HOME/etc/apps/<YOURAPP>/appserver/static – Copy & paste /sankeychart/ subfolder into $SPLUNK_HOME/etc/apps/<YOURAPP>/ appserver/static/components – Restart Splunk. • In your dashboard: – Include script="autodiscover.js" in <form> or <dashboard> opening tag – Insert XMLsnippet from 2- or 3-node Sankey dashboard example – Change 2 instances of “custom_simplexml_extensions” to <YOURAPP>. – Update search and “data-options” parameters (nodes) in XMLto reflect your data.
  • 36. 36 Know Your Audience • Finally, keep in mind your audience: who are they, what questions do they care about, and how do they want to consume the data? – Executive: KPIs, charts, tables with icons ✔︎ – MarketingAnalyst: KPIs & metrics. Sharp images for their own reports & decks.Tableau. – Data Scientist: output clean data to organized data stores (Hunk, HDFS, SQL, NoSQL). – Sysadmin: sparklines, gauges for activity & MTTR, tables with highlighted anomalies. – Security Ops: maps with detailed overlays, drill down on anomalous events. • Bring it back to the business problem & use case!
  • 37. 37 3 Key Takeaways Splunk  is  great  for   doing  Data  Science!   Splunk  complements   other  tools  in  the   Data  Science  toolkit.   Data  Science  is  about   extrac:ng  ac:onable   insights  from  data.   1 2 3
  • 38. 38 List of References Good books on Data Science: •  Schutt & O’Neil. Doing Data Science. O’Reilly 2013 •  Provost & Fawcett. Data Science for Business. O’Reilly 2013 •  Max Shron. Thinking With Data. O’Reilly 2014 •  Edward Tufte. The Visual Display of Quantitative Information. Graphics Press 2001 •  Zumel & Mount. Practical Data Science with R. Manning 2014 •  Hastie et al. Elements of Statistical Learning. Springer-Verlag 2009 (free PDF!) Using Splunk for Data Science: •  Zadrozny, Kodali (and Stout). Big Data Analytics Using Splunk. Apress 2013 •  David Carasso. Exploring Splunk. CITO Research 2012 •  David Carasso. Data Mining with Splunk. .conf2012 •  Michael Wilde & David Carasso. Social Media & Sentiment Analysis. .conf2012 Good free references: •  Tableau. Visual Analysis Best Practices. Tableau 2013 •  King & Magoulas. 2013 Data Science Salary Survey. O’Reilly 2013 •  DJ Patil. Building Data Science Teams. O’Reilly 2013 •  Cathy O’Neil. On Being A Data Skeptic. O’Reilly 2013