3. Analytics Big Picture
Pivot
Enables non-technical users to build complex
reports without the search language
Data
Model
Provides more meaningful representation
of underlying raw machine data
Analytics
Store
Acceleration technology delivers up to 1000x
faster analytics over Splunk Enterprise 5
6. Splunk Search Language
search and filter | munge | report | clean-up
sourcetype=access_combined source = "/home/ssorkin/banner_access.log.2013.6.gz"
| eval unique=(uid + useragent) | stats dc(unique) by os_name
| rename dc(unique) as "Unique Visitors" os_name as "Operating System"
7. Hurdles
index=main source=*/banner_access* uri_path=/js/*/*/login/* guid=* useragent!=*KTXN* useragent!=*GomezAgent* clientip!=206.80.3.67
clientip!=198.144.207.62 clientip!=97.65.63.66 clientip!=175.45.37.78 clientip!=209.119.210.194 clientip!=212.36.37.138 clientip!=204.156.84.0/24
clientip!=216.221.226.0/24 clientip!=207.87.200.162 | rex field=uri_path "/js/(?<t>[^/]*)/(?<v>[^/]*)/login/(?<l>[^/]*)” | eval license = case(l LIKE "prod%" AND
t="pro", "enterprise", l LIKE "trial%" AND t="pro", "trial", t="free", "free”) | rex field=v "^(?<vers>d.d)” | bin span=1d _time as day | stats values(vers) as vers
min(day) as min_day min(eval(if(vers=="5.0", _time, null()))) as min_day_50 dc(day) as days values(license) as license by guid | eval type =
if(match(vers,"4.*"), "upgrade", "not upgrade") + "/" + if(days > 1, "repeat", "not repeat")| search license=enterprise | eval _time = min_day_50| timechart
count by type| streamstats sum(*) as *
•
Simple searches easy… Multi-stage munging/reporting is hard!
•
Need to understand data’s structure to construct search
•
Non-technical users may not have data source domain knowledge
•
Splunk admins do not have end-user search context
8. Data Model Goals
•
Make it easy to share/reuse domain knowledge
•
Admins/power users build data models
•
Non-technical users interact with data via pivot UI
10. What is a Data Model?
A data model is a search-time mapping of data onto a hierarchical structure
Encapsulate the knowledge
needed to build a search
Pivot reports are build on top
of data models
Data-independent
Screenshot here
11. A Data Model is a Collection of Objects
Screenshot here
19. Object Attributes
Auto-extracted – default and predefined fields
Eval expression – a new field based
on an expression that you define
Lookup – leverage an existing lookup
table
Regular expression – extract a new
field based on regex
Geo IP – add geolocation fields such
as latitude, longitude, country, etc.
20. Object Attributes
Set field types
Configure various flags
Note: Child object configuration can differ from parent
21. Best Practices
Use event objects as often as possible
– Benefit from data model acceleration
Resist the urge to use search objects instead of event objects!!
– Event based searches can be optimized better
Minimize object hierarchy depth when possible
– Constraint based filtering is less efficient deeper down the tree
Event object with deepest tree (and most matching results) first
– Model-wide acceleration only for first event object and its
descendants
22. Warnings!
Object constraints and attributes cannot contain pipes or subsearches
A transaction object requires at least one event or search object in the data model
Lookups used in attributes must be globally visible (or at least visible to the app
using the data model)
No versioning on data models (and objects)!
29. Using the Splunk Search Language
Object Search String
| datamodel <modelname> <objectID> search
Example:
| datamodel WebIntelligence HTTP_Request search
Behind the scenes:
sourcetype=access_* OR sourcetype=iis* uri=* uri_path=* status=* clientip=* referer=*
useragent=*
30. Under the hood: Pivot Search String Generation
Pivot search = object search + filters + reporting + formatting
Example:
(sourcetype=access_* OR sourcetype=iis*) status=2*
uri=* uri_path=* status=* clientip=* referer=* useragent=*
| stats count AS "Count of HTTP_Sucess" by ”useragent"
| sort limit=0 "useragent" | fields - _span
| fields "useragent" "Count of HTTP_Success"
| fillnull "Count of HTTP_Success"
| fields "useragent" *
31. Using the Splunk Search Language
Pivot Search String
| pivot <modelname> <objectID> [statsfns, rowsplit, colsplit, filters, …]
Example:
| pivot WebIntelligence HTTP_Request count(HTTP_Request) AS "Count of HTTP_Request" SPLITROW status
AS "status" SORT 0 status
Behind the scenes:
sourcetype=access_* OR sourcetype=iis* uri=* uri_path=* status=* clientip=* referer=* useragent=*
| stats count AS "Count of HTTP_Request" by "status"
| sort limit=0 "status" | fields - _span
| fields "status", "Count of HTTP_Request"
| fillnull "Count of HTTP_Request"
| fields "status" *
32. Warnings
• | datamodel and | pivot are generating commands
– They must be at the beginning of the search string
•
Use objectIDs NOT user-visible object names
34. Data Model on Disk
Each data model is a separate JSON file
Lives in <myapp>/local/data/models
(or <myapp>/default/data/models for
pre-installed models)
Has associated conf stanzas
and metadata
35. Editing Data Model JSON
At your own risk!
Models edited via the UI are validated
Manually edited data models: NOT SUPPORTED
Exception: installing a new model by adding the file to
<myapp>/<local OR default>/data/models is probably okay
36. Deleting a Data Model
Use the UI for appropriate cleanup
Potential for bad state if manually deleting model on disk
37. Interacting With a Data Model
Use data model builder and pivot UI – safest option!
Use REST API – for developers (see docs for details)
Use | datamodel and | pivot Splunk search commands
39. Data Model Acceleration
Admin or power user
Backend magic
Acceleration
Non-technical user
Run search using on-disk acceleration
Run a pivot report
No acceleration
Kick off ad-hoc acceleration and run search
40. Model-Wide Acceleration
Only accelerates first eventbased object and descendants
Does not accelerate search and
transaction-based objects
Pivot search:
| tstats count AS "Count of HTTP_Success" from datamodel="WebIntelligence" where
(nodename="HTTP_Request") (nodename="HTTP_Request.HTTP_Success") prestats=true | stats count AS
"Count of HTTP_Success”
41. Ad-Hoc Object Acceleration
Kick off acceleration on pivot page (re) load for non-accelerated models
and search/transaction objects
Amortize cost of ad-hoc acceleration over repeated pivoting on
same object
Pivot search:
| tstats count AS "Count of HTTP_Success" from sid=1379116434.663 prestats=true | stats count AS
"Count of HTTP_Success”
Splunk 6 takes large-scalemachine data analytics to the next level by introducing three breakthrough innovations:Pivot – opens up the power of Splunk search to non-technical users with an easy-to-use drag and drop interface to explore, manipulate and visualize data Data Model – defines meaningful relationships in underlying machine data and making the data more useful to broader base of non-technical usersAnalytics Store – patent pending technology that accelerates data models by delivering extremely high performance data retrieval for analytical operations, up to 1000x faster than Splunk 5Let’s dig into each of these new features in more detail.
What is Data Model, and why do I care?Building a Data ModelManagement, Acceleration, and BeyondThe Future!Q&A
-The Splunk search language is very expressive. - Can perform a wide variety of tasks ranging from filtering to data munging and reporting- There are various search commands for complex transformations and statistics (e.g. correlation, prediction etc)
What does the search do?Basically, first it normalizes the individual accesses, which should be representable as a model object.Next it aggregates by guid to create an "instance" object, which should be representable in a DM.It calculates a field on that instance object, "type".Then it builds a timechart. of those, using a special "_time" value.Low overhead to start but learning curve quickly gets steepObtaining website usage metrics should not require understanding Apache vs IIS formatAdmins won’t know apriori what questions are being asked of the data…so they can’t provide canned dashboards for all scenariosBackup search for example: eventtype=pageview | eval stage_2=if(searchmatch("uri=/download*"), _time, null()) | eval stage_1=if(searchmatch("uri=/product*"), _time, null()) | eval stage_3=if(searchmatch("uri=*download_track*"), _time, null()) | stats min(stage_*) as stage_* by cookie | search stage_1=* | where isnull(stage_2) OR stage_2 >= stage_1 | where isnull(stage_3) OR stage_3 >= stage_2 | eval stage = case(isnull(stage_2), "stage_1", isnull(stage_3), "stage_2", 1==1, "stage_3") | stats count by stage | reverse | accum count as cumulative_count | reverse | streamstats current=f max(cumulative_count) as stage_1_count last(cumulative_count) as prev_count
What are the important “things” in your data?E.g. WebIntelligence might haveHTTPAccessHTTPSuccessUser SessionHow are they related?There’s more than one “right” way to define your objects
Constraints filter down to a set of a dataAttributes are the fields and knowledge associated with the objectBoth are inherited!
A child object is a type of its parent object: e.g. An HTTP_Success object is a type of HTTP_AccessAdding a child object is essentially a way of adding a filter on the parentsA parent-child relationship makes it easy to do queries like “What percentage of my HTTP_Access events are HTTP_Success events?”
Constraints are essentially the search broken down into a hierarchy, attributes are the associated fields and knowledge
Arbitrary searches that include transforming commands to define the dataset that they representFix example here? TODO
Enable the creation of objects that represent transactionsUse fields that have already been added to the model via event or search objects
This is how we capture knowledge
Required: Only events that contain this field will be returned in PivotOptional: The field doesn't have to appear in every event Hidden: The field will not be displayed to Pivot users when they select the object in PivotUse this for fields that are only being used to define another attribute, such as an eval expression Hidden & Required: Only events that contain this field will be returned, and the field will be hidden from use in Pivot
Be careful about lookup permissions – must be available in the context where you want to use them