December 2013 HUG: Hunk - Splunk over Hadoop

© 2012 Splunk, Inc.

Inside Splunk Enterprise
and Hunk: Architecture,
Analytics and Use Cases
Todd Papaioannou, CTO
Ledion Bitincka, Principal Architect
Brett Sheppard, Big Data PMM Director
December 2013

Splunk is a Different Approach
Built by IT pros for IT pros

It’s all about the user from novice to guru

One code base

Laptop to datacenter, Unix to Windows, agent to server

Open architecture

Files versus database, scriptable. APIs. SDKs, standards

Flexible and extensible

Any data, any format, different views, built to be extended

Scales to big data

Not filtered, not “dumbed” down, not locked into a fixed schema

Transparent support

Public documentation, public roadmap, real engineers on IRC
2

Inside Search-time Knowledge Extraction
Automatically discovered fields
And user-defined fields

... enable statistics and precise search on specific fields:

3

Inside Search-time Knowledge Extraction
Searches saved as event types
Plus tagging of event types, hosts and other fields

... enable normalized reporting, knowledge
sharing and granular access control.
4

Powerful, Easy-to-use Analytics for Everyone
Data Models and Pivot
• Data models describes how
underlying data is
represented and accessed
• Drag-and-drop interface
enables anyone to analyze
raw, unstructured data
• Click to visualize any chart
type; reports dynamically
update when fields change

All chart types available in the chart toolbox

Save report
to share

Time window

Add constraints to
filter out events

Select fields from
data model

Data models: hierarchical object view of underlying data
5

Visualize and Share Data with Role-based Security
Build and Personalize
• Rapidly build advanced graphs
•

•
•
•

and charts on-the-fly
Combine charts, views and
external data in dashboards
and reports
View and edit on any desktop
or mobile device
Drill down to raw data
Protect data with role-based
access controls
6

Integration Methods
Dashboards and Views

UI Extensibility

• Simple XML,
JavaScript,
Django

• Interactive
dashboards and
user workflows

• REST API

• Custom styling,
behavior & visuals

• iframe embed

• Integrate charts, dashboards and query results into other applications
• Create workflows that trigger an action in an external system or use REST endpoints
• ODBC driver (beta) to integrate with 3rd-party visualization software
7

Analytics Use Cases by Splunk Product
Real-time
indexing
Real-time
search

App Dev
&
App
Mgmt.

Ad hoc analytics of
historical data in Hadoop

IT
Ops.

Digital
Intelligence

Security &
Compliance

Product and
Service
Analytics

Business
Analytics

Complete
3600
Customer Security
Analytics
View

Developers building big data apps on top of Hadoop
Splunk Apps
Vibrant and passionate developer community
8

Splunk Hadoop Connect

Real-Time Analytics with Managed Forwarders

Data

Scripted Input

Parsing Pipeline
• Source, event typing
• Character set
normalization
• Line breaking
• Timestamp identification
• Regex transforms

9

Index Queue

TCP/UDP Input

Parsing Queue

Monitor Input

Real-time
Buffer

Indexing
Pipeline

Real-time
Search
Process

Raw data
Index Files

Splunk
Index

Hunk: Splunk Analytics for Hadoop

10

The Problem

Easy to get data in
Large amounts of data already in Hadoop
Hard to get value out

12

Data -> Value (today)

Collect

Prepare

13

Ask

Data -> Value (ideally)

Collect

Prepare

Ask

14

What if?

Hadoop + Splunk =

15

Free Download

Go now to splunk.com/download/hunk and download your 60-day free trial,
with no limit on the size of the Hadoop cluster
17

Process the data in place
Maintain support for Splunk Processing Language (SPL)
True schema on read

Interactive
Ease of setup & use

19

GOALS

Support SPL

Naturally suitable for MapReduce
Reduces adoption time
Challenge: Hadoop “apps” written in Java & all SPL code is in C++

Porting SPL to Java would be a daunting task (120+ commands)
Reuse the C++ code somehow
– JNI – not easy nor stable
– use “splunkd” (the binary) to process the data

21

GOALS

Schema on read

Apply Splunk’s index-time schema at search time
– Event breaking, time stamping etc

Anything else would be brittle & maintenance nightmare
Extremely flexible
Runtime overhead (manpower >>$ computation)
Challenge: Hadoop “apps” written in Java & all index-time schema logic
is implemented in C++

22

GOALS

Interactive

No one likes to stare at a blank screen!
Challenge: Hadoop is designed for batch-like jobs

23

Hunk Uses Virtual Indexes

• Enables seamless use of the Splunk stack on data in Hadoop
• Automatically handles MapReduce
• Technology is patent pending
25

Examples of Virtual Indexes
External System 1

index = syslog (/home/syslog/…)

Hunk
Search Head >

External System 2

External System 3

26

index = apache_logs
index = sensor_data

index = twitter

Mixed-mode Search
Streaming

Reporting

• Transfers first several blocks from

• Pushes computation to the

HDFS to the Hunk Search Head
for immediate processing

DataNodes and TaskTrackers for
the complete search

• Hunk starts the streaming and reporting modes concurrently
• Streaming results show until the reporting results come in
• Allows users to search interactively by pausing and refining queries
29

Data Processing Pipeline
Raw data
(HDFS)

Custom
processing

stdin

You can plug in
data preprocessors
e.g. Apache Avro or
format readers

Indexing
pipeline
Event breaking
Timestamping

Search
pipeline
Event typing
Lookups
Tagging
Search processors

splunkd/C++

MapReduce/Java
30

30

December 2013 HUG: Hunk - Splunk over Hadoop

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à December 2013 HUG: Hunk - Splunk over Hadoop

Similaire à December 2013 HUG: Hunk - Splunk over Hadoop (20)

Plus de Yahoo Developer Network

Plus de Yahoo Developer Network (20)

Dernier

Dernier (20)

December 2013 HUG: Hunk - Splunk over Hadoop

Notes de l'éditeur