SlideShare une entreprise Scribd logo
1  sur  32
© 2012 Splunk, Inc.

Inside Splunk Enterprise
and Hunk: Architecture,
Analytics and Use Cases
Todd Papaioannou, CTO
Ledion Bitincka, Principal Architect
Brett Sheppard, Big Data PMM Director
December 2013
Splunk is a Different Approach
Built by IT pros for IT pros

It’s all about the user from novice to guru

One code base

Laptop to datacenter, Unix to Windows, agent to server

Open architecture

Files versus database, scriptable. APIs. SDKs, standards

Flexible and extensible

Any data, any format, different views, built to be extended

Scales to big data

Not filtered, not “dumbed” down, not locked into a fixed schema

Transparent support

Public documentation, public roadmap, real engineers on IRC
2
Inside Search-time Knowledge Extraction
Automatically discovered fields
And user-defined fields

... enable statistics and precise search on specific fields:

3
Inside Search-time Knowledge Extraction
Searches saved as event types
Plus tagging of event types, hosts and other fields

... enable normalized reporting, knowledge
sharing and granular access control.
4
Powerful, Easy-to-use Analytics for Everyone
Data Models and Pivot
• Data models describes how
underlying data is
represented and accessed
• Drag-and-drop interface
enables anyone to analyze
raw, unstructured data
• Click to visualize any chart
type; reports dynamically
update when fields change

All chart types available in the chart toolbox

Save report
to share

Time window

Add constraints to
filter out events

Select fields from
data model

Data models: hierarchical object view of underlying data
5
Visualize and Share Data with Role-based Security
Build and Personalize
• Rapidly build advanced graphs
•

•
•
•

and charts on-the-fly
Combine charts, views and
external data in dashboards
and reports
View and edit on any desktop
or mobile device
Drill down to raw data
Protect data with role-based
access controls
6
Integration Methods
Dashboards and Views

UI Extensibility

• Simple XML,
JavaScript,
Django

• Interactive
dashboards and
user workflows

• REST API

• Custom styling,
behavior & visuals

• iframe embed

• Integrate charts, dashboards and query results into other applications
• Create workflows that trigger an action in an external system or use REST endpoints
• ODBC driver (beta) to integrate with 3rd-party visualization software
7
Analytics Use Cases by Splunk Product
Real-time
indexing
Real-time
search

App Dev
&
App
Mgmt.

Ad hoc analytics of
historical data in Hadoop

IT
Ops.

Digital
Intelligence

Security &
Compliance

Product and
Service
Analytics

Business
Analytics

Complete
3600
Customer Security
Analytics
View

Developers building big data apps on top of Hadoop
Splunk Apps
Vibrant and passionate developer community
8

Splunk Hadoop Connect
Real-Time Analytics with Managed Forwarders

Data

Scripted Input

Parsing Pipeline
• Source, event typing
• Character set
normalization
• Line breaking
• Timestamp identification
• Regex transforms

9

Index Queue

TCP/UDP Input

Parsing Queue

Monitor Input

Real-time
Buffer

Indexing
Pipeline

Real-time
Search
Process

Raw data
Index Files

Splunk
Index
Hunk: Splunk Analytics for Hadoop

10
Inside Hunk
The Problem

Easy to get data in
Large amounts of data already in Hadoop
Hard to get value out

12
Data -> Value (today)

Collect

Prepare

13

Ask
Data -> Value (ideally)

Collect

Prepare

Ask

14
What if?

Hadoop + Splunk =

15
Hadoop + Splunk = Hunk

16
Free Download

Go now to splunk.com/download/hunk and download your 60-day free trial,
with no limit on the size of the Hadoop cluster
17
Goals

18
Process the data in place
Maintain support for Splunk Processing Language (SPL)
True schema on read

Interactive
Ease of setup & use

19
Challenges

20
GOALS

Support SPL

Naturally suitable for MapReduce
Reduces adoption time
Challenge: Hadoop “apps” written in Java & all SPL code is in C++

Porting SPL to Java would be a daunting task (120+ commands)
Reuse the C++ code somehow
– JNI – not easy nor stable
– use “splunkd” (the binary) to process the data

21
GOALS

Schema on read

Apply Splunk’s index-time schema at search time
– Event breaking, time stamping etc

Anything else would be brittle & maintenance nightmare
Extremely flexible
Runtime overhead (manpower >>$ computation)
Challenge: Hadoop “apps” written in Java & all index-time schema logic
is implemented in C++

22
GOALS

Interactive

No one likes to stare at a blank screen!
Challenge: Hadoop is designed for batch-like jobs

23
Virtual Indexes

24
Hunk Uses Virtual Indexes

• Enables seamless use of the Splunk stack on data in Hadoop
• Automatically handles MapReduce
• Technology is patent pending
25
Examples of Virtual Indexes
External System 1

index = syslog (/home/syslog/…)

Hunk
Search Head >

External System 2

External System 3

26

index = apache_logs
index = sensor_data

index = twitter
Deployment Overview

27
Data processing

28
Mixed-mode Search
Streaming

Reporting

• Transfers first several blocks from

• Pushes computation to the

HDFS to the Hunk Search Head
for immediate processing

DataNodes and TaskTrackers for
the complete search

• Hunk starts the streaming and reporting modes concurrently
• Streaming results show until the reporting results come in
• Allows users to search interactively by pausing and refining queries
29
Data Processing Pipeline
Raw data
(HDFS)

Custom
processing

stdin

You can plug in
data preprocessors
e.g. Apache Avro or
format readers

Indexing
pipeline
Event breaking
Timestamping

Search
pipeline
Event typing
Lookups
Tagging
Search processors

splunkd/C++

MapReduce/Java
30

30
Demo
Thank You
splunk.com/hunk

Contenu connexe

Tendances

Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
DataWorks Summit
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
k4ndar
 
SplunkLive! Washington DC May 2013 - Big Data Architectural Patterns
SplunkLive! Washington DC May 2013 - Big Data Architectural PatternsSplunkLive! Washington DC May 2013 - Big Data Architectural Patterns
SplunkLive! Washington DC May 2013 - Big Data Architectural Patterns
Splunk
 

Tendances (20)

Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsHow To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
MCT Virtual Summit 2021
MCT Virtual Summit 2021MCT Virtual Summit 2021
MCT Virtual Summit 2021
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
Check Point Big Data Forum m3
Check Point Big Data Forum m3Check Point Big Data Forum m3
Check Point Big Data Forum m3
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...
SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...
SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
Modern data warehouse with Azure
Modern data warehouse with AzureModern data warehouse with Azure
Modern data warehouse with Azure
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
SplunkLive! Washington DC May 2013 - Big Data Architectural Patterns
SplunkLive! Washington DC May 2013 - Big Data Architectural PatternsSplunkLive! Washington DC May 2013 - Big Data Architectural Patterns
SplunkLive! Washington DC May 2013 - Big Data Architectural Patterns
 
Data analysis using hive ql & tableau
Data analysis using hive ql & tableauData analysis using hive ql & tableau
Data analysis using hive ql & tableau
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
SplunkLive! London 2016 Getting started with Splunk
SplunkLive! London 2016 Getting started with SplunkSplunkLive! London 2016 Getting started with Splunk
SplunkLive! London 2016 Getting started with Splunk
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 

Similaire à December 2013 HUG: Hunk - Splunk over Hadoop

SplunkLive! Developer Session
SplunkLive! Developer SessionSplunkLive! Developer Session
SplunkLive! Developer Session
Splunk
 

Similaire à December 2013 HUG: Hunk - Splunk over Hadoop (20)

Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbeta
 
Splunk Developer Platform
Splunk Developer PlatformSplunk Developer Platform
Splunk Developer Platform
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
SplunkLive London 2014 Developer Presentation
SplunkLive London 2014  Developer PresentationSplunkLive London 2014  Developer Presentation
SplunkLive London 2014 Developer Presentation
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
Machine Data 101 Workshop
Machine Data 101 Workshop Machine Data 101 Workshop
Machine Data 101 Workshop
 
Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Machine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into InsightMachine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into Insight
 
Machine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into InsightMachine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into Insight
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Uxdevsummit - Best practices for instrumentation
Uxdevsummit - Best practices for instrumentationUxdevsummit - Best practices for instrumentation
Uxdevsummit - Best practices for instrumentation
 
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
 
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
SplunkLive! Developer Session
SplunkLive! Developer SessionSplunkLive! Developer Session
SplunkLive! Developer Session
 

Plus de Yahoo Developer Network

Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 

Plus de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

December 2013 HUG: Hunk - Splunk over Hadoop

  • 1. © 2012 Splunk, Inc. Inside Splunk Enterprise and Hunk: Architecture, Analytics and Use Cases Todd Papaioannou, CTO Ledion Bitincka, Principal Architect Brett Sheppard, Big Data PMM Director December 2013
  • 2. Splunk is a Different Approach Built by IT pros for IT pros It’s all about the user from novice to guru One code base Laptop to datacenter, Unix to Windows, agent to server Open architecture Files versus database, scriptable. APIs. SDKs, standards Flexible and extensible Any data, any format, different views, built to be extended Scales to big data Not filtered, not “dumbed” down, not locked into a fixed schema Transparent support Public documentation, public roadmap, real engineers on IRC 2
  • 3. Inside Search-time Knowledge Extraction Automatically discovered fields And user-defined fields ... enable statistics and precise search on specific fields: 3
  • 4. Inside Search-time Knowledge Extraction Searches saved as event types Plus tagging of event types, hosts and other fields ... enable normalized reporting, knowledge sharing and granular access control. 4
  • 5. Powerful, Easy-to-use Analytics for Everyone Data Models and Pivot • Data models describes how underlying data is represented and accessed • Drag-and-drop interface enables anyone to analyze raw, unstructured data • Click to visualize any chart type; reports dynamically update when fields change All chart types available in the chart toolbox Save report to share Time window Add constraints to filter out events Select fields from data model Data models: hierarchical object view of underlying data 5
  • 6. Visualize and Share Data with Role-based Security Build and Personalize • Rapidly build advanced graphs • • • • and charts on-the-fly Combine charts, views and external data in dashboards and reports View and edit on any desktop or mobile device Drill down to raw data Protect data with role-based access controls 6
  • 7. Integration Methods Dashboards and Views UI Extensibility • Simple XML, JavaScript, Django • Interactive dashboards and user workflows • REST API • Custom styling, behavior & visuals • iframe embed • Integrate charts, dashboards and query results into other applications • Create workflows that trigger an action in an external system or use REST endpoints • ODBC driver (beta) to integrate with 3rd-party visualization software 7
  • 8. Analytics Use Cases by Splunk Product Real-time indexing Real-time search App Dev & App Mgmt. Ad hoc analytics of historical data in Hadoop IT Ops. Digital Intelligence Security & Compliance Product and Service Analytics Business Analytics Complete 3600 Customer Security Analytics View Developers building big data apps on top of Hadoop Splunk Apps Vibrant and passionate developer community 8 Splunk Hadoop Connect
  • 9. Real-Time Analytics with Managed Forwarders Data Scripted Input Parsing Pipeline • Source, event typing • Character set normalization • Line breaking • Timestamp identification • Regex transforms 9 Index Queue TCP/UDP Input Parsing Queue Monitor Input Real-time Buffer Indexing Pipeline Real-time Search Process Raw data Index Files Splunk Index
  • 10. Hunk: Splunk Analytics for Hadoop 10
  • 12. The Problem Easy to get data in Large amounts of data already in Hadoop Hard to get value out 12
  • 13. Data -> Value (today) Collect Prepare 13 Ask
  • 14. Data -> Value (ideally) Collect Prepare Ask 14
  • 15. What if? Hadoop + Splunk = 15
  • 16. Hadoop + Splunk = Hunk 16
  • 17. Free Download Go now to splunk.com/download/hunk and download your 60-day free trial, with no limit on the size of the Hadoop cluster 17
  • 19. Process the data in place Maintain support for Splunk Processing Language (SPL) True schema on read Interactive Ease of setup & use 19
  • 21. GOALS Support SPL Naturally suitable for MapReduce Reduces adoption time Challenge: Hadoop “apps” written in Java & all SPL code is in C++ Porting SPL to Java would be a daunting task (120+ commands) Reuse the C++ code somehow – JNI – not easy nor stable – use “splunkd” (the binary) to process the data 21
  • 22. GOALS Schema on read Apply Splunk’s index-time schema at search time – Event breaking, time stamping etc Anything else would be brittle & maintenance nightmare Extremely flexible Runtime overhead (manpower >>$ computation) Challenge: Hadoop “apps” written in Java & all index-time schema logic is implemented in C++ 22
  • 23. GOALS Interactive No one likes to stare at a blank screen! Challenge: Hadoop is designed for batch-like jobs 23
  • 25. Hunk Uses Virtual Indexes • Enables seamless use of the Splunk stack on data in Hadoop • Automatically handles MapReduce • Technology is patent pending 25
  • 26. Examples of Virtual Indexes External System 1 index = syslog (/home/syslog/…) Hunk Search Head > External System 2 External System 3 26 index = apache_logs index = sensor_data index = twitter
  • 29. Mixed-mode Search Streaming Reporting • Transfers first several blocks from • Pushes computation to the HDFS to the Hunk Search Head for immediate processing DataNodes and TaskTrackers for the complete search • Hunk starts the streaming and reporting modes concurrently • Streaming results show until the reporting results come in • Allows users to search interactively by pausing and refining queries 29
  • 30. Data Processing Pipeline Raw data (HDFS) Custom processing stdin You can plug in data preprocessors e.g. Apache Avro or format readers Indexing pipeline Event breaking Timestamping Search pipeline Event typing Lookups Tagging Search processors splunkd/C++ MapReduce/Java 30 30
  • 31. Demo

Notes de l'éditeur

  1. Splunk is a different kind of company with a different kind of product. Our technology is built by IT pros for IT pros to be software people will want to use, from novice to guru. The product features one code base. Splunk software is standards-based and built on an open architecture. In addition Splunk is flexible and extensible allowing you to access any data from any format and provide it for viewing across an organization. The Splunk architecture was designed to scale from a single user to truly massive and distributed global deployments. Splunk software doesn’t dumb down or normalize data to fit into a database, potentially removing context. And finally we are easy to work with and provide a transparent support environment. Our documentation is all public, as well as our product roadmap, we even have real engineers on our IRC channel.
  2. Splunk automatically extracts a set of default fields for each event it indexes. You can "create" more "custom" fields by defining additional index-time and search-time field extractions. You can accomplish this manual field extraction through the use of search commands, the Interactive Field Extractor, and configuration files.
  3. Using Splunk's Common Information Model as a guide, you can normalize field names in your IT data so that loading external applications like firewall reports will "just work" with your existing fields. Tag event types to add information to your data. Any event type can have multiple tags. For example, you can tag all firewall event types as firewall, tag a subset of firewall event types as deny and tag another subset as allow. Once an event type is tagged, any event type matching the tagged pattern will also be tagged.
  4. Splunk software enables organizations to gain new insights from this data and a key focus for Splunk 6 is to empower a broader base of users in the organization with this insight – users that extend beyond core IT users.The Pivot interface enables non-technical and technical users alike to quickly generate sophisticated charts, visualizations and dashboards using simple drag and drop. Users can access different chart types from the Splunk toolbox to easily visualize their data different ways. Queries using the Pivot interface are powered by underlying data models, which are usually designed and implemented by users who understand the format and semantics of their indexed data, and who are familiar with the Splunk Search Processing Language (SPL). Unlike traditional BI visualization tools focused on structured data analytics, the Pivot interface enables both non-technical and technical users to easily explore, manipulate and visualize raw, unstructured and polystructured data. It complements existing BI technologies by providing relevant business insights from a rapidly exploding new class of data.
  5. Generate reports on the fly from hard- to-understand data. Create powerful, information-rich reports to do analysis, without an advanced knowledge of search commands. Schedule delivery of any report via PDF and share it with management, business users or other stakeholders. Combine multiple charts, views, reports and external data.View and edit on any desktop, tablet and mobile device.
  6. Dashboards and Views Build interactive dashboards and user workflows with Simple XML, JavaScript and Django Easily add custom styling, behavior and visualizationsOne-click access to develop in the Splunk web frameworkMore Options for UI Extensibility Integrate charts, dashboards, and query results into other applicationsCreate workflow actions that trigger an action in an external systemAlert creates a change request in a help-desk systemExternal / scripted lookups from a database or other systemApplications or interfaces developed on Splunk's REST API and SDKsODBC driver in beta to integrate with 3rd party visualization software such as Tableau, QlikTech and TibcoSpotfire.
  7. Splunk Enterpriseis a standalone solution and the industry-leading platform for machine data with all of Splunk’s core use cases. For customers who are storing historical data in Hadoop, we offer Hunk to run analytics on data stored natively in Hadoop. Hunk targets new use cases, including:– Data analytics for new product and service launches – Synthesis of data from all customer touch points– Comprehensive security analytics for modern threats– Easier big data app development than in raw Hadoop Furthermore, you can use Splunk Enterprise Hadoop Connect to send data between Splunk Enterprise and Hadoop. Many accounts may decide to buy both Splunk Enterprise for real-time monitoring and real-time search together with Hadoop for exploratory analytics of historical data stored in Hadoop. With this combination, you can run searches across native indexes in Splunk Enterprise and Hunk virtual indexes for data in Hadoop.
  8. Splunk Enterprise enables real-time analytics with managed forwarders for data ingest. For the most part, you can use monitor to add nearly all your data sources from files and directories. However, you might want to use upload to add one-time inputs, such as an archive of historical data. You can enable Splunk to accept an input on any TCP or UDP port. Splunk consumes any data sent on these ports. Use this method for syslog (default port is UDP 514), or set up netcat and bind to a port. TCP is the protocol underlying Splunk's data distribution and is the recommended method for sending data from any remote machine to your Splunk server. Splunk can index remote data from syslog-ng or any other application that transmits via TCP. However, there are times when you want to use scripts to feed data to Splunk for indexing, or prepare data from a non-standard source so Splunk can properly parse events and extract fields. You can use shell scripts, python scripts, Windows batch files, PowerShell, or any other utility that can format and stream the data that you want Splunk to index. You can stream the data to Splunk or write the data from a script to a file. All data that comes into Splunk enters through the parsing pipeline as large chunks. During parsing, Splunk breaks these chunks into events which it hands off to the indexing pipeline, where final processing occurs. During both parsing and indexing, Splunk acts on the data, transforming it in various ways. Most of these processes are configurable, so you have the ability to adapt them to your needs.To kick off a real-time search in Splunk Web, use the time range menu to select a preset Real-time time range window, such as 30 seconds or 1 minute. You can also specify a sliding time range window to apply to your real-time search. This defines a real-time buffer. The Splunk Index is the repository for Splunk Enterprise data. Splunk Enterprise transforms incoming data into events,which it stores in indexes.
  9. Hunk brings Splunk software's big data analytics stack to your data in Hadoop. Explore, analyze and visualize data, create dashboards and share reports from one integrated platform that works with Apache Hadoop or the Hadoop distribution of your choice. The Splunk Virtual Index decouples the data storage tier from the data access and analytics tiers, so that Hunk can transparently route requests to different data stores. Hunk uses this foundational patent-pending technology to enable seamless interactive exploration, analysis and visualization for data stored in Hadoop. You can create multiple virtual indexes that extend across one or more Hadoop clusters. Virtual indexes contain pointers to the data, such as assigning all files in a directory as an index, so you can prune partitions for faster search performance. With Hunk, even time stamp extraction and event breaking are done at search time.
  10. One of the key innovations in this product is Splunk Virtual Index technology. This patent-pending capability enables the seamless use of almost the entire Splunk technology stack, including the Splunk Search Processing Language for interactive exploration, analysis and visualization of data stored anywhere, as if it was stored in a Splunk Index. Splunk Analytics for Hadoop uses this foundational technology and is the first product to come from this innovation.To configure the virtual index, specify the external resource provider the virtual index is serviced by and specify the data paths that belong to this virtual index.
  11. A virtual index is a search time concept that allows a Splunk search to access data and optionally push computation to external systems. A virtual indexbehaves as an addressable data container that can be referenced by a search. Virtual indexes contain pointers to the data – such as all files in this directory belong in this index. Since the data that resides in the external system is not under direct management of Splunk, retention policies cannot be applied to the datasets that make up virtual indexes. And data in external systems such as Hadoop will often not be optimized for search. Hunk is able to provide access to and perform analytics on data that resides in external system by encapsulating the data into addressable units using virtual indexes, while utilizing external resource processes to handle the details of pushing down computations to the external system. There are several key reasons for having multiple indexes: To control user access. To organize how you search data across disparate data sets. To speed searches.You can define a virtual index as the contents of an entire Hadoop cluster, or sub-sets of data in that cluster such as by data type.
  12. Hunk starts the streaming and reporting modes concurrently. Streaming results show until the reporting results come in.Allows users to search interactively by pausing and refining queries.This is a major, unique advantage of Hunk compared to alternative approaches such as Hive or SQL on Hadoop which require fixed schema in an effort to speed up searches, while Hunk retains the combination of schema on the fly with results preview.
  13. Before data is processed by Hunk you can plug in your own data preprocessor. The preprocessors have to be written in Java and can transform the data in some way before Hunk gets a chance to. Data preprocessors can vary in complexity from simple translators (say Avro to JSON) to as complex as doing image/video/document processing.Hunk translates Avro to JSON. These translations happen on the fly and are not persisted.