2. Pragnya Dash
• Development Engineer, Web Operations
• Splunk administrator at Ancestry.com
• Setup, maintain and support Splunk environments
“Our developers are very excited about using Splunk. The
product is very easy to use.”
2
5. Before Splunk: Limited Operational Visibility
No central area to
gather and analyze
the logs
Difficult to collect
data
Cumbersome Root
cause analysis
Long time to
resolve issues
No single tool to
identify functional
and performance
issues
Custom, silo-ed
tools have high
support costs
Disparate
Systems
Problems
Troubleshooting
No Enterprise
wide Solution
5
6. Why Splunk?
Business risk of system downtime and
slow issue identification and resolution
Enterprise-grade – standard platform
across all teams and implementations
Troubleshoot both functionality and
performance
Proactive performance monitoring
Need for Operational Excellence
6
7. Where We Are Today
20
200
750+
stacks
hosts
sourcetypes
Started with
Splunk in late
2012
7
8. How Do We Use Splunk?
Network Operations
Application troubleshooting
Development
DevOps
8
19. • Custom appender for logging – compatible with Java and .NET apps
Example Pseudo-Code:
void submitPurchase(purchaseId)
{
log.info("action=submitPurchaseStart, purchaseId=%d", purchaseId)
//these calls throw an exception on error
submitToCreditCard(...)
generateInvoice(...)
generateFullfillmentOrder(...)
log.info("action=submitPurchaseCompleted, purchaseId=%d", purchaseId)
}
• Human-readable events
• Properly formatted timestamps
• Key-Value pairs (JSON Logging)
• Separate out multi-value events
• Unique transaction identifiers
Standardizing App Logs - Semantic Logging
19
20. Splunk for Application Development
Identifying Defects
– Find and fix quickly to reduce time-to-market
Complying with SLAs
– Benchmarking application performance
– Monitoring API endpoints
Extracting data via the Splunk REST API
– In JSON format to put into HDFS for long-term storage and machine learning
20
21. Reaching Operational Excellence
• Transaction visibility : Track transactions across multiple
components using common IDs
– Stitching transactions together across various machines by tracing
common user and session IDs in the logs
• Network monitoring: Real-time and historical view into
network activity
– When network switches reboot, they doesn't retain the logs locally,
but Splunk captures all the data for fast issue identification and
resolution
21
22. Future Goals
• Security use cases (just beginning work)
• Modifying Development process to maximize value of Splunk
• Integration with Hadoop
– Hadoop team is currently leveraging the Splunk API
• Dashboards for product management and marketing for
insights into customer behavior and feature usage
22
23. Best Practice Recommendations
Communicate the value of Splunk to the executive team
– Enterprise-wide solution
– Opportunity cost of lost revenue brought on by downtime
– Reducing operational support resources
2
Ancestry.com Inc. is the world's largest online family history resource, with more than 2 million paying subscribers. More than 11 billion records have been added to the site in the past 15 years. Ancestry users have created more than 40 million family trees containing approximately 4 billion profiles. In addition to its flagship site, Ancestry.com offers several localized Web sites designed to empower people to discover, preserve and share their family history.