2. My Background and Role
Patrick Farrell, Sr. Software Engineer
– Resident Splunk Administrator and Champion
– Started using Splunk two years ago as a
developer for our eCommerce platform
– Responsible for Splunk administration,
maintenance, custom application
development, and dashboards
– Splunk Community of Practice
owner at Cardinal Health
3. Company Overview
•
•
•
•
•
Founded in 1971
Over 30,000 employees
Headquarters in Dublin, Ohio
Ranked #19 on the Fortune 500
Cardinal Health helps pharmacies,
hospitals, ambulatory surgery centers
and physician offices focus on patient
care while reducing costs, enhancing
efficiency and improving quality
4. Before Splunk
Manual search on 30+ servers
using Unix command-line
programs (Awk, Grep, Tail)
Operational support and
development groups spent
hours on root cause analysis
and problem resolution
No insight into customer
usage of our applications
No ability to be proactive
with customer support
5. Splunk at Cardinal Health
Data sources
– Application Logs
– Access Logs, System Out,
System Err, GC, and other
custom application logs
– 25 individual source types
– 250+ individual sources
Indexer, Search Head,
Deployment Server, and
License Master
60 GB
Per Day
Splunk used in pre-production
and production environments
More than thirty individuals
actively using Splunk on a
regular basis
Forwarder
Forwarder
30+ Forwarders (5 Server Classes)
6. Splunk Use Cases
“Splunk is our Swiss Army Knife”
Improving Root
Cause Analysis
Gathering Customer
Usage Statistics
Increasing
Efficiency
Proactive Customer
Support
7. Return on Investment
“One of the most important benefits of
using Splunk from an application development
standpoint is illustrated by how it has helped
us clean up our logging code.”
8. Increased Efficiency
100+ developers on a single application,
there can be lines of erroneous code
– 1.2 million severe error messages / hour
Splunk is used to analyze application logs
during performance/endurance testing
The punct command is your friend
Key benefit:
Splunk helps us clean up our code
– Capacity savings (storage, license)
– Improved efficiency (speed)
– Reduced spam
9. Improved Systems Uptime and Performance
Writing Splunk friendly code
– Inventory Manager
Splunk’s search processing language
allowed us to easily perform analysis
once considered impossible from
the Unix prompt.
Analytics for:
–
–
–
–
–
Most active accounts
Most invoked operations
SQL Database contention
Longest running operations
Exceptions encountered
17. Improving Customer Satisfaction
•
Splunk alerts us when customers
see the contact help desk
message on our site
– Reach out to customer immediately
•
Immediate support = happier
customers = more revenue
•
Gathering customer usage data to
identify which functionality should
be enhanced or retired
18. Reducing Root Cause Analysis Time
Searching logs across many
application servers can take
hours. Remember, time is
money!
Now an alert or search helps us
identify most issues in seconds!
22. Results with Splunk
Reduced
Down time
The most important
benefit to our large ecommerce application
is reduced down time.
Every minute of down
time results in a
significant loss of
revenue.
Improved
Customer
Satisfaction
Increase
Efficiencies
We were able to
reduce our daily
indexing volume by
3 GB by identifying
and eliminating
defects that produced
in excess of 1.2 million
severe events per
hour.
Reduced
MTTR
Application
Enhancements
We can determine
the focus of future
enhancements by
monitoring how our
customers are using
the site. Likewise,
we can also identify
unused functionality.
Thank you, punct!
22
Searching and
Reporting
Ability to drill down to
specific areas and find
issues in seconds
instead of hours.
23. Best Practice Recommendations
Splunk is an amazing platform as long as you are prepared for it!
Create a roadmap that outlines how you intend to use Splunk and
where you would like to take the product within your organization.
Plan your environment and account for future growth (users,
searches, license volume, hardware capacity, storage, etc.).
23
24. Best Practice Recommendations
Generate a unique identifier for each transaction and write it to the log
as part of each event so that you may easily identify all related events.
Take advantage of automatic field extraction using key-value pairs or
use a logging format such as JSON that can provide automatic field
extraction.
Capture execution time in log events for an added dimension
24
25. Future Plans
Expanding use of Splunk to our Medical eCommerce Platform
Creation of additional operational and business dashboards
Evaluate the possibility of using Splunk in DEV and QA
25
And so, if a developer for example wanted to realize, you know, or wanted to identify the root cause of a problem, they may have significant difficulty locating the log information. And originally, our environment was kind of – it was larger, there were more servers in the mix. And so, it would really be almost, you know, a never ending process to try and find the place where the problem occurred, when we had to physically log in to each box, examine, hover many log files on each box and then move on to the next one and see, you know, where the problem originated.Patrick Farrell: We would definitely increase our, let's say, outage time, right, downtime, if we were having it, because trying to locate a problem was, you know, quite a challenge. And so, by adding Splunk to the mix, especially once we manage to stabilize our environment, you know, we have definitely seen some benefits in – not necessarily in support, but reducing or cleaning up some issues that exist in our application.Patrick Farrell: Correct. Those users that – those customers that are currently on the site, potentially, that are experiencing difficulties. Maybe they're presented with the contact the help desk type message. And so, I'll see that. So, with that information, our goal is that maybe we can make a proactive attempt to contact a customer. We haven't gone this far yet. But for example, automatically pop up a message on the customer's screen saying, "Hey, would you like to speak with customer support about the issue you're experiencing?" type of thing, that kind of push to the customer and say, "Look, you know, we're here for you. You know, come take advantage of the opportunity to speak with us about the issue that you're experiencing."
Patrick Farrell: There are 30-some forwarders; I think 33 or so forwarders right now. We collect log data from essentially custom application logs. We collect like HTTP log data. We're bringing in log data from our – let's see, like JVM type logs, robust GC logs. Where else do we pull data from? System out, system error. So, we have a number of source files as well as it's not just Order Express that uses it. We also have our EDI group. They use Splunk as well, the same Splunk installation. They use that for their source types. They have well over 200 individual source files that are managed and indexed by Splunk, probably on the order of about 20 source types just for them. Let's see. By the way, you know, I want to switch gears just for a second. (Cathy), I just pinged (Scott). He said that Splunk was the only – really the only tool being considered. He did say they briefly looked at an IBM tool. But really he said it was far more expensive and less functional than Splunk. Patrick Farrell: Well, right now, we're basically consolidated onto a single virtual machine. And I'll tell you that it's an undersized virtual machine. We handle, I'll say, about – just our production server alone handles about, I'll say, 60 gigabytes a day of log volume coming into it and that's going through a single virtual machine. It's a Linux operating system and it handles – let's see, it's got the deployment server, license server. It's on a license master, and (indexer and search) all in one virtual machine.
Patrick Farrell: And what we do use for it in our stage environment is specifically to analyze performance testing results. So, when – and this is our – probably one of our biggest benefits that we've seen from Splunk from an application development standpoint is just cleaning up the code. So, like I said, we have a large development team and everybody is off doing their thing. And it's – when you try to bring people together, and you bring this whole thing together, and you put it out there and you look at the finished product and you see, "Wow, maybe there's a million severe error messages an hour in the production logs."Patrick Farrell: And you look at that, you say, "A million severe error messages an hour. Do I really need a million severe error messages an hour? My system is still functioning. I'm not getting alerted. Why is it doing this?" And so, what we're using it for is to go back almost retroactively and find the places in the logs where people were either printing worthless log statements. To give you one example, I found one in there that was printed 1.2 million times an hour in the log and it had nothing in it. Like ...Patrick Farrell: I have – basically – I said before, I was a developer and the team that I was a developer for is called the Inventory Manager. And Inventory Manager, that particularly piece of Order Express or the larger application, is using these logs. As a developer, I was basically the one who wrote the logs, so I knew the most about what was going into those logs. I also had a lot of control of the information and how I was going to write that information to the log. And ultimately, it just – it ended up being very advantageous to me to – you know, to change the way I was writing these logs so that Splunk would – you know, they were naturally useful to Splunk. And so, that information specifically allowed me to build some pretty interesting dashboards, just most specifically from an operational standpoint first and then more from a business perspective, trying to show – from an operational perspective, I show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: Correct. There is information like that that might be in these logs that is completely just dead – it's dead information. There's really no use for it. And so, we're going back and we're taking these statements out because when you add those statements up, the overall size of those statements, we may have gigabytes worth of data per day that is just from one statement in our logs in production. And so, for us, that's not – you know, there's benefit, there's monetary benefit in our case to take those nasty statements out, clean them up and move on. Not to mention, they shouldn't be there in the first place. Our system will run faster if we don't have to write these silly statements to the log. So – I mean it's that kind of retroactive stuff at the moment. We do it in our stage environment. We also use stage environment to look for the most frequently occurring messages; for example, the pump command.
And what we do use for it in our stage environment is specifically to analyze performance testing results. So, when – and this is our – probably one of our biggest benefits that we've seen from Splunk from an application development standpoint is just cleaning up the code.
Patrick Farrell: And what we do use for it in our stage environment is specifically to analyze performance testing results. So, when – and this is our – probably one of our biggest benefits that we've seen from Splunk from an application development standpoint is just cleaning up the code. So, like I said, we have a large development team and everybody is off doing their thing. And it's – when you try to bring people together, and you bring this whole thing together, and you put it out there and you look at the finished product and you see, "Wow, maybe there's a million severe error messages an hour in the production logs."
show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production. The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production. The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production. The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production. The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production. The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production. The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production. The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
show, let's see, the most frequently – you know, the most – the top 10 accounts for example that are using the system.Patrick Farrell: I see the most frequently invoked operations and their accounts throughout the day. I see the longest running operations. I see the database SQL contention, whether or not there are any transaction timeouts for example at the database level. What else? I see the overall number of business and system exceptions individually, of course, so that way I can – technically in production, we shouldn't have any what are called the business exceptions. Those are what we consider business role validations or violations, and those should be caught during testing. So, I should be able to look at this graph or this – what we have is a radio gauge. I should be able to see it basically at zero all the time. And if I don't see that at zero, then we have defects that are there in production. The question is how are those defects – you know, where are those defects? Are they on the service layer? Are they on the front end? Is it a front end validation that failed and the service layer caught it? That type of thing. So – then of course, there are system exceptions which are completely unexpected. You know, cases that the system caught. So, I see those. I also see which users are having the most difficulty with the system. A Additionally, I had a couple other notes. On the business side of things, I also track the type of transactions that are being executed on the system and (when) throughout the day they are executed. I track how certain pieces of functionality are being used. For example, a given report may have, let's say, five inputs or one – the report may have one particular input that has five options, and maybe the users are only using two of those options. And so, we're supporting functionality for three others that really nobody is using. So, that knowledge gives us the ability to go back and say, "Well, if nobody's using this functionality over the course of a month, two months, three months, et cetera, or more, do we really need to keep this functionality? Should we retire it?"Patrick Farrell: ... the more (I thought) about it, I was like, "Ooh, OK. So I can extract the execution time field." I'm like, "Wow, that's really useful. So now, I could kind of do aggregated searches across all my (boxes)." And then I though about it some more, I was like, "Ooh, OK." So now, I've got all these cool things like subsearches and, you know, this rich query or this rich search language, the search processing language that Splunk has. And I quickly fell in love it. I was like, "Wow, this is great. I can do some amazing things that I could never do with my data before from the UNIX prompt. I can now do that in Splunk. And take this, you know, basically to a whole new level." And that's essentially what excited me, you know, about the product. It was basically the richness of that search processing language.
I also see which users are having the most difficulty with the system. And so, the idea that the … Correct. Those users that – those customers that are currently on the site, potentially, that are experiencing difficulties. Maybe they're presented with the contact the help desk type message. And so, I'll see that. So, with that information, our goal is that maybe we can make a proactive attempt to contact a customer. We haven't gone this far yet. But for example, automatically pop up a message on the customer's screen saying, "Hey, would you like to speak with customer support about the issue you're experiencing?" type of thing, that kind of push to the customer and say, "Look, you know, we're here for you. You know, come take advantage of the opportunity to speak with us about the issue that you're experiencing.”
I would say biggest business impact would be ability to identify issues in a complex environment quickly which will reduce outage time. That's probably our biggest benefit because as a large e-commerce application with, you know, doing as much business as we do. You know, on a daily basis, you don't really want to be down for long. Because every second you're down is orders you're not receiving, and those customers will be happy to take their business somewhere else. So, you really want to get your systems running. You want to get – you want to identify the problems quickly and you want to get them resolved so that you're not alienating your customer base.