20. SMALL DATA
Definition of
“Connects people with timely,
meaningful information, organised and
packaged for human consumption”
- Allen Bonde
21. SMALL DATA
A better definition of
“Connects people with timely, meaningful
information, organised and packaged for
human consumption and empowers them
to take action”
- Ben Foster
26. Turning Data into Information
• Translation – “What language is this?”
• Detection – “What just happened?”
• Relevance – “Are you talking to me?”
• Importance – “Do I care?”
• Information – “I can make a decision!”
• Action – “Don’t just stand there, do something!”
Always willing to listen to the opinions of minds greater than my own.. Often that’s quite a wide field.
Informatica are pretty big players in the data and information world, so they should know a thing or two about the topic and they have this to say.
“……”
I would agree with that statement… mostly..
The way I see it, we have a problem, you could say it’s a *big* problem.
If we want to get data into the hands of the right people, we need to consider what we are going to give them.
In the ever more connected world of ‘Internet of Things,
Data is being produced at exponential rates – just about everything is connected and everything produces data, and its getting worse.
So those ‘right people’ are going to be Faced with a Tsunami of Data. Its just going to be a huge towering wall of data coming at them.
“Drinking from the Firehose" is often used to characterise the rate and volume of data available… I’m not sure that’s a good thing. It doesn’t sound like a lot of fun to me.
Put quite simply, there is too much data for anyone to cope with.
Our problems don’t stop there, not only do we have masses of data being served up to us in a continuous stream, what we do get is the wrong sort of data, and its wrong in a couple of ways:
Unsuitable Granularity – often the data is too small to be of any significance.
No Context – often we don’t know what business process or domain object or real ‘thing’ the data relates to. We don’t know when it was recorded, in short we have not context.
No Relevance – If we have no context, its hard to know if this is data that is relevant to us. It could be somebody else’s data, it could be something totally unrelated to what we care about. Its hard to know!
Too hard to understand – Its often in a very raw format and very hard to understand without prior knowledge and some translation work.
Not fit for human consumption – Basically, this isn’t for humans, in this form, its just a wave of difficult to understand, difficult to interpret data.
The kind of data that the IOT and other connected systems provide us with is the wrong sort of data.
This data is very fine grained, it has no context, no relevance, no meaning
Its too hard to understand what its trying to tell us and we certainly can’t deal with it at high speed and high volume.
In short, this data isn’t for humans.
Given we have this slightly meaningless, hard to understand data we need to spend time actually making it make sense.
This processing and interpretation of data takes time, sometimes it takes lots of time. This time delay between raw data and useful data that humans can deal with means that the data loses its real value.
Tells us why something happened, yesterday isn’t much use really. Sure, we can change processes after the fact and we can move thresholds but ultimately, the situation has passed, the crisis has either happened or been averted, no thanks to data.
When we process this raw data to turn it into something that makes any kind of sense to a human, the processing takes time.
Often, in this processing time, the data loses its value.
Thinking critically about the situation, because we perhaps don’t really know what we want from our data, we have a temptation to ‘record everything and figure it out later’ with an added splash of ‘throw some computing power at it’.
However, when we really think about it, our approach of recording everything, actually gives us nothing of any real value… sure, we have a LOT of no real value.. But lots of nothing useful is still.. Nothing useful.
When we are overloaded with data, the wrong kind of data, at the wrong time, its very difficult to come to any kind of conclusion about anything
And when you can’t reach a conclusion, you will struggle to make any kind of decision
And if we can make a decision, we probably wont be able to DO very much of anything
And after all, that what most businesses rely on employees doing… making decisions… preferably, informed decisions.
So, we need to tame the data deluge, we need to make sense of it all.
Big Data seems a likely candidate!
Big data is a term used to describe the collection, processing and availability of huge volumes of streaming data in realtime.
- More data at your disposal!
- Faster!
- Remove the silo’s, break down the walls!
Three V’s are used to characterise this change in approach
“volume, velocity, variety” - Doug Laney
Combine the streams to identify:
Correlation
Causation
Statistically Valid Models
And make more accurate decisions.
Ok, so we have a way forward, it sounds like we can use big data to solve our problems
…. I’m pretty sure the ghostbusters said crossing the streams was a bad idea!
Retitle to ‘Pick a Tool.. Any tool!’ or ‘Choose your Weapon!’
Many Tools
Crossed Purposes
Unclear application
There are perhaps as many big data tools as there are articles about big data.
The array is dizzying.. Too many to choose from.
No one size fits all, some are complimentary, some don’t play well together, some have overlapping features.
So what should we do?
When things get complicated, we used to call a consultant, now we need a Data Scientist. All the cool kids are doing it, add a splash of predictive analytics and good things happen.
If all goes well, your big data project *might* bear fruit.. “Leverage the hidden connections in your data for new competitive advantages”
.. Maybe
In scientific analysis, running ALL the tests WILL allow you to find SOMETHING… but it probably wasn’t what you were looking for, and you don’t understand why its significant. Running ALL the tests is generally frowned upon.
“It is not enough to do your best, you must know what to do, and then do your best”.
Mention the Data Science twitter account – it spews out masses of tweets about big data, data science and many related things all day every day. Its like its trying to become its own source of big data. Lots of noise, very little signal… and if these are the people who are supposed to champion Big Data… what hope do we actually have! There is so much hyperbole and a lot of it is created by the people we look to for clarity!
@BigDataScience – 95 tweets in 24 hours. 29000 tweets over all time…what!!!!!!
If you DO get a Data Scientist.. Get a good one!
Many things are needed for a successful big data project.
Time to understand the problem, and the potential uses for your business.. No one size fits all!
Money to build the infrastructure and deploy the tooling
Skills to make use of the tooling your deploy and apply it to your data
And actually… you probably don’t have enough data to call it true ‘Big Data’…..
And this reminds me of the The Big Data strategy of many companies.
“Get the data and we’ll figure out what to do with it later.”
“Right Tools, Right Skills… what was the question again?”
Big Data is over hyped
Its confusing
Its generating more data than we know what to do with
Its about tech and machines, not people and doing
Right now, Its abstract, not practical
It can tell you WHY something is happening, but its usually after the fact.. And for most people, after the fact is near useless.. Most businesses are not that mature
Maybe its time to take a breath and think about an alternative…
In many ways, Small data is better defined by what we don’t want.
I don’t WANT “everything”.
I don’t want every possible piece of data about every conceivable thing.
I’m not dealing with “might be interesting”, or “may look at it later” or “could be related”.
My questions are much more specific – I’m not really concerned with hypothesis.
I don’t WANT to “interpret”.
I don’t want to go dot to dot
I don’t want to infer context, I don’t want to guess about what is happening
I WANT bigger bits of USEFUL data
I want it digestable
I want it readable
I want it meaningful
I WANT to focus on WHAT, NOT why.
Immediacy is important
I’m focussed on what is happening right now, I don’t want to consider why things are happening, at least not yet.
There is much more value in knowing where I am right now rather than in understanding how I got here and where I might be going in the long term.
I want to concentrate on the things that are important to me, the things that will support my decision making, now.
In actual fact, we want INFORMATION, not data.
We want to create a drinking fountain instead of a fire hose.
We want to reduce the pressure.
We want to control what we drink and how often we drink it.
We want it accessible, timely, manageable and meaningful.
Small Data, as maybe you would hope, started out as a couple of principles and practices that we found emerging as we worked on a couple of data related projects.
When we started, it didn’t have a name, it was just things we did that made our work more useful and our projects more meaningful.
After a while we started to look for some kind of formal definition because, you know, everyone likes to have a name for their ideas and everyone loves a bit of validation around they way they work.
After a bit of digging we found this definition by Allen Bonde, which seemed to fit the bill really nicely. It fits with a few things that I often mumble when I’m working.
… and of course, there is always that one guy that is never happy…
In my case, that guy, quite regularly is Ben Foster.
When we are working, we often ask ourselves the question “So What?”, we’ve got all this data, we’ve refined it, we’ve turned it into information.. Now what? What’s the point?
The point is, as Ben so ably pointed out when I showed him the first draft of this slide, is that information is only useful if you do something with it.. So we extended the definition a little.
Being empowered to take action is very important to us. It’s the whole point actually.
Real-time
Tells us what is happening, now
Empowers us to makes a difference, now
Traditional BI implementations have reduced the amount of time it takes to ETL data, the time it takes to get from data to information but they are still essentially, ‘after the fact’ systems.
Big data originally sacrificed query speed for sheer richness of data. This query speed is dropping all the time, even for ad-hoc queries, but there is still a way to go.
For many of us, we want to know about a significant business event ‘as it happens’. We need to know what has happened, when it has happened.
We need to know this so we can take action to exploit opportunities and tackle problems.
In this kind of business, 5 minutes ago it probably too late.
In a small data system we work to narrow the focus down to critical items, the things that we really need to know, so that we can see them happen in real-time with absolute clarity.
Sorting the wheat from the chaff
Narrowing the focus
Highlighting the important and ignoring the irrelevant.
Detect, Prioritise, Highlight
Being able to do the day job rather than spending time analysing and interpreting
Managing by exception
Meaningful
=======
More Signal, Less Noise
Understandable Information
Highlights Business Events
Includes Business Context
Relevant and Actionable
No interpretation needed
We are actively informed when things we say are important happen.
Each event contains enough business context for it to make sense to a human
Enough information is provided to allow us to take action on what it tells us.
Alerts and Notifications
Flexible Definitions of what is important
Presented for conclusions, not investigations
Quickly accessible by the people that in need it in a form they can understand, with enough supporting information for it to make sense quickly.
Organised & Packaged
Accessible
Contextual
Relevant
Packaged and Presented for Humans
Available when and where its needed
Intuitive
Less Analysis, More Action
Decide and do more
Move Faster
Prevent Problems
Close the decision loop faster
Decide… then DO!
Overview of the technical steps – Ben to discuss as he demo’s?
We will show one implementation but the process of translation, detection, filtering, prioritisation and action is generic and can be applied to any business.
Relevance – in many connected systems, we see data for other systems which we can chose to take or ignore, we may have a better source, so we need to know if its relevant to us.
Importance - is the application of context – for example, We can have high volume betting on popular events (FA Cup) but there IS a threshold that makes the volume important, even for that event.
Information – we put the bits together and present it as a human readable piece of information – no digging – a straight up lump of information and we can make an informed decision
Action – We have to do something about what we are told… otherwise.. Whats the point?! The computer maybe can’t help us with this.. But it can prompt us!
So, we’ve heard a lot of theory. Probably time to show this thing in action.
In the past, we’ve implemented ‘Small Data’ systems for a couple of international airports, a couple of international telecoms operators and one national rail infrastructure provider.
For reasons of international copyright I must say that what you are about to see… is none of those things….
The principles are the same, the technology is conceptually similar, but not the same
..and the names have been changed to protect the innocent…
High Betting Volume
High Betting Value
Marker player bet – multiple marker player bets and betting pattern can also be important.
Mention self exclusions.. Would be good to model, but they simply wouldn’t be allowed to place a bet. (When the Fun stops, stop)
In the modern bookmaking world an awful lot of ‘trading’ activity as its known is automated.
Data is supplied from a range of ‘feed providers’ who tell us what is happening at a sporting event in realtime.
This data is translated and fed into a algorithmic model that determines the odds (also known as the prices) that we offer to customers.
These prices are then offered to sports books in a range of locations (shops in many countries, online and mobile) and customers can place their bets.
Most of the time, we let the computers do their thing and all is well with the world.
However, like most other businesses, we do prefer to make money rather than lose it. Our customers feel differently.
In some situations we would like our traders to override what the automated systems do:
If we have suspect betting behaviour such as high bet volumes or high bet amounts on any one event, we would want to trade manually
If we have significant amounts bet by ‘marker’ customers or more than one marker customer then we may want to trade manually to limit our exposure (Racing Post for example)
If we have simply taken a lot of legitimate bets and created a high liability for the company, we may want to trade manually, either altering the odds or suspending trading.
However, with some kind of sport happening in the world 24 hours a day and your average football match having approximately 200 ‘things’ you can bet on, keeping track of where the traders should focus their attention can be very difficult and the picture changes every second. This seems like a good candidate for Small Data.
Its Mature Technology
You can make it happen today
Take baby steps, focus on the smallest thing that will deliver value to your business
Its not Rocket (or Data) Science
Focus on Business Problems, not Technology Solutions
Go try it now!
Small Data and Big Data are not mutually exclusive
Small Data or Big Data is in some regards, a question of maturity.
Collect the data you need *now* - grow it organically
Go Big.. Later, but I’m betting that will be much later.
You might want to consider a lambda architecture where data is processed according to need. There is a speed layer for analysing and presenting the things you absolutely must know *now* (although accuracy is compromised). There is a batch layer for things you don’t mind waiting for but that must be right and a serving layer that combines the speed and batch layers to present the most accurate, most up to date picture it can. It tries to be the best of both worlds but it has inherent complexity.
Earlier we saw the definition of small data. If we think a little bit bigger than a strict definition, I think there are some guiding principles for small data
and taking a little inspiration for the Agile Manifesto I think we can get close to some really good guiding principles.
We’re all people, we all want to think we can make a difference in the world.
By focusing on Information, not data, we can make a difference,
By focussing on interacting with people rather than slavishly following process, we can make a difference.
By spending our time on taking appropriate action, not performing deep analysis, we can make a decision.
Small Data can help you make informed decisions… try it.
Definition of Small Data
https://en.wikipedia.org/wiki/Small_data
Structure
======
A. Illustration of the Problem
-----------------------------------
Instead of introducing myself, provide a huge stream of data.
Who I am – DNA information
Where I’m From – Stream of GPS tracking data (journey to belgium)
Where I’m Going – Stream of calendar information and GPS information about the coming weeks – meetings etc
What I do – Show a steam of code and build information
What I like – Stream of purchase information
*Maybe play that information as a sound at the start to get their attention*
Then ask – did you get that? All the data is there? Did you miss it? I could play it again?
So, despite having all the data, you missed the important stuff
Who I am
Where I’m From
Where I’m Going
What I do
What I like
You missed the *information* because you don’t have the skills, technology, time, money or quite frankly the motivation to do you own analysis
Where are we now – Data, Data Everywhere
----------------------------------------------------------
Stats about the number of devices, the amount of data (something nice and animated like the WW2 presentation) (they rarely fail due to technology)
The situation isn’t going to get any better… how fast can you analyse?
Big data tools are getting better but its an arms race.
Categorise data analysis, business intelligence and what we have.. What actually is that called??! https://www.promptcloud.com/blog/business-intelligence-Vs-data-analytics/
C. Big Data – Jam Tomorrow – Why it doesn’t work
--------------------------------------------------------------
Its over hyped
Its confusing
Its generating more data than we know what to do with
Its about tech and machines, not people and doing
It can tell you WHY something is happening, but its usually after the fact.. And for most people, after the fact is near useless.. Most businesses are not that mature
Its abstract, not practical
(How many talks of big data in practical use Vs talks about Big Data Tech are there?) – we’re always talking, not doing.
Big Data appears to be generating its own big data – so many standards, so many products, so many articles, so much talk. But who is doing anything with it. The short answer is, some of the bigger companies are, but for your average company, Big Data is just a buzzword that has little practical impact.
(Smaller companies fall back on traditional BI, not ideal and usually backwards looking)
E. If not big data then what? - Small Data
--------------------------------------------------
Take Small Steps
Aim at the important stuff
Big Data later, potato
F. Just Do It – How to do it
--------------------------------
Demo – Flights, Football, Garden Centre (put our money where our mouth is… we need data and we need a problem and it needs to be dynamic
Discuss the stages of the demo – use Data, Data Everywhere content – Acquire,
G. Summary
========
https://en.wikipedia.org/wiki/Small_data
"Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged
“Small Data is about people, Big data is about machines”
We have the means to do it now
You can make it happen
Its not rocket science
Its about business problems, not tech solutions
Doesn’t exclude Big Data later
Information, not Data
People, not Process
Action, not Analysis
Resources
======
https://www.promptcloud.com/blog/business-intelligence-Vs-data-analytics/
Definition of Small Data
https://en.wikipedia.org/wiki/Small_data
Structure
======
A. Illustration of the Problem
-----------------------------------
Instead of introducing myself, provide a huge stream of data.
Who I am – DNA information
Where I’m From – Stream of GPS tracking data (journey to belgium)
Where I’m Going – Stream of calendar information and GPS information about the coming weeks – meetings etc
What I do – Show a steam of code and build information
What I like – Stream of purchase information
*Maybe play that information as a sound at the start to get their attention*
Then ask – did you get that? All the data is there? Did you miss it? I could play it again?
So, despite having all the data, you missed the important stuff
Who I am
Where I’m From
Where I’m Going
What I do
What I like
You missed the *information* because you don’t have the skills, technology, time, money or quite frankly the motivation to do you own analysis
Where are we now – Data, Data Everywhere
----------------------------------------------------------
Stats about the number of devices, the amount of data (something nice and animated like the WW2 presentation) (they rarely fail due to technology)
The situation isn’t going to get any better… how fast can you analyse?
Big data tools are getting better but its an arms race.
Categorise data analysis, business intelligence and what we have.. What actually is that called??! https://www.promptcloud.com/blog/business-intelligence-Vs-data-analytics/
C. Big Data – Jam Tomorrow – Why it doesn’t work
--------------------------------------------------------------
Its over hyped
Its confusing
Its generating more data than we know what to do with
Its about tech and machines, not people and doing
It can tell you WHY something is happening, but its usually after the fact.. And for most people, after the fact is near useless.. Most businesses are not that mature
Its abstract, not practical
(How many talks of big data in practical use Vs talks about Big Data Tech are there?) – we’re always talking, not doing.
Big Data appears to be generating its own big data – so many standards, so many products, so many articles, so much talk. But who is doing anything with it. The short answer is, some of the bigger companies are, but for your average company, Big Data is just a buzzword that has little practical impact.
(Smaller companies fall back on traditional BI, not ideal and usually backwards looking)
E. If not big data then what? - Small Data
--------------------------------------------------
Take Small Steps
Aim at the important stuff
Big Data later, potato
F. Just Do It – How to do it
--------------------------------
Demo – Flights, Football, Garden Centre (put our money where our mouth is… we need data and we need a problem and it needs to be dynamic
Discuss the stages of the demo – use Data, Data Everywhere content – Acquire,
G. Summary
========
https://en.wikipedia.org/wiki/Small_data
"Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged
“Small Data is about people, Big data is about machines”
We have the means to do it now
You can make it happen
Its not rocket science
Its about business problems, not tech solutions
Doesn’t exclude Big Data later
Information, not Data
People, not Process
Action, not Analysis
Resources
======
https://www.promptcloud.com/blog/business-intelligence-Vs-data-analytics/