12. 1000 BC Era Math, Computation, Logic
12
We start to develop understanding through numbers
500 BC Pythagoras
300 BC Euclid
And how numbers can be used to compute data
250 BC Archimedes
100 BC Antikythera Mechanism
We start to classify objects and use logic to derive insights
350 BC Aristotle
13. 0 - 1600
13
78 Pliny : โallโ the worldโs knowledge captured
105 Paper : bulk recording of data
340 Codices : making data browseable with sections and indexes
1350 Nicole Oresme : turning data into picture
1453 Guttenberg : mass distribution of data
14. 1600 - 1900
14
1640 Napier : before logs there were logarithms
1662 Graunt : father of statistics
1796 Watt : recording data with a machine
1801 Jacquard : programming !!
1830 Babbage : the first mechanical and programmable computer
1844 Morse : data encodings
1850 Reuter : first โWANโ (the CSMA/CD was a bit messy)
1876 Dewey : data classification
15. 1900 - 2000
15
1930โs Fisher : modern statistics
1936 Turing : the universal computer
1950โs Programming Languages : Fortran et al
1962 Tomlinson : first standard for Geo Data
1963 ASCII : a standard for representing letters and numbers
1969 ARPANET and other Protocols
1970โs RDBMS : ETL , BI , Data Warehouses , I am your father
1970โs/ 80โs Personal Computing : the foundations for alot of todayโs data
1982 TCP/IP standardized and the Internet came to be
1989 The Web and HTML blink tags
1991 Unicode : all languages captured
16. To infinity and beyond
16
Web 2.0
Google
Social Networking
IOT
27. Data Stats Candy
27
Every day 2.5 quintillion bytes of data (1 followed by 18 zeros) are created
A full 90 percent of all the data in the world has been generated over the last two years
2.7 Zetabytes of data exist in the digital universe today.
Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data
Akamai analyzes 75 million events per day to better target advertisements.
Decoding the human genome originally took 10 years to process; now it can be achieved in one
week.
Data production will be 44 times greater in 2020 than it was in 2009
35. New approaches to data platforms are needed
35
Traditional ETL / Data Warehouses = schema at write time
To cope with data today = schema at read time
A Data language , the new SQL
Platforms that support an ecosystem of developers , content creators , data knowledge sharing
Open and easily extensible to cope with a variety of data sources and use cases, API oriented
Elasticity
38. Platform for Machine Data
Any Machine Data
HA Indexes
and Storage
Search and
Investigation
Proactive
Monitoring
Operational
Visibility
Real-time
Business
Insights
Commodity
Servers
Online
Services Web
Services
Servers
Security GPS
Location
Storage
Desktops
Networks
Packaged
Applications
Custom
ApplicationsMessaging
Telecoms
Online
Shopping
Cart
Web
Clickstreams
Databases
Energy
Meters
Call Detail
Records
Smartphones
and Devices
RFID
39. Powerful Platform for Enterprise Developers
39
REST API
Build Splunk Apps Extend and Integrate Splunk
Simple XML
JavaScript
Django
Web
Framework
Java
JavaScript
Python
Ruby
C#
PHP
Data Models
Search Extensibility
Modular Inputs
SDKs
41. The Developer Opportunity in Data
41
Itโs fun to make cool things
Get a job , build a business , make money !
Promote yourself , Promote your company
Get involved in community projects
Do Good
Think of new data sources and tap into them
Democratize data
Discover new things & drive society forward
We talk alot about the how , what , where and who โฆ.. but what about the WHY
3 words to sum up what types of projects I am drawn to.
Heard all the sheep jokes , any original heckles are most welcome.
What am I talking about :Data Where it came from What it looks like todayHow are we going to do some useful things with itMarketing guys created itBut the underlying data , tools and technologies and fundamental core discoveries didnโt
Like hollywood , I like a good origin story
Shoutout to wolfram alpha.The invention of arithmetic provides a way to abstractly compute numbers of objects. The Ishango Bone, a baboon's fibula discovered in the 1970's, was a counting system, a list of prime numbers and even gives evidence of a multiplication tableImagine taking these to the bar ! No androids , iphones.
The Lascaux cave paintings record the first known narrative stories. Telling stories through visualization of eventsMike Bostock,D3 and the NewYork Times VizCSS had to come from somewhere
No good if they are just sitting on a wall.A central event in the emergence of civilization, written language provides a systematic way to record and transmit knowledge. Old Sumerian love poem. Oh would they be dissapointed in Bieber.
The first known calendar system is established, rounding the lunar month to 30 days to create a 360-day year. The โlike a boss Epochโ 1970 . Pfffft.
The Library at Thebes is the first known effort to gather and make many sources of knowledge available in one place. Inscription above the door โmedicine for the soulโ , havenโt heard people say that about folks trying to troubleshoot RDBMS scalability issues ๏
The Turin Papyrus is the first known topographic map. Geostats is going to be one of the most important โdimensions of dataโ in the future , and Iโll show a demo.
The Pythagoreans promote the idea that numbers can be used to systematically understand and compute aspects of nature, music, and the world. Euclid writes his Elements, systematically presenting theorems of geometry and arithmetic. Archimedes uses mathematics to create and understand technological devices and possibly builds gear-based, mechanical astronomical calculators. Antikythera Mechanism A gear-based device that survives today is created to compute calendrical computation. Aristotle tries to systematize knowledge, first, by classifying objects in the world, and second, by inventing the idea of logic as a way to formalize human reasoning. The father of OO and the if/then/else statement ? No disrespect to the Smalltalk team at Xerox PARC
Pliny creates an encyclopedia that claims to summarize all knowledge with references to its sources. Tsai Lun invents paper in China. French philosopher Nicole Oresme introduces the notion of drawing graphs of values. The printing press , Moveable type makes it economical to print many kinds of documents. Codices : A codex (Latincaudex for "trunk of a tree" or block of wood, book; plural codices) is a book made up of a number of sheets of paper, vellum, papyrus, or similar, with hand-written content,[1] usually stacked and bound by fixing one edge and with covers thicker than the sheets, but sometimes continuous and folded concertina-style. The alternative to paged codex format for a long document is the continuous scroll. Examples of folded codices are the Maya codices. Sometimes the term is used for a book-style format, including modern printed books but excluding folded books.Before codices came scrolls.
John Napier publishes the first tables of logarithms,fundamentialmathmatic tenets behind stats , geometry , cryptography etc..Leibniz promotes the idea of answering all human questions by converting them to a universal symbolic language, then applying logic using a machine. He also tries to organize the systematic collection of knowledge to use in such a system. Graunt and others start to systematically summarize demographic and economic data using statistical ideas based on mathematics. James Watt and John Southern create (but keep secret for 24 years) a device for automatically tracing variation of pressure with volume in a steam engine.The Jacquard loom weaves patterns specified by punched cards. Babbage constructed a mechanical computer to automate the creation of mathematical knowledge. Samuel Morse sends the first public telegraph message. Paul Julius Reuter uses pigeons to fly stock prices between Aachen and Brussels. Dewey invented the Dewey Decimal System for classifying the world's knowledge and specifying how to organize books in libraries. Indexing for performance, NOSQL and key value stores ideas came from somewhere.
Ronald Fisher and others lay the foundations for modern statistics. Turing, Bletchly Park, shows that any reasonable computation can be done by programming a fixed universal machineโand then speculated that such a machine could emulate the brain. Fortran, COBOL, and other early computer languages defines the concept of a precise formal representation for tasks to be performed by computers. Roger Tomlinson initiates the Canada Geographic Information System, creating the first GIS system.The first two nodes of what would become the ARPANET were interconnected between Leonard Kleinrock's Network Measurement Center at the UCLA's School of Engineering and Applied Science and Douglas Engelbart's NLS system at SRI International (SRI) in Menlo Park, California, on 29 October 1969.[1The Unicode standard assigns a numerical code to every glyph in every human language.
Where are we heading ?Where does it come from ?What does it look like ?Is it just stuff from computers ?Is it just humans and human creations that represent data ?
Log filesJVM JMXSNMPTapping the wire tcpdump, ngrep etcโฆAPIs (REST etcโฆ)
Data always been hereIf a bear shits in the woodsDoes data exist without someone there to observe it ?
Capturing data from the past
Data all around us , it may not necessarily look like data at first, until you think about it as data and how to capture it
Data is starting to permeate our daily lives and create a lot of opportunity for data driven solutionsAnd when you have the means to correlate data together , this can lead to many possibilitys
Even bearded hipsters can be useful.I aspire to have that amount of hairFitbit , humans are a source of data
A lot of data being createdA lot of data existsA lot more data will be created.We are analysing it and are getting more powerful at analysing it.
The data Vโs , useful for booth babing duty.VeracitySome data is inherently uncertain, for example: sentiment and truthfulness in humans; GPS sensors bouncing among the skyscrapers of Manhattan; weather condi- tions; economic factors; and the future. When dealing with these types of data, no amount of data cleansing can correct for it. Yet despite uncertainty, the data still contains valuable information. The need to acknowledge and embrace this uncertainty is a hallmark of big data.
Where are we heading ?Huge DataReally F***en astronomical dataNope
Just dataBut weโll be better at generating it more optima tallyData sources might increase , IOT etcโฆBut , volumes may taper out to a less exponential trajectoryYou need people who understand the data , to impart knowledge about the dataThat allows us to generate insightsAnd ultimately take actions , unless you just want to look at pretty charts all day.For me that is the data story.
Developers will be the kingmakers (to use a RedMonkism)
A few musings of several potential ones.
At Splunk, our mission is to make machine data accessible, usable and valuable to everyone. Andthis overarching mission is what drives our company and product priorities.
What is Splunk
Splunk is the leading platform for machine data analytics with over 5,200 organizations using Splunk (as of 7/1/13) โ from tens of GB to many tens of TBs of data PER DAY.Splunk software is optimized for real-time, low latency and interactivity.Splunk software reliably collects and indexes all the streaming data from IT systems and technology devices in real-time - tens of thousands of sources in unpredictable formats and types.The value from Splunking machine data is described as Operational Intelligence. This enables organizations to: 1. Find and fix problems dramatically faster2. Automatically monitor to identify issues, problems and attacks3. Gain end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions4. Gain real-time insight from operational data to make better-informed business decisions
BUILD SPLUNK APPSThe Splunk Web Framework makes building a Splunk app looks and feels like building any modern web application. ย The Simple Dashboard Editor makes it easy to BUILD interactive dashboards and user workflows as well as add custom styling, behavior and visualizations. Simple XML is ideal for fast, lightweight app customization and building. Simple XML development requires minimal coding knowledge and is well-suited for Splunk power users in IT to get fast visualization and analytics from their machine data. Simple XML also lets the developer โescapeโ to HTML with one click to do more powerful customization and integration with JavaScript.ย Developers looking for more advanced functionality and capabilities can build Splunk apps from the ground up using popular, standards-based web technologies: JavaScript and Django. The Splunk Web Framework lets developers quickly create Splunk apps by using prebuilt components, styles, templates, and reusable samples as well as supporting the development of custom logic, interactions, components, and UI. Developers can choose to program their Splunk app using Simple XML, JavaScript or Django (or any combination thereof).EXTEND AND INTEGRATE SPLUNKSplunk Enterprise is a robust, fully-integrated platform that enables developers to INTEGRATE data and functionality from Splunk software into applications across the organization using Software Development Kits (SDKs) for Java, JavaScript, C#, Python, PHP and Ruby. These SDKs make it easier to code to the open REST API that sits on top of the Splunk Engine. With almost 200 endpoints, the REST API lets developers do programmatically what any end user can do in the UI and more. The Splunk SDKs include documentation, code samples, resources and tools to make it faster and more efficient to program against the Splunk REST API using constructs and syntax familiar to developers experienced with Java, Python, JavaScript, PHP, Ruby and C#. Developers can easily manage HTTP access, authentication and namespaces in just a few lines of code. ย Developers can use the Splunk SDKs to: - Run real-time searches and retrieve Splunk data from line-of-business systems like Customer Service applications - Integrate data and visualizations (charts, tables) from Splunk into BI tools and reporting dashboards- Build mobile applications with real-time KPI dashboards and alerts powered by Splunk - Log directly to Splunk from remote devices and applications via TCP, UDP and HTTP- Build customer-facing dashboards in your applications powered by user-specific data in Splunk - Manage a Splunk instance, including adding and removing users as well as creating data inputs from an application outside of Splunk- Programmatically extract data from Splunk for long-term data warehousingDevelopers can EXTEND the power of Splunk software with programmatic control over search commands, data sources and data enrichment. Splunk Enterprise offers search extensibility through: - Custom Search Commands - developers can add a custom search script (in Python) to Splunk to create own search commands. To build a search that runs recursively, developers need to make calls directly to the REST API- Scripted Lookups: developers can programmatically script lookups via Python.- Scripted Alerts: can trigger a shell script or batch file (we provide guidance for Python and PERL).- Search Macros: make chunks of a search reuseable in multiple places, including saved and ad hoc searches.ย ย Splunk also provides developers with other mechanisms to extend the power of the platform.-Data Models: allow developers to abstract away the search language syntax, making Splunk queries (and thus, functionality) more manageable and portable/shareable. - Modular Inputs: allow developers to extend Splunk to programmatically manage custom data input functionality via REST.
Social DataTwitterTwitter REST InputShow setup for qconlondon tags and searching over raw dataCreate on the fly dashboardCreate searchtime extraction and dashboardShow precanned full dashboardShow underlying HTML/JS sourceFoursquareFoursquare REST inputShow 3 pre canned searchesShow haversine search commandShow geostats and create simple map on the fly in a dashboardPublic Data / Open Data / Geo DataFirebase , SF Muni realtime transit dataNode Scripted InputShow dashboardInputlookup for data genShow html/js sourceMobile Device signal outages
New Apps that take advantage of all this new data and correlations and insightsMore data more accessible , the democritization of dataHelp people : More effcient ways to run your car your household etc..Socio economic dataCyberbullyingSplunk for good