SlideShare une entreprise Scribd logo
1  sur  19
Garbage In
Rainbows Out
Zach Briggs
New Developer
7 Years in Data Analytics




Mike Fidler
Systems Security Specialist
Ex Geologist
His Unix experience is old enough to drink
Amateur Inventor
Validates
This form contains one error.
Screenshot of
 column view
Screenshot of serial view
Application
Raw Data
            Database
Continuous   Application
Raw Data    Process      Database
Coffee
Why your coffee is shit.
Anything but drip
Thank You

Zach - briggszj@gmail.com
@theotherzach
Title of Record



Mike - rockmastermike@gmail.com
@rockmastermike
Unix Neck Beard
Available for hire

Contenu connexe

En vedette (9)

Catálogo vasos plástico
Catálogo vasos plásticoCatálogo vasos plástico
Catálogo vasos plástico
 
περιβαλλοντικη
περιβαλλοντικηπεριβαλλοντικη
περιβαλλοντικη
 
Harish.h.nair
Harish.h.nairHarish.h.nair
Harish.h.nair
 
A crise e o direito público
A crise e o direito públicoA crise e o direito público
A crise e o direito público
 
Behavioral economics
Behavioral economicsBehavioral economics
Behavioral economics
 
Bloodborne pathogen training
Bloodborne pathogen trainingBloodborne pathogen training
Bloodborne pathogen training
 
Colocation Market Trends 2015
Colocation Market Trends 2015Colocation Market Trends 2015
Colocation Market Trends 2015
 
Gottman Presentation Philosophy & Implementation of Couples Interventions
Gottman Presentation Philosophy & Implementation of Couples InterventionsGottman Presentation Philosophy & Implementation of Couples Interventions
Gottman Presentation Philosophy & Implementation of Couples Interventions
 
Gottman Presentation Sound Marital House
Gottman Presentation Sound Marital HouseGottman Presentation Sound Marital House
Gottman Presentation Sound Marital House
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Garbage in, rainbows out

Notes de l'éditeur

  1. I wanted to call this talk “dirty inputs.” \n
  2. \n
  3. Gatekeeper\nUnexpected inputs fail, push back to the user\n\n
  4. Fault tolerant systems. Model validations are the most obvious form cleansing. They are the gatekeepers.\n
  5. How about bulk records?\nUsing CSV, uploads as my example could be any source\nConsuming external json, sharing databases. Anything outside of the black rectangle\nWhat’s the downside to relying on validation when we get garbage?\nBest case is it fails, we ask the user to fix their stuff and try again.\nAllow half-fails? “Fix lines X Y and Z?” \n
  6. Super basic example. Unravels a CSV file, turns a potnentially wide table into a long one.\n
  7. Typical data grid, once again from any source\n
  8. And now we have a stream of data. Allows for more graceful failures. Since the entire input is in the system we can prompt the user to fix the errors or devise filters to do it automatically. \n\nIs it possible we would get better filters in the future? Better methods of cleaning the data. I’m sure none of you have ever seen a database where the columns were shifted by 1 because of a bone headed mistake that happened 2 months ago. Me either.\n
  9. Schemaless store is just the landing area for the data to be moved into our database in batches. The stream could be MongoDB, SQL Light, cave drawings with a web cam where your OCR software processes it into something usable. \n\nIt doesn’t matter.\n
  10. \n
  11. What if it looked more like this? How many do fake deletes? Why? How is an update different from a delete?\nIf we automate the input/ filter process why do it only once?\nWhy throw out anything at all? How would that system be different? Here is as far as I am. Ish. That “All data” is a few hundred gigs in MySQL tables and I have scripts that run when something updates. Add a ZIP and 56 minutes later it shows up in my Rails app.\n
  12. Nathan Marz had this idea first. \n
  13. How’s about this? \n\nQuery is a function of all data. Capture is done in the rawest granular way possible so speed wouldn’t be a consideration. Events rather than “stuff” so it can be rewound to the beginning of time.\n
  14. What is coffee? It’s filthy ass water, that’s what it is. Coffeeologists (board certified ones) measure the quality of coffee using the same dimensions as clean drinking water. pH, dissolved solids, rat feces. The usual.\n
  15. Pre ground grocery store beans have been sitting there for months and have lost their volatile flavor molecules. \nThe drip machine sprays unfiltered water that is too hot into the center of the filter over extracting some grounds and leaving others under extracted. \nThe coffee hits the bottom of the hot glass carafe and is instantly burned.\nWhat about the coffee nerds here? \n
  16. Pour over fixes the water temp and center over-extraction\nPress pot goes further and allows for extraction fine tuning\n\n
  17. The issue is variables out of your control:\nBean age\nWater quality\n\nPress pots can come close but you’re brewing blind. \n
  18. \n
  19. \n