SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
DATA LAKES ON AWS: 

GOOD, FAST, AND INEXPENSIVE
WELLINGTON AWS USER GROUP
Photo: Frank Kovalchek, http://www.flickr.com/people/72213316@N00
VOLUME, VARIETY, AND VELOCITY
Photo: Frank Kovalchek, http://www.flickr.com/people/72213316@N00
WHAT IS A DATA LAKE?
▸ A network file share full of spreadsheets is a (bad) data lake
▸ Focused on making it easy to collect large amounts of data
▸ A place to store data in its natural format for future analysis
▸ Instead of Big Design Up Front (BDUF) shifts governance
right in order to remove barriers and empower users.
▸ Accepts in principle that slightly inefficient computer costs
make data scientists more productive.
2 MINUTE DATA LAKE
DEMONSTRATION
Photo: Tim Evanson, https://www.flickr.com/photos/timevanson/
DATA WAREHOUSE PROBLEMS SOLVED BY S3
▸ Dropbox’s distributed storage system 

on IEEE Software Engineering Radio 

(Masses of people, enormous capital, long timeframe)
▸ Running out of space, capacity planning
▸ Slow hardware, unable to drink from the firehose
▸ Significant developer cost and delay 

before data can be analyzed to determine if it is valuable.

▸ Elastic scalability
▸ High Availability
▸ Coupling storage to compute (HDFS)
▸ Hosting and admin cost of running EMR clusters
▸ No need to run your own data dictionary (Hive metabase)
and persist it HA between cluster outages.
▸ No need to run your own security (Apache Ranger)
DATA LAKE PROBLEMS SOLVED BY ATHENA
BUSINESS INTELLIGENCE PROBLEMS SOLVED BY QUICKSIGHT
▸ Performance at scale
▸ High Availability
▸ Hosting and admin cost of running servers
COMPETITORS
▸ Azure has similar offerings
▸ PowerBI is good
▸ Azure Data Lake Analytics differences:
▸ Not elastic
▸ No optimized storage: ORC or parquet
▸ Uses HDFS service, not Blob store
VISUALISATION
DEMONSTRATION
Photo: Geo Swan, https://commons.wikimedia.org/wiki/User:Geo_Swan
UNEVEN COMPARISONS
VS
▸ On premise performance will start slower and scale
poorly
▸ AWS Enterprise support vs ticket logging
▸ High availability, Disaster recovery, backup costs
included
▸ On premise costs escalate rapidly with scale. 

~$1,000,000,000 per petabyte every year
TRUE COSTS OF SERVERS
▸ Servers aren’t being patched
▸ Servers aren’t natively Highly Available
▸ Server backups need to be configured, and can be
misconfigured
▸ Server configuration slows down development
▸ Server performance suffers before scaling
Photo: Micheal Filion, https://www.flickr.com/photos/mike9alive/
ARCHITECTURE: DATA INGESTION
OPERATIONAL ANALYTICS
TRACKING PERFORMANCE
RESEARCH
EXTENDED DATA LAKE
THE “PROJECT MANAGEMENT TRIANGLE”
Photo: Kevin Lim, https://www.flickr.com/photos/inju/
VICTIMS OF THE SYSTEM
CULTURE, AUTOMATION, LEAN, MEASUREMENT
© BrokenSphere / Wikimedia Commons
▸ Not tool specialists - can focus elsewhere
▸ Tool “automates” the hard part of the task
▸ Tool only does the part of the job that has value
▸ Transparency - everyone can see the results
NO ONE WANTS A DRILL
▸ This presentation is about tools, people want outcomes.
▸ Knowing your tools is good, 

making them the focus of your work is wrong.
▸ Providing value with a data lake is about asking the important
questions, and answering those questions accurately.
▸ I strongly recommend asking the correct question over using
the correct tool.
▸ Thinking with Data by Max Shron
Photo: United States Marine Corps.
STEVEN ENSSLEN - AUTOMATION FOR BUSINESS INTELLIGENCE
▸ AWS Certified Solutions Architect - Professional
▸ Big data and business intelligence consulting

▸ http://stevenensslen.com
▸ steven@stevenensslen.com

Contenu connexe

Similaire à DATA LAKES ON AWS: GOOD, FAST, AND INEXPENSIVE

Overview of AWS Services for Data Storage and Migration - SRV205 - Anaheim AW...
Overview of AWS Services for Data Storage and Migration - SRV205 - Anaheim AW...Overview of AWS Services for Data Storage and Migration - SRV205 - Anaheim AW...
Overview of AWS Services for Data Storage and Migration - SRV205 - Anaheim AW...Amazon Web Services
 
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesCache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesFelix Gessert
 
Federated Storage Resources GCC2018 https://vimeo.com/291738189
Federated Storage Resources GCC2018 https://vimeo.com/291738189Federated Storage Resources GCC2018 https://vimeo.com/291738189
Federated Storage Resources GCC2018 https://vimeo.com/291738189Vahid Jalili
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...Amazon Web Services
 
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Amazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudAmazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...Amazon Web Services
 
AWS Partner Webcast - Disaster Recovery: Implementing DR Across On-premises a...
AWS Partner Webcast - Disaster Recovery: Implementing DR Across On-premises a...AWS Partner Webcast - Disaster Recovery: Implementing DR Across On-premises a...
AWS Partner Webcast - Disaster Recovery: Implementing DR Across On-premises a...Amazon Web Services
 
An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...
An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...
An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...Amazon Web Services
 
Monitoring OpenStack? Piece of cake!
Monitoring OpenStack? Piece of cake!Monitoring OpenStack? Piece of cake!
Monitoring OpenStack? Piece of cake!Dirk Wallerstorfer
 
Seamlessly Extend Your Datacenter to the Cloud with Commvault on AWS
Seamlessly Extend Your Datacenter to the Cloud with Commvault on AWSSeamlessly Extend Your Datacenter to the Cloud with Commvault on AWS
Seamlessly Extend Your Datacenter to the Cloud with Commvault on AWSAmazon Web Services
 

Similaire à DATA LAKES ON AWS: GOOD, FAST, AND INEXPENSIVE (20)

AWS Storage and Edge Processing
AWS Storage and Edge ProcessingAWS Storage and Edge Processing
AWS Storage and Edge Processing
 
Overview of AWS Services for Data Storage and Migration - SRV205 - Anaheim AW...
Overview of AWS Services for Data Storage and Migration - SRV205 - Anaheim AW...Overview of AWS Services for Data Storage and Migration - SRV205 - Anaheim AW...
Overview of AWS Services for Data Storage and Migration - SRV205 - Anaheim AW...
 
SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
 
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesCache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
 
Federated Storage Resources GCC2018 https://vimeo.com/291738189
Federated Storage Resources GCC2018 https://vimeo.com/291738189Federated Storage Resources GCC2018 https://vimeo.com/291738189
Federated Storage Resources GCC2018 https://vimeo.com/291738189
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
 
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the Cloud
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
 
AWS Partner Webcast - Disaster Recovery: Implementing DR Across On-premises a...
AWS Partner Webcast - Disaster Recovery: Implementing DR Across On-premises a...AWS Partner Webcast - Disaster Recovery: Implementing DR Across On-premises a...
AWS Partner Webcast - Disaster Recovery: Implementing DR Across On-premises a...
 
An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...
An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...
An Overview of AWS services for Data Storage and Migration - SRV205 - Toronto...
 
Monitoring OpenStack? Piece of cake!
Monitoring OpenStack? Piece of cake!Monitoring OpenStack? Piece of cake!
Monitoring OpenStack? Piece of cake!
 
Seamlessly Extend Your Datacenter to the Cloud with Commvault on AWS
Seamlessly Extend Your Datacenter to the Cloud with Commvault on AWSSeamlessly Extend Your Datacenter to the Cloud with Commvault on AWS
Seamlessly Extend Your Datacenter to the Cloud with Commvault on AWS
 
ppt2.pdf
ppt2.pdfppt2.pdf
ppt2.pdf
 

Dernier

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 

Dernier (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 

DATA LAKES ON AWS: GOOD, FAST, AND INEXPENSIVE

  • 1. DATA LAKES ON AWS: 
 GOOD, FAST, AND INEXPENSIVE WELLINGTON AWS USER GROUP Photo: Frank Kovalchek, http://www.flickr.com/people/72213316@N00
  • 3. Photo: Frank Kovalchek, http://www.flickr.com/people/72213316@N00 WHAT IS A DATA LAKE? ▸ A network file share full of spreadsheets is a (bad) data lake ▸ Focused on making it easy to collect large amounts of data ▸ A place to store data in its natural format for future analysis ▸ Instead of Big Design Up Front (BDUF) shifts governance right in order to remove barriers and empower users. ▸ Accepts in principle that slightly inefficient computer costs make data scientists more productive.
  • 4. 2 MINUTE DATA LAKE DEMONSTRATION Photo: Tim Evanson, https://www.flickr.com/photos/timevanson/
  • 5. DATA WAREHOUSE PROBLEMS SOLVED BY S3 ▸ Dropbox’s distributed storage system 
 on IEEE Software Engineering Radio 
 (Masses of people, enormous capital, long timeframe) ▸ Running out of space, capacity planning ▸ Slow hardware, unable to drink from the firehose ▸ Significant developer cost and delay 
 before data can be analyzed to determine if it is valuable.

  • 6. ▸ Elastic scalability ▸ High Availability ▸ Coupling storage to compute (HDFS) ▸ Hosting and admin cost of running EMR clusters ▸ No need to run your own data dictionary (Hive metabase) and persist it HA between cluster outages. ▸ No need to run your own security (Apache Ranger) DATA LAKE PROBLEMS SOLVED BY ATHENA
  • 7. BUSINESS INTELLIGENCE PROBLEMS SOLVED BY QUICKSIGHT ▸ Performance at scale ▸ High Availability ▸ Hosting and admin cost of running servers
  • 8. COMPETITORS ▸ Azure has similar offerings ▸ PowerBI is good ▸ Azure Data Lake Analytics differences: ▸ Not elastic ▸ No optimized storage: ORC or parquet ▸ Uses HDFS service, not Blob store
  • 9. VISUALISATION DEMONSTRATION Photo: Geo Swan, https://commons.wikimedia.org/wiki/User:Geo_Swan
  • 10. UNEVEN COMPARISONS VS ▸ On premise performance will start slower and scale poorly ▸ AWS Enterprise support vs ticket logging ▸ High availability, Disaster recovery, backup costs included ▸ On premise costs escalate rapidly with scale. 
 ~$1,000,000,000 per petabyte every year
  • 11. TRUE COSTS OF SERVERS ▸ Servers aren’t being patched ▸ Servers aren’t natively Highly Available ▸ Server backups need to be configured, and can be misconfigured ▸ Server configuration slows down development ▸ Server performance suffers before scaling Photo: Micheal Filion, https://www.flickr.com/photos/mike9alive/
  • 17. THE “PROJECT MANAGEMENT TRIANGLE” Photo: Kevin Lim, https://www.flickr.com/photos/inju/
  • 18. VICTIMS OF THE SYSTEM
  • 19. CULTURE, AUTOMATION, LEAN, MEASUREMENT © BrokenSphere / Wikimedia Commons ▸ Not tool specialists - can focus elsewhere ▸ Tool “automates” the hard part of the task ▸ Tool only does the part of the job that has value ▸ Transparency - everyone can see the results
  • 20. NO ONE WANTS A DRILL ▸ This presentation is about tools, people want outcomes. ▸ Knowing your tools is good, 
 making them the focus of your work is wrong. ▸ Providing value with a data lake is about asking the important questions, and answering those questions accurately. ▸ I strongly recommend asking the correct question over using the correct tool. ▸ Thinking with Data by Max Shron Photo: United States Marine Corps.
  • 21. STEVEN ENSSLEN - AUTOMATION FOR BUSINESS INTELLIGENCE ▸ AWS Certified Solutions Architect - Professional ▸ Big data and business intelligence consulting
 ▸ http://stevenensslen.com ▸ steven@stevenensslen.com