SlideShare une entreprise Scribd logo
1  sur  62
Data at Scale

Data problems and solutions with the
          connected world
Michael Peacock
Web Systems Developer
Telemetry Team
Smith Electric Vehicles

Lead Developer
Occasional conference speaker
Technical Author
• Worlds largest manufacturer of all electric
  commercial vehicles
• Founded in 1920
• US facility opened 2009
• US buyout in 2011
Commercial electric vehicles?
Electric Vehicles
•   16,500 – 26,000 lbs gross vehicle weight
•   Commercial Electric Delivery Trucks
•   7,121 – 16,663 lbs payload
•   50 – 240km
•   Top Speed 80km/h
Electric Vehicles
• New, continually evolving, technology
• Viability evidence required
• Government research
EV Data
• Performance analysis and metrics
• Proving the technology: Government
  research
• Evaluating driver training conversions
• Diagnostics, Service and Warranty Issues
• Continuous Improvement
Current Status
• ~500 telemetry enabled vehicles
• Telemetry is now fitted as standard in our
  vehicles
• Our MySQL solution processes:
  – 1.5 billion inserts per day
  – Constant minimum of 4000 inserts per second
CANBus: 101
CANBus and Telemetry
• Sample the buses: once per second
• Only sample buses with useful
  performance and diagnostic information on
  them
Vehicle Data
• Drive train information:
  – Motor speed
  – Pedal positions
  – Temperatures
  – Fault Codes
• Battery information:
  – Current, Voltage & Power
  – Capacity
  – Temperatures
Connected World: The Problem
• Connected infrastructure
  – EV Charging stations
  – Utilities
• Home based telemetry
  – Smart Meters
  – Smart Homes
Our problem
• Hundreds of connected devices, each with
  numerous sensors giving us 2,500 pieces
  of data per second per vehicle
• Broadcast time we can’t plan for
• Vehicles rolling off the production line
• New requirements for more data
How it started
Issue 1: Availability
Issue 2: Capacity
Sometimes data is too
much to cope with




www.flickr.com/photos/eveofdiscovery/314
9008295
Issue 2: Capacity
Option: Cloud Infrastructure
• Cloud based infrastructure gives:
  – More capacity
  – More failover
  – Higher availability
Cloud Infrastructure: Problem
• Huge volumes of data inserts into a
  MySQL solution: sub-optimal on virtualised
  environments
• Existing enterprise hardware investment
• Security and legal issues for us storing the
  data off-site
Cloud Infrastructure: Enabler
www.flickr.com/photos/gadl/89650415/inphotostream
AMQP
Advanced Message Queuing Protocol
Queuing
• Downtime
• Capacity
• Maintenance Windows
What if...
• Queuing allows us to cope with:
  – Downtime of our own systems
  – Capacity problems
• Queuing doesnt allow us to cope with:
  – An outage of a queuing infrastructure
Buffer




www.flickr.com/photos/brapps/403257780
Cloud based infrastructure
• Use a Message Queue to ensure data is
  only processed when you have the
  resources to process it
SAN
• Backbone to most cloud-based systems
• Powers our MySQL solution
• Supports:
  – Huge volumes of data
  – Lots of processing
  – Fast connection to your servers
  – Backups and snapshots
SAN Tips
• When dealing with data on a huge scale
  every aspect of your application and
  infrastructure needs to be optimised, this
  includes your SAN – something which is
  commonly overlooked.


• http://www.samlambert.com/2011/07/how-to-push-your-san-with-
  open-iscsi_13.html
New Architecture
Speed: Stream  Batch
• Streams of continuously flowing data can
  be difficult to process
• Turn the stream into small, quick batches

• MySQL: LOAD DATA INFILE
Shard 1: Hardware
• As the amount of data increased, we hit a
  huge performance problem. This was
  solved by sharding at a hardware level.
• Each data collection device was given its
  own database, which could be on any
  number of separate machines, with a
  single database acting as a registry
Rationalisation & Extrapolation
• Remember the CANBus
  – Always telling us information, which we
    sample every second?
  – Do we always need that?
• Extrapolate and assume
Getting information from data
• Vehicle performance information involves:
  – Looking at 20 – 30 data points for each
    second of a vehicles operation in a day
  – Analysing the data
  – Performing calculations, which vary
    depending on certain data points
• Getting this data was slow
  – How far did Customer A’s fleet travel last
    week?
Regular processing
• Instead of processing data on demand,
  process it regularly
• Nightly scheduled task to evaluate
  performance information
Regular Processing: Problems
You need to pull the data out faster and
           faster than before!
Shard 2: Tables
• All our data has a timestamp associated
  with it
• Looking up data for a particular day was
  slow. Very slow.
• We sharded the data again, this time with
  a table per week within a vehicles specific
  database
Sharding: Fallbacks and logic
• What about data before you implemented
  sharding?
• Which table do I need to look at?
Aggregation
• With data segregated on a per vehicle and
  per week basis, lookups were much faster
• Performance calculations could be
  scheduled nightly, with a single record
  recorded for each vehicle for each day in a
  central database
• Allows for easy aggregation:
  – How far did my fleet travel last week?
  – How much energy did they use last month?
Backups and Archives
• SAN backups and snapshots
• With date based sharding:
  – Dump a table
  – Copy it elsewhere
  – Drop it / Flush it (if archiving)
Outsource to the cloud
• Why waste resources doing things that
  cloud based services do better (where
  legal, security and privacy reasons allow?)

• Maps
• Email delivery
• Even phone integration
Data Type Optimization
• When prototyping a system and designing
  a database schema, its easy to be sloppy
  with your data types, and fields
• DONT BE
• Use as little storage space as you can
  – Ensure the data type uses as little as you can
  – Use only the fields you need
Sharding: An excuse
• Sharding was a large project for us, and
  involved extensive re-architecting of the
  system.
• We had to make changes to every query
  we have in our code
• Gave us an excuse to:
  – Optimise the queries
  – Optimise the indexes
Query Optimization
• Run every query through EXPLAIN
  EXTENDED
• Check it hits the indexes
• Remove functions like CURDATE from
  queries, to ensure query cache is hit
Index Optimization
• Keep it small
• From our legacy days of one database on
  one server, we had a column that told us
  which vehicle the data related to
  – This was still there...as part of an
    index...despite the fact the application
    hadn’t required it for months
Live data: dashboard
Live data: Maps
Live data
• Original database design dictated:
  • Each type of data point required a separate
    query, sub-query or join to obtain
• Collection device and processing service
  dictated:
  • GPS Co-ordinates can be up to 6 separate
    data points, including: Longitude; Latitude;
    Altitude; Speed; Number of Satellites used to
    get location; Direction
Dashboards: Caching
• Don’t query if you don’t have to

• Cache what you can; access direct

• With message queuing its possible to
  route messages to two or more places:
  one to be processed and another to
  display the latest information directly
Exporting data: Group
• Where possible group exports and reports
  together by the same shard/table/index
Code considerations
• Race conditions
• Number of concurrent requests – group
  them
Application Quality
• When dealing with lots of data, quickly,
  you need to ensure:
  – You process it correctly
  – You can act fast if there is a bug
  – You can act fast when refactoring
Deployment
• When dealing with a stream of data, rolling
  out new code can mean pausing the
  processing work that is done
• Put deployment measures in place to
  make a deployment switch over
  instantaneous
Technical Tips
• Measure your applications performance,
  data throughput and so on
  – A data at scale problem itself
• Use as much RAM on your servers as is
  safe to do so
  – We give 80% per DB server to MySQL of 100
    – 140GB
What do we have now?
• Now we have a fast, stable reliable system
• Pulling in millions of messages from a queue per
  day
• Decoding those messages into 1.5 billion data
  points per day
• Inserting 1.5 billion data points into MySQL per
  day
• Performance data generated, and grant
  authority reports exported daily
• More sleep on a night than we used to
Questions

Contenu connexe

Tendances

Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsAvere Systems
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveYifeng Jiang
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Presentation on Large Scale Data Management
Presentation on Large Scale Data ManagementPresentation on Large Scale Data Management
Presentation on Large Scale Data ManagementChris Bunch
 
Cassandra and Riak at BestBuy.com
Cassandra and Riak at BestBuy.comCassandra and Riak at BestBuy.com
Cassandra and Riak at BestBuy.comjoelcrabb
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?MapR Technologies
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
Container Orchestrator Smackdown @ContinousLifecycle
Container Orchestrator Smackdown @ContinousLifecycleContainer Orchestrator Smackdown @ContinousLifecycle
Container Orchestrator Smackdown @ContinousLifecycleMichael Mueller
 
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013Amazon Web Services
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache ApexPramod Immaneni
 
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Precisely
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...DataStax Academy
 

Tendances (20)

Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Presentation on Large Scale Data Management
Presentation on Large Scale Data ManagementPresentation on Large Scale Data Management
Presentation on Large Scale Data Management
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
 
Cloud Migration
Cloud MigrationCloud Migration
Cloud Migration
 
Cassandra and Riak at BestBuy.com
Cassandra and Riak at BestBuy.comCassandra and Riak at BestBuy.com
Cassandra and Riak at BestBuy.com
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Container Orchestrator Smackdown @ContinousLifecycle
Container Orchestrator Smackdown @ContinousLifecycleContainer Orchestrator Smackdown @ContinousLifecycle
Container Orchestrator Smackdown @ContinousLifecycle
 
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
SmugMug's Zero-Downtime Migration to AWS (ARC312) | AWS re:Invent 2013
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
 
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 

Similaire à Data at Scale - Michael Peacock, Cloud Connect 2012

Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
DC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysDC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysRahul Agarwal
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-conceptsMuhammad Ahad
 
Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]AppFirst
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the CloudEberhard Wolff
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsSriram Krishnan
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesAlexandra Sasha Blumenfeld
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceMercedes Coyle
 
Data data everywhere
Data data everywhereData data everywhere
Data data everywhereMetron
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 

Similaire à Data at Scale - Michael Peacock, Cloud Connect 2012 (20)

Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
DC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysDC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion Days
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-concepts
 
Lecture1
Lecture1Lecture1
Lecture1
 
Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Best Practices Using RTI Connext DDS
Best Practices Using RTI Connext DDSBest Practices Using RTI Connext DDS
Best Practices Using RTI Connext DDS
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the Cloud
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
 
Data data everywhere
Data data everywhereData data everywhere
Data data everywhere
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Big data
Big dataBig data
Big data
 

Plus de Michael Peacock

Immutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and TerraformImmutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and TerraformMichael Peacock
 
Test driven APIs with Laravel
Test driven APIs with LaravelTest driven APIs with Laravel
Test driven APIs with LaravelMichael Peacock
 
Symfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning TalkSymfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning TalkMichael Peacock
 
Alexa, lets make a skill
Alexa, lets make a skillAlexa, lets make a skill
Alexa, lets make a skillMichael Peacock
 
API Development with Laravel
API Development with LaravelAPI Development with Laravel
API Development with LaravelMichael Peacock
 
An introduction to Laravel Passport
An introduction to Laravel PassportAn introduction to Laravel Passport
An introduction to Laravel PassportMichael Peacock
 
Refactoring to symfony components
Refactoring to symfony componentsRefactoring to symfony components
Refactoring to symfony componentsMichael Peacock
 
Dance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech TalkDance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech TalkMichael Peacock
 
Powerful and flexible templates with Twig
Powerful and flexible templates with Twig Powerful and flexible templates with Twig
Powerful and flexible templates with Twig Michael Peacock
 
Introduction to OOP with PHP
Introduction to OOP with PHPIntroduction to OOP with PHP
Introduction to OOP with PHPMichael Peacock
 
Phpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friendsPhpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friendsMichael Peacock
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data projectMichael Peacock
 
Real time voice call integration - Confoo 2012
Real time voice call integration - Confoo 2012Real time voice call integration - Confoo 2012
Real time voice call integration - Confoo 2012Michael Peacock
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Michael Peacock
 
PHP Continuous Data Processing
PHP Continuous Data ProcessingPHP Continuous Data Processing
PHP Continuous Data ProcessingMichael Peacock
 
PHP North East Registry Pattern
PHP North East Registry PatternPHP North East Registry Pattern
PHP North East Registry PatternMichael Peacock
 

Plus de Michael Peacock (20)

Immutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and TerraformImmutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and Terraform
 
Test driven APIs with Laravel
Test driven APIs with LaravelTest driven APIs with Laravel
Test driven APIs with Laravel
 
Symfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning TalkSymfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning Talk
 
Alexa, lets make a skill
Alexa, lets make a skillAlexa, lets make a skill
Alexa, lets make a skill
 
API Development with Laravel
API Development with LaravelAPI Development with Laravel
API Development with Laravel
 
An introduction to Laravel Passport
An introduction to Laravel PassportAn introduction to Laravel Passport
An introduction to Laravel Passport
 
Phinx talk
Phinx talkPhinx talk
Phinx talk
 
Refactoring to symfony components
Refactoring to symfony componentsRefactoring to symfony components
Refactoring to symfony components
 
Dance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech TalkDance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech Talk
 
Powerful and flexible templates with Twig
Powerful and flexible templates with Twig Powerful and flexible templates with Twig
Powerful and flexible templates with Twig
 
Introduction to OOP with PHP
Introduction to OOP with PHPIntroduction to OOP with PHP
Introduction to OOP with PHP
 
Vagrant
VagrantVagrant
Vagrant
 
Phpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friendsPhpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friends
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
 
Real time voice call integration - Confoo 2012
Real time voice call integration - Confoo 2012Real time voice call integration - Confoo 2012
Real time voice call integration - Confoo 2012
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012
 
Supermondays twilio
Supermondays twilioSupermondays twilio
Supermondays twilio
 
PHP & Twilio
PHP & TwilioPHP & Twilio
PHP & Twilio
 
PHP Continuous Data Processing
PHP Continuous Data ProcessingPHP Continuous Data Processing
PHP Continuous Data Processing
 
PHP North East Registry Pattern
PHP North East Registry PatternPHP North East Registry Pattern
PHP North East Registry Pattern
 

Dernier

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Dernier (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Data at Scale - Michael Peacock, Cloud Connect 2012

  • 1. Data at Scale Data problems and solutions with the connected world
  • 2. Michael Peacock Web Systems Developer Telemetry Team Smith Electric Vehicles Lead Developer Occasional conference speaker Technical Author
  • 3. • Worlds largest manufacturer of all electric commercial vehicles • Founded in 1920 • US facility opened 2009 • US buyout in 2011
  • 5.
  • 6.
  • 7. Electric Vehicles • 16,500 – 26,000 lbs gross vehicle weight • Commercial Electric Delivery Trucks • 7,121 – 16,663 lbs payload • 50 – 240km • Top Speed 80km/h
  • 8. Electric Vehicles • New, continually evolving, technology • Viability evidence required • Government research
  • 9. EV Data • Performance analysis and metrics • Proving the technology: Government research • Evaluating driver training conversions • Diagnostics, Service and Warranty Issues • Continuous Improvement
  • 10.
  • 11.
  • 12. Current Status • ~500 telemetry enabled vehicles • Telemetry is now fitted as standard in our vehicles • Our MySQL solution processes: – 1.5 billion inserts per day – Constant minimum of 4000 inserts per second
  • 14. CANBus and Telemetry • Sample the buses: once per second • Only sample buses with useful performance and diagnostic information on them
  • 15.
  • 16. Vehicle Data • Drive train information: – Motor speed – Pedal positions – Temperatures – Fault Codes • Battery information: – Current, Voltage & Power – Capacity – Temperatures
  • 17. Connected World: The Problem • Connected infrastructure – EV Charging stations – Utilities • Home based telemetry – Smart Meters – Smart Homes
  • 18. Our problem • Hundreds of connected devices, each with numerous sensors giving us 2,500 pieces of data per second per vehicle • Broadcast time we can’t plan for • Vehicles rolling off the production line • New requirements for more data
  • 21. Issue 2: Capacity Sometimes data is too much to cope with www.flickr.com/photos/eveofdiscovery/314 9008295
  • 23. Option: Cloud Infrastructure • Cloud based infrastructure gives: – More capacity – More failover – Higher availability
  • 24. Cloud Infrastructure: Problem • Huge volumes of data inserts into a MySQL solution: sub-optimal on virtualised environments • Existing enterprise hardware investment • Security and legal issues for us storing the data off-site
  • 29. What if... • Queuing allows us to cope with: – Downtime of our own systems – Capacity problems • Queuing doesnt allow us to cope with: – An outage of a queuing infrastructure
  • 31. Cloud based infrastructure • Use a Message Queue to ensure data is only processed when you have the resources to process it
  • 32. SAN • Backbone to most cloud-based systems • Powers our MySQL solution • Supports: – Huge volumes of data – Lots of processing – Fast connection to your servers – Backups and snapshots
  • 33. SAN Tips • When dealing with data on a huge scale every aspect of your application and infrastructure needs to be optimised, this includes your SAN – something which is commonly overlooked. • http://www.samlambert.com/2011/07/how-to-push-your-san-with- open-iscsi_13.html
  • 35. Speed: Stream  Batch • Streams of continuously flowing data can be difficult to process • Turn the stream into small, quick batches • MySQL: LOAD DATA INFILE
  • 36. Shard 1: Hardware • As the amount of data increased, we hit a huge performance problem. This was solved by sharding at a hardware level. • Each data collection device was given its own database, which could be on any number of separate machines, with a single database acting as a registry
  • 37. Rationalisation & Extrapolation • Remember the CANBus – Always telling us information, which we sample every second? – Do we always need that? • Extrapolate and assume
  • 38. Getting information from data • Vehicle performance information involves: – Looking at 20 – 30 data points for each second of a vehicles operation in a day – Analysing the data – Performing calculations, which vary depending on certain data points • Getting this data was slow – How far did Customer A’s fleet travel last week?
  • 39. Regular processing • Instead of processing data on demand, process it regularly • Nightly scheduled task to evaluate performance information
  • 40. Regular Processing: Problems You need to pull the data out faster and faster than before!
  • 41. Shard 2: Tables • All our data has a timestamp associated with it • Looking up data for a particular day was slow. Very slow. • We sharded the data again, this time with a table per week within a vehicles specific database
  • 42.
  • 43. Sharding: Fallbacks and logic • What about data before you implemented sharding? • Which table do I need to look at?
  • 44. Aggregation • With data segregated on a per vehicle and per week basis, lookups were much faster • Performance calculations could be scheduled nightly, with a single record recorded for each vehicle for each day in a central database • Allows for easy aggregation: – How far did my fleet travel last week? – How much energy did they use last month?
  • 45.
  • 46. Backups and Archives • SAN backups and snapshots • With date based sharding: – Dump a table – Copy it elsewhere – Drop it / Flush it (if archiving)
  • 47. Outsource to the cloud • Why waste resources doing things that cloud based services do better (where legal, security and privacy reasons allow?) • Maps • Email delivery • Even phone integration
  • 48. Data Type Optimization • When prototyping a system and designing a database schema, its easy to be sloppy with your data types, and fields • DONT BE • Use as little storage space as you can – Ensure the data type uses as little as you can – Use only the fields you need
  • 49. Sharding: An excuse • Sharding was a large project for us, and involved extensive re-architecting of the system. • We had to make changes to every query we have in our code • Gave us an excuse to: – Optimise the queries – Optimise the indexes
  • 50. Query Optimization • Run every query through EXPLAIN EXTENDED • Check it hits the indexes • Remove functions like CURDATE from queries, to ensure query cache is hit
  • 51. Index Optimization • Keep it small • From our legacy days of one database on one server, we had a column that told us which vehicle the data related to – This was still there...as part of an index...despite the fact the application hadn’t required it for months
  • 54. Live data • Original database design dictated: • Each type of data point required a separate query, sub-query or join to obtain • Collection device and processing service dictated: • GPS Co-ordinates can be up to 6 separate data points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; Direction
  • 55. Dashboards: Caching • Don’t query if you don’t have to • Cache what you can; access direct • With message queuing its possible to route messages to two or more places: one to be processed and another to display the latest information directly
  • 56. Exporting data: Group • Where possible group exports and reports together by the same shard/table/index
  • 57. Code considerations • Race conditions • Number of concurrent requests – group them
  • 58. Application Quality • When dealing with lots of data, quickly, you need to ensure: – You process it correctly – You can act fast if there is a bug – You can act fast when refactoring
  • 59. Deployment • When dealing with a stream of data, rolling out new code can mean pausing the processing work that is done • Put deployment measures in place to make a deployment switch over instantaneous
  • 60. Technical Tips • Measure your applications performance, data throughput and so on – A data at scale problem itself • Use as much RAM on your servers as is safe to do so – We give 80% per DB server to MySQL of 100 – 140GB
  • 61. What do we have now? • Now we have a fast, stable reliable system • Pulling in millions of messages from a queue per day • Decoding those messages into 1.5 billion data points per day • Inserting 1.5 billion data points into MySQL per day • Performance data generated, and grant authority reports exported daily • More sleep on a night than we used to

Notes de l'éditeur

  1. Good morning, I’m here to talk about Data at Scale and the problems associated with receiving, processing and presenting data at scale in a connected world; in particular I’m going to use a case study of Electric Vehicle telematics from my own experiences of an extremely challenging, data intensive project. I’m going to be talking about the problem of data at scale both in terms of server resources and in terms of application design – because you need to be able to push data into your solution quickly, but you also need to process and export it quickly too.
  2. I’m Michael Peacock, the web systems developer on the telemetry team for Smith Electric Vehicles. We are a core team of three; myself, the only web developer on the project. A systems administrator who looks after our server infrastructure – and its much of his work that I’ll be taking credit for today! And a project manager. A very small team for a large amount of data – I’ll tell you exactly how much soon.
  3. Smith are the worlds largest manufacturer of all electric, commercial vehicles. Founded over 90 years ago to build electric delivery vehicles – both battery based and cable based. In 2009 the company opened its doors in the US, and at the start of last year the US operation bought out the European company which brings us to where we are today.
  4. When most people think of electric vehicles they tend to think of either hybrid vehicles or the likes of the Nissan Leaf or the Chevvy Volt. When it comes to commercial electric vehicles, they think of the electric buggies in airports or, for any British members of the audience today, Milk floats. However, we develop a different type of commercial vehicle:
  5. Large, fully electric, commercial delivery vehicles. Ranging from flatbed vehicles with military applications...
  6. Through to home delivery, depo delivery, utilities and school buses.
  7. These are 16 and a half thousand to 26 thousand pound delivery vehicles, capable of supporting upto 16 thousand pound payload, with a top speed of 80km/h.
  8. As I’m sure you can appreciate, Electric Vehicles are a relatively new, and continually evolving technology. As a result the technologies are constantly being evaluated and improved.
  9. We use EV data to look at the performance analysis and metrics of the vehicle, to see how far, how fast and how efficiently the vehicles travelled; to prove the technology through research; ensuring that driver training to help drivers move from diesel vehicles to electric vehicles has been successful; to diagnose issues, help with service intervals and warranty issues, and of course help to continually improve the vehicles. The only way to prove the technology is to look at a large sample of data. The only way to look at specific routes, vehicles, performance, driver and service issues is to ensure the data is available on every vehicle.This means capturing alot of data on the vehicles.Not only capturing the data – but we must effectively store the data, process the data, display the data and export it.
  10. We need to display real time information quickly to our users, so they can look and see in real time how a vehicle is functioning, where it is, where it is going and if it has raised any fault codes.
  11. Our users need to be able to look at detailed vehicle performance data over time, to see how much use a customer is getting from a vehicle or a fleet of vehicles, how well the vehicles are being driven, and pull out various other performance and driver style metrics.
  12. We currently have around 500 vehicles in service with telemetry installed and active. Each telemetry enabled vehicle collects between two and two and a half thousand data points on a per second basis while in drive mode, or on a per minute basis in charge mode. Telemetry is now a standard function of our vehicles, so going forward every new Smith vehicle will have it – which means we have a lot more data to process. As a result, our MySQL solution currently processes 1.5 billion inserts on a daily basis, with a constant minimum of 4000 inserts per second.
  13. As with many vehicles, we make use of a Controller Area Network to allow vehicle components to communicate with one another. The purpose behind a CANBus is to allow various nodes to communicate with one another without the need for a central host. Each component packages up a message which describes various aspects of its operation and sends these down the bus for the rest of the vehicle to pick up on. Because there isn’t a central host keeping an eye on things, the components don’t know who wants to know what when, so they constantly broadcast their information as messages onto the bus, hundreds of times per second.
  14. Obviously in a vehicle application, broadcasting and acting on data hundreds of times per second is essential. A driver wouldn’t want a delay when they apply the brake. From a monitoring and telematics perspective, this is somewhat excessive. As such, we only sample the CANBus once per second. Not all of the buses on the vehicle contain information which is relevant to performance, component analysis and diagnostics, and so they can be discounted.
  15. Mention about module level data on battery pods
  16. The problem that I am talking to you about today, arises from the fact that we are becoming a more connected world. More and more devices are capturing data in larger and larger quantities, in greater frequencies. As passenger electric vehicles become more popular, a larger and larger charging infrastructure develops with billing and usage data being collected; utilities companies collect data to monitor and evaluate their infrastructure, deal with issues, monitor electricity generation and route water supplies from resivours based on demand. Some homes now have smart meters which report billing information directly to the supplier, or which monitor and track energy consumption so that home owners can save energy by turning appliances off. The connected world already gives us huge data problems. We are sure to get lots more over the years.
  17. When the project was first conceived, it had a single aim: to capture a small set of vehicle performance metrics for a small number of vehicles. Subsequently, the initial system design simply had the vehicles connecting directly to a single server for the information to be processed. This however causes two problems for us.
  18. The first of these issues was our availability. If our systems were down, vehicles couldn’t connect to us and deliver their data, and the data would be lost.
  19. The other problem is the capacity of our servers. With a large amount of data coming in, and a large number of collection devices giving us this data, we could find our selves vulnerable to a Distributed Denial of Service attack that we ourselves authorised. This would lead to us being unable to process some or all data, some data being lost, and potentially, downtime.As more and more vehicles are used more and more regularly our servers will run the risk of catching fire!
  20. One option when faced with problems like this, is of course standard cloud based infrastructure. With the likes of Amazons EC2, more machines could be powered on when demand was high, and different availability zones can help in the event of machine downtime or network problems.
  21. With other cloud based services, we were able to put cloud based services between our data collection devices and our enterprise infrastructure and internal systems. Allowing us to use the Cloud to Connect us.
  22. This cloud based middleware for us, was a Message Queue
  23. By using a dedicated message queuing infrastructure: Our application can cope with downtime issues; when our service is down messages are queued in the message queue, until we are back online Just because our application is online, doesn’t mean we can process incoming data; the queue acts as an elastic between our computing power and our data streams Since we can cope with downtime, and capacity problems, we can perform maintenance on our system as and when we need to
  24. While cloud based infrastructure and services often deliver massive reliability boosts, they can and sometimes do, fail.
  25. The solution there, was to ensure the remote collection devices themselves have a small buffer within them. Be careful, you don’t want to send all of the buffered data back in one go once you can connect to the service again!
  26. If you remember back to one of the earlier slides, I talked about how the vehicle components constantly tell us what they are up to, and that we sample that on a per second basis? Do we always care? With data which is reported at a low resolution, which doesn’t change as frequently as the sampling occurs – do we really need it? Clearly the answer is no. If a battery is 100% full, and remains that way for 5 minutes during drive, we don’t need to know that for every second of those 5 minutes the battery has 100% state of charge. We can simply log when it went down to 99%, and we can assume and extrapolate that if we are looking up data within those five minutes – the previous known value applies.Of course, this does bring problems of its own into the mix. What if the vehicle was turned off, and its shut-down sequence interupted. We don’t know that its drive status is now off, nor do we know that its battery is now doing nothing. Should we assume it continues to draw charge? No. We need to make some assumptions. In effect, we put our own sampling in place, where we sample the data from the vehicle on a per minute basis, unless the data changes. If it changes, we sample, if it doesn’t change we don’t sample until at least 60 second has passed.
  27. For example, the energy used relies on the current and the voltage values of the battery. The distance travelled and speeds (for top and average speeds) rely on the motor speed, the gearbox and other vehicle metrics.
  28. For any non-essential server tasks, we try to outsource these to the cloud. Services such as postmarkapp allow you to outsource your email delivering – so you don’t need to worry that your servers are under pressure dealing with critical data, AND you need to send hundreds of reports based off the contents of the data. Get someone else to do the work.We make use of a vast number of cloud based services to do our work for us, including:
  29. With SQL based database systems, each data type available for a field uses a set amount of storage space. A good example is integers, MySQL offers a range of different integer fields, each type is able to store a different range of values, the greater the range, the more storage space the field needs to use – regardless of if the value of the field is part of that range, as opposed to the range of the next field type down. If you know the data in a particular field is always within a specific range – use the data type with the smallest size which supports the range you require. When you need to store data at scale, an over eager datatype can cost you dearly.Similarly, make sure the data type is optimised for the work you are doing on it. When it comes to Ints, floats, doubles and decimals some are more suited to others for arithmetic work because of the part of the CPU they use.
  30. Our vehicle live screen lists the current status of 27 different pieces of information on a given vehicle. Due to the way the data is stored together and managed, to pull out this information would require a seperate query per data point, or a query with lots of subqueries. The page is refreshed every thirty seconds.
  31. We run a large number of daily exports and reports, where we look at the data held in a number of shards, and pull that out. If we postpone this processing for maintenance work, or to deal with a built up message queue, we find we have a large queue of data to be exported. The trick, is to export the data differently.Normally, we would pull data for a single vehicle for a single day, do some processing build a report, then move onto the next. When catching up on a number of days worth of reports, it is faster to do each of those days for a single vehicle, then move onto the next, because the data from the indexes is often still held in memory.