SlideShare une entreprise Scribd logo
1  sur  103
Evolution of a “big data”
         project
        Michael Peacock
@michaelpeacock
@michaelpeacock


Head Developer @groundsix
@michaelpeacock


Head Developer @groundsix

Author of a number of web related books
@michaelpeacock


Head Developer @groundsix

Author of a number of web related books

Occasional conference / user group speaker
Ground Six
Ground Six
Tech company based in the North East of
England
Ground Six
Tech company based in the North East of
England

Specialise in developing web and mobile
applications
Ground Six
Tech company based in the North East of
England

Specialise in developing web and mobile
applications

Provide investment (financial and tech) to
interesting app ideas
Ground Six
Tech company based in the North East of
England

Specialise in developing web and mobile
applications

Provide investment (financial and tech) to
interesting app ideas

Got an idea? Looking for investment?
www.groundsix.com
Whats in store
Whats in store
Challenges, solutions and approaches when dealing with
billions of inserts per day
Whats in store
Challenges, solutions and approaches when dealing with
billions of inserts per day

   Processing and storing the data
Whats in store
Challenges, solutions and approaches when dealing with
billions of inserts per day

   Processing and storing the data

   Querying the data quickly
Whats in store
Challenges, solutions and approaches when dealing with
billions of inserts per day

   Processing and storing the data

   Querying the data quickly

   Reporting against the data
Whats in store
Challenges, solutions and approaches when dealing with
billions of inserts per day

   Processing and storing the data

   Querying the data quickly

   Reporting against the data

   Keeping the application responsive
Whats in store
Challenges, solutions and approaches when dealing with
billions of inserts per day

   Processing and storing the data

   Querying the data quickly

   Reporting against the data

   Keeping the application responsive

   Keeping the application running
Whats in store
Challenges, solutions and approaches when dealing with
billions of inserts per day

   Processing and storing the data

   Querying the data quickly

   Reporting against the data

   Keeping the application responsive

   Keeping the application running

   Legacy project, problems and code
Vehicle Telematics
Electric Vehicles: Need
        for Data
Electric Vehicles: Need
        for Data
We need to receive all of the data
Electric Vehicles: Need
        for Data
We need to receive all of the data

We need to keep all of the data
Electric Vehicles: Need
        for Data
We need to receive all of the data

We need to keep all of the data

We need to be able to display data in real time
Electric Vehicles: Need
        for Data
We need to receive all of the data

We need to keep all of the data

We need to be able to display data in real time

We need to transfer large chunks of data to
customers and government departments
Electric Vehicles: Need
        for Data
We need to receive all of the data

We need to keep all of the data

We need to be able to display data in real time

We need to transfer large chunks of data to
customers and government departments

We need to be able to calculate performance
metrics from the data
Some stats
Some stats

500 (approx) telemetry enabled vehicles
using the system
Some stats

500 (approx) telemetry enabled vehicles
using the system

2500 data points captured per vehicle per
second
Some stats

500 (approx) telemetry enabled vehicles
using the system

2500 data points captured per vehicle per
second

> 1.5 billion MySQL inserts per day
Some stats

500 (approx) telemetry enabled vehicles
using the system

2500 data points captured per vehicle per
second

> 1.5 billion MySQL inserts per day

Worlds largest vehicle telematics project
outside of Formula 1
More stats


Constant minimum of 4000 inserts per
second within the application

Peaks:

  3 million inserts per second
Processing and storing
      the data
Receiving continuous
   data streams
Receiving continuous
   data streams

We need to be online
Receiving continuous
   data streams

We need to be online

We need to have capacity to process the
data
Receiving continuous
   data streams

We need to be online

We need to have capacity to process the
data

We need to scale
Message Queue


Fast, secure, reliable and scalable

Hosted: they worry about the server
infrastructure and availability

We only have to process what we can
AMQP + PHP

php-amqplib (github.com/videlalvaro/php-
amqplib)

  OR install it via composer:   videlalvaro/php-amqplib


Pure PHP implementation

Handles publishing and consuming messages
from a queue
AMQP: Consume
// connect to the AMQP server
$connection = new AMQPConnection($host,$port,$user,$password);

// create a channel; a logical stateful link to our physical connection
$channel = $connection->channel();


// link the channel to an exchange (where messages are sent)
$channel->exchange_declare($exchange, ‘direct’);

// bind the channel to the queue
$channel->queue_bind($queue, $exchange);

// consume by sending the message to our processing callback function
$channel->basic_consume($queue, $consumerTag, false, false, false,
$callbackFunctionName);

while(count($channel->callbacks))
{
  $channel->wait();
}
Buffers
Pulling in the data
Dedicated application and hardware to
consume from the Message Queue and
convert to MySQL Inserts

MySQL: LOAD DATA INFILE

  Very fast

  Due to high volumes of data, these “bulk
  operations” only cover a few seconds of
  time - still giving a live stream of data
Optimising MySQL
innodb_flush_method=O_DIRECT

    Lets the buffer pool bypass the OS cache

    InnoDB buffer pools more efficient that OS

    Can have negative side effects

Improve write performance:

    innodb_flush_log_at_trx_commit=2

    Prevents per-commit log flushing

Query cache size (query_cache_size)

    Measure your applications usage and make a judgement

    Our data stream was too frequent to make use of the cache
Sharding (1)

Evaluate data, look for natural break points

Split the data so each data collection unit
(vehicle) had a seperate database

Gives some support for horizontal scaling

  Provided the data per vehicle is a
  reasonable size
System architecture
But the MQ can store data...why
    do you have a problem?

  Message Queue isn’t designed for storage

  Messages are transferred in a compressed
  form

  Nature of vehicle data (CAN) means that a 16
  character string is actually 4 - 64 pieces of
  data
Sam Lambert
Solves big-data MySQL problems for
breakfast

Constantly tweaking the servers and
configuration to get more and more
performance

Pushing the capabilities of our SAN,
tweaking configs where no DBA has gone
before

www.samlambert.com

http://www.samlambert.com/2011/07/how-
to-push-your-san-with-open-iscsi_13.html

http://www.samlambert.com/
2011/07/diagnosing-and-fixing-
mysql-io.html

Twitter: @isamlambert
Querying the data
      QUICKLY!
Graphs! Slow!
Long Running Queries
More and more vehicles came into service

Huge amount of data resulted in very slow
queries

  Page load

  Session locking

  Slow exports

  Slow backups
Real time information
Original database schema dictated all
information was accessed via a query, or a
separate subquery. Expensive.

Live information:

  Up to 30 data points

  Refreshing every 5 - 30 seconds via AJAX

Painful
Requests
Asynchronous requests let the page load before
the data

Number of these requests had to be monitored

Real time information used Fusion Charts

  1 AJAX call per chart

  10 - 30 charts per vehicle live screen

  Refresh every 5 - 30 seconds
Requests: Optimised
Single entry point

Multiple entry points make it difficult to
dynamically change the time out and
memory usage of key pages, as well as
dealing with session locking issues
effectively.

Single point of entry is essential

Checkout the symfony routing component...
Symfony Routing
// load your routes
$locator = new FileLocator( array(__DIR__ . '/../../' ) );
$loader = new YamlFileLoader( $locator );
$loader->load('routes.yml');
$request = ( isset( $_SERVER['REQUEST_URI'] ) ) ? $_SERVER['REQUEST_URI'] : '';
$requestContext = new RequestContext( $request );

// Setup the router
$router = new RoutingRouter( new YamlFileLoader( $locator ), "routes.yml",
array('cache_dir' => null), $requestContext );
$requestURL = ( isset( $_SERVER['REQUEST_URI'] ) ) ? $_SERVER['REQUEST_URI'] : '';
$requestURL = (strlen( $requestURL ) > 1 ) ? rtrim( $requestURL, '/' ) : $requestURL;

// get the route for your request
$route = $this->router->match( $requestURL );

// act on the route
Sharding: split the data into
      smaller buckets
Sharding (2)

Data is very time relevant

  Only care about specific days

  Don’t care about comparing data too much

Split the data so that each week had a
separate table
Supporting Sharding
                  Simple PHP function to run all queries
                  through. Works out the table name. Link
                  with a sprintf to get the full query string
/**
  * Get the sharded table to use from a specific date
  * @param String $date YYYY-MM-DD
  * @return String
  */
public function getTableNameFromDate( $date )
{
	    // ASSUMPTION: todays database is ALWAYS THERE
	    // ASSUMPTION: You shouldn't be querying for data in the future
	    $date = ( $date > date( 'Y-m-d') ) ? date('Y-m-d') : $date;
	    $stt = strtotime( $date );
     if( $date >= $this->switchOver ) {
     	    $year = ( date( 'm', $stt ) == 01 && date( 'W', $stt ) == 52 ) ? date('Y', $stt ) - 1 : date('Y', $stt );
     	    return 'datavalue_' . $year . '_' . date('W', $stt );
     }
     else {
     	    return 'datavalue';
     }
}
Sharding: an excuse

Alterations to the database schema

Code to support smaller buckets of data



Take advantage of needing to touch queries
and code: improve them!
Index Optimisation
Two sharding projects left the schema as a
frankenstien

Indexes still had data from before the first shard
(the vehicle ID)

   Wasting storage space

   Increasing the index size

   Increasing query time

   Makes the index harder to fit into memory
Schema Optimisation
MySQL provides a range of data-types

Varying storage implications

   Does that need to be a BIGINT

   Do you really need DOUBLE PRECISION when a
   FLOAT will do?

Are those tables, fields or databases still required?

Perform regular schema audits
Query Optimisation

Run your queries through EXPLAIN
EXTENDED

  Check they hit the indexes

For big queries avoid functions such as
CURDATE - this helps ensure the cache is hit
Reporting against the
        data
Performance report
Reports & Intensive
     Queries
How far did the vehicle travel today

  Calculation involves looking at every single
  motor speed value for the day

How much energy did the vehicle use today

  Calculation involves looking at multiple
  variables for every second of the day

Lookup time + calculation time
Group the queries
Leverage indexes

  Perform related queries in succession

  Then perform calculations

Catching up on a backlog of calculations and
exports?

  Do a table of queries at a time

  Make use of indexes
Save the report
Automate the queries in dead time, grouped
together nicely

Save the results in a reports table

Only a single record per vehicle per day of
performance data

  Means users and management can run
  aggregate and comparison queries
  themselves quickly and easily
Enables date-range aggregation
Check for efficiency
      savings

Initial export scripts maintained a MySQLi
connection per database (500!)

Updated to maintain one per server and
simply switch to the database in question
Leverage your RAM

Intensive queries might only use X% of your
RAM

Safe to run more than one report / export
at a time

Add support for multiple exports / reports
within your scripts e.g.
$numberOfConcurrentReportsToRun = 2;
$reportInstance = 0;
$counter = 0;
foreach( $data as $unit ) {
! if( ( $counter % $numberOfConcurrentReportsToRun ) == $reportInstance ) {
! ! $dataToProcess[] = $unit;
! }!
   $counter++;
}
Extrapolate & Assume

Data is only stored when it changes

Known assumptions are used to extrapolate
values for all seconds of the day

Saves MySQL but costs in RAM

“Interlation”
Interlation
  * Add an array to the interlation
public function addArray( $name, $array )

  * Get the time that we first receive data in one of our arrays
public function getFirst( $field )

  * Get the time that we last received data in any of our arrays
public function getLast( $field )

  * Generate the interlaced array
public function generate( $keyField, $valueField )

  * Beak the interlaced array down into seperate days
public function dayBreak( $interlationArray )

   * Generate an interlaced array and fill for all timestamps within the range
of     _first_ to _last_
public function generateAndFill( $keyField, $valueField )

  * Populate the new combined array with key fields using the common field
public function populateKeysFromField( $field, $valueField=null )

http://www.michaelpeacock.co.uk/interlation-library
Food for thought



Gearman

  Tool to schedule and run background jobs
Keeping the application
      responsive
Session Locking

Some queries were still (understandably, and
acceptably) slow

Sessions would lock and AJAX scripts would
enter race conditions

User would attempt to navigate to another
page: their session with the web server
wouldn’t respond
Session Locking:
      Resolution
Session locking caused by how PHP handles
sessions;

  Session file is closed once it has finishes
  executing the request

Potential solution: use another method e.g.
database

Our solution: manually close the session
Closing the session

session_write_close();

Caveats:

  If you need to write to sessions again in
  the execution cycle, you must call
  session_start() again

  Made problematic by the lack of template
  handling
Live real-time data
Request consolidation helped

Each data point on the live screen was still a
separate query due to original design
constraints

Live fleet information spanned multiple
databases e.g. a map of all vehicles
belonging to a customer

Solution: caching
Caching with memcached
      Fast, in-memory key-value store

         Used to keep a copy of the most recent
         data from each vehicle
$mc = new Memcache();
$mc->connect($memcacheServer, $memcachePort);
$realTimeData = $mc->get($vehicleID . ‘-’ . $dataVariable);


      Failover: Moxi Memcached Proxy
Caching enables large range of
 data to be looked up quickly
Legacy Project
Constraints, problems and code. Easing
         deployment anxiety.
Source Control
       Management

Initially SVN

Migrated to git

  Branch per feature strategy

Automated deployment
Dependencies
Dependency Injection framework missing
from the application, caused problems with:

  Authentication

  Memcache

  Handling multiple concurrent database
  connections

  Access control
Autoloading



PSR-0
Templates and sessions


Closing and opening sessions means you need
to know when data has been sent to the
browser

Separation of concerns and template systems
help with this
Database rollouts
              Specific database table defines how the data should
              be processed

              Log database deltas

              Automated process to roll out changes

                   Backup existing table first
DATE=`date +%H-%M-%d-%m-%y`
mysqldump -h HOST -u USER -pPASSWORD DATABASE TABLENAME > /backups/dictionary_$DATE.sql
cd /var/www/pdictionarypatcher/repo/
git pull origin master
cd src
php index.php


                   Rollout changes
private function applyNextPatch( $currentPatchID ) {
	 $patchToTry = ++$currentPatchID;
	 if( file_exists( FRAMEWORK_PATH . '../patches/' . $patchToTry . '.php' ) ) {
	 	 $sql = file_get_contents( FRAMEWORK_PATH . '../patches/' . $patchToTry . '.php' );
	 	 $this->database->multi_query( $sql );
	 	 return $this->applyNextPatch( $patchToTry );
	 }
	 else {
	 	 return $patchToTry-1;
	 }
}
The future
Tiered SAN hardware
NoSQL?
MySQL was used as a “golden hammer”

Original team of contractors who built the
system knew it

Easy to hire developers who know it

Not necessarily the best option

We had to introduce application-level
sharding for it to suite the growing needs
Rationalisation


Do we need all that data? Really?

  At the moment: probably

  In the future: probably not
Direct queue interaction


 Types of message queue could allow our live
 data to be streamed direct from a queue

 We could use this infrastructure to share
 the data with partners instead of providing
 them regular processed exports
More hardware



More vehicles + New components = Need for
more storage
Conclusions
So you need to work with a crap-load of data?
PHP needs lots of
        friends
PHP is a great tool for:

   Displaying the data

   Processing the data

   Exporting the data

   Binding business logic to the data

It needs friends to:

   Queue the data

   Insert the data

   Visualise the data
Continually Review


Your schema & indexes

Your queries

Efficiencies in your code

Number of AJAX requests
Message Queue: A
     safety net

Queue what you can

Lets you move data around while you process
it

Gives your hardware some breathing space
Code Considerations
Template engines

Dependency management

Abstraction

Autoloading

Session handling

Request management
Compile Data
Keep related data together

Look at storing summaries of data

   Approach used by analytics companies: granularity
   changes over time:

      This week: per second data

      Last week: Hourly summaries

      Last month: Daily summaries

      Last year: Monthly summaries
Thanks; Q+A


Michael Peacock

mkpeacock@gmail.com

@michaelpeacock

www.michaelpeacock.co.uk
Photo credits


flickr.com/photos/itmpa/4531956496/

flickr.com/photos/eveofdiscovery/3149008295

flickr.com/photos/gadl/89650415/

flickr.com/photos/brapps/403257780

Contenu connexe

Tendances

Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...Amazon Web Services
 
Technology Trends in Data Processing - DAT311 - re:Invent 2017
Technology Trends in Data Processing - DAT311 - re:Invent 2017Technology Trends in Data Processing - DAT311 - re:Invent 2017
Technology Trends in Data Processing - DAT311 - re:Invent 2017Amazon Web Services
 
Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDataWorks Summit
 
Koalas: Pandas on Apache Spark
Koalas: Pandas on Apache SparkKoalas: Pandas on Apache Spark
Koalas: Pandas on Apache SparkDatabricks
 
Apache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data pointsApache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data pointsKasper Sørensen
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperDataWorks Summit
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Data Con LA
 
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Lucidworks
 
Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017
Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017
Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017Amazon Web Services
 
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Amazon Web Services
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
High Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into HadoopHigh Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into HadoopDataWorks Summit
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSAmazon Web Services
 

Tendances (19)

Amazon DynamoDB and DAX
Amazon DynamoDB and DAXAmazon DynamoDB and DAX
Amazon DynamoDB and DAX
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
Accelerate Machine Learning Workloads using Amazon EC2 P3 Instances - SRV201 ...
 
Technology Trends in Data Processing - DAT311 - re:Invent 2017
Technology Trends in Data Processing - DAT311 - re:Invent 2017Technology Trends in Data Processing - DAT311 - re:Invent 2017
Technology Trends in Data Processing - DAT311 - re:Invent 2017
 
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big DataPowering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
 
Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
 
Koalas: Pandas on Apache Spark
Koalas: Pandas on Apache SparkKoalas: Pandas on Apache Spark
Koalas: Pandas on Apache Spark
 
Apache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data pointsApache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data points
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS Developer
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
 
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
 
Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017
Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017
Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017
 
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
High Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into HadoopHigh Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into Hadoop
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
Elasticsearch as a Database?
Elasticsearch as a Database?Elasticsearch as a Database?
Elasticsearch as a Database?
 

En vedette

The Evolution of Big Data Analytics
The Evolution of Big Data AnalyticsThe Evolution of Big Data Analytics
The Evolution of Big Data AnalyticsAYATA
 
The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big dataEdward Yoon
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
Big Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? DefinitelyBig Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? DefinitelyEMC
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

En vedette (11)

The Evolution of Big Data Analytics
The Evolution of Big Data AnalyticsThe Evolution of Big Data Analytics
The Evolution of Big Data Analytics
 
The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? DefinitelyBig Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? Definitely
 
Big Data
Big DataBig Data
Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similaire à Evolution of a big data project

PHP Continuous Data Processing
PHP Continuous Data ProcessingPHP Continuous Data Processing
PHP Continuous Data ProcessingMichael Peacock
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraDataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsEiti Kimura
 
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...Amazon Web Services
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Prolifics
 
Real Time Data Warehousing Mastering Business Objects June 11
Real Time Data Warehousing   Mastering Business Objects June 11Real Time Data Warehousing   Mastering Business Objects June 11
Real Time Data Warehousing Mastering Business Objects June 11jeffmonico
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambdadarach
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7Paul Lo
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicalsShelli Ciaschini
 
Gemfire
GemfireGemfire
GemfireFNian
 
Automating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop AgentAutomating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop AgentCA | Automic Software
 
Oracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridOracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridEmiliano Pecis
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 
Cloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptxCloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptxMichel Burger
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Nati Shalom
 

Similaire à Evolution of a big data project (20)

PHP Continuous Data Processing
PHP Continuous Data ProcessingPHP Continuous Data Processing
PHP Continuous Data Processing
 
DataBearings: A semantic platform for data integration on IoT, Artem Katasonov
DataBearings: A semantic platform for data integration on IoT, Artem KatasonovDataBearings: A semantic platform for data integration on IoT, Artem Katasonov
DataBearings: A semantic platform for data integration on IoT, Artem Katasonov
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
Real Time Data Warehousing Mastering Business Objects June 11
Real Time Data Warehousing   Mastering Business Objects June 11Real Time Data Warehousing   Mastering Business Objects June 11
Real Time Data Warehousing Mastering Business Objects June 11
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambda
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicals
 
Gemfire
GemfireGemfire
Gemfire
 
Automating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop AgentAutomating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop Agent
 
Oracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridOracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagrid
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Cloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptxCloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptx
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 

Plus de Michael Peacock

Immutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and TerraformImmutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and TerraformMichael Peacock
 
Test driven APIs with Laravel
Test driven APIs with LaravelTest driven APIs with Laravel
Test driven APIs with LaravelMichael Peacock
 
Symfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning TalkSymfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning TalkMichael Peacock
 
Alexa, lets make a skill
Alexa, lets make a skillAlexa, lets make a skill
Alexa, lets make a skillMichael Peacock
 
API Development with Laravel
API Development with LaravelAPI Development with Laravel
API Development with LaravelMichael Peacock
 
An introduction to Laravel Passport
An introduction to Laravel PassportAn introduction to Laravel Passport
An introduction to Laravel PassportMichael Peacock
 
Refactoring to symfony components
Refactoring to symfony componentsRefactoring to symfony components
Refactoring to symfony componentsMichael Peacock
 
Dance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech TalkDance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech TalkMichael Peacock
 
Powerful and flexible templates with Twig
Powerful and flexible templates with Twig Powerful and flexible templates with Twig
Powerful and flexible templates with Twig Michael Peacock
 
Introduction to OOP with PHP
Introduction to OOP with PHPIntroduction to OOP with PHP
Introduction to OOP with PHPMichael Peacock
 
Phpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friendsPhpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friendsMichael Peacock
 
Real time voice call integration - Confoo 2012
Real time voice call integration - Confoo 2012Real time voice call integration - Confoo 2012
Real time voice call integration - Confoo 2012Michael Peacock
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Michael Peacock
 
Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012Michael Peacock
 
PHP North East Registry Pattern
PHP North East Registry PatternPHP North East Registry Pattern
PHP North East Registry PatternMichael Peacock
 
PHP North East - Registry Design Pattern
PHP North East - Registry Design PatternPHP North East - Registry Design Pattern
PHP North East - Registry Design PatternMichael Peacock
 

Plus de Michael Peacock (20)

Immutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and TerraformImmutable Infrastructure with Packer Ansible and Terraform
Immutable Infrastructure with Packer Ansible and Terraform
 
Test driven APIs with Laravel
Test driven APIs with LaravelTest driven APIs with Laravel
Test driven APIs with Laravel
 
Symfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning TalkSymfony Workflow Component - Introductory Lightning Talk
Symfony Workflow Component - Introductory Lightning Talk
 
Alexa, lets make a skill
Alexa, lets make a skillAlexa, lets make a skill
Alexa, lets make a skill
 
API Development with Laravel
API Development with LaravelAPI Development with Laravel
API Development with Laravel
 
An introduction to Laravel Passport
An introduction to Laravel PassportAn introduction to Laravel Passport
An introduction to Laravel Passport
 
Phinx talk
Phinx talkPhinx talk
Phinx talk
 
Refactoring to symfony components
Refactoring to symfony componentsRefactoring to symfony components
Refactoring to symfony components
 
Dance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech TalkDance for the puppet master: G6 Tech Talk
Dance for the puppet master: G6 Tech Talk
 
Powerful and flexible templates with Twig
Powerful and flexible templates with Twig Powerful and flexible templates with Twig
Powerful and flexible templates with Twig
 
Introduction to OOP with PHP
Introduction to OOP with PHPIntroduction to OOP with PHP
Introduction to OOP with PHP
 
Vagrant
VagrantVagrant
Vagrant
 
Phpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friendsPhpne august-2012-symfony-components-friends
Phpne august-2012-symfony-components-friends
 
Real time voice call integration - Confoo 2012
Real time voice call integration - Confoo 2012Real time voice call integration - Confoo 2012
Real time voice call integration - Confoo 2012
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012
 
Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012Data at Scale - Michael Peacock, Cloud Connect 2012
Data at Scale - Michael Peacock, Cloud Connect 2012
 
Supermondays twilio
Supermondays twilioSupermondays twilio
Supermondays twilio
 
PHP & Twilio
PHP & TwilioPHP & Twilio
PHP & Twilio
 
PHP North East Registry Pattern
PHP North East Registry PatternPHP North East Registry Pattern
PHP North East Registry Pattern
 
PHP North East - Registry Design Pattern
PHP North East - Registry Design PatternPHP North East - Registry Design Pattern
PHP North East - Registry Design Pattern
 

Dernier

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Evolution of a big data project

  • 1. Evolution of a “big data” project Michael Peacock
  • 4. @michaelpeacock Head Developer @groundsix Author of a number of web related books
  • 5. @michaelpeacock Head Developer @groundsix Author of a number of web related books Occasional conference / user group speaker
  • 7. Ground Six Tech company based in the North East of England
  • 8. Ground Six Tech company based in the North East of England Specialise in developing web and mobile applications
  • 9. Ground Six Tech company based in the North East of England Specialise in developing web and mobile applications Provide investment (financial and tech) to interesting app ideas
  • 10. Ground Six Tech company based in the North East of England Specialise in developing web and mobile applications Provide investment (financial and tech) to interesting app ideas Got an idea? Looking for investment? www.groundsix.com
  • 12. Whats in store Challenges, solutions and approaches when dealing with billions of inserts per day
  • 13. Whats in store Challenges, solutions and approaches when dealing with billions of inserts per day Processing and storing the data
  • 14. Whats in store Challenges, solutions and approaches when dealing with billions of inserts per day Processing and storing the data Querying the data quickly
  • 15. Whats in store Challenges, solutions and approaches when dealing with billions of inserts per day Processing and storing the data Querying the data quickly Reporting against the data
  • 16. Whats in store Challenges, solutions and approaches when dealing with billions of inserts per day Processing and storing the data Querying the data quickly Reporting against the data Keeping the application responsive
  • 17. Whats in store Challenges, solutions and approaches when dealing with billions of inserts per day Processing and storing the data Querying the data quickly Reporting against the data Keeping the application responsive Keeping the application running
  • 18. Whats in store Challenges, solutions and approaches when dealing with billions of inserts per day Processing and storing the data Querying the data quickly Reporting against the data Keeping the application responsive Keeping the application running Legacy project, problems and code
  • 21. Electric Vehicles: Need for Data We need to receive all of the data
  • 22. Electric Vehicles: Need for Data We need to receive all of the data We need to keep all of the data
  • 23. Electric Vehicles: Need for Data We need to receive all of the data We need to keep all of the data We need to be able to display data in real time
  • 24. Electric Vehicles: Need for Data We need to receive all of the data We need to keep all of the data We need to be able to display data in real time We need to transfer large chunks of data to customers and government departments
  • 25. Electric Vehicles: Need for Data We need to receive all of the data We need to keep all of the data We need to be able to display data in real time We need to transfer large chunks of data to customers and government departments We need to be able to calculate performance metrics from the data
  • 27. Some stats 500 (approx) telemetry enabled vehicles using the system
  • 28. Some stats 500 (approx) telemetry enabled vehicles using the system 2500 data points captured per vehicle per second
  • 29. Some stats 500 (approx) telemetry enabled vehicles using the system 2500 data points captured per vehicle per second > 1.5 billion MySQL inserts per day
  • 30. Some stats 500 (approx) telemetry enabled vehicles using the system 2500 data points captured per vehicle per second > 1.5 billion MySQL inserts per day Worlds largest vehicle telematics project outside of Formula 1
  • 31. More stats Constant minimum of 4000 inserts per second within the application Peaks: 3 million inserts per second
  • 33. Receiving continuous data streams
  • 34. Receiving continuous data streams We need to be online
  • 35. Receiving continuous data streams We need to be online We need to have capacity to process the data
  • 36. Receiving continuous data streams We need to be online We need to have capacity to process the data We need to scale
  • 37.
  • 38.
  • 39. Message Queue Fast, secure, reliable and scalable Hosted: they worry about the server infrastructure and availability We only have to process what we can
  • 40. AMQP + PHP php-amqplib (github.com/videlalvaro/php- amqplib) OR install it via composer: videlalvaro/php-amqplib Pure PHP implementation Handles publishing and consuming messages from a queue
  • 41. AMQP: Consume // connect to the AMQP server $connection = new AMQPConnection($host,$port,$user,$password); // create a channel; a logical stateful link to our physical connection $channel = $connection->channel(); // link the channel to an exchange (where messages are sent) $channel->exchange_declare($exchange, ‘direct’); // bind the channel to the queue $channel->queue_bind($queue, $exchange); // consume by sending the message to our processing callback function $channel->basic_consume($queue, $consumerTag, false, false, false, $callbackFunctionName); while(count($channel->callbacks)) { $channel->wait(); }
  • 43. Pulling in the data Dedicated application and hardware to consume from the Message Queue and convert to MySQL Inserts MySQL: LOAD DATA INFILE Very fast Due to high volumes of data, these “bulk operations” only cover a few seconds of time - still giving a live stream of data
  • 44. Optimising MySQL innodb_flush_method=O_DIRECT Lets the buffer pool bypass the OS cache InnoDB buffer pools more efficient that OS Can have negative side effects Improve write performance: innodb_flush_log_at_trx_commit=2 Prevents per-commit log flushing Query cache size (query_cache_size) Measure your applications usage and make a judgement Our data stream was too frequent to make use of the cache
  • 45. Sharding (1) Evaluate data, look for natural break points Split the data so each data collection unit (vehicle) had a seperate database Gives some support for horizontal scaling Provided the data per vehicle is a reasonable size
  • 47. But the MQ can store data...why do you have a problem? Message Queue isn’t designed for storage Messages are transferred in a compressed form Nature of vehicle data (CAN) means that a 16 character string is actually 4 - 64 pieces of data
  • 48. Sam Lambert Solves big-data MySQL problems for breakfast Constantly tweaking the servers and configuration to get more and more performance Pushing the capabilities of our SAN, tweaking configs where no DBA has gone before www.samlambert.com http://www.samlambert.com/2011/07/how- to-push-your-san-with-open-iscsi_13.html http://www.samlambert.com/ 2011/07/diagnosing-and-fixing- mysql-io.html Twitter: @isamlambert
  • 49. Querying the data QUICKLY!
  • 51. Long Running Queries More and more vehicles came into service Huge amount of data resulted in very slow queries Page load Session locking Slow exports Slow backups
  • 52. Real time information Original database schema dictated all information was accessed via a query, or a separate subquery. Expensive. Live information: Up to 30 data points Refreshing every 5 - 30 seconds via AJAX Painful
  • 53. Requests Asynchronous requests let the page load before the data Number of these requests had to be monitored Real time information used Fusion Charts 1 AJAX call per chart 10 - 30 charts per vehicle live screen Refresh every 5 - 30 seconds
  • 55. Single entry point Multiple entry points make it difficult to dynamically change the time out and memory usage of key pages, as well as dealing with session locking issues effectively. Single point of entry is essential Checkout the symfony routing component...
  • 56. Symfony Routing // load your routes $locator = new FileLocator( array(__DIR__ . '/../../' ) ); $loader = new YamlFileLoader( $locator ); $loader->load('routes.yml'); $request = ( isset( $_SERVER['REQUEST_URI'] ) ) ? $_SERVER['REQUEST_URI'] : ''; $requestContext = new RequestContext( $request ); // Setup the router $router = new RoutingRouter( new YamlFileLoader( $locator ), "routes.yml", array('cache_dir' => null), $requestContext ); $requestURL = ( isset( $_SERVER['REQUEST_URI'] ) ) ? $_SERVER['REQUEST_URI'] : ''; $requestURL = (strlen( $requestURL ) > 1 ) ? rtrim( $requestURL, '/' ) : $requestURL; // get the route for your request $route = $this->router->match( $requestURL ); // act on the route
  • 57. Sharding: split the data into smaller buckets
  • 58. Sharding (2) Data is very time relevant Only care about specific days Don’t care about comparing data too much Split the data so that each week had a separate table
  • 59. Supporting Sharding Simple PHP function to run all queries through. Works out the table name. Link with a sprintf to get the full query string /** * Get the sharded table to use from a specific date * @param String $date YYYY-MM-DD * @return String */ public function getTableNameFromDate( $date ) { // ASSUMPTION: todays database is ALWAYS THERE // ASSUMPTION: You shouldn't be querying for data in the future $date = ( $date > date( 'Y-m-d') ) ? date('Y-m-d') : $date; $stt = strtotime( $date ); if( $date >= $this->switchOver ) { $year = ( date( 'm', $stt ) == 01 && date( 'W', $stt ) == 52 ) ? date('Y', $stt ) - 1 : date('Y', $stt ); return 'datavalue_' . $year . '_' . date('W', $stt ); } else { return 'datavalue'; } }
  • 60. Sharding: an excuse Alterations to the database schema Code to support smaller buckets of data Take advantage of needing to touch queries and code: improve them!
  • 61. Index Optimisation Two sharding projects left the schema as a frankenstien Indexes still had data from before the first shard (the vehicle ID) Wasting storage space Increasing the index size Increasing query time Makes the index harder to fit into memory
  • 62. Schema Optimisation MySQL provides a range of data-types Varying storage implications Does that need to be a BIGINT Do you really need DOUBLE PRECISION when a FLOAT will do? Are those tables, fields or databases still required? Perform regular schema audits
  • 63. Query Optimisation Run your queries through EXPLAIN EXTENDED Check they hit the indexes For big queries avoid functions such as CURDATE - this helps ensure the cache is hit
  • 66. Reports & Intensive Queries How far did the vehicle travel today Calculation involves looking at every single motor speed value for the day How much energy did the vehicle use today Calculation involves looking at multiple variables for every second of the day Lookup time + calculation time
  • 67. Group the queries Leverage indexes Perform related queries in succession Then perform calculations Catching up on a backlog of calculations and exports? Do a table of queries at a time Make use of indexes
  • 68. Save the report Automate the queries in dead time, grouped together nicely Save the results in a reports table Only a single record per vehicle per day of performance data Means users and management can run aggregate and comparison queries themselves quickly and easily
  • 70. Check for efficiency savings Initial export scripts maintained a MySQLi connection per database (500!) Updated to maintain one per server and simply switch to the database in question
  • 71. Leverage your RAM Intensive queries might only use X% of your RAM Safe to run more than one report / export at a time Add support for multiple exports / reports within your scripts e.g.
  • 72. $numberOfConcurrentReportsToRun = 2; $reportInstance = 0; $counter = 0; foreach( $data as $unit ) { ! if( ( $counter % $numberOfConcurrentReportsToRun ) == $reportInstance ) { ! ! $dataToProcess[] = $unit; ! }! $counter++; }
  • 73. Extrapolate & Assume Data is only stored when it changes Known assumptions are used to extrapolate values for all seconds of the day Saves MySQL but costs in RAM “Interlation”
  • 74. Interlation * Add an array to the interlation public function addArray( $name, $array ) * Get the time that we first receive data in one of our arrays public function getFirst( $field ) * Get the time that we last received data in any of our arrays public function getLast( $field ) * Generate the interlaced array public function generate( $keyField, $valueField ) * Beak the interlaced array down into seperate days public function dayBreak( $interlationArray ) * Generate an interlaced array and fill for all timestamps within the range of _first_ to _last_ public function generateAndFill( $keyField, $valueField ) * Populate the new combined array with key fields using the common field public function populateKeysFromField( $field, $valueField=null ) http://www.michaelpeacock.co.uk/interlation-library
  • 75. Food for thought Gearman Tool to schedule and run background jobs
  • 77. Session Locking Some queries were still (understandably, and acceptably) slow Sessions would lock and AJAX scripts would enter race conditions User would attempt to navigate to another page: their session with the web server wouldn’t respond
  • 78. Session Locking: Resolution Session locking caused by how PHP handles sessions; Session file is closed once it has finishes executing the request Potential solution: use another method e.g. database Our solution: manually close the session
  • 79. Closing the session session_write_close(); Caveats: If you need to write to sessions again in the execution cycle, you must call session_start() again Made problematic by the lack of template handling
  • 80. Live real-time data Request consolidation helped Each data point on the live screen was still a separate query due to original design constraints Live fleet information spanned multiple databases e.g. a map of all vehicles belonging to a customer Solution: caching
  • 81. Caching with memcached Fast, in-memory key-value store Used to keep a copy of the most recent data from each vehicle $mc = new Memcache(); $mc->connect($memcacheServer, $memcachePort); $realTimeData = $mc->get($vehicleID . ‘-’ . $dataVariable); Failover: Moxi Memcached Proxy
  • 82. Caching enables large range of data to be looked up quickly
  • 83. Legacy Project Constraints, problems and code. Easing deployment anxiety.
  • 84. Source Control Management Initially SVN Migrated to git Branch per feature strategy Automated deployment
  • 85. Dependencies Dependency Injection framework missing from the application, caused problems with: Authentication Memcache Handling multiple concurrent database connections Access control
  • 87. Templates and sessions Closing and opening sessions means you need to know when data has been sent to the browser Separation of concerns and template systems help with this
  • 88. Database rollouts Specific database table defines how the data should be processed Log database deltas Automated process to roll out changes Backup existing table first DATE=`date +%H-%M-%d-%m-%y` mysqldump -h HOST -u USER -pPASSWORD DATABASE TABLENAME > /backups/dictionary_$DATE.sql cd /var/www/pdictionarypatcher/repo/ git pull origin master cd src php index.php Rollout changes
  • 89. private function applyNextPatch( $currentPatchID ) { $patchToTry = ++$currentPatchID; if( file_exists( FRAMEWORK_PATH . '../patches/' . $patchToTry . '.php' ) ) { $sql = file_get_contents( FRAMEWORK_PATH . '../patches/' . $patchToTry . '.php' ); $this->database->multi_query( $sql ); return $this->applyNextPatch( $patchToTry ); } else { return $patchToTry-1; } }
  • 92. NoSQL? MySQL was used as a “golden hammer” Original team of contractors who built the system knew it Easy to hire developers who know it Not necessarily the best option We had to introduce application-level sharding for it to suite the growing needs
  • 93. Rationalisation Do we need all that data? Really? At the moment: probably In the future: probably not
  • 94. Direct queue interaction Types of message queue could allow our live data to be streamed direct from a queue We could use this infrastructure to share the data with partners instead of providing them regular processed exports
  • 95. More hardware More vehicles + New components = Need for more storage
  • 96. Conclusions So you need to work with a crap-load of data?
  • 97. PHP needs lots of friends PHP is a great tool for: Displaying the data Processing the data Exporting the data Binding business logic to the data It needs friends to: Queue the data Insert the data Visualise the data
  • 98. Continually Review Your schema & indexes Your queries Efficiencies in your code Number of AJAX requests
  • 99. Message Queue: A safety net Queue what you can Lets you move data around while you process it Gives your hardware some breathing space
  • 100. Code Considerations Template engines Dependency management Abstraction Autoloading Session handling Request management
  • 101. Compile Data Keep related data together Look at storing summaries of data Approach used by analytics companies: granularity changes over time: This week: per second data Last week: Hourly summaries Last month: Daily summaries Last year: Monthly summaries

Notes de l'éditeur

  1. Hello everyone; Thanks for coming. I spent the last 12 months working on a large scale data intensive project, focusing on the development of a PHP web application which had to support, display, process, report against and export a pheonomenal amount of data each day.\n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. The project concerned dealing with vehicle telematics data from vehicles produced by Smith Electric Vehicles. One of the worlds largest manufacturers of all-electric commercial vehicles. As a new and emerging industry, performance, efficiency and fault reporting data from these vehicles is very valuable. As I’m sure you can imagine, with electric vehicles the drive and battery systems generate a large amount of data - with batteries broken down into smaller cells, each giving us temperature, current, voltage and state of charge data.\n
  17. As the data may relate to performance and faults - we need to ensure we get the data. Telematics projects which offer safety features have this as an even more important issue. We also have government partners who subsidise the vehicle cost in exchange for some of this data. Subsequently we need to be able to give this data to them, as well as receiving it ourselves. \nAs EV’s rely on chemistry and external factors, we need to keep data so we can compare data at different times of the year and different locations \n
  18. As the data may relate to performance and faults - we need to ensure we get the data. Telematics projects which offer safety features have this as an even more important issue. We also have government partners who subsidise the vehicle cost in exchange for some of this data. Subsequently we need to be able to give this data to them, as well as receiving it ourselves. \nAs EV’s rely on chemistry and external factors, we need to keep data so we can compare data at different times of the year and different locations \n
  19. As the data may relate to performance and faults - we need to ensure we get the data. Telematics projects which offer safety features have this as an even more important issue. We also have government partners who subsidise the vehicle cost in exchange for some of this data. Subsequently we need to be able to give this data to them, as well as receiving it ourselves. \nAs EV’s rely on chemistry and external factors, we need to keep data so we can compare data at different times of the year and different locations \n
  20. As the data may relate to performance and faults - we need to ensure we get the data. Telematics projects which offer safety features have this as an even more important issue. We also have government partners who subsidise the vehicle cost in exchange for some of this data. Subsequently we need to be able to give this data to them, as well as receiving it ourselves. \nAs EV’s rely on chemistry and external factors, we need to keep data so we can compare data at different times of the year and different locations \n
  21. As the data may relate to performance and faults - we need to ensure we get the data. Telematics projects which offer safety features have this as an even more important issue. We also have government partners who subsidise the vehicle cost in exchange for some of this data. Subsequently we need to be able to give this data to them, as well as receiving it ourselves. \nAs EV’s rely on chemistry and external factors, we need to keep data so we can compare data at different times of the year and different locations \n
  22. What you will realise is that we in effect built a large scale distributed-denial-of-service system, and pointed it directly at our own hardware, with the caveat of needing the data from the DDOS attack!\n
  23. What you will realise is that we in effect built a large scale distributed-denial-of-service system, and pointed it directly at our own hardware, with the caveat of needing the data from the DDOS attack!\n
  24. What you will realise is that we in effect built a large scale distributed-denial-of-service system, and pointed it directly at our own hardware, with the caveat of needing the data from the DDOS attack!\n
  25. What you will realise is that we in effect built a large scale distributed-denial-of-service system, and pointed it directly at our own hardware, with the caveat of needing the data from the DDOS attack!\n
  26. \n
  27. before we could do anything - we need to be able to process the data and store it within the system. This includes actually transferring the data to our servers, inserting it into our database cluster and performing business logic on the data.\n
  28. In order for us to reliably receive the data, we need the system to be online so that data can be transferred. We also need to have the server capacity to process the data, and we need to be able to scale the system. Just because there are X number of data collection units out there - we don’t know how many will be on at a given time, and we have to deal with more and more collection units being build and delivered.\n
  29. In order for us to reliably receive the data, we need the system to be online so that data can be transferred. We also need to have the server capacity to process the data, and we need to be able to scale the system. Just because there are X number of data collection units out there - we don’t know how many will be on at a given time, and we have to deal with more and more collection units being build and delivered.\n
  30. In order for us to reliably receive the data, we need the system to be online so that data can be transferred. We also need to have the server capacity to process the data, and we need to be able to scale the system. Just because there are X number of data collection units out there - we don’t know how many will be on at a given time, and we have to deal with more and more collection units being build and delivered.\n
  31. The biggest problem is dealing with the pressure of that data stream. \n
  32. \n
  33. \n
  34. There are a range of AMQP libraries for PHP, some of them based off the C-library and other difficult dependencies. \n\nA couple of guys developed a pure PHP implementation of the library which is really easy to use and install, and can be installed directly via Composer. As its a pure PHP implementation its really easy to get up and running on any platform.\n\nProvides support for both publishing and consuming messages from a queue.Great not only for dealing with streams of data but also for storing events and requests across multiple sessions, or dispatching jobs.\n
  35. \n
  36. A small buffer allows us to cope with the issue of connectivity problems to our message queue, or signal problems with the data collection devices.\n
  37. To give data import the resources it needs, the system had dedicated hardware to consume messages from the message queue, perform business logic and convert them to MySQL Inserts.\n\nAlthough its an obvious one, its also easily overlooked. The data is bundled together into LOAD DATA INFILE statements with MySQL. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. With a project of this scale, dealing with business-critical data could lead to deployment anxiety. This is because a bug in rolled out code could cause problems with displaying real time data, or cause exported data or processed data reports to be incorrect; requiring them to be re-run at a cost of CPU time - most of which was already in use generating that days reports or dealing with that days data imports. Architecture of the application also provided constraints for maintenance.\n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n