SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
If the Data Cannot Come to
the Algorithm...
many cores with java
session four
data locality
copyright 2013 Robert Burrell Donkin robertburrelldonkin.name
this work is licensed under a Creative Commons Attribution 3.0 Unported License
Pre-emptive multi-tasking operating
systems use involuntary context switching
to provide the illusion of parallel processes
even when the hardware supports only a
single thread of execution.
Take Away from Session One
Even on a single core,
there's no escaping parallelism.
Take Away from Session Two
Take Away from Session Three
Code executing on different cores uses copies held
in registers and caches, so memory shared is likely
to be incoherent unless the program plays by the
rules of the software platform.
Gustafson's Law
S(p) = p - a (p-1)
● S(p) is the speedup for pprocessors
● a is the non-parallelizable fraction
"in practice, the problem size scales with the number of
processors" John L. Gustafson
● Think about Gustafson's Law...
● The quantity of data processed...
● ...scales linearly as processors added.
● Throwing processors at the problem
works...
● ...at least sometimes.
Scales and Scaling
Divide and Conquer
● Back to the future
● Partition the data...
○ ...apply the same algorithm to each part and then
○ ...collate the answers.
● Natural to parallelise
● No contended shared memory
Data Locality
● When the algorithm is small
○ it's more efficient
■ to bring the algorithm to the data
■ than the data to the algorithm
● Whether the data is in
○ caches on cores in a many core computer, or in
○ disc storage in a distributed data store
Map and Reduce
● Partition the data
● The map algorithm
○ works in parallel
○ on local data
○ independently
● The reduce algorithm
○ collates output from map algorithms
● More complex systems built from these blocks
Map-Reduce
As a Query Language
● NoSQL
● A popular alternative to SQL
○ for distributed data stores
● Why...?
○ Easy to
■ read and write
■ parallelize
○ Rich and full programming model
Map-Reduce
Crunching Big Data
● Commodity hardware
● Scales up to Terabyte and Petabyte
○ smoothly by adding new nodes
● Map-Reduce platforms typically provide
○ fault tolerance eg. retry
○ orchestration
○ redundant data storage
● Statistical resilience
Take Away
When you want to be able to process big data
tomorrow by adding cores or computers, adopt
an appropriate architecture today.

Contenu connexe

Tendances

Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek MukherjeeIntroduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek MukherjeeFinTechopedia
 
EC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed ProcessingEC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed ProcessingJonathan Dahl
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsSpeedment, Inc.
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokesGagan Bajpai
 
Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)8thLight
 
Caffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelCaffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelSri Ambati
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceVasia Kalavri
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesVasia Kalavri
 
Tech Talk - Underutilized Resources in Distributed System
Tech Talk - Underutilized Resources in Distributed SystemTech Talk - Underutilized Resources in Distributed System
Tech Talk - Underutilized Resources in Distributed SystemRishabh Dugar
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processingHarisankar H
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingRiyad Parvez
 
Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandraChristina Yu
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and ReuseVasia Kalavri
 
Multi core processing of xml twig patterns
Multi core processing of xml twig patternsMulti core processing of xml twig patterns
Multi core processing of xml twig patternsieeepondy
 
Tensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute LibraryTensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute LibraryKobe Yu
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
Serving deep learning models in a serverless platform (IC2E 2018)
Serving deep learning models in a serverless platform (IC2E 2018)Serving deep learning models in a serverless platform (IC2E 2018)
Serving deep learning models in a serverless platform (IC2E 2018)alekn
 

Tendances (20)

Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek MukherjeeIntroduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee
 
EC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed ProcessingEC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed Processing
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)
 
Pregel
PregelPregel
Pregel
 
Caffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelCaffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noel
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
 
Tech Talk - Underutilized Resources in Distributed System
Tech Talk - Underutilized Resources in Distributed SystemTech Talk - Underutilized Resources in Distributed System
Tech Talk - Underutilized Resources in Distributed System
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processing
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
 
Hadoop and cassandra
Hadoop and cassandraHadoop and cassandra
Hadoop and cassandra
 
Scheduling for Parallel and Multi-Core Systems
Scheduling for Parallel and Multi-Core SystemsScheduling for Parallel and Multi-Core Systems
Scheduling for Parallel and Multi-Core Systems
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
 
Multi core processing of xml twig patterns
Multi core processing of xml twig patternsMulti core processing of xml twig patterns
Multi core processing of xml twig patterns
 
Tensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute LibraryTensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute Library
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Serving deep learning models in a serverless platform (IC2E 2018)
Serving deep learning models in a serverless platform (IC2E 2018)Serving deep learning models in a serverless platform (IC2E 2018)
Serving deep learning models in a serverless platform (IC2E 2018)
 

En vedette

Fifty Year Of Microprocessor
Fifty Year Of MicroprocessorFifty Year Of Microprocessor
Fifty Year Of MicroprocessorAli Usman
 
ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale Marco van der Hart
 
The Evolution Of Computer
The Evolution Of ComputerThe Evolution Of Computer
The Evolution Of ComputerShravan Kumar
 
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...inside-BigData.com
 
Genesis & Progression of Processors in CPU
Genesis & Progression of Processors in CPUGenesis & Progression of Processors in CPU
Genesis & Progression of Processors in CPUAnkita Jangir
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiAnkit Raj
 
Xilinx fpga cores
Xilinx fpga coresXilinx fpga cores
Xilinx fpga coressanaz nouri
 
Introduction to microprocessor
Introduction to microprocessorIntroduction to microprocessor
Introduction to microprocessorKashyap Shah
 
Multi core processors
Multi core processorsMulti core processors
Multi core processorsAdithya Bhat
 

En vedette (11)

Fifty Year Of Microprocessor
Fifty Year Of MicroprocessorFifty Year Of Microprocessor
Fifty Year Of Microprocessor
 
ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale
 
Apostila lpt
Apostila lptApostila lpt
Apostila lpt
 
processors
processorsprocessors
processors
 
The Evolution Of Computer
The Evolution Of ComputerThe Evolution Of Computer
The Evolution Of Computer
 
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...
 
Genesis & Progression of Processors in CPU
Genesis & Progression of Processors in CPUGenesis & Progression of Processors in CPU
Genesis & Progression of Processors in CPU
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash Prajapati
 
Xilinx fpga cores
Xilinx fpga coresXilinx fpga cores
Xilinx fpga cores
 
Introduction to microprocessor
Introduction to microprocessorIntroduction to microprocessor
Introduction to microprocessor
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 

Similaire à If the data cannot come to the algorithm...

Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learningAmer Ather
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)RichardWarburton
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache CassandraSaeid Zebardast
 
Introduction to Memoria
Introduction to MemoriaIntroduction to Memoria
Introduction to MemoriaVictor Smirnov
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationHao Xu
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
Scalable data systems at Traveloka
Scalable data systems at TravelokaScalable data systems at Traveloka
Scalable data systems at TravelokaRendy Bambang Junior
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Tom Grek
 
Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Robert Burrell Donkin
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache SparkLucian Neghina
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Threads - Why Can't You Just Play Nicely With Your Memory?
Threads - Why Can't You Just Play Nicely With Your Memory?Threads - Why Can't You Just Play Nicely With Your Memory?
Threads - Why Can't You Just Play Nicely With Your Memory?Robert Burrell Donkin
 

Similaire à If the data cannot come to the algorithm... (20)

Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Introduction to Memoria
Introduction to MemoriaIntroduction to Memoria
Introduction to Memoria
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale Automation
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Scalable data systems at Traveloka
Scalable data systems at TravelokaScalable data systems at Traveloka
Scalable data systems at Traveloka
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Caching in
Caching inCaching in
Caching in
 
Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018
 
Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Threads - Why Can't You Just Play Nicely With Your Memory?
Threads - Why Can't You Just Play Nicely With Your Memory?Threads - Why Can't You Just Play Nicely With Your Memory?
Threads - Why Can't You Just Play Nicely With Your Memory?
 

Plus de Robert Burrell Donkin

Plus de Robert Burrell Donkin (11)

Threads and Threads
Threads and ThreadsThreads and Threads
Threads and Threads
 
If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...
 
An End to Order
An End to OrderAn End to Order
An End to Order
 
An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)
 
Many Cores Java - Session One: Threads and Threads
Many Cores Java - Session One: Threads and ThreadsMany Cores Java - Session One: Threads and Threads
Many Cores Java - Session One: Threads and Threads
 
Apache Maven In 10 Slides
Apache Maven In 10 SlidesApache Maven In 10 Slides
Apache Maven In 10 Slides
 
XP In 10 slides
XP In 10 slidesXP In 10 slides
XP In 10 slides
 
Public Sector: Agile and Open Source
Public Sector: Agile and Open SourcePublic Sector: Agile and Open Source
Public Sector: Agile and Open Source
 
An Agile Pick-N-Mix
An Agile Pick-N-MixAn Agile Pick-N-Mix
An Agile Pick-N-Mix
 
The Pomodoro Technique: Introduced Unofficially In 10 Slides
The Pomodoro Technique: Introduced Unofficially In 10 SlidesThe Pomodoro Technique: Introduced Unofficially In 10 Slides
The Pomodoro Technique: Introduced Unofficially In 10 Slides
 
Retrospectives In 10 Slides (With Notes)
Retrospectives In 10 Slides  (With Notes)Retrospectives In 10 Slides  (With Notes)
Retrospectives In 10 Slides (With Notes)
 

Dernier

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Dernier (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

If the data cannot come to the algorithm...

  • 1. If the Data Cannot Come to the Algorithm... many cores with java session four data locality copyright 2013 Robert Burrell Donkin robertburrelldonkin.name this work is licensed under a Creative Commons Attribution 3.0 Unported License
  • 2. Pre-emptive multi-tasking operating systems use involuntary context switching to provide the illusion of parallel processes even when the hardware supports only a single thread of execution. Take Away from Session One
  • 3. Even on a single core, there's no escaping parallelism. Take Away from Session Two
  • 4. Take Away from Session Three Code executing on different cores uses copies held in registers and caches, so memory shared is likely to be incoherent unless the program plays by the rules of the software platform.
  • 5. Gustafson's Law S(p) = p - a (p-1) ● S(p) is the speedup for pprocessors ● a is the non-parallelizable fraction "in practice, the problem size scales with the number of processors" John L. Gustafson
  • 6. ● Think about Gustafson's Law... ● The quantity of data processed... ● ...scales linearly as processors added. ● Throwing processors at the problem works... ● ...at least sometimes. Scales and Scaling
  • 7. Divide and Conquer ● Back to the future ● Partition the data... ○ ...apply the same algorithm to each part and then ○ ...collate the answers. ● Natural to parallelise ● No contended shared memory
  • 8. Data Locality ● When the algorithm is small ○ it's more efficient ■ to bring the algorithm to the data ■ than the data to the algorithm ● Whether the data is in ○ caches on cores in a many core computer, or in ○ disc storage in a distributed data store
  • 9. Map and Reduce ● Partition the data ● The map algorithm ○ works in parallel ○ on local data ○ independently ● The reduce algorithm ○ collates output from map algorithms ● More complex systems built from these blocks
  • 10. Map-Reduce As a Query Language ● NoSQL ● A popular alternative to SQL ○ for distributed data stores ● Why...? ○ Easy to ■ read and write ■ parallelize ○ Rich and full programming model
  • 11. Map-Reduce Crunching Big Data ● Commodity hardware ● Scales up to Terabyte and Petabyte ○ smoothly by adding new nodes ● Map-Reduce platforms typically provide ○ fault tolerance eg. retry ○ orchestration ○ redundant data storage ● Statistical resilience
  • 12. Take Away When you want to be able to process big data tomorrow by adding cores or computers, adopt an appropriate architecture today.