SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
DynamoDB: Data Example
     userId    date          value    unlockedAchievments
     hadr-fb   18-07-2012    72       [’10 days’, ‘2 levels day’]
     hadr-fb   19-07-2012    1        None
     hadr-fb   20-07-2012    56789    [‘top 10 progress’]

                      Table: ‘Waldo-Scores’



        Id     platform     Name       JoinDate      Score
        hadr   fb           Hadrien    31-02-2011    10 457
        hadr   G+           Hadrien    18-07-2012    357
        pior   fb           Pior       12-12-2012    18 951

                          Table: ‘Players’
Data types (Lean. . . )

   Types

       single
             string (utf-8)
             number (entre 10-128 et 10+126 )
       set
             string (utf-8)
             number


   Constraints

       no “Embeded Documents”
       no complex types (dates, . . . )
Dimensionning 1/2: Big picture


   Units

       acces/s ∗ roundUp(kb) ∗ item
       provisionning
       updates are. . . constraining

   Storage

       tables are “elastic”
       64KB max per item
       overhead = 100o per item
Dimensionning 2/2: Traps and constraints
   TRAPS:

      Units are divided among each partition.
      Bigger tables often means higher throughput. Divide tables ?

   CONSTRAINTS for throughput:

      absolute
            min 5
            max 10 000
            1 single table in UPDATING state
      increase
            min 10%
            max 100%
      decrease
            min 10%
            max once a day
Integrated Service 1/3: IAM

        API level
        table level (except for “ListTables”)

   Example: “Fair” Scores table use

   {
       "Statement":[{
          "Effect":"Allow",
          "Action":["DynamoDB:DeleteItem", "DynamoDB:PutItem",
            "DynamoDB:UpdateItem", "DynamoDB:GetItem",
            "DynamoDB:Query"],
          "Resource":
            "arn:aws:DynamoDB:<region>:<account>:table/Scores"
       }]
   }
Integrated Service 2/3: CloudWatch
   Metrics:

       SuccessfulRequestLatency
       UserErrors
       SystemErrors
       ThrottledRequests
       ConsumedReadCapacityUnits
       ConsumedWriteCapacityUnits
       ReturnedItemCount

   Metric’s context

       Table
       Operation ({Put, Delete, Update, Get, BatchGet}Item, Scan,
       Query)
Integrated Service 3/3: EMR



       out of the scope of this presentation
       basically, HIVE integrated with DynamoDB => HiveQL

   use cases:

       custom index generation
       export to S3 (backup, data removal)
       data analysis / aggregation
Data access 1/3: GetItem


      Fastest: primary key(s)
      0-1 item
      Cost = 1 unit

  Example : ‘Hadrien’ Player of ‘fb’ platform

  table = conn.get_table(’Players’)
  item = table.get_item(
          hash_key=’hadr’,
          range_key=’fb’
      )
Data access 2/3: Query

       Fast
              primary key
              range key conditions =, <, >, <=, >=, startsWith
       0+ item(s)
       Cost = 1 unit per returned item

   Example : All ‘Waldo-Scores’ of ‘hadr-fb’ Player

   table = conn.get_table(’Waldo-Scores’)
   item = table.get_item(
           hash_key=’hadr-fb’,
           #range_key_condition=
       )
Data access 3/3: Scan

       Slooooow
            filter on any key
            tests ALL the table !
       0+ item(s)
       Cost = 1/2 unit for each parsed KB ! => Starvations
       Use case: get a full (small) table. Ex: ‘powerups’

   Example : All days where ‘hadr-*’ did better than 100

   table = conn.get_table(’Waldo-Scores’)
   item = table.get_item(
           scan_filter={
               ’userId’: BEGINSWITH(’hadr-’),
               ’value’: GT(100)
           })
Performance considerations: non indexed data 1/2




   De-normalisation

       Ex: Waldo and Players table :)
       big picture: data duplication to fit the
           view point
           need
Performance considerations: non indexed data 2/2

   Scan

       sloooooow (sequential)
       (bad) unit consumption (sequential)


   EMR

       scales (less slow :p)
       (better) units consumption (parallele)


   TL;DR
   Index your data !
Eventual vs strong consitence



      write => propagation ∼ 1s
      read => may not be up to date . . .


       Consistence   Applications   Cost (Units)   performance
       strong        critical       1 per KB       good
       eventual      aware          1/2 per KB     maximal
Critical/specific applications


   Redundancy/backup

       managed => no need
       “∼ Snapshot” => EMR + S3

   ∼ Transactions

       conditional operations (idempotent)
       atomic counter (idempotent BUT strong consistence)
API 1/3: Read


   Method           Consistence        Description      Returns
   GetItem          eventual/strong    load by key      0-1 item
   BatchGetItem     eventual/strong    same //          0-100 item, 0-1MB
   Query            eventual/strong    rangeKey filter   0+ item, 0-1MB
   Scan             eventual           any key filter    0+ item, 0-1MB


      rule: 0-1 filter / eligible key
      unprocessed => ‘UnprocessedKeys’, ‘LastEvaluatedKey’
      consumed units => ‘ConsumedCapacityUnits’
      enforce strong consistence => ‘ConsistentRead’
API 2/3: Edit



    Method           Consistence      Condition   Changes
    PutItem          create-replace   yes         1 item
    DeleteItem       supprime         yes         1 item, 0-1MB
    BatchWriteItem   create-up-del    no          1-25 item
    UpdateItem       create-up-del    yes         1+ field, 1 item


      not processed / failure => ‘UnprocessedItems’
      condition failed => ‘ConditionalCheckFailed’
API 3/3: Structure



    Method          Asynchronous   Description
    CreateTable     yes            Create table - provision units
    DeleteTable     yes            self explanatory
    DescribeTable   no             Read size, status, throughput
    ListTables      no             Get tables starting with “. . . ”
    UpdateTables    yes            Update provisions


      “DELETING” table might answer requests until deleted
TL;DR Let’s make it short :)




      Amazon
          scalable
          fully integrated
      Constraints
          throughput provisioning
          index matters

Contenu connexe

Tendances

Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
Spring data ii
Spring data iiSpring data ii
Spring data ii명철 강
 
An Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL TriggersAn Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL TriggersJim Mlodgenski
 
Lab1-DB-Cassandra
Lab1-DB-CassandraLab1-DB-Cassandra
Lab1-DB-CassandraLilia Sfaxi
 
Parameterization is nothing but giving multiple input
Parameterization is nothing but giving multiple inputParameterization is nothing but giving multiple input
Parameterization is nothing but giving multiple inputuanna
 
15 MySQL Basics #burningkeyboards
15 MySQL Basics #burningkeyboards15 MySQL Basics #burningkeyboards
15 MySQL Basics #burningkeyboardsDenis Ristic
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraDeependra Ariyadewa
 
MongoDB Quick Reference Card
MongoDB Quick Reference CardMongoDB Quick Reference Card
MongoDB Quick Reference CardJeremy Taylor
 
Sql Connection and data table and data set and sample program in C# ....
Sql Connection and data table and data set and sample program in C# ....Sql Connection and data table and data set and sample program in C# ....
Sql Connection and data table and data set and sample program in C# ....Hari Haran
 
Google apps script database abstraction exposed version
Google apps script database abstraction   exposed versionGoogle apps script database abstraction   exposed version
Google apps script database abstraction exposed versionBruce McPherson
 
(E Book) Asp .Net Tips, Tutorials And Code
(E Book) Asp .Net Tips,  Tutorials And Code(E Book) Asp .Net Tips,  Tutorials And Code
(E Book) Asp .Net Tips, Tutorials And Codesyedjee
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureDataStax Academy
 
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Jongwook Woo
 
16 MySQL Optimization #burningkeyboards
16 MySQL Optimization #burningkeyboards16 MySQL Optimization #burningkeyboards
16 MySQL Optimization #burningkeyboardsDenis Ristic
 
Functional streams with Kafka - A comparison between Akka-streams and FS2
Functional streams with Kafka - A comparison between Akka-streams and FS2Functional streams with Kafka - A comparison between Akka-streams and FS2
Functional streams with Kafka - A comparison between Akka-streams and FS2Luis Miguel Reis
 

Tendances (20)

Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Spring data ii
Spring data iiSpring data ii
Spring data ii
 
An Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL TriggersAn Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL Triggers
 
Lab1-DB-Cassandra
Lab1-DB-CassandraLab1-DB-Cassandra
Lab1-DB-Cassandra
 
Parameterization is nothing but giving multiple input
Parameterization is nothing but giving multiple inputParameterization is nothing but giving multiple input
Parameterization is nothing but giving multiple input
 
Dbabstraction
DbabstractionDbabstraction
Dbabstraction
 
15 MySQL Basics #burningkeyboards
15 MySQL Basics #burningkeyboards15 MySQL Basics #burningkeyboards
15 MySQL Basics #burningkeyboards
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
MongoDB Quick Reference Card
MongoDB Quick Reference CardMongoDB Quick Reference Card
MongoDB Quick Reference Card
 
Sql Connection and data table and data set and sample program in C# ....
Sql Connection and data table and data set and sample program in C# ....Sql Connection and data table and data set and sample program in C# ....
Sql Connection and data table and data set and sample program in C# ....
 
Pandas csv
Pandas csvPandas csv
Pandas csv
 
Python - Lecture 11
Python - Lecture 11Python - Lecture 11
Python - Lecture 11
 
Google apps script database abstraction exposed version
Google apps script database abstraction   exposed versionGoogle apps script database abstraction   exposed version
Google apps script database abstraction exposed version
 
laravel-53
laravel-53laravel-53
laravel-53
 
(E Book) Asp .Net Tips, Tutorials And Code
(E Book) Asp .Net Tips,  Tutorials And Code(E Book) Asp .Net Tips,  Tutorials And Code
(E Book) Asp .Net Tips, Tutorials And Code
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
Market Basket Analysis Algorithm with no-SQL DB HBase and Hadoop
 
Sparklyr
SparklyrSparklyr
Sparklyr
 
16 MySQL Optimization #burningkeyboards
16 MySQL Optimization #burningkeyboards16 MySQL Optimization #burningkeyboards
16 MySQL Optimization #burningkeyboards
 
Functional streams with Kafka - A comparison between Akka-streams and FS2
Functional streams with Kafka - A comparison between Akka-streams and FS2Functional streams with Kafka - A comparison between Akka-streams and FS2
Functional streams with Kafka - A comparison between Akka-streams and FS2
 

En vedette (7)

Pruitt caleb visual_resumestoryboard
Pruitt caleb visual_resumestoryboardPruitt caleb visual_resumestoryboard
Pruitt caleb visual_resumestoryboard
 
Insomnia
InsomniaInsomnia
Insomnia
 
Establishing safety event analysis team seat turned ordinary people in to cha...
Establishing safety event analysis team seat turned ordinary people in to cha...Establishing safety event analysis team seat turned ordinary people in to cha...
Establishing safety event analysis team seat turned ordinary people in to cha...
 
Insomnia Solution
Insomnia SolutionInsomnia Solution
Insomnia Solution
 
Rivera no life pdf
Rivera no life pdfRivera no life pdf
Rivera no life pdf
 
Vadodara trafficeducationtrust teamtist_2112012
Vadodara trafficeducationtrust teamtist_2112012Vadodara trafficeducationtrust teamtist_2112012
Vadodara trafficeducationtrust teamtist_2112012
 
Resume
ResumeResume
Resume
 

Similaire à Dynamodb

CR17 - Designing a database like an archaeologist
CR17 - Designing a database like an archaeologistCR17 - Designing a database like an archaeologist
CR17 - Designing a database like an archaeologistyoavrubin
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeWim Godden
 
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
Polyglot Persistence in the Real World: Cassandra + S3 + MapReducePolyglot Persistence in the Real World: Cassandra + S3 + MapReduce
Polyglot Persistence in the Real World: Cassandra + S3 + MapReducethumbtacktech
 
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiy
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiyTues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiy
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiyAnton Yazovskiy
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep DiveAmazon Web Services
 
February 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDBFebruary 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDBAmazon Web Services
 
Designing a database like an archaeologist
Designing a database like an archaeologistDesigning a database like an archaeologist
Designing a database like an archaeologistyoavrubin
 
Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013
Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013
Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013Anton Bangratz
 
Amazon Dynamo DB for Developers (김일호) - AWS DB Day
Amazon Dynamo DB for Developers (김일호) - AWS DB DayAmazon Dynamo DB for Developers (김일호) - AWS DB Day
Amazon Dynamo DB for Developers (김일호) - AWS DB DayAmazon Web Services Korea
 
What's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksWhat's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksGrgur Grisogono
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesKeshav Murthy
 
PyCon SG x Jublia - Building a simple-to-use Database Management tool
PyCon SG x Jublia - Building a simple-to-use Database Management toolPyCon SG x Jublia - Building a simple-to-use Database Management tool
PyCon SG x Jublia - Building a simple-to-use Database Management toolCrea Very
 

Similaire à Dynamodb (20)

CR17 - Designing a database like an archaeologist
CR17 - Designing a database like an archaeologistCR17 - Designing a database like an archaeologist
CR17 - Designing a database like an archaeologist
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
Om nom nom nom
Om nom nom nomOm nom nom nom
Om nom nom nom
 
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
Polyglot Persistence in the Real World: Cassandra + S3 + MapReducePolyglot Persistence in the Real World: Cassandra + S3 + MapReduce
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
 
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiy
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiyTues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiy
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiy
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 
Polyglot parallelism
Polyglot parallelismPolyglot parallelism
Polyglot parallelism
 
AWS Data Collection & Storage
AWS Data Collection & StorageAWS Data Collection & Storage
AWS Data Collection & Storage
 
February 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDBFebruary 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDB
 
Designing a database like an archaeologist
Designing a database like an archaeologistDesigning a database like an archaeologist
Designing a database like an archaeologist
 
Dbms
DbmsDbms
Dbms
 
Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013
Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013
Talk about Testing at vienna.rb meetup #2 on Apr 12th, 2013
 
Amazon Dynamo DB for Developers (김일호) - AWS DB Day
Amazon Dynamo DB for Developers (김일호) - AWS DB DayAmazon Dynamo DB for Developers (김일호) - AWS DB Day
Amazon Dynamo DB for Developers (김일호) - AWS DB Day
 
What's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksWhat's Coming Next in Sencha Frameworks
What's Coming Next in Sencha Frameworks
 
Midterm Winter 10
Midterm  Winter 10Midterm  Winter 10
Midterm Winter 10
 
CIS-166 Midterm
CIS-166 MidtermCIS-166 Midterm
CIS-166 Midterm
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
 
PyCon SG x Jublia - Building a simple-to-use Database Management tool
PyCon SG x Jublia - Building a simple-to-use Database Management toolPyCon SG x Jublia - Building a simple-to-use Database Management tool
PyCon SG x Jublia - Building a simple-to-use Database Management tool
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
 
Sequel
SequelSequel
Sequel
 

Dernier

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Dynamodb

  • 1. DynamoDB: Data Example userId date value unlockedAchievments hadr-fb 18-07-2012 72 [’10 days’, ‘2 levels day’] hadr-fb 19-07-2012 1 None hadr-fb 20-07-2012 56789 [‘top 10 progress’] Table: ‘Waldo-Scores’ Id platform Name JoinDate Score hadr fb Hadrien 31-02-2011 10 457 hadr G+ Hadrien 18-07-2012 357 pior fb Pior 12-12-2012 18 951 Table: ‘Players’
  • 2. Data types (Lean. . . ) Types single string (utf-8) number (entre 10-128 et 10+126 ) set string (utf-8) number Constraints no “Embeded Documents” no complex types (dates, . . . )
  • 3. Dimensionning 1/2: Big picture Units acces/s ∗ roundUp(kb) ∗ item provisionning updates are. . . constraining Storage tables are “elastic” 64KB max per item overhead = 100o per item
  • 4. Dimensionning 2/2: Traps and constraints TRAPS: Units are divided among each partition. Bigger tables often means higher throughput. Divide tables ? CONSTRAINTS for throughput: absolute min 5 max 10 000 1 single table in UPDATING state increase min 10% max 100% decrease min 10% max once a day
  • 5. Integrated Service 1/3: IAM API level table level (except for “ListTables”) Example: “Fair” Scores table use { "Statement":[{ "Effect":"Allow", "Action":["DynamoDB:DeleteItem", "DynamoDB:PutItem", "DynamoDB:UpdateItem", "DynamoDB:GetItem", "DynamoDB:Query"], "Resource": "arn:aws:DynamoDB:<region>:<account>:table/Scores" }] }
  • 6. Integrated Service 2/3: CloudWatch Metrics: SuccessfulRequestLatency UserErrors SystemErrors ThrottledRequests ConsumedReadCapacityUnits ConsumedWriteCapacityUnits ReturnedItemCount Metric’s context Table Operation ({Put, Delete, Update, Get, BatchGet}Item, Scan, Query)
  • 7. Integrated Service 3/3: EMR out of the scope of this presentation basically, HIVE integrated with DynamoDB => HiveQL use cases: custom index generation export to S3 (backup, data removal) data analysis / aggregation
  • 8. Data access 1/3: GetItem Fastest: primary key(s) 0-1 item Cost = 1 unit Example : ‘Hadrien’ Player of ‘fb’ platform table = conn.get_table(’Players’) item = table.get_item( hash_key=’hadr’, range_key=’fb’ )
  • 9. Data access 2/3: Query Fast primary key range key conditions =, <, >, <=, >=, startsWith 0+ item(s) Cost = 1 unit per returned item Example : All ‘Waldo-Scores’ of ‘hadr-fb’ Player table = conn.get_table(’Waldo-Scores’) item = table.get_item( hash_key=’hadr-fb’, #range_key_condition= )
  • 10. Data access 3/3: Scan Slooooow filter on any key tests ALL the table ! 0+ item(s) Cost = 1/2 unit for each parsed KB ! => Starvations Use case: get a full (small) table. Ex: ‘powerups’ Example : All days where ‘hadr-*’ did better than 100 table = conn.get_table(’Waldo-Scores’) item = table.get_item( scan_filter={ ’userId’: BEGINSWITH(’hadr-’), ’value’: GT(100) })
  • 11. Performance considerations: non indexed data 1/2 De-normalisation Ex: Waldo and Players table :) big picture: data duplication to fit the view point need
  • 12. Performance considerations: non indexed data 2/2 Scan sloooooow (sequential) (bad) unit consumption (sequential) EMR scales (less slow :p) (better) units consumption (parallele) TL;DR Index your data !
  • 13. Eventual vs strong consitence write => propagation ∼ 1s read => may not be up to date . . . Consistence Applications Cost (Units) performance strong critical 1 per KB good eventual aware 1/2 per KB maximal
  • 14. Critical/specific applications Redundancy/backup managed => no need “∼ Snapshot” => EMR + S3 ∼ Transactions conditional operations (idempotent) atomic counter (idempotent BUT strong consistence)
  • 15. API 1/3: Read Method Consistence Description Returns GetItem eventual/strong load by key 0-1 item BatchGetItem eventual/strong same // 0-100 item, 0-1MB Query eventual/strong rangeKey filter 0+ item, 0-1MB Scan eventual any key filter 0+ item, 0-1MB rule: 0-1 filter / eligible key unprocessed => ‘UnprocessedKeys’, ‘LastEvaluatedKey’ consumed units => ‘ConsumedCapacityUnits’ enforce strong consistence => ‘ConsistentRead’
  • 16. API 2/3: Edit Method Consistence Condition Changes PutItem create-replace yes 1 item DeleteItem supprime yes 1 item, 0-1MB BatchWriteItem create-up-del no 1-25 item UpdateItem create-up-del yes 1+ field, 1 item not processed / failure => ‘UnprocessedItems’ condition failed => ‘ConditionalCheckFailed’
  • 17. API 3/3: Structure Method Asynchronous Description CreateTable yes Create table - provision units DeleteTable yes self explanatory DescribeTable no Read size, status, throughput ListTables no Get tables starting with “. . . ” UpdateTables yes Update provisions “DELETING” table might answer requests until deleted
  • 18. TL;DR Let’s make it short :) Amazon scalable fully integrated Constraints throughput provisioning index matters