SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Data Migrations in the
App Engine Datastore
Ryan Morlok
Co-Founder at Docalytics
@rmorlok
https://plus.google.com/u/0/+RyanMorlok
http://www.linkedin.com/in/ryanmorlok/
This Talk is in Python
Examples Use NDB
from google.appengine.ext import ndb
!
from google.appengine.ext import db
The Datastore
• Schemaless
• Entities of the same type can have different
properties
• Most applications express an effective schema
using application code
• All queries are served by pre-built indexes
Problem
• No general framework for making mass updates
(“schema changes”) to entities in the Datastore or
helping the in-code model evolve
• Frameworks like Rails and Django (South) have
tools to help manage this and are built on top of
relational databases that allow SQL to be used to
help migrate data
• We will look at techniques for doing this on the non-
relational App Engine Datastore
Initial Data Model
from google.appengine.ext import ndb
!
!
class BlogPost(ndb.Model):
title = ndb.StringProperty(required=True)
blurb = ndb.StringProperty(required=True)
content = ndb.StringProperty(required=True, indexed=False)
published = ndb.DateTimeProperty(auto_now_add=True, required=True)
!
!
class Comment(ndb.Model):
blog_post = ndb.KeyProperty(kind=BlogPost, required=True)
content = ndb.StringProperty(required=True, indexed=False)
timestamp = ndb.DateTimeProperty(required=True)
Add Likes
• Each comment has a count of “likes”
• Displayed with each comment
• No searching/sorting/etc
• Don’t care who liked what
Just Add Property with
Default Value
class Comment(ndb.Model):
blog_post = ndb.KeyProperty(kind=BlogPost, required=True)
content = ndb.StringProperty(required=True, indexed=False)
timestamp = ndb.DateTimeProperty(required=True)
likes = ndb.IntegerProperty(default=0)
Adding a Post URL Slug
• Want to change URLs from

http://example.com/posts/1234

to

http://example.com/posts/this-is-my-post
• Need to be able to query Datastore to locate the right
post for a given slug
• All Posts need to have slugs populated before we
start using them for URL lookups
• Slugs are determined by the title of the post
Steps
Update
Code to Use
Slug &
Make
Required
Deploy Deploy
Update
Model &
Create
Batch Job
Run Batch
Job
URLs
Using
Slug
Add Slug to Model
class BlogPost(ndb.Model):
title = ndb.StringProperty(required=True)
blurb = ndb.StringProperty(required=True)
content = ndb.StringProperty(required=True, indexed=False)
published = ndb.DateTimeProperty(auto_now_add=True, required=True)
slug = ndb.StringProperty()
Create Slugs for All Posts
Deferred Tasks
BATCH_SIZE = 10
!
def add_slugs(cursor=None):
posts, next_cursor, is_there_more 
= BlogPost.query().fetch_page(BATCH_SIZE, start_cursor=cursor)
!
for p in posts:
p.slug = BlogPost.slug_from_title(p.title)
!
if len(posts) > 0:
ndb.put_multi(posts)
!
if is_there_more:
deferred.defer(add_slugs, cursor=next_cursor)
Create Slugs for All Posts
MapReduce
from mapreduce import base_handler, mapper_pipeline, operation
!
def create_slug_mapper(post):
post.slug = BlogPost.slug_from_title(post.title)
yield operation.db.Put(post)
!
class CreateSlugsPipeline(base_handler.PipelineBase):
def run(self):
yield mapper_pipeline.MapperPipeline(
job_name="create slug",
handler_spec="module.create_slug_mapper",
input_reader_spec=
"mapreduce.input_readers.DatastoreInputReader",
params={
"entity_kind": "models.BlogPost"
},
shards=16)
Create Slugs for All Posts
MapReduce
pipeline = module.CreateSlugsPipeline()
pipeline.start()
Add Number of Comments
for Post
• When displaying the preview of a blog post, show
the number of comments
• Cannot query for count for pages of multiple posts;
too slow
• Need to de-normalize the data
• Need to go through and do computation for all
existing posts
Add Comment Count to
Model
class BlogPost(ndb.Model):
title = ndb.StringProperty(required=True)
blurb = ndb.StringProperty(required=True)
content = ndb.StringProperty(required=True, indexed=False)
published = ndb.DateTimeProperty(auto_now_add=True, required=True)
slug = ndb.StringProperty(required=True)
number_of_comments = ndb.IntegerProperty(default=0)
Count Comments for All
Posts Deferred Tasks
BATCH_SIZE = 10
!
@ndb.toplevel
def count_comments(cursor=None):
posts, next_cursor, is_there_more 
= BlogPost.query().fetch_page(BATCH_SIZE, start_cursor=cursor)
!
for p in posts:
p.number_of_comments 
= Comment.query(Comment.blog_post == p.key).count(limit=1000)
p.put_async()
!
if is_there_more:
deferred.defer(count_comments, cursor=next_cursor)
Count Comments for All
Posts MapReduce
from mapreduce import base_handler, operation, 
mapreduce_pipeline
!
def count_comments_mapper(comment):
yield (comment.blog_post.urlsafe(), "")
!
!
def count_comments_reducer(keystring, values):
post = ndb.Key(urlsafe=keystring).get()
post.number_of_comments = len(values)
yield operation.db.Put(post)
Count Comments for All
Posts MapReduce
class CountCommentsPipeline(base_handler.PipelineBase):
def run(self):
yield mapreduce_pipeline.MapreducePipeline(
job_name="count comments",
mapper_spec="module.count_comments_mapper",
reducer_spec="module.count_comments_reducer",
input_reader_spec=
"mapreduce.input_readers.DatastoreInputReader",
mapper_params={
"entity_kind": "models.Comment"
},
shards=16)
Deleting Properties from
Models
• Removing a property from your model class doesn’t
change the data in the DataStore
• May want to delete data to reduce entity size, keep
entities consistent, etc
class Cat(ndb.Model):
tail = ndb.StringProperty()
wiskers = ndb.StringProperty()
skin = ndb.StringProperty()
Deleting a Property - NDB
c = Cat.get_by_id(1234)
!
if 'skin' in c._properties:
del c._properties['skin']
c.put()
• Delete the property from the _properties dictionary
• Note that del cat.skin would set the property to
None
Deleting a Property - DB
• Model objects have a _properties dictionary, but it
cannot be used to delete a properties
• Two approaches: switch model to Expando or
direct Datastore access
Deleting a Property - DB
Expando
class Cat(db.Expando):
pass
!
c = Cat.get_by_id(12345)
!
del c.skin
c.put()
Deleting a Property - DB
Expando
Restore
model base
class
Deploy Deploy
Change
model base
class to
Expando
Run Batch
Job
Done
Downsides:
• Must change base class; problematic if using custom
base class with logic
• Requires two deploys
Deleting a Property - DB
Direct Datastore Access
from google.appengine.api import datastore
from google.appengine.api import datastore_errors
!
def get_entities(keys):
rpc = datastore.GetRpcFromKwargs({})
keys, multiple = datastore.NormalizeAndTypeCheckKeys(keys)
entities = None
try:
entities = datastore.Get(keys, rpc=rpc)
except datastore_errors.EntityNotFoundError:
assert not multiple
!
return entities
!
def put_entities(entities):
rpc = datastore.GetRpcFromKwargs({})
keys = datastore.Put(entities, rpc=rpc)
return keys
Deleting a Property - DB
Direct Datastore Access
key = db.Key.from_path('Cat', 12345)
c = get_entities([key])[0]
!
if 'skin' in c:
del c['skin']
put_entities([c])
Deleting a Property - DB
Direct Datastore Access
Advantages:
• Does not require multiple deploys which can be hard to
automate across multiple environments & developers
• Does not interfere with the base class of the model
Deploy
Change
model &
create batch
job
Run Batch
Job
Done
Renaming Models
• Approach depends on your need
• If you just want to keep the code clean, alias the model in
code
• Be cautious: GQL and other textual references to the
model for queries will need to use name in Datastore
• If you want the underlying entities renamed, need to create
new entities
• Problematic if there are keys to the entity or child entities
Rename Blog Post to Article NDB
class Article(ndb.Model):
!
@classmethod
def _get_kind(cls):
return 'BlogPost'
!
title = ndb.StringProperty()
blurb = ndb.StringProperty()
content = ndb.StringProperty()
published = ndb.DateTimeProperty()
slug = ndb.StringProperty()
number_of_comments = ndb.IntegerProperty()
class Article(db.Model):
!
@classmethod
def kind(cls):
return 'BlogPost'
!
title = db.StringProperty()
blurb = db.StringProperty()
content = db.StringProperty()
published = db.DateTimeProperty()
slug = db.StringProperty()
number_of_comments = db.IntegerProperty()
Rename Blog Post to Article DB
Renaming Fields
• Approach depends on your need
• If you just want to keep the code clean, alias the
model in code
• Be cautious: GQL and other textual references to
the field for queries will need to use name in
Datastore
• If you want the underlying fields on the entities
renamed, need multi-step migration
NDB!
url_slug = ndb.StringProperty(name='slug')
!
!
DB!
url_slug = db.StringProperty(name='slug')
Alias url_slug to slug
Truly Rename Property
1. Create new model property
2. Create new @property on the Class that sets the
value to both the new and the old property, but
gets from the old property
3. Update all get/sets to use the @property. Queries
still use the old property
4. Create Batch migration to copy values from old to
new property
Truly Rename Property (2)
5. Deploy the code & run the migration
6. Update code, remove @property and old model
property, everything uses new property, including
queries
7. Deploy
8. (optional) Create batch job to delete old property
On-the-Fly Migrations
• Create your own base model class that derives
from ndb.Model or db.Model
• Have a class-level property that defines the current
version of that model
• Store the schema version for each specific entity
• When an entity loads, compare its schema version
to the latest schema version. Run migrations as
needed
On-the-Fly Migrations
Advantages:!
• Allows for more aggressive
code roll out
• Let’s you be running a batch
migration in the background
while live code expects the
migrated schema
• Only migrate entities that
get used; don’t need to
mess with historical data
Disadvantages:!
• Cannot use it for
migrations that will need
to support queries
• Not all entities in same
schema version unless
combined with batch job
• Potential for
performance issues
Code Structure
• Docalytics has a specific format
for how we lay out migration
logic
• Module for each entity, with
each migration in its own
module
• Well-known names for migration
functions that are called from
the on-the-fly and batch logic
• Allows us to write the batch
logic once rather than specific
to the migration
migrations
BlogPost
01.py
02.py
Comment
01.py
02.py
03.py
Final Thoughts
• Pay attention to indexes
• Make sure added/renamed fields are added
index.yaml
• Delete old Indexes that aren’t needed
• Watch out for caching as you migrate data
• Think about the processes for how you manage
migrations both as code rolls to deployed
environments and to other developer workstations
Questions?
https://github.com/Docalytics/app-engine-datastore-migrations

Contenu connexe

Tendances

Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And GroovyKen Kousen
 
Create responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJSCreate responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJSHannes Hapke
 
Top100summit 谷歌-scott-improve your automated web application testing
Top100summit  谷歌-scott-improve your automated web application testingTop100summit  谷歌-scott-improve your automated web application testing
Top100summit 谷歌-scott-improve your automated web application testingdrewz lin
 
Aligning Ember.js with Web Standards
Aligning Ember.js with Web StandardsAligning Ember.js with Web Standards
Aligning Ember.js with Web StandardsMatthew Beale
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2fishwarter
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearchErhwen Kuo
 
Implement react pagination with react hooks and react paginate
Implement react pagination with react hooks and react paginateImplement react pagination with react hooks and react paginate
Implement react pagination with react hooks and react paginateKaty Slemon
 
Django Introduction & Tutorial
Django Introduction & TutorialDjango Introduction & Tutorial
Django Introduction & Tutorial之宇 趙
 
Intoduction to Play Framework
Intoduction to Play FrameworkIntoduction to Play Framework
Intoduction to Play FrameworkKnoldus Inc.
 
04 integrate entityframework
04 integrate entityframework04 integrate entityframework
04 integrate entityframeworkErhwen Kuo
 
Django for Beginners
Django for BeginnersDjango for Beginners
Django for BeginnersJason Davies
 
Apache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawaniApache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawaniBhawani N Prasad
 
Web development with django - Basics Presentation
Web development with django - Basics PresentationWeb development with django - Basics Presentation
Web development with django - Basics PresentationShrinath Shenoy
 
Intro to Web Development Using Python and Django
Intro to Web Development Using Python and DjangoIntro to Web Development Using Python and Django
Intro to Web Development Using Python and DjangoChariza Pladin
 
03 integrate webapisignalr
03 integrate webapisignalr03 integrate webapisignalr
03 integrate webapisignalrErhwen Kuo
 
Google Cloud Endpoints: Building Third-Party APIs on Google AppEngine
Google Cloud Endpoints: Building Third-Party APIs on Google AppEngineGoogle Cloud Endpoints: Building Third-Party APIs on Google AppEngine
Google Cloud Endpoints: Building Third-Party APIs on Google AppEngineRoman Kirillov
 
Django app deployment in Azure By Saurabh Agarwal
Django app deployment in Azure By Saurabh AgarwalDjango app deployment in Azure By Saurabh Agarwal
Django app deployment in Azure By Saurabh Agarwalratneshsinghparihar
 

Tendances (20)

Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And Groovy
 
Create responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJSCreate responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJS
 
Top100summit 谷歌-scott-improve your automated web application testing
Top100summit  谷歌-scott-improve your automated web application testingTop100summit  谷歌-scott-improve your automated web application testing
Top100summit 谷歌-scott-improve your automated web application testing
 
Aligning Ember.js with Web Standards
Aligning Ember.js with Web StandardsAligning Ember.js with Web Standards
Aligning Ember.js with Web Standards
 
Django Girls Tutorial
Django Girls TutorialDjango Girls Tutorial
Django Girls Tutorial
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
2012: ql.io and Node.js
2012: ql.io and Node.js2012: ql.io and Node.js
2012: ql.io and Node.js
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearch
 
Implement react pagination with react hooks and react paginate
Implement react pagination with react hooks and react paginateImplement react pagination with react hooks and react paginate
Implement react pagination with react hooks and react paginate
 
Django Introduction & Tutorial
Django Introduction & TutorialDjango Introduction & Tutorial
Django Introduction & Tutorial
 
Intoduction to Play Framework
Intoduction to Play FrameworkIntoduction to Play Framework
Intoduction to Play Framework
 
04 integrate entityframework
04 integrate entityframework04 integrate entityframework
04 integrate entityframework
 
Django for Beginners
Django for BeginnersDjango for Beginners
Django for Beginners
 
Apache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawaniApache spark with akka couchbase code by bhawani
Apache spark with akka couchbase code by bhawani
 
Web development with django - Basics Presentation
Web development with django - Basics PresentationWeb development with django - Basics Presentation
Web development with django - Basics Presentation
 
Intro to Web Development Using Python and Django
Intro to Web Development Using Python and DjangoIntro to Web Development Using Python and Django
Intro to Web Development Using Python and Django
 
03 integrate webapisignalr
03 integrate webapisignalr03 integrate webapisignalr
03 integrate webapisignalr
 
Google Ajax APIs
Google Ajax APIsGoogle Ajax APIs
Google Ajax APIs
 
Google Cloud Endpoints: Building Third-Party APIs on Google AppEngine
Google Cloud Endpoints: Building Third-Party APIs on Google AppEngineGoogle Cloud Endpoints: Building Third-Party APIs on Google AppEngine
Google Cloud Endpoints: Building Third-Party APIs on Google AppEngine
 
Django app deployment in Azure By Saurabh Agarwal
Django app deployment in Azure By Saurabh AgarwalDjango app deployment in Azure By Saurabh Agarwal
Django app deployment in Azure By Saurabh Agarwal
 

En vedette (6)

2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore
 
Webapp2 2.2
Webapp2 2.2Webapp2 2.2
Webapp2 2.2
 
Google cloud datastore driver for Google Apps Script DB abstraction
Google cloud datastore driver for Google Apps Script DB abstractionGoogle cloud datastore driver for Google Apps Script DB abstraction
Google cloud datastore driver for Google Apps Script DB abstraction
 
google Bigtable
google Bigtablegoogle Bigtable
google Bigtable
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
The Google Bigtable
The Google BigtableThe Google Bigtable
The Google Bigtable
 

Similaire à Data Migrations in the App Engine Datastore

dotNet Miami - June 21, 2012: Richie Rump: Entity Framework: Code First and M...
dotNet Miami - June 21, 2012: Richie Rump: Entity Framework: Code First and M...dotNet Miami - June 21, 2012: Richie Rump: Entity Framework: Code First and M...
dotNet Miami - June 21, 2012: Richie Rump: Entity Framework: Code First and M...dotNet Miami
 
Entity Framework: Code First and Magic Unicorns
Entity Framework: Code First and Magic UnicornsEntity Framework: Code First and Magic Unicorns
Entity Framework: Code First and Magic UnicornsRichie Rump
 
Adding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsAdding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsJeff Durta
 
Angular or Backbone: Go Mobile!
Angular or Backbone: Go Mobile!Angular or Backbone: Go Mobile!
Angular or Backbone: Go Mobile!Doris Chen
 
Python Programming for ArcGIS: Part II
Python Programming for ArcGIS: Part IIPython Programming for ArcGIS: Part II
Python Programming for ArcGIS: Part IIDUSPviz
 
GDG Addis - An Introduction to Django and App Engine
GDG Addis - An Introduction to Django and App EngineGDG Addis - An Introduction to Django and App Engine
GDG Addis - An Introduction to Django and App EngineYared Ayalew
 
Drupal Best Practices
Drupal Best PracticesDrupal Best Practices
Drupal Best Practicesmanugoel2003
 
Django class based views (Dutch Django meeting presentation)
Django class based views (Dutch Django meeting presentation)Django class based views (Dutch Django meeting presentation)
Django class based views (Dutch Django meeting presentation)Reinout van Rees
 
The Django Book - Chapter 5: Models
The Django Book - Chapter 5: ModelsThe Django Book - Chapter 5: Models
The Django Book - Chapter 5: ModelsSharon Chen
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Rajeev Rastogi (KRR)
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native CompilationPGConf APAC
 
Introduction Django
Introduction DjangoIntroduction Django
Introduction DjangoWade Austin
 

Similaire à Data Migrations in the App Engine Datastore (20)

Django Pro ORM
Django Pro ORMDjango Pro ORM
Django Pro ORM
 
dotNet Miami - June 21, 2012: Richie Rump: Entity Framework: Code First and M...
dotNet Miami - June 21, 2012: Richie Rump: Entity Framework: Code First and M...dotNet Miami - June 21, 2012: Richie Rump: Entity Framework: Code First and M...
dotNet Miami - June 21, 2012: Richie Rump: Entity Framework: Code First and M...
 
Entity Framework: Code First and Magic Unicorns
Entity Framework: Code First and Magic UnicornsEntity Framework: Code First and Magic Unicorns
Entity Framework: Code First and Magic Unicorns
 
Data herding
Data herdingData herding
Data herding
 
Data herding
Data herdingData herding
Data herding
 
Java scriptforjavadev part2a
Java scriptforjavadev part2aJava scriptforjavadev part2a
Java scriptforjavadev part2a
 
JS Essence
JS EssenceJS Essence
JS Essence
 
Adding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsAdding a modern twist to legacy web applications
Adding a modern twist to legacy web applications
 
Angular or Backbone: Go Mobile!
Angular or Backbone: Go Mobile!Angular or Backbone: Go Mobile!
Angular or Backbone: Go Mobile!
 
Group111
Group111Group111
Group111
 
Python Programming for ArcGIS: Part II
Python Programming for ArcGIS: Part IIPython Programming for ArcGIS: Part II
Python Programming for ArcGIS: Part II
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
 
GDG Addis - An Introduction to Django and App Engine
GDG Addis - An Introduction to Django and App EngineGDG Addis - An Introduction to Django and App Engine
GDG Addis - An Introduction to Django and App Engine
 
Drupal Best Practices
Drupal Best PracticesDrupal Best Practices
Drupal Best Practices
 
Django class based views (Dutch Django meeting presentation)
Django class based views (Dutch Django meeting presentation)Django class based views (Dutch Django meeting presentation)
Django class based views (Dutch Django meeting presentation)
 
The Django Book - Chapter 5: Models
The Django Book - Chapter 5: ModelsThe Django Book - Chapter 5: Models
The Django Book - Chapter 5: Models
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
 
Django tricks (2)
Django tricks (2)Django tricks (2)
Django tricks (2)
 
Introduction Django
Introduction DjangoIntroduction Django
Introduction Django
 

Dernier

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 

Dernier (20)

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 

Data Migrations in the App Engine Datastore

  • 1. Data Migrations in the App Engine Datastore
  • 2. Ryan Morlok Co-Founder at Docalytics @rmorlok https://plus.google.com/u/0/+RyanMorlok http://www.linkedin.com/in/ryanmorlok/
  • 3. This Talk is in Python
  • 4. Examples Use NDB from google.appengine.ext import ndb ! from google.appengine.ext import db
  • 5. The Datastore • Schemaless • Entities of the same type can have different properties • Most applications express an effective schema using application code • All queries are served by pre-built indexes
  • 6. Problem • No general framework for making mass updates (“schema changes”) to entities in the Datastore or helping the in-code model evolve • Frameworks like Rails and Django (South) have tools to help manage this and are built on top of relational databases that allow SQL to be used to help migrate data • We will look at techniques for doing this on the non- relational App Engine Datastore
  • 7. Initial Data Model from google.appengine.ext import ndb ! ! class BlogPost(ndb.Model): title = ndb.StringProperty(required=True) blurb = ndb.StringProperty(required=True) content = ndb.StringProperty(required=True, indexed=False) published = ndb.DateTimeProperty(auto_now_add=True, required=True) ! ! class Comment(ndb.Model): blog_post = ndb.KeyProperty(kind=BlogPost, required=True) content = ndb.StringProperty(required=True, indexed=False) timestamp = ndb.DateTimeProperty(required=True)
  • 8. Add Likes • Each comment has a count of “likes” • Displayed with each comment • No searching/sorting/etc • Don’t care who liked what
  • 9. Just Add Property with Default Value class Comment(ndb.Model): blog_post = ndb.KeyProperty(kind=BlogPost, required=True) content = ndb.StringProperty(required=True, indexed=False) timestamp = ndb.DateTimeProperty(required=True) likes = ndb.IntegerProperty(default=0)
  • 10. Adding a Post URL Slug • Want to change URLs from
 http://example.com/posts/1234
 to
 http://example.com/posts/this-is-my-post • Need to be able to query Datastore to locate the right post for a given slug • All Posts need to have slugs populated before we start using them for URL lookups • Slugs are determined by the title of the post
  • 11. Steps Update Code to Use Slug & Make Required Deploy Deploy Update Model & Create Batch Job Run Batch Job URLs Using Slug
  • 12. Add Slug to Model class BlogPost(ndb.Model): title = ndb.StringProperty(required=True) blurb = ndb.StringProperty(required=True) content = ndb.StringProperty(required=True, indexed=False) published = ndb.DateTimeProperty(auto_now_add=True, required=True) slug = ndb.StringProperty()
  • 13. Create Slugs for All Posts Deferred Tasks BATCH_SIZE = 10 ! def add_slugs(cursor=None): posts, next_cursor, is_there_more = BlogPost.query().fetch_page(BATCH_SIZE, start_cursor=cursor) ! for p in posts: p.slug = BlogPost.slug_from_title(p.title) ! if len(posts) > 0: ndb.put_multi(posts) ! if is_there_more: deferred.defer(add_slugs, cursor=next_cursor)
  • 14. Create Slugs for All Posts MapReduce from mapreduce import base_handler, mapper_pipeline, operation ! def create_slug_mapper(post): post.slug = BlogPost.slug_from_title(post.title) yield operation.db.Put(post) ! class CreateSlugsPipeline(base_handler.PipelineBase): def run(self): yield mapper_pipeline.MapperPipeline( job_name="create slug", handler_spec="module.create_slug_mapper", input_reader_spec= "mapreduce.input_readers.DatastoreInputReader", params={ "entity_kind": "models.BlogPost" }, shards=16)
  • 15. Create Slugs for All Posts MapReduce pipeline = module.CreateSlugsPipeline() pipeline.start()
  • 16. Add Number of Comments for Post • When displaying the preview of a blog post, show the number of comments • Cannot query for count for pages of multiple posts; too slow • Need to de-normalize the data • Need to go through and do computation for all existing posts
  • 17. Add Comment Count to Model class BlogPost(ndb.Model): title = ndb.StringProperty(required=True) blurb = ndb.StringProperty(required=True) content = ndb.StringProperty(required=True, indexed=False) published = ndb.DateTimeProperty(auto_now_add=True, required=True) slug = ndb.StringProperty(required=True) number_of_comments = ndb.IntegerProperty(default=0)
  • 18. Count Comments for All Posts Deferred Tasks BATCH_SIZE = 10 ! @ndb.toplevel def count_comments(cursor=None): posts, next_cursor, is_there_more = BlogPost.query().fetch_page(BATCH_SIZE, start_cursor=cursor) ! for p in posts: p.number_of_comments = Comment.query(Comment.blog_post == p.key).count(limit=1000) p.put_async() ! if is_there_more: deferred.defer(count_comments, cursor=next_cursor)
  • 19. Count Comments for All Posts MapReduce from mapreduce import base_handler, operation, mapreduce_pipeline ! def count_comments_mapper(comment): yield (comment.blog_post.urlsafe(), "") ! ! def count_comments_reducer(keystring, values): post = ndb.Key(urlsafe=keystring).get() post.number_of_comments = len(values) yield operation.db.Put(post)
  • 20. Count Comments for All Posts MapReduce class CountCommentsPipeline(base_handler.PipelineBase): def run(self): yield mapreduce_pipeline.MapreducePipeline( job_name="count comments", mapper_spec="module.count_comments_mapper", reducer_spec="module.count_comments_reducer", input_reader_spec= "mapreduce.input_readers.DatastoreInputReader", mapper_params={ "entity_kind": "models.Comment" }, shards=16)
  • 21. Deleting Properties from Models • Removing a property from your model class doesn’t change the data in the DataStore • May want to delete data to reduce entity size, keep entities consistent, etc class Cat(ndb.Model): tail = ndb.StringProperty() wiskers = ndb.StringProperty() skin = ndb.StringProperty()
  • 22. Deleting a Property - NDB c = Cat.get_by_id(1234) ! if 'skin' in c._properties: del c._properties['skin'] c.put() • Delete the property from the _properties dictionary • Note that del cat.skin would set the property to None
  • 23. Deleting a Property - DB • Model objects have a _properties dictionary, but it cannot be used to delete a properties • Two approaches: switch model to Expando or direct Datastore access
  • 24. Deleting a Property - DB Expando class Cat(db.Expando): pass ! c = Cat.get_by_id(12345) ! del c.skin c.put()
  • 25. Deleting a Property - DB Expando Restore model base class Deploy Deploy Change model base class to Expando Run Batch Job Done Downsides: • Must change base class; problematic if using custom base class with logic • Requires two deploys
  • 26. Deleting a Property - DB Direct Datastore Access from google.appengine.api import datastore from google.appengine.api import datastore_errors ! def get_entities(keys): rpc = datastore.GetRpcFromKwargs({}) keys, multiple = datastore.NormalizeAndTypeCheckKeys(keys) entities = None try: entities = datastore.Get(keys, rpc=rpc) except datastore_errors.EntityNotFoundError: assert not multiple ! return entities ! def put_entities(entities): rpc = datastore.GetRpcFromKwargs({}) keys = datastore.Put(entities, rpc=rpc) return keys
  • 27. Deleting a Property - DB Direct Datastore Access key = db.Key.from_path('Cat', 12345) c = get_entities([key])[0] ! if 'skin' in c: del c['skin'] put_entities([c])
  • 28. Deleting a Property - DB Direct Datastore Access Advantages: • Does not require multiple deploys which can be hard to automate across multiple environments & developers • Does not interfere with the base class of the model Deploy Change model & create batch job Run Batch Job Done
  • 29. Renaming Models • Approach depends on your need • If you just want to keep the code clean, alias the model in code • Be cautious: GQL and other textual references to the model for queries will need to use name in Datastore • If you want the underlying entities renamed, need to create new entities • Problematic if there are keys to the entity or child entities
  • 30. Rename Blog Post to Article NDB class Article(ndb.Model): ! @classmethod def _get_kind(cls): return 'BlogPost' ! title = ndb.StringProperty() blurb = ndb.StringProperty() content = ndb.StringProperty() published = ndb.DateTimeProperty() slug = ndb.StringProperty() number_of_comments = ndb.IntegerProperty()
  • 31. class Article(db.Model): ! @classmethod def kind(cls): return 'BlogPost' ! title = db.StringProperty() blurb = db.StringProperty() content = db.StringProperty() published = db.DateTimeProperty() slug = db.StringProperty() number_of_comments = db.IntegerProperty() Rename Blog Post to Article DB
  • 32. Renaming Fields • Approach depends on your need • If you just want to keep the code clean, alias the model in code • Be cautious: GQL and other textual references to the field for queries will need to use name in Datastore • If you want the underlying fields on the entities renamed, need multi-step migration
  • 33. NDB! url_slug = ndb.StringProperty(name='slug') ! ! DB! url_slug = db.StringProperty(name='slug') Alias url_slug to slug
  • 34. Truly Rename Property 1. Create new model property 2. Create new @property on the Class that sets the value to both the new and the old property, but gets from the old property 3. Update all get/sets to use the @property. Queries still use the old property 4. Create Batch migration to copy values from old to new property
  • 35. Truly Rename Property (2) 5. Deploy the code & run the migration 6. Update code, remove @property and old model property, everything uses new property, including queries 7. Deploy 8. (optional) Create batch job to delete old property
  • 36. On-the-Fly Migrations • Create your own base model class that derives from ndb.Model or db.Model • Have a class-level property that defines the current version of that model • Store the schema version for each specific entity • When an entity loads, compare its schema version to the latest schema version. Run migrations as needed
  • 37. On-the-Fly Migrations Advantages:! • Allows for more aggressive code roll out • Let’s you be running a batch migration in the background while live code expects the migrated schema • Only migrate entities that get used; don’t need to mess with historical data Disadvantages:! • Cannot use it for migrations that will need to support queries • Not all entities in same schema version unless combined with batch job • Potential for performance issues
  • 38. Code Structure • Docalytics has a specific format for how we lay out migration logic • Module for each entity, with each migration in its own module • Well-known names for migration functions that are called from the on-the-fly and batch logic • Allows us to write the batch logic once rather than specific to the migration migrations BlogPost 01.py 02.py Comment 01.py 02.py 03.py
  • 39. Final Thoughts • Pay attention to indexes • Make sure added/renamed fields are added index.yaml • Delete old Indexes that aren’t needed • Watch out for caching as you migrate data • Think about the processes for how you manage migrations both as code rolls to deployed environments and to other developer workstations