This slide deck accompanied a talk I gave at Boston's Google Cloud Meetup group in June of 2016. It chronicles our story of building out the Meta Search product using Google Cloud Platform, particularly App Engine, and finishes with a short walkthrough of a demo application.
7. Talk Roadmap
• What problems we face at Meta
• How we are solving them using GCP
• How you can get started on GCP
8. Building a product
• No baggage, free to choose whatever stack we want
• Take advantage of latest technologies
• but not quite bleeding edge
9. Engineering Goals
• This will be a complex product, it needs to be
comprehensible to everyone on our team
• Keep the team as lean as possible
• Focus on product, not sysadmin and dev ops
10. Language Choices
• Go chosen as our primary language
• Python for NLP and data analysis
• enables easy experimentation, comfortable for data
scientists and developers
• Java/Scala interacting with Dataflow, Apache Tika, etc.
11. Our Hard Problems
• User onboarding load
• Heterogeneous (changing) data sources
• Unpredictable traffic from web hooks
• Compute loads for file content analysis
• Processing streaming data
12. User Onboarding
• Crawl multiple cloud
accounts at once
• Parallel computation
• In-process using Go
• Distributed using tasks
• App Engine
Taskqueues
13. Heterogeneous Data
• Remove complexity of
third-party services
• Detect changes/
breakages in APIs
• Distributed by nature
• Continuous Deployment
• Datastore
• BigQuery
14. Unpredictable Traffic
• Changes are pushed to
us through web hooks
• Dropping changes
generally unacceptable
• One user should not
negatively impact others
• App Engine autoscaling
• Asynchronous task
queues
16. Stream Processing
• Efficient handling of
high-volume changes
• Collate events in
succession, from
multiple users
• Google Cloud Pub/Sub
• Google Cloud Dataflow
17. How we started off
• App Engine is our entry point
• Service Oriented Architecture
• Currently ~37 different services
• Cloud Datastore is our persistence layer
• BigQuery as a data warehouse
18. Documentation
• Lots of information for getting started
• Quality resources for our growing team
• Onboarding new developers without GCP
experience has been a breeze
• Google is devoting lots of resources to this area
19. App Engine
• Don’t worry about servers
• Cache, task queues, cron, database, logging,
monitoring, and more all built in
• Powerful, configurable autoscaling
• Heavy compute on App Engine Flexible Runtimes
20. Development Process
• Build, run, and test services locally
• Continuous deployment to a development project
• Incremental releases go to production project
• Logging and monitoring easy to setup
21. Problems we faced
• Mantra of “don’t worry about scalability”didn’t take us
very far
• Users have lots and lots of files
• Datastore use optimizations
• Cost issues with App Engine
• Trimming auto-scaling parameters
• Migrated heavy compute to Flexible Runtimes
22. Outside GCP
• Algolia
• Hosts infrastructure for our search indices
• Pusher
• realtime socket connections
• Postmark/Mailchimp
• transactional and campaign-based email
23. Growth of the platform
• Rapid changes and improvements taking place
• Flexible Runtimes
• Container Engine
• Dataflow
• Investing in a documentation overhaul soon
• Support is generally quite responsive
24. Recent Developments
• Introduction of Pub/Sub to our system for all event
processing
• Experimenting with Kubernetes/Container Engine
• Dataflow stream processing jobs
• Splitting functionality into multiple projects
31. Datastoretype Greeting struct {
Author string
Content string
Date time.Time
}
// guestbookKey returns the key used for all guestbook entries.
func guestbookKey(c appengine.Context) *datastore.Key {
// The string "default_guestbook" here could be varied to have multiple guestbooks.
return datastore.NewKey(c, "Guestbook", "default_guestbook", 0, nil)
}
func root(w http.ResponseWriter, r *http.Request) {
c := appengine.NewContext(r)
// Ancestor queries, as shown here, are strongly consistent with the High
// Replication Datastore. Queries that span entity groups are eventually
// consistent. If we omitted the .Ancestor from this query there would be
// a slight chance that Greeting that had just been written would not
// show up in a query.
q := datastore.NewQuery("Greeting").Ancestor(guestbookKey(c)).Order("-Date").Limit(10)
greetings := make([]Greeting, 0, 10)
if _, err := q.GetAll(c, &greetings); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
if err := guestbookTemplate.Execute(w, greetings); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
33. Forms
func sign(w http.ResponseWriter, r *http.Request) {
c := appengine.NewContext(r)
g := Greeting{
Content: r.FormValue("content"),
Date: time.Now(),
}
if u := user.Current(c); u != nil {
g.Author = u.String()
}
// We set the same parent key on every Greeting entity to ensure each Greeting
// is in the same entity group. Queries across the single entity group
// will be consistent. However, the write rate to a single entity group
// should be limited to ~1/second.
key := datastore.NewIncompleteKey(c, "Greeting", guestbookKey(c))
_, err := datastore.Put(c, key, &g)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
http.Redirect(w, r, "/", http.StatusFound)
}
34. Conclusions
• Google Cloud Platform has allowed us to build out
Meta in ways that wouldn’t otherwise be feasible
• Simplicity of App Engine allows us to focus on product
• Scalability/Availability are built in to the platform
35. access any file in seconds,
wherever it is.
www.meta.sc/careers
careers@meta.sc