SlideShare une entreprise Scribd logo
1  sur  45
Inside Wordnik's Architecture

          Tony Tam
          @fehguy
Who is Wordnik?

• Founded in 2008 by Erin McKean
• "Understand meaning of words
  automatically"
• Patented "Free-Range Definition"
  technology
• Constructed largest (known) English Word
  Graph
               We do Discovery
It's all about Data!
Data?

• Word Graph is                       80 S
 built by data
                                     reads!
• Runtime answers
 needed fast



     50M+
     Nodes!

                            80M+
                            Edges!
What we do with Data

• Update the Graph constantly
• Augment our NLP pipeline
• "Reality-based Annotation" with
  current, real-world data
What we do with Data

• Update the Graph constantly
• Augment our NLP pipeline
• "Reality-based Annotation" with
  current, real-world data
                Language
                 is NOT
                  static
What we do with Data

• Update the Graph constantly
• Augment our NLP pipeline
• "Reality-based Annotation" with
 Next???
  current, real-world data          Twitter?
                Language
                 is NOT
                  static
 Tumblr?                            Wordpres
                                       s
Is a 20 year-old corpus good enough?
How we do it

• Amazon EC2-based deployment
• Efficiency through constraint-based
  architecture
  •   Small is Big!
• Horizontal scaling by adding servers!
  •   Yea, we can always go vertical
• Blah, blah, more details!
Micro Services

• Services are stand-alone building blocks
• Increase capacity through a "more like this"
  button
Micro Services

• Big application => micro services


Monolithic
application



    "Isn't this
       just
     SOA?"
Micro Services

• Big application => micro services


Monolithic
application



    "Isn't this
       just
     SOA?"
Micro Services

• Big application => micro services


Monolithic
application



    "Isn't this
       just
     SOA?"
Micro Services

• Big application => micro services


Monolithic
application



    "Isn't this
       just
     SOA?"
Not PO-SOA
• This is different
  •   No proprietary message bus
  •   Decoupled objects
  •   Dedicated storage***
• Speak REST
  •   Develop your services in…
      •   Java
      •   Scala
      •   Ruby
      •   Php
Speak REST?

• Sounds good but…
 •   REST semantics vary wildly
 •   HATEOAS vs. practical REST?
/api/pet.json/1?delete (GET)
/api/pet.json/1 (DELETE)            Al
/api/pet.json/1 (POST empty)       valid!


So…
Speak REST?

• Sounds good but…
 •   REST semantics vary wildly
 •   HATEOAS vs. practical REST?
/api/pet.json/1?delete (GET)
                         Peer            All
/api/pet.json/1     (DELETE)
                       Review!          valid!
/api/pet.json/1 (POST empty)
     Better
      Docs!
So…
                                      API
        API                        Styleguide
      Council!                          !
SOA makes new Challenges
• It's communication (not easy)
• Need a consumer & provider contract
• Driving force to create Swagger
What is Swagger?

• Swagger is…
  •   Spec for declaring and documenting an API
  •   A framework for auto-generating the spec
  •   A library for client library generation
  •   A JSON-based test framework
• It's open source!
  •   http://swagger.wordnik.com
How?

• Swagger Codegen
  •   Creates a client based on your Swagger Spec
scala src/main/scala/Codegen.scala 
  ${swagger-spec-url}




                                            Scal
                                             a


                                   Ruby
In the Wordnik Workflow
• Jenkins will…
 •   Build a service library
 •   Build a stand-alone application distro
 •   Build an installable image (RPM)
 •   Build a compatible client library
• Consumers will…
 •   Declare dependency on a service version
 •   Use a client for that version
 •   Be given a list of compatible services, by
     cluster, version
Back to Data

• Micro services have small(ish) databases
 •   Share nothing across services
 •   YES To replica sets
• Deployed to ephemeral storage
 •   (more in a bit)
 •   Small by design
• How to keep them small?
Keeping Databases Small

• Some easy tricks
 •   Schema-less => "schema per document"
 •   Keep field names short!
db.foo.save({user_name:"Tony"})
                         Repeat
db.foo.save({un:"Tony"})10e9 times!
• Indexes
 •   They can get *huge*
 •   Make _id matter!
Keeping Databases Small

• Some easy tricks
 •   Schema-less => "schema per document"
 •   Keep field names short!
db.foo.save({user_name:"Tony"})
                         Repeat
db.foo.save({un:"Tony"})10e9 times!
• Indexes
 •   They can get *huge*
 •   Make _id matter!
Keeping Databases Small
• Don't make _id just an "auto increment"
 You're stuck with it! Be smart
 •   User collection? Try _id: username
 •   Email collection? Try _id: email
 •   Date-driven collection? How about _id: "20120502"
     •   db.logins.find({_id:/^201205/})      1
                                              7




         Be lazy until
          you can't
          anymore!                                1      2
                                                  5      7
Keeping Databases Small

• DAO or die!
 •   Fancy index scheme => control access to
     collections
                           NO!!!!




                                               Yes
Keeping Databases Small

• If/when you need to shard…



                                  Don't
                                  make
                                  your
                               clients do
                                  this!
Keeping Databases Small

• Again, why keep them small?
• Starting a new replica
 •   Initial sync
 •   Index rebuilding
• Backups
• Index Compaction
• Speed
• TCO
Keeping Databases Small

• Again, why keep them small?
                            Everythin
• Starting a new replica      g is
 •   Initial sync            easier
 •   Index rebuilding
• Backups
• Index Compaction
• Speed This can
• TCO         take
                    DAYS
Ephemeral Storage?

• Every EC2 instance type has some
  (except micro)
• Only available via EC2 API
• Less prone to issues than EBS
• Faster ***
• Included in cost of server
Ephemeral Storage?

• Every EC2 instance type has some
  (except micro)
• Only available via EC2 API
• Less prone to issues than EBS
• Faster ***
• Included in cost of server
                   But dies
                   on host
                   reboot!
Keeping Data Safe
Which Zone? Which Region?
Which Zone? Which Region?




Arbiter handles
    external
  connectivity
issue detection
How does this really stack up?

• Tuned indexes & access, split with services
  •   Was: 3 DAS Devices w/18 TB disk
  •   Now: 21 M1.large + M1.xlarge instances
      •   3 Zones, 2 regions
• The Gory Details
blog.wordnik.com/with-software-small-is-the-new-big
As for Services

• ~1,000 requests/sec via Swagger-enabled
  micro services
• Direct to Consumer via SwaggerSocket
What's Next

• Migrating all services to SwaggerSocket
 •   OSS WebSocket subprotocol
https://github.com/wordnik/swaggersocket
 •   25%-100% speed increase (sync & async)
• Discovery via Wordnik
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
See more:
developer.wordnik.com
swagger.wordnik.com
github.com/wordnik

            Questions?

Contenu connexe

Tendances

Alfresco Process Services REST API - Alfresco DevCon 2018
 Alfresco Process Services REST API - Alfresco DevCon 2018 Alfresco Process Services REST API - Alfresco DevCon 2018
Alfresco Process Services REST API - Alfresco DevCon 2018Dennis Koch
 
Do's and Don'ts of APIs
Do's and Don'ts of APIsDo's and Don'ts of APIs
Do's and Don'ts of APIsJason Harmon
 
CI/CD and Asset Serving for Single Page Apps
CI/CD and Asset Serving for Single Page AppsCI/CD and Asset Serving for Single Page Apps
CI/CD and Asset Serving for Single Page AppsMike North
 
Rest in practice
Rest in practiceRest in practice
Rest in practiceIan Brennan
 
Building the Eventbrite API Ecosystem
Building the Eventbrite API EcosystemBuilding the Eventbrite API Ecosystem
Building the Eventbrite API EcosystemMitch Colleran
 
Familiarity Breeds Contempt (Or why all APIs suck, even yours.)
Familiarity Breeds Contempt (Or why all APIs suck, even yours.)Familiarity Breeds Contempt (Or why all APIs suck, even yours.)
Familiarity Breeds Contempt (Or why all APIs suck, even yours.)Stephen Darlington
 
Building APIs with Node.js and Swagger
Building APIs with Node.js and SwaggerBuilding APIs with Node.js and Swagger
Building APIs with Node.js and SwaggerJeremy Whitlock
 
Design for scale
Design for scaleDesign for scale
Design for scaleDoug Lampe
 
Five Ways to Scale your API Without Touching Your Code
Five Ways to Scale your API Without Touching Your CodeFive Ways to Scale your API Without Touching Your Code
Five Ways to Scale your API Without Touching Your Code3scale
 
Swagger in the API Lifecycle
Swagger in the API LifecycleSwagger in the API Lifecycle
Swagger in the API LifecycleOle Lensmar
 
Design Driven API Development
Design Driven API DevelopmentDesign Driven API Development
Design Driven API DevelopmentSokichi Fujita
 
ADF Basics and Beyond - Alfresco Devcon 2018
ADF Basics and Beyond - Alfresco Devcon 2018ADF Basics and Beyond - Alfresco Devcon 2018
ADF Basics and Beyond - Alfresco Devcon 2018Mario Romano
 
Coders Workshop: API First Mobile Development Featuring Angular and Node
Coders Workshop: API First Mobile Development Featuring Angular and NodeCoders Workshop: API First Mobile Development Featuring Angular and Node
Coders Workshop: API First Mobile Development Featuring Angular and NodeApigee | Google Cloud
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your siteHoang Nguyen
 
Api Design Anti-Patterns
Api Design Anti-PatternsApi Design Anti-Patterns
Api Design Anti-PatternsJason Harmon
 
Premature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilPremature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilFabio Akita
 
User-percieved performance
User-percieved performanceUser-percieved performance
User-percieved performanceMike North
 
Serverless Apps
Serverless AppsServerless Apps
Serverless Appsjoehack3r
 

Tendances (20)

Alfresco Process Services REST API - Alfresco DevCon 2018
 Alfresco Process Services REST API - Alfresco DevCon 2018 Alfresco Process Services REST API - Alfresco DevCon 2018
Alfresco Process Services REST API - Alfresco DevCon 2018
 
Api Design
Api DesignApi Design
Api Design
 
Do's and Don'ts of APIs
Do's and Don'ts of APIsDo's and Don'ts of APIs
Do's and Don'ts of APIs
 
CI/CD and Asset Serving for Single Page Apps
CI/CD and Asset Serving for Single Page AppsCI/CD and Asset Serving for Single Page Apps
CI/CD and Asset Serving for Single Page Apps
 
Let's Jira do the work
Let's Jira do the workLet's Jira do the work
Let's Jira do the work
 
Rest in practice
Rest in practiceRest in practice
Rest in practice
 
Building the Eventbrite API Ecosystem
Building the Eventbrite API EcosystemBuilding the Eventbrite API Ecosystem
Building the Eventbrite API Ecosystem
 
Familiarity Breeds Contempt (Or why all APIs suck, even yours.)
Familiarity Breeds Contempt (Or why all APIs suck, even yours.)Familiarity Breeds Contempt (Or why all APIs suck, even yours.)
Familiarity Breeds Contempt (Or why all APIs suck, even yours.)
 
Building APIs with Node.js and Swagger
Building APIs with Node.js and SwaggerBuilding APIs with Node.js and Swagger
Building APIs with Node.js and Swagger
 
Design for scale
Design for scaleDesign for scale
Design for scale
 
Five Ways to Scale your API Without Touching Your Code
Five Ways to Scale your API Without Touching Your CodeFive Ways to Scale your API Without Touching Your Code
Five Ways to Scale your API Without Touching Your Code
 
Swagger in the API Lifecycle
Swagger in the API LifecycleSwagger in the API Lifecycle
Swagger in the API Lifecycle
 
Design Driven API Development
Design Driven API DevelopmentDesign Driven API Development
Design Driven API Development
 
ADF Basics and Beyond - Alfresco Devcon 2018
ADF Basics and Beyond - Alfresco Devcon 2018ADF Basics and Beyond - Alfresco Devcon 2018
ADF Basics and Beyond - Alfresco Devcon 2018
 
Coders Workshop: API First Mobile Development Featuring Angular and Node
Coders Workshop: API First Mobile Development Featuring Angular and NodeCoders Workshop: API First Mobile Development Featuring Angular and Node
Coders Workshop: API First Mobile Development Featuring Angular and Node
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
 
Api Design Anti-Patterns
Api Design Anti-PatternsApi Design Anti-Patterns
Api Design Anti-Patterns
 
Premature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilPremature optimisation: The Root of All Evil
Premature optimisation: The Root of All Evil
 
User-percieved performance
User-percieved performanceUser-percieved performance
User-percieved performance
 
Serverless Apps
Serverless AppsServerless Apps
Serverless Apps
 

Similaire à Inside Wordnik's Architecture

Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the CloudTony Tam
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swaggerTony Tam
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?DATAVERSITY
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on RailsAvi Kedar
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of MillionsErik Onnen
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Austin NoSQL 2011-07-06
Austin NoSQL 2011-07-06Austin NoSQL 2011-07-06
Austin NoSQL 2011-07-06jimbojsb
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
Message:Passing - lpw 2012
Message:Passing - lpw 2012Message:Passing - lpw 2012
Message:Passing - lpw 2012Tomas Doran
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Ohio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPOhio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPWesley Workman
 
Social dev camp_2011
Social dev camp_2011Social dev camp_2011
Social dev camp_2011Craig Ulliott
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help? Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help? Applatix
 
My Little Webap - DevOpsSec is Magic
My Little Webap - DevOpsSec is MagicMy Little Webap - DevOpsSec is Magic
My Little Webap - DevOpsSec is MagicApollo Clark
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisWill Iverson
 

Similaire à Inside Wordnik's Architecture (20)

Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swagger
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Austin NoSQL 2011-07-06
Austin NoSQL 2011-07-06Austin NoSQL 2011-07-06
Austin NoSQL 2011-07-06
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Message:Passing - lpw 2012
Message:Passing - lpw 2012Message:Passing - lpw 2012
Message:Passing - lpw 2012
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
Ohio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPOhio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCP
 
Social dev camp_2011
Social dev camp_2011Social dev camp_2011
Social dev camp_2011
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help? Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help?
 
My Little Webap - DevOpsSec is Magic
My Little Webap - DevOpsSec is MagicMy Little Webap - DevOpsSec is Magic
My Little Webap - DevOpsSec is Magic
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatis
 

Plus de Tony Tam

API Design first with Swagger
API Design first with SwaggerAPI Design first with Swagger
API Design first with SwaggerTony Tam
 
Writer APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorWriter APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorTony Tam
 
Fastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerFastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerTony Tam
 
Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Tony Tam
 
Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Tony Tam
 
Swagger for-your-api
Swagger for-your-apiSwagger for-your-api
Swagger for-your-apiTony Tam
 
Swagger for startups
Swagger for startupsSwagger for startups
Swagger for startupsTony Tam
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without InterferenceTony Tam
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data SafeTony Tam
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relationalTony Tam
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDBTony Tam
 
Managing a MongoDB Deployment
Managing a MongoDB DeploymentManaging a MongoDB Deployment
Managing a MongoDB DeploymentTony Tam
 
Keeping the Lights On with MongoDB
Keeping the Lights On with MongoDBKeeping the Lights On with MongoDB
Keeping the Lights On with MongoDBTony Tam
 
Migrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at WordnikMigrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at WordnikTony Tam
 

Plus de Tony Tam (14)

API Design first with Swagger
API Design first with SwaggerAPI Design first with Swagger
API Design first with Swagger
 
Writer APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorWriter APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger Inflector
 
Fastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerFastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + Swagger
 
Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)
 
Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)
 
Swagger for-your-api
Swagger for-your-apiSwagger for-your-api
Swagger for-your-api
 
Swagger for startups
Swagger for startupsSwagger for startups
Swagger for startups
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 
Managing a MongoDB Deployment
Managing a MongoDB DeploymentManaging a MongoDB Deployment
Managing a MongoDB Deployment
 
Keeping the Lights On with MongoDB
Keeping the Lights On with MongoDBKeeping the Lights On with MongoDB
Keeping the Lights On with MongoDB
 
Migrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at WordnikMigrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at Wordnik
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Inside Wordnik's Architecture

  • 1. Inside Wordnik's Architecture Tony Tam @fehguy
  • 2. Who is Wordnik? • Founded in 2008 by Erin McKean • "Understand meaning of words automatically" • Patented "Free-Range Definition" technology • Constructed largest (known) English Word Graph We do Discovery
  • 4. Data? • Word Graph is 80 S built by data reads! • Runtime answers needed fast 50M+ Nodes! 80M+ Edges!
  • 5. What we do with Data • Update the Graph constantly • Augment our NLP pipeline • "Reality-based Annotation" with current, real-world data
  • 6. What we do with Data • Update the Graph constantly • Augment our NLP pipeline • "Reality-based Annotation" with current, real-world data Language is NOT static
  • 7. What we do with Data • Update the Graph constantly • Augment our NLP pipeline • "Reality-based Annotation" with Next??? current, real-world data Twitter? Language is NOT static Tumblr? Wordpres s
  • 8. Is a 20 year-old corpus good enough?
  • 9. How we do it • Amazon EC2-based deployment • Efficiency through constraint-based architecture • Small is Big! • Horizontal scaling by adding servers! • Yea, we can always go vertical • Blah, blah, more details!
  • 10. Micro Services • Services are stand-alone building blocks • Increase capacity through a "more like this" button
  • 11. Micro Services • Big application => micro services Monolithic application "Isn't this just SOA?"
  • 12. Micro Services • Big application => micro services Monolithic application "Isn't this just SOA?"
  • 13. Micro Services • Big application => micro services Monolithic application "Isn't this just SOA?"
  • 14. Micro Services • Big application => micro services Monolithic application "Isn't this just SOA?"
  • 15. Not PO-SOA • This is different • No proprietary message bus • Decoupled objects • Dedicated storage*** • Speak REST • Develop your services in… • Java • Scala • Ruby • Php
  • 16. Speak REST? • Sounds good but… • REST semantics vary wildly • HATEOAS vs. practical REST? /api/pet.json/1?delete (GET) /api/pet.json/1 (DELETE) Al /api/pet.json/1 (POST empty) valid! So…
  • 17. Speak REST? • Sounds good but… • REST semantics vary wildly • HATEOAS vs. practical REST? /api/pet.json/1?delete (GET) Peer All /api/pet.json/1 (DELETE) Review! valid! /api/pet.json/1 (POST empty) Better Docs! So… API API Styleguide Council! !
  • 18. SOA makes new Challenges • It's communication (not easy) • Need a consumer & provider contract • Driving force to create Swagger
  • 19. What is Swagger? • Swagger is… • Spec for declaring and documenting an API • A framework for auto-generating the spec • A library for client library generation • A JSON-based test framework • It's open source! • http://swagger.wordnik.com
  • 20. How? • Swagger Codegen • Creates a client based on your Swagger Spec scala src/main/scala/Codegen.scala ${swagger-spec-url} Scal a Ruby
  • 21. In the Wordnik Workflow • Jenkins will… • Build a service library • Build a stand-alone application distro • Build an installable image (RPM) • Build a compatible client library • Consumers will… • Declare dependency on a service version • Use a client for that version • Be given a list of compatible services, by cluster, version
  • 22. Back to Data • Micro services have small(ish) databases • Share nothing across services • YES To replica sets • Deployed to ephemeral storage • (more in a bit) • Small by design • How to keep them small?
  • 23. Keeping Databases Small • Some easy tricks • Schema-less => "schema per document" • Keep field names short! db.foo.save({user_name:"Tony"}) Repeat db.foo.save({un:"Tony"})10e9 times! • Indexes • They can get *huge* • Make _id matter!
  • 24. Keeping Databases Small • Some easy tricks • Schema-less => "schema per document" • Keep field names short! db.foo.save({user_name:"Tony"}) Repeat db.foo.save({un:"Tony"})10e9 times! • Indexes • They can get *huge* • Make _id matter!
  • 25. Keeping Databases Small • Don't make _id just an "auto increment" You're stuck with it! Be smart • User collection? Try _id: username • Email collection? Try _id: email • Date-driven collection? How about _id: "20120502" • db.logins.find({_id:/^201205/}) 1 7 Be lazy until you can't anymore! 1 2 5 7
  • 26. Keeping Databases Small • DAO or die! • Fancy index scheme => control access to collections NO!!!! Yes
  • 27. Keeping Databases Small • If/when you need to shard… Don't make your clients do this!
  • 28. Keeping Databases Small • Again, why keep them small? • Starting a new replica • Initial sync • Index rebuilding • Backups • Index Compaction • Speed • TCO
  • 29. Keeping Databases Small • Again, why keep them small? Everythin • Starting a new replica g is • Initial sync easier • Index rebuilding • Backups • Index Compaction • Speed This can • TCO take DAYS
  • 30. Ephemeral Storage? • Every EC2 instance type has some (except micro) • Only available via EC2 API • Less prone to issues than EBS • Faster *** • Included in cost of server
  • 31. Ephemeral Storage? • Every EC2 instance type has some (except micro) • Only available via EC2 API • Less prone to issues than EBS • Faster *** • Included in cost of server But dies on host reboot!
  • 33. Which Zone? Which Region?
  • 34. Which Zone? Which Region? Arbiter handles external connectivity issue detection
  • 35. How does this really stack up? • Tuned indexes & access, split with services • Was: 3 DAS Devices w/18 TB disk • Now: 21 M1.large + M1.xlarge instances • 3 Zones, 2 regions • The Gory Details blog.wordnik.com/with-software-small-is-the-new-big
  • 36. As for Services • ~1,000 requests/sec via Swagger-enabled micro services • Direct to Consumer via SwaggerSocket
  • 37. What's Next • Migrating all services to SwaggerSocket • OSS WebSocket subprotocol https://github.com/wordnik/swaggersocket • 25%-100% speed increase (sync & async) • Discovery via Wordnik

Notes de l'éditeur

  1. list.foldLeft(0)(x, y => x+y)
  2. list.foldLeft(0)(x, y => x+y)
  3. list.foldLeft(0)(x, y => x+y)
  4. list.foldLeft(0)(x, y => x+y)
  5. list.foldLeft(0)(x, y => x+y)
  6. list.foldLeft(0)(x, y => x+y)
  7. list.foldLeft(0)(x, y => x+y)