SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
Coursera +
AWS CloudSearch
    Frank Chen
    Software Engineer
About
•    Ed-Tech startup providing MOOCs
     o    Massive Open Online Courses
•    New company -- launched 4/18/12
     o    Less than a year old.

•    215 free courses from 33 top universities
     o  Princeton, Stanford, Penn, Duke, etc...
     o  From Cryptography to Modern and Contemporary American
        Poetry
•  2.5+ million users
     o    We reached a million users faster than Facebook and
          Pinterest.
•  ~9 million course enrollments
Platform Scale
•    Moderate-sized (>10,000 concurrent users)
•    65 concurrent courses running now, each with tens of
     thousands of enrollments each
•    >600 "pretty heavy" PHP/Python dynamic pages served
     per second sustained
     o    Might make backend calls to services (e.g. CloudSearch or SES -->
          want low latencies)
•    Various other services (70 instances+ on EC2 running
     at the moment)
•    Spiky traffic
     o    People procrastinate on deadlines - spiky on the weekends
Stack
•    PHP / Python / Scala backed by MySQL
•    Runs on AWS completely
•    Utilizes lots of AWS services
     o    EC2 / ELB for servers
     o    MySQL RDS for databases
     o    S3 for video and static hosting
     o    Cloudfront for video / asset hosting
     o    SES for emails (>1 million emails everyday)
     o    SQS for long running tasks (video encoding, gradebook generation,
          etc...)
     o    SNS for notification services
     o    Route53 for DNS
     o    CloudSearch for forum search
Why CloudSearch?
•    Big issue for us back in March / April. Solution then
     didn't work
     o    MySQL Full Text Search
          §  LIKE %x% AS NATURAL LANGUAGE?
          §  Really terrible results
          §  MyISAM (eww...)

•    Requirements:
     o    Fast searches (we call backend APIs - don't want to keep the users
          waiting too long)
     o    Good results (need to be relevant - don't waste the students' time)
     o    Low/no maintenance (we have enough instances to manage as is)
Why CloudSearch?
•  Alternatives we looked at:
   o  Apache Solr, Sphinx, fiddling with MySQL
•  Then CloudSearch was announced...
•  Early general adopter - we started using
  CloudSearch ~10 days after announcement
   o  We didn't get any heads-up about CS before the public
      announcement
   o  Wrote the code to use CloudSearch and import over our
      existing forum posts / comments in 2 or 3 days.
       §  From decision to production!
       §  Easy to use and great documentation
CloudSearch Uses
      User facing forum search
CloudSearch Uses
•  Analytics
   o  Most frequent searches and other statistics about their courses
      §  Informing instructors about this so they can clarify
          information
   o  Finding posts across forums
      §  Easy for CloudSearch, hard normally because of sharded
          scatter-gather problems
               •    Old way: Querying 600 databases on 4 RDS servers? Not fun
        §    Usage analysis
        §    Unexpected use: Instructors often want to find all their own
              posts so they can save / archive common answers
CloudSearch Scale
•  Moderate scale
•  ~1.5 million documents indexed
   o    All forum posts and comments


•  50,000+ searches a day
   o    Spikey! Depends on when homeworks are due.
Experience




        GREAT!
We Want...
•  "Did you mean..."
  o    Lots of typos from non-native speakers


•  Multilingual Tokenization / Search
  o    We are starting to run courses in other languages...


•  Find Similar Documents
Thank You!
    Questions?
frank@coursera.org

Contenu connexe

Tendances

Automation in VLSI related tasks.
Automation in VLSI related tasks.Automation in VLSI related tasks.
Automation in VLSI related tasks.Shariful Islam
 
Rust is for "Big Data"
Rust is for "Big Data"Rust is for "Big Data"
Rust is for "Big Data"Andy Grove
 
02 beginning code first
02   beginning code first02   beginning code first
02 beginning code firstMaxim Shaptala
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
The XML Forms Architecture
The XML Forms ArchitectureThe XML Forms Architecture
The XML Forms ArchitectureiText Group nv
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinxAdrian Nuta
 

Tendances (8)

Automation in VLSI related tasks.
Automation in VLSI related tasks.Automation in VLSI related tasks.
Automation in VLSI related tasks.
 
Rust is for "Big Data"
Rust is for "Big Data"Rust is for "Big Data"
Rust is for "Big Data"
 
02 beginning code first
02   beginning code first02   beginning code first
02 beginning code first
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
The XML Forms Architecture
The XML Forms ArchitectureThe XML Forms Architecture
The XML Forms Architecture
 
Web Ninja
Web NinjaWeb Ninja
Web Ninja
 
Taming Text
Taming TextTaming Text
Taming Text
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinx
 

Similaire à Coursera amazon cloudsearch presentation

Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologiesgagravarr
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Journey towards serverless infrastructure
Journey towards serverless infrastructureJourney towards serverless infrastructure
Journey towards serverless infrastructureVille Seppänen
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...Amazon Web Services
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...Databricks
 
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
 
How to scale your app and win the cloud challenge
How to scale your app and win the cloud challenge How to scale your app and win the cloud challenge
How to scale your app and win the cloud challenge Quentin Adam
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Cloud-native persistence in a serverless world
Cloud-native persistence in a serverless worldCloud-native persistence in a serverless world
Cloud-native persistence in a serverless worldNick Do
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Ricard Clau
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overviewscrazzl
 

Similaire à Coursera amazon cloudsearch presentation (20)

Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Journey towards serverless infrastructure
Journey towards serverless infrastructureJourney towards serverless infrastructure
Journey towards serverless infrastructure
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
 
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
How to scale your app and win the cloud challenge
How to scale your app and win the cloud challenge How to scale your app and win the cloud challenge
How to scale your app and win the cloud challenge
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Cloud-native persistence in a serverless world
Cloud-native persistence in a serverless worldCloud-native persistence in a serverless world
Cloud-native persistence in a serverless world
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overview
 
Presentation open source library
Presentation open source libraryPresentation open source library
Presentation open source library
 
TechSoup An Open Source Library Story
TechSoup An Open Source Library StoryTechSoup An Open Source Library Story
TechSoup An Open Source Library Story
 

Plus de Michael Bohlig

Amazon Cloudsearch Session With Elsevier: re:Invent 2013
Amazon Cloudsearch Session With Elsevier: re:Invent 2013 Amazon Cloudsearch Session With Elsevier: re:Invent 2013
Amazon Cloudsearch Session With Elsevier: re:Invent 2013 Michael Bohlig
 
Dzone Webinar: Search Patterns with Amazon CloudSearch
Dzone Webinar: Search Patterns with Amazon CloudSearchDzone Webinar: Search Patterns with Amazon CloudSearch
Dzone Webinar: Search Patterns with Amazon CloudSearchMichael Bohlig
 
Delivering Better Search For WordPress - AWS Webcast
Delivering Better Search For WordPress - AWS WebcastDelivering Better Search For WordPress - AWS Webcast
Delivering Better Search For WordPress - AWS WebcastMichael Bohlig
 
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913Michael Bohlig
 
Building Great Mobile Search with Productsy and Amazon CloudSearch
Building Great Mobile Search with Productsy and Amazon CloudSearchBuilding Great Mobile Search with Productsy and Amazon CloudSearch
Building Great Mobile Search with Productsy and Amazon CloudSearchMichael Bohlig
 
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Michael Bohlig
 
Amazon CloudSearch User Talk - Naked Wines
Amazon CloudSearch User Talk - Naked Wines Amazon CloudSearch User Talk - Naked Wines
Amazon CloudSearch User Talk - Naked Wines Michael Bohlig
 
DynamoDB and Amazon Cloudsearch
DynamoDB and Amazon CloudsearchDynamoDB and Amazon Cloudsearch
DynamoDB and Amazon CloudsearchMichael Bohlig
 
Tuning Search Requests - Amazon CloudSearch
Tuning Search Requests - Amazon CloudSearchTuning Search Requests - Amazon CloudSearch
Tuning Search Requests - Amazon CloudSearchMichael Bohlig
 
Snapguide - Amazon Cloudsearch
Snapguide - Amazon CloudsearchSnapguide - Amazon Cloudsearch
Snapguide - Amazon CloudsearchMichael Bohlig
 
EDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearchEDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearchMichael Bohlig
 
Geospatial Search With Amazon CloudSearch
Geospatial Search With Amazon CloudSearch Geospatial Search With Amazon CloudSearch
Geospatial Search With Amazon CloudSearch Michael Bohlig
 
Amazon CloudSearch - Relevance, Ranking, Tuning and Analytics
Amazon CloudSearch - Relevance, Ranking, Tuning and AnalyticsAmazon CloudSearch - Relevance, Ranking, Tuning and Analytics
Amazon CloudSearch - Relevance, Ranking, Tuning and AnalyticsMichael Bohlig
 

Plus de Michael Bohlig (13)

Amazon Cloudsearch Session With Elsevier: re:Invent 2013
Amazon Cloudsearch Session With Elsevier: re:Invent 2013 Amazon Cloudsearch Session With Elsevier: re:Invent 2013
Amazon Cloudsearch Session With Elsevier: re:Invent 2013
 
Dzone Webinar: Search Patterns with Amazon CloudSearch
Dzone Webinar: Search Patterns with Amazon CloudSearchDzone Webinar: Search Patterns with Amazon CloudSearch
Dzone Webinar: Search Patterns with Amazon CloudSearch
 
Delivering Better Search For WordPress - AWS Webcast
Delivering Better Search For WordPress - AWS WebcastDelivering Better Search For WordPress - AWS Webcast
Delivering Better Search For WordPress - AWS Webcast
 
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
 
Building Great Mobile Search with Productsy and Amazon CloudSearch
Building Great Mobile Search with Productsy and Amazon CloudSearchBuilding Great Mobile Search with Productsy and Amazon CloudSearch
Building Great Mobile Search with Productsy and Amazon CloudSearch
 
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
 
Amazon CloudSearch User Talk - Naked Wines
Amazon CloudSearch User Talk - Naked Wines Amazon CloudSearch User Talk - Naked Wines
Amazon CloudSearch User Talk - Naked Wines
 
DynamoDB and Amazon Cloudsearch
DynamoDB and Amazon CloudsearchDynamoDB and Amazon Cloudsearch
DynamoDB and Amazon Cloudsearch
 
Tuning Search Requests - Amazon CloudSearch
Tuning Search Requests - Amazon CloudSearchTuning Search Requests - Amazon CloudSearch
Tuning Search Requests - Amazon CloudSearch
 
Snapguide - Amazon Cloudsearch
Snapguide - Amazon CloudsearchSnapguide - Amazon Cloudsearch
Snapguide - Amazon Cloudsearch
 
EDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearchEDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearch
 
Geospatial Search With Amazon CloudSearch
Geospatial Search With Amazon CloudSearch Geospatial Search With Amazon CloudSearch
Geospatial Search With Amazon CloudSearch
 
Amazon CloudSearch - Relevance, Ranking, Tuning and Analytics
Amazon CloudSearch - Relevance, Ranking, Tuning and AnalyticsAmazon CloudSearch - Relevance, Ranking, Tuning and Analytics
Amazon CloudSearch - Relevance, Ranking, Tuning and Analytics
 

Coursera amazon cloudsearch presentation

  • 1. Coursera + AWS CloudSearch Frank Chen Software Engineer
  • 2. About •  Ed-Tech startup providing MOOCs o  Massive Open Online Courses •  New company -- launched 4/18/12 o  Less than a year old. •  215 free courses from 33 top universities o  Princeton, Stanford, Penn, Duke, etc... o  From Cryptography to Modern and Contemporary American Poetry •  2.5+ million users o  We reached a million users faster than Facebook and Pinterest. •  ~9 million course enrollments
  • 3. Platform Scale •  Moderate-sized (>10,000 concurrent users) •  65 concurrent courses running now, each with tens of thousands of enrollments each •  >600 "pretty heavy" PHP/Python dynamic pages served per second sustained o  Might make backend calls to services (e.g. CloudSearch or SES --> want low latencies) •  Various other services (70 instances+ on EC2 running at the moment) •  Spiky traffic o  People procrastinate on deadlines - spiky on the weekends
  • 4. Stack •  PHP / Python / Scala backed by MySQL •  Runs on AWS completely •  Utilizes lots of AWS services o  EC2 / ELB for servers o  MySQL RDS for databases o  S3 for video and static hosting o  Cloudfront for video / asset hosting o  SES for emails (>1 million emails everyday) o  SQS for long running tasks (video encoding, gradebook generation, etc...) o  SNS for notification services o  Route53 for DNS o  CloudSearch for forum search
  • 5. Why CloudSearch? •  Big issue for us back in March / April. Solution then didn't work o  MySQL Full Text Search §  LIKE %x% AS NATURAL LANGUAGE? §  Really terrible results §  MyISAM (eww...) •  Requirements: o  Fast searches (we call backend APIs - don't want to keep the users waiting too long) o  Good results (need to be relevant - don't waste the students' time) o  Low/no maintenance (we have enough instances to manage as is)
  • 6. Why CloudSearch? •  Alternatives we looked at: o  Apache Solr, Sphinx, fiddling with MySQL •  Then CloudSearch was announced... •  Early general adopter - we started using CloudSearch ~10 days after announcement o  We didn't get any heads-up about CS before the public announcement o  Wrote the code to use CloudSearch and import over our existing forum posts / comments in 2 or 3 days. §  From decision to production! §  Easy to use and great documentation
  • 7. CloudSearch Uses User facing forum search
  • 8. CloudSearch Uses •  Analytics o  Most frequent searches and other statistics about their courses §  Informing instructors about this so they can clarify information o  Finding posts across forums §  Easy for CloudSearch, hard normally because of sharded scatter-gather problems •  Old way: Querying 600 databases on 4 RDS servers? Not fun §  Usage analysis §  Unexpected use: Instructors often want to find all their own posts so they can save / archive common answers
  • 9. CloudSearch Scale •  Moderate scale •  ~1.5 million documents indexed o  All forum posts and comments •  50,000+ searches a day o  Spikey! Depends on when homeworks are due.
  • 10. Experience GREAT!
  • 11. We Want... •  "Did you mean..." o  Lots of typos from non-native speakers •  Multilingual Tokenization / Search o  We are starting to run courses in other languages... •  Find Similar Documents
  • 12. Thank You! Questions? frank@coursera.org