SlideShare une entreprise Scribd logo
1  sur  25
Lessons from building a search
engine with Amazon Web Services
            Chirayu Patel
     chirayu@snappyfingers.com
Chirayu Patel
                       Developer
                     SnappyFingers
           Question and Answer Search Engine
                   100% in the cloud



5-Apr-09                CloudCamp - Bangalore
My experiences
                 What worked?
                 What didn’t?
           What could be done better?
                What did I miss?




5-Apr-09           CloudCamp - Bangalore
Recap - AWS?
      •    EC2 – Elastic Compute Cloud
      •    S3 – Simple Storage Service
      •    SQS – Simple Queue Service
      •    SDB – SimpleDB




5-Apr-09                   CloudCamp - Bangalore
Why AWS?
      •    Computing Power requirements unknown
      •    Cheap
      •    Availability of multiple services
      •    Easy to implement SnappyFingers architecture
           using AWS services




5-Apr-09                   CloudCamp - Bangalore
SnappyFingers
      • Information Retrieval System (IRS)
      • FrontEnd
           – Nothing unique here




5-Apr-09                    CloudCamp - Bangalore
Three motivations (behind my decisions)
      • Reluctance to learn
      • Cost Conscious
      • I write buggy code




5-Apr-09                  CloudCamp - Bangalore
Architectural Requirements
      •    Loose Coupled
      •    Scalable
      •    Fault Tolerant
      •    Budget dependent




5-Apr-09                 CloudCamp - Bangalore
IRS Architecture
           Pipeline
                                                                                            SQS
                      Pipe Crawler           Pipe Parser            Pipe Indexer




                                                                                           EC2
                 Crawler                        Parser                     Indexer




                                                EC2 + S3                             SDB
                               Data Store                               Errors



5-Apr-09                                    CloudCamp - Bangalore
Pipes and Pipelines




5-Apr-09         CloudCamp - Bangalore
Pipes and Pipelines
      • Pipes contain jobs
      • Pipeline is a group of pipe
      • Easy to create pipelines and add pipes




5-Apr-09                 CloudCamp - Bangalore
Job ORM
                                                                SQS API
      class CrawlerJob (JobBase):
                                                                SDB API
          class SDBInterfaceConfig:
              domain_name = settings.CRAWLER_JOB_DOMAIN

           class SQSInterfaceConfig:
               queue_name = settings.CRAWLER_JOB_QUEUE
               timeout = settings.CRAWLER_JOB_TIMEOUT

           class AWSMetaData:
               action = CharField (...)
               url = CharField (...)
               ...
               ...
      Default attributes of each Job:
      • Pipeline Name
      • Status
      • Start Time
      • End Time
      • Id

5-Apr-09                                CloudCamp - Bangalore
Job Processing
      for i in range (num_of_jobs):
         try:
             job = cls.jobclass.sqs_get() # process job
             ...
         except Exception, e:
             job.job_processing_complete(…)
             fsdebug.mail_admins (..)
             end_transaction(rollback = True)
             job.sdb_save() # save in error store
          finally:
             job.sqs_del() # delete the job



5-Apr-09                   CloudCamp - Bangalore
The Good
      • Architecture easy to extend
      • ORM approach is a big time saver
      • Simple to add new services




5-Apr-09                CloudCamp - Bangalore
The Bad
      • Messages may be lost
           – Service Failure
           – SQS deletes messages after 4 days.


      Imp: System should be able to recreate jobs




5-Apr-09                     CloudCamp - Bangalore
Storage




5-Apr-09   CloudCamp - Bangalore
What do we store?
      • Crawler Data – Web Pages
      • Extracted Content – Questions/Answers
      • Backups




5-Apr-09                CloudCamp - Bangalore
Storage Structure

                        Meta Data                        Key + Value


           Postgres                                 S3




5-Apr-09                    CloudCamp - Bangalore
ORM
      • Extended Django ORM to support S3

      class S3WebPage (S3Model):
          _allowed_attrs = [quot;urlquot;, quot;contentquot;, ..]
          _name = quot;S3WebPage“
          ...
          ...




5-Apr-09               CloudCamp - Bangalore
The Good
      • Extremely scalable
      • Possible to store Python objects in S3
      • Latency issues can be solved by using a
        caching layer
      • No need to backup S3 data
      • Storage is cheap



5-Apr-09                 CloudCamp - Bangalore
The Bad
      • Postgres + S3 is not an elegant solution
           – Periodic syncing of Postgres and S3 required
      • High transaction costs
           – $.01 per 1000 PUT,COPY,POST or LIST Requests
           – $.01 per 10000 GET Requests




5-Apr-09                      CloudCamp - Bangalore
Computing




5-Apr-09    CloudCamp - Bangalore
EC2 – The Good
      • Computing needs are not constant
      • Data transfer to other AWS services is free
      • AMI’s per node type




5-Apr-09                 CloudCamp - Bangalore
The bad
      • Missed having a nerve center
           – Budget
           – Job Load
           – CPU load
      • Low cost 64bit severs are not available




5-Apr-09                 CloudCamp - Bangalore
Thank You

Contenu connexe

Tendances

Infrastructure as a code: a cloud approach
Infrastructure as a code: a cloud approachInfrastructure as a code: a cloud approach
Infrastructure as a code: a cloud approachThinkOpen
 
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...Amazon Web Services
 
Best practices to use aws in countryside.
Best practices to use aws in countryside.Best practices to use aws in countryside.
Best practices to use aws in countryside.Takuya Tachibana
 
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS Amazon Web Services
 
ShadowReader - Serverless load tests for replaying production traffic
ShadowReader - Serverless load tests for replaying production trafficShadowReader - Serverless load tests for replaying production traffic
ShadowReader - Serverless load tests for replaying production trafficYuki Sawa
 
Gpu accelerated BERT deployment on aws
Gpu accelerated BERT deployment on awsGpu accelerated BERT deployment on aws
Gpu accelerated BERT deployment on awsJohn Varghese
 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...Seldon
 
Mvp skill saturday ep09 _06072019_azure updates - july 2019
Mvp skill saturday ep09 _06072019_azure updates - july 2019Mvp skill saturday ep09 _06072019_azure updates - july 2019
Mvp skill saturday ep09 _06072019_azure updates - july 2019Kumton Suttiraksiri
 
Siebel CRM in Production - What Now?
Siebel CRM in Production - What  Now?Siebel CRM in Production - What  Now?
Siebel CRM in Production - What Now?Frank
 
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...Vadym Kazulkin
 
Communication tool & Environment for Remote Worker
Communication tool & Environment for Remote WorkerCommunication tool & Environment for Remote Worker
Communication tool & Environment for Remote WorkerShotaro Sakamaki
 
How to improve lambda cold starts
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold startsYan Cui
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAmazon Web Services
 
Empowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAPEmpowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAPUri Cohen
 
Serverless Apps on Google Cloud: more dev, less ops
Serverless Apps on Google Cloud:  more dev, less opsServerless Apps on Google Cloud:  more dev, less ops
Serverless Apps on Google Cloud: more dev, less opsJoseph Lust
 
SpringOne Tour St. Louis - Serverless Spring
SpringOne Tour St. Louis - Serverless SpringSpringOne Tour St. Louis - Serverless Spring
SpringOne Tour St. Louis - Serverless SpringVMware Tanzu
 
Experiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and PostgresqlExperiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and PostgresqlOkis Chuang
 

Tendances (17)

Infrastructure as a code: a cloud approach
Infrastructure as a code: a cloud approachInfrastructure as a code: a cloud approach
Infrastructure as a code: a cloud approach
 
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
 
Best practices to use aws in countryside.
Best practices to use aws in countryside.Best practices to use aws in countryside.
Best practices to use aws in countryside.
 
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
AWS Public Sector Symposium 2014 Canberra | Managing Seasonal Workloads on AWS
 
ShadowReader - Serverless load tests for replaying production traffic
ShadowReader - Serverless load tests for replaying production trafficShadowReader - Serverless load tests for replaying production traffic
ShadowReader - Serverless load tests for replaying production traffic
 
Gpu accelerated BERT deployment on aws
Gpu accelerated BERT deployment on awsGpu accelerated BERT deployment on aws
Gpu accelerated BERT deployment on aws
 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
 
Mvp skill saturday ep09 _06072019_azure updates - july 2019
Mvp skill saturday ep09 _06072019_azure updates - july 2019Mvp skill saturday ep09 _06072019_azure updates - july 2019
Mvp skill saturday ep09 _06072019_azure updates - july 2019
 
Siebel CRM in Production - What Now?
Siebel CRM in Production - What  Now?Siebel CRM in Production - What  Now?
Siebel CRM in Production - What Now?
 
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
 
Communication tool & Environment for Remote Worker
Communication tool & Environment for Remote WorkerCommunication tool & Environment for Remote Worker
Communication tool & Environment for Remote Worker
 
How to improve lambda cold starts
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold starts
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
 
Empowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAPEmpowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAP
 
Serverless Apps on Google Cloud: more dev, less ops
Serverless Apps on Google Cloud:  more dev, less opsServerless Apps on Google Cloud:  more dev, less ops
Serverless Apps on Google Cloud: more dev, less ops
 
SpringOne Tour St. Louis - Serverless Spring
SpringOne Tour St. Louis - Serverless SpringSpringOne Tour St. Louis - Serverless Spring
SpringOne Tour St. Louis - Serverless Spring
 
Experiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and PostgresqlExperiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and Postgresql
 

En vedette

ACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker ProgramACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker ProgramACMBangalore
 
Opening Remarks - Cloud Symposium
Opening Remarks - Cloud SymposiumOpening Remarks - Cloud Symposium
Opening Remarks - Cloud SymposiumACMBangalore
 
Overview of FreeBSD PMC Tools
Overview of FreeBSD PMC ToolsOverview of FreeBSD PMC Tools
Overview of FreeBSD PMC ToolsACMBangalore
 
The power of abstraction
The power of abstractionThe power of abstraction
The power of abstractionACMBangalore
 
Securing Wireless Cellular Systems
Securing Wireless Cellular SystemsSecuring Wireless Cellular Systems
Securing Wireless Cellular SystemsACMBangalore
 
cloud - internet rengineering
cloud - internet rengineeringcloud - internet rengineering
cloud - internet rengineeringACMBangalore
 
Automated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-ChipAutomated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-ChipACMBangalore
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
 

En vedette (9)

ACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker ProgramACM Bangalore Distinguished Speaker Program
ACM Bangalore Distinguished Speaker Program
 
Opening Remarks - Cloud Symposium
Opening Remarks - Cloud SymposiumOpening Remarks - Cloud Symposium
Opening Remarks - Cloud Symposium
 
Overview of FreeBSD PMC Tools
Overview of FreeBSD PMC ToolsOverview of FreeBSD PMC Tools
Overview of FreeBSD PMC Tools
 
The power of abstraction
The power of abstractionThe power of abstraction
The power of abstraction
 
Securing Wireless Cellular Systems
Securing Wireless Cellular SystemsSecuring Wireless Cellular Systems
Securing Wireless Cellular Systems
 
In Home Safety Tips
In Home Safety TipsIn Home Safety Tips
In Home Safety Tips
 
cloud - internet rengineering
cloud - internet rengineeringcloud - internet rengineering
cloud - internet rengineering
 
Automated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-ChipAutomated Design of Digital Microfluids Lab-on-Chip
Automated Design of Digital Microfluids Lab-on-Chip
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
 

Similaire à Building a search engine with AWS - lessons from SnappyFingers

Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Szabolcs Zajdó
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...Stuart Myles
 
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine LearningDeploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine LearningDatabricks
 
Cloud Computing in Practice
Cloud Computing in PracticeCloud Computing in Practice
Cloud Computing in PracticeKing Huang
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Scaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesScaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesAndrew Turner
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWSThanh Nguyen
 
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Edureka!
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Yan Cui
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduceGareth Rogers
 
Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)Yan Cui
 
Azure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your projectAzure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your projectEastBanc Tachnologies
 
Navigate Data Service using AWS
Navigate Data Service using AWSNavigate Data Service using AWS
Navigate Data Service using AWSArno Broekhof
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)Amazon Web Services
 
AWS Lambda from the trenches
AWS Lambda from the trenchesAWS Lambda from the trenches
AWS Lambda from the trenchesYan Cui
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
 
Aws Introduction, technology and $ sense
Aws Introduction, technology and $ senseAws Introduction, technology and $ sense
Aws Introduction, technology and $ senseSachin Dole
 
Big Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudBig Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudGeorge Ang
 

Similaire à Building a search engine with AWS - lessons from SnappyFingers (20)

Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
 
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine LearningDeploy and Serve Model from Azure Databricks onto Azure Machine Learning
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
 
Cloud Computing in Practice
Cloud Computing in PracticeCloud Computing in Practice
Cloud Computing in Practice
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Scaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesScaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web Services
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWS
 
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduce
 
Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)Serverless in Production, an experience report (cloudXchange)
Serverless in Production, an experience report (cloudXchange)
 
Azure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your projectAzure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your project
 
Navigate Data Service using AWS
Navigate Data Service using AWSNavigate Data Service using AWS
Navigate Data Service using AWS
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
 
AWS Lambda from the trenches
AWS Lambda from the trenchesAWS Lambda from the trenches
AWS Lambda from the trenches
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Aws Introduction, technology and $ sense
Aws Introduction, technology and $ senseAws Introduction, technology and $ sense
Aws Introduction, technology and $ sense
 
Big Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudBig Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the Cloud
 

Plus de ACMBangalore

Clouds in emerging markets
Clouds in emerging marketsClouds in emerging markets
Clouds in emerging marketsACMBangalore
 
Opportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputingOpportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputingACMBangalore
 
Perspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - GooglePerspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - GoogleACMBangalore
 
Making of a Successful Cloud Business
Making of a Successful Cloud BusinessMaking of a Successful Cloud Business
Making of a Successful Cloud BusinessACMBangalore
 
Web Business Platforms on the Cloud
Web Business Platforms on the CloudWeb Business Platforms on the Cloud
Web Business Platforms on the CloudACMBangalore
 
Badrinath Ramamurthy Cloud Infrastructure
Badrinath Ramamurthy   Cloud InfrastructureBadrinath Ramamurthy   Cloud Infrastructure
Badrinath Ramamurthy Cloud InfrastructureACMBangalore
 
market oriented cloud
market oriented cloudmarket oriented cloud
market oriented cloudACMBangalore
 
Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09ACMBangalore
 
virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009ACMBangalore
 

Plus de ACMBangalore (9)

Clouds in emerging markets
Clouds in emerging marketsClouds in emerging markets
Clouds in emerging markets
 
Opportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputingOpportunites and Challenges in Cloud COmputing
Opportunites and Challenges in Cloud COmputing
 
Perspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - GooglePerspectives on Cloud COmputing - Google
Perspectives on Cloud COmputing - Google
 
Making of a Successful Cloud Business
Making of a Successful Cloud BusinessMaking of a Successful Cloud Business
Making of a Successful Cloud Business
 
Web Business Platforms on the Cloud
Web Business Platforms on the CloudWeb Business Platforms on the Cloud
Web Business Platforms on the Cloud
 
Badrinath Ramamurthy Cloud Infrastructure
Badrinath Ramamurthy   Cloud InfrastructureBadrinath Ramamurthy   Cloud Infrastructure
Badrinath Ramamurthy Cloud Infrastructure
 
market oriented cloud
market oriented cloudmarket oriented cloud
market oriented cloud
 
Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09Case study - SaaS Abs Experience Jan07 09
Case study - SaaS Abs Experience Jan07 09
 
virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009
 

Dernier

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 

Dernier (20)

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 

Building a search engine with AWS - lessons from SnappyFingers

  • 1. Lessons from building a search engine with Amazon Web Services Chirayu Patel chirayu@snappyfingers.com
  • 2. Chirayu Patel Developer SnappyFingers Question and Answer Search Engine 100% in the cloud 5-Apr-09 CloudCamp - Bangalore
  • 3. My experiences What worked? What didn’t? What could be done better? What did I miss? 5-Apr-09 CloudCamp - Bangalore
  • 4. Recap - AWS? • EC2 – Elastic Compute Cloud • S3 – Simple Storage Service • SQS – Simple Queue Service • SDB – SimpleDB 5-Apr-09 CloudCamp - Bangalore
  • 5. Why AWS? • Computing Power requirements unknown • Cheap • Availability of multiple services • Easy to implement SnappyFingers architecture using AWS services 5-Apr-09 CloudCamp - Bangalore
  • 6. SnappyFingers • Information Retrieval System (IRS) • FrontEnd – Nothing unique here 5-Apr-09 CloudCamp - Bangalore
  • 7. Three motivations (behind my decisions) • Reluctance to learn • Cost Conscious • I write buggy code 5-Apr-09 CloudCamp - Bangalore
  • 8. Architectural Requirements • Loose Coupled • Scalable • Fault Tolerant • Budget dependent 5-Apr-09 CloudCamp - Bangalore
  • 9. IRS Architecture Pipeline SQS Pipe Crawler Pipe Parser Pipe Indexer EC2 Crawler Parser Indexer EC2 + S3 SDB Data Store Errors 5-Apr-09 CloudCamp - Bangalore
  • 10. Pipes and Pipelines 5-Apr-09 CloudCamp - Bangalore
  • 11. Pipes and Pipelines • Pipes contain jobs • Pipeline is a group of pipe • Easy to create pipelines and add pipes 5-Apr-09 CloudCamp - Bangalore
  • 12. Job ORM SQS API class CrawlerJob (JobBase): SDB API class SDBInterfaceConfig: domain_name = settings.CRAWLER_JOB_DOMAIN class SQSInterfaceConfig: queue_name = settings.CRAWLER_JOB_QUEUE timeout = settings.CRAWLER_JOB_TIMEOUT class AWSMetaData: action = CharField (...) url = CharField (...) ... ... Default attributes of each Job: • Pipeline Name • Status • Start Time • End Time • Id 5-Apr-09 CloudCamp - Bangalore
  • 13. Job Processing for i in range (num_of_jobs): try: job = cls.jobclass.sqs_get() # process job ... except Exception, e: job.job_processing_complete(…) fsdebug.mail_admins (..) end_transaction(rollback = True) job.sdb_save() # save in error store finally: job.sqs_del() # delete the job 5-Apr-09 CloudCamp - Bangalore
  • 14. The Good • Architecture easy to extend • ORM approach is a big time saver • Simple to add new services 5-Apr-09 CloudCamp - Bangalore
  • 15. The Bad • Messages may be lost – Service Failure – SQS deletes messages after 4 days. Imp: System should be able to recreate jobs 5-Apr-09 CloudCamp - Bangalore
  • 16. Storage 5-Apr-09 CloudCamp - Bangalore
  • 17. What do we store? • Crawler Data – Web Pages • Extracted Content – Questions/Answers • Backups 5-Apr-09 CloudCamp - Bangalore
  • 18. Storage Structure Meta Data Key + Value Postgres S3 5-Apr-09 CloudCamp - Bangalore
  • 19. ORM • Extended Django ORM to support S3 class S3WebPage (S3Model): _allowed_attrs = [quot;urlquot;, quot;contentquot;, ..] _name = quot;S3WebPage“ ... ... 5-Apr-09 CloudCamp - Bangalore
  • 20. The Good • Extremely scalable • Possible to store Python objects in S3 • Latency issues can be solved by using a caching layer • No need to backup S3 data • Storage is cheap 5-Apr-09 CloudCamp - Bangalore
  • 21. The Bad • Postgres + S3 is not an elegant solution – Periodic syncing of Postgres and S3 required • High transaction costs – $.01 per 1000 PUT,COPY,POST or LIST Requests – $.01 per 10000 GET Requests 5-Apr-09 CloudCamp - Bangalore
  • 22. Computing 5-Apr-09 CloudCamp - Bangalore
  • 23. EC2 – The Good • Computing needs are not constant • Data transfer to other AWS services is free • AMI’s per node type 5-Apr-09 CloudCamp - Bangalore
  • 24. The bad • Missed having a nerve center – Budget – Job Load – CPU load • Low cost 64bit severs are not available 5-Apr-09 CloudCamp - Bangalore