Submit Search
Upload
Seravia in the Cloud
•
Download as PPT, PDF
•
3 likes
•
550 views
K
kidrane
Follow
MongoDB Beijing Meetup on May 7th.
Read less
Read more
Technology
Report
Share
Report
Share
1 of 17
Download now
Recommended
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Lucidworks (Archived)
Cascalog
Cascalog
nathanmarz
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
Hadoop User Group
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
kristgen
Insight_150115_Demo
Insight_150115_Demo
Matt Rubashkin
Elasticsearch
Elasticsearch
Divij Sehgal
Configuring elasticsearch for performance and scale
Configuring elasticsearch for performance and scale
Bharvi Dixit
AWS Customer Presentation - AideRss
AWS Customer Presentation - AideRss
Amazon Web Services
Recommended
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Lucidworks (Archived)
Cascalog
Cascalog
nathanmarz
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
Hadoop User Group
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
kristgen
Insight_150115_Demo
Insight_150115_Demo
Matt Rubashkin
Elasticsearch
Elasticsearch
Divij Sehgal
Configuring elasticsearch for performance and scale
Configuring elasticsearch for performance and scale
Bharvi Dixit
AWS Customer Presentation - AideRss
AWS Customer Presentation - AideRss
Amazon Web Services
Deploying On EC2
Deploying On EC2
Steve Loughran
Open source data ingestion
Open source data ingestion
Treasure Data, Inc.
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
NAVER D2
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
Winston Chen
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
Amazon Web Services
Amazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the Cloud
Safe Software
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
nathanmarz
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
Wes McKinney
Elasticsearch
Elasticsearch
Ricardo Peres
Backbone Conf 2014 - Realtime & Firebase
Backbone Conf 2014 - Realtime & Firebase
Clément Wehrung
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Noriaki Tatsumi
Elasticsearch in 15 minutes
Elasticsearch in 15 minutes
David Pilato
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Chuan-Yen Chiang
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
Jake Mannix
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
Kouhei Sutou
Graph and Neptune
Graph and Neptune
Amazon Web Services
Distributed search solutions and comparison
Distributed search solutions and comparison
zingopen
Elasticsearch
Elasticsearch
Ricardo Peres
SAS integration with NoSQL data
SAS integration with NoSQL data
Kevin Lee
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Amazon Web Services
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
More Related Content
What's hot
Deploying On EC2
Deploying On EC2
Steve Loughran
Open source data ingestion
Open source data ingestion
Treasure Data, Inc.
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
NAVER D2
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
Winston Chen
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
Amazon Web Services
Amazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the Cloud
Safe Software
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
nathanmarz
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
Wes McKinney
Elasticsearch
Elasticsearch
Ricardo Peres
Backbone Conf 2014 - Realtime & Firebase
Backbone Conf 2014 - Realtime & Firebase
Clément Wehrung
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Noriaki Tatsumi
Elasticsearch in 15 minutes
Elasticsearch in 15 minutes
David Pilato
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Chuan-Yen Chiang
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
Jake Mannix
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
Kouhei Sutou
Graph and Neptune
Graph and Neptune
Amazon Web Services
Distributed search solutions and comparison
Distributed search solutions and comparison
zingopen
Elasticsearch
Elasticsearch
Ricardo Peres
SAS integration with NoSQL data
SAS integration with NoSQL data
Kevin Lee
What's hot
(20)
Deploying On EC2
Deploying On EC2
Open source data ingestion
Open source data ingestion
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
AWS September Webinar Series - Running Microservices with Amazon EC2 Contain...
Amazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the Cloud
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
Elasticsearch
Elasticsearch
Backbone Conf 2014 - Realtime & Firebase
Backbone Conf 2014 - Realtime & Firebase
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Elasticsearch in 15 minutes
Elasticsearch in 15 minutes
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Hands on experience in real-time data process with AWS Kinesis, Firehose, S3 ...
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache Arrow
Graph and Neptune
Graph and Neptune
Distributed search solutions and comparison
Distributed search solutions and comparison
Elasticsearch
Elasticsearch
SAS integration with NoSQL data
SAS integration with NoSQL data
Similar to Seravia in the Cloud
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Amazon Web Services
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
Amazon Web Services
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Amazon Web Services
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
javier ramirez
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Amazon Web Services
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
Amazon Web Services
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
Amazon Web Services
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
Amazon Web Services
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
Amazon Web Services
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
BeyondTrees
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
Amazon Web Services
Using Data Lakes
Using Data Lakes
Amazon Web Services
Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2
cmcavoy
Using Data Lakes
Using Data Lakes
Amazon Web Services
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Amazon Web Services
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Amazon Web Services
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Amazon Web Services
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
Amazon Web Services LATAM
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Amazon Web Services
Similar to Seravia in the Cloud
(20)
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Database and Analytics on the AWS Cloud - AWS Innovate Toronto
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
Using Data Lakes
Using Data Lakes
Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2
Using Data Lakes
Using Data Lakes
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Recently uploaded
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
UXDXConf
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
CzechDreamin
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
FIDO Alliance
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
FIDO Alliance
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
GDSC PJATK
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
reely ones
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
CzechDreamin
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
John Staveley
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
IES VE
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
FIDO Alliance
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
David Michel
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
UXDXConf
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
CzechDreamin
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
UXDXConf
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
Syngulon
The Metaverse: Are We There Yet?
The Metaverse: Are We There Yet?
Mark Billinghurst
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
marcuskenyatta275
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
UXDXConf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
FIDO Alliance
Recently uploaded
(20)
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
The Metaverse: Are We There Yet?
The Metaverse: Are We There Yet?
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
Seravia in the Cloud
1.
Seravia in the
Cloud Danny Yang May 7, 2011
2.
3.
Sample of Data
2.0 Companies
4.
Our data sets
5.
6.
7.
8.
9.
10.
11.
12.
Seravia on AWS
WWW Data Crawlware ELB EC2 rails, mongo, mysql, sphinx S3 EC2 parsing, pentaho, ETL S3 EMR hadoop, hive, BI EC2 S3 EC2 rails, mongo, mysql, sphinx
13.
WWW architecture ELB
EC2 webserver S3 EC2 webserver EC2 mongo EC2 sphinx EC2 mysql EC2 webserver, rails EC2 mongo EC2 sphinx
14.
Data Architecture EC2
post-processing EC2 Parsing, ETL S3 EC2 Parsing, ETL EMR Hadoop, hive, BI EC2 post-processing 1. Raw data – html, xml, text files 2. Pre-processed – unrelated tsv files 3. Analyzed – related tsv files and reports 4. Post-processed – json documents EC2 post-processing
15.
Crawlware Architecture EC2
Crawler EC2 Crawler EC2 Controller S3 EC2 Crawler EC2 Crawler EC2 Controller
16.
17.
Q & A
Download now