SlideShare une entreprise Scribd logo
1  sur  22
Scraping the Web with Scrapinghub
For Finance
We turn web content into useful data
About Scrapinghub
Scrapinghub specializes in data extraction. Our platform is
used to scrape over 4 billion web pages a month.
We offer:
● Professional Services to handle the web scraping for you
● Off-the-shelf datasets so you can get data hassle free
● A cloud-based platform that makes scraping a breeze
Founded in 2010, largest 100% remote company based outside of the US
We’re 134 teammates in 48 countries
“Getting information off the
Internet is like taking a drink
from a fire hydrant.”
– Mitchell Kapor
Scrapy
Scrapy is a web scraping framework that
gets the dirty work related to web crawling
out of your way.
Benefits
● No platform lock-in: Open Source
● Very popular (13k+ ★)
● Battle tested
● Highly extensible
● Great documentation
Portia
Portia is a Visual Scraping tool that lets you
get data without needing to write code.
Benefits
● No platform lock-in: Open Source
● JavaScript dynamic content generation
● Ideal for non-developers
● Extensible
● It’s as easy as annotating a page
Portia
Large Scale Infrastructure
Meet Scrapy Cloud , our PaaS for web crawlers:
● Scalable: Crawlers run on EC2 instances or dedicated servers
● Crawlera add-on
● Control your spiders: Command line, API or web UI
● Machine learning integration: BigML, MonkeyLearn
● No lock-in: scrapyd to run Scrapy spiders on your own infrastructure
Broad Crawls
Frontera allows us to build large scale web crawlers in Python:
● Scrapy support out of the box
● Distribute and scale custom web crawlers across servers
● Crawl Frontier Framework: large scale URL prioritization logic
● Aduana to prioritize URLs based on link analysis (PageRank, HITS)
Web Scraping Use Cases
Competitive Pricing
Companies use web scraping to monitor the
pricing and the ratings of competitors:
● Scrape online retailers
● Structure the data in a search engine or DB
● Create an interface to search for products
● Sentiment analysis for product rankings
We help a leading IT manufacturer monitor the activities of their
resellers:
● Tracking and watching out for stolen goods
● Pricing agreement violations
● Customer support responses on complaints
● Product line quality checks
Monitor Resellers
Lead Generation
Mine scraped data to identify who to target in a company for your
outbound sales campaigns:
● Locate possible leads in your target market
● Identify the right contacts within each one
● Augment the information you already have on them
Real Estate
Crawl property websites and use the data obtained in order to:
● Estimate house prices
● Rental values
● Housing stock movements
● Give insight into real estate agents and homeowners
Fraud Detection
Monitor for sellers that offer products violating the ToS of credit card
companies including:
● Drugs
● Weapons
● Gambling
Identify stolen cards and IDs on the Dark Web
● Forums where hackers share ID numbers / pins
Company Reputation
Sentiment analysis of a company or product through newsletters, social
networks and other natural language data sources.
● NLP to create an associated sentiment indicator.
● Track the relevant news supporting the indicator can lead to market
insights for long-term trends.
Consumer Behavior
Extract data from forums and websites like Reddit to evaluate consumer
reviews and commentary:
● Volume of comments across brands
● Topics of discussion
● Comparisons with other brands and products
● Evaluate product launches and marketing tactics
Tracking Legislation
Monitor bills and regulations that are being discussed in Congress. Access
court judgments and opinions in order to:
● Follow discussions
● Try to forecast legislative outcomes
● Track regulations that impact different economic sectors
Hiring
Crawl and extract data from job boards and other
sources in order to understand:
● Hiring trends in different sectors or regions
● Find candidates for jobs, or future leaders
● Spot and rescue employees that are shopping
for a new job
Monitoring Corruption
Journalists and analysts can create Open Data by extracting information
from difficult to access government websites:
● Track the activities of lobbyists
● Patterns in the behavior of government officials
● Disruptions in the economy due to corruption allegations
Thank you!
scrapinghub.com
Thank you!

Contenu connexe

Tendances

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsMongoDB
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documentsTommy Tavenner
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterMongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichPatrick Baumgartner
 
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingMongoDB
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitTom Bennet
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.Shyjal Raazi
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et HadoopMongoDB
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB
 
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools
 
Mindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenesMindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenesrobin_sy
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?Yu-Chang Ho
 

Tendances (20)

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
 
Fluentd - Unified logging layer
Fluentd -  Unified logging layerFluentd -  Unified logging layer
Fluentd - Unified logging layer
 
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
SoftNews-lowres
SoftNews-lowresSoftNews-lowres
SoftNews-lowres
 
Jinchao demo
Jinchao demoJinchao demo
Jinchao demo
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkit
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Generic Crawler
Generic CrawlerGeneric Crawler
Generic Crawler
 
DomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web AssetsDomainTools Fingerprinting Threat Actors with Web Assets
DomainTools Fingerprinting Threat Actors with Web Assets
 
Mindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenesMindtalk Tech - Behind the scenes
Mindtalk Tech - Behind the scenes
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
 

Similaire à Using Web Data for Finance

SEMrush product training- Killer Features
SEMrush product training- Killer FeaturesSEMrush product training- Killer Features
SEMrush product training- Killer FeaturesYulia Aslamova
 
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Connotate
 
SEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for MeetupSEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for MeetupBruce Jones
 
How to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital StrategiesHow to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital StrategiesMel Tomas
 
Top Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail DataTop Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail Dataretailgators
 
Website Parameters.pptx
Website Parameters.pptxWebsite Parameters.pptx
Website Parameters.pptxASHAVI2
 
Google Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners GuideGoogle Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners GuideIndulge Media
 
Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1) Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1) Magenest
 
Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022ALPSMarketing
 
Sources of data collection for business applications
Sources of data collection for business applicationsSources of data collection for business applications
Sources of data collection for business applicationsPromptCloud
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETTanyaRaina3
 
How to setup Big Data Company in India or data analytics Company
How to setup Big Data Company in India or data analytics  CompanyHow to setup Big Data Company in India or data analytics  Company
How to setup Big Data Company in India or data analytics Companystartupscratch
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSouth Tyrol Free Software Conference
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfAqsaBatool21
 
A Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionA Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionAggregage
 
Google analytics overview
Google analytics overviewGoogle analytics overview
Google analytics overviewToby Eborn
 
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...In Marketing We Trust
 
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian AgencyWeb Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian AgencyVorian Agency
 

Similaire à Using Web Data for Finance (20)

SEMrush product training- Killer Features
SEMrush product training- Killer FeaturesSEMrush product training- Killer Features
SEMrush product training- Killer Features
 
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
 
SEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for MeetupSEO Reporting 1ON1 Presentation for Meetup
SEO Reporting 1ON1 Presentation for Meetup
 
How to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital StrategiesHow to leverage market insights for winning Digital Strategies
How to leverage market insights for winning Digital Strategies
 
Top Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail DataTop Web Scraping Service Provider For The Retail Data
Top Web Scraping Service Provider For The Retail Data
 
Website Parameters.pptx
Website Parameters.pptxWebsite Parameters.pptx
Website Parameters.pptx
 
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
Uncovering the DNA of Affiliate Programs: Insights from Market Research - Fer...
 
Google Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners GuideGoogle Analytics For Business - A Beginners Guide
Google Analytics For Business - A Beginners Guide
 
Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1) Google analytics traning for beginner ( part 1)
Google analytics traning for beginner ( part 1)
 
Tuning out-the-static
Tuning out-the-staticTuning out-the-static
Tuning out-the-static
 
Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022Build a High-Impact SEO Strategy in 2022
Build a High-Impact SEO Strategy in 2022
 
Sources of data collection for business applications
Sources of data collection for business applicationsSources of data collection for business applications
Sources of data collection for business applications
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
 
How to setup Big Data Company in India or data analytics Company
How to setup Big Data Company in India or data analytics  CompanyHow to setup Big Data Company in India or data analytics  Company
How to setup Big Data Company in India or data analytics Company
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
 
A Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionA Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in Production
 
Google analytics overview
Google analytics overviewGoogle analytics overview
Google analytics overview
 
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
AI-Powered SEO with Botify: Automation in Prevention, Execution, and Implemen...
 
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian AgencyWeb Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
Web Analytics & Online Monitoring Tools Training Seminar - Vorian Agency
 

Dernier

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 

Dernier (20)

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 

Using Web Data for Finance

  • 1. Scraping the Web with Scrapinghub For Finance
  • 2. We turn web content into useful data
  • 3. About Scrapinghub Scrapinghub specializes in data extraction. Our platform is used to scrape over 4 billion web pages a month. We offer: ● Professional Services to handle the web scraping for you ● Off-the-shelf datasets so you can get data hassle free ● A cloud-based platform that makes scraping a breeze
  • 4. Founded in 2010, largest 100% remote company based outside of the US We’re 134 teammates in 48 countries
  • 5. “Getting information off the Internet is like taking a drink from a fire hydrant.” – Mitchell Kapor
  • 6. Scrapy Scrapy is a web scraping framework that gets the dirty work related to web crawling out of your way. Benefits ● No platform lock-in: Open Source ● Very popular (13k+ ★) ● Battle tested ● Highly extensible ● Great documentation
  • 7. Portia Portia is a Visual Scraping tool that lets you get data without needing to write code. Benefits ● No platform lock-in: Open Source ● JavaScript dynamic content generation ● Ideal for non-developers ● Extensible ● It’s as easy as annotating a page
  • 9. Large Scale Infrastructure Meet Scrapy Cloud , our PaaS for web crawlers: ● Scalable: Crawlers run on EC2 instances or dedicated servers ● Crawlera add-on ● Control your spiders: Command line, API or web UI ● Machine learning integration: BigML, MonkeyLearn ● No lock-in: scrapyd to run Scrapy spiders on your own infrastructure
  • 10. Broad Crawls Frontera allows us to build large scale web crawlers in Python: ● Scrapy support out of the box ● Distribute and scale custom web crawlers across servers ● Crawl Frontier Framework: large scale URL prioritization logic ● Aduana to prioritize URLs based on link analysis (PageRank, HITS)
  • 12. Competitive Pricing Companies use web scraping to monitor the pricing and the ratings of competitors: ● Scrape online retailers ● Structure the data in a search engine or DB ● Create an interface to search for products ● Sentiment analysis for product rankings
  • 13. We help a leading IT manufacturer monitor the activities of their resellers: ● Tracking and watching out for stolen goods ● Pricing agreement violations ● Customer support responses on complaints ● Product line quality checks Monitor Resellers
  • 14. Lead Generation Mine scraped data to identify who to target in a company for your outbound sales campaigns: ● Locate possible leads in your target market ● Identify the right contacts within each one ● Augment the information you already have on them
  • 15. Real Estate Crawl property websites and use the data obtained in order to: ● Estimate house prices ● Rental values ● Housing stock movements ● Give insight into real estate agents and homeowners
  • 16. Fraud Detection Monitor for sellers that offer products violating the ToS of credit card companies including: ● Drugs ● Weapons ● Gambling Identify stolen cards and IDs on the Dark Web ● Forums where hackers share ID numbers / pins
  • 17. Company Reputation Sentiment analysis of a company or product through newsletters, social networks and other natural language data sources. ● NLP to create an associated sentiment indicator. ● Track the relevant news supporting the indicator can lead to market insights for long-term trends.
  • 18. Consumer Behavior Extract data from forums and websites like Reddit to evaluate consumer reviews and commentary: ● Volume of comments across brands ● Topics of discussion ● Comparisons with other brands and products ● Evaluate product launches and marketing tactics
  • 19. Tracking Legislation Monitor bills and regulations that are being discussed in Congress. Access court judgments and opinions in order to: ● Follow discussions ● Try to forecast legislative outcomes ● Track regulations that impact different economic sectors
  • 20. Hiring Crawl and extract data from job boards and other sources in order to understand: ● Hiring trends in different sectors or regions ● Find candidates for jobs, or future leaders ● Spot and rescue employees that are shopping for a new job
  • 21. Monitoring Corruption Journalists and analysts can create Open Data by extracting information from difficult to access government websites: ● Track the activities of lobbyists ● Patterns in the behavior of government officials ● Disruptions in the economy due to corruption allegations