Web Scraping is a simple computer software technique of extracting information from websites. I’ve been creating a lot of (data driven) creative content lately and one of the things I like to do is gathering as much data as I can from public sources. I even have some cases it is costing to much time to create and run database queries and my personal build PHP scraper is faster so I just wanted to share some tools that could be helpful.
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
What is Web scraping - Data Extraction Softwares
1. WHAT IS WEB SCRAPING
(WEB HARVESTING ORWEB DATA EXTRACTION)
Data Collected: Wikipedia, notprovided
2. Web Scraping
• Web scraping (web harvesting or web data
extraction) is a computer software
technique of extracting information from
websites. Usually, such software programs
simulate human exploration of the World
Wide Web by either implementing low-level
Hypertext Transfer Protocol (HTTP),
or embedding a fully-fledged web browser,
such as Internet Explorer or Mozilla
Firefox.
Definition
3. Web Scrapping
• Web scraping is the process of
automatically collecting information from
the World Wide Web. It is a field with
active developments sharing a common
goal with the semantic web vision, an
ambitious initiative that still requires
breakthroughs in text processing, semantic
understanding, artificial intelligence and
human-computer interactions. Web
scraping, instead, favors practical solutions
based on existing technologies that are
often entirely ad hoc.
Techniques
4. Web Scrapping
• Kimono Labs
• Import.io
• Datatude Technologies
• Outwit Hub
• ScraperWiki
• Scraper (Chrome plugin)
Best Tools for Data
Extraction
5. Kimono Labs
• Kimono has two easy ways to scrape
specific URLs: just paste the URL into their
website or use their bookmark. Once you
have pointed out the data you need, you
can set how often and when you want the
data to be collected. The data is saved in
their database. I like the facts that their
learning curve is not that steep and it
doesn’t look like you need a PHD in
engineering to use their software. The
disadvantage of this tool is the fact you
can’t upload multiple URLs at once.
6. Import.io
• Import.io is a browser based web scraping
tool. By following their easy step-by-step
plan you select the data you want to
scrape and the tool does the rest. It is a
more sophisticated tool compared to
Kimono. I like it because of the fact it
shows a clear overview of all the scrapers
you have active and you can scrape
multiple URLs at once.
7. Datatude
Technologies
• Collecting valid data from the Internet is
extremely valuable for any business but it
is a challenging process that tends to be
slow, highly error-prone, and wastes many
human and financial resources. Ficstar
solves this problem by providing data
mining solutions from the Deep Web for
collecting information relevant to your
business. With this custom-designed
search engine, you can quickly get the data
you want, and transform it into relevant
and usable results that meet your business
needs.
8. Outwit Hub
• I will start with the two biggest
differences compared to the previous
tool: it is a softwarepackage to use on
your PC or laptop and to use its full
potential it will cost you 75 USD. The
free version can only scrape 100 rows
of data. What I do like is the number of
preprogrammed options to scrape
which makes it easy to start and learn
about web scraping.
9. ScraperWiki
• This tool is really for people wanting to
scrape on a massive scale. You can code
your own scrapers (in PHP, Ruby & Python)
and pricing is really cheap looking to what
you can get: 29USD / month for 100
datasets. You are completely free in using
libraries and timers. And if your
programming skills are not good enough,
they can help you out (paid service
though). Compared to other tools, this is
the most advanced tool that offers the
basics of web scraping.
10. Scraper
(Chrome
plugin)
• You can select a specific data point, a price,
a rating etc and then use your browser
menu: click Scrape Similar and you will get
multiple options to export or copy your
data to Excel or Google Docs. This plugin is
really basic but does the job it is build for:
fast and easy screen scraping.