×
  • Partagez
  • E-mail
  • Intégrer
  • J'aime
  • Télécharger
  • Contenu privé
 

Large scale crawling with Apache Nutch

by V.P., Apache Nutch at The Apache Software Foundation on Nov 07, 2012

  • 15,386 vues

This talk will give an overview of Apache Nutch, its main components, how it fits with other Apache projects and its latest developments. ...

This talk will give an overview of Apache Nutch, its main components, how it fits with other Apache projects and its latest developments.

Apache Nutch was started exactly 10 years ago and was the starting point for what later became Apache Hadoop and also Apache Tika. Nutch is nowadays the tool of reference for large scale web crawling.

In this talk I will give an overview of Apache Nutch and describe its main components and how Nutch fits with other Apache projects such as Hadoop, SOLR or Tika.

The second part of the presentation will be focused on the latest developments in Nutch and the changes introduced by the 2.x branch with the use of Apache GORA as a front end to various NoSQL datastores.

Statistiques

Vues

Total des vues
15,386
Vues sur SlideShare
15,216
Vues externes
170

Actions

J'aime
17
Téléchargements
286
Commentaires
1

6 Ajouts 170

http://www.twylah.com 129
http://www.linkedin.com 19
https://twitter.com 15
http://192.168.5.10 4
http://www.pinterest.com 2
http://translate.googleusercontent.com 1

Accessibilité

Catégories

Détails de l'import

Importé via SlideShare au format OpenOffice

Droits d'utilisation

Attribution licence CC

Report content

Signalé comme inapproprié Signaler comme inapproprié
Signaler comme inapproprié

Select your reason for flagging this presentation as inappropriate.

Annuler

11 sur 1 précédent suivant

Poster un commentaire
Modifier votre commentaire

Large scale crawling with Apache Nutch Large scale crawling with Apache Nutch Presentation Transcript