This document discusses using Nutch, an open source web crawler, with Scala. It provides an overview of Nutch's architecture and how plugins can be written in Scala to extend its functionality. As an example, it describes how Scala was used to build a plugin for an aggregator application that crawls multiple suppliers, parses content to extract details, and passes this data to an actor for processing. The solution was able to crawl 5 suppliers and collect over 500k records using Nutch and 823 lines of Scala code.
2. about
CTO at Knoldus Software
Co-Founder at MyCellWasStolen.com
Community Editor at InfoQ.com
Dabbling with Scala – last 40 months
Enterprise grade implementations on Scala – 18 months
2
12. scala
I have Java !
concurrency verbose
popular Strongly typed
jvm
OO library
12
13. scala
Java:
class Person {
private String firstName;
private String lastName;
private int age;
public Person(String firstName, String lastName, int age) {
this.firstName = firstName;
this.lastName = lastName;
this.age = age;
}
public void setFirstName(String firstName) { this.firstName = firstName; }
public void String getFirstName() { return this.firstName; }
public void setLastName(String lastName) { this.lastName = lastName; }
public void String getLastName() { return this.lastName; }
public void setAge(int age) { this.age = age; }
public void int getAge() { return this.age; }
}
Scala:
class Person(var firstName: String, var lastName: String, var age: Int)
Source: http://blog.objectmentor.com/articles/2008/08/03/the-seductions-of-scala-part-i 13
14. scala
Java – everything is an object unless it is primitive
Scala – everything is an object. period.
Java – has operators (+, -, < ..) and methods
Scala – operators are methods
Java – statically typed – Thing thing = new Thing()
Scala – statically typed but uses type inferencing
val thing = new Thing
14