Handwritten Text Recognition for manuscripts and early printed texts
Recommender Systems with Ruby (adding machine learning, statistics, etc)
1. Ruby in the world of
recommendations
(also machine learning, statistics and visualizations..)
Marcel Caraciolo
@marcelcaraciolo
Developer, Cientist, contributor to the Crab recsys project,
works with Python for 6 years, interested at mobile,
education, machine learning and dataaaaa!
Recife, Brazil - http://aimotion.blogspot.com
Saturday, September 14, 2013
2. FAÇA BACKUP!
NUNCA:
find
.
-‐type
f
-‐not
-‐name
'*pyc'
|
xargs
rm
Saturday, September 14, 2013
4. Where is Ruby?
Presentation & Visualization
Experimentation
(Re-Design)
Data AcquisitionData Analysis
Saturday, September 14, 2013
5. Where is Ruby?
Presentation & Visualization
Experimentation
(Re-Design)
Data AcquisitionData Analysis
Saturday, September 14, 2013
6. Where is Ruby?
Presentation & Visualization
Experimentation
(Re-Design)
Data AcquisitionData Analysis
Saturday, September 14, 2013
7. Where is Ruby?
Presentation & Visualization
Experimentation
(Re-Design)
Data AcquisitionData Analysis
Saturday, September 14, 2013
8. Where is Ruby?
Python launched at 1991; Ruby
launched at 1995
Python was highly addopted and
promoted by most of the research and
development team of Google
Saturday, September 14, 2013
9. Where is Ruby?
Python lançado em 1991; Ruby lançado em 1995
Python foi altamente popularizado com a adoção oficial de
boa parte do time de pesquisa do Google
Python has been an important
key of Google since its beginning,
and still continues as our infra-
structure grows, we are always
looking for more people with
skills in this language.
Peter Norvig, Google, Inc.
Saturday, September 14, 2013
10. Where is Ruby?
Python was famous even at some old
scientific articles
Saturday, September 14, 2013
11. Where is Ruby?
Ruby’s popularity exploded at 2004.
Focus on web
Django - 2005; Numpy - 2005;
BioPython - 2001; SAGE - 2005;
Matplotlib- 2000;
Python
Saturday, September 14, 2013
12. Where is Ruby?
Programming comes second to researchers, not
first like us. - “Ruby developer answer”
Python
[(x, x*x) for x in [1,2,3,4] if x != 3]
vs Ruby
`[1,2,3,4].map { |x| [x, x*x] if x != 3 }`
vs Result
[(1,1), (2,4), (4,16)]
Saturday, September 14, 2013
22. Data Visualization
require 'rsruby'
cmd = %Q
(
pdf(file = "r_directly.pdf"))
boxplot(c(1,2,3,4),c(5,6,7,8))
dev.off()
)
def gnuplot(commands)
IO.popen("gnuplot", "w") { |io| io.puts commands }
end
commands = %Q(
set terminal svg
set output "curves.svg"
plot [-10:10] sin(x), atan(x), cos(atan(x))
)
gnuplot(commands)
http://effectif.com/ruby/manor/data-visualisation-with-ruby
https://github.com/glejeune/Ruby-Graphviz/Saturday, September 14, 2013
23. Other tools
•BioRuby
#!/usr/bin/env ruby
require 'bio'
# create a DNA sequence object from a String
dna = Bio::Sequence::NA.new("atcggtcggctta")
# create a RNA sequence object from a String
rna = Bio::Sequence::NA.new("auugccuacauaggc")
# create a Protein sequence from a String
aa = Bio::Sequence::AA.new("AGFAVENDSA")
# you can check if the sequence contains illegal characters
# that is not an accepted IUB character for that symbol
# (should prepare a Bio::Sequence::AA#illegal_symbols method also)
puts dna.illegal_bases
# translate and concatenate a DNA sequence to Protein sequence
newseq = aa + dna.translate
puts newseq # => "AGFAVENDSAIGRL"
http://bioruby.org/
Saturday, September 14, 2013
24. Other tools
•RubyDoop (uses JRuby)
module
WordCount
class
Reducer
def
reduce(key,
values,
context)
sum
=
0
values.each
{
|value|
sum
+=
value.get
}
context.write(key,
Hadoop::Io::IntWritable.new(sum))
end
end
end
https://github.com/iconara/rubydoop
module
WordCount
class
Mapper
def
map(key,
value,
context)
value.to_s.split.each
do
|word|
word.gsub!(/W/,
'')
word.downcase!
unless
word.empty?
context.write(Hadoop::Io::Text.new(word),
Hadoop::Io::IntWritable.new(1))
end
end
end
end
end
Saturday, September 14, 2013
25. Coming back to the
world of recommenders
The world is an over-crowded place
Saturday, September 14, 2013
26. Coming back to the
world of recommenders!"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0#
Saturday, September 14, 2013
29. And how does it work ?
Saturday, September 14, 2013
30. What the recommenders realy do ?
1. Predict how much you may like a certain
product o service
2. It suggests a list of N items ordered by the level of
your interests.
3. It suggests a N list o f users to a product/
service
4. It explains to you why those items were
recommended.
5. It adjusts the prediction and recommendations
based on your feedback and from anothers.
Saturday, September 14, 2013
31. Content Based Filtering
Gone with
the Wind
Die Hard
Similar
Armagedon
Toy
Store
Marcel
likes
recommends
Items
Users
Saturday, September 14, 2013
32. Problems with Content
Recommenders
1. Restrict Data Analysis
3. Portfolio Effect
- Items and users mal-formed. Even worst in audio and images
- An person that does not have experience with Sushi does not get
the recommendation of the best sushi in town.
- Just because I saw 1 movie of Xuxa when I was child, it must have
to recommend all movies of her (só para baixinhos!)
2. Specialized Data
Saturday, September 14, 2013
33. Collaborative Filtering
Gone with
the wind
Thor
Similar
Armagedon
Toy
Store
Marcel
like
recommend
Items
Rafael Amanda Users
Saturday, September 14, 2013
34. Problems with Collaborative Filtering
1. Scalability
2. Sparse Data
3. Cold Start
4. Popularity
- Amazon with 5M users, 50K items, 1.4B ratings
- New users and items with no records
- I only rated one book at Amazon!
- The person who reads ‘Harry Potter’ also reads ‘Kama Sutra’
5. Hacking
- Everyone reads Harry Potter!
Saturday, September 14, 2013
35. How does it show ?
Highlights More about this artist...
Listen to the similar songs
Someone similar to you also liked this...
Since you listened this, you may like this one...
Those items come together...
The most popular of your group...
New Releases
Saturday, September 14, 2013
36. Recommendable
Quickly add a recommender engine for Likes and
Dislikes to your Ruby app
http://davidcel.is/recommendable/
Saturday, September 14, 2013
38. Recommendable
gem
'recommendable'
Add to your GemFile:
Saturday, September 14, 2013
39. Recommendable
require 'redis'
Recommendable.configure do |config|
# Recommendable's connection to Redis
config.redis = Redis.new(:host => 'localhost', :port => 6379, :db => 0)
# A prefix for all keys Recommendable uses
config.redis_namespace = :recommendable
# Whether or not to automatically enqueue users to have their
recommendations
# refreshed after they like/dislike an item
config.auto_enqueue = true
# The name of the queue that background jobs will be placed in
config.queue_name = :recommendable
# The number of nearest neighbors (k-NN) to check when updating
# recommendations for a user. Set to `nil` if you want to check all
# other users as opposed to a subset of the nearest ones.
config.nearest_neighbors = nil
end
Create a configuration initializer:
Saturday, September 14, 2013
40. Recommendable
In your ONE model that will be receiving the
recommendations:
class User
recommends :movies, :books, :minerals,
:other_things
# ...
end
Saturday, September 14, 2013
43. Recommendable
You can also like your recommendable objects
>> user.like(movie)
=> true
>> user.likes?(movie)
=> true
>> user.rated?(movie)
=> true # also true if user.dislikes?(movie)
>> user.liked_movies
=> [#<Movie id: 23, name: "2001: A Space Odyssey">]
>> user.liked_movie_ids
=> ["23"]
>> user.like(book)
=> true
>> user.likes
=> [#<Movie id: 23, name: "2001: A Space Odyssey">, #<Book id: 42, title: "100 Years of Solitude">]
>> user.likes_count
=> 2
>> user.liked_movies_count
=> 1
>> user.likes_in_common_with(friend)
=> [#<Movie id: 23, name: "2001: A Space Odyssey">, #<Book id: 42, title: "100 Years of Solitude">]
>> user.liked_movies_in_common_with(friend)
=> [#<Movie id: 23, name: "2001: A Space Odyssey">]
>> movie.liked_by_count
=> 2
>> movie.liked_by
=> [#<User username: 'davidbowman'>, #<User username: 'frankpoole'>]
Saturday, September 14, 2013
44. Recommendable
Obviously, You can also DISLIKE your recommendable
objects
>> user.dislike(movie)
>> user.dislikes?(movie)
>> user.disliked_movies
>> user.disliked_movie_ids
>> user.dislikes
>> user.dislikes_count
>> user.disliked_movies_count
>> user.dislikes_in_common_with(friend)
>> user.disliked_movies_in_common_with(friend)
>> movie.disliked_by_count
>> movie.disliked_by
Saturday, September 14, 2013
45. Recommendable
Recommendations
>> friend.like(Movie.where(:name => "2001: A Space Odyssey").first)
>> friend.like(Book.where(:title => "A Clockwork Orange").first)
>> friend.like(Book.where(:title => "Brave New World").first)
>> friend.like(Book.where(:title => "One Flew Over the Cuckoo's Next").first)
>> user.like(Book.where(:title => "A Clockwork Orange").first)
=> [#<User username: "frankpoole">, #<User username: "davidbowman">, ...]
>> user.recommended_books # Defaults to 10 recommendations
=> [#<Book title: "Brave New World">, #<Book title: "One Flew Over the Cuckoo's
Nest">]
>> user.similar_raters # Defaults to 10 similar users
=> [#<
>> user.recommended_movies(10, 30) # 10 Recommendations, offset by 30 (i.e. page
4)
=> [#<Movie name: "A Clockwork Orange">, #<Movie name: "Chinatown">, ...]
>> user.similar_raters(25, 50) # 25 similar users, offset by 50 (i.e. page 3)
=> [#<User username: "frankpoole">, #<User username: "davidbowman">, ...]
Saturday, September 14, 2013
46. Recommendable
Jaccard Similarity
Marcel likes A, B, C and dislikes D
Amanda likes A, B and dislikes C
Guilherme likes C, D and dislikes A
Flavio likes B, C, E and dislikes D
J(Marcel, Amanda) =
([A,B].size + [].size - [C].size - [].size) / [A,B,C,D].size
J(Marcel, Amanda) =
2 + 0 - 1 - 0 / 4 = 1/4 = 0.25
Saturday, September 14, 2013
47. Recommendable
Jaccard Similarity
Marcel likes A, B, C and dislikes D
Amanda likes A, B and dislikes C
Guilherme likes C, D and dislikes A
Flavio likes B, C, E and dislikes D
J(Marcel, Guilherme) =
([C].size + [].size - [A].size - [D].size) / [A,B,C,D].size
J(Marcel, Guilherme) =
1 + 0 - 1 - 1 / 4 = 1/4 = - 0.25
Saturday, September 14, 2013
48. Recommendable
Jaccard Similarity
Marcel likes A, B, C and dislikes D
Amanda likes A, B and dislikes C
Guilherme likes C, D and dislikes A
Flavio likes B, C, E and dislikes D
J(Marcel, Flavio) =
([B,C].size + [D].size - [].size - [].size) / [A,B,C,D, E].size
J(Marcel, Flavio) =
2 + 0 - 0 - 0 = 2/5 = 0.4
Saturday, September 14, 2013
49. Recommendable
Jaccard Similarity
MostSimilar(Marcel) = [ (Flavio, 0.4) , (Amanda, 0.25) , (Guilherme, -0.25)]
Marcel likes A, B, C and dislikes D
Amanda likes A, B and dislikes C
Guilherme likes C, D and dislikes A
Flavio likes B, C, E and dislikes D
Saturday, September 14, 2013
50. Recommendable
Recommendations
>> Movie.top
=> #<Movie name: "2001: A Space Odyssey">
>> Movie.top(3)
=> [#<Movie name: "2001: A Space Odyssey">, #<Movie name: "A Clockwork Orange">,
#<Movie name: "The Shining">]
The best of your recommendable models
Wilson score confidence - Reddit Algorithm
Saturday, September 14, 2013
51. Recommendable
Callbacks
class User < ActiveRecord::Base
has_one :feed
recommends :movies
after_like :update_feed
def update_feed(obj)
feed.update "liked #{obj.name}"
end
end
apotonick/hooks to implement callbacks for liking,
disliking, etc
Saturday, September 14, 2013
53. redis makes the magic!
Manual recommendations
Saturday, September 14, 2013
54. redis makes the magic!
Manual recommendations
Saturday, September 14, 2013
55. Recommendable
module
Recommendable
module
Workers
class
Resque
include
::Resque::Plugins::UniqueJob
if
defined?(::Resque::Plugins::UniqueJob)
@queue
=
:recommendable
def
self.perform(user_id)
Recommendable::Helpers::Calculations.update_similarities_for(user_id)
Recommendable::Helpers::Calculations.update_recommendations_for(user_id)
end
end
end
end
Recommendations over Queueing System
Put the workers to do the job! (SideKiq, Resque, DelayedJob)
Saturday, September 14, 2013
60. Ruby in the world of
recommendations
(also machine learning, statistics and visualizations..)
Marcel Caraciolo
@marcelcaraciolo
Developer, Cientist, contributor to the Crab recsys project,
works with Python for 6 years, interested at mobile,
education, machine learning and dataaaaa!
Recife, Brazil - http://aimotion.blogspot.com
Saturday, September 14, 2013