SlideShare une entreprise Scribd logo
1  sur  14
Big Data amb Cassandra i Celery
#bbmnk novembre 2013

Santi Camps Taltavull
@santicamps
@socialvane
La Problemàtica (Big Data)

➲
➲
➲
➲
➲

Gran volum d'informació (TeraBytes)
Informació no estructurada
Poca densitat d'informació útil
Altíssima capacitat de processament
Poca pasta
Les solucions aplicades

➲
➲
➲
➲
➲
➲
➲
➲
➲
➲
➲
➲

BBDD distribuida Cassandra
Gestor de tasques distribuides Celery
Gestor de missatgeria RabbitMQ
Aplicació --> RabbitMQ --> Celery <--> Cassandra
4 servidors inicials
12 TB de capacitat
208 GB de RAM
44 nuclis de CPU
Tolerant a fallades
Redundant
Molt Fàcilment Escalable
I Barat !!
Cassandra

➲
➲
➲
➲
➲
➲

Neix dins de Facebook i s'allibera
L'adopta la fundació Apache
Twitter també l'empra
Està escrit amb Java
És una BBDD NO SQL
Les dades es guarden com a clau -> valor
Cassandra - Avantatges

➲
➲
➲
➲
➲

BBDD distribuida
Redundància configurable
Tolerant a fallades
Preparada per WAN
Totalment Escalable
Cassandra - Inconvenients

➲
➲
➲
➲

No té gestió de transaccions
Es coordina amb timestamps
En mode RandomPartitioner no permet ordenar
En mode RandomPartitioner filtrar es fa difícil
Cassandra - Característiques

➲
➲
➲
➲
➲

Name Space = BBDD
Column Family = Taula
Cada Registre pot tenir columnes diferents
Un Registre pot tenir milions de columnes
Tots es guarda com a clau -> valor
Cassandra - Exemple
create column family user_item with key_validation_class = 'UTF8Type' and comparator =
'UTF8Type' and default_validation_class = 'UTF8Type'
and column_metadata=[
{ column_name: source,
validation_class: UTF8Type, index_type: KEYS},
{column_name: user_name,
validation_class: UTF8Type, index_type: KEYS},
{column_name: type,
validation_class: UTF8Type},
{column_name: last_update,
validation_class: UTF8Type},
{column_name: id,
validation_class: UTF8Type},
{column_name: profile_image_url, validation_class: UTF8Type},
{column_name: name,
validation_class: UTF8Type},
{column_name: friends_count,
validation_class: UTF8Type},
{column_name: followers_count, validation_class: UTF8Type},
{column_name: location,
validation_class: UTF8Type},
{column_name: description,
validation_class: UTF8Type},
{column_name: lang,
validation_class: UTF8Type},
{column_name: geo_latitude,
validation_class: FloatType, index_type: KEYS},
{column_name: geo_longitude,
validation_class: FloatType, index_type: KEYS},
{column_name: geo_radious,
validation_class: FloatType},
];
Cassandra - Exemple
get user_item['facebook.santi.camps.58']
... ;
=> (name=description, value=Me dedico a ..., timestamp=1383782405981374)
=> (name=followers_count, value=0, timestamp=1383782405981374)
=> (name=friends_count, value=, timestamp=1383782405981374)
=> (name=geo_latitude, value=4.264729, timestamp=1383782405981374)
=> (name=geo_longitude, value=39.88943, timestamp=1383782405981374)
=> (name=geo_radious, value=8.976159, timestamp=1383782405981374)
=> (name=id, value=100000444843078, timestamp=1383782405981374)
=> (name=lang, value=en_GB, timestamp=1383782405981374)
=> (name=last_update, value=2013-11-07T01:00:05.981352, timestamp=1383782405981374)
=> (name=location, value=Mahón, Islas Baleares, Spain, timestamp=1383782405981374)
=> (name=name, value=Santi Camps, timestamp=1383782405981374)
=> (name=profile_image_url, value=https://graph.facebook.com/santi.camps.58/picture,
timestamp=1383782405981374)
=> (name=profile_url, value=https://www.facebook.com/santi.camps.58,
timestamp=1383782405981374)
=> (name=source, value=facebook, timestamp=1383782405981374)
=> (name=type, value=user, timestamp=1383782405981374)
=> (name=user_name, value=santi.camps.58, timestamp=1383782405981374)
Cassandra - Indexació
get user_follower_index['santicamps58.facebook.current'];
=> (name=2013-10-29T11:09:01.979083, value=santicamps58.facebook.100000561127539,
timestamp=1381823950979106)
=> (name=2013-10-27T09:59:07.980314, value=santicamps58.facebook.1810751517,
timestamp=1381823950980330)
=> (name=2013-10-11T07:50:10.980547, value=santicamps58.facebook.100002326398873,
timestamp=1381823950980559)
...
get user_follower_item['santicamps58.facebook.100002326398873'];
=> (name=fetch_date, value=2013-10-15, timestamp=1381823950980662)
=> (name=friend_count, value=134, timestamp=1381823950980662)
=> (name=id, value=100002326398873, timestamp=1381823950980662)
=> (name=lang, value=, timestamp=1381823950980662)
=> (name=name, value=Diego Izquierdo Carranza, timestamp=1381823950980662)
=> (name=profile_image_url, value=https://graph.facebook.com/diego.izquierdocarranza/picture,
timestamp=1381823950980662)
=> (name=profile_url, value=https://www.facebook.com/diego.izquierdocarranza,
timestamp=1381823950980662)
=> (name=source, value=facebook, timestamp=1381823950980662)
=> (name=start_date, value=2013-10-15, timestamp=1381823950980662)
=> (name=user_name, value=diego.izquierdocarranza, timestamp=1381823950980662)
Cassandra - Indexació
get mention_tag_source_index['803.possitive'];
...
=> (name=2013-11-08T02:00:27.361445, value=803__-UzkY7psQTYJ,
timestamp=1383876396514768)
=> (name=2013-11-08T06:53:57, value=803__twitter.398704931630481408,
timestamp=1383894677856944)
=> (name=2013-11-08T06:54:38, value=803__twitter.398705100648382464,
timestamp=1383894677646453)
=> (name=2013-11-08T06:57:51, value=803__twitter.398705909511503872,
timestamp=1383894677313681)
...
get mention_tag_source_index['803.possitive.google'];
=> (name=2012-12-01T00:00:00.395260, value=803__YfOIKwVseDkJ,
timestamp=1381830781423739)
=> (name=2012-12-01T00:00:00.420936, value=803__YfOIKwVseDkJ,
timestamp=1381867147942586)
=> (name=2012-12-01T00:00:00.633055, value=803__YfOIKwVseDkJ,
timestamp=1381830436666804)
=> (name=2013-06-14T00:00:00.055140, value=803__5Bv2Eu9qk04J,
timestamp=1381867142254676)
Cassandra - Indexació
get mention_item['803__twitter.398705909511503872'];
=> (name=body, value=@SocialVane INTERESANTÍSIMA HERRAMIENTA DE ANÁLISIS PARA
REDES SOCIALES, timestamp=1383894677307778)
=> (name=body_norm, value=your_brand interesante herramienta analisis red your_brand,
timestamp=1383894677307778)
=> (name=brand, value=103, timestamp=1383894677307778)
=> (name=checked, value=false, timestamp=1383894677307778)
=> (name=emissor, value=SebastianCamps, timestamp=1383894677307778)
=> (name=emissor_id, value=234140801, timestamp=1383894677307778)
=> (name=emissor_name, value=Sebastián Camps , timestamp=1383894677307778)
=> (name=geo, value=None, timestamp=1383894677307778)
=> (name=id, value=398705909511503872, timestamp=1383894677307778)
=> (name=in_reply_to_id, value=, timestamp=1383894677307778)
=> (name=interest, value=, timestamp=1383894677307778)
=> (name=interest_checked, value=False, timestamp=1383894677307778)
=> (name=lang, value=es, timestamp=1383894677307778)
=> (name=like_action_count, value=0, timestamp=1383894677307778)
=> (name=probability, value=0.482361909795, timestamp=1383894677307778)
=> (name=query, value=803, timestamp=1383894677307778)
=> (name=reply_action_count, value=0, timestamp=1383894677307778)
=> (name=retweeted, value=False, timestamp=1383894677307778)
=> (name=share_action_count, value=0, timestamp=1383894677307778)
=> (name=source, value=twitter, timestamp=1383894677307778)
=> (name=tag, value=possitive, timestamp=1383894677307778)
=> (name=time, value=2013-11-08T06:57:51, timestamp=1383894677307778)
Celery

➲
➲
➲
➲
➲

Es configuren cues d'execució
S'engeguen N workers a M màquines escoltant cada cua
Les tasques distribuibles es marquen al codi
Es defineix la cua d'execució de cada tasca
Es poden cridar síncronament o asíncrona

➲
➲
➲

Molt senzill d'implantar
Molt fàcil d'escalar
Cal vigilar la concurrència
Celery Exemple

CELERY_ROUTES = {
"celeryutils.track_all_users_followers": {"queue": "slow", "routing_key": "slow_task"},
"userfollowers.bulk_insert": {"queue": "slow", "routing_key": "slow_task"},
"extract_mentions_from_website": {"queue": "slow", "routing_key": "slow_task"},
"LeadsClassifier.classify_untagged": {"queue": "cpu", "routing_key": "cpu_task"},
...
@task(name = 'extract_mentions_from_website', time_limit=300)
def extract_mentions_from_website(brand, query,...):
...
# CRIDA LOCAL
extract_mentions_from_website(params)
# CRIDA DISTRIBUIDA ASÍNCRONA
extract_mentions_from_website.delay(params)
# CRIDA DISTRIBUIDA SÍNCRONA
extract_mentions_from_website.delay(params).get()

Contenu connexe

En vedette

En vedette (19)

Conferencia Big Data en #MenorcaConnecta
Conferencia Big Data en #MenorcaConnectaConferencia Big Data en #MenorcaConnecta
Conferencia Big Data en #MenorcaConnecta
 
Transparencias taller Python
Transparencias taller PythonTransparencias taller Python
Transparencias taller Python
 
Knowing your garbage collector - PyCon Italy 2015
Knowing your garbage collector - PyCon Italy 2015Knowing your garbage collector - PyCon Italy 2015
Knowing your garbage collector - PyCon Italy 2015
 
BDD - Test Academy Barcelona 2017
BDD - Test Academy Barcelona 2017BDD - Test Academy Barcelona 2017
BDD - Test Academy Barcelona 2017
 
Tidy vews, decorator and presenter
Tidy vews, decorator and presenterTidy vews, decorator and presenter
Tidy vews, decorator and presenter
 
Madrid SPARQL handson
Madrid SPARQL handsonMadrid SPARQL handson
Madrid SPARQL handson
 
Python Dominicana 059: Django Migrations
Python Dominicana 059: Django MigrationsPython Dominicana 059: Django Migrations
Python Dominicana 059: Django Migrations
 
Volunteering assistance to online geocoding services through a distributed kn...
Volunteering assistance to online geocoding services through a distributed kn...Volunteering assistance to online geocoding services through a distributed kn...
Volunteering assistance to online geocoding services through a distributed kn...
 
Introduccio a python
Introduccio a pythonIntroduccio a python
Introduccio a python
 
STM on PyPy
STM on PyPySTM on PyPy
STM on PyPy
 
TDD in the Web with Python and Django
TDD in the Web with Python and DjangoTDD in the Web with Python and Django
TDD in the Web with Python and Django
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
Presentacion scraping
Presentacion scrapingPresentacion scraping
Presentacion scraping
 
Grunt.js introduction
Grunt.js introductionGrunt.js introduction
Grunt.js introduction
 
PyQgis gpul-lab Univerisity of A Coruña 20160413
PyQgis gpul-lab Univerisity of A Coruña 20160413PyQgis gpul-lab Univerisity of A Coruña 20160413
PyQgis gpul-lab Univerisity of A Coruña 20160413
 
Charla mspba
Charla mspbaCharla mspba
Charla mspba
 
Guía de Python
Guía de Python Guía de Python
Guía de Python
 
Geospatial and MongoDB
Geospatial and MongoDBGeospatial and MongoDB
Geospatial and MongoDB
 
Bucles con Scratch
Bucles con ScratchBucles con Scratch
Bucles con Scratch
 

Similaire à Big data amb Cassandra i Celery ##bbmnk

Similaire à Big data amb Cassandra i Celery ##bbmnk (20)

Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and Hailo
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
 
Cassandra 3.x et la future 4.0
Cassandra 3.x et la future 4.0Cassandra 3.x et la future 4.0
Cassandra 3.x et la future 4.0
 
Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)
 
Using Spark over Cassandra
Using Spark over CassandraUsing Spark over Cassandra
Using Spark over Cassandra
 
C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo
C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo
C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo
 
Results cache
Results cacheResults cache
Results cache
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
 
Ten modules I haven't yet talked about
Ten modules I haven't yet talked aboutTen modules I haven't yet talked about
Ten modules I haven't yet talked about
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
MUC - Moodle Universal Cache
MUC - Moodle Universal CacheMUC - Moodle Universal Cache
MUC - Moodle Universal Cache
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
 
Think Distributed: The Hazelcast Way
Think Distributed: The Hazelcast WayThink Distributed: The Hazelcast Way
Think Distributed: The Hazelcast Way
 
PWA caching strategies
PWA caching strategiesPWA caching strategies
PWA caching strategies
 
Distributed caching and computing v3.7
Distributed caching and computing v3.7Distributed caching and computing v3.7
Distributed caching and computing v3.7
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Big data amb Cassandra i Celery ##bbmnk

  • 1. Big Data amb Cassandra i Celery #bbmnk novembre 2013 Santi Camps Taltavull @santicamps @socialvane
  • 2. La Problemàtica (Big Data) ➲ ➲ ➲ ➲ ➲ Gran volum d'informació (TeraBytes) Informació no estructurada Poca densitat d'informació útil Altíssima capacitat de processament Poca pasta
  • 3. Les solucions aplicades ➲ ➲ ➲ ➲ ➲ ➲ ➲ ➲ ➲ ➲ ➲ ➲ BBDD distribuida Cassandra Gestor de tasques distribuides Celery Gestor de missatgeria RabbitMQ Aplicació --> RabbitMQ --> Celery <--> Cassandra 4 servidors inicials 12 TB de capacitat 208 GB de RAM 44 nuclis de CPU Tolerant a fallades Redundant Molt Fàcilment Escalable I Barat !!
  • 4. Cassandra ➲ ➲ ➲ ➲ ➲ ➲ Neix dins de Facebook i s'allibera L'adopta la fundació Apache Twitter també l'empra Està escrit amb Java És una BBDD NO SQL Les dades es guarden com a clau -> valor
  • 5. Cassandra - Avantatges ➲ ➲ ➲ ➲ ➲ BBDD distribuida Redundància configurable Tolerant a fallades Preparada per WAN Totalment Escalable
  • 6. Cassandra - Inconvenients ➲ ➲ ➲ ➲ No té gestió de transaccions Es coordina amb timestamps En mode RandomPartitioner no permet ordenar En mode RandomPartitioner filtrar es fa difícil
  • 7. Cassandra - Característiques ➲ ➲ ➲ ➲ ➲ Name Space = BBDD Column Family = Taula Cada Registre pot tenir columnes diferents Un Registre pot tenir milions de columnes Tots es guarda com a clau -> valor
  • 8. Cassandra - Exemple create column family user_item with key_validation_class = 'UTF8Type' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and column_metadata=[ { column_name: source, validation_class: UTF8Type, index_type: KEYS}, {column_name: user_name, validation_class: UTF8Type, index_type: KEYS}, {column_name: type, validation_class: UTF8Type}, {column_name: last_update, validation_class: UTF8Type}, {column_name: id, validation_class: UTF8Type}, {column_name: profile_image_url, validation_class: UTF8Type}, {column_name: name, validation_class: UTF8Type}, {column_name: friends_count, validation_class: UTF8Type}, {column_name: followers_count, validation_class: UTF8Type}, {column_name: location, validation_class: UTF8Type}, {column_name: description, validation_class: UTF8Type}, {column_name: lang, validation_class: UTF8Type}, {column_name: geo_latitude, validation_class: FloatType, index_type: KEYS}, {column_name: geo_longitude, validation_class: FloatType, index_type: KEYS}, {column_name: geo_radious, validation_class: FloatType}, ];
  • 9. Cassandra - Exemple get user_item['facebook.santi.camps.58'] ... ; => (name=description, value=Me dedico a ..., timestamp=1383782405981374) => (name=followers_count, value=0, timestamp=1383782405981374) => (name=friends_count, value=, timestamp=1383782405981374) => (name=geo_latitude, value=4.264729, timestamp=1383782405981374) => (name=geo_longitude, value=39.88943, timestamp=1383782405981374) => (name=geo_radious, value=8.976159, timestamp=1383782405981374) => (name=id, value=100000444843078, timestamp=1383782405981374) => (name=lang, value=en_GB, timestamp=1383782405981374) => (name=last_update, value=2013-11-07T01:00:05.981352, timestamp=1383782405981374) => (name=location, value=Mahón, Islas Baleares, Spain, timestamp=1383782405981374) => (name=name, value=Santi Camps, timestamp=1383782405981374) => (name=profile_image_url, value=https://graph.facebook.com/santi.camps.58/picture, timestamp=1383782405981374) => (name=profile_url, value=https://www.facebook.com/santi.camps.58, timestamp=1383782405981374) => (name=source, value=facebook, timestamp=1383782405981374) => (name=type, value=user, timestamp=1383782405981374) => (name=user_name, value=santi.camps.58, timestamp=1383782405981374)
  • 10. Cassandra - Indexació get user_follower_index['santicamps58.facebook.current']; => (name=2013-10-29T11:09:01.979083, value=santicamps58.facebook.100000561127539, timestamp=1381823950979106) => (name=2013-10-27T09:59:07.980314, value=santicamps58.facebook.1810751517, timestamp=1381823950980330) => (name=2013-10-11T07:50:10.980547, value=santicamps58.facebook.100002326398873, timestamp=1381823950980559) ... get user_follower_item['santicamps58.facebook.100002326398873']; => (name=fetch_date, value=2013-10-15, timestamp=1381823950980662) => (name=friend_count, value=134, timestamp=1381823950980662) => (name=id, value=100002326398873, timestamp=1381823950980662) => (name=lang, value=, timestamp=1381823950980662) => (name=name, value=Diego Izquierdo Carranza, timestamp=1381823950980662) => (name=profile_image_url, value=https://graph.facebook.com/diego.izquierdocarranza/picture, timestamp=1381823950980662) => (name=profile_url, value=https://www.facebook.com/diego.izquierdocarranza, timestamp=1381823950980662) => (name=source, value=facebook, timestamp=1381823950980662) => (name=start_date, value=2013-10-15, timestamp=1381823950980662) => (name=user_name, value=diego.izquierdocarranza, timestamp=1381823950980662)
  • 11. Cassandra - Indexació get mention_tag_source_index['803.possitive']; ... => (name=2013-11-08T02:00:27.361445, value=803__-UzkY7psQTYJ, timestamp=1383876396514768) => (name=2013-11-08T06:53:57, value=803__twitter.398704931630481408, timestamp=1383894677856944) => (name=2013-11-08T06:54:38, value=803__twitter.398705100648382464, timestamp=1383894677646453) => (name=2013-11-08T06:57:51, value=803__twitter.398705909511503872, timestamp=1383894677313681) ... get mention_tag_source_index['803.possitive.google']; => (name=2012-12-01T00:00:00.395260, value=803__YfOIKwVseDkJ, timestamp=1381830781423739) => (name=2012-12-01T00:00:00.420936, value=803__YfOIKwVseDkJ, timestamp=1381867147942586) => (name=2012-12-01T00:00:00.633055, value=803__YfOIKwVseDkJ, timestamp=1381830436666804) => (name=2013-06-14T00:00:00.055140, value=803__5Bv2Eu9qk04J, timestamp=1381867142254676)
  • 12. Cassandra - Indexació get mention_item['803__twitter.398705909511503872']; => (name=body, value=@SocialVane INTERESANTÍSIMA HERRAMIENTA DE ANÁLISIS PARA REDES SOCIALES, timestamp=1383894677307778) => (name=body_norm, value=your_brand interesante herramienta analisis red your_brand, timestamp=1383894677307778) => (name=brand, value=103, timestamp=1383894677307778) => (name=checked, value=false, timestamp=1383894677307778) => (name=emissor, value=SebastianCamps, timestamp=1383894677307778) => (name=emissor_id, value=234140801, timestamp=1383894677307778) => (name=emissor_name, value=Sebastián Camps , timestamp=1383894677307778) => (name=geo, value=None, timestamp=1383894677307778) => (name=id, value=398705909511503872, timestamp=1383894677307778) => (name=in_reply_to_id, value=, timestamp=1383894677307778) => (name=interest, value=, timestamp=1383894677307778) => (name=interest_checked, value=False, timestamp=1383894677307778) => (name=lang, value=es, timestamp=1383894677307778) => (name=like_action_count, value=0, timestamp=1383894677307778) => (name=probability, value=0.482361909795, timestamp=1383894677307778) => (name=query, value=803, timestamp=1383894677307778) => (name=reply_action_count, value=0, timestamp=1383894677307778) => (name=retweeted, value=False, timestamp=1383894677307778) => (name=share_action_count, value=0, timestamp=1383894677307778) => (name=source, value=twitter, timestamp=1383894677307778) => (name=tag, value=possitive, timestamp=1383894677307778) => (name=time, value=2013-11-08T06:57:51, timestamp=1383894677307778)
  • 13. Celery ➲ ➲ ➲ ➲ ➲ Es configuren cues d'execució S'engeguen N workers a M màquines escoltant cada cua Les tasques distribuibles es marquen al codi Es defineix la cua d'execució de cada tasca Es poden cridar síncronament o asíncrona ➲ ➲ ➲ Molt senzill d'implantar Molt fàcil d'escalar Cal vigilar la concurrència
  • 14. Celery Exemple CELERY_ROUTES = { "celeryutils.track_all_users_followers": {"queue": "slow", "routing_key": "slow_task"}, "userfollowers.bulk_insert": {"queue": "slow", "routing_key": "slow_task"}, "extract_mentions_from_website": {"queue": "slow", "routing_key": "slow_task"}, "LeadsClassifier.classify_untagged": {"queue": "cpu", "routing_key": "cpu_task"}, ... @task(name = 'extract_mentions_from_website', time_limit=300) def extract_mentions_from_website(brand, query,...): ... # CRIDA LOCAL extract_mentions_from_website(params) # CRIDA DISTRIBUIDA ASÍNCRONA extract_mentions_from_website.delay(params) # CRIDA DISTRIBUIDA SÍNCRONA extract_mentions_from_website.delay(params).get()