Some insights about data curation processes @ SpazioDati. How we're using Big Data tools, Linked Data technologies, to build our products: Dandelion API (dandelion.eu) and Atoka (atoka.io).
5. !
!
Data curation is the process of turning independently
created data sources (structured and semi-structured data)
into unified data sets ready for analytics,
using domain experts to guide the process.
http://strataconf.com/stratany2014/public/schedule/detail/36021
6. a lot of things involved
!
ETL (Extract-Transform-Load) tools
Data Science tools
Linked Data tools
Big Data tools
Domain Knowledge
15. Our Entity Extraction API is based on a graph
Brussels
Paris
Berlin
Eiffel Tower
2009 World Championships
in Athletics
King Baudouin Stadium
Champ de Mars
0.42
0.80
0.43
0.53
0.53
0.53
0.63
0.59
0.440.44
https://dandelion.eu/docs/api/datatxt/nex/v1/
28. Search: how it works
Direct search of one particular
company through its name or “partita
iva” (vat number)
Content search into company websites
Keyword search among extracted and
refined entities from company resources
!
Dandelion API is the extraction engine!
1.
2. [*]
3. [*]
38. References
1) From raw data to dataGEMs for developers - http://ceur-ws.org/Vol-1268/paper1.pdf
2) Knowledge Graph ovunque:
http://www.slideshare.net/dagoneye/knowledge-graphs-ovunque-un-quadro-di-insieme-e-le-implicazioni-per-uno-sviluppo-condiviso-del-web-of-data
3) Linking Enterprise Data - https://www.springer.com/it/book/9781441976642
4) Using OpenRefine - https://www.packtpub.com/big-data-and-business-intelligence/using-openrefine
5) WhyYour Business Needs A Customer Data Knowledge Graph -
http://www.dataversity.net/business-needs-customer-data-knowledge-graph/
6) Enabling parallel processing for OpenRefine: Spark vs Akka -
http://refinepro.com/blog/enabling-parallel-processing-for-openrefine-spark-vs-akka/