Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Making document search system slightly friendlier to the power user

166 vues

Publié le

Presentation from Search Solutions 2017,
2017.11.29, London, UK

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Making document search system slightly friendlier to the power user

  1. 1. Making document search system slightly friendlier to the power user. Judgements search case study Michał Łopuszyński 2017.11.29, London, UK Search Solutions 2017
  2. 2. saos.org.pl Before judgements scattered between many search systems• Goal: Unify access to Polish case-law• We provide unified search, rest API , WCAG compliant service• Data volume ~ 300k documents and growing• Constitutional Tribunal Import, metadata extraction http://saos.org.pl Supreme Court Common Courts National Appeals Chamber API Search Analysis ~3k daily visits•
  3. 3. saos.org.pl Side-goal: provide some non-mainstream approaches to explore document collections • The analysis tool (the trender) – in production• Creating maps of document collections – only in the lab•
  4. 4. The trender
  5. 5. The trender – saos.org.pl/analysis
  6. 6. Maps of document collections
  7. 7. Maps of document collections – a caveat All low dimensional "embeddings" are wrong• Some are useful (perhaps)• The graph from Matti Lyra, PyData Berlin 2017, https://www.youtube.com/watch?v=UkmIljRIG_M For t-SNE, see also https://distill.pub/2016/misread-tsne/
  8. 8. Maps of document collections – PCA vs t-SNE PCA t-SNE 2000 judgements from National Appeal Chamber, common court, Supreme Court, and Constitutional Tribunal visualised • M.Jungiewicz, M. Łopuszyński, Towards Meaningful Maps of Polish Case Law, JURIX 2015, 185 (2015)
  9. 9. Maps of document collections – PCA vs t-SNE The previous picture coloured by issuing court (however, note that issuing court was not used directly in map generation process) • National Appeal Chamber common courts Supreme Court Constitutional Tribunal PCA t-SNE M.Jungiewicz, M. Łopuszyński, Towards Meaningful Maps of Polish Case Law, JURIX 2015, 185 (2015)
  10. 10. Maps of document collections – t-SNE example 2000 judgements from common courts tagged with different keywords • granting pensions military pensions increase/recalculation of pensions pension compensation offence agreement personal rights M.Jungiewicz, M. Łopuszyński, Towards Meaningful Maps of Polish Case Law, JURIX 2015, 185 (2015)
  11. 11. Maps of document collections – in the wild Demo of Andrej Karpathy – papers, t-SNE based• http://cs.stanford.edu/people/karpathy/scholaroctopus/ Paperscape – papers, based on citation networks• http://paperscape.org
  12. 12. Acknowledgements The Team• Piotr Waglowski (the boss)• Data science team: Michał Jungiewicz, Michał Łopuszyński• Tech team: Łukasz Dumiszewski (tech lead), Aleksander Nowiński, Monika Maksymiuk, Krzysztof Mądry, Łukasz Pawełczak, Jan Pavtel • The funding• Grant of National Centre for Research and Development (PL), within Social Innovations programme • Network analysis team: Michał Bojanowski, Bartosz Chrol Monika Pawluczuk, •
  13. 13. Thank you for your attention! Questions? @lopusz http://slideshare.net/lopusz

×