Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Fast learning to-rank financial entity resolution

- Alessio Russo Introito, M.Sc. student in Computer Science and Engineering
- Matteo Moreschini, M.Sc. student in Computer Science and Engineering

Entity Resolution is increasingly important for banking institutions to detect suspicious activities and fake identities and must be performed quickly and accurately, as a single mistake can be extremely expensive.
Traditional strategies make heavy use of string comparisons, reaching good results only at the cost of high computational effort. We present a novel entity resolution technique that speeds up this process concatenating textual feature similarity with gradient-boosted decision trees and a de-duplication classifier. We obtain 98% of Precision and Recall on a dataset derived from the Panama Paper documents, containing 700.000 financial records, while achieving real-time predictions and flexible online learning

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Fast learning to-rank financial entity resolution

  1. 1. FAST LEARNING-TO-RANK FINANCIAL ENTITY RESOLUTION Matteo Moreschini matteo1.moreschini@mail.polimi.it Alessio Russo Introito alessio2.russo@mail.polimi.it Alberto Parravicini Marco D. Santambrogio vNGC July 19th, 2020
  2. 2. 2 Bank frauds Moreschini Matteo July 19th, 2020
  3. 3. Similarity filtering Name Address Phone Vincent Pope Anfos Avenue +3944232111 ... ... ... HHELMET COPMANY INC. 150 West Capitol St NE +2024586016 Name Address Phone HELMET INC, COP 150 East Capitol St NE +2024586016 Edit Distance 3 ● O(N²) string comparisons Name Address Phone HHELMET COPMANY INC. 150 West Capitol St NE +2024586016 Moreschini Matteo July 19th, 2020
  4. 4. N-GRAMS RECORDS N-GRAMSRECORDS N-GRAMS RECORDS X Cosine, Jaccard RECORDS RECORDS = Field similarity ● Sparse matrix multiplication 4 N-grams to the rescue HEL SRE SSE INC FRA MET LLE CRI LME ELM 1 0 0 1 0 1 0 0 1 1 Name HELMET INC, COP Moreschini Matteo July 19th, 2020
  5. 5. Re-ranking Top K similar records re-ranking Final Prediction Classifier Binary classifier to detect duplicated records Online Learning Fast live prediction for new records RECORDS RECORDS Top K Re-Ranked Top k Moreschini Matteo July 19th, 2020 1 2 3 1 N-GRAMS RECORDS X = N-GRAMS RECORDS ... Training ... ... Training Increment Prediction
  6. 6. Results Moreschini Matteo July 19th, 2020 Throughput achievable latency < 1 sec per document 98% Precision & Recall 1h Training Time (on 700k records)
  7. 7. Why now? Moreschini Matteo July 19th, 2020
  8. 8. Why now? Moreschini Matteo July 19th, 2020 Matteo Moreschini matteo1.moreschini@mail.polimi.it Alessio Russo Introito alessio2.russo@mail.polimi.it Alberto Parravicini Marco D. Santambrogio

×