1. WHAT’S IN A NAME?
- IDENTITY & CULTURE
- GENDER (LE SEXE)
NamSor Applied Onomastics
1
2014-06-26
2. Où sont les Femmes?
2
Male
86%
Female
14%
Paris DataGeek
Gender Gap
3. Où sont les Femmes? ex. le CINOCHE
Mining 5M names to assess GENDER*
3
IMDB File THE CINEMATOGRAPHERS LIST
Name Origin Male Female Unknown
France 82% 16% 2%
Tunisia 77% 16% 8%
Morocco 80% 15% 5%
Algeria 86% 11% 3%
Ireland 89% 10% 1% *Using NamSor GendRE API v0.0.13
4. What’s in a name? What’s a name?
4
Elena Rossini
@_Elena (Twitter)
Elian Carsenat
@ElianCarsenat (Twitter)
elian.carsenat@namsor.com
elian.carsenat@sfr.fr
tioulpanov (Skype)
NamSor.com
+ Social Network (LinkedIn, Twitter, FB …) : more names
Onomastics = the science of proper names
5. NamSor socio linguistics algorithm
5
FN LN
MetteAndersen
LeneAndersson
EvaArndt-Riise
HeidiAstrup
MieAugustesen
MargotBærentzen
LouiseBager Nørgaard
MarieBagger Rasmussen
YuttaBarding
UllaBarding-Poulsen
FN LN
XianDongmei
ZhengDongmei
JinDongxiang
XuDongxiang
LiDongxiao
QinDongya
LiDongying
HanDuan
LiDuihong
JiangFan
Training set : Athletes
Step 1 – Learn stereotypes
bitao gong
biwang jiang
birgitta agerberth
birgitte l. eriksen
bitao gong
bitten thorengaard
biwang Jiang
birgitta agerberth
birgitte l. eriksen
bitten thorengaard
Data set : Actors
Step 2 – Classify
7. Mining 3M Geo-Tweets to map FLOWS
7
Source Target Type Id Onoma Weight
United Kingdom France Directed 16 Great Britain 37
Spain France Directed 55 Spain 14
United States France Directed 75 Great Britain 12
Turkey France Directed 79 Turkey 11
Brazil France Directed 87 Portugal 10
United Kingdom France Directed 112 Ireland 9
Italy France Directed 152 Italy 7
Switzerland France Directed 226 France 5
Belgium France Directed 247 France 5
United Kingdom France Directed 258 France 5
Mexico France Directed 287 Spain 4
Ireland France Directed 317 Great Britain 4
United Kingdom France Directed 333 Italy 4
United States France Directed 375 France 4
Source: Twitter
8. Isn’t predicting gender SIMPLE?
8
Can you tell:
Andrea/Rossini vs. Andrea/Parker
O./Sokolova
Kjell/Bergqvist
声涛/周
בנימין/נתניהו
معين/المرعبي
Our target, globally for all countries/lang./cultures:
99% precision, 99% recall for both Male & Female
9. We’re getting there, combining
classic baby name statistics with our unique algorithm
9
100% of objectives reached for 10 countries
75% of objectives reached for 28 countries
Currently, each version brings
~30% improvement!
11. 11
Improve your targeting, increase your open and
click rates by saying "Hello Sir", "Hello Madam"
without mistakes in your emailing
12. Conclusions
12
We recognize names in any language, any place, any
database; we can classify and we can sort
Onomastic class is no ‘hard fact’ like a place of birth, a
nationality, etc. but it’s accurate and fine-grain
Our sociolinguistic approach surpasses the traditional geo-
demographics or ‘dictionary’ approach used in the US/UK
Our unique capability to decrypt identity and gender in
high growth / emerging countries (Russia, Africa, India,
Indonesia…) can be put to work in a wide range of
applications
13. Elian Carsenat
http://fdimagnet.com/ http://namsor.com/
13
Juillet 2013, Ambassade de Lituanie à Paris
elian.carsenat@namsor.com
+33 6 52 77 99 07
Twitter @NamsSor_com