Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
LinkedIn Salary: A System for Secure
Collection and Presentation of Structured
Compensation Insights to Job Seekers
Krishn...
Outline
▪ LinkedIn Salary Overview
▪ Challenges: Privacy, Modeling
▪ System Design & Architecture
▪ Privacy vs. Modeling T...
LinkedIn Salary (launched in Nov, 2016)
Salary Collection Flow via Email Targeting
Current Reach (July 2017)
▪ A few million responses out of several millions of
members targeted
– Targeted via emails sinc...
▪ Minimize the risk of inferring any one individual’s
compensation data
▪ Protection against data breach
– No single point...
▪ Modeling on aggregated data
▪ Evaluation
▪ Outlier detection
▪ Robustness and stability
Modeling Challenges
K. Kenthapad...
Problem Statement
▪How do we design LinkedIn Salary system taking into
account the unique privacy and security challenges,...
Differential Privacy? [Dwork et al, 2006]
▪Rich privacy literature (Adam-Worthmann, Samarati-Sweeney, Agrawal-
Srikant, …,...
Title Region
$$
User Exp
Designer
SF Bay
Area
100K
User Exp
Designer
SF Bay
Area
115K
... ... ...
Title Region
$$
User Exp...
System
Architecture
Collection
&
Storage
Collection & Storage
▪Allow members to submit their compensation info
▪Extract member attributes
– E.g., canonical job tit...
De-identification
&
Grouping
De-identification & Grouping
▪Approach inspired by k-Anonymity [Samarati-Sweeney]
▪“Cohort” or “Slice”
– Defined by a comb...
▪Slicing service
– Access member attribute
info & submission identifiers
(no compensation data)
– Generate slices & track ...
Insights
&
Modeling
Insights & Modeling
▪Salary insight service
– Check whether the
member is eligible
▪ Give-to-get model
– If yes, show the ...
Security
Mechanisms
Security
Mechanisms
▪Encryption of
member
attributes &
compensation
data using
different sets of
keys
– Separation of
proc...
Security
Mechanisms
▪Key rotation
▪No single point
of failure
▪Infra security
Preventing Timestamp Join based Attacks
▪Inference attack by joining these on timestamp
– De-identified compensation data
...
Privacy vs Modeling Tradeoffs
▪Our system deployed in production for ~1.5 years
▪Study tradeoffs between privacy guarantee...
Amount of
data
available
vs.
threshold, k
Percent of
data
available vs.
batch size, k
Median
delay due to
batching vs.
batch size, k
Summary
▪ LinkedIn Salary: a new internet application
– Privacy and Modeling Challenges
– System Design & Architecture
– P...
Thanks & Pointers
▪ Related tech report: K. Kenthapadi, S. Ambler, L. Zhang, and D. Agarwal,
Bringing salary transparency ...
LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers
Prochain SlideShare
Chargement dans…5
×

LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers

Online professional social networks such as LinkedIn have enhanced the ability of job seekers to discover and assess career opportunities, and the ability of job providers to discover and assess potential candidates. For most job seekers, salary (or broadly compensation) is a crucial consideration in choosing a new job. At the same time, job seekers face challenges in learning the compensation associated with different jobs, given the sensitive nature of compensation data and the dearth of reliable sources containing compensation data. Towards the goal of helping the world’s professionals optimize their earning potential through salary transparency, we present LinkedIn Salary, a system for collecting compensation information from LinkedIn members and providing compensation insights to job seekers. We present the overall design and architecture, and describe the key components needed for the secure collection, de-identification, and processing of compensation data, focusing on the unique challenges associated with privacy and security. We perform an experimental study with more than one year of compensation submission history data collected from over 1.5 million LinkedIn members, thereby demonstrating the tradeoffs between privacy and modeling needs. We also highlight the lessons learned from the production deployment of this system at LinkedIn.

Presented at IEEE Symposium on Privacy-Aware Computing (IEEE PAC), 2017.

Corresponding paper: LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers, IEEE Symposium on Privacy-Aware Computing, 2017 (available at https://arxiv.org/abs/1705.06976).

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers

  1. 1. LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers Krishnaram Kenthapadi Staff Software Engineer and Applied Researcher, LinkedIn (Joint work with Ahsan Chudhary, Stuart Ambler)
  2. 2. Outline ▪ LinkedIn Salary Overview ▪ Challenges: Privacy, Modeling ▪ System Design & Architecture ▪ Privacy vs. Modeling Tradeoffs
  3. 3. LinkedIn Salary (launched in Nov, 2016)
  4. 4. Salary Collection Flow via Email Targeting
  5. 5. Current Reach (July 2017) ▪ A few million responses out of several millions of members targeted – Targeted via emails since early 2016 ▪ Countries: US, CA, UK, DE ▪ Insights available for a large fraction of US monthly active users
  6. 6. ▪ Minimize the risk of inferring any one individual’s compensation data ▪ Protection against data breach – No single point of failure Data Privacy Challenges Achieved by a combination of techniques: encryption, access control, , aggregation, thresholding
  7. 7. ▪ Modeling on aggregated data ▪ Evaluation ▪ Outlier detection ▪ Robustness and stability Modeling Challenges K. Kenthapadi, S. Ambler, L. Zhang, and D. Agarwal, Bringing salary transparency to the world: Computing robust compensation insights via LinkedIn Salary, CIKM 2017 (arxiv.org/abs/1703.09845)
  8. 8. Problem Statement ▪How do we design LinkedIn Salary system taking into account the unique privacy and security challenges, while addressing the product requirements?
  9. 9. Differential Privacy? [Dwork et al, 2006] ▪Rich privacy literature (Adam-Worthmann, Samarati-Sweeney, Agrawal- Srikant, …, Kenthapadi et al, Machanavajjhala et al, Li et al, Dwork et al) – Limitation of anonymization techniques (Backstrom et al,…,Narayanan et al) ▪Worst case sensitivity of quantiles to any one user’s compensation data is large –  Large noise to be added, depriving reliability/usefulness ▪Need compensation insights on a continual basis – Theoretical work on applying differential privacy under continual observations ▪ No practical implementations / applications – Randomized response based approaches (Google’s RAPPOR; Apple) not applicable
  10. 10. Title Region $$ User Exp Designer SF Bay Area 100K User Exp Designer SF Bay Area 115K ... ... ... Title Region $$ User Exp Designer SF Bay Area 100K De-identification Example Title Region Company Industry Years of exp Degree FoS Skills $$ User Exp Designer SF Bay Area Google Internet 12 BS Interacti ve Media UX, Graphic s, ... 100K Title Region Industry $$ User Exp Designer SF Bay Area Internet 100K Title Region Years of exp $$ User Exp Designer SF Bay Area 10+ 100K Title Region Company Years of exp $$ User Exp Designer SF Bay Area Google 10+ 100K #data points > threshold? Yes ⇒ Copy to Hadoop (HDFS) Note: Original submission stored as encrypted objects.
  11. 11. System Architecture
  12. 12. Collection & Storage
  13. 13. Collection & Storage ▪Allow members to submit their compensation info ▪Extract member attributes – E.g., canonical job title, company, region, by invoking LinkedIn standardization services ▪Securely store member attributes & compensation data
  14. 14. De-identification & Grouping
  15. 15. De-identification & Grouping ▪Approach inspired by k-Anonymity [Samarati-Sweeney] ▪“Cohort” or “Slice” – Defined by a combination of attributes – E.g, “User experience designers in SF Bay Area” ▪ Contains aggregated compensation entries from corresponding individuals ▪ No user name, id or any attributes other than those that define the cohort – A cohort available for offline processing only if it has at least k entries – Apply LinkedIn standardization software (free-form attribute  canonical version) before grouping ▪ Analogous to the generalization step in k-Anonymity
  16. 16. ▪Slicing service – Access member attribute info & submission identifiers (no compensation data) – Generate slices & track # submissions for each slice ▪Preparation service – Fetch compensation data (using submission identifiers), associate with the slice data, copy to HDFS De-identification & Grouping
  17. 17. Insights & Modeling
  18. 18. Insights & Modeling ▪Salary insight service – Check whether the member is eligible ▪ Give-to-get model – If yes, show the insights ▪Offline workflow – Consume de-identified HDFS dataset – Compute robust compensation insights ▪ Outlier detection ▪ Bayesian smoothing – Populate the insight key- value stores
  19. 19. Security Mechanisms
  20. 20. Security Mechanisms ▪Encryption of member attributes & compensation data using different sets of keys – Separation of processing – Limiting access to the keys
  21. 21. Security Mechanisms ▪Key rotation ▪No single point of failure ▪Infra security
  22. 22. Preventing Timestamp Join based Attacks ▪Inference attack by joining these on timestamp – De-identified compensation data – Page view logs (when a member accessed compensation collection web interface) –  Not desirable to retain the exact timestamp ▪Perturb by adding random delay (say, up to 48 hours) ▪Modification based on k-Anonymity – Generalization using a hierarchy of timestamps – But, need to be incremental –  Process entries within a cohort in batches of size k ▪ Generalize to a common timestamp ▪ Make additional data available only in such incremental batches
  23. 23. Privacy vs Modeling Tradeoffs ▪Our system deployed in production for ~1.5 years ▪Study tradeoffs between privacy guarantees (‘k’) and data available for computing insights – Dataset: Compensation submission history from 1.5M LinkedIn members – Amount of data available vs. minimum threshold, k – Effect of processing entries in batches of size, k
  24. 24. Amount of data available vs. threshold, k
  25. 25. Percent of data available vs. batch size, k
  26. 26. Median delay due to batching vs. batch size, k
  27. 27. Summary ▪ LinkedIn Salary: a new internet application – Privacy and Modeling Challenges – System Design & Architecture – Privacy vs. Modeling Tradeoffs ▪ Potential directions – Privacy-preserving machine learning models in a practical setting [e.g., Chaudhuri et al, Papernot et al]
  28. 28. Thanks & Pointers ▪ Related tech report: K. Kenthapadi, S. Ambler, L. Zhang, and D. Agarwal, Bringing salary transparency to the world: Computing robust compensation insights via LinkedIn Salary, 2017 (arxiv.org/abs/1703.09845) ▪ Team: Careers Engineering  Ahsan Chudhary  Alan Yang  Alex Navasardyan  Brandyn Bennett  Hrishikesh S  Jim Tao  Juan Pablo Lomeli Diaz  Lu Zheng  Patrick Schutz  Ricky Yan  Stephanie Chou  Joseph Florencio  Santosh Kumar Kancha  Anthony Duerr Data Relevance Engineering  Krishnaram Kenthapadi, Stuart Ambler, Yiqun Liu, Parul Jain, Liang Zhang, Ganesh Venkataraman, Tim Converse, Deepak Agarwal Product Managers: Ryan Sandler, Keren Baruch UED: Julie Kuang Marketing: Phil Bunge Business Operations: Prateek Janardhan BA: Fiona Li Testing: Bharath Shetty ProdOps/VOM: Sunil Mahadeshwar Security: Cory Scott, Tushar Dalvi, and team linkedin.com/salary

×