SlideShare a Scribd company logo
1 of 36
Employees, Business Partners and Bad Guys:
What Web data reveals about persons of interest
Presenters: Gina Cerami, VP of Marketing, Connotate
Dave Danielson, VP of Marketing, Digital Reasoning
Cl i S h id Di f PClaire Schmidt, Director of Programs,
Thorn: Digital Defenders of Children
(formerly DNA Foundation)
Date: November 28, 2012
Today’s Discussion
• What Web Data Reveals: The Fundamentals
The business case
Employee background check business partner screening persons of interestEmployee background check – business partner screening – persons of interest
• Collecting Good Data: Not That Easy
Where to start? Best practices
Differences in data sources – the automation processDifferences in data sources – the automation process
• Analyzing Data: A Difficult Problem
Why advanced text analytics matters
Making sense of big dataMaking sense of big data
• Automation and Advanced Analytics: A Powerful Combination
Background check accuracy enhanced with Entity Resolution
• Thorn: Working to End Child Sexual Exploitation
Combined solution applied to detecting child sex trafficking online
• Q&AQ&A
2
What Web Data Reveals:What Web Data Reveals:
The Fundamentals
3
The Business Case
news – blogs – social media
trillions of URLstrillions of URLs
court records – registries – sanctions lists
4
What Web Data Reveals About Persons of Interest
Bad GuysBusiness PartnersProspective Employees
• Extract precise data
from 10,000+ records
on URLs linked to
• Check sanctions lists
• Identify politically
• 3-minute screening
using public records
in 1 500 jurisdictions on URLs linked to
illegal activities
• Use advanced
analytics to narrow the
exposed persons
• Reduce business risk
• Avoid fines – comply
in 1,500 jurisdictions
• Eliminate human error
• Save time, money;
analytics to narrow the
scope of investigations
Avoid fines comply
with AML/KYC ruleshire right the first time
Automated, precise data collection
is key to success
55
5
The Cost of Not Knowing Your Employees
The cost of fraud in the workplace:
• $400 to 600 billion/year in the U S (Harvard)$400 to 600 billion/year in the U.S. (Harvard)
• 5% revenue on average (Association of Fraud Examiners)
• $3.2B in Canada in 2011 (Certified General Accountants Assoc. of Canada)
The cost of re-hiring (not getting it right the first time)
• From $3.5K (U.S. average cost-per-hire) to $$millions for CEOs$ ( g p ) $$
How does this happen?
• 50% of resumes have factual errors
• 1 in 5 job applications have a major lie or discrepancy (UK 2009 survey)
• Many background checks are manual (error prone) or incomplete
6
Solution: Comprehensive Search
Regular monitoring of all levels of government sites
• National state county and localNational, state, county and local
• If you outsource – make sure your screening service continually monitors
these sites for updates
• If you already do it yourself consider automating the Web data collection• If you already do it yourself – consider automating the Web data collection
process to ensure accuracy and timeliness
Connotate’s software powers over
250,000 background checks per month;
3 million to date3 million to date
7
The Cost of Not Knowing Business Partners
Recent Bank Secrecy Act (BSA) Penalties
• $1 2 B – Citibank April 2012$1.2 B Citibank, April 2012
• $7 M – Pacific National Bank, March 2011
BSA / Anti-Money Laundering (AML) Penalties
• $10.9 M – Ocean Bank (FL) August 2011
Reputation Risk – substantial
8
Solution: Comprehensive Search
Comprehensive searches by third-party services are
available for specific vertical industriesavailable for specific vertical industries
If you wish to conduct customized searches on a
regular basis consider automated data collectionregular basis, consider automated data collection
• Sanctions lists (Treasury.gov, ICE, EU Terrorism List, FBI Most Wanted,
OCC Shell Bank, etc.)
• PACER, national and state lists
• Social media may reveal that the person of interest is associating with others
on sanction lists
9
Collecting Good Data: Not That Easy
10
Where to Start? Best Practices
• Narrow your search
• Scope the projectp p j
• Think about the long term
• Sources
11
Differences in Web Sources
12
Polling Question: Web Data Collection
Are you currently collecting background data
from the Web?from the Web?
Yes – we are doing this using an automated processg g p
Yes – however, we are collecting Web data using a manual process
No – we outsource background check to a third-party service
13
An Overview of the Automation Process
Transform Deliver
• Structure
Classify
• Reports
Dashboards
Collect Data
Internal Sources
• Databases
External Sources
• Social Media • Classify
• Prep for Analysis
• Dashboards
• Workflow
• BI Plug-ins
• Databases
• Interviews
• Resumes
• Social Media
• Surface Web
• Hidden Web
•Secured Sites
14
Analyzing Data: A Difficult Problem
15
New Content Sources
Require Advanced Analytics
Outputs
q y
TransformCollect DataCollect Data Advanced Analytics
• Reports
• Dashboards
• Workflow
• BI Plug ins
• Remove Formatting
• Text Only
• Unstructured and
Structured Data
• Variety of Sources
• Scalable
• Automated
• BI Plug-ins
Resolving the Unique Individual
Associating Time and Geographic data
Fact/Assertion Extraction
Relationship Identification and Extraction
1616
Synthesys Overview:
A software platform for making sense of big datap g g
READ RESOLVE REASON
Synthesys Platform
DISPARATE
DATA
APPLICATION-
READY
Deep processing
of unstructured
data
Assemble,
organize, and
relate
Uncover
relationships,
compare & correlate
News
Web
Email
Research
DATA
App Integration
Events/Alarms
Network Analysis
READY
ANALYTICS
Instant Messages
Analytic Primitives
• Natural Language
Processing
• Entity Resolution
• Synonym Generation
• Similarity Algorithms
• ConnectivityProcessing
• Extraction
• Geocoding
• Time normalization
• Synonym Generation • Connectivity
Machine Learningg
Distributed Processing (Hadoop MapReduce)
Distributed Storage (HBase, Cassandra, Cloudbase)
Synthesys reads, resolves and reasons about entities and relationships in space and time.
17
Other solutions are flawed
and don’t make automated understanding possibleg p
Historically, the market has built tools to help find reading material
Search
Google, Fast, Autonomy, Recommind, 
Lucene
Entity Extraction
Basis, Janya, Aerotext, Attensity, 
SAP/Inxight, Lexalytics, SRA NetOwl
Comprehensive Ontologies 
or Data Models
Clarabridge, Endeca, Expert Systems, IBM 
Entity Analytics, Informatica
Other text analytics solutions still require the human to read to understand
18
Synthesys turns data into “knowledge objects”
President Masayoshi Son wants to repeat the success
VBZNNP NNP NNP VBTO DT NN PRP
NP VP - PP VP
PERSON – PROPER NOUN
President Masayoshi Son
Japan
he had while building Softbank into Japan’s third-
l t i l i S t t t k k t
NNP NNP
S
Y
M
VBD JJSJJININVBD POS
NP NP
NP
PERSON – PROPER NOUN
LOCATION
Japan
ORGANIZATION
Softbank
PREDICATE
Built
largest wireless carrier. Son wants to take market
share from entrenched giants and deliver more data to
NN NNP. VBZ VB NN NN INNN TO
NP
NP
ORGANIZATION
NOUN - ENTITY
More Data
To Deliver
NOUN - ENTITY
Market Share
PREDICATE
smartphones, tablets, cars and even bicycles.
CC
NNS
NNSNNSNNS
NNS NNS ,CC VBJJ
RB
DT TO
NPNP NP
ENTITY
Smartphones
Tablets PREDICATE
To Take
PREDICATE
CC NNSNNSNNS , .RB
NLP ExtractionEOS TOK POS CHUNK NER SREX
ENTITY
EOS TOK POS CHUNK NER SREX
19
Resolution makes “Concept” or “Semantic”
understanding possibleg p
Concept: California-based Apple
References/Mentions:
Apple
Apple inc
Apple, inc.pp
California-based Apple
Secretive Apple
iPhone inventor
Steve Job’s Company
AAPL
Technology Innovator Apple
Synthesys resolves multiple, varied mentions across the entire data set
b i f h b d h i ias being part of the same concept based on their usage in context.
20
Synthesys is “Software that Learns”
new languages, patterns, categories, etc.g g , p , g ,
Supervised machine learning techniques and patent-pending workflows allow content
experts to train models and achieve quality improvement without any programming.
User uploads
example of new
document
domain/language
11
55
domain/language
Synthesys predicts
annotation
22
Operator corrects 1Operator corrects
annotation and
adds categories
33
Completed
annotation is44
22
33
44
annotation is
submitted to server
4
Completed model
training is submitted to
Synthesys
55
3
21
Synthesys Powers Tools:
Providing a Common Global View
Leading
Visualization
Platforms
R l ti l D t b
Platform
(Data Organized &
Application-
Ready)
Relational Database
Management System
(RDBMS)
Data Sources
U t t d D tSt t d D t Unstructured DataStructured Data
22
Polling Question: Data Analysis
Are you looking to use analytics on Web data to
resolve entities or understand relationships that mightresolve entities or understand relationships that might
help in background investigations?
Yes – we are analyzing Web data manually today
Yes – we analyzing with text extractors or other text mining tools
Yes – we have a near-term project to analyze Web data
N b t h d t l W b d t i th f tNo – but we may have a need to analyze Web data in the future
No – we have no plans to analyze Web data
23
Web Data and Advanced Analytics:Web Data and Advanced Analytics:
A Powerful Combination
24
Employee Screening: A Delicate Balance
The cost of mistaken identity (incomplete screening)
• Class action suits have been filed over erroneous sex offender reportingClass action suits have been filed over erroneous sex offender reporting
• Digital Reasoning’s Solution: Entity Resolution with Synthesys®
Positions of trust Employee privacy
Safe workplace
Right hire the
first time
Libel: Impact = job loss
EEOC / FCRA
25
Business Partner Screening:
Avoiding Legal and Reputation RiskAvoiding Legal and Reputation Risk
Anti Money
L d iDo we have the right Laundering
Political
Corruption
Do we have the right
person?
(Nicknames,
misspellings, etc.)
Foreign Corrupt
Practices Act
Terrorist
Do we know who is
connected with this
company? Terrorist
Financing
Suspicious
Activity Report
What about Foreign
Language Sources?
company?
Activity Report
Increasingly, the sources of the information you need
are in unstructured web content
26
Thorn:Thorn:
Working to End Child Sexual Exploitation
27
Thorn Overview
Thorn’s focus: The role technology plays in crimes
involving the sexual exploitation of children.
Thorn’s goal: To disrupt and deflate predatory behavior
in the fight to end child sexual exploitation.
Thorn creates tools, policies and programs to bring
an end to illicit activities that could harm childrenan end to illicit activities that could harm children.
Technology Task Force consists of over 25 top tech
companies that collaborate on technology initiatives to
fight child sexual exploitation.g p
Works closely with law enforcement, NGOs, private
sector and its Technology Task Force
Part of the White House’s Office of Science andPart of the White House s Office of Science and
Technology’s commitment to end trafficking
28
Claire Schmidt is the Director of Programs for Thorn
28
The Challenge
• The explosive growth of online media has made it more
difficult to monitor and identify illicit activities, includingy , g
child sex trafficking
• Traditional analytics tools do a poor job of monitoring these
forms of online media
• Data is “messy” and unstructured
Data is often false• Data is often false
• Real age is difficult to determine from online data content
• Law enforcement has few, if any, tools to combat thisy
problem
29
Combating Sex Trafficking: Project Overview
• Thorn desired to determine the feasibility of using advanced text
analytics to detect child sex trafficking in online media.y g
• Connotate built a process to automatically download data from
selected websites.
• Digital Reasoning developed analytics to detect potential child sexDigital Reasoning developed analytics to detect potential child sex
trafficking activity within the collected data.
Widely Varied
Data Sources
Data Aggregation
and Cleansing
Analytics, resolution,
and pattern matching
Analytics results,
reports, charts
30
Project Methodology
Interview Law Enforcement Experts
• Interviewed Law Enforcement officials and determined three major focal• Interviewed Law Enforcement officials and determined three major focal
points for automated understanding
Isolate and Map Semantic Features
• Interview results were mapped into semantic features (“signatures”)
Develop models for use in Synthesys
• Analytic models were created by training Synthesys on the semantic
signatures
Identify sources of Internet dataIdentify sources of Internet data
• Then configured into Connotate for automated collection, cleansing and
transformation of data
31
Key Innovative Developments
• Accurate telephone number extractor
• Unique profiles for people posting ads
• Analytic assessment models for text
Achieved High Level of Accuracy• Achieved High Level of Accuracy
32
Web Data Collection and Advanced Analytics
OutputsTransformCollect DataCollect Data Advanced Analytics
Connotate Digital Reasoning
• Reports
• Dashboards
• Workflow
• BI Plug ins
• Remove Formatting
• Text Only
• Unstructured and
Structured Data
• Variety of Sources
• Scalable
• Automated
• BI Plug-ins
Connotate provides precise quality
data, formatted for delivery to your
Digital Reasoning applies advanced
analytics to resolve identities, enrich, y y
analysis tools data and develop unique profiles of
individuals targeted for investigation
3333
Web Data Can Reveal Insights of
Tremendous Value
Valid insights
require precise,
quality data
Avoid mistaken
identity with entity
resolution
Automation is
the key to
extracting
Obtain a deeper
understanding of
partner operationse t act g
precise,
quality data
partner operations
and key relationships
34
Q & A
Connotate will email a link to this presentation as well as ap
copy of the slides to you within 2 business days.
If you would like to use advanced Web data collection solutiony
to support background check of employees or business
partners in-house, please call (+1) 732-296-8844 or visit
www connotate com or www connotate co ukwww.connotate.com or www.connotate.co.uk
For more information about law enforcement applications and
advanced analytics please visit www digitalreasoning comadvanced analytics, please visit www.digitalreasoning.com.
35
Thank You
If you have an immediate need and would like us to contacty
you about a forthcoming project, please check the appropriate
box in the last polling question or call (+1) 732-296-8844.
For more information, visit
www connotate com or www connotate co ukwww.connotate.com or www.connotate.co.uk
and
www digitalreasoning comwww.digitalreasoning.com
36

More Related Content

What's hot

Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Cloudera, Inc.
 
Startds9.19.17sd
Startds9.19.17sdStartds9.19.17sd
Startds9.19.17sdThinkful
 
Deck 92-146 (3)
Deck 92-146 (3)Deck 92-146 (3)
Deck 92-146 (3)Thinkful
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnPraj H
 
Big Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderBig Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderJennifer Walker
 
SIM IT Trends Study 2013 - SIMposium Session
SIM IT Trends Study 2013 - SIMposium SessionSIM IT Trends Study 2013 - SIMposium Session
SIM IT Trends Study 2013 - SIMposium SessionLeon Kappelman
 
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016Calin Constantinov
 
KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science ...
KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science ...KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science ...
KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science ...IADSS
 
iTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun SukhaniiTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun SukhaniiTrain
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary SurveyTrieu Nguyen
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDATAVERSITY
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-surveyAdam Rabinovitch
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunitiesJose Quesada
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
Artificial Intelligence Expert Session Webinar
Artificial Intelligence Expert Session Webinar Artificial Intelligence Expert Session Webinar
Artificial Intelligence Expert Session Webinar ibi
 

What's hot (20)

Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
 
Startds9.19.17sd
Startds9.19.17sdStartds9.19.17sd
Startds9.19.17sd
 
Deck 92-146 (3)
Deck 92-146 (3)Deck 92-146 (3)
Deck 92-146 (3)
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
 
Big Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderBig Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not Harder
 
Big data careers
Big data careersBig data careers
Big data careers
 
SIM IT Trends Study 2013 - SIMposium Session
SIM IT Trends Study 2013 - SIMposium SessionSIM IT Trends Study 2013 - SIMposium Session
SIM IT Trends Study 2013 - SIMposium Session
 
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
 
KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science ...
KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science ...KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science ...
KDD 2019 IADSS Workshop - Skills to Master Machine Learning and Data Science ...
 
iTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun SukhaniiTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun Sukhani
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success Stories
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunities
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Artificial Intelligence Expert Session Webinar
Artificial Intelligence Expert Session Webinar Artificial Intelligence Expert Session Webinar
Artificial Intelligence Expert Session Webinar
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 

Similar to Employees, Business Partners and Bad Guys: What Web Data Reveals About Persons of Interest

Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Connotate
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataPrecisely
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersRuhollah Farchtchi
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsBrand Niemann
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?Rackspace
 
Why Big Data is Really about Small Data
Why Big Data is Really about Small DataWhy Big Data is Really about Small Data
Why Big Data is Really about Small DataHurwitz & Associates
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overviewjkvr
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data scienceThinkful
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 

Similar to Employees, Business Partners and Bad Guys: What Web Data Reveals About Persons of Interest (20)

Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Wher...
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big Data for HR
Big Data for HRBig Data for HR
Big Data for HR
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
Working with data
Working with dataWorking with data
Working with data
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data Dashboards
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
Why Big Data is Really about Small Data
Why Big Data is Really about Small DataWhy Big Data is Really about Small Data
Why Big Data is Really about Small Data
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overview
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 

More from Connotate

Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsConnotate
 
Three Steps to Accelerating Your Billing Reconciliation Process in Online Adv...
Three Steps to Accelerating Your Billing Reconciliation Process in Online Adv...Three Steps to Accelerating Your Billing Reconciliation Process in Online Adv...
Three Steps to Accelerating Your Billing Reconciliation Process in Online Adv...Connotate
 
Using Web Data to Fuel Dynamic Pricing
Using Web Data to Fuel Dynamic PricingUsing Web Data to Fuel Dynamic Pricing
Using Web Data to Fuel Dynamic PricingConnotate
 
Increase Profits with Better Vehicle Listing Data
Increase Profits with Better Vehicle Listing DataIncrease Profits with Better Vehicle Listing Data
Increase Profits with Better Vehicle Listing DataConnotate
 
Power Up Competitive Price Intelligence with Web Data
Power Up Competitive Price Intelligence with Web DataPower Up Competitive Price Intelligence with Web Data
Power Up Competitive Price Intelligence with Web DataConnotate
 
Power Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web DataPower Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web DataConnotate
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsConnotate
 
Big Data and Competitive Intelligence
Big Data and Competitive Intelligence Big Data and Competitive Intelligence
Big Data and Competitive Intelligence Connotate
 

More from Connotate (8)

Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
 
Three Steps to Accelerating Your Billing Reconciliation Process in Online Adv...
Three Steps to Accelerating Your Billing Reconciliation Process in Online Adv...Three Steps to Accelerating Your Billing Reconciliation Process in Online Adv...
Three Steps to Accelerating Your Billing Reconciliation Process in Online Adv...
 
Using Web Data to Fuel Dynamic Pricing
Using Web Data to Fuel Dynamic PricingUsing Web Data to Fuel Dynamic Pricing
Using Web Data to Fuel Dynamic Pricing
 
Increase Profits with Better Vehicle Listing Data
Increase Profits with Better Vehicle Listing DataIncrease Profits with Better Vehicle Listing Data
Increase Profits with Better Vehicle Listing Data
 
Power Up Competitive Price Intelligence with Web Data
Power Up Competitive Price Intelligence with Web DataPower Up Competitive Price Intelligence with Web Data
Power Up Competitive Price Intelligence with Web Data
 
Power Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web DataPower Up Your Competitive Price Intelligence With Web Data
Power Up Your Competitive Price Intelligence With Web Data
 
Using Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce CostsUsing Web Data to Drive Revenue and Reduce Costs
Using Web Data to Drive Revenue and Reduce Costs
 
Big Data and Competitive Intelligence
Big Data and Competitive Intelligence Big Data and Competitive Intelligence
Big Data and Competitive Intelligence
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Employees, Business Partners and Bad Guys: What Web Data Reveals About Persons of Interest

  • 1. Employees, Business Partners and Bad Guys: What Web data reveals about persons of interest Presenters: Gina Cerami, VP of Marketing, Connotate Dave Danielson, VP of Marketing, Digital Reasoning Cl i S h id Di f PClaire Schmidt, Director of Programs, Thorn: Digital Defenders of Children (formerly DNA Foundation) Date: November 28, 2012
  • 2. Today’s Discussion • What Web Data Reveals: The Fundamentals The business case Employee background check business partner screening persons of interestEmployee background check – business partner screening – persons of interest • Collecting Good Data: Not That Easy Where to start? Best practices Differences in data sources – the automation processDifferences in data sources – the automation process • Analyzing Data: A Difficult Problem Why advanced text analytics matters Making sense of big dataMaking sense of big data • Automation and Advanced Analytics: A Powerful Combination Background check accuracy enhanced with Entity Resolution • Thorn: Working to End Child Sexual Exploitation Combined solution applied to detecting child sex trafficking online • Q&AQ&A 2
  • 3. What Web Data Reveals:What Web Data Reveals: The Fundamentals 3
  • 4. The Business Case news – blogs – social media trillions of URLstrillions of URLs court records – registries – sanctions lists 4
  • 5. What Web Data Reveals About Persons of Interest Bad GuysBusiness PartnersProspective Employees • Extract precise data from 10,000+ records on URLs linked to • Check sanctions lists • Identify politically • 3-minute screening using public records in 1 500 jurisdictions on URLs linked to illegal activities • Use advanced analytics to narrow the exposed persons • Reduce business risk • Avoid fines – comply in 1,500 jurisdictions • Eliminate human error • Save time, money; analytics to narrow the scope of investigations Avoid fines comply with AML/KYC ruleshire right the first time Automated, precise data collection is key to success 55 5
  • 6. The Cost of Not Knowing Your Employees The cost of fraud in the workplace: • $400 to 600 billion/year in the U S (Harvard)$400 to 600 billion/year in the U.S. (Harvard) • 5% revenue on average (Association of Fraud Examiners) • $3.2B in Canada in 2011 (Certified General Accountants Assoc. of Canada) The cost of re-hiring (not getting it right the first time) • From $3.5K (U.S. average cost-per-hire) to $$millions for CEOs$ ( g p ) $$ How does this happen? • 50% of resumes have factual errors • 1 in 5 job applications have a major lie or discrepancy (UK 2009 survey) • Many background checks are manual (error prone) or incomplete 6
  • 7. Solution: Comprehensive Search Regular monitoring of all levels of government sites • National state county and localNational, state, county and local • If you outsource – make sure your screening service continually monitors these sites for updates • If you already do it yourself consider automating the Web data collection• If you already do it yourself – consider automating the Web data collection process to ensure accuracy and timeliness Connotate’s software powers over 250,000 background checks per month; 3 million to date3 million to date 7
  • 8. The Cost of Not Knowing Business Partners Recent Bank Secrecy Act (BSA) Penalties • $1 2 B – Citibank April 2012$1.2 B Citibank, April 2012 • $7 M – Pacific National Bank, March 2011 BSA / Anti-Money Laundering (AML) Penalties • $10.9 M – Ocean Bank (FL) August 2011 Reputation Risk – substantial 8
  • 9. Solution: Comprehensive Search Comprehensive searches by third-party services are available for specific vertical industriesavailable for specific vertical industries If you wish to conduct customized searches on a regular basis consider automated data collectionregular basis, consider automated data collection • Sanctions lists (Treasury.gov, ICE, EU Terrorism List, FBI Most Wanted, OCC Shell Bank, etc.) • PACER, national and state lists • Social media may reveal that the person of interest is associating with others on sanction lists 9
  • 10. Collecting Good Data: Not That Easy 10
  • 11. Where to Start? Best Practices • Narrow your search • Scope the projectp p j • Think about the long term • Sources 11
  • 12. Differences in Web Sources 12
  • 13. Polling Question: Web Data Collection Are you currently collecting background data from the Web?from the Web? Yes – we are doing this using an automated processg g p Yes – however, we are collecting Web data using a manual process No – we outsource background check to a third-party service 13
  • 14. An Overview of the Automation Process Transform Deliver • Structure Classify • Reports Dashboards Collect Data Internal Sources • Databases External Sources • Social Media • Classify • Prep for Analysis • Dashboards • Workflow • BI Plug-ins • Databases • Interviews • Resumes • Social Media • Surface Web • Hidden Web •Secured Sites 14
  • 15. Analyzing Data: A Difficult Problem 15
  • 16. New Content Sources Require Advanced Analytics Outputs q y TransformCollect DataCollect Data Advanced Analytics • Reports • Dashboards • Workflow • BI Plug ins • Remove Formatting • Text Only • Unstructured and Structured Data • Variety of Sources • Scalable • Automated • BI Plug-ins Resolving the Unique Individual Associating Time and Geographic data Fact/Assertion Extraction Relationship Identification and Extraction 1616
  • 17. Synthesys Overview: A software platform for making sense of big datap g g READ RESOLVE REASON Synthesys Platform DISPARATE DATA APPLICATION- READY Deep processing of unstructured data Assemble, organize, and relate Uncover relationships, compare & correlate News Web Email Research DATA App Integration Events/Alarms Network Analysis READY ANALYTICS Instant Messages Analytic Primitives • Natural Language Processing • Entity Resolution • Synonym Generation • Similarity Algorithms • ConnectivityProcessing • Extraction • Geocoding • Time normalization • Synonym Generation • Connectivity Machine Learningg Distributed Processing (Hadoop MapReduce) Distributed Storage (HBase, Cassandra, Cloudbase) Synthesys reads, resolves and reasons about entities and relationships in space and time. 17
  • 18. Other solutions are flawed and don’t make automated understanding possibleg p Historically, the market has built tools to help find reading material Search Google, Fast, Autonomy, Recommind,  Lucene Entity Extraction Basis, Janya, Aerotext, Attensity,  SAP/Inxight, Lexalytics, SRA NetOwl Comprehensive Ontologies  or Data Models Clarabridge, Endeca, Expert Systems, IBM  Entity Analytics, Informatica Other text analytics solutions still require the human to read to understand 18
  • 19. Synthesys turns data into “knowledge objects” President Masayoshi Son wants to repeat the success VBZNNP NNP NNP VBTO DT NN PRP NP VP - PP VP PERSON – PROPER NOUN President Masayoshi Son Japan he had while building Softbank into Japan’s third- l t i l i S t t t k k t NNP NNP S Y M VBD JJSJJININVBD POS NP NP NP PERSON – PROPER NOUN LOCATION Japan ORGANIZATION Softbank PREDICATE Built largest wireless carrier. Son wants to take market share from entrenched giants and deliver more data to NN NNP. VBZ VB NN NN INNN TO NP NP ORGANIZATION NOUN - ENTITY More Data To Deliver NOUN - ENTITY Market Share PREDICATE smartphones, tablets, cars and even bicycles. CC NNS NNSNNSNNS NNS NNS ,CC VBJJ RB DT TO NPNP NP ENTITY Smartphones Tablets PREDICATE To Take PREDICATE CC NNSNNSNNS , .RB NLP ExtractionEOS TOK POS CHUNK NER SREX ENTITY EOS TOK POS CHUNK NER SREX 19
  • 20. Resolution makes “Concept” or “Semantic” understanding possibleg p Concept: California-based Apple References/Mentions: Apple Apple inc Apple, inc.pp California-based Apple Secretive Apple iPhone inventor Steve Job’s Company AAPL Technology Innovator Apple Synthesys resolves multiple, varied mentions across the entire data set b i f h b d h i ias being part of the same concept based on their usage in context. 20
  • 21. Synthesys is “Software that Learns” new languages, patterns, categories, etc.g g , p , g , Supervised machine learning techniques and patent-pending workflows allow content experts to train models and achieve quality improvement without any programming. User uploads example of new document domain/language 11 55 domain/language Synthesys predicts annotation 22 Operator corrects 1Operator corrects annotation and adds categories 33 Completed annotation is44 22 33 44 annotation is submitted to server 4 Completed model training is submitted to Synthesys 55 3 21
  • 22. Synthesys Powers Tools: Providing a Common Global View Leading Visualization Platforms R l ti l D t b Platform (Data Organized & Application- Ready) Relational Database Management System (RDBMS) Data Sources U t t d D tSt t d D t Unstructured DataStructured Data 22
  • 23. Polling Question: Data Analysis Are you looking to use analytics on Web data to resolve entities or understand relationships that mightresolve entities or understand relationships that might help in background investigations? Yes – we are analyzing Web data manually today Yes – we analyzing with text extractors or other text mining tools Yes – we have a near-term project to analyze Web data N b t h d t l W b d t i th f tNo – but we may have a need to analyze Web data in the future No – we have no plans to analyze Web data 23
  • 24. Web Data and Advanced Analytics:Web Data and Advanced Analytics: A Powerful Combination 24
  • 25. Employee Screening: A Delicate Balance The cost of mistaken identity (incomplete screening) • Class action suits have been filed over erroneous sex offender reportingClass action suits have been filed over erroneous sex offender reporting • Digital Reasoning’s Solution: Entity Resolution with Synthesys® Positions of trust Employee privacy Safe workplace Right hire the first time Libel: Impact = job loss EEOC / FCRA 25
  • 26. Business Partner Screening: Avoiding Legal and Reputation RiskAvoiding Legal and Reputation Risk Anti Money L d iDo we have the right Laundering Political Corruption Do we have the right person? (Nicknames, misspellings, etc.) Foreign Corrupt Practices Act Terrorist Do we know who is connected with this company? Terrorist Financing Suspicious Activity Report What about Foreign Language Sources? company? Activity Report Increasingly, the sources of the information you need are in unstructured web content 26
  • 27. Thorn:Thorn: Working to End Child Sexual Exploitation 27
  • 28. Thorn Overview Thorn’s focus: The role technology plays in crimes involving the sexual exploitation of children. Thorn’s goal: To disrupt and deflate predatory behavior in the fight to end child sexual exploitation. Thorn creates tools, policies and programs to bring an end to illicit activities that could harm childrenan end to illicit activities that could harm children. Technology Task Force consists of over 25 top tech companies that collaborate on technology initiatives to fight child sexual exploitation.g p Works closely with law enforcement, NGOs, private sector and its Technology Task Force Part of the White House’s Office of Science andPart of the White House s Office of Science and Technology’s commitment to end trafficking 28 Claire Schmidt is the Director of Programs for Thorn 28
  • 29. The Challenge • The explosive growth of online media has made it more difficult to monitor and identify illicit activities, includingy , g child sex trafficking • Traditional analytics tools do a poor job of monitoring these forms of online media • Data is “messy” and unstructured Data is often false• Data is often false • Real age is difficult to determine from online data content • Law enforcement has few, if any, tools to combat thisy problem 29
  • 30. Combating Sex Trafficking: Project Overview • Thorn desired to determine the feasibility of using advanced text analytics to detect child sex trafficking in online media.y g • Connotate built a process to automatically download data from selected websites. • Digital Reasoning developed analytics to detect potential child sexDigital Reasoning developed analytics to detect potential child sex trafficking activity within the collected data. Widely Varied Data Sources Data Aggregation and Cleansing Analytics, resolution, and pattern matching Analytics results, reports, charts 30
  • 31. Project Methodology Interview Law Enforcement Experts • Interviewed Law Enforcement officials and determined three major focal• Interviewed Law Enforcement officials and determined three major focal points for automated understanding Isolate and Map Semantic Features • Interview results were mapped into semantic features (“signatures”) Develop models for use in Synthesys • Analytic models were created by training Synthesys on the semantic signatures Identify sources of Internet dataIdentify sources of Internet data • Then configured into Connotate for automated collection, cleansing and transformation of data 31
  • 32. Key Innovative Developments • Accurate telephone number extractor • Unique profiles for people posting ads • Analytic assessment models for text Achieved High Level of Accuracy• Achieved High Level of Accuracy 32
  • 33. Web Data Collection and Advanced Analytics OutputsTransformCollect DataCollect Data Advanced Analytics Connotate Digital Reasoning • Reports • Dashboards • Workflow • BI Plug ins • Remove Formatting • Text Only • Unstructured and Structured Data • Variety of Sources • Scalable • Automated • BI Plug-ins Connotate provides precise quality data, formatted for delivery to your Digital Reasoning applies advanced analytics to resolve identities, enrich, y y analysis tools data and develop unique profiles of individuals targeted for investigation 3333
  • 34. Web Data Can Reveal Insights of Tremendous Value Valid insights require precise, quality data Avoid mistaken identity with entity resolution Automation is the key to extracting Obtain a deeper understanding of partner operationse t act g precise, quality data partner operations and key relationships 34
  • 35. Q & A Connotate will email a link to this presentation as well as ap copy of the slides to you within 2 business days. If you would like to use advanced Web data collection solutiony to support background check of employees or business partners in-house, please call (+1) 732-296-8844 or visit www connotate com or www connotate co ukwww.connotate.com or www.connotate.co.uk For more information about law enforcement applications and advanced analytics please visit www digitalreasoning comadvanced analytics, please visit www.digitalreasoning.com. 35
  • 36. Thank You If you have an immediate need and would like us to contacty you about a forthcoming project, please check the appropriate box in the last polling question or call (+1) 732-296-8844. For more information, visit www connotate com or www connotate co ukwww.connotate.com or www.connotate.co.uk and www digitalreasoning comwww.digitalreasoning.com 36