SlideShare une entreprise Scribd logo
1  sur  20
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Fingerprinting Latent
Structure in Data
MRITYUNJAY KUMAR & GUNTUR RAVINDRA
TECHNOLOGY EXCELLENCE GROUP
TALENTICA SOFTWARE
PRESENTED AT DAIR (DATA ANALYTICS AND INTELLIGENCE RESEARCH ,INDIAN INSTITUTE OF TECHNOLOGY, DELHI)
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Agenda
 Challenge with building data-driven algorithms
 Small-data
 Introduction to data fingerprinting
 Two problem statements
 Solving a Question complexity problem
 Solving an Image recognition problem
 Fingerprinting the structure in data
 Extracting structure
 Representing structure as a signature
 Other complex problems
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
What is data fingerprinting
 A method to represent a block of data as an entity
 Applications: Easy validation, proof of originality, tamper detection, DLP
 Classical techniques
 Bloom filters, cryptographic hashes
 Main issues with fingerprinting
 Do not capture data semantics
 Large number of fingerprints  complexity
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Two Problems
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognizing question complexity
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognizing question complexity
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognizing structural deformation in
cells
Data source: https://www.kaggle.com/c/data-science-bowl-2018/data
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Data-driven algorithms with Small-
Data
 Need for problem-specific data
 Rule-based approaches
 Rule-based approaches are easy to implement
 Not all data characteristics can be captured as rules
 Does not automatically adapt to the data
 Machine learning approach
 ML approaches need large amounts of data
 Generic models and open-source data are not suitable for application-specific
needs
 Can build complex structures and designs
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Architecting a solution
• Knowledge has a latent structure
• Sequence, Geometry
• There can be a hierarchies of structures
• convert structure to a computational representation
• Objective: context of application
capabilities Influences computational
representation
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Problem Formulation
A set of elements : images, questions, Text messages
An objective
A subset of structures relevant to an objective
How do we define and how do we find
Transformation of elements into a structure and hence
a computational entity
A human in the loop
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Structures in Data
How many buses are plying in Mumbai on a route originating at Dadar and ending at Vashi?
How many students are in the class?
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Structures in Data
Intensity Projections
Oriented gradients
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Problem Formulation
For computational ease we make
A function that maps a structure to vector
The inverse of the function results in one of
many structures
a binary bit-vector
Goal is to find so as to satisfy the constraints
This is a constrained optimization formulation
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Solution : Optimization formulation
 Based on the problem formulation
 We have an optimization formulation that has an inverse that results in the
variable itself or a subset of variables
 A related function is a neural auto-encoder
 Solution boils down to
 Training an auto-encoder with one class of data
 Recognizing data class involves
 Data clustering
 Human intelligence/visual inspection to mark clusters
 Data in clusters used to train the auto-encoder
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognition : Cell Structure
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Recognition : Question Complexity
How much can the SP alter income tax in Scotland?
What is stage 1 in the life of a bill?
Who is the President of Egypt?
Why do some people purposely resist officers of the law?
Why is the need for acceptance of punishment needed?
Why would one plead guilty to a crime involving civil disobedience?
Why is giving a defiant speech sometimes more harmful for the individual?
Why did Harvard end its early admission program?
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
 The auto-encoder output has distortions
 Detect the distortion
 Quantify the distortion
Solution : Recognition
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Building Complexity
 Incremental addition of data classes
 Using stacking
 Unique binary code injected in each
stacked layer
 Collapse stacked layers into a
classification model  redeploy
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Data Type Test
Cases
True
Positive
False
Positive
True False
Negative
With classes
like in
training data
1781 1774 NA NA 7
With classes
not like in
training data
8789 NA 13 8776 NA
Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.
Summary
 A large number of applications are still small-data applications
 Data has latent structure
 Extraction is objective based and data specific
 We can harness data-hungry algorithms for small-data applications
 Use structures instead of raw data
 Auto-encoders are powerful tools
 Build incremental complexity

Contenu connexe

Tendances

5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data AnnotationInnodata, Inc
 
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...dyadelm
 
Less Artificial More Intelligent
Less Artificial More IntelligentLess Artificial More Intelligent
Less Artificial More Intelligentpipemode
 
Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?CILIP
 
Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018ThomasCook16
 
Understanding the ABC's of AI
Understanding the ABC's of AIUnderstanding the ABC's of AI
Understanding the ABC's of AIDickson Lukose
 
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech AnalysisRoss Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech AnalysisRostyslav Chayka
 
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
Dennis Hills -  Introduction to Machine Learning on Mobile.pdfDennis Hills -  Introduction to Machine Learning on Mobile.pdf
Dennis Hills - Introduction to Machine Learning on Mobile.pdfAmazon Web Services
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayBaoxu Shi
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarJessica Willis
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...AI Frontiers
 
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...KTN
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision MakingLee Schlenker
 
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.Taejoon Yoo
 

Tendances (18)

5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation5 Questions To Ask Before Getting Started With Data Annotation
5 Questions To Ask Before Getting Started With Data Annotation
 
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...Wk 5 case 3 knowledge management and collaboration at tata consulting service...
Wk 5 case 3 knowledge management and collaboration at tata consulting service...
 
Ai trends and startups in india
Ai trends and startups in india Ai trends and startups in india
Ai trends and startups in india
 
Less Artificial More Intelligent
Less Artificial More IntelligentLess Artificial More Intelligent
Less Artificial More Intelligent
 
Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?Artificial Intelligence: What is it? Where do information professionals fit?
Artificial Intelligence: What is it? Where do information professionals fit?
 
Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018Pragmatic use of artificial intelligence in smart cities 03262018
Pragmatic use of artificial intelligence in smart cities 03262018
 
Understanding the ABC's of AI
Understanding the ABC's of AIUnderstanding the ABC's of AI
Understanding the ABC's of AI
 
LegalTech - Bots vs Lawyers
LegalTech - Bots vs LawyersLegalTech - Bots vs Lawyers
LegalTech - Bots vs Lawyers
 
Using Open Data to fuel LegalTech Innovation
Using Open Data to fuel LegalTech InnovationUsing Open Data to fuel LegalTech Innovation
Using Open Data to fuel LegalTech Innovation
 
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech AnalysisRoss Chayka. Gartner Hype Cycle for Emerging Tech Analysis
Ross Chayka. Gartner Hype Cycle for Emerging Tech Analysis
 
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
Dennis Hills -  Introduction to Machine Learning on Mobile.pdfDennis Hills -  Introduction to Machine Learning on Mobile.pdf
Dennis Hills - Introduction to Machine Learning on Mobile.pdf
 
Resume
ResumeResume
Resume
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit Jaokar
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
 
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision Making
 
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
[Mindslab] Success stories and use cases of artificial intelligence of MindsLab.
 

Similaire à Data Fingerprinting for Small Data Problems Using Autoencoders

The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...Shift Conference
 
A Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelA Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelDATAVERSITY
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Edureka!
 
Designing a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science StrategyDesigning a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science StrategyDATAVERSITY
 
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...Saama
 
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyondCompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyondZeshan Sattar
 
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-shareSantisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-sharestelligence
 
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-shareSantisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-shareSantisook Limpeeticharoenchot
 
Overview about Emerging Technologies
Overview about Emerging TechnologiesOverview about Emerging Technologies
Overview about Emerging TechnologiesMurali Venkatesh
 
CompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity ApprenticeshipsCompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity ApprenticeshipsZeshan Sattar
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Edureka!
 
Introduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SFIntroduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SFAmazon Web Services
 
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioArtificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioAmazon Web Services
 
Functional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJSFunctional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJSVivek Tikar
 
DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼Sutaek Kim
 
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen LudlowAIIM International
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Amazon Web Services
 

Similaire à Data Fingerprinting for Small Data Problems Using Autoencoders (20)

The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...
 
A Pragmatic AI Maturity Model
A Pragmatic AI Maturity ModelA Pragmatic AI Maturity Model
A Pragmatic AI Maturity Model
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Designing a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science StrategyDesigning a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science Strategy
 
Machine Learning on Mobile
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
 
Machine Learning on Mobile
Machine Learning on MobileMachine Learning on Mobile
Machine Learning on Mobile
 
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
Natural Language Understanding at AI and Machine Learning in Clinical Trials ...
 
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyondCompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
CompTIA Cyber Career Pathway: Developing skills for 2020 and beyond
 
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-shareSantisook s telligence ai-innovation-digital big bang-thailand2018-share
Santisook s telligence ai-innovation-digital big bang-thailand2018-share
 
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-shareSantisook stelligence ai-innovation-digital big bang-thailand2018-share
Santisook stelligence ai-innovation-digital big bang-thailand2018-share
 
Overview about Emerging Technologies
Overview about Emerging TechnologiesOverview about Emerging Technologies
Overview about Emerging Technologies
 
CompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity ApprenticeshipsCompTIA powered Cybersecurity Apprenticeships
CompTIA powered Cybersecurity Apprenticeships
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
 
Introduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SFIntroduction to Machine Learning on Mobile: Mobile Week SF
Introduction to Machine Learning on Mobile: Mobile Week SF
 
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglioArtificial Intelligence nella realtà di oggi: come utilizzarla al meglio
Artificial Intelligence nella realtà di oggi: come utilizzarla al meglio
 
Functional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJSFunctional programming, TypeScript and RXJS
Functional programming, TypeScript and RXJS
 
DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼DataRobot - 머신러닝 자동화 플랫폼
DataRobot - 머신러닝 자동화 플랫폼
 
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
[AIIM18] Automation and Integration (Not Rip and Replace) - Stephen Ludlow
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
 

Dernier

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Dernier (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Data Fingerprinting for Small Data Problems Using Autoencoders

  • 1. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Fingerprinting Latent Structure in Data MRITYUNJAY KUMAR & GUNTUR RAVINDRA TECHNOLOGY EXCELLENCE GROUP TALENTICA SOFTWARE PRESENTED AT DAIR (DATA ANALYTICS AND INTELLIGENCE RESEARCH ,INDIAN INSTITUTE OF TECHNOLOGY, DELHI)
  • 2. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Agenda  Challenge with building data-driven algorithms  Small-data  Introduction to data fingerprinting  Two problem statements  Solving a Question complexity problem  Solving an Image recognition problem  Fingerprinting the structure in data  Extracting structure  Representing structure as a signature  Other complex problems
  • 3. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. What is data fingerprinting  A method to represent a block of data as an entity  Applications: Easy validation, proof of originality, tamper detection, DLP  Classical techniques  Bloom filters, cryptographic hashes  Main issues with fingerprinting  Do not capture data semantics  Large number of fingerprints  complexity
  • 4. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Two Problems
  • 5. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognizing question complexity
  • 6. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognizing question complexity
  • 7. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognizing structural deformation in cells Data source: https://www.kaggle.com/c/data-science-bowl-2018/data
  • 8. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Data-driven algorithms with Small- Data  Need for problem-specific data  Rule-based approaches  Rule-based approaches are easy to implement  Not all data characteristics can be captured as rules  Does not automatically adapt to the data  Machine learning approach  ML approaches need large amounts of data  Generic models and open-source data are not suitable for application-specific needs  Can build complex structures and designs
  • 9. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Architecting a solution • Knowledge has a latent structure • Sequence, Geometry • There can be a hierarchies of structures • convert structure to a computational representation • Objective: context of application capabilities Influences computational representation
  • 10. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Problem Formulation A set of elements : images, questions, Text messages An objective A subset of structures relevant to an objective How do we define and how do we find Transformation of elements into a structure and hence a computational entity A human in the loop
  • 11. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Structures in Data How many buses are plying in Mumbai on a route originating at Dadar and ending at Vashi? How many students are in the class?
  • 12. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Structures in Data Intensity Projections Oriented gradients
  • 13. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Problem Formulation For computational ease we make A function that maps a structure to vector The inverse of the function results in one of many structures a binary bit-vector Goal is to find so as to satisfy the constraints This is a constrained optimization formulation
  • 14. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Solution : Optimization formulation  Based on the problem formulation  We have an optimization formulation that has an inverse that results in the variable itself or a subset of variables  A related function is a neural auto-encoder  Solution boils down to  Training an auto-encoder with one class of data  Recognizing data class involves  Data clustering  Human intelligence/visual inspection to mark clusters  Data in clusters used to train the auto-encoder
  • 15. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognition : Cell Structure
  • 16. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Recognition : Question Complexity How much can the SP alter income tax in Scotland? What is stage 1 in the life of a bill? Who is the President of Egypt? Why do some people purposely resist officers of the law? Why is the need for acceptance of punishment needed? Why would one plead guilty to a crime involving civil disobedience? Why is giving a defiant speech sometimes more harmful for the individual? Why did Harvard end its early admission program?
  • 17. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved.  The auto-encoder output has distortions  Detect the distortion  Quantify the distortion Solution : Recognition
  • 18. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Building Complexity  Incremental addition of data classes  Using stacking  Unique binary code injected in each stacked layer  Collapse stacked layers into a classification model  redeploy
  • 19. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Data Type Test Cases True Positive False Positive True False Negative With classes like in training data 1781 1774 NA NA 7 With classes not like in training data 8789 NA 13 8776 NA
  • 20. Copyright © 2018 Talentica Software (I) Pvt Ltd. All rights reserved. Summary  A large number of applications are still small-data applications  Data has latent structure  Extraction is objective based and data specific  We can harness data-hungry algorithms for small-data applications  Use structures instead of raw data  Auto-encoders are powerful tools  Build incremental complexity

Notes de l'éditeur

  1. Sequence of systemcalls execution  a computer program Sequence of words  a sentence Organization of pixel intensities in a 2d space  image Sequence of images  video
  2. Explain objective : an objective is to detect if a question can be answered by a trained API-based model. Objective can also be to detect if a cell is not deformed.
  3. Explain that this is similar to an auto encoder’s F and INV(F) except that INV can return the representation of any element in S’