SlideShare une entreprise Scribd logo
1  sur  16
Converting High Dimensional
Problems to Low Dimensional
           Ones
General Paradigm
                 Reduce and Conquer

• Large Problem  Small Problem

   – Break array into two parts

   – Consider odd and even elements

   – Sample edges in a graph to obtain a smaller graph

   – Represent a graph by a collection of trees

   – Take number modulo small prime

   – Multiply matrix by a random vector

   – Project high dimensional point sets into fewer dimensions
The Problem


• Given n points in D dimensional space

• Project them in d << D dimensions
   – So (Euclidean) distance between every pair of points is
     (almost) preserved

• How does d compare to n?
Application


• Hierarchical Clustering

• Say ten thousand samples each over a few million
  SNPs

• Few million  Few Hundreds/Thousands? And Fast?
First Attempt


• Can we make d=n-1?

  – X axis through 2 of the points

  – Y axis so 3rd point is in the XY
    plane

  – Z axis so 4th point is in the XYZ
    3d space

  – And so on
First Attempt


• Time taken

  – Each new axis has to be made
    orthogonal to all previous axes

  – O(n2 D)

  – Too slow
Second Attempt
          Use Random Projections


• Take d random vectors r1..rd

• For every point p, take the d dimensional point
      • [ p.r1 p.r2 .. p.rd ] * scaling-factor

• Do these d-dim points preserve inter-point
  distances approximately? How large should d be?
Random Projections
              Further Simplification


• Take any vector p in D dimensions

• Suppose we show
   – [ p.r1 p.r2 .. p.rd ] * scaling-factor has length ~ |p|
   – Failure prob < 1/n3

• Prob that even one of the n2 difference vector
  lengths is not preserved with prob < n2/n3 ~ 1/n
Random Projections
        What is a random vector?



• No directional bias
Normal Distributions

• Pr of being between x and x+dx




       For N(0,1), ~ e-x2/2
Generating Random Vectors without
           Directional Bias
• Take D numbers (X1...XD), each N(0,1), independently

• Distribution of each number X
   – Pr of being between a..a+da ~ e-a2/2

• Pr X1 in a1..a1+da1 : X2 in a2..a2+da2 ::: XD in aD..aD+daD
   – e-a12/2 e-a22/2 … e-aD2/2   da1da2….daD
   – e-(a12+a22+aD2)/2           da1da2….daD
   – e-l2/2                      da1da2….daD

   So no dependence on direction, only on length l !
The Algorithm

• Take d random vectors r1..rd
   – Each ri = [Xi1 Xi2 … XiD] where the X’s are chosen from
     N(0,1) independently


• For every point p, take the d dimensional point
      • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d)

• Time: n*d*D
Simplifying Further

• Take any vector p in D dimensions

• We need to show that
    • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) has length ~ |p|
    • Failure prob < 1/n3

• We can assume p to be 1 0 0 0 0 0 …
   – because random vectors have no directional bias
   – Then [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) = [X11 X21 … Xd1] * sqrt(1/d)
Analysis

• We need to show that
       • [X1 X2 … Xd] * sqrt(1/d) has length ~ 1
       • Failure prob < 1/n3

• Or (X12+…+Xd2)/d ~ 1, failure prob < 1/n3


• Or (X12+…+Xd2) ~ d, failure prob < 1/n3


• Note Xi has mean 1 and s.d sqrt(2)
Law of Large Numbers

• Y1..Yd each with any (decent) distribution with mean
  1 and s.d sqrt(2)

• Then Y1+…+Yd tends to a Normal distribution with
  mean d and s.d sqrt(2d) (for large d)

• Pr (Y1+…+Yd not in (1+∆)d.. (1-∆)d) <
      • e-(∆d)2/2.2d = e-∆2d/4
• Choose d=12 ln n/∆2 , this is < 1/n3 as needed
Conclusion


• n numbers in D dimensions

  – can be projected to 12 ln n/∆2 dimensions

  – all distances stretch only by (1+/-∆)

  – with prob > 1-1/n

Contenu connexe

Tendances

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
Discrete Fourier Transform
Discrete Fourier TransformDiscrete Fourier Transform
Discrete Fourier TransformAbhishek Choksi
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learningAlexander Novikov
 
Machine Learning - Regression model
Machine Learning - Regression modelMachine Learning - Regression model
Machine Learning - Regression modelRADO7900
 
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016Penn State University
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Traian Rebedea
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embeddingKhang Pham
 
Tensorizing Neural Network
Tensorizing Neural NetworkTensorizing Neural Network
Tensorizing Neural NetworkRuochun Tzeng
 
Discrete fourier transform
Discrete fourier transformDiscrete fourier transform
Discrete fourier transformMOHAMMAD AKRAM
 
Ram minimum spanning tree
Ram   minimum spanning treeRam   minimum spanning tree
Ram minimum spanning treeRama Prasath A
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!ChenYiHuang5
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyOlivier Teytaud
 

Tendances (20)

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
Discrete Fourier Transform
Discrete Fourier TransformDiscrete Fourier Transform
Discrete Fourier Transform
 
Greedy Algorithms with examples' b-18298
Greedy Algorithms with examples'  b-18298Greedy Algorithms with examples'  b-18298
Greedy Algorithms with examples' b-18298
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
 
Machine Learning - Regression model
Machine Learning - Regression modelMachine Learning - Regression model
Machine Learning - Regression model
 
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016
 
Cpsc125 ch6sec3
Cpsc125 ch6sec3Cpsc125 ch6sec3
Cpsc125 ch6sec3
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embedding
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 
Daa unit 4
Daa unit 4Daa unit 4
Daa unit 4
 
Tensorizing Neural Network
Tensorizing Neural NetworkTensorizing Neural Network
Tensorizing Neural Network
 
Greedy algorithm
Greedy algorithmGreedy algorithm
Greedy algorithm
 
Discrete fourier transform
Discrete fourier transformDiscrete fourier transform
Discrete fourier transform
 
Ram minimum spanning tree
Ram   minimum spanning treeRam   minimum spanning tree
Ram minimum spanning tree
 
Shortest Path Problem
Shortest Path ProblemShortest Path Problem
Shortest Path Problem
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) Survey
 

En vedette

Towards energy efficient big data gathering
Towards energy efficient big data gatheringTowards energy efficient big data gathering
Towards energy efficient big data gatheringFinalyear Projects
 
Energy efficient reverse skyline query processing over wireless sensor networks
Energy efficient reverse skyline query processing over wireless sensor networksEnergy efficient reverse skyline query processing over wireless sensor networks
Energy efficient reverse skyline query processing over wireless sensor networksFinalyear Projects
 
Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015MedicReS
 
Batch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to InsightBatch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to InsightWSO2
 
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Cagatay Turkay
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsAlexander Litvinenko
 
Nosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networksNosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networksNikhil Bhaware
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentIJMER
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

En vedette (12)

Towards energy efficient big data gathering
Towards energy efficient big data gatheringTowards energy efficient big data gathering
Towards energy efficient big data gathering
 
Energy efficient reverse skyline query processing over wireless sensor networks
Energy efficient reverse skyline query processing over wireless sensor networksEnergy efficient reverse skyline query processing over wireless sensor networks
Energy efficient reverse skyline query processing over wireless sensor networks
 
Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015
 
Batch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to InsightBatch and Interactive Analytics: From Data to Insight
Batch and Interactive Analytics: From Data to Insight
 
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formats
 
DESIGN AND ANALYSIS OF ALGORITHM (DAA)
DESIGN AND ANALYSIS OF ALGORITHM (DAA)DESIGN AND ANALYSIS OF ALGORITHM (DAA)
DESIGN AND ANALYSIS OF ALGORITHM (DAA)
 
Nosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networksNosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networks
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed Environment
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similaire à Converting High Dimensional Problems to Low Dimensional Ones

Chap-2 Preliminary Concepts and Linear Finite Elements.pptx
Chap-2 Preliminary Concepts and  Linear Finite Elements.pptxChap-2 Preliminary Concepts and  Linear Finite Elements.pptx
Chap-2 Preliminary Concepts and Linear Finite Elements.pptxSamirsinh Parmar
 
Computing the Square Roots of Unity to break RSA using Quantum Algorithms
Computing the Square Roots of Unity to break RSA using Quantum AlgorithmsComputing the Square Roots of Unity to break RSA using Quantum Algorithms
Computing the Square Roots of Unity to break RSA using Quantum AlgorithmsDharmalingam Ganesan
 
Divide and conquer surfing lower bounds
Divide and conquer  surfing lower boundsDivide and conquer  surfing lower bounds
Divide and conquer surfing lower boundsRajendran
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometryppd1961
 
Line drawing Algorithm DDA in computer Graphics.pdf
Line drawing Algorithm DDA in computer Graphics.pdfLine drawing Algorithm DDA in computer Graphics.pdf
Line drawing Algorithm DDA in computer Graphics.pdfRAJARATNAS
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationMark Chang
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov
 
Randomness conductors
Randomness conductorsRandomness conductors
Randomness conductorswtyru1989
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNEYan Xu
 
Quantum factorization.pdf
Quantum factorization.pdfQuantum factorization.pdf
Quantum factorization.pdfssuser8b461f
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptxpallavidhade2
 
Achieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
Achieving Spatial Adaptivity while Searching for Approximate Nearest NeighborsAchieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
Achieving Spatial Adaptivity while Searching for Approximate Nearest NeighborsDon Sheehy
 
Multiple intigration ppt
Multiple intigration pptMultiple intigration ppt
Multiple intigration pptManish Mor
 
Circle drawing algo.
Circle drawing algo.Circle drawing algo.
Circle drawing algo.Mohd Arif
 
Algorithms - A Sneak Peek
Algorithms - A Sneak PeekAlgorithms - A Sneak Peek
Algorithms - A Sneak PeekBADR
 

Similaire à Converting High Dimensional Problems to Low Dimensional Ones (20)

Chap-2 Preliminary Concepts and Linear Finite Elements.pptx
Chap-2 Preliminary Concepts and  Linear Finite Elements.pptxChap-2 Preliminary Concepts and  Linear Finite Elements.pptx
Chap-2 Preliminary Concepts and Linear Finite Elements.pptx
 
ML unit2.pptx
ML unit2.pptxML unit2.pptx
ML unit2.pptx
 
Computing the Square Roots of Unity to break RSA using Quantum Algorithms
Computing the Square Roots of Unity to break RSA using Quantum AlgorithmsComputing the Square Roots of Unity to break RSA using Quantum Algorithms
Computing the Square Roots of Unity to break RSA using Quantum Algorithms
 
Divide and conquer surfing lower bounds
Divide and conquer  surfing lower boundsDivide and conquer  surfing lower bounds
Divide and conquer surfing lower bounds
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Lecture5
Lecture5Lecture5
Lecture5
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometry
 
Line drawing Algorithm DDA in computer Graphics.pdf
Line drawing Algorithm DDA in computer Graphics.pdfLine drawing Algorithm DDA in computer Graphics.pdf
Line drawing Algorithm DDA in computer Graphics.pdf
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
Randomness conductors
Randomness conductorsRandomness conductors
Randomness conductors
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
Quantum factorization.pdf
Quantum factorization.pdfQuantum factorization.pdf
Quantum factorization.pdf
 
IARE_DSP_PPT.pptx
IARE_DSP_PPT.pptxIARE_DSP_PPT.pptx
IARE_DSP_PPT.pptx
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
Achieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
Achieving Spatial Adaptivity while Searching for Approximate Nearest NeighborsAchieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
Achieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors
 
Multiple ppt
Multiple pptMultiple ppt
Multiple ppt
 
Multiple intigration ppt
Multiple intigration pptMultiple intigration ppt
Multiple intigration ppt
 
Circle drawing algo.
Circle drawing algo.Circle drawing algo.
Circle drawing algo.
 
Algorithms - A Sneak Peek
Algorithms - A Sneak PeekAlgorithms - A Sneak Peek
Algorithms - A Sneak Peek
 

Plus de Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Suffix arrays
Suffix arraysSuffix arrays
Suffix arrays
 
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
 

Dernier

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Converting High Dimensional Problems to Low Dimensional Ones

  • 1. Converting High Dimensional Problems to Low Dimensional Ones
  • 2. General Paradigm Reduce and Conquer • Large Problem  Small Problem – Break array into two parts – Consider odd and even elements – Sample edges in a graph to obtain a smaller graph – Represent a graph by a collection of trees – Take number modulo small prime – Multiply matrix by a random vector – Project high dimensional point sets into fewer dimensions
  • 3. The Problem • Given n points in D dimensional space • Project them in d << D dimensions – So (Euclidean) distance between every pair of points is (almost) preserved • How does d compare to n?
  • 4. Application • Hierarchical Clustering • Say ten thousand samples each over a few million SNPs • Few million  Few Hundreds/Thousands? And Fast?
  • 5. First Attempt • Can we make d=n-1? – X axis through 2 of the points – Y axis so 3rd point is in the XY plane – Z axis so 4th point is in the XYZ 3d space – And so on
  • 6. First Attempt • Time taken – Each new axis has to be made orthogonal to all previous axes – O(n2 D) – Too slow
  • 7. Second Attempt Use Random Projections • Take d random vectors r1..rd • For every point p, take the d dimensional point • [ p.r1 p.r2 .. p.rd ] * scaling-factor • Do these d-dim points preserve inter-point distances approximately? How large should d be?
  • 8. Random Projections Further Simplification • Take any vector p in D dimensions • Suppose we show – [ p.r1 p.r2 .. p.rd ] * scaling-factor has length ~ |p| – Failure prob < 1/n3 • Prob that even one of the n2 difference vector lengths is not preserved with prob < n2/n3 ~ 1/n
  • 9. Random Projections What is a random vector? • No directional bias
  • 10. Normal Distributions • Pr of being between x and x+dx For N(0,1), ~ e-x2/2
  • 11. Generating Random Vectors without Directional Bias • Take D numbers (X1...XD), each N(0,1), independently • Distribution of each number X – Pr of being between a..a+da ~ e-a2/2 • Pr X1 in a1..a1+da1 : X2 in a2..a2+da2 ::: XD in aD..aD+daD – e-a12/2 e-a22/2 … e-aD2/2 da1da2….daD – e-(a12+a22+aD2)/2 da1da2….daD – e-l2/2 da1da2….daD So no dependence on direction, only on length l !
  • 12. The Algorithm • Take d random vectors r1..rd – Each ri = [Xi1 Xi2 … XiD] where the X’s are chosen from N(0,1) independently • For every point p, take the d dimensional point • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) • Time: n*d*D
  • 13. Simplifying Further • Take any vector p in D dimensions • We need to show that • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) has length ~ |p| • Failure prob < 1/n3 • We can assume p to be 1 0 0 0 0 0 … – because random vectors have no directional bias – Then [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) = [X11 X21 … Xd1] * sqrt(1/d)
  • 14. Analysis • We need to show that • [X1 X2 … Xd] * sqrt(1/d) has length ~ 1 • Failure prob < 1/n3 • Or (X12+…+Xd2)/d ~ 1, failure prob < 1/n3 • Or (X12+…+Xd2) ~ d, failure prob < 1/n3 • Note Xi has mean 1 and s.d sqrt(2)
  • 15. Law of Large Numbers • Y1..Yd each with any (decent) distribution with mean 1 and s.d sqrt(2) • Then Y1+…+Yd tends to a Normal distribution with mean d and s.d sqrt(2d) (for large d) • Pr (Y1+…+Yd not in (1+∆)d.. (1-∆)d) < • e-(∆d)2/2.2d = e-∆2d/4 • Choose d=12 ln n/∆2 , this is < 1/n3 as needed
  • 16. Conclusion • n numbers in D dimensions – can be projected to 12 ln n/∆2 dimensions – all distances stretch only by (1+/-∆) – with prob > 1-1/n