Converting High Dimensional Problems to Low Dimensional Ones

•Télécharger en tant que PPTX, PDF•

1 j'aime•537 vues

Strand Life Sciences Pvt Ltd

Technologie Développement personnel

Converting High Dimensional
Problems to Low Dimensional
Ones

General Paradigm
Reduce and Conquer

• Large Problem  Small Problem

– Break array into two parts

– Consider odd and even elements

– Sample edges in a graph to obtain a smaller graph

– Represent a graph by a collection of trees

– Take number modulo small prime

– Multiply matrix by a random vector

– Project high dimensional point sets into fewer dimensions

The Problem

• Given n points in D dimensional space

• Project them in d << D dimensions
– So (Euclidean) distance between every pair of points is
(almost) preserved

• How does d compare to n?

Application

• Hierarchical Clustering

• Say ten thousand samples each over a few million
SNPs

• Few million  Few Hundreds/Thousands? And Fast?

First Attempt

• Can we make d=n-1?

– X axis through 2 of the points

– Y axis so 3rd point is in the XY
plane

– Z axis so 4th point is in the XYZ
3d space

– And so on

First Attempt

• Time taken

– Each new axis has to be made
orthogonal to all previous axes

– O(n2 D)

– Too slow

Second Attempt
Use Random Projections

• Take d random vectors r1..rd

• For every point p, take the d dimensional point
• [ p.r1 p.r2 .. p.rd ] * scaling-factor

• Do these d-dim points preserve inter-point
distances approximately? How large should d be?

Random Projections
Further Simplification

• Take any vector p in D dimensions

• Suppose we show
– [ p.r1 p.r2 .. p.rd ] * scaling-factor has length ~ |p|
– Failure prob < 1/n3

• Prob that even one of the n2 difference vector
lengths is not preserved with prob < n2/n3 ~ 1/n

Random Projections
What is a random vector?

• No directional bias

Normal Distributions

• Pr of being between x and x+dx

For N(0,1), ~ e-x2/2

Generating Random Vectors without
Directional Bias
• Take D numbers (X1...XD), each N(0,1), independently

• Distribution of each number X
– Pr of being between a..a+da ~ e-a2/2

• Pr X1 in a1..a1+da1 : X2 in a2..a2+da2 ::: XD in aD..aD+daD
– e-a12/2 e-a22/2 … e-aD2/2 da1da2….daD
– e-(a12+a22+aD2)/2 da1da2….daD
– e-l2/2 da1da2….daD

So no dependence on direction, only on length l !

The Algorithm

• Take d random vectors r1..rd
– Each ri = [Xi1 Xi2 … XiD] where the X’s are chosen from
N(0,1) independently

• For every point p, take the d dimensional point
• [ p.r1 p.r2 .. p.rd ] * sqrt(1/d)

• Time: n*d*D

Simplifying Further

• Take any vector p in D dimensions

• We need to show that
• [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) has length ~ |p|
• Failure prob < 1/n3

• We can assume p to be 1 0 0 0 0 0 …
– because random vectors have no directional bias
– Then [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) = [X11 X21 … Xd1] * sqrt(1/d)

Analysis

• We need to show that
• [X1 X2 … Xd] * sqrt(1/d) has length ~ 1
• Failure prob < 1/n3

• Or (X12+…+Xd2)/d ~ 1, failure prob < 1/n3

• Or (X12+…+Xd2) ~ d, failure prob < 1/n3

• Note Xi has mean 1 and s.d sqrt(2)

Law of Large Numbers

• Y1..Yd each with any (decent) distribution with mean
1 and s.d sqrt(2)

• Then Y1+…+Yd tends to a Normal distribution with
mean d and s.d sqrt(2d) (for large d)

• Pr (Y1+…+Yd not in (1+∆)d.. (1-∆)d) <
• e-(∆d)2/2.2d = e-∆2d/4
• Choose d=12 ln n/∆2 , this is < 1/n3 as needed

Conclusion

• n numbers in D dimensions

– can be projected to 12 ln n/∆2 dimensions

– all distances stretch only by (1+/-∆)

– with prob > 1-1/n

Contenu connexe

Tendances

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf

5.1 greedyKrish_ver2

Discrete Fourier TransformAbhishek Choksi

Greedy Algorithms with examples' b-18298LGS, GBHS&IC, University Of South-Asia, TARA-Technologies

Tensor Train decomposition in machine learningAlexander Novikov

Machine Learning - Regression modelRADO7900

WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016Penn State University

Cpsc125 ch6sec3guest862df4e

Algorithm Design and Complexity - Course 11Traian Rebedea

A note on word embeddingKhang Pham

Neural ODENatan Katz

Daa unit 4Abhimanyu Mishra

Tensorizing Neural NetworkRuochun Tzeng

Greedy algorithmInternational Islamic University

Discrete fourier transformMOHAMMAD AKRAM

Ram minimum spanning treeRama Prasath A

Shortest Path ProblemGuillaume Guérard

Detailed Description on Cross Entropy Loss Function범준 김

Paper study: Attention, learn to solve routing problems!ChenYiHuang5

Noisy optimization --- (theory oriented) SurveyOlivier Teytaud

Tendances (20)

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...

5.1 greedy

Discrete Fourier Transform

Greedy Algorithms with examples' b-18298

Tensor Train decomposition in machine learning

Machine Learning - Regression model

WF ED 540, Class Meeting 2 - Identifying & converting data types, 2016

Cpsc125 ch6sec3

Algorithm Design and Complexity - Course 11

A note on word embedding

Neural ODE

Daa unit 4

Tensorizing Neural Network

Greedy algorithm

Discrete fourier transform

Ram minimum spanning tree

Shortest Path Problem

Detailed Description on Cross Entropy Loss Function

Paper study: Attention, learn to solve routing problems!

Noisy optimization --- (theory oriented) Survey

En vedette

Towards energy efficient big data gatheringFinalyear Projects

Energy efficient reverse skyline query processing over wireless sensor networksFinalyear Projects

Han Liu MedicReS World Congress 2015MedicReS

Batch and Interactive Analytics: From Data to InsightWSO2

Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Cagatay Turkay

Research issues in the big data and its ChallengesKathirvel Ayyaswamy

Efficient Analysis of high-dimensional data in tensor formatsAlexander Litvinenko

DESIGN AND ANALYSIS OF ALGORITHM (DAA)m.kumarasamy college of engineering

Nosql query processing system for wireless sensor networksNikhil Bhaware

Skyline Query Processing using Filtering in Distributed EnvironmentIJMER

High Performance Computing and Big Data Geoffrey Fox

Big Data Analytics with HadoopPhilippe Julio

En vedette (12)

Towards energy efficient big data gathering

Energy efficient reverse skyline query processing over wireless sensor networks

Han Liu MedicReS World Congress 2015

Batch and Interactive Analytics: From Data to Insight

Designing Progressive and Interactive Analytics Processes for High-Dimensiona...

Research issues in the big data and its Challenges

Efficient Analysis of high-dimensional data in tensor formats

DESIGN AND ANALYSIS OF ALGORITHM (DAA)

Nosql query processing system for wireless sensor networks

Skyline Query Processing using Filtering in Distributed Environment

High Performance Computing and Big Data

Big Data Analytics with Hadoop

Similaire à Converting High Dimensional Problems to Low Dimensional Ones

Chap-2 Preliminary Concepts and Linear Finite Elements.pptxSamirsinh Parmar

ML unit2.pptxSwarnaKumariChinni

Computing the Square Roots of Unity to break RSA using Quantum AlgorithmsDharmalingam Ganesan

Divide and conquer surfing lower boundsRajendran

Randomized algorithms ver 1.0Dr. C.V. Suresh Babu

Lecture5Atner Yegorov

Digital Distance Geometryppd1961

Line drawing Algorithm DDA in computer Graphics.pdfRAJARATNAS

Modeling the Dynamics of SGD by Stochastic Differential EquationMark Chang

Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov

Randomness conductorswtyru1989

Visualization using tSNEYan Xu

Quantum factorization.pdfssuser8b461f

IARE_DSP_PPT.pptxNavaneethakrishnanVe2

1_Asymptotic_Notation_pptx.pptxpallavidhade2

Achieving Spatial Adaptivity while Searching for Approximate Nearest NeighborsDon Sheehy

Multiple pptManish Mor

Multiple intigration pptManish Mor

Circle drawing algo.Mohd Arif

Algorithms - A Sneak PeekBADR

Similaire à Converting High Dimensional Problems to Low Dimensional Ones (20)

Chap-2 Preliminary Concepts and Linear Finite Elements.pptx

ML unit2.pptx

Computing the Square Roots of Unity to break RSA using Quantum Algorithms

Divide and conquer surfing lower bounds

Randomized algorithms ver 1.0

Lecture5

Digital Distance Geometry

Line drawing Algorithm DDA in computer Graphics.pdf

Modeling the Dynamics of SGD by Stochastic Differential Equation

Numerical Linear Algebra for Data and Link Analysis.

Randomness conductors

Visualization using tSNE

Quantum factorization.pdf

IARE_DSP_PPT.pptx

1_Asymptotic_Notation_pptx.pptx

Achieving Spatial Adaptivity while Searching for Approximate Nearest Neighbors

Multiple ppt

Multiple intigration ppt

Circle drawing algo.

Algorithms - A Sneak Peek

Plus de Strand Life Sciences Pvt Ltd

Strand genomics features in CIO reviewStrand Life Sciences Pvt Ltd

Rules of a Quantum WorldStrand Life Sciences Pvt Ltd

Least common ancestors in constant timeStrand Life Sciences Pvt Ltd

Introduction to statistics iiiStrand Life Sciences Pvt Ltd

Introduction to statistics iiStrand Life Sciences Pvt Ltd

Introduction to statisticsStrand Life Sciences Pvt Ltd

Dynamic programming for simdStrand Life Sciences Pvt Ltd

Complex numbers polynomial multiplicationStrand Life Sciences Pvt Ltd

Searching using Quantum RulesStrand Life Sciences Pvt Ltd

Randomized algorithmsStrand Life Sciences Pvt Ltd

Suffix arraysStrand Life Sciences Pvt Ltd

Alignment of raw reads in Avadis NGSStrand Life Sciences Pvt Ltd

Plus de Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review

Rules of a Quantum World

Least common ancestors in constant time

Introduction to statistics iii

Introduction to statistics ii

Introduction to statistics

Dynamic programming for simd

Complex numbers polynomial multiplication

Searching using Quantum Rules

Randomized algorithms

Suffix arrays

Alignment of raw reads in Avadis NGS

Dernier

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

🐬 The future of MySQL is Postgres 🐘RTylerCroy

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

A Domino Admins Adventures (Engage 2024)Gabriella Davis

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Dernier (20)

Partners Life - Insurer Innovation Award 2024

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

08448380779 Call Girls In Civil Lines Women Seeking Men

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Axa Assurance Maroc - Insurer Innovation Award 2024

CNv6 Instructor Chapter 6 Quality of Service

How to Troubleshoot Apps for the Modern Connected Worker

🐬 The future of MySQL is Postgres 🐘

[2024]Digital Global Overview Report 2024 Meltwater.pdf

A Domino Admins Adventures (Engage 2024)

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Unblocking The Main Thread Solving ANRs and Frozen Frames

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Handwritten Text Recognition for manuscripts and early printed texts

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Converting High Dimensional Problems to Low Dimensional Ones

1. Converting High Dimensional Problems to Low Dimensional Ones

2. General Paradigm Reduce and Conquer • Large Problem  Small Problem – Break array into two parts – Consider odd and even elements – Sample edges in a graph to obtain a smaller graph – Represent a graph by a collection of trees – Take number modulo small prime – Multiply matrix by a random vector – Project high dimensional point sets into fewer dimensions

3. The Problem • Given n points in D dimensional space • Project them in d << D dimensions – So (Euclidean) distance between every pair of points is (almost) preserved • How does d compare to n?

4. Application • Hierarchical Clustering • Say ten thousand samples each over a few million SNPs • Few million  Few Hundreds/Thousands? And Fast?

5. First Attempt • Can we make d=n-1? – X axis through 2 of the points – Y axis so 3rd point is in the XY plane – Z axis so 4th point is in the XYZ 3d space – And so on

6. First Attempt • Time taken – Each new axis has to be made orthogonal to all previous axes – O(n2 D) – Too slow

7. Second Attempt Use Random Projections • Take d random vectors r1..rd • For every point p, take the d dimensional point • [ p.r1 p.r2 .. p.rd ] * scaling-factor • Do these d-dim points preserve inter-point distances approximately? How large should d be?

8. Random Projections Further Simplification • Take any vector p in D dimensions • Suppose we show – [ p.r1 p.r2 .. p.rd ] * scaling-factor has length ~ |p| – Failure prob < 1/n3 • Prob that even one of the n2 difference vector lengths is not preserved with prob < n2/n3 ~ 1/n

9. Random Projections What is a random vector? • No directional bias

10. Normal Distributions • Pr of being between x and x+dx For N(0,1), ~ e-x2/2

11. Generating Random Vectors without Directional Bias • Take D numbers (X1...XD), each N(0,1), independently • Distribution of each number X – Pr of being between a..a+da ~ e-a2/2 • Pr X1 in a1..a1+da1 : X2 in a2..a2+da2 ::: XD in aD..aD+daD – e-a12/2 e-a22/2 … e-aD2/2 da1da2….daD – e-(a12+a22+aD2)/2 da1da2….daD – e-l2/2 da1da2….daD So no dependence on direction, only on length l !

12. The Algorithm • Take d random vectors r1..rd – Each ri = [Xi1 Xi2 … XiD] where the X’s are chosen from N(0,1) independently • For every point p, take the d dimensional point • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) • Time: n*d*D

13. Simplifying Further • Take any vector p in D dimensions • We need to show that • [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) has length ~ |p| • Failure prob < 1/n3 • We can assume p to be 1 0 0 0 0 0 … – because random vectors have no directional bias – Then [ p.r1 p.r2 .. p.rd ] * sqrt(1/d) = [X11 X21 … Xd1] * sqrt(1/d)

14. Analysis • We need to show that • [X1 X2 … Xd] * sqrt(1/d) has length ~ 1 • Failure prob < 1/n3 • Or (X12+…+Xd2)/d ~ 1, failure prob < 1/n3 • Or (X12+…+Xd2) ~ d, failure prob < 1/n3 • Note Xi has mean 1 and s.d sqrt(2)

15. Law of Large Numbers • Y1..Yd each with any (decent) distribution with mean 1 and s.d sqrt(2) • Then Y1+…+Yd tends to a Normal distribution with mean d and s.d sqrt(2d) (for large d) • Pr (Y1+…+Yd not in (1+∆)d.. (1-∆)d) < • e-(∆d)2/2.2d = e-∆2d/4 • Choose d=12 ln n/∆2 , this is < 1/n3 as needed

16. Conclusion • n numbers in D dimensions – can be projected to 12 ln n/∆2 dimensions – all distances stretch only by (1+/-∆) – with prob > 1-1/n

Converting High Dimensional Problems to Low Dimensional Ones

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (12)

Similaire à Converting High Dimensional Problems to Low Dimensional Ones

Similaire à Converting High Dimensional Problems to Low Dimensional Ones (20)

Plus de Strand Life Sciences Pvt Ltd

Plus de Strand Life Sciences Pvt Ltd (12)

Dernier

Dernier (20)

Converting High Dimensional Problems to Low Dimensional Ones