SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
A Multidimensional
Distributed Array Abstraction
for PGAS
www.dash-project.org
Tobias Fuchs
fuchst@nm.ifi.lmu.de
Ludwig-Maximilians-Universität München, MNM-Team
| 2DASH
DASH - Overview
 DASH is a C++ template library that offers
– distributed data structures and parallel algorithms
– a complete PGAS (part. global address space) programming
system without a custom (pre-)compiler
 PGAS Terminology – SHMEM Analogy
Unit: The individual participants in a DASH
program, usually full OS processes.
Private
Shared
Unit 0 Unit 1 Unit N-1
int b;
int c;
dash::Array a(1000);
int a;
…
dash::Shared s;
10..190..9 ..999
Shared data:
managed by DASH
in a virtual global
address space
Private data:
managed by regular
C/C++ mechanisms
| 3DASH
DASH Project Structure
Phase I (2013-2015) Phase II (2016-2018)
LMU Munich
Project lead,
C++ template library
Project lead,
C++ template library,
data dock
TU Dresden
Libraries and
interfaces, tools
Smart data
structures, resilience
HLRS Stuttgart DART runtime DART runtime
KIT Karlsruhe Application studies
IHR Stuttgart
Smart deployment,
Application studies
| 4DASH
DASH - Partitioned Global Address Space
 Data affinity
– data has well-defined owner but can be accessed by any unit
– data locality important for performance
– support for the owner computes execution model
 DASH:
– unified access to
local and remote
data in global
memory space
| 5DASH
DASH - Partitioned Global Address Space
 Data affinity
– data has well-defined owner but can be accessed by any unit
– data locality important for performance
– support for the owner computes execution model
 DASH:
– unified access to
local and remote
data in global
memory space
– and explicit views
on local memory
space
| 6DASH
DASH Distributed Data Structures Overview
Container Description Data distribution
Array<T> 1D Array static, configurable
NArray<T, N> N-dim. Array static, configurable
Shared<T> Shared scalar fixed (at 0)
Directory(*)<T> Variable-size,
locally indexed
array
manual,
load-balanced
List<T> Variable-size
linked list
dynamic,
load-balanced
Map<T> Variable-size
associative map
dynamic, balanced
by hash function
(*) Under construction
| 7DASH
DASH Distributed Data Structures Overview
Container Description Data distribution
Array<T> 1D Array static, configurable
NArray<T, N> N-dim. Array static, configurable
Shared<T> Shared scalar fixed (at 0)
Directory(*)<T> Variable-size,
locally indexed
array
manual,
load-balanced
List<T> Variable-size
linked list
dynamic,
load-balanced
Map<T> Variable-size
associative map
dynamic, balanced
by hash function
(*) Under construction
| 8DASH
Multidimensional Data Distribution (1)
 dash::Pattern<N> specifies N-dim data distribution
– Blocked, cyclic, and block-cyclic in multiple dimensions
Pattern<2>(20, 15)
(BLOCKED,
NONE)
(NONE,
BLOCKCYCLIC(2))
(BLOCKED,
BLOCKCYCLIC(3))
Extent in first and
second dimension
Distribution in first and
second dimension
| 9DASH
Multidimensional Data Distribution (2)
 Example: tiled and tile-shifted data distribution
(TILE(4), TILE(3))
ShiftTilePattern<2>(32, 24)TilePattern<2, COL_MAJOR>(20, 15)
(TILE(5), TILE(5))
| 10DASH
Multidimensional Views
 Lightweight Multidimensional Views
// 8x8 2D array
dash::NArray<int, 2> mat(8,8);
// linear access using iterators
dash::distance(mat.begin(), mat.end()) == 64
// create 2x5 region view
auto reg = matrix.cols(2,5).rows(3,2);
// region can be used just like 2D array
cout << reg[1][2] << endl; // ‘7’
dash::distance(reg.begin(), reg.end()) == 10
| 11DASH
Multidimensional Views
 Lightweight Multidimensional Views
– Local and block views
dash::NArray<int, 2> mat(80000,17000);
// view to block element range
auto block = matrix.block(3,2);
// use view as Cartesian space:
auto elem = block[30][20];
dash::NArray<int, 2> mat(80000,17000);
if (dash::myid() == 1) {
// view to local element range
auto local_elems = matrix.local;
// use view as sequential range:
for (auto elem : local_elems) { … }
}
| 12DASH
Multidimensional Views
 Multidimensional Iterator Ranges
– Global iterators provide access to the
underlying view of their index space
– DASH global iterators on multi-dimensional regions can
still be passed to standard library algorithms
auto r = mat.sub(0, { 4,7 }) // rows
.sub(1, { 2,7 }); // cols
auto r_view = r.begin().view();
r_view.extents() == { 3,5 }
r_view.offsets() == { 4,2 }
// DASH algorithms use n-dim. view:
dash::summa(r.begin(), r.end(), …);
// multidimensional iterators still
// are sequential:
std::for_each(r.begin(), r.end(), …);
| 13DASH
Multidimensional Views
 Multidimensional Iterator Ranges
– Global iterators provide access to the
data distribution pattern of their iteration space
auto r = mat.sub(0, { 4,7 }) // rows
.sub(1, { 2,7 }); // cols
auto r_pattern = r.begin().pattern();
r_pattern.blocksize() == 4
r_pattern.blocks() == 16
r_pattern.blocks_at(dash::myid()) == 4
| 14DASH
DASH Algorithms
 Growing number of DASH equivalents to STL algorithms:
 Examples of STL algorithms ported to DASH, also work
for multidimensional ranges:
dash::GlobIter<T> dash::fill(GlobIter<T> begin,
GlobIter<T> end,
T val);
- dash::fill range[i] <- val
- dash::generate range[i] <- func()
- dash::for_each func(range[i])
- dash::transform range[i] = func(range2[i])
- dash::accumulate sum(range[i]) (0<=i<=n-1)
- dash::min_element min(range[i]) (0<=i<=n-1)
- dash::copy range[i] <- range2[i]
| 15DASH
DASH Algorithms
 Growing number of DASH equivalents to STL algorithms:
 Examples of STL algorithms ported to DASH, also work
for multidimensional ranges:
dash::GlobIter<T> dash::fill(GlobIter<T> begin,
GlobIter<T> end,
T val);
- dash::fill range[i] <- val
- dash::generate range[i] <- func()
- dash::for_each func(range[i])
- dash::transform range[i] = func(range2[i])
- dash::accumulate sum(range[i]) (0<=i<=n-1)
- dash::min_element min(range[i]) (0<=i<=n-1)
- dash::copy range[i] <- range2[i]
| 16DASH
Asynchronous Copying for Latency Hiding
 Asynchronous Operations
– Async. algorithm interface:
dash::copy_async()
– Launch policy:
dash::launch::async (in upcoming DASH release 0.3.0)
std::vector<int> lcopy(block.size());
// starts async. copy of global range to local memory
// … via algorithm interface:
auto fut = dash::copy_async(block.begin(), block.end(),
lcopy.begin());
// … or via launch policy:
auto fut = dash::copy(dash::launch::async,
block.begin(), block.end(),
lcopy.begin());
overlapping computation();
auto copy_end = fut.get(); // blocks until copy received
| 17DASH
 Block matrix-matrix multiplication with prefetching
while(!done) {
blk_a = matrixA.local.block(k); …
blk_b = matrixB.local.block(k); …
// prefetch
auto get_a = dash::copy_async(blk_a.begin(), blk_a.end(), lblk_a_get);
auto get_b = dash::copy_async(blk_b.begin(), blk_b.end(), lblk_b_get);
// local DGEMM
dash::multiply(lblk_a_comp, lblk_b_comp, lblk_c_comp);
// wait for transfer to finish
get_a.wait(); get_b.wait();
// swap buffers
swap(lblk_a_get, lblk_a_comp); swap(lblk_b_get, lblk_b_comp);
}
Case Study: S(R)UMMA Algorithm
| 18DASH
 Block matrix-matrix multiplication with prefetching
while(!done) {
blk_a = matrixA.local.block(k); …
blk_b = matrixB.local.block(k); …
// prefetch
auto get_a = dash::copy_async(blk_a.begin(), blk_a.end(), lblk_a_get);
auto get_b = dash::copy_async(blk_b.begin(), blk_b.end(), lblk_b_get);
// local DGEMM
dash::multiply(lblk_a_comp, lblk_b_comp, lblk_c_comp);
// wait for transfer to finish
get_a.wait(); get_b.wait();
// swap buffers
swap(lblk_a_get, lblk_a_comp); swap(lblk_b_get, lblk_b_comp);
}
Case Study: S(R)UMMA Algorithm
DISCLAIMER
This code is simplified for brevity.
Get the real source code here:
https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/SUMMA.h
| 19DASH
 Block matrix-matrix multiplication with prefetching
while(!done) {
blk_a = matrixA.local.block(k); …
blk_b = matrixB.local.block(k); …
// prefetch
auto get_a = dash::copy_async(blk_a.begin(), blk_a.end(), lblk_a_get);
auto get_b = dash::copy_async(blk_b.begin(), blk_b.end(), lblk_b_get);
// local DGEMM
dash::multiply(lblk_a_comp, lblk_b_comp, lblk_c_comp);
// wait for transfer to finish
get_a.wait(); get_b.wait();
// swap buffers
swap(lblk_a_get, lblk_a_comp); swap(lblk_b_get, lblk_b_comp);
}
Case Study: S(R)UMMA Algorithm
Schedules block
transmissions to
minimize network
congestion
| 20DASH
 Block matrix-matrix multiplication with prefetching
while(!done) {
blk_a = matrixA.local.block(k); …
blk_b = matrixB.local.block(k); …
// prefetch
auto get_a = dash::copy_async(blk_a.begin(), blk_a.end(), lblk_a_get);
auto get_b = dash::copy_async(blk_b.begin(), blk_b.end(), lblk_b_get);
// local DGEMM
dash::multiply(lblk_a_comp, lblk_b_comp, lblk_c_comp);
// wait for transfer to finish
get_a.wait(); get_b.wait();
// swap buffers
swap(lblk_a_get, lblk_a_comp); swap(lblk_b_get, lblk_b_comp);
}
Case Study: S(R)UMMA Algorithm
Local submatrix
multiplication using
DGEMM from serial
Intel MKL
Schedules block
transmissions to
minimize network
congestion
| 21DASH
DASH vs. DGEMM: Intel MKL, PLASMA
| 22DASH
DASH vs. PDGEMM: ScaLAPACK
 Good.
But acing singular benchmarks
is not the actual point.
Most important:
 the NArray concept allows intuitive
design of efficient algorithms
 we achieved portable, robust
efficiency on different hardware
and system environments
| 23DASH
Summary
 NArray Concept
– Views simplify design of efficient algorithms
– First-class support for locality-based operations
– Complies to existing C++ standard library concepts
 DASH algorithms on n-dim. ranges
– SUMMA case study: straight-forward, compact
implementation
– Leveraged portable efficiency of Intel MKL
– Beats performance in (P)DGEMM compared to
Intel MKL, PLASMA, ScaLAPACK
– Robust scalability in a variety of node-level and
highly distributed benchmark scenarios
http://www.dash-project.org/
http://github.com/dash-project/
| 24DASH
Acknowledgements
DASH on GitHub:
https://github.com/dash-project/dash/
 Funding
 The DASH Team
T. Fuchs (LMU), R. Kowalewski (LMU), D. Hünich (TUD), A. Knüpfer
(TUD), J. Gracia (HLRS), C. Glass (HLRS), H. Zhou (HLRS), K. Idrees
(HLRS), J. Schuchart (HLRS), F. Mößbauer (LMU), K. Fürlinger (LMU)

Contenu connexe

Tendances

Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahoutsscdotopen
 
0015.register allocation-graph-coloring
0015.register allocation-graph-coloring0015.register allocation-graph-coloring
0015.register allocation-graph-coloringsean chen
 
How Matlab Helps
How Matlab HelpsHow Matlab Helps
How Matlab Helpssiufu
 
R tools for HiC data visualization
R tools for HiC data visualizationR tools for HiC data visualization
R tools for HiC data visualizationtuxette
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 
Assembly language (addition and subtraction)
Assembly language (addition and subtraction)Assembly language (addition and subtraction)
Assembly language (addition and subtraction)Muhammad Umar Farooq
 
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio servicesMPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio servicesCyril Concolato
 
JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component Marialaura Bancheri
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmYu Liu
 
JGrass-NewAge probabilities forward component
JGrass-NewAge probabilities forward component JGrass-NewAge probabilities forward component
JGrass-NewAge probabilities forward component Marialaura Bancheri
 

Tendances (20)

Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahout
 
AU exam
AU examAU exam
AU exam
 
Matlab bode diagram_instructions
Matlab bode diagram_instructionsMatlab bode diagram_instructions
Matlab bode diagram_instructions
 
0015.register allocation-graph-coloring
0015.register allocation-graph-coloring0015.register allocation-graph-coloring
0015.register allocation-graph-coloring
 
How Matlab Helps
How Matlab HelpsHow Matlab Helps
How Matlab Helps
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Lecture set 5
Lecture set 5Lecture set 5
Lecture set 5
 
R tools for HiC data visualization
R tools for HiC data visualizationR tools for HiC data visualization
R tools for HiC data visualization
 
slides_v1
slides_v1slides_v1
slides_v1
 
Computer Science Assignment Help
Computer Science Assignment Help Computer Science Assignment Help
Computer Science Assignment Help
 
Raster package jacob
Raster package jacobRaster package jacob
Raster package jacob
 
Lecture13
Lecture13Lecture13
Lecture13
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Assembly language (addition and subtraction)
Assembly language (addition and subtraction)Assembly language (addition and subtraction)
Assembly language (addition and subtraction)
 
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio servicesMPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
 
JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component JGrass-NewAge probabilities backward component
JGrass-NewAge probabilities backward component
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
JGrass-NewAge probabilities forward component
JGrass-NewAge probabilities forward component JGrass-NewAge probabilities forward component
JGrass-NewAge probabilities forward component
 

En vedette

Matrix transapose in c++
Matrix transapose in c++Matrix transapose in c++
Matrix transapose in c++imran khan
 
~ Project-student report-card.cpp[1]
~ Project-student report-card.cpp[1]~ Project-student report-card.cpp[1]
~ Project-student report-card.cpp[1]Sunny Rekhi
 
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...widespreadpromotion
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
C programming project by navin thapa
C programming project by navin thapaC programming project by navin thapa
C programming project by navin thapaNavinthp
 
Computer science project work
Computer science project workComputer science project work
Computer science project workrahulchamp2345
 

En vedette (9)

Pooja
PoojaPooja
Pooja
 
Matrix transapose in c++
Matrix transapose in c++Matrix transapose in c++
Matrix transapose in c++
 
~ Project-student report-card.cpp[1]
~ Project-student report-card.cpp[1]~ Project-student report-card.cpp[1]
~ Project-student report-card.cpp[1]
 
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
 
Students report card for C++ project..
Students report card for C++ project..Students report card for C++ project..
Students report card for C++ project..
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
 
C programming project by navin thapa
C programming project by navin thapaC programming project by navin thapa
C programming project by navin thapa
 
Project report
Project reportProject report
Project report
 
Computer science project work
Computer science project workComputer science project work
Computer science project work
 

Similaire à Multidimensional PGAS Abstraction

Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Samir Bessalah
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Massimo Schenone
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide trainingSpark Summit
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopDilum Bandara
 
2015 03 27_ml_conf
2015 03 27_ml_conf2015 03 27_ml_conf
2015 03 27_ml_confSri Ambati
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo
 
Michal Malohlava presents: Open Source H2O and Scala
Michal Malohlava presents: Open Source H2O and Scala Michal Malohlava presents: Open Source H2O and Scala
Michal Malohlava presents: Open Source H2O and Scala Sri Ambati
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Presentation on use of r statistics
Presentation on use of r statisticsPresentation on use of r statistics
Presentation on use of r statisticsKrishna Dhakal
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
 
Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)Robert Metzger
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data VisualizationSakthi Dasans
 

Similaire à Multidimensional PGAS Abstraction (20)

Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
2015 03 27_ml_conf
2015 03 27_ml_conf2015 03 27_ml_conf
2015 03 27_ml_conf
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 
Michal Malohlava presents: Open Source H2O and Scala
Michal Malohlava presents: Open Source H2O and Scala Michal Malohlava presents: Open Source H2O and Scala
Michal Malohlava presents: Open Source H2O and Scala
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Presentation on use of r statistics
Presentation on use of r statisticsPresentation on use of r statistics
Presentation on use of r statistics
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 
Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
 

Dernier

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 

Dernier (20)

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 

Multidimensional PGAS Abstraction

  • 1. A Multidimensional Distributed Array Abstraction for PGAS www.dash-project.org Tobias Fuchs fuchst@nm.ifi.lmu.de Ludwig-Maximilians-Universität München, MNM-Team
  • 2. | 2DASH DASH - Overview  DASH is a C++ template library that offers – distributed data structures and parallel algorithms – a complete PGAS (part. global address space) programming system without a custom (pre-)compiler  PGAS Terminology – SHMEM Analogy Unit: The individual participants in a DASH program, usually full OS processes. Private Shared Unit 0 Unit 1 Unit N-1 int b; int c; dash::Array a(1000); int a; … dash::Shared s; 10..190..9 ..999 Shared data: managed by DASH in a virtual global address space Private data: managed by regular C/C++ mechanisms
  • 3. | 3DASH DASH Project Structure Phase I (2013-2015) Phase II (2016-2018) LMU Munich Project lead, C++ template library Project lead, C++ template library, data dock TU Dresden Libraries and interfaces, tools Smart data structures, resilience HLRS Stuttgart DART runtime DART runtime KIT Karlsruhe Application studies IHR Stuttgart Smart deployment, Application studies
  • 4. | 4DASH DASH - Partitioned Global Address Space  Data affinity – data has well-defined owner but can be accessed by any unit – data locality important for performance – support for the owner computes execution model  DASH: – unified access to local and remote data in global memory space
  • 5. | 5DASH DASH - Partitioned Global Address Space  Data affinity – data has well-defined owner but can be accessed by any unit – data locality important for performance – support for the owner computes execution model  DASH: – unified access to local and remote data in global memory space – and explicit views on local memory space
  • 6. | 6DASH DASH Distributed Data Structures Overview Container Description Data distribution Array<T> 1D Array static, configurable NArray<T, N> N-dim. Array static, configurable Shared<T> Shared scalar fixed (at 0) Directory(*)<T> Variable-size, locally indexed array manual, load-balanced List<T> Variable-size linked list dynamic, load-balanced Map<T> Variable-size associative map dynamic, balanced by hash function (*) Under construction
  • 7. | 7DASH DASH Distributed Data Structures Overview Container Description Data distribution Array<T> 1D Array static, configurable NArray<T, N> N-dim. Array static, configurable Shared<T> Shared scalar fixed (at 0) Directory(*)<T> Variable-size, locally indexed array manual, load-balanced List<T> Variable-size linked list dynamic, load-balanced Map<T> Variable-size associative map dynamic, balanced by hash function (*) Under construction
  • 8. | 8DASH Multidimensional Data Distribution (1)  dash::Pattern<N> specifies N-dim data distribution – Blocked, cyclic, and block-cyclic in multiple dimensions Pattern<2>(20, 15) (BLOCKED, NONE) (NONE, BLOCKCYCLIC(2)) (BLOCKED, BLOCKCYCLIC(3)) Extent in first and second dimension Distribution in first and second dimension
  • 9. | 9DASH Multidimensional Data Distribution (2)  Example: tiled and tile-shifted data distribution (TILE(4), TILE(3)) ShiftTilePattern<2>(32, 24)TilePattern<2, COL_MAJOR>(20, 15) (TILE(5), TILE(5))
  • 10. | 10DASH Multidimensional Views  Lightweight Multidimensional Views // 8x8 2D array dash::NArray<int, 2> mat(8,8); // linear access using iterators dash::distance(mat.begin(), mat.end()) == 64 // create 2x5 region view auto reg = matrix.cols(2,5).rows(3,2); // region can be used just like 2D array cout << reg[1][2] << endl; // ‘7’ dash::distance(reg.begin(), reg.end()) == 10
  • 11. | 11DASH Multidimensional Views  Lightweight Multidimensional Views – Local and block views dash::NArray<int, 2> mat(80000,17000); // view to block element range auto block = matrix.block(3,2); // use view as Cartesian space: auto elem = block[30][20]; dash::NArray<int, 2> mat(80000,17000); if (dash::myid() == 1) { // view to local element range auto local_elems = matrix.local; // use view as sequential range: for (auto elem : local_elems) { … } }
  • 12. | 12DASH Multidimensional Views  Multidimensional Iterator Ranges – Global iterators provide access to the underlying view of their index space – DASH global iterators on multi-dimensional regions can still be passed to standard library algorithms auto r = mat.sub(0, { 4,7 }) // rows .sub(1, { 2,7 }); // cols auto r_view = r.begin().view(); r_view.extents() == { 3,5 } r_view.offsets() == { 4,2 } // DASH algorithms use n-dim. view: dash::summa(r.begin(), r.end(), …); // multidimensional iterators still // are sequential: std::for_each(r.begin(), r.end(), …);
  • 13. | 13DASH Multidimensional Views  Multidimensional Iterator Ranges – Global iterators provide access to the data distribution pattern of their iteration space auto r = mat.sub(0, { 4,7 }) // rows .sub(1, { 2,7 }); // cols auto r_pattern = r.begin().pattern(); r_pattern.blocksize() == 4 r_pattern.blocks() == 16 r_pattern.blocks_at(dash::myid()) == 4
  • 14. | 14DASH DASH Algorithms  Growing number of DASH equivalents to STL algorithms:  Examples of STL algorithms ported to DASH, also work for multidimensional ranges: dash::GlobIter<T> dash::fill(GlobIter<T> begin, GlobIter<T> end, T val); - dash::fill range[i] <- val - dash::generate range[i] <- func() - dash::for_each func(range[i]) - dash::transform range[i] = func(range2[i]) - dash::accumulate sum(range[i]) (0<=i<=n-1) - dash::min_element min(range[i]) (0<=i<=n-1) - dash::copy range[i] <- range2[i]
  • 15. | 15DASH DASH Algorithms  Growing number of DASH equivalents to STL algorithms:  Examples of STL algorithms ported to DASH, also work for multidimensional ranges: dash::GlobIter<T> dash::fill(GlobIter<T> begin, GlobIter<T> end, T val); - dash::fill range[i] <- val - dash::generate range[i] <- func() - dash::for_each func(range[i]) - dash::transform range[i] = func(range2[i]) - dash::accumulate sum(range[i]) (0<=i<=n-1) - dash::min_element min(range[i]) (0<=i<=n-1) - dash::copy range[i] <- range2[i]
  • 16. | 16DASH Asynchronous Copying for Latency Hiding  Asynchronous Operations – Async. algorithm interface: dash::copy_async() – Launch policy: dash::launch::async (in upcoming DASH release 0.3.0) std::vector<int> lcopy(block.size()); // starts async. copy of global range to local memory // … via algorithm interface: auto fut = dash::copy_async(block.begin(), block.end(), lcopy.begin()); // … or via launch policy: auto fut = dash::copy(dash::launch::async, block.begin(), block.end(), lcopy.begin()); overlapping computation(); auto copy_end = fut.get(); // blocks until copy received
  • 17. | 17DASH  Block matrix-matrix multiplication with prefetching while(!done) { blk_a = matrixA.local.block(k); … blk_b = matrixB.local.block(k); … // prefetch auto get_a = dash::copy_async(blk_a.begin(), blk_a.end(), lblk_a_get); auto get_b = dash::copy_async(blk_b.begin(), blk_b.end(), lblk_b_get); // local DGEMM dash::multiply(lblk_a_comp, lblk_b_comp, lblk_c_comp); // wait for transfer to finish get_a.wait(); get_b.wait(); // swap buffers swap(lblk_a_get, lblk_a_comp); swap(lblk_b_get, lblk_b_comp); } Case Study: S(R)UMMA Algorithm
  • 18. | 18DASH  Block matrix-matrix multiplication with prefetching while(!done) { blk_a = matrixA.local.block(k); … blk_b = matrixB.local.block(k); … // prefetch auto get_a = dash::copy_async(blk_a.begin(), blk_a.end(), lblk_a_get); auto get_b = dash::copy_async(blk_b.begin(), blk_b.end(), lblk_b_get); // local DGEMM dash::multiply(lblk_a_comp, lblk_b_comp, lblk_c_comp); // wait for transfer to finish get_a.wait(); get_b.wait(); // swap buffers swap(lblk_a_get, lblk_a_comp); swap(lblk_b_get, lblk_b_comp); } Case Study: S(R)UMMA Algorithm DISCLAIMER This code is simplified for brevity. Get the real source code here: https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/SUMMA.h
  • 19. | 19DASH  Block matrix-matrix multiplication with prefetching while(!done) { blk_a = matrixA.local.block(k); … blk_b = matrixB.local.block(k); … // prefetch auto get_a = dash::copy_async(blk_a.begin(), blk_a.end(), lblk_a_get); auto get_b = dash::copy_async(blk_b.begin(), blk_b.end(), lblk_b_get); // local DGEMM dash::multiply(lblk_a_comp, lblk_b_comp, lblk_c_comp); // wait for transfer to finish get_a.wait(); get_b.wait(); // swap buffers swap(lblk_a_get, lblk_a_comp); swap(lblk_b_get, lblk_b_comp); } Case Study: S(R)UMMA Algorithm Schedules block transmissions to minimize network congestion
  • 20. | 20DASH  Block matrix-matrix multiplication with prefetching while(!done) { blk_a = matrixA.local.block(k); … blk_b = matrixB.local.block(k); … // prefetch auto get_a = dash::copy_async(blk_a.begin(), blk_a.end(), lblk_a_get); auto get_b = dash::copy_async(blk_b.begin(), blk_b.end(), lblk_b_get); // local DGEMM dash::multiply(lblk_a_comp, lblk_b_comp, lblk_c_comp); // wait for transfer to finish get_a.wait(); get_b.wait(); // swap buffers swap(lblk_a_get, lblk_a_comp); swap(lblk_b_get, lblk_b_comp); } Case Study: S(R)UMMA Algorithm Local submatrix multiplication using DGEMM from serial Intel MKL Schedules block transmissions to minimize network congestion
  • 21. | 21DASH DASH vs. DGEMM: Intel MKL, PLASMA
  • 22. | 22DASH DASH vs. PDGEMM: ScaLAPACK  Good. But acing singular benchmarks is not the actual point. Most important:  the NArray concept allows intuitive design of efficient algorithms  we achieved portable, robust efficiency on different hardware and system environments
  • 23. | 23DASH Summary  NArray Concept – Views simplify design of efficient algorithms – First-class support for locality-based operations – Complies to existing C++ standard library concepts  DASH algorithms on n-dim. ranges – SUMMA case study: straight-forward, compact implementation – Leveraged portable efficiency of Intel MKL – Beats performance in (P)DGEMM compared to Intel MKL, PLASMA, ScaLAPACK – Robust scalability in a variety of node-level and highly distributed benchmark scenarios http://www.dash-project.org/ http://github.com/dash-project/
  • 24. | 24DASH Acknowledgements DASH on GitHub: https://github.com/dash-project/dash/  Funding  The DASH Team T. Fuchs (LMU), R. Kowalewski (LMU), D. Hünich (TUD), A. Knüpfer (TUD), J. Gracia (HLRS), C. Glass (HLRS), H. Zhou (HLRS), K. Idrees (HLRS), J. Schuchart (HLRS), F. Mößbauer (LMU), K. Fürlinger (LMU)