SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Data structures &
representations
mquartulli@vicomtech.org
remote sensing data processing architectures
Job Queue
Analysis Workers
Data
Catalogue
Processing Workers
Auto Scaling
Ingestion
Data
Catalogue
Exploitation
Annotations
Catalogue
User Application Servers
Load
Balancer
User
Source Products
Domain
Expert
Configuration
Admin
Domain
Expert
direct data import
Data Processing / Data Intelligence
Servers
13/34
Hadoop Cluster computing
[Lisa Vaas 2016]
Spark cluster computing
[Hitesh Dharmdasani, “Python and Bigdata - An Introduction to Spark (PySpark)”]
spark + mongodb
The log data structure
• An append-only ordered sequence
of records.
• In DBs: log shipping protocols to
transmit portions of log to slave
replica databases
• In distributed systems: the State
Machine Replication Principle: If two
identical, deterministic processes
begin in the same state and get the
same inputs in the same order, they
will produce the same output and
end in the same state.
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Why database management systems
• An interface between the database and
application programs, ensuring that data
is consistently organized and remains
easily accessible.
• Manages: 1. the data 2. the engine that
allows data to be accessed, locked and
modified 3. the database schema, which
defines the database’s logical structure.
Why database management systems
• Provides:
• Data abstraction and independence; Data security; A locking mechanism for concurrent access;
• An efficient handler to balance the needs of multiple applications using the same data;
• The ability to swiftly recover from crashes and errors, including restartability and recoverability;
• Robust data integrity capabilities; Logging and auditing of activity;
• Simple access using a standard application programming interface (API);
• Uniform administration procedures for data.
• Question:
• Does your application need all this? Typically yes if concurrent insert/update accesses, distributed
reads…
History
• 1960s: Navigational DBs
• 1970s-1980s: SQL, normalisation and OLTP, transactions
• 1990s: object oriented DBs and OLAP, warehousing
• 2000s: NoSQL and the CAP theorem
• 2010s: newSQL, graph DBs, Big Data
Databases: a user’s view
• SQL and ACID: Atomicity, Consistency, Isolation, and Durability
• Variety and NoSQL: MongoDB
• Volume and NoSQL: HBase, Cassandra
• Velocity and NoSQL: MonetDB, KairosDB
• NoSQL: bad at relationships —> Graph DBs: Neo4J
• The CAP theorem: Consistency, Availability, Partition tolerance
SQL vs NoSQL
[Lisa Vaas 2016]
DB indices
• Objective: sub-linear search (e.g. O(logN), O(1))
• E.g. bitmaps, keys/pointers to records, keys/pointers to blocks, 

reverse indices.
• Implementations in terms of (balanced) trees, hashes, B+trees.
• Types:
• non-clustered: logical order only, multiple indices possible
• clustered: physical order too for efficiency,

a single clustered index per table.
KD Trees
• Binary space-partitioning trees in D dimensions
• For every non-leaf node: generate a splitting hyperplane that divides
the space into two half-spaces.
• Canonical construction: given all input points
• Cycles through the axes
• Split by the medians with respect to the current axis
source: wikipedia
LSH
• Locality-Sensitive Hashing: maximize probability of collision —
similar items end up in same bucket.
• Applications in near-duplicate detection and “fingerprinting”,
similarity nearest neighbor search, hierarchical clustering.
• E.g. by random projection: use a random hyperplane to hash
vectors.
Inverted indices
• Forward index: document —> content
• Inverted index: content —> document / location
• Record level, “Word” level
• Allows fast search (increases insertion cost!): queries can be
resolved by jumping to the “word” id in the inverted index.
apache arrow
import feather
path = 'my_data.feather'
feather.write_dataframe(df, path)
df = feather.read_dataframe(path)
Google Earth Engine
• A Short Intro by Kersten Clauss…
• Question: how would you replicate what’s under the hood?
18
data	acquisition	&	management
GeoEuskadi	satellite	image	services	
• linked	open	data	management	infrastructure	
• public	sources:	Landsat	y	Copernicus	Sentinel	1-2
19
data	processing
map	update	by

distributed	analysis

of	25cm-

regional-scale

ortho-imagery
Lozano	Silva,	J.;	Aginako	Bengoa,	N.;	Quartulli,	M.;	Olaizola,	I.G.;	Zulueta,	E.,	"Web-Based	
Supervised	Thematic	Mapping,"	in	Selected	Topics	in	Applied	Earth	Observations	and	
Remote	Sensing,	IEEE	Journal	of	,	vol.8,	no.5,	pp.2165-2176,	May	2015
20
data	processing	–	example	video
image search: iqcbm
Index management system
Data input
Column-based DB
User interface
Analysis & processing
Image
analysis
• with DLR IMF BW

• search by compression

in compressed streams

• dynamic taxonomies

• corel 10k
iqcbm: geoeye
semantic label and these labels are stored into the database as part of the patch
information.
In the following, the results of different queries using the CBIR-FCD and TerraSAR-X
images are described.
TELEIOS FP7-257662
7. Applicable and Reference Documents
7.1. Applicable Documents
Document Title
Internal
Referenc
e
Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1]
N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”,
Revision 1.0, Version 1.8.1.
[RI- 2]
Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features
for Image Classification". IEEE Transactions on Systems, Man, and
Cybernetics, SMC-3 (6): 610–621.
[RI- 3]
The GLCM Tutorial
http://www.fp.ucalgary.ca/mhallbey/tutorial.htm
[RI-4]
H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms",
Birkhäuser.
[RI-5]
B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval
of image data”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 18 (8): 837–842.
[RI-6]
A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization
using space variant spectral analysis". IEEE Radar Conference, 1-4.
[RI-7]
T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”.
IEEE Transactions on Multimedia, 8(3), 564-574.
[RI-8]
A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of
interpolation / decimation / tree decomposition techniques". International Conf.
on information Science and Systems, Patras, Greece: 443-446.
[RI-9]
P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band
Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4
(3): 4-20.
[RI-10]
23
iqcbm: digitalglobe
TELEIOS FP7-257662
7. Applicable and Reference Documents
7.1. Applicable Documents
Document Title
Internal
Referenc
e
Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1]
N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”,
Revision 1.0, Version 1.8.1.
[RI- 2]
Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features
for Image Classification". IEEE Transactions on Systems, Man, and
Cybernetics, SMC-3 (6): 610–621.
[RI- 3]
The GLCM Tutorial
http://www.fp.ucalgary.ca/mhallbey/tutorial.htm
[RI-4]
H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms",
Birkhäuser.
[RI-5]
B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval
of image data”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 18 (8): 837–842.
[RI-6]
A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization
using space variant spectral analysis". IEEE Radar Conference, 1-4.
[RI-7]
T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”.
IEEE Transactions on Multimedia, 8(3), 564-574.
[RI-8]
A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of
interpolation / decimation / tree decomposition techniques". International Conf.
on information Science and Systems, Patras, Greece: 443-446.
[RI-9]
P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band
Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4
(3): 4-20.
[RI-10]
24
.76
.69
.79
.90
.95
.99
TELEIOS FP7-257662
7. Applicable and Reference Documents
7.1. Applicable Documents
Document Title
Internal
Referenc
e
Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1]
N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”,
Revision 1.0, Version 1.8.1.
[RI- 2]
Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features
for Image Classification". IEEE Transactions on Systems, Man, and
Cybernetics, SMC-3 (6): 610–621.
[RI- 3]
The GLCM Tutorial
http://www.fp.ucalgary.ca/mhallbey/tutorial.htm
[RI-4]
H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms",
Birkhäuser.
[RI-5]
B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval
of image data”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 18 (8): 837–842.
[RI-6]
A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization
using space variant spectral analysis". IEEE Radar Conference, 1-4.
[RI-7]
T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”.
IEEE Transactions on Multimedia, 8(3), 564-574.
[RI-8]
A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of
interpolation / decimation / tree decomposition techniques". International Conf.
on information Science and Systems, Patras, Greece: 443-446.
[RI-9]
P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band
Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4
(3): 4-20.
[RI-10]
normalized distance from query
25
iqcbm: terrasar-x
TELEIOS FP7-257662
7. Applicable and Reference Documents
7.1. Applicable Documents
Document Title
Internal
Referenc
e
Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1]
N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”,
Revision 1.0, Version 1.8.1.
[RI- 2]
Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features
for Image Classification". IEEE Transactions on Systems, Man, and
Cybernetics, SMC-3 (6): 610–621.
[RI- 3]
The GLCM Tutorial
http://www.fp.ucalgary.ca/mhallbey/tutorial.htm
[RI-4]
H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms",
Birkhäuser.
[RI-5]
B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval
of image data”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 18 (8): 837–842.
[RI-6]
A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization
using space variant spectral analysis". IEEE Radar Conference, 1-4.
[RI-7]
T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”.
IEEE Transactions on Multimedia, 8(3), 564-574.
[RI-8]
A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of
interpolation / decimation / tree decomposition techniques". International Conf.
on information Science and Systems, Patras, Greece: 443-446.
[RI-9]
P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band
Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4
(3): 4-20.
[RI-10]
26
iqcbm: terrasar-x
TELEIOS FP7-257662
7. Applicable and Reference Documents
7.1. Applicable Documents
Document Title
Internal
Referenc
e
Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1]
N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”,
Revision 1.0, Version 1.8.1.
[RI- 2]
Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features
for Image Classification". IEEE Transactions on Systems, Man, and
Cybernetics, SMC-3 (6): 610–621.
[RI- 3]
The GLCM Tutorial
http://www.fp.ucalgary.ca/mhallbey/tutorial.htm
[RI-4]
H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms",
Birkhäuser.
[RI-5]
B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval
of image data”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 18 (8): 837–842.
[RI-6]
A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization
using space variant spectral analysis". IEEE Radar Conference, 1-4.
[RI-7]
T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”.
IEEE Transactions on Multimedia, 8(3), 564-574.
[RI-8]
A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of
interpolation / decimation / tree decomposition techniques". International Conf.
on information Science and Systems, Patras, Greece: 443-446.
[RI-9]
P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band
Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4
(3): 4-20.
[RI-10]
27
experiment: tsx
iqcbm: terrasar-x
D3.1 KDD concepts and methods proposal: report & design recommendations 96
The patches were annotated with a semantic label by using the Search Engine based on
SVM tool and user supervision previously presented in section 5.1. The semantic labels
associated to the selected classes were previously described in Table 5.
In the following, we present some examples of retrieving TerraSAR-X structures using
both images. Table 11 displays the query images and the 20 top retrieved images. Some
quality metrics (Precision and Recall) were computed from these results and they are
summarized in Table 12.
Query
images
Retrieved images
Class9
Class6
Class7
Class36
TELEIOS FP7-257662
Class20
Class31
Class28
Class32
Table 11: Results of the queries based on image content using CBIR-FCD as data
mining tool.
Table 12 shows the precision and recall for the classes and the query time in seconds
needed for searching and retrieving the results.
Table 12: Precision and recall of the semantic classes using query based on content
and the query time.
Class Precision
(%)
Recall
(%)
Query time
(sec)
Class1 5,36 5,17 0.32882
Class2 10,71 10,34 0.318238
Class3 5,36 5,17 0.235323
Class4 7,14 6,90 0.107209
TELEIOS FP7-257662
7. Applicable and Reference Documents
7.1. Applicable Documents
Document Title
Internal
Referenc
e
Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1]
N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”,
Revision 1.0, Version 1.8.1.
[RI- 2]
Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features
for Image Classification". IEEE Transactions on Systems, Man, and
Cybernetics, SMC-3 (6): 610–621.
[RI- 3]
The GLCM Tutorial
http://www.fp.ucalgary.ca/mhallbey/tutorial.htm
[RI-4]
H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms",
Birkhäuser.
[RI-5]
B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval
of image data”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 18 (8): 837–842.
[RI-6]
A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization
using space variant spectral analysis". IEEE Radar Conference, 1-4.
[RI-7]
T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”.
IEEE Transactions on Multimedia, 8(3), 564-574.
[RI-8]
A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of
interpolation / decimation / tree decomposition techniques". International Conf.
on information Science and Systems, Patras, Greece: 443-446.
[RI-9]
P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band
Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4
(3): 4-20.
[RI-10]
TELEIOS KDD concepts and methods proposal: report & design recommendations

Corneliu Octavian Dumitru, Daniela Espinoza Molina, Shiyong Cui, Jagmal Singh, Marco Quartulli, Mihai Datcu 

2011, FP7 TELEIOS Tech Report
29
interactive	web-based	data	retrieval
web-based		
interactive

classification	

of	image	content

Contenu connexe

Tendances

What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitGanesan Narayanasamy
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersSaliya Ekanayake
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence GeneratorRim Moussa
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.KGMGROUP
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 

Tendances (20)

What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence Generator
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Asd 2015
Asd 2015Asd 2015
Asd 2015
 

En vedette

08 visualisation seminar ver0.2
08 visualisation seminar   ver0.208 visualisation seminar   ver0.2
08 visualisation seminar ver0.2Marco Quartulli
 
08 distributed optimization
08 distributed optimization08 distributed optimization
08 distributed optimizationMarco Quartulli
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
07 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_201607 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_2016Marco Quartulli
 
05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extractionMarco Quartulli
 
04 bigdata and_cloud_computing
04 bigdata and_cloud_computing04 bigdata and_cloud_computing
04 bigdata and_cloud_computingMarco Quartulli
 

En vedette (10)

06 ashish mahabal bse2
06 ashish mahabal bse206 ashish mahabal bse2
06 ashish mahabal bse2
 
08 visualisation seminar ver0.2
08 visualisation seminar   ver0.208 visualisation seminar   ver0.2
08 visualisation seminar ver0.2
 
08 distributed optimization
08 distributed optimization08 distributed optimization
08 distributed optimization
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
07 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_201607 big skyearth_dlr_7_april_2016
07 big skyearth_dlr_7_april_2016
 
06 ashish mahabal bse1
06 ashish mahabal bse106 ashish mahabal bse1
06 ashish mahabal bse1
 
05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction05 sensor signal_models_feature_extraction
05 sensor signal_models_feature_extraction
 
05 astrostat feigelson
05 astrostat feigelson05 astrostat feigelson
05 astrostat feigelson
 
06 ashish mahabal bse3
06 ashish mahabal bse306 ashish mahabal bse3
06 ashish mahabal bse3
 
04 bigdata and_cloud_computing
04 bigdata and_cloud_computing04 bigdata and_cloud_computing
04 bigdata and_cloud_computing
 

Similaire à 07 data structures_and_representations

Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...Deltares
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Safe Software
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
The Matrix: connecting and re-using digital records of archaeological investi...
The Matrix: connecting and re-using digital records of archaeological investi...The Matrix: connecting and re-using digital records of archaeological investi...
The Matrix: connecting and re-using digital records of archaeological investi...Keith.May
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
 
P2P Resource Discovery for the Browser
P2P Resource Discovery for the BrowserP2P Resource Discovery for the Browser
P2P Resource Discovery for the BrowserDavid Dias
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSymeon Papadopoulos
 
Scientific
Scientific Scientific
Scientific marpierc
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databasesijaia
 

Similaire à 07 data structures_and_representations (20)

Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
The Matrix: connecting and re-using digital records of archaeological investi...
The Matrix: connecting and re-using digital records of archaeological investi...The Matrix: connecting and re-using digital records of archaeological investi...
The Matrix: connecting and re-using digital records of archaeological investi...
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
P2P Resource Discovery for the Browser
P2P Resource Discovery for the BrowserP2P Resource Discovery for the Browser
P2P Resource Discovery for the Browser
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Scientific
Scientific Scientific
Scientific
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databases
 

Dernier

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 

Dernier (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

07 data structures_and_representations

  • 2. remote sensing data processing architectures Job Queue Analysis Workers Data Catalogue Processing Workers Auto Scaling Ingestion Data Catalogue Exploitation Annotations Catalogue User Application Servers Load Balancer User Source Products Domain Expert Configuration Admin Domain Expert direct data import Data Processing / Data Intelligence Servers 13/34
  • 4. Spark cluster computing [Hitesh Dharmdasani, “Python and Bigdata - An Introduction to Spark (PySpark)”]
  • 6. The log data structure • An append-only ordered sequence of records. • In DBs: log shipping protocols to transmit portions of log to slave replica databases • In distributed systems: the State Machine Replication Principle: If two identical, deterministic processes begin in the same state and get the same inputs in the same order, they will produce the same output and end in the same state. https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  • 7. Why database management systems • An interface between the database and application programs, ensuring that data is consistently organized and remains easily accessible. • Manages: 1. the data 2. the engine that allows data to be accessed, locked and modified 3. the database schema, which defines the database’s logical structure.
  • 8. Why database management systems • Provides: • Data abstraction and independence; Data security; A locking mechanism for concurrent access; • An efficient handler to balance the needs of multiple applications using the same data; • The ability to swiftly recover from crashes and errors, including restartability and recoverability; • Robust data integrity capabilities; Logging and auditing of activity; • Simple access using a standard application programming interface (API); • Uniform administration procedures for data. • Question: • Does your application need all this? Typically yes if concurrent insert/update accesses, distributed reads…
  • 9. History • 1960s: Navigational DBs • 1970s-1980s: SQL, normalisation and OLTP, transactions • 1990s: object oriented DBs and OLAP, warehousing • 2000s: NoSQL and the CAP theorem • 2010s: newSQL, graph DBs, Big Data
  • 10. Databases: a user’s view • SQL and ACID: Atomicity, Consistency, Isolation, and Durability • Variety and NoSQL: MongoDB • Volume and NoSQL: HBase, Cassandra • Velocity and NoSQL: MonetDB, KairosDB • NoSQL: bad at relationships —> Graph DBs: Neo4J • The CAP theorem: Consistency, Availability, Partition tolerance
  • 11. SQL vs NoSQL [Lisa Vaas 2016]
  • 12. DB indices • Objective: sub-linear search (e.g. O(logN), O(1)) • E.g. bitmaps, keys/pointers to records, keys/pointers to blocks, 
 reverse indices. • Implementations in terms of (balanced) trees, hashes, B+trees. • Types: • non-clustered: logical order only, multiple indices possible • clustered: physical order too for efficiency,
 a single clustered index per table.
  • 13. KD Trees • Binary space-partitioning trees in D dimensions • For every non-leaf node: generate a splitting hyperplane that divides the space into two half-spaces. • Canonical construction: given all input points • Cycles through the axes • Split by the medians with respect to the current axis source: wikipedia
  • 14. LSH • Locality-Sensitive Hashing: maximize probability of collision — similar items end up in same bucket. • Applications in near-duplicate detection and “fingerprinting”, similarity nearest neighbor search, hierarchical clustering. • E.g. by random projection: use a random hyperplane to hash vectors.
  • 15. Inverted indices • Forward index: document —> content • Inverted index: content —> document / location • Record level, “Word” level • Allows fast search (increases insertion cost!): queries can be resolved by jumping to the “word” id in the inverted index.
  • 16. apache arrow import feather path = 'my_data.feather' feather.write_dataframe(df, path) df = feather.read_dataframe(path)
  • 17. Google Earth Engine • A Short Intro by Kersten Clauss… • Question: how would you replicate what’s under the hood?
  • 21. image search: iqcbm Index management system Data input Column-based DB User interface Analysis & processing Image analysis • with DLR IMF BW • search by compression
 in compressed streams • dynamic taxonomies • corel 10k
  • 22. iqcbm: geoeye semantic label and these labels are stored into the database as part of the patch information. In the following, the results of different queries using the CBIR-FCD and TerraSAR-X images are described. TELEIOS FP7-257662 7. Applicable and Reference Documents 7.1. Applicable Documents Document Title Internal Referenc e Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1] N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”, Revision 1.0, Version 1.8.1. [RI- 2] Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features for Image Classification". IEEE Transactions on Systems, Man, and Cybernetics, SMC-3 (6): 610–621. [RI- 3] The GLCM Tutorial http://www.fp.ucalgary.ca/mhallbey/tutorial.htm [RI-4] H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms", Birkhäuser. [RI-5] B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval of image data”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (8): 837–842. [RI-6] A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization using space variant spectral analysis". IEEE Radar Conference, 1-4. [RI-7] T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”. IEEE Transactions on Multimedia, 8(3), 564-574. [RI-8] A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of interpolation / decimation / tree decomposition techniques". International Conf. on information Science and Systems, Patras, Greece: 443-446. [RI-9] P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4 (3): 4-20. [RI-10]
  • 23. 23 iqcbm: digitalglobe TELEIOS FP7-257662 7. Applicable and Reference Documents 7.1. Applicable Documents Document Title Internal Referenc e Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1] N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”, Revision 1.0, Version 1.8.1. [RI- 2] Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features for Image Classification". IEEE Transactions on Systems, Man, and Cybernetics, SMC-3 (6): 610–621. [RI- 3] The GLCM Tutorial http://www.fp.ucalgary.ca/mhallbey/tutorial.htm [RI-4] H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms", Birkhäuser. [RI-5] B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval of image data”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (8): 837–842. [RI-6] A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization using space variant spectral analysis". IEEE Radar Conference, 1-4. [RI-7] T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”. IEEE Transactions on Multimedia, 8(3), 564-574. [RI-8] A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of interpolation / decimation / tree decomposition techniques". International Conf. on information Science and Systems, Patras, Greece: 443-446. [RI-9] P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4 (3): 4-20. [RI-10]
  • 24. 24 .76 .69 .79 .90 .95 .99 TELEIOS FP7-257662 7. Applicable and Reference Documents 7.1. Applicable Documents Document Title Internal Referenc e Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1] N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”, Revision 1.0, Version 1.8.1. [RI- 2] Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features for Image Classification". IEEE Transactions on Systems, Man, and Cybernetics, SMC-3 (6): 610–621. [RI- 3] The GLCM Tutorial http://www.fp.ucalgary.ca/mhallbey/tutorial.htm [RI-4] H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms", Birkhäuser. [RI-5] B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval of image data”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (8): 837–842. [RI-6] A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization using space variant spectral analysis". IEEE Radar Conference, 1-4. [RI-7] T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”. IEEE Transactions on Multimedia, 8(3), 564-574. [RI-8] A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of interpolation / decimation / tree decomposition techniques". International Conf. on information Science and Systems, Patras, Greece: 443-446. [RI-9] P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4 (3): 4-20. [RI-10] normalized distance from query
  • 25. 25 iqcbm: terrasar-x TELEIOS FP7-257662 7. Applicable and Reference Documents 7.1. Applicable Documents Document Title Internal Referenc e Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1] N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”, Revision 1.0, Version 1.8.1. [RI- 2] Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features for Image Classification". IEEE Transactions on Systems, Man, and Cybernetics, SMC-3 (6): 610–621. [RI- 3] The GLCM Tutorial http://www.fp.ucalgary.ca/mhallbey/tutorial.htm [RI-4] H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms", Birkhäuser. [RI-5] B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval of image data”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (8): 837–842. [RI-6] A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization using space variant spectral analysis". IEEE Radar Conference, 1-4. [RI-7] T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”. IEEE Transactions on Multimedia, 8(3), 564-574. [RI-8] A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of interpolation / decimation / tree decomposition techniques". International Conf. on information Science and Systems, Patras, Greece: 443-446. [RI-9] P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4 (3): 4-20. [RI-10]
  • 26. 26 iqcbm: terrasar-x TELEIOS FP7-257662 7. Applicable and Reference Documents 7.1. Applicable Documents Document Title Internal Referenc e Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1] N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”, Revision 1.0, Version 1.8.1. [RI- 2] Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features for Image Classification". IEEE Transactions on Systems, Man, and Cybernetics, SMC-3 (6): 610–621. [RI- 3] The GLCM Tutorial http://www.fp.ucalgary.ca/mhallbey/tutorial.htm [RI-4] H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms", Birkhäuser. [RI-5] B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval of image data”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (8): 837–842. [RI-6] A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization using space variant spectral analysis". IEEE Radar Conference, 1-4. [RI-7] T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”. IEEE Transactions on Multimedia, 8(3), 564-574. [RI-8] A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of interpolation / decimation / tree decomposition techniques". International Conf. on information Science and Systems, Patras, Greece: 443-446. [RI-9] P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4 (3): 4-20. [RI-10]
  • 28. iqcbm: terrasar-x D3.1 KDD concepts and methods proposal: report & design recommendations 96 The patches were annotated with a semantic label by using the Search Engine based on SVM tool and user supervision previously presented in section 5.1. The semantic labels associated to the selected classes were previously described in Table 5. In the following, we present some examples of retrieving TerraSAR-X structures using both images. Table 11 displays the query images and the 20 top retrieved images. Some quality metrics (Precision and Recall) were computed from these results and they are summarized in Table 12. Query images Retrieved images Class9 Class6 Class7 Class36 TELEIOS FP7-257662 Class20 Class31 Class28 Class32 Table 11: Results of the queries based on image content using CBIR-FCD as data mining tool. Table 12 shows the precision and recall for the classes and the query time in seconds needed for searching and retrieving the results. Table 12: Precision and recall of the semantic classes using query based on content and the query time. Class Precision (%) Recall (%) Query time (sec) Class1 5,36 5,17 0.32882 Class2 10,71 10,34 0.318238 Class3 5,36 5,17 0.235323 Class4 7,14 6,90 0.107209 TELEIOS FP7-257662 7. Applicable and Reference Documents 7.1. Applicable Documents Document Title Internal Referenc e Katrin Molch et al., (2010). “Naming Convention for Image Patches”. [RI- 1] N. Ritter and M. Ruth, (1995) “GeoTIFF Format Specification GeoTIFF”, Revision 1.0, Version 1.8.1. [RI- 2] Robert M Haralick, K Shanmugam, Its'hak Dinstein (1973). "Textural Features for Image Classification". IEEE Transactions on Systems, Man, and Cybernetics, SMC-3 (6): 610–621. [RI- 3] The GLCM Tutorial http://www.fp.ucalgary.ca/mhallbey/tutorial.htm [RI-4] H.G. Feichtinger, Th. Strohmer (1998). "Gabor Analysis and Algorithms", Birkhäuser. [RI-5] B.S. Manjunath, W.Y. Ma (1996). “Texture features for browsing and retrieval of image data”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (8): 837–842. [RI-6] A. Popescu, I. Gavat, M. Datcu (2008). "Complex SAR image characterization using space variant spectral analysis". IEEE Radar Conference, 1-4. [RI-7] T. Li, M. Ogihara (2006). “Towards Intelligent Music Information Retrieval”. IEEE Transactions on Multimedia, 8(3), 564-574. [RI-8] A. Croisier, D. Esteban, C. Galand (1976). "Perfect channel splitting by use of interpolation / decimation / tree decomposition techniques". International Conf. on information Science and Systems, Patras, Greece: 443-446. [RI-9] P.P Vaidyananthan (1987). “Quadrature Mirror Filter Banks, M-Band Extensions and Perfect-Reconstruction Techniques”. IEEE ASSP Magazine, 4 (3): 4-20. [RI-10] TELEIOS KDD concepts and methods proposal: report & design recommendations Corneliu Octavian Dumitru, Daniela Espinoza Molina, Shiyong Cui, Jagmal Singh, Marco Quartulli, Mihai Datcu 
 2011, FP7 TELEIOS Tech Report