Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Roberto Trasarti PhD Thesis
1. University of Pisa Mastering the Spatio-Temporal Knowledge Discovery Process PhD Candidate:Roberto Trasarti PhD Thesis discussion
2. Spatio-Temporal context Research on moving-object data analysis has been recently fostered by the widespread diffusion of new techniques and systems for monitoring, collecting and storing location-aware data, generated by a wealth of technological infrastructures, such as: Global Positioning System (GPS) Global System for Mobile (GSM) Sensor networks
3. Knowledge Discovery Process Knowledge discovery is a multi-step process, that involves data preprocessing, pattern mining stages and pattern post-processing.
4. Motivations Lack of a unifying framework, where mining tools are specific components of the knowledge discovery process. ? Models Data Having elements from different worlds causes an impedence mismatch
5. Related Works In the literature there aren’t proposals addressing the problem of an uniform framework There are approaches on Moving Objects Database such as Secondo and Hermes which provide some primitives. The thesis work has been inspired by well known literature works on the inductive database vision
6.
7.
8. Object representation of Data and Models Using the object relational paradigm we represent data and models as objects The set of attribute types A can be partitioned in three subset : AsAd Am Ad Data Types Data World Spatial objectTemporal object Moving object AmModelstypes Model World T-Pattern objectsCluster object Flock object Object Type
9. Data Types y Spatial objectis an object which has a geometric shape and a position in space. Temporal objectis an object which has an absolute temporal reference and a duration. Moving objectis an object which changesin time and space. x t y t x
10. Data-World The D-World represents the entities to be analyzed, as well as their properties and mutual relationships. Intuitively the D-World is the set of entities which describe the trajectory dataset and/or a set of regions and/or a partition of the day. The D-World is a set of tables defined only by attributes in Ad and As
11. Models Types T-Pattern is a concise description of frequent behaviors, in terms of both space and time Clusteris a the spatio-temrporal affinitybetween a set of moving objectsw.r.t. a distance function. Flockis the spatio-temporal coincidence between a set of moving objectswho move togheter. RegionA RegionC RegionB 10 min 5 min
12. Model-World The M-World contains all the movement patterns extracted from the data with their properties and relationships. The M-World contains the collection of models, unveiled at the different stages of the knowledge discovery process. The M-World is a set of tables defined only by attributes in Am and As
13. Two-Worlds Operators Operators can be intra-world or inter-world and for each type different classes of operators have been defined.
14. The aim of this class of operators is to build objects in D-World starting from the raw data. It realizes the data acquisition step of the knowledge discovery process. Generic Data Constructor operator is defined as OPconstructor(T,p) Td Data Constructor Operators
15. This kind operatorsrealizes the extractionof models from the D-World through data mining algorithms. Generic Model Constructor operator is defined as OPmining(Td,p) Tm Model Constructor Operators
16. Transformation operators are intra-world tasks aimed at manipulating data and models These operations are the means for expressing data pre-processing and post-processing tasks. Generic D-Transformation operator is defined as OPD-Transf(Td,p) T’d Generic M-Transformation operator is defined as OPM-Transf(Tm,p) T’m Transformation Operators
17. Relation operatorsinclude both intra-worldand inter-world operations and have the objective of creating relations between data, models, and the combination of the two. Generic DD-Relation operator is defined as OPDD-Relation (Tdd,f ) TRdd Generic MM-Relation operator is defined as OPMM-Relation (Tmm,f ) TRmm Generic DM-Relation operator is defined as OPDM-Relation (Tdm,f ) TRdm Relation Operators
18. The predicate f can assume a large variety of predicates. However, the semantics of these predicates depends on the type of the data (resp.model) objects to which they are applied. Predicates of relation operators DD DM MM
22. The Design of the GeoPKDD system The GeoPKDD system is an implementation of the Two-Worlds model and the Data Mining Query Language.
23. Object Realtional Database and Database Manager As described above the object relational database contains both data and models and grants the power of SQL. It contains the representation of data and models. The database manager realizes a middle layer and using the translation libraries detaches the system from the database techonologies
24. Language Parser and Controller Identifies the various types of queries and builds a plan of execution of them as sequence of actions for the controller. Example: CREATE MODELS ClusteringTable USING OPTICSFROM (Select t.id, t.trajobj fromTrajectories t)SET OPTICS.distance_method = Route Similarity AND OPTICS.eps = 50 AND OPTICS.min_size = 100 Plan: Retrieve[ Select t.id, t.trajobj from Trajectories t ] Translate[ Data type: Moving point ] Execute[ Mining algorithm: Optics algorithm, Parameters: ... ] Translate[ Model type: Cluster ] Store[ Table Name: ClusteringTable ]
25. Algorithms Manager This component is a plug-in module capable of managing different sets of libraries Each library realizes a different sets of operators according to the Two-World framework proposed.
26. Algorithms Libraries Data construction library Moving object Reconstruction algorithm Spatial object Builder algotirhm Termporal object Builder algoritm Model construction library T-Pattern algorithm Optics algorithm T-Flock algorithm Transformation library Resampling algorithm Intersection algoritm Object filtering T-Anonimity algorithms Relation Library All the predicates CREATE DATA MobilityData BUILDING MOVING_POINTSFROM (SELECT userid,lon,lat,datetime FROM MobilityRawData ORDER BY userid,datetime) SET MOVING_POINT.MAX_SPACE_GAP = 2000m AND MOVING_POINT.MAX_TIME_GAP = 1800 sec CREATE MODELS Patterns USING T-PATTERNFROM (Select t.id, t.trajobj from Trajectories t) SET T-PATTERN.support = .02 AND T-PATTERN.time = 120 sec CREATE TRANSFORMATION AnonimizedData USING NWA FROM (SELECT t.id, t.trajobj FROM Trajectories t) SET ANONYMIZATION.K = 10 AND ANONYMIZATION.TIME_SLOT = 600 sec CREATE RELATION EntailmentTable USING ENTIAL FROM (SELECT t.id, t.trajobj, p.id, p.obj FROM Trajectories t, Patterns p)
30. Add-ons: Location Prediction The goal is to constructs a predictive model using the set of T-patterns extracted on a set of trajectories. Given a new trajectory the predictive model can be used to predict the next location of it. Prediction Tree Local patterns Trajectory dataset CREATE TRANSFORMATION TPatternTree USING TPATTERN_TREE FROM( Select p.id, p.TpatternObj FROM PatternTable p )
31. Add-ons: K-Best Map Matching A new way to perform the Map Matching The shortest path assumption in real cases can be violated in situations where other external factors play a role (i.e. Traffic congestion) CREATE DATA K-MobilityData BUILDING K-MOVING_POINTS FROM( SELECT userid, lon, lat, datetime FROM MobilityRawData ORDER BY userid, datetime) SET K-MOVING_POINTS.K = 5 AND K-MOVING_POINTS.MAP = StreetMapFile.wkt
37. Demo GeoPKDD system Equipped with a very simple GUI which enables the user to write down DMQL queries and visualize the results M-Atlas The new generation of the GUI where the DMQL is used to build complex analysis creating scripts.
38.
39. the definition of a DMQL which realizes the operators of the framework