SlideShare a Scribd company logo
1 of 13
Download to read offline
How to search extracted data
© Copyright 2015 NowSecure, Inc.
Javier Collado
● It’s hard to decode data for each application with limited resources
○ There are a lot of applications
○ Each application version might change:
■ format (file type, database schema)
■ content (new and interesting data)
● Many applications store data in SQLite databases
Data extraction in mobile devices
© Copyright 2015 NowSecure, Inc.
● Libraries
○ Low level interface
○ Examples: lucene, xapian, whoosh
● Servers
○ High level interface
○ Examples: solr, elasticsearch, sphinx
Index and search
© Copyright 2015 NowSecure, Inc.
● Very flexible and permissive: each value has its own type
● Storage class: group of related datatypes (different lengths, encodings, …)
● Type affinity: preferred storage class for a column based on column type
● Not all the content should be indexed:
○ sqlite_master, sqlite_sequence
○ FTS tables
○ BLOBs
SQLite
© Copyright 2015 NowSecure, Inc.
sqlite> CREATE TABLE names (id INTEGER, name TEXT);
sqlite> INSERT INTO names VALUES (1, "Alice");
sqlite> INSERT INTO names VALUES ("Bob", 2);
sqlite> SELECT typeof(id), id, typeof(name), name FROM names;
integer|1|text|Alice
text|Bob|text|2
sqlite>
SQLite
© Copyright 2015 NowSecure, Inc.
sqlite> CREATE TABLE names (id INTEGER name TEXT);
sqlite> .schema names
CREATE TABLE names (id INTEGER name TEXT);
sqlite> INSERT INTO names VALUES (1, "Alice");
Error: table names has 1 columns but 2 values were supplied
SQLite
© Copyright 2015 NowSecure, Inc.
● Search server
● Document oriented (json)
● RESTful API
● Schema (mapping) not required, but needed to avoid errors due to SQLite flexibility
ElasticSearch
© Copyright 2015 NowSecure, Inc.
$ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: 1, name: "Alice"}'
{"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_version":1,"created":true}
$ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: "Bob", name: 2}'
{"error":"MapperParsingException[failed to parse [id]]; nested: NumberFormatException[For input string: "
Bob"]; ","status":400}
$ curl -XGET 'http://localhost:9200/dfrws/_mapping/names'
{"dfrws":{"mappings":{"names":{"properties":{"id":{"type":"long"},"name":{"type":"string"}}}}}}
ElasticSearch
© Copyright 2015 NowSecure, Inc.
$ curl -XPOST 'http://localhost:9200/dfrws/_names' -d '{id: 1, name: "Alice"}'
{"error":"InvalidTypeNameException[mapping type name [_names] can't start with '_']","status":400}
$ curl -XGET 'http://localhost:9200/dfrws/names/_search' -d '{query: {match: {name: "Alice"}}}'
{"took":27,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":
0.30685282,"hits":[{"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_score":0.30685282,"
_source":{id: 1, name: "Alice"}}]}}
ElasticSearch
© Copyright 2015 NowSecure, Inc.
● https://github.com/jcollado/esis
● Command line tool written in python
○ Ability to index every row in every table in every database file found under a given directory
○ Ability to search using simple queries
Example tool
© Copyright 2015 NowSecure, Inc.
● SQLite content can be indexed in elasticsearch but…
○ Types need to be consistent
○ Not relevant information needs to be discarded
Conclusions
© Copyright 2015 NowSecure, Inc.
● Index text information from other file types (Apache Tika)
● Regular expressions
● Highlight search results
● Search suggestions
● Language detection and custom analyzers
● Proximity matching (match vs. match_phrase)
Future work
© Copyright 2015 NowSecure, Inc.
© Copyright 2015 NowSecure, Inc.
Thanks

More Related Content

What's hot

SXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup LanguageSXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup Language
elliando dias
 
DBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pmDBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pm
Sheeju Alex
 

What's hot (20)

SXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup LanguageSXML: S-expression eXtensible Markup Language
SXML: S-expression eXtensible Markup Language
 
Object Storage
Object StorageObject Storage
Object Storage
 
JS App Architecture
JS App ArchitectureJS App Architecture
JS App Architecture
 
DBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pmDBIx::Class walkthrough @ bangalore pm
DBIx::Class walkthrough @ bangalore pm
 
Introduction to CouchDB - LA Hacker News
Introduction to CouchDB - LA Hacker NewsIntroduction to CouchDB - LA Hacker News
Introduction to CouchDB - LA Hacker News
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Data-Defined Typed Schema Generation in Accumulo
Data-Defined Typed Schema Generation in AccumuloData-Defined Typed Schema Generation in Accumulo
Data-Defined Typed Schema Generation in Accumulo
 
Json Persistence Framework
Json Persistence FrameworkJson Persistence Framework
Json Persistence Framework
 
An Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and KeynoteAn Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and Keynote
 
Mysql DBI
Mysql DBIMysql DBI
Mysql DBI
 
MongoDB
MongoDBMongoDB
MongoDB
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
Php 2
Php 2Php 2
Php 2
 
Python Files
Python FilesPython Files
Python Files
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
PHP Training Session 6
PHP Training Session 6PHP Training Session 6
PHP Training Session 6
 
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
Azure Table Storage: The Good, the Bad, the Ugly (10 min. lightning talk)
 

Viewers also liked (9)

Class Inventions New
Class Inventions NewClass Inventions New
Class Inventions New
 
Famiglia Lavoro
Famiglia LavoroFamiglia Lavoro
Famiglia Lavoro
 
Bdi Linkedin
Bdi LinkedinBdi Linkedin
Bdi Linkedin
 
Invest.Acktual English April 2309 V 2.0
Invest.Acktual English April 2309 V 2.0Invest.Acktual English April 2309 V 2.0
Invest.Acktual English April 2309 V 2.0
 
Essere Madre, Essere Padre oggi
Essere Madre, Essere Padre oggiEssere Madre, Essere Padre oggi
Essere Madre, Essere Padre oggi
 
Heart diseases
Heart diseasesHeart diseases
Heart diseases
 
The gorgeous pearl
The gorgeous pearlThe gorgeous pearl
The gorgeous pearl
 
Pictures Of Zanzibar
Pictures Of ZanzibarPictures Of Zanzibar
Pictures Of Zanzibar
 
Zanzibar, The old days
Zanzibar, The old daysZanzibar, The old days
Zanzibar, The old days
 

Similar to How to search extracted data

Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query Engines
MapR Technologies
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 
MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document Store
Olivier DASINI
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 

Similar to How to search extracted data (20)

Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with Elasticsearch
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
iOS & Drupal
iOS & DrupaliOS & Drupal
iOS & Drupal
 
Building Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query EnginesBuilding Highly Flexible, High Performance Query Engines
Building Highly Flexible, High Performance Query Engines
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
War of the Indices- SQL vs. Oracle
War of the Indices-  SQL vs. OracleWar of the Indices-  SQL vs. Oracle
War of the Indices- SQL vs. Oracle
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
 
Json to hive_schema_generator
Json to hive_schema_generatorJson to hive_schema_generator
Json to hive_schema_generator
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
 
RESTFul development with Apache sling
RESTFul development with Apache slingRESTFul development with Apache sling
RESTFul development with Apache sling
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQL
 
Discover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQLDiscover The Power of NoSQL + MySQL with MySQL
Discover The Power of NoSQL + MySQL with MySQL
 
MySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document StoreMySQL Day Paris 2018 - MySQL JSON Document Store
MySQL Day Paris 2018 - MySQL JSON Document Store
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

How to search extracted data

  • 1. How to search extracted data © Copyright 2015 NowSecure, Inc. Javier Collado
  • 2. ● It’s hard to decode data for each application with limited resources ○ There are a lot of applications ○ Each application version might change: ■ format (file type, database schema) ■ content (new and interesting data) ● Many applications store data in SQLite databases Data extraction in mobile devices © Copyright 2015 NowSecure, Inc.
  • 3. ● Libraries ○ Low level interface ○ Examples: lucene, xapian, whoosh ● Servers ○ High level interface ○ Examples: solr, elasticsearch, sphinx Index and search © Copyright 2015 NowSecure, Inc.
  • 4. ● Very flexible and permissive: each value has its own type ● Storage class: group of related datatypes (different lengths, encodings, …) ● Type affinity: preferred storage class for a column based on column type ● Not all the content should be indexed: ○ sqlite_master, sqlite_sequence ○ FTS tables ○ BLOBs SQLite © Copyright 2015 NowSecure, Inc.
  • 5. sqlite> CREATE TABLE names (id INTEGER, name TEXT); sqlite> INSERT INTO names VALUES (1, "Alice"); sqlite> INSERT INTO names VALUES ("Bob", 2); sqlite> SELECT typeof(id), id, typeof(name), name FROM names; integer|1|text|Alice text|Bob|text|2 sqlite> SQLite © Copyright 2015 NowSecure, Inc.
  • 6. sqlite> CREATE TABLE names (id INTEGER name TEXT); sqlite> .schema names CREATE TABLE names (id INTEGER name TEXT); sqlite> INSERT INTO names VALUES (1, "Alice"); Error: table names has 1 columns but 2 values were supplied SQLite © Copyright 2015 NowSecure, Inc.
  • 7. ● Search server ● Document oriented (json) ● RESTful API ● Schema (mapping) not required, but needed to avoid errors due to SQLite flexibility ElasticSearch © Copyright 2015 NowSecure, Inc.
  • 8. $ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: 1, name: "Alice"}' {"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_version":1,"created":true} $ curl -XPOST 'http://localhost:9200/dfrws/names' -d '{id: "Bob", name: 2}' {"error":"MapperParsingException[failed to parse [id]]; nested: NumberFormatException[For input string: " Bob"]; ","status":400} $ curl -XGET 'http://localhost:9200/dfrws/_mapping/names' {"dfrws":{"mappings":{"names":{"properties":{"id":{"type":"long"},"name":{"type":"string"}}}}}} ElasticSearch © Copyright 2015 NowSecure, Inc.
  • 9. $ curl -XPOST 'http://localhost:9200/dfrws/_names' -d '{id: 1, name: "Alice"}' {"error":"InvalidTypeNameException[mapping type name [_names] can't start with '_']","status":400} $ curl -XGET 'http://localhost:9200/dfrws/names/_search' -d '{query: {match: {name: "Alice"}}}' {"took":27,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score": 0.30685282,"hits":[{"_index":"dfrws","_type":"names","_id":"AUxNeQ7-7Nsk22Tyod1W","_score":0.30685282," _source":{id: 1, name: "Alice"}}]}} ElasticSearch © Copyright 2015 NowSecure, Inc.
  • 10. ● https://github.com/jcollado/esis ● Command line tool written in python ○ Ability to index every row in every table in every database file found under a given directory ○ Ability to search using simple queries Example tool © Copyright 2015 NowSecure, Inc.
  • 11. ● SQLite content can be indexed in elasticsearch but… ○ Types need to be consistent ○ Not relevant information needs to be discarded Conclusions © Copyright 2015 NowSecure, Inc.
  • 12. ● Index text information from other file types (Apache Tika) ● Regular expressions ● Highlight search results ● Search suggestions ● Language detection and custom analyzers ● Proximity matching (match vs. match_phrase) Future work © Copyright 2015 NowSecure, Inc.
  • 13. © Copyright 2015 NowSecure, Inc. Thanks