SlideShare a Scribd company logo
1 of 24
Download to read offline
Crime Analytics: Analysis of Crimes
Through Newspaper Articles
Indika Perera
Isuru Jayaweera
Chamath Sajeewa
Sampath Liyanage
Tharindu Wijewardane
Department of Computer Science and Engineering
Faculty of Engineering
University of Moratuwa
Sri Lanka
Adeesha Wijayasiri
Department of Computer and Information Science and Engineering
University of Florida
United States
Introduction
Why Crime Analysis?
• To identify general and specific crime trends, patterns and
series in an ongoing, timely manner.
• To utilize limited law enforcement resources.
• To be proactive in detecting and preventing crimes.
• To meet the law enforcement needs of changing society.
A major challenge faced by most of the law enforcement and intelligence
organizations is efficiently and accurately analyzing the growing volumes of crime
related data. The vast geographical diversity and the complexity of crime patterns
have made the analyzing and recording of crime data more difficult. Data mining is a
powerful tool that can be used effectively for analyzing large databases and deriving
important analytical results.
Crime Analysis Approaches Using in
SriLanka
• Police department uses manual crime recording and
analysis system.
• There is no free and open accessible crime analysis
system in Sri Lanka.
 Grave Crime Records
 Minor Crime Records
Usefulness of Crime Analysis System
● Police Department can use the system when they create
security plans.
● Police department can evaluate their existing plans.
● Investors can use the system when they want to find suitable
areas for investments.
● Tourists and tourist agents can use the system when they are
planning their tours.
Related Work
Crime Data Mining Techniques
• Entity extraction
• Association rule mining
• Deviation detection
• Classification
Existing systems that use data mining techniques for
crime investigation
• Regional crime analysis program
• Link analysis concepts
• Data mining framework for crime pattern identification
• Narcotics network in Tucson police department
• Clustering techniques
• Sequential pattern mining
• String comparison
Crime Analysis Steps
• Hotspot Detection
• Crime Pattern Visualization
• Nearest Police Station Detection
A framework has been proposed which includes relationships between the crime data
mining techniques and crime type characteristics. Framework has been developed by
using Tucson Police Department crime classification database. Using this framework,
investigator can determine the most suitable data mining technique for his/her task. As
given by the proposed framework, investigators can use neural network techniques in
crime entity extraction/ prediction, clustering techniques are effective in crime association/
prediction, and social network analysis can facilitate crime association/pattern
visualization.
• Crime Comparison
• Crime Clock
• Outbreaks Detection
• Web Crawling – multi threaded crawlers, preferential crawlers,
focused crawlers
• Document Classification – stop words removal,
lemmatization/stemming, tf-idf, syntactic and semantic
arrangement of words, support vector machines, word net/ ontology,
different error costs, sampling techniques, cross validation
• Entity Extraction – sentence splitting, tokenizing, POS tagging,
supervised/ semi- supervised/ unsupervised entity extraction
• Duplicate Detection – near duplicate detection, finger prints
(shingles, simhash), hamming distance
Preliminaries
The Proposed System
 This paper presents a web based crime analysis system.
 Sri Lankan English newspapers (Daily Mirror, The Island,
and Ceylon Today) are used as the source for details of
crime incidents.
 Newspaper articles are crawled using a focused crawler
and then they are classified.
The Proposed System
 Required entities are extracted from classified crime
articles and duplicate detection is performed.
 By using these preprocessed data, crime analysis
operations are performed and results are displayed
using a web based GUI.
 Unlike most systems, this system is open to anyone who
is interested in crime analysis.
The Proposed System (cont.)
Web Crawling
Crime Analysis and Prediction
Document Classification Entity Extraction Duplicate Detection
Our Solution
• Crawler – crawler4j, Jsoup, cookie handler
• Document Classifier – Weka, LibSVM, SMOTE, Different
Error Costs
• Entity Extractor – GATE, ANNIE, Stanford NLP, Google
Map API
• Duplicate Detector – 64 bit simhash calculator,
murmur hash calculator, hamming distances
• Web Interface – HighCharts, Java scripts, AJAX
Implementation Details
Results
Crime Hot Spot Analysis
Crime Comparison
Crime Pattern Visualization
Conclusion
• The proposed system performs crime analysis
operations such as hotspot detection, crime comparison
and crime pattern visualization.
• Graphical user interface of the system uses graphs and
diagrams to display the results which make crime
analysis a very simple task.
• Therefore law enforcement officers and other
interested users will be able to use this system
effectively and efficiently for crime analysis processes.
• Also this is a publicly accessible system, so that anyone
who is interested in this area will be able to use this
system freely.
Future Work
• Crime prediction is expected to be implemented in future
to enhance the functionality of the system.
• Comprehensiveness of the news article collection can be
further improved by extending the news article crawler to
crawl more news websites.
• Linguistic knowledge (WordNet, Ontology, etc.) can be
incorporated with the document classifier module in
order to improve the accuracy of the classification
process.
• Entity extraction module can be improved by
incorporating more rules which will improve accuracy
and comprehensiveness of the entity extraction process.
Questions?
Thank You!

More Related Content

What's hot

Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringReuben George
 
Crime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningCrime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningAnavadya Shibu
 
06 analysis of crime
06 analysis of crime06 analysis of crime
06 analysis of crimeJim Gilmer
 
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?Sunil Jagani
 
04 Evidence Collection and Data Seizure - Notes
04 Evidence Collection and Data Seizure - Notes04 Evidence Collection and Data Seizure - Notes
04 Evidence Collection and Data Seizure - NotesKranthi
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
Computer forensics and its role
Computer forensics and its roleComputer forensics and its role
Computer forensics and its roleSudeshna Basak
 
Digital Image Forensics: camera fingerprint and its robustness
Digital Image Forensics: camera fingerprint and its robustness Digital Image Forensics: camera fingerprint and its robustness
Digital Image Forensics: camera fingerprint and its robustness Francesco Forestieri
 
Chapter 3 - Updated
Chapter 3 - UpdatedChapter 3 - Updated
Chapter 3 - Updatedglickauf
 
Crime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspectiveCrime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspectiveBenjamin Ang
 
Cybercrime And Cyber forensics
Cybercrime And  Cyber forensics Cybercrime And  Cyber forensics
Cybercrime And Cyber forensics sunanditaAnand
 
Criminology ppt by_waseem_i._khan
Criminology ppt by_waseem_i._khanCriminology ppt by_waseem_i._khan
Criminology ppt by_waseem_i._khanwaseemkhanpbn
 

What's hot (20)

Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means Clustering
 
Crime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningCrime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data Mining
 
06 analysis of crime
06 analysis of crime06 analysis of crime
06 analysis of crime
 
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
 
04 Evidence Collection and Data Seizure - Notes
04 Evidence Collection and Data Seizure - Notes04 Evidence Collection and Data Seizure - Notes
04 Evidence Collection and Data Seizure - Notes
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
Criminal Investigations (Part One)
Criminal Investigations (Part One)Criminal Investigations (Part One)
Criminal Investigations (Part One)
 
Computer forensics and its role
Computer forensics and its roleComputer forensics and its role
Computer forensics and its role
 
Digital Image Forensics: camera fingerprint and its robustness
Digital Image Forensics: camera fingerprint and its robustness Digital Image Forensics: camera fingerprint and its robustness
Digital Image Forensics: camera fingerprint and its robustness
 
Cybercrime investigation
Cybercrime investigationCybercrime investigation
Cybercrime investigation
 
Chapter 3 - Updated
Chapter 3 - UpdatedChapter 3 - Updated
Chapter 3 - Updated
 
CS6004 Cyber Forensics
CS6004 Cyber ForensicsCS6004 Cyber Forensics
CS6004 Cyber Forensics
 
Crime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspectiveCrime sensing with big data - Singapore perspective
Crime sensing with big data - Singapore perspective
 
First Responder Officer in Cyber Crime
First Responder Officer in Cyber CrimeFirst Responder Officer in Cyber Crime
First Responder Officer in Cyber Crime
 
Predictive Policing
Predictive PolicingPredictive Policing
Predictive Policing
 
Digital Forensics
Digital ForensicsDigital Forensics
Digital Forensics
 
Cybercrime And Cyber forensics
Cybercrime And  Cyber forensics Cybercrime And  Cyber forensics
Cybercrime And Cyber forensics
 
Criminology ppt by_waseem_i._khan
Criminology ppt by_waseem_i._khanCriminology ppt by_waseem_i._khan
Criminology ppt by_waseem_i._khan
 
Criminology powerpoint one
Criminology powerpoint oneCriminology powerpoint one
Criminology powerpoint one
 
Community Policing
Community PolicingCommunity Policing
Community Policing
 

Viewers also liked

2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis 2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis Yawen Li
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsFundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsOsokop
 
Prevent the crime, don't just record it
Prevent the crime, don't just record itPrevent the crime, don't just record it
Prevent the crime, don't just record itVideoIQ
 
Micro processor programs
Micro processor programsMicro processor programs
Micro processor programssantosh kumar
 
Cloud GIS for Crime Mapping
Cloud GIS for Crime Mapping Cloud GIS for Crime Mapping
Cloud GIS for Crime Mapping IJORCS
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 
Molecular forensics 2010
Molecular forensics 2010Molecular forensics 2010
Molecular forensics 2010Bruno Mmassy
 
Fundamentalsof Crime Mapping 6
Fundamentalsof Crime Mapping 6Fundamentalsof Crime Mapping 6
Fundamentalsof Crime Mapping 6Osokop
 
API Strategies for Big Data - If Data Were Oil
API Strategies for Big Data - If Data Were OilAPI Strategies for Big Data - If Data Were Oil
API Strategies for Big Data - If Data Were OilDrew Bartkiewicz
 
API and Big Data Solution Patterns
API and Big Data Solution Patterns API and Big Data Solution Patterns
API and Big Data Solution Patterns WSO2
 
Big data應用讓企業獲利翻倍
Big data應用讓企業獲利翻倍Big data應用讓企業獲利翻倍
Big data應用讓企業獲利翻倍Weng Wallace
 

Viewers also liked (15)

2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis 2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsFundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis Concepts
 
Prevent the crime, don't just record it
Prevent the crime, don't just record itPrevent the crime, don't just record it
Prevent the crime, don't just record it
 
What is big data
What is big dataWhat is big data
What is big data
 
CCTNS Karnataka Overview
CCTNS Karnataka OverviewCCTNS Karnataka Overview
CCTNS Karnataka Overview
 
Micro processor programs
Micro processor programsMicro processor programs
Micro processor programs
 
Cloud GIS for Crime Mapping
Cloud GIS for Crime Mapping Cloud GIS for Crime Mapping
Cloud GIS for Crime Mapping
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Molecular forensics 2010
Molecular forensics 2010Molecular forensics 2010
Molecular forensics 2010
 
Fundamentalsof Crime Mapping 6
Fundamentalsof Crime Mapping 6Fundamentalsof Crime Mapping 6
Fundamentalsof Crime Mapping 6
 
API Strategies for Big Data - If Data Were Oil
API Strategies for Big Data - If Data Were OilAPI Strategies for Big Data - If Data Were Oil
API Strategies for Big Data - If Data Were Oil
 
Process Maker Features
Process Maker FeaturesProcess Maker Features
Process Maker Features
 
CCTNS & Homeland Security
CCTNS & Homeland SecurityCCTNS & Homeland Security
CCTNS & Homeland Security
 
API and Big Data Solution Patterns
API and Big Data Solution Patterns API and Big Data Solution Patterns
API and Big Data Solution Patterns
 
Big data應用讓企業獲利翻倍
Big data應用讓企業獲利翻倍Big data應用讓企業獲利翻倍
Big data應用讓企業獲利翻倍
 

Similar to Crime Analytics: Analysis of crimes through news paper articles

Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Crime forecasting system for soic final
Crime forecasting system for soic finalCrime forecasting system for soic final
Crime forecasting system for soic finalTwaha Minai
 
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...IJARIIT
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
 
Crime rate analysis using k nn in python
Crime rate analysis using k nn in python Crime rate analysis using k nn in python
Crime rate analysis using k nn in python CloudTechnologies
 
9th may net sci presentation (1)
9th may net sci presentation (1)9th may net sci presentation (1)
9th may net sci presentation (1)Rajath Mahesh
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxImpetus Technologies
 
Crime analysis mapping, intrusion detection using data mining
Crime analysis mapping, intrusion detection using data miningCrime analysis mapping, intrusion detection using data mining
Crime analysis mapping, intrusion detection using data miningVenkat Projects
 
Crime analysis mapping, intrusion detection using data mining
Crime analysis mapping, intrusion detection using data miningCrime analysis mapping, intrusion detection using data mining
Crime analysis mapping, intrusion detection using data miningVenkat Projects
 
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Tarun Amarnath
 
Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisIOSR Journals
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticseSAT Journals
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and AnalysisPrashant Chopra
 

Similar to Crime Analytics: Analysis of crimes through news paper articles (20)

PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Case.pptx
Case.pptxCase.pptx
Case.pptx
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Crime forecasting system for soic final
Crime forecasting system for soic finalCrime forecasting system for soic final
Crime forecasting system for soic final
 
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime Rate
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime Rate
 
Crime rate analysis using k nn in python
Crime rate analysis using k nn in python Crime rate analysis using k nn in python
Crime rate analysis using k nn in python
 
9th may net sci presentation (1)
9th may net sci presentation (1)9th may net sci presentation (1)
9th may net sci presentation (1)
 
Netsci
NetsciNetsci
Netsci
 
Ijarcce 6
Ijarcce 6Ijarcce 6
Ijarcce 6
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
 
Crime analysis mapping, intrusion detection using data mining
Crime analysis mapping, intrusion detection using data miningCrime analysis mapping, intrusion detection using data mining
Crime analysis mapping, intrusion detection using data mining
 
Crime analysis mapping, intrusion detection using data mining
Crime analysis mapping, intrusion detection using data miningCrime analysis mapping, intrusion detection using data mining
Crime analysis mapping, intrusion detection using data mining
 
CRIME.pptx
CRIME.pptxCRIME.pptx
CRIME.pptx
 
Fraud analysis
Fraud analysisFraud analysis
Fraud analysis
 
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...
 
Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysis
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analytics
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
 

Recently uploaded

YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 

Recently uploaded (17)

YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 

Crime Analytics: Analysis of crimes through news paper articles

  • 1. Crime Analytics: Analysis of Crimes Through Newspaper Articles Indika Perera Isuru Jayaweera Chamath Sajeewa Sampath Liyanage Tharindu Wijewardane Department of Computer Science and Engineering Faculty of Engineering University of Moratuwa Sri Lanka Adeesha Wijayasiri Department of Computer and Information Science and Engineering University of Florida United States
  • 3. Why Crime Analysis? • To identify general and specific crime trends, patterns and series in an ongoing, timely manner. • To utilize limited law enforcement resources. • To be proactive in detecting and preventing crimes. • To meet the law enforcement needs of changing society. A major challenge faced by most of the law enforcement and intelligence organizations is efficiently and accurately analyzing the growing volumes of crime related data. The vast geographical diversity and the complexity of crime patterns have made the analyzing and recording of crime data more difficult. Data mining is a powerful tool that can be used effectively for analyzing large databases and deriving important analytical results.
  • 4. Crime Analysis Approaches Using in SriLanka • Police department uses manual crime recording and analysis system. • There is no free and open accessible crime analysis system in Sri Lanka.  Grave Crime Records  Minor Crime Records
  • 5. Usefulness of Crime Analysis System ● Police Department can use the system when they create security plans. ● Police department can evaluate their existing plans. ● Investors can use the system when they want to find suitable areas for investments. ● Tourists and tourist agents can use the system when they are planning their tours.
  • 7. Crime Data Mining Techniques • Entity extraction • Association rule mining • Deviation detection • Classification Existing systems that use data mining techniques for crime investigation • Regional crime analysis program • Link analysis concepts • Data mining framework for crime pattern identification • Narcotics network in Tucson police department • Clustering techniques • Sequential pattern mining • String comparison
  • 8. Crime Analysis Steps • Hotspot Detection • Crime Pattern Visualization • Nearest Police Station Detection A framework has been proposed which includes relationships between the crime data mining techniques and crime type characteristics. Framework has been developed by using Tucson Police Department crime classification database. Using this framework, investigator can determine the most suitable data mining technique for his/her task. As given by the proposed framework, investigators can use neural network techniques in crime entity extraction/ prediction, clustering techniques are effective in crime association/ prediction, and social network analysis can facilitate crime association/pattern visualization. • Crime Comparison • Crime Clock • Outbreaks Detection
  • 9. • Web Crawling – multi threaded crawlers, preferential crawlers, focused crawlers • Document Classification – stop words removal, lemmatization/stemming, tf-idf, syntactic and semantic arrangement of words, support vector machines, word net/ ontology, different error costs, sampling techniques, cross validation • Entity Extraction – sentence splitting, tokenizing, POS tagging, supervised/ semi- supervised/ unsupervised entity extraction • Duplicate Detection – near duplicate detection, finger prints (shingles, simhash), hamming distance Preliminaries
  • 11.  This paper presents a web based crime analysis system.  Sri Lankan English newspapers (Daily Mirror, The Island, and Ceylon Today) are used as the source for details of crime incidents.  Newspaper articles are crawled using a focused crawler and then they are classified. The Proposed System
  • 12.  Required entities are extracted from classified crime articles and duplicate detection is performed.  By using these preprocessed data, crime analysis operations are performed and results are displayed using a web based GUI.  Unlike most systems, this system is open to anyone who is interested in crime analysis. The Proposed System (cont.)
  • 13. Web Crawling Crime Analysis and Prediction Document Classification Entity Extraction Duplicate Detection Our Solution
  • 14. • Crawler – crawler4j, Jsoup, cookie handler • Document Classifier – Weka, LibSVM, SMOTE, Different Error Costs • Entity Extractor – GATE, ANNIE, Stanford NLP, Google Map API • Duplicate Detector – 64 bit simhash calculator, murmur hash calculator, hamming distances • Web Interface – HighCharts, Java scripts, AJAX Implementation Details
  • 16. Crime Hot Spot Analysis
  • 20. • The proposed system performs crime analysis operations such as hotspot detection, crime comparison and crime pattern visualization. • Graphical user interface of the system uses graphs and diagrams to display the results which make crime analysis a very simple task. • Therefore law enforcement officers and other interested users will be able to use this system effectively and efficiently for crime analysis processes. • Also this is a publicly accessible system, so that anyone who is interested in this area will be able to use this system freely.
  • 22. • Crime prediction is expected to be implemented in future to enhance the functionality of the system. • Comprehensiveness of the news article collection can be further improved by extending the news article crawler to crawl more news websites. • Linguistic knowledge (WordNet, Ontology, etc.) can be incorporated with the document classifier module in order to improve the accuracy of the classification process. • Entity extraction module can be improved by incorporating more rules which will improve accuracy and comprehensiveness of the entity extraction process.

Editor's Notes

  1. Crime data mining is the application of data mining techniques for crime analysis, Data mining is a powerful tool that can be used effectively for analyzing large databases quickly and efficiently.
  2. Crime data mining is the application of data mining techniques for crime analysis, Data mining is a powerful tool that can be used effectively for analyzing large databases quickly and efficiently.
  3. Crime data mining is the application of data mining techniques for crime analysis, Data mining is a powerful tool that can be used effectively for analyzing large databases quickly and efficiently.
  4. Crime data mining is the application of data mining techniques for crime analysis, Data mining is a powerful tool that can be used effectively for analyzing large databases quickly and efficiently.
  5. In addition to these steps, [1] has given some other analysis steps such as crime clock, outbreaks detection and nearest police station detection. In crime pattern visualization, a time series can be drawn between the crime frequency and the time and using it interesting crime trends can be identified. An intelligent crime identification system is described in [6] which can be used to predict possible suspects for given crime. They have used five types of agents namely, message space agent, gateway agent, prisoner agent, criminal agent and evidence agent.
  6. Web crawler is an Internet bot. It systematically browses the World Wide Web typically for the purpose of web indexing [10]. Crawlers can be selective about the pages they fetch (Ex- crawl only the pages of selected newspaper sites) and are then referred to as preferential or heuristic-based crawlers [11]. Preferential crawlers built to retrieve pages within a certain topic are called topical or focused crawlers. To perform an effective document classification, syntactic (arrangement of words) and semantic (meaning of words) aspects of the natural language have to be addressed [13]. There are a lot of algorithms available for text classification and among them SVM (support vector machine) is the best [14]. The most commonly used approach for document classification is the utilization of SVM in combination with other linguistic related resources such as ontology or word net [15] [16]. Various approaches such as different error costs and sampling techniques [17] [18] have been proposed in order to handle this problem. This is the main step of transforming the unstructured data into the structured format. Extraction of named entities involves identification of small chunks of appropriate texts and classification of them into one of such predefined categories of interest [19]. In order to do a successful entity extraction there is some amount of preprocessing needs to be done on the unstructured data (i.e. sentence splitting, tokenizing, POS tagging, etc.) [20]. Duplicates (i.e. documents that are exact duplicates of each other due to mirroring and plagiarism) are easy to identify by standard check summing techniques. A more difficult problem is the identification of near-duplicate documents. Two such documents are identical in terms of content but differ in a small portion of the document [21]. Document similarities are measured usually in high dimensional vector space [22], but it is really computationally expensive. Fingerprints such as simhash and shingles are used to produce sketches of the documents to process them efficiently [21].
  7. Hotspot detection tool aids in identifying the areas with higher crime density than others so that the law enforcement resources can be utilized appropriately. For example the police department can utilize the police patrols in an efficient manner so that the patrol services in areas with high crime density can be increased and the patrol services in other areas can be decreased accordingly. Also travelers and tourists can identify regions which are more suitable for travel and visiting. In addition to this investors can get assistance to find out suitable areas for investments.
  8. Crime comparison tool helps an analyzer to compare the crime frequencies within a particular time period (year). This uses a pie chart to represent frequencies of crimes for each year. By using this tool law enforcement officers can understand clearly what type of crimes needs more attention, so that necessary actions can be taken.
  9. Crime pattern visualizer tool supports the police officers and crime analyzers to identify gradual changes of crime frequencies with respect to time so that they can change security arrangements accordingly. This has an option to generate a time series plot for each crime type separately. In addition it also has an option to generate time series plots for all the crime types in the same graph making the comparison between graphs easier. Law enforcement officers can use this tool to easily identify the trends involved in different types of crimes. Also if they have followed a security plan, they can evaluate the progress of it.