SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Analysis on crimes in Atlanta
Undergraduate Research Team
Georgia Institute of Technology
ISYE 4699
December 7 2014
1 Abstract
In this report, we will include what conclusions we made using the Geor-
gia Tech and Atlanta crime data. Areas of studies we were interested in
were patrol analysis, hot spots, and correlations among crimes. Using patrol
analysis, we studied about how current patrol routes could be changed. Im-
provements would result in less arrival times by police officers, lower crime
rates, less economic waste, and many more. A hot spot is an area with
concentrated crimes. We developed an algorithm on spotting hot spots in
Atlanta. This program needs improvements, but once we upgrade it to locate
more accurate hot spots, we will be able to compare hot spots of types of
crimes and find overlaps among them. Finally, we studied about how one
crime led to other crimes. We focused on finding the relationship between
auto theft and burglary.
2 Patrol Analysis
After we received the data from the Georgia Tech Police Department, we
filtered out unnecessary variables to conduct our preliminary research. We
were left with 18 most useful variables, and our research involved using these
information to come up with an improved solution. The variables are listed
below for reference:
1
2.1 Overview ISYE 4699
• OCANumber
• IncidentFromDate
• IncidentFromTime
• IncidentToDate
• IncidentToTime
• OffenseCode
• Offense Description
• CaseStatus CaseDisposition
• LocationCode
• PatrolZone
• Location
• Landmark
• LocationStreetNumber
• LocationDirectional
• LocationStreet
• LocationLatitude
• LocationLongitude
• CreatedSource
2.1 Overview
A total of 11578 crimes were recorded in Georgia Tech and nearby regions.
By looking at the distribution of time, day, and month, we acknowledged that
crimes occurred most often at around 1 am, and the frequency gradually
dropped until 6 am, when the crime was the least likely to occur. The crime
rate fluctuated smoothly between 400 to 700 crimes per hour from 8 am to
11 pm. April and September, which are parts of Spring and Fall Semesters,
had the highest number of crimes during the year. Offense codes 2700, 3657,
and 2751 topped the list of crimes. Approximately three-fourths of cases
were closed or cleared, and even the remaining cases were mostly inactive.
From the analysis of the four patrol zones divided by the Georgia Tech police,
Zone 2 was found to be the most dangerous. The number of crimes there
was almost twice compared to any other zone. Detailed crime type analysis
will be mentioned later.
2.2 Urban Police Patrol Model
Initially, the police department calculated patrol efficiency only consid-
ering the patrol time, and it even contained many errors. It was computed
by taking the difference of the total work time of police officers and the time
they spent on other duties, such as answering radio calls, taking care of the
traffic, or having a meal. This calculation gave inaccurate efficiency values
2/22
2.3 Further Questions ISYE 4699
since many police officers patrolled at a particular period of time when the
crimes were mostly likely to happen. Therefore, there was a lack of police
officers to patrol in other times. The goal was to have the right number of
patrol officers.
To fix these problems, in 1960’s, Dr. Richard Larson developed a system-
atic approach to study police patrol efficiency. He cooperated with the NYC
police department to develop the first version of the Urban Police Patrol
Model. He used the same 18 variables we listed above and came up with the
most ”accurate” model. The key idea of Dr. Larson’s model was as follows:
Given the pattern of crimes and limited amount of preventive patrol, how
should the effort be allocated along the streets to best achieve highest effi-
ciency? We were influenced by his study and decided to take on his model to
analyze the Atlanta crime data and optimize the patrol resource allocation.
Figure 1: Patrol Time = Total Time − Time for other duties
2.3 Further Questions
Dr. Larson’s model gave a rough approximation of the behavior of po-
lice preventive patrol. Qualitatively, Koopman’s method suggested that the
3/22
ISYE 4699
patrol effort should grow as the logarithm of the crime density increased
and further advised that areas with low likelihood of crimes should not be
patrolled at all. Refinement of this model was required before it could be
implemented by the police and a few more questions were asked by Dr. Lar-
son:
1. To what extent is an optimal patrol coverage function realizable?
2. How closely does a unit have to approach the optimal coverage in order
to achieve satisfactory result? Or, equivalently, what is the sensitivity
of the solution about the optimum?
3. To what extent is the crime distribution modified by patrol strategies?
4. How should each crime type be evaluated to reflect its relative serious-
ness?
5. What is the conditional probability that a crime will be detected given
its pattern?
We believe that these questions contain valuable insights and will continue
our research to answer questions in these areas.
3 Georgia Tech Crime data
One interesting phenomena was that while the crime rate of Atlanta kept
decreasing at a rate of 5.324%, the crime rate of Georgia Tech fluctuated
over the years. As the graphs suggested, the crime rate was higher in 2011
compared to 2010, then it decreased in 2012 and reached at its peak in
2013. We were unsure if the crime rates actually changed, or if the change
of benchmark on identification of crimes at Georgia Tech was the reason for
this. For example, some crimes in 2010 were categorized differently in later
years.
4/22
ISYE 4699
2010 2011 2012 2013 2014
0
1,000
2,000
Year
Numberofcrimes
Figure 2: Georgia Tech Crime – fluctuates
2010 2011 2012 2013 2014
0
1
2
3
·104
Year
Numberofcrimes
Figure 3: Atlanta Crime – decreases
We also observed the crime patterns using time series. In order to find the
relationship between Atlanta crimes and Georgia Tech crimes, we compared
the annual data and realized that the crime number patterns did not have a
recognizable similarity. In fact, there were some notable difference in their
patterns.
5/22
ISYE 4699
Figure 4: Montly number of crimes for Georgia Tech and Atlatna
Compared to Georiga Tech that had fewer crime in the summer, Atlanta
had even more. It was obvious that Georgia Tech’s summer crime rate was
low because most students left the campus for vacation. However, we did
not have an easy explanation for why the Atlanta data had an increase in
summer. Even when we took the average rate of years and put the graph
for GT and Atlanta together, we could see that Georgia Tech was more
dangerous during the semesters, but it was the opposite for Atlanta. Also,
it was interesting that the crime rate in Georgia Tech generally decreased
in between late August and December. We tried to come up with a few
reasons for the phenomenon. First, most freshmen came in at the end of
August of every year, and they lacked the sense of safety, and were thus
much more vulnerable to crimes. Second, September was the pledge month
for fraternities and sororities. Students were asked to do crazy stuff and were
under the risk of being targeted, especially when they were drunk or walked
outside late at night.
6/22
3.1 Geographical Relationship ISYE 4699
Figure 5: Average number of crimes for Georgia Tech and Atlatna
3.1 Geographical Relationship
We analyzed crimes geographically by using offense codes and patrol
zones. This was easily done by making a pivot table and looking at the
results. It showed an overall trend of the data and gave insights on which
other techniques to apply to achieve even better results.
Our objective was to prove or disprove that there exists a clear relevance
between the GTPD patrol zones (Zone 1 - 4) and the offense codes used by
the NCIC. Furthermore, if such process proved to be efficient, then we could
further apply the same procedure to analyze the Atlanta crime data.
We used all of the GTPD data from 2010 to 2014. To neglect unnecessary
information, we only took account of two variables: Patrol Zone and Offense
Code. For every crime, both its offense code and its location were given, so
we had enough data for analysis. We programmed Excel to give the output
in the following way: Z1 = [22 : 24, 23 : 325, 29 : 84, . . . ]
• The first two numbers represented the two numbers of the offense code
• The remaining numbers were the number of such incidents
• For example, there were 24 crimes that was coded ”22”
We played around with the NCIC code list before we proceeded with the
test.
• There were many different types of offense on the offense code list, but
we could categorize them nicely based on their first two numbers
7/22
3.1 Geographical Relationship ISYE 4699
• We excluded some offenses from the data because they were student
conducts, public order crimes, juvenile, invalid, or trivial to the overall
data
Using our manipulated data set, we generated a pivot table.
Figure 6: The pivot table (Location versus type of crimes)
Based on the table, we found out that Zone 2 had the most number
of crimes. Particularly, Zone 2 had the most number of assaults, burglary,
damage property, and stolen vehicles compared to other zones. In conclusion,
our approach could have worked better with more data. We will apply this
method on the Atlanta data later since we believe there will be enough data.
However, we concluded that we could not infer more information about the
relationship between location and type of crimes at Georgia Tech.
Crime Type Most frequent (# of Crimes) 2nd
most frequent (# of Crimes)
Assault Zone 2 (94) Zone 3 (24)
Burglary Zone 2 (79) Zone 4 (32)
Damage Property Zone 2 (171) Zone 1 (84)
Stolen Vehicle Zone 2 (43) Zone 1 (27)
Based on this approach, we could conclude that Zone 2 was the most dan-
gerous zone. It was difficult to find a relevance between types of crimes and
patrol zones because Zone 2 had so many more crimes than other zones –
there were not enough information about crimes in other zones. There were
explanations why there could not be enough data. First, the Georgia Tech
8/22
3.2 Questions ISYE 4699
campus was considered safe and did not have many crimes to record. Sec-
ond, many of the recorded crimes were minor, and after we filtered out them,
we only left with a few data. Last, there was not enough variables to take
account. There could have been more significant factors that contribute to
the result.
3.2 Questions
We have come up with some questions that needed to be answered in
order to continue our research. We will list them here:
1. How are the 4 zones divided into? Can we have a detailed description
of where each zone is?
2. There are 4 zones within Georgia Tech, and there are 2 more zones: off
campus and SAV. What does SAV mean?
3. There were many incidents that counted as ”minor” crimes. Are they
really insignificant enough to be excluded from our research, or should
we give more attention to them?
4 Atlanta Crime data
4.1 Time series and Seasonality
Time study of criminal data was helpful in revealing crime patterns on
time scale. With the 2011-2014 crime data, we grouped the entries by date
(occur_date) and crime type (UC2 Literal). We returned the count of each
crime type on every reported date and performed the time series analysis.
9/22
4.1 Time series and Seasonality ISYE 4699
In the time series plot of total crimes each day, we could observe a rough
seasonal pattern. We were unsure if we could detect this seasonal pattern
on all crime types or just on a few that influenced the result on total crime
rate. To figure out, we decomposed the data into different crime types and
performed the time series analysis. There were two notable crime types
that returned interesting patterns: aggravated assault (AGG_ASSULT) and
larceny (LARCENY). Therefore, we decided to investigate more on these
types of crimes. Below are the time series plots for them.
10/22
4.1 Time series and Seasonality ISYE 4699
Moreover, using the additive single exponential method, we were able to
smooth the data and come up with cleaner diagrams:
The smoothened data plots showed us the trend of crime data. The
frequencies of both aggravated assault and larceny tended to peak around in
September, and they slowly dropped down to bottom in March.
Using the Holt-Winter’s method, we were able to apply weight on data
points, and we came up with an applicable model for current data points.
The diagram below shows our result of applying the Holt-Winters’ method
on larceny data. Red points represented the smoothed data points of our
model.
11/22
4.1 Time series and Seasonality ISYE 4699
With the smoothed model, we were able to make a prediction on future
data points. We used this method on the larceny data points and made
a prediction on 100 more data points with a 95% prediction interval. The
residual plots are shown below.
12/22
4.1 Time series and Seasonality ISYE 4699
In the following diagram, blue points represented the actual data, red
points showed the smoothed data points with lower weight on older data
and higher weight on later data, the green points gave the prediction for the
next 100 data points, and purple points were the upper and lower bounds
of 95% prediction interval of green points. The residual analysis plot of this
method was as follows. P-value of the Anderson-Darling Test was 0.009 –
it indicated that the residuals agreed with normality assumption. Residual
versus fits plot showed that the residuals were randomly distributed, and it
supported our identical variance assumption. Hold-Winters’ method had a
mean absolute percentage error (MAPE) of 14.4879, mean absolute deviation
(MAD) of 6.1235, and mean squared deviation (MSD) of 60.0624. These
results were lower than those of single exponential method, which indicated
that the Holt-Winters’ method was an appropriate choice in this time study.
13/22
4.2 Hot spots ISYE 4699
4.2 Hot spots
We began our data analysis by checking whether there were areas of
concentrated crime in Atlanta. In order to locate these areas, called ”hot
spots,” we used four basic statistical tests: mean center, standard deviation,
standard deviation ellipses, and the test for clustering. The mean center
gave us the mean longitude and latitude of crimes, the standard deviation
showed how deviated the crimes were with respect to the mean center, and
the standard deviation ellipses visually showed which crimes were one stan-
dard deviation away from the mean center. Most importantly, the test for
clustering gave information on the closeness of crime locations.
The mean center we found was near the Fulton County Juvenile Court.
We figured that the mean center itself did not give much information about
hot spots. It was not necessarily true that crimes near the mean center
occurred with a high probability; however, it was useful as a comparison.
We could check where other crimes occurred in relation to the mean center.
Also, the result in standard deviation and standard deviation ellipses were
vague. The standard deviation ellipses did not map the concentrated area
of crimes – some areas of an ellipse had frequent occurrence of crimes, while
14/22
4.2 Hot spots ISYE 4699
other areas within the same ellipse did not have many crimes. On the other
hand, the values obtained from the test for clustering were relative, and thus
were comparable. Therefore, we concluded that the test for clustering gave
the most accurate representation of hot spots among the four tests.
To test for clustering, we used the nearest neighbor index method. Simply
put, we generated random crime spots in Atlanta and compared how close
those spots were to how close actual crime spots were. The ratio between
the distances among observed data to distances among random data was
called the Nearest Neighbor Index (NNI). The smaller the NNI was, the more
clustered the data was. We could safely assume that data was clustered if
NNI was close to 0.5.
The NNI for all crimes in Atlanta was 0.543. This showed that there was
definitely a correlation between locations and crimes. Then we found NNI for
each types of crime. To minimize error, we calculated NNI several times and
took the average. Table 1 shows NNI for each type of crime. Since no NNI
was less than 1, all crimes were somehow clustered. Note that robbery was
most clustered and murder was least clustered. Except for murder and rape,
all other crimes’ NNI were below 0.5, which implied that it was worthy to
investigate the hotspots. One reason for robbery and theft having the lowest
NNI was their relatively frequent occurrence. The data showed that these
types of crimes appeared more frequently than the others. It was natural
that there were hot spots where victims were more vulnerable to robbery
and theft. On the contrary, since rape and murder took place less frequently
than other crimes, it was not surprising to observe more scatterings of data.
Type of crime NNI
Total 0.543
Assault 0.416
Burglary 0.448
Murder/Homicide 0.823
Rape 0.694
Robbery 0.258
Theft 0.371
Vehicle 0.414
Table 1: NNI for different types of crimes
Some notable regions of hotspots for all types of crimes included the
areas along 10th street NW and along Peachtree street SW. Although not
15/22
4.2 Hot spots ISYE 4699
many crimes occurred inside schools, there were many crimes reported near
colleges, including Georgia Tech, Georgia State University, Clark Atlanta
University, and Spelman College. Since we were specifically interested in the
relationship between robbery and auto theft, we compared the hot spots of
auto theft to the hot spots of robbery. We could observe that there were
some overlaps. We have yet to conduct a statistical test on the correlation
between the two crime types, but this seemed like a notable topic to study,
and we decided to do more research to figure out whether stolen cars were
used to commit other crimes.
Errors in analysis came from crimes not having the same amount of data.
A crime with the most data will most likely produce an accurate NNI, while
a crime with the least data will not be able to produce an accurate NNI.
Another error appeared when generating random crime spots on the map.
We had difficulty setting an exact boundary and instead generated random
points inside a rectangle that approximately resemble the border of Atlanta.
Furthermore, we assumed that the Earth was a 2 dimensional plane and used
the inappropriate formula for finding the distance between points. Instead,
our results would have been improved with a help of the Haversine formula:
d = 2r arcsin sin2
φ2 − φ1
2
+ cos(φ1)cos(φ2)sin2 λ2 − λ1
2
Even though there were many ways to compute more accurate NNI, cur-
rently calculated NNI will be sufficient when comparing clustering of a crime
to other crimes. However, we could develop a better algorithm to compute
the NNI, as the current algorithm computed numerous unnecessary infor-
mations. For instance, it calculated the distance between all points and
compared all values when we could have smartly selected a few points to
compare. We would improve our algorithm to incorporate the Voronoi dia-
gram and Fortune’s algorithm to reduce the computational time. This would
allow us to analyze more data in less time, and we will also be able to calcu-
late multidimensional data more efficiently.
As shown in Figure 1, the Voronoi diagram is a plot with points divided
up by half-planes. Subspaces are divided up such that each subspace con-
tains one point and that an imaginary line segment that connects two near
points are perpendicular to a borderline. Since the points are now somewhat
sorted, this diagram can find the nearest neighbor intelligently and has the
computational time of O(logn). The problem is that generating half-planes
16/22
4.2 Hot spots ISYE 4699
take a long time O(n2
logn), and thus will slow down the process. Luckily, the
Fortune algorithm can find half-planes faster, and the big O of it is O(nlogn).
Therefore, combining two algorithms, we end up with the computation time
of O(nlogn).
(a) Voronoi diagram step 1 (b) Voronoi diagram step 2
(c) Fortune algorithm
Figure 9: Caption place holder
In addition to those improvements, we can also filter out avoidable calcu-
lations by identifying the unstable queries. An unstable query arbitrarily sets
a border around each point so that the algorithm determines which points to
include in its process. Along with the integration of algorithms stated above,
this improvement will further reduce the computational time. Additionally,
the algorithm can be used to find which points are located near a certain
point.
Finally, we will perform more statistical test on the data set. Our focus
will be to reduce errors and computation time as well as to locate zones
that need more attention by the officers. Once we have the algorithm, we
will be able to suggest new patrol routes to minimize the arrival time at the
crime site or the optimized number of officers in each patrol zone. Then by
17/22
4.3 Auto Theft ISYE 4699
comparing with the optimized solution, we can check how efficient current
resource allocation is.
4.3 Auto Theft
When we checked the hot spots, we noticed that the hot spots for auto
theft and for robbery had a lot of overlaps. We were interested in this obser-
vation and decided to test the relationship between auto thefts and robbery.
Then we realized that criminals’ primary goal of auto thefts was not to com-
mit robberies, but rather to sell those cars. If they did not sell cars right
away, however, they used that car to commit other crimes, including joyriding
(driving around freely), drug dealing, or robbing.
One way we used to find the correlation between auto thefts and other
crimes was by tracking a stolen car and checking if it was recorded again as
a suspect’s car. The most obvious way to do so was by comparing the license
plates. However, there was not enough information; many times, it was not
viable for witnesses to remember the license plate numbers. Instead, we
compared other attributes of the stolen vehicles and suspect vehicles. There
were too many information, so we filtered out less important information and
ended up with 60 variables. We reconstructed two data sets using them and
started our research.
One file contained all necessary information about auto theft, such as the
offense code and date of crime. Unfortunately, one fourth of crime data did
not contain any information about the stolen car, and could have been more
helpful with consistency and completeness of documentation. The other file
included the information about suspect vehicles. Here, we listed any vehicle
that was used to commit any type of crime. This data set also had insufficient
amount of data, but we wrote a code to use the best out of these two files.
While examining criminals’ habits, we came up with more questions, such
as the time delay between car stolen time and robbery time, car types that
were vulnerable to crimes, and how the stolen cars were used. In follow-
ing paragraphs, we will provide an analysis of police crime data along with
derived questions.
The easiest way to categorize cars was by their colors and makers, so we
made a color versus maker pivot table. The most noticeable information we
got from the pivot table was that Dodge, Chevrolet, and Ford were the most
popular car types and white, black, and silver were the most vulnerable
colors. Particularly, old models in 1990’s were targeted frequently. These
18/22
4.3 Auto Theft ISYE 4699
results were pretty intuitive, as those cars had weaker security systems and
criminals did not want to get noticed by robbing fancy cars. However, to our
surprise, the thieves showed more interest in luxury cars than we thought
they would do. We found out that the main reason for stealing those cars
despite their difficulty to do so was because those cars could be sold for high
prices.
Dodge Chevrolet Ford Honda
0
20
40
60
80
100
120
140
160
Types of cars
Numberofcars
White
Black
Silver
19/22
4.3 Auto Theft ISYE 4699
Figure 10: The popularities of car types and their years
It was obvious that criminals targeted old, common cars for easy theft
and expensive cars for high return. What was interesting, however, was that
criminals tended to take cars that were less valuable than cars they used
for robbery. In other words, they used newer cars to steal older cars. This
could be interpreted in two ways: they wanted small, easy money, or they
needed a new car to commit a new crime. We had to know what they did
with the stole cars. To do so, we found out how much time criminals spent
before committing a crime with their stolen cars. Out of 5270 auto stolen
offenses and 4237 suspect vehicle cases, we found 48 exact matches. In these
48 cases, the average time a stolen car was spotted in another crime scene
20/22
4.4 Questions and Goals ISYE 4699
was about 4 hours, if we did not count for some cars that reappeared several
days later. Particularly, among the 48 cases, two cars were used to commit
multiple crimes in a short time period. Since the cars were used in crime
only a few hours after they were stolen and then were sold, we could infer
that the reason criminals stole cars was to make their crimes less traceable
and to earn some quick cash.
Suspect
vehicle
year
Suspect vehicle
maker
Stolen
vehicle
year
Stolen vehicle
maker
2002 Chevrolet 1999 Ford
2010 Dodge 2004 Dodge
2011 Toyota 1996 Honda
2001 Ford 1996 Honda
2008 Nissan 1984 Oldsmobilie
Table 2: Examples of suspect stealing less valuable cars
4.4 Questions and Goals
Our goals for next semester will be as follows.
1. Upgrade the model used for Georgia Tech crimes to use for Atlanta
crimes.
2. Develop a better algorithm on locating the hot spots
3. Find geographic matches and correlations among crimes
4. Suggest an optimized way of allocating resources.
5. Recognize crime patterns
To continue with our research, we needed more information about crimes.
We will list some questions that are preventing us from advancing.
1. Atlanta is divided up into 5 zones, but the Excel data shows the place
of crime by latitude and longitude, not by zones. Given the coordinate
of a place, is there a way of telling which zone that place is in?
21/22
4.5 Reference ISYE 4699
2. Has the crime criteria changed over the past few years? In other words,
is there a crime that was considered ”type A” crime but now ”type B?”
Now that we are familiar with the data and gained insights on crimes
in Atlanta, we are certain that our progress will speed up. There was a
limitation on the amount of data; however, we learned to make use of small
data to come up with noteworthy conclusions. We hope that we establish a
generalized algorithm that could be used in many cities.
4.5 Reference
Pictures of Voronoi Diagrams: https://www.youtube.com/watch?v=7eCrHAv6sYY
22/22

Contenu connexe

Tendances

Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternZakaria Zubi
 
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?Sunil Jagani
 
Analytics-Based Crime Prediction
Analytics-Based Crime PredictionAnalytics-Based Crime Prediction
Analytics-Based Crime PredictionProdapt Solutions
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringReuben George
 
Crime prediction-using-data-mining
Crime prediction-using-data-miningCrime prediction-using-data-mining
Crime prediction-using-data-miningmohammed albash
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsFundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsOsokop
 
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...Zakaria Zubi
 
Discovery of ranking fraud for mobile apps
Discovery of ranking fraud for mobile appsDiscovery of ranking fraud for mobile apps
Discovery of ranking fraud for mobile appsNexgen Technology
 
Crime analysis of different situations
Crime analysis of different situationsCrime analysis of different situations
Crime analysis of different situationsKanukulaAkhil
 
Chicago Crime Dataset Project Proposal
Chicago Crime Dataset Project ProposalChicago Crime Dataset Project Proposal
Chicago Crime Dataset Project ProposalAashri Tandon
 
Crime rate analysis using k nn in python
Crime rate analysis using k nn in python Crime rate analysis using k nn in python
Crime rate analysis using k nn in python CloudTechnologies
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
Us Pennsylvania State Police
Us Pennsylvania State PoliceUs Pennsylvania State Police
Us Pennsylvania State PoliceDawnStarling
 
Machine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionMachine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionAPNIC
 
Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisIOSR Journals
 

Tendances (20)

Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
 
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
Predictive Policing - How Emerging Technologies Are Helping Prevent Crimes?
 
Analytics-Based Crime Prediction
Analytics-Based Crime PredictionAnalytics-Based Crime Prediction
Analytics-Based Crime Prediction
 
Crime
CrimeCrime
Crime
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means Clustering
 
Crime analysis
Crime analysisCrime analysis
Crime analysis
 
Application of GIS in Criminology and Defence Intelligence
Application of GIS in Criminology and Defence IntelligenceApplication of GIS in Criminology and Defence Intelligence
Application of GIS in Criminology and Defence Intelligence
 
Crime prediction-using-data-mining
Crime prediction-using-data-miningCrime prediction-using-data-mining
Crime prediction-using-data-mining
 
Crime analysis
Crime analysisCrime analysis
Crime analysis
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsFundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis Concepts
 
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
 
Discovery of ranking fraud for mobile apps
Discovery of ranking fraud for mobile appsDiscovery of ranking fraud for mobile apps
Discovery of ranking fraud for mobile apps
 
Crime analysis of different situations
Crime analysis of different situationsCrime analysis of different situations
Crime analysis of different situations
 
Chicago Crime Dataset Project Proposal
Chicago Crime Dataset Project ProposalChicago Crime Dataset Project Proposal
Chicago Crime Dataset Project Proposal
 
Crime rate analysis using k nn in python
Crime rate analysis using k nn in python Crime rate analysis using k nn in python
Crime rate analysis using k nn in python
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
Us Pennsylvania State Police
Us Pennsylvania State PoliceUs Pennsylvania State Police
Us Pennsylvania State Police
 
Machine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionMachine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern Detection
 
Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysis
 
U24149153
U24149153U24149153
U24149153
 

Similaire à Analysis of Atlanta crime hotspots and patrol routes

Database and Analytics Programming - Project report
Database and Analytics Programming - Project reportDatabase and Analytics Programming - Project report
Database and Analytics Programming - Project reportsarthakkhare3
 
georgia-tech-atlanta
georgia-tech-atlantageorgia-tech-atlanta
georgia-tech-atlantaPeter Kim
 
Crime Data Analysis and Prediction for city of Los Angeles
Crime Data Analysis and Prediction for city of Los AngelesCrime Data Analysis and Prediction for city of Los Angeles
Crime Data Analysis and Prediction for city of Los AngelesHeta Parekh
 
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxMr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxaudeleypearl
 
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxMr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxroushhsiu
 
Student #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docxStudent #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docxjohniemcm5zt
 
Crime prediction based on crime types
Crime prediction based on crime typesCrime prediction based on crime types
Crime prediction based on crime typesIJDKP
 
Chicago Crime Analysis
Chicago Crime AnalysisChicago Crime Analysis
Chicago Crime AnalysisTom Donoghue
 
IRJET- Detecting Criminal Method using Data Mining
IRJET- Detecting Criminal Method using Data MiningIRJET- Detecting Criminal Method using Data Mining
IRJET- Detecting Criminal Method using Data MiningIRJET Journal
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime AnalysisParang Saraf
 
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docxJournal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docxpriestmanmable
 
External Mechanisms Of Accountaability
External Mechanisms Of AccountaabilityExternal Mechanisms Of Accountaability
External Mechanisms Of AccountaabilityCarmen Martin
 
PA-Consulting-Group_Cybercrime-Tipping-point-survey-report
PA-Consulting-Group_Cybercrime-Tipping-point-survey-reportPA-Consulting-Group_Cybercrime-Tipping-point-survey-report
PA-Consulting-Group_Cybercrime-Tipping-point-survey-reportJames Fisher
 
IRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET Journal
 
How scanning and digitizing police records help fight crimes at the earliest (1)
How scanning and digitizing police records help fight crimes at the earliest (1)How scanning and digitizing police records help fight crimes at the earliest (1)
How scanning and digitizing police records help fight crimes at the earliest (1) Managed Outsource Solutions
 
Crime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataCrime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataValerii Klymchuk
 
Disadvantages Of Intelligence Led Policing
Disadvantages Of Intelligence Led PolicingDisadvantages Of Intelligence Led Policing
Disadvantages Of Intelligence Led PolicingChristina Ramirez
 
Research on Clustering Method of Related Cases Based On Chinese Text
Research on Clustering Method of Related Cases Based On Chinese TextResearch on Clustering Method of Related Cases Based On Chinese Text
Research on Clustering Method of Related Cases Based On Chinese TextIOSR Journals
 

Similaire à Analysis of Atlanta crime hotspots and patrol routes (20)

Database and Analytics Programming - Project report
Database and Analytics Programming - Project reportDatabase and Analytics Programming - Project report
Database and Analytics Programming - Project report
 
georgia-tech-atlanta
georgia-tech-atlantageorgia-tech-atlanta
georgia-tech-atlanta
 
Crime Data Analysis and Prediction for city of Los Angeles
Crime Data Analysis and Prediction for city of Los AngelesCrime Data Analysis and Prediction for city of Los Angeles
Crime Data Analysis and Prediction for city of Los Angeles
 
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxMr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
 
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docxMr. Friend is acrime analystwith the SantaCruz, Califo.docx
Mr. Friend is acrime analystwith the SantaCruz, Califo.docx
 
Student #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docxStudent #1 I have chosen to write about the history of data anal.docx
Student #1 I have chosen to write about the history of data anal.docx
 
Crime prediction based on crime types
Crime prediction based on crime typesCrime prediction based on crime types
Crime prediction based on crime types
 
Chicago Crime Analysis
Chicago Crime AnalysisChicago Crime Analysis
Chicago Crime Analysis
 
IRJET- Detecting Criminal Method using Data Mining
IRJET- Detecting Criminal Method using Data MiningIRJET- Detecting Criminal Method using Data Mining
IRJET- Detecting Criminal Method using Data Mining
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime Analysis
 
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docxJournal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
Journal of-Criminal Justice, Vol. 7, pp. 217-241 (1979). Per.docx
 
External Mechanisms Of Accountaability
External Mechanisms Of AccountaabilityExternal Mechanisms Of Accountaability
External Mechanisms Of Accountaability
 
PA-Consulting-Group_Cybercrime-Tipping-point-survey-report
PA-Consulting-Group_Cybercrime-Tipping-point-survey-reportPA-Consulting-Group_Cybercrime-Tipping-point-survey-report
PA-Consulting-Group_Cybercrime-Tipping-point-survey-report
 
Technical Seminar
Technical SeminarTechnical Seminar
Technical Seminar
 
IRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data Analytics
 
How scanning and digitizing police records help fight crimes at the earliest (1)
How scanning and digitizing police records help fight crimes at the earliest (1)How scanning and digitizing police records help fight crimes at the earliest (1)
How scanning and digitizing police records help fight crimes at the earliest (1)
 
Crime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataCrime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation Data
 
Disadvantages Of Intelligence Led Policing
Disadvantages Of Intelligence Led PolicingDisadvantages Of Intelligence Led Policing
Disadvantages Of Intelligence Led Policing
 
RoboCop World
RoboCop WorldRoboCop World
RoboCop World
 
Research on Clustering Method of Related Cases Based On Chinese Text
Research on Clustering Method of Related Cases Based On Chinese TextResearch on Clustering Method of Related Cases Based On Chinese Text
Research on Clustering Method of Related Cases Based On Chinese Text
 

Analysis of Atlanta crime hotspots and patrol routes

  • 1. Analysis on crimes in Atlanta Undergraduate Research Team Georgia Institute of Technology ISYE 4699 December 7 2014 1 Abstract In this report, we will include what conclusions we made using the Geor- gia Tech and Atlanta crime data. Areas of studies we were interested in were patrol analysis, hot spots, and correlations among crimes. Using patrol analysis, we studied about how current patrol routes could be changed. Im- provements would result in less arrival times by police officers, lower crime rates, less economic waste, and many more. A hot spot is an area with concentrated crimes. We developed an algorithm on spotting hot spots in Atlanta. This program needs improvements, but once we upgrade it to locate more accurate hot spots, we will be able to compare hot spots of types of crimes and find overlaps among them. Finally, we studied about how one crime led to other crimes. We focused on finding the relationship between auto theft and burglary. 2 Patrol Analysis After we received the data from the Georgia Tech Police Department, we filtered out unnecessary variables to conduct our preliminary research. We were left with 18 most useful variables, and our research involved using these information to come up with an improved solution. The variables are listed below for reference: 1
  • 2. 2.1 Overview ISYE 4699 • OCANumber • IncidentFromDate • IncidentFromTime • IncidentToDate • IncidentToTime • OffenseCode • Offense Description • CaseStatus CaseDisposition • LocationCode • PatrolZone • Location • Landmark • LocationStreetNumber • LocationDirectional • LocationStreet • LocationLatitude • LocationLongitude • CreatedSource 2.1 Overview A total of 11578 crimes were recorded in Georgia Tech and nearby regions. By looking at the distribution of time, day, and month, we acknowledged that crimes occurred most often at around 1 am, and the frequency gradually dropped until 6 am, when the crime was the least likely to occur. The crime rate fluctuated smoothly between 400 to 700 crimes per hour from 8 am to 11 pm. April and September, which are parts of Spring and Fall Semesters, had the highest number of crimes during the year. Offense codes 2700, 3657, and 2751 topped the list of crimes. Approximately three-fourths of cases were closed or cleared, and even the remaining cases were mostly inactive. From the analysis of the four patrol zones divided by the Georgia Tech police, Zone 2 was found to be the most dangerous. The number of crimes there was almost twice compared to any other zone. Detailed crime type analysis will be mentioned later. 2.2 Urban Police Patrol Model Initially, the police department calculated patrol efficiency only consid- ering the patrol time, and it even contained many errors. It was computed by taking the difference of the total work time of police officers and the time they spent on other duties, such as answering radio calls, taking care of the traffic, or having a meal. This calculation gave inaccurate efficiency values 2/22
  • 3. 2.3 Further Questions ISYE 4699 since many police officers patrolled at a particular period of time when the crimes were mostly likely to happen. Therefore, there was a lack of police officers to patrol in other times. The goal was to have the right number of patrol officers. To fix these problems, in 1960’s, Dr. Richard Larson developed a system- atic approach to study police patrol efficiency. He cooperated with the NYC police department to develop the first version of the Urban Police Patrol Model. He used the same 18 variables we listed above and came up with the most ”accurate” model. The key idea of Dr. Larson’s model was as follows: Given the pattern of crimes and limited amount of preventive patrol, how should the effort be allocated along the streets to best achieve highest effi- ciency? We were influenced by his study and decided to take on his model to analyze the Atlanta crime data and optimize the patrol resource allocation. Figure 1: Patrol Time = Total Time − Time for other duties 2.3 Further Questions Dr. Larson’s model gave a rough approximation of the behavior of po- lice preventive patrol. Qualitatively, Koopman’s method suggested that the 3/22
  • 4. ISYE 4699 patrol effort should grow as the logarithm of the crime density increased and further advised that areas with low likelihood of crimes should not be patrolled at all. Refinement of this model was required before it could be implemented by the police and a few more questions were asked by Dr. Lar- son: 1. To what extent is an optimal patrol coverage function realizable? 2. How closely does a unit have to approach the optimal coverage in order to achieve satisfactory result? Or, equivalently, what is the sensitivity of the solution about the optimum? 3. To what extent is the crime distribution modified by patrol strategies? 4. How should each crime type be evaluated to reflect its relative serious- ness? 5. What is the conditional probability that a crime will be detected given its pattern? We believe that these questions contain valuable insights and will continue our research to answer questions in these areas. 3 Georgia Tech Crime data One interesting phenomena was that while the crime rate of Atlanta kept decreasing at a rate of 5.324%, the crime rate of Georgia Tech fluctuated over the years. As the graphs suggested, the crime rate was higher in 2011 compared to 2010, then it decreased in 2012 and reached at its peak in 2013. We were unsure if the crime rates actually changed, or if the change of benchmark on identification of crimes at Georgia Tech was the reason for this. For example, some crimes in 2010 were categorized differently in later years. 4/22
  • 5. ISYE 4699 2010 2011 2012 2013 2014 0 1,000 2,000 Year Numberofcrimes Figure 2: Georgia Tech Crime – fluctuates 2010 2011 2012 2013 2014 0 1 2 3 ·104 Year Numberofcrimes Figure 3: Atlanta Crime – decreases We also observed the crime patterns using time series. In order to find the relationship between Atlanta crimes and Georgia Tech crimes, we compared the annual data and realized that the crime number patterns did not have a recognizable similarity. In fact, there were some notable difference in their patterns. 5/22
  • 6. ISYE 4699 Figure 4: Montly number of crimes for Georgia Tech and Atlatna Compared to Georiga Tech that had fewer crime in the summer, Atlanta had even more. It was obvious that Georgia Tech’s summer crime rate was low because most students left the campus for vacation. However, we did not have an easy explanation for why the Atlanta data had an increase in summer. Even when we took the average rate of years and put the graph for GT and Atlanta together, we could see that Georgia Tech was more dangerous during the semesters, but it was the opposite for Atlanta. Also, it was interesting that the crime rate in Georgia Tech generally decreased in between late August and December. We tried to come up with a few reasons for the phenomenon. First, most freshmen came in at the end of August of every year, and they lacked the sense of safety, and were thus much more vulnerable to crimes. Second, September was the pledge month for fraternities and sororities. Students were asked to do crazy stuff and were under the risk of being targeted, especially when they were drunk or walked outside late at night. 6/22
  • 7. 3.1 Geographical Relationship ISYE 4699 Figure 5: Average number of crimes for Georgia Tech and Atlatna 3.1 Geographical Relationship We analyzed crimes geographically by using offense codes and patrol zones. This was easily done by making a pivot table and looking at the results. It showed an overall trend of the data and gave insights on which other techniques to apply to achieve even better results. Our objective was to prove or disprove that there exists a clear relevance between the GTPD patrol zones (Zone 1 - 4) and the offense codes used by the NCIC. Furthermore, if such process proved to be efficient, then we could further apply the same procedure to analyze the Atlanta crime data. We used all of the GTPD data from 2010 to 2014. To neglect unnecessary information, we only took account of two variables: Patrol Zone and Offense Code. For every crime, both its offense code and its location were given, so we had enough data for analysis. We programmed Excel to give the output in the following way: Z1 = [22 : 24, 23 : 325, 29 : 84, . . . ] • The first two numbers represented the two numbers of the offense code • The remaining numbers were the number of such incidents • For example, there were 24 crimes that was coded ”22” We played around with the NCIC code list before we proceeded with the test. • There were many different types of offense on the offense code list, but we could categorize them nicely based on their first two numbers 7/22
  • 8. 3.1 Geographical Relationship ISYE 4699 • We excluded some offenses from the data because they were student conducts, public order crimes, juvenile, invalid, or trivial to the overall data Using our manipulated data set, we generated a pivot table. Figure 6: The pivot table (Location versus type of crimes) Based on the table, we found out that Zone 2 had the most number of crimes. Particularly, Zone 2 had the most number of assaults, burglary, damage property, and stolen vehicles compared to other zones. In conclusion, our approach could have worked better with more data. We will apply this method on the Atlanta data later since we believe there will be enough data. However, we concluded that we could not infer more information about the relationship between location and type of crimes at Georgia Tech. Crime Type Most frequent (# of Crimes) 2nd most frequent (# of Crimes) Assault Zone 2 (94) Zone 3 (24) Burglary Zone 2 (79) Zone 4 (32) Damage Property Zone 2 (171) Zone 1 (84) Stolen Vehicle Zone 2 (43) Zone 1 (27) Based on this approach, we could conclude that Zone 2 was the most dan- gerous zone. It was difficult to find a relevance between types of crimes and patrol zones because Zone 2 had so many more crimes than other zones – there were not enough information about crimes in other zones. There were explanations why there could not be enough data. First, the Georgia Tech 8/22
  • 9. 3.2 Questions ISYE 4699 campus was considered safe and did not have many crimes to record. Sec- ond, many of the recorded crimes were minor, and after we filtered out them, we only left with a few data. Last, there was not enough variables to take account. There could have been more significant factors that contribute to the result. 3.2 Questions We have come up with some questions that needed to be answered in order to continue our research. We will list them here: 1. How are the 4 zones divided into? Can we have a detailed description of where each zone is? 2. There are 4 zones within Georgia Tech, and there are 2 more zones: off campus and SAV. What does SAV mean? 3. There were many incidents that counted as ”minor” crimes. Are they really insignificant enough to be excluded from our research, or should we give more attention to them? 4 Atlanta Crime data 4.1 Time series and Seasonality Time study of criminal data was helpful in revealing crime patterns on time scale. With the 2011-2014 crime data, we grouped the entries by date (occur_date) and crime type (UC2 Literal). We returned the count of each crime type on every reported date and performed the time series analysis. 9/22
  • 10. 4.1 Time series and Seasonality ISYE 4699 In the time series plot of total crimes each day, we could observe a rough seasonal pattern. We were unsure if we could detect this seasonal pattern on all crime types or just on a few that influenced the result on total crime rate. To figure out, we decomposed the data into different crime types and performed the time series analysis. There were two notable crime types that returned interesting patterns: aggravated assault (AGG_ASSULT) and larceny (LARCENY). Therefore, we decided to investigate more on these types of crimes. Below are the time series plots for them. 10/22
  • 11. 4.1 Time series and Seasonality ISYE 4699 Moreover, using the additive single exponential method, we were able to smooth the data and come up with cleaner diagrams: The smoothened data plots showed us the trend of crime data. The frequencies of both aggravated assault and larceny tended to peak around in September, and they slowly dropped down to bottom in March. Using the Holt-Winter’s method, we were able to apply weight on data points, and we came up with an applicable model for current data points. The diagram below shows our result of applying the Holt-Winters’ method on larceny data. Red points represented the smoothed data points of our model. 11/22
  • 12. 4.1 Time series and Seasonality ISYE 4699 With the smoothed model, we were able to make a prediction on future data points. We used this method on the larceny data points and made a prediction on 100 more data points with a 95% prediction interval. The residual plots are shown below. 12/22
  • 13. 4.1 Time series and Seasonality ISYE 4699 In the following diagram, blue points represented the actual data, red points showed the smoothed data points with lower weight on older data and higher weight on later data, the green points gave the prediction for the next 100 data points, and purple points were the upper and lower bounds of 95% prediction interval of green points. The residual analysis plot of this method was as follows. P-value of the Anderson-Darling Test was 0.009 – it indicated that the residuals agreed with normality assumption. Residual versus fits plot showed that the residuals were randomly distributed, and it supported our identical variance assumption. Hold-Winters’ method had a mean absolute percentage error (MAPE) of 14.4879, mean absolute deviation (MAD) of 6.1235, and mean squared deviation (MSD) of 60.0624. These results were lower than those of single exponential method, which indicated that the Holt-Winters’ method was an appropriate choice in this time study. 13/22
  • 14. 4.2 Hot spots ISYE 4699 4.2 Hot spots We began our data analysis by checking whether there were areas of concentrated crime in Atlanta. In order to locate these areas, called ”hot spots,” we used four basic statistical tests: mean center, standard deviation, standard deviation ellipses, and the test for clustering. The mean center gave us the mean longitude and latitude of crimes, the standard deviation showed how deviated the crimes were with respect to the mean center, and the standard deviation ellipses visually showed which crimes were one stan- dard deviation away from the mean center. Most importantly, the test for clustering gave information on the closeness of crime locations. The mean center we found was near the Fulton County Juvenile Court. We figured that the mean center itself did not give much information about hot spots. It was not necessarily true that crimes near the mean center occurred with a high probability; however, it was useful as a comparison. We could check where other crimes occurred in relation to the mean center. Also, the result in standard deviation and standard deviation ellipses were vague. The standard deviation ellipses did not map the concentrated area of crimes – some areas of an ellipse had frequent occurrence of crimes, while 14/22
  • 15. 4.2 Hot spots ISYE 4699 other areas within the same ellipse did not have many crimes. On the other hand, the values obtained from the test for clustering were relative, and thus were comparable. Therefore, we concluded that the test for clustering gave the most accurate representation of hot spots among the four tests. To test for clustering, we used the nearest neighbor index method. Simply put, we generated random crime spots in Atlanta and compared how close those spots were to how close actual crime spots were. The ratio between the distances among observed data to distances among random data was called the Nearest Neighbor Index (NNI). The smaller the NNI was, the more clustered the data was. We could safely assume that data was clustered if NNI was close to 0.5. The NNI for all crimes in Atlanta was 0.543. This showed that there was definitely a correlation between locations and crimes. Then we found NNI for each types of crime. To minimize error, we calculated NNI several times and took the average. Table 1 shows NNI for each type of crime. Since no NNI was less than 1, all crimes were somehow clustered. Note that robbery was most clustered and murder was least clustered. Except for murder and rape, all other crimes’ NNI were below 0.5, which implied that it was worthy to investigate the hotspots. One reason for robbery and theft having the lowest NNI was their relatively frequent occurrence. The data showed that these types of crimes appeared more frequently than the others. It was natural that there were hot spots where victims were more vulnerable to robbery and theft. On the contrary, since rape and murder took place less frequently than other crimes, it was not surprising to observe more scatterings of data. Type of crime NNI Total 0.543 Assault 0.416 Burglary 0.448 Murder/Homicide 0.823 Rape 0.694 Robbery 0.258 Theft 0.371 Vehicle 0.414 Table 1: NNI for different types of crimes Some notable regions of hotspots for all types of crimes included the areas along 10th street NW and along Peachtree street SW. Although not 15/22
  • 16. 4.2 Hot spots ISYE 4699 many crimes occurred inside schools, there were many crimes reported near colleges, including Georgia Tech, Georgia State University, Clark Atlanta University, and Spelman College. Since we were specifically interested in the relationship between robbery and auto theft, we compared the hot spots of auto theft to the hot spots of robbery. We could observe that there were some overlaps. We have yet to conduct a statistical test on the correlation between the two crime types, but this seemed like a notable topic to study, and we decided to do more research to figure out whether stolen cars were used to commit other crimes. Errors in analysis came from crimes not having the same amount of data. A crime with the most data will most likely produce an accurate NNI, while a crime with the least data will not be able to produce an accurate NNI. Another error appeared when generating random crime spots on the map. We had difficulty setting an exact boundary and instead generated random points inside a rectangle that approximately resemble the border of Atlanta. Furthermore, we assumed that the Earth was a 2 dimensional plane and used the inappropriate formula for finding the distance between points. Instead, our results would have been improved with a help of the Haversine formula: d = 2r arcsin sin2 φ2 − φ1 2 + cos(φ1)cos(φ2)sin2 λ2 − λ1 2 Even though there were many ways to compute more accurate NNI, cur- rently calculated NNI will be sufficient when comparing clustering of a crime to other crimes. However, we could develop a better algorithm to compute the NNI, as the current algorithm computed numerous unnecessary infor- mations. For instance, it calculated the distance between all points and compared all values when we could have smartly selected a few points to compare. We would improve our algorithm to incorporate the Voronoi dia- gram and Fortune’s algorithm to reduce the computational time. This would allow us to analyze more data in less time, and we will also be able to calcu- late multidimensional data more efficiently. As shown in Figure 1, the Voronoi diagram is a plot with points divided up by half-planes. Subspaces are divided up such that each subspace con- tains one point and that an imaginary line segment that connects two near points are perpendicular to a borderline. Since the points are now somewhat sorted, this diagram can find the nearest neighbor intelligently and has the computational time of O(logn). The problem is that generating half-planes 16/22
  • 17. 4.2 Hot spots ISYE 4699 take a long time O(n2 logn), and thus will slow down the process. Luckily, the Fortune algorithm can find half-planes faster, and the big O of it is O(nlogn). Therefore, combining two algorithms, we end up with the computation time of O(nlogn). (a) Voronoi diagram step 1 (b) Voronoi diagram step 2 (c) Fortune algorithm Figure 9: Caption place holder In addition to those improvements, we can also filter out avoidable calcu- lations by identifying the unstable queries. An unstable query arbitrarily sets a border around each point so that the algorithm determines which points to include in its process. Along with the integration of algorithms stated above, this improvement will further reduce the computational time. Additionally, the algorithm can be used to find which points are located near a certain point. Finally, we will perform more statistical test on the data set. Our focus will be to reduce errors and computation time as well as to locate zones that need more attention by the officers. Once we have the algorithm, we will be able to suggest new patrol routes to minimize the arrival time at the crime site or the optimized number of officers in each patrol zone. Then by 17/22
  • 18. 4.3 Auto Theft ISYE 4699 comparing with the optimized solution, we can check how efficient current resource allocation is. 4.3 Auto Theft When we checked the hot spots, we noticed that the hot spots for auto theft and for robbery had a lot of overlaps. We were interested in this obser- vation and decided to test the relationship between auto thefts and robbery. Then we realized that criminals’ primary goal of auto thefts was not to com- mit robberies, but rather to sell those cars. If they did not sell cars right away, however, they used that car to commit other crimes, including joyriding (driving around freely), drug dealing, or robbing. One way we used to find the correlation between auto thefts and other crimes was by tracking a stolen car and checking if it was recorded again as a suspect’s car. The most obvious way to do so was by comparing the license plates. However, there was not enough information; many times, it was not viable for witnesses to remember the license plate numbers. Instead, we compared other attributes of the stolen vehicles and suspect vehicles. There were too many information, so we filtered out less important information and ended up with 60 variables. We reconstructed two data sets using them and started our research. One file contained all necessary information about auto theft, such as the offense code and date of crime. Unfortunately, one fourth of crime data did not contain any information about the stolen car, and could have been more helpful with consistency and completeness of documentation. The other file included the information about suspect vehicles. Here, we listed any vehicle that was used to commit any type of crime. This data set also had insufficient amount of data, but we wrote a code to use the best out of these two files. While examining criminals’ habits, we came up with more questions, such as the time delay between car stolen time and robbery time, car types that were vulnerable to crimes, and how the stolen cars were used. In follow- ing paragraphs, we will provide an analysis of police crime data along with derived questions. The easiest way to categorize cars was by their colors and makers, so we made a color versus maker pivot table. The most noticeable information we got from the pivot table was that Dodge, Chevrolet, and Ford were the most popular car types and white, black, and silver were the most vulnerable colors. Particularly, old models in 1990’s were targeted frequently. These 18/22
  • 19. 4.3 Auto Theft ISYE 4699 results were pretty intuitive, as those cars had weaker security systems and criminals did not want to get noticed by robbing fancy cars. However, to our surprise, the thieves showed more interest in luxury cars than we thought they would do. We found out that the main reason for stealing those cars despite their difficulty to do so was because those cars could be sold for high prices. Dodge Chevrolet Ford Honda 0 20 40 60 80 100 120 140 160 Types of cars Numberofcars White Black Silver 19/22
  • 20. 4.3 Auto Theft ISYE 4699 Figure 10: The popularities of car types and their years It was obvious that criminals targeted old, common cars for easy theft and expensive cars for high return. What was interesting, however, was that criminals tended to take cars that were less valuable than cars they used for robbery. In other words, they used newer cars to steal older cars. This could be interpreted in two ways: they wanted small, easy money, or they needed a new car to commit a new crime. We had to know what they did with the stole cars. To do so, we found out how much time criminals spent before committing a crime with their stolen cars. Out of 5270 auto stolen offenses and 4237 suspect vehicle cases, we found 48 exact matches. In these 48 cases, the average time a stolen car was spotted in another crime scene 20/22
  • 21. 4.4 Questions and Goals ISYE 4699 was about 4 hours, if we did not count for some cars that reappeared several days later. Particularly, among the 48 cases, two cars were used to commit multiple crimes in a short time period. Since the cars were used in crime only a few hours after they were stolen and then were sold, we could infer that the reason criminals stole cars was to make their crimes less traceable and to earn some quick cash. Suspect vehicle year Suspect vehicle maker Stolen vehicle year Stolen vehicle maker 2002 Chevrolet 1999 Ford 2010 Dodge 2004 Dodge 2011 Toyota 1996 Honda 2001 Ford 1996 Honda 2008 Nissan 1984 Oldsmobilie Table 2: Examples of suspect stealing less valuable cars 4.4 Questions and Goals Our goals for next semester will be as follows. 1. Upgrade the model used for Georgia Tech crimes to use for Atlanta crimes. 2. Develop a better algorithm on locating the hot spots 3. Find geographic matches and correlations among crimes 4. Suggest an optimized way of allocating resources. 5. Recognize crime patterns To continue with our research, we needed more information about crimes. We will list some questions that are preventing us from advancing. 1. Atlanta is divided up into 5 zones, but the Excel data shows the place of crime by latitude and longitude, not by zones. Given the coordinate of a place, is there a way of telling which zone that place is in? 21/22
  • 22. 4.5 Reference ISYE 4699 2. Has the crime criteria changed over the past few years? In other words, is there a crime that was considered ”type A” crime but now ”type B?” Now that we are familiar with the data and gained insights on crimes in Atlanta, we are certain that our progress will speed up. There was a limitation on the amount of data; however, we learned to make use of small data to come up with noteworthy conclusions. We hope that we establish a generalized algorithm that could be used in many cities. 4.5 Reference Pictures of Voronoi Diagrams: https://www.youtube.com/watch?v=7eCrHAv6sYY 22/22