Imagine this scenario: you are driving to a city for a business meeting or to meet a friend. You look for a place to park your car, and you find a parking lot near your meeting location. So far so good! But since you are not familiar with the area you ask yourself "is this neighborhood safe enough?".
Now imagine another scenario: the new Marvel movie came out this week! You want to go see it but your friends are all busy and let you know last minute. You try to book a ticket at your favorite cinema but it's sold out, so you find a cinema with available tickets in the other part of town. The same question pops up "Is this neighborhood safe?".
In this post, we are going to use data to answer this question. We aim to utilize crime reports and provide you with information so that you can make an informed decision. For this analysis, we have employed the police data of the city of Chicago. By analyzing the most recent crimes, we employ visual representations of areas that suffer the most to highlight "dangerous zones" for every hour of the day. Let's have a look at the data first, to understand what we have at hand.
Dataset
The dataset represents documented crime incidents in Chicago from 2001 till today, excluding the past seven days, except for murders where data is available for each victim. Information is sourced from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. To safeguard the privacy of crime victims, addresses are provided only at the block level, without pinpointing specific locations.
Let's see what has happened over the last 23 years concerning the type of crimes.
The dataset also contains information about the area of each reported crime.
For this analysis, we have used the data of the last 5 years i.e., all records from 2019 till today. We will explore two use cases based on the type of crime and the location which was reported.
Use Case | Crime type | Crime location |
---|---|---|
Case 1 |
|
|
Case 2 |
|
|
For both use cases, the processing steps are the same, the only differentiation is the subset of data employed for each case.
Processing steps: visualizing crime data for safety
Lets begin by visualizing crime data for safety and awareness. To visualize the data, we can either create a heat map or partition the map into segments. Let's take a look at the overall vehicle theft dataset.
This is not informative, it seems that the whole city is covered in crime. We will have to do a more granular visualization to get a better intuition. This is of course slightly more complex.
First, we have to split the map into blocks. A block is characterized by a bounding box which includes information about the coordinates and also temporal information regarding the type of crimes and frequency of them inside the bounding box.
After the partitioning of the map into blocks, we create some frequencies regarding the type of crimes that took place within the block for each hour of the day over the last 5 years. E.g., 13:00 - 14:00 7 motor vehicle thefts took place over the last 5 years, and so on. Using this temporal information we can decide about the risk associated with each block.
For estimating the risk value associated with each block, we have built up a formula that incorporates information about the current block, the information from its surrounding blocks as well as its historical information using a decay factor. More analytically the risk is described as follows:
The crime parameter of block "i" is the frequency of the targeted crime at the time period "t". The α parameter is the weight factor of the summation of the neighborhood "j" blocks and δ is the decay factor of the crime count at the previous time period "t-1". For the purpose of this analysis α=0.75, and δ=0.5
The neighborhood of block "i" is extracted by the surrounding blocks. In this analysis, we consider the blocks that are directly connected to the block "i" i.e., all the blocks in level 1 as shown below. One could also consider blocks in level two with a smaller weight parameter and so on for level 3, etc.
Visualization
Now that we have formulated the risk factor of each block, let's have a look at the temporal behavior of blocks concerning the two use cases. A threshold value has been applied to reduce the bounding boxes and highlight the areas of frequently reported crimes. This DOES NOT mean that other areas are completely safe but there are fewer or zero reported crimes.
For the first use case (sexual assults) the risk map is the following (red indicates high risk):
Not to our surprise, it seems that there is a high crime rate in the city center, but there are also reported crimes across the city. In some cases across the city, the data indicate areas that have reported crimes in different hours of the day. If we set δ near 1, we will see bounding boxes persisting throughout the day and becoming completely red in cases of reoccurring crimes.
For the second use case (motor vehicle theft) the behavior is similar but the reported crimes are way too many.
For the second use case, we can imagine that anti-theft insurance companies would charge extra for people who live downtown.
From visualization to risk estimation, the way is to create an API service. This API would receive the time of the request and the geolocation to match it to the corresponding box. After the request is received, the corresponding box will provide the risk score of the area.
We are using reported crimes to create these estimates. Again, this DOES NOT mean that in areas with low or no reported crimes, the risk is zero rather low. At the end of the day by using frequencies we are able to generate these score, if crimes are not reported or not recorded, then the risk scores do not depict the full picture.
Conclusion
In this post, we vizualized the temporal behavior of crime activity concerning two different types of crimes. By extracting the frequency of crimes per block, we calculated a risk score based on the formula above. This score indicates high-risk areas throughout the day and can potentially help users avoid such areas or at least be very cautious. One possible extension is to incorporate all the types of crimes within a block and design a formula that will take into consideration all the crimes within the block as well as its surrounding blocks to calculate a multi-aspect risk score.
** If you are interested in other ML use-cases, please contact me using the form (and also include a publicly available dataset for this case, I'm always curious to explore new problems).
Comments