Understanding and Visualizing Crime Incidents: A Yearly Breakdown

Data Analysis: Extracting Number of Occurrences Per Year

Understanding the Problem and Requirements

The given Stack Overflow question is related to data analysis, specifically focusing on extracting the number of occurrences per year for a particular crime category from a CSV file. The goal is to create a bar graph showing how many times each type of crime occurs every year.

Background Information: Data Preprocessing

Before diving into the solution, it’s essential to understand some fundamental concepts in data analysis:

  • Data types: In Python, pd.DataFrame objects store data as Series (one-dimensional) or DataFrame (two-dimensional). The Date column contains datetime values, which can be converted to a more suitable format for analysis.
  • Data manipulation libraries: Pandas (import pandas as pd) is a powerful library used for data manipulation and analysis.

Step 1: Import Libraries and Load Data

To solve this problem, we need to import the necessary libraries and load the CSV file:

import pandas as pd
import matplotlib.pyplot as plt

Next, we’ll create a function that loads the data from the CSV file and performs any necessary preprocessing steps.

def load_data(file_name):
    try:
        # Load data from the CSV file into a DataFrame object.
        data = pd.read_csv(file_name)
        
        return data
    
    except FileNotFoundError:
        print("File not found. Please check the file path.")
        return None

Step 2: Preprocess Data

The next step is to preprocess the data by converting the Date column into a suitable format for analysis:

def preprocess_data(data):
    # Convert 'Date' column to datetime type.
    data['Date'] = pd.to_datetime(data['Date'])
    
    return data

Step 3: Group Data by Year and Count Occurrences

To extract the number of occurrences per year, we can use the groupby function along with the count method:

def group_by_year(data):
    # Set 'Date' as the index.
    data.set_index('Date')
    
    # Group by year and count occurrences.
    grouped_data = data.groupby(data.index.year).count()
    
    return grouped_data

Step 4: Plot Data

Now that we have the preprocessed data, let’s create a bar graph to visualize the number of occurrences per year:

def plot_data(grouped_data):
    # Select 'IncidntNum' column for plotting.
    plot_data = grouped_data['IncidntNum']
    
    # Plot as a bar chart.
    plot_data.plot(kind='bar')
    
    return plot_data

Step 5: Filter Data by Category

To filter the data by category, we can use the loc function:

def filter_by_category(data, category):
    filtered_data = data.loc[data['Category'] == category]
    
    return filtered_data

Step 6: Combine Code into a Single Function

Here is the complete code combined into a single function:

import pandas as pd
import matplotlib.pyplot as plt

def get_occurrences_per_year(file_name, category):
    def load_data(file_name):
        try:
            data = pd.read_csv(file_name)
            return data
        
        except FileNotFoundError:
            print("File not found. Please check the file path.")
            return None
    
    def preprocess_data(data):
        data['Date'] = pd.to_datetime(data['Date'])
        return data
    
    def group_by_year(data):
        data.set_index('Date')
        grouped_data = data.groupby(data.index.year).count()
        return grouped_data
    
    def plot_data(grouped_data):
        plot_data = grouped_data['IncidntNum']
        plot_data.plot(kind='bar')
        return plot_data
    
    def filter_by_category(data, category):
        filtered_data = data.loc[data['Category'] == category]
        return filtered_data
    
    data = load_data(file_name)
    
    if data is not None:
        data = preprocess_data(data)
        
        # Group by year and count occurrences.
        grouped_data = group_by_year(data)
        
        # Plot as a bar chart.
        plot_data = plot_data(grouped_data)
        
        # Filter data by category.
        filtered_data = filter_by_category(data, category)
        
        return plot_data, filtered_data
    else:
        print("Data loading failed.")
        return None

# Test the function.
file_name = 'crimes.csv'
category = 'theft'

plot, filtered = get_occurrences_per_year(file_name, category)

if plot is not None and filtered is not None:
    plt.show()

This code first loads the data from the CSV file into a DataFrame object. Then it preprocesses the data by converting the Date column to datetime type.

Next, it groups the data by year and counts the occurrences using the groupby function along with the count method.

After that, it plots the data as a bar chart to visualize the number of occurrences per year.

Finally, it filters the data by category using the loc function.

The test case uses the ‘crimes.csv’ file and filters the data by the ’theft’ category.


Last modified on 2024-02-06