Kovachev,Stefan, Patrick Reichert, and Hendrik Speck. Crimeblips - Web Based Framework for Crime Incident Analysis and Visualization. 10th International Conference on Information Integration and Web-based Applications and Services (iiWAS2008). November24 - 26, 2008. Linz (Austria).

Crimeblips - Web Based Framework for Crime Incident Analysis and Visualization.

Stefan Kovachev, Patrick Reichert, and Hendrik Speck. University of Applied Sciences Kaiserslautern. Zweibrücken, Germany.

iiWAS2008, November 24–26, 2008, Linz, Austria. (c) 2008 ACM 978-1-60558-349-5/08/0011 $5.00.

ABSTRACT. The crime mapping framework Crimeblibs[1] provides up-to-date crime statistics for neighborhoods throughout Berlin (Germany)[2]. The application maps, visualizes, and analyzes crime incident patterns, allowing users to identify crime hot spots, trends and general patterns. The occurrences, events and data are extracted from public press releases of the City of Berlin Police Department.

Bayesian algorithms are used to extract relevant information, the resulting statistics and visualizations can be filtered and categorized, and addresses are mapped to the nearest known location. The interpretation of mapped and visualized results as well as further conclusions should be discussed with the local police departments. Future versions of the semester project[3] will provide comparisons with incident descriptions in the media, published law enforcement agency data, schools and shopping centers as well as other third party information including real estate prices and indicators.

Categories and Subject Descriptors: H.2.8 [Database Applications]: Data mining, Scientific databases, Spatial databases and GIS, Statistical databases; H.3.1 [Content Analysis and Indexing]: Abstracting methods, Dictionaries, Indexing methods, Linguistic processing, Thesauruses; H.3.3 [Information Search and Retrieval]: Clustering, Information filtering; H.3.4 [Systems and Software]: Current awareness systems (selective dissemination of information--SDI), User profiles and alert services; H.3.5 [Online Information Services]: Web-based services; H.5.1 [Multimedia Information Systems]: Hypertext navigation and maps; H.5.2 [User Interfaces]: Graphical user interfaces (GUI), Interaction styles (e.g., commands, menus, forms, direct manipulation), Screen design (e.g., text, graphics, color)

General Terms: Algorithms, Measurement, Documentation, Performance, Design, Security, Human Factors, Languages, Theory, Legal Aspects

Keywords: Crimeblips, crime statistics, Kriminalstatistik, Polizei, Berlin, mapping, visualizing, map, analysis, visualization, hot spots, trends, incident patters, police department, law enforcement, data mining, GIS, indexing, linguistic processing, clustering, filtering, web based services, graphical user interface, University of Applied Sciences Kaiserslautern

1.INTRODUCTION. The evolution of the web and the proliferation of data not only allow access to new types of entertainment and commerce, but allows content providers to mix data and data layers, generating a completely different user experience.

The analysis of crime incident data became a vital tool for many police departments[4]. In Germany there are several information systems used to gather and analyze data for law enforcnment purposes including: RIVAR (Rheinland-Pfälzisches Informations, Vorgangsbearbeitungs, Auswert- und Recherchesystem), PO-LADIS.net (Polizeiliches Anwenderorientiertes Dezentrales Informationssystem), POLIS.net (Polizeiliches Informations- und Fahndungssystem), ZEVIS.net (Zentrales Verkehrs Informationssystem).

Most of these applications are not available to the public; very few police departments inform citizens through web applications about crime incidents in specific regions. Independent parties however decided to gather crime incidents and publish crime maps for different locations. Those websites are very popular in the United States[5][6] but similar applications in Europe exist only in Great Britain. Different legal systems and privacy laws limit access to crime incidents, police reports or court records.

The Crimeblips framework attempts to close that gap for informed citizens by parsing official crime and press reports of the City of Berlin (Germany) police department. Crimeblips analyzes and classifies the text documents, visualizes the results on a map oriented web interface, and adds statistical data for all city districts. The framework is an open source project hosted by sourceforge.net where it can be downloaded for free.

2. DATA AND RELIABILITY. Crimeblips relies on the publicly available press reports of police departments as its main source of information. The local police departments however do not always provide information about each individual incident; instead even press publications and public awareness must be understood in the legal and social framework of law enforcement.

Press releases can only include incidents which are actually reported to the police, and even then police press reports only include mayor crimes or incidents. Smaller crimes, which generally consume most law enforcements resources, are in most cases not included or covered.

The process of press release generation generally also limits the coverage: Press releases are not the result of an automated process directly tied to individual events or incidents, but created by specific public relation officials within the police departments. As a result of the distribution of tasks and workloads within law enforcement the volume of press releases within a given time frame (for instance per day) remains more or less constant; there is not always a direct correlation to the actual crime landscape or situation.

Additional social expectations, the resources available to the police, as well as political pressure can further influence the coverage and availability of police press reports, resulting the over or under representation of certain crime types, districts or incidents. Another factors influencing the reliability of the source data are the partially overlapping and shifting responsibilities of individual police departments, administrative structures and city districts. These inconsistencies or developments can be used, depending on the individual, social or political intention, to support or undermine the discourse of crime mapping and analysis.

Police press reports are further influence by the current status, norms, or morale; changes in perception patterns, social contract or accepted tolerance levels further impact long term trends and reliability. Examples include teenager violence, sexuality, and drug consumption, which have seen different social interpretations and corresponding law enforcement measures in the last decades.

These limitations in mind it is vital to understand that official police press reports are part of the public relation efforts of police departments. They are specifically targeted towards press and other mass media; they pass several limitations and filtering structures; and over represent certain crime types and incidents through the exclusion of other, often minor crimes and incidents.

3. TECHNOLOGY AND FUNCTIONALITY. 3.1 Information retrieval: The Crimeblips framework consists out of two modules – a crawler and analysis module and a visualization and interaction module. The crawler and analysis module is an open framework responsible for data extraction and analysis, the visualization and interaction module provides the framework for data visualization, user interaction and statistical analysis.

All crime incidents are extracted from the website of the Berlin police department featuring public press releases. The content of these items is in most cases unstructured, pure text and natural language, which impedes the data collection process. The following example features a press report from October 15, 2008.[7]

Pressemeldung. Eingabe: 15.10.2008 - 11:55 Uhr. Laute Musik führte zu Cannabisplantage. Treptow-Köpenick. # 3175. Eine Cannabisplantage entdeckten Polizeibeamte in der vergangenen Nacht in Ober-schöneweide, nachdem sie wegen ruhestörenden Lärms gerufen worden waren. Mieter eines Hauses in der Parsevalstraße hatten gegen 21 Uhr 30 die Polizei alarmiert, da sie sich durch laute Musik gestört fühlten. … Bei der Überprüfung des 20-jährigen Wohnungsinhabers stellte sich zudem heraus, dass er per Haftbefehl gesucht wurde. ...

The press items released on the website of the Berlin police department are unstructured and created manually, rendering the already complex process of data extraction more difficult. Our framework therefore uses different pattern matching and text analysis algorithms to ensure that the press releases are correctly parsed and categorized.

Figure 1. The process of collecting data from the internet

Because of performance considerations the crawler module was developed to function in a multithreaded environment. The crawler perform several tasks including the crawling, parsing, analyzing and indexing different types of web content, including the gathering press releases, represented by Crawler, Subcrawler, EventCatcher, ItemWriter, and ItemCreator (See Figure 1). That includes the crawling and extraction of relevant information from individual web pages using predefined pattern sequences, analyzing that data and ultimately saving that data in a persistent manner. Other third party content gathered by our crawling framework and included in our application includes a database of locations, street names or geographic identifiers of that specific region.

Information such as the crime type or the precise location of the incident had to be extracted with the help of various algorithms, including Bayesian algorithms based on probability calculations. Bayes' theorem, in the context of the Crimeblips application, says that the probability that a crime item is of specific type, given that it has certain words in its description, is equal to the probability of finding those certain words in description of that type, times the probability that any crime items fits in the current type category, divided by the probability of finding those words in any description of a crime press item release:

3.2 Crime Categorization. Once that algorithm is implemented in the crawler our framework is capable of analyzing the textual content of individual crime items and calculating the probability for several predefined categories. For the Bayesian module to work it must be configured with a certain number of words. An enhanced dictionary dramatically improves the quality of our matching algorithms. Table 1 features the main categories or crime types within Crimeblips and the number of press releases gathered until the writing of this article.

Table 1. Categories with the current number of items in them
Arson 78
Murder 8
Robbery 154
Vandalism 45
Drugs 14
Others 53
Assault 64
Messages 131

3.3 Geocoding. To identify the exact address of crime incidents the crawler module uses address cases. These are objects that describe several ways of presenting an address in textual content. They are ordered by their abstraction level (for example whether there is a street number, an intersection between two streets, exactly between two streets or at a train station without street information) and are ordered and evaluated starting from the most precise usage scenario. A dictionary of street names and locations provides the basis for the address extraction within the target region.

Each address case is then reevaluated by using the textual context of the press item and if a match occurs, the algorithm tries to identify the exact address. Once the address is extracted the last phase of the positioning process, geocoding, begins, converting the textual representation of a address into geographical coordinates.

Geocoding generally implements geographical location extraction algorithms as defined in the US Patent 7257570. Because of performance considerations our address cases are sent directly to the Google Geocoder web service and if the result is precise enough, the coordinates are stored for that item.

4. VISUALIZATION. Crimeblips is capable of gathering and processing the several hundred press releases and incident reports published by the Berlin police department within one month. Representing all of these crime incidents or accidents on one map view however would overwhelm the user and result in a less satisfying user experience. Crimeblips therefore provides a comprehensive map view and interaction system, that allows users to get an overview or generic view of trends and events, while also providing access to the detailed information hidden beneath the clutter of crime incident objects.

All crime incident objects pass a clustering phase that makes it easier for the users to identify individual crime activities, types, or regional trends without being overwhelmed with information. The clustering algorithm also addresses issues arising from usability and information aesthetics.

The clustering algorithm merges crime items within a certain area depending on zoom level and user perspective in a way that is still conveys qualitative and quantitative information about the contained incidents. The number of crime individual crime incidents or accidents is represented by a specific numeric indicator and different scales or sizes of symbols (police hat) used to represent crime clusters (See Figure 3). Individual crime incidents or accidents are represented by specific category icons located at the address identified. Further development of the application will focus on the qualitative aspect of the cluster visualization, including but not limited to dynamically created cluster icons which will feature information about the contained item types and categories.

Figure 3. Visualization of crime incidents

Another feature of the framework is an implemented filtering system, reducing the complexity of the visual, and allowing users to focus on specific crime types, date ranges, time periods and city districts. Users can select the ways how data is presented on the map, they can select different map types, zoom and detail levels, and concentrate on information that they find important (See Figure 4).

Figure 4. The crime items filter

Crimeblips further integrates several statistics and comparisons of crime types, trends, and city districts. Statistical charts combine data provided by external sources including population, as well as the percentage of foreigners or unemployed citizens within a certain region, providing users with a rather rudimentary option to identify possible trends and general patterns.

5. RISKS AND CHANCES. Future versions will provide comparisons with incident descriptions in the media, published law enforcement agency data, schools and shopping centers as well as other third party information including real estate prices and indicators. It will be possible to filter that data in many ways allowing users to concentrate only on information that best suits their needs from a vast pool of possible sources.

The spatial turn of crime statistics and serious application frameworks such as Crimeblips must rely on the data provided by law enforcement agencies and police departments. If such ideas or concepts, based on notions such as transparency, accountability and the informed citizen, shall provide valuable (information) services for our communities it is vital, that all relevant data is made available to the platform, that all possible indicators and influencing factors are included or eliminated, and that administration and politics are willing to discuss, act and prevent based upon the findings and results.

Widespread usage and a more complex implementation of frameworks such as Crimeblips might encourage citizens to discuss and change the environmental conditions they are living in, leading to a situation where a friendly competition between individual city districts and areas can dramatically increase the quality of life, foster the community spirit, support and neighborhood, resulting in better educational systems and reduced crime.

6. CONCLUSION. Information on the internet is not only mostly unordered or if a source was found featuring structured information it is still very likely that such a resource does not contain all the required information. Most valuable data is contained within unstructured texts and websites, which makes it difficult to extract and categorize the proper information. That problem can be solved with the use of algorithms that find semantic connections and connotations in unordered textual information.

An application like Crimeblips bridges societal and user expectations as well as structured and unstructured data by providing meaningful layers and representations through web interfaces. Crime mapping applications rely on the availability of data and data layers, gathered by law enforcement agencies or other public services, which in many cases have been paid for by the tax payer. Still, very often the data is not made available to the public, or only in a way, that it is not possible for individuals to get an idea of the overall trends or figures.

In order to improve the information quality on the web, in order to support such applications with meaningful data and indicators, in order to enhance transparency and accountability, and in order to allow users to make informed choices leading to improvements of our way of living, information and service providers such as traffic channels[8] [9] [10] [11], bus and railroad companies [12] [13] [14] or airports[15] should make their data available to the public through convenient and efficient web services.

The crime mapping application Crimeblips tries to accomplish just that – to extract information from public sources and to combine it with additional data showing relations that are normally not visible. The application includes sophisticated algorithms which enhance the quality of the data in the eyes of the user, provides a convenient and efficient interface, and enjoys growing popularity amongst its users.

[1] CrimeBlips. Sourceforge. http://ccrime.sourceforge.net/
[2] CrimeBlips. Prototype. http://crimeblips.informatik.fh-kl.de/
[3] Fachhochschule Kaiserslautern. University of Applied Sciences. http://www.fh-kl.de
[4] Töpfer, Eric. "Daten, Karten, Lagebilder. Mit dem 'spatial turn' in der Polizeiarbeit schreitet auch ihre Geoinformatisierung voran." Telepolis. April 23, 2008, Available: http://www.heise.de/tp/r4/artikel/27/27741/1.html
[5] Outside.In. http://www.outside.in/
[6] YourStreet. http://www.yourstreet.com/
[7] Polizei Berlin. Laute Musik führte zu Cannabisplantage. Pressemeldung. Eingabe: 15.10.2008 - 11:55 Uhr. Treptow-Köpenick. # 3175, http://www.berlin.de/polizei/presse-fahndung/archiv/111694/index.html
[8] Meldungen des Verkehrsfunks (TMC) http://www.berlin.de/polizei/verkehr/stau/index.html
[9] VMZ Berliner Verkehrsgeschehen. http://www.vmz-berlin.de/
[10] Verkehrslenkung Berlin. http://www.stadtentwicklung.berlin.de/verkehrslenkung/verkehrsbehinderungen/
[11] Baustellen in Brandenburg. http://www.ls.brandenburg.de/cms/detail.php/lbm1.c.361298.de
[12] BVG. http://www.bvg.de/index.php/de/Bvg/TrafficReportAll
[13] S-Bahn. http://www.s-bahn-berlin.de/bauinformationen/index.html
[14] Bahn. http://bauarbeiten.bahn.de/berlin-bb/
[15] Berliner Flughafen. http://www.berlin-airport.de/