ICC Programming

Clustering and Networks

3506.2 - Significant locations in auxiliary data as seeds for typical use cases of point clustering

Monday, July 3
1:50 PM - 2:10 PM
Location: Maryland A

To avoid data crowding, overlapping of marker icons or to minimize load on the client's computer, clustering methods are often used on point data in interactive maps. Here, instead of displaying a marker icon for each point, groups of them are aggregated into clusters and their display is limited to one marker icon for the whole group. Popular web mapping libraries and services offer point clustering as a base feature, easily enabled by users without requiring any additional knowledge.
Usually either greedy or grid-based clustering algorithms are used due to their low computational cost and general applicability. In greedy clustering randomly chosen points seed clusters that aggregate their neighbors until all points are assigned to a cluster. In grid-based clustering a polygonal grid, often made out of squares, is placed over the area of interest and points are aggregated per cell.

We argue that the resulting cluster distributions of these standard algorithms often mislead, at least in maps where the spatial distribution of the points follows an underlying irregular pattern like population density. A typical store locator map for example, commonly displayed on company websites, should allow potential customers to determine if any store is available at a certain location. Here such clustering can drastically change the validity of the map: A group of stores, located naturally in a city, might be torn apart into separate clusters, leaving the city displayed as uncovered by any cluster. The city might sit just between the randomly chosen cluster centers of a greedy clustering approach or right on the border between cells in a grid-based clustering. If this happens, the map failed its purpose of showing that stores are existing in the city.

We present a new, straightforward clustering algorithm, where precalculated and weighted points at significant locations in auxiliary data are used as seeds for the clustering process. The seeds serve as “context-aware” cluster anchors for point data that is, or at least appears to be, closely correlated to the phenomena the auxiliary data represents. An example and our primary usecase are local maxima of population density as clustering seeds for point data that has a relation to population density, such as stores. The concept itself is applicable to other phenomena where previous knowledge exists.

In our approach the cluster seeding points are located on the local population maxima. These local maxima can be calculated on varying scales using grid-based population maps or derived from datasets of populated places like settlements or metropolitan areas. Each seed's weight is used to calculate the extents of its catchment area. Neighboring points of the dataset-to-be-clustered are then aggregated into clusters per catchment area. Each seed's weight and its influence on the catchment area can be specified dependent on the scale, allowing a granular and dynamic control over suitable locations.

Prototyping and preliminary tests suggest that this approach can drastically improve the quality of point clustering for specific use cases. Detailed results are yet to be evaluated. Both analysis of quantitative metrics as well as a study on user acceptance and task efficiency will be performed. Further work may include a free, global, multi-level dataset of suitable cluster seeding points for these purposes.

Johannes Kröger

Research Assistant
HafenCity University Hamburg, Lab for Geoinformatics and Geovisualization

After finishing both a B.Sc. and M.Sc. in Geomatics at HafenCity Universität Hamburg, Germany, Johannes Kröger now works as a research assistant in the Lab for Geoinformatics and Geovisualization at the same. His interests include a wide range of topics in cartography and geoinformatics. His recent work includes analysing spatial effects in gas pricing, the (re-)design and implementation of a tangible map interface for participatory urban planning and visualising bathymetric data in a journalistic context.

Presentation(s):

Send Email for Johannes Kröger

Lawrence Stanislawski

U.S. Geological Survey

Lawrence Stanislawski is a cartographic research scientist for the Center of Excellence for Geospatial Information Science within the National Geospatial Program of the United States Geological Survey. He studied surveying and forest resource conservation at the Univesity of Florida, and he currently lives in Rolla Missouri. His research interests involve generalization, multiscale representation, geospatial data accuracy, and high performance computing.

Presentation(s):

Send Email for Lawrence Stanislawski


Assets

3506.2 - Significant locations in auxiliary data as seeds for typical use cases of point clustering

Handout

Attendees who have favorited this

Please enter your access key

The asset you are trying to access is locked. Please enter your access key to unlock.

Send Email for Significant locations in auxiliary data as seeds for typical use cases of point clustering