Visualising geospatial data using SVG, Python and Javascript/d3

Geography is an immensely important tool in the modern business environment. Doing business globally was the first wave of the globalised economy, and the widespread use of internet-based services and e-commerce only accelerated and solidified the notion. The next wave came with mobility in the form of location-based services. Today everyone can create location-based tools that span the globe, and many companies do, but in my view this is akin to a car manufacturer producing a single model for Chinese farmers and European urban commuters alike. Hyperlocality, the concept that each location is worthy of particular attention, that it has unique characteristics and inherent value impossible to tap by going ‘global’ is slowly becoming a major trend. In 2009, while bootstrapping AthensBook, ThessBook and GEO|ADS it was our only choice.

For us geography is everything. We didn’t start global or even regional. We started local, paying attention to the needs of the people in Athens and Thessaloniki and expanded our featureset accordingly; with limited resources and funding. The importance of understanding geography is amplified when your geographic realm is a city, as opposed to a country or region. When we launched GEO|ADS in early 2009, the business world didn’t really know what to do with it. Even today, mobile advertising is still in the process of coming up with standards, conceptual or technical, the world is trying to understand how to use it, how to extract value from it. With GEO|ADS we were the first platform to provide meaningful, consistent high-resolution spatial analytics to our customers in 2009 for Athens and Thessaloniki. We always thought this was a fundamental point where we could contribute valuable, differentiated feedback, compared to the Web or traditional media.

Our spatial analytics reports have long been generated largely automatically as kml files, a de facto XML-based standard that originated alongside Keyhole’s Earth Viewer. Keyhole was a company funded by In-Q-Tel, CIA’s venture capital appendage, and focused on a single (publicly offered) product, the Earth Viewer. The application was very much ahead of its time and it was only through the acquisition of the company by Google in 2004 that turned it into a product enjoyed by the masses: Google Earth. Choosing KML was an easy decision for us, as the format is open and the output usable by everyone — Google Earth is free and infinitely more accessible, usable and engaging than your average GIS application. It is also deeply interactive, as users can zoom, pan and rotate around regions of interest extremely quickly, allowing business development managers and marketers choose locations for campaigns, expansion, targeting more easily and quickly than it would have been possible otherwise. Being XML based meant that we could write a relatively short Python script and leverage all the amazing facilities KML and Google Earth provide.

In addition to customer reports, however, as part of our operations at Cosmical, we often employ spatial analytics for internal use, to understand how our users interact with our applications, identify potential improvements or interesting types of venues to be considered for future versions and, enhance our hyperlocal character by identifying local trends and featured venues. For this purpose we have been using web-based technologies and more specifically SVG in conjunction with CSS and Javascript to create interactive, scalable and æsthetically coherent reports.

Reports coming to the Web

Since the end of last year we’ve expanded our in-house administrative dashboard for GEO|ADS by including a real-time spatial visualisation component for the greater Athens metropolitan area. This map displays the administrative divisions of Athens (i.e. Kallikratis) and overlays important information about GEO|ADS, including local ads that are currently running (for which you can get more information by hovering your mouse on them) and colour-coding the municipalities based on the last 50K impressions. This part of the dashboard is extremely impressive as it provides an immediate indication of where our AthensBook and ThessBook apps are being used, where users engage with our ads and how we can maximise value for our customers. Much of the data we currently use in-house will soon be available to our paying GEO|ADS customers along with a slew of statistics about their campaigns in a web portal available in the coming months.

Obviously creating our SVG based spatial analytics dashboard meant a wholly different approach, as the goal was very different to the one solved by KML files: We now cared more about the accessibility of aggregate information at a glance than the provision of detailed, extensive high-resolution spatial analytics. To create the spatial analytics component we made use of a number of tools and libraries.
First, there’s the data. Our existing KML-based reports employ a high-resolution grid overlay over Athens that provides a very engaging view regarding the use of the applications and how (or rather where) users engage with them. You can visualise impressions, clicks, CTRs and even conversions and generate heatmaps on the fly. We have even experimented with network-enabled KML generation, whereby Google Earth accesses a dynamically generated KML on the server every few seconds and updates the overlay dynamically (extremely cool, but not worth the resource and latency overheads). While impressive, the dependence on Google Earth is cumbersome to some people unfamiliar with the application. Then there’s the need for a more ‘traditional’ presentation of the data, along established administrative lines and the need to ‘print’ the reports.

To solve this we decided to go with the (new) Kallikratis’s administrative divisions (municipalities). This way it would be easier for customers to ‘see’ how popular they were in Chalandri vs the centre of Athens, instead of trying to ‘decipher’ the high-resolution grid (Some people are not that hot about geography, but obviously we will keep providing our KML high-res grid version as well, for those keen on more detailed spatial analytics).

The Implementation

To create the municipalities map, we got the administrative polygons from geodata.gov.gr. The information provided by geodata.gov.gr was needlessly complex for our purposes. This meant that SVG visualisation, let alone dynamic visualisation, was out of the question in a web context. To remedy this we simplified the provided shapefile by using the free, online flash-based tool MapShaper. This made interactive, yet high quality (viz. ‘printable’) SVG visualisation a realistic proposition.
The next step was converting our ‘simplified’ shapefile to a Web-friendly GIS format; that was no other than GeoJSON. After converting the simplified shapefile to GeoJSON, we were left with a .geojson file measuring a mere 257KB; and that included all of Attiki, including information like the municipalities’ names etc.

Having the map data in place meant we needed a way to render it. To make our lives easier, we employed what has become our staple library for SVG visualisation, d3. This is an extremely impressive piece of work by Mike Bostock from Stanford’s Visualisation Group (the people that also created Protovis a few years ago), and its geo extensions allowed us to ‘load’ the GeoJSON file. The GeoJSON data were described using WGS84 coordinates, so we employed d3’s projection facilities. Specifically we defined our projection as xy where:

var xy = d3.geo.albers()
 .origin([24,38])
 .parallels([29.5,38.7])
 .scale(104000)
 .translate([600,300])

and then used it in defining a path:

var path = d3.geo.path().projection(xy);

The projection above means that we could play with the scale, origin and parallels parameters and enable panning and zooming by the user. Actually showing the GeoJSON data on our ‘map’ was then a matter of reading the appropriate file and using our ‘path’ object to draw it:

var municipalities = svg.append("svg:g")
   .attr("id","municipalities");
d3.json("attiki.json", function(json) {
       municipalities.selectAll("path")
           .data(json.features)
	   .enter().append("svg:path")
	   .attr("fill","#111")
	   .attr("d", path)
... other stuff we want to show on the map ...
});

The d attribute of the path element is what contains the actual ‘drawing’ information. This resulted in an ‘outlines’ map of Attiki, with all the ‘municipalities’ drawn and their information (e.g. name) available to us via the GeoJSON array. Colouring them translates to using different CSS classes with the appropriate fill attribute, depending on the count of impressions. This information is provided by our analytics infrastructure in a dictionary/Javascript object, of the form:

{municipalityName:metricCount,..}

A function takes care of assigning the appropriate class by mapping the (normalised) values to one of four CSS classes ranging from dark gray to bright red for no impressions to maximum impressions respectively. The CSS class was applied by using d3’s convenient selectAll method:

munis.selectAll("path")
    .attr("class",assignClass)

where assignClass is the Javascript function returning the appropriate ‘class’ name.

Mapping Data to Municipalities

All this was on the ‘client-facing’ component. Actually counting impressions per municipality was the most interesting part of this endeavour: determining which municipality a ‘point’ belongs to is akin to throwing a dart on a world map and figuring out which country it fell on. Easy enough for a human, not so much for a computer. To process the stream of ‘impression’ data (that comes from our server infrastructure, which is not covered in this post), we used Python, a pretty popular language (both internally at Cosmical and in the not-totally-braindead-GIS crowd) and a combination of OGR, Shapely and Rtree. In our initial version we used only OGR and Shapely, effectively calling point.within(geom) for every single one of the 50K points in our data set and every single one of the municipalities (encoded as features in our simplified shapefile and converted to Shapely features).
This was further optimised with an Rtree index. By creating an index of the municipalities of Athens, effectively storing their ‘bounding boxes’, determining the municipality that a point belongs to became a much more efficient process: using the index, one or more (usually between one and four) candidate municipalities were returned. Shapely was then asked to find in which one of the (one to four) municipalities the point belongs to. Once it found it it proceeded to the next point. The R-tree optimisation cut the time it takes to process the 50K impressions down by a staggering 80%!
Using the same constructs as before, we used d3 to provide javascript handlers for hovering and clicking on regions, showing tooltips, and overlaying ‘local’ (1.5km radius) GEO|ADS on the map.

The result of all this is a near-realtime visualisation of the number of ad impressions in different regions of Athens, colour coded appropriately and additional, interactive overlays of information about running ads, statistics etc. with minimal use of imperative constructs and a data-driven design that is both extensible and extremely efficient.