Logo

The Data Daily

Interactive Data Visualization of Geospatial Data using D3.js, DC.js, Leaflet.js and Python // Adil Moujahid // Data Analytics and more

Interactive Data Visualization of Geospatial Data using D3.js, DC.js, Leaflet.js and Python // Adil Moujahid // Data Analytics and more

// tags python javascript data visualization d3.js dc.js leaflet.js
The goal of this tutorial is to introduce the steps for building an interactive visualization of geospatial data.
To do this, we will use a dataset from a Kaggle competition to build a data visualization that shows the distribution of mobile phone users in China. We will also create additional charts that show the usage patterns, the most popular phone brands, and users’ age segments and gender. We will be able to filter the data by the different attributes and see the results reflected in the map and all charts.
We will cover a wide range of technologies in this tutorial: Pandas for cleaning the data, Flask for building the server, Javascript libraries d3.js, dc.js and crossfilter.js for building the charts and Leaflet.js for building the map.
The source code for this tutorial can be found in this github repository .
Below is an animated gif of the interactive data visualization dashboard that we will be building in this tutorial.
1. The Case Study
In this tutorial, we will use a dataset from a Kaggle competition called "TalkingData Mobile User Demographics" . This dataset is provided by TalkingData, China’s largest third-party mobile data platform. It contains app usage data, geolocation data and mobile device properties.The goal of the competition is to predict the gender and age segments of users based on the data provided.
Data visualization is an important first step in the data analysis workflow. It enables us to effectively discover patterns through graphical means, and to represent these findings in a meaningful and effective way.
The dataset that we will use contains various attributes that can be combined together to build interesting data visualizations. Geospatial data is particularly interesting, as it allows us to see how the user profiles and usage behavior changes based on the location.
In this tutorial, we will build a data visualization that combines a map that shows user locations together with various charts that summarises users’ information and usage behavior. We will make this visualisation interactive, so we can drill down into a particular user segment or location.
2. System Architecture
For our data visualization, we need a system architecture that handles the following:
Cleaning and structuring data for visualization. We will use mainly Python’s Pandas library for this.
Serving static files (html, css and Javascript file) and data to the browser. We will use a Python lightweight server called Flask for this.
Building the charts and map. We will mainly use 3 Javascript libraries for this. DC.js, D3.js and Leaflet.js.
3. Data Preparation
We start by downloading the dataset from the competition website. You need to create a Kaggle account and agree to the competition rules to download the data.
We will be using 3 csv files: gender_age_train.csv, events.csv, phone_brand_device_model.csv.
gender_age_train.csv: This file contains the device id, gender and age of users.
events.csv: This file contains information about phone events triggered by the users. Each event has an id, a timestamp and location info (latitude and longitude).
phone_brand_device_model.csv: This file contains the brand and model for each device.
We start by importing pandas.
import pandas as pd
Next, we read and merge the different datasets into a single Pandas DataFrame that we call df.
gen_age_tr = pd.read_csv(data_path + 'gender_age_train.csv') ev = pd.read_csv(data_path + 'events.csv') ph_br_dev_model = pd.read_csv(data_path + 'phone_brand_device_model.csv') df = gen_age_tr.merge(ev, how='left', on='device_id') df = df.merge(ph_br_dev_model, how='left', on='device_id')
Next we add the english phone brand to our DataFrame.
top_10_brands_en = {'华为':'Huawei', '小米':'Xiaomi', '三星':'Samsung', 'vivo':'vivo', 'OPPO':'OPPO', '魅族':'Meizu', '酷派':'Coolpad', '乐视':'LeEco', '联想':'Lenovo', 'HTC':'HTC'} df['phone_brand_en'] = df['phone_brand'].apply(lambda phone_brand: top_10_brands_en[phone_brand] if (phone_brand in top_10_brands_en) else 'Other')
Next, we add the age segment of users to the DataFrame.
def get_age_segment(age): if age

Images Powered by Shutterstock