Friday, April 1, 2016

Fashion Trends in Time Series

Originally appeared on the Ather Blog.
For those who can't be bothered to go through till the end, these visuals are now available in R in the ggTimeSeries package.



Time Series Data

Any data which has some temporal identity can be considered time series data.

Your internet data consumption every month, tracked over multiple months, would comprise time series data. If you record the distance your car has been driven every week over a long duration then you are creating time series data. Flagging every instance that you used the phrase, ‘So what else is up?’ on a phone call would also make time series data. Time series data, actually, is all around.



The Legacy of Line Charts

The first step in analysing data is usually visualising it to glean some simple insights.

Legacy portrayals of time series data would be based on line charts, which have been around since the early 1700s (source: Wikipedia). They facilitate trend detection and comparison, are simple to draw, and easy to understand; all in all a very well behaved visualisation. In modern times, their use is widespread from the heartbeat monitor at a hospital to the multiple-monitor display at a trader’s desk.

We all remember these days [1] -






Alternatives to Line Charts

However there are cases when the data scientist becomes more demanding and specific. Five alternatives available to such a data scientist are listed below. We are a smart and connected technology company ( hence the mandatory IoT section above ), and we’re also a clean energy company so we decided to use meteorological data from random weather stations somewhere in the USA to make some examples. [2]

1. Calendar Heatmap

A calendar heatmap is a great way to visualise daily data over one or more years. The smallest box is a day, the thicker borders demarcate months, and a year forms one entire box. Its structure makes it easy to detect weekly, monthly, or seasonal patterns. A line chart might also point to a trend but because there is no context of a month or a week in a regular line chart, the viewer usually needs to do some further analysis to arrive at that conclusion.

The below chart plots the daily maximum temperature recorded over two years. Can you make out the summer months from winter? Does 2015 look warmer than 2014? Do weekends in the early part of the year look slightly warmer than the weekdays? When would you say the onset of winter usually happens? Can you make out the brief periods of respite from the heat in the summers?


2. Horizon Plots

Imagine an area chart which has been chopped into multiple chunks of equal height. If you overlay these chunks one on top of the the other, and colour them to indicate which chunk it is, you get a horizon plot. Horizon plots are useful when vertical space is constrained, when visualising y values spanning a vast range but with a skewed distribution, or trying to highlight outliers without losing context of variation in the rest of the data.

The below chart plots the daily maximum temperature recorded over two years. Can you spot the hottest day? The coldest day? Which of the questions posed against the previous chart are you able to answer?



3. Steamgraphs

A steamgraph is a more aesthetically appealing version of a stacked area chart. It tries to highlight the changes in the data by placing the groups with the most variance on the edges, and the groups with the least variance towards the centre. This feature, in conjunction with the centred alignment of each of the individual parts, makes it easier for the viewer to compare the contribution of the individual components across time.

Here is a plot of cumulative snowfall over one year for a bunch of weather stations. Can you spot the sudden increase in snowfall in January? Can you make out which weather station contributed the highest to this jump? When does the snowfall usually cease for the year?



4. Waterfall

In some cases, instead of the values itself, you might want to see the changes in the values. Instead of plotting the coordinates, we plot rectangles which stretch between sucessive coordinate. The height of the rectangle signifies the changes in the value, the width signifies the change in time, and the top signifies the final value attained. The reds and greens signify the drops and rises respectively on that particular day. You can also make out the contour of the overall trend.

Here is a plot of the depth of snow for a random weather station. Can you spot the day the highest increase was recorded? The week with the highest fall?



5. Occurrence Dot Plot

In infographics, this one is a favourite alternative to bar charts. For rare events, the reader would find it convenient to have the count of events encoded in the chart itself instead of having to map the value back to the Y axis.

We’ve slightly abused this one by plotting the amount precipation instead of discrete events. In our defense, there aren’t too many things that happen in meteorological data.




ggTimeSeries

R users, we’ve open sourced the code and you can create these plots yourself! Check out the ( under development but works for the most part ) ggTimeSeries package.


Attributions

[1] Plot created in R using the ggplot2 and ggthemes packages

[2] Data downloaded from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/

Menne, M.J., I. Durre, B. Korzeniewski, S. McNeal, K. Thomas, X. Yin, S. Anthony, R. Ray, R.S. Vose, B.E.Gleason, and T.G. Houston, 2012: Global Historical Climatology Network - Daily (GHCN-Daily), Version 3.12

No comments:

Post a Comment