Friday, July 29, 2016

Data Science of Debt in Seven Charts

To hear some people tell the story, we are on the verge of economic disaster.  According to some, the problem is that debt continues to “explode.”  The data science may not support these claims.  The argument for explosion rests on dubious data presentations, which, at best, make these claims difficult to assess, and, at worse, contradict them.  Given the importance of this topic, I want to use this context to make a few comments on the presentation of quantitative information.  Today's post has little to do with eDiscovery, but a lot to do with data and how we analyze and present it.

Data science is principled story telling about data.  But inadequate understanding of data can lead to misleading stories.  Moreover, how you present the data has a big impact on how people understand the story that the data are telling.  This is a story about misrepresenting data, leading to the wrong conclusion.

In particular, I want to focus on an analysis by Grant Williams.  Williams created a 40 minute video called “Crazy, A Story of Debt,” in which he claims that there is too much debt, which will cause future economic collapse when that debt comes due.  He argues that “The relationship between [total debt and GDP] is what everything that we have been through has been all about.”  He argues specifically that high levels of debt were the cause of the 2008 credit crisis and that the level of debt has increased even more since then.  Williams argues that “absurd levels of short-term debt,” have grown more absurd since the Great Recession.
Figure 1. The graph to rule them all: debt and GDP from 
1951 to 2015.
In the first chart (Figure 1), I have redrawn the one that Williams calls the one chart to rule them all.  The debt and GDP data are available from the Federal Reserve of St. Louis.

This chart shows the total outstanding debt for the US (“All Sectors; Debt Securities and Loans; Liability, Level”) for each quarter from 1951 to 2015.  It is clear that the amount of debt has increased substantially over that time period, and that it has increased at a higher rate than the GDP (Gross Domestic Product; the market value of goods and services produced in a country), particularly over the last 35 years.

What is less clear from Figure 1, is whether the debt increased at the same rate after 2008 relative to the rate before 2008.  The time scale on the chart makes the post-2008 debt look like it is increasing substantially.  And, it is difficult to compare visually two slopes in a line graph.

Figure 2. The average rate of increase per day of 
the debt prior to 2008 and its average rate after 2008.  
The rate was calculated as the slope of the line from 
Figure 1 from 2001 - 2007 and from 2008 onward.
Figure 2, on the other hand, makes this comparison explicit. From 2001 to 2007 the debt increased by $9.5 billion per day.  After 2008, it increased at the much lower rate of $3.4 billion per day.  Both rates may be economically unhealthy, but it would be wrong to claim that the debt continues to grow unabated after 2008. Its growth is substantially lower.

We can focus our chart on the period since 2008, where Williams focuses his attention.  This chart is shown in Figure 3.


Figure 3 shows just the time period since 2008.  The increase in the debt level over this time period does not look as dramatic in this graph as is does in the first one, though the numbers are exactly the same.

Figure 3. Debt and GDP from 2008 to 2015. These are 
the same data as shown in Figure for the time from 2008 
to the end of 2015.
Some things are easy to see in this graph, but some things are more difficult.  Williams argues that the relationship between Debt and GDP is important.  We can easily see that the debt since 2008 is larger than the GDP.  We can easily see that it has increased since 2008.  We have difficulty, however, determining whether the debt has increased or decreased relative to GDP during the time period.  For that, we can replot these data as the ratio of debt to GDP.  If debt is growing as a percentage of GDP, as Williams asserts, then we should see a line that generally rises over the time period.  This line is shown in Figure 4.

Larger economies are likely to have larger debt, all other things being equal.  The GDP is a global measure of the economy’s output.  Looking at the debt as a proportion of GDP presents a very different story than looking at the absolute dollar level.  The relative amount of debt has actually declined even as the absolute value has increased. 

Both debt and GDP have increased since 2008, Debt has grown more quickly than GDP (as measured by the slope of the two lines), but debt as a percentage of GDP has actually declined since 2008.


Figure 4.  The ratio of debt to GDP for the period
from 2008 - 2015.
  In this version, it is easier to see 

that debt has fallen as a percentage of GDP since 2008.
Williams took advantage of people’s generally poor intuitions about fractions and proportions when both the numerator (debt) and denominator (GDP) change.  Here we are comparing two lines, both of which are increasing.  Many people might intuit that for debt to be a constant fraction of GDP, the two lines would have to increase at the same rate—that a billion dollar increase in debt would have to correspond to a billion dollar increase in GDP.  In reality, if debt and GDP increased in lockstep, we would see a decrease in the proportion.  For example, if debt started at $54 trillion and GDP started at $14 trillion, the ratio ($54 / $14 trillion) would be 3.85.  If they both increased by $10 trillion, the ratio ($64 / $24 trillion), would be 2.67.  In contrast to the simplistic expectation, as in the present situation, the numerator can increase at a faster rate than the denominator and still result in a decreasing ratio.

Williams takes advantage of this poor intuition when he implies that the debt is growing out of control relative to the GDP.   Debt may be too high, but it is definitely not growing relative to GDP following 2008. Can it be said to be "out of control?"  I don't know.

In contrast to the overall debt, the US Federal Debt has, in fact, increased since 2008, both in absolute terms, relative to GDP, and relative to the total debt. 

Figure 5.  The ratio of Debt to GDP before and after 
the great recession of 2008.  This presentation makes it 
easier to see that debt was increasing rapidly before 2008 
but has been decreasing since then, both as a proportion 
of GDP.  The right half of this graph shows the same data 
as Figure 4.
Again, I cannot say whether this increase is a problem.  The US government has been incurring debt faster than the GDP has been growing. Obviously, if Federal debt is growing relative to GDP while overall debt is shrinking, it must be true that Federal debt is taking up an increasing proportion of the total debt.

Conclusions

Good data science starts with a question, analyzes data to address that question, and presents the results of that analysis in a form accessible to the target audience.  In the present context, the primary question is whether there has been an explosion of debt relative to GDP since 2008.  The answer to that question seems to be a resounding no.  Although the debt has, as expected, increased, it has been at a lower rate than before 2008 and it has actually declined relative to GDP.

In conducting this analysis, we have had to make some decisions about how we interpret the informal language of English into precise testable hypotheses.  We interpreted “explosion” to mean a higher rate of increase after 2008 than before.  We interpreted “relative to” to mean the ratio of debt to GDP.  Effective data science, or any other kind of science, for that matter,  always requires such translations from informal language into mathematically precise language.

Figure 6. US government debt and GDP since 2008.  
Unlike overall debt, Federal debt has increased since 2008, 
but that relationship is relatively difficult to observe 
when both debt and GDP are increasing.
Many people have a difficult time assessing claims about rates.  They often find it difficult to understand what it means to say that some factor increased, but at a lower rate than before.  Although people can, if they think, about it, understand that the GDP growth rate is lower at one time than at another, if they don’t actively work at it, they might expect that a decreasing growth rate would result in a decreasing GDP.  It is  not that people are incapable of understanding such rate claims, but they may find it difficult.

Effective data science presentations, therefore, are the ones that reduce the cognitive load of making difficult comparisons by designing visualizations that make these comparisons as explicit as possible.  Figure 5 makes the comparison of rate growth before and after 2008 more clear than Figure 1 does and Figure 2 focuses on it even more directly.

A lot of data science focuses on big data and machine learning, but good data science is necessary when dealing with more moderately scaled data sets, as well.  Basic line charts can provide interesting and useful information, but they can also be misused.

The data scientist's conclusion should be apparent to the reader from the chosen visualization.  The audience should not have to dig through the graph to assess ones claims (See Figure 4, debt relative to GDP).

Figure 7. Federal debt a proportion of GDP.  This version 
makes it clear that the Federal debt has been increasing 
relative to GDP since 2008.
Data science can help us to make sense of the world around us.  Effective data science can help to guide policy and predict the future.  Even relatively simple data, however, must be handled with care.  Insights are only possible when the data speak loudly through sound analysis and sound presentation.


The goal of effective data science should be to find patterns in data, not to selectively present the data to support preformed conclusions.  Visualizations can help to make data more comprehensible, but they can also be used to mislead.  Visualizations are not arbitrarily related to the data, rather, the combination of theory and data constrain the kind of visualization that is appropriate and useful.