3.7 Comparing one city’s data to the US median

Whether you’re a journalist or government staffer looking at a trend line like the rise in San Francisco median household income, it’s helpful to keep a key statistical question in mind: compared to what? A 6% increase in a city’s median household income over several years might be impressive if overall national income stayed flat in the same period, but could be viewed as sluggish if the US household median rose 10% over that same period.

So let’s add national median income to the graph. I looked up US income on FRED, and it’s “MHIUS00000A052NCEN”. Get the national data with

Figure 3.5: Graph of San Francisco and US median household income

and then add the national data columns to the San Francisco data set with base R’s cbind() function. cbind means “bind” two data sets together by adding one data column to another, side by side. You can also rbind() to add rows from one data set below another.

## Warning in merge.xts(..., all = all, fill = fill, suffixes = suffixes): NAs
## introduced by coercion

Now, use the dygraph() function to graph the mygraphdata data set the same way you created the first graph:

Figure 3.5: Graph of San Francisco and US median household income

One of the best things about scripting this: When new data is available from the Fed, you can just run the same code and you’ll have an updated graph! Now, imagine how useful this is if you work with data that updates monthly or weekly.

Another advantage: Once you’ve got the basic code for pulling data from the Fed, it’s easy to change to another data point such as unemployment. The code for US unemployment data on FRED is “UNRATE” and San Francisco is “SANF806UR”, so you can just swap those in for “MHICA06075A052NCEN” and “MHIUS00000A052NCEN” and get your data set:

Because San Francisco unemployment data is only available since 1990, it could be better to just show all data from 1990 onward instead of displaying earlier national data. Time series have their own unusual way of subsetting data; this code

will update the unemploymentdata variable so it contains only information from 1990 and later.

Now you can draw the graph with: