Originally posted 8-13-2018
Abstract: One of the goals of stabilization policy is to reduce the output gap—the difference between potential and actual output—during downturns. Potential output, however, is an unobserved variable whose definition can vary. For example, some view potential output as the level of output that can be produced when employment is at the natural rate. Others use trend measures of output to measure potential. We survey some of these measures using both full-sample data (all of the data that would be available through June 2017) and real-time data (the actual data that would have been available at different points in the sample). We construct six different measures of potential: a linear trend, a quadratic trend, the Congressional Budget Office measure, and three filtered trends. We compare these measures across methods and across time. We also use the measures to compute the monetary policy prescription in a standard interest rate rule and find very little difference across methods.
One of the goals of stabilization policy is closing the output gap—that is, the difference between actual and potential output. Obviously, then, a key component of stabilization policy and its timing is the proper measurement of potential output. Potential output is an unobserved measure of the economy's ability to generate output. Unfortunately, many decades of academic research have failed to converge on a single measure (or even definition) of potential output, leaving one of policymakers' key objectives—at least, in part—out of focus.
As an example, one of the early works on measuring potential output was by Okun (1962) and led to the development of Okun's law, an empirical relationship between output and unemployment. Since that paper, a large literature has arisen around the measurement of potential output in various countries. Orphanides and van Norden (2002) show that conventional statistical measures produce unreliable estimates of the U.S. output gap in real time because they are subject to large revisions and the unreliability of end-of-sample estimates; however, Edge and Rudd (2016) find that despite changes in productivity, the revision properties of output-gap estimates improved significantly in the late 1990s and early 2000s. Fernald (2014) closely examines the productivity slowdown in the 2000s, showing that declines in productivity preceded the Great Recession and that the current low-productivity state is not related to the downturn. This slowdown implies slower long-term potential growth (Fernald estimates around 2.1 percent) and a narrower output gap than that identified by the Congressional Budget Office (CBO).
Marcellino and Musso (2011) find that estimates of the output gap in the euro area are also highly uncertain. However, they attribute the uncertainty to parameter instability and model uncertainty instead of data revisions. Champagne, Poulin-Bellisle, and Sekkel (2016) extend the U.S.-based studies to Canada's output-gap estimates, concluding that the Bank of Canada's staff estimates have become more reliable over the past 30 years. Dungey, Jacobs, and Tian (2017) use an unobserved components (UC) filter to forecast one-step-ahead potential for the G-7 countries and find that allowing for correlated shocks is more valuable to the measurement of potential than allowing for structural breaks.1
One of the central issues in measuring potential output is that there is no single theoretical definition. Kiley (2014) highlights the importance of defining concepts related to various methodologies for estimating potential output. Some economists prefer to define potential output as the trend—sometimes linear or quadratic—in output. This definition allows the analyst to use univariate (or sometimes multivariate) econometric techniques to measure potential output without much theory. However, even these techniques must rely on often implicit theoretical assumptions (Basu and Fernald, 2009). These techniques generally consist of filters that separate the trend from cyclical or low-frequency fluctuations. There are, however, a variety of different filters that are commonly used, and their popularity often depends on their ease of use or availability of executable code rather than the reliability or sensibility of the resulting potential output series. Other techniques, like those used by the CBO, define potential as the level of output possible if all resources are employed at their full potential, defined as trend growth in productive capacity. The theoretical assumptions underlying these techniques directly influence the calculation of potential; however, there are many unknowns associated with these models, including structural (parameter) shifts and even the appropriateness of the model itself to describe the economy (Basu and Fernald, 2009).
In this article, we explore some of the myriad of methods to measure potential output. We start by examining a measure produced by the CBO, which uses empirical relationships among output, the unemployment rate, and inflation, among other variables. We then consider a few common methods used to extract a trend-version of potential output: (i) a linear trend, (ii) a quadratic trend, (iii) the Hodrick-Prescott (1997) filtered trend, (iv) a univariate version of the UC model (Harvey, 1989, Clark, 1987, and Watson, 1986), and (v) a multivariate version of the UC model that includes inflation (Basistha and Startz, 2008).
Evaluating these methods, however, is a difficult task. Unlike standard out-of-sample forecast experiments with observed data, potential output is latent, so there is no "truth" to which we can compare our estimates. However, one way to examine the measures is to view the implications of their use through a policy lens. We consider the use of the six different measures (five trend measures and the CBO measure) in a standard version of the Taylor rule, asking whether different methods for estimating potential output would lead to dramatically different conclusions about how policy should be implemented at different points. We first conduct this experiment ex post, using all of the data in the sample. We then consider how these conclusions would have varied in "real time," using only data that were available at the time.
We find small differences across the measures of potential output. These differences are magnified at the beginning and end of the sample around turning points and when using real-time data. However, we find that these differences do not typically affect the policy prescription resulting from a standard Taylor-type interest rate rule.
The balance of the article is laid out as follows: The first section describes four of the six measures of potential output that we consider. The second section describes the data and the full-sample estimates of potential output using linear and quadratic trends and each of the four measures described in Section. The third section recomputes the estimates of the measures in real time, while the fourth section shows the implications for monetary policy using a calibrated Taylor rule. The final section concludes.
METHODS OF COMPUTING POTENTIAL OUTPUT
Potential output is an unobserved construct often thought to be synonymous with the maximum level of sustainable output. Because it is unobserved, potential output is generally constructed from observed data such as gross domestic product (GDP). Some definitions of potential output are based on theoretical or empirical relationships that are imposed on the data. For example, one might believe in both a Phillips curve-type relationship (between unemployment and inflation) and an Okun's law-type relationship (between output and unemployment) and construct potential output from a three-variable New Keynesian model. In this section, we discuss a few common methods of estimating potential output and how they differ, both theoretically and empirically. Obviously, the methods here describe only a small subset of the literature; they are, however, relatively commonly used measures.
The CBO defines potential output as the trend growth in the productive capacity of the economy. To estimate potential output, it uses a model that attributes real GDP growth to the growth in three factor inputs: capital, labor, and technological progress. The CBO method divides GDP into five sectors: nonfarm business, government, farm, households and nonprofits, and housing.2 For each sector, the CBO estimates a production function based on labor, capital accumulation, and total factor productivity (TFP) using a "potential value" for each factor.3
Output in sector is assumed to be generated from a Cobb-Douglas production function:
where is period- output in sector , is hours worked in sector during period , is the level of the capital stock in sector at time , and is period- TFP for sector (CBO, 2001). The parameter represents the capital share—capital's contribution to output growth.4 Aggregate output is assumed to be the sum of output across the five sectors. The CBO defines potential output for each sector as the output obtained when all of the factors (A, L, and K) are at their potential levels.
The potential level of the hours worked factor,, is computed as the product of average weekly hours, , and potential employment, which in turn is determined by the potential labor force, :
where is the natural rate of unemployment.5 The size of the labor force is assumed to be a function of the unemployment gap (i.e., the deviation of the unemployment rate from the natural rate) and the state of the business cycle, :
where is a matrix of business cycle dummy variables, with each column representing one business cycle, and is an iid normal error term. A representative element of is set to 1 during the expansion associated with that column. The CBO uses the fitted values from this regression, setting , to obtain an estimate for the potential size of the labor force, .
Because different types of capital have different potential productivities, the CBO uses a capital services index that assumes that the marginal productivity of different types of capital is proportional to capital cost shares.6 The types of capital in the CBO index are computers, software, communications equipment, other equipment, nonresidential structures, inventories, and land. The index is then the average of capital growth rates weighted by cost share:
where denotes the different types of capital. The weights are a two-period average of the relative cost share of each type of capital (CBO, 2001) for sector . The resulting index does not have to be adjusted to potential because the productivities already represent the potential contribution of capital to output.
The final component of the nonfarm business sector is TFP, which is estimated by using the residual historical growth values not attributed to capital and labor. These values are then cyclically adjusted similar to the way the labor factor uses separate dummy time trends for each business cycle.
While the sectors of the CBO model generally follow the same format, there is some variation. For example, the government sector separates output into local, state, and federal levels and then calculates each individual component by using potential compensation of employees and capital cost allowance. Compensation is modeled as a function of potential employment and potential productivity. Output for the farm sector is a function of potential farm employment and potential output per employee, and output from the households and nonprofit sector is calculated using potential hours worked and potential productivity in the same manner outlined above. The housing sector is a bit different and relies on projections of the residential housing stock based on residential investment and projections of the productivity of that stock using the rate of residential capital depreciation.
Another method for computing potential output that is commonly used by both the media and academics defines potential as "trend output." While this method can be as simple as computing a deterministic (say, linear or quadratic) trend from output data, the literature has typically adopted slightly more complicated methods of extracting the trend. One popular method for extracting a trend from a single series of data is the Hodrick-Prescott (1997, henceforth HP) filter. The objective of the HP filter is to separate the high-frequency or cyclical fluctuations from the low-frequency or trend movements.
Suppose that is the series in question (in our case, output) and is the length of the data series. The HP-filtered trend is obtained from
where and is the value of the potential output (measured as the trend) at time . The parameter is a smoothing parameter that is discussed below. The two terms of the equation lead to a trade-off between allowing the trend to follow the data (the first term) and minimizing movements in the trend (the second term). The smoothing parameter increases the weight on the latter: The larger the smoothing parameter, the more weight is placed on minimizing the higher-frequency fluctuations. Thus, as the data frequency increases, it is desirable to set the smoothing parameter to larger values (see Ravn and Uhlig, 2002, for a discussion of the values commonly used for various data frequencies).
While the HP filter's relative ease of use is an advantage, the method does have a number of drawbacks. Aside from choosing the "correct" value of the smoothing parameter, the HP filter makes a number of assumptions about the data to be examined, including that the series is . If the series is not , the HP filter can introduce false fluctuations in the trend.7
An alternative to the HP filter that may have better statistical characteristics is the UC model first introduced in Harvey (1989). Similar to the HP filter, the UC model assumes that the data series (in our case, log output) can be written as the sum of a unit root trend and stationary cycle :
where the trend is a random walk with drift ,
and the cycle exhibits stationary AR dynamics,
where and are iid normal error terms and is a common identification assumption.
A typical assumption in the UC literature is that the variance-covariance of the vector of errors is diagonal—that is, and are orthogonal. However, Morley, Nelson, and Zivot (2003) showed that the assumption of zero correlation between the trend and cycle innovations can be relaxed. Moreover, imposing yields another common filter, the Beveridge-Nelson decomposition. For our application, we will consider the uncorrelated UC model but refer the reader to Morley, Nelson, and Zivot (2003) for the application with correlation between the shocks to the trend and cycle.
Multivariate Unobserved Components
One issue with the potential output measures computed in the previous two sections is that they are simply trends extracted from output without any economic theory. We can add an element of economic theory by incorporating other variables that might fluctuate when the difference between output and potential output changes. For example, if one believed in Okun's law and a Phillips curve (as in a New Keynesian model), then there should exist a relationship between output's deviation from potential and inflation. We can capture this relationship by estimating a multivariate version of the UC model, where we allow some correlation between the cross-series innovations in the trends and cycles.
For this model, we assume that both output and inflation can be written as the simple sum of a trend and a cycle:
where the trends in each variable are unit roots with separate drifts:
The cycles evolve as a VAR:
where the covariance matrix is block diagonal,
This formulation allows an outside variable—in this case, inflation—to influence the estimate of potential output through the contemporaneous correlation in and the lagged correlation in the VAR.8 The block diagonal structure allows contemporaneous correlation across the variables in the cycles or trends but does not allow the trend and cycle of a variable to be contemporaneously correlated and does not allow the trend of one variable to be correlated with the cycle of the other.
COMPARING SIX MEASURES OF POTENTIAL OUTPUT
The standard measure for output is real GDP, which is available at a quarterly frequency from the Bureau of Economic Analysis (BEA). The BEA releases three estimates of output: an advance estimate available one month after the end of a quarter, a preliminary estimate available two months after the end of a quarter, and a final estimate available three months after the end of a quarter. The final numbers are continually revised. As is typical in these types of analyses, we present the revised final estimates, seasonally adjusted and annualized.
In a later section, we consider the computation of real-time estimates of potential output using only the final estimates of real GDP that would have been available at the time. Moreover, we use the data vintage from each period, ignoring subsequent revisions to the data. Thus, in 1990:Q1, we would compute potential output using only data available at that point in time and using only the revisions that were available up to that date.
Inflation data are the yearly percent change in quarterly, seasonally adjusted personal consumption expenditures compiled by the BEA.
These two data series, along with the CBO's potential output, are accessed from the Federal Reserve Bank of St. Louis FRED® database. We use data from 1950:Q1 through 2015:Q2 for the full-sample analysis. The real-time measures of output, inflation, and the CBO's potential output come from the FRED® vintage data service, ALFRED®. For the output measures, we use the first vintage available in January 1992; for inflation, we use the first vintage in January 1996.
Figure 1 presents output gaps—the difference between actual output and potential output—for six alternative measures of potential GDP. (NOTE: The gray bars in the figures indicate recessions as determined by the National Bureau of Economic Research.) Panel A shows the linear and quadratic time trends, Panel B shows the CBO measure and the trend component of the HP filter, and Panel C shows the two UC models. The linear time trend assumes that the growth rate in potential output is constant over time. A prominent feature of the data is that GDP growth appears slower in the first half of the sample than in the second half. Because the linear time trend assumes that the growth rate of potential output is constant over the whole sample, the output gap is negative for much of the middle years of the sample. Additionally, the output gap is large and positive (actual output is greater than potential) at the beginning and end of the sample.
The quadratic time trend introduces an additional term that allows the growth rate of potential output to change over time. The quadratic trend may be a more realistic estimate of potential GDP, yielding a negative output gap during economic downturns and a positive output gap during the height of some expansions. Based on the quadratic trend, GDP was above potential for most of the 2000s then fell below potential during the 2007-09 recession and has yet to recover. While simple to compute and easy to understand, these deterministic trends do not allow for possible structural change and are sensitive to the sample used to compute them.
Panel B of Figure 1 presents the CBO estimate and the trend component of the HP filter. Compared with the deterministic trends above, these two measures track real GDP more closely, usually resulting in smaller output gaps. For example, measured GDP was below both the HP-filtered and the CBO measures of potential output for short periods at the beginning of the 2000s. However, during the 2007-09 recession, the two measures diverged: Measured GDP was at the CBO measure of potential leading into the Great Recession but below the HP-filtered measure of potential.
These differences highlight one feature of the CBO model: It rarely places output above potential. Figure 2 highlights the last 10 years of data (2006 to 2016) for both the CBO measure and the HP filter. The CBO has the output gap much larger than the other measures during and after the Great Recession, with output still below potential into 2016.9 The HP filter, on the other hand, has output above potential for the first half of the recession and shows a negative output gap only until mid-2011.
Figure 3 compares the year-over-year percent change in the HP-filtered and the CBO potential output measures. The two series mostly move together, with some exceptions around the end of the sample, during the expansion between 1960 and 1970, and during the 1973 downturn. At the beginning of the sample in 1950, potential GDP growth was high in both measures, slightly above 5 percent. Potential output growth fluctuates over 5- to 10-year cycles for the next 50 years, averaging about 3 percent until the early 2000s when it falls to less than 1 percent before bottoming out. Figure 3 also shows that the HP filter's performance deteriorates around the ends of the sample. Because the filter has fewer data points on the ends, it is more difficult to identify the trend. Near the ends of the sample, the CBO potential growth rate is about 1.5 percent, while the HP-filter's is higher and about 2.1 percent.10
Panel C of Figure 1 presents the trend component of GDP from the univariate and multivariate UC models. The univariate UC trend yields positive output gaps before the 1991, 2001, and 2007-09 recessions (albeit a much smaller positive gap leading into the 2007-09 recession). After the Great Recession, the univariate UC trend yields a large negative output gap. Unlike previous measures, the univariate UC output gap does not narrow during the Great Recession recovery. Instead, the gap widens. The bivariate UC trend, which allows for correlation between inflation and output in the estimation of potential, shows output below potential through the 1980s.11 Output returns to potential for a while after the 1980s until the mid-1990s, then goes above potential and remains there until the 2007-09 recession. Consistent with a narrative about unsustainable gains in productivity during the IT boom leading to growth above potential (Fernald, 2014), the bivariate UC model is the only measure for which measured output is above potential for all of the 2000s. After the 2007-09 recession, output falls below potential and the gap widens through the end of the sample. This model implies that GDP during the 2000s was growing above potential and that the 2007-09 recession actually served to adjust growth back down to potential levels.
REAL-TIME ESTIMATES OF POTENTIAL OUTPUT
The previous section demonstrated that differences in the definition of potential output for the methods used to construct it can alter our conclusions about the state of the economy. In that section, we used the data that were available in 2017:Q3. The data are, however, revised over time. For example, the 2001:Q1 observation of GDP might be different if observed in 2003:Q1 versus 2004:Q1. These revisions to the GDP data will obviously affect the calculation of potential.12
To see how the different measures of potential output change as the data are revised, we compute quasi-real-time estimates using different vintages of data. To compute a series for potential output at time , we use the data that would be available at that time—that is, we use . We consider a few different vintages of data: the first available vintage, 1992:Q1, and vintages for the first quarter of every fifth year, 1995:Q1, 2000:Q1, 2005:Q1, 2010:Q1, and 2015:Q1. Figure 4 shows the GDP vintages. The bivariate UC model is computed using vintage data for both GDP and inflation. Because of limited availability of inflation vintages, the bivariate UC model is calculated starting in 1996:Q1 then every fifth year (2000:Q1 and so forth). Figure 6 shows the six GDP vintages indexed to 100 at the beginning of the sample (January 1961 for the real-time analysis). A major revision between the 1995:Q1 and 2000:Q1 vintages shifts GDP up substantially. Another, less dramatic, upward shift occurs in the 2000s between the 2010:Q1 and 2015:Q1 vintages.
Figures 5-7 plot the vintage computations of potential output for the different methods outlined above. Notice that the real-time measure not only affects the last period of the vintage, but also can affect the estimates of potential output for all periods before. This effect can occur because the data are different (revised) or because the latest observation of the data affects the inference for all periods before. The GDP vintages used are reported in real terms using varying dollar indexes. To compare vintages across different dollar indexes, we index the potential computations to 100 in the first period.
Figure 5 shows real-time potential output for the linear time trend (Panel A) and quadratic time trend (Panel B). Because of the restriction in the linear trend that output be constant over time, as the sample size increases, the slope of the potential output curve becomes steeper. This steady increase reflects both GDP growth over time and revisions that shifted historical growth upward. The quadratic trend picks up on the same increasing growth over time. The first two vintages, 1992:Q1 and 1995:Q1, move together, then the vintages between 2000:Q1 and 2010:Q1 have an upward shift with a more defined curve, and finally the last vintage in 2015:1 moves up again.
Figure 6 plots real-time vintage potential output for the HP-filter (Panel A) and CBO measures (Panel B). These measures produce essentially the same potential output estimates for the 1992:Q1 and 1995:Q1 vintages. In 2000:Q1, the GDP revision causes an upward level shift in potential output. The real-time analysis further illustrates the problem of the HP filter being less accurate around the end points. As seen in the figure, around the Great Recession, the HP-filter potential estimate increases notably from the 2010:Q1 vintage to the 2015:Q1 vintage. The CBO measure is more consistent through the recession, although the 2015:Q1 vintage shifts potential down slightly during the recession.
Figure 7 plots the vintage output estimates for the univariate UC (left) and bivariate UC (right) models. Potential obtained from the bivariate UC model has a different starting vintage to incorporate the personal consumption expenditures inflation data. The univariate UC vintages follow a similar pattern as those of the CBO and HP filter, reflecting an increase in potential output after 2000:Q1. Otherwise, the univariate model is mostly consistent despite the GDP revisions. The notable exception is around the 2007-09 recession, where the 2015:Q1 vintage estimates potential to be higher during the recession than the 2010:Q1 vintage.
Data revisions for the bivariate UC model are substantively different from the other models. In the years following the Great Recession, data revisions for the other models generally lead to upward revisions in potential. For the bivariate UC model, however, the revised data show a downward adjustment of potential.
IMPLICATIONS FOR POLICY
One reason to care about measuring potential output is that it may be important for the conduct of monetary policy. While there appear to be differences in the measure of potential across methods and vintages, one might want to determine whether these differences are substantive enough to produce differences in policy. Although the Federal Reserve's dual mandate is to achieve full employment and stable prices, the conduct of monetary policy is often theorized to depend on output relative to its potential (which would be proportional to the deviation between the unemployment rate and the natural rate if Okun's law were assumed to hold). As an example, the Taylor (1993) rule, which is often characterized as an optimal rule for the stance of monetary policy, sets the policy rate, , as a function of the deviation of the inflation rate, , from a target rate, , and the deviation of log output, , from log potential, :
where is the equilibrium interest rate. A common parameterization of the Taylor rule is to set and set the equilibrium interest rate .
The two coefficients and reflect the policymaker's responsiveness to inflation and output deviations, respectively. Larger coefficients imply larger movements in the policy instrument (usually the federal funds rate). However, the coefficients in the Taylor rule are not the only "free parameters." While the Fed has recently adopted a target band for inflation centered on 2 percent, how one computes potential output can affect the prescribed interest rate even with fixed parameters and a fixed inflation target.
Figure 8 shows the time series of the federal funds rate prescribed by the Taylor rule in Equation 3 using the six measures of potential output described above and the actual federal funds rate over the same period. Each of these series is computed using the full sample of data. Except for the beginning of the sample, the different measures of potential prescribe almost the same policy rate. In the beginning of the sample, the linear time trend and bivariate UC yield estimates that are slightly different from those of the other four measures because of large differences in each measure's estimated output gap. The linear trend prescribes a slightly higher policy rate in the early years of the sample because assuming a constant growth rate places potential below actual output for the first 10 years of data.
One issue with the exercise in Figure 8 is that policymakers did not have access to all of the information for the full sample of data at every point in time. This availability issue is important because the HP filter is a two-sided filter and both UC methods use smoothers. The use of smoothers means that they use all of the data available to infer potential output—that is, data at time at the end of the sample could influence the estimate of potential output at time . Because the policymaker would not have this period- data available, his or her estimate of potential output could differ substantially and lead to a very different prescription for policy.
Figure 9 shows the real-time policy rates prescribed by the univariate UC model. The full-sample data above do not yield drastic differences for the policy rate; however, different vintages within the same measure do show important differences. We show only the vintages for the univariate UC model, but the real-time policy rates from the other measures yield the same results.13 As more data become available and historical data are revised, estimated potential changes and so does the respective output gap. These updates to the data series lead us to reach different conclusions about policy rate levels around turning points. The real-time policy rule identifies an earlier turning point for the 1970 recession in the 1995:Q1 vintage than in the 1992:Q1 vintage and exhibits a lower max policy rate during the double-dip 1980s recession. After 1992:Q1, the revised data cause a downward level shift in the prescribed policy rate during the 1980s. Beyond a small blip in the late 1960s between the 1992:Q1 and 1995:Q1 vintages, the turning points occur on the same dates. However, the level of the policy rate at these turning points does vary. Figures 10 and 11 highlight level shifts in vintage policy rules around turning points. For example, the turning point before the 1990s recession was revised lower as new vintages were released. Figure 10 shows that after the 1992:Q1 vintage, data revisions imply a later turning point in the 1969-70 recession. Figure 11 highlights the downward shift in prescribed policy rates during the 1980s. As data revisions were released, policy rates shifted from a peak of around 9.5 percent from the 1992:Q1 vintage during the 1991 recession to around 8.5 percent in the 2010:Q1 and 2015:Q1 vintages.
We considered a number of relatively common methods of measuring potential output. While the measures produce qualitatively similar results, they can also vary at important times: at the beginning and ends of the sample periods and around turning points. Moreover, the measures can vary substantially in real time. However—and perhaps fortunately for the policymaker—simple interest rate rules do not show wildly different policy prescriptions based on the different measures.
9 For a full discussion of advantages and disadvantages related to the CBO's production function based approach, see Arnold (2009).
10 The CBO argues that filtered growth data is "trend output" rather than "potential output" because it does not tie directly to stable inflation (CBO, 2004).
11 Dungey, Jacobs, and Tian (2017) find that correlated innovations are more important than structural breaks for potential estimation, which suggests greater consideration should be given to multivariate models incorporating inflation into measures of potential.
Amy Y. Guisinger is an assistant professor of economics at Lafayette College. Michael T. Owyang is an assistant vice president and economist and Hannah G. Shell is a senior research associate in the Research Division at Federal Reserve Bank of St. Louis. The authors thank Kevin Kliesen and Jeremy Piger for helpful discussion.