This article presents empirical evidence on the efficacy of forecast averaging using the ALFRED (ArchivaL Federal Reserve Economic Data) real-time database. The authors consider averages over a variety of bivariate vector autoregressive models. These models are distinguished from one another based on at least one of the following factors: (i) the choice of variables used as predictors, (ii) the number of lags, (iii) use of all available data or only data after the Great Moderation, (iv) the observation window used to estimate the model parameters and construct averaging weights, and (v) the use of either iterated multistep or direct multistep methods for forecast horizons greater than one. A variety of averaging methods are considered. The results indicate that the benefits of model averaging relative to Bayesian information criterion-based model selection are highly dependent on the class of models averaged. The authors provide a novel decomposition of the forecast improvements that allows determination of the most (and least) helpful types of averaging methods and models averaged across.