This paper presents empirical evidence on the efficacy of forecast averaging using the ALFRED real-time database. We consider averages taken over a variety of different bivariate VAR models that are distinguished from one another based upon at least one of the following: which variables are used as predictors, the number of lags, using all available data or data after the Great Moderation, the observation window used to estimate the model parameters and construct averaging weights, and for forecast horizons greater than one, whether or not iterated- or direct-multistep methods are used. A variety of averaging methods are considered. Our results indicate that the benefits to model averaging relative to BIC-based model selection are highly dependent upon the class of models being averaged over. We provide a novel decomposition of the forecast improvements that allows us to determine which types of averaging methods and models were most (and least) useful in the averaging process.