|
on Econometrics |
By: | Shi, Chengchun; Song, Rui; Lu, Wenbin; Li, Runzi |
Abstract: | In this article, we develop a new estimation and valid inference method for single or low-dimensional regression coefficients in high-dimensional generalized linear models. The number of the predictors is allowed to grow exponentially fast with respect to the sample size. The proposed estimator is computed by solving a score function. We recursively conduct model selection to reduce the dimensionality from high to a moderate scale and construct the score equation based on the selected variables. The proposed confidence interval (CI) achieves valid coverage without assuming consistency of the model selection procedure. When the selection consistency is achieved, we show the length of the proposed CI is asymptotically the same as the CI of the “oracle” method which works as well as if the support of the control variables were known. In addition, we prove the proposed CI is asymptotically narrower than the CIs constructed based on the desparsified Lasso estimator and the decorrelated score statistic. Simulation studies and real data applications are presented to back up our theoretical findings. Supplementary materials for this article are available online. |
Keywords: | confidence interval; Ultrahigh dimensions; Generalized linear models; online estimations; Online estimation; Confidence interval |
JEL: | C1 |
Date: | 2020–01–23 |
URL: | http://d.repec.org/n?u=RePEc:ehl:lserod:103043&r=all |
By: | Paulo M.M. Rodrigues; Marina Balboa; Antonio Rubia; A. M. Robert Taylor |
Abstract: | We introduce a new joint test for the order of fractional integration of a multivariate fractionally integrated vector autoregressive [FIVAR] time series based on applying the Lagrange multiplier principle to a feasible generalised least squares estimate of the FIVAR model obtained under the null hypothesis. A key feature of the test we propose is that it is constructed using a heteroskedasticity-robust estimate of the variance matrix. As a result, the test has a standard 2 limiting null distribution under considerably weaker conditions on the innovations than are permitted in the extant literature. Specifically, we allow the innovations driving the FIVAR model to follow a vector martingale difference sequence allowing for both serial and crosssectional dependence in the conditional second-order moments. We also do not constrain the order of fractional integration of each element of the series to lie in a particular region, thereby allowing for both stationary and non-stationary dynamics, nor do we assume any particular distribution for the innovations. A Monte Carlo study demonstrates that our proposed tests avoid the large over-sizing problems seen with extant tests when conditional heteroskedasticity is present in the data. We report an empirical case study for a sample of major U.S. stocks investigating the order of fractional integration in trading volume and different measures of volatility in returns, including realized variance. Our results suggest that both return volatility and trading volume are fractionally integrated, but with the former generally found to be more persistent (having a higher fractional exponent) than the latter, when more reliable proxies for volatility such as the range or realized variance are used. |
JEL: | C12 C22 |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:ptu:wpaper:w202102&r=all |
By: | Francisco Blasques (Vrije Universiteit Amsterdam); Andre Lucas (Vrije Universiteit Amsterdam); Anne Opschoor (Vrije Universiteit Amsterdam); Luca Rossini (Queen Mary University of London) |
Abstract: | We introduce the new F-Riesz distribution to model tail-heterogeneity in fat-tailed covariance matrix observations. In contrast to the typical matrix-valued distributions from the econometric literature, the F-Riesz distribution allows for different tail behavior across all variables in the system. We study the consistency properties of the maximum likelihood estimator in both static and dynamic models with F- Riesz innovations using both one-step and two-step (targeting) estimation techniques. Allowing for tail-heterogeneity when modeling covariance matrices appears empirically highly relevant. When applying the new distribution to realized covariance matrices of 30 U.S. stocks over a 14 year period, we find huge likelihood increases both in-sample and out-of-sample compared to all competing distributions, including the Wishart, inverse Wishart, Riesz, inverse Riesz, and matrix-F distribution. |
Keywords: | Matrix Distributions, Tail Heterogeneity, (inverse) Riesz Distribution, Fat-Tails, Realized Covariance Matrices |
JEL: | C58 C32 C46 |
Date: | 2021–01–24 |
URL: | http://d.repec.org/n?u=RePEc:tin:wpaper:20210010&r=all |
By: | Wang, Wenjie |
Abstract: | This note studies the asymptotic validity of bootstrapping the test of overidentifying restrictions under many/many weak instruments and heteroskedasticity. We show that the wild bootstrap consistently estimates the null limiting distributions of a jackknife overidentification statistic under this asymptotic framework. In particular, such bootstrap validity holds even when the bootstrap procedure fails to mimic well the distribution of the jackknife instrumental variable estimator, an important component of the statistic of interest. Monte Carlo simulations show that the wild bootstrap provides a more reliable method than that based on asymptotic critical values to approximate the null distributions of interest under many/many weak instruments and heteroskedasticity. |
Keywords: | Bootstrap, Overidentification Tests, Many Instruments, Weak Instruments, Heteroskedasticity |
JEL: | C12 C15 C26 |
Date: | 2020–12–21 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:104858&r=all |
By: | Siem Jan Koopman (Vrije Universiteit Amsterdam); Julia Schaumburg (Vrije Universiteit Amsterdam); Quint Wiersma (Vrije Universiteit Amsterdam) |
Abstract: | We propose a new unified approach to identifying and estimating spatio-temporal dependence structures in large panels. The model accommodates global cross-sectional dependence due to global dynamic factors as well as local cross-sectional dependence, which may arise from local network structures. Model selection, filtering of the dynamic factors, and estimation are carried out iteratively using a new algorithm that combines the Expectation-Maximization algorithm with coordinate descent and gradient descent, allowing us to efficiently maximize an l1- and l2-penalized state space likelihood function. A Monte Carlo simulation study illustrates the good performance of the algorithm in terms of determining the presence and magnitude of global and/or local cross-sectional dependence. In an empirical application, we investigate monthly US interest rate data on 15 maturities over almost 40 years. We find that besides a changing number of global dynamic factors, there is heterogeneous local dependence among neighboring maturities. Taking this heterogeneity into account substantially improves out-of-sample forecasting performance. |
Keywords: | high-dimensional factor model, Lasso, spatial error model, yield curve |
JEL: | C32 C33 C38 |
Date: | 2021–01–21 |
URL: | http://d.repec.org/n?u=RePEc:tin:wpaper:20210008&r=all |
By: | Pincheira, Pablo; Hardy, Nicolás; Muñoz, Felipe |
Abstract: | In this paper we present a new asymptotically normal test for out-of-sample evaluation in nested models. Our approach is a simple modification of a traditional encompassing test that is commonly known as Clark and West test (CW). The key point of our strategy is to introduce an independent random variable that prevents the traditional CW test from becoming degenerate under the null hypothesis of equal predictive ability. Using the approach developed by West (1996), we show that in our test the impact of parameter estimation uncertainty vanishes asymptotically. Using a variety of Monte Carlo simulations in iterated multi-step-ahead forecasts we evaluate our test and CW in terms of size and power. These simulations reveal that our approach is reasonably well-sized even at long horizons when CW may present severe size distortions. In terms of power, results are mixed but CW has an edge over our approach. Finally, we illustrate the use of our test with an empirical application in the context of the commodity currencies literature. |
Keywords: | forecasting; random walk; out-of-sample; prediction; mean square prediction error |
JEL: | C01 C1 C12 G17 |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:105368&r=all |
By: | Minji Bang; Wayne Yuan Gao; Andrew Postlewaite; Holger Sieg |
Abstract: | This paper develops a new method for identifying econometric models with partially latent covariates. Such data structures arise naturally in industrial organization and labor economics settings where data are collected using an "input-based sampling" strategy, e.g., if the sampling unit is one of multiple labor input factors. We show that the latent covariates can be nonparametrically identified, if they are functions of a common shock satisfying some plausible monotonicity assumptions. With the latent covariates identified, semiparametric estimation of the outcome equation proceeds within a standard IV framework that accounts for the endogeneity of the covariates. We illustrate the usefulness of our method using two applications. The first focuses on pharmacies: we find that production function differences between chains and independent pharmacies may partially explain the observed transformation of the industry structure. Our second application investigates education achievement functions and illustrates important differences in child investments between married and divorced couples. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.05847&r=all |
By: | Palumbo, D. |
Abstract: | The occurrence of extreme observations in a time series depends on the heaviness of the tails of its distribution. The paper proposes a dynamic conditional score model (DCS) for modelling dynamic shape parameters that govern the tail index. The model is based on the Generalised t family of conditional distributions, allowing for the presence of asymmetric tails and therefore the possibility of specifying different dynamics for the left and right tail indices. The paper examines through simulations both the convergence properties of the model and the implications of the link functions used. In addition the paper introduces and studies the size and power properties of a new Lagrange Multiplier (LM) test based on fitted scores to detect the presence of dynamics in the tail index parameter. The paper also shows that the novel LM test is more effective than existing tests based on fitted scores. The model is fitted to Equity Indices and Credit Default Swaps returns. It is found that the tail index for equities has dynamics driven mainly by either the upper or lower tail depending if leverage is taken or not into account. In the case of Credit Default Swaps the test identifies very persistent dynamics for both the tails. Finally the implications of dynamic tail indices for the estimated conditional distribution are assessed in terms of conditional distribution forecasting showing that the novel model predicts more accurately expected shortfalls and value-at-risk than existing models. |
Keywords: | Heavy Tailed Distributions, Extreme Events, Score-Driven Models, Tail Index, Lagrange Multiplier Test, Financial Markets |
JEL: | C12 C18 C51 C52 C46 C58 G12 |
Date: | 2021–01–29 |
URL: | http://d.repec.org/n?u=RePEc:cam:camdae:2111&r=all |
By: | Nicolas Debarsy (CNRS - Centre National de la Recherche Scientifique); Cem Ertur (UO - Université d'Orléans, Laboratoire d'Economie d'Orléans - LEO - Laboratoire d'Économie d'Orleans - CNRS - Centre National de la Recherche Scientifique - Université de Tours - UO - Université d'Orléans) |
Abstract: | The interaction matrix, or spatial weight matrix, is the fundamental tool to model cross-sectional interdependence between observations in spatial econometric models. However, it is most of the time not derived from theory, as it should be ideally, but chosen on an ad hoc basis. In this paper, we propose a modified version of the J test to formally select the interaction matrix. Our methodology is based on the application of the robust against unknown heteroskedasticity GMM estimation method, developed by Lin & Lee (2010). We then implement the testing procedure developed by Hagemann (2012) to overcome the decision problem inherent to non-nested models tests. An application is presented for the Schumpeterian growth model with worldwide interactions (Ertur & Koch 2011) using three different types of interaction matrix: genetic distance, linguistic distance and bilateral trade flows and we find that the interaction matrix based on trade flows is the most adequate. Furthermore, we propose a network based innovative representation of spatial econometric results. |
Keywords: | Bootstrap,GMM,Interaction matrix,J tests,Non-nested models,Heteroscedasticity,Spatial autoregressive models,Heteroskedasticity |
Date: | 2019–03 |
URL: | http://d.repec.org/n?u=RePEc:hal:journl:halshs-01278545&r=all |
By: | Florian Eckert (KOF Swiss Economic Institute, ETH Zurich, Switzerland); Philipp Kronenberg (KOF Swiss Economic Institute, ETH Zurich, Switzerland); Heiner Mikosch (KOF Swiss Economic Institute, ETH Zurich, Switzerland); Stefan Neuwirth (KOF Swiss Economic Institute, ETH Zurich, Switzerland) |
Abstract: | Most macroeconomic indicators failed to capture the sharp economic fluctuations dur- ing the Corona crisis in a timely manner. Instead, alternative high-frequency data have been used, aiming to monitor the economic situation. However, these data are often only loosely related to the business cycle and come with irregular patterns of missing observations, ragged edges and short histories. This paper presents a novel mixed- frequency dynamic factor model for measuring economic activity at high-frequency intervals in rich data environments. Previous research has estimated the dynamic factor conditional on actually observed data only. In contrast, we propose to estimate the dynamic factor conditional on a balanced panel with observed and latent data information, where the latent data are themselves estimated in a separate state-space block. One benefit of this data augmentation strategy is that it allows to easily ac- count for serial correlation in the factor measurement errors. We apply the model to a set of daily, weekly, monthly and quarterly series and extract a dynamic factor, which is identified as the weekly growth rate of GDP. It turns out that the model is well suited to exploit the business cycle information contained in alternative high- frequency data. GDP is tracked timely and accurately during the Corona crisis and past economic crises. |
Keywords: | Economic Activity Indicator, Real Time, Nowcasting, Alternative HighFrequency Data, Mixed-Frequency Dynamic Factor Model, Data Augmentation |
JEL: | C11 C32 C38 C53 E32 E37 |
Date: | 2020–12 |
URL: | http://d.repec.org/n?u=RePEc:kof:wpskof:20-488&r=all |
By: | Edvard Bakhitov; Amandeep Singh |
Abstract: | Recent advances in the literature have demonstrated that standard supervised learning algorithms are ill-suited for problems with endogenous explanatory variables. To correct for the endogeneity bias, many variants of nonparameteric instrumental variable regression methods have been developed. In this paper, we propose an alternative algorithm called boostIV that builds on the traditional gradient boosting algorithm and corrects for the endogeneity bias. The algorithm is very intuitive and resembles an iterative version of the standard 2SLS estimator. Moreover, our approach is data driven, meaning that the researcher does not have to make a stance on neither the form of the target function approximation nor the choice of instruments. We demonstrate that our estimator is consistent under mild conditions. We carry out extensive Monte Carlo simulations to demonstrate the finite sample performance of our algorithm compared to other recently developed methods. We show that boostIV is at worst on par with the existing methods and on average significantly outperforms them. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.06078&r=all |
By: | Arthur Lewbel (Boston College); Susanne M. Schennach (Brown University); Linqi Zhang (Boston College) |
Abstract: | We show that a standard linear triangular two equation system can be point identified, without the use of instruments or any other side information. We find that the only case where the model is not point identified is when a latent variable that causes endogeneity is normally distributed. In this non-identified case, we derive the sharp identified set. We apply our results to Acemoglu and Johnson’s (2007) model of life expectancy and GDP, obtaining point identification and comparable estimates to theirs, without using their (or any other) instrument. |
Keywords: | Returns to schooling, identification, triangular system, Kotlarski, deconvolution |
JEL: | C14 C30 |
Date: | 2020–12–20 |
URL: | http://d.repec.org/n?u=RePEc:boc:bocoec:1022&r=all |
By: | Tam\'as Krisztin; Philipp Piribauer |
Abstract: | We develop a Bayesian approach to estimate weight matrices in spatial autoregressive (or spatial lag) models. Our approach focuses on spatial weights which are binary prior to row-standardization. However, unlike recent literature our approach requires no strong a priori assumptions on (socio-)economic distances between the spatial units. The estimation approach relies on efficient Gibbs sampling techniques and can be easily combined with and extended to more flexible spatial specifications. In addition to geographic prior structures, we also discuss shrinkage priors on the neighbourhood size, which are particularly useful in spatial panels where T is small relative to N. |
Date: | 2021–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2101.11938&r=all |
By: | Pablo Montero-Manso; Rob J Hyndman |
Abstract: | Forecasting of groups of time series (e.g. demand for multiple products offered by a retailer, server loads within a data center or the number of completed ride shares in zones within a city) can be approached locally, by considering each time series as a separate regression task and fitting a function to each, or globally, by fitting a single function to all time series in the set. While global methods can outperform local for groups composed of similar time series, recent empirical evidence shows surprisingly good performance on heterogeneous groups. This suggests a more general applicability of global methods, potentially leading to more accurate tools and new scenarios to study. However, the evidence has been of empirical nature and a more fundamental study is required. Formalizing the setting of forecasting a set of time series with local and global methods, we provide the following contributions: • We show that global methods are not more restrictive than local methods for time series forecasting, a result which does not apply to sets of regression problems in general. Global and local methods can produce the same forecasts without any assumptions about similarity of the series in the set, therefore global models can succeed in a wider range of problems than previously thought. • We derive basic generalization bounds for local and global algorithms, linking global models to pre-existing results in multi-task learning: We find that the complexity of local methods grows with the size of the set while it remains constant for global methods. Global algorithms can afford to be quite complex and still benefit from better generalization error than local methods for large datasets. These bounds serve to clarify and support recent experimental results in the area of time series forecasting, and guide the design of new algorithms. For the specific class of limited-memory autoregressive models, this bound leads to the design of global models with much larger memory than what is effective for local methods. • The findings are supported by an extensive empirical study. We show that purposely naïve algorithms derived from these principles, such as global linear models fit by least squares, deep networks or even high order polynomials, result in superior accuracy in benchmark datasets. In particular, global linear models show an unreasonable effectiveness, providing competitive forecasting accuracy with far fewer parameters than the simplest of local methods. Empirical evidence points towards global models being able to automatically learn long memory patterns and related effects that are only available to local models if introduced manually. |
Keywords: | time series, forecasting, generalization, global, local, cross-learning, pooled regression |
Date: | 2020 |
URL: | http://d.repec.org/n?u=RePEc:msh:ebswps:2020-45&r=all |
By: | Daouia, Abdelaati; Gijbels, Irene; Stupfler, Gilles |
Abstract: | Regression extremiles define a least squares analogue of regression quantiles.They are determined by weighted expectations rather than tail probabilities. Of special interest is their intuitive meaning in terms of expected minima and maxima. Their use appears naturally in risk management where, in contrast to quantiles, they fulfill the coherency axiom and take the severity of tail losses into account. In addition, they are comonotonically additive and belong to both the families of spec- tral risk measures and concave distortion risk measures. This paper provides the first detailed study exploring implications of the extremile terminology in a general setting of presence of covariates. We rely on local linear (least squares) check func- tion minimization for estimating conditional extremiles and deriving the asymptotic normality of their estimators. We also extend extremile regression far into the tails of heavy-tailed distributions. Extrapolated estimators are constructed and their asymptotic theory is developed. Some applications to real data are provided. |
Date: | 2021–01–18 |
URL: | http://d.repec.org/n?u=RePEc:tse:wpaper:125140&r=all |
By: | Pincheira, Pablo; Hardy, Nicolas |
Abstract: | This is a summary of the paper entitled : “The Mean Squared Prediction Error Paradox”. In that paper, we show that traditional comparisons of Mean Squared Prediction Error (MSPE) between two competing forecasts may be highly controversial. This is so because when some specific conditions of efficiency are not met, the forecast displaying the lowest MSPE will also display the lowest correlation with the target variable. Given that violations of efficiency are usual in the forecasting literature, this opposite behavior in terms of accuracy and correlation with the target variable may be a fairly common empirical finding that we label here as "the MSPE Paradox." We characterize "Paradox zones" in terms of differences in correlation with the target variable and conduct some simple simulations to show that these zones may be non-empty sets. Finally, we illustrate the relevance of the Paradox with two empirical applications. |
Keywords: | Mean Squared Prediction Error, Correlation, Forecasting, Time Series, Random Walk. |
JEL: | C0 C00 C01 C02 C2 C21 C22 C4 C41 C44 C5 C51 C52 C53 C54 C58 E0 E3 E37 E5 E58 E6 F3 F31 F37 F4 F41 F44 F47 G00 G1 G12 G15 G17 |
Date: | 2020–12–29 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:105020&r=all |