|
on Big Data |
By: | Dominique Guegan (Centre d'Economie de la Sorbonne and LabEx ReFi); Bertrand Hassani (Group Capgemini and Centre d'Economie de la Sorbonne and LabEx ReFi) |
Abstract: | The arrival of big data strategies is threatening the lastest trends in financial regulation related to the simplification of models and the enhancement of the comparability of approaches chosen by financial institutions. Indeed, the intrinsic dynamic philosophy of Big Data strategies is almost incompatible with the current legal and regulatory framework as illustrated in this paper. Besides, as presented in our application to credit scoring, the model selection may also evolve dynamically forcing both practitioners and regulators to develop libraries of models, strategies allowing to switch from one to the other as well as supervising approaches allowing financial institutions to innovate in a risk mitigated environment. The purpose of this paper is therefore to analyse the issues related to the Big Data environment and in particular to machine learning models highlighting the issues present in the current framework confronting the data flows, the model selection process and the necessity to generate appropriate outcomes |
Keywords: | Financial Regulation; Algorithm; Big Data; Risk |
JEL: | C55 |
Date: | 2017–07 |
URL: | http://d.repec.org/n?u=RePEc:mse:cesdoc:17034&r=big |
By: | Xiaojiao Yu |
Abstract: | Online leading has disrupted the traditional consumer banking sector with more effective loan processing. Risk prediction and monitoring is critical for the success of the business model. Traditional credit score models fall short in applying big data technology in building risk model. In this manuscript, data with various format and size were collected from public website, third-parties and assembled with client's loan application information data. Ensemble machine learning models, random forest model and XGBoost model, were built and trained with the historical transaction data and subsequently tested with separate data. XGBoost model shows higher K-S value, suggesting better classification capability in this task. Top 10 important features from the two models suggest external data such as zhimaScore, multi-platform stacking loans information, and social network information are important factors in predicting loan default probability. |
Date: | 2017–07 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:1707.04831&r=big |
By: | Vasilios Plakandaras; Rangan Gupta; Periklis Gogas; Theophilos Papadimitriou |
Abstract: | The 2006 sudden and immense downturn in U.S. House Prices sparked the 2007 global financial crisis and revived the interest about forecasting such imminent threats for economic stability. In this paper we propose a novel hybrid forecasting methodology that combines the Ensemble Empirical Mode Decomposition (EEMD) from the field of signal processing with the Support Vector Regression (SVR) methodology that originates from machine learning. We test the forecasting ability of the proposed model against a Random Walk (RW) model, a Bayesian Autoregressive and a Bayesian Vector Autoregressive model. The proposed methodology outperforms all the competing models with half the error of the RW model with and without drift in out-of-sample forecasting. Finally, we argue that this new methodology can be used as an early warning system for forecasting sudden house prices drops with direct policy implications. |
Date: | 2017–07 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:1707.04868&r=big |
By: | Matthew F Dixon |
Abstract: | Recurrent neural networks (RNNs) are types of artificial neural networks (ANNs) that are well suited to forecasting and sequence classification. They have been applied extensively to forecasting univariate financial time series, however their application to high frequency trading has not been previously considered. This paper solves a sequence classification problem in which a short sequence of observations of limit order book depths and market orders is used to predict a next event price-flip. The capability to adjust quotes according to this prediction reduces the likelihood of adverse price selection. Our results demonstrate the ability of the RNN to capture the non-linear relationship between the near-term price-flips and a spatio-temporal representation of the limit order book. The RNN compares favorably with other classifiers, including a linear Kalman filter, using S&P500 E-mini futures level II data over the month of August 2016. Further results assess the effect of retraining the RNN daily and the sensitivity of the performance to trade latency. |
Date: | 2017–07 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:1707.05642&r=big |
By: | Nalan Basturk (Maastricht University & RCEA); Stefano Grassi (University of Rome “Tor Vergata”); Lennart Hoogerheide (Vrije Universiteit Amsterdam & Tinbergen Institute); Anne Opschoor (Vrije Universiteit Amsterdam & Tinbergen Institute); Herman K. van Dijk (Erasmus University Rotterdam, Norges Bank (Central Bank of Norway) & Tinbergen Institute & RCEA) |
Abstract: | This paper presents the R package MitISEM (mixture of t by importance sampling weighted expectation maximization) which provides an automatic and flexible two-stage method to approximate a non-elliptical target density kernel – typically a posterior density kernel – using an adaptive mixture of Student-t densities as approximating density. In the first stage a mixture of Student-t densities isfitted to the target using an expectation maximization algorithm where each step of the optimization procedure is weighted using importance sampling. In the second stage this mixture density is a candidate density for efficient and robust application of importance sampling or the Metropolis-Hastings (MH) method to estimate properties of the target distribution. The package enables Bayesian inference and prediction on model parameters and probabilities, in particular, for models where densities have multi-modal or other non-elliptical shapes like curved ridges. These shapes occur in research topics in several scientific fields. For instance, analysis of DNA data in bio-informatics, obtaining loans in the banking sector by heterogeneous groups in financial economics and analysis of education's effect on earned income in labor economics. The package MitISEM provides also an extended algorithm, 'sequential MitISEM', which substantially decreases computation time when the target density has to be approximated for increasing data samples. This occurs when the posterior or predictive density is updated with new observations and/or when one computes model probabilities using predictive likelihoods. We illustrate the MitISEM algorithm using three canonical statistical and econometric models that are characterized by several types of non-elliptical posterior shapes and that describe well-known data patterns in econometrics and finance. We show that MH using the candidate density obtained by MitISEM outperforms, in terms of numerical efficiency, MH using a simpler candidate, as well as the Gibbs sampler. The MitISEM approach is also used for Bayesian model comparison using predictive likelihoods. |
Keywords: | Finite mixtures, Student-t densities, importance sampling, MCMC, MetropolisHastings algorithm, expectation maximization, Bayesian inference, R software |
Date: | 2017–06–26 |
URL: | http://d.repec.org/n?u=RePEc:bno:worpap:2017_10&r=big |
By: | Takaya Fukui (Mizuho Securities Co., Ltd.); Akihiko Takahashi (Faculty of Economics, University of Tokyo) |
URL: | http://d.repec.org/n?u=RePEc:tky:jseres:2017cj287&r=big |