nep-big 2020-01-06 papers

on Big Data

Issue of 2020‒01‒06
24 papers chosen by
Tom Coupé
University of Canterbury

Wage Indexation and Jobs. A Machine Learning Approach By Gert Bijnens; Shyngys Karimov; Jozef Konings
Using Massive Online Choice Experiments to Measure Changes in Well-being By Brynjolfsson, Erik; Collis, Avinash; Eggers, Felix
Using Machine Learning to Detect and Forecast Accounting Fraud By KONDO Satoshi; MIYAKAWA Daisuke; SHIRAKI Kengo; SUGA Miki; USUKI Teppei
A Robust Predictive Model for Stock Price Prediction Using Deep Learning and Natural Language Processing By Sidra Mehtab; Jaydip Sen
Forecasting significant stock price changes using neural networks By Firuz Kamalov
How data analytics drive sharing economy business models? By Soraya SEDKAOUI; Rafika Benaichouba
Role of Energy use in the Prediction of CO2 Emissions and Growth in India: An Application of Artificial Neural Networks (ANN) By K, Ashin Nishan M; ASHIQ, MUHAMMED V
Intuitive Beliefs By Jawwad Noor
The Impact of Local Taxes and Public Services on Property Values By Grodecka, Anna; Hull, Isaiah
Shocks to Supply Chain Networks and Firm Dynamics: An Application of Double Machine Learning By MIYAKAWA Daisuke
I NTRODUCING A NEW T ECHNICAL I NDICATOR BASED ON OCTAV O NICESCU I NFORMATIONAL E NERGY AND COMPARE IT WITH B OLLINGER BANDS FOR S&P 500 M OVEMENT P REDICTIONS By Alexandru, Daia
Double debiased machine learning nonparametric inference with continuous treatments By Kyle Colangelo; Ying-Ying Lee
Explanation, prediction, and causality: Three sides of the same coin? By Watts, Duncan J; Beck, Emorie D; Bienenstock, Elisa Jayne; Bowers, Jake; Frank, Aaron; Grubesic, Anthony; Hofman, Jake; Rohrer, Julia Marie; Salganik, Matthew
Older Workers Need Not Apply? Ageist Language in Job Ads and Age Discrimination in Hiring By Ian Burn; Patrick Button; Luis Felipe Munguia Corella; David Neumark
Book Review: Donald Kettl, Little Bites of Big Data for Public Policy By Li, Huafang
EXPERIMENTED KINETIC ENERGY AS FEATURES FOR NATURAL LANGUAGE CLASSIFICATION By Alexandru, Daia
AI-readiness for circular economy_Prospects and challenges By Ho, Tung Manh
Housing Prices and Property Descriptions: Using Soft Information to Value Real Assets By Lily Shen; Stephen L. Ross
Towards a general large sample theory for regularized estimators By Michael Jansson; Demian Pouzo
Using Machine Learning to Target Treatment: The Case of Household Energy Use By Christopher R. Knittel; Samuel Stolper
Generative Synthesis of Insurance Datasets By Kevin Kuo
Alpha Discovery Neural Network based on Prior Knowledge By Jie Fang; Zhikang Xia; Xiang Liu; Shutao Xia; Yong Jiang; Jianwu Lin
Minimax Semiparametric Learning With Approximate Sparsity By Jelena Bradic; Victor Chernozhukov; Whitney K. Newey; Yinchu Zhu
"Don't know" Tells: Calculating Non-Response Bias in Firms' Inflation Expectations Using Machine Learning Techniques By Yosuke Uno; Ko Adachi

Wage Indexation and Jobs. A Machine Learning Approach

By:	Gert Bijnens; Shyngys Karimov; Jozef Konings
Abstract:	In 2015 Belgium suspended the automatic wage indexation for a period of 12 months in order to boost competitiveness and increase employment. This paper uses a novel, machine learning based approach to construct a counterfactual experiment. This artificial counterfactual allows us to analyze the employment impact of suspending the indexation mechanism. We find a positive impact on employment of 0.5 percent which corresponds to a labor demand elasticity of -0.25. This effect is more pronounced for manufacturing firms, where the impact on employment can reach 2 percent, which corresponds to a labor demand elasticity of -1.
Keywords:	labor demand, wage elasticity, counterfactual analysis, artificial control, machine learning
Date:	2019–11–27
URL:	http://d.repec.org/n?u=RePEc:ete:ceswps:643831&r=all

Using Massive Online Choice Experiments to Measure Changes in Well-being

By:	Brynjolfsson, Erik; Collis, Avinash; Eggers, Felix
Abstract:	GDP and derived metrics such as productivity have been central to our understanding of economic progress and well-being. In principle, changes in consumer surplus provide a superior, and more direct, measure of changes in well-being, especially for digital goods. In practice, these alternatives have been difficult to quantify. We explore the potential of massive online choice experiments to measure consumer surplus. We illustrate this technique via several empirical examples which quantify the valuations of popular digital goods and categories. Our examples include incentive compatible discrete choice experiments where online and lab participants receive monetary compensation if and only if they forgo goods for pre-defined periods. For example, the median user needed a compensation of about $48 to forgo Facebook for one month. Our overall analyses reveal that digital goods have created large gains in well-being that are not reflected in conventional measures of GDP and productivity. By periodically querying a large, representative sample of goods and services, including those which are not priced in existing markets, changes in consumer surplus and other new measures of well-being derived from these online choice experiments have the potential for providing cost-effective supplements to the existing National Income and Product Accounts.
Date:	2019–04–09
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:akqhn&r=all

Using Machine Learning to Detect and Forecast Accounting Fraud

By:	KONDO Satoshi; MIYAKAWA Daisuke; SHIRAKI Kengo; SUGA Miki; USUKI Teppei
Abstract:	This study investigates the usefulness of machine learning methods for detecting and forecasting accounting fraud. First, we aim to "detect" accounting fraud and confirm an improvement in detection performance. We achieve this by using machine learning, which allows high-dimensional feature space, compared with a classical parametric model, which is based on limited explanatory variables. Second, we aim to "forecast" accounting fraud, by using the same approach. This area has not been studied significantly in the past, yet we confirm a solid forecast performance. Third, we interpret the model by examining how estimated score changes with respect to change in each predictor. The validation is done on public listed companies in Japan, and we confirm that the machine learning method increases the model performance, and that higher interaction of predictors, which machine learning made possible, contributes to large improvement in prediction.
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:eti:dpaper:19103&r=all

A Robust Predictive Model for Stock Price Prediction Using Deep Learning and Natural Language Processing

By:	Sidra Mehtab; Jaydip Sen
Abstract:	Prediction of future movement of stock prices has been a subject matter of many research work. There is a gamut of literature of technical analysis of stock prices where the objective is to identify patterns in stock price movements and derive profit from it. Improving the prediction accuracy remains the single most challenge in this area of research. We propose a hybrid approach for stock price movement prediction using machine learning, deep learning, and natural language processing. We select the NIFTY 50 index values of the National Stock Exchange of India, and collect its daily price movement over a period of three years (2015 to 2017). Based on the data of 2015 to 2017, we build various predictive models using machine learning, and then use those models to predict the closing value of NIFTY 50 for the period January 2018 till June 2019 with a prediction horizon of one week. For predicting the price movement patterns, we use a number of classification techniques, while for predicting the actual closing price of the stock, various regression models have been used. We also build a Long and Short-Term Memory - based deep learning network for predicting the closing price of the stocks and compare the prediction accuracies of the machine learning models with the LSTM model. We further augment the predictive model by integrating a sentiment analysis module on twitter data to correlate the public sentiment of stock prices with the market sentiment. This has been done using twitter sentiment and previous week closing values to predict stock price movement for the next week. We tested our proposed scheme using a cross validation method based on Self Organizing Fuzzy Neural Networks and found extremely interesting results.
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1912.07700&r=all

Forecasting significant stock price changes using neural networks

By:	Firuz Kamalov
Abstract:	Stock price prediction is a rich research topic that has attracted interest from various areas of science. The recent success of machine learning in speech and image recognition has prompted researchers to apply these methods to asset price prediction. The majority of literature has been devoted to predicting either the actual asset price or the direction of price movement. In this paper, we study a hitherto little explored question of predicting significant changes in stock price based on previous changes using machine learning algorithms. We are particularly interested in the performance of neural network classifiers in the given context. To this end, we construct and test three neural network models including multi-layer perceptron, convolutional net, and long short term memory net. As benchmark models we use random forest and relative strength index methods. The models are tested using 10-year daily stock price data of four major US public companies. Test results show that predicting significant changes in stock price can be accomplished with a high degree of accuracy. In particular, we obtain substantially better results than similar studies that forecast the direction of price change.
Date:	2019–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1912.08791&r=all

How data analytics drive sharing economy business models?

By:	Soraya SEDKAOUI (Faculty of Economics, University of Khemis Miliana, Algeria); Rafika Benaichouba (Faculty of Economics, University of Khemis miliana)
Abstract:	Several studies and reports published by Mckinsey, Gartner, Cesco, PwC, etc., confirm that data analytics offers companies more value and allows them to the creation of new and innovative ideas. This is why the data-driven approach has been the subject of considerable publicity in recent years. This approach has given rise to the emergence of many business models, all of which have created their own way of doing things. This is the case of many emergent business models who have noticed that several assets (goods or services) are not exploited effectively by the parties that hold them. We buy many products that we use only for a certain period and then put them aside. What if we could find one or more people who might need it?This is the question that these innovative business models had taken into account. They could see potential monetary benefits in these different resources, simply by facilitating their sharing. Some succeed by bursting the value chains and shaking up the established players: Uber for taxis, BlaBlaCar for interurban carpooling, Airbnb for accommodation, etc., and this is, of course, just the beginning, because the trend is accelerating. These are fascinating ideas that have led to the emergence of the sharing economy.But, one thing is clear, the ideas created by Uber, Airbnb, BlaBlaCar, etc. cannot be realized without determining what allows their development (how?) and, of course, the target (for whom?). These companies use the data to determine what to develop and target, to create untapped sharing market opportunities.Many researchers have found the potential of large amounts of data produced and collected by many sharing platforms. The analysis of these quantities not only helps to promote the performance of these models or operationalize their activities, but also to predict economic results such as inflation, unemployment, housing prices, etc.All sharing platforms and applications rely on data and analysis to develop practices and determine who to target. These data are increasingly used today because of the conjunction of a number of factors, such as: ?The constant decrease in data storage costs;?The increase of the computing power;?The production of large amounts of data, which is largely unstructured and requires different operating techniques and which cannot be preceded by traditional methods. Being able to generate value, in the context of the sharing economy, and make big data more profitable is based on the ability of companies to analyze the amount available data. The challenge, therefore, lie in the ability to extract value from the amount volume of data produced in real-time continuous streams with multiple form and from multiple sources. In another word, the key to explore data and uncover secrets from it is to find and develop applicable ways in such a way to extract knowledge that can conduct any business project strategies.Indeed, recent years have been marked by the use of very advanced methods and computer tools previously reserved only for large companies. This has facilitated access to a large number of ways to create innovative ideas.Therefore, in this paper the following research question will be answered: How the sharing economy companies use data and advanced analytics to boost their business models? Through this question, we recall the context of big data and analytics, their importance in sharing economy context, their challenges and the role they mutually plays to create new opportunities for sharing economy companies. We will, through this paper, see how sharing economy business models use data analytics to generate value.
Keywords:	Data analytics, big data, sharing economy, platforms, business model, innovation
Date:	2019–10
URL:	http://d.repec.org/n?u=RePEc:sek:iacpro:9911754&r=all

Role of Energy use in the Prediction of CO2 Emissions and Growth in India: An Application of Artificial Neural Networks (ANN)

By:	K, Ashin Nishan M; ASHIQ, MUHAMMED V
Abstract:	The correspondence among energy use, carbon dioxide emissions and growth is a matter of discussion among policymakers, economists and researchers. It is not possible to deny that the concept of sustainable development inspires them for the enquiry into this arena. The primary aspiration of this work is to develop and use the machine learning technique in the prediction of carbon dioxide emissions and growth by taking energy use as the inputs variables. Our findings suggest that the prediction accuracy of the CO2 and growth can improve by using machine learning techniques. In this case, prediction using Adam optimisation is better than Stochastic Gradient Descent (SGD) in the context of carbon dioxide emissions and growth. Further, result highlights that movement from fossil fuel use to renewable energy use is a possible way to reduce carbon dioxide emissions without sacrificing economic growth.
Date:	2019–12–08
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:gkpbu&r=all

Intuitive Beliefs

By:	Jawwad Noor (e Department of Economics, Boston University)
Abstract:	Beliefs are intuitive if they rely on associative memory, which can be described as a network of associations between events. A belief-theoretic characterization of the model is provided, its uniqueness properties are established, and the intersection with the Bayesian model is characterized. The formation of intuitive beliefs is modelled after machine learning, whereby the network is shaped by past experience via minimization of the diï¬€erence from an objective probability distribution. The model is shown to accommodate correlation misperception, the conjunction fallacy, base-rate neglect/conservatism, etc.
Keywords:	Beliefs, Intuition, Associative memory, Boltzmann machine, Energy-Based Neural Networks, Non-Bayesian updating
JEL:	C45 D01 D90
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:cwl:cwldpp:2216&r=all

The Impact of Local Taxes and Public Services on Property Values

By:	Grodecka, Anna (Lund University and Knut Wicksell Centre for Financial Studies); Hull, Isaiah (Research Department, Central Bank of Sweden)
Abstract:	Attempts to measure the capitalization of local taxes into property prices, starting with Oates (1969), have suffered from a lack of local public service controls. We revisit this vast literature with a novel dataset of 947 time-varying local characteristic and public service controls for all municipalities in Sweden over the 2010-2016 period. To make use of the high dimensional vector of controls, as well as time and geographic fixed effects, we employ a novel empirical approach that modifies the recently-introduced debiased machine learning estimator by coupling it with a deep-wide neural network. We find that existing estimates of tax capitalization in the literature, including quasi-experimental work, may understate the impact of taxes on house prices by as much as 50%. We also exploit the unique features of our dataset to test core assumptions of the Tiebout hypothesis and to estimate the impact of public services, education, and crime on house prices.
Keywords:	Local Public Goods; Tax Capitalization; Tiebout Hypothesis; Machine Learning; Property Prices
JEL:	C45 C55 H31 H41 R30
Date:	2019–04–01
URL:	http://d.repec.org/n?u=RePEc:hhs:rbnkwp:0374&r=all

Shocks to Supply Chain Networks and Firm Dynamics: An Application of Double Machine Learning

By:	MIYAKAWA Daisuke
Abstract:	We examine the association between changes in supply chain networks and firm dynamics. To determine the causal relationship, first, using data on over a million Japanese firms, we construct machine learning-based prediction models for the three modes of firm exit (i.e., default, voluntary closure, and dissolution) and firm sales growth. Given the high performance in those prediction models, second, we use the double machine learning method (Chernozhukov et al. 2018) to determine causal relationships running from the changes in supply chain networks to those indexes of firm dynamics. The estimated nuisance parameters suggest, first, that an increase in global and local centrality indexes results in lower probability of exits. Second, higher meso-scale centrality leads to higher probability of exits. Third, we also confirm the positive association of global and local centrality indexes with sales growth as well as the negative association of a meso-scale centrality index with sales growth. Fourth, somewhat surprisingly, we found that an increase in one type of local centrality index shows a negative association with sales growth. These results reconfirm the already reported correlation between the centrality of firms in supply chain networks and firm dynamics in a causal relationship and further show the unique role of centralities measured in local and medium-sized clusters.
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:eti:dpaper:19100&r=all

I NTRODUCING A NEW T ECHNICAL I NDICATOR BASED ON OCTAV O NICESCU I NFORMATIONAL E NERGY AND COMPARE IT WITH B OLLINGER BANDS FOR S&P 500 M OVEMENT P REDICTIONS

By:	Alexandru, Daia
Abstract:	This research paper demonstrates the invention of the kinetic bands, based on Romanian mathematician and statistician Octav Onicescu’s kinetic energy, also known as “informational energy”, where we use historical data of foreign exchange currencies or indexes to predict the trend displayed by a stock or an index and whether it will go up or down in the future. Here, we explore the imperfections of the Bollinger Bands to determine a more sophisticated triplet of indicators that predict the future movement of prices in the Stock Market. An Extreme Gradient Boosting Modelling was conducted in Python using historical data set from Kaggle, the historical data set spanning all current 500 companies listed. An invariable importance feature was plotted. The results displayed that Kinetic Bands, derived from (KE) are very influential as features or technical indicators of stock market trends. Furthermore, experiments done through this invention provide tangible evidence of the empirical aspects of it. The machine learning code has low chances of error if all the proper procedures and coding are in play. The experiment samples are attached to this study for future references or scrutiny.
Date:	2019–06–20
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:m478b&r=all

Double debiased machine learning nonparametric inference with continuous treatments

By:	Kyle Colangelo (Institute for Fiscal Studies); Ying-Ying Lee (Institute for Fiscal Studies)
Abstract:	We propose a nonparametric inference method for causal e?ects of continuous treatment variables, under unconfoundedness and in the presence of high-dimensional or nonparametric nuisance parameters. Our simple kernel-based double debiased machine learning (DML) estimators for the average dose-response function (or the average structural function) and the partial e?ects are asymptotically normal with a nonparametric convergence rate. The nuisance estimators for the conditional expectation function and the generalized propensity score can be nonparametric kernel or series estimators or ML methods. Using doubly robust in?uence function and cross-?tting, we give tractable primitive conditions under which the nuisance estimators do not a?ect the ?rst-order large sample distribution of the DML estimators.
Date:	2019–10–21
URL:	http://d.repec.org/n?u=RePEc:ifs:cemmap:54/19&r=all

Explanation, prediction, and causality: Three sides of the same coin?

By:	Watts, Duncan J; Beck, Emorie D; Bienenstock, Elisa Jayne (Arizona State University); Bowers, Jake; Frank, Aaron; Grubesic, Anthony; Hofman, Jake; Rohrer, Julia Marie (University of Leipzig); Salganik, Matthew
Abstract:	In this essay we make four interrelated points. First, we reiterate previous arguments (Kleinberg et al 2015) that forecasting problems are more common in social science than is often appreciated. From this observation it follows that social scientists should care about predictive accuracy in addition to unbiased or consistent estimation of causal relationships. Second, we argue that social scientists should be interested in prediction even if they have no interest in forecasting per se. Whether they do so explicitly or not, that is, causal claims necessarily make predictions; thus it is both fair and arguably useful to hold them accountable for the accuracy of the predictions they make. Third, we argue that prediction, used in either of the above two senses, is a useful metric for quantifying progress. Important differences between social science explanations and machine learning algorithms notwithstanding, social scientists can still learn from approaches like the Common Task Framework (CTF) which have successfully driven progress in certain fields of AI over the past 30 years (Donoho, 2015). Finally, we anticipate that as the predictive performance of forecasting models and explanations alike receives more attention, it will become clear that it is subject to some upper limit which lies well below deterministic accuracy for many applications of interest (Martin et al 2016). Characterizing the properties of complex social systems that lead to higher or lower predictive limits therefore poses an interesting challenge for computational social science.
Date:	2018–10–31
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:u6vz5&r=all

Older Workers Need Not Apply? Ageist Language in Job Ads and Age Discrimination in Hiring

By:	Ian Burn; Patrick Button; Luis Felipe Munguia Corella; David Neumark
Abstract:	We study the relationships between ageist stereotypes – as reflected in the language used in job ads – and age discrimination in hiring, exploiting the text of job ads and differences in callbacks to older and younger job applicants from a previous resume (correspondence study) field experiment (Neumark, Burn, and Button, 2019). Our analysis uses methods from computational linguistics and machine learning to directly identify, in a field-experiment setting, ageist stereotypes that underlie age discrimination in hiring. We find evidence that language related to stereotypes of older workers sometimes predicts discrimination against older workers. For men, our evidence points most strongly to age stereotypes about physical ability, communication skills, and technology predicting age discrimination, and for women, age stereotypes about communication skills and technology. The method we develop provides a framework for applied researchers analyzing textual data, highlighting the usefulness of various computer science techniques for empirical economics research.
JEL:	J14 J23 J7 J78
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:26552&r=all

Book Review: Donald Kettl, Little Bites of Big Data for Public Policy

By: Li, Huafang

Abstract: This is a book review on Don Kettl's book Little Bites of Big Data for Public Policy.

Date: 2019–05–31

URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:8hy4b&r=all

EXPERIMENTED KINETIC ENERGY AS FEATURES FOR NATURAL LANGUAGE CLASSIFICATION

By:	Alexandru, Daia
Abstract:	This article describes various uses of kinetic Energy in Natural Language Processing (NLP) and why Natural Language Processing could be used in trading, with the potential to be use also in other applications, including psychology and medicine. Kinetic energy discovered by great Romanian mathematician Octave Onicescu (1892-1983), allows to do feature engineering in various domains including NLP which we did in this experiment. More than that we have run a machine learning model called xgboost to see feature importance and the features extracted by xgboost where captured the most important, in order to classify for simplicity of reader some authors by their content and type of writing
Date:	2019–06–20
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:drwc6&r=all

AI-readiness for circular economy_Prospects and challenges

By:	Ho, Tung Manh
Abstract:	In this essay, the prospects and challenges of a circular economy (CE) powered by artificial intelligence (AI) will be discussed.
Date:	2019–08–15
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:s4jpz&r=all

Housing Prices and Property Descriptions: Using Soft Information to Value Real Assets

By:	Lily Shen (Clemson University); Stephen L. Ross (University of Connecticut)
Abstract:	Recent research in economics and ﬁnance has recognized the potential of utilizing tex-tual “soft” data for valuing heterogeneous assets. This paper employs machine learning to quantify the value of “soft” information contained in real estate property descrip-tions. Textual descriptions contain information that traditional hedonic attributes cannot capture. A one standard deviation increase in unobserved quality based on our “soft” information leads to a 15% increase in property sale price. Further, annual hedonic house price indices ignoring our measure of unobserved quality overstate real estate prices by 11% to 16% and mistime the recovery of housing prices following the Great Recession.
Keywords:	Natural Language Processing, Unsupervised Machine Learning, Soft In-formation, Housing Prices, Price Indices, Property Descriptions
JEL:	R31 G12 G14 C45
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:uct:uconnp:2019-20&r=all

Towards a general large sample theory for regularized estimators

By:	Michael Jansson (Institute for Fiscal Studies); Demian Pouzo (Institute for Fiscal Studies)
Abstract:	We present a general framework for studying regularized estimators; such estimators are pervasive in estimation problems wherein “plug-in” type estimators are either ill-de?ned or ill-behaved. Within this framework, we derive, under primitive conditions, consistency and a generalization of the asymptotic linearity property. We also provide data-driven methods for choosing tuning parameters that, under some conditions, achieve the aforementioned properties. We illustrate the scope of our approach by studying a wide range of applications, revisiting known results and deriving new ones.
Date:	2019–11–25
URL:	http://d.repec.org/n?u=RePEc:ifs:cemmap:63/19&r=all

Using Machine Learning to Target Treatment: The Case of Household Energy Use

By:	Christopher R. Knittel; Samuel Stolper
Abstract:	We use causal forests to evaluate the heterogeneous treatment effects (TEs) of repeated behavioral nudges towards household energy conservation. The average response is a monthly electricity reduction of 9 kilowatt-hours (kWh), but the full distribution of responses ranges from -30 to +10 kWh. Selective targeting of treatment using the forest raises social net benefits by 12-120 percent, depending on the year and welfare function. Pre-treatment consumption and home value are the strongest predictors of treatment effect. We find suggestive evidence of a "boomerang effect": households with lower consumption than similar neighbors are the ones with positive TE estimates.
JEL:	C53 D90 Q40
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:26531&r=all

Generative Synthesis of Insurance Datasets

By:	Kevin Kuo
Abstract:	One of the impediments in advancing actuarial research and developing open source assets for insurance analytics is the lack of realistic publicly available datasets. In this work, we develop a workflow for synthesizing insurance datasets leveraging state-of-the-art neural network techniques. We evaluate the predictive modeling efficacy of datasets synthesized from publicly available data in the domains of general insurance pricing and life insurance shock lapse modeling. The trained synthesizers are able to capture representative characteristics of the real datasets. This workflow is implemented via an R interface to promote adoption by researchers and data owners.
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1912.02423&r=all

Alpha Discovery Neural Network based on Prior Knowledge

By:	Jie Fang; Zhikang Xia; Xiang Liu; Shutao Xia; Yong Jiang; Jianwu Lin
Abstract:	In financial automatic feature construction task, genetic programming is the state-of-the-art-technic. It uses reverse polish expression to represent features and then uses genetic programming to simulate the evolution process. With the development of deep learning, there are more powerful feature extractors for option. And we think that comprehending the relationship between different feature extractors and data shall be the key. In this work, we put prior knowledge into alpha discovery neural network, combined with different kinds of feature extractors to do this task. We find that in the same type of network, simple network structure can produce more informative features than sophisticated network structure, and it costs less training time. However, complex network is good at providing more diversified features. In both experiment and real business environment, fully-connected network and recurrent network are good at extracting information from financial time series, but convolution network structure can not effectively extract this information.
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1912.11761&r=all

Minimax Semiparametric Learning With Approximate Sparsity

By:	Jelena Bradic; Victor Chernozhukov; Whitney K. Newey; Yinchu Zhu
Abstract:	Many objects of interest can be expressed as a linear, mean square continuous functional of a least squares projection (regression). Often the regression may be high dimensional, depending on many variables. This paper gives minimal conditions for root-n consistent and efficient estimation of such objects when the regression and the Riesz representer of the functional are approximately sparse and the sum of the absolute value of the coefficients is bounded. The approximately sparse functions we consider are those where an approximation by some $t$ regressors has root mean square error less than or equal to $Ct^{-\xi}$ for $C,$ $\xi>0.$ We show that a necessary condition for efficient estimation is that the sparse approximation rate $\xi_{1}$ for the regression and the rate $\xi_{2}$ for the Riesz representer satisfy $\max\{\xi_{1} ,\xi_{2}\}>1/2.$ This condition is stronger than the corresponding condition $\xi_{1}+\xi_{2}>1/2$ for Holder classes of functions. We also show that Lasso based, cross-fit, debiased machine learning estimators are asymptotically efficient under these conditions. In addition we show efficiency of an estimator without cross-fitting when the functional depends on the regressors and the regression sparse approximation rate satisfies $\xi_{1}>1/2$.
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1912.12213&r=all

"Don't know" Tells: Calculating Non-Response Bias in Firms' Inflation Expectations Using Machine Learning Techniques

By:	Yosuke Uno (Bank of Japan); Ko Adachi (Bank of Japan)
Abstract:	This paper examines the "don't know" responses for questions concerning inflation expectations in the Tankan survey. Specifically, using machine learning techniques, we attempt to extract "don't know" responses where respondent firms are more likely to "know" in a sense. We then estimate the counterfactual inflation expectations of such respondents and examine the non-response bias based on the estimation results. Our findings can be summarized as follows. First, there is indeed a fraction of firms that respond "don't know" despite the fact that they seem to "know" something in a sense. Second, the number of such firms, however, is quite small. Third, the estimated counterfactual inflation expectations of such firms are not statistically significantly different from the corresponding official figures in the Tankan survey. Fourth and last, based on the above findings, the non-response bias in firms' inflation expectations likely is statistically negligible.
Keywords:	inflation expectations; PU classification; non-response bias
JEL:	C55 E31
Date:	2019–12–25
URL:	http://d.repec.org/n?u=RePEc:boj:bojwps:wp19e17&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	Li, Huafang
Abstract:	This is a book review on Don Kettl's book Little Bites of Big Data for Public Policy.
Date:	2019–05–31
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:8hy4b&r=all