|
on Big Data |
By: | Matt Taddy |
Abstract: | We have seen in the past decade a sharp increase in the extent that companies use data to optimize their businesses. Variously called the `Big Data' or `Data Science' revolution, this has been characterized by massive amounts of data, including unstructured and nontraditional data like text and images, and the use of fast and flexible Machine Learning (ML) algorithms in analysis. With recent improvements in Deep Neural Networks (DNNs) and related methods, application of high-performance ML algorithms has become more automatic and robust to different data scenarios. That has led to the rapid rise of an Artificial Intelligence (AI) that works by combining many ML algorithms together – each targeting a straightforward prediction task – to solve complex problems. We will define a framework for thinking about the ingredients of this new ML-driven AI. Having an understanding of the pieces that make up these systems and how they fit together is important for those who will be building businesses around this technology. Those studying the economics of AI can use these definitions to remove ambiguity from the conversation on AI's projected productivity impacts and data requirements. Finally, this framework should help clarify the role for AI in the practice of modern business analytics and economic measurement. |
JEL: | C01 C1 O33 |
Date: | 2018–02 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:24301&r=big |
By: | Avi Goldfarb; Daniel Trefler |
Abstract: | This paper explores the international dimensions of the economics of artificial intelligence. Trade theory emphasizes the roles of scale, competition, and knowledge creation and knowledge diffusion as fundamental to comparative advantage. We explore key features of AI with respect to these dimensions and describe the features of an appropriate model of international trade in the context of AI. We then discuss policy implications with respect to investments in research, and behind-the-border regulations such as privacy, data localization, standards, and competition. We conclude by emphasizing that there is still much to learn before we have a comprehensive understanding of how AI will affect trade. |
JEL: | F1 O33 |
Date: | 2018–01 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:24254&r=big |
By: | Ajay K. Agrawal; Joshua S. Gans; Avi Goldfarb |
Abstract: | Recent artificial intelligence advances can be seen as improvements in prediction. We examine how such predictions should be priced. We model two inputs into decisions: a prediction of the state and the payoff or utility from different actions in that state. The payoff is unknown, and can only be learned through experiencing a state. It is possible to learn that there is a dominant action across all states, in which case the prediction has little value. Therefore, if predictions cannot be credibly contracted upfront, the seller cannot extract the full value, and instead charges the same price to all buyers. |
JEL: | D81 L12 O33 |
Date: | 2018–02 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:24284&r=big |
By: | Ginger Zhe Jin |
Abstract: | Thanks to big data, artificial intelligence (AI) has spurred exciting innovations. In the meantime, AI and big data are reshaping the risk in consumer privacy and data security. In this essay, I first define the nature of the problem and then present a few facts about the ongoing risk. The bulk of the essay describes how the U.S. market copes with the risk in current policy environment. It concludes with key challenges facing researchers and policy makers. |
JEL: | D04 D18 D8 L15 L51 |
Date: | 2018–01 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:24253&r=big |
By: | Choi, Jay Pil; Jeon, Doh-Shin; Kim, Byung-Cheol |
Abstract: | We provide a theoretical model of privacy in which data collection requires consumers' consent and consumers are fully aware of the consequences of such consent. Nonetheless, excessive collection of personal information arises in the monopoly market equilibrium which results in excessive loss of privacy compared to the social optimum. In a fragmented market with a continuum of firms, no individual website has incentives to collect and monetize users' personal data in the presence of scale economies in data analytics. However, the emergence of data brokerage industry can restore these incentives. Our results have important policy implications for the ongoing debate regarding online privacy protection: excessive loss of privacy emerges even with costless reading and perfect understanding of all privacy policies. We support the view that privacy is a public good and propose alternative policy remedies beyond the current informed-consent approach. |
Keywords: | privacy; personal data; information externalities; big data analytics |
Date: | 2018–01 |
URL: | http://d.repec.org/n?u=RePEc:tse:wpaper:32426&r=big |
By: | Magdalena Bennett; Peter Leopold S. Bergman |
Abstract: | Truancy correlates with many risky behaviors and adverse outcomes. We use detailed administrative data on by-class absences to construct social networks based on students who miss class together. We simulate these networks and use permutation tests to show that certain students systematically coordinate their absences. Leveraging a parent-information intervention on student absences, we find spillover effects from treated students onto peers in their network. We show that an optimal-targeting algorithm that incorporates machine-learning techniques to identify heterogeneous effects, as well as the direct effects and spillover effects, could further improve the efficacy and cost-effectiveness of the intervention subject to a budget constraint. |
Keywords: | social networks, peer effects, education |
JEL: | D85 I20 |
Date: | 2018 |
URL: | http://d.repec.org/n?u=RePEc:ces:ceswps:_6848&r=big |
By: | Hans B\"uhler; Lukas Gonon; Josef Teichmann; Ben Wood |
Abstract: | We present a framework for hedging a portfolio of derivatives in the presence of market frictions such as transaction costs, market impact, liquidity constraints or risk limits using modern deep reinforcement machine learning methods. We discuss how standard reinforcement learning methods can be applied to non-linear reward structures, i.e. in our case convex risk measures. As a general contribution to the use of deep learning for stochastic processes, we also show that the set of constrained trading strategies used by our algorithm is large enough to $\epsilon$-approximate any optimal solution. Our algorithm can be implemented efficiently even in high-dimensional situations using modern machine learning tools. Its structure does not depend on specific market dynamics, and generalizes across hedging instruments including the use of liquid derivatives. Its computational performance is largely invariant in the size of the portfolio as it depends mainly on the number of hedging instruments available. We illustrate our approach by showing the effect on hedging under transaction costs in a synthetic market driven by the Heston model, where we outperform the standard "complete market" solution. |
Date: | 2018–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:1802.03042&r=big |
By: | Chia-Lin Chang (National Chung Hsing University); Michael McALeer (Asia University; University of Sydney Business School; Erasmus University); Wing-Keung Wong (Asia University; China Medical University Hospital; Lingnan University) |
Abstract: | The paper provides a review of the literature that connects Big Data, Computational Science, Economics, Finance, Marketing, Management, and Psychology, and discusses some research that is related to the seven disciplines. Academics could develop theoretical models and subsequent econometric and statistical models to estimate the parameters in the associated models, as well as conduct simulation to examine whether the estimators in their theories on estimation and hypothesis testing have good size and high power. Thereafter, academics and practitioners could apply theory to analyse some interesting issues in the seven disciplines and cognate areas. |
Keywords: | Big Data; Computational science; Economics; Finance; Management; Theoretical models; Econometric and statistical models; Applications. |
JEL: | A10 G00 G31 O32 |
Date: | 2018–02–03 |
URL: | http://d.repec.org/n?u=RePEc:tin:wpaper:20180011&r=big |
By: | Julio A. Berdegué (Centro Latinoamericano para el Desarrollo Rural, RIMISP); Tatiana Hiller (Universidad Iberoamericana); Juan Mauricio Ramírez (Centro Latinoamericano para el Desarrollo Rural, RIMISP); Santiago Satizábal (Centro Latinoamericano para el Desarrollo Rural, RIMISP); Isidro Soloaga (Universidad Iberoamericana, Ciudad de México); Juan Soto (Centro Latinoamericano para el Desarrollo Rural, RIMISP); Miguel Uribe (Food and Agriculture Organization, FAO-United Nations); Milena Vargas (Centro Latinoamericano para el Desarrollo Rural, RIMISP) |
Abstract: | The delimitation of functional spatial units or functional territories is an important topic in regional science and economic geography since the empirical verification of many causal relationships is affected by the size and shape of these areas. Most of the literature on the delimitation of these functional territories is based on developed countries, usually using contemporary and updated information of commuting flows. Conversely, in developing countries the technical contributions have been incipient. This paper proposes a complementary step in the delimitation of functional territories, combining stable satellite night lights and commuting flows, with applications for Mexico, Colombia and Chile. This method leads to a more accurate definition of functional territories, especially in cases where official data for commuting flows are unreliable and/or outdated, as is the case of several developing and underdeveloped countries. We exploit important advances associated with the use of satellite images, and specifically, the use of night lights as a source of information for the delimitation of metropolitan areas and urban settlements. |
JEL: | R10 R12 R23 |
Date: | 2017–09–15 |
URL: | http://d.repec.org/n?u=RePEc:smx:wpaper:2017004&r=big |
By: | Florian Maire (School of Mathematics and Statistics, University College Dublin; Insight Centre for Data Analytics, University College Dublin); Nial Friel (School of Mathematics and Statistics, University College Dublin; Insight Centre for Data Analytics, University College Dublin); Pierre ALQUIER (CREST-ENSAE) |
Abstract: | This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an unknown fraction of fixed size of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation (ABC) literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC, is a generic and exible approach which, contrarily to existing scalable methodologies, preserves the simplicity of the Metropolis-Hastings algorithm. Even though exactness is lost, i.e the chain distribution approximates the target, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments. |
Keywords: | Bayesian inference, Big-data, Approximate Bayesian Computation, noisy Markov chain Monte Carlo |
Date: | 2017–06–26 |
URL: | http://d.repec.org/n?u=RePEc:crs:wpaper:2017-40&r=big |
By: | Kohn, Robert; Nguyen, Nghia; Nott, David; Tran, Minh-Ngoc |
Abstract: | Deep neural networks (DNNs) are a powerful tool for functional approximation. We describe flexible versions of generalized linear and generalized linear mixed models incorporating basis functions formed by a deep neural network. The consideration of neural networks with random effects seems little used in the literature, perhaps because of the computational challenges of incorporating subject specific parameters into already complex models. Efficient computational methods for Bayesian inference are developed based on Gaussian variational approximation methods. A parsimonious but flexible factor parametrization of the covariance matrix is used in the Gaussian variational approximation. We implement natural gradient methods for the optimization, exploiting the factor structure of the variational covariance matrix to perform fast matrix vector multiplications in iterative conjugate gradient linear solvers in natural gradient computations. The method can be implemented in high dimensions, and the use of the natural gradient allows faster and more stable convergence of the variational algorithm. In the case of random effects, we compute unbiased estimates of the gradient of the lower bound in the model with the random effects integrated out by making use of Fisher's identity. The proposed methods are illustrated in several examples for DNN random effects models and high-dimensional logistic regression with sparse signal shrinkage priors. |
Keywords: | Variational approximation; Stochastic optimization; Reparametrization gradient; Factor models |
Date: | 2017 |
URL: | http://d.repec.org/n?u=RePEc:syb:wpbsba:2123/17877&r=big |
By: | Pierre Alquier (CREST-ENSAE, CNRS); James Ridgway (INRIA) |
Abstract: | While Bayesian methods are extremely popular in statistics and machine learning, their application to massive datasets is often challenging, when possible at all. Indeed, the classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family. Thus, MCMC are replaced by an optimization algorithm which is orders of magnitude faster. VB methods have been applied in such computationally demanding applications as including collaborative filtering, image and video processing, NLP and text processing... However, despite very nice results in practice, the theoretical properties of these approximations are usually not known. In this paper, we propose a general approach to prove the concentration of variational approximations of fractional posteriors. We apply our theory to two examples: matrix completion, and Gaussian VB. |
Date: | 2017–06–28 |
URL: | http://d.repec.org/n?u=RePEc:crs:wpaper:2017-39&r=big |
By: | FUJI Kazuhiko |
Abstract: | General purpose artificial intelligence (AI) will be realized in 30 years, and there is a possibility that the majority of the current jobs will become redundant. Many experts claim that a basic income should be introduced in the future, but I think there is very little chance of this occurring in Japan. In the future, knowledge will become excessive due to AI, but an aesthetic sense will be lacking as general purpose AI cannot create a new frontier of art. Brain science research has revealed issues including the following. ï½¥Appreciation of beauty has the potential to be a new object of desire. ï½¥Consciousness of beauty raises a feeling of justice. Many young Japanese enjoy aesthetic activities through the internet (e.g., YouTuber). For these reasons, I expect knowledge of aesthetic sense will play an important role in economic activities. To promote aesthetic activities, the government, especially local governments, should take necessary measures including issuing local currency. Abundant aesthetic activities can lead to happier lives without a basic income in Japan. |
Date: | 2017–12 |
URL: | http://d.repec.org/n?u=RePEc:eti:rpdpjp:17035&r=big |