nep-big New Economics Papers
on Big Data
Issue of 2020‒02‒17
twenty papers chosen by
Tom Coupé
University of Canterbury

  1. Night lights in economics: Sources and uses By John Gibson; Susan Olivia; Geua Boe-Gibson
  2. Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium By Cockx, Bart; Lechner, Michael; Bollens, Joost
  3. The Macroeconomics of Automation: Data, Theory, and Policy Analysis By Jaimovich, Nir; Saporta-Eksten, Itay; Siu, Henry E.; Yedid-Levi, Yaniv
  4. Artificial Intelligence, Data, Ethics: An Holistic Approach for Risks and Regulation By Alexis Bogroff; Dominique Guegan
  5. On Calibration Neural Networks for extracting implied information from American options By Shuaiqiang Liu; \'Alvaro Leitao; Anastasia Borovykh; Cornelis W. Oosterlee
  6. On the Basis of Brain: Neural-Network-Inspired Change in General Purpose Chips By Ekaterina Prytkova; Simone Vannuccini
  7. A New Approach to Analyzing Opioid Use among SSDI Applicants By April Yanyuan Wu; Peter Mariani; Jia Pu; Andrew Hurwitz
  8. Artificial Intelligence Platforms – A New Research Agenda for Digital Platform Economy By Mucha, Tomasz; Seppälä, Timo
  9. Big Data based Research on Mechanisms of Sharing Economy Restructuring the World By Dingju Zhu
  10. Housing Search in the Age of Big Data: Smarter Cities or the Same Old Blind Spots? By Boeing, Geoff; Besbris, Max; Schachter, Ariela; Kuk, John
  11. Housing Search in the Age of Big Data: Smarter Cities or the Same Old Blind Spots? By Geoff Boeing; Max Besbris; Ariela Schachter; John Kuk
  12. Technology and Big Data Are Changing Economics: Mining Text to Track Methods By Janet Currie; Henrik Kleven; Esmée Zwiers
  13. The macroeconomics of automation: data, theory, and policy analysis By Nir Jaimovich; Itay Saporta-Eksten; Henry Siu; Yaniv Yedid-Levi
  14. The Global Impact of Brexit Uncertainty By Tarek Hassan; Laurence van Lent; Stephan Hollander; Ahmed Tahoun
  15. Robots and the origin of their labour-saving impact By Fabio Montobbio; Jacopo Staccioli; Maria Enrica Virgillito; Marco Vivarelli
  16. Paid parental leave and maternal reemployment: Do part-time subsidies help or harm? By Zimmert, Franziska; Zimmert, Michael
  17. Rental Housing Spot Markets: How Online Information Exchanges Can Supplement Transacted-Rents Data By Geoff Boeing; Jake Wegmann; Junfeng Jiao
  18. Central Bank Communication in Ghana: Insights from a Text Mining Analysis By Omotosho, Babatunde S.
  19. Digital Platforms and the Demand for International Tourism Services By Lopez Cordova,Jose Ernesto
  20. Nowcasting in Real Time Using Popularity Priors By George Monokroussos; Yongchen Zhao

  1. By: John Gibson (CERDI - Centre d'Études et de Recherches sur le Développement International - Clermont Auvergne - UCA - Université Clermont Auvergne - CNRS - Centre National de la Recherche Scientifique, University of Waikato [Hamilton]); Susan Olivia (University of Waikato [Hamilton]); Geua Boe-Gibson (University of Waikato [Hamilton])
    Abstract: Night lights, as detected by satellites, are increasingly used by economists, typically as a proxy for economic activity. The growing popularity of these data reflects either the absence, or the presumed inaccuracy, of more conventional economic statistics, like national or regional GDP. Further growth in use of night lights is likely, as they have been included in the AidData geo-query tool for providing sub-national data, and in geographic data that the Demographic and Health Survey links to anonymised survey enumeration areas. Yet this ease of obtaining night lights data may lead to inappropriate use, if users fail to recognize that most of the satellites providing these data were not designed to assist economists, and have features that may threaten validity of analyses based on these data, especially for temporal comparisons, and for small and rural areas. In this paper we review sources of satellite data on night lights, discuss issues with these data, and survey some of their uses in economics.
    Keywords: Density,Development,DMSP,Luminosity,Night lights,VIIRS
    Date: 2020–01–24
    URL: http://d.repec.org/n?u=RePEc:hal:wpaper:hal-02453838&r=all
  2. By: Cockx, Bart; Lechner, Michael; Bollens, Joost
    Abstract: We investigate heterogenous employment effects of Flemish training programmes. Based on administrative individual data, we analyse programme effects at various aggregation levels using Modified Causal Forests (MCF), a causal machine learning estimator for multiple programmes. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and types of unemployed. Simulations show that assigning unemployed to programmes that maximise individual gains as identified in our estimation can considerably improve effectiveness. Simplified rules, such as one giving priority to unemployed with low employability, mostly recent migrants, lead to about half of the gains obtained by more sophisticated rules.
    Keywords: Policy evaluation, active labour market policy, causal machine learning, modified causal forest, conditional average treatment effects
    JEL: J68
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:usg:econwp:2020:01&r=all
  3. By: Jaimovich, Nir (University of Zurich); Saporta-Eksten, Itay (Tel Aviv University); Siu, Henry E. (University of British Columbia); Yedid-Levi, Yaniv (Interdisciplinary Center (IDC) Herzliya)
    Abstract: The U.S. economy has experienced a significant drop in the fraction of the population employed in middle wage, "routine task-intensive" occupations. Applying machine learning techniques, we identify characteristics of those who used to be employed in such occupations and show they are now less likely to work in routine occupations. Instead, they are either non-participants in the labor force or working at occupations that tend to occupy the bottom of the wage distribution. We then develop a quantitative, heterogeneous agent, general equilibrium model of labor force participation, occupational choice, and capital investment. This allows us to quantify the role of advancement in automation technology in accounting for these labor market changes. We then use this framework as a laboratory to evaluate various public policies aimed at addressing the disappearance of routine employment and its consequent impacts on inequality.
    Keywords: polarization, automation, routine employment, labor force participation, universal basic income, unemployment insurance, retraining
    JEL: E22 E24 J23 J24
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp12913&r=all
  4. By: Alexis Bogroff (UP1 - Université Panthéon-Sorbonne); Dominique Guegan (UP1 - Université Panthéon-Sorbonne, CES - Centre d'économie de la Sorbonne - UP1 - Université Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, Labex ReFi - UP1 - Université Panthéon-Sorbonne, University of Ca’ Foscari [Venice, Italy])
    Abstract: An extensive list of risks relative to big data frameworks and their use through models of artificial intelligence is provided along with measurements and implementable solutions. Bias, interpretability and ethics are studied in depth, with several interpretations from the point of view of developers, companies and regulators. Reflexions suggest that fragmented frameworks increase the risks of models misspecification, opacity and bias in the result; Domain experts and statisticians need to be involved in the whole process as the business objective must drive each decision from the data extraction step to the final activatable prediction. We propose an holistic and original approach to take into account the risks encountered all along the implementation of systems using artificial intelligence from the choice of the data and the selection of the algorithm, to the decision making.
    Keywords: Artificial Intelligence,Bias,Big Data,Ethics,Governance,Interpretability,Regulation,Risk
    Date: 2019–06
    URL: http://d.repec.org/n?u=RePEc:hal:journl:halshs-02181597&r=all
  5. By: Shuaiqiang Liu; \'Alvaro Leitao; Anastasia Borovykh; Cornelis W. Oosterlee
    Abstract: Extracting implied information, like volatility and/or dividend, from observed option prices is a challenging task when dealing with American options, because of the computational costs needed to solve the corresponding mathematical problem many thousands of times. We will employ a data-driven machine learning approach to estimate the Black-Scholes implied volatility and the dividend yield for American options in a fast and robust way. To determine the implied volatility, the inverse function is approximated by an artificial neural network on the computational domain of interest, which decouples the offline (training) and online (prediction) phases and thus eliminates the need for an iterative process. For the implied dividend yield, we formulate the inverse problem as a calibration problem and determine simultaneously the implied volatility and dividend yield. For this, a generic and robust calibration framework, the Calibration Neural Network (CaNN), is introduced to estimate multiple parameters. It is shown that machine learning can be used as an efficient numerical technique to extract implied information from American options.
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2001.11786&r=all
  6. By: Ekaterina Prytkova (Friedrich Schiller University Jena, Department of Economics and Business Administration); Simone Vannuccini (Science Policy Research Unit, University of Sussex Business School, University of Sussex)
    Abstract: In this paper, we disentangle the changes that the rise of Artificial Intelligence Technologies (AITs) is inducing in the semiconductor industry. The prevailing von Neumann architecture at the core of the established “intensive” technological trajectory of chip production is currently challenged by the rising difficulty to improve product performance over a growing set of computation tasks. In particular, the challenge is exacerbated by the increasing success of Artificial Neural Networks (ANNs) in application to a set of tasks barely tractable for classical programs. The inefficiency of the von Neumann architecture in the execution of ANN-based solutions opens room for competition and pushes for an adequate response from hardware producers in the form of exploration of new chip architectures and designs. Based on an historical overview of the industry and on collected data, we identify three characteristics of a chip — (i) computing power, (ii) heterogeneity of computation, and (iii) energy efficiency — as focal points of demand interest and simultaneously as directions of product improvement for the semiconductor industry players and consolidate them into a techno– economic trilemma. Pooling together the trilemma and an analysis of the economic forces at work, we construct a simple model formalising the mechanism of demand distribution in the semiconductor industry, stressing in particular the role of its supporting services, the software domain. We conclude deriving two possible scenarios for chip evolution: (i) the emergence of a new dominant design in the form of a “platform chip” comprising heterogeneous cores; (ii) the fragmentation of the semiconductor industry into submarkets with dedicated chips. The convergence toward one of the proposed scenarios is conditional on (i) technological progress along the trilemma’s edges, (ii) advances in the software domain and its compatibility with hardware, (iii) the amount of tasks successfully addressed by this software, (iv) market structure and dynamics.
    Keywords: neural network; Artificial Intelligence, technological trajectory; semiconductor industry; hardware; software
    JEL: L63 O31 O33
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:sru:ssewps:2020-01&r=all
  7. By: April Yanyuan Wu; Peter Mariani; Jia Pu; Andrew Hurwitz
    Abstract: This is a proof of concept study for the proposition that machine learning can be used to classify free-form text of SSDI applicant medication information in SSA’s Structured Data Repository. Using this new approach, we documented the opioid use among a sample SSDI applicants.
    Keywords: disability, opioids, machine learning, SSDI
    URL: http://d.repec.org/n?u=RePEc:mpr:mprres:11942c9f2f484a59a26728ca01c1bd73&r=all
  8. By: Mucha, Tomasz; Seppälä, Timo
    Abstract: Abstract Three out of nine of S&P500 digital platform companies stand out as building own artificial intelligence (AI) platforms. There is overwhelming empirical evidence of AI technologies are being central to running a digital platform business. However, the current research agenda is not directing researchers to study AI technologies in the context of digital platforms. We have divided the proposed AI platforms research agenda as follows: The first set of questions we propose relates to an overall conceptualization of AI platforms. Thereafter, we recognize specific aspects of AI platforms, which need to be investigated in detail to gain understanding that is more complete. The second set of questions we propose relates to understanding the dynamics between AI platforms and the broader socio-economic context. This topic might be particularly relevant to economies of countries without indigenous AI platforms. Our paper builds on the proposition that AI is a general-purpose technology, which by itself carries properties of a digital platform.
    Keywords: Platforms, Digital Platform Economy, Artificial Intelligence, AI platforms, Research agenda
    JEL: M1 M21 O3 O33
    Date: 2020–02–06
    URL: http://d.repec.org/n?u=RePEc:rif:wpaper:76&r=all
  9. By: Dingju Zhu
    Abstract: Many researches have discussed the phenomenon and definition of sharing economy, but an understanding of sharing economy's reconstructions of the world remains elusive. We illustrate the mechanism of sharing economy's reconstructions of the world in detail based on big data including the mechanism of sharing economy's reconstructions of society, time and space, users, industry, and self-reconstruction in the future, which is very important for society to make full use of the reconstruction opportunity to upgrade our world through sharing economy. On the one hand, we established the mechanisms for sharing economy rebuilding society, industry, space-time, and users through qualitative analyses, and on the other hand, we demonstrated the rationality of the mechanisms through quantitative analyses of big data.
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2001.08926&r=all
  10. By: Boeing, Geoff (Northeastern University); Besbris, Max; Schachter, Ariela; Kuk, John
    Abstract: Housing scholars stress the importance of the information environment in shaping housing search behavior and outcomes. Rental listings have increasingly moved online over the past two decades and, in turn, online platforms like Craigslist are now central to the search process. Do these technology platforms serve as information equalizers or do they reflect traditional information inequalities that correlate with neighborhood sociodemographics? We synthesize and extend analyses of millions of US Craigslist rental listings and find they supply significantly different volumes, quality, and types of information in different communities. Technology platforms have the potential to broaden, diversify, and equalize housing search information, but they rely on landlord behavior and, in turn, likely will not reach this potential without a significant redesign or policy intervention. Smart cities advocates hoping to build better cities through technology must critically interrogate technology platforms and big data for systematic biases.
    Date: 2020–01–30
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:n5zws&r=all
  11. By: Geoff Boeing; Max Besbris; Ariela Schachter; John Kuk
    Abstract: Housing scholars stress the importance of the information environment in shaping housing search behavior and outcomes. Rental listings have increasingly moved online over the past two decades and, in turn, online platforms like Craigslist are now central to the search process. Do these technology platforms serve as information equalizers or do they reflect traditional information inequalities that correlate with neighborhood sociodemographics? We synthesize and extend analyses of millions of US Craigslist rental listings and find they supply significantly different volumes, quality, and types of information in different communities. Technology platforms have the potential to broaden, diversify, and equalize housing search information, but they rely on landlord behavior and, in turn, likely will not reach this potential without a significant redesign or policy intervention. Smart cities advocates hoping to build better cities through technology must critically interrogate technology platforms and big data for systematic biases.
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2001.11585&r=all
  12. By: Janet Currie; Henrik Kleven; Esmée Zwiers
    Abstract: The last 40 years have seen huge innovations in computing technology and data availability. Data derived from millions of administrative records or by using (as we do) new methods of data generation such as text mining are now common. New data often requires new methods, which in turn can inspire new data collection. If history is any guide, some methods will stick and others will prove to be a flash in the pan. However, the larger trends towards demanding greater credibility and transparency from researchers in applied economics and a “collage” approach to assembling evidence will likely continue.
    JEL: A0 B0 C0 H0 I0 J0 L0
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:26715&r=all
  13. By: Nir Jaimovich; Itay Saporta-Eksten; Henry Siu; Yaniv Yedid-Levi
    Abstract: The U.S. economy has experienced a significant drop in the fraction of the population employed in middle wage, “routine task-intensive” occupations. Applying machine learning techniques, we identify characteristics of those who used to be employed in such occupations and show they are now less likely to work in routine occupations. Instead, they are either non-participants in the labor force or working at occupations that tend to occupy the bottom of the wage distribution. We then develop a quantitative, heterogeneous agent, general equilibrium model of labor force participation, occupational choice, and capital investment. This allows us to quantify the role of advancement in automation technology in accounting for these labor market changes. We then use this framework as a laboratory to evaluate various public policies aimed at addressing the disappearance of routine employment and its consequent impacts on inequality.
    Keywords: Polarization, automation, routine employment, labor force participation, universal basic income, unemployment insurance, retraining
    JEL: E00 E23 E25 E60 J01 J2
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:zur:econwp:340&r=all
  14. By: Tarek Hassan (Boston University); Laurence van Lent (Frankfurt School of Finance and Management); Stephan Hollander (Tilburg University); Ahmed Tahoun (London Business School)
    Abstract: Using tools from computational linguistics, we construct new measures of the impact of Brexit on listed firms in the United States and around the world; these measures are based on the proportion of discussions in quarterly earnings conference calls on the costs, benefits, and risks associated with the UK’s intention to leave the EU. We identify which firms expect to gain or lose from Brexit and which are most affected by Brexit uncertainty. We then estimate effects of the different types of Brexit exposure on firm-level outcomes. We find that the impact of Brexit- related uncertainty extends far beyond British or even European firms; US and international firms most exposed to Brexit uncertainty lost a substantial fraction of their market value and have also reduced hiring and investment. In addition to Brexit uncertainty (the second moment), we find that international firms overwhelmingly expect negative direct effects from Brexit (the first moment) should it come to pass. Most prominently, firms expect difficulties from regulatory divergence, reduced labor mobility, limited trade access, and the costs of post-Brexit operational adjustments. Consistent with the predictions of canonical theory, this negative sentiment is recognized and priced in stock markets but has not yet significantly affected firm actions.
    Keywords: Brexit, uncertainty, sentiment, machine learning, cross-country effects
    JEL: D8 E22 E24 E32 E6 F0 G18 G32 G38 H32
    URL: http://d.repec.org/n?u=RePEc:thk:wpaper:106&r=all
  15. By: Fabio Montobbio; Jacopo Staccioli; Maria Enrica Virgillito; Marco Vivarelli
    Abstract: This paper investigates the presence of explicit labour-saving heuristics within robotic patents. It analyses innovative actors engaged in robotic technology and their economic environment (identity, location, industry), and identifies the technological fields particularly exposed to labour-saving innovations. It exploits advanced natural language processing and probabilistic topic modelling techniques on the universe of patent applications at the USPTO between 2009 and 2018, matched with ORBIS (Bureau van Dijk) firm-level dataset. The results show that labour-saving patent holders comprise not only robots producers, but also adopters. Consequently, labour-saving robotic patents appear along the entire supply chain. The paper shows that labour-saving innovations challenge manual activities (e.g. in the logistics sector), activities entailing social intelligence (e.g. in the healthcare sector) and cognitive skills (e.g. learning and predicting).
    Keywords: Robotic Patents; Labour-Saving Technology; Search Heuristics; Probabilistic Topic Models.
    Date: 2020–02–05
    URL: http://d.repec.org/n?u=RePEc:ssa:lemwps:2020/03&r=all
  16. By: Zimmert, Franziska; Zimmert, Michael
    Abstract: Employment subsidies can incentivize mothers to shorten employment interruptions after childbirth. We examine a German parental leave reform promoting an early return to work in part-time. Exploiting the exogenous variation in the benefit entitlement length defined by the child’s birthday, we apply machine-learning augmented semi-parametric difference-indifference estimation using administrative data. The reform yields positive average employment effects mainly driven by part-time employment as our dynamic optimization model for mothers on parental leave suggests. Conditional effects show that the policy creates heterogenous incentives depending on the opportunity costs of working part-time.
    Keywords: Causal machine learning, effect heterogeneity, maternal labor supply, parental leave, Germany
    JEL: J21 J22 C14
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:usg:econwp:2020:02&r=all
  17. By: Geoff Boeing; Jake Wegmann; Junfeng Jiao
    Abstract: Traditional US rental housing data sources such as the American Community Survey and the American Housing Survey report on the transacted market - what existing renters pay each month. They do not explicitly tell us about the spot market - i.e., the asking rents that current homeseekers must pay to acquire housing - though they are routinely used as a proxy. This study compares governmental data to millions of contemporaneous rental listings and finds that asking rents diverge substantially from these most recent estimates. Conventional housing data understate current market conditions and affordability challenges, especially in cities with tight and expensive rental markets.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.01578&r=all
  18. By: Omotosho, Babatunde S.
    Abstract: Effective central bank communication is useful for anchoring market expectations and enhancing macroeconomic stability. In this paper, the communication strategy of the Bank of Ghana (BOG) is analysed using BOG’s monetary policy committee press releases for the period 2018-2019. Specifically, we apply text mining techniques to investigate the readability, sentiments and hidden topics of the policy documents. Our results provide evidence of increased central bank communication during the sample period, implying improved monetary policy transparency. Also, the computed Coleman and Liau (1975) readability index shows that the word and sentence structures of the press releases have become less complex, indicating increased readability. Furthermore, we find an average monetary policy net sentiment score of 3.9 per cent. This means that the monetary policy committee expressed positive sentiments regarding policy and macroeconomic outlooks during the period. Finally, the estimated topic model reveals that the topic proportion for “monetary policy and inflation” was prominent in the year 2018 while concerns regarding exchange rate were strong in 2019. The paper recommends that in order to enhance monetary policy communication, the Bank of Ghana should continue to improve on the readability of the monetary policy press releases.
    Keywords: Central bank communication, text mining, monetary policy
    JEL: E52 E58 E65
    Date: 2019–11–15
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:98297&r=all
  19. By: Lopez Cordova,Jose Ernesto
    Abstract: Tourism is an important source of foreign exchange and employment across developing economies. A scant literature has explored the relationship between tourism and the advent of the internet. This paper contributes to the tourism-trade literature and studies the empirical relationship between international tourism and the adoption of digital technologies that facilitate search about tourism opportunities across countries. It links foreign visits with the spread of the use of the internet in sending countries and the level of development of business-to-consumer digital tools in host countries. The paper estimates a well-specified gravity model of tourist arrivals between country pairs with panel data. The results indicate that frictions affecting bilateral tourism flows have been attenuated by the advent of digital tools. The absolute value of the effects of bilateral geographic distance, language differences, and border-contiguity seem to be reduced by the use of the internet by potential tourists and the business sector in host countries. The results are robust to alternative proxies for internet use for tourism search proxied by data from Google trends. The paper also presents simulations of the potential impacts of advances in the adoption of digital tools over time, linking the adoption process to mechanisms of technology adoption that are commonplace in the literature.
    Date: 2020–02–13
    URL: http://d.repec.org/n?u=RePEc:wbk:wbrwps:9147&r=all
  20. By: George Monokroussos (Amazon - Seattle); Yongchen Zhao (Department of Economics, Towson University)
    Abstract: We construct a "Google Recession Index" (GRI) using Google Trends data on internet search popularity, which tracks the public’s attention to recession-related keywords in real time. We then compare nowcasts made with and without this index using both a standard dynamic factor model and a Bayesian approach with alternative prior setups. Our results indicate that using the Bayesian model with GRI-based "popularity priors" we could identify the 2008Q3 turning point in real time, without sacrificing the accuracy of the nowcasts over the rest of the sample periods.
    Keywords: Gibbs Sampling, Factor Models, Kalman Filter, Real-Time Data, Google Trends Monetary Policy, Great Recession.
    JEL: C11 C22 C53 E37 E52
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:tow:wpaper:2020-01&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.