nep-big New Economics Papers
on Big Data
Issue of 2020‒09‒28
sixteen papers chosen by
Tom Coupé
University of Canterbury

  1. Inequality and Artificial Intelligence in European Union By Mihail Caradaica
  2. Industrial Growth in Sub-Saharan Africa: Evidence from Machine Learning with Insights from Nightlight Satellite Images By Otchia, Christian; Asongu, Simplice
  3. Big Data and Happiness By Rossouw, Stephanie; Greyling, Talita
  4. Learning from Data and Network Effects: The Example of Internet Search By Maximilian Schäfer; Geza Sapi
  5. Artificial Intelligence, Income Distribution and Economic Growth By Gries, Thomas; Naudé, Wim
  6. Prévision de l’activité économique au Québec et au Canada à l’aide des méthodes Machine Learning By Philippe Goulet Coulombe; Maxime Leroux; Dalibor Stevanovic; Stéphane Surprenant
  7. Machine learning techniques for strawberry yield forecasting By Li, Sheng; Wu, Feng; Guan, Zhengfei
  8. Using Google data to understand governments’ approval in Latin America By Nathalia Montoya; Sebastián Nieto-Parra; René Orozco; Juan Vázquez Zamora
  9. ‘Fruchtfolge’: A crop rotation decision support system for optimizing cropping choices with big data and spatially explicit modeling By Pahmeyer, Christoph; Kuhn, Till; Britz, Wolfgang
  10. Data: A collaborative ? By Jean-Sebastien Lacam
  11. Two-Stage Least Squares Random Forests with an Application to Angrist and Evans (1998) By Biewen, Martin; Kugler, Philipp
  12. Technologies numériques, intelligence artificielle et responsabilité By Christine Balagué
  13. Expanding the measurement of culture with a sample of two billion humans By Obradovich, Nick; Özak, Ömer; Martín, Ignacio; Ortuño-Ortín, Ignacio; Awad, Edmond; Cebrián, Manuel; Cuevas, Rubén; Desmet, Klaus; Rahwan, Iyad; Cuevas, Ángel
  14. What A Deep Learning Approach Say about Future US Soybean Yields By Xiong, Tao; Ji, Yongjie; Ficklin, Darren
  15. Deep Learning, Predictability, and Optimal Portfolio Returns By Mykola Babiak; Jozef Barunik
  16. Data-intensive Innovation and the State: Evidence from AI Firms in China By Martin Beraja; David Y. Yang; Noam Yuchtman

  1. By: Mihail Caradaica (National University of Political Studies and Public Administration)
    Abstract: Researchers and engineers of the 21st century have produced technologies that might deeply change our way of life. There is Blockchain which could revolutionise the trust between people and the financial sector, Internet of Things that can allow machines to communicate with each other to provide better services, and Artificial Intelligence that assigns machines with the ability to ?think? and empowers them to make decisions by themselves. The intersection between technological development and society ? understood as economic activities, social habits, politics, political institutions, etc. ? has always been a delicate issue in human history. It can generate both wealth and poverty, wars and peace or illnesses and health. It all depends on how we use technology and how prepared we are to accept changes and to adapt to them. Artificial Intelligence fits all previous scenarios and it generates highly concern among regular people. Therefore, in this paper, I will try to answer the following research question: does artificial intelligence have the potential to create more inequality in the European Union? In the first phase of this endeavour, I will analyse the AI?s state of the art to see the most recent achievements in the field, its area of implementation and the potential it could achieve. Secondly, by using the concept of digital divide, I will try to figure out what are the mechanisms of this new technology that could create more inequality. Digital divide focuses on the possibility that people would become even more marginalized due to the lack of basic skills and the impossibility to afford new technologies available on the market. Then, my case study will focus on the European Union, which is one of the three main global actors in the field. Because AI is still an emerging technology, I will focus on AI strategies of the EU member states in order to emphasise possible future cleavages.
    Keywords: Artificial intelligence, digital divide, European Union, inequality, machine learning
    URL: http://d.repec.org/n?u=RePEc:sek:iacpro:10612985&r=all
  2. By: Otchia, Christian; Asongu, Simplice
    Abstract: This study uses nightlight time data and machine learning techniques to predict industrial development in Africa. The results provide the first evidence on how machine learning techniques and nightlight data can be used to predict economic development in places where subnational data are missing or not precise. Taken together, the research confirms four groups of important determinants of industrial growth: natural resources, agriculture growth, institutions, and manufacturing imports. Our findings indicate that Africa should follow a more multisector approach for development, putting natural resources and agriculture productivity growth at the forefront.
    Keywords: Industrial growth; Machine learning; Africa
    JEL: I32 O15 O40 O55
    Date: 2019–01
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:101524&r=all
  3. By: Rossouw, Stephanie; Greyling, Talita
    Abstract: The pursuit of happiness. What does that mean? Perhaps a more prominent question to ask is, 'how does one know whether people have succeeded in their pursuit'? Survey data, thus far, has served us well in determining where people see themselves on their journey. However, in an everchanging world, one needs high-frequency data instead of data released with significant time-lags. High-frequency data, which stems from Big Data, allows policymakers access to virtually real-time information that can assist in effective decision-making to increase the quality of life for all. Additionally, Big Data collected from, for example, social media platforms give researchers unprecedented insight into human behaviour, allowing significant future predictive powers.
    Keywords: Happiness,Big Data,Sentiment analysis
    JEL: C88 I31 I39 J18
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:zbw:glodps:634&r=all
  4. By: Maximilian Schäfer; Geza Sapi
    Abstract: The rise of dominant firms in data driven industries is often credited to their alleged data advantage. Empirical evidence lending support to this conjecture is surprisingly scarce. In this paper we document that data as an input into machine learning tasks display features that support the claim of data being a source of market power. We study how data on keywords improve the search result quality on Yahoo!. Search result quality increases when more users search a keyword. In addition to this direct network effect caused by more users, we observe a novel externality that is caused by the amount of data that the search engine collects on the particular users. More data on the personal search histories of the users reinforce the direct network effect stemming from the number of users searching the same keyword. Our findings imply that a search engine with access to longer user histories may improve the quality of its search results faster than an otherwise equally efficient rival with the same size of user base but access to shorter user histories.
    Keywords: Competition, network effects, search engines, Big Data
    JEL: L12 L41 L81 L86
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:diw:diwwpp:dp1894&r=all
  5. By: Gries, Thomas (University of Paderborn); Naudé, Wim (RWTH Aachen University)
    Abstract: The economic impact of Articial Intelligence (AI) is studied using a (semi) endogenous growth model with two novel features. First, the task approach from labor economics is reformulated and integrated into a growth model. Second, the standard representative household assumption is rejected, so that aggregate demand restrictions can be introduced. With these novel features it is shown that (i) AI automation can decrease the share of labor income no matter the size of the elasticity of substitution between AI and labor, and (ii) when this elasticity is high, AI will unambiguously reduce aggregate demand and slow down GDP growth, even in the face of the positive technology shock that AI entails. If the elasticity of substitution is low, then GDP, productivity and wage growth may however still slow down, because the economy will then fail to benefit from the supply-side driven capacity expansion potential that AI can deliver. The model can thus explain why advanced countries tend to experience, despite much AI hype, the simultaneous existence of rather high employment with stagnating wages, productivity, and GDP.
    Keywords: technology, artificial intelligence, productivity, labor demand, income distribution, growth theory
    JEL: O47 O33 J24 E21 E25
    Date: 2020–08
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp13606&r=all
  6. By: Philippe Goulet Coulombe; Maxime Leroux; Dalibor Stevanovic; Stéphane Surprenant
    Abstract: Dans ce rapport nous appliquons de nombreuses techniques d’apprentissage automatique (Machine Learning) au problème de prévision de l’activité économique au Québec et au Canada. Six groupes de modèles sont considérés : les modèles à facteurs, régressions pénalisées, régressions régularisées par sous-ensembles complets, régressions à vecteurs de support, forêts d’arbres aléatoires et les réseaux de neurones. Tous ces modèles apportent différentes façons de gérer les grands ensembles de données et de générer les formes fonc-tionnelles hautement complexes. La prédiction de 16 variables macroéconomiques québécoises et canadiennes est évaluée dans un exercice de prévision hors échantillon. Les grands ensembles de données canadiennes et américaines sont considérés. Les résultats indiquent que les méthodes machine learning, combinées avec les grands ensembles de données, ont un bon pouvoir prédictif pour plusieurs variables d’activité réelle comme le PIB, la formation brute de capital fixe et la production industrielle. Les forêts d’arbres aléatoires sont particulièrement résiliantes, suivies des réseaux de neurones. La prévision des variables du marché d’emploi est améliorée par l’utilisation des régressions pénalisées, simples ou par sous-ensembles complets. Les taux d’inflation sont prévisibles avec les forêts aléatoires et les régressions pénalisées. Quant aux mises en chantier et le taux de change USD/CAD, les méthodes machine learning n’arrivent pas à améliorer la prévision ponctuelle, mais affichent des résultats intéressants au niveau de la prévision de la direction future de ces variables.
    Keywords: , Prévision,Macroéconomie,Données massives,Machine Learning
    Date: 2020–08–27
    URL: http://d.repec.org/n?u=RePEc:cir:cirpro:2020rp-18&r=all
  7. By: Li, Sheng; Wu, Feng; Guan, Zhengfei
    Keywords: Agribusiness, Research Methods/Statistical Methods, Risk and Uncertainty
    Date: 2020–07
    URL: http://d.repec.org/n?u=RePEc:ags:aaea20:304502&r=all
  8. By: Nathalia Montoya; Sebastián Nieto-Parra; René Orozco; Juan Vázquez Zamora
    Abstract: This paper studies the potential drivers of governments’ approval rates in 18 Latin American countries using Internet search query data from Google Trends and traditional data sources. It employs monthly panel data between January 2006 and December 2015. The analysis tests several specifications including traditional explanatory variables of governments’ approval rates – i.e. inflation, unemployment rate, GDP growth, output gap – and subjective explanatory variables – e.g. perception of corruption and insecurity. For the latter, it uses Internet search query data to proxy citizens’ main social concerns, which are expected to drive governments’ approval rates. The results show that the perception of corruption and insecurity, and complaints about public services have a statistically significant association with governments’ approval rates. This paper also discusses the potential of Internet search query data as a tool for policy makers to understand better citizens’ perceptions, since it provides highly anonymous and high-frequency series in real-time.
    Keywords: big data, citizens’ perceptions, governments’ approval, Latin America, social contract
    JEL: D72 H11 O3
    Date: 2020–09–28
    URL: http://d.repec.org/n?u=RePEc:oec:devaaa:343-en&r=all
  9. By: Pahmeyer, Christoph; Kuhn, Till; Britz, Wolfgang
    Abstract: Deciding on which crop to plant on a field and how to fertilize it has become increasingly complex as volatile markets, location factors as well as policy restrictions need to be considered simultaneously. To assist farmers in this process, we develop the web-based, open source decision support system ‘Fruchtfolge’ (German for ‘crop rotation’). It provides decision makers with a crop and management recommendation for each field based on the solution of a single farm optimization model. The optimization model accounts for field specific location factors, labor endowments, field-to-farm distances and policy restrictions such as measures linked to the EU Nitrates Directives and the Greening of the EU Common Agricultural Policy. ‘Fruchtfolge’ is user-friendly by automatically including big data related to farm, location and management characteristics and providing instant feedback on alternative management choices. This way, creating a first optimal cropping plan generally requires less than five minutes. We apply the decision support system to a German case study farm which manages fields outside and inside a nitrate sensitive area. In the year 2021, revised fertilization regulations come in force in Germany, which amongst others lowers maximal allowed nitrogen applications relative to crop nutrient needs in nitrate sensitive areas. The regulations provoke profit losses of up to 15% for the former optimal crop rotation. The optimal adaptation strategy proposed by ‘Fruchfolge’ diminishes this loss to 10%. The reduction in profit loss clearly underlines the benefits of our support tool to take optimal cropping decisions in a complex environment. Future research should identify barriers of farmers to apply decision support systems and upon availability, integrate more detailed crop and field specific sensor data.
    Keywords: Agribusiness, Crop Production/Industries, Farm Management, Land Economics/Use, Production Economics, Productivity Analysis, Research and Development/Tech Change/Emerging Technologies, Research Methods/ Statistical Methods
    Date: 2020–09–18
    URL: http://d.repec.org/n?u=RePEc:ags:ubfred:305287&r=all
  10. By: Jean-Sebastien Lacam (ESSCA Research Lab - ESSCA - Groupe ESSCA, CleRMa - Clermont Recherche Management - Clermont Auvergne - École Supérieure de Commerce (ESC) - Clermont-Ferrand - UCA - Université Clermont Auvergne)
    Abstract: This study examines the interdependence of relational strategies and data management policies of SMEs during product innovation. The type of data management developed by a small firm to support its innovation efforts requires it to engage in competitive, vertical cooperative or coopetitive relationships. An empirical study of 109 leaders of French high-tech SMEs provides a descriptive and explanatory analysis of this question. This empirical study combines three theoretical dimensions: the characteristics of a Big Data policy, of an innovation product and of a relational strategy. We enrich the existing knowledge concerning the exploitation of data by SMEs by presenting a typology of their data strategies. We also find that Big data and Smart data policies are deployed by SMEs to support product innovation. Finally, we show that SMEs implement data management individually to support radical product innovation but will collaborate to support incremental product innovation. The nature of the data innovation guides the relational context of the SME. This study deepens the interdependence of data management and relational strategies among SMEs.
    Keywords: Data management,product innovation,competition,vertical cooperation,coopetition,SMEs,Big data challenges
    Date: 2020–05
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-02930902&r=all
  11. By: Biewen, Martin (University of Tuebingen); Kugler, Philipp (Institut für Angewandte Wirtschaftsforschung (IAW))
    Abstract: We develop the case of two-stage least squares estimation (2SLS) in the general framework of Athey et al. (Generalized Random Forests, Annals of Statistics, Vol. 47, 2019) and provide a software implementation for R and C++. We use the method to revisit the classic application of instrumental variables in Angrist and Evans (Children and Their Parents' Labor Supply: Evidence from Exogenous Variation in Family Size, American Economic Review, Vol. 88, 1998). The two-stage least squares random forest allows one to investigate local heterogenous effects that cannot be investigated using ordinary 2SLS.
    Keywords: machine learning, generalized random forests, fertility, instrumental variable estimation
    JEL: C26 C55 J22 J13 C14
    Date: 2020–08
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp13613&r=all
  12. By: Christine Balagué (CONNECT - Consommateur Connecté dans la Société Numérique - DEFI - Département Droit, Economie et Finances - TEM - Télécom Ecole de Management - IMT - Institut Mines-Télécom [Paris] - IMT-BS - Institut Mines-Télécom Business School - IMT-BS - Institut Mines-Télécom Business School - MMS - Département Management, Marketing et Stratégie - TEM - Télécom Ecole de Management - IMT - Institut Mines-Télécom [Paris] - IMT-BS - Institut Mines-Télécom Business School, LITEM - Laboratoire en Innovation, Technologies, Economie et Management (EA 7363) - UEVE - Université d'Évry-Val-d'Essonne - IMT-BS - Institut Mines-Télécom Business School, MMS - Département Management, Marketing et Stratégie - TEM - Télécom Ecole de Management - IMT - Institut Mines-Télécom [Paris] - IMT-BS - Institut Mines-Télécom Business School)
    Abstract: Les innovations technologiques dans le secteur de la santé permettent des progrès considérables, mais le développement des objets connectés, des robots ou des algorithmes utilisant des méthodologies d'apprentissage machine de plus en plus complexes, est source d'impacts sociétaux majeurs. Quels sont donc les enjeux éthiques d'une médecine s'appuyant sur des technologies d'intelligence artificielle ? Comment repenser l'humain au coeur de la e-santé ? Comment concevoir des technologies responsables et éthiques ? Quelles sont les courants de pensée actuels et les actions menées pour penser une médecine du futur responsable ?
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-02907065&r=all
  13. By: Obradovich, Nick (Max Planck Institute for Human Development); Özak, Ömer (Southern Methodist University); Martín, Ignacio; Ortuño-Ortín, Ignacio; Awad, Edmond; Cebrián, Manuel; Cuevas, Rubén; Desmet, Klaus; Rahwan, Iyad; Cuevas, Ángel
    Abstract: Culture has played a pivotal role in human evolution. Yet, the ability of social scientists to study culture is limited by currently available measurement instruments. Scholars of culture must regularly choose between scalable but sparse survey-based methods or restricted but rich ethnographic methods. Here, we demonstrate that massive online social networks can advance the study of human culture by providing quantitative, scalable, and high-resolution measurement of behaviorally revealed cultural values and preferences. We employ publicly available data across nearly 60,000 topic dimensions drawn from two billion Facebook users across 225 countries and territories. The data capture preferences inferred by Facebook from online behaviours on the platform, behaviors on external websites and apps, and offline behaviours captured by smartphones and other devices. We first validate that cultural distances calculated from this measurement instrument correspond to survey-based and objective measures of cultural differences. We then demonstrate that this measure enables insight into the cultural landscape globally at previously impossible resolution. We analyze the importance of national borders in shaping culture and explore unique cultural markers that identify subnational population groups. The global collection of massive data on human behavior provides a high-dimensional complement to traditional cultural metrics, potentially enabling novel insight into fundamental questions in the social sciences. The measure enables detailed investigation into the countries’ geopolitical stability, social cleavages within both small and large-scale human groups, the integration of migrant populations, and the disaffection of certain population groups from the political process, among myriad other potential future applications.
    Date: 2020–09–09
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:qkf42&r=all
  14. By: Xiong, Tao; Ji, Yongjie; Ficklin, Darren
    Keywords: Production Economics, Research Methods/Statistical Methods, Productivity Analysis
    Date: 2020–07
    URL: http://d.repec.org/n?u=RePEc:ags:aaea20:304452&r=all
  15. By: Mykola Babiak; Jozef Barunik
    Abstract: We study optimal dynamic portfolio choice of a long-horizon investor who uses deep learning methods to predict equity returns when forming optimal portfolios. The results show statistically and economically significant out-of-sample portfolio benefits of deep learning as measured by high certainty equivalent returns and Sharpe ratios. Return predictability via deep learning generates substantially improved portfolio performance across different subsamples, particularly the recession periods. These gains are robust to including transaction costs, short-selling and borrowing constraints.
    Date: 2020–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2009.03394&r=all
  16. By: Martin Beraja; David Y. Yang; Noam Yuchtman
    Abstract: Data-intensive technologies, like AI, are increasingly widespread. We argue that the direction of innovation and growth in data-intensive economies may be crucially shaped by the state because: (i) the state is a key collector of data and (ii) data is sharable across uses within firms, potentially generating economies of scope. We study a prototypical setting: facial recognition AI in China. Collecting comprehensive data on firms and government procurement contracts, we find evidence of economies of scope arising from government data: firms awarded contracts providing access to more government data produce both more government and commercial software. We then build a directed technical change model to study the implications of government data access for the direction of innovation, growth, and welfare. We conclude with three applications showing how data-intensive innovation may be shaped by the state: both directly, by setting industrial policy; and indirectly, by choosing surveillance levels and privacy regulations.
    JEL: E0 H4 L5 L63 O25 O30 O40 P00 P16 Z21
    Date: 2020–08
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:27723&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.