nep-big New Economics Papers
on Big Data
Issue of 2023‒05‒08
sixteen papers chosen by
Tom Coupé
University of Canterbury

  1. Big Data, Algorithms, AI, Ethics, and the Economy: An Aristotelian Perspective By Ricardo F. Crespo
  2. The political economy of AI: Towards democratic control of the means of prediction By Kasy, Maximilian
  3. Oracle Counterpoint: Relationships between On-chain and Off-chain Market Data By Zhimeng Yang; Ariah Klages-Mundt; Lewis Gudgeon
  4. The Cost of Influence:How Gifts to Physicians Shape Prescriptions and Drug Costs By Melissa Newham; Marica Valente
  5. Web-scraping housing prices in real-time: The Covid-19 crisis in the UK By Jean-Charles Bricongne; Baptiste Meunier; Sylvain Pouget
  6. OFTER: An Online Pipeline for Time Series Forecasting By Nikolas Michael; Mihai Cucuringu; Sam Howison
  7. Greenhouse gases emissions: estimating corporate non-reported emissions using interpretable machine learning By Jeremi Assael; Thibaut Heurtebize; Laurent Carlier; François Soupé
  8. Dissecting the explanatory power of ESG features on equity returns by sector, capitalization, and year with interpretable machine learning By Jérémi Assael; Laurent Carlier; Damien Challet
  9. Peer Prediction for Peer Review: Designing a Marketplace for Ideas By Alexander Ugarov
  10. Narrative-Driven Fluctuations in Sentiment: Evidence Linking Traditional and Social Media By Alistair Macaulay; Wenting Song
  11. Regulatory Markets: The Future of AI Governance By Gillian K. Hadfield; Jack Clark
  12. The Economic Effect of Gaining a New Qualification Later in Life By Finn Lattimore; Daniel M. Steinberg; Anna Zhu
  13. Finding Anomalies in China By Hou, Kewei; Qiao, Fang; Zhang, Xiaoyan
  14. Reinforcement learning for optimization of energy trading strategy By {\L}ukasz Lepak; Pawe{\l} Wawrzy\'nski
  15. Data, Competition, and Digital Platforms By Dirk Bergemann; Alessandro Bonatti
  16. Mastering Pair Trading with Risk-Aware Recurrent Reinforcement Learning By Weiguang Han; Jimin Huang; Qianqian Xie; Boyi Zhang; Yanzhao Lai; Min Peng

  1. By: Ricardo F. Crespo (Universidad Austral)
    Abstract: While a growing body of literature points to the advantages of using algorithms in big data processing, as well as applying them to artificial intelligence (AI), in order to achieve a desired output, it also warns about the pitfalls and perils in algorithm decision-making. Algorithms and AI are the machines and big data is the new oil. Criticisms come from different fields: legal, social, political, medical, and the economic. They argue that algorithms have the power to predict our wishes and behavior and, subsequently, to manage our life: they decide the music we listen to, the news we read, the information we obtain, the content we see online, the movies we watch, the health care we receive, the products we buy, and so on.
    Date: 2023–04
    URL: http://d.repec.org/n?u=RePEc:aoz:wpaper:232&r=big
  2. By: Kasy, Maximilian
    Abstract: This chapter discusses the regulation of artificial intelligence (AI) from the vantage point of political economy. By "political economy" I mean a perspective which emphasizes that there are different people and actors in society who have divergent interests and unequal access to resources and power. By "artificial intelligence" I mean the construction of autonomous systems that maximize some notion of reward. The construction of such systems typically draws on the tools of machine learning and optimization. AI and machine learning are used in an ever wider array of socially consequential settings. This includes labor markets, education, criminal justice, health, banking, housing, as well as the curation of information by search engines, social networks, and recommender systems. There is a need for public debates about desirable directions of technical innovation, the use of technologies, and constraints to be imposed on technologies. In this chapter, I review some frameworks to help structure such debates. The discussion in this chapter is opinionated and based on the following premises: AI concerns the construction of systems which maximize a measurable objective (reward). Such systems take data as an input, and produce chosen actions as an output. Maximization of a singular objective by autonomous systems is taking place in a social world where different individuals have divergent objectives. These divergent objectives might stand in conflict. Evaluated in terms of these divergent objectives, the actions and policies chosen by AI systems (almost) always generate winners and losers. Going from individual-level assessments of gains and losses to society-level assessments requires aggregation, which trades off gains and losses across individuals. In order to normatively evaluate AI, as well as proposed regulations, we need to explicitly assess the resulting individual gains and losses, and explicitly aggregate these gains and losses across individuals. The social issues raised by AI, including questions of fairness, privacy, value alignment, accountability, and automation, can only be resolved through democratic control of algorithm objectives, and of the means to obtain them - data and computational infrastructure. Democratic control requires public debate and binding collective decision-making, at many different levels of society. My discussion draws on concepts and references from machine learning, economics, and social choice theory. I touch on several debates regarding the ethics and social impact of artificial intelligence, without any pretension of doing justice to the vast and growing literature on these topics; instead my goal is to give an internally coherent and principled account.
    Date: 2023–04
    URL: http://d.repec.org/n?u=RePEc:amz:wpaper:2023-06&r=big
  3. By: Zhimeng Yang; Ariah Klages-Mundt; Lewis Gudgeon
    Abstract: We investigate the theoretical and empirical relationships between activity in on-chain markets and pricing in off-chain cryptocurrency markets (e.g., ETH/USD prices). The motivation is to develop methods for proxying off-chain market data using data and computation that is in principle verifiable on-chain and could provide an alternative approach to blockchain price oracles. We explore relationships in PoW mining, PoS validation, block space markets, network decentralization, usage and monetary velocity, and on-chain liquidity pools and AMMs. We select key features from these markets, which we analyze through graphical models, mutual information, and ensemble machine learning models to explore the degree to which off-chain pricing information can be recovered entirely on-chain. We find that a large amount of pricing information is contained in on-chain data, but that it is generally hard to recover precise prices except on short time scales of retraining the model. We discuss how even a noisy trustless data source such as this can be helpful toward minimizing trust requirements of oracle designs.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.16331&r=big
  4. By: Melissa Newham; Marica Valente
    Abstract: This paper studies how gifts – monetary or in-kind payments – from drug firms to physicians in the US affect prescriptions and drug costs. We estimate heterogeneous treatment effects by combining physician-level data on antidiabetic prescriptions and payments with causal inference and machine learning methods.We find that payments cause physicians to prescribe more brand drugs, resulting in a cost increase of $ 30 per dollar received. Responses differ widely across physicians, and are primarily explained by variation in patients’ out-of-pocket costs. A gift ban is estimated to decrease drug costs by 3-4 %. Taken together, these novel findings reveal how payments shape prescription choices and drive up costs.
    Keywords: public health, payments to physicians, gift ban, heterogeneous treatment effects, causal machine learning
    JEL: I11 I18 M31
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:inn:wpaper:2023-03&r=big
  5. By: Jean-Charles Bricongne (Centre de recherche de la Banque de France - Banque de France, LEO - Laboratoire d'Économie d'Orleans [2022-...] - UO - Université d'Orléans - UT - Université de Tours - UCA - Université Clermont Auvergne, LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po); Baptiste Meunier (Centre de recherche de la Banque Centrale européenne - Banque Centrale Européenne, AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique); Sylvain Pouget (Grenoble INP ENSIMAG - École nationale supérieure d'informatique et de mathématiques appliquées - UGA - Université Grenoble Alpes - Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology - UGA - Université Grenoble Alpes)
    Abstract: While official statistics provide lagged and aggregate information on the housing market, extensive information is available publicly on real-estate websites. By web-scraping them for the UK on a daily basis, this paper extracts a large database from which we build timely and highly granular indicators. One originality of the dataset is to focus on the supply side of the housing market, allowing to compute innovative indicators reflecting the sellers' perspective such as the number of new listings posted or how prices fluctuate over time for existing listings. Matching listing prices in our dataset with transacted prices from the notarial database, using machine learning, also measures the negotiation margin of buyers. During the Covid-19 crisis, these indicators demonstrate the freezing of the market and the "wait-and-see" behaviour of sellers. They also show that listing prices after the lockdown experienced a continued decline in London but increased in other regions.
    Keywords: Housing, Real time, Big data, Web-scraping, High frequency, United Kingdom
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-04064185&r=big
  6. By: Nikolas Michael; Mihai Cucuringu; Sam Howison
    Abstract: We introduce OFTER, a time series forecasting pipeline tailored for mid-sized multivariate time series. OFTER utilizes the non-parametric models of k-nearest neighbors and Generalized Regression Neural Networks, integrated with a dimensionality reduction component. To circumvent the curse of dimensionality, we employ a weighted norm based on a modified version of the maximal correlation coefficient. The pipeline we introduce is specifically designed for online tasks, has an interpretable output, and is able to outperform several state-of-the art baselines. The computational efficacy of the algorithm, its online nature, and its ability to operate in low signal-to-noise regimes, render OFTER an ideal approach for financial multivariate time series problems, such as daily equity forecasting. Our work demonstrates that while deep learning models hold significant promise for time series forecasting, traditional methods carefully integrating mainstream tools remain very competitive alternatives with the added benefits of scalability and interpretability.
    Date: 2023–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2304.03877&r=big
  7. By: Jeremi Assael (BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab, MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec - Université Paris-Saclay); Thibaut Heurtebize (BNP Paribas Asset Management, Quantitative Research Group, Research Lab); Laurent Carlier (BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab); François Soupé (BNP Paribas Asset Management, Quantitative Research Group, Research Lab)
    Abstract: As of 2022, greenhouse gases (GHG) emissions reporting and auditing are not yet compulsory for all companies, and methodologies of measurement and estimation are not unified. We propose a machine learning-based model to estimate scope 1 and scope 2 GHG emissions of companies not reporting them yet. Our model, designed to be transparent and completely adapted to this use case, is able to estimate emissions for a large universe of companies. It shows good out-of-sample global performances as well as good out-of-sample granular performances when evaluating it by sectors, countries, or revenue buckets. We also compare the model results to those of other providers and find our estimates to be more accurate. Explainability tools based on Shapley values allow the constructed model to be fully interpretable, the user being able to understand which factors split explains the GHG emissions for each particular company.
    Keywords: sustainability, disclosure, greenhouse gas emissions, machine learning, interpretability, carbon emissions, scope 1, scope 2, interpretable machine learning
    Date: 2023–02–13
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-03905325&r=big
  8. By: Jérémi Assael (BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab, MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec - Université Paris-Saclay); Laurent Carlier (BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab); Damien Challet (MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec - Université Paris-Saclay)
    Abstract: We systematically investigate the links between price returns and Environment, Social and Governance (ESG) features in the European market. We propose a cross-validation scheme with random company-wise validation to mitigate the relative initial lack of quantity and quality of ESG data, which allows us to use most of the latest and best data to both train and validate our models. Boosted trees successfully explain a part of annual price returns not accounted by the market factor. We check with benchmark features that ESG features do contain significantly more information than basic fundamental features alone. The most relevant sub-ESG feature encodes controversies. Finally, we find opposite effects of better ESG scores on the price returns of small and large capitalization companies: better ESG scores are generally associated with larger price returns for the latter, and reversely for the former.
    Keywords: ESG features, sustainable investing, interpretable machine learning, model selection, asset management, equity returns, ESG data
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-03791538&r=big
  9. By: Alexander Ugarov
    Abstract: The paper describes a potential platform to facilitate academic peer review with emphasis on early-stage research. This platform aims to make peer review more accurate and timely by rewarding reviewers on the basis of peer prediction algorithms. The algorithm uses a variation of Peer Truth Serum for Crowdsourcing (Radanovic et al., 2016) with human raters competing against a machine learning benchmark. We explain how our approach addresses two large productive inefficiencies in science: mismatch between research questions and publication bias. Better peer review for early research creates additional incentives for sharing it, which simplifies matching ideas to teams and makes negative results and p-hacking more visible.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.16855&r=big
  10. By: Alistair Macaulay; Wenting Song
    Abstract: This paper studies the role of narratives for macroeconomic fluctuations. We micro-found narratives as directed acyclic graphs and show how exposure to different narratives can affect expectations in an otherwise standard macroeconomic model. We capture such competing narratives in news media’s reports on a US yield curve inversion by using techniques in natural language processing. Linking these media narratives to social media data, we show that exposure to a recessionary narrative is associated with a more pessimistic sentiment, while exposure to a nonrecessionary narrative implies no such change in sentiment. In a model with financial frictions, narrative-driven beliefs create a trade-off for quantitative easing: extended periods of quantitative easing make narrative-driven waves of pessimism more frequent, but smaller in magnitude.
    Keywords: Financial markets; Inflation and prices; Monetary policy
    JEL: D84 E32 E43 E44 E5 G1
    Date: 2023–04
    URL: http://d.repec.org/n?u=RePEc:bca:bocawp:23-23&r=big
  11. By: Gillian K. Hadfield; Jack Clark
    Abstract: Appropriately regulating artificial intelligence is an increasingly urgent policy challenge. Legislatures and regulators lack the specialized knowledge required to best translate public demands into legal requirements. Overreliance on industry self-regulation fails to hold producers and users of AI systems accountable to democratic demands. Regulatory markets, in which governments require the targets of regulation to purchase regulatory services from a private regulator, are proposed. This approach to AI regulation could overcome the limitations of both command-and-control regulation and self-regulation. Regulatory market could enable governments to establish policy priorities for the regulation of AI, whilst relying on market forces and industry R&D efforts to pioneer the methods of regulation that best achieve policymakers' stated objectives.
    Date: 2023–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2304.04914&r=big
  12. By: Finn Lattimore; Daniel M. Steinberg; Anna Zhu
    Abstract: Pursuing educational qualifications later in life is an increasingly common phenomenon within OECD countries since technological change and automation continues to drive the evolution of skills needed in many professions. We focus on the causal impacts to economic returns of degrees completed later in life, where motivations and capabilities to acquire additional education may be distinct from education in early years. We find that completing and additional degree leads to more than \$3000 (AUD, 2019) per year compared to those who do not complete additional study. For outcomes, treatment and controls we use the extremely rich and nationally representative longitudinal data from the Household Income and Labour Dynamics Australia survey is used for this work. To take full advantage of the complexity and richness of this data we use a Machine Learning (ML) based methodology to estimate the causal effect. We are also able to use ML to discover sources of heterogeneity in the effects of gaining additional qualifications, for example those younger than 45 years of age when obtaining additional qualifications tend to reap more benefits (as much as \$50 per week more) than others.
    Date: 2023–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2304.01490&r=big
  13. By: Hou, Kewei (Ohio State U); Qiao, Fang (U of International Business and Economics, Beijing); Zhang, Xiaoyan (Tsinghua U)
    Abstract: To study the cross-section of returns in the Chinese stock market, we follow the anomaly literature and construct 454 strategies between 2000 and 2020, based on 208 firm-level trading and accounting signals. With the conventional single-testing t-statistic cutoff of 1.96, 101 strategies have significant value-weighted raw returns, and 20 remain significant after risk adjustments. To avoid false discoveries, we recalibrate the t-statistic cutoff to 2.85 to accommodate multiple testing. 36 strategies survive the higher hurdle rate in value-weighted raw returns, while none remains significant after risk adjustments. When we use machine learning techniques to combine information from multiple signals, the resulting composite strategies mostly have significant returns after risk adjustments, even with the higher t-statistic cutoff. We relate Chinese anomaly returns to aggregate economic conditions and find that they comove with financial market development, accounting quality, market liquidity, and government regulations.
    JEL: G1 G12
    Date: 2023–01
    URL: http://d.repec.org/n?u=RePEc:ecl:ohidic:2023-02&r=big
  14. By: {\L}ukasz Lepak; Pawe{\l} Wawrzy\'nski
    Abstract: An increasing part of energy is produced from renewable sources by a large number of small producers. The efficiency of these sources is volatile and, to some extent, random, exacerbating the energy market balance problem. In many countries, that balancing is performed on day-ahead (DA) energy markets. In this paper, we consider automated trading on a DA energy market by a medium size prosumer. We model this activity as a Markov Decision Process and formalize a framework in which a ready-to-use strategy can be optimized with real-life data. We synthesize parametric trading strategies and optimize them with an evolutionary algorithm. We also use state-of-the-art reinforcement learning algorithms to optimize a black-box trading strategy fed with available information from the environment that can impact future prices.
    Date: 2023–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2303.16266&r=big
  15. By: Dirk Bergemann (Cowles Foundation, Yale University); Alessandro Bonatti
    Abstract: We analyze digital markets where a monopolist platform uses data to match multiproduct sellers with heterogeneous consumers who can purchase both on and off the platform. The platform sells targeted ads to sellers that recommend their products to consumers and reveals information to consumers about their values. The revenueoptimal mechanism is a managed advertising campaign that matches products and preferences efficiently. In equilibrium, sellers offer higher qualities at lower unit prices on than off the platform. Privacy-respecting data-governance rules such as organic search results or federated learning can lead to welfare gains for consumers.
    Keywords: Data, Data, Privacy, Data Governance, Digital Advertising, Competition, Digital Platforms, Digital Intermediaries, Personal Data, Matching, Price Discrimination, Automated Bidding, Algorithmic Bidding, Managed Advertising Campaigns, Showrooming
    JEL: D18 D44 D82 D83
    Date: 2023–04
    URL: http://d.repec.org/n?u=RePEc:cwl:cwldpp:2343r&r=big
  16. By: Weiguang Han; Jimin Huang; Qianqian Xie; Boyi Zhang; Yanzhao Lai; Min Peng
    Abstract: Although pair trading is the simplest hedging strategy for an investor to eliminate market risk, it is still a great challenge for reinforcement learning (RL) methods to perform pair trading as human expertise. It requires RL methods to make thousands of correct actions that nevertheless have no obvious relations to the overall trading profit, and to reason over infinite states of the time-varying market most of which have never appeared in history. However, existing RL methods ignore the temporal connections between asset price movements and the risk of the performed trading. These lead to frequent tradings with high transaction costs and potential losses, which barely reach the human expertise level of trading. Therefore, we introduce CREDIT, a risk-aware agent capable of learning to exploit long-term trading opportunities in pair trading similar to a human expert. CREDIT is the first to apply bidirectional GRU along with the temporal attention mechanism to fully consider the temporal correlations embedded in the states, which allows CREDIT to capture long-term patterns of the price movements of two assets to earn higher profit. We also design the risk-aware reward inspired by the economic theory, that models both the profit and risk of the tradings during the trading period. It helps our agent to master pair trading with a robust trading preference that avoids risky trading with possible high returns and losses. Experiments show that it outperforms existing reinforcement learning methods in pair trading and achieves a significant profit over five years of U.S. stock data.
    Date: 2023–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2304.00364&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.