nep-big New Economics Papers
on Big Data
Issue of 2024‒09‒16
sixteen papers chosen by
Tom Coupé, University of Canterbury


  1. Deep Learning for Economists By Melissa Dell
  2. Predicting the distributions of stock returns around the globe in the era of big data and learning By Jozef Barunik; Martin Hronec; Ondrej Tobek
  3. Directional Stock Price Forecasting Based on Quantitative Value Investing Principles for Loss Averted Bogle-Head Investing using Various Machine Learning Algorithms By Moitra, Agnij
  4. EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods By Hongcheng Ding; Xuanze Zhao; Zixiao Jiang; Shamsul Nahar Abdullah; Deshinta Arrova Dewi
  5. Neural Network Learning for Nonlinear Economies By Julian Ashwin; Paul Beaudry; Martin Ellison
  6. Why Groups Matter: Necessity of Group Structures in Attributions By Dangxing Chen; Jingfeng Chen; Weicheng Ye
  7. Predicting full retirement attainment of NBA players By Foutzopoulos, Giorgos; Pandis, Nikolaos; Tsagris, Michail
  8. Optimizing Portfolio with Two-Sided Transactions and Lending: A Reinforcement Learning Framework By Ali Habibnia; Mahdi Soltanzadeh
  9. Spooky Boundaries at a Distance: Inductive Bias, Dynamic Models, and Behavioral Macro By Mahdi E. Kahou; Jesús Fernández-Villaverde; Sebastian Gomez-Cardona; Jesse Perla; Jan Rosa
  10. Machine Learning and the Yield Curve: Tree-Based Macroeconomic Regime Switching By Siyu Bie; Francis X. Diebold; Jingyu He; Junye Li
  11. Enhancement of price trend trading strategies via image-induced importance weights By Zhoufan Zhu; Ke Zhu
  12. Enhancing Deep Hedging of Options with Implied Volatility Surface Feedback Information By Pascal Fran\c{c}ois; Genevi\`eve Gauthier; Fr\'ed\'eric Godin; Carlos Octavio P\'erez Mendoza
  13. Evaluating the Role of Information Disclosure on Bidding Behavior in Wholesale Electricity Markets By Brown, David P.; Cajueiro, Daniel O.; Eckert, Andrew; Silveira, Douglas
  14. From Text to Insight: Leveraging Large Language Models for Performance Evaluation in Management By Ning Li; Huaikang Zhou; Mingze Xu
  15. Get in the Zone: The Risk-Adjusted Welfare Effects of Data-Driven vs. Administrative Borders for Index Insurance Zones By Benami, Elinor; Carter, Michael R.; Hobbs, Andrew; Jin, Zhenong; Kirchner, Ella
  16. Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors By Felipe A. Csaszar; Harsh Ketkar; Hyunjin Kim

  1. By: Melissa Dell
    Abstract: Deep learning provides powerful methods to impute structured information from large-scale, unstructured text and image datasets. For example, economists might wish to detect the presence of economic activity in satellite images, or to measure the topics or entities mentioned in social media, the congressional record, or firm filings. This review introduces deep neural networks, covering methods such as classifiers, regression models, generative AI, and embedding models. Applications include classification, document digitization, record linkage, and methods for data exploration in massive scale text and image corpora. When suitable methods are used, deep learning models can be cheap to tune and can scale affordably to problems involving millions or billions of data points.. The review is accompanied by a companion website, EconDL, with user-friendly demo notebooks, software resources, and a knowledge base that provides technical details and additional applications.
    JEL: C0
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:32768
  2. By: Jozef Barunik; Martin Hronec; Ondrej Tobek
    Abstract: This paper presents a method for accurately predicting the full distribution of stock returns, given a comprehensive set of 194 stock characteristics and market variables. Such distributions, learned from rich data using a machine learning algorithm, are not constrained by restrictive model assumptions and allow the exploration of non-Gaussian, heavy-tailed data and their non-linear interactions. The method uses a two-stage quantile neural network combined with spline interpolation. The results show that the proposed approach outperforms alternative models in terms of out-of-sample losses. Furthermore, we show that the moments derived from such distributions can be useful as alternative empirical estimates in many cases, including mean estimation and forecasting. Finally, we examine the relationship between cross-sectional returns and several distributional characteristics. The results are robust to a wide range of US and international data.
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2408.07497
  3. By: Moitra, Agnij
    Abstract: Boglehead investing, founded on the principles of John C. Bogle is one of the classic time tested long term, low cost, and passive investment strategy. This paper uses various machine learning methods, and fundamental stock data in order to predict whether or not a stock would incur negative returns next year, and suggests a loss averted bogle-head strategy to invest in all stocks which are expected to not give negative returns over the next year. Results reveal that XGBoost, out of the 44 models trained, has the highest classification metrics for this task. Furthermore, this paper shall use various machine learning methods for exploratory data analysis, and SHAP values reveal that Net Income Margin, ROA, Gross Profit Margin and EBIT are some of the most important factors for this. Also, based on the SHAP values it is interesting to note that the current year has negligible contribution to the final prediction. Investors can use this as a heuristic guide for loss averted long term (1-year) stock portfolios.
    Date: 2024–07–27
    URL: https://d.repec.org/n?u=RePEc:osf:osfxxx:y3mr6
  4. By: Hongcheng Ding; Xuanze Zhao; Zixiao Jiang; Shamsul Nahar Abdullah; Deshinta Arrova Dewi
    Abstract: Accurate forecasting of the EUR/USD exchange rate is crucial for investors, businesses, and policymakers. This paper proposes a novel framework, IUS, that integrates unstructured textual data from news and analysis with structured data on exchange rates and financial indicators to enhance exchange rate prediction. The IUS framework employs large language models for sentiment polarity scoring and exchange rate movement classification of texts. These textual features are combined with quantitative features and input into a Causality-Driven Feature Generator. An Optuna-optimized Bi-LSTM model is then used to forecast the EUR/USD exchange rate. Experiments demonstrate that the proposed method outperforms benchmark models, reducing MAE by 10.69% and RMSE by 9.56% compared to the best performing baseline. Results also show the benefits of data fusion, with the combination of unstructured and structured data yielding higher accuracy than structured data alone. Furthermore, feature selection using the top 12 important quantitative features combined with the textual features proves most effective. The proposed IUS framework and Optuna-Bi-LSTM model provide a powerful new approach for exchange rate forecasting through multi-source data integration.
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2408.13214
  5. By: Julian Ashwin (Maastricht University); Paul Beaudry (University of British Columbia); Martin Ellison (University of Oxford; Centre for Economic Policy Research (CEP)
    Abstract: Neural networks offer a promising tool for the analysis of nonlinear economies. In this paper, we derive conditions for the global stability of nonlinear rational expectations equilibria under neural network learning. We demonstrate the applicability of the conditions in analytical and numerical examples where the nonlinearity is caused by monetary policy targeting a range, rather than a specific value, of inflation. If shock persistence is high or there is inertia in the structure of the economy, then the only rational expectations equilibria that are learnable may involve inflation spending long periods outside its target range. Neural network learning is also useful for solving and selecting between multiple equilibria and steady states in other settings, such as when there is a zero lower bound on the nominal interest rate.
    Keywords: inflation targeting, machine learning, neural networks, zero lower bound
    Date: 2024–07
    URL: https://d.repec.org/n?u=RePEc:cfm:wpaper:2432
  6. By: Dangxing Chen; Jingfeng Chen; Weicheng Ye
    Abstract: Explainable machine learning methods have been accompanied by substantial development. Despite their success, the existing approaches focus more on the general framework with no prior domain expertise. High-stakes financial sectors have extensive domain knowledge of the features. Hence, it is expected that explanations of models will be consistent with domain knowledge to ensure conceptual soundness. In this work, we study the group structures of features that are naturally formed in the financial dataset. Our study shows the importance of considering group structures that conform to the regulations. When group structures are present, direct applications of explainable machine learning methods, such as Shapley values and Integrated Gradients, may not provide consistent explanations; alternatively, group versions of the Shapley value can provide consistent explanations. We contain detailed examples to concentrate on the practical perspective of our framework.
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2408.05701
  7. By: Foutzopoulos, Giorgos; Pandis, Nikolaos; Tsagris, Michail
    Abstract: The aim of this analysis is to predict whether an National Basketball Association (NBA) player will be active in the league for at least 10 years so as to be qualified for NBA's full retirement scheme which allows for the maximum benefit payable by law. We collected per game statistics for players during their second year, drafted during the years 1999 up to 2006, for which, information on their career longetivity is known. By feeding these statistics of the sophomore players into statistical and machine learning algorithms we select the important statistics and manage to accomplish a satisfactory predictability performance. Further, we visualize the effect of each of the selected statistics on the estimated probability of staying in the league for more than 10 years. Finally, as an illustration, we collected data from players that were drafted 11 years ago (and some are still active) and estimated their probability of surviving in the league for at least 10 years.
    Keywords: BA, career duration, exit discrimination, retirement scheme
    JEL: C21 C38 C4 C53
    Date: 2024–07–23
    URL: https://d.repec.org/n?u=RePEc:pra:mprapa:121540
  8. By: Ali Habibnia; Mahdi Soltanzadeh
    Abstract: This study presents a Reinforcement Learning (RL)-based portfolio management model tailored for high-risk environments, addressing the limitations of traditional RL models and exploiting market opportunities through two-sided transactions and lending. Our approach integrates a new environmental formulation with a Profit and Loss (PnL)-based reward function, enhancing the RL agent's ability in downside risk management and capital optimization. We implemented the model using the Soft Actor-Critic (SAC) agent with a Convolutional Neural Network with Multi-Head Attention (CNN-MHA). This setup effectively manages a diversified 12-crypto asset portfolio in the Binance perpetual futures market, leveraging USDT for both granting and receiving loans and rebalancing every 4 hours, utilizing market data from the preceding 48 hours. Tested over two 16-month periods of varying market volatility, the model significantly outperformed benchmarks, particularly in high-volatility scenarios, achieving higher return-to-risk ratios and demonstrating robust profitability. These results confirm the model's effectiveness in leveraging market dynamics and managing risks in volatile environments like the cryptocurrency market.
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2408.05382
  9. By: Mahdi E. Kahou; Jesús Fernández-Villaverde; Sebastian Gomez-Cardona; Jesse Perla; Jan Rosa
    Abstract: In the long run, we are all dead. Nonetheless, when studying the short-run dynamics of economic models, it is crucial to consider boundary conditions that govern long-run, forward-looking behavior, such as transversality conditions. We demonstrate that machine learning (ML) can automatically satisfy these conditions due to its inherent inductive bias toward finding flat solutions to functional equations. This characteristic enables ML algorithms to solve for transition dynamics, ensuring that long-run boundary conditions are approximately met. ML can even select the correct equilibria in cases of steady-state multiplicity. Additionally, the inductive bias provides a foundation for modeling forward-looking behavioral agents with self-consistent expectations.
    JEL: C0 E0
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:32850
  10. By: Siyu Bie; Francis X. Diebold; Jingyu He; Junye Li
    Abstract: We explore tree-based macroeconomic regime-switching in the context of the dynamic Nelson-Siegel (DNS) yield-curve model. In particular, we customize the tree-growing algorithm to partition macroeconomic variables based on the DNS model's marginal likelihood, thereby identifying regime-shifting patterns in the yield curve. Compared to traditional Markov-switching models, our model offers clear economic interpretation via macroeconomic linkages and ensures computational simplicity. In an empirical application to U.S. Treasury bond yields, we find (1) important yield curve regime switching, and (2) evidence that macroeconomic variables have predictive power for the yield curve when the short rate is high, but not in other regimes, thereby refining the notion of yield curve ``macro-spanning".
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2408.12863
  11. By: Zhoufan Zhu; Ke Zhu
    Abstract: We open up the "black-box" to identify the predictive general price patterns in price chart images via the deep learning image analysis techniques. Our identified price patterns lead to the construction of image-induced importance (triple-I) weights, which are applied to weighted moving average the existing price trend trading signals according to their level of importance in predicting price movements. From an extensive empirical analysis on the Chinese stock market, we show that the triple-I weighting scheme can significantly enhance the price trend trading signals for proposing portfolios, with a thoughtful robustness study in terms of network specifications, image structures, and stock sizes. Moreover, we demonstrate that the triple-I weighting scheme is able to propose long-term portfolios from a time-scale transfer learning, enhance the news-based trading strategies through a non-technical transfer learning, and increase the overall strength of numerous trading rules for portfolio selection.
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2408.08483
  12. By: Pascal Fran\c{c}ois; Genevi\`eve Gauthier; Fr\'ed\'eric Godin; Carlos Octavio P\'erez Mendoza
    Abstract: We present a dynamic hedging scheme for S&P 500 options, where rebalancing decisions are enhanced by integrating information about the implied volatility surface dynamics. The optimal hedging strategy is obtained through a deep policy gradient-type reinforcement learning algorithm, with a novel hybrid neural network architecture improving the training performance. The favorable inclusion of forward-looking information embedded in the volatility surface allows our procedure to outperform several conventional benchmarks such as practitioner and smiled-implied delta hedging procedures, both in simulation and backtesting experiments.
    Date: 2024–07
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2407.21138
  13. By: Brown, David P. (University of Alberta, Department of Economics); Cajueiro, Daniel O. (University of Brasilia); Eckert, Andrew (University of Alberta, Department of Economics); Silveira, Douglas (University of Alberta, Department of Economics)
    Abstract: Real-time information has the potential to improve market outcomes in wholesale electricity markets. However, transparency can also facilitate coordination between firms, raising questions over the appropriate extent of information disclosure. Despite this ongoing debate, there is a lack of understanding of the information employed by firms when bidding in wholesale electricity markets. We use data from Alberta’s wholesale market and leverage machine learning techniques to evaluate the real-time information firms use when forming their bidding decisions. We find that aggregate market-level variables emerge as important predictors, while detailed firm-specific information does not lead to a material improvement in predicting firms’ bidding decisions. These results suggest that firm-specific information, which has raised concerns because of its potential use in facilitating coordinated behavior, may not be required to promote efficient market outcomes.
    Keywords: Machine Learning; Electricity; Price Forecasting; Competition Policy
    JEL: D43 L13 L50 L94 Q40
    Date: 2024–08–18
    URL: https://d.repec.org/n?u=RePEc:ris:albaec:2024_002
  14. By: Ning Li; Huaikang Zhou; Mingze Xu
    Abstract: This study explores the potential of Large Language Models (LLMs), specifically GPT-4, to enhance objectivity in organizational task performance evaluations. Through comparative analyses across two studies, including various task performance outputs, we demonstrate that LLMs can serve as a reliable and even superior alternative to human raters in evaluating knowledge-based performance outputs, which are a key contribution of knowledge workers. Our results suggest that GPT ratings are comparable to human ratings but exhibit higher consistency and reliability. Additionally, combined multiple GPT ratings on the same performance output show strong correlations with aggregated human performance ratings, akin to the consensus principle observed in performance evaluation literature. However, we also find that LLMs are prone to contextual biases, such as the halo effect, mirroring human evaluative biases. Our research suggests that while LLMs are capable of extracting meaningful constructs from text-based data, their scope is currently limited to specific forms of performance evaluation. By highlighting both the potential and limitations of LLMs, our study contributes to the discourse on AI role in management studies and sets a foundation for future research to refine AI theoretical and practical applications in management.
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2408.05328
  15. By: Benami, Elinor; Carter, Michael R.; Hobbs, Andrew; Jin, Zhenong; Kirchner, Ella
    Abstract: Agricultural index insurance seeks to protect producers against negative shocks that are common across a prespecified area, i.e., an index insurance zone. Often, administrative boundaries are used to delineate such index insurance zones. However, administrative boundaries may not reflect relevant variations in yield over space, which can be costly for policyholders as well as the public, especially since agricultural insurance is often heavily subsidized. Increased availability of finely resolved geospatial data on agronomic conditions coupled with machine learning approaches to identify similarities promises the ability to reduce losses associated with index insurance by identifying more homogeneous zones. In this work, we examine the changes in welfare impacts of a hypothetical area-yield index insurance when redrawing zone boundaries on the basis of relevant observed agronomic conditions. Drawing upon crop cut data from over 10, 000 maize fields in Kenya from 2016-2020 combined with satellite-based estimates of agronomic conditions, we examine the changes in expected utility to assess the value of data-driven and administrative insurance zones. When keeping the number of insurance zones equal to the number of administrative zones, we find that data-driven zones may offer only slightly higher risk reduction value than administrative zones. If no set number of zones are prespecified, the data-driven approach offers a flexible approach to identify an optimal number of zones that balances costs and performance. This approach can help inform program design as well as impact evaluations, as it further sheds light on trade-offs between the costs of ground sampling and zone size that can inform how to design and evaluate new programs in resource-constrained environments for maximum impact.
    Keywords: Agricultural Finance, International Development, Risk and Uncertainty
    Date: 2024–08–27
    URL: https://d.repec.org/n?u=RePEc:ags:cfcp15:344685
  16. By: Felipe A. Csaszar; Harsh Ketkar; Hyunjin Kim
    Abstract: This paper explores how artificial intelligence (AI) may impact the strategic decision-making (SDM) process in firms. We illustrate how AI could augment existing SDM tools and provide empirical evidence from a leading accelerator program and a startup competition that current Large Language Models (LLMs) can generate and evaluate strategies at a level comparable to entrepreneurs and investors. We then examine implications for key cognitive processes underlying SDM -- search, representation, and aggregation. Our analysis suggests AI has the potential to enhance the speed, quality, and scale of strategic analysis, while also enabling new approaches like virtual strategy simulations. However, the ultimate impact on firm performance will depend on competitive dynamics as AI capabilities progress. We propose a framework connecting AI use in SDM to firm outcomes and discuss how AI may reshape sources of competitive advantage. We conclude by considering how AI could both support and challenge core tenets of the theory-based view of strategy. Overall, our work maps out an emerging research frontier at the intersection of AI and strategy.
    Date: 2024–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2408.08811

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.