|
on Big Data |
By: | Morgane Laouenan (CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po, CNRS - Centre National de la Recherche Scientifique); Palaash Bhargava (Department of Economics Columbia University - Columbia University [New York]); Jean-Benoît Eyméoud (LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po); Olivier Gergaud (LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po); Guillaume Plique (médialab - médialab (Sciences Po) - Sciences Po - Sciences Po, Kedge BS - Kedge Business School); Etienne Wasmer (New York University [Abu Dhabi] - NYU - NYU System, LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po) |
Abstract: | A new strand of literature aims at building the most comprehensive and accurate database of notable individuals. We collect a massive amount of data from various editions of and . Using deduplication techniques over these partially overlapping sources, we cross-verify each retrieved information. For some variables, adds 15% more information when missing in . We find very few errors in the part of the database that contains the most documented individuals but nontrivial error rates in the bottom of the notability distribution, due to sparse information and classification errors or ambiguity. Our strategy results in a cross-verified database of 2.29 million individuals (an elite of 1/43, 000 of human being having ever lived), including a third who are not present in the English edition of . Data collection is driven by specific social science questions on gender, economic growth, urban and cultural development. We document an Anglo-Saxon bias present in the English edition of , and document when it matters and when not. |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:hal:cesptp:hal-03930666&r=big |
By: | Jan Pr\"user; Florian Huber |
Abstract: | Modeling and predicting extreme movements in GDP is notoriously difficult and the selection of appropriate covariates and/or possible forms of nonlinearities are key in obtaining precise forecasts. In this paper, our focus is on using large datasets in quantile regression models to forecast the conditional distribution of US GDP growth. To capture possible non-linearities we include several nonlinear specifications. The resulting models will be huge dimensional and we thus rely on a set of shrinkage priors. Since Markov Chain Monte Carlo estimation becomes slow in these dimensions, we rely on fast variational Bayes approximations to the posterior distribution of the coefficients and the latent states. We find that our proposed set of models produces precise forecasts. These gains are especially pronounced in the tails. Using Gaussian processes to approximate the nonlinear component of the model further improves the good performance in the tails. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2301.13604&r=big |
By: | Andrew Papanicolaou; Hao Fu; Prashanth Krishnamurthy; Farshad Khorrami |
Abstract: | We analyze a fixed-point algorithm for reinforcement learning (RL) of optimal portfolio mean-variance preferences in the setting of multivariate generalized autoregressive conditional-heteroskedasticity (MGARCH) with a small penalty on trading. A numerical solution is obtained using a neural network (NN) architecture within a recursive RL loop. A fixed-point theorem proves that NN approximation error has a big-oh bound that we can reduce by increasing the number of NN parameters. The functional form of the trading penalty has a parameter $\epsilon>0$ that controls the magnitude of transaction costs. When $\epsilon$ is small, we can implement an NN algorithm based on the expansion of the solution in powers of $\epsilon$. This expansion has a base term equal to a myopic solution with an explicit form, and a first-order correction term that we compute in the RL loop. Our expansion-based algorithm is stable, allows for fast computation, and outputs a solution that shows positive testing performance. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2301.10869&r=big |
By: | Marta Crispino (Bank of Italy); Vincenzo Mariani (Bank of Italy) |
Abstract: | This paper proposes a strategy for nowcasting tourist overnight stays in Italy by exploiting payment card data and Google Search indices. The strategy is applied to national and regional overnight stays at a time of a significant and unanticipated shock to tourism flows and payment habits (the COVID-19 pandemic). Our results show that indicators based on payment data are very informative for predicting tourist volumes, both at the national and at the regional level. Instead, the predictive power of Google Search data is more limited. |
Keywords: | tourism, time series, payment cards data, Google Trends, nowcasting |
JEL: | L83 C53 C55 F47 |
Date: | 2023–02 |
URL: | http://d.repec.org/n?u=RePEc:bdi:opques:qef_746_23&r=big |
By: | Ezzedine Ghlamallah (CERGAM - Centre d'Études et de Recherche en Gestion d'Aix-Marseille - AMU - Aix Marseille Université - UTLN - Université de Toulon); Christos Alexakis; Michael Dowling; Anke Piepenbrink |
Abstract: | We provide a comprehensive structuring of research on Islamic economics and finance into the core topics of the area, for the period 1979 to 2018. This is carried out through a probabilistic topic modeling approach that allows statistical learning of the connection between research articles as well as their shared topics. This approach, which blends machine learning and natural language processing, helps provide a comprehensive structure to the literature. Our topic modeling analysis is conducted on approximately 1500 articles, and suggests the Islamic economics and finance literature can be well-described by 11 topics. These topics cover economic, finance, and morality issues. Our research can be applied to provide a clear structure for ongoing research agendas in Islamic economics and finance as well as a framework for understanding research development in this area. We also note the differences between Islamic and conventional approaches to economics and finance research in order to highlight the inherent new contributions of this maturing area of research. |
Keywords: | Islamic finance, Islamic banking, Islamic economics, Takaful |
Date: | 2021–09 |
URL: | http://d.repec.org/n?u=RePEc:hal:journl:hal-03511406&r=big |
By: | Békés, Gábor; Ottaviano, Gianmarco I. P. |
Abstract: | One may reasonably think that cultural preferences affect collaboration in multinational teams in general, but not in superstar teams of professionals at the top of their industry. We reject this hypothesis by creating and analyzing an exhaustive dataset recording all 10.7 million passes by 7 thousand professional European football players from 138 countries fielded by all 154 teams competing in the top 5 men leagues over 8 sporting seasons, together with full information on players' and teams' characteristics. We use a discrete choice model of players' passing behavior as a baseline to separately identify collaboration due to cultural preferences (`choice homophily') from collaboration due to opportunities (`induced homophily'). The outcome we focus on is the `pass rate', defined as the count of passes from a passer to a receiver relative to the passer's total passes when both players are fielded together in a half-season. We find strong evidence of choice homophily. Relative to the baseline, player pairs of same culture have a 2.42 percent higher pass rate due to choice, compared with a 6.16 percent higher pass rate due to both choice and opportunity. This shows that choice homophily based on culture is pervasive and persistent even in teams of very high skill individuals with clear common objectives and aligned incentives, who are involved in interactive tasks that are well defined, readily monitored and not particularly language intensive. |
Keywords: | organizations; teams; culture; homophily; diversity; language; globalization; big data; panel data; sport |
JEL: | J1 |
Date: | 2022–10–07 |
URL: | http://d.repec.org/n?u=RePEc:ehl:lserod:117993&r=big |
By: | Moreira, Hugo |
Abstract: | This study conducts a threefold analysis of the EU proposal for an Artificial Intelligence Act (AIA). The first objective is a regulatory analysis of the proposal, focusing on the proposed structures for implementation, concepts, and key requirements for Artificial Intelligence (AI) producers. The second objective is a comparison with the General Data Protection Regulation (GDPR), and its complementarity in providing a robust response to the needs of operators and users of data-driven algorithmic technologies. The third objective is to examine the potential for harmonization of the EU internal market and competitiveness with non-EU markets. The analysis includes a regulatory comparison with the GDPR, which highlights the EU's digital economy policy based on national authorities, risk-based approaches and European bodies for harmonization. The analytical framework of the Brussels Effect is also applied to the AIA proposal, which expresses the intentions of the regulations, both from internal and external pressures. The study concludes that the AIA proposed by the European Commission has the potential to have a significant impact on the development and use of AI systems, particularly in the EU and for companies operating internationally. However, it also poses challenges in terms of implementation and enforcement, which could hamper growth. |
Date: | 2023–01–25 |
URL: | http://d.repec.org/n?u=RePEc:osf:socarx:59fbk&r=big |
By: | Solveig Flaig; Gero Junike |
Abstract: | Machine learning methods are getting more and more important in the development of internal models using scenario generation. As internal models under Solvency 2 have to be validated, an important question is in which aspects the validation of these data-driven models differs from a classical theory-based model. On the specific example of market risk, we discuss the necessity of two additional validation tasks: one to check the dependencies between the risk factors used and one to detect the unwanted memorizing effect. The first one is necessary because in this new method, the dependencies are not derived from a financial-mathematical theory. The latter one arises when the machine learning model only repeats empirical data instead of generating new scenarios. These measures are then applied for an machine learning based economic scenario generator. It is shown that those measures lead to reasonable results in this context and are able to be used for validation as well as for model optimization. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2301.12719&r=big |
By: | Kyung, Heekwon (Korea Institute for Industrial Economics and Trade); Lee, Jun (Korea Institute for Industrial Economics and Trade) |
Abstract: | On March 2, the National Security Commission on Artificial Intelligence (NSCAI) released its final report: a glimpse of the U.S. view of advanced industries such as artificial intelligence (AI) and semiconductors, as well as the direction of its related strategies. The commission urged a full-scale mobilization of government capacity to beat China for global supremacy in AI and other related advanced industries. American strategies for AI and other advanced industries are key constants to be considered in devising Korea’s industrial policy, and a national blueprint is needed to respond to such strategies. This analytical brief analyzes the main features of the NSCAI report and the implications carried for Korean industrial strategy and policy. |
Keywords: | artificial intelligence; AI; technology; US; China; Korea; semiconductors; supply chain; advanced technology; manufacturing; competition; competitiveness; national security; conflict; hegemony; economic strategy; innovation; R&D |
JEL: | F02 F13 F23 F50 F52 H12 H56 J21 J24 J38 L16 L53 L63 O32 O38 |
Date: | 2021–05–17 |
URL: | http://d.repec.org/n?u=RePEc:ris:kietia:2021_009&r=big |
By: | Kazuki Amagai; Tomoya Suzuki |
Abstract: | In the practical business of asset management by investment trusts and the like, the general practice is to manage over the medium to long term owing to the burden of operations and increase in transaction costs with the increase in turnover ratio. However, when machine learning is used to construct a management model, the number of learning data decreases with the increase in the long-term time scale; this causes a decline in the learning precision. Accordingly, in this study, data augmentation was applied by the combined use of not only the time scales of the target tasks but also the learning data of shorter term time scales, demonstrating that degradation of the generalization performance can be inhibited even if the target tasks of machine learning have long-term time scales. Moreover, as an illustration of how this data augmentation can be applied, we conducted portfolio management in which machine learning of a multifactor model was done by an autoencoder and mispricing was used from the estimated theoretical values. The effectiveness could be confirmed in not only the stock market but also the FX market, and a general-purpose management model could be constructed in various financial markets. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2301.12346&r=big |
By: | Benjamin Avanzi; Greg Taylor; Melantha Wang; Bernard Wong |
Abstract: | High-cardinality categorical features are pervasive in actuarial data (e.g. occupation in commercial property insurance). Standard categorical encoding methods like one-hot encoding are inadequate in these settings. In this work, we present a novel _Generalised Linear Mixed Model Neural Network_ ("GLMMNet") approach to the modelling of high-cardinality categorical features. The GLMMNet integrates a generalised linear mixed model in a deep learning framework, offering the predictive power of neural networks and the transparency of random effects estimates, the latter of which cannot be obtained from the entity embedding models. Further, its flexibility to deal with any distribution in the exponential dispersion (ED) family makes it widely applicable to many actuarial contexts and beyond. We illustrate and compare the GLMMNet against existing approaches in a range of simulation experiments as well as in a real-life insurance case study. Notably, we find that the GLMMNet often outperforms or at least performs comparably with an entity embedded neural network, while providing the additional benefit of transparency, which is particularly valuable in practical applications. Importantly, while our model was motivated by actuarial applications, it can have wider applicability. The GLMMNet would suit any applications that involve high-cardinality categorical variables and where the response cannot be sufficiently modelled by a Gaussian distribution. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2301.12710&r=big |
By: | Chronopoulos, Ilias; Raftapostolos, Aristeidis; Kapetanios, George |
Abstract: | In this paper we use a deep quantile estimator, based on neural networks and their universal approximation property to examine a non-linear association between the conditional quantiles of a dependent variable and predictors. This methodology is versatile and allows both the use of different penalty functions, as well as high dimensional covariates. We present a Monte Carlo exercise where we examine the finite sample properties of the deep quantile estimator and show that it delivers good finite sample performance. We use the deep quantile estimator to forecast Value-at-Risk and find significant gains over linear quantile regression alternatives and other models, which are supported by various testing schemes. Further, we consider also an alternative architecture that allows the use of mixed frequency data in neural networks. This paper also contributes to the interpretability of neural networks output by making comparisons between the commonly used SHAP values and an alternative method based on partial derivatives. |
Keywords: | Quantile regression, machine learning, neural networks, value-at-risk, forecasting |
Date: | 2023–02–07 |
URL: | http://d.repec.org/n?u=RePEc:esy:uefcwp:34837&r=big |
By: | Jonathan Leslie (Indiana University, Department of Economics) |
Abstract: | I evaluate whether incorporating sub-national trends improves macroeconomic forecasting accuracy in a deep machine learning framework. Specifically, I adopt a computer vision setting by transforming U.S. economic data into a ‘video’ series of geographic ‘images’ and utilizing a recurrent convolutional neural network to extract spatio-temporal features. This spatial forecasting model outperforms equivalent methods based on country-level data and achieves a 0.14 percentage point average error when forecasting out-of-sample monthly percentage changes in real GDP over a twelve-month horizon. The estimated model focuses on Middle America in particular when making its predictions: providing insight into the benefit of employing spatial data. |
Keywords: | Macroeconomic Forecasting, Machine Learning, Deep Learning, Computer Vision, Economic Geography |
Date: | 2023–02 |
URL: | http://d.repec.org/n?u=RePEc:inu:caeprp:2023003&r=big |
By: | Andrew Na; Justin Wan |
Abstract: | We propose a deep Recurrent neural network (RNN) framework for computing prices and deltas of American options in high dimensions. Our proposed framework uses two deep RNNs, where one network learns the price and the other learns the delta of the option for each timestep. Our proposed framework yields prices and deltas for the entire spacetime, not only at a given point (e.g. t = 0). The computational cost of the proposed approach is linear in time, which improves on the quadratic time seen for feedforward networks that price American options. The computational memory cost of our method is constant in memory, which is an improvement over the linear memory costs seen in feedforward networks. Our numerical simulations demonstrate these contributions, and show that the proposed deep RNN framework is computationally more efficient than traditional feedforward neural network frameworks in time and memory. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2301.08232&r=big |
By: | de Cornière, Alexandre; Taylor, Greg |
Abstract: | Does enhanced access to data foster or hinder competition among firms? Using a competition-in-utility framework that encompasses many situations where firms use data, we model data as a revenue-shifter and identify two opposite effects: a mark-up effect according to which data induces firms to compete harder, and a surplus-extraction effect. We provide conditions for data to be pro- or anti-competitive, requiring neither knowledge of demand nor computation of equilibrium. We apply our results to situations where data is used to recommend products, monitor insuree behavior, price-discriminate, or target advertising. We also revisit the issue of data and market structure. |
JEL: | L1 L4 L5 |
Date: | 2023–01–31 |
URL: | http://d.repec.org/n?u=RePEc:tse:wpaper:32535&r=big |
By: | Mike Tsionas (Montpellier Business School Université de Montpellier, Montpellier Research in Management and Lancaster University Management School); Christopher F. Parmeter (Miami Herbert Business School, University of Miami, Miami FL); Valentin Zelenyuk (School of Economics and Centre for Efficiency and Productivity Analysis (CEPA) at The University of Queensland, Australia) |
Abstract: | Artificial neural networks have offered their share of econometric insights, given their power to model complex relationships. One area where they have not been readily deployed is the estimation of frontiers. The literature on frontier estimation has seen its share of research comparing and contrasting data envelopment analysis (DEA) and stochastic frontier analysis (SFA), the two workhorse estimators. These studies rely on both Monte Carlo experiments and actual data sets to examine a range of performance issues which can be used to elucidate insights on the benefits or weaknesses of one method over the other. As can be imagined, neither method is universally better than the other. The present paper proposes an alternative approach that is quite exible in terms of functional form and distributional assumptions and it amalgamates the benefits of both DEA and SFA. Specifically, we bridge these two popular approaches via Bayesian artificial neural networks while accounting for possible endogeneity of inputs. We examine the performance of this new machine learning approach using Monte Carlo experiments which is found to be very good, comparable to, or often better than, the current standards in the literature. To illustrate the new techniques, we provide an application of this approach to a data set of large US banks. |
Keywords: | Machine Learning; Simulation; Flexible Functional Forms; Bayesian Artificial Neural Networks; Banking; Efficiency Analysis. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:qld:uqcepa:183&r=big |
By: | Mr. Anil Ari; Gabor Pula; Liyang Sun |
Abstract: | The qualitative and granular nature of most structural indicators and the variety in data sources poses difficulties for consistent cross-country assessments and empirical analysis. We overcome these issues by using a machine learning approach (the partial least squares method) to combine a broad set of cross-country structural indicators into a small number of synthetic scores which correspond to key structural areas, and which are suitable for consistent quantitative comparisons across countries and time. With this newly constructed dataset of synthetic structural scores in 126 countries between 2000-2019, we establish stylized facts about structural gaps and reforms, and analyze the impact of reforms targeting different structural areas on economic growth. Our findings suggest that structural reforms in the area of product, labor and financial markets as well as the legal system have a significant impact on economic growth in a 5-year horizon, with one standard deviation improvement in one of these reform areas raising cumulative 5-year growth by 2 to 6 percent. We also find synergies between different structural areas, in particular between product and labor market reforms. |
Keywords: | Structural reforms; institutions; economic growth; C. PLS estimation procedure; machine learning approach; Gabor pula; Liyang sun; labor market composite; Business environment; Labor markets; Machine learning; Labor market reforms; Global |
Date: | 2022–09–16 |
URL: | http://d.repec.org/n?u=RePEc:imf:imfwpa:2022/184&r=big |
By: | Xiaohong Chen; Zhengling Qi; Runzhe Wan |
Abstract: | Batch reinforcement learning (RL) aims at finding an optimal policy in a dynamic environment in order to maximize the expected total rewards by leveraging pre-collected data. A fundamental challenge behind this task is the distributional mismatch between the batch data generating process and the distribution induced by target policies. Nearly all existing algorithms rely on the absolutely continuous assumption on the distribution induced by target policies with respect to the data distribution so that the batch data can be used to calibrate target policies via the change of measure. However, the absolute continuity assumption could be violated in practice, especially when the state-action space is large or continuous. In this paper, we propose a new batch RL algorithm without requiring absolute continuity in the setting of an infinite-horizon Markov decision process with continuous states and actions. We call our algorithm STEEL: SingulariTy-awarE rEinforcement Learning. Our algorithm is motivated by a new error analysis on off-policy evaluation, where we use maximum mean discrepancy, together with distributionally robust optimization, to characterize the error of off-policy evaluation caused by the possible singularity and to enable the power of model extrapolation. By leveraging the idea of pessimism and under some mild conditions, we derive a finite-sample regret guarantee for our proposed algorithm without imposing absolute continuity. Compared with existing algorithms, STEEL only requires some minimal data-coverage assumption and thus greatly enhances the applicability and robustness of batch RL. Extensive simulation studies and one real experiment on personalized pricing demonstrate the superior performance of our method when facing possible singularity in batch RL. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2301.13152&r=big |
By: | Morgane Laouenan (CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po, CNRS - Centre National de la Recherche Scientifique); Palaash Bhargava (Department of Economics Columbia University - Columbia University [New York]); Jean-Benoît Eyméoud (LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po); Olivier Gergaud (LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po); Guillaume Plique (médialab - médialab (Sciences Po) - Sciences Po - Sciences Po, Kedge BS - Kedge Business School); Etienne Wasmer (New York University [Abu Dhabi] - NYU - NYU System, LIEPP - Laboratoire interdisciplinaire d'évaluation des politiques publiques (Sciences Po) - Sciences Po - Sciences Po) |
Abstract: | A new strand of literature aims at building the most comprehensive and accurate database of notable individuals. We collect a massive amount of data from various editions of and . Using deduplication techniques over these partially overlapping sources, we cross-verify each retrieved information. For some variables, adds 15% more information when missing in . We find very few errors in the part of the database that contains the most documented individuals but nontrivial error rates in the bottom of the notability distribution, due to sparse information and classification errors or ambiguity. Our strategy results in a cross-verified database of 2.29 million individuals (an elite of 1/43, 000 of human being having ever lived), including a third who are not present in the English edition of . Data collection is driven by specific social science questions on gender, economic growth, urban and cultural development. We document an Anglo-Saxon bias present in the English edition of , and document when it matters and when not. |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:hal:journl:hal-03930666&r=big |
By: | Nhu Khoa Nguyen; Thierry Delahaut; Emanuela Boros; Antoine Doucet; Ga\"el Lejeune |
Abstract: | Identifying and exploring emerging trends in the news is becoming more essential than ever with many changes occurring worldwide due to the global health crises. However, most of the recent research has focused mainly on detecting trends in social media, thus, benefiting from social features (e.g. likes and retweets on Twitter) which helped the task as they can be used to measure the engagement and diffusion rate of content. Yet, formal text data, unlike short social media posts, comes with a longer, less restricted writing format, and thus, more challenging. In this paper, we focus our study on emerging trends detection in financial news articles about Microsoft, collected before and during the start of the COVID-19 pandemic (July 2019 to July 2020). We make the dataset accessible and propose a strong baseline (Contextual Leap2Trend) for exploring the dynamics of similarities between pairs of keywords based on topic modelling and term frequency. Finally, we evaluate against a gold standard (Google Trends) and present noteworthy real-world scenarios regarding the influence of the pandemic on Microsoft. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2301.11318&r=big |
By: | John J. Horton |
Abstract: | Newly-developed large language models (LLM) -- because of how they are trained and designed -- are implicit computational models of humans -- a homo silicus. These models can be used the same way economists use homo economicus: they can be given endowments, information, preferences, and so on and then their behavior can be explored in scenarios via simulation. I demonstrate this approach using OpenAI's GPT3 with experiments derived from Charness and Rabin (2002), Kahneman, Knetsch and Thaler (1986) and Samuelson and Zeckhauser (1988). The findings are qualitatively similar to the original results, but it is also trivially easy to try variations that offer fresh insights. Departing from the traditional laboratory paradigm, I also create a hiring scenario where an employer faces applicants that differ in experience and wage ask and then analyze how a minimum wage affects realized wages and the extent of labor-labor substitution. |
Date: | 2023–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2301.07543&r=big |