nep-big New Economics Papers
on Big Data
Issue of 2023‒01‒16
29 papers chosen by
Tom Coupé
University of Canterbury

  1. Fallen Angel Bonds Investment and Bankruptcy Predictions Using Manual Models and Automated Machine Learning By Harrison Mateika; Juannan Jia; Linda Lillard; Noah Cronbaugh; Will Shin
  2. Benchmarking Machine Learning Models to Predict Corporate Bankruptcy By Emmanuel Alanis; Sudheer Chava; Agam Shah
  3. A machine learning approach to support decision in insider trading detection By Piero Mazzarisi; Adele Ravagnani; Paola Deriu; Fabrizio Lillo; Francesca Medda; Antonio Russo
  4. Orthogonal Series Estimation for the Ratio of Conditional Expectation Functions By Kazuhiko Shinoda; Takahiro Hoshino
  5. The Role of Renewable Energy Consumption in Promoting Sustainability and Circular Economy. A Data-Driven Analysis By Laureti, Lucio; Costantiello, Alberto; Leogrande, Angelo
  6. Design interpretable experience of dynamical feed forward machine learning model for forecasting NASDAQ By Pouriya Khalilian; Sara Azizi; Mohammad Hossein Amiri; Javad T. Firouzjaee
  7. Bi-LSTM Price Prediction based on Attention Mechanism By Jiashu Lou; Leyi Cui; Ye Li
  8. Dominant Drivers of National Inflation By Jan Ditzen; Francesco Ravazzolo
  9. Langevin algorithms for Markovian Neural Networks and Deep Stochastic control By Pierre Bras
  10. The effects of discontinuing machine learning decision support By Bauer, Kevin; Nofer, Michael; Abdel-Karim, Benjamin M.; Hinz, Oliver
  11. Consumer credit in the age of AI: Beyond anti-discrimination law By Langenbucher, Katja
  12. Lie detection algorithms attract few users but vastly increase accusation rates By Alicia von Schenk; Victor Klockmann; Jean-Fran\c{c}ois Bonnefon; Iyad Rahwan; Nils K\"obis
  13. Data-Driven Prediction and Evaluation on Future Impact of Energy Transition Policies in Smart Regions By Chunmeng Yang; Siqi Bu; Yi Fan; Wayne Xinwei Wan; Ruoheng Wang; Aoife Foley
  14. The Effects of Just-in-time Delivery on Social Engagement: A Cluster Analysis By Mois\'es Ram\'irez; Raziel Ru\'iz; Nathan Klarer
  15. Policy learning for many outcomes of interest: Combining optimal policy trees with multi-objective Bayesian optimisation By Patrick Rehill
  16. Visual Privacy: Current and Emerging Regulations Around Unconsented Video Analytics in Retail By Pletcher, Scott Nicholas
  17. Management of Big data: An empirical investigation of the Too-Much-of-a-Good-Thing effect in medium and large firms By Claudio Vitari; Elisabetta Raguseo; Federico Pigni
  18. CDO calibration via Magnus Expansion and Deep Learning By Marco Di Francesco; Kevin Kamm
  19. Sentiment Indexes and Economic Activity Indicators in Mexico 2016-2021 By Torre Leonardo; González Eva; Casillas Ramón; Alvarado Jorge
  20. NETpred: Network-based modeling and prediction of multiple connected market indices By Alireza Jafari; Saman Haratizadeh
  21. Human Security: Concepts and Measurement By Phoebe Koundouri; Konstantinos Dellis
  22. AI Ethics on Blockchain: Topic Analysis on Twitter Data for Blockchain Security By Yihang Fu; Zesen Zhuang; Luyao Zhang
  23. Time Series Analysis in American Stock Market Recovering in Post COVID-19 Pandemic Period By Weilin Fu; Zhuoran Li; Yupeng Zhang; Xingyou Zhou
  24. Building Daily Economic Sentiment Indicators By Pilar Rey del Castillo
  25. Deep Quadratic Hedging By Alessandro Gnoatto; Silvia Lavagnini; Athena Picarelli
  26. R Libraries for Remote Sensing Data Classification by k-means Clustering and NDVI Computation in Congo River Basin, DRC By Polina Lemenkova; Olivier Debeir
  27. A probabilistic autoencoder for causal discovery By Matthias Feiler
  28. Open banking and customer data sharing: Implications for FinTech borrowers By Nam, Rachel J.
  29. Social Media and the Broadening of Social Movements: Evidence from Black Lives Matter By Artís, Annalí Casanueva; Avetian, Vladimir; Sardoschau, Sulin; Saxena, Kavya

  1. By: Harrison Mateika; Juannan Jia; Linda Lillard; Noah Cronbaugh; Will Shin
    Abstract: The primary aim of this research was to find a model that best predicts which fallen angel bonds would either potentially rise up back to investment grade bonds and which ones would fall into bankruptcy. To implement the solution, we thought that the ideal method would be to create an optimal machine learning model that could predict bankruptcies. Among the many machine learning models out there we decided to pick four classification methods: logistic regression, KNN, SVM, and NN. We also utilized an automated methods of Google Cloud's machine learning. The results of our model comparisons showed that the models did not predict bankruptcies very well on the original data set with the exception of Google Cloud's machine learning having a high precision score. However, our over-sampled and feature selection data set did perform very well. This could likely be due to the model being over-fitted to match the narrative of the over-sampled data (as in, it does not accurately predict data outside of this data set quite well). Therefore, we were not able to create a model that we are confident that would predict bankruptcies. However, we were able to find value out of this project in two key ways. The first is that Google Cloud's machine learning model in every metric and in every data set either outperformed or performed on par with the other models. The second is that we found that utilizing feature selection did not reduce predictive power that much. This means that we can reduce the amount of data to collect for future experimentation regarding predicting bankruptcies.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.03454&r=big
  2. By: Emmanuel Alanis; Sudheer Chava; Agam Shah
    Abstract: Using a comprehensive sample of 2, 585 bankruptcies from 1990 to 2019, we benchmark the performance of various machine learning models in predicting financial distress of publicly traded U.S. firms. We find that gradient boosted trees outperform other models in one-year-ahead forecasts. Variable permutation tests show that excess stock returns, idiosyncratic risk, and relative size are the more important variables for predictions. Textual features derived from corporate filings do not improve performance materially. In a credit competition model that accounts for the asymmetric cost of default misclassification, the survival random forest is able to capture large dollar profits.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.12051&r=big
  3. By: Piero Mazzarisi; Adele Ravagnani; Paola Deriu; Fabrizio Lillo; Francesca Medda; Antonio Russo
    Abstract: Identifying market abuse activity from data on investors' trading activity is very challenging both for the data volume and for the low signal to noise ratio. Here we propose two complementary unsupervised machine learning methods to support market surveillance aimed at identifying potential insider trading activities. The first one uses clustering to identify, in the vicinity of a price sensitive event such as a takeover bid, discontinuities in the trading activity of an investor with respect to his/her own past trading history and on the present trading activity of his/her peers. The second unsupervised approach aims at identifying (small) groups of investors that act coherently around price sensitive events, pointing to potential insider rings, i.e. a group of synchronised traders displaying strong directional trading in rewarding position in a period before the price sensitive event. As a case study, we apply our methods to investor resolved data of Italian stocks around takeover bids.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.05912&r=big
  4. By: Kazuhiko Shinoda; Takahiro Hoshino
    Abstract: In various fields of data science, researchers are often interested in estimating the ratio of conditional expectation functions (CEFR). Specifically in causal inference problems, it is sometimes natural to consider ratio-based treatment effects, such as odds ratios and hazard ratios, and even difference-based treatment effects are identified as CEFR in some empirically relevant settings. This chapter develops the general framework for estimation and inference on CEFR, which allows the use of flexible machine learning for infinite-dimensional nuisance parameters. In the first stage of the framework, the orthogonal signals are constructed using debiased machine learning techniques to mitigate the negative impacts of the regularization bias in the nuisance estimates on the target estimates. The signals are then combined with a novel series estimator tailored for CEFR. We derive the pointwise and uniform asymptotic results for estimation and inference on CEFR, including the validity of the Gaussian bootstrap, and provide low-level sufficient conditions to apply the proposed framework to some specific examples. We demonstrate the finite-sample performance of the series estimator constructed under the proposed framework by numerical simulations. Finally, we apply the proposed method to estimate the causal effect of the 401(k) program on household assets.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.13145&r=big
  5. By: Laureti, Lucio; Costantiello, Alberto; Leogrande, Angelo
    Abstract: In this article we investigate the role of “Renewable Energy Consumption” in the context of Circular Economy. We use data from the World Bank for 193 countries in the period 2011-2020. We perform several econometric techniques i.e., Panel Data with Fixed Effects, Panel Data with Random Effects, Pooled OLS, WLS. Our results show that “Renewable Energy Consumption” is positively associated among others to “Cooling Degree Days” and “Adjusted savings: net forest depletion” and negatively associated among others to “GHG net emissions/removals by LUCF” and “Mean Drought Index”. Furthermore, we perform a cluster analysis with the application of the k-Means algorithm optimized with the Silhouette Coefficient and we find the presence of two clusters. Finally, we compare eight different machine learning algorithms to predict the value of Renewable Energy Consumption. Our results show that the Polynomial Regression is the best algorithm in the sense of prediction and that on average the renewable energy consumption is expected to growth of 2.61%.
    Keywords: Environmental Economics, General, Valuation of Environmental Effects, Pollution Control Adoption and Costs, Recycling.
    JEL: Q5 Q50 Q51 Q52 Q53
    Date: 2022–12–25
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:115763&r=big
  6. By: Pouriya Khalilian; Sara Azizi; Mohammad Hossein Amiri; Javad T. Firouzjaee
    Abstract: National Association of Securities Dealers Automated Quotations(NASDAQ) is an American stock exchange based. It is one of the most valuable stock economic indices in the world and is located in New York City \cite{pagano2008quality}. The volatility of the stock market and the influence of economic indicators such as crude oil, gold, and the dollar in the stock market, and NASDAQ shares are also affected and have a volatile and chaotic nature \cite{firouzjaee2022lstm}.In this article, we have examined the effect of oil, dollar, gold, and the volatility of the stock market in the economic market, and then we have also examined the effect of these indicators on NASDAQ stocks. Then we started to analyze the impact of the feedback on the past prices of NASDAQ stocks and its impact on the current price. Using PCA and Linear Regression algorithm, we have designed an optimal dynamic learning experience for modeling these stocks. The results obtained from the quantitative analysis are consistent with the results of the qualitative analysis of economic studies, and the modeling done with the optimal dynamic experience of machine learning justifies the current price of NASDAQ shares.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.12044&r=big
  7. By: Jiashu Lou; Leyi Cui; Ye Li
    Abstract: With the increasing enrichment and development of the financial derivatives market, the frequency of transactions is also faster and faster. Due to human limitations, algorithms and automatic trading have recently become the focus of discussion. In this paper, we propose a bidirectional LSTM neural network based on an attention mechanism, which is based on two popular assets, gold and bitcoin. In terms of Feature Engineering, on the one hand, we add traditional technical factors, and at the same time, we combine time series models to develop factors. In the selection of model parameters, we finally chose a two-layer deep learning network. According to AUC measurement, the accuracy of bitcoin and gold is 71.94% and 73.03% respectively. Using the forecast results, we achieved a return of 1089.34% in two years. At the same time, we also compare the attention Bi-LSTM model proposed in this paper with the traditional model, and the results show that our model has the best performance in this data set. Finally, we discuss the significance of the model and the experimental results, as well as the possible improvement direction in the future.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.03443&r=big
  8. By: Jan Ditzen; Francesco Ravazzolo
    Abstract: For western economies a long-forgotten phenomenon is on the horizon: rising inflation rates. We propose a novel approach christened D2ML to identify drivers of national inflation. D2ML combines machine learning for model selection with time dependent data and graphical models to estimate the inverse of the covariance matrix, which is then used to identify dominant drivers. Using a dataset of 33 countries, we find that the US inflation rate and oil prices are dominant drivers of national inflation rates. For a more general framework, we carry out Monte Carlo simulations to show that our estimator correctly identifies dominant drivers.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.05841&r=big
  9. By: Pierre Bras
    Abstract: Stochastic Gradient Descent Langevin Dynamics (SGLD) algorithms, which add noise to the classic gradient descent, are known to improve the training of neural networks in some cases where the neural network is very deep. In this paper we study the possibilities of training acceleration for the numerical resolution of stochastic control problems through gradient descent, where the control is parametrized by a neural network. If the control is applied at many discretization times then solving the stochastic control problem reduces to minimizing the loss of a very deep neural network. We numerically show that Langevin algorithms improve the training on various stochastic control problems like hedging and resource management, and for different choices of gradient descent methods.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.12018&r=big
  10. By: Bauer, Kevin; Nofer, Michael; Abdel-Karim, Benjamin M.; Hinz, Oliver
    Abstract: Advances in Machine Learning (ML) led organizations to increasingly implement predictive decision aids intended to improve employees' decision-making performance. While such systems improve organizational efficiency in many contexts, they might be a double-edged sword when there is the danger of a system discontinuance. Following cognitive theories, the provision of ML-based predictions can adversely affect the development of decision-making skills that come to light when people lose access to the system. The purpose of this study is to put this assertion to the test. Using a novel experiment specifically tailored to deal with organizational obstacles and endogeneity concerns, we show that the initial provision of ML decision aids can latently prevent the development of decision-making skills which later becomes apparent when the system gets discontinued. We also find that the degree to which individuals "blindly" trust observed predictions determines the ultimate performance drop in the post-discontinuance phase. Our results suggest that making it clear to people that ML decision aids are imperfect can have its benefits especially if there is a reasonable danger of (temporary) system discontinuances.
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:zbw:safewp:370&r=big
  11. By: Langenbucher, Katja
    Abstract: Search costs for lenders when evaluating potential borrowers are driven by the quality of the underwriting model and by access to data. Both have undergone radical change over the last years, due to the advent of big data and machine learning. For some, this holds the promise of inclusion and better access to finance. Invisible prime applicants perform better under AI than under traditional metrics. Broader data and more refined models help to detect them without triggering prohibitive costs. However, not all applicants profit to the same extent. Historic training data shape algorithms, biases distort results, and data as well as model quality are not always assured. Against this background, an intense debate over algorithmic discrimination has developed. This paper takes a first step towards developing principles of fair lending in the age of AI. It submits that there are fundamental difficulties in fitting algorithmic discrimination into the traditional regime of antidiscrimination laws. Received doctrine with its focus on causation is in many cases ill-equipped to deal with algorithmic decision-making under both, disparate treatment, and disparate impact doctrine.0F 1 The paper concludes with a suggestion to reorient the discussion and with the attempt to outline contours of fair lending law in the age of AI.
    Keywords: credit scoring methodology,AI enabled credit scoring,AI borrower classification,responsible lending,credit scoring regulation,financial privacy,statistical discrimination
    JEL: C18 C32 K12 K23 K33 K40 J14 O31 O33
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:zbw:safewp:369&r=big
  12. By: Alicia von Schenk; Victor Klockmann; Jean-Fran\c{c}ois Bonnefon; Iyad Rahwan; Nils K\"obis
    Abstract: People are not very good at detecting lies, which may explain why they refrain from accusing others of lying, given the social costs attached to false accusations - both for the accuser and the accused. Here we consider how this social balance might be disrupted by the availability of lie-detection algorithms powered by Artificial Intelligence. Will people elect to use lie detection algorithms that perform better than humans, and if so, will they show less restraint in their accusations? We built a machine learning classifier whose accuracy (67\%) was significantly better than human accuracy (50\%) in a lie-detection task and conducted an incentivized lie-detection experiment in which we measured participants' propensity to use the algorithm, as well as the impact of that use on accusation rates. We find that the few people (33\%) who elect to use the algorithm drastically increase their accusation rates (from 25\% in the baseline condition up to 86% when the algorithm flags a statement as a lie). They make more false accusations (18pp increase), but at the same time, the probability of a lie remaining undetected is much lower in this group (36pp decrease). We consider individual motivations for using lie detection algorithms and the social implications of these algorithms.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.04277&r=big
  13. By: Chunmeng Yang; Siqi Bu; Yi Fan; Wayne Xinwei Wan; Ruoheng Wang; Aoife Foley
    Abstract: To meet widely recognised carbon neutrality targets, over the last decade metropolitan regions around the world have implemented policies to promote the generation and use of sustainable energy. Nevertheless, there is an availability gap in formulating and evaluating these policies in a timely manner, since sustainable energy capacity and generation are dynamically determined by various factors along dimensions based on local economic prosperity and societal green ambitions. We develop a novel data-driven platform to predict and evaluate energy transition policies by applying an artificial neural network and a technology diffusion model. Using Singapore, London, and California as case studies of metropolitan regions at distinctive stages of energy transition, we show that in addition to forecasting renewable energy generation and capacity, the platform is particularly powerful in formulating future policy scenarios. We recommend global application of the proposed methodology to future sustainable energy transition in smart regions.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.07019&r=big
  14. By: Mois\'es Ram\'irez; Raziel Ru\'iz; Nathan Klarer
    Abstract: Fooji Inc. is a social media engagement platform that has created a proprietary "Just-in-time" delivery network to provide prizes to social media marketing campaign participants in real-time. In this paper, we prove the efficacy of the "Just-in-time" delivery network through a cluster analysis that extracts and presents the underlying drivers of campaign engagement. We utilize a machine learning methodology with a principal component analysis to organize Fooji campaigns across these principal components. The arrangement of data across the principal component space allows us to expose underlying trends using a $K$-means clustering technique. The most important of these trends is the demonstration of how the "Just-in-time" delivery network improves social media engagement.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.12285&r=big
  15. By: Patrick Rehill
    Abstract: Methods for learning optimal policies use causal machine learning models to create human-interpretable rules for making choices around the allocation of different policy interventions. However, in realistic policy-making contexts, decision-makers often care about trade-offs between outcomes, not just singlemindedly maximising utility for one outcome. This paper proposes an approach termed Multi-Objective Policy Learning (MOPoL) which combines optimal decision trees for policy learning with a multi-objective Bayesian optimisation approach to explore the trade-off between multiple outcomes. It does this by building a Pareto frontier of non-dominated models for different hyperparameter settings. The key here is that a low-cost surrogate function can be an accurate proxy for the very computationally costly optimal tree in terms of expected regret. This surrogate can be fit many times with different hyperparameter values to proxy the performance of the optimal model. The method is applied to a real-world case-study of conditional cash transfers in Morocco where hybrid (partially optimal, partially greedy) policy trees provide good performance as a surrogate for optimal trees while being computationally cheap enough to feasibly fit a Pareto frontier.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.06312&r=big
  16. By: Pletcher, Scott Nicholas
    Abstract: Video analytics is the practice of combining digital video data with machine learning models to infer various characteristics from that video. This capability has been used for years to detect objects, movement and the number of customers in physical retail stores but more complex machine learning models combined with more powerful computing power has unlocked new levels of possibility. Researchers claim it is now possible to infer a whole host of characteristics about an individual using video analytics–such as specific age, ethnicity, health status and emotional state. Moreover, an individual’s visual identity can be augmented with information from other data providers to build out a detailed profile–all with the individual unknowingly contributing their physical presence in front of a retail store camera. Some retailers have begun to experiment with this new technology as a way to better know their customers. However, those same early adopters are caught in an evolving legal landscape around privacy and data ownership. This research looks into the current legal landscape and legislation currently in progress around the use of video analytics, specifically in the retail in-store setting. Because the ethical and legal norms around individualized video analytics are still heavily in flux, retailers are urged to adopt a ‘wait-and-see’ approach or potentially incur costly legal expenses and risk damage to their brand.
    Date: 2022–12–06
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:tfw96&r=big
  17. By: Claudio Vitari (AMU - Aix Marseille Université, CERGAM - Centre d'Études et de Recherche en Gestion d'Aix-Marseille - AMU - Aix Marseille Université - UTLN - Université de Toulon); Elisabetta Raguseo (Polito - Politecnico di Torino = Polytechnic of Turin); Federico Pigni (EESC-GEM Grenoble Ecole de Management)
    Abstract: Firms adopt Big data solutions, but a body of evidence suggests that Big data in some cases may create more problems than benefits. We hypothesize that the problem may not be Big data in itself but rather too much of it. These kinds of effects echo the Too-Much-of-a-Good-Thing (TMGT) effect in the field of management. This theory also seems meaningful and applicable in management information systems. We contribute to assessments of the TMGT effect related to Big data by providing an answer to the following question: When does the extension of Big data lead to value erosion? We collected data from a sample of medium and large firms and established a set of regression models to test the relationship between Big data and value creation, considering firm size as a moderator. The data confirm the existence of both an inverted U-shaped curve and firm size moderation. These results extend the applicability of the TMGT effect theory and are useful for firms exploring investments in Big data.
    Keywords: Too-Much-of-a-Good-Thing effect,inverted U-shaped curve,Big data,business value,medium and large firms
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-03876785&r=big
  18. By: Marco Di Francesco; Kevin Kamm
    Abstract: In this paper, we improve the performance of the large basket approximation developed by Reisinger et al. to calibrate Collateralized Debt Obligations (CDO) to iTraxx market data. The iTraxx tranches and index are computed using a basket of size $K= 125$. In the context of the large basket approximation, it is assumed that this is sufficiently large to approximate it by a limit SPDE describing the portfolio loss of a basket with size $K\rightarrow \infty$. For the resulting SPDE, we show four different numerical methods and demonstrate how the Magnus expansion can be applied to efficiently solve the large basket SPDE with high accuracy. Moreover, we will calibrate a structural model to the available market data. For this, it is important to efficiently infer the so-called initial distances to default from the Credit Default Swap (CDS) quotes of the constituents of the iTraxx for the large basket approximation. We will show how Deep Learning techniques can help us to improve the performance of this step significantly. We will see in the end a good fit to the market data and develop a highly parallelizable numerical scheme using GPU and multithreading techniques.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.12318&r=big
  19. By: Torre Leonardo; González Eva; Casillas Ramón; Alvarado Jorge
    Abstract: This paper uses, for the first time, information in text format from 9, 802 interviews performed between January 2016 and January 2021 from the Programa Trimestral de Entrevistas a Directivos, employed to elaborate Banco de Mexico's Regional Economic Report, to estimate regional and national sentiment indexes. These indexes are next associated with different "soft" and "hard" data of economic activity. The results show positive and statistically significant correlations between both types of indicators, mainly at the national level, suggesting that data in text format contained in the Programa Trimestral de Entrevistas a Directivos can be useful to complement the information provided by traditional indicators of economic activity.
    Keywords: Sentiment Analysis;Regional Analysis;Mexico
    JEL: C45 R11 R15
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:bdm:wpaper:2022-18&r=big
  20. By: Alireza Jafari; Saman Haratizadeh
    Abstract: Market prediction plays a major role in supporting financial decisions. An emerging approach in this domain is to use graphical modeling and analysis to for prediction of next market index fluctuations. One important question in this domain is how to construct an appropriate graphical model of the data that can be effectively used by a semi-supervised GNN to predict index fluctuations. In this paper, we introduce a framework called NETpred that generates a novel heterogeneous graph representing multiple related indices and their stocks by using several stock-stock and stock-index relation measures. It then thoroughly selects a diverse set of representative nodes that cover different parts of the state space and whose price movements are accurately predictable. By assigning initial predicted labels to such a set of nodes, NETpred makes sure that the subsequent GCN model can be successfully trained using a semi-supervised learning process. The resulting model is then used to predict the stock labels which are finally aggregated to infer the labels for all the index nodes in the graph. Our comprehensive set of experiments shows that NETpred improves the performance of the state-of-the-art baselines by 3%-5% in terms of F-score measure on different well-known data sets.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.05916&r=big
  21. By: Phoebe Koundouri; Konstantinos Dellis
    Abstract: The notion of Human Security has regained traction in the public domain, mostly following the disruptive impact of the global pandemic and the geopolitical tensions in Eastern Europe. The concept, however, was molded during the second half of the twentieth century, as scholars, policy makers and the public became ever more disillusioned with the focus on national security that dominated the public domain. The pressing issues of climate change, health challenges and human rights violations in the 21st century have resulted in elevated policy attention and resources for these issues in the form of targeted reports, concepts, metrics, empirical and theoretical research. Having said that, the introduction, monitoring and implementation of the SDGs within the UN 203o Agenda is inherently related to the concept of human security and its components This paper attempts to briefly summarize the methods and concepts of the reports germane to human security and its classifications. In addition, we provide an initial conceptual mapping of the proposed measures to the Sustainable Development Goals. The process provides ample fodder for future research on the interlinkages between human security measures and all 169 targets within the SDGs using up-to-date machine learning techniques.
    Keywords: Human Security, Sustainable Development Goals, Economic Development
    JEL: F63 I31 O15
    Date: 2022–12–16
    URL: http://d.repec.org/n?u=RePEc:aue:wpaper:2234&r=big
  22. By: Yihang Fu; Zesen Zhuang; Luyao Zhang
    Abstract: Blockchain has empowered computer systems to be more secure using a distributed network. However, the current blockchain design suffers from fairness issues in transaction ordering. Miners are able to reorder transactions to generate profits, the so-called miner extractable value (MEV). Existing research recognizes MEV as a severe security issue and proposes potential solutions, including prominent Flashbots. However, previous studies have mostly analyzed blockchain data, which might not capture the impacts of MEV in a much broader AI society. Thus, in this research, we applied natural language processing (NLP) methods to comprehensively analyze topics in tweets on MEV. We collected more than 20000 tweets with \#MEV and \#Flashbots hashtags and analyzed their topics. Our results show that the tweets discussed profound topics of ethical concern, including security, equity, emotional sentiments, and the desire for solutions to MEV. We also identify the co-movements of MEV activities on blockchain and social media platforms. Our study contributes to the literature at the interface of blockchain security, MEV solutions, and AI ethics.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.06951&r=big
  23. By: Weilin Fu; Zhuoran Li; Yupeng Zhang; Xingyou Zhou
    Abstract: Every financial crisis has caused a dual shock to the global economy. The shortage of market liquidity, such as default in debt and bonds, has led to the spread of bankruptcies, such as Lehman Brothers in 2008. Using the data for the ETFs of the S&P 500, Nasdaq 100, and Dow Jones Industrial Average collected from Yahoo Finance, this study implemented Deep Learning, Neuro Network, and Time-series to analyze the trend of the American Stock Market in the post-COVID-19 period. LSTM model in Neuro Network to predict the future trend, which suggests the US stock market keeps falling for the post-COVID-19 period. This study reveals a reasonable allocation method of Long Short-Term Memory for which there is strong evidence.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.05369&r=big
  24. By: Pilar Rey del Castillo
    Abstract: The availability of copious amounts of data produced by the increasing datification of our society is nowadays deemed an opportunity to produce timely and convenient statistical information. This paper shows the building of economic sentiment indexes from the texts of the most read economic newspapers in Spain. The data are collected through the scraping of the Digital Periodical and Newspaper Library website. To compute the sentiment, an existing emotional lexicon for Spanish words has been customized, allowing for inferring sentiment for words in texts. The resulting indexes are later compared to other well-known indicators that try to monitor similar or related phenomena.
    Keywords: index numbers, large datasets, leading indicators, proxy variables, sentiment analysis, web scraping
    JEL: C18 C43 C55 C89
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_10087&r=big
  25. By: Alessandro Gnoatto; Silvia Lavagnini; Athena Picarelli
    Abstract: We present a novel computational approach for quadratic hedging in a high-dimensional incomplete market. This covers both mean-variance hedging and local risk minimization. In the first case, the solution is linked to a system of BSDEs, one of which being a backward stochastic Riccati equation (BSRE); in the second case, the solution is related to the F\"olmer-Schweizer decomposition and is also linked to a BSDE. We apply (recursively) a deep neural network-based BSDE solver. Thanks to this approach, we solve high-dimensional quadratic hedging problems, providing the entire hedging strategies paths, which, in alternative, would require to solve high dimensional PDEs. We test our approach with a classical Heston model and with a multi-dimensional generalization of it.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.12725&r=big
  26. By: Polina Lemenkova; Olivier Debeir
    Abstract: In this paper, an image analysis framework is formulated for Landsat-8 Operational Land Imager and Thermal Infrared Sensor (OLI/TIRS) scenes using the R programming language. The libraries of R are shown to be effective in remote sensing data processing tasks, such as classification using k-means clustering and computing the Normalized Difference Vegetation Index (NDVI). The data are processed using an integration of the RStoolbox, terra, raster, rgdal and auxiliary packages of R. The proposed approach to image processing using R is designed to exploit the parameters of image bands as cues to detect land cover types and vegetation parameters corresponding to the spectral reflectance of the objects represented on the Earth’s surface. Our method is effective at processing the time series of the images taken at various periods to monitor the landscape dynamics in the middle part of the Congo River basin, Democratic Republic of the Congo (DRC). Whereas previous approaches primarily used Geographic Information System (GIS) software, we proposed to explicitly use the scripting methods for satellite image analysis by applying the extended functionality of R. The application of scripts for geospatial data is an effective and robust method compared with the traditional approaches due to its high automation and machine-based graphical processing. The algorithms of the R libraries are adjusted to spatial operations, such as projections and transformations, object topology, classification and map algebra. The data include Landsat-8 OLI-TIRS covering the three regions along the Congo river, Bumba, Basoko and Kisangani, for the years 2013, 2015 and 2022. We also validate the performance of graphical data handling for cartographic visualization using R libraries for visualising changes in land cover types by k-means clustering and calculation of the NDVI for vegetation analysis.
    Keywords: image processing; remote sensing; Landsat; R language; programming; cartography; mapping; data visualization; NDVI; Africa
    JEL: Y91 Q20 Q24 Q23 Q01 R11 O44 O13 Q51 Q55 N57 C61
    Date: 2022–12–07
    URL: http://d.repec.org/n?u=RePEc:ulb:ulbeco:2013/352357&r=big
  27. By: Matthias Feiler
    Abstract: The paper addresses the problem of finding the causal direction between two associated variables. The proposed solution is to build an autoencoder of their joint distribution and to maximize its estimation capacity relative to both the marginal distributions. It is shown that the resulting two capacities cannot, in general, be equal. This leads to a new criterion for causal discovery: the higher capacity is consistent with the unconstrained choice of a distribution representing the cause while the lower capacity reflects the constraints imposed by the mechanism on the distribution of the effect. Estimation capacity is defined as the ability of the auto-encoder to represent arbitrary datasets. A regularization term forces it to decide which one of the variables to model in a more generic way i.e., while maintaining higher model capacity. The causal direction is revealed by the constraints encountered while encoding the data instead of being measured as a property of the data itself. The idea is implemented and tested using a restricted Boltzmann machine.
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2212.04235&r=big
  28. By: Nam, Rachel J.
    Abstract: With open banking, consumers take greater control over their own financial data and share it at their discretion. Using a rich set of loan application data from the largest German FinTech lender in consumer credit, this paper studies what characterizes borrowers who share data and assesses its impact on loan application outcomes. I show that riskier borrowers share data more readily, which subsequently leads to an increase in the probability of loan approval and a reduction in interest rates. The effects hold across all credit risk profiles but are the most pronounced for borrowers with lower credit scores (a higher increase in loan approval rate) and higher credit scores (a larger reduction in interest rate). I also find that standard variables used in credit scoring explain substantially less variation in loan application outcomes when customers share data. Overall, these findings suggest that open banking improves financial inclusion, and also provide policy implications for regulators engaged in the adoption or extension of open banking policies.
    Keywords: Open banking,FinTech,Marketplace lending,P2P lending,Big data,Customer data sharing,Data access,Data portability,Digital footprints
    Date: 2022
    URL: http://d.repec.org/n?u=RePEc:zbw:safewp:364&r=big
  29. By: Artís, Annalí Casanueva (Paris School of Economics); Avetian, Vladimir (Université Paris-Dauphine); Sardoschau, Sulin (Humboldt University Berlin); Saxena, Kavya (affiliation not available)
    Abstract: How do modern social movements broaden their base? Prompted by the viral video footage of George Floyd's murder, the Black Lives Matter (BLM) movement gained unprecedented scope in the spring of 2020. In this paper, we show that pandemic exposure (COVID-19 related deaths) significantly increased the take-up of social media and subsequently mobilized protesters in whiter, more affluent and suburban counties with low ex-ante probability of protesting. We exploit Super Spreader Events in the early stages of the pandemic as a source of plausibly exogenous variation at the county level and develop a novel index of social media penetration, using information from more than 45 million tweets, google searches and mobility data. We show that a one standard deviation increase in pandemic exposure increased the number of new Twitter accounts by 27% and increased protest propensity by 9 percentage points. Our results suggest that social media can be persuasive and inspire action outside of traditional coalitions.
    Keywords: social media, BLM, protest, COVID-19
    JEL: P16 D7
    Date: 2022–12
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp15812&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.