nep-big New Economics Papers
on Big Data
Issue of 2019‒01‒21
25 papers chosen by
Tom Coupé
University of Canterbury

  1. What is the Value Added by using Causal Machine Learning Methods in a Welfare Experiment Evaluation? By Anthony Strittmatter
  2. Empirical Asset Pricing via Machine Learning By Shihao Gu; Bryan Kelly; Dacheng Xiu
  3. Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting By Yaodong Yang; Alisa Kolesnikova; Stefan Lessmann; Tiejun Ma; Ming-Chien Sung; Johnnie E. V. Johnson
  4. Deep Learning for Ranking Response Surfaces with Applications to Optimal Stopping Problems By Ruimeng Hu
  5. Artificial Intelligence, Jobs, Inequality and Productivity: Does Aggregate Demand Matter? By Gries, Thomas; Naudé, Wim
  6. Forecasting economic decisions under risk: The predictive importance of choice-process data By Steffen Q. Mueller; Patrick Ring; Maria Schmidt
  7. Learning Policy Levers: Toward Automated Policy Analysis Using Judicial Corpora By Ash, Elliott; Chen, Daniel L.; Delgado, Raul; Fierro, Eduardo; Lin, Shasha
  8. Learning Policy Levers: Toward Automated Policy Analysis Using Judicial Corpora By Ash, Elliott; Chen, Daniel L.; Delgado, Raul; Fierro, Eduardo; Lin, Shasha
  9. Data is Different: Why the World Needs a New Approach to Governing Cross-border Data Flows By Susan Aaronson
  10. Data Minefield: How AI is Prodding Governments to Rethink Trade in Data By Susan Aaronson
  11. Judicial Analytics and the Great Transformation of American Law By Chen, Daniel L.
  12. Patterns of domestic and cross-border e-commerce in Spain: A gravitational model approach By Hicham Ganga; Javier Alonso; Vincenzo Spiezia; Jan Tscheke
  13. Top Lights - Bright Cities and their Contribution to Economic Development By Richard Bluhm; Melanie Krause
  14. The logical evolution of internet governance policy in China from 1994-2017: A computational textual analysis approach By Wu, Jun
  15. In Search of Information: Use of Google Trends’ Data to Narrow Information Gaps for Low-income Developing Countries By Futoshi Narita; Rujun Yin
  16. Double Deep Q-Learning for Optimal Execution By Brian Ning; Franco Ho Ting Ling; Sebastian Jaimungal
  17. Bright Investments: Measuring the Impact of Transport Infrastructure Using Luminosity Data in Haiti By Mitnik, Oscar A.; Sanchez, Raul; Yanez-Pagans, Patricia
  18. End-Of-Conflict Deforestation: Evidence From Colombian’s Peace Agreement By Prem, M; Saavedra, S; Vargas, J.F
  19. News on Fake News – Media Portrayals of Fake News by Japanese News Media By Cheng, John W.; Mitomo, Hitoshi
  20. Automated Classification of Modes of Moral Reasoning in Judicial Decisions By Ash, Elliott; Chen, Daniel L.; Mainali, Nischal; Meier, Liam
  21. Attorney Voice and the U.S. Supreme Court By Chen, Daniel L.
  22. Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence By Knaus, Michael C.; Lechner, Michael; Strittmatter, Anthony
  23. Investor Sentiment and Attention in Capital Markets - A (Social) Media Perspective By Ton, Quoc-Thai
  24. To (Psychologically) Own Data is to Protect Data: How Psychological Ownership Determines Protective Behavior in a Work and Private Context By Heidt, Margareta; Olt, Christian; Buxmann, Peter
  25. The anti-competition measures and policy remedies in the data economy By Chou, Yuntsai

  1. By: Anthony Strittmatter
    Abstract: I investigate causal machine learning (CML) methods to estimate effect heterogeneity by means of conditional average treatment effects (CATEs). In particular, I study whether the estimated effect heterogeneity can provide evidence for the theoretical labour supply predictions of Connecticut's Jobs First welfare experiment. For this application, Bitler, Gelbach, and Hoynes (2017) show that standard CATE estimators fail to provide evidence for theoretical labour supply predictions. Therefore, this is an interesting benchmark to showcase the value added by using CML methods. I report evidence that the CML estimates of CATEs provide support for the theoretical labour supply predictions. Furthermore, I document some reasons why standard CATE estimators fail to provide evidence for the theoretical predictions. However, I show the limitations of CML methods that prevent them from identifying all the effect heterogeneity of Jobs First.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1812.06533&r=all
  2. By: Shihao Gu; Bryan Kelly; Dacheng Xiu
    Abstract: We synthesize the field of machine learning with the canonical problem of empirical asset pricing: measuring asset risk premia. In the familiar empirical setting of cross section and time series stock return prediction, we perform a comparative analysis of methods in the machine learning repertoire, including generalized linear models, dimension reduction, boosted regression trees, random forests, and neural networks. At the broadest level, we find that machine learning offers an improved description of expected return behavior relative to traditional forecasting methods. Our implementation establishes a new standard for accuracy in measuring risk premia summarized by an unprecedented out-of-sample return prediction R2. We identify the best performing methods (trees and neural nets) and trace their predictive gains to allowance of nonlinear predictor interactions that are missed by other methods. Lastly, we find that all methods agree on the same small set of dominant predictive signals that includes variations on momentum, liquidity, and volatility. Improved risk premia measurement through machine learning can simplify the investigation into economic mechanisms of asset pricing and justifies its growing role in innovative financial technologies.
    JEL: C45 C58 G11 G12
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:25398&r=all
  3. By: Yaodong Yang; Alisa Kolesnikova; Stefan Lessmann; Tiejun Ma; Ming-Chien Sung; Johnnie E. V. Johnson
    Abstract: The success of deep learning for unstructured data analysis is well documented but little evidence has emerged related to the structured, tabular datasets used in decision support. We address this research gap by considering the potential of deep learning to support financial risk management. In particular, we develop a deep learning model for predicting whether individual spread traders are likely to secure profits from future trades. This embodies typical modeling challenges faced in risk and behavior forecasting. Conventional machine learning requires data that is representative of the feature-target relationship and relies on the often costly development, maintenance, and revision of handcrafted features. Consequently, modeling highly variable, heterogeneous patterns such as the behavior of traders is challenging. Deep learning promises a remedy. Learning hierarchical distributed representations of the raw data in an automatic manner (e.g. risk taking behavior), it uncovers generative features that determine the target (e.g., trader's profitability), avoids manual feature engineering, and is more robust toward change (e.g. dynamic market conditions). The results of employing a deep network for operational risk forecasting confirm the feature learning capability of deep learning, provide guidance on designing a suitable network architecture and demonstrate the superiority of deep learning over powerful machine learning benchmarks. Empirical results suggest that the financial institution which provided the data can increase annual profits by 16% through implementing a deep learning based risk management policy. The findings demonstrate the potential of applying deep learning methods for management science problems in finance, marketing, and accounting.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1812.06175&r=all
  4. By: Ruimeng Hu
    Abstract: In this paper, we propose deep learning algorithms for ranking response surfaces, with applications to optimal stopping problems in financial mathematics. The problem of ranking response surfaces is motivated by estimating optimal feedback policy maps in stochastic control problems, aiming to efficiently find the index associated to the minimal response across the entire continuous input space $\mathcal{X} \subseteq \mathbb{R}^d$. By considering points in $\mathcal{X}$ as pixels and indices of the minimal surfaces as labels, we recast the problem as an image segmentation problem, which assigns a label to every pixel in an image such that pixels with the same label share certain characteristics. This provides an alternative method for efficiently solving the problem instead of using sequential design in our previous work [R. Hu and M. Ludkovski, SIAM/ASA Journal on Uncertainty Quantification, 5 (2017), 212--239]. Deep learning algorithms are scalable, parallel and model-free, i.e., no parametric assumptions needed on the response surfaces. Considering ranking response surfaces as image segmentation allows one to use a broad class of deep neural networks, e.g., UNet, SegNet, DeconvNet, which have been widely applied and numerically proved to possess high accuracy in the field. We also systematically study the dependence of deep learning algorithms on the input data generated on uniform grids or by sequential design sampling, and observe that the performance of deep learning is {\it not} sensitive to the noise and locations (close to/away from boundaries) of training data. We present a few examples including synthetic ones and the Bermudan option pricing problem to show the efficiency and accuracy of this method.
    Date: 2019–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1901.03478&r=all
  5. By: Gries, Thomas (University of Paderborn); Naudé, Wim (Maastricht University)
    Abstract: Rapid technological progress in artificial intelligence (AI) has been predicted to lead to mass unemployment, rising inequality, and higher productivity growth through automation. In this paper we critically re-assess these predictions by (i) surveying the recent literature and (ii) incorporating AI-facilitated automation into a product variety-model, frequently used in endogenous growth theory, but modified to allow for demand-side constraints. This is a novel approach, given that endogenous growth models, and including most recent work on AI in economic growth, are largely supply-driven. Our contribution is motivated by two reasons. One is that there are still only very few theoretical models of economic growth that incorporate AI, and moreover an absence of growth models with AI that takes into consideration growth constraints due to insufficient aggregate demand. A second is that the predictions of AI causing massive job losses and faster growth in productivity and GDP are at odds with reality so far: if anything, unemployment in many advanced economies is historically low. However, wage growth and productivity is stagnating and inequality is rising. Our paper provides a theoretical explanation of this in the context of rapid progress in AI.
    Keywords: technology, artificial intelligence, productivity, labour demand, innovation, growth theory
    JEL: O47 O33 J24 E21 E25
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp12005&r=all
  6. By: Steffen Q. Mueller (Chair for Economic Policy, University of Hamburg); Patrick Ring (Social and Behavioral Approaches to Global Problems, Kiel Institute for the World Economy); Maria Schmidt (Department of Psychology, Kiel University)
    Abstract: We investigate various statistical methods for forecasting risky choices and identify important decision predictors. Subjects (n=44) are presented a series of 50/50 gambles that each involves a potential gain and a potential loss, and subjects can choose to either accept or reject a displayed lottery. From this data, we use information on 8800 individual lottery gambles and specify four predictor-sets that include different combinations of input categories: lottery design, socioeconomic characteristics, past gambling behavior, eye-movements, and various psychophysiological measures that are recorded during the first three seconds of lottery-information processing. The results of our forecasting experiment show that choice-process data can effectively be used to forecast risky gambling decisions; however, we find large differences among models’ forecasting capabilities with respect to subjects, predictor-sets, and lottery payoff structures.
    Keywords: Forecasting, lottery, risk, choice-process tracing, experiments, machine learning, decision theory
    JEL: C44 C45 C53 D87 D91
    Date: 2019–01–11
    URL: http://d.repec.org/n?u=RePEc:hce:wpaper:066&r=all
  7. By: Ash, Elliott; Chen, Daniel L.; Delgado, Raul; Fierro, Eduardo; Lin, Shasha
    Abstract: To build inputs for end-to-end machine learning estimates of the causal impacts of law, we consider the problem of automatically classifying cases by their policy impact. We propose and implement a semi-supervised multi-class learning model, with the training set being a hand-coded dataset of thousands of cases in over 20 politically salient policy topics. Using opinion text features as a set of predictors, our model can classify labeled cases by topic correctly 91% of the time. We then take the model to the broader set of unlabeled cases and show that it can identify new groups of cases by shared policy impact.
    Date: 2018–08
    URL: http://d.repec.org/n?u=RePEc:tse:iastwp:33154&r=all
  8. By: Ash, Elliott; Chen, Daniel L.; Delgado, Raul; Fierro, Eduardo; Lin, Shasha
    Abstract: To build inputs for end-to-end machine learning estimates of the causal impacts of law, we consider the problem of automatically classifying cases by their policy impact. We propose and implement a semi-supervised multi-class learning model, with the training set being a hand-coded dataset of thousands of cases in over 20 politically salient policy topics. Using opinion text features as a set of predictors, our model can classify labeled cases by topic correctly 91% of the time. We then take the model to the broader set of unlabeled cases and show that it can identify new groups of cases by shared policy impact.
    Date: 2018–08
    URL: http://d.repec.org/n?u=RePEc:tse:wpaper:33153&r=all
  9. By: Susan Aaronson (George Washington University)
    Abstract: Companies, governments, and individuals are using data to create new services such as apps, artificial intelligence (AI) and the internet of things (IoT). These data-driven services rely on large pools of data and a relatively unhindered flow of data across borders (few market access or governance barriers). The current approach to governing cross-border data flows through trade agreements has not led to binding, universal, nor interoperable rules governing the use of data. Trade diplomats first established principles to govern cross-border data flows, and then drafted e-commerce language in free trade agreements, rather than through the WTO, the most international trade agreement. Data-driven services however, will require a different domestic and international regulatory environment than that developed to facilitate e-commerce. Most countries with significant numbers of data-driven firms are in the process of debating how to regulate these services and the data that underpins them. I argue that policymakers must devise a more effective approach to regulating trade in data for four reasons: the unique nature of data as an item exchanged across borders; the sheer volume of data exchanged; much of this data exchanged across borders is personal data, and the fact that although data could be a significant source of growth, many developing countries are unprepared to participate in this new data driven economy and to build new data driven services. This article begins by with an overview and then describes how trade in data de is different from trade in goods or services. It then examines analogies used to describe data as an input, which can help understand how data could be regulated. Next, we discuss how trade policymakers are regulating trade in data and how these efforts have created a patchwork. Finally, it suggests an alternative approach. "
    Keywords: data, digital trade, AI, internet, trade, FTA, WTO
    JEL: F1 F5
    Date: 2018–10
    URL: http://d.repec.org/n?u=RePEc:gwi:wpaper:2018-10&r=all
  10. By: Susan Aaronson (George Washington University)
    Abstract: No nation alone can regulate artificial intelligence (AI) because it is built on cross- border data flows. • Countries are just beginning to figure out how best to use and to protect various types of data that are used in AI, whether proprietary, personal, public or metadata. • Countries could alter comparative advantage in data through various approaches to regulating data — for example, requiring companies to pay for personal data. • Canada should carefully monitor and integrate its domestic regulatory and trade strategies related to data utilized in AI.
    Keywords: AI, trade, FTA, WTO, internet, data
    JEL: F1 F5
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:gwi:wpaper:2018-11&r=all
  11. By: Chen, Daniel L.
    Abstract: Predictive judicial analytics holds the promise of increasing efficiency and fairness of law. Judicial analytics can assess extra-legal factors that influence decisions. Behavioral anomalies in judicial decision-making offer an intuitive understanding of feature relevance, which can then be used for debiasing the law. A conceptual distinction between inter-judge disparities in predictions and interjudge disparities in prediction accuracy suggests another normatively relevant criterion with regards to fairness. Predictive analytics can also be used in the first step of causal inference, where the features employed in the first step are exogenous to the case. Machine learning thus offers an approach to assess bias in the law and evaluate theories about the potential consequences of legal change.
    Keywords: Judicial Analytics; Causal Inference; Behavioral Judging
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:tse:iastwp:33148&r=all
  12. By: Hicham Ganga; Javier Alonso; Vincenzo Spiezia; Jan Tscheke
    Abstract: This paper presents econometric evidence on the determinants of domestic and cross-border e-commerce in Spain based on BBVA anonymised data. The paper applies the gravity model of trade to explain online credit card payment flows, using all private customer transactions of BBVA for Spain.
    Keywords: Working Paper , Consumption , Digital economy , Analysis with Big Data , Spain
    JEL: B22 F41 L81
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:bbv:wpaper:1818&r=all
  13. By: Richard Bluhm; Melanie Krause
    Abstract: The commonly-used satellite images of nighttime lights fail to capture the true brightness of most cities. We show that night lights are a reliable proxy for economic activity at the city level, provided they are first corrected for top-coding. We present a stylized model of urban luminosity and empirical evidence which both suggest that these ‘top lights’ follow a Pareto distribution. We then propose a simple correction procedure which recovers the full distribution of city lights. Applying this approach to cities in Sub-Saharan Africa, we find that primate cities are outgrowing secondary cities but are changing from within.
    Keywords: development, urban growth, night lights, top-coding, inequality
    JEL: O10 O18 R11 R12
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:ces:ceswps:_7411&r=all
  14. By: Wu, Jun
    Abstract: As China government gives full play to the role of digital technology in driving and leading the development of economy and society, policy issues on China internet governance have gained wide attractions from academics and practitioners. Grounded on a unique textual dataset collected from China official law and regulation database with over 300 central and local government laws and regulations on internet governance released during the period of 1994 and 2017,this study investigates the evolutional characteristics of the policy central topics, policy subjective and policy orientation in the timeframe of 1994 to 2017.This paper contributes to the existing knowledge accumulation not only in its uncovering of the evolutional logic hidden in the large swaths of policy text but also by introducing a computational text analysis approach to facilitate the text oriented policy evaluation.
    Keywords: Internet governance;regulation policy;China,Topic modeling
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:zbw:itsb18:190425&r=all
  15. By: Futoshi Narita; Rujun Yin
    Abstract: Timely data availability is a long-standing challenge in policy-making and analysis for low-income developing countries. This paper explores the use of Google Trends’ data to narrow such information gaps and finds that online search frequencies about a country significantly correlate with macroeconomic variables (e.g., real GDP, inflation, capital flows), conditional on other covariates. The correlation with real GDP is stronger than that of nighttime lights, whereas the opposite is found for emerging market economies. The search frequencies also improve out-of-sample forecasting performance albeit slightly, demonstrating their potential to facilitate timely assessments of economic conditions in low-income developing countries.
    Date: 2018–12–14
    URL: http://d.repec.org/n?u=RePEc:imf:imfwpa:18/286&r=all
  16. By: Brian Ning; Franco Ho Ting Ling; Sebastian Jaimungal
    Abstract: Optimal trade execution is an important problem faced by essentially all traders. Much research into optimal execution uses stringent model assumptions and applies continuous time stochastic control to solve them. Here, we instead take a model free approach and develop a variation of Deep Q-Learning to estimate the optimal actions of a trader. The model is a fully connected Neural Network trained using Experience Replay and Double DQN with input features given by the current state of the limit order book, other trading signals, and available execution actions, while the output is the Q-value function estimating the future rewards under an arbitrary action. We apply our model to nine different stocks and find that it outperforms the standard benchmark approach on most stocks using the measures of (i) mean and median out-performance, (ii) probability of out-performance, and (iii) gain-loss ratios.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1812.06600&r=all
  17. By: Mitnik, Oscar A. (Inter-American Development Bank); Sanchez, Raul (IDB Invest); Yanez-Pagans, Patricia (IDB Invest)
    Abstract: This paper quantifies the impacts of transport infrastructure investments on economic activity in Haiti, using satellite night-light luminosity as a proxy measure. Our identification strategy exploits the differential timing of rehabilitation projects across various road segments of the primary road network. We combine multiple sources of non-traditional data and carefully address concerns related to unobserved heterogeneity. The results obtained across multiple specifications consistently indicate that receiving a road rehabilitation project leads to an increase in luminosity values of between 6% and 26% at the communal section level. Taking into account the national level elasticity between luminosity values and GDP, we approximate that these interventions translate into communal section-GDP increases of between 0.5% and 2.1%, for communal sections benefited by a transport infrastructure project. We observe temporal and spatial variation in results, and crucially that the larger impacts appear once projects are completed and are concentrated within 2 km buffers around the intervened roads. Neither the richest or the poorest communities reap the benefits from road improvements, with gains accruing to those in the middle of the ranking of communal sections, based on unsatisfied basic needs. Our findings provide novel evidence on the role of transport investments in promoting economic activity in developing countries.
    Keywords: Haiti, night-time luminosity, road investments
    JEL: O1 O47 R4 D04
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp12018&r=all
  18. By: Prem, M; Saavedra, S; Vargas, J.F
    Abstract: Armed conflict can endanger natural resources through several channels such as direct predation from fighting groups, but it may also help preserve ecosystems by dissuading extractive economic activities through the fear of extortion. The effect of conflict on deforestation is thus an empirical question. This paper studies the effect on forest cover of Colombia’s recent peace negotiation between the central government and the FARC insurgency. Using yearly deforestation data from satellite images and a difference-in-differences identification strategy, we show that areas controlled by FARC prior to the declaration of a permanent ceasefire that ultimately led to a peace agreement experienced a differential increase in deforestation after the start of the ceasefire. The deforestation effect of peace is attenuated in municipalities with higher state capacity, and is exacerbated by land intensive economic activities. Our results highlight the importance of complementing peacemaking milestones with state building efforts to avoid environmental damage.
    Keywords: Deforestation, Conflict, Peace building, Colombia
    JEL: D74 Q34
    Date: 2018–12–27
    URL: http://d.repec.org/n?u=RePEc:col:000092:017068&r=all
  19. By: Cheng, John W.; Mitomo, Hitoshi
    Abstract: This study quantitatively examines how the term 'fake news' is being portrayed by the Japanese news media using semantic network analysis. It uses newspapers as the representative as they are still one of the most influential news media in Japan. The data set consists of 624 newspaper articles that contain the word 'fake news' in Japanese and its equivalents extracted from the five national Japanese newspapers between 2015 and 2017. The analysis results have revealed six main themes within the articles. They show that fake news is mainly portrayed as an American problem that it is mainly associated with 'news about the US President,' 'the Trump-Russian inquiry,' and the 'media reportage of the US President.' On top of that, fake news is also portrayed an 'informational problem' that affects society through 'human-Internet interaction' and it has some 'implications for Japan' as well.
    Keywords: fake news,media portrayal,news media,content analysis,semantic network analysis
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:zbw:itsb18:190384&r=all
  20. By: Ash, Elliott; Chen, Daniel L.; Mainali, Nischal; Meier, Liam
    Abstract: What modes of moral reasoning do judges employ? We construct a linear SVM classifier for moral reasoning mode trained on applied ethics articles written by consequentialists and deontologists. The model can classify a paragraph of text in held out data with over 90 percent accuracy. We then apply this classifier to a corpus of circuit court opinions. We show that the use of consequentialist reasoning has increased over time. We report rankings of relative use of reasoning modes by legal topic, by judge, and by judge law school.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:tse:iastwp:33158&r=all
  21. By: Chen, Daniel L.
    Abstract: Using data from 1946–2014, we show that audio features of lawyers’ introductory statements improve the performance of the best prediction models of Supreme Court outcomes. We infer voice attributes using a 15-year sample of human-labeled Supreme Court advocate voices. Audio features improved prediction of case outcomes by 1.1 percentage points. Lawyer traits receive approximately half the weight of the most important feature from the models without audio features.
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:tse:iastwp:33156&r=all
  22. By: Knaus, Michael C. (University of St. Gallen); Lechner, Michael (University of St. Gallen); Strittmatter, Anthony (University of St. Gallen)
    Abstract: We investigate the finite sample performance of causal machine learning estimators for heterogeneous causal effects at different aggregation levels. We employ an Empirical Monte Carlo Study that relies on arguably realistic data generation processes (DGPs) based on actual data. We consider 24 different DGPs, eleven different causal machine learning estimators, and three aggregation levels of the estimated effects. In the main DGPs, we allow for selection into treatment based on a rich set of observable covariates. We provide evidence that the estimators can be categorized into three groups. The first group performs consistently well across all DGPs and aggregation levels. These estimators have multiple steps to account for the selection into the treatment and the outcome process. The second group shows competitive performance only for particular DGPs. The third group is clearly outperformed by the other estimators.
    Keywords: causal machine learning, conditional average treatment effects, selection-on-observables, random forest, causal forest, lasso
    JEL: C21
    Date: 2018–12
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp12039&r=all
  23. By: Ton, Quoc-Thai
    Abstract: This dissertation examines the impact of social and traditional media on capital markets. The empirical tests focus on investor sentiment which, for example, can be captured by postings on social media platforms, innovative news databases and the textual analysis of traditional media press. The research direction of this dissertation implicitly questions the assumptions stated by the traditional finance theory. Our new empirical findings and their explanations are, hence, closely linked with the behavioral finance theory. The Efficient Market Hypothesis constitutes one of the fundamental pillars of the traditional finance theory. In this concept, the availability of information is the basic requirement for the functionality of efficient capital markets. New information is quickly and correctly incorporated into an asset’s price. The new price of an asset, therefore, immediately reflects the updated fundamental value (Fama, 1969; 1970). However, various studies have recently shown that stock market movements are not always associated with rational information about an asset’s value. The observation of over- and underreaction of asset prices to news signals or distinctive return patterns gave reason for the gaining importance of the behavioral finance theory since the 1990’s. The changing availability and the easier access to information for institutional and individual investors play an important role in this recent development. For example, Figure 1 1 (p. 3) depicts the circulation of US newspapers between 1970 and 2017. The number of households covered by traditional media press decreased from more than 60 million to around 30 million households in 2017. The establishment of the internet, on the other hand, parallelly accelerated the digital development in the media landscape. Figure 1 3 (p. 5) describes the global development of social media users since 2010. The number of social media users is expected to increase from 1 billion users in 2010 to around 3 billion users in 2021. This development not only affects the society but also a specific focus group of this dissertation: the financial investors. The way investors gather, process, and disseminate information also experienced a significant change in recent decades (Puppis et al., 2017). In this connection, the development of investor attention and sentiment for individual assets is sustainably impacted by the digitalization of media channels. Consequently, we derive four fundamental research questions, which accompany the empirical analyses of this dissertation: 1. What role does investor sentiment play in financial markets? Do investors solely follow the market, or do beliefs of investors predict future returns or other market variables? 2. How does (social) media relate to financial markets in the general daily context and specifically around news events, such as earnings or M&A announcements? 3. What kind of firms are more sensitive to investor sentiment than others? 4. Does arbitrage stabilize financial markets against noise traders? The following structure of this dissertation aims to answer these questions in the best possible way: The first chapter introduces the reader to the relevance of the topic and the leading research questions of the dissertation. The second chapter lays the theoretical foundation and describes the fundamental concepts of the traditional and also the behavioral finance theory, which aims to comprehensively explain selected market anomalies. Also, we summarize selected psychological concepts that help to explain irrational actions of investors, which potentially cause market volatility and asset prices to deviate from their fundamental value. Literature reviews on investor sentiment in close relationship with traditional and social media complete the second chapter. The third chapter encompasses the first empirical work of this dissertation and primarily explores the impact of social media on capital markets. The empirical analysis falls back to more than 4.5 million posts on the leading Australian financial internet message board HotCopper between January 2008 and May 2016. The findings suggest that social media activity is price relevant for capital markets. Positive investor sentiment, for example, is in this connection contemporaneously and significantly correlated with a stock’s abnormal return. However, the effect diminishes after one month. Arbitrage of presumably informed investors only partially countervail this effect. Postings by individual investors on social media, hence, cause capital markets to overreact to potentially non-relevant information in the short-term. However, negative investor sentiment expressed in internet message boards provides a differentiated picture. Negative investor sentiment is significantly related with the next month’s abnormal returns. Also, an increasing rate of agreement on negative investor sentiment before earnings announcements forecasts negative earnings surprises. Both findings support the information hypothesis that negative internet message board postings contain value-relevant information. The question whether social media activity induces market volatility remains ambiguous. The Granger-tests and the reactions of the impulse-response functions show a bilateral relationship between return volatility and the number of internet message board postings. However, we find in this context that individual investors react more sensitive to market volatility on social media than the other way around. In summary, the results of the first empirical work provide evidence for the economic significance of investor sentiment measured on social media and its asymmetric role in capital markets. We extend the empirical analysis in the fourth chapter of this dissertation and investigate the impact of traditional and social media on target price run-ups before bid announcements. The literature previously documented an increase in the target stock price two months prior to the official bid announcement (e.g., Keown and Pinkerton, 1981). This phenomenon is also referred to as the target run-up. One group of researchers find explanations within the insider hypothesis (leakage of insider information prior to the bid announcement). Another group argues based on the market expectation hypothesis (the market anticipates publicly available information to predict upcoming mergers). Our second empirical work considers 2,765 bid announcements in Australia between January 2008 and August 2015. We use more than 15 thousand news articles, more than 80 thousand posts on the internet message board HotCopper, analyst recommendations, and Google search queries to analyze their relationship with target run-ups before official bid announcements. Thus, we specifically examine the varying impact of investor attention of different investor groups (institutional and individual investors) on target run-ups. The results let us conclude that target run-ups of smaller, unprofitable, and growth firms are significantly related with social media coverage on HotCopper. On the contrary, similar firms that lack media coverage do not experience a significant target run-up prior to a bid announcement. Target run-ups of larger capitalization stocks are, on the other hand, more sensitive to analyst recommendations. The results are consistent with the anecdotal evidence that smaller firms are usually less covered by analysts. Social media closes the information gap for small firms in this perspective. Google search inquiries for target firms are not found to be significantly related to target run-ups. The overall findings of the second empirical work support the market expectation hypothesis. In this regard, social media contributes to the increase of market efficiency and partially closes informational blind spots for smaller firms which might exist due to inefficient allocations of resources or costly information sourcing for smaller firms. The fifth chapter comprises the last empirical work of this dissertation and explores the relationship between media press sentiment and capital markets. We specifically examine the im-pact of aggregated news sentiment indices on the cross-section of returns in the asset pricing context. The literature around asset pricing especially focuses on the determination of risk premia that help to explain stock returns. A central question of our third empirical work is, therefore, whether stock returns are associated with their underlying risk or whether these returns are just a result of irrational market movements in the spirit of the behavioral finance theory. We calculate monthly aggregated news sentiment indices based on more than 120 million unique classified news articles from the Ravenpack News Analytics database between 2000 and 2017. Thus, we construct monthly zero-investment portfolios that go long on (sell) stocks which exhibit on average positive (negative) news sentiment in the previous month. The portfolio yields an annual return of 7.5% even if we control for widely-accepted risk fac-tors, such as market, size, momentum, liquidity, profitability, and investments. The results are mainly driven by positive news sentiment. Hence, we refer this premium to the “premium on optimism”. One possible explanation could be the persistent positive news coverage in the respective time period. The probability of the publication of good news is in particularly high-er if a firm experienced positive news in the prior months. The total results of our third empirical work support the view that news sentiment reflects a risk factor. The overall results of this dissertation have several implications for firms, investors, regulators and researchers in the field of behavioral finance. Firms must learn today to early anticipate crowd movements on (social) media and to deal with putatively fake news. The investor relations department of a firm must engage in this topic more sophistically content-wise and in the communicative interaction with its stakeholders. Selective communication strategies for specific firm events are required to early prevent a potentially negative public perception of the firm. Fake news and volatile markets are also gaining in importance for regulators. The identification of manipulative activities or the stabilization of financial markets in the presence of ambiguous information is of special interest for regulators. This task is even more relevant in the time of increased digitalization of media channels and the networks behind them. The more important is, hence, a better understanding of the stakeholders in financial markets and their actions for the functionality of efficient markets. Finally, the results of this dissertation create new connection points for future research. The asymmetric role of investor sentiment and its underlying mechanism are still controversial and elusive. Current studies especially fail to shed light on the long-term impact of investor sentiment on capital markets. This dissertation, hence, provides a substantiated baseline for future empirical work. Also, this work could not fully answer the question in which situation investors specifically use different media channels for information sourcing and dissemination. An intraday-based analysis on various media channels could provide new answers to this question. In summary, this dissertation shows that investor sentiment is an integral part of today’s financial markets and its important role cannot be anymore neglected by advocates of the traditional finance theory.
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:dar:wpaper:110621&r=all
  24. By: Heidt, Margareta; Olt, Christian; Buxmann, Peter
    Date: 2019–01–02
    URL: http://d.repec.org/n?u=RePEc:dar:wpaper:110476&r=all
  25. By: Chou, Yuntsai
    Keywords: data network effect,essential data,algorithm audit,data cooperatives,data portability
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:zbw:itsb18:190372&r=all

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.