|
on Big Data |
By: | Eitan Sapiro-Gheiler |
Abstract: | The increasing digitization of political speech has opened the door to studying a new dimension of political behavior using text analysis. This work investigates the value of word-level statistical data from the US Congressional Record--which contains the full text of all speeches made in the US Congress--for studying the ideological positions and behavior of senators. Applying machine learning techniques, we use this data to automatically classify senators according to party, obtaining accuracy in the 70-95% range depending on the specific method used. We also show that using text to predict DW-NOMINATE scores, a common proxy for ideology, does not improve upon these already-successful results. This classification deteriorates when applied to text from sessions of Congress that are four or more years removed from the training set, pointing to a need on the part of voters to dynamically update the heuristics they use to evaluate party based on political speech. Text-based predictions are less accurate than those based on voting behavior, supporting the theory that roll-call votes represent greater commitment on the part of politicians and are thus a more accurate reflection of their ideological preferences. However, the overall success of the machine learning approaches studied here demonstrates that political speeches are highly predictive of partisan affiliation. In addition to these findings, this work also introduces the computational tools and methods relevant to the use of political speech data. |
Date: | 2018–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:1809.00741&r=big |
By: | Ikudo, Akina (University of California, Los Angeles); Lane, Julia (New York University); Staudt, Joseph (U.S. Census Bureau); Weinberg, Bruce A. (Ohio State University) |
Abstract: | Characterizing the work that people do on their jobs is a longstanding and core issue in labor economics. Traditionally, classification has been done manually. If it were possible to combine new computational tools and administrative wage records to generate an automated crosswalk between job titles and occupations, millions of dollars could be saved in labor costs, data processing could be sped up, data could become more consistent, and it might be possible to generate, without a lag, current information about the changing occupational composition of the labor market. This paper examines the potential to assign occupations to job titles contained in administrative data using automated, machine-learning approaches. We use a new extraordinarily rich and detailed set of data on transactional HR records of large firms (universities) in a relatively narrowly defined industry (public institutions of higher education) to identify the potential for machine-learning approaches to classify occupations. |
Keywords: | UMETRICS, occupational classifications, machine learning, administrative data, transaction data |
JEL: | J0 J21 J24 |
Date: | 2018–08 |
URL: | http://d.repec.org/n?u=RePEc:iza:izadps:dp11738&r=big |
By: | Naudé, Wim (Maastricht University); Dimitri, Nicola (University of Siena) |
Abstract: | An arms race for an artificial general intelligence (AGI) would be detrimental for and even pose an existential threat to humanity if it results in an unfriendly AGI. In this paper an all-pay contest model is developed to derive implications for public policy to avoid such an outcome. It is established that in a winner-takes all race, where players must invest in R&D, only the most competitive teams will participate. Given the difficulty of AGI the number of competing teams is unlikely ever to be very large. It is also established that the intention of teams competing in an AGI race, as well as the possibility of an intermediate prize is important in determining the quality of the eventual AGI. The possibility of an intermediate prize will raise quality of research but also the probability of finding the dominant AGI application and hence will make public control more urgent. It is recommended that the danger of an unfriendly AGI can be reduced by taxing AI and by using public procurement. This would reduce the pay-off of contestants, raise the amount of R&D needed to compete, and coordinate and incentivize co-operation, all outcomes that will help alleviate the control and political problems in AI. Future research is needed to elaborate the design of systems of public procurement of AI innovation and for appropriately adjusting the legal frameworks underpinning high-tech innovation, in particular dealing with patents created by AI. |
Keywords: | artificial intelligence, innovation, technology, public policy |
JEL: | O33 O38 O14 O15 H57 |
Date: | 2018–08 |
URL: | http://d.repec.org/n?u=RePEc:iza:izadps:dp11737&r=big |
By: | Oliver Kirchkamp (FSU Jena, School of Economics); Christina Strobel (FSU Jena, School of Economics) |
Abstract: | Humans make decisions jointly with others. They share responsibility for the outcome with their interaction partners. Today, more and more often the partner in a decision is not another human but, instead, a machine. Here we ask whether the type of the partner, machine or human, affects our responsibility, our perception of the choice and the choice itself. As a workhorse we use a modified dictator game with two joint decision makers: either two humans or one human and one machine. We find no treatment effect on perceived responsibility or guilt. We also find only a small and insignificant effect on actual choices. |
Keywords: | Human-computer interaction, Experiment, Shared responsibility, Moral wiggle room |
JEL: | C91 D63 D80 |
Date: | 2018–09–12 |
URL: | http://d.repec.org/n?u=RePEc:jrp:jrpwrp:2018-014&r=big |
By: | Everett Grant (Federal Reserve Bank of Dallas) |
Abstract: | We use daily equity returns to estimate global inter-firm networks across all major industries from 1981-2016 and test whether the network is robust or fragile, relating multinational firms' overall health with global integration. More connected firms are less likely to be in distress and have higher profit growth and equity returns, but are also more exposed to direct contagion from distressed neighboring firms and network level crises. Our machine learning analysis reveals the centrality of finance in the international firm network and increased globalization over time, with greater potential for crises to spread globally when they do occur. |
Date: | 2018 |
URL: | http://d.repec.org/n?u=RePEc:red:sed018:506&r=big |
By: | Boxell, Levi |
Abstract: | I build a dataset of over one million images used on the front page of websites around the 2016 election period. I then use machine-learning tools to detect the faces of politicians across the images and measure the nonverbal emotional content expressed by each politician. Combining this with data on the partisan composition of each website’s users, I show that websites portray politicians that align with the partisan preferences of their users with more positive emotions. I also find that nonverbal coverage by Republican-leaning websites was not consistent over the 2016 election, but became more favorable towards Donald Trump after he clinched the Republican nomination. |
Keywords: | media bias, images, emotions, nonverbal, polarization |
JEL: | C0 H0 L82 L86 |
Date: | 2018–09–17 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:89047&r=big |
By: | Will Dobbie; Andres Liberman; Daniel Paravisini; Vikram Pathania |
Abstract: | This paper tests for bias in consumer lending decisions using administrative data from a high-cost lender in the United Kingdom. We motivate our analysis using a simple model of bias in lending, which predicts that profits should be identical for loan applicants from different groups at the margin if loan examiners are unbiased. We identify the profitability of marginal loan applicants by exploiting variation from the quasi-random assignment of loan examiners. We find significant bias against both immigrant and older loan applicants when using the firm's preferred measure of long-run profits. In contrast, there is no evidence of bias when using a short-run measure used to evaluate examiner performance, suggesting that the bias in our setting is due to the misalignment of firm and examiner incentives. We conclude by showing that a decision rule based on machine learning predictions of long-run profitability can simultaneously increase profits and eliminate bias. |
JEL: | J15 J16 |
Date: | 2018–08 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:24953&r=big |
By: | Dario Buono; George Kapetanios; Massimiliano Marcellino; Gianluigi Mazzi; Fotis Papailias |
Abstract: | This paper aims at providing a primer on the use of big data in macroeconomic nowcasting and early estimation. We discuss: (i) a typology of big data characteristics relevant for macroeconomic nowcasting and early estimates, (ii) methods for features extraction from unstructured big data to usable time series, (iii) econometric methods that could be used for nowcasting with big data, (iv) some empirical nowcasting results for key target variables for four EU countries, and (v) ways to evaluate nowcasts and ash estimates. We conclude by providing a set of recommendations to assess the pros and cons of the use of big data in a specific empirical nowcasting context. |
Keywords: | Big Data, Nowcasting, Early Estimates, Econometric Methods |
JEL: | C32 C53 |
Date: | 2018 |
URL: | http://d.repec.org/n?u=RePEc:baf:cbafwp:cbafwp1882&r=big |
By: | Francis X. Diebold; Minchul Shin |
Abstract: | Despite the clear success of forecast combination in many economic environments, several important issues remain incompletely resolved. The issues relate to selection of the set of forecasts to combine, and whether some form of additional regularization (e.g., shrinkage) is desirable. Against this background, and also considering the frequently-found good performance of simple-average combinations, we propose a LASSO-based procedure that sets some combining weights to zero and shrinks the survivors toward equality ("partially-egalitarian LASSO"). Ex-post analysis reveals that the optimal solution has a very simple form: The vast majority of forecasters should be discarded, and the remainder should be averaged. We therefore propose and explore direct subset-averaging procedures motivated by the structure of partially-egalitarian LASSO and the lessons learned, which, unlike LASSO, do not require choice of a tuning parameter. Intriguingly, in an application to the European Central Bank Survey of Professional Forecasters, our procedures outperform simple average and median forecasts – indeed they perform approximately as well as the ex-post best forecaster. |
JEL: | C53 |
Date: | 2018–08 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:24967&r=big |
By: | Twomey, Paul |
Abstract: | Building on the 2017 Hamburg Statement and the G20 Roadmap for Digitalization, this paper recommends a G20 framework for artificial intelligence in the workplace. It proposes high level principles for such a framework for G-20 governments to enable the smoother, internationally broader and more socially acceptable introduction of big data and AI. The principles are dedicated to the work space. It summarises the main issues behind the framework principles. It also suggests two paths towards adoption of a G-20 framework for artificial intelligence in the workplace. |
Keywords: | artifical intelligence,privacy,wealth distribution,workplace,regulation,political principles,workers,transparency,G20,heads of government,big data,Hamburg Statement |
JEL: | K2 O3 |
Date: | 2018 |
URL: | http://d.repec.org/n?u=RePEc:zbw:ifwedp:201863&r=big |
By: | Mallory, Mindy; Kuethe, Todd; Hubbs, Todd |
Keywords: | Agricultural Finance, Risk and Uncertainty |
Date: | 2018–04–06 |
URL: | http://d.repec.org/n?u=RePEc:ags:scc018:276141&r=big |
By: | Victor Chernozhukov; Whitney K Newey; Rahul Singh |
Abstract: | Many objects of interest can be expressed as an L2 continuous functional of a regression, including average treatment effects, economic average consumer surplus, expected conditional covariances, and discrete choice parameters that depend on expectations. Debiased machine learning (DML) of these objects requires a learning a Riesz representer (RR). We provide here Lasso and Dantzig learners of the RR and corresponding learners of affine and other nonlinear functionals. We give an asymptotic variance estimator for DML. We allow for a wide variety of regression learners that can converge at relatively slow rates. We give conditions for root-n consistency and asymptotic normality of the functional learner. We give results for non affine functionals in addition to affine functionals. |
Date: | 2018–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:1809.05224&r=big |
By: | Asgari, Mahdi; Nemati, Mehdi; Zheng, Yuqing |
Abstract: | Predicting financial market movements in today’s fast-paced and complex environment is challenging more than ever. For many investors, online resources are a major source of information. Researchers can use Google Trends to access the number of search queries of a particular topic by internet users. The search volume index provided by Google then can be used as a proxy for importance of that topic. To predict the collective response to a particular news, we can use the search index for relevant search terms in our forecasting model. The focus of our study is forecasting food stock movement. A unique feature of the food industry is that besides common fundamental information, stakeholders are responsive to food safety news. In this study, we test whether including relevant search terms would reduce the forecasting error and improve the predictive power of traditional models. We use the market data and Google Trends index for 46 listed food companies. The empirical results show that on average the use of search terms reduces forecasting error by 2 to 31 percent for predicting trading volume, and reduces forecasting error by 3.5 to 77 percent for predicting the closing price, depending on the company. We also applied a model confidence set (MCS) to create a set of specifications that have statistically least forecasting error. The average forecasting error of the models in the set is lower than all models with search terms which implies that the MCS approach is efficient in identifying models with best predictive power. |
Keywords: | Agribusiness, Research Methods/ Statistical Methods |
Date: | 2018–02–06 |
URL: | http://d.repec.org/n?u=RePEc:ags:saea18:266323&r=big |
By: | Ying-Hui Chiang |
Abstract: | Current governmental data on rental housing-only by agencies have to be registered- do not reflect real market activity on the Taipei rental market. This study is trying to use web scraping to collect the big data. By cleaning, analyzing and mapping the data reveal spatial and temporal patterns cross districts housing markets in Taipei City. The rental market issue is more important in Taipei with surging housing price. The research will build the rent model to estimate fair rent of different types housing. To assess the rent affordability by the ratio between social housing rent and fair rent. To calculate the rent burdens by the ratio between median household income and median rent across the statistical area. We use two indicators that rent affordability and rent burden to discuss the social housing policy in Taipei. The findings are to capture the real rental market in Taipei and to provide suggestions for social housing policy by using big data. |
Keywords: | Housing Affordability; Housing Rent; Social Houing |
JEL: | R3 |
Date: | 2018–01–01 |
URL: | http://d.repec.org/n?u=RePEc:arz:wpaper:eres2018_246&r=big |
By: | Farah Zawaideh (Irbid National University); Raed Sahawneh (Irbid National University) |
Abstract: | Automatic text categorization (TC) has become one of the most interesting fields for researchers in data mining, information retrieval, web text mining, as well as natural language processing paradigms due to the vast number of new documents being retrieved for various information retrieval systems. This paper proposes a new TC technique, which classifies Arabic language text documents using the naïve Bayesian classifier attached to a genetic algorithm, model; this algorithm classifies documents by generating a random sample of chromosomes that represent documents in the corpus. The developed model aims to enhance the work of naïve Bayesian classifier through applying the genetic algorithm model. Experiment results show that the precision and recall are increased when testing higher number of documents; the precision was ranged from 0.8 to 0.97 for different testing environment; the number of genes that is placed in every chromosome is also tested and experiments show that the best value for the number of genes is 50 genes |
Keywords: | Data mining, Text classification, Genetic algorithm, Naïve Bayesian Classifier, N-gram processing |
JEL: | C80 |
Date: | 2018–06 |
URL: | http://d.repec.org/n?u=RePEc:sek:iacpro:6409186&r=big |
By: | Mills, Brian; Brorsen, Wade; Tostão, Emílio |
Keywords: | Agricultural Finance, Production Economics, Agribusiness |
Date: | 2017–06–30 |
URL: | http://d.repec.org/n?u=RePEc:ags:aaea17:258269&r=big |
By: | Somsri Banditvilai (King Mongkut's Institute of Technology Ladkrabang); Siriluck Anansatitzin (King Mongkut's Institute of Technology Ladkrabang) |
Abstract: | Accurate incidence forecasting of infectious disease such as dengue hemorrhagic fever is critical for early prevention and detection of outbreaks. This research presents a comparative study of three different forecasting methods based on the monthly incidence of dengue hemorrhagic fever. Holt and Winters method, Box-Jenkins method and Artificial Neural Networks were compared. The data were taken from the Bureau of Epidemiology, Department of Disease Control, Ministry of Public Health starting from January, 2003 to December, 2016. The data were divided into 2 sets. The first set from January, 2003 to December, 2015 were used for constructing and selection the forecasting models. The second set from January, 2016 to December, 2016 were used for computing the accuracy of the forecasting model. The forecasting models were chosen by considering the smallest root mean square error (RMSE) and mean absolute percentage error (MAPE) were used to measure the accuracy of the model. The results showed that Artificial Neural Networks obtained the smallest RMSE in the modeling process and the MAPE in the forecasting process was 14.05% |
Keywords: | Dengue hemorrhagic fever, Time Series Forecasting, Holt-Winters method, Box-Jenkins method, Artificial Neural Networks |
JEL: | C22 C45 |
Date: | 2018–06 |
URL: | http://d.repec.org/n?u=RePEc:sek:iacpro:6409199&r=big |
By: | Bergemann, Dirk; Bonatti, Alessandro |
Abstract: | We survey a recent and growing literature on markets for information. We offer a comprehensive view of information markets through an integrated model of consumers, information intermediaries, and firms. The model embeds a large set of applications ranging from sponsored search advertising to credit scores to information sharing among competitors. We then review a mechanism design approach to selling information in greater detail. We distinguish between ex ante sales of information (the buyer acquires an information structure) and ex post sales (the buyer pays for specific realizations). We relate this distinction to the different products that brokers, advertisers, and publishers use to trade consumer information online. We discuss the endogenous limits to the trade of information that derive from its potential adverse use for consumers. Finally we revisit the role of recommender systems and artificial intelligence systems as markets for indirect information. |
Keywords: | information design; information markets; intermediaries; mechanism design; predictions; ratings |
JEL: | D42 D82 D83 |
Date: | 2018–08 |
URL: | http://d.repec.org/n?u=RePEc:cpr:ceprdp:13148&r=big |