Abstract: |
This paper seeks to better understand what makes big data analysis different,
what we can and cannot do with existing econometric tools, and what issues
need to be dealt with in order to work with the data efficiently. As a case
study, I set out to extract any business cycle information that might exist in
four terabytes of weekly scanner data. The main challenge is to handle the
volume, variety, and characteristics of the data within the constraints of our
computing environment. Scalable and efficient algorithms are available to ease
the computation burden, but they often have unknown statistical properties and
are not designed for the purpose of efficient estimation or optimal inference.
As well, economic data have unique characteristics that generic algorithms may
not accommodate. There is a need for computationally efficient econometric
methods as big data is likely here to stay. |