A data mining approach to economic crisis governance: the CoRisk-Index as a measure of industry risks related to COVID-19 in the US economy.

Philipp Darius introduces the CoRisk-Index, a measure of corporate industry risk related to the coronavirus pandemic compiled by researchers from the Oxford.Berlin social data science collaboration group.

The COVID-19 pandemic and economic crisis

In the past months the global spread of COVID-19 has led to countermeasures including lock-downs in many countries. These successful attempts at decelerating the spread of the virus by reducing the mobility of people have, however, resulted in a global plummet of production and trade. While some industries, especially tech, might be beneficiaries of consumption shifts during the lockdown, businesses in many sectors face existential financial and operational risks associated to the coronavirus and the strain on government responses. In its monthly report on the employment situation, the US Bureau of Labour Statistics published a record unemployment rate increase in April by 10.3 percentage points to 14.7 percent, while May and June showed signs of economic recovery. However, the overall economic environment remains difficult, uncertainty high, and the recovery limited.

Data mining to inform economic governance

In these times of crises that are associated with high uncertainty and forecasting difficulties, mining web data may provide an insightful and real-time data source to improve financial and economic governance for business and governments alike. Such data mining approaches have proved useful in the past; search engine data, for instance, can be employed to predict short-term US unemployment rates, as well as a number of other economic indicators. Contributing to that strand of research, we built an industry-specific business risk indicator – the CoRisk-Index - based on publicly available financial governance reports (10-K reports filed to the US Security and Exchange Commission). The results indicate a surge of economic uncertainty and business risks related to the coronavirus, which is positively correlated with stock market and unemployment developments during the crisis. We are confident that this work provides valuable up-to-date data for research institutions working on economic forecasting, government institutions such as market supervision bodies or private sector organisations seeking real-time empirical information about the economic risks during the unfolding crisis.


We constantly collect text data from the risk reports since January 30th, the first time a company has mentioned COVID-19 in one of their reports. This unique set of data allows us to monitor the magnitude, context, and timing of the risks that businesses attribute to COVID-19 by analysing the quantity and sentiment of statements – these are aggregated into the CoRisk Index (embedded below). We have summarised the analysis and results in a working paper and illustrated the main findings in an interactive online dashboard which is updated on a weekly basis. In the following, we present the key empirical findings:

1. Industries assess COVID-19 risks differently. While the majority of firms in retail and manufacturing perceive COVID-19 as a substantial risk factor, only one in three companies in the finance sector mentioned COVID-19 in their reports (Tab 3 - Industry View). As the crisis unfolds the share of companies reporting on COVID-19 as a business risk is constantly increasing. The CoRisk-Index monitors this development in real-time (Tab 1 - CoRisk Index).

2. The context of perceived COVID-19 risks changes over time. Automated text analysis of the risk reports allows us to monitor the context in which businesses are referring to COVID-19. In a heatmap illustration, we explore relevant domains for businesses: demand, finance, production, supply, and travel. In January and February, travel related issues were more relevant, but this changed drastically, when retail and manufacturing started to report about supply chain and production interruptions. The pandemic has caused a supply shock to the economy. In March and April, as lockdown policies were introduced, retail and manufacturing started to suffer from a rapid demand shock, and began to report substantial concerns about decreasing demand. At the end of June, wholesale & retail and manufacturing continue to be the most concerned sectors, in particular with regards to the topics supply, production and demand (Tab 4 - Topic Heatmap).

3. The CoRisk Index appears to be correlated with stock market developments. Moreover, the “negativity” of the risk reports (Tab 2 – Text Sentiment) increased already prior to negative stock market developments (see working paper).  We assess the relationship between our index and the S&P 500 in more detail, since it promises to benefit economic forecasting models. 

Since the release of the CoRisk-Index, we have presented it in academic and policy-making circles, including at the CEPR Covid Economics Series, the UNDP Eurasia Webinar Series, the Oxford Department of International Development and the German Institute for Economic Research.


We have presented our work in progress on the CoRisk-Index, a measure of corporate industry risk related to the coronavirus based on text data of obligatory financial reports by corporations. The measure builds from sentiments in the reports, and appears to have a lead in comparison to the negative stock market reaction and thus promises to have explanatory value for forecasting models. Moreover, we seek to sensibilise government and research institutions to integrate data mining of web data into economic modelling to improve the capacities of crisis forecasting. Correspondingly, the CoRisk-Index may, as a measure of perceived and reported risk, inform more complex economic models to further enhance crisis forecasting and public policy responses to the crisis.