DATA INSIGHT TRAINING
  • Home
  • On-Site Training
    • Power BI Master Class
    • Excel BI Boot Camp
    • Interactive Dashboards with Excel Pivot Tables
  • Online & Virtual Training
    • Our Courses
    • Affiliate Courses
  • About Us
  • Blog

Tracking a virus

3/3/2020

1 Comment

 
Picture
An accurate recreation of what the coronavirus would look like
The current topic of conversation in many households and places of work is the coronavirus, or more correctly, Covid-19. 

Having studied biochemistry at university and also having an interest in epidemiology, this has definitely been fascinating to me.  A few days ago, I convinced Ian to do some work to create some visualisations based on real data, so as to show the impact the disease has had. 

The first issue that we had to solve was a way to find data that was reliable.  There is lots of data floating around but much of it is incomplete, or questionable.  As anyone who has worked on projects like this will know, getting clean data is more than half the battle.  After digging around the internet for a while and looking at various sources, we elected to use the daily situational reports issued by the World Health Organisation (WHO).  We felt that this organisation would have the necessary authority to provide data that was of a satisfactory level of accuracy.
What we have found though is that the data WHO provide does have some anomalies.  For example, we found cases where the total number of deceased in some countries would drop which is literally a miracle.  Digging into these, we found that most of the anomalies have an explanation of some sort. (eg Some deaths were incorrectly attributed to Covid-19 and this was later corrected.)  As a result, we decided that a few anomalies found in the inevitable chaos that dealing with a rapidly spreading disease introduces was acceptable. 

Once we found a source of data we were happy with, the next problem was getting this into a usable format.  The daily situational reports from WHO are published as PDF documents.  We had to take these and somehow get them into a data format.  This involved converting the PDF to Microsoft Word format and then copying the data from Word to Microsoft Excel. 

However, we still had more to do.  The way the situation reports gave data was not ideal.  For example, the number of cases and new cases were given in one field with the new cases shown as a bracketed number next to the total cases.  We needed to split out the new cases from the total cases and have these as 2 separate fields.
A final issue was changes in terms of the format of the data.  As the situation has developed, the format of the daily WHO reports has evolved.  This has necessitated a change in terms of how we extract the data into the final usable format.  Furthermore, we found situations where the naming of countries varied - usually something small like having an extra space or something similarly minor but nonetheless an issue for analysis purposes.  Fortunately, Excel has some simple to use tools for doing all this without complicated formulas or manual intervention. 
Picture
We now have an automated process where we simply past the data from the daily situation reports, and refresh our Excel data source.  This process takes around 5 minutes which is manageable. 

Having found a way to get solid data in a usable format, we were able to build a visualisation using Power BI.  We can now update our visualisation with the new data on a daily basis and it's available for public consumption.  We published the dashboard to the Power BI server for access by anyone who wants to view it.  We've published the actual visualisation in it's own dedicated blog post click HERE to see it.  Our plan is to keep this updated as new data comes in. 
We hope you find this interesting. 

That's all for now...
Scott (and big thanks to Ian for all his hard work in making this visualisation happen!)
1 Comment
Aravind Hande
31/5/2020 22:38:48

I have a question about this site. How was this corona virus site set up? Do you have to pay the $750 a month to Microsoft for the "app owns data" model so that all users can see these visualizations without needing to be a logged in user to Power BI (or needing a pro license).

https://www.datainsighttraining.com/blog/coronavirus-power-bi-visualisation

I was thinking of creating a React site with embedded Power BI visualizations for Corona Virus and I was wondering if I could do it in a way that wouldn't cost me $750 a month.

Reply



Leave a Reply.

    Categories

    All
    Business Data Analysis
    Complete Introduction To
    Coronavirus
    Data Skills
    DAX
    Excel Dashboards
    Excel Formulas
    Gartner Report
    Inspiration & Motivation
    Pivot Tables
    Power BI
    Power BI Desktop
    Power BI Query Editor
    Power BI Services
    Power Map
    Power Pivot
    Power Query
    Power View
    Tables & Reports

    Archives

    April 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    August 2019
    July 2019
    June 2019
    May 2019
    April 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018
    April 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017

Privacy Policy
Copyright datainsighttraining.com
info@datainsighttraining.com
+44 330 223 5910
  • Home
  • On-Site Training
    • Power BI Master Class
    • Excel BI Boot Camp
    • Interactive Dashboards with Excel Pivot Tables
  • Online & Virtual Training
    • Our Courses
    • Affiliate Courses
  • About Us
  • Blog