Earlier this month I came across an article in the BMJ about HealthMap, a website that automatically monitors and disseminates information on disease outbreaks; some of the people involved describe it in more detail in an article1 in PLoS Medicine.
The traditional disease surveillance network suffers from gaps in coverage and sometimes from restricted flow of information between countries; the purpose of HealthMap, which has been operating since September 2006, is to bring together the large amount of information now available on the Internet from discussion sites, news outlets and the like. It is funded by Google.org, the philanthropic arm of Google, and available free of charge to users.
Currently 14 information sources, containing information from over 20 000 websites, are monitored. Articles are automatically analysed by text mining algorithms to identify diseases and locations, to estimate currency and relevance, to cluster similar articles and to remove duplicates; humans then review the items and correct them where necessary (further extensive human input is needed to maintain and improve the database of disease and geographic names on which the automatic system depends). An article2 containing a full technical description of the system is available if you have a subscription to the journal concerned; from the abstract I see that the automatic classification was found to be 84% accurate. It is hoped that the service can be improved both by technological developments and by greater human input through collaboration with users.
The visual display makes it easy to spot reports from particular countries; the approximate location of each report or group of reports is displayed using Google Maps (colour-coded by ‘heat index’ — see below), and the user can click on the marker on the map to read the original reports and other related information. The user can choose which diseases they want to know about, or zoom in to a particular country or region. The system is available in Spanish, French, Russian and Chinese as well as English, and further languages are under development. The default option is to show only reports in the language that you choose, but you can choose to see reports in any language; the ‘latest alerts’ list is multilingual in any case.
Of course such a system can never be better than the information that goes into it. The authors of the article are well aware that reporting is least complete in the parts of the world that are at greatest risk from emerging infectious diseases, but addressing this problem is a key aim. News reports may not always be reliable, or representative of the overall situation, but reliable sources such as WHO reports are given more weight than local media reports when calculating the ‘heat index’, and outbreaks mentioned by multiple sources of information are given a higher index. (Recent reports and multiple outbreaks are also given a higher index).
I clicked on 10 randomly chosen markers on the map and looked at the top report for each; there were a couple of oddities (an item about forest pests in Canada and one about the Chinese algal problem described in Vicki’s blog entry of 4th July). I also found that diseases with only one report sometimes didn’t appear on the map at all, and the related background items for any given report often seemed to be a slightly odd selection.
Overall, it seems that this service could be useful to many people, especially if it is used in combination with other information sources. Although it is not perfect, the benefits of a large amount of information being clearly displayed probably outweigh the drawbacks, and the developers are working to improve the service. If anyone reading this has views about it, please add a comment below.
1: Brownstein, J.S. et al.: Surveillance sans frontières: Internet-based emerging infectious disease intelligence and the HealthMap project. PLoS Medicine (2008) 5 (7), e151. DOI: 10.1371/journal.pmed.0050151.
2: Freifeld, C.C. et al.: HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports. Journal of the American Medical Informatics Association (2008) 15 (2), pp. 150-157. DOI: 10.1197/jamia.M2544.