|
|
|||||||||
|
Persons using assistive technology might not be able to fully access information in this file. For assistance, please send e-mail to: mmwrq@cdc.gov. Type 508 Accommodation and the title of the report in the subject line of e-mail. Fast Grid-Based Scan Statistic for Detection of Significant Spatial Disease ClustersDaniel B. Neill, A. Moore
Corresponding author: Daniel B. Neill, Department of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213. Telephone: 412-621-2650; E-mail: neill@cs.cmu.edu. AbstractIntroduction: The spatial scan statistic is a commonly used statistical test for detecting significant disease clusters. However, the time needed to compute the scan statistic increases as the square of the number of data points M, making the test computationally infeasible for large data sets (M >100,000). One solution is to aggregate data points to a uniform grid --- when the grid is dense, the scan statistic can be computed substantially faster, with complexity O()instead of O(M 2). However, even this approach can require multiple days to compute when M is large. Because disease clusters must be found in minutes rather than days for real-time detection, a more efficient algorithm is needed. Objectives: Given a grid of squares, where each square has an associated count (number of disease cases) and underlying population, the goal is to quickly find the region with the maximum value of the scan statistic (the most significant disease cluster). Methods: A multiresolution algorithm is proposed that partitions the grid into overlapping regions, bounds the maximum score of each region, and prunes regions that cannot contain the most significant cluster. This method enables users to search across all possible regions while examining only a fraction of the regions. This reduces complexity to O(M) for dense test regions. As in the original scan statistic, randomization testing is used to calculate the statistical significance (p-value) of the detected cluster. (For additional details, see the full paper at http://www.cs.cmu.edu/~neill/papers/sss-techreport.pdf.) Results: The algorithm was tested on seven data sets (M ≈ 200,000), including western Pennsylvania emergency department data. The algorithm identified the most significant disease clusters in 20--130 minutes, 20--150 times faster than exhaustive search (Table). Conclusions: The algorithm results in substantial speedups as compared with exhaustive search, making real-time detection of disease clusters computationally feasible. This algorithm is being applied toward automatic real-time detection of outbreaks. TableReturn to top.
All MMWR HTML versions of articles are electronic conversions from ASCII text into HTML. This conversion may have resulted in character translation or format errors in the HTML version. Users should not rely on this HTML document, but are referred to the electronic PDF version and/or the original MMWR paper copy for the official text, figures, and tables. An original paper copy of this issue can be obtained from the Superintendent of Documents, U.S. Government Printing Office (GPO), Washington, DC 20402-9371; telephone: (202) 512-1800. Contact GPO for current prices. **Questions or messages regarding errors in formatting should be addressed to mmwrq@cdc.gov.Page converted: 9/14/2004 |
|||||||||
This page last reviewed 9/14/2004
|