Views from the Field No. 7: Improving Data Quality using GIS
In this Views from the Field blog post we explore Geographic Information Systems (GIS) and how they can be applied to survey design. This blog post is closely linked to previous work we have published on paradata and designing baseline surveys for tracking respondents.
GIS is a system by which spatial and geographic data is stored, managed and analysed. The very first examples of GIS come from public health when, in 1854, John Snow (not that one) investigated the causes of cholera. He plotted cases over a map of Soho, in central London, which led to the theory of cholera being a water borne disease and resulted in the founding of the field of study called Epidemiology. Following the advancement of technology during the last 20 years, the vast potential of GIS has been unlocked. Today there are huge amounts of data available to all; as well as a number of paid and free-to-use, open source software programs that run on your everyday computer making GIS the most accessible it has ever been.
At EDI, we believe GIS and other forms of paradata are vital in data collection; to monitor data as it is collected, to check interviewer behaviour is consistent as well as ensuring and improving the quality of the data collected (Choumert-Nkolo, Cust, Taylor, 2018). At EDI, geographic data is a vital part to our systematic tracking information. GPS coordinates, detailed contact information and specific localised directions are all key to ensuring respondents can be tracked during a follow-up survey. Furthermore, analysis and presentation of this tracking data can help reduce attrition and improve survey outcomes. Read our blog post on designing a successful baseline survey for more information.
We are always looking for ways to improve the services we provide and so, earlier this year, EDI’s Research Team, including myself, completed a course in GIS. During the course we learnt the fundamentals of GIS data and how to use GIS software, mainly the open source software QGIS. Once the groundwork was complete, we discussed ways in which we can improve our internal methods by analysing spatial data for improving fieldwork monitoring and respondent tracking. Figure 1 shows a map of Tanzania with the location of households of a recent project (scale removed). This was produced as a result of the GIS course and is an example of the descriptive work that can be produced.
Figure 1: Final output
How can GIS be used to improve our monitoring and incorporated into our methods to improve the quality of the data we ultimately collect?
At the time of writing there have been a number of instances where GIS has already proved useful to EDI. Chief amongst these are the following two examples. First, to monitor random walk patterns performed by interviewers by plotting the routes of each interviewer over satellite photos of the villages our teams were working in. In Figure 2, the spread of interviewers’ routes across a village is shown (the satellite image has been removed to avoid identification of the village). From an image such as this it is easy to see if your random walk methodology is resulting in the required spread of interviews and would identify where protocols had been violated and where interviewers had not performed as expected. This can then provide a guide to the corrective measures required. Such insights would be impossible without visualising in two dimensions.
Figure 2: Random Walk Routes
Second, during the preparation phase of a project with Results for Development, EDI plotted the GPS coordinates of primary and replacement health facilities before calculating the nearest replacement facility to each primary facility, so teams were well prepared with the closest replacements should they be required. Figure 3 displays the primary health facilities in black linked with the closest replacement facilities in white. This helped the coordination team to make decisions on which replacement facilities to be utilised in the field; saving time and resources. Once a replacement has been utilised once, it can be removed from the calculation and the nearest facility can be recalculated for the next replacement required. As I touched on earlier, similar visualisations can be produced to help track respondents. Respondent household locations and direction can be pre-loaded onto interviewers’ tablets allowing them to locate households easily reducing the tracking time during fieldwork and reducing attrition rates by easily locating respondents.
Figure 3. Nearest Replacement Facilities
The positive effects of this training are already being seen with EDI teams making GIS considerations when advising on data collection projects. For example, by taking GIS into consideration during project preparations we are improving protocols so that the definition of GPS information is consistent and can be used more readily in analysis. We are also taking steps to improve the accuracy of GPS by instructing interviewers to ensure they have a clear, unobscured, view of the sky when they take GPS readings as taking them inside a building can skew the reading. We also give specific and consistent instructions when this is not possible, for example, if interviewing a household in a block of flats, all households within that block will have the GPS reading taken outside the front entrance. Surveybe’s in-built commenting function can then be used to provide further detail of the location of the household with the block of flats.
Ideas and innovations will naturally be developed by the EDI Research Team, as well as adopting external methodologies as appropriate from the world of GIS that can contribute to the high-quality data we systematically produce. We all know poor quality data undermines and hinders efforts of the development sector and embracing GIS to improve data quality is another step to achieving our collective goals.