Statement of Data Verification and Validation Procedures
BTO is a world-leading impartial, scientific research charity interested in the status and changes of wildlife populations in relation to their ecology and the habitats on which they depend. We specialise in knowledge about birds, in the UK and internationally, but increasingly work with partners to improve knowledge of other wildlife taxa. We are a major custodian of wildlife data, with over 200 million records, organised through long-term schemes, some of which exceed 50 years continuous operation. We are committed to making our data and information products available to inform nature conservation, land management, policy and scientific purposes by sharing our data, publishing our results and providing impartial scientific advice.
These data are collated through a range of surveys (in which data collection follows specified protocols), schemes (in which records are submitted in a more ad hoc manner) and projects. These include our core surveys, some in operation for over 50 years, which monitor populations of British breeding and wintering birds (and, increasingly, other taxa), as well as a range of bespoke targeted schemes, of which the highest profile are periodic atlas surveys. It is essential that the records in our databases are appropriately validated, stored and interpreted. The Trust has robust procedures in place to facilitate this. In particular, we ensure that:
- data are collected using repeatable, systematic observation and documented methods
- data are verified and validated in a manner appropriate to the scheme goals
- data are stored securely
- data are appropriately analysed, with results published in the peer-reviewed literature
BTO’s long-term surveys (and the BirdTrack scheme) are subject to oversight by steering groups to provide strategic oversight of their goals and to ensure operating methods are appropriate and fit for purpose. Additionally, we regularly hold workshops and other events, as necessary, on specific aspects of scheme operation, involving participants, organisers, data-users and wider stakeholders, to ensure the schemes are collecting relevant data appropriate to the uses to which they are being directed. By operating a national network of Regional Organisers, trainers and mentors, running training schemes and providing online and printed material, we aim to maximise the skills of our volunteers (see below).
Data are collected using repeatable, systematic observation
- Surveys are designed such that sampling is approached in a systematic way to provide data from representative locations and habitats, to allow broader inferences to be drawn. Where practicable, we aim to do this using a stratified random sampling design that is consistent with the survey’s aims. Where this is not possible, inference from the data will be limited accordingly.
- For long-term and periodic schemes, wherever possible, participants survey the same site as in previous years.
- In regions with few volunteers (such as the uplands), we may accept a greater turn-over of volunteers, or employ professional surveyors, to ensure sufficient coverage is achieved.
- All observers are provided with instructions detailing data collection methods and clear forms to ensure the relevant information is recorded in a standardised fashion.
- We provide opportunities for training and relevant support material to surveyors, both on paper and digitally, through our website and using other media as appropriate, to ensure surveyors are well-informed about data collection protocols and the need to ensure these are done in a consistent manner.
- Online data submission systems, wherever possible, are designed around a common structure to help observers gain familiarity with the systems and hence reduce the likelihood of errors.
Data are verified and validated in a manner appropriate to the scheme goals
- We have invested in enabling online data capture with the result that most data are submitted using customised online data entry forms with built-in validation algorithms at the point of data entry. Typically, once the data have been entered, a summary is provided to the user for checking and approval before final submission.
- This automated validation is typically enhanced by manual checks of unusual records by local and national organisers.
- Data received on paper are checked in various ways, initially by inspection, followed by an extraction of unusual records or counts and often for all rare species once the data have been input electronically. When paper records were the main form of data received, BTO also used procedures such as double-inputting and checking for discrepancies to ensure accuracy.
- For most schemes, the final complete annual dataset – from paper and online submissions - is searched using agreed criteria for unusual records of any type by the scheme organiser and/or regional organisers and these are then followed up by questions to the original observer.
- Our online systems include auditing of any record editing, and we continue to improve such mechanisms across our data holdings, detailing when and why changes were made and by whom. In some cases individual records may be flagged as unusual or unvalidated, so they may be treated appropriately at the point of use (e.g. in statistical analyses).
- Most BTO records are collected with the primary aim of examining them in aggregate to uncover broad trends, as opposed to considering individual records in isolation, and validation is undertaken at the level necessary for this. Increasingly though, records may be repurposed, for example, counts from specific Breeding Bird Survey (BBS) squares may be requested for use in local site assessments, such as might be required for planning consent. In these cases, careful consideration is given to the interpretative guidance provided alongside such data so that they can be re-used appropriately (for example, Breeding Bird Survey (BBS) records provide evidence of presence, but are unlikely to provide a good estimate of true abundance without analyses that accounts for variation in detectability).
- Whilst modern systems allow powerful and flexible approaches to automation, it is important to recall that the long time-series of some BTO datasets extend back well before the information systems revolution, and verification of some (particularly older) records may have been adequate for their primary purpose then but less complete in the context of other current uses.
Data are stored securely
- All data collected electronically are housed in secure relational databases, with daily backups of key data that are stored off site and a continuous archive log allowing recovery to the point in time of any media failure.
- Paper data are all input, usually using double-input procedures to minimise data-entry errors, and most of it loaded to the secure relational database. Otherwise files are saved on the network and backed-up nightly. Paper forms are retained and filed, or scanned for digital preservation.
- BTO recognises that records of some species, or from some locations, are especially sensitive and is committed to ensuring that these are held securely with access available only to those permitted to do so. We work closely with the Rare Birds Breeding Panel (RBBP) and others to ensure these are protected in an appropriate manner.
- More generally, BTO operates a Computer Security Policy ensuring regular back-ups are made, file access is to authorised personnel only, and robust anti-virus measures are in place. We aim to meet wider industry standards in this regard, e.g. we achieved Cyber Essentials accreditation from 2018.
Data are appropriately analysed, with results published in the peer-reviewed literature
- All of BTO scientists are expected to follow our Code of Good Scientific Practice, which incorporates guidance on quality management procedures to ensure analyses are undertaken to a sufficient standard. This policy is reviewed annually by the Trust’s governing Board of Trustees to ensure it is current and relevant.
- We aim to publish the results of our analyses in the international peer-reviewed scientific literature as appropriate. This provides additional scientific oversight and peer-review over both analytical and data collection methods, ensuring they are both reliable and appropriate to the questions or hypotheses that are being tested.
- Scientists continually review their analytical methods in the light of recent advances, and consider the most appropriate methods to use, taking account of the potential biases in the data, in order to generate the most robust findings possible. This often means accepting a degree of uncertainty in the data collected (e.g. stochastic variation in detection, or between observers), but then analysing those data appropriately to account for that uncertainty.
- Increasingly, BTO makes data and information openly available either directly through our own online systems, via third party systems (for example, National Biodiversity Network) or as published datasets for scientific use (for example, the Atlas datasets).
- Where data cannot be directly downloaded, we operate a data-request system to make data available for specific purposes. This allows staff to discuss the proposed plans and ensure that the data requested are suitable to achieve the stated aims.
- As far as possible we ensure that sufficient interpretative background information is presented alongside these data to ensure appropriate inference is drawn (e.g. apparent absence of rare breeding species may be due to publishing restrictions rather than true absence).
Scheme-specific Validation Procedures
BTO encourages people with a wide range of skills to contribute to its survey programme at an appropriate level. Our network of Regional Organisers, who are local experts, familiar with their region, are in a position to assess the capabilities of existing and potential surveyors and to direct participants to a survey requiring an appropriate skill level. Each data record submitted is subject to a level of verification (checking the identification is accurate) and validation (checking it has been recorded and submitted correctly) appropriate to the scheme. The general principles described above are applied, as appropriate to each of our schemes. The following provides some examples of these are applied in each scheme, together with some additional, scheme-specific, details.
BirdTrack (online bird recording that includes casual records, complete lists and supplementary information on breeding evidence, direction of flight or sex)
- Auto-validation of dates and counts at point of data entry, based on a county-specific look-up table that is provided by the local bird club. (For data from outside the UK, automated verification rules are used to flag records if they are unusual in comparison to records already entered for that country.)
- Notification when a local rarity or national rarity is submitted asking for a description to be sent to the county recorder for assessment at appropriate level.
- Verification system for use by county bird club officials to check all records submitted.
Atlas (records and counts of all species, received via standardised tetrad surveys and casual, roving records)
- Training and advice articles were written for providing e.g. grid references and breeding evidence.
- Online systems only allowed data to be added to locations on land and for dates within the survey periods.
- Auto-verification of species identity and counts (i.e. highlighting potential errors) based on location and date at point of data entry.
- Verification module for use by Atlas Organisers and other nominated officials. Manual acceptance/querying of records. Also auto validation of some records based on a previously manually accepted record of same species in same square within same time period.
- Production of draft distribution maps for scrutiny by RBBP and all Local Atlas Organisers via an online tool.
Wetland Bird Survey (monthly counts of all waterbird species at registered WeBS sites, also Low Tide Counts and the results of WeBS add-on surveys)
- All new counters receive a handbook which covers counting techniques. Training days are available from BTO and Local Organisers. Local Organisers are responsible for being satisfied counters are competent at identifying the usual waterbirds that occur on their allocated site.
- Data input is usually done by the counters themselves, or in some cases local or national organisers, which includes data validation on entry. Species that have previously been recorded at the site are on the main data entry screen, with extra steps required to record a new species for the site.
- Automatic verification flags high counts (using national rules) to the counter for immediate checking. National Organisers also look for significant counts for checking during the annual reporting process, which are confirmed with the counters if necessary.
- WeBS Online includes a ‘Manage my Team’ section for Local Organisers where they can check all the counts from the counters in their region and flag that they have checked a submission and the counts for that submission are all correct.
Breeding Bird Survey/Waterways Breeding Bird Survey (species counts per transect section in distance bands, as well as information on habitats)
- Auto-validation of records and counts at point of data entry, along with manual review by the user prior to submission.
- Records checked by Regional Organisers with the ability for them to comment on records.
- Number of SQL scripts run by BBS/WBBS National Organiser to check for errors, such as species encountered outside the expected range.
- Production of distribution maps to aid checking for outliers.
- Prompts to ensure that habitat data is submitted to ensure long-term collation of habitat information for each transect section (levels 1 and 2 only).
Garden BirdWatch (weekly maximum counts of numbers of target species in gardens)
- Volunteers recruited on the basis that they can identify common garden birds, and given a book with identification information of all core species.
- Auto-validation of records and counts at point of data entry (limited species presented).
- On a quarterly basis paper data are loaded, and data for all core species are checked for abnormally high counts, which are corrected.
- All data are only from registered garden, so no possibility of location errors.
- Records of non-core species (rarer birds and mammals for which summary information is not presented online) are not routinely checked, but as and when these records are used or provided externally they are checked and any ‘suspect’ records are flagged, and queried if appropriate.
Garden Bird Feeding Survey (weekly counts of birds at garden feeders in winter)
- Volunteers recruited from within the pool of Garden BirdWatch participants, can identify common garden birds, and given a book with identification information of all core species.
- All data are only from registered gardens, so no possibility of location errors.
- Data from all sites (around 250) submitted annually on paper forms, which are checked for errors including abnormal species or high counts when computerised.
Garden Wildlife Health
- All submissions of sick or diseased birds are individually checked and assessed by the vets at the Zoological Society of London.
Nest Record Scheme (location and contents of individual nests)
- Prospective participants are provided with a handbook, which includes a Code of Conduct and details of survey methodology. On joining the survey, they sign a registration form to say that they have read, understood and agree to follow these protocols.
- Training in field techniques is available via training events and a network of volunteer mentors.
- Records input and submitted using downloaded survey software are subject to data type validation, range and constraint validation (including nulls in mandatory fields) and cross-reference validation, e.g. grid reference is validated against given county. Entries may be disallowed or accepted with a warning, e.g. valid but biologically unlikely values.
- On loading into the database, records with valid but biologically unlikely values are individually checked by staff. Those records associated with a high degree of uncertainty (for example, where the first egg date has been estimated) may be excluded from routine analyses.
Ringing (status and condition of individual birds caught)
- All ringers are provided with a handbook detailing best practice in terms of welfare, data collection (what and how) and data submission. In addition, all ringers undertake extensive training in the field and given guidance on data collection and submission, via their trainer.
- All data are submitted electronically with validation on data entry. This includes validation rules for key fields (for example, that the grid reference entered is within the entered county and that biometric measures are within expected ranges) which, if violated, issues an error preventing record to be saved unless corrected appropriately or warnings that require a validation comment.
- Prompts to ensure all mandatory fields are submitted for each type of record.
- On upload to ORACLE tables all records go through further internal validation where records are either rejected or flagged for edit by staff if they fail validation. In particular, subsequent encounters are given a final sense check by staff to identify and check unusual movements.
- The scheme is overseen by a Committee including members of the BTO Board of Trustees and active ringers, elected from within the membership, which has oversight of the strategic scientific direction of the scheme and the standards to which ringers operate.
Single Species Surveys
- Online validation at the point of data entry on date, location and counts.
- Prompts to ensure that all mandatory data fields are submitted to ensure data quality is maintained.
One bird, twelve journeys, 60 000 miles and invaluable scientific data: PJ the Cuckoo has left an incredible legacy.