MBBS data checklist

After a fresh new year and a new series of surveys, here’s the walkthrough of how to update this repository with the latest information.

STEP 1: Checking and QC’ing the ebird data

You’ll want the ‘NC Mini BBS Route Runners’ google sheet1 up to check which routes ought to have been run, and who you should get in touch with if they haven’t been. You’ll check the routes on all three ebird accounts.

Update the ‘sampling events’ tab with comments as needed.

  1. Check that all the routes are accounted for.
  2. Check that all routes have 20 stops and are formatted correctly
  3. Does Stop 1 have environmental data? Are the types of data written correctly? The only required information is the observers=

Here’s an example of what this environmental data might look like:

observers=Allen Hurlbert, Sarah Pollack; weather=55 F, clear; notes=big thunderstorm last night, everything wet; vehicles=3 (or v=3); habitat=B,H (or h=B,H).

Habitat data may only be recorded if something’s changed from the last years.

* Potential errors:
  * misspellings in the data type ie "wether" instead of "weather".
    To fix, edit the comments to the correct spelling.
  * no *"observer(s)="* in the first stop.
    To fix, check the submitter's ebird account
    or the 'NC Mini BBS Route Runners'
    to see who to credit with running that route.
    Add the information.
  * data types separated by a comma "," instead of semi-colon ";". 
    To fix, edit the comments to change commas to semi-colons.
  * h = no changes
    This will cause errors because that's not an expected habitat string
    To fix, change to e.g. notes = no habitat changes
  1. Do the other stops have comments? Are the comments formatted correctly?

  2. Are all the routes sorted under the right county?

STEP 2: Download the ebird data

Now that the data is QC’d, download the ebird data from all the accounts (it will be sent to Allen’s email initially) and add it to data/ebird. You’ll rename the files to MyEbirdData_[COUNTY]_[YYYYMMDD]. IMPORTANT! DO NOT open the files in excel. If you want to check the data open the .csv in R. Opening the files in excel may change the date format and cause errors when processing the data. If you get an error later on relating to an invalid date format, redownload the data.

STEP 3: Update the taxonomy

Download the latest version of the eBird taxonomy CSV to the data/taxonomy folder. You can find the latest version of the taxonomy here

The file should be named with the format ebird_taxonomy_vYYYY (it should download in this format). Leave previous versions in the directory.

If conform_taxonomy() later stops the update because it’s flagged that there’s a common name in the historical data that is NOT present in the ebird data, eg. taxonomy has changed for that species and been updated on ebird, add the species to the case_when()’s in taxonomy.R/conform_taxonomy() eg. “House Wren ~ Northern House Wren”

STEP 4: Run the update locally

Now, you’ll run the update locally. this will enable to you to do two things:

  1. check for errors and
  2. give input on any new observers.

To run the update locally: devtools::load_all() while you have the mbbs project open in RStudio. If you don’t have devtools installed: install.packages('devtools'). The functions for creating both the stop-level and the route-level dataset are in the R/mbbs folder. Every other R code in the project contributes to this file, and it’s the one thing that needs to run to update the data. Run this function: create_mbbs_data().

As you import the data for each county, warning messages may appear in the console. E.g.: “The following year/route don’t have either 1 or 20 checklists:”. Follow up on these messages - first check for a note in the NC Mini BBS Route Runners sheet (in the sampling events tab), then fix any errors, and finally redownload the data from ebird as necessary.

You will get INFO notices for changes occuring in the background, ARN notices for warnings that should be fixed, and ERROR notices for large errors that must to be fixed.

If there are any new observers, or any previous observers whose names have been entered with typos in the new data, you will be prompted to give input on their names.

Are there any checklists which, after discussion, need to be removed from the data? Add their checklist ID to data/excluded_submissions.yml. Examples include duplicate checklists and pre-count owling checklists, which for the moment are excluded from the data. Leave a note about the county, route, year, and why the checklist was excluded.

If everything’s gone well, we’ve now updated the mbbs datasets with the new year’s info.

STEP 5: Update the version number

Once you’ve confirmed that the update is running smoothly locally, update the version number in the DESCRIPTION file.

STEP 6: Push update to github

Push your commits to github.

The data available for download on the website will be automatically updated. Download it and check that the latest year of data is available in the dataset. Voilà!


  1. The sheet is view-only unless it has been shared with your account.↩︎