This document describes:
mbbs
package related to the
aboveMBBS data comes from several sources. They are briefly here and explained in more detail below.
eBird
As of 2009, all MBBS checklists are submitted to eBird. These checklists are downloaded manually as CSV files. Prior to 2020, checklists were submitted at the route-level. Beginning in 2020, some users began submitting individual stop-level checklists, and by 2022 all (or nearly all?) checklists were stop-level eBird checklists.
historical
stop-level
survey-list
taxonomy
data/taxonomy/ebird_taxonomy_vXXXX.csv
. The
get_latest_taxonomy
function is used internally for
accessing the taxonomy.
route_stop_coordinates
excluded-submissions
stop-deviations
flowchart TD
A[eBird]
B[historical]
C[stop-level]
D[taxonomy]
F[coordinates]
G[excluded-submissions]
H[survey-list]
O1(by-stop)
O2(by-route)
A --> O1
B --> O1
C --> O1
D --> A
D --> B
A --> H
C --> H
H --> O1
G --> A
E --> O1
F --> O1
A --> O2
O1 --> O2
See also: data-checklist.
eBird
The eBird
checklists are manually downloaded as CSV
files from each of the mbbs ebird accounts: mbbsorangenc
,
mbbsdurhamnc
, and mbbschathamnc
. Files are
stored in data/ebird/
.
Sometimes stop-level data are missing when in fact the route was
surveyed and stop-level data were collected for the rest of the route.
The stop_deviations.yml
file tracks these deviations.
Deviations could be for 2 reasons:
historical
Prior to 2009, checklists were available on the old MBBS website.
These files were collated into single csv
file for each
county. Files are stored in data/historical/
. These data
are not updated.
stop-level
Prior to 2022 (prior to 2020 for some routes), survey counts were
aggregated at the route level. This data is the un-summarized version of
the routes for which records exist. The stop-level
data
comes from a variety of sources, and is all stored in
data/stop_level/
. The data is considered stable and not
updated.
The sources are as follows:
Excel files provided by observers. The raw data are stored in
data/stop-level/
in folders by the name of the observer who
sent them. Code in R/prepare_historical_xls
creates the
processed stop_level_hist_xls.csv
.
Scraped from the ebird
species_comments
column. Some checklists summarizing routes on ebird contain stop-level
information in the notes for each species by listing comma-separated
values of abundance at each stop like “,,3,,1,,,1,1,,,,,,,1,2,,,1,”. The
R/process_species_comments
R script processes this data
into a stop-level format to create
stop_level_species_comments.csv
.
Transcribed paper files. Many surveyors sent Haven Wiley their
paper recording sheets which were then summarized to route for the old
website. These sheets have been transcribed with double-entry to prevent
errors. The transcribed_paper_files_NAME
spreadsheets are
processed to create
stop_level_transcribed_paper.csv
NOTE: When there is disagreement between counts at the route-level and the stop-level, the stop-level data is taken as the source of truth.
survey-list
The data/survey-list.csv
file contains a basic summary
of all route/years for which a survey was completed. This data is taken
from ebird, scraped from the old website, and/or confirmed independently
with surveyors about who ran which routes each year. It also summarizes
the number of species (S) and total birds seen (N). It is updated when
new surveys have been added to ebird.
A normalized version of this file is availble as a data product.
The following data are available from the MBBS data website.
mbbs_stops_counts.csv
: Species counts by
route/stop/year for route/years that we have available data.
year
: year of surveycounty
: chatham | durham | orangeroute
: survey routeroute_num
: route number within a countystop_num
: stop number within a route (1 - 20)source
: One of ebird
,
obs_details
, transcribed_paper
, or
observer_xls
. In the case that data are available from multiple sources, the following preference is
used:
ebird
,obs_details
,transcribed_paper
,observer_xls
. See stop-level
description for details
on each of these sources.common_name
: scientific name from eBird taxonomysci_name
: scientific name from eBird taxonomycount
: count of birds observedmbbs_route_counts.csv
: Species routes by route/year for
all route/years. This dataset both summarizes
mbbs_stops_counts.csv
at the route level and additionally
includes route/years where we do not have stop level counts available.
year
: year of surveycounty
: chatham | durham | orangeroute
: survey routeroute_num
: route number within a countynstops
: number of stops that were surveyed on this
routesource
: One of historical
,
stop-level
, or ebird
, meaning the data came
from historical
sources,
summarized from stop-level
, or directly from
ebird
, respectively.common_name
: scientific name from eBird taxonomysci_name
: scientific name from eBird taxonomycount
: count of birds observedsurveys.csv
: Summary of all surveys run.
route
: survey routeyear
: year of surveyobs1
: name of first observerobs2
: name of second observerobs3
: name of third observerstotal_species
: total number of species observedtotal_abundance
: total number of birds observeddate
: date of survey. In the case that survey was
conducted on more than 1 date, the most common date is used.protocol_violation
: A boolean flag that the survey had
any protocol violations, which may include:
comments.csv
: Caveat emptor This is
essentially all the comments from ebird submissions. It does includes
vehicles
and weather
fields, which are parsed
from the comments
field.route_stop_coordinates.csv
: Geographic coordinates of
each stop.
county
route
route_num
stop_num
lat
lon
stop_notes
log.txt
: output of the data processing logData products are versioned as follows… TODO