Monday, September 17, 2007

Bone Rooms, Bird Bodies, and Biodiversity Informatics

Where in the world does the same grizzly bear perpetually whack the same leaping salmon -- both creatures frozen in shared savagery -- while children whisper and point only inches away? Some people believe that museums contain only musty air, stuffy docents, and pure boredom. However, tucked away behind a mysterious door marked "Museum Staff Only" is a dynamic and ever-growing resource few of us are lucky enough to see in person: the museum collection itself. Whether you imagine graybeards stirring up dust as they pin shiny beetles into tiny boxes or a sparkling modern facility, every museum's beating heart is its hidden collection of specimens and associated library of descriptive notebooks. These collections are anything but boring, and many are now online.


A stuffed albatross presides over the MVZ specimen collection, housed in dozens of metal shelves filled with hundreds of trays of preserved creatures.

Dust or no dust, it also may be difficult to guess how 100-year-old bird bodies could have any relevance to the geospatial industry or our lives in general. One visit to the consortium of Berkeley Natural History Museums (BNHM, bnhm.berkeley.museum), however, reveals that collections (worldwide) are data storehouses of tremendous relevance to researchers in such disciplines as biology, geomorphology, ecology, and climatology. A natural history museum, in particular, is not just a warehouse of dead creatures, but a spatio-temporal census of flora and fauna. Need to search 100 years of specimens for mammals collected in Colorado, sorted by genetic signature and mapped by evolving distribution? Museum curators teaming with in-house geospatial experts are enabling just such analyses by developing spatio-temporal specimen catalogs, taxonomic protocols, and online spatial visualization tools capable of geocoding even "fuzzy" data from historic textual references.


Chasing Critters What our industry typically calls geospatial data -- points, lines, polygons, raster grids, and so forth -- are either ink on paper or electronic zeros and ones. Usually, when we venture into the field to collect vector and raster data, we don't really bring anything tangible home. The street centerlines we digitize merely represent the streets -- we capture the bits and bytes, but leave the asphalt where it is.

Museum data are, quite literally, a different animal. Museum collectors note when and where they found the creatures they were seeking, but also often bring some critters home with them for further study and preservation. It's both the collectors' written records and the actual bodies, bones, and skins that fill a museum collection. (In comparison, a GIS lab seems a bit empty and sterile -- just a few posters and some humming machines.)



In the past, field biologists collected specimens with shotguns, leg-hold traps, or snap traps. Modern collectors are more likely to capture and release most of the animals they discover, keeping only enough specimens for positive identification and later reference (particularly when sampling small mammals, amphibians, or reptiles). Researchers may trap in the morning and then remove the animals' skins and stuff them with cotton in the afternoon. Back at the museum, they tag the skinned bodies and drop them in a tank of flesh-eating beetles that leave only bones behind. To capture the DNA, collectors save small slices of the animals' livers in vials of alcohol.


Figure 1 (click to enlarge). A page from one of Joseph Grinnell's 1918 notebooks shows his penciled map of gopher burrows in Siskiyou County, California. Grinnell's notebooks, which are now being scanned to create to a queryable Internet-based database, are rich in (spatio-temporal) textual references to historical ecologic conditions.

Some research calls for data about whole groups of animals rather than individuals, such as with bird population studies. In this case, naturalists simply observe and count, bringing home only photos and notebooks. The contents of the notebooks are data, of course, but the notebooks themselves may also become historic specimens over time. Berkeley's Museum of Vertebrate Zoology (MVZ), a member of the BNHM consortium, for instance, has journals from such collectors as Joseph Grinnell and Aldo Leopold (author of Sand County Almanac) that are as worthy of preservation as the specimen collections they describe (see Figure 1). Grinnell's highly detailed field notes established a system in the early 1900s that continues to this day at many museums. Specifically, Grinnell attached tabular data to each specimen using a consistent organizational template -- in other words, he pioneered a metadata standard for museum collections.

Part of that standard includes specific spatiotemporal metadata. For instance, Grinnell and his colleague, Tracey Storer, conducted a survey of birds, mammals, reptiles, and amphibians from California's Central Valley through Yosemite Valley to Mono Lake between 1914 and 1920. They followed a transect -- one line cutting across many different ecosystems -- that they gradually navigated during six years of field work. As they captured and observed the creatures, the researchers noted both location along the transect and date of each capture. Today, nearly 100 years later, new curators at the same institution, Berkeley's MVZ, are following that same transect to detect changes in species abundance and distribution.

Such long-term comparison studies, however, raise issues about incompatible data formats. It's safe to assume that any modern field-collected data, even if not captured digitally, will ultimately be converted to digital format. Data collected before computers even existed, such as that in Grinnell's and Leopold's notebooks, are also valuable when making temporal comparisons with parallel studies today. Consequently, starting in 2003, the National Science Foundation (NSF) awarded MVZ a grant to scan Grinnell and others' 13,000 pages of field notes and 2,000 photos to a queryable Internet-based database. The notebooks' text will become searchable by specimen catalog numbers, names of collectors, scientific names, common names, places, and dates.


A tray of colorful South American bird specimens tagged with collection metadata.

Supporting the digitization effort, a program called BioGeomancer (www.biogeomancer.org) can accept even "fuzzy" textual spatial references such as "seven miles west of Davis," and automatically return a point location. BioGeomancer is the result of a partnership between the University of Kansas Natural History Museum and Biodiversity Research Center (KUNHM, http://nhm.ku.edu), Brazil's Reference Center on Environmental Information (www.cria.org.br), Yale University (www.yale.edu), and MVZ (www.mip.berkeley.edu/mvz). BioGeomancer's founders have named the service's capability "geoparsing," and like any well-designed Web service, it provides just that single function. The Web site's interface is consequently (deceptively) simple, offering users four text entry fields beginning with country, stepping down in scale through state and county, and ending with locality. The service will format geoparsed results as hypertext markup language, extensible markup language (XML), or a graphic map.

BioGeomancer matters to collectors, curators, and users of natural history specimens, because it extends the gazetteer concept to handle the grammar that biologists in the field commonly use to describe locations. Basic gazetteers convert place names of, say, cities or monuments into points. BioGeomancer's enhancement to the gazetteer concept is that it parses not just single place names but whole phrases, including locations at some distance and cardinal direction from a nearby city or monument. When hunting for specimens, collectors are seldom actually in the cities that gazetteers reference, but often refer to their position in relation to a nearby city or distant mountain peak.

No comments: