Interact with DataONE data repositories using the dataone R package

The dataone R package enables users to construct programmatic queries to DataONE data repositories. This should be very helpful for people interested in making their analyses reproducible! Here I show my workflow for locating, analyzing, and plotting Oneida Lake benthic invertebrate data all without leaving my RStudio session.

Setup

Let’s assume that I know that my target data is on KNB so I start by setting up dataone for a node specific KNB search:

library(dataone)

cn <- CNode("PROD")
mn <- getMNode(cn, "urn:node:KNB")

Find data id

Let’s also assume that, at least initially, I do not know the id of my target dataset. I only know that the dataset title should include the benthic and Oneida keywords. I choose to use the Solr method to query the database on the Oneida keyword and filter the results to entries containing the benthic keyword. I limit the result fields to id, title, and dateModified. For more explanation of the Solr query method see this dataone vignette.

(qy <- dataone::query(cn, list(
                              rows = "300", 
                              q    = "title:*Oneida*",
                              fq   = "(title:*benthic*)",
                              fl   = "id,title,dateModified"), 
                     as = "data.frame"))
##                             id dateModified
## 1                 kgordon.4.51   2016-02-01
## 2                 kgordon.4.52   2016-09-01
## 3                 kgordon.4.56   2016-12-07
## 4                 kgordon.4.38   2016-02-01
## 5                 kgordon.4.37   2013-11-14
## 6  doi:10.5063/AA/kgordon.4.24   2015-01-05
## 7   doi:10.5063/AA/kgordon.4.9   2015-01-06
## 8   doi:10.5063/AA/kgordon.4.3   2015-01-06
## 9  doi:10.5063/AA/kgordon.4.34   2015-01-06
## 10  doi:10.5063/AA/kgordon.4.4   2015-01-06
##                                                                          title
## 1              Benthic invertebrates in Oneida Lake, New York, 1956-
## 2              Benthic invertebrates in Oneida Lake, New York, 1956
## 3              Benthic invertebrates in Oneida Lake, New York, 1956
## 4  Ekman sampling of benthic invertebrates in Oneida Lake, New York,
## 5  Ekman sampling of benthic invertebrates in Oneida Lake, New York,
## 6  Ekman sampling of benthic invertebrates in Oneida Lake, New York,
## 7       Eckman sampling of benthic invertebrates in Oneida Lake, NY,
## 8       Eckman sampling of benthic invertebrates in Lake Oneida, NY,
## 9  Ekman sampling of benthic invertebrates in Oneida Lake, New York,
## 10      Eckman sampling of benthic invertebrates in Oneida Lake, NY,

As far as I know, the results returned by a KNB website query include only the entries not prefixed by a doi designation. Let’s filter our results to exclude the doi entries and sort on the dateModified field to find the most recent result version: :

library(dplyr)
(qy <- slice(qy, grep("^doi", id, invert = TRUE)))
##             id dateModified
## 1 kgordon.4.51   2016-02-01
## 2 kgordon.4.52   2016-09-01
## 3 kgordon.4.56   2016-12-07
## 4 kgordon.4.38   2016-02-01
## 5 kgordon.4.37   2013-11-14
## 6 kgordon.4.55   2016-09-01
## 7 kgordon.4.57   2016-12-07
##                                                                         title
## 1             Benthic invertebrates in Oneida Lake, New York, 1956-t
## 2             Benthic invertebrates in Oneida Lake, New York, 1956 t
## 3             Benthic invertebrates in Oneida Lake, New York, 1956 t
## 4 Ekman sampling of benthic invertebrates in Oneida Lake, New York,
## 5 Ekman sampling of benthic invertebrates in Oneida Lake, New York,
## 6             Benthic invertebrates in Oneida Lake, New York, 1956 t
## 7             Benthic invertebrates in Oneida Lake, New York, 1956 t
(qy <- arrange(qy, desc(id), desc(dateModified)))
##             id dateModified
## 1 kgordon.4.57   2016-12-07
## 2 kgordon.4.56   2016-12-07
## 3 kgordon.4.55   2016-09-01
## 4 kgordon.4.52   2016-09-01
## 5 kgordon.4.51   2016-02-01
## 6 kgordon.4.38   2016-02-01
## 7 kgordon.4.37   2013-11-14
##                                                                         title
## 1             Benthic invertebrates in Oneida Lake, New York, 1956 t
## 2             Benthic invertebrates in Oneida Lake, New York, 1956 t
## 3             Benthic invertebrates in Oneida Lake, New York, 1956 t
## 4             Benthic invertebrates in Oneida Lake, New York, 1956 t
## 5             Benthic invertebrates in Oneida Lake, New York, 1956-t
## 6 Ekman sampling of benthic invertebrates in Oneida Lake, New York,
## 7 Ekman sampling of benthic invertebrates in Oneida Lake, New York,

Get data package

Next, I download the data package with the getPackage command. This command returns the location of a zip file in the machine’s temporary file system which can be fed to the unzip function.

resource_path <- paste0("resourceMap_", qy[1,"id"])
dt <- getPackage(mn, id = resource_path)
unzip(dt)
# the unzipped folder has underscores as separators rather than periods
package_path <- file.path(gsub("\\.", "_", resource_path), "data")
(flist <- list.files(package_path))
## [1] "cbfs.140.3-Oneida_Benthos_1956_to_present.csv.csv"
## [2] "cbfs.141.3-Taxa_list_Oneida_Benthos.csv.csv"      
## [3] "cbfs.27.8-Benthos_locations.csv.csv"              
## [4] "kgordon.4.57-METADATA.pdf"                        
## [5] "kgordon.4.57-METADATA.xml"                        
## [6] "resourceMap_kgordon.4.57-RESOURCE.rdf"
fpath <- file.path(package_path, flist[1])
dt <- read.csv(fpath, stringsAsFactors = FALSE, na.strings = "-999.0")
head(dt)
##   Year DepthGroup Season X.SamplingEvents Chironomidae Ephemeroptera
## 1 1957       deep Spring                1        359.9         121.7
## 2 1958       deep Spring                1         34.3         500.0
## 3 1959       deep Spring                1        822.7          34.8
## 4 1962       deep Spring                4        939.8           0.0
## 5 1963       deep Spring                2        454.2           0.0
## 6 1965       deep Spring                2       1782.5           0.0
##   Trichoptera Other.insects Amphipoda Isopoda Leeches Oligochaeta Planaria
## 1         0.0          17.4       8.7     0.0     0.0      NA     NA
## 2         0.0           0.0       0.0     0.0     0.0      NA     NA
## 3         0.0           0.0       0.0     0.0     0.0      NA     NA
## 4         0.0           6.5       0.0     0.0    39.1      NA     NA
## 5         0.0           0.0      52.2     0.0     0.0         4.3     NA
## 6         4.3           0.0       8.7    69.6    13.0        21.7     NA
##   Snails Clams Quagga.Mussels Zebra.Mussels
## 1   NA  NA           NA          NA
## 2   NA  NA           NA          NA
## 3   NA  NA           NA          NA
## 4   NA  NA           NA          NA
## 5   NA  NA           NA          NA
## 6   NA  NA           NA          NA

Plot data

Now that I have the dataset located, downloaded, and read into a data.frame object I go ahead and plot the result while facetting on organism class.

library(dplyr)
library(ggplot2)

dt <- group_by(dt, Year)
dt <- summarise_each(dt, funs(mean), 5:17)
dt <- reshape2::melt(dt, "Year")

gg <- ggplot(data = dt) + 
        geom_line(aes(x = Year, y = value)) + 
        facet_wrap(~variable)
gg

Interact with Sciencebase.gov using sbtools: Finding Dam Removal Data

I was searching for data on US dam removal trends and stumbled upon a sciencebase.gov page with data that looked promising. However, I wasn’t sure if this was the best source of data given the mixed results of my earlier search engine queries. I decided to try out the sbtools package as a way of seeing all the files associated with multiple dam removal entries in Sciencebase. I started by passing a query to the query_sb_text function. Although the function returns a list of twenty items by default, I only show the first two to illustrate how it is not obvious which entry is best for my purposes.

library(sbtools)
query_sb_text("Dam Removal Database")[1:2]
## [[1]]
## <ScienceBase Item> 
##   Title: National Dam Removal Database: A living database for information on dying dams
##   Creator/LastUpdatedBy:      / 
##   Provenance (Created / Updated):   / 
##   Children: 
##   Item ID: 552448fce4b027f0aee3d3d4
##   Parent ID: 550c9621e4b02e76d759d7a2
## 
## [[2]]
## <ScienceBase Item> 
##   Title: USGS Dam Removal Science Database
##   Creator/LastUpdatedBy:      / 
##   Provenance (Created / Updated):   / 
##   Children: 
##   Item ID: 55071bf9e4b02e76d757c076
##   Parent ID: 53f630a3e4b09d12e0e9bd1e

Next, I found the subset of the results that sounded most promising and looped over it with item_list_files.

query_names <- c("American Rivers Dam Removals",
                 "Dam Removal Information Portal (DRIP)",
                 "National Dam Removal Science Database",
                 "USGS Dam Removal Science Database",
                 "National Dam Removal Database: A living database for information on dying dams")
res_ids <- lapply(query_names, function(x) query_sb_text(x)[[1]]$id)
lapply(res_ids, function(x) item_list_files(x[[1]]))
## [[1]]
## data frame with 0 columns and 0 rows
## 
## [[2]]
##                             fname size
## 1 metadata2365201133950090499.xml 3840
##                                                                                                                                       url
## 1 https://www.sciencebase.gov/catalog/file/get/57b6ce28e4b03fd6b7d919cf?f=__disk__15%2Fac%2F72%2F15ac724223c8464c77e8615543dc0c9a07411d00
## 
## [[3]]
## data frame with 0 columns and 0 rows
## 
## [[4]]
##                                                      fname    size
## 1                        20150309_DamRemovalDatabase.accdb 1196032
## 2        20150309_DamRemovalDatabase_Accession Numbers.xml   62542
## 3                20150309_DamRemovalDatabase_Citations.xml  106793
## 4                     20150309_DamRemovalDatabase_Dams.xml  178601
## 5        20150309_DamRemovalDatabase_Study Design Data.xml  322143
## 6            20150309_DamRemovalDatabase_Study Results.xml  373008
## 7           USGS_Dam_Removal_Science_Database_METADATA.xml   77789
## 8                      20150527_Dam Removal Database.accdb 1302528
## 9      20150527_Dam Removal Database_Accession Numbers.xml   62542
## 10             20150527_Dam Removal Database_Citations.xml  107128
## 11                  20150527_Dam Removal Database_Dams.xml  178958
## 12     20150527_Dam Removal Database_Study Design Data.xml  322143
## 13          20150527_Dam Removal DatabaseStudy Results.xml  373008
## 14                     20150710_Dam_Removal_Database.accdb 1777664
## 15     20150710_Dam_Removal_Database_Accession_Numbers.xml   69881
## 16             20150710_Dam_Removal_Database_Citations.xml  107128
## 17                  20150710_Dam_Removal_Database_Dams.xml  195634
## 18     20150710_Dam_Removal_Database_Study_Design_Data.xml  354367
## 19         20150710_Dam_Removal_Database_Study_Results.xml  373008
## 20 20150710_USGS_Dam_Removal_Science_Database_metadata.xml   76303
##                                                                                                                                        url
## 1  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__43%2Ffb%2F81%2F43fb810a41ebeb792d5f1dad87809e35b8a79354
## 2  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__93%2F76%2Fef%2F9376ef25e9e1e9f70420448ed98d50a66660c0bd
## 3  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__43%2F3a%2Fb5%2F433ab51df501fb9065584e22a8459a5abe4ad52c
## 4  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__54%2Ffc%2Fae%2F54fcaea6e89924e24cedb15bb6d31aac13f41d23
## 5  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__59%2Fd0%2Fb8%2F59d0b8df94974951b02fe7081b56e1ad5086123f
## 6  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__1a%2F2d%2F30%2F1a2d307c3286790c7818fceca9972cca37466891
## 7  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__86%2Fc1%2F41%2F86c1410e5197a5da45a7c2cffb516c99fddb1407
## 8  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__5f%2Fda%2Fc7%2F5fdac70a07b9908c9e23acf313113f0fff9266e7
## 9  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__b7%2F88%2F4b%2Fb7884b9f771b65ac655a88d27874b8511534efec
## 10 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__15%2F64%2Fea%2F1564ead29d2c5228a19de8f63d32c36d960f19ba
## 11 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__14%2F1c%2F4c%2F141c4cab303202778b4a5b4ee1a21711a522eed4
## 12 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__0b%2F63%2F0b%2F0b630b5a4f5e924ee447879f3b87786dbc9368cf
## 13 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__6b%2F82%2Fb2%2F6b82b20c4c64fbb96473f29dd6c8988f0038b138
## 14 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__0d%2Fc5%2F78%2F0dc578f5dfd8c03968e1847ce15846bed89c0635
## 15 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__d5%2F4e%2F7e%2Fd54e7eb3c8d04b419d71761f3c7bbc99076183d9
## 16 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__f7%2F7b%2Fd4%2Ff77bd4e124f85e8689ca11a99112ce3f7877964f
## 17 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__fc%2F1c%2Fa3%2Ffc1ca3b96366654f311d3b2abf1da3fdff35b61b
## 18 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__5d%2F66%2F2e%2F5d662ede94fb288e9291672f1e9838b9dff1f2e7
## 19 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__65%2F20%2F30%2F652030ca51fc3d33b5f8f052c8afb14f5fa5a5db
## 20 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__6a%2F1f%2Fa8%2F6a1fa8987c5773a9aca05fdb74de1b6b3beab674
## 
## [[5]]
##          fname   size
## 1     Dams.PNG 187555
## 2 figure13.PNG 424553
## 3 figure14.png 169371
##                                                                                                                                       url
## 1 https://www.sciencebase.gov/catalog/file/get/552448fce4b027f0aee3d3d4?f=__disk__35%2F0d%2F58%2F350d58bf13bd184f88f24e5f61c92ab76cc4c577
## 2 https://www.sciencebase.gov/catalog/file/get/552448fce4b027f0aee3d3d4?f=__disk__e3%2F66%2F6b%2Fe3666bf48aa91fbad53ac3813dffcff0d6922888
## 3 https://www.sciencebase.gov/catalog/file/get/552448fce4b027f0aee3d3d4?f=__disk__87%2Fb1%2Fa1%2F87b1a16e3f65d765bdea0c00a2b9f32173704a77

It looks like only the fourth term is associated with any available data files. Next, I narrowed my query to this term and used the item_file_download function to return the largest file which I will assume is the actual database.

query_names <- c("USGS Dam Removal Science Database")
query_ids   <- query_sb_text(query_names)[[1]]$id
query_files <- item_list_files(query_ids)
big_file <- query_files[order(query_files$size, decreasing = TRUE),][1,]

item_file_download(query_ids, names = big_file$fname,
                   destinations = big_file$fname, dest_dir = ".", overwrite_file = TRUE)
## [1] "20150710_Dam_Removal_Database.accdb"

Since I am on a Linux machine without access to MS Access this will make it a bit difficult to read the Access database file. Luckily I can use mdbtools command line program (it can be installed with apt-get). First, I identified the Dams table using the mdb-tables command. Finally, I pipe the output of mdb-export to read.csv.

system(paste0("mdb-tables ", big_file$fname))
res <- read.csv(pipe(paste0("mdb-export ", big_file$fname, " Dams")))
head(res[,1:8])
##   DamAccessionNumber          DamName    DamRiverName
## 1                  1 Murphy Creek Dam    Murphy Creek
## 2                  2       Croton Dam       Big Creek
## 3                  3  Hawkesville Dam Conestoga River
## 4                  4  Huttonville Dam    Credit River
## 5                  5     Chilligo Dam     Ellis Creek
## 6                  6   Greenfield Dam      Nith River
##                      DamLocation DamState_Province DamCountry DamLatitude
## 1 San Joaquin County, California                CA        USA    38.23394
## 2         Delhi, Ontario, Canada           Ontario     Canada    42.82025
## 3   Hawkesville, Ontario, Canada           Ontario     Canada    43.56758
## 4   Huttonville, Ontario, Canada           Ontario     Canada    43.64485
## 5     Cambridge, Ontario, Canada           Ontario     Canada    43.43619
## 6    Greenfield, Ontario, Canada           Ontario     Canada    43.29834
##   DamLongitude
## 1   -121.02740
## 2    -80.50807
## 3    -80.63648
## 4    -79.80412
## 5    -80.33448
## 6    -80.47440

Routing reservoir input hydrographs using the level pool routing method

Level pool routing refers to one of the more straight-forward methods for calculating reservoir outflow given an input hydrograph (time vs inflow) along with information about basin discharge relative to elevation. Here the term reservoir is used in the technical engineering context and does not exclude the use of this method for natural lakes. I believe this method is applicable to any waterbody where you can assume a one-to-one relationship between discharge and elevation. Said another way, we assume that discharge does not have a hysteresis component and does not depend on the direction (rising vs falling limb) of the input hydrograph.

For more a more rigorous mathematical treatment of this method see here, here, and here. The following demonstration will lay out the computational details and describe code to implement the method on three test datasets from the above links.

Computation

The level pool routing computation procedure involves:

  • Determining the volume of the reservoir at a range of depths (if not known ahead-of-time) given its area.

  • Developing a relationship between outflow and reservoir-change-in-storage.
  • The reference examples use the classical approach of linear interpolation. Here I use non-linear generalized additive modelling to more flexibly represent the shape of this relationship.

Then for each time-step we:

  • Calculate the sum of inflows for the current and previous time-steps.

  • Calculate change-in-storage-with-time as a function of outflow during the previous time-step.

  • Use our fitted relationship to calculate outflows for the current time-step as a function of reservoir change-in-storage.

  • Subtract these outflows from the running reservoir storage term.

#' level_pool_routing
#' @param lt data.frame with time and inflow columns
#' @param qh data.frame with elevation and discharge columns.
#'  Storage column optional.
#' @param area numeric reservoir area
#' @param delta_t numeric time step interval in seconds.
#' @param initial_outflow numeric
#' @param initial_storage numeric
#' @param linear.fit logical operator specifying a linear
#'  relationship between outflow and reservoir-change-in-storage
#' @importFrom mgcv gam

level_pool_routing <- function(lt, qh, area, delta_t,
                        initial_outflow, initial_storage,
                        linear.fit){
  
  lagpad <- function(x, k) {
    c(rep(NA, k), x)[1 : length(x)] 
  }
  
  lt$ii <- apply(cbind(lagpad(lt$inflow, 1), lt$inflow), 1, sum)

  if(is.null(qh$storage)){
    qh$storage <- area * qh$elevation
  }
  
  qh$stq       <- ((2 * qh$storage) / (delta_t)) + qh$discharge
  
  lt$sjtminq   <- NA
  lt$sj1tplusq <- NA
  lt$outflow   <- NA
  lt[1, c("sj1tplusq")] <- c(NA)
  lt[1, c("sjtminq")] <- ((2 * initial_storage) / delta_t) -
                          initial_outflow
  
  lt[1, "outflow"] <- initial_outflow
  
  if(linear.fit == TRUE){
    fit <- lm(discharge ~ stq, data = qh)
  }else{
    fit <- mgcv::gam(discharge ~ s(stq, k = 3), data = qh)
    
    plot(qh$stq, qh$discharge, xlab = "Change-in-storage-with-time",
         ylab = "Discharge")
    lines(qh$stq, predict(fit))
  }
  
  for(i in seq_len(nrow(lt))[-1]){
    lt[i, "sj1tplusq"] <- lt[i-1, "sjtminq"] + lt[i, "ii"]
    lt[i, "outflow"]   <- predict(fit,
                            data.frame(stq = lt[i, "sj1tplusq"]))
    lt[i, "sjtminq"]   <- lt[i, "sj1tplusq"] -
                            (lt[i, "outflow"] * 2)
  }

  lt
}

Example 1

Linear relationship between discharge and change-in-storage

The data for this example comes from this level-pool routing walkthrough. We are given an inflow hydrograph in 6 hour increments so we will specify a delta_t timestep of 6 * 3600 seconds. The problem statement specifies an intial storage of 0 and an initial outflow of 20. There is no need to specify reservoir area because we are given storage as a function of discharge.

input_hydro     <- data.frame(
                    time = seq(0, 162, by = 6),
                    inflow = c(0, 50, 130, 250, 350, 540, 735, 1215,
                              1800, 1400, 1050, 900, 740, 620, 510,
                              420, 320, 270, 200, 150, 100, 72, 45,
                              25, 10, 0, 0, 0))
reservoir_char <- data.frame(
                    elevation = c(130:134, 136:139),
                    discharge = c(20, 34, 57, 96, 162, 463, 781,
                              1318, 2226),
                    storage = c(1, 1.69, 2.85, 4.8, 8.12, 23.1,
                              39.1, 65.9, 111))
reservoir_char$storage <- reservoir_char$storage * 1000000
delta_t <- 6 * 3600

res_linear <- level_pool_routing(input_hydro, reservoir_char,
                area = NA, delta_t = delta_t, initial_outflow = 20,
                initial_storage = 0, linear.fit = FALSE)

plot(res_linear$time, res_linear$inflow,
     xlab = "Time (h)", ylab = "Flow (m3/s)")
lines(res_linear$time, res_linear$outflow)
legend("topleft", legend = c("Inflow", "Outflow"), lty = c(NA, 1),
       pch = c(21, NA))

Example 2

Curvilinear relationship between discharge and change-in-storage

The data for this example comes from this level-pool routing walkthrough pdf. I scraped the data tables from the pdf using the pdftools package. We are given storage as a function of discharge so we have no need for information on the area of the reservoir. Also, we are given an inflow hydrograph in 2 hour increments so we will specify a delta_t timestep of 2 * 3600 seconds. Unlike the previous example, our reservoir has a non-zero initial storage.

library(pdftools)
txt  <- strsplit(pdf_text(
  "https://www.caee.utexas.edu/prof/maidment/CE374KSpr12/Docs/Hmwk5Soln.pdf"), "\n")[[1]]

parse_table <- function(tbl, tbl_names){
  tbl <- lapply(tbl, function(x) read.table(text = x,
            stringsAsFactors = FALSE))
  tbl <- lapply(tbl, function(x) gsub(",", "", x))
  
  inds <- lapply(tbl,
            function(x) ifelse(max(grep(")", x)) == -Inf,
            1, max(grep(")", x))))
  inds <- lapply(seq_along(tbl),
            function(x) c(1, (inds[[x]] + 1):length(tbl[[x]])))
  
  tbl <- lapply(seq_along(tbl),
          function(x) as.numeric(tbl[[x]][inds[[x]]]))
  tbl <- data.frame(t(do.call("rbind", tbl)))
  tbl <- tbl[2:nrow(tbl),]
  names(tbl) <- tbl_names
  
  tbl
}

tbl1 <- suppressWarnings(rbind(
          parse_table(txt[6:7], tbl_names = c("time", "inflow")),
          data.frame(time = seq(20, 36, by = 2), inflow = 0)))
tbl2 <- suppressWarnings(
          parse_table(txt[9:10], tbl_names = c("storage", "discharge")))
tbl2$storage <- tbl2$storage * 1000000
res_curv <- level_pool_routing(lt = tbl1, qh = tbl2, area = NA,
              delta_t = 7200, initial_outflow = 57,
              initial_storage = 75000000, linear.fit = FALSE)

plot(res_curv$time, res_curv$inflow, xlab = "Time (h)",
  ylab = "Flow (m3/s)")
lines(res_curv$time, res_curv$outflow)
legend("topleft", legend = c("Inflow", "Outflow"), lty = c(NA, 1),
  pch = c(21, NA))

Example 3

Oscillating relationship between discharge and change-in-storage

The data for this example comes from this level-pool routing walkthrough ppt. Here we are given discharge as a function of reservoir elevation but we are not given the corresponding storage. As a result, we must specify a reservoir area. We are given an inflow hydrograph in 10 minute increments so we will specify a delta_t timestep of 10 * 60 seconds. In this case, our reservoir a zero initial storage and initial outflow.

lt <- data.frame(time = seq(0, 210, by = 10),
         inflow = c(seq(0, 360, by = 60), seq(320, 0, by = -40),
         rep(0, 6)))

qh <- data.frame(elevation = seq(0, 10, by = 0.5),
         discharge = c(0, 3, 8, 17, 30, 43, 60, 78, 97, 117, 137,
         156, 173, 190, 205, 218, 231, 242, 253, 264, 275))
res_osc <- level_pool_routing(lt, qh, area = 43560, delta_t = 600,
            initial_outflow = 0, initial_storage = 0,
            linear.fit = FALSE)


plot(res_osc$time, res_osc$inflow, xlab = "Time (h)",
  ylab = "Flow (cfs)")
lines(res_osc$time, res_osc$outflow)
legend("topleft", legend = c("Inflow", "Outflow"), lty = c(NA, 1),
  pch = c(21, NA))

Using the Geopackage Format with R

Many (most?) people involved in vector geospatial data analysis work exlusively with shapefiles. However, the shapefile format has a number of drawbacks including the fact that spatial attributes, metadata, and projection information are stored in seperate files. For instance, it can take as much as 45 lines of code to ensure a complete “shapefile” download.

In a post to the R-Sig-Geo listserv (excerpted below) Barry Rowlingson mentions the GeoPackage format which is an interesting alternative to shapefiles.

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Wed Jul 13 19:44:28 CEST 2016

[...] the agency from which I got the data has been enlightened
enough not to use the clunky, outdated, and limited "shapefile" format
and has released the data as a modern, OGC-standard GeoPackage. My
variables have long names, my metadata is stored with my data, and its
all in one file instead of six. [...]

 Shapefiles are an awful, awful format which Esri didn't think people
would actually use. They should not be encouraged. [...] 

Barry

The interested but time-limited geospatial analyst might wonder; Can I easily switch from a shapefile workflow to a GeoPackage workflow? Will my colleagues be able to open and interact with GeoPackage files? Fortunately, the answer is yes because the rgdal R package supports reading and writing of vector geospatial data to GeoPackage files.

First, lets load some geospatial data into a SpatialPointsDataFrame object.

library(rgdal)
cities <- readOGR(system.file("vectors", package = "rgdal")[1], "cities")
class(cities)
> [1] "SpatialPointsDataFrame"    
> attr(,"package")    
> [1] "sp"

Now lets write the data to disk using the GeoPackage format:

# outname <- file.path(tempdir(), "cities.gpkg")
outname <- "cities.gpkg"
writeOGR(cities, dsn = outname, layer = "cities", driver = "GPKG")

This file can then be read back into R:

cities_gpkg <- readOGR("cities.gpkg", "cities")
identical(cities, cities_gpkg)
> [1] TRUE

The file can also be opened in your standalone GIS program of choice such as GRASS, QGIS, or even ArcGIS.

Sometimes you may want to directly access the metadata for your spatial data without loading the object geometry. Because GeoPackage files are formatted as SQLite databases you can use the existing R tools for SQLite files. One option is to use the slick dplyr interface:

library(dplyr)
cities_sqlite <- tbl(src_sqlite("cities.gpkg"), "cities")
print(cities_sqlite, n = 5)
> Source: sqlite 3.8.6 [cities.gpkg]
> From: cities [606 x 6]

>    fid      geom             NAME COUNTRY POPULATION CAPITAL
>   (int)     (chr)            (chr)   (chr)      (dbl)   (chr)
> 1      1 <raw[29]>         Murmansk  Russia     468000       N
> 2      2 <raw[29]>      Arkhangelsk  Russia     416000       N
> 3      3 <raw[29]> Saint Petersburg  Russia    5825000       N
> 4      4 <raw[29]>          Magadan  Russia     152000       N
> 5      5 <raw[29]>            Perm'  Russia    1160000       N
> ..   ...       ...              ...     ...        ...     ...

Another option is to use the more primitive RSQLite interface:

library(RSQLite)
con <- dbConnect(RSQLite::SQLite(), dbname = "cities.gpkg")
dbListTables(con)

cities_rsqlite <- dbGetQuery(con, 'select * from cities')
head(cities_rsqlite[, -which(names(cities_rsqlite) == "geom")])
dbGetQuery(con, 'select * from gpkg_spatial_ref_sys')[3,"description"]
> fid             NAME COUNTRY POPULATION CAPITAL
> 1   1         Murmansk  Russia     468000       N
> 2   2      Arkhangelsk  Russia     416000       N
> 3   3 Saint Petersburg  Russia    5825000       N
> 4   4          Magadan  Russia     152000       N
> 5   5            Perm'  Russia    1160000       N
> 6   6    Yekaterinburg  Russia    1620000       N

> [1] "longitude/latitude coordinates in decimal degrees on the WGS 84 spheroid"

[Re] Accelerated sea level rise and Florida Current transport

Here I examine the sea level trend calculations presented in Figure 4 and Table 3 of:

Park, J., & Sweet, W. (2015). Accelerated sea level rise and Florida Current transport. Ocean Science Discussions, 12(2), 551-572. [pdf]

The code repository for generating the following figures and tables can be found here. As in the original Park and Sweet (2015) implementation, I calculate trends sea level trends at two gages with publicly available data (Vaca and Virginia Key). My result for Vaca Key was very similar to the original. However, my result for Virginia Key was quite a bit higher despite using the same period-of-record. One explanation for the different trend numbers might be that they were using provisional data that has since been corrected.

vaca Figure 1 Monthly mean sea levels at Vaca Key. Solid line represents calculated trend.

virginia Figure 2 Monthly mean sea levels at Virginia Key. Solid line represents calculated trend.

Table 1 Linear models of mean sea level rise.

Station Trend (mm yr-1) Uncertainty (se)
Vaca Key 3.91 0.43
Virginia Key 4.66 0.92