This event has ended. Visit the official site or create your own event on Sched.
Thursday, July 28 • 10:50am - 10:55am
Accelerating geospatial computing using Apache Arrow

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The ‘arrow’ R package and wider Apache Arrow ecosystem provide an end-to- end solution for querying and computing on in-memory and bigger-than-memory data sets using the Apache Arrow C++ library. In this talk we introduce the ‘geoarrow’ package, which extends Arrow to provide efficient columnar storage for spatial types and functions to support spatial queries in the Arrow compute engine. We focus on a workflow where (1) data are stored in multiple files that can be hosted remotely (e.g., on S3-compatible storage), (2) queries are processed batchwise and in parallel allowing for efficient processing of bigger- than-memory geospatial data and (3) results can be passed without copying to Rust, Python, or other R packages for further analysis.

Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/deweydunnington/Accelerating%20geospatial%20computing%20using%20Apache%20Arrow%20-%20Dewey%20Dunnington.pdf.

avatar for Dewey Dunnington

Dewey Dunnington

Voltron Data
Dewey Dunnington (Ph.D., P.Geo.) is an environmental researcher, programmer, and educator based in Nova Scotia, Canada. He recently completed his Ph.D. in lake sediment geochemistry and is currently an R Developer at Voltron Data working on all things Apache Arrow + R... Read More →

Thursday July 28, 2022 10:50am - 10:55am EDT
2. Potomac D