High-Resolution Soybean Yield Mapping Across the US Midwest Using Sub-field Harvester Data

Placeholder Show Content

Abstract/Contents

Abstract
Cloud computing and freely available, high-resolution satellite data has enabled recent progress in crop yield mapping at fine scales. However, extensive validation data at a matching resolution remains uncommon or infeasible in much of the world. Here, we use a large-scale ground-truth dataset across the United States Midwest to assess machine learning models’ capacity for soybean yield prediction. First, we compare random forest (RF) implementations across 400,000 fields, testing a range of feature engineering approaches using Sentinel-2 and Landsat spectral data for 20- and 30-meter scale yield prediction. We find that Sentinel-2-based models can explain up to 45% of out-of-sample yield variability across 2017-18, while Landsat models explain up to 43% across the longer 2008-18 period. Using discrete Fourier transforms, or harmonic regressions, proved helpful for capturing soybean phenology, improving a Landsat-based model considerably. Second, we compare RF models trained using our fine-scale harvester data to models trained on freely available county-level data. We find that county-level models rely more heavily on just a few predictors, namely August weather covariates (VPD, rainfall, temperature) and July and August NIR observations. As a result, county-scale models perform relatively poorly on field-scale validation, especially for high-yielding fields, but perform similarly to field-scale models when evaluated at the county scale. Finally, we test whether our findings on variable importance can inform improvements to a Scalable Crop Yield Mapper (SCYM) approach that uses crop simulations to train models for yield estimation. Based on findings from our RF models, we employ harmonic regressions to estimate peak VI and a VI observation 30 days later, with August rainfall as a sole weather covariate in our new SCYM model. These changes proved effective for improving SCYM’s explained variance and creating a simple, generalizable framework for regions or time periods beyond which ground data are available.

Description

Type of resource text
Date created June 1, 2020

Creators/Contributors

Author Dado, Walter T
Contributing author Deines, Jillian M
Contributing author Patel, Rinkal
Contributing author Liang, Sang-Zi
Primary advisor Lobell, David B

Subjects

Subject School of Earth Energy & Environmental Sciences
Subject Crop Yield Mapping
Subject Remote Sensing
Subject Machine Learning
Genre Thesis

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Preferred citation

Preferred Citation
Dado, Walter T and Deines, Jillian M and Patel, Rinkal and Liang, Sang-Zi and David B Lobell. (2020). High-Resolution Soybean Yield Mapping Across the US Midwest Using Sub-field Harvester Data. Stanford Digital Repository. Available at: https://purl.stanford.edu/gj825fq6518

Collection

Master's Theses, Doerr School of Sustainability

View other items in this collection in SearchWorks

Contact information

Also listed in

Loading usage metrics...