High-Resolution Soybean Yield Mapping Across the US Midwest Using Sub-field Harvester Data

Dado, Walter T; Deines, Jillian M; Patel, Rinkal; Liang, Sang-Zi

High-Resolution Soybean Yield Mapping Across the US Midwest Using Sub-field Harvester Data

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fgj825fq6518" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Cloud computing and freely available, high-resolution satellite data has enabled recent progress in crop yield mapping at fine scales. However, extensive validation data at a matching resolution remains uncommon or infeasible in much of the world. Here, we use a large-scale ground-truth dataset across the United States Midwest to assess machine learning models’ capacity for soybean yield prediction. First, we compare random forest (RF) implementations across 400,000 fields, testing a range of feature engineering approaches using Sentinel-2 and Landsat spectral data for 20- and 30-meter scale yield prediction. We find that Sentinel-2-based models can explain up to 45% of out-of-sample yield variability across 2017-18, while Landsat models explain up to 43% across the longer 2008-18 period. Using discrete Fourier transforms, or harmonic regressions, proved helpful for capturing soybean phenology, improving a Landsat-based model considerably. Second, we compare RF models trained using our fine-scale harvester data to models trained on freely available county-level data. We find that county-level models rely more heavily on just a few predictors, namely August weather covariates (VPD, rainfall, temperature) and July and August NIR observations. As a result, county-scale models perform relatively poorly on field-scale validation, especially for high-yielding fields, but perform similarly to field-scale models when evaluated at the county scale. Finally, we test whether our findings on variable importance can inform improvements to a Scalable Crop Yield Mapper (SCYM) approach that uses crop simulations to train models for yield estimation. Based on findings from our RF models, we employ harmonic regressions to estimate peak VI and a VI observation 30 days later, with August rainfall as a sole weather covariate in our new SCYM model. These changes proved effective for improving SCYM’s explained variance and creating a simple, generalizable framework for regions or time periods beyond which ground data are available.

Description

Type of resource	text
Date created	June 1, 2020

Creators/Contributors

Author	Dado, Walter T
Contributing author	Deines, Jillian M
Contributing author	Patel, Rinkal
Contributing author	Liang, Sang-Zi
Primary advisor	Lobell, David B

Subjects

Subject	School of Earth Energy & Environmental Sciences
Subject	Crop Yield Mapping
Subject	Remote Sensing
Subject	Machine Learning
Genre	Thesis

Bibliographic information

Location	https://purl.stanford.edu/gj825fq6518

Access conditions

Use and reproduction: User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Preferred citation

Preferred Citation: Dado, Walter T and Deines, Jillian M and Patel, Rinkal and Liang, Sang-Zi and David B Lobell. (2020). High-Resolution Soybean Yield Mapping Across the US Midwest Using Sub-field Harvester Data. Stanford Digital Repository. Available at: https://purl.stanford.edu/gj825fq6518

Collection

Master's Theses, Doerr School of Sustainability

View other items in this collection in SearchWorks

Contact information

Contact: tekedado@icloud.com

Also listed in

View in SearchWorks

Loading usage metrics...