Model-based Principal Field Crop Estimates

Detailed information for August 31, 2023

Status:

Active

Frequency:

Annual

Record number:

5225

The model-based crop estimates provide provincial and national yield and production estimates for principal field crops in Canada. The model utilizes data from low resolution satellite imagery, historical field crop survey estimates, and agroclimatic information.

Data release - September 14, 2023

Description
Data sources and methodology
Data accuracy
Documentation

Description

Since 2012-13, Statistics Canada has been collaborating with Agriculture and Agri-Food Canada (AAFC) and Environment Canada (EC) on a model which could derive crop yield estimates for principal crops grown in Canada. Statistics Canada recognised a potential opportunity to make new estimates available with these modelled yields. These estimates could eventually replace collected data and reduce the response burden on crop producers.

The modelled crop yield estimates are produced at the provincial and national levels for dissemination.

The modelled estimates will provide yield and production information for the month of August, at an earlier date than the September Farm Survey, without adding additional response burden on crop producers.

These estimates provide important information for global food security, crop products markets, and planning for transporting crops from the farm to market.

Federal and provincial government agencies, grain marketing agencies, crop insurance companies, researchers and producers are typical users of the yield and production estimate information.

Subjects

Agriculture and food (formerly Agriculture)
Crops and horticulture

Data sources and methodology

Target population

The target population is the entire agricultural area of Québec, Ontario, Manitoba, Saskatchewan and Alberta.

Instrument design

This methodology does not apply.

Sampling

Data are collected for all units of the target population, therefore, no sampling is done.

Data sources

Data are extracted from administrative files and derived from other Statistics Canada surveys and/or other sources.

Three data sources are used as input variables for the crop models; they are: 1) the Normalized Difference Vegetation Index (NDVI), derived from coarse resolution satellite data, 2) survey yield data, and 3) the agroclimatic indices.

The weekly NDVI data is a product of Statistics Canada's Crop Condition Assessment Program (CCAP). The NDVI is a standardized index of vegetation health and allows the direct comparison of changing vegetation conditions within a time series. The mean NDVI value for an individual Census Agriculture Region (CAR) is computed by averaging all of the pixels within a CAR. After the mean NDVI values were computed, they were imported as one of the input variable databases to the crop models as three-week moving averages from Julian week 18 to 35 (May to August). For more information regarding the NDVI data, follow the link to the CCAP IMDB page in the Documentation section of this document.

The Field Crop Reporting Series collects information on grains and other field crops stored on farms (March, June, and December field crop surveys), seeded area (March, June, and November field crop surveys), harvested area, expected yield and production of field crops ( November field crop survey). The resulting estimates are based on sample surveys collected at four points throughout the year, principally via telephone interviews with farm operators. Historical data from the July Farm Survey and current year expected crop yield from the July model-based data are used as input variables for the model. The historical November survey crop yield estimates are used as the dependent variable in the model. Modelled production estimates are calculated by multiplying the harvested area from the July model-based estimates by the modelled crop yield estimate of August. For more information regarding the Field Crop Reporting Series surveys, follow the link to the IMDB page in the Documentation section of this document.

The agroclimate information measured during the growing season is the third data source used for modelling crop yields. The station based daily temperature and precipitation data provided by Environment and Climatic Change Canada and other partner institutions are used to generate climate based predictors. In total, approximately 478 climate stations across the crop land extent of Canada are selected to represent the climate of the 69 CARs and the 19 CDs for Saskatchewan. The quality control and gap-filling of the missing data is performed by AAFC.

The daily series of air temperature and precipitation are incorporated into a Versatile Soil Moisture Budget (VSMB) model by AAFC to generate agroclimatic indices used in the yield model. The VSMB model outputs are generated at a daily time step and used as potential yield predictors.

Average values of the indices at all stations within the cropland extent of a specific CAR are used to represent the mean agroclimate of that CAR. If a CAR lacks input climate data, stations from neighboring CARs are used.

To form a manageable array of potential crop yield predictors, AAFC aggregated the daily agroclimatic indices which are included in the modelling methodology (Newlands et al. 2014 - http://journal.frontiersin.org/article/10.3389/fenvs.2014.00017/full; Chipanshi et al. 2015 - http://dx.doi.org/10.1016/j.agrformet.2015.03.007).

Error detection

After the modelled estimates have been generated, they are compared against the July yield estimates, although differences between the two sets of estimates are to be expected. Subject-matter experts also review the results to identify any estimates which seem questionable. External sources of field crop yield estimates are also used to identify possible errors.

Imputation

This methodology type does not apply to this statistical program.

Estimation

The modelled field crop yield estimates are calculated by a robust linear regression model (using the MM method in SAS) that uses data from 1987 to present. The dependent variable is the historical November field crop survey estimate for crop yield. The independent variables come from the NDVI dataset, the historical July field crop survey estimate for crop yield and historical agroclimatic data. A maximum of five independent variables are included in the model. They are selected by the LASSO (Least Absolute Shrinkage and Selection Operator) variable selection approach in SAS.

The crop yield estimates are modelled at the CAR level and aggregated to provincial and national levels for dissemination. The aggregation is accomplished by using seeded area from the June Farm Survey of the Field Crop Reporting Series to weight the contribution of individual CARs to the respective aggregated level. Certain crops that are less abundant in a province are modelled directly at the provincial level.

Quality evaluation

During the model development phase the quality of the model was tested by predicting the values for historical November survey crop yield estimates using the model and comparing them to the actual values at the provincial and national level. Based upon these observations, the optimal model was chosen. Model diagnostics were also run to confirm that its input data conformed to the properties required for the model.

A number of requirements are defined which must be met before the model's estimate will be eligible for publication. If less than twelve years of July or November Farm Survey yield estimates are available or the July Farm Survey yield estimate or June Farm Survey area estimate for the current year are missing for a CAR, then a yield estimate will not be produced for that CAR. At the provincial level, the total area suppressed at the CAR level is computed and if it is greater than 10% of the total area for that crop within that province, it is suppressed at the provincial level. Similarly, if greater than 10% of total area for that crop at the national level is suppressed, a yield estimate will not be produced for that crop at the national level.

Additionally, if the coefficient of variation (CV) calculated for the yield estimate of a province is greater than 35%, then the yield estimate is not published for that province.

Prior to release, the current year's modelled crop yield estimates are compared with those from the July field crop survey and subject-matter experts review the results to identify any estimates which seem questionable. External sources of field crop yield estimates are also used to identify questionable results.

Disclosure control

Statistics Canada is prohibited by law from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

In order to prevent any data disclosure, confidentiality analysis is done using the Statistics Canada Generalized Disclosure Control System (G-Confid). G-Confid is used for primary suppression (direct disclosure) as well as for secondary suppression (residual disclosure). Direct disclosure occurs when the value in a tabulation cell is composed of or dominated by few enterprises while residual disclosure occurs when confidential information can be derived indirectly by piecing together information from different sources or data series.

Revisions and seasonal adjustment