The Open Database of Greenhouses (ODG)

Detailed information for 2023/2024

Status:

Active

Frequency:

Occasional

Record number:

5443

For the purpose of exploring open data for official statistics and to support research, planning, policy across various domains, the Data Exploration and Integration Lab (DEIL) undertook a project to create an accessible and harmonized database of walking infrastructure within Canada.

Data release - August 18, 2025

Description

The Open Database of Greenhouses (ODG) is a collection of digitized greenhouses and their locations across Canada and is made available under the Open Government Licence - Canada.

The ODG is a derivative product of open data that leverages high and medium resolution earth observation imagery available from various sources, such as open data portals, partnerships, and agreements, which are controlled by municipal or provincial governments or were provided by companies that hold an existing National Standing Offer with the Federal Government, or International Space Agencies. This project uses leading-edge methods, data integration and advanced technologies in an effort to reduce the response burden on agricultural operators. As part of these efforts, the ODG is used as a tool for new technologies created to automate the collection of greenhouse data across Canada.

Reference period: The reference period for these data varies. For more information on the reference period for a specific dataset , please consult the metadata documentation.

Collection period: May through October of the year for the reference period 2021-2024

Subjects

  • Agriculture and food (formerly Agriculture)
  • Crops and horticulture

Data sources and methodology

Target population

Statistics Canada defines a greenhouse and greenhouse productsFootnote4 as a space for growing seedlings, potted plants, bedding plants, cuttings and other propagating material, vegetables and fruit grown for sale in a permanent, artificially heated enclosed structure made of plastic, plexiglass, poly-film or glass.

As a result of this definition, additional buildings which do not fit into the greenhouse definition, as outlined above, can possibly be included in the dataset based on their common visual characteristics. The database does not include linkages to business information, which would differentiate agricultural versus non-agricultural facilities.

Minimal editing and validation are done to the shape of buildings through methods of digitization and automated machine-learning in instances where validation of buildings captured in the database have similar visual characteristics. The dataset identifies greenhouses without distinguishing their type, the crops grown inside, or any features that could aid in their classification. The database does not include linkages to business information or refer to Statistics Canada surveys, business registers, taxation data or other sources. This ensures the database to maintain an open database component.

Instrument design

This methodology type does not apply to this statistical program.

Sampling

This methodology does not apply.

Data sources

Responding to this survey is voluntary.

Data are collected from other Statistics Canada surveys and/or other sources.

The creation of the ODG version 2.0 comprised of two main processing steps: first, the processing of earth observation data and, second, the creation and formatting of the dataset overlaying the earth observation data and the mapping of the original dataset attributes to standard variable (column) names. The ODG version 2.0 has two separate methods of construction: manual digitization and automated machine learning.

To compile the data into the final geographic shapefile database, both compilation methods are outlined below:

Digitized
- Earth observation data were extracted, uncompressed and converted to TIF format if not originally in this format once acquired.
- Sentinel-2 imagery was visualized into GIS software using web services, removing the steps of downloading and processing the imagery, and a new geographic feature classes were created for each region of interest.
- Greenhouses visually comparable to known greenhouses were identified in the earth observation, and a new record was digitized within the feature class.
- In the case where a greenhouse building was identified in ODG v 1.0 but not visible in ODG v 2.0 timeframe, the greenhouse polygon(s) were removed.
- Once a region was completed, attribute tables were completed with the correct and up to date information.
- Once all the regions were covered; the feature classes were merged to form the final ODG file.
Super-resolved imagery
- The super resolution model is a neural network with the following architecture
-- 1x3x3 CGR layer followed by 1x3x3 DenseNet block
-- 1x1x1 CGR layer followed by multi-head attention layer
-- 1x1 CG layer followed by 3z3 DenseNet block
-- 1x1 CG layer followed by upsampling, followed by 1x1 CG layer, followed by 3x3 DenseNet block
-- 1x1 CGR layer followed by 1x1 convolutional layer and a sigmoid
- The architecture was based on a literature review (Wang et al., 2022, Dong et al., 2016, Kawulok et al., 2020, Fuoli et al., 2021)
- The first step when running the model is to download Sentinel-1 and Sentinel-2 imagery. Ensure common projection, then provide images to super-resolution model to obtain super-resolved images of desired sites.
- Ensure super-resolved images are reprojected to the same projection as manually digitized greenhouse shapefile.
Machine learning Detection
- The detection model is a U-Net architecture neural network for image segmentation, based on the paper by Ronneberger O. et al.
- The architecture was modified to use ResNet34 as the encoder portion to improve training stability.
- The model was trained using super-resolution imagery and manually digitized greenhouse labels.
- For the detection, the first step to run the model is to feed super-resolved imagery to the trained greenhouse machine-learning based detection model.
- Obtain shapefile output of greenhouses from the detections.
- For this release false positives were removed by taking the intersection of detected greenhouse and known greenhouse shapefiles (manually digitized).
- In the future, post processing steps will remove these false positives without relying on known greenhouse polygons. For example, with an NDVI layer.

The original data fields were the unique ID and Shape identified automatically by the software. New fields were created to provide information on the imagery data source, method of collection, centroid X and Y location and province.
- Lastly, the manually digitized greenhouse shapefile is merged with machine learning detections.

Error detection

While effort was made to ensure all greenhouses were identified and other building types were not included, some buildings may be misidentified, or greenhouses could have been missed from the source image. Should any such errors be reported, they will be corrected in future versions of the ODG.

The data included in the ODG is due to visual inspection only and is not linked to official databases, surveys, or private sources.

Imputation

All greenhouses digitized in the ODG version 2 were in reference to the imagery within a certain date range, provided by ESA's satellite imagery. In general, other than processing and digitization of the features in the dataset, the imagery was used as is and can therefore create errors in the final database where features could not be identified correctly in some cases. Given the nature of the data acquisition and creation of the database, there is the possibility of some errors to be found in the final geographic product.

Estimation

This methodology type does not apply to this statistical program.

Quality evaluation

Due to the different standards adopted in the original data, steps taken to standardize the data may include some errors. The key principles of the methodology used were the avoidance of false positives and of significant alterations to the data. The methodology and limitations of each technique are described below. Simple cleaning techniques, such as removal of whitespace characters and punctuation removal, are omitted from discussion.

For the machine learning methodology, the standardization process involved processing the labels and images to the same projection and ensuring alignment for training the machine learning model.

The Statistics Canada's Annual Greenhouse Sod and Nursery Survey (GSNA) is a collection of information of greenhouse production, nursery stocks and sod produced in Canada and is frequently used to perform market trend analysis. Since the GSNA does not use information from this data source, nor does the ODG use data from the GSNA, it is unlikely that the information and total area for a province or region are comparable. The data are kept separate from each other to allow the ODG to be published and used by the public through the open data licence.

Disclosure control

This methodology does not apply.

Revisions and seasonal adjustment

This methodology type does not apply to this statistical program.

Data accuracy

Super Resolution Model
Quality assessment of the super-resolved predictions was done in the following manner:
- Looking at the mean absolute error (MAE) and root mean squared error (RMSE) for each band.
-- A sliding tolerance threshold was taken for each window for MAE and RMSE, in which for each 100x100 window the minimum MAE and RMSE for a shift by up to 8 pixels was considered.
-- This sliding window accepts images that are slightly shifted (for example, a 5 pixel to the left constant shift) compared to the ground images.
-- Allowing such slight shifts greatly favors more detailed images. The model cannot be certain for the shift between the 10 m Sentinel images and the ground truth images is. Without allowing the shift a blurry image without much enhancement is favored over a detailed image that is shifted by a few pixels.

- Comparing the ratio of MAE of the super-resolved images against a baseline consisting of cubic upsampling of one Sentinel-2 image most closely resembling the target image (similar ratio for RMSE). If this ratio is less than one, super-resolution is outperforming the baseline for the corresponding band. The same sliding window threshold was used for both the baseline and super-resolution.
- Visual inspection of the super-resolved results with input images and the high-resolution ground truth images.

Greenhouse Detection Model
There were 1,209 greenhouses in the dataset of which 749 were used for training and 460 used for testing. The greenhouse detection model achieved a pixel-wise F1 score of 80% in testing. Recall was 87%, meaning the model consistently identified true greenhouses, and precision was 75%. The object-wise F1 score was higher, at 91%. Most of the error in performance came from false positives. Upon closer inspection, the false positives occurred where errors in the super-resolved imagery appeared to be shapes that were distinct from surrounding land and had similar characteristics to greenhouses. There were also instances of buildings that became washed out in the super-resolved process and appeared closer to the roof of a greenhouse, and vice versa, where greenhouses in the original imagery became washed out in the super-resolved process. These errors in the super-resolved images contributed to errors in the greenhouse detection results.

Date modified: