Annual Environmental Protection Expenditures Survey (EPES)
Detailed information for 2019
The purpose of this survey is to obtain information on the expenditures made by industry to protect the environment in Canada. This information serves as an important indicator of Canadian investment in environmental protection.
Data release - March 28, 2022
The Annual Environmental Protection Expenditures Survey provides a measure of the costs incurred by Canadian industries to protect the environment, whether or not they are in response to current or anticipated Canadian or international environmental regulations, conventions or voluntary agreements. The survey also collects information on the goods, technologies and services purchased by industries as well as the processes and practices adopted by them to protect the environment.
Data from this survey are used by all levels of government for establishing informed environmental policies. The private sector also uses this information in the corporate decision-making process.
The survey is administered as part of the Integrated Business Statistics Program (IBSP). The IBSP has been designed to integrate approximately 200 separate business surveys into a single master survey program. The IBSP aims at collecting industry and product detail at the provincial level while minimizing overlap between different survey questionnaires. The redesigned business survey questionnaires have a consistent look, structure and content.
The integrated approach makes reporting easier for firms operating in different industries because they can provide similar information for each branch operation. This way they avoid having to respond to questionnaires that differ for each industry in terms of format, wording and even concepts. The combined results produce more coherent and accurate statistics on the economy.
Reference period: The calendar year or the 12-month fiscal period for which the final day occurs on or between April 1st of the reference year and March 31st of the following year.
Collection period: August through February of the year after the reference period.
- Environmental protection
Data sources and methodology
The target population includes establishments engaged in primary industries (resource extraction), manufacturing industries, the electric power generation, transmission and distribution industry, the natural gas distribution industry and the pipeline transportation industry.
The observed population comes from the Generic Survey Universe File (GSUF) created by Statistics Canada's Data Integration Infrastructure Division in April 2020. It contains all establishments in Canada existing in April 2020. From this frame, only establishments with the following North American Industry Classification System (NAICS) codes are retained: Manufacturing industries (31-33), Logging (except contract) (113311), Oil and gas extraction (211), Coal mining (2121), Metal ore mining (2122), Other non-metallic mineral mining and quarrying (21239), Shale, clay and refractory mineral mining and quarrying (212326), Electric power generation, transmission and distribution (2211), Natural gas distribution (2212) and Pipeline transportation (486).
For this survey, smaller establishments have been excluded from the target population. That means, in each industry group (NAICS code at three digits level) by region, 10% of the smallest establishments in term of the size measure (revenue) have been excluded from the population.
The questionnaire has undergone various transformations since its inception in 1994. The original material was developed by the Environmental Accounts and Statistics Division with assistance from Industry Canada. A pilot study involving a limited group of respondents was performed as a means of initial testing of content and terminology. The basic expenditure questions have remained relatively consistent up to 2016, when the survey was biennial, with most significant changes involving the environmental management processes and technological materials (qualitative data).
The survey was redesigned for the reference year 2018 to harmonize the environmental protection activities with international classifications and extend the survey scope to include all expenditures made to protect the environment, whether or not they are in response to current or anticipated Canadian or international regulations, conventions or voluntary agreements. It introduces new questions on goods, technologies and services purchased by industries during the reference year, as well as questions on resource management. These changes were implemented in response to the new data requirements of Natural Resources Canada and Innovation, Science and Economic Development Canada.
Prior to its implementation, the new questionnaire was tested on survey respondents and others industry members to ensure that the concepts were clearly understood and that respondents were able to answer the questions correctly.
This is a sample survey with a cross-sectional design.
The Generic Survey Universe File (GSUF) was used as the survey frame. A stratified Bernoulli sample of establishments classified to the North American Industry Classification System (NAICS) Canada 2017 and to geographical regions was selected.
Data collection for this reference period: 2020-09-15 to 2021-04-30
Responding to this survey is mandatory.
Data are collected directly from survey respondents.
Data are collected using English and French electronic questionnaires. Respondents are contacted by email or letter and given a secured access code and password for the electronic questionnaire for the survey. The electronic questionnaires are addressed to a contact person who is either responsible for, or has knowledge of, the environment-related operations of the firm.
Telephone follow-up is used to obtain data from establishments who return incomplete questionnaires or who fail to respond.
View the Questionnaire(s) and reporting guide(s) .
Many factors affect the accuracy of data produced in a survey. For example, respondents may make errors in interpreting questions, answers may be incorrectly entered on the questionnaires, and errors may be introduced during the data capture or tabulation process. Every effort is made to reduce the occurrence of such errors in the survey.
The electronic questionnaire contains edits to help respondents correct for inconsistencies (e.g., a percentage summation variable does not equal 100%).
For returned data the system verifies that all mandatory cells have been filled in, that certain values are within acceptable ranges, that questionnaire flow patterns have been respected, that percentages are converted to dollars, that certain transformed and derived variables are assigned values, and that blanks are set to zeroes where appropriate. Consistency edit rules are performed on the data for each usable record. These rules ensure that all the variables have valid responses and are complete and coherent both within the questionnaire and across questionnaires.
Further data checking is performed by subject matter officers who research companies (annual reports, web sites, etc.) in an effort to verify information submitted by respondents.
Outliers are identified after collection, outside of Integrated Business Statistics Program, and are removed from the imputation process.
Statistical imputation is used for total non-response and partial non-response records. Six methods of imputation are used:
- deterministic imputation (there is only one possible value for the field to impute);
- historical imputation (when available);
- imputation by ratio;
- imputation by mean;
- donor imputation (using a nearest neighbour approach to find, for each record requiring imputation, the valid record that is most similar to it);
- manual imputation.
The criteria used for ratio, mean and donor imputation are various combinations of industry group, establishment size and geographical location (province, region, or Canada). Statistics Canada's generalized edit and imputation system (Banff) is used for this process. Usually, key variables are imputed first and are used as anchors in subsequent steps to impute other related variables.
Imputation generates a complete and coherent microdata file that covers all survey variables.
The Generalized Estimation System (G-Est) developed at Statistics Canada is used to produce domain estimates and quality indicators. It is a SAS based application for producing estimates (totals, ratios, percentages) for domains of a population based on a sample.
An initial sampling weight (the design weight) is calculated for each unit in the survey and is the inverse of the probability of selection. The weight calculated for each sampling unit indicates how many other units it represents. Sampling units which are selected with certainty (must-take units) have sampling weights of one and only represent themselves; outlier units with larger than expected size are seen as misclassified and their weight is usually adjusted so that they only represent themselves, and the weights of other units are adjusted accordingly to take into account the existence of outliers. The final weights are usually either one or greater than one.
Estimates are computed at several levels of interest such as the North American Industry Classification System code and region or province, based on the most recent classification information for the statistical entity and the survey reference period.
Data evaluation and error detection during data collection is an important way to ensure that the final estimates that are produced are of good quality. Post-collection, the survey results and estimates are evaluated as a further method of evaluating data quality. One way to assess data quality is to compare the data to the trends of other data collected. For example, in comparing the environmental protection expenditures with those of the previous cycle, some industry groups would show increases in expenditures, while others would show a decrease. Increased expenditures would be expected in industry groups that were subject to new environmental regulations or that had experienced a general increase in capital investments. On the other hand, some industries would have made large expenditures in the past to comply with regulations that came into effect before the reference year, and therefore would not be reporting as much in the current year. Instead, a shift in expenditures from capital to operating may be noticed, as firms now pay for the operation and maintenance on previously implemented processes and equipment.
Statistics Canada is prohibited by law from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential.
Statistics Canada typically uses suppression techniques to protect sensitive statistical information. Statistics Canada's generalized G-Confid system uses these techniques which involve suppressing data points that can directly or indirectly reveal information about a respondent. This can often lead to the suppression of a large number of data points and significantly reduce the amount of available data.
The random tabular adjustment (RTA) technique was introduced in 2019 for estimates expressed in total amounts. This alternative technique aims to increase the amount of data made available to users while protecting the confidentiality of respondents.
Rather than using suppression techniques, RTA changes estimates by a random amount and adds a degree of uncertainty to the accuracy of the estimate to prevent the disclosure of individual values. As a result, estimates that could disclose an individual's response are not released. Note that if the adjusted estimates are part of a table with totals or sub-totals, the related total and sub-total estimates will also be adjusted.
For more information on RTA, please refer to the blog article "Random Tabular Adjustment is here!" available as part of the StatCan Blog.(www.statcan.gc.ca/en/blog/cs/rta)
Both G-Confid and RTA techniques are used for 2019, the first for percentage estimates and the second for total estimates.
Revisions and seasonal adjustment
Revisions are made for the previous survey reference period, with the initial release of the current data, as required. The purpose is to address any significant issues with the data that were found between survey cycles. The actual period of revision depends on the nature of the issue, but rarely exceeds three years.
Since the reference year 2018 is the first cycle of the redesigned survey, no revision will be done on the data of the previous cycle. The tables 38-10-0042-01 to 38-10-0046-01, 38-10-0004-01, 38-10-0005-01, 38-10-0008-01 and 38-10-0120-01 (referring to years 2006 to 2016) are now terminated. From reference year 2018, new tables will be created.
The data are not seasonally adjusted.
The accuracy of data collected in a sample survey is affected by both sampling and non-sampling errors. Sampling errors arise from the fact that the information obtained from a sample of the population is applied to the entire population. The sampling method as well as the estimation method, the sample size and the variability associated to each measured variable determine the sampling error.
As for non-sampling errors, they arise from coverage error, data response error, non-response error, and processing errors. Every effort is made to reduce these types of errors including verification of keyed data, consistency and validity edits, follow up for non-response and consultation with government departments and industry associations.
Data response error may be due to questionnaire design, the characteristics of a question, inability or unwillingness of the respondent to provide correct information, misinterpretation of the questions or definitional problems. These errors are controlled through careful questionnaire design and testing and the use of simple concepts and consistency checks.
Processing errors may occur at various stages of processing such as data entry, editing and tabulation. Measures have been taken to minimize these errors.
Non-response error results when respondents refuse to answer, are unable to respond or are too late in reporting. Missing data items are imputed for partial non-responses (i.e., when only some questions are left unanswered) and total non-response (i.e., when all questions from the survey are left unanswered).
Quality indicators are derived to take into account the variablility due to sampling, non-response and RTA, when applied. On the data tables, quality indicators, are as follows:
A = excellent
B = very good
C = good
D = acceptable
E = use with caution
F = too unreliable to be published
X = suppressed to meet the confidentiality requirements of the Statistics Act