The Survey of Industrial Processes (SIP) is an industry-specific business survey focusing on small- and medium-sized enterprises (SMEs). It is designed to link economic data with industrial processes and environmental outcomes. The SIP collects data on operational activities and engineering processes of industrial, manufacturing, and service oriented establishments.

The 2009 SIP pilot focused on retail gasoline outlets across Canada that includes retail gas stations and marinas with gas docks. The survey collected data on the activities, processes, and equipment of these establishments, including quantities of gasoline sold, gasoline truck delivery frequencies, number of gasoline storage tanks, number of gas dispensers, age of site, and several other variables. The survey was designed to collect economic and process data for estimating gasoline evaporative losses.

In conjunction with related Statistics Canada surveys and administrative records, the SIP pilot was administered to explore potential linkage between economic and environmental data. Results from this pilot will be evaluated to examine the feasibility of collecting process-related data from such establishments for use in estimating environmental outcomes. For example, data from this pilot will be used to estimate evaporative losses from retail gasoline outlets across Canada.


Data sources and methodology

Target population

The population consisted of establishments primarily engaged in retailing gasoline fuels, including marinas, whether or not operated in conjunction with a convenience store, repair garage, restaurant or other type of operation. Diesel-only stations and card-locks were excluded. The initial frame was derived from Statistic's Canada Business Register using the following NAICS codes: 447110 (retail gas stations), 447190 (other retail gas stations) and 713930 (marinas).

Instrument design

The questionnaire was developed with inputs and review from a technical advisory group consisting of professional engineers, academics, and industry experts in the subject matter area. Several retail gas station owners and operators were consulted during content development. Questionnaire content review and cognitive testing were conducted through the Questionnaire Design Resource Centre (QDRC) of Statistics Canada with direct face to face input and reviews from over a dozen retail gasoline outlet owners and operators.


This is a sample survey with a cross-sectional design.

The population frame was stratified by different criteria, including but not limited to size, banner and brand name, Census Metropolitan Area, and proximity to highways and lakes. The survey employed a two-stage sampling design; the first stage was used to frame the population and develop the stratification variables. The size of the frame was estimated after a pre-contact sampling campaign. In this stage, 50% of all locations from Statistics Canada's initial 20,000 Business Register records were selected based on a predefined criterion of brand-names and size. All marinas and most small independently owned and operated retail gas stations were included in this first stage sampling. These locations received telephone calls to validate and confirm their addresses and business activities.

Following the first stage sampling campaign the population was stratified into six groups based on brand names, size of the operations, and revenues. The six groupings were: 1) small independent retail gas stations that do not belong to any brand name, 2) small retail gas stations that are associated with a brand name, 3) retail gas stations that belong to a banner of regional brand names with multiple outlets, 4) retail gas stations that belong to a banner of national brand names with multiple outlets, 5) all marina with gas docks, and 6) a residual stratum to include those outlets that could not be stratified to any of the above.

Groups 1 and 5 were surveyed as a census. Groups 2, 3, 4, and 6 were surveyed as a random sample with an assigned probability of selection. These probabilities of selection were used to estimate respective population parameters.

Data sources

Data collection for this reference period: 2010-01-15 to 2010-03-15

Responding to this survey is mandatory.

Data are collected directly from survey respondents.

Data collected included the types and age of equipment, engineering processes, and the quantity/types of fuels dispensed for the reference period January 1, 2009 to December 31, 2009. Survey data were collected via a standard mail-out mail-back questionnaire. Data from returned questionnaires were keyed into an internal Statistics Canada data capture application. Quantity data, activity data, equipment data, process and operating procedure data were all collected directly from survey respondents. Establishments that submitted partial or complete non-responses were contacted by collection staff to assist them in completing the questionnaire.

View the Questionnaire(s) and reporting guide(s).

Error detection

No survey is error free. Many factors contribute to errors and biases. For example, respondents may have made errors in interpreting questions, answers may have been incorrectly entered on the questionnaires, and errors may have been introduced during the data capture or tabulation processes.

Returned questionnaires were checked for errors. Data were captured using a text recognition program. Flags generated by the automated capture algorithm were manually reviewed and corrected. Once all data were captured, an application was used to check if all mandatory cells had been filled in, that certain values were within acceptable ranges, that questionnaire flow patterns had been respected, and that totals equalled the sum of their components. Collection staff evaluated the edit failures and concentrated follow-up efforts accordingly. Consistency edit rules were performed on the data for each usable record. These rules ensured that all the variables had valid responses and were complete and coherent both within the questionnaire and across questionnaires.

Data checking was also performed by subject matter staff who conducted more in-depth research through external sources, data confrontation, and administrative records for some establishments in an effort to verify the information submitted. Outliers were identified and were controlled for during the imputation process.


Four methods of statistical imputation were used for missing data in partially completed questionnaires: deterministic imputation (only one possible value for the field to impute); imputation by ratio within or in-between the records of respective establishments (for example, using the quantity of gasoline sold as a surrogate for total gasoline sales); donor imputation (using a "nearest neighbour" approach to find a valid record that is similar to the record requiring imputation); and manual imputation (use of expert judgement). Ratios were calculated and donors were selected for imputation purposes based on the most appropriate strata, homogenous groupings within one geographic area.


Estimates for the target population were calculated by multiplying the response values for the sampled units by their sampling weights. The groups that were part of a census had a weight close to 1. For those who were part of a sample, weights were calculated using standardized statistical methodologies such as probability allocation, generalised weight share adjustment, and post-stratification methods.

Quality evaluation

The quality of the data was evaluated and cross validated using a number of Statistics Canada's surveys and related industry data and administrative records. Data on volume and types of fuel sold, employment number, and number of retail gas stations across Canada were compared with data from the following sources: the Survey of Employment, Payrolls and Hours; the Refined Petroleum Products Survey; the Canadian Vehicle Survey; the Monthly Retail Trade Survey; the Road Motor Vehicle Survey - Fuel; Statistics Canada Business Register; Excise Fuel Sales Data, Census of Population; MJ Ervin and Associates; Kent Marketing Services Limited; and administrative records from several major brand name retail gasoline outlets across Canada.

Disclosure control

Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Revisions and seasonal adjustment

Data accuracy

Approximately 5,000 establishments received the questionnaire by mail. About 3,200 complete or partially complete responses were processed. The response rate for the survey was 68%.

Sampling Error

The accuracy of data collected in a survey is affected by both sampling and non-sampling errors. Sampling errors arise from the fact that the information obtained from a sample of the population is applied to the entire population. The accuracy of data representing the population, sampling method, sample size, imputation methods, the weighting scheme and the variability associated to each reported and estimated variable are potential sources of error that contribute to the quality and the accuracy of the estimates.

The coefficient of variation (CV) is used to quantify sampling error. A CV represents the statistical likelihood that the true value of a variable will fall within a certain range of its estimated value. As an example, if the total number of retail gasoline outlets across Canada is estimated to be 11,262 outlets with a CV of 5%, then the true number of retail gasoline outlets across Canada would be somewhere within the range of 11,262 ± 5% within one standard deviation (i.e., 64% statistical confidence) or ±10% within two standard deviations (i.e., 95% statistical confidence).

The following classification was used to qualify the CVs of each reported estimate: Class A = Excellent (0.00% to 4.99%); Class B = Very good (5.00% to 9.99%); Class C = Good (10.00% to 14.99%); Class D = Acceptable (15.00% to 24.99%); Class E = Use with caution (25.00% to 49.99%); Class F = Too unreliable to be published (50.00% and more); and Class X = Suppressed to meet the confidentiality requirements of the Statistics Act.

For details on the CVs of key variables published please view "Reported Estimates with Corresponding Coefficient of Variation", available in the Documentation section below.

Non sampling errors

Non-sampling errors arise from coverage errors, response errors, non-response errors, and processing errors. To reduce coverage errors, a pre-contact campaign (calling each establishment to confirm business activity and address) was conducted on a large sample of retail gas stations. To reduce response error, a post-contact follow up campaign was conducted with all respondents who partially completed their surveys.

Response error may be due to questionnaire design, the characteristics of an individual question, the inability or unwillingness of the respondent to provide correct information, misinterpretation of the questions or definitional problems. These errors were controlled through questionnaire cognitive testing in which over 20 retail gas station owners/operators participated.

To reduce response errors, manual one-by-one verification of a sample (20%) of the completed questionnaires in terms of consistency and coherence of data, pre- and post-imputation, was conducted.

To prevent small retail gasoline outlets representing larges ones, and vice versa, weight adjustments took into consideration outlets with similar characteristics in quantity of gasoline sold, number and size of tanks, and geographical area.

Partial non-response records, those respondents who provided partial responses with some questions left unanswered, were imputed using donors with similar characteristics.


