Retail Commodity Survey (RCS)
Detailed information for October 2020
This survey collects detailed information about retail commodity sales in Canada to produce estimates of the sales of various commodities at the national level, for different types of retail outlets in Canada.
Data release - January 12, 2021
- Questionnaire(s) and reporting guide(s)
- Data sources and methodology
- Data accuracy
The Retail Commodity Survey (RCS) collects detailed information about retail commodity sales in Canada. The objective is to produce estimates of the sales of various commodities at the national level, for different types of retail outlets in Canada. The survey is a complement to the Monthly Retail Trade Survey (MRTS - record number 2406). MRTS gathers total monthly retail sales, while RCS collects a breakdown of these sales by commodity type.
The information provided by RCS can be used to track commodity sales within and across various types of retail stores, as well as to calculate commodity market share, and to gain a better understanding of the rapidly changing retail industry. The data show the type of outlets where consumers prefer to buy certain commodities, and the shifts in the different types of commodities retailers decide to sell. Analysis of these data assists in establishing trends in commodity sales over time.
The RCS data are used by the Statistics Canada's System of National Accounts with respect to the estimates of personal expenditure. Other users of the data include federal and provincial government departments, retail analysts, market researchers, industry experts and independent consultants.
Reference period: Monthly
Collection period: the month following the reporting period.
- Retail and wholesale
- Retail sales by type of product
Data sources and methodology
The Retail Commodity Survey (RCS) has the same target population as the Monthly Retail Trade Survey (MRTS).The MRTS target population consists of all statistical establishments on Statistics Canada's Business Register that are classified to the retail sector using the NAICS 2017. The NAICS code range for the retail sector is 441100 to 454110.
The exclusions to the target population are ancillary establishments (producers of services in support of the activity of producing goods and services for the market of more than one establishment within the enterprise, and serves as a cost centre or a discretionary expense centre for which data on all its costs including labour and depreciation can be reported by the business), future establishments, establishments with a missing or a zero gross business income value on the BR and establishments in the following non-covered NAICS:
- 4542 (vending machine operators)
- 45431 (fuel dealers)
- 45439 (other direct selling establishments)
The questionnaires were developed at Statistics Canada and were reviewed and tested in the field in both official languages. In the course of developing the survey, Statistics Canada consulted with a number of retailers as well as with industry associations. The questionnaire underwent significant changes in 2016 moving to an electronic questionnaire and the North American Product Classification System (NAPCS).
This is a sample survey with a cross-sectional design.
The Retail Trade Survey (RCS) sample contains all of the retailers in the Monthly Retail Trade Survey (MRTS). The MRTS sample consists of 10,000 groups of establishments (clusters) classified to the retail trade sector selected from the Statistics Canada Business Register. A cluster of establishments is defined as all establishments belonging to a statistical enterprise that are in the same industry and geographical region. The MRTS uses a stratified design with simple random sample selection in each stratum. The stratification is done by sampling groups using the NAICS-three, four or five-digit codes, depending on the subsector, and the geographical regions consisting of the provinces and territories, as well as nine provincial sub-regions (Vancouver, Edmonton, Calgary, Winnipeg, Toronto, Ottawa, Gatineau, Montreal, and Quebec City). We further stratify by the establishment size. The size measure is created using a combination of independent survey data and three administrative variables: the gross business income, the GST sales, and the T2 revenue (from corporation tax return).
The size strata consist of one take-all (census), at most two take-some (partially sampled) strata, and one take-none (none sampled) stratum. Take-none strata serve to reduce respondent burden by excluding the smaller businesses from the surveyed population. These businesses should represent at most 10% of total sales.
The sample is allocated optimally in order to reach target coefficients of variation at the national, provincial/territorial, industrial, and sampling group by province/territory levels. The sample is also inflated to compensate for dead, non-responding, and misclassified units.
MRTS is a repeated survey with maximization of monthly sample overlap. The sample is kept month after month and every month births are added to the sample and dead units are identified. MRTS births, i.e., new clusters of establishment(s), are identified every month via the BR's latest universe. They are stratified according to the same criteria as the initial population. A sample of these births is selected according to the sampling fraction of the stratum to which they belong and is added to the monthly sample. Deaths also occur on a monthly basis. A death can be a cluster of establishment(s) that have ceased their activities (out-of-business) or whose major activities are no longer in retail trade (out-of-scope). The status of these businesses is updated on the BR using administrative sources and survey feedback, including feedback from the MRTS.
Methods to treat dead units and misclassified units are part of the sample and population update procedures.
For the RCS, there is one NAICS-five digit code industry that is subject to a different sampling treatment - the new car dealers industry (code 444110). For this industry, approximately 20 manufacturers and importers of new cars are surveyed through the New Motor Vehicle Dealer Commodity Survey to collect information on behalf of their dealers.
Responding to this survey is mandatory.
Data are collected directly from survey respondents.
If a respondent finds it more convenient to report their commodity data to Statistics Canada on a monthly basis, they are allowed to do so. Respondents can report annually when the distribution of their sales does not vary throughout the year. The reporting period refers to the period that the commodities were actually sold in the retail stores. The collection period is the time period in which the data is collected.
Data are principally collected by electronic questionnaire and the Statistics Canada Regional Offices. A selected number of units are collected through the head office in Ottawa.
Respondents are given a choice of collection methods: electronic or paper questionnaire or telephone. They also have the choice to report commodity data in dollars or as a percentage of total sales and receipts. Telephone follow-up is conducted to resolve edit problems with mail-back questionnaires and to collect data from respondents who have not returned the questionnaire.
The initial contact with the respondent consists of sending the respondent a package including an introductory letter informing the respondent that a Statistics Canada representative will be calling. A sample questionnaire is also included. This package is followed by a telephone conversation to introduce the survey to the respondent, identify the person best able to provide the data and obtain a detailed profile of what the business sells over a one-year time frame. A profile is a list of all the commodities sold by the retailer. The electronic questionnaire is then tailored to the commodities sold by the retailer.
Commodity indices were developed to assist interviewers and respondents in choosing the most appropriate commodity codes to classify the type of items being sold by retailers. There are two indices -- one is organized by the North American Product Classification System codes and the other one is an alphabetical listing by product within the 5 digit classes of the North American Industry Classification System.
View the Questionnaire(s) and reporting guide(s).
During data collection, on-line edits are performed to check for consistency between the current period's data and the last period's data. If the commodities reported for the current period are inconsistent with the previous period, the data are verified with the respondent. Edits to ensure that the captured information is numerically valid and that all data fields are completed are also performed, as well as edits to ensure that the reporting period dates are valid.
Once the data are received back at head office an extensive series of processing steps is undertaken to thoroughly verify each record received. Edits are performed at the micro level to ensure that: the commodities sold make sense for the type of store; the sum of the individual commodities equals the total sales reported and that there are no missing fields; the total sales reported to this survey is in line with the sales reported to Monthly Retail Trade Survey; and there are no large fluctuations in commodity sales from period to period. Records failing these edits are subject to manual inspection and possible corrective action.
An automated imputation system is used to impute for missing or erroneous data. Non-respondents, as well as respondents with one or more fields flagged for imputation (due to incomplete or inconsistent data identified during the editing process), are subject to imputation. Since the Retail Commodity Survey (RCS) sample is monthly-based, the imputation system processes the data for one reference month at a time. The system makes use of the auxiliary information available from the Monthly Retail Trade Survey (MRTS). Since all retailers in RCS are also in MRTS, the total sales for each record is obtained from the MRTS file after the MRTS edit and imputation process has been completed. The commodity fields are then imputed one at a time.
Firstly, the system uses the most recent historical data available to determine which commodities are sold by the retailer. The system then imputes a sales value by commodity. Different methods are used given available information. The first method to be used by the system is deductive imputation when possible. Then, for the remaining missing values, historical imputation is used to impute commodity values. Data from the retailer for the same month of the previous year is used. If that data is unavailable, the previous month data is used.
Where there is no historical data available, commodity values are imputed using nearest-neighbor imputation, and where that is not possible, commodity values are imputed by ratio imputation using a current auxiliary variable. Imputation groups of similar retailers are formed on the basis of type of store, and geographic region. Respondents that are considered to be outliers are excluded from the group. When there are not sufficient respondents in an imputation group, groups at successively more aggregated levels of type of store and geographic region are used. Finally, in last resort, if a unit still has fields missing, a proportion of its total sales will be imputed.
The last step consists of adjusting the imputed values to ensure that all parts add up to the corresponding totals for each North American Product Classification System hierarchy.
The commodity values for the new car dealers industry (code 441110 of the North American Industry Classification System) are derived in a different manner than the other industries. Since commodity distributions are collected in the responses to the New Motor Vehicle Dealer Commodity Survey, these distributions are applied to the Monthly Retail Trade Survey retail sales for this industry to derive commodity distributions for each individual new car dealer.
Estimation is a process that approximates unknown population parameters using only the part of the population that is included in a sample. Inferences about these unknown parameters are then made, using the sample data and associated survey design, such as design weight. This stage uses Statistics Canada's Generalized Estimation System.
The estimation weight that is applied to units in the Retail Commodity Survey (RCS) sample is made up of three components that are multiplied together. The first component is a weight reflecting the sampling design (i.e. a weight to inflate the sample data to represent the entire population). The second weight is an adjustment to increase representativeness and precision of the estimations by using the ratio estimator. The third weight is an adjustment to ensure coherence with the Monthly Retail Trade Survey (MRTS).
Ratio estimation consists of replacing the initial sampling weights (defined as the inverse of the probability of selection in the sample) by new weights in a manner that satisfies the constraints of calibration. Calibration ensures that the total of an auxiliary variable estimated using the sample must equal the sum of the auxiliary variable over the entire population, and that the new sampling weights are as close as possible (using a specific distance measure) to the initial sampling weights.
For example, suppose that the known population total of the auxiliary variable is equal to 100 and based on a sample the estimated total is equal to 90, so that we are underestimating by approximately 10%. Since we know the population total of the auxiliary variable, it would be reasonable to increase the weights of the sampled units so that the estimate would be exactly equal to it. Now since the variable of interest is correlated to the auxiliary variable, it is not unreasonable to believe that the estimate of the sales based on the same sample and weights as the estimate of the auxiliary variable may also be an underestimation by approximately 10%. If this is in fact the case, then the adjusted weights produce a more exact estimation of the total sales.
In essence, the ratio estimator tries to compensate for 'unlucky' samples and brings the estimate closer to the true total. It also reduce the variance. The gain in variance will depend on the strength of the relationship between the variable of interest and the auxiliary data.
Finally, the last component is an adjustment factor to ensure that the RCS total sales estimate equals the MRTS sales estimate at the 3 digit code level of the North American Industry Classification System (NAICS).
As mentioned before, the RCS produces monthly estimates, by North American Product Classification System (NAPCS), and more detailed quarterly estimates, by NAPCS and 3 digit NAICS, of the total retail sales.
Since the MRTS and RCS samples are monthly-based, commodity estimates and their variances are calculated for each month. The variances are derived directly from a stratified simple random sample without replacement. For quarterly estimations, the monthly estimates are summed to obtain commodity estimates for the quarter. Variance of the quarterly estimates is calculated as if respondents had reported on a quarterly basis.
Prior to publication, combined survey results are analyzed for comparability; in general, this includes a detailed review of individual responses (especially for the largest companies), general economic conditions, and historical trends.
The data are examined at a macro level to ensure that the long-term trends make sense when compared to publicly available information in media reports, company press releases, etc. Large fluctuations in year-over-year sales for commodities are analyzed to determine if they are in error or if sales for these commodities accurately reflect retail activity. Subject matter officers follow up with the company to confirm the data and to document reasons for large fluctuations in sales.
Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
Confidentiality analysis includes the detection of possible direct disclosure, which occurs when the value in a tabulation cell is composed of a few respondents or when the cell is dominated by a few companies.
Revisions and seasonal adjustment
Each release, current monthly preliminary estimates as well as previous monthly revised estimates are made available. Once a year, annual revisions are performed. The revisions mainly stem from responses received after the initial release of the month's data. Data are also revised due to revisions to the retail sales level provided by the Monthly Retail Trade Survey (MRTS).
The Retail Commodity Survey (RCS) total sales estimates are benchmarked at the sampling group level to the sales estimates (before seasonal adjustment) from the Monthly Retail Trade Survey (MRTS). Total sales for RCS differ slightly from the sales published by MRTS in that the sales of department store concessions are included in RCS and not in MRTS.
RCS estimates are not adjusted for seasonality.
The commodity estimates are derived from a sample survey and, as such, are subject to both sampling and non-sampling errors. Sampling errors are present because observations are made only on a sample and not on the entire population. The sampling error depends on factors such as the size of the sample, variability in the population, sampling design and method of estimation. The coefficient of variation (CV), which is the estimated standard error expressed as a percentage of the estimate, is used to measure the degree to which sampling error potentially exists within the sample. Estimates with smaller CVs are more reliable than estimates with larger CVs.
Non-sampling errors are not related to sampling and may occur for many reasons. Population coverage errors, differences in the interpretation of questions, incorrect information from respondents, and mistakes in recording, coding and processing data are examples of non-sampling errors. Non-response is an important source of non-sampling error. While the impact of non-sampling errors is difficult to evaluate, the measure of imputation rates are considered. Imputation rate is defined as equal to the total of imputed sales divided by the total sales for a given commodity. For example, if the total estimated sales for a commodity is $1 million, and $150,000 is from imputed data, then the imputation rate is 15%. Estimations with lower imputation rates are considered more reliable than those with higher imputation rates.
A quality indicator is derived for each estimate. This quality indicator code is a joint measure of the magnitude of the CV and the imputation rate. Quality indicators are defined by a letter from A to F (where A is most reliable and E is to be used with caution). Estimations coded F are considered to be of insufficient quality to be published.