Annual Retail Trade Survey
Detailed information for 2012
Status:
Active
Frequency:
Annual
Record number:
2447
The purpose of this survey is to collect the financial and operating/production data needed to develop national and regional economic policies and programs.
Data release - March 26, 2014
- Questionnaire(s) and reporting guide(s)
- Description
- Data sources and methodology
- Data accuracy
- Documentation
Description
The Annual Retail Trade Survey measures, on an annual basis, the operating and financial characteristics of Canadian retailers.
Data from this survey provide information on revenue, expenses and inventory. The data are used by all levels of government, government agencies, the retail industry and individuals in assessing trends within the industry, measuring performance, benchmarking and to study the evolving structure of the retail industry. The information is also a critical input into the measure of gross margins in the Canadian System of National Accounts (CSNA).
Statistical activity
The survey is administered as part of the Unified Enterprise Survey program (UES). The UES program has been designed to integrate, gradually over time, the approximately 200 separate business surveys into a single master survey program. The UES aims at collecting more industry and product detail at the provincial level than was previously possible while avoiding overlap between different survey questionnaires. The redesigned business survey questionnaires have a consistent look, structure and content. The unified approach makes reporting easier for firms operating in different industries because they can provide similar information for each branch operation. This way they avoid having to respond to questionnaires that differ for each industry in terms of format, wording and even concepts.
Reference period: The calendar year, or the 12-month fiscal period for which the final day occurs on or between April 1st of the reference year and March 31st of the following year.
Collection period: April to October
Subjects
- Retail and wholesale
- Retail sales by type of store
Data sources and methodology
Target population
The target population consists of all retail establishments operating in Canada for at least one day between January and December of a calendar year. Direct sellers and operators of vending machines are excluded from the target population of this survey.
The survey population is the collection of all retail establishments from which the survey can realistically obtain information. The survey population will differ from the target population due to difficulties in identifying all the units that belong to the target population because of a possible lack of detailed information for some units, particularly small businesses with low sales levels.
The survey population is comprised of all statistical establishments coded to NAICS 441 through 453 on Statistics Canada's Business Register, as well as those small unincorporated businesses which are classified to the retail industry.
Instrument design
The questionnaires used in the survey have been successfully designed to minimise different interpretations. The questions asked on the questionnaire are closed questions. The survey forms were field tested with respondents to ensure the questions; concepts and terminology were appropriate. Statistics Canada consulted with the Canadian Retail Trade Council and the Quebec Retail Council and tested the questionnaire with a number of different retail firms in Toronto, Ottawa and Montreal. The Centre for the Study of Commercial Activity at Ryerson Polytechnic University in Toronto also organized a workshop on the survey with representatives of some of Canada's largest retailers.
Sampling
This is a sample survey with a cross-sectional design.
In order to reduce the respondent response burden and still produce reliable figures, exclusion thresholds based on industrial, provincial, and size dimensions were implemented. Administrative (tax) data were used to estimate for small businesses below the threshold and data for the retailing establishments above the prescribed threshold were collected mainly through questionnaires, but also through direct replacement with tax data for several businesses.
Before sample selection, the survey population is delineated into cells representing the provincial, industrial groups (mainly, but not only four digit level NAICS), and size dimensions required. The establishments in the survey population are first stratified according to their province/territory and industrial group based on the NAICS industrial classification. The industrial groups are mutually exclusive classifications, each representing similar businesses.
Within each province/territory, by industrial group combination, four size strata are created to group businesses of a similar size. The boundaries are determined using total estimated revenues for the businesses. The resulting groups are one take-all stratum of the largest businesses (which are all included in the sample), two take-some strata (from which representative samples are selected) and one take-none stratum (containing small businesses which are not eligible to be sampled). Optimal stratum boundaries or thresholds are determined to minimise the total sample size. It should be noted that the chains of stores (defined as an organization operating four or more outlets in the same industry class under the same legal ownership at any time during the survey year) are all included in the take-all stratum, thus, all included in the sample.
Following the sample selection process, data for the take-all and take-some strata are collected through questionnaires. However, for 55% of the selected 'simple' businesses, that is, those that operate in a single province and conduct all their activities in the same industry, under the same legal entity, tax data is substituted for survey collection. For those units belonging to the take-none stratum, a census of administrative (tax) records is used to collect selected financial information.
All sampled units are assigned a sampling weight. An initial weight equal to the inverse of the original probability of selection is assigned to each entity. The sampling weight is a raising factor attached to each sampled unit to obtain estimates for the population. For example, if two units are selected at random and with equal probability out of a population of 10 units, then each selected unit represents five units in the population, and it is given a sampling weight of five. These weights are subsequently adjusted, at the time of producing survey results, to reflect as closely as possible the characteristics of the population in this industry.
On the Business Register, there were approximately 198,242 retail establishments having operated for at least one day during the reference year 2012. The sample comprised approximately 41,494 establishments.
Data sources
Responding to this survey is mandatory.
Data are collected directly from survey respondents and extracted from administrative files.
The survey is conducted using the mail-out / mail-back questionnaire approach, as well as using Computer Assisted Telephone Interviews (CATI) for capture, edit and follow-up.
The questionnaires are mailed to the respondents to the survey after the end of the calendar year. An automatic fax reminder is sent to non-reporters around 15 days after mailing out the questionnaires. A telephone contact is made with non-reporting companies 15 days after the first fax follow-up to discuss reporting delinquency and possible special arrangements. A second fax is sent to persistent non-reporters later on in the collection period before collection is closed.
Respondents can report to the survey by fax or by mail. Information may be transmitted by the Internet or by telephone. In exceptional cases a company may not be able to comply with the legal reporting deadlines and special reporting arrangements are determined.
View the Questionnaire(s) and reporting guide(s).
Error detection
Several checks are performed on the collected data to verify internal consistency and identify extreme values. Data are analyzed within each industrial group and geographic region. Extreme values are reviewed and corrective action taken. These extreme values are excluded from use in the calculation of imputation variables by the imputation system.
Imputation
Units which do not respond in the current period are imputed (their characteristics are estimated). Units are imputed by applying a growth factor to previously reported data when available. The growth factor is estimated using the survey responses for the units that are most similar to the unit being imputed.
When partial survey data covering three key variables (total operating revenue, total operating expenses and cost of goods sold) are received, the imputation factors are calculated at the unit level using these partial data. For records without historical information, a donor imputation system (nearest neighbor) is used. Information on the size of the non-respondent is obtained and a similar sized respondent is found. The size information consists of the three key variables (total operating revenue, total operating expenses and cost of goods sold). If this information is not available, sales from the Monthly Retail Trade Survey (Survey ID 2406) are used. In this case, the monthly sales are directly copied over to the non-respondent and the rest of the key variables are calculated using the sales data.
Estimation
Estimation is a process that approximates unknown population parameters using information from only the part of the population that is included in a sample. Inferences about these unknown parameters are then made, using the sample data and the associated survey design. The estimation process for the Retail Trade survey is done within the United Enterprise Survey (UES) framework, and takes place after all the missing data from partial or total non-response have been imputed.
The population is divided into a survey portion (take-all and take-some strata) and a non-survey portion (take-none stratum). From the sample that is drawn from the survey portion, an estimate for the population is determined through the use of a Horvitz-Thompson estimator where responses are weighted using the inverses of the inclusion probabilities of the sampled units. Such weights (called sampling weights) can be interpreted as the number of times that each sampled unit should be replicated in order to represent the entire population. The calculated weighted values are summed by domain in order to produce the total estimates by each industrial group / geographic area combination. A domain is defined as the most recent classification values available from the BR for the unit and the survey reference period. These domains may differ from the original sampling strata because units may have changed size, industry or location. Changes in classification are reflected immediately in the estimates and do not accumulate over time.
During the estimation process for the survey portion, an adjustment is applied to the units for which data were collected via tax records as opposed to survey data. This correction is done to take into account the fact that, because they were not actually contacted, some of them might be erroneously considered alive or in-scope for the survey. Furthermore, an outlier detection procedure indentifies influential units and a correction process ensures that they do not contribute too heavily to the estimates.
For the non-survey portion, the total revenue is taken from tax data and a ratio-type estimator is calculated (derived from the survey portion by each North American Industry Classification -- geographical combination) and is applied to the missing variables.. The total estimate is equal to the sum of the survey and non-survey portion estimates.
The measure of precision used to evaluate the quality of a population parameter estimate and to obtain valid inferences is the variance. The variance from the survey portion is derived directly from a stratified simple random sample without replacement.
Sample estimates may differ from the expected value of the estimates. However, since the estimate is based on a probability sample, the variability of the sample estimate with respect to its expected value can be measured. The variance of an estimate is a measure of the precision of the sample estimate and is defined as the average, over all possible samples, of the squared difference of the estimate from its expected value.
Quality evaluation
Prior to the data release, combined survey results are analyzed for comparability; in general, this includes a detailed review of: individual responses (especially for the largest companies), general economic conditions, historic trends, and comparisons with annualized monthly survey data and industry and trade association sources.
Disclosure control
Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
Confidentiality analysis includes the detection of possible direct disclosure, which occurs when the value in a tabulation cell is composed of a few respondents or when the cell is dominated by a few companies.
Revisions and seasonal adjustment
Revisions in the raw data are required to correct known non-sampling errors. These normally include replacing imputed data with reported data and respondent corrections to previously reported data.
Raw data are revised, on an annual basis, for the year immediately prior to the current reference year being published. That is, when data for the current year are being published for the first time, there will also be revisions, if necessary, to the raw data for the previous year.
Data accuracy
While considerable effort is made to ensure high standards throughout all stages of collection and processing, the resulting estimates are inevitably subject to a certain degree of non-sampling error. Non-sampling error is not related to sampling and may occur for many reasons. For example, non-response is an important source of non-sampling error. Population coverage, differences in the interpretation of questions, incorrect information from respondents, and mistakes in recording, coding and processing data are other examples of non-sampling errors.
Measures such as response rate (total number of completed questionnaires as a percentage of the total active, in-scope survey sample) and response fraction (the proportion of the estimate based upon reported data) can be used as indicators of the possible extent of non-sampling errors. For the 2012 survey, at the Canada level, the response fractions (RF) for total operating revenue (TOR) was 93%.
It is an unavoidable fact that estimates from a sample survey are subject to sampling error. The basis for measuring the potential size of sampling errors is the standard deviation of the estimates derived from survey results. However, due to the large variety of estimates that can be produced from a survey, the standard deviation of an estimate is usually expressed relative to the estimate to which it pertains. This resulting measure, known as the coefficient of variation (CV) of an estimate, is obtained by dividing the standard deviation of the estimate by the estimate itself and is expressed as a percentage of the estimate. Statistics Canada commonly uses CV results when analyzing data and urges users to do so as well.
Documentation
- Partial list of retail chain stores for the Annual Retail Trade Survey - 2012
- Date modified: