National Household Survey (NHS)

Detailed information for 2011

Status:

Inactive

Frequency:

Every 5 years

Record number:

5178

Complementing the data collected by the census, the National Household Survey (NHS) is designed to provide information about people in Canada by their demographic, social and economic characteristics as well as provide information about the housing units in which they live.

Data release - May 8, 2013 (first in a series of releases) For variables and definitions, see "Documentation" below.

Description

Starting in 2011, information previously collected by the mandatory long-form census questionnaire is collected as part of the voluntary National Household Survey (NHS). The NHS provides information about the demographic, social and economic characteristics of people living in Canada as well as the housing units in which they live.

The information from the survey provides data to support federal, provincial, territorial and local government planning and program delivery.

Statistical activity

The term 'Census Program' is used to refer in a general way to the Census of Population (record number 3901) and, if applicable, any accompanying survey conducted at the time of the census. The Census Program consists of two parts: a short questionnaire (census) with a basic set of questions distributed to 100% of households, or a long questionnaire (National Household Survey - record number 5178) distributed to a 33% sample of households.

Subjects

  • Aboriginal peoples
  • Education, training and learning
  • Ethnic diversity and immigration
  • Families, households and housing
  • Income, pensions, spending and wealth
  • Labour
  • Languages
  • Population and demography
  • Society and community

Data sources and methodology

Target population

The NHS covers all persons who usually live in Canada, in the provinces and the territories. It includes persons who live on Indian reserves and in other Indian settlements, permanent residents, non-permanent residents such as refugee claimants, holders of work or study permits, and members of their families living with them

Foreign residents such as representatives of a foreign government assigned to an embassy, high commission or other diplomatic mission in Canada, members of the armed forces of another country stationed in Canada, and residents of another country who are visiting Canada temporarily are not covered by the NHS.

The survey also excludes persons living in institutional collective dwellings such as hospitals, nursing homes and penitentiaries; Canadian citizens living in other countries; and full-time members of the Canadian Forces stationed outside Canada. Also excluded are persons living in non-institutional collective dwellings such as work camps, hotels and motels, and student residences.

Instrument design

The NHS questions were tested during the 2011 Census consultation and testing processes. Those processes helped Statistics Canada understand users' data requirements and assess questions. The questions were tested through focus groups and one-on-one interviews (qualitative tests) to make sure that they were properly understood. The questions asked in a voluntary context and the NHS collection method were not tested.

Two types of questionnaires were developed for the NHS: a questionnaire for the self-administered collection method, and a questionnaire for collection on Indian reserves and in remote areas, where 100% of the households were interviewed by a Statistics Canada enumerator. The survey questions related to each person's situation on May 10, 2011 (the same reference period as the census) unless otherwise noted in the questionnaire.

Sampling

This is a sample survey with a cross-sectional design.

A random sample of 4.5 million dwellings was selected for the NHS. This is slightly less than one-third (30%) of all private dwellings in Canada in 2011. The sample size was determined to ensure a uniform dissemination probability for small areas and small populations, within the available budget and resources. The NHS sample was selected from the 2011 Census of Population dwelling list.

The sampling fraction varies with the questionnaire delivery mode. For the mail delivery mode, about 3 households in 10 (29%) received a questionnaire. For the enumerator delivery mode, the sampling fraction is 1 in 3 households (33%). However, in cases where it was necessary to reach households in remote areas or on Indian reserves, where only the interview response mode was offered, all households were invited to participate in the NHS.

Data sources

Data collection for this reference period: 2011-05-10 to 2011-08-24

Responding to this survey is voluntary.

Data are collected directly from survey respondents.

There were two primary collection methods: a paper questionnaire and an online collection stream, although in some instances, a respondent may have been asked by an enumerator to complete the questionnaire.

NHS data collection ran from May to August 2011. It was carried out primarily in three successive waves:

In wave 1 (May and June), the focus was on online collection.

In wave 2 (June to mid-July), printed questionnaires were mailed out to households that did not respond in wave 1.

In wave 3 (mid-July to mid-August), non-response follow-up was conducted for households that did not respond in waves 1 and 2, with the aim of maximizing the survey's response rate.

NHS non-response follow-up was planned in such a way as to maximize the survey's response rate and control potential non-response bias due to the survey's voluntary nature. Non-response follow-up began in June 2011. During that process, enumerators contacted non-respondent selected households in person or by telephone to obtain their questionnaire responses. Subsequently, in mid-July 2011, a subsample of 400,000 of the 1.2 million dwellings that had not yet responded to the NHS was selected for non-response follow-up.

The 400,000 dwelling subsample was distributed geographically on the basis of the observed level of non-response and the heterogeneity of the population. Heterogeneity reflects the diversity of the population in a particular geographic area. Heterogeneity was determined with data from the 2006 Census long form questionnaire and was calculated for geographic areas similar in size to census dissemination areas (a dissemination area contains about 400 dwellings). Hence there was a correlation between the size of the subsample and the level of heterogeneity: the more heterogeneous the population was the larger the subsample, subject to a minimum size for each geographic area. The subsample was introduced to minimize the non-response bias that can arise when non-respondents are different from respondents.

View the Questionnaire(s) and reporting guide(s).

Error detection

Statistics Canada's Data Operations Centre (DOC) was the central reception and storage point for electronic and printed questionnaires. Electronic questionnaires were transmitted directly to the DOC's servers, and printed questionnaires were scanned and stored as images. After the quality of the image was confirmed, the data were captured by optical mark recognition (OMR) and intelligent character recognition (ICR). If the image quality was inadequate, the data were captured manually by an operator.

Coding, the next stage of data processing, was also carried out in the Data Operations Centre. All write-in responses were submitted to an automated coding system that assigned each response a numeric code using Statistics Canada reference files, code sets and standard classifications. When the system was unable to assign a code to a particular response, the response was coded manually by an operator. Coding was applied to the following variables: relationship to Person 1, place of birth, citizenship, non-official languages, home language, mother tongue, ethnic origin, population group, Indian band/First Nation, place of residence 1 year ago, place of residence 5 years ago, place of birth of parents, major field of study, location of study, language of work, industry, occupation and place of work.

After data capture, initial edit and coding operations have been completed, the data are processed up to the final edit and imputation stage. The final edit detects invalid responses and inconsistencies. This edit is based on rules determined by Statistics Canada's subject-matter analysts. Unanswered questions are also identified. Imputation replaces these missing, invalid or inconsistent responses with plausible values. When carried out properly, imputation can improve data quality by replacing non-responses with plausible responses similar to the ones that the respondents would have given if they had answered the questions. It also has the advantage of producing a complete data set.

Imputation

The nearest-neighbour method was used to impute NHS data. This method is widely used in the treatment of non-response. It replaces missing, invalid or inconsistent information about one respondent with values from another, 'similar' respondent. The rules for identifying the respondent most similar to the non-respondent may vary with the variables to be imputed. Donor imputation methods have good properties and generally will not alter the distribution of the data, a drawback of many other imputation techniques. Following nearest-neighbour imputation, the data are checked for consistency.

Estimation

The final responses are weighted so that the data from the sample accurately represent the NHS's target population. The weighting process involves calculating sampling weights, adjusting the weights for the survey's total non-response, and calibrating the weights against census totals.

First, an initial sampling weight of about 3 is assigned to each sampled household. The initial weight of 3 is the inverse of the probability of being selected in the NHS sample. About 3 of 10 households were selected in the sample, which yields an initial weight of just over 3 (10/3). Then the sampling weights are adjusted to reflect the selection of the subsample. The subsample was selected from the set of households that had not responded to the NHS by mid-July 2011. It is important to note that at the end of these two weighting steps, some households have a weight of 1 because in some regions, all households are selected in the NHS sample.

Next, since a number of households in the subsample were still non-respondent at the end of collection operations, the sampling weight is adjusted for the survey's residual non-response. This is done by transferring the weights of non-respondent households to the nearest-neighbour respondent households. The latter are identified in a manner similar to the imputation process, using known variables for respondent and non-respondent households, including census variables and a few variables resulting from matches to administrative databases.

Lastly, the weights are calibrated against census totals at the level of geographic calibration areas. Those areas contain an average of about 2,300 dwellings or 5,600 people in the NHS target population. They are formed by grouping dissemination areas so that they are contiguous, have enough respondent households to make calibration easy to perform, and do not straddle census division boundaries or, wherever possible, census subdivision and census tract boundaries. Calibration is performed so that the estimates for an NHS calibration area are approximately equal to the census counts for that area, for a set of about 60 characteristics common to the NHS and the Census. The control totals used are for age, sex, marital/common-law status, dwelling structure, household size, family structure and language. They include the number of households and individuals in all the dissemination areas that make up the calibration area. It is important to note, however, that for a given area, a number of calibration totals are discarded on the basis of certain criteria to avoid reducing the general quality of the estimates.

Quality evaluation

The final estimates were certified after weighting to ensure that the data are consistent and reliable. At this point, the final estimates are compared with various data sources. These comparisons help determine whether the NHS estimates are consistent and therefore of good quality. The key data sources used are estimates from other Statistics Canada surveys for which data based on common concepts are available (for example, the Labour Force Survey), data from previous censuses, and data from selected administrative records available to Statistics Canada (for example, the T1 file on family income and Citizenship and Immigration Canada's Longitudinal Immigration Database). Population projections, available for population subgroups (for example, projections for Aboriginal peoples), which are based on the 2006 Census and are produced with microsimulations, were also compared with the NHS estimates.

Disclosure control

Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Revisions and seasonal adjustment

This methodology type does not apply to this statistical program.

Data accuracy

In a sample survey there are two types of error: sampling error and non-sampling error. The former is present because when we estimate a characteristic, we are measuring only part of the population instead of the whole population. The latter covers all errors that are not related to sampling.

Sampling error

The objective of the NHS is to produce estimates from a number of questions for a wide variety of geographies, ranging from very large areas (such as provinces and census metropolitan areas) to very small areas (such as neighbourhoods and municipalities), and for various population groups such as Aboriginals Peoples and immigrants. These groups also vary in size, especially when cross-classified by geographic area. Such groupings are generally referred to as 'domains of interest'.

For any given domain of interest, on the assumption that the sampling is random, the sampling error depends on several parameters: population size, the number of survey respondents, the variability of the variables being measured, stratification and cluster sampling.

With a sampling rate of about 3 in 10 and a response rate of 68.6%, it is estimated that about 21% of the Canadian population participated in the NHS. Nevertheless, the quality of the domain estimates may vary appreciably, in particular because of the variation in response rates from domain to domain.

Non-sampling error

Besides sampling, a number of factors can cause errors in the survey's results. Respondents may misunderstand the questions and answer them inaccurately, and responses may be entered incorrectly during data capture and processing. These are examples of non-sampling errors that were thoroughly accounted for at every stage of collection and processing to mitigate their impact.

In addition, in every self-administered voluntary survey, error due to non-response to the survey's variables makes up a substantial portion of the non-sampling error. A distinction is made between partial non-response (lack of response to one or some questions) and total non-response (lack of response to the survey because the household could not be reached or refused to participate). Total non-response is likely to bias the estimates based on the survey, because non-respondents tend to have different characteristics from respondents. As a result, there is a risk that the results will not be representative of the actual population.

Since the NHS has a response rate of 68.6%, that risk is taken into account. Statistics Canada conducted several studies and various simulations, before and after collection, to assess the risk and extent of the potential bias. A number of measures were taken to mitigate its effects.

Documentation

Date modified: