Census Undercoverage Survey (CUS)

Detailed information for May 11, 2021

Status:

Active

Frequency:

Every 5 years

Record number:

3902

Following each census since the 1961 Census, the Reverse Record Check (RRC) has been carried out to measure census population undercoverage. For the 2021 Census cycle, the RRC has been renamed the Census Undercoverage Survey (CUS). The CUS estimates the number of persons missed in the census.

Data release - April 28, 2023 (preliminary estimates); September 27, 2023 (final estimates)

Description

Following each census since the 1961 Census, the Census Underoverage Survey (CUS), formerly called the Reverse Record Check (RRC), has been carried out to measure census population undercoverage. The CUS estimates the number of persons missed in the census. This estimate is combined with the estimate of the number of persons enumerated more than once from the Census Overcoverage Survey in order to calculate net undercoverage.

Population undercoverage is regarded as one of the most important sources of error affecting census data. It causes a downward bias to the extent that census counts underestimate true population counts. Overcoverage, on the other hand, results in an upward bias whereby census counts overestimate true population counts. These two sources of error will also distort the distribution of population characteristics estimated from census data when the overcovered and missed persons don't have the same characteristics as those who were enumerated once.

Reference period: Census Day

Collection period: Following the census

Subjects

Population and demography

Data sources and methodology

Target population

The target population is identical to the census. The census targets every person living in Canada on Census Day, as well as Canadians who are abroad, either on a military base, attached to a diplomatic mission, at sea or in port aboard Canadian-registered merchant vessels. Persons in Canada including those holding a temporary resident permit, study permit or work permit, and their dependents, are also part of the census.

The sampling frames used for the survey do not cover persons who had emigrated or were out of the country at the time of the previous Census, did not complete a census questionnaire and returned during the intercensal period (returning Canadians within a province), as well as persons returning from a territory to a province and persons from reserves who were partially enumerated in the previous census and enumerated in the current one. For this reason, the observed population does not include these populations, which totaled an estimated 290,000 persons for the 2016 Reverse Record Check.

Instrument design

Prior to the 2021 Census Undercoverage Survey (CUS), users of the CUS data have been consulted to discuss any potential change but the content of the CUS collection instrument has remained fairly stable over each iteration. The electronic questionnaire used for the 2021 CUS was reviewed by Statistics Canada's Questionnaire Design Resource Center.

Computer-assisted telephone interviewing (CATI) was the principal collection mode. The CATI application was thoroughly tested prior to collection.

Sampling

This is a sample survey with a cross-sectional design.

A stratified random systematic sample design was used to select a sample of individuals.

The sampling frame was constructed from eight sources independent of the 2021 Census. The first five frames were used to select a sample for estimating undercoverage in the ten provinces, while estimates for the three territories were calculated using samples from the last three frames only.

At the provincial level, the first two frames cover the persons who were in the 2016 Census target population. They represent all persons enumerated in the 2016 Census along with the persons missed by the census, represented by the portion of the sample of persons from the 2016 Reverse Record Check (RRC) who were classified as missed. To represent persons added to the target population since the previous census, intercensal births and immigrants (i.e., people who were born or immigrated between the 2016 and 2021 censuses) were added, as well as non-permanent residents on Census Day. The sampling frame for the three territories was constructed from their respective health insurance files.

The sample design varied by frame. In the 2016 Census frame, the sample design was a one-stage stratified design. The population was stratified according to the likelihood of being in-scope for the Census and the likelihood of being enumerated, as well as by province, sex and age. People enumerated on reserves in the 2016 Census were placed in separate strata. In the territories frame, the sample design was also a one-stage stratified design. The population was stratified by territory of residence, sex and age.

The missed frame is a sample-based frame since there is no list of all persons missed in the 2016 Census. The sample for this frame consisted of all cases classified as "missed" in the 2016 RRC. Strictly speaking, the sample was not stratified, but there was an implicit stratification since the 2016 missed cases were from different frames and strata.

The births frame was stratified by mother's province of residence. The immigrants frame and the non-permanent residents frame (permit holders and refugee claimants) were also stratified by province.

When using multiple sampling frames, as is the case for the CUS, the possibility exists that someone will be included in more than one frame. For example, a person in the immigrants frame may have been in Canada on a work permit in May 2016 and thus have been enumerable in the 2016 Census. The person would then be in both the immigrants frame and the census frame if he or she was enumerated, or in the immigrants frame and the missed frame if not enumerated. Consequently, it is important to identify all cases of frame overlap. If this is not done, estimates may be too high because some people are included twice in the frames. Though such overlap was identified wherever possible when the sampling frames were constructed, some overlap was also identified later using information provided by respondents.

The total sample size and the sample allocation were determined with the objective of achieving a targeted level of precision for the undercoverage estimate for each province and territory, as well as at the national level, while controlling for collection and processing costs.

Data sources

Data collection for this reference period: March 2022 to August 2022

Responding to this survey is mandatory.

Data are collected directly from survey respondents.

Although the 2021 Census Undercoverage Survey (CUS) was a multi-mode survey, the main data collection mode was the computer-assisted telephone interviewing (CATI). The CATI application was developed using many of the standards set for all CATI questionnaires used at Statistics Canada. The application consisted of various interrelated modules and was accessed through the regional offices' generic interface. Interviewers were assigned cases based on language and whether cases required tracing or not. By design, collection was proxy for Selected Persons (SPs) who were less than 18 years of age or presumed deceased. Proxy respondents were also used when the SP was not available during the collection period or was difficult to reach.

The average duration of the CATI interview was less than 15 minutes. However, the actual time spent on each case was much greater, given the number of contact attempts required and the amount of tracing that was involved.

A self-response electronic questionnaire was available for those SPs who were contacted by telephone and requested to complete the questionnaire themselves. Selected persons who the interviewers did not succeed in contacting by telephone and who had a good quality mailing address were sent an invitation letter explaining the survey and providing a Secure Access Code (SAC) to complete the questionnaire online, with instructions on how to do so.

Several sources of administrative data have been used in the various CUS steps. To build the sampling frames, Vital statistics data on intercensal births have been used, as well as administrative data from Immigration, Refugees and Citizenship Canada on immigrants and non-permanent resident and health care files for the three territories. To update the geographic information, especially for the census sample and the missed sample, for which the information was from 2016, a match with Canada Revenue Agency (CRA) files, including personal income tax files for 2016 to 2020 and 2020-2021 Canada Child Benefit files was performed. CRA files and vital statistics data are also used to check whether any selected persons had died. As part of the sample preparation, cases were linked to tax data and telephone files to provide updated contact data for the SP and their household members. These various administrative data files are obtained by Statistics Canada under the authority of the Statistics Act.

View the Questionnaire(s) and reporting guide(s) .

Error detection

The CATI application included an automated verification to ensure that data were collected for the right person. A similar verification was done as part of the post-collection edits. The CATI application also included a number of missing data edits and consistency edits. Interviewers were offered the opportunity to change the data they had entered. The data were also subject to post-collection edits for missing, incomplete, and inconsistent data. Classification of each sampled person as enumerated, missed or out of scope took place after post-collection processing. In order to achieve the highest quality of classification possible, each case potentially classified as missed was reviewed extensively.

Imputation

Although item non-response was very low for most questions in the CUS questionnaire, it was higher for the module asked of people whose usual residence on Census Day was outside of Canada. The questions from this module play a key role in properly classifying these persons. Donor imputation was used to impute for item non-response for the main questions in this module. Elsewhere, deterministic imputation was used for some missing, incomplete, or inconsistent data.

Estimation

The estimation of the Census Undercoverage Survey (CUS) is divided in two parts. First, there is the weighting of selected persons (SPs) which is followed by the calculation of the census undercoverage.

The initial weight of an SP from the 2016 missed frame was the final weight assigned to that person in the 2016 CUS when he/she was classified as missed. For SPs from the other sampling frames, the initial weights were generally based on the inverse of the selection probabilities in the sample.

To reduce bias, the initial weights of respondents had to be adjusted to account for non-response. The weight of persons who could not be classified (referred to as non-respondents) was redistributed among persons who were classified (referred to as respondents). This was done by ensuring that the weight of non-respondents with certain characteristics was redistributed among respondents with the same characteristics. The following characteristics were used: information available on the sampling frame, various tax indicators, as well as collection information.

The adjustment of the initial weights to account for non-response was followed by two calibration steps. The first one was for the territorial frames. The estimated number of enumerated persons in the territories has traditionally been lower than the comparable census count. This is probably due to undercoverage of the census target population in health insurance files. To address this undercoverage, the weights of the SPs selected in each territory were adjusted so that the estimated number of enumerated persons by age and sex equaled the comparable census count for that territory. Three age groups were used. The second calibration was for the census frame. Auxiliary variables strongly correlated with the CUS classifications of enumerated, deceased and missed persons were derived for each person on the census frame. The weights of the respondents from the census frame sample were then adjusted so that the estimated totals for those auxiliary variables matched the known frame totals.

Lastly, the weights of SPs from the 2016 Census frame who were enumerated more than once in 2016 were adjusted downward to account for the fact that these SPs had more than one chance of being selected.

Census population undercoverage was estimated by the weighted number of missed persons less the number of persons excluded from the CUS version of the 2021 Census Response Database (CUS CRDB). The CUS CRDB is a different version from the final 2021 Census Response Database, that was available before the end of census data processing. There are some minor differences between the CUS CRDB and later versions of the census databases. In particular, the CUS CRDB, which is a database of persons, contains all census records for persons with two exceptions. The first exception involves census records imputed through whole household imputation. The second group consists of census records with invalid or incomplete names, or invalid or incomplete birth dates. This group is also known as the 'incompletely enumerated.'

Lastly, for the purpose of calculating the variance of the CUS estimates, the bootstrap method was used. Five hundred boostrap replicate weights were created.

Quality evaluation

Pre-release verification consisted of data confrontation with other published sources (census count of enumerated persons, population estimates for deceased, emigrants, and internal migration), and historical trend analysis. In addition, there is an extensive certification process after the release of the preliminary estimates with the provincial and territorial statistical focal points and other key clients.

Disclosure control

Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

No microdata are released. No tabulations are produced for small geographic domains. Otherwise, there are no sensitive cells.

Revisions and seasonal adjustment

This methodology does not apply to this survey.

Non response

Several things have been done to minimize the effect of non-response on the Census Undercoverage Survey estimates. First of all, most cases are resolved without the need for collection. Introductory letters were sent to the selected persons (SPs) prior to collection. Head office provided tracing leads using several administrative files to help the ROs locate each SP. A five-month collection period allowed for multiple attempts for each case and for extensive tracing. Multi-mode collection, where invitation letters to complete the questionnaire online were mailed to harder to contact SPs at various points during collection, also helped in completing more interviews.

It is important to note that the definition of a non-respondent for classification, and therefore for estimation, was not the same as the usual definition of a non-respondent for whom data collection was attempted but not completed. This is because classification was based on data from many sources, one of which might be collection.

Data accuracy

Data from the Census Undercoverage Survey are combined with the results of the Census Overcoverage Study and data from the final census database to estimate population coverage error. Standard errors are provided for each estimate in the dissemination tables.

Date modified:: 2022-04-29

Language selection

Search and menus

Search