Reverse Record Check (RRC)
Detailed information for May 10, 2016
Every 5 years
Following each census since the 1961 Census, the Reverse Record Check (RRC) has been carried out to measure census population undercoverage. The RRC estimates the number of persons missed in the census.
Data release - March 29, 2018 (preliminary estimates); September 27, 2018 (final estimates)
Following each census since the 1961 Census, the Reverse Record Check (RRC) has been carried out to measure census population undercoverage. The RRC estimates the number of persons missed in the census. This estimate is combined with the estimate of the number of persons enumerated more than once from the Census Overcoverage Study in order to calculate net undercoverage.
Population undercoverage is regarded as one of the most important sources of error affecting census data. It causes a downward bias to the extent that census counts underestimate true population counts. Overcoverage, on the other hand, results in an upward bias whereby census counts overestimate true population counts. These two sources of error will also distort the distribution of population characteristics estimated from census data when the overcovered and missed persons don't have the same characteristics as those who were enumerated once.
Reference period: Census Day
Collection period: Following the census
- Population and demography
Data sources and methodology
The target population is identical to the census. The census targets every man, woman and child living in Canada on Census Day, as well as Canadians who are abroad, either on a military base, attached to a diplomatic mission, at sea or in port aboard Canadian-registered merchant vessels. Persons in Canada including those holding a temporary resident permit, study permit or work permit, and their dependents, are also part of the census.
The sampling frames used for the survey do not cover persons who had emigrated or were out of the country at the time of the previous Census, did not complete a census questionnaire and returned during the intercensal period (returning Canadians within a province), as well as persons returning from a territory to a province and persons from Indian reserves or Indian settlements who were partially enumerated in the previous census and enumerated in the current one. For this reason, the observed population does not include these populations, which totaled an estimated 260,000 persons for the 2011 Reverse Record Check.
Prior to the 2016 Reverse Record Check (RRC), users of the RRC data have been consulted to discuss any potential change but the content of the RRC collection instrument has remained fairly stable over each iteration. Three questionnaires have been used for the 2016 RRC; one for when the selected person was the respondent, one for when a proxy was interviewed and one for when the selected person was deceased prior to May 10, 2016. Each version was reviewed by Statistics Canada's Questionnaire Design Resource Center.
Computer-assisted telephone interviewing (CATI) was the principal collection mode. The CATI application incorporated the content of all three questionnaires, with different flows and wording based on the type of respondent. The application was thoroughly tested prior to collection.
This is a sample survey with a cross-sectional design.
A stratified random systematic sample design was used to select a sample of individuals.
The sampling frame was constructed from six sources independent of the 2016 Census. The first five frames were used to select a sample for estimating undercoverage in the ten provinces, while estimates for the three territories were calculated using samples from the last frame only.
At the provincial level, the first two frames cover the persons who were in the 2011 Census target population. They represent all persons enumerated in the 2011 Census along with the persons missed by the census, represented by the portion of the sample of persons from the 2011 Reverse Record Check (RRC) who were classified as missed. To represent persons added to the target population since the previous census, intercensal births and immigrants (i.e., people who were born or immigrated between the 2011 and 2016 censuses) were added, as well as non-permanent residents on Census Day. The sampling frame for the three territories was constructed from their respective health insurance files.
The sample design varied by frame. In the 2011 Census frame, the sample design was a one-stage stratified design. The population was stratified by province of residence, sex, age and marital status. People enumerated on Indian reserves in the 2011 Census were placed in separate strata. In the territories frame, the sample design was also a one-stage stratified design. The population was stratified by territory of residence, sex and age.
The missed frame is a sample-based frame since there is no list of all persons missed in the 2011 Census. The sample for this frame consisted of all cases classified as "missed" in the 2011 RRC. Strictly speaking, the sample was not stratified, but there was an implicit stratification since the 2011 missed cases were from different frames and strata.
The births frame was stratified by mother's province of residence. The immigrants frame and the non-permanent residents frame (permit holders and refugee claimants) were also stratified by province.
When using multiple sampling frames, as is the case for the RRC, the possibility exists that someone will be included in more than one frame. For example, a person in the immigrants frame may have been in Canada on a work permit in May 2011 and thus have been enumerable in the 2011 Census. The person would then be in both the immigrants frame and the census frame if he or she was enumerated, or in the immigrants frame and the missed frame if not enumerated. Consequently, it is important to identify all cases of frame overlap. If this is not done, estimates may be too high because some people are included twice in the frames. Though such overlap was identified wherever possible when the sampling frames were constructed, some overlap was also identified later using information provided by respondents.
It was decided that the total size of the 2016 sample would be similar to that of the 2011 RRC. Sample allocation was done in two stages. First, the national sample was allocated to the provinces using a combination of equal-variance allocation, to obtain the same variance for all provincial undercoverage rate estimates, and optimal allocation, to find the national undercoverage rate estimate with the smallest variance. Second, the provincial samples were allocated to the provincial strata. This was done by optimal allocation based on historical undercoverage rates, historical non-response rates, and stratum size. The missed frame was an exception: everyone who was classified as missed in the 2011 RRC was selected.
The resulting allocation was only approximately optimal, since assumptions had to be made about the size of certain populations, including the expected number of intercensal births and immigrants, at the time of the allocation. The final total allocated sample was 70,467 persons, distributed across the frames:
- 53,663 for the census frame;
- 4,745 for the missed frame;
- 4,026 for the births frame;
- 2,958 for the immigrants frame;
- 2,480 for the non-permanent residents frame;
- 2,595 for the territories frame.
In each stratum, the list of persons was sorted according to certain characteristics, and then a systematic random sample was selected. This ensured that each of these characteristics was adequately represented in the sample. The sorting variables varied by frame. For the census frame, the population was sorted by type of dwelling (private or collective), age and geographical region. For the Indian reserves strata, sorting was also done by sex (for the rest of the census frame, sex was a stratification variable). For the births frame, sorting was done by the child's year of birth and age of the mother. The immigrants frame strata were sorted by year of immigration and age. The non-permanent residents frame sorting was done by type of permit and refugee status, sex and age. For the strata in the territories, sorting was done by geography and age.
Data collection for this reference period: January 2017 to June 2017
Responding to this survey is mandatory.
Data are collected directly from survey respondents.
Although the 2016 Reverse Record Check (RRC) was a multi-mode survey, the main data collection mode was the computer-assisted telephone interviewing (CATI). The CATI application was developed using many of the standards set for all CATI questionnaires used at Statistics Canada. The application consisted of various interrelated modules and was accessed through the regional offices' generic interface. Interviewers were assigned cases based on language and whether cases required tracing or not. By design, collection was proxy for Selected Persons (SPs) who were less than 18 years of age or presumed deceased. Proxy respondents were also used when the SP was not available during the collection period or was difficult to reach.
The average duration of the CATI interview was less than 15 minutes. However, the actual time spent on each case was much greater, given the number of contact attempts required and the amount of tracing that was involved.
Paper questionnaires in both official languages were available for those SPs who were contacted by telephone and requested a paper questionnaire. Selected persons who the RO did not succeed in contacting by telephone and who had a good quality mailing address (as determined by the RO) were sent a paper questionnaire package from Head Office (HO) containing the different questionnaire versions, a cover letter explaining the survey, instructions for choosing the right questionnaire, and how to complete it. Finally, field interviewers completed some interviews using the paper questionnaires. Data capture from the paper questionnaires was performed at HO using the CATI system. A great deal of coordination is required to operationalize a sequential multiple-mode collection system such as the 2016 RRC. Of the 12,790 completed questionnaires, 94.8% were done by CATI, 4.7% were done by self-enumeration, and 0.5% by personal interview.
Several sources of administrative data have been used in the various RRC steps. To build the sampling frames, Vital statistics data on intercensal births have been used, as well as administrative data from Immigration, Refugees and Citizenship Canada on immigrants and non-permanent resident and health care files for the three territories. To update the geographic information, especially for the census sample and the missed sample, for which the information was from 2011, a match with Canada Revenue Agency (CRA) files, including personal income tax files for 2010 to 2015 and 2015-2016 Canada Child Tax Benefit files was performed. CRA files and vital statistics data are also used to check whether any selected persons had died. As part of the sample preparation, cases were linked to tax data and telephone files to provide updated contact data for the SP and their household members. These various administrative data files are obtained by Statistics Canada under the authority of the Statistics Act.
View the Questionnaire(s) and reporting guide(s) .
The CATI application included an automated verification to ensure that data were collected for the right person. A similar verification was done as part of the post-collection edits. The CATI application also included a number of missing data edits and consistency edits. Interviewers were offered the opportunity to change the data they had entered. The data were also subject to post-collection edits for missing, incomplete, and inconsistent data. Classification of each sampled person as enumerated, missed or out of scope took place after post-collection processing. In order to achieve the highest quality of classification possible, each case potentially classified as missed was reviewed extensively.
Deterministic imputation was used for some missing, incomplete, or inconsistent data.
The estimation of the Reverse Record Check (RRC) is divided in two parts. First, there is the weighting of selected persons (SPs) which is followed by the calculation of the census undercoverage.
The initial weight of an SP from the 2011 missed frame was the final weight assigned to that person in the 2011 RRC when he/she was classified as missed. For SPs from the other sampling frames, the initial weights were generally based on the inverse selection probabilities in the sample.
To reduce bias, the initial weights of respondents had to be adjusted to account for non-response. The weight of persons who could not be classified (referred to as non-respondents) was redistributed among persons who were classified (referred to as respondents). This was done by ensuring that the weight of non-respondents with certain characteristics was redistributed among respondents with the same characteristics. The following characteristics were used: information available on the sampling frame, various tax indicators, as well as collection information.
The adjustment of the initial weights to account for non-response was followed by two calibration steps. The first one was for the territorial frames. The estimated number of enumerated persons in the territories has traditionally been lower than the comparable census count. This is probably due to undercoverage of the census target population in health insurance files. To address this undercoverage, the weights of the SPs selected in each territory were adjusted so that the estimated number of enumerated persons by age and sex equaled the comparable census count for that territory. Three age groups were used. The second calibration was for the census frame. Auxiliary variables strongly correlated with the RRC classifications of enumerated, deceased and missed persons were derived for each person on the census frame. The weights of the respondents from the census frame sample were then adjusted so that the estimated totals for those auxiliary variables matched the known frame totals.
Lastly, the weights of SPs from the 2011 Census frame who were enumerated more than once in 2011 were adjusted downward to account for the fact that these SPs had more than one chance of being selected.
Census population undercoverage was estimated by the weighted number of missed persons less the number of persons excluded from the RRC version of the 2016 Census Response Database (RRC CRDB). The RRC CRDB is an early version of the final 2016 Census Response Database that was available before the end of census data processing. There are some minor differences between the RRC CRDB and later versions of the census databases. In particular, the RRC CRDB, which is a database of persons, contains all census records for persons with three exceptions. The first exception involves census records imputed through whole household imputation. The second group consists of census records with invalid or incomplete names, or invalid or incomplete birth dates. This group is also known as the 'incompletely enumerated.' The third group consists of all census records that were added late, after the start of RRC processing.
Lastly, for the purpose of calculating the variance of the RRC estimates, the bootstrap method was used. Five hundred boostrap replicate weights were created.
Pre-release verification consisted of data confrontation with other published sources (census count of enumerated persons, population estimates for deceased, emigrants, and internal migration), and historical trend analysis. In addition, there is an extensive certification process after the release of the preliminary estimates with the provincial and territorial statistical focal points and other key clients.
Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
No microdata are released. No tabulations are produced for small geographic domains. Otherwise, there are no sensitive cells.
Revisions and seasonal adjustment
This methodology type does not apply to this survey.
Several things have been done to minimize the effect of non-response on the Reverse Record Check estimates. First of all, most cases are resolved without the need for collection. Introductory letters were sent to the selected persons (SPs) prior to collection. Head office provided tracing leads using several administrative files to help the ROs locate each SP. A six-month collection period allowed for multiple attempts for each case and for extensive tracing. Multi-mode collection, where questionnaires were mailed and personal visits were made to harder to contact SPs, also helped in completing more interviews.
It is important to note that the definition of a non-respondent for classification, and therefore for estimation, was not the same as the usual definition of a non-respondent for whom data collection was attempted but not completed. This is because classification was based on data from many sources, one of which might be collection.
Data from the Reverse Record Check are combined with the results of the Census Overcoverage Study and data from the final census database to estimate population coverage error. Standard errors are provided for each estimate in the dissemination tables.