Canadian Census Health and Environment Cohorts (CanCHECs)
Detailed information for 2021
Status:
Active
Frequency:
Every 5 years
Record number:
5422
The primary purpose of the Canadian Census Health and Environment Cohorts (CanCHECs) is to enable researchers and academics, public health officials, government agencies, private sectors and other non- governmental organizations to examine health outcomes by various population characteristics, such as income, education, occupation, language, ethnocultural diversity, immigrant status, and Indigenous identity. Additionally, environmental data can be integrated to study the association between environmental exposures and health outcomes.
A cohort is created for each census cycle every (5) five years, but its longitudinal follow-up data are updated annually until the end of the planned study period.
Data release - June 18, 2025
- Questionnaire(s) and reporting guide(s)
- Description
- Data sources and methodology
- Data accuracy
- Documentation
Description
The Canadian Census Health and Environment Cohorts (CanCHECs) enable the creation of population-based linked data sets. The CanCHECs combine census respondents to the long-form questionnaire with administrative health data (e.g., mortality, cancer, hospitalizations, ambulatory care, youth and adult mental health hospitalizations) and annual postal codes for mailing addresses. These data can be used to examine health outcomes by population characteristics measured by the census long-form sample data or the National Household Survey data (e.g., income, education, occupation, language, ethnocultural diversity, immigrant status, or Indigenous identity). Environmental data can also be integrated into the CanCHEC using the annual postal codes for mailing addresses to examine the association between environmental exposure and a health outcome. The potential users of CanCHEC data are researchers and academics, public health officials, government agencies, private sectors and other non-governmental organizations.
Reference period: May 11th, 2021, unless otherwise specified. For annual administrative data, this is the calendar year or the 12-month financial year beginning on April 1 of the reference year and ending on March 31 of the following year.
Collection period: Month of May through July, every (5) five years. For annual administrative data, 13 to 27 months after January 1 of the calendar year or within (2) two months after the end of each quarter of the 12-month fiscal period.
Subjects
- Diseases and health conditions
- Environmental factors
- Health
- Life expectancy and deaths
- Lifestyle and social conditions
Data sources and methodology
Target population
The 2021 Canadian Census Health and Environment Cohort (CanCHEC) links the 2021 Census long-form sample data to the administrative data such as the Canadian Vital Statistics - Death database (CVSD), the Discharge Abstract Database (DAD), etc., through the Derived Record Depository (DRD) in the Social Data Linkage Environment (SDLE) at Statistics Canada. The 2021 CanCHEC enumerates persons from the 2021 Census long-form sample that consisted only of private households, including those living in private dwellings attached to collective dwellings in Canada, but excluding those that are living in incompletely enumerated reserves and settlements. This means the households living in collective dwellings, outside Canada, and in incompletely enumerated reserves and settlements were considered "out of scope" for the 2021 CanCHEC.
Instrument design
Statistics Canada's repository of information about the 2021 Census of Population (record number 3901) includes information on the key steps that ensured that the 2021 Census of Population produced relevant information for Canadians and decision makers.
Sampling
The long-form sample is selected from the 2021 Census of Population dwelling list.
The Canadian Census Health and Environment Cohort (CanCHEC) is a subset of sample survey with a cross-sectional design for the census data with longitudinal follow-up through administrative data. The 2021 CanCHEC members whose information provided by the census is specific to May 11th, 2021, unless otherwise specified, are also followed over time using the administrative data sources. The most recent 2021 CanCHEC data release included the following administrative data sources for longitudinal follow-up:
- Canadian Vital Statistics - Death database (CVSD): May 11th, 2021, to December 31st, 2023
- Discharge Abstract Database (DAD): April 1st, 2000, to March 31st, 2024
- National Ambulatory Care Reporting System (NACRS): April 1st, 2002, to March 31st, 2024
- Ontario Mental Health Reporting System (OMHRS): April 1st, 2006, to March 31st, 2024
- T1 Universe File (Personal Master) for extracting the annual postal codes for mailing addresses: 1981 to 2022.
Canadian Cancer Registry (CCR) data for the 2021 CanCHEC cycle are unavailable, while they are available for other CanCHEC cycles.
Sampling unit
Statistics Canada's repository of information about the 2021 Census of Population (record number 3901) includes information on the sampling unit.
Stratification method
Statistics Canada's repository of information about the 2021 Census of Population (record number 3901) includes information on the stratification strategy.
Sampling and sub-sampling
Statistics Canada's repository of information about the 2021 Census of Population (record number 3901) includes information on how the sample was allocated and selected.
Data sources
Data collection for this reference period: 2000-04-01 to 2024-03-31
Responding to this survey is mandatory.
Data are collected directly from survey respondents, extracted from administrative files and derived from other Statistics Canada surveys and/or other sources.
Data are collected directly from 2021 Census respondents, extracted from administrative files, and derived from other Statistics Canada programs such as the Social Data Linkage Environment.
Census data are collected under the authority of the Statistics Act, R.S.C. 1985, c. S-19. Statistics Canada's repository of information about the Census of Population includes information on how the data were collected. For example, for the 2021 Census of Population, collection method included response by Internet, paper questionnaire, the Census Help Line and failed-edit and non-response follow-up.
The Census of Population is the primary source of sociodemographic data for specific population groups such as lone-parent families, Indigenous Peoples, immigrants, seniors, and language groups. Moreover, the use of administrative health data improves our knowledge and understanding of the association between health outcomes and the social determinants of health.
The data collected from the Census of Population via the long-form questionnaire or the National Household Survey and the data extracted from the Canadian Vital Statistics - Death database (CVSD) and from one or more of the following administrative data sources, which were obtained under the authority of the Statistics Act, R.S.C. 1985, c. S-19, section 13, were integrated to create each Canadian Census Health and Environment Cohort:
- Canadian Cancer Registry (CCR)
- Discharge Abstract Database (DAD)
- National Ambulatory Care Reporting System (NACRS)
- Ontario Mental Health Reporting System (OMHRS).
The T1 Universe File (Personal Master) is an electronic file of data from all individual Canadian T1 income tax filers, including unincorporated businesses, and its data were obtained from the Canada Revenue Agency under the authority of the Statistics Act, R.S.C. 1985, c. S-19, section 24. These data are the primary data source for extracting the annual postal codes for mailing addresses.
Statistics Canada's repository of information about its surveys and statistical programs includes information on data processing conducted on the administrative data by the data provider and by Statistics Canada. The Canadian Institute for Health Information (CIHI) maintains metadata about the DAD, the NACRS, and the OMHRS, which include data processing conducted on the administrative data by the data providers (e.g., acute care facilities or their respective health/regional authority or ministry/department of health) and CIHI. Based on the dissemination guidelines for CIHI data, abstracts identifying that a therapeutic abortion was performed based on diagnosis codes or intervention codes were deleted by Statistics Canada. Furthermore, some data elements disclosed by CIHI to Statistics Canada are either suppressed or deleted by Statistics Canada.
The Social Data Linkage Environment (SDLE) at Statistics Canada provides a secure environment that maximizes the use of existing survey and administrative data to address important research questions and inform socio-economic policy through record linkage without the need to collect additional data from Canadians. Its Derived Record Depository (DRD) and Key Registry facilitate data integration across multiple domains, such as health, justice, education, and income, using standardized record linkage processes and methods.
At the core of 2021 Canadian Census Health and Environment Cohort (CanCHEC) is the paired record identifiers (i.e., linkage keys) between a subset of the 2021 Census long-form sample survey and the CVSD. To create this, the 2021 Census of Population and the CVSD were first separately linked to the DRD. Then, the preliminary linkage keys between the two data sources were extracted from the Key Registry. This is followed by data cleaning of the preliminary linkage keys to create the final linkage keys between the 2021 CanCHEC and the CVSD. In a similar way, the final linkage keys between the 2021 CanCHEC and other administrative data sources were created.
View the Questionnaire(s) and reporting guide(s).
Error detection
The 2021 Canadian Census Health and Environment Cohort (CanCHEC) uses census and administrative data, which have been released and linked to the Derived Record Depository (DRD). Statistics Canada's repository of information about surveys and statistical programs includes information on the methods used to identify missing, invalid or inconsistent entries, or observations that were in error for each of the 2021 Census of Population, the Canadian Vital Statistics - Death database (CVSD), and the Canadian Cancer Registry (CCR). For example, the Census Program corrects non-response and coverage errors by imputing and adjusting the weighting of the census data from the long- form questionnaire. The Canadian Institute for Health Information (CIHI) reports the methods on error detection for their data holdings such as the Discharge Abstract Database (DAD), the National Ambulatory Care Reporting System (NACRS), and the Ontario Mental Health Reporting System (OMHRS). For example, CIHI's Open-Year Data Quality tests help reporting facilities to create their own data quality audits to identify abstracts with suspected data quality issues and to submit corrections.
CanCHEC production team works with the understanding that each subject matter program minimizes errors in its data. This enables the CanCHEC production team to focus on cleaning of the linkage keys extracted from the Key Registry. This data cleaning includes:
1) removing duplicate observations if persons were enumerated more than once in the Census of Population,
2) identifying possible false matches between the census and administrative data at the person and event levels.
The Social Data Linkage Environment (SDLE) team identifies persons who were enumerated more than once by the Census of Population via internal record linkage. If multiple census respondents were grouped as the same person, then one of the census respondents from the linkage keys was selected at random by the CanCHEC production team. The random selection is performed only once for the creation of a cohort (i.e., the cohort membership remains fixed for all subsequent updates). Observations from the census and administrative data are then match-merged using the de-duplicated linkage keys to produce linked data to identify possible false matches. Sex at birth or gender and dates (e.g., date of birth; the Census day and date of death; date of death and date of discharge; etc.) across the census data and the administrative data are compared to reduce the size of search space. For example, persons who were enumerated by the Census of Population, but who died prior to the Census day according to the matched CVSD observations are identified as false matches. However, residual false matches between the census data and the DRD remain following this data cleaning, especially among those persons whose census data did not match to the administrative data via the DRD.
Imputation
Imputation was performed for each invalid day of a month in a year for each of the data sources to help validate the match-merged data. Any invalid day, excluding missing day, was set to the last day of the month (e.g., 29th day of February in a non-leap year was imputed to be the 28th).
A modified version of the method of imputing missing values in the annual postal codes for mailing addresses for the 1991 Canadian Census Health and Environment Cohort (CanCHEC), developed by Health Analysis Division at Statistics Canada, has been implemented to create the annual imputed postal codes for mailing addresses for the 2021 CanCHEC using the set of files derived from the latest Postal Code OM Conversion File Plus.
Estimation
The 2021 Canadian Census Health and Environment Cohort (CanCHEC) consists of the 2021 Census long-form sample survey linked to the Derived Record Depository (DRD) with a linkage rate of 95.8%. To allow data users to estimate and assess the variance of summary measures of the Canadian population in the presence of missed linked pairs between the long-form sample and the DRD, the main and replicate weights produced and associated with persons from their responding households of the long-form sample survey were adjusted and calibrated.
To create the CanCHEC's main and replicate weights, response homogeneous groups were first constructed using the linkage propensity score from a logistic model fitted for the long-form sample. A cell calibration procedure was used to further adjust the weights such that the weighted counts for a multivariate tabulation of the 2021 CanCHEC match the weighted counts from the 2021 Census long-form sample survey.
Recall that the replicate estimator chosen for the 2021 Census long-form sample survey was derived from Fay's balanced half-sample method, which determined the creation of replicates, the calculation of replicate weights and the multiplication factor used to estimate variance. To produce variance estimates for the 2021 CanCHEC estimates, the 100 set of replicate weights produced for variance estimates for the long-form sample estimates was adjusted and calibrated.
Quality evaluation
The Social Data Linkage Environment (SDLE) team routinely assesses the quality of the internal and external record linkages they perform. Common data quality measures include duplication rates, linkage rates, linkage error rates, and differences in the distribution of data elements. To better understand the phenomenon of error propagation resulting from data integration, the Canadian Census Health and Environment (CanCHEC) production team analyzes these measures before cleaning the linkage keys extracted from the Key Registry.
The Methodological Support Services team responsible for adjusting and calibrating the main and replicate weights for the CanCHECs examines slippage on estimates of key data elements (e.g., a ratio of the weighted count of transgender persons aged 35 and older from the 2021 Census long-form sample survey and the weighted count from the 2021 CanCHEC).
The CanCHEC production team uses the final linkage keys between the 2021 CanCHEC and each administrative health data source to compare the weighted distribution of data elements of the administrative data for the CanCHEC to the distribution of the data elements of the population-level administrative data.
Disclosure control
Statistics Canada is prohibited by law from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
Only Statistics Canada employees and deemed employees can be approved to access the confidential microdata.
The use of the Canadian Census Health and Environment Cohorts (CanCHECs) is subject to the normal privacy and confidentiality constraints to prevent the disclosure of personal information or business information (e.g., health care facilities). The confidentiality rules were developed based on the confidentiality standards and guidelines for the dissemination of 2021 Census of Population data, which are retroactively applied to all historical census cycles (unless specified), and those for the dissemination of administrative data. All aggregate statistics are subject to the confidentiality rules for the CanCHECs (e.g., minimum cell count thresholds, consistent random or deterministic rounding, rules for custom geographic areas, etc.).
Revisions and seasonal adjustment
The final linkage keys between the 2021 Canadian Census Health and Environment Cohort (CanCHEC) and the administrative data sources are subject to revision as new administrative data are released and linked pairs between a given data source and the Derived Record Depository (DRD) are created, modified, or deleted. Version control is used to track and manage changes in different versions of the final linkage keys between the 2021 CanCHEC and each administrative data source.
Given that the cohort membership was fixed at the creation of the 2021 CanCHEC, those cohort members whose census records were no longer linked to the DRD in each subsequent version of the final linkage keys between the 2021 CanCHEC and the Canadian Vital Statistics - Death database (CVSD) were identified (i.e., lost to follow-up). For example, the creation of the second 2021 CanCHEC version identified 0.01% of the original cohort members were lost to follow-up.
The 2021 CanCHEC is expected to be followed prospectively for up to about a decade from Census day. Length of the follow-up period for each outcome is subject to change as needed.
Data accuracy
The 2021 Canadian Census Health and Environment (CanCHEC) combines the 2021 Census long-form sample survey with administrative data sources through record linkages. Among the survey and administrative data that contribute to the production of the 2021 CanCHEC, Statistics Canada's repository of information about its surveys and statistical programs includes information on data accuracy for the following data sources:
- Census of Population, 2021 (record number 3901)
- Canadian Vital Statistics - Death database (CVSD) (record number 3233)
- Canadian Cancer Registry (CCR) (record number 3207)
The Canadian Institute for Health Information (CIHI) maintains metadata about the Discharge Abstract Database (DAD), the National Ambulatory Care Reporting System (NACRS), and the Ontario Mental Health Reporting System (OMHRS), which include information on the data quality to help users decide whether the data fit their needs.
Response rates
According to the 2021 Census program, the response rate for the 2021 Census long-form questionnaire was 95.7%.
Non-sampling error
For the 2021 Census of Population, the Census program used administrative data during data processing, to impute non-responding households in areas with low response rates, and where the administrative data are of sufficient quality to generate reliable population counts. Results showed that using administrative data to impute some non-responding households improved the quality of the data for key population and demographic indicators. However, members of the non-responding households that had undergone whole household imputation were not available for record linkage purposes and considered out of scope for the creation of the 2021 Canadian Census Health and Environment Cohort (CanCHEC).
Non-response bias
To allow data users to estimate and assess the variance of summary measures of the Canadian population in the presence of missed linked pairs between the long-form sample and the Derived Record Depository, the main and replicate weights produced and associated with persons from their responding households of the long-form sample survey were adjusted and calibrated. When weighting the 2021 Canadian Census Health and Environment Cohort (CanCHEC), the logistic model fitted for the 2021 Census long-form sample to calculate the linkage propensity score and the cell calibration procedure that followed both accounted for the members of non-responding households treated with the process of whole household imputation who were not available for record linkage purposes.
Coverage error
For the 2021 Census, a total of 63 census subdivisions defined as reserves and settlements were incompletely enumerated. For these reserves and settlements, dwelling enumeration was either not permitted or could not be completed. Therefore, household members from these reserves and settlements were not available for record linkage purposes and considered out of scope for the creation of the 2021 Canadian Census Health and Environment Cohort (CanCHEC). Moreover, weighting the 2021 CanCHEC does not consider these incompletely enumerated reserves and settlements.
The coverage technical report examined coverage errors in the 2021 Census of Population (Catalogue number 98-303-X, issue 2021001). Similar to the use of linkages in the Census Overcoverage Study, the Social Data Linkage Environment team identified persons who were enumerated more than once by the Census of Population via internal record linkage. The CanCHEC production team removed duplicate observations if in-scope persons were enumerated more than once in the target population of the Census long-form sample survey.
Other non-sampling errors
Statistics Canada's repository of information about the 2021 Census of Population (record number 3901) includes documentation that described other non-sampling errors (Catalogue number 98-304-X, issue 2021001).
Documentation
- Canadian Census Health and Environment Cohorts (CanCHECs): Creation of a new health surveillance program
- Census of Population, 2021
- Canadian Cancer Registry (CCR)
- Canadian Vital Statistics - Death database (CVSD)
- Coverage Technical Report, Census of Population, 2021
- Social Data Linkage Environment (SDLE)
- Discharge Abstract Database (DAD) metadata
- National Ambulatory Care Reporting System (NACRS) metadata
- Ontario Mental Health Reporting System (OMHRS) metadata
- Date modified: