Canadian Income Survey - 2021 (CIS)
Detailed information for 2021
The Canadian Income Survey (CIS) is a cross-sectional survey developed to provide a portrait of the income and income sources of Canadians, with their individual and household characteristics.
Data release - May 2, 2023
The primary objective of the Canadian Income Survey (CIS) is to provide information on the income and income sources of Canadians, along with their individual and household characteristics. The data collected in the CIS is combined with Labour Force Survey (LFS, record number 3701) and tax data.
The survey gathers information on labour market activity, school attendance, disability, unmet health care needs, support payments, child care expenses, inter-household transfers, personal income, food security, and characteristics and costs of housing. This content is supplemented with information on individual and household characteristics (e.g. age, educational attainment, main job characteristics, family type), as well as geographic details (e.g. province/territory, census metropolitan area (CMA)) from the LFS. Tax data for income and income sources are also combined with the survey data.
Results from the survey are made available not only to various levels of government, but also to individuals and organizations. All levels of government can use CIS data to shape policies and programs related to the economic well-being of Canadians. Statistical organizations such as the Organization for Economic Cooperation and Development (OECD) use the results for international benchmarking and comparison studies.
Reference period: Calendar year
Collection period: January through June of the year following the reference year.
- Families, households and housing
- Household, family and personal income
- Income, pensions, spending and wealth
- Low income and inequality
Data sources and methodology
The survey is conducted nationwide, in both the provinces and the territories. It covers all individuals in Canada, excluding persons living on reserves and other Indigenous settlements in the provinces, the institutionalized population, and households in extremely remote areas with very low population density. Overall, these exclusions amount to less than two percent of the population.
Qualitative testing was carried out by Statistics Canada's Questionnaire Design Resource Centre (QDRC) for selected modules of the survey questionnaire, while questions for the remaining modules came from other Statistics Canada surveys. Question wording adheres as closely as possible to questions established by the Harmonized Content Committee at Statistics Canada.
The questionnaire follows standard practices and wording used in a computer-assisted interviewing environment, such as the automatic control of flows that depend upon answers to earlier questions and the use of edits to check for logical inconsistencies and capture errors. The computer application for data collection was tested extensively.
This is a sample survey with a cross-sectional design.
The Canadian Income Survey is administered to a sub-sample of LFS respondents. The LFS sample is drawn from an area frame and is based on a stratified, multi-stage design that uses probability sampling.
The LFS uses a rotating panel sample design. In the provinces, selected dwellings remain in the LFS sample for six consecutive months. Each month, about one-sixth of the LFS sampled dwellings are in their first month of the survey, one-sixth are in their second month of the survey, and so on. These six independent samples are called rotation groups. In the territories, the sample is composed of eight rotation groups. Selected dwellings are interviewed every quarter and remain in the LFS sample for two years.
For the 2021 CIS, six rotation groups from the LFS were used for the provinces: the rotation groups answering the LFS for the last time from January to June of 2022. For the territories, dwellings from all rotation groups were included in the CIS. The CIS sample size was about 55,000 households.
Data collection for this reference period: 2022-01-16 to 2022-07-05
Responding to this survey is voluntary.
Data are collected directly from survey respondents and extracted from administrative files.
CIS interviews are conducted by telephone by interviewers working in a regional office or by personal visit from a field interviewer. In addition, respondents who meet certain criteria are offered the option of completing the survey on-line.
In each dwelling, information about all household members is usually obtained from one knowledgeable household member. Such 'proxy' reporting is used to avoid the high cost and extended time requirements that would be involved in repeat visits or calls necessary to obtain information directly from each respondent.
Personal income data from the Canada Revenue Agency (CRA) are used for income and income sources information.
The Canadian Income Survey introduced improvements to the methods and systems used to produce income estimates. Beginning with the 2021 reference year, CIS income data were produced from the Administrative Personal Income Masterfile, a comprehensive source of personal income data generated not only from T1 tax returns, but also from associated tax slips. Previous reference years estimates were produced using T1 tax returns only. Other changes to income processing were introduced at the same time, and estimates for 2021 also incorporated updates to the weighting methodology. These changes to the data source, processing system and weighting improve the quality of the data, while having minimal impact on key CIS estimates and trends.
View the Questionnaire(s) and reporting guide(s) .
The CIS Computer Assisted Interviewing (CAI) questionnaire incorporates many features that serve to maximize the quality of the data collected. There are edits built into the CAI questionnaire to compare the entered data against unusual values, as well as to check for logical inconsistencies. Whenever an edit fails, the interviewer is prompted to correct the information (with the help of the respondent when necessary). For most edit failures, the interviewer has the ability to override the edit failure if they cannot resolve the apparent discrepancy.
Once the data is received back at head office, an extensive series of processing steps is undertaken to thoroughly verify each record received. This includes the review of interviewer entered notes. The editing phase of processing involves the identification of logically inconsistent items and the correction of such conditions. Since the true value of each entry on the questionnaire is not known, the identification of errors can be done only through recognition of obvious inconsistencies.
Households are kept as respondents if information for at least one person in the household was provided, and any key data that is missing for individuals within responding households is imputed. Imputation is carried out for income variables as well as variables related to labour, school attendance, food security, housing and utility costs.
CIS uses a nearest neighbour approach for the imputation of most income variables, and for labour, school attendance, food security and housing variables. This imputation method involves the selection of a donor record based on matching variables. First, a set of matching variables, each of which is correlated with the variables to be imputed, is defined. Then, through the combined use of a score function (for categorical matching variables) and a distance function (for numeric matching variables), the most similar consistent donor record is identified and used to impute data for the record.
Cold-deck imputation using donor information from the 2021 Census is used to impute utility costs for all CIS households. Imputation classes are formed to identify groups of Census households sharing characteristics with the CIS household to be imputed. Data from a randomly selected (with replacement) Census household is used to impute data for the CIS household.
The CIS sample is a sub-sample of the Labour Force Survey sample. LFS uses a complex random sampling plan to select households. Each household in the sample represents a number of other households in the population. Estimates for a given characteristic are obtained by multiplying the survey weight by the corresponding value of this characteristic. The key step in the point estimation process is therefore the derivation of the weights.
The initial weights are the LFS subweights, which are then adjusted to account for the fact that the CIS is a sub-sample of the LFS sample.
Two types of adjustment are then applied to these weights in order to improve the reliability of the estimates. The weights are first inflated to compensate for CIS non-response. Then, the non-response adjusted weights are further adjusted to ensure that estimates on relevant population characteristics respect population totals from sources other than the survey.
The first set of population totals used by the CIS are estimates of population counts based on the 2016 Census, provided by Statistics Canada's Centre for Demography. For each province, population counts for different age/sex groups, household size and economic family size are used. CIS also employs population counts for six Census Metropolitan Areas (Montreal, Toronto, Winnipeg, Calgary, Edmonton, and Vancouver).
The second set of totals is derived from the T4 file from the Canada Revenue Agency (CRA) and is intended to ensure that the weighted distribution of income (based on wages and salaries) in the dataset matches that of the Canadian population.
In order to estimate sampling variance, the bootstrap approach is used. A set of 1,000 bootstrap weights is produced.
A separate set of weights is created specifically for estimating disability and unmet health care needs. The initial weights are the CIS non-response adjusted weights. These weights are then inflated to account for the fact that only one person in the household among those aged 16 years or older is selected for the disability and unmet health care needs questions. They are further increased to compensate for non-response to these questions. To ensure that estimates of population characteristics respect population totals, weights are adjusted to match age/sex group counts and income distribution within each province.
A set of 1,000 bootstrap weights is also produced in order to estimate sampling variance related to disability and unmet health care needs.
Results from the survey are compared with other data sources that include administrative databases and other Statistics Canada surveys.
Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
Estimates based on less than 25 observations or with a coefficient of variation higher than 33.3% are suppressed. Population estimates are rounded to the nearest thousandth and income estimates are rounded to the nearest hundredth.
CIS Public Use Microdata Files (PUMFs):
CIS uses a number of techniques to ensure confidentiality:
- Data reduction involves limiting the amount of identifying information on the PUMF. Techniques include removing direct identifiers, sub-sampling, reducing the level of detail, grouping and suppressing data values for specific records.
- Data perturbation involves applying protective measures by coarsening or perturbing the data to hamper re-identification attempts. The addition of noise and data swapping are examples of perturbation techniques that are often employed.
- Quantitative variables with very large positive or negative values are usually rare or unique in the population. Such extreme values are often top-coded which involves replacing the top values while preserving the integrity of the file for the purposes of producing precise and accurate statistics.
- All income values are rounded.
Revisions and seasonal adjustment
Revisions are made to CIS data every five years or so after new population estimates become available following the most recent census. At that time, all CIS data back to the previous census are re-weighted using the new population estimates (since the new population estimates will cover the inter-censal period between the two most recent censuses), and all corresponding historical CIS estimates are revised. With the release of CIS 2020, estimates from 2012 to 2019 have been revised to reflect 2016 Census population estimates.
Since the CIS is a sample survey, all estimates are subject to both sampling and non-sampling errors.
Non-sampling errors can arise at any stage of the collection and processing of the survey data. These include coverage errors, non-response errors, response errors, interviewer errors, coding errors and other types of processing errors.
Coverage errors arise when sampling frame units do not exactly represent the target population. Units may have been omitted from the sampling frame (undercoverage), or units not in the target population may have been included (overcoverage), or units may have been included more than once (duplicates).
Undercoverage represents the most common coverage problem. Slippage is a measure of survey coverage error. It is defined as the percentage difference between control totals (postcensal population estimates) and weighted sample counts. In 2021, the CIS person-level slippage rate was 9.8%.
In 2021, the final CIS response rate was 70.8%.
Sampling errors associated with survey estimates are measured using coefficients of variation for CIS estimates.