Census of Population

Detailed information for 2016

Status:

Active

Frequency:

Every 5 years

Record number:

3901

The census provides a detailed statistical portrait of Canada and its people by their demographic, social and economic characteristics. This information is important for communities and is vital for planning services such as child care, schooling, family services, and skills training for employment.

Data release - February 8, 2017 (Population and dwelling counts); May 3, 2017 (Age and sex and type of dwelling); August 2, 2017 (Families, households and marital status; Language); September 13, 2017 (Income); October 18, 2017 (Immigration...)

Description

Statistics Canada conducts the Census of Population in order to develop a statistical portrait of Canada and Canadians on one specific day. The census is designed to provide information about people and housing units in Canada by their demographic, social and economic characteristics.

Statistics Canada reinstated the mandatory long-form census in time for the 2016 Census of Population. As such, a sample of approximately 25% of Canadian households received a long-form questionnaire. All other households received a short-form questionnaire.

The Census of Population is a reliable basis for the estimation of the population of the provinces, territories and municipal areas. The information collected is related to federal and provincial legislative measures and provides a basis for the distribution of federal transfer payments. The census also provides information about the characteristics of the population and its housing within small geographic areas and for small population groups to support planning, administration, policy development and evaluation activities of governments at all levels, as well as data users in the private sector.

Clients: Federal government, provincial and territorial governments, municipal governments; libraries; educational institutions; researchers and academics; private industry; business associations labour organisations; ethnic and cultural groups; private citizens; public interest groups are among the users of census data.

Reference period: May 10th, 2016, unless otherwise specified.

Collection period: Month of May, every (5) five years

Subjects

  • Aboriginal peoples
  • Education, training and learning
  • Ethnic diversity and immigration
  • Families, households and housing
  • Income, pensions, spending and wealth
  • Labour
  • Languages
  • Population and demography
  • Population estimates and projections
  • Society and community

Data sources and methodology

Target population

The census enumerates the entire Canadian population, which consists of Canadian citizens (by birth and by naturalization), landed immigrants and non-permanent residents and their families living with them in Canada. Non-permanent residents are persons who hold a work or student permit, or who claim refugee status.

The census also counts Canadian citizens and landed immigrants who are temporarily outside the country on census day. This includes federal and provincial government employees working outside Canada, Canadian embassy staff posted to other countries, members of the Canadian Forces stationed abroad, all Canadian crew members of merchant vessels and their families. Because people outside the country are enumerated, the Census of Canada is considered a modified de jure census.

Foreign residents such as representatives of a foreign government assigned to an embassy, high commission or other diplomatic mission in Canada, and residents of another country who are visiting Canada temporarily are not covered by the census.

Long-form
The census long-form includes the same target population as the short-form census, with the exception of Canadian citizens living temporarily in other countries; full-time members of the Canadian Forces stationed outside Canada; persons living in institutional collective dwellings such as hospitals, nursing homes and penitentiaries; and persons living in non-institutional collective dwellings such as work camps, hotels and motels, and student residences.

Instrument design

Prior to each Census of Population, Statistics Canada undertakes a three to four-year process to review content by consulting with users of data, testing, and developing the questionnaire to ensure the content reflects changes in Canadian society. Factors considered in developing content include legislative requirements for information, program and policy needs, the burden on the respondent in answering the questions, privacy concerns, input from consultations and testing, data quality, costs and operational considerations, historical comparability, and the availability of alternate data sources.

Statistics Canada held content consultations on the census questionnaires, which included receiving submissions, meeting and having conference calls with various data users, such as federal government departments and agencies, provincial and territorial government departments, local governments, the general public, libraries, academia, special interest groups, the private sector and licensed distributors of census data. Statistics Canada launched the 2016 Census Strategy Project in December 2010 and consultations for the 2016 Census content began in September 2012.

Statistics Canada quantitatively and qualitatively tests the questionnaire content. The qualitative testing was conducted with the help of Statistics Canada's Questionnaire Design Resource Center (QDRC), using questionnaires that were based on the 2011 Census questions. The questionnaires were tested with QDRC in June 2013, September 2013, and January 2014. Furthermore the questionnaires were quantitatively tested in May 2014 and a field procedures test was conducted with the questionnaires in October 2014.

In accordance with the Statistics Act, the questions for the Census of Population were prescribed by the Governor in Council through an Order in Council signed January 29, 2016. Questions following Question 10 of the questionnaire constitute the 2016 Census of Population long-form and will be asked pursuant to section 22 of the Statistics Act, and prescribed by the Chief Statistician acting under the direction of the Minister. The Order in Council and the schedule of questions were published in the Canada Gazette on February 6, 2016.

Sampling

For the short-form:

This survey is a census with a cross-sectional design.

Questions on age, sex, marital status, mother tongue and relationship to Person 1 are collected from 100% of the population. Data are collected for all units (dwellings) of the target population, therefore no sampling is done.

For the long-form:

This is a sample survey with a cross-sectional design.

For the census long-form, a random sample of 1 in 4 private dwellings in Canada is selected systematically. The sample size was determined to ensure the dissemination of reliable estimates for small areas and small populations. The long-form sample is selected from the 2016 Census of Population dwelling list.

Data sources

Data collection for this reference period: 2016-02-02 to 2016-07-29

Responding to this survey is mandatory.

Data are collected directly from survey respondents and extracted from administrative files.

Collection for the 2016 Census of Population included response by Internet, paper, the Census Help Line, canvasser, and failed-edit and non-response follow-up. The content of the 2016 Census of Population questionnaires (short and long-forms), has similar content to the 2011 Census of Population with the exception of the income questions that are collected through administrative data, and the omission of the religion question (asked every 10 years).
Statistics Canada obtained income information from personal income tax and benefits records.

1) To reduce response burden and to increase the quality and quantity of income statistics data available, the 2016 Census of Population Program gathered income information solely from administrative data sources rather than asking respondents directly.

For statistical purposes, respondents' information on income, income taxes, contributions to registered savings plans and selected expenditures was extracted from their personal income tax and benefits records (including the T1 income tax return, various information slips held by CRA and CCTB and GST credit programs) and added to their responses to the Census of Population (short and long forms).

From the Canada Revenue Agency (CRA):
T1 Personal Master, 2014 and 2015
Canada Child Tax Benefit files, 2014-2015 and 2015-2016 program years
GST Credit files, 2014-2015, 2015-2016 program years
T4 Statement of Earnings, 2014 and 2015
TFSA contribution and withdrawal slips, 2014 and 2015
Miscellaneous Income Statement and Payroll Deduction slips (MISP), 2015
(includes T4A, T4ANR, T4OAS, T4RIF, T4RSP, T5007, T5, T4NR, T4A(P) - CPP and QPP, T5018, T5008, T1204, RRSP contributions, SAFER)
T3 Master File, Schedule 9 File and Information Slips, 2015
T5013 Partnership slips, 2015
T4E Statement of Employment Insurance Income, 2015

These files were acquired under section 24 of the Statistics Act. Versions current as of December 2016 were used where possible.

The data processing of each of these files was very similar and was performed in two broad phases: the linkage of the administrative records and the adaptation of administrative concepts to statistical concepts.

The linkage phase required selecting personal identifiers available in the file and cleaning and normalising them to permit comparison to a similarly processed version of the identifiers on the Census Response Database. Where administrative file personal identifiers were less complete (information slips in particular) but a Social Insurance Number (SIN) was available, the date of birth and full names were retrieved from an index to complete the information and facilitate the linkage. A certain tolerance was included in the statistical linkage process that allowed for matches, even if some fields did not correspond exactly on both sources.

The concepts required to administer the Income Tax Act are not always aligned with the income concept standards used by Statistics Canada. Where possible, dollar amounts from tax return lines or information slip boxes were aggregated or subtracted to align more closely to the concepts used for statistical purposes. In particular, non-taxable income sources were included in the total income concept and certain taxable amounts such as capital gains were excluded.

Additionally, some edits of implausible values in certain fields were carried out during edit and imputation based on general coherence with other fields.

Only the income information, some basic demographics and a linkage accuracy measure were added to the 2016 Census edit and imputation database. No personal identifiers (i.e., no names, civic addresses, or telephone numbers) were retained on this database and access is restricted to Statistics Canada employees whose assigned work activities require such access.

Only aggregate statistical estimates and analyses conforming to the confidentiality provisions of the Statistics Act are released outside of Statistics Canada. Outputs for the census include a wide range of analysis and standard data tables, as well as custom tabulations.

2) As described above, an administrative index was used to complete the personal identifiers available and permit linkage for records with information slips only (without date of birth, name or proper address) and ensure persons holding multiple SIN were properly covered.

From the Employment and Social Development Canada (ESDC):
Extract from the Social Insurance Registry, 2016

Acquired under section 13 of the Statistics Act. The file version was from September 2016.

3) While not planned for before the census, additional uses of administrative data were required to compensate for collection difficulties in areas affected by forest fires in Alberta. These uses have been described in Appendix 1.4 of the Guide to the Census of Population, 2016 (http://www12.statcan.gc.ca/census-recensement/2016/ref/98-304/app-ann1-4-eng.cfm).

The data integration in the census for the income variables is a straightforward data replacement technique. Administrative income data is obtained for specific income fields for the subset of respondents identified in each administrative source.

If multiple conflicting values are supplied on different files, usually the priority is given to the T1 tax and benefit return value rather than the information slip value which is often left untouched and unedited at the Canada Revenue Agency.

For records with less complete administrative information - if the respondent is not found in administrative records or if he or she had not filed an income tax return - the missing fields are imputed based on the values from a donor record identified using other available information.

View the Questionnaire(s) and reporting guide(s).

Error detection

Statistics Canada's Data Operations Centre (DOC) was the central reception and storage point for electronic and printed questionnaires. Electronic questionnaires were transmitted directly to the DOC's servers, and printed questionnaires were scanned and stored as images.

A set of automated checks was conducted on the images to ensure that each one was suitable for data capture. Failed images were sent for review by clerks, and in cases where the image was deemed unsuitable for capture, the form was sent for re-scanning or transcription.

The data were then captured off the images at an automated step using optical mark recognition (OMR) and optical character recognition (OCR). Responses not successfully captured by OMR or OCR due to problems such as unclear handwriting were captured manually by keying operators. A number of edits were applied to both the automated and manual keying results to ensure the values met expectations, such as consistency among fields, numeric fields falling within a certain range, etc. Failures at the automated capture step were sent to be keyed, while failures at the keying step were brought to the keyer's attention for correction.

Following data capture, all responses were subject to a set of checks, called "coverage edits", to identify any problems affecting the count of persons in each household. This included duplicate persons or questionnaires, or false person data created by data capture or respondent error. Any errors found were corrected by automated processes where possible. The remainder were sent to an interactive system to be corrected by clerical operators.

After the coverage edits step, responses were checked to determine if the household required telephone follow-up to confirm that all usual residents had been enumerated, and to obtain any missing data. Households requiring follow-up included those who indicated they were unsure whether to include some household members on the questionnaire, or who did not provide data for all persons listed in the roster of household members.

Coding, the next stage of data processing, was also carried out at the DOC. All write-in responses were submitted to an automated coding system that assigned each response a numeric code using Statistics Canada reference files, code sets and standard classifications. When the system was unable to assign a code to a particular response, the response was coded manually by an operator. Coding was applied to the following variables: relationship to Person 1, place of birth, citizenship, non-official languages, home language, mother tongue, ethnic origin, population group, First Nation/Indian band, place of residence 1 year ago, place of residence 5 years ago, place of birth of parents, major field of study, location of study, language of work, industry, occupation and place of work.

At each of the data capture, coverage edits and coding steps, quality assurance (QA) activities are conducted to measure quality and provide ongoing feedback to the operations. The QA consists of re-processing samples of work (keyed fields, households, or coded write-ins) and comparing to the initial processing. Discrepancies are sent to be processed a third time to determine what the appropriate result should be. Daily reports containing accuracy rates and other processing statistics are distributed to the operations.

After coding, the data are processed through the final edit and imputation stage. The final edits detect invalid responses and inconsistencies. They are based on rules determined by Statistics Canada's subject-matter analysts. Unanswered questions are also identified. Imputation replaces these missing, invalid or inconsistent responses with plausible values.

Imputation

The data collected in any survey or census contains omissions or inconsistencies. These errors can be the result of respondents missing a question, or they can be due to errors generated during processing.

After the initial editing and coding operations were completed, the data were processed through the final edit and imputation activity. The final editing process detected errors and the imputation process corrected them.

The imputation system consists of two components: deterministic imputation and donor imputation. Deterministic imputation is done to correct systematic errors or errors that have only one solution based on subject matter experience. When many solutions are possible to solve an error, donor imputation is done. The latter method, also called nearest neighbour, is widely used in the treatment of non-response. It replaces missing, invalid or inconsistent information about one respondent with values from another, 'similar' respondent. The rules for identifying the respondent most similar to the non-respondent may vary with the variables to be imputed. Donor imputation methods have good properties and generally will not alter the distribution of the data, a drawback of many other imputation techniques. Nearest neighbour imputation makes sure that any imputed value is consistent with the values of other variables.

Estimation

For the short form:

No weights are calculated to produce the estimates for the Census of Population short form. However, whole household imputation is performed to compensate for the cases of total non-response and the cases of under-coverage that were identified by the Dwelling Classification Survey (DCS).

One of the sources of coverage error in the census is the misclassification of dwellings on Census Day. This error can occur when an occupied dwelling is classified as unoccupied, or when an unoccupied dwelling is classified as occupied. The purpose of the DCS is to produce estimates of the number of these classification errors. A sample of dwellings for which no census questionnaire was returned is contacted, information is collected on the occupancy status and, if occupied, on the number of usual residents.

For the long form:

A weight is produced and associated with each responding household of the long-form sample. Persons, census families and economic families inherit the weight of their household. This weight is used to estimate population parameters.

A weight is first calculated for all the dwellings of the sample based on the sample design as the inverse of the probability of selection. In remote areas and Indian reserves, where the long form is used to collect the information of all households, this is the final weight and data imputation is performed for all cases of total and partial non-response.

In the rest of the country, the sample weight of responding households is first adjusted for coverage, as evaluated by the DCS. The resulting weight is then adjusted for non-response with the help of a model that predicts the probability of response as a function of the variables derived from the short form and from matched administrative data (individual taxation data, Indian Register and immigration data). The weight of non-responding households is set to zero. The weight adjusted for non-response is then benchmarked for various geographic levels to a large number of Census totals, as well as to totals derived from the previously mentioned administrative data that were matched to the Census records. Calibration totals include households that were imputed based on the results of the DCS.

For the estimation of the variability due to sampling and to total non-response, replication weights are used. The variance is estimated using a method based on balanced repeated replication.

Quality evaluation

Data quality assessment provides an evaluation of the overall quality of census data. The results of this assessment are used to inform users of the reliability of the data, to make improvements for the next census, to adjust census data for non-response and, for two coverage studies (reverse record check and the Census Overcoverage Study), to produce official population estimates. Quality assessment activities take place throughout the census process, beginning prior to data collection and ending after dissemination.

Sources of error:

However well a census is designed, the data collected will inevitably contain errors. Errors can occur at virtually every stage of the census process, from material preparation to creation of the list of dwellings, data collection and processing. Census data users should be aware of the types of errors that can occur, so they can assess the usefulness of the data for their own purposes.

Main types of errors:

Coverage errors occur when dwellings and/or persons are missed, incorrectly enumerated or counted more than once.

Non-response errors occur when some or all information about individuals, households or dwellings is not provided.

Response errors occur when a question is misunderstood or a characteristic is misreported by the respondent, the census enumerator or the Census Help Line operator.
Processing errors can occur at any stage of processing. Processing errors include errors that can be made:
1) at data capture,
2) during coding operations when written responses are converted into numerical codes, and
3) during imputation, when valid (but not necessarily accurate) values are inserted into a record to replace missing or invalid data.

Sampling errors apply only when answers to questions are obtained from a sample. This type of error applies only to the 2016 Census long-form questionnaire.

Measuring data quality:

Many data quality studies have been conducted for recent censuses to allow data users to assess the impact of errors and improve their own understanding of how errors occur. For the 2016 Census, special studies examine errors in coverage and data quality, i.e., non-response, response and processing.

Three studies are conducted to measure coverage errors:

(1) Dwelling Classification Survey - One of the sources of coverage error in the census is the misclassification of dwellings on Census Day. This error can occur when an occupied dwelling is classified as unoccupied, or when an unoccupied dwelling is classified as occupied. The purpose of the Dwelling Classification Survey is to produce estimates of the number of these classification errors. A sample of dwellings for which no census questionnaire was returned is contacted, information is collected on the occupancy status and, if occupied, on the number of usual residents.

The DCS estimates are used to adjust the Census household size distribution through imputation of dwellings that did not return a questionnaire. This is done in time for the initial population count release.

(2) Reverse Record Check - This study provides estimates of persons missed by the census (after accounting for the adjustments described in the Dwelling Classification Survey above). Estimates are developed for each province and territory and for various population subgroups (e.g., age-sex groups and marital status).

For the provinces, this study comprises two steps:

o Step 1: Selecting a sample of persons who should have been enumerated in the census, using sources such as the previous census, birth registrations, immigration and non-permanent residents' records, and the sample of persons missed in the Reverse Record Check from the previous census.

o Step 2: Linking persons selected in Step 1 to the Census Response Database (CRD) to determine whether these persons were enumerated. The survey is then used to trace and interview persons who could not be linked with certainty to the CRD in order to collect additional information. Persons who have died or who emigrated prior to Census Day are identified using administrative records, such as the death register, or during tracing or the interviews.

For the territories, Step 1 consists in linking the persons on health insurance records to the Census Response Database to identify persons who were enumerated in the census. The Reverse Record Check sample is then selected among the unmatched persons.

The results of the Reverse Record Check are the most important source of information about persons missed in the census. However, unlike the Dwelling Classification Survey, the estimates are not used to adjust census data before the initial population count release.

(3) Census Overcoverage Study - In the 2011 and 2016 censuses, double-counting of persons is determined by searching for linked records that have a high degree of matching on sex, date of birth and name. Linked records are sampled and checked manually, and results are used to estimate the census overcoverage (or the number of duplicate persons).
When combined with the results of the Reverse Record Check, the results of the Census Overcoverage Study provide estimates of net coverage error in census data. This net error is used to calculate the official population estimates.

Certification:

Certification consists of several activities to rigorously assess the quality of census data at specific levels of geography in order to ensure that the quality standards for public release are met. This evaluation includes the certification of population and dwelling counts, and variables related to dwelling and population characteristics.

During certification, response rates, invalid responses, edit failure rates, and a comparison of data before and after imputation are among the data quality measures used. Tabulations for the 2016 Census are produced and compared with corresponding data from past censuses, other surveys and administrative sources. Detailed cross-tabulations are also checked for consistency and accuracy.

For more information on the quality indicators and certification results, see the reference guides for the various domains of interest.

Disclosure control

Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Depending on the certification results, census data can be released in one of three ways:
- First, the data may be released unconditionally, meaning that the data are of suitable quality.
- Second, the data may be released conditionally or with restrictions. In this case, the data will be released with a special note alerting users to possible limitations, or the data may be specially processed, for example, by combining reporting categories to address quality or confidentiality concerns.
- Finally, the data may be suppressed for quality reasons.
Published census data go through a variety of automated and manual processes to determine whether the data need to be suppressed. This is done primarily for two reasons: (1) to ensure that the identity and characteristics of respondents is not disclosed (which will subsequently be referred to as confidentiality) and (2) to limit the dissemination of data of unacceptable quality (which will subsequently be referred to as data quality).

Overview of suppression for confidentiality reasons

Confidentiality refers to the assurance that Statistics Canada will not disclose any information that could identify respondents. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data. Consequently, geographic areas with a population below a certain threshold are not published.

Random rounding

All counts in census tabulations undergo random rounding, a process that transforms all raw counts into randomly rounded counts. This reduces the possibility of identifying individuals in the tabulations.

Preventing disclosure

The risk of direct or residual disclosure must also be addressed when determining product content. A number of factors must be considered when assessing the risk of disclosure. The detail of individual variables, cross-classification of variables and the geographic level of the data will all contribute to the level of risk. For example, there may be no risk in producing tables with the number of persons in the dwelling and detailed groupings of age by various characteristics of the household members for large geographic areas. However, the risk of disclosure would increase for lower levels of geography.

Area suppression for standard and non-standard geographic areas

Area suppression is used to remove all characteristic data for geographic areas whose population size is below a certain threshold. The population size threshold for all standard areas or aggregations of standard areas is 40, except for blocks, blockfaces and postal code-defined areas. Consequently, no characteristics or tabulated data are released if the total population of the area is less than 40. However, for six-character postal code areas, areas built from the block or blockface, the population size threshold is 100. These population size thresholds are applied to 2016 Census data as well as to all previous census data.

Published census data go through a variety of automated and manual processes to determine whether the data need to be suppressed. This is done primarily for two reasons: (1) to ensure that the identity and characteristics of respondents is not disclosed (which will subsequently be referred to as confidentiality) and (2) to limit the dissemination of data of unacceptable quality (which will subsequently be referred to as data quality).

Overview of suppression for confidentiality reasons

Confidentiality refers to the assurance that Statistics Canada will not disclose any information that could identify respondents. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data. Consequently, geographic areas with a population below a certain threshold are not published.

Random rounding

All counts in census tabulations undergo random rounding, a process that transforms all raw counts into randomly rounded counts. This reduces the possibility of identifying individuals in the tabulations.

Preventing disclosure

The risk of direct or residual disclosure must also be addressed when determining product content. A number of factors must be considered when assessing the risk of disclosure. The detail of individual variables, cross-classification of variables and the geographic level of the data will all contribute to the level of risk. For example, there may be no risk in producing tables with the number of persons in the dwelling and detailed groupings of age by various characteristics of the household members for large geographic areas. However, the risk of disclosure would increase for lower levels of geography.

Area suppression for standard and non-standard geographic areas

Area suppression is used to remove all characteristic data for geographic areas whose population size is below a certain threshold. The population size threshold for all standard areas or aggregations of standard areas is 40, except for blocks, blockfaces and postal code-defined areas. Consequently, no characteristics or tabulated data are released if the total population of the area is less than 40. However, for six-character postal code areas, areas built from the block or blockface, the population size threshold is 100. These population size thresholds are applied to 2016 Census data as well as to all previous census data.

Revisions and seasonal adjustment

This methodology type does not apply to this statistical program.

Data accuracy

The census long-form estimates come from a sample survey and are thus subject to sampling error and non-sampling error. The census short-form estimates come from a census and are thus only subject to non-sampling error.

Non-sampling errors include coverage, non-response, response and processing errors. They can happen during collection or processing operations, despite all efforts made to minimise them.

Sampling errors of long form estimates are measured with the standard error. The standard errors that are calculated and disseminated estimate the variability due to sampling as well as the variability due to total non-response of sampled households. Variability due to total non-response is taken into account because it can represent a significant portion of the total variability of the estimates since the sampling fraction is large.

RESPONSE RATES
One of the key data quality measures used for the Census of Population is the response rate. Table 10.1 in the Guide to the Census of Population, 2016 (http://www12.statcan.gc.ca/census-recensement/2016/ref/98-304/chap10-eng.cfm) shows the response rate for the 2016 Census of Population nationally, and for each province and territory. The rates are provided for the short-form and the long-form together, and for the long-form only.

Table 7 of the Income Reference Guide (http://www12.statcan.gc.ca/census-recensement/2016/ref/guides/004/98-500-x2016004-eng.cfm) provides the linkage rates to administrative sources nationally, and for each province and territory. The data quality section also has a description of the Income data quality indicator flags present in all the data products to show the amount of data not obtained from administrative sources for each geographic area. Separate data quality indicator flags are provided for products produced from the full census and those from the long-form sample estimates.

Documentation

Date modified: