General Social Survey - Social Identity (SI)
Detailed information for 2020 (Cycle 35)
Every 5 years
The two primary objectives of the General Social Survey are to gather data on social trends in order to monitor changes in the living conditions and well-being of Canadians over time, and to provide information on specific social policy issues of current or emerging interest.
Data release - September 28, 2021 (First in a series of releases for this reference period.)
- Questionnaire(s) and reporting guide(s)
- Data sources and methodology
- Data accuracy
The main objective of the General Social Survey on Social Identity is to provide an overall picture of Canadians' identification, attachment, belonging and pride in their social and cultural environment. The key components of the survey include the following topics: social networks, civic participation and engagement, knowledge of Canadian history, appreciation of national symbols, shared values, confidence in institutions, and trust in people. In addition, the survey also covers people's possible experiences of discrimination before and during the Covid-19 pandemic.
This record is part of the General Social Survey (GSS) program. The GSS originated in 1985. Each survey contains a core topic, focus or exploratory questions and a standard set of socio-demographic questions used for classification. More recent cycles have also included some qualitative questions, which explore intentions and perceptions.
- Social networks and civic participation
- Society and community
Data sources and methodology
The target population for the 2020 General Social Survey is all non-institutionalized persons and non-residents of First Nations reserves 15 years of age or older, living in the 10 provinces of Canada.
The questionnaire was designed based on research and extensive consultations with data users. Qualitative testing, conducted by Statistics Canada's Questionnaire Design Resource Center (QDRC), was carried out, with respondents in Ottawa, who were screened in based on representative criteria. Questions which worked well and others that needed clarification or redesign were highlighted. QDRC staff compiled a detailed report of the results along with their recommendations. All comments and feedback from qualitative testing were carefully considered and incorporated into the survey whenever possible.
This is a sample survey with a cross-sectional design.
The frame for the regular sample of the survey is the Dwelling Universe File (DUF).
The regular sample is based on a stratified design employing probability sampling. The stratification is done at the province level. Information is collected from one randomly selected household member aged 15 or older, and proxy responses are not permitted.
The 2020 General Social Survey also had an oversample of selected population groups designated as visible minorities. The oversample was drawn from a frame created using the 2016 long-form Census of Population with updated address information from administrative information available to Statistics Canada.
GSS uses a two-stage sampling design. The sampling units are the households. The final stage units are individuals within the identified households. Note that GSS only selects one eligible person per household to be interviewed.
The 2020 GSS frame is stratified by province and household size composition and a simple random sample of dwellings is selected independently within each province.
Sampling and sub-sampling
Sufficient sample was allocated to each of the provinces so that the survey could produce provincial and national level estimates.
For the survey, a single eligible member of each sampled household is selected by the age-order selection method to complete the questionnaire.
A field sample of approximatively 47,000 dwellings was selected for the regular sample. A completion of 20,000 questionnaires was expected. An additional field sample of 40,000 dwellings was selected for the oversample of selected population groups designated as visible minorities. A completion of 10,000 questionnaires was expected.
Data collection for this reference period: 2020-08-17 to 2021-02-07
Responding to this survey is voluntary.
Data are collected directly from survey respondents.
Data are collected using the computer-assisted telephone interviewing method and an electronic questionnaire. First contact is made either by an invitation letter in the mail, which provides a link and access code for completing survey electronically, or by telephone. A non-responding household may receive up to three mail reminders and one SMS reminder before they are contacted by a Statistics Canada interviewer to complete the questionnaire over the telephone. No proxy reporting is allowed. The respondents have the choice between French and English. Interviews are approximately 45 minutes.
Tax derived files (CSDD environment).
Questions relating to income show rather high non-response rates, the income reported by respondents are usually rough estimates. Linking allows getting such information without having to ask questions.
The information collected during the 2020 GSS (Cycle 35) has been linked to the personal tax records (T1, T1FF or T4) of respondents. The information collected has also been linked to the Longitudinal Immigration Database. Household information (address, postal code, and telephone number) and respondent's information (social insurance number, surname, name, date of birth, sex) are key variables for the linkage.
Respondents were notified of the planned linkage before and during the survey. Any respondents who objected to the linkage of their data had their objections recorded, and no linkage to their tax data took place.
View the Questionnaire(s) and reporting guide(s) .
Processing used the Social Survey Processing Environment (SSPE) set of generalized processing steps and utilities to allow subject matter and survey support staff to specify and run the processing of the survey in a timely fashion with high quality outputs.
It used a structured environment to monitor the processing of data ensuring best practices and harmonized business processes were followed.
Edits were performed automatically and manually at various stages of processing at macro and micro levels. They included family, consistency, and flow edits. Family relationships were checked to ensure the integrity of matrix data. A series of checks were done to ensure the consistency of survey data. An example was to check the respondent age against the respondent birth date. Flow edits were used to ensure respondents followed the correct path and fix off-path situations. Error detection was done through edits programmed into the system.
The data capture program allowed a valid range of codes for each question and built-in edits, and automatically follows the flow of the questionnaire.
All survey records were subjected to computer edits throughout the course of the interview. The system principally edited the flow of the questionnaire and identified out of range values. As a result, such problems were immediately resolved with the respondent. If the interviewer was unable to correctly resolve the detected errors, the interviewer bypassed the edit and forwarded the data to head office for resolution. All interviewer comments were reviewed and taken into account by head office editing.
Head office performed the same checks as the system as well as the more detailed edits discussed previously.
In 2020, personal income questions were not asked as part of the survey. Income information has been obtained instead through a linkage to tax data for respondents who did not object to this linkage. Income information was obtained from the 2019 T1 Family File (T1FF). Missing information for all other respondents has been imputed. Since GSS 2016, the family income (i.e., linking directly to a variable on the T1FF that corresponds to the census family income) is used instead of the household income.
When a probability sample is used, as was the case for this survey, the principle behind estimation is that each person selected in the sample represents (in addition to himself/herself) several other persons not in the sample. For example, in a simple random sample of 2% of the population, each person in the sample represents 50 persons in the population (himself/herself and 49 others). The number of persons represented by a given respondent is usually known as the weight or weighting factor.
The 2020 GSS is a survey of individuals and the analytic files contain questionnaire responses and associated information from the respondents.
A weighting factor is available on the microdata file:
WGHT_PER: This is the basic weighting factor for analysis at the person level, i.e. to calculate estimates of the number of persons in the target population having one or several given characteristics.
In addition to the estimation weights, bootstrap weights have been created for the purpose of design-based variance estimation.
Estimates based on the survey data are also adjusted (by weighting) so that they are representative of the target population with regard to certain characteristics (each month we have independent estimates for various age-sex groups by province). To the extent that the characteristics are correlated with those independent estimates, this adjustment can improve the precision of estimates.
While rigorous quality assurance mechanisms are applied across all steps of the statistical process, validation and scrutiny of the data by statisticians are the ultimate quality checks prior to dissemination. Many validation measures were implemented. They include:
a. Analysis of changes over time;
b. Verification of estimates through cross-tabulations;
c. Confrontation with other similar sources of data.
Statistics Canada is prohibited by law from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
Revisions and seasonal adjustment
This methodology does not apply to this survey.
As the data are based on a sample of persons, they are subject to sampling error. That is, estimates based on a sample will vary from sample to sample, and typically they will be different from the results that would have been obtained from a complete census. More precise estimates of the sampling variability of estimates can be produced with the bootstrap method using bootstrap weights that have been created for this survey. The bootstrap method was used to estimate the sampling variability for all of the estimates produced based on the data from 2020 GSS. Estimates with high sampling variability are indicated in this publication and all of the highlighted differences between subgroups of the population are significant at the 95% level.
The overall response rate is 40.3% (43.5% for the regular sample and 36.7% for the oversample of selected population groups designated as visible minorities).
Common sources of these errors are imperfect coverage and non-response. Coverage errors (or imperfect coverage) arise when there are differences between the target population and the surveyed population. Households without telephones, as well as households with telephone services not covered by the current frame, represent a part of the target population that was excluded from the surveyed population. To the extent that the excluded population differs from the rest of the target population, the results may be biased. In general, since these exclusions are small, one would expect the biases introduced to be small. Non-response could occur at several stages in this survey. There were two stages of information collection: at the household level and at the individual level. Some non-response occurred at the household level, and some at the individual level. Survey estimates will be adjusted (i.e. weighted) to account for non-response cases. Other types of non-sampling errors can include response errors and processing errors.
The main method used to reduce nonresponse bias involved a series of adjustments to the survey weights to account for nonresponse as much as possible. Information was extracted from administrative sources and used to model and adjust nonresponse.
The frame for the regular sample was Dwelling Universe File (DUF), a file produced at Statistics Canada. All respondents in the ten provinces were interviewed by telephone or self-completed an electronic questionnaire. Dwellings that were identified as vacant at the time the sampling frame was created were excluded. Dwellings that had neither a mailing address nor an associated telephone number were also excluded from the sample frame, as they could not be contacted by any of the survey collection modes. However, the survey estimates were weighted to include persons living in these dwellings.
As the frame for the oversample was created using the 2016 long-form Census of Population, Census non-respondents nor recent residents of Canada could not be sampled for the oversample. With the very high response rate of the 2016 long-form Census of Population (97.8%) in addition to the use of the DUF for the regular sample, the scope of this imperfect coverage is minimal.
Other non-sampling errors:
For the 2020 GSS significant effort was made to minimize bias by using a well-tested questionnaire, a proven methodology, specialized interviewers and strict quality control.
- The General Social Survey: An Overview
Last review : January 7, 2021