Canadian COVID-19 Antibody and Health Survey (CCAHS)
Detailed information for November 2020 to April 2021
The Canadian COVID-19 Antibody and Health Survey (CCAHS), is collecting key information relevant to the COVID-19 pandemic to learn as much as possible about the virus, how it affects overall health, how it spreads, and whether Canadians are developing antibodies against it.
Data release - July 6, 2021
The Canadian COVID-19 Antibody and Health Survey will collect information in two parts. The first part is an electronic questionnaire about general health and exposure to COVID-19. The second part is an at-home finger-prick blood test, which is sent to a lab to determine the presence of COVID-19 antibodies.
The data will be used to:
- estimate how many Canadians test positive for antibodies even if they have never had symptoms of COVID-19
- better understand the social distancing behaviours of Canadians and their general health during the pandemic.
This important information will help evaluate the extent of the health status associated with the COVID-19 pandemic such as the prevalence of infection even for people who have never had symptoms, among a representative sample of Canadians. Through the integration with health and social administrative data, the survey will also provide a platform to explore emerging public health issues, including the impact of COVID-19 on health and social well-being. It also aims to shed light on immune responses to SARS-CoV-2 in a diversity of communities, age brackets, populations, and occupational groups across the nation.
Reference period: Varies according to the question (e.g. in the past six months, since March 1, 2020, in the past 20 years, etc.).
Collection period: November 2020 to April 2021
- Diseases and health conditions
- Lifestyle and social conditions
Data sources and methodology
The target population for the survey is persons 1 year of age and older living in the 10 provinces or three territorial capitals.
The observed population excludes: persons living in the three territories outside of the capitals; persons living on reserves and other Aboriginal settlements in the provinces; full-time members of the Canadian Forces; the institutionalized population and residents of certain remote regions.
The content for the survey was developed by Statistics Canada's Centre for Population Health Data, with input from the COVID-19 Immunity Task Force (CITF) and in consultation with Health Canada and the Public Health Agency of Canada.
The survey takes place in two parts: an electronic questionnaire about general health and exposure to COVID-19. The second part is an at-home finger-prick blood test, which is sent to a lab to determine the presence of COVID-19 antibodies.
The questionnaire and blood sample collection underwent testing in the form of interviews in both of Canada's official languages, conducted by Statistics Canada's Questionnaire Design Resource Centre. The goal of the testing was to test the questionnaire content and to evaluate the effectiveness of the instructions for self-administering the blood sample.
This is a sample survey with a cross-sectional design.
Dwelling Universe File (DUF) is used to select dwellings for persons 25 and over. Multiple vintages of the Canadian Child Benefit (CCB) and Census 2016 are used to identify persons 1 to 24 years old. Their contact information is then updated where possible using Canada Revenue Agency (CRA) databases.
It is a stratified random sample of respondents with two components: a targeted respondent sample and a household sample.
The following sampling units are used in order to have accurate information on dwellings.
Dwelling Universe File (DUF)
Canadian Child Benefit File (CCB)
CRA databases for contact information
Given the heterogeneity of COVID-19 in the population, particularly by geography, sub-provincial strata were created and the sample was allocated across these strata.
In the provinces, 27 strata were created from first subdividing each province into CMA and non-CMA areas. The CMAs of St. John's, Halifax, Saint John, Montréal, Québec, Toronto, Ottawa, Hamilton, Winnipeg, Regina, Saskatoon, Calgary, Edmonton and Vancouver form their own strata. From Ontario, Québec and British Columbia there are three additional strata of aggregated remaining CMA areas. Finally, there are 10 non-CMA regions, one for each province. In addition to these 27 strata, there are three territories for a total of 30 strata.
Typically, the population size of a stratum contributes to the sample size determination, where larger strata get more sample. This is then balanced by the need to ensure all strata receive sufficient sample to produce estimates. Increasing the sample in larger populations and increasing the sample in populations with more heterogeneity leads to more precise results at the national level. In this context, this means increasing the sample in large CMAs and strata with more COVID-19 confirmed cases leads to increased precision in the national estimates. Statistical sample allocation formulae were adapted to fit this specific situation, where the specific population size and proportion of confirmed COVID-19 cases for all strata were used in the allocation. Strata sample sizes were determined by a formula that favors larger population sizes and higher proportions of COVID confirmed cases. The formula was then balanced to ensure sufficient sample was allocated to smaller strata with fewer cases. The results provide a sample allocation that will facilitate analysis for the hardest hit and larger strata with the added benefit of yielding more precise results nationally. Weighting that incorporates the sampling design will ensure that the final weighted sample is representative of the population.
Sampling and sub-sampling
The age groups defined in the proposal are quite broad being defined as <25, 25-64 and 65+, but analysis is not limited to these broad groups. In order to ensure a sufficient sample size of children and youth, sampling for those aged <25 will be done directly from administrative files. The administrative data source provides good coverage of children as young as 6 months of age. Finer age groups, such as 5 year groupings within this youngest cohort, will be considered at sampling to ensure all ages are well represented in the raw national sample.
For those aged 25+, the administrative files available have reduced coverage and dwellings will be selected instead. Within each household, one individual aged 25+ will be selected based on specific instructions within the letter they receive (or provided by the interviewer if they respond by phone). The instructions will use the age of household members to determine who is selected, and will vary from one household to another. For some households, the oldest member is selected, others the second oldest, or the youngest, etc. These letters are randomly assigned to the selected dwellings ensuring that the selected individual from within the dwelling is random. Unlike with the sample of those aged <25, the sampling by specific fine age groupings is not controlled as there is no prior knowledge of the selected dwelling's household composition. However, this method randomly selects individuals of all ages (25+) and given the proposed sample sizes analysis can be conducted at much finer age groups for aggregated geographies. Weighting of the sample will also be performed for these finer age groups to ensure representativeness.
This comprehensive sample will provide nationally representative estimates as well as facilitating more granular estimation.
People under the age of 25 will be selected from administrative files at Statistics Canada. If the selected respondent is under the age of 15, their guardian will be the contact. For those aged 25 and over, dwellings with a mailing address will be randomly selected, and one person from within the dwelling will be selected at random to participate. There will be strict instructions to ensure the selected individual does not choose a different person in the household.
A sample size of 48,000 people is proposed with the hypotheses that a 45% response rate will be achieved and that the current SARS-CoV-2 immunity prevalence is 2-3%. This should yield reliable estimates at the provincial and territorial level for three age groups (<25, 25-64 and 65 and over) by sex. It should also yield reliable national level estimates for at least three ethno-cultural groups or by visible minority status.
Data collection for this reference period: 2020-11-02 to 2021-04-30
Responding to this survey is voluntary.
Data are collected directly from survey respondents.
1- Collections methods
A) Electronic questionnaire
The only contact with respondents is a letter sent through the mail with the DBS kit. The letter informs people living at the sampled address that a specific person or a randomly selected person has been chosen to participate in the survey. On the letter there is a code which gives access to the online questionnaire. The electronic questionnaire takes on average 20 minutes to complete. Respondents are asked a series of questions covering a wide range of COVID-19 related questions. For respondents aged 14 or younger, the questionnaire is answered by a parent or guardian. Respondents aged 15 and up provide consent and answer questions for themselves.
B) Dried blood spots (DBS) sample
The respondents are asked to provide a small blood sample (via finger prick) to be tested for COVID-19 antibodies. Respondents must prick their finger and place up to 5 blood spots on a test strip.
For respondents aged 1 to 14, a parent or guardian provides consent for the child to participate to the DBS test, to receive the child's results, and to store any leftover samples for future use.
All materials related to the survey (initial letter, questionnaire, DBS instructions, etc.) are available in both official languages.
2- Follow-up methods
A Statistics Canada interviewer may call, email or text the respondent to do follow up if we do not receive the respondent complete questionnaire. Afterword, a tracking system will be implemented in order to flag the DBS cards that have not be sent. Follow up calls will be done by CCAHS staff.
When a respondent reaches the age of 14 years old, they will receive a letter asking for their approval of keeping their DBS in the biobank. In the case of a respondent wanting to remove their sample, they will have to specify it in a letter or send an email to CCAHS team.
The questionnaire was developed in both official languages.
4-Average time to complete the survey
The electronic questionnaire takes 20 minutes to complete and the dry blood spot test takes 10 minutes.
View the Questionnaire(s) and reporting guide(s).
Electronic files containing the daily transmissions of completed respondent survey records were combined to create the "raw" survey file. Before further processing, verification was performed to identify and eliminate potential duplicate records and to drop non-response and out-of-scope records.
In addition, some out-of-scope respondent records were found during the data clean-up stage. All respondent records that were determined to be out-of-scope and those records that contained no data were removed from the data file.
After the verification stage, editing was performed to identify errors and modify affected data at the individual variable level. The first editing step was to identify errors and determine which items from the survey output needed to be kept on the survey master file.
Subsequent to this, invalid characters were deleted and the remaining data items were formatted appropriately.
There are only a few variables for which imputation was carried out. First, there are 2 household size variables: total household size and those only aged 25+. Only one of the 2 questions was asked to each respondent, which depends on the wave of collection and the age group of the respondent. In some cases, the respondent did not answer. Using administrative data, these variables were imputed where they were missing where available. Otherwise, donor imputation was used. Lastly, income data was linked to tax files where possible and donor imputed otherwise.
The estimation of population characteristics from a sample survey is based on the premise that each person in the sample represents a certain number of other persons in addition to themselves. This number is referred to as the survey weight. The process of computing survey weights for each survey respondent involves several steps.
1) Each selected dwelling (in the household sample) is given an initial weight equal to the inverse of its selection probability from the sampling frame (DUF). Dwellings identified as out-of-scope during collection are dropped from the sample.
2) The weights for responding households are adjusted to represent the households that did not respond. Adjustment factors are calculated separately by province, and using a nonresponse model based on frame information.
3) The household weights are calibrated so that the sum of the weights match province level household size demographic counts.
4) Person weights are computed by multiplying the household level weights by the inverse of the probability of selecting the person within the household.
5) Each selected person in the targeted respondent sample is given an initial weight to the inverse of the selection probability from the person frame. Persons identified as out of scope are dropped from the sample.
6) The weights of respondents are adjusted to represent the persons which did not respond to the survey. Adjustment factors are computed separately by province, based on a nonresponse model using frame information.
7) The person weights coming from the household sample and the targeted respondent sample are pooled together.
8) The weights of persons who did not provide a DBS kit are adjusted to represent the persons that did not. Adjustment factors are computed separately by province, based on a nonresponse model using frame information and questionnaire data.
9) The person weights are calibrated so that the sum of the weights match demographic population counts at the region by age group and by sex. The weights are also calibrated to demographic counts for large Census Metropolitan Areas (CMAs).
Sampling variance estimation is based on a resampling method called the bootstrap. A further variance adjustment must be done to incorporate sensitivity and specificity of the test when estimating variance for COVID-19 prevalence.
For the calculation of COVID-19 antibody prevalence, the Rogan-Gladen estimator must be used, which incorporates the sensitivity and specificity of the test.
The Generalized Estimation System (G-Est) was used to generate the survey weights and bootstrap weights.
While quality assurance mechanisms are applied at all stages of the statistical process, the validation and detailed review of data by statisticians is the final verification of quality prior to release. Many validation measures were implemented, they include:
a. Verification of estimates through cross-tabulations
c. Consultation with stakeholders internal to Statistics Canada
d. Consultation with external stakeholders
Survey weights were also adjusted to minimise any potential bias that could arise from survey non-response; non-response adjustments and calibration using available auxiliary information were applied and are reflected in the survey weights provided with the data file.
Extensive validations of survey estimates were also performed and examined from a bias analysis perspective. Despite these rigorous adjustments and validations, the high non-response increases the risk of a remaining bias and the magnitude with which such a bias could impact estimates produced using the survey data. Therefore, users are advised to use the CCAHS data with caution, especially when creating estimates for small sub-populations or when comparing to other publically available sources of data.
Statistics Canada is prohibited by law from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
Estimates with less than 5 positive counts in the numerator are supressed for confidentiality reasons.
Estimates for which the effective sample size is below 30 are also supressed. For estimates of COVID-19 prevalence, the effective sample size must be calculated using the adjusted variance, not just the sampling variance. Generally, because prevalence are very low in certain domains, the calculation of a design effect should be bounded below by 1, and by the Kish design effect based on the weights.
Revisions and seasonal adjustment
This methodology does not apply to this survey program.
The survey aims at producing unbiased national and provincial estimates of good quality. Age group and sex breakdowns are also possible, but careful considerations of sample size and quality indicator (confidence interval) must be taken into account.
In all, 47,900 persons were selected to participate in the Canadian COVID-19 Antibody and Health Survey (CCAHS). Some were directly chosen, others via two-stage sampling (household, then person). The response rate to the electronic questionnaire was 36.3%. Of those who completed a questionnaire, 63.6% provided a completed dried blood spot (DBS) kit and consented to testing, for an overall response rate of 23.0%.
The CCAHS covers the population aged 1 and over living in the 10 provinces and 3 territories. Excluded from the survey's coverage are: persons living on reserves and other Aboriginal settlements in the provinces; the institutionalized population; children living in foster care or whose parents are ineligible for the Canadian Child Benefit. For the respondents 25 and over, this represents about 3% of the target population. For children, this represents about 4% of the target population.
Much time and effort was devoted to reducing non-sampling errors in the survey. Quality assurance measures were applied at each stage of the data collection and processing cycle to control the quality of the data.