This survey was designed to collect information on the health of the Canadian population and related socio-demographic information.
Data release – May 29, 1998
In the fall of 1991, the National Health Information Council (NHIC) recommended that an ongoing national survey of population health be conducted. This recommendation was based on consideration of the economic and fiscal pressures on the health care systems and the commensurate requirement for information with which to improve the health status of the population in Canada. Commencing in April 1992, Statistics Canada received funding for development of a National Population Health Survey (NPHS).
The objectives of the NPHS are to:
- aid in the development of public policy by providing measures of the level, trend and distribution of the health status of the population;
- provide data for analytic studies that will assist in understanding the determinants of health;
- collect data on the economic, social, demographic, occupational and environmental correlates of health;
- increase the understanding of the relationship between health status and health care utilization, including alternative as well as traditional services;
- provide information on a panel of people who will be followed over time to reflect the dynamic process of health and illness;
- provide the provinces and territories and other clients with a health survey capacity that will permit supplementation of content or sample;
- allow the possibility of linking survey data to routinely collected administrative data such as vital statistics, environmental measures, community variables, and health services utilization.
This survey became part of the Canadian Community Health Survey (number 3226) for reference year 2000.
The NPHS collects information related to the health of the Canadian population and related socio-demographic information. It is composed of three components: the Household, the Health Institutions, and the North components.
The Household component started in 1994-1995 and is conducted every two years. The first three cycles (1994-1995, 1996-1997 and 1998-1999) were both cross-sectional and longitudinal. Beginning in Cycle 4 (2000-2001), the survey became strictly longitudinal (i.e. collecting health information from the same individuals each cycle). The cross-sectional component is now part of the Canadian Community Health Survey. The cross-sectional and longitudinal documentation for the Household component is presented separately as well as the documentation for the Health Institutions and North components.
In addition to a common set of questions asked in cycles 1, 2 and 3, the questionnaire included focus content and supplements that changed from cycle to cycle. General health questions were asked about each member of the selected household and specific health questions were asked to the selected member of the household. For the complete list of topics covered by the NPHS over time, please consult the NPHS Content, Household Component, Cycles 1 to 5.
Health Canada and provincial ministries of health use NPHS data to plan, implement and evaluate programs and health policies to improve health and the efficiency of health services. Non-profit health organizations and researchers in the academic fields use NPHS data to move research ahead and to improve health. The media uses the results from the survey to raise awareness about various health issues.
The target population of the cross-sectional NPHS included all age household residents from all provinces, with the principal exclusion of populations on Indian Reserves, Canadian Forces Bases and some remote areas in Québec and Ontario.
Each NPHS cycle questionnaire was conceived in collaboration with specialists from Statistics Canada, Health Canada, provincial ministries of health and researchers from the academic fields. The questionnaire development involved an elaborate literature research and numerous consultations between specialists in order to adapt existing survey instruments from other well-known sources, or to create new ones especially for the NPHS. Every questionnaire was approved by members of the expert committees and the Advisory Committee, which includes representatives from the provincial ministries of health, Health Canada, Statistics Canada, other government departments and specialists.
Data collection was performed using a Computer Assisted Interview (CAI) system. The logical flow of the questions was programmed to reflect skip pattern associated with certain variable such as age. The program takes also into account the type of answer required, the allowed minimum and maximum values, on-line edits associated with the question and what to do in case of item non-response.
Before collecting data from respondents the data collection computer application was extensively tested in order to identify any errors in the program flow and text. Furthermore, two field tests were conducted for each cycle. The sample size of each test was approximately 900 respondents. The tests involved Statistics Canada's Regional Offices and the interviews were conducted by Statistics Canada's interviewers. The main objectives of the two tests were to observe respondents' reaction to the survey, to test new or modified questions, to obtain time estimates for the various sections of the questionnaire, to study the response rates and to test field operation and procedures such as the interviewer training and data transmission.
This is a sample survey with a cross-sectional design and a longitudinal follow-up.
The NPHS used a stratified two-stage sample design (clusters, dwellings) based on the Labour Force Survey (LFS) in all provinces except Québec, where Santé Québec's design for the 1992--1993 Enquête sociale et de santé (ESS) was used.
The LFS design consists of a multistage stratified sample where dwellings are selected within clusters. Each province was divided into three types of areas (Major Urban Centres, Urban Towns and Rural Areas) from which separate geographic and/or socio-economic strata were formed. In most strata, six clusters, usually Census Enumeration Areas (EAs), were selected with probability proportional to size (PPS). The sample of dwellings was obtained once listing operations in sample clusters were completed. As NPHS usually requested only between 2 and 6 clusters per LFS stratum, similar LFS strata were grouped to form larger NPHS strata with the required number of sample clusters. Once strata were grouped, their sample clusters were also grouped to form replicates. As a result of these modifications, the NPHS sample of clusters can be considered as a stratified replicated sample where strata are groups of LFS strata and replicates are typically independent, identically distributed samples of 4 clusters each.
In Québec, the NPHS sample was selected from dwellings participating in a health survey organized by Santé Québec: the 1992--1993 ESS. The survey sampled 16,010 dwellings using a two-stage design similar to that of the LFS. The province was divided geographically by crossing fifteen health areas with four urban density classes (the Montreal Census Metropolitan Area, regional capitals, small urban agglomerations, and the rural sector). In each area, clusters were defined using socio-economic characteristics and selected using a PPS sample. Selected clusters were enumerated and random samples of their dwellings were drawn: ten per cluster in major cities, twenty or thirty elsewhere.
The NPHS Cycle 2 (1996-1997) cross-sectional sample contains all members of the NPHS longitudinal panel and buy-in samples from 3 provinces: Ontario, Manitoba and Alberta. These provinces bought additional samples to produce reliable estimates at the sub-provincial level. The buy-in samples were selected using random digit dialing.
For more information on the sampling plan, consult Chapter 5 of the NPHS Cycle 2 User Guide in the Documentation section.
The NPHS Cycle 2 health cross-sectional file contains all selected respondents (longitudinal sample and buy-in sample) that answered the detailed health questionnaire.
The NPHS Cycle 2 general cross-sectional file contains all members of households (longitudinal sample and buy-in sample) for which a general health questionnaire was completed.
Responding to this survey is voluntary.
Data are collected directly from survey respondents.
At the beginning of each cycle, each person part of the NPHS sample received by mail a letter announcing the start of data collection and a brochure, which provided information about the survey and presented results based on NPHS data. In Cycle 1, 79% of the interviews regarding the household members were conducted in person (21% by telephone), and 72% of the interview with the selected respondent were conducted in person (28% by telephone). In Cycles 2 and 3, approximately 95% of the interviews were conducted by telephone (5% in person). Personal visits were made only if the selected respondent did not have a telephone, if the interviewer made a personal visit in the course of tracing a respondent or upon request by the respondent. The total interview time averaged one hour per household. Proxy reporting was allowed for the general health questions (household members) and for the interview for the selected respondents under 12 years old were done by proxy. However, proxy reporting for selected respondents aged 12 and over, was allowed only for reasons of illness or incapacity.
Interviewers were employees hired and trained by Statistics Canada to carry out surveys using computer-assisted interviewing, and most were experienced Labour Force Survey interviewers. Each cycle, interviewers attended a training session and received a manual for use as a reference tool.
The survey questions were designed for computer-assisted interviewing (CAI): as the questions were developed, the associated logical flow into and out of the questions was specified, along with the type of answer required, the minimum and maximum values, on-line edits associated with the question, and what to do in case of item non-response. With CAI, the interview was controlled based on answers provided by the respondent. Onscreen prompts were shown when an invalid entry was recorded and the interviewer could correct inconsistencies when needed. The CAI system inserted text or data on the screen based on information gathered during the current interview or previous cycle's interviews (e.g. name and sex). In other words the questionnaire was customised to the respondent based on the data collected during the current and previous interviews.
Interviewers were instructed to make all reasonable attempts to obtain interviews with longitudinal respondents. Numerous calls were made to try to reach a respondent. The Interviewer training covered ways of reducing the number of non-contacts using information collected in previous cycles. When needed, appointments were scheduled with respondents. Regional Offices sent a letter to respondents who refused to participate stressing the importance of the survey and their participation. Refusals were followed up by senior interviewers. Interviewers used several methods to trace a respondent (the last known address and telephone number, the name and address of one or two contacts provided by the respondents, local telephone directories and directory assistance, etc.). If these leads were unsuccessful, the case was transferred to an interviewer specially trained in tracing respondents. Attempts were made to contact panel members who moved. The survey was not conducted with respondents living outside Canada and the United States.
Editing was done in two stages. The first stage of editing was performed on-line during data collection. Valid ranges for variables were programmed in the computer-assisted interviewing (CAI) application as well as consistency edit between variables and between cycles. The flow of questions was controlled by the CAI application. Warning messages appear on the CAI screen when an invalid value was captured or when inconsistencies were detected by the application. The interviewers then had the opportunity to confirm responses with the respondents. In most occasions the conflict had to be resolved before the interview could continue. The second stage of editing was performed during data processing at Head Office (mainly SAS programming). Inconsistencies discovered at this stage were usually corrected by setting one or more of the variables in question to "not stated". The exception to this were the relationship edits, in which inconsistencies went through a manual collection process.
Several questions allowing write-in responses had the write-in information coded to either new unique categories, or to an existing category. Where possible (e.g., occupation, industry, diseases), the coding followed the standard classification systems as used either in the Census of Population or in other Statistics Canada surveys such as the Health and Activity Limitation Survey and General Social Survey-Cycle 6.
Cycle 2 industry and occupation data were coded to the Standard Industrial Classification 1980 (SIC-80) and the Standard Occupational Classification 1980 (SOC-80).
Estimation from NPHS data was done using the sampling weights provided with each data set. These weights were computed using an approach where an initial weight representing the inverse probability of selection was computed. This weight was then adjusted to take into account the various specifics of the survey. The typical adjustment was the one to compensate for non-response; homogeneous response groups were formed based on data available from both respondents and non-respondents. For the part of the sample that comes from the longitudinal panel, this adjustment used the longitudinal data from previous cycles. The CHAID algorithm was used to determine which variables best characterize the response groups. Once the adjustments were made, the last step consisted of post-stratifying the weights based on 10 age-sex groups (one-dimensional post-stratification) to insure consistency with the available Census-based estimates for the reference year of the survey.
Also, for each of the sampling weights computed for the group of respondents in each cycle, a "share" version of the weight was also computed. This share weight was given only to those respondents who agreed to share their data with the survey partners (Health Canada and the various provincial health ministries). The computation of this weight involved the redistribution of the weights of the non-sharers to the sharers using a similar approach to that of the non-response adjustment. Since the share partners only have access to the share data, they must use the share weights for estimation.
Note that during Cycle 2 of NPHS, an additional weight was created specifically for the analysis of responses to the questions relating to health services for children. These questions were only asked of a sub-sample of children interviewed in Alberta and Manitoba.
Given the NPHS's multi-stage survey design, the NPHS uses the bootstrap method to calculate the variance. This method takes the complexities of the survey design into account, as well as the various adjustments to the weights during the weighting process. For each sampling weight, a set of bootstrap weights is available to calculate the variances. Since the NPHS longitudinal sample is part of the cross-sectional sample for the first three cycles, it is important to keep the dependence of these samples in mind when calculating the variances for statistics using more than one cycle of data. For this reason, a set of coordinated bootstrap weights have recently been computed.
For more details, see the document "Coordinated Bootstrap Weights" in the Documentation section.
Note that the Bootvar program, available in SAS and SPSS, is distributed with the bootstrap weights in order to calculate the variances with this method.
Finally, coefficients of variation tables are also available as an alternative to the use of bootstrap weights (for example, for users of the public files that don't have access to the bootstrap weights for confidentiality reasons). These tables allow the users to obtain an approximate coefficient of variation according to the estimate calculated from the survey data. Several tables are available for each cycle, each referring to a different sub-population of interest. The coefficients of variation in these tables are based on the average design effect obtained from a variety of variables. By their nature, these tables apply only to estimates of totals, proportions, and the differences between them.
For more information on the estimation process, it is recommended to consult the NPHS user guide in the Documentation section.
Various strategies were put in place during data collection to improve response rates such as: interviewer training, use of introductory letters, brochures, use of languages other than French and English to conduct interviews, reschedule interview when needed, non-respondents follow-ups, tracing, response rate monitoring, transfer of caseloads to other offices, etc.
NPHS data was collected using a Computer-Assisted Interview (CAI) system which ensures that all and only appropriate questions were asked. The CAI application was extensively tested in-house in order to identify any errors in the program flow and text. Furthermore, in each cycle, two field tests were conducted. The tests involved four of Statistics Canada's Regional Offices. The main objectives of the two tests were to observe respondent reaction to the survey, to obtain estimates of time for the various sections, to study response rates and to test feedback questions. Field operations and procedures, interviewer training, and the CAI application (i.e., the questionnaire on computer) were also tested. Application testing was an ongoing operation up until the start of collection for the survey.
Editing was performed on-line in the CAI application during data collection. It was not possible to enter out-of-range values, and flow errors were controlled through the use of CAI. Some types of inconsistent or unusual reporting were edited after data collection at Head Office. Inconsistencies were usually corrected by setting answers to a question to 'not stated'.
Files, record layouts, programs, documentation, CD-ROMS etc. were verified and tested before they were sent outside STC.
Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
In Cycles 1, 2 and 3 of the NPHS, cross-sectional Public Use Microdata Files (PUMFs) were produced in addition to the Master files. The PUMFs differ in a number of important aspects from the survey 'master' files held by Statistics Canada. These differences are the result of actions taken to protect the anonymity of individual survey respondents. First, only cross-sectional data are available on such files, because longitudinal information can lead to the identification of respondents. Also, some sensible variables were regrouped, capped or completely deleted from the files. All the PUMFs must be approved by Statistics Canada's Microdata Release Committee before their release.
Users requiring access to information excluded from the PUMFs may purchase custom tabulations, or access the master files through Statistics Canada's Research Data Centres program or Remote Access program. All outputs are vetted for confidentiality before being given to users.
Before releasing and/or publishing any estimate from these files, users should first determine the number of sampled respondents who contribute to the calculation of the estimate. If this number is less than 30, the weighted estimate should not be released regardless of the value of the coefficient of variation for this estimate. For weighted estimates based on sample sizes of 30 or more, users should determine the coefficient of variation of the rounded estimate and follow the guidelines below.
Estimates in the main body of a statistical table are rounded to the nearest hundred units using the normal rounding technique. If the first or only digit dropped is zero to four, the last digit retained is not changed. If the first or only digit dropped is five to nine, the last digit retained is raised by one. Marginal sub-totals and totals in statistical tables are derived from their corresponding unrounded components and then are rounded themselves to the nearest 100 units using normal rounding methods. Averages, proportions, rates and percentages are computed from unrounded components (for example, numerators and/or denominators) and then are rounded themselves to one decimal using normal rounding. In normal rounding to a single digit, if the final or only digit dropped is zero to four, the last digit retained is not changed. If the first or only digit dropped is five to nine, the last digit retained is increased by one. Sums and differences of aggregates (or ratios) are derived from their corresponding unrounded components and then are rounded themselves to the nearest 100 units (or the nearest one decimal) using normal rounding. Under no circumstances are unrounded estimates, published or otherwise, released. Unrounded estimates imply greater precision than actually exists.
Cycle 2, household level (general questionnaire) cross-sectional response rate: 82.6%
Cycle 2, selected person level (health questionnaire) cross-sectional response rate:95.6%
Consult the Cycle 2 NPHS User Guide in the Documentation section more specifically chapter 8 on data quality and chapter 10 on approximate sampling variability tables