This survey was designed to collect information on the health of the Canadian population and related socio-demographic information.
Data release – November 24, 2004
In the fall of 1991, the National Health Information Council recommended that an ongoing national survey of population health be conducted. This recommendation was based on consideration of the economic and fiscal pressures on the health care systems and the requirement for information with which to improve the health status of the population in Canada. Commencing in April 1992, Statistics Canada received funding for development of a National Population Health Survey (NPHS).
The objectives of the NPHS are to:
- aid in the development of public policy by providing measures of the level, trend and distribution of the health status of the population;
- provide data for analytic studies that will assist in understanding the determinants of health;
- collect data on the economic, social, demographic, occupational and environmental correlates of health;
- increase the understanding of the relationship between health status and health care utilization, including alternative as well as traditional services;
- provide information on a panel of people who will be followed over time to reflect the dynamic process of health and illness;
- provide the provinces and territories and other clients with a health survey that will permit supplementation of content or sample (due to the longitudinal nature of NPHS the sample option is no longer available; the Canadian Community Health Survey [CCHS] is now providing this possibility);
- allow the possibility of linking survey data to routinely collected administrative data such as vital statistics, environmental measures, community variables, and health services utilization.
The NPHS collects information related to the health of the Canadian population and related socio-demographic information. It is composed of three components: the Households, the Health Institutions, and the North components.
The Household component started in 1994/1995 and is conducted every two years. The first three cycles (1994/1995, 1996/1997 and 1998/1999) were both cross-sectional and longitudinal. Beginning in Cycle 4 (2000/2001) the survey became strictly longitudinal (collecting health information from the same individuals each cycle). The cross-sectional and longitudinal documentation of the Household component is presented separately as well as the documentation for the Health Institutions and North components.
The NPHS longitudinal sample includes 17,276 persons from all ages in 1994/1995 and these same persons will be interviewed every two years.
Each cycle, a common set of health questions is asked to the respondents. This allows for the analysis of changes in the health of the respondents over time. In addition to the common set of questions, the questionnaire does include focus content and supplements that change from cycle to cycle. For the complete list of topics covered by the NPHS over time, please consult "NPHS Content, Household Component" in the Documentation section at the bottom of the page.
Health Canada, Public Health Agency of Canada and provincial ministries of health use NPHS longitudinal data to plan, implement and evaluate programs and health policies to improve health and the efficiency of health services. Non-profit health organizations and researchers in the academic fields use the information to move research ahead and to improve health.
The target population of the longitudinal NPHS Household component includes household residents in the ten Canadian provinces in 1994/1995 excluding persons living on Indian Reserves and Crown Lands, residents of health institutions, full-time members of the Canadian Forces Bases and some remote areas in Ontario and Quebec.
Each NPHS cycle questionnaire is conceived in collaboration with specialists from Statistics Canada, Health Canada, provincial ministries of health and researchers from the academic fields. The questionnaire development involves an elaborate literature research and numerous consultations between specialists in order to adapt existing survey instruments from other well-known sources, or to create new ones especially for the NPHS. Every questionnaire is approved by members of the expert committees and the Advisory Committee, which includes representatives from the provincial ministries of health, Health Canada, Statistics Canada, other government departments and specialists.
Data collection is performed using a computer-assisted interview (CAI) system. The logical flow of the questions is programmed to reflect skip pattern associated with certain variables such as age. The program also takes into account the type of answer required, the allowed minimum and maximum values, on-line edits associated with the question and what to do in case of item non-response.
Before collecting data from respondents, the data collection computer application is extensively tested in order to identify any errors in the program flow and text. Furthermore, two field tests are conducted each cycle. The sample size of each of the tests is approximately 900 respondents. The tests involve Statistics Canada's Regional Offices and the interviews are conducted by Statistics Canada's interviewers. The main objectives of the two tests are to observe the respondents' reactions to the survey, to test new or modified questions, to obtain time estimates for the various sections of the questionnaire, to study the response rates and to test field operations and procedures such as the interviewer training and data transmission.
This is a sample survey with a longitudinal design.
The same longitudinal units are followed over time.
The NPHS employed a stratified two-stage sample design (clusters, dwellings) based on the Labour Force Survey (LFS) in all provinces except Québec, where Santé Québec's design for the 1992/1993 Enquête sociale et de santé (ESS) was used.
The LFS design consists of a multistage stratified sample where dwellings are selected within clusters. Each province was divided into 3 types of areas (Major Urban Centres, Urban Towns and Rural Areas) from which separate geographic and/or socio-economic strata were formed. In most strata, 6 clusters, usually Census Enumeration Areas (EAs), were selected with probability proportional to size (PPS). The sample of dwellings was obtained once listing operations in sample clusters were completed. Requirements specific to the NPHS led to small modifications to the LFS sampling strategy. For more details on the NPHS sampling plan, consult the chapter 5 of the Longitudinal Documentation.
In Québec, the NPHS sample was selected from dwellings participating in a health survey organized by Santé Québec: the 1992/1993 ESS. The survey sampled 16,010 dwellings using a two-stage design similar to that of the LFS. The province was divided geographically by crossing fifteen health areas with four urban density classes (the Montreal Census Metropolitan Area, regional capitals, small urban agglomerations, and the rural sector). In each area, clusters were defined using socio-economic characteristics and selected using a PPS sample. Selected clusters were enumerated and random samples of their dwellings were drawn: ten per cluster in major cities, twenty or thirty elsewhere.
In the first cycle of the NPHS (1994/1995), the sample was created by first selecting households and then within each household, choosing one member 12 years of age or older to be the longitudinal respondent. The NPHS longitudinal sample consists of all longitudinal respondents who have completed at least the general component of the questionnaire in Cycle 1. It also includes 2,022 children from the first cycle (1994/1995) of the National Longitudinal Survey of Children and Youth (NLSCY). These children were interviewed by the NLSCY for the NPHS in Cycle 1 and have been interviewed by the NPHS since the second cycle. Please note that the persons selected in 1994/1995 as part of the supplemental buy-in samples (for cross-sectional purposes) were not part of the longitudinal sample. The NPHS longitudinal sample is composed of 17,276 persons and is not renewed over time.
Data collection for this reference period: 2002-05-30 – 2003-07-05
Responding to this survey is voluntary.
Data are collected directly from survey respondents.
Collection for the household sample is divided into four quarters (June, August, October and January). An additional collection period is added for further follow-up of non-respondents from previous quarters.
In Cycle 6, data collection for quarters 1 to 3 was performed by interviewers working from their household. Data collection for quarter 4 and the additional period was performed by interviewers working in Statistics Canada Calling Centres (Edmonton, Sturgeon Falls, Sherbrooke and Halifax).
Data collection for respondents living in health institutions is conducted in person using a paper and pencil questionnaire. The health institutions component questionnaires are used for this purpose.
At the beginning of each cycle, each living longitudinal panel member receives by mail a letter announcing the start of data collection and a brochure, which provides information about the survey and presents results based on NPHS data. The interview is conducted using a computer assisted interview (CAI) system. In Cycle 1, 75% of the interviews with the longitudinal respondents were conducted in person and the rest by telephone. Since Cycle 2, around 95% of the interviews are conducted by telephone. Personal interviews are conducted if the respondent does not have a telephone, upon request by the respondent or if the respondents live in a health institution. The interview lasts less than one hour. Interviews for respondents under 12 years old are done by proxy. However, proxy reporting for the other respondents is allowed only for reasons of illness or incapacity.
Interviewers are employees hired by Statistics Canada. Each cycle, interviewers attend a training session specific to NPHS and they receive a manual for reference.
The questionnaire is designed to be used with a CAI system. Questions are specified along with the type of answer required, the minimum and maximum values, on-line edits associated with the question, and what to do in case of item non-response. The CAI questionnaire gets customized to the respondent based on the data collected during the current and previous interviews.
During collection, all reasonable attempts to obtain interviews with longitudinal respondents are made. The Interviewer training covers ways of reducing the number of non-contacts and refusals and increasing success of tracing. The interview is not conducted with respondents living outside Canada and the United States.
A certain number of questions allow responses in-full. The write-in information is coded using various classification systems.
The industry and occupation data for all cycles are coded to the North American Industry Classification System (NAICS) and Standard Occupational Classification 1991 (SOC-91).
The drug coding for all cycles is based on the Anatomical Therapeutic Chemical (ATC) classification developed by the World Health Organization (WHO) as available on the Health Canada Drug Product Database (DPD) in September 2003.
For all cycles the conditions or health problems causing activity restrictions were coded based on the International Classification of Diseases, 10th Revision (ICD-10). The Musculoskeletal Impairment Supplementary Coding Scheme was not used.
When the death of a respondent is confirmed against the Canadian Vital Statistics Database - Deaths, the cause and the date of death are captured. The cause of death is coded using the ICD-10.
Editing is done in two stages. The first stage of editing is performed during data collection. Valid ranges for variables have been programmed in the computer-assisted interviewing (CAI) application as well as consistency edits between variables and between cycles. The flow of questions is controlled by the CAI application. Warning messages appear on the CAI screen when an invalid value is captured or when inconsistencies are detected by the application. The interviewers then have the opportunity to confirm responses with the respondents. In most cases the conflict has to be resolved before the interview can continue. The second stage of editing is performed during data processing at Head Office (mainly informatics programs). Inconsistencies discovered at this stage are usually corrected by setting one or more of the variables in question to "not stated". The exception to this is the relationship edits, in which inconsistencies go through a manual processing.
Estimation from NPHS data is done using the sampling weights provided with each data set. These weights are computed using an approach where an initial weight representing the inverse probability of selection is computed. This weight is then adjusted to take into account the various specifics of the survey. The typical adjustment is the one to compensate for non-response; homogeneous response groups are formed based on data available from both respondents and non-respondents. For the longitudinal sample, this adjustment tries as much as possible to use the longitudinal data from previous cycles. The Chi-Square Automatic Interaction Detection (CHAID) algorithm is used to determine which variables best characterize the response groups. Once the adjustments have been made, the last step consists of post-stratifying the weights within each province, for 10 age-sex groups (one-dimensional post-stratification). This post-stratification is done to ensure consistency with the (1991 Census-based) population estimates for 1994, the reference year for the panel.
Also, for each of the sampling weights computed for the group of respondents in each cycle, a "share" version of the weights is also computed. These share weights are only given to those respondents who agreed to share their data with the survey partners (typically Health Canada and the various provincial health ministries). The computation of these weights involves the redistribution of the weights of the non-sharers to the sharers using a similar approach to that of the non-response adjustment. Since the share partners only have access to the share data, they must use the share weights for estimation.
Several sets of longitudinal weights were computed throughout the NPHS cycles. First, for Cycle 1, sampling weights were computed to represent the entire panel of 17,276 persons. For Cycle 2, two types of longitudinal weights were computed; the first one was associated exclusively with the subset of panel members who had fully responded to the survey in both cycles, and the second one with the subset that partially or fully responded in both cycles. From Cycle 3 and onwards, only the weights for the subset that provided full responses to all cycles were recomputed after each cycle. Since Cycle 4, new weights were added to those that already existed; these weights are computed for the subset of panel members who fully responded to Cycle 1 and to the most recent cycle (for example, for the Cycle 5 weight, it consists of Cycles 1 and 5).
Given the NPHS's multi-stage survey design, the NPHS uses the bootstrap method to calculate the variance. This method takes the complexities of the survey design into account, as well as the various adjustments to the weights during the weighting process. For each sampling weight, a set of bootstrap weights is available to calculate the variances. Note that the Bootvar program, a program made up of macros available in SAS and SPSS, is distributed with the bootstrap weights in order to calculate the variances with this method.
For more information on the estimation process, consult section 7.4 of the Cycle 5 Longitudinal Documentation.
Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
In Cycles 1, 2 and 3 of the NPHS, cross-sectional Public Use Microdata Files (PUMFs) were produced in addition to the Master files. The PUMFs differ in a number of important aspects from the survey 'master' files held by Statistics Canada. These differences are the result of actions taken to protect the anonymity of survey respondents. First, only cross-sectional data are available on such files, because longitudinal information can lead to the identification of respondents. Also, some sensitive variables are regrouped, capped or completely deleted from the files. All the PUMFs must be approved by Statistics Canada's Microdata Release Committee before their release.
Access to longitudinal Master files and access to the information excluded from the PUMFs can be done via Statistics Canada's Research Data Centres program or Remote Access program. Custom tabulations can also be bought. All outputs are vetted for confidentiality before being given to users.
Before releasing and/or publishing any estimate from these files, users should first determine the number of sampled respondents who contribute to the calculation of the estimate. If this number is less than 30, the weighted estimate should not be released regardless of the value of the coefficient of variation for this estimate. For weighted estimates based on sample sizes of 30 or more, users should determine the coefficient of variation of the rounded estimate and follow the guidelines below.
Estimates in the main body of a statistical table are rounded to the nearest hundred units using the normal rounding technique. If the first or only digit dropped is zero to four, the last digit retained is not changed. If the first or only digit dropped is five to nine, the last digit retained is raised by one. Marginal sub-totals and totals in statistical tables are derived from their corresponding unrounded components and then are rounded themselves to the nearest 100 units using normal rounding methods. Averages, proportions, rates and percentages are computed from unrounded components (for example, numerators and/or denominators) and then are rounded themselves to one decimal using normal rounding. In normal rounding to a single digit, if the final or only digit dropped is zero to four, the last digit retained is not changed. If the first or only digit dropped is five to nine, the last digit retained is increased by one. Sums and differences of aggregates (or ratios) are derived from their corresponding unrounded components and then are rounded themselves to the nearest 100 units (or the nearest one decimal) using normal rounding. Under no circumstances are unrounded estimates, published or otherwise, released.
Coverage errors occur at a number of stages in the survey: during frame design, sampling unit definition, data collection and processing. An indicator, the slippage rate, is used to measure the coverage error. The slippage rate represents the discrepancy between population estimates based on the survey (using pre-post-stratified weights) and the most recent census-based population estimates. The discrepancy is expressed as a percentage of the census-based estimate. As with most surveys, NPHS observed some under-coverage which is manifested by a positive slippage rate (about 10% for the longitudinal sample selected in 1994/95. To reduce the effect of the coverage error, sampling weights are adjusted during their computation according to population estimates provided for the survey's reference period.
Please note that the Cycle 1 response rates is based on the 20,095 in-scope persons selected to form the longitudinal panel while the response rate for subsequent cycles is based on the 17,276 individuals who form the longitudinal panel.
Cycle Response rate
Cycle 1 : 83.6%
Cycle 2 : 92.8%
Cycle 3 : 88.2%
Cycle 4 : 84.8%
Cycle 5 : 80.6%
Attrition is a loss in sample size due to non-respondents i.e. refusals, no-contacts, unable to trace cases, etc. Note that decease respondents are not considered part of attrition for the NPHS longitudinal sample. The cumulative attrition rate is presented for each cycle. Each attrition rate is calculated using the number of individuals found in the Full subset of respondents i.e. those who completed the questionnaire in all cycles. The main cause of attrition is due to an increasing number of respondents who refuse to continue to participate to the survey.
Cycle Attrition rate
Cycle 2 : 9.3%
Cycle 3 : 15.4%
Cycle 4 : 21.4%
Cycle 5 : 27.4%
Item non-response for Cycle 5 questionnaire was around 0.7 %. Higher non-response was observed for a few variables such as household and personal income, insurance and some stress variables.
The Working Paper, NPHS Data Quality: Exploring Non-sampling Errors, DMEM 2003-2004 provides information on a number of data quality indicators (e.g., number of attempted contact, length of interview, unit/item non-response (refusal/don't know) and edit failures) and is available upon request from the Data Access Unit, Health Statistics Division, firstname.lastname@example.org.