National Population Health Survey: Household Component, Longitudinal (NPHS)
This survey was designed to collect information on the health of the Canadian population and related socio-demographic information.
Detailed information for 2010-2011 (Cycle 9)
Data release - September 12, 2012. After nine cycles, the National Population Health Survey has ended. Special thanks and appreciation to all NPHS respondents who, by devoting their time, made this survey possible.
- Questionnaire(s) and reporting guide(s)
- Data sources and methodology
- Data accuracy
In the fall of 1991, the National Health Information Council recommended that an ongoing national survey of population health be conducted. This recommendation was based on consideration of the economic and fiscal pressures on the health care systems and the requirement for information with which to improve the health status of the population in Canada. Commencing in April 1992, Statistics Canada received funding for development of a National Population Health Survey (NPHS).
The objectives of the NPHS are to:
- aid in the development of public policy by providing measures of the level, trend and distribution of the health status of the population;
- provide data for analytic studies that will assist in understanding the determinants of health;
- collect data on the economic, social, demographic, occupational and environmental correlates of health;
- increase the understanding of the relationship between health status and health care utilization, including alternative as well as traditional services;
- provide information on a panel of people who will be followed over time to reflect the dynamic process of health and illness;
- provide the provinces and territories and other clients with a health survey that will permit supplementation of content or sample (due to the longitudinal nature of NPHS the sample option is no longer available; the Canadian Community Health Survey [CCHS] is now providing this possibility);
- allow the possibility of linking survey data to routinely collected administrative data such as vital statistics, environmental measures, community variables, and health services utilization.
The NPHS collects information related to the health of the Canadian population and related socio-demographic information. It is composed of three components: the Households, the Health Institutions, and the North components.
The Household component started in 1994/1995 and is conducted every two years. The first three cycles (1994/1995, 1996/1997 and 1998/1999) were both cross-sectional and longitudinal. Beginning in Cycle 4 (2000/2001) the survey became strictly longitudinal (collecting health information from the same individuals each cycle). The cross-sectional and longitudinal documentation of the Household component is presented separately as well as the documentation for the Health Institutions and North components.
The NPHS longitudinal sample includes 17,276 persons from all ages in 1994/1995 and these same persons will be interviewed every two years.
Each cycle, a common set of health questions is asked to the respondents. This allows for the analysis of changes in the health of the respondents over time. In addition to the common set of questions, the questionnaire does include focus content and supplements that change from cycle to cycle. For the complete list of topics covered by the NPHS over time, please consult "NPHS Content, Household Component" in the Documentation section at the bottom of the page.
Health Canada, Public Health Agency of Canada and provincial ministries of health use NPHS longitudinal data to plan, implement and evaluate programs and health policies to improve health and the efficiency of health services. Non-profit health organizations and researchers in the academic fields use the information to move research ahead and to improve health.
- Diseases and health conditions
- Health care services
- Lifestyle and social conditions
- Mental health and well-being
- Prevention and detection of disease
Data sources and methodology
The target population of the longitudinal NPHS Household component includes household residents in the ten Canadian provinces in 1994/1995 excluding persons living on Indian Reserves and Crown Lands, residents of health institutions, full-time members of the Canadian Forces Bases and some remote areas in Ontario and Quebec.
Each NPHS cycle questionnaire is conceived in collaboration with specialists from Statistics Canada, Health Canada, provincial ministries of health and researchers from the academic fields. The questionnaire development involves an elaborate literature research and numerous consultations between specialists in order to adapt existing survey instruments from other well-known sources, or to create new ones especially for the NPHS. Every questionnaire is approved by Statistics Canada, members of the expert committees and the Advisory Committee, which includes representatives from the provincial ministries of health, Health Canada, Public Health Agency of Canada, Statistics Canada, other government departments and specialists.
Data collection is performed using a computer-assisted interview (CAI) system. The logical flow of the questions is programmed to reflect skip pattern associated with certain variables such as age. The program also takes into account the type of answer required, the allowed minimum and maximum values, on-line edits associated with the question and what to do in case of item non-response.
Before collecting data from respondents, the data collection computer application is tested extensively in order to identify any errors in the program flow and text. Furthermore, field tests are conducted each cycle. From cycles 1 to 6, two field tests were conducted. For cycles 7 and 8, only one field test was conducted (in November prior to collection). The samples from the two field tests were combined. The majority of long-term non-respondents were removed from the sample. A small number was kept in order to test tracing procedures, among other things. The objectives of the test remain the same as before. The interviews were conducted by Statistics Canada interviewers in Statistics Canada's regional offices. The main objectives are to observe the respondents' reactions to the survey, to test the questionnaire with the changing focus content from one cycle to another, to obtain time estimates for the various sections of the questionnaire, to study the response rates, and to test field operations and procedures such as the interviewer training and data transmission. In Cycle 9, no field test was conducted since changes to the questionnaire were minimal. Instead, additional in-house testing was performed.
This is a sample survey with a longitudinal design.
The same longitudinal units are followed over time.
The NPHS employed a stratified two-stage sample design (clusters, dwellings) based on the Labour Force Survey (LFS) in all provinces except Québec, where Santé Québec's design for the 1992/1993 Enquête sociale et de santé (ESS) was used.
The LFS design consists of a multistage stratified sample where dwellings are selected within clusters. Each province was divided into 3 types of areas (Major Urban Centres, Urban Towns and Rural Areas) from which separate geographic and/or socio-economic strata were formed. In most strata, 6 clusters, usually Census Enumeration Areas (EAs), were selected with probability proportional to size (PPS). The sample of dwellings was obtained once listing operations in sample clusters were completed. Requirements specific to the NPHS led to small modifications to the LFS sampling strategy. For more details on the NPHS sampling plan, consult the chapter 5 of the Longitudinal Documentation.
In Québec, the NPHS sample was selected from dwellings participating in a health survey organized by Santé Québec: the 1992/1993 ESS. The survey sampled 16,010 dwellings using a two-stage design similar to that of the LFS. The province was divided geographically by crossing fifteen health areas with four urban density classes (the Montreal Census Metropolitan Area, regional capitals, small urban agglomerations, and the rural sector). In each area, clusters were defined using socio-economic characteristics and selected using a PPS sample. Selected clusters were enumerated and random samples of their dwellings were drawn: ten per cluster in major cities, twenty or thirty elsewhere.
In the first cycle of the NPHS (1994/1995), the sample was created by first selecting households and then within each household, choosing one member 12 years of age or older to be the longitudinal respondent. The NPHS longitudinal sample consists of all longitudinal respondents who have completed at least the general component of the questionnaire in Cycle 1. It also includes 2,022 children from the first cycle (1994/1995) of the National Longitudinal Survey of Children and Youth (NLSCY). These children were interviewed by the NLSCY for the NPHS in Cycle 1 and have been interviewed by the NPHS since the second cycle. Please note that the persons selected in 1994/1995 as part of the supplemental buy-in samples (for cross-sectional purposes) were not part of the longitudinal sample. The NPHS longitudinal sample is composed of 17,276 persons and is not renewed over time.
Responding to this survey is voluntary.
Data are collected directly from survey respondents.
For seasonality reasons, collection is split in four quarters: starting in May, July, September, and January. There is an additional collection period starting in April of the second year with further follow-up of the non-respondents. Dates and causes of deaths are confirmed with the Canadian Vital Statistics Database - Deaths. The causes are then coded using the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10).
Data collection was performed by interviewers working in the four Statistics Canada Regional Offices (Edmonton, Sturgeon Falls, Sherbrooke and Halifax).
Data collection for the respondents of the household component panel who moved to health institutions was conducted in person using a paper and pencil questionnaire until Cycle 8, when the NPHS began using a computer-assisted personal interview (CAPI) application instead. The health institutions component questionnaires are used for this purpose.
At the beginning of each cycle, each living longitudinal panel member receives, by mail, a letter announcing the start of data collection along with some additional information which includes results based on NPHS data. The interview is conducted using a computer assisted interview (CAI) system. In Cycle 1, 75% of the interviews with the longitudinal respondents were conducted in person and the rest by telephone. Since Cycle 2, around 95% of the interviews are conducted by telephone. Personal interviews are conducted if the respondent does not have a telephone, upon request by the respondent or if the respondents live in a health institution. The interview lasts less than one hour. Interviews for respondents under 12 years old are done by proxy. However, proxy reporting for the other respondents is allowed only for reasons of illness or incapacity.
Interviewers are employees hired by Statistics Canada. Each cycle, interviewers attend a training session specific to NPHS and they receive a manual for reference.
The questionnaire is designed to be used with a CAI system. Questions are specified along with the type of answer required, the minimum and maximum values, on-line edits associated with the question, and what to do in case of item non-response. The CAI questionnaire gets customized to the respondent based on the data collected during the current and previous interviews.
During collection, all reasonable attempts to obtain interviews with longitudinal respondents are made. The interviewer training covers ways of reducing the number of non-contacts and refusals and increasing success of tracing. The interview is not conducted with respondents living outside Canada and the United States.
A certain number of questions allow responses in-full. The write-in information is coded using various classification systems.
The industry and occupation data for all cycles are coded to the North American Industry Classification System (NAICS) 2007 and National Occupational Classification for Statistics (NOC-S) 2006.
The drug coding for all cycles is based on the Anatomical Therapeutic Chemical (ATC) Classification System developed by the World Health Organization (WHO).
For all cycles the conditions or health problems causing activity restrictions were coded based on the ICD-10.
When the death of a respondent is confirmed against the Canadian Vital Statistics Database - Deaths, the cause and the date of death are captured. The cause of death is coded using the ICD-10.
View the Questionnaire(s) and reporting guide(s) .
Editing is done in two stages. The first stage of editing is performed during data collection. Valid ranges for variables have been programmed in the computer-assisted interviewing (CAI) application as well as consistency edits between variables and between cycles. The flow of questions is controlled by the CAI application. Warning messages appear on the CAI screen when an invalid value is captured or when inconsistencies are detected by the application. The interviewers then have the opportunity to confirm responses with the respondents. In most cases the conflict has to be resolved before the interview can continue. The second stage of editing is performed during data processing at Head Office (mainly informatics programs). Inconsistencies discovered at this stage are usually corrected by setting one or more of the variables in question to "not stated". The exception to this is the relationship edits, in which inconsistencies go through a manual processing.
Estimation from NPHS data is done using the sampling weights provided with each data set. These weights are computed using an approach where an initial weight representing the inverse probability of selection is computed. This weight is then adjusted to take into account the various specifics of the survey. The typical adjustment is the one to compensate for non-response; homogeneous response groups are formed based on data available from both respondents and non-respondents. For the longitudinal sample, this adjustment tries as much as possible to use the longitudinal data from previous cycles. The Chi-Square Automatic Interaction Detection (CHAID) algorithm is used to determine which variables best characterize the response groups. Once the adjustments have been made, the last step consists of post-stratifying the weights within each province, for 10 age-sex groups (one-dimensional post-stratification). This post-stratification is done to ensure consistency with the (1996 Census-based) population estimates for 1994, the reference year for the panel.
At the end of each interview, questions are asked about the agreement to share data with the survey partners: Health Canada, Public Health Agency of Canada and the provincial health ministries. The "share" version of the microdata file contains data (and corresponding survey weights) from only those respondents who agreed to share their data. The computation of these weights involves the redistribution of the weights of the non-sharers to the sharers using a similar approach to that of the non-response adjustment. Since the share partners only have access to the share data, they must use the share weights for estimation.
Several sets of longitudinal weights were computed throughout the NPHS cycles. First, for Cycle 1, sampling weights were computed to represent the entire panel of 17,276 persons. For Cycle 2, two types of longitudinal weights were computed; the first one was associated exclusively with the subset of panel members who had fully responded to the survey in both cycles, and the second one with the subset that partially or fully responded in both cycles. From Cycle 3 and onwards, the weights for the subset that provided full responses to all cycles were recomputed after each cycle.
Given the NPHS's multi-stage survey design, the NPHS uses the bootstrap method to calculate the variance. This method takes the complexities of the survey design into account, as well as the various adjustments to the weights during the weighting process. For each sampling weight, a set of bootstrap weights is available to calculate the variances. Note that the Bootvar program, a program made up of macros available in SAS and SPSS, is distributed with the bootstrap weights in order to calculate the variances with this method.
For more information on the estimation process, consult section 7.4 of the NPHS Longitudinal Documentation.
Various strategies are put in place during data collection to improve response rates. Examples are: interviewer training, use of introductory letters, brochures, use of languages other than French and English to conduct interviews, reschedule interviews when needed, non-respondents follow-up, tracing, response rate monitoring, transfer of caseloads to other offices, etc.
NPHS data are collected using a Computer-Assisted Interview (CAI) system which ensures that all and only appropriate questions are asked. The CAI application is extensively tested in-house in order to identify any errors in the program flow and text. Furthermore, in each cycle, field tests were conducted. The tests involved four of Statistics Canada's Regional Offices. The main objectives of these tests were to observe respondent reaction to the survey, to obtain estimates of time for the various sections, to study response rates and to test feedback questions. Field operations and procedures, interviewer training, and the CAI application (for example, the questionnaire on computer) are also tested. Application testing is an ongoing operation up until the start of collection for the survey.
Editing is performed on-line in the CAI application during data collection. It is not possible to enter out-of-range values, and flow errors are controlled through the use of CAI (skip patterns). Some types of inconsistent or unusual reporting are edited at Head Office after data collection. Inconsistencies are usually corrected by setting answers to a question to 'not stated'.
Files, record layouts, programs, documentation, and CD-ROMS are verified and tested before they are sent outside Statistics Canada.
Data quality is an important aspect for any survey. Examining data quality allows for the verification of the reliability and accuracy of the data collected, and helps to determine what steps should be taken to improve data quality in future cycles. The Longitudinal Documentation, chapter 9, provides information related to sampling and non-sampling errors (response, refusals, tracing, attrition, "Don't know" rates).
Please note that the Cycle 1 response rate is based on the 20,095 in-scope persons selected to form the longitudinal panel while the response rate for subsequent cycles is based on the 17,276 individuals who form the longitudinal panel.
Cycles Response rate
Cycle 1: 83.6%
Cycle 2: 92.8%
Cycle 3: 88.3%
Cycle 4: 84.9%
Cycle 5: 80.8%
Cycle 6: 77.6%
Cycle 7: 77.0%
Cycle 8: 70.7%
Cycle 9: 69.7%
Items non-response rates have been calculated from the number of refusal, "Don't Know" and valid values for each variable, sub-module and module in the questionnaire. The cycle 9 total item refusal rate was 0.1% and 0.4% for the "Don't Know" rate. The highest non-response rates were observed for a few variables in the Income and Preventive Health sections.
Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
Access to longitudinal Master files and access to the information excluded from the public use microdata files (PUMFs) can be obtained via Statistics Canada's Research Data Centres program or Remote Access program. Custom tabulations can also be bought. All outputs are vetted for confidentiality before being given to users.
Before releasing and/or publishing any estimate from these files, users should first determine the number of sampled respondents who contribute to the calculation of the estimate. If this number is less than 30, the weighted estimate should not be released regardless of the value of the coefficient of variation for this estimate. For weighted estimates based on sample sizes of 30 or more, users should determine the coefficient of variation of the rounded estimate and follow the guidelines below.
Estimates in the main body of a statistical table are rounded to the nearest hundred units using the normal rounding technique. If the first or only digit dropped is zero to four, the last digit retained is not changed. If the first or only digit dropped is five to nine, the last digit retained is raised by one. Marginal sub-totals and totals in statistical tables are derived from their corresponding unrounded components and then are rounded themselves to the nearest 100 units using normal rounding methods. Averages, proportions, rates and percentages are computed from unrounded components (for example, numerators and/or denominators) and then are rounded themselves to one decimal using normal rounding. In normal rounding to a single digit, if the final or only digit dropped is zero to four, the last digit retained is not changed. If the first or only digit dropped is five to nine, the last digit retained is increased by one. Sums and differences of aggregates (or ratios) are derived from their corresponding unrounded components and then are rounded themselves to the nearest 100 units (or the nearest one decimal) using normal rounding. Under no circumstances are unrounded estimates, published or otherwise, released.
Coverage errors occur at a number of stages in the survey: during frame design, sampling unit definition, data collection and processing. An indicator, the slippage rate, is used to measure the coverage error. The slippage rate represents the discrepancy between population estimates based on the survey (using pre-post-stratified weights) and the most recent census-based population estimates. The discrepancy is expressed as a percentage of the census-based estimate. As with most surveys, NPHS observed some under-coverage which is manifested by a positive slippage rate (about 10% for the longitudinal sample selected in 1994/1995). To reduce the effect of the coverage error, sampling weights are adjusted during their computation according to population estimates provided for the survey's reference period.
Methods used to estimate sampling error
Variance and coefficients of variations (CVs) of estimates produced from these data files are calculated using the bootstrap method. For each cycle, users are provided with a file that contains the bootstrap weights, a program that computes variance/CV for a certain number of statistics and complete documentation. NPHS longitudinal CANSIM tables, available in the NPHS I-PUB (publication 82-618), include confidence intervals or a standard indicator of the precision of the estimates based on Statistics Canada's Policy on Informing Users of Data Quality and Methodology. The topics covered by the tables are: Tobacco use, Self-rated health, Body mass index, Physical activity.
For more details on accuracy, please consult chapters 10 and 11 of the NPHS Longitudinal Documentation.
- ARCHIVED - NPHS Content, Household Component, Cycles 1 to 9
- Date modified: