National Population Health Survey: Health Institutions Component, Longitudinal (NPHS)

Detailed information for 2002/2003




Every 2 years

Record number:


The NPHS Health Institutions Component survey data support national level estimates only.

Data release - January 26, 2007


In the fall of 1991, the National Health Information Council recommended that an ongoing national survey of population health be conducted. This recommendation was based on consideration of the economic and fiscal pressures on the health care systems and the commensurate requirement for information with which to improve the health status of the population in Canada. Commencing in April 1992, Statistics Canada received funding for development of a National Population Health Survey Health (NPHS) Institutions component.

The NPHS collects information related to the health of the Canadian population and related socio-demographic information. It is composed of three components: the Household, the Health Institutions and the North components.

To obtain a comprehensive picture of Canadians' health, a special survey was developed for people living in health care institutions--hospitals, nursing homes, and residential facilities for people with disabilities.

The content of the NPHS Health Institutions component was selected according to the following criteria:

- the survey should collect information on the health status of the Canadian population residing in health institutions;
- the data collected should be comparable to that of the household population whenever possible;
- the survey should increase the understanding of conditions relating to institutionalization;
- information provided should permit the study, over time, of the transitions from households to institutions and vice versa;
- the survey should produce national level data.

The Health Institutions component started in 1994-1995 and has been conducted every two years. The first two cycles (1994-1995 and 1996-1997) were both cross-sectional and longitudinal. Beginning in Cycle 3 (1998-1999) the survey became strictly longitudinal (i.e. collecting information from the same individuals each cycle). After five cycles of collection, the institutions component has ended.

The NPHS Health Institutions component data are primarily used to study prevalence and incidence of disease, to make projections and perform demographic trend analyses. The data are used by the research community and other health professionals.

Reference period: Varies according to the question (for example: "over the last 12 months", "over the last 6 months", "during the last week",


  • Disability
  • Diseases and health conditions
  • Health
  • Health and disability among seniors
  • Health care services
  • Seniors

Data sources and methodology

Instrument design

Each NPHS cycle questionnaire is conceived in collaboration with specialists from Statistics Canada, Health Canada, provincial ministries of health and researchers from the academic fields. The questionnaire development involves an elaborate literature research and numerous consultations between specialists in order to adapt existing survey instruments from other well-known sources, or to create new ones especially for the NPHS. Every questionnaire is approved by members of the expert committees and the Advisory Committee, which includes representatives from the provincial ministries of health, Health Canada, Statistics Canada, other government departments and specialists. The health institution component questionnaire was slightly modified between the first three cycles but remains basically the same since Cycle 3.


This is a sample survey with a longitudinal design.

The selection of the sample was done in 2 stages. First, health institutions were selected, then residents were selected within these institutions.

A list of in-scope health institutions (long-term, at least 4 beds and residents not autonomous) was generated from list of residential care facilities collected by the Canadian Institute for Health Information and the list of hospitals maintained by the Health Statistics Division (HSD) of Statistics Canada. This list was initially stratified by geographic region (geographic strata) and subsequently by the type of institution (characteristic strata) and number of beds (size strata).

There were five geographic strata; the Atlantic Provinces, Quebec, Ontario, the Prairie Provinces, and British Columbia. Within each geographic stratum three characteristic strata were defined:

Institutions for the aged including residential care facilities for the aged and extended/chronic care hospitals.

Cognitive Institutions including residential care facilities for emotionally disturbed children psychiatrically disabled and developmentally delayed people, and psychiatric hospitals.

Other Rehabilitative Institutions including rehabilitation, pediatric and other speciality hospitals, general hospitals with long-term units as well as residential care facilities for people with physical disabilities.

Within each of these geographic/characteristic strata, the institutions were grouped into size strata by grouping facilities with a similar number of beds. The number of size strata created depended on the total number of beds in the geographic/characteristic strata. Once the number of size strata was determined, the boundaries for the different size strata were determined using the rule of the cumulative square root of the number of beds.

In Cycle 1, the number of institutions selected from a size stratum depended on the amount of sample allocated to the stratum and the size of the institutions within the stratum (consult Chapter 5 of the Cycle 1 NPHS Health Institution Component Public Use Microdata File Guide in the Documentation section) In strata comprised of larger institutions, a larger sample of residents was selected from each institution. This reduced the total number of institutions visited. Once the number of institutions to be selected from each size stratum was determined, a systematic sample of institutions was taken from the stratum list with the probability of selection proportional to size (PPS). Size was determined by the number of long-term beds.

It was possible that the listing indicated a head office for several smaller institutions. In this case, a listing of all of the institutions under this head office was obtained and two were selected: the largest (in terms of beds) and another randomly selected using PPS sampling.

Once the institution had been selected, residents of these institutions were selected. The total sample of 2,600 residents was proportionally allocated to each of the size strata based on the number of beds in each stratum. The sample was increased to thirty residents when a size stratum had an initial sample size of less than thirty residents.

After Cycle 1 collection, this sample consisted of 2,287 respondents living in a health institution in 1994-1995.
This sample of respondents represents the Cycle 1 cross-sectional sample as well as the longitudinal panel of the NPHS Health Institutions component. The longitudinal panel is followed every cycle and no additional sampling units are added to the longitudinal panel over the cycles.

Data sources

Responding to this survey is voluntary.

Data are collected directly from survey respondents and extracted from administrative files.

Prior to collection, all institutions were sent an introductory letter and then contacted by telephone by senior interviewers to arrange a meeting between an interviewer and the administrator of the institution. During this liaison visit the interviewer administered a short questionnaire on the policies of the institution. The residents requiring proxy interviews were determined at this time. The name and telephone number of the next-of-kin were obtained in these cases. The next-of-kin was then phoned and given the option to complete the interview primarily themselves or have it completed by a knowledgeable institutional staff member.

The NPHS Institutional component questions were designed to be conducted by personal interview (face-to-face collection) using paper and pencil. Telephone interviews were acceptable when a proxy respondent could not be contacted in person.

Interviewers were instructed to make all reasonable attempts to obtain interviews with selected residents. The administrator of the institution or a contact within the institution determined which of the selected residents required a proxy interview because of illness or incapacity. The proxy respondent could be a relative, a staff member, or a volunteer at the institution. In Cycle 5, Proxy respondents completed 72% of the interviews (of the proxy interviews, 34% were done by relatives of the resident).A staff member from the institution provided information on each selected resident's use of medications and contact with health professionals.

In Cycle 5, most interviews were conducted in person.

The respondent questionnaire and Institution Control Form were captured at the Head Office using EP90 (Entry Point 90). The programmes written for the data capture prevented most out-of-range values from being entered. All captured information, excluding comments, was 100% verified.

Conditions or health problems causing activity restrictions were coded based on the International Classification of Diseases, 9th Revision (ICD-9) or according to the Musculoskeletal Impairment Supplementary Coding Scheme developed for the Health and Activity Limitation Survey (HALS). Conditions or health problems causing activity restrictions were also coded based on the International Classification of Diseases and Related Health Problems, 10th revision (ICD-10).

The drug coding for all cycles is based on the Anatomical Therapeutic Chemical (ATC) classification developed by the World Health Organization (WHO) as available on the Health Canada Drug Product Database (DPD) in September 2003.

When the death of a respondent is confirmed against the Canadian Vital Statistics Database -- Deaths, the cause and date of death are captured. The cause of death is coded using the ICD-9 and the ICD-10.

View the Questionnaire(s) and reporting guide(s) .

Error detection

After completing an interview, the interviewer reviewed the questionnaire to ensure the skip patterns were correctly followed. Further editing was done at the Regional Offices to check for completeness, legibility and consistency of entries on the questionnaire. This allowed for immediate follow-up with the respondents.

After data capture, top-down editing was performed on all records to check the skip patterns.


Imputation was used to derive the missing values for one variable in the NPHS Health Institutions component. The variable HSIxDHSI denotes the respondent's Health Utility Index (HUI). This measure of overall health status assesses vision, hearing, speech, getting around (ability to move about), dexterity (movement of hands and fingers), feelings, cognitive ability (memory and thinking) and pain. The overall HUI rating, which can range from -0.360 to 1.000 is calculated based on responses to a series of questions on health status. However, this overall rating cannot be calculated if one or more of the answers are missing. It was decided to use imputation for the missing values in order to calculate the HUI of the health care institutions component.

The HUI was calculated based on the answers to questions on the eight elements in the health status section. A partial rating was calculated for each of the elements and then further calculations were done on these partial ratings to derive the overall HUI rating. Imputation was at the level of the eight partial ratings rather than the questions. After imputation, the program for calculating the derived HUI variable was changed slightly so that it selected as entry data the eight imputed values for vision, hearing, speech, getting around, feelings, cognitive ability, dexterity and pain.

Imputation was done in two stages:

- The first stage used a deterministic imputation. In some instances, even if the person did not answer the question providing the partial rating, there was sufficient information to deduce the partial rating with certainty. Therefore, a partial rating based on this partial information was attributed in all instances where it was considered appropriate to do so.

- The second stage corresponds to a hot deck donor imputation to attribute the missing partial ratings. The nearest neighbour method was used to identify the donors. The nearest neighbour was determined by calculating a temporary HUI, using only the partial ratings containing only valid values.


Estimation from NPHS data is done using the sampling weights provided with each data set. These weights are computed using an approach where an initial weight representing the inverse probability of selection is computed. This weight is then adjusted to take into account the various specifics of the survey. The typical adjustment is the one to compensate for non-response. The CHAID algorithm is used to determine which variables best characterize the response groups. Once the adjustments have been made, the last step consists of post-stratifying the weights. This post-stratification is done to insure consistency with the Census-based population estimates. Since the total number of people in Canada living in a health care institution is unknown (based on the institution definition in the NPHS), it is impossible to perform a post-stratification based on these totals. However, post-stratification is done using the total weights obtained in Cycle 1. Post-stratification is done in two steps: first, for each of the five regions and then for each type of institution and age-sex category.

Also, for each of the sampling weights computed for the group of respondents in each cycle, a "share" version of the weight is also computed. This share weight is given only to those respondents who agreed to share their data with the survey partners (typically Health Canada and the various provincial health ministries). The computation of this weight involves the redistribution of the weights of the non-sharers to the sharers using a similar approach to that of the non-response adjustment. Since the share partners only have access to the share data, they must use the share weights for estimation.

For the first two cycles of the NPHS Health Institution component (1994-1995 and 1996-1997), a well-known, simple variance formula was used to compute the variances and the CVs of estimates. It assumes that institutions are selected with unequal probabilities and with replacement. In reality, the institutions were selected without replacement, that is, once selected, an institution could not be chosen a second time. A variance computation program written in SAS and SPSS is provided along with the microdata files. This program can be used to calculate variances for means and totals.

For the third (1998/1999), fourth (2000/2001) and fifth (2002/2003) cycles, the NPHS Health Institution component used the bootstrap method to calculate the variance. This method takes the complexities of the survey design into account, as well as the various adjustments to the weights during the weighting process. A set of bootstrap weights is available with the data files to calculate the variances. Note that the Bootvar program, a program made up of macros available in SAS and SPSS, is distributed with the bootstrap weights in order to calculate the variances with this method.

Quality evaluation

Survey design has a profound effect on the objectives of the survey which are listed under "Survey Description". To meet these objectives, a Steering Committee and an Advisory Board comprised of authorities from the provincial Ministries of Health and Health Canada determined the concepts and focus. Expert Groups were convened to advise on the measures to obtain the results envisioned by the Steering Committee and Advisory Board, and to recommend proven collection vehicles and indices. The resulting data is recognized as valid measures of contemporary concepts such as: depression, activity limitation, weight problems and chronic pain.

High response rates are essential for quality data. To reduce the number of non-response cases, the interviewers are all extensively trained by Statistics Canada, provided with detailed Interviewer Manuals, and are under the direction of interviewer supervisors. A maximum recommended assignment size by interviewer was calculated based on test results. Interviewers make every effort to contact respondents.

Refusals were followed up by senior interviewers, project supervisors or by other interviewers to encourage respondents to participate in the survey. In addition, to maximize the response rate, non-response cases were also followed up in subsequent collection periods.

Disclosure control

Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

In Cycles 1 and 2 of the NPHS Health Institutions, Public Use Microdata Files (PUMFs) were produced in addition to the Master files. The PUMFs differ in a number of important aspects from the survey 'master' files held by Statistics Canada. These differences are the result of actions taken to protect the anonymity of individual survey respondents. First, only cross-sectional data are available on such files, because longitudinal information can lead to the identification of respondents. Also, some sensitive variables are regrouped, capped or completely deleted from the files. All the PUMFs must be approved by the Microdata Release Committee before their release.

Users requiring access to information excluded from the microdata files may purchase custom tabulations, or access the master files through the Research Data Centres program or the Remote Access program. Outputs are vetted for confidentiality before being given to users.

Before releasing and/or publishing any estimate from these files, users should first determine the number of sampled respondents who contribute to the calculation of the estimate. If this number is less than 30, the weighted estimate should not be released regardless of the value of the coefficient of variation for this estimate. For weighted estimates based on sample sizes of 30 or more, users should determine the coefficient of variation of the rounded estimate and follow the guidelines below.

Estimates in the main body of a statistical table are rounded to the nearest hundred units using the normal rounding technique. If the first or only digit dropped is zero to four, the last digit retained is not changed. If the first or only digit dropped is five to nine, the last digit retained is raised by one. Marginal sub-totals and totals in statistical tables are derived from their corresponding unrounded components and then are rounded themselves to the nearest 100 units using normal rounding methods. Averages, proportions, rates and percentages are computed from unrounded components (for example, numerators and/or denominators) and then are rounded themselves to one decimal using normal rounding. In normal rounding to a single digit, if the final or only digit dropped is zero to four, the last digit retained is not changed. If the first or only digit dropped is five to nine, the last digit retained is increased by one. Sums and differences of aggregates (or ratios) are derived from their corresponding unrounded components and then are rounded themselves to the nearest 100 units (or the nearest one decimal) using normal rounding. Under no circumstances are unrounded estimates, published or otherwise, released. Unrounded estimates imply greater precision than actually exists

Revisions and seasonal adjustment

This methodology type does not apply to this statistical program.

Data accuracy

Two separate response rates are calculated for the NPHS Health Institution component. The institutions response rate corresponds to the percentage of in-scope institutions that agreed to have the survey conducted among their residents and the individual response rate corresponds to the percentage of selected residents from the responding institutions with whom an interview was conducted.

Cycle 5 Longitudinal Institution response rate: 100%
Cycle 5 Longitudinal Individual response rate: 97.7%

Methods used to estimate sampling error
Variance and coefficients of variations (CVs) of estimates produced from these data files are calculated using the bootstrap method. For each cycle, users are provided with a file that contains the bootstrap weights, a program that computes variance/CV for a certain number of statistics and compete documentation. For more details on accuracy, please consult chapter 11 of the Cycle 5 NPHS Longitudinal documentation for the Health Institutions component.


Date modified: