Youth Smoking Survey (YSS)

Detailed information for 2002





Record number:


The main objective of the Youth Smoking Survey (YSS) is to provide current information on the smoking behaviour of students in grades 5 to 9 (in Quebec primary school grades 5 and 6 and secondary school grades 1 to 3), and to measure changes that occurred since the last time the survey was conducted. Additionally, the survey collects basic data on alcohol and drug use by students in grades 7 to 9 (in Quebec secondary 1 to 3). Results of the Youth Smoking Survey will help with the evaluation of anti-smoking and anti-drug use programs, as well as with the development of new programs.

Data release - June 14, 2004


The main objective of the Youth Smoking Survey (YSS) is to provide current information on the smoking behaviour of students in grades 5 to 9 (in Quebec primary school grades 5 and 6 and secondary school grades 1 to 3), and to measure changes that occurred since the last time the survey was conducted. Additionally, the survey collects basic data on alcohol and drug use by students in grades 7 to 9 (in Quebec secondary 1 to 3). Results of the Youth Smoking Survey will help with the evaluation of anti-smoking and anti-drug use programs, as well as with the development of new programs.

The YSS collects information on the following topics:

- the prevalence of smoking among students in grades 5 to 9 (in Quebec primary school grades 5 and 6 and secondary school grades 1 to 3);

- the types of smoking behaviour among children (e.g. experimental smoking, occasional smoking, daily smoking);

- the social and demographic factors associated with smoking behaviour (e.g. what motivates children to smoke, the influence of family and friends);

- where and how children obtain cigarettes;

- attitudes and beliefs about smoking, including awareness of health risks;

- recollection and opinions on health warning messages on cigarette packages;

- experience with alcohol, drugs and medications used for non-medical purposes.


  • Children and youth
  • Health
  • Lifestyle and social conditions
  • Risk behaviours

Data sources and methodology

Target population

The target population consists of all young Canadian residents aged 10 to 14 attending private or public schools in grades 5 to 9 inclusively (in Quebec primary school grades 5 and 6 and secondary school grades 1 to 3). Specifically excluded from the survey's coverage are residents of the Yukon, Northwest Territories and Nunavut, persons living on Indian Reserves and inmates of institutions. Young persons who are attending special schools (schools for the blind or for deaf-mutes) or who are attending schools located on military bases are also excluded from the target population. Furthermore, the population actually surveyed differs somewhat from the target population. The differences may be categorized as:

1) Young people enrolled in small classes (less than 10 students) and;

2) Young people living in remote areas i.e.:

Newfoundland & Labrador above latitude of 55 degrees,
Quebec above latitude of 51 degrees, as well as Îles de la Madeleine,
Ontario above latitude of 51 degrees,
Manitoba and Saskatchewan above latitude of 55 degrees,
Alberta and British Columbia above latitude of 57 degrees and the Queen Charlotte Islands.

Both categories were not eligible to be surveyed but were still part of the target population. It is estimated that these exclusions represent approximately 2.3% of the target population.

Instrument design

As comparability with the 1994 survey results was an important objective of the 2002 YSS, only minimal modifications were made to the wording of the questions that were asked of children in 1994. The layout and the style of the questionnaire were also very similar to the original version. So as not to affect responses to the questions related to smoking, the alcohol and drug use questions were added at the end of the questionnaire for the older students. The new questions about children's activities, as well as those measuring self-esteem, came from the National Longitudinal Survey of Children and Youth.

The draft questionnaire was tested in the spring of 2002 with children from various grades, with or without experience with cigarettes, with good or low marks, boys and girls, English and French speaking. The respondents completed the questionnaire and later, in one-on-one interviews provided comments and clarified their answers.

The basic function of the parent's questionnaire was still collection of socio-demographic information about the child's family. However, the design of the questionnaire was significantly modified compared with the 1994 version. Of the 15 questions addressed to parents, 13 are standard questions used in other surveys. The parent's questionnaire was tested informally.


This is a sample survey with a cross-sectional design.

The 2002 Youth Smoking Survey was administered to a sample of children in grades 5 to 9 by sampling classes from a frame of all public and private schools in Canada. The sample design consists of a two-stage stratified clustered design with schools as primary sampling units and with classes as secondary sampling units. All of the students in the selected classes were surveyed (approximately 19,000 students).

The sample design features three levels of stratification. First, each province constitutes a stratum. An implicit stratification by grade level (5 to 9 - in Quebec primary school grades 5 and 6 and secondary school grades 1 to 3) is used and finally, the schools are stratified by census metropolitan area (CMA) versus non-CMA, with additional strata in Quebec (Montreal) and Ontario (Toronto). (In Quebec the English and bilingual schools were grouped together in a stratum called English regardless of their CMA status. The French schools were stratified by CMA, non-CMA and Montreal). The sample was then selected in each strata independently, meaning that some schools may be selected more than once, for different grades.

The sample of schools was selected systematically with probability proportional to school size, i.e., the total number of students for each stratum. In order to ensure better representation by school board size and school size, the school file was sorted, first by school board size and then by school size within each school board. The selection of the secondary sampling units (classes) was accomplished in the field by the interviewer who randomly selected one class in the desired grade. This translated into a final sample of 1,070 classes in 982 schools situated in 327 school boards.

Data sources

Data collection for this reference period: October 2002 to December 2002

Responding to this survey is voluntary.

Data are collected directly from survey respondents.

Survey collection activities in schools were conducted from October to December 2002. They included mailing an introductory letter to the selected schools, selecting the classes to participate in the survey and conducting classroom sessions during which students completed paper questionnaires. These collection activities were preceded by a lengthy school board approval process which began in June 2002.

Only experienced Statistics Canada interviewers worked on this survey. An average assignment size was three classrooms per interviewer. If more than one classroom, up to a maximum of four, was selected in one school, all of the classrooms were assigned to the same interviewer. This procedure ensured that each school was approached by only one interviewer and a minimal number of visits were made to the school. Allocation of assignments was based on the geographic distribution of the schools relative to the interviewers' residences.

Interviewers were allowed up to four hours for training. This included reading the Interviewer's Manual, completing the review exercises, answering test questions posed by senior interviewers over the phone and discussing any data collection issues. The YSS data collection managers in the Regional Offices participated in a two day classroom training session in Ottawa and later trained the senior interviewers.

View the Questionnaire(s) and reporting guide(s).

Error detection

As in 1994, the youth questionnaire was designed with very few skip patterns. It was felt that skip patterns might not be correctly followed by the young respondents aged 10 to 14 and might result in identifying smokers as non-smokers during the classroom session, as non-smokers would require much less time to get through the questionnaire.

The questionnaire was edited using the Atop-downA logic. To accomplish this task, flows had to be determined before the edit program could be written. The critical element was establishing the smoking status of respondents, as many survey questions applied to smokers only. Similar to the 1994 survey, answers to several questions were examined to resolve inconsistencies between key questions used as indicators of smoking status.

The first stage of survey processing undertaken at head office was the replacement of any "out-of-range" values on the data file with blanks. This process was designed to make further editing easier.

The first type of error treated was errors in questionnaire flow, where questions which did not apply to the respondent (and should therefore not have been answered) were found to contain answers. In this case a computer edit automatically eliminated superfluous data by following the flow of the questionnaire implied by answers to previous, and in some cases, subsequent questions.

The second type of error treated involved a lack of information in questions which should have been answered. For this type of error, a non-response or "not-stated" code was assigned to the variable.

Some item non-response was also subject to imputation. Imputation is the process used to resolve problems of missing, invalid or inconsistent responses identified during editing. This is done by changing some of the responses or item non-responses on the record being edited to ensure that a plausible, internally coherent record is created.


Even though for the majority of the variables item non-response was treated by assigning a "Not stated" code, imputation for question Y_Q16 (Have you smoked 100 or more cigarettes in your life?) was necessary since this variable was critical for deriving smoking status. Records with "Not stated" and "Don't know" answers were imputed using a donor imputation approach. The method used to perform imputation is a hot-deck procedure called nearest-neighbour or donor imputation.

Questions Y_Q11A (Have you ever tried cigarette smoking, even just a few puffs?) and Y_Q14 (Have you ever smoked a whole cigarette?) were also imputed for missing entries since they were pivotal in determining the valid skips. In order to reconcile, and deterministically impute these two variables, several variables were considered. They were Q11A, Q14 and Q15 as primary fields and Q16, Q17, Q18, Q19 and Q20 as secondary fields.


Statistical weights were placed on each record to represent the number of sampled persons that the record represents. The weighting for the Youth Smoking Survey (YSS) consisted of several steps:

1) Initial Sampling Weight (School Weight)

2) Adjustment for Non-response at the School Level

3) Adjustment for the Selection of a Class (Class Weight)

4) Adjustment for Class Non-response

5) Adjustment for Student Non-response

6) Post-stratification Adjustment

In order to supply coefficients of variation (CV) which would be applicable to a wide variety of categorical estimates produced from this microdata file and which could be readily accessed by the user, a set of Approximate Sampling Variability Tables has been produced. These CV tables allow the user to obtain an approximate coefficient of variation based on the size of the estimate calculated from the survey data.

The coefficients of variation are derived using the variance formula for simple random sampling and incorporating a factor which reflects the multi stage, clustered nature of the sample design. This factor, known as the design effect, was determined by first calculating design effects for a wide range of characteristics and then choosing from among these a conservative value (usually the 75th percentile) to be used in the CV tables which would then apply to the entire set of characteristics.

The variance estimation method used is bootstrap.

Quality evaluation

Considerable time and effort were taken to reduce non sampling errors in the survey. Quality assurance measures were implemented at each step of the data collection and processing cycle to monitor the quality of the data. These measures include the use of highly skilled interviewers, extensive training of interviewers with respect to the survey procedures and questionnaire, observation of interviewers to detect problems of questionnaire design or misunderstanding of instructions, procedures to ensure that data capture errors were minimized, and coding and edit quality checks to verify the processing logic.

Disclosure control

Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Since the YSS file amalgamates data from both the child and the parent/guardian interview into a single record, there is a possibility that a parent/guardian may identify his/her child's answers and consequently obtain access to the child's responses. This assumption led to the suppression of all but one of the parental variables (GP2_07 -- Child's family situation).

To prevent disclosure on the child's section of the record, age and aboriginal status have been suppressed, while the weekly amount of spending money has been capped at $75. Additionally, the responses to Questions 37a (father) and 39a (mother) "I don't live with a father/mother or anyone who is like a father/mother" have been coded to "Not stated". There were also a total of 44 local suppressions on 30 records to minimize the risk of disclosure in case of unique combinations of variables.

To avoid disclosure of product brands, Question 22b (What brand do you usually smoke?) has been replaced by the derived variables DVSMOKE that describes the strength of the brand and DVLOWTAR which indicates its lower tar value. Similarly, the questions referring to the use of Ritalin (75a and 75b) and Gravol (78a and 78b) have been replaced by derived variables grouping the drugs in question with other prescription (DVPDG and DVPDGAG) and non-prescription drugs (DVNDG and DVNDGAG).

Revisions and seasonal adjustment

This methodology does not apply to this survey.

Data accuracy

The overall response rate for the 2002 Youth Smoking Survey was 82%.

Please refer to Chapter 8.0 (Data Quality) of the User Guide for detailed information.


Date modified: