National Apprenticeship Survey (NAS)

Detailed information for 2015





Record number:


There is a critical need in Canada for highly skilled tradespeople. Apprenticeships in trades are a major source of skilled workers for the Canadian economy. The National Apprenticeship Survey collects information to understand apprenticeship-related issues. This includes the factors that affect apprentices' completion and certification before, during and after their involvement with their apprenticeship.

Data release - March 29, 2017


This new cycle of the National Apprenticeship Survey helps us to describe and better understand apprentices' pathways and experiences, including the motivations that bring people to the skilled trades, experiences with apprenticeship training, and labour market outcomes during and following an apprenticeship program. This information will contribute to ensuring that apprenticeship systems across Canada remain strong and are able to continue to support Canada's evolving economy.

The specific objectives were to better understand:

- pathways to apprenticeship and why people did not enter earlier,
- apprenticeship program progression and completion, including barriers
to both entry and completion,
- experiences of select groups, such as women, Aboriginal people,
immigrants, and persons with disabilities, and
- the financing of apprenticeship training


  • Education, training and learning
  • Job training and educational attainment
  • Labour
  • Outcomes of education

Data sources and methodology

Target population

This survey targets registered apprentices in the ten provinces and the three territories for two types (statuses) of apprentices and only for certain reference years. The two targeted statuses are completers and discontinuers. The target population was determined in two steps: at frame creation and at collection.

First, there is the target population at the time of creation of the frame. The targeted completers and discontinuers are those who completed or discontinued their apprenticeship training between 2011 and 2013, and are not registered in any apprenticeship training as of December 31, 2013.

Then, at time of collection, it was decided to keep in the target population those selected apprentices in the sample that said to have had some apprentice activities (i.e. completer or discontinuer) between 2010 and 2013. If they did not have any apprentice activities between these years, they were considered out-of-scope for this survey.

Instrument design

Statistics Canada designed the NAS questionnaire in partnership with Employment and Social Development Canada (ESDC). Various tools and methods were used to design the NAS questionnaire:

- Previous iteration of surveys on apprenticeship such as the 2007 National Apprenticeship Survey;
- Panel groups of internal and external subject-matter experts;
- Standard sets of questions from other STC surveys, for example, the use of some Labour Force Survey questions to collect data on labour force activities of the apprentices. In addition, other Statistics Canada Harmonized Content questions were also used;
- Statistics Canada's Questionnaire Design Resource Centre (QDRC) to set-up testing of the questionnaire using the Qualitative Testing process by conducting one-on-one interviews with apprentices;
- Revisions to the questionnaire were done after each set of testing and changes approved by QDRC and ESDC.


This is a sample survey with a cross-sectional design.

A one-phase stratified sample was designed for this survey and, based on the budget allocated for the survey, a total sample of approximately 29,000 respondents was targeted. The major goal of the design was ensuring, when possible, that domains of interest to analysts would be well represented in the survey while remaining within the constraints of the budget. Among other things, analysts will be interested in estimating counts, proportions, means and medians (e.g., age, training duration, salary) within each domain of interest. Within each jurisdiction, domains of interest are the status of the apprentice at time of collection (the final status) and the eleven major trade groups as established in the Registered Apprentice Information System (RAIS). At the national level, red seal trades (there are 45 red seal trades) and gender were also domains of interest. As women represent a very small proportion of all apprentices, it would not be possible to compute estimates by gender within all provincial/territorial domains, whereas it would be possible at the national level.

Data sources

Data collection for this reference period: 2015-09-08 to 2016-03-31

Responding to this survey is voluntary.

Data are collected directly from survey respondents.

- IDENT files (for quarters 2014Q2, 2014Q1, 2013Q4, 2013Q3 and 2012Q4), as well as the household survey frame, were used to obtain the most up to date contact information (phone number primarily).
- The income variable from the T1FF file was used as an auxiliary variable when performing imputation of the respondent's personal income variable.

View the Questionnaire(s) and reporting guide(s) .

Error detection

After collection the individual raw data files were appended together and put through a series of standard processing steps designed to clean the data and help ensure its consistency thereby increasing its usefulness. The edits were done on the data both at the micro and macro level.

The flow edits replicated the flow patterns (question order) used in the application and set the respondent's non-applicable questions to a value of 'Valid Skip'. Non-responses were set to a value of 'Not Stated'. These are questions that were applicable to the respondent but were not answered; in a CATI application these values usually follow a response of 'Refusal' or 'Don't Know'.

In addition, various types of editing were done to detect missing or inconsistent information. For example, edits were performed to check the logical relationship between responses. Outliers were identified at the edit step and then updated during the imputation stage.


Imputation is the process that supplies valid values for those variables that have been identified for a change either because of invalid information or because of missing information. The new values are supplied in such a way as to preserve the underlying structure of the data and to ensure that the resulting records will pass all required edits. In other words, the objective is not to reproduce the true microdata values, but rather to establish internally consistent data records that yield good aggregate estimates.

We can distinguish between three types of non-response. Complete non-response is when the respondent does not provide the minimum set of answers. These records are dropped and accounted for in the weighting process (see Chapter 12.0 in the NAS 2015 User Guide). Item non-response is when the respondent does not provide an answer to one question, but goes on to the next question. These are usually handled using the "not stated" code or are imputed. Finally, partial non-response is when the respondent provides the minimum set of answers but does not finish the interview. These records can be handled like either complete non-response or multiple item non-response.

In the case of the NAS, donor imputation was used to fill in missing data for item and partial non-response.


The National Apprenticeship Survey (NAS) is a probability survey. As is the case with any probability survey the sample is selected to represent a reference population - the apprenticeship population - at a specific date within the context of the survey as accurately as possible. Each unit in the sample must therefore represent a certain number of units in the population. If the frame used was perfect (covering exactly the population of interest) and all selected units were traced, contacted and completed the survey, then the design weight assigned to each unit would represent accurately and exactly the number of apprentices in the target population. In this situation, using this weight would yield unbiased estimates. However, this is not the case when surveys are faced with non-response and imperfect frames. Weight adjustments are traditionally used to compensate for these different issues. Response patterns have to be studied carefully to appropriately correct for non-response. It was observed that non-response did not occur randomly or uniformly within the population since different response rates were obtained for different sub-populations. The use of appropriate techniques will correct non-response bias that may be introduced. Similarly, it was observed that the out-of-scope status did not occur randomly within the population and was observed at very high rate as presented in Table 9.2 in the NAS 2015 User Guide.

This survey can be viewed as a two-phase sample where the first phase is the sample selection by stratum and the second phase is a combined adjustment for nonresponse and out-of-scope.

The NAS sample can be divided into several groups according to the results of collection (see Diagram 1, section 12.0 in the NAS 2015 User Guide). First, the sample is divided into resolved (R) and unresolved (U) units. For NAS a resolved unit is one where enough information was obtained during collection to determine whether or not a unit was in-scope or out-of-scope for this survey. All unresolved units are nonrespondents at collection. Then, within each of the two groups, the sample can be divided into in-scope (I) or out-of-scope (OOS). Finally, resolved in-scope units can be divided into respondents (R) or nonrespondents (NR). The resolved units represent 61% of the sample while the unresolved units represent 39%. The unresolved units are comprised of the unresolved in-scope units (U_IS) and the unresolved out-of-scope units (U_OOS) with unknown proportions. The resolved units are comprised of three groups, the resolved in-scope respondents (R_IR) which form 80% of the resolved units, the resolved in-scope non-respondents (R_INR) which form only 5% of the resolved units, and the resolved out-of-scope units (R_OOS) which form 15% of the resolved cases.

There are several key issues concerning weighting for the NAS. One issue is that we do not know the in-scope/out-of-scope status of the unresolved units. As we know that 15% of the resolved units are out-of-scope, we suspect that a significant proportion of the unresolved units are also out-of-scope. It is also possible that being out-of-scope might be a factor for being unresolved (non-ignorable non-response) but, it is very difficult to assess. As the out-of-scope units are not part of the population of interest, they will not be included in the calculation of survey estimates (total, mean, ratio, etc.). However, they have an impact on the variability of these estimates due to uncertainty of the target population or domain totals. Therefore, it is important to estimate as accurately as possible the proportion of the unresolved units that are out-of-scope in order that the sum of the weights of the in-scope portion reflects as much as possible the true totals of the target population and domains within it. Another issue is that no known counts of the target population are available, and therefore no calibration to known totals is possible. For all these reasons, the weights were calculated in three steps.

Step 1. Selection weight

At the time of selection, an initial design weight was assigned to each apprentice, as the inverse of its probability of selection. Since the NAS design is stratified with simple random sampling within strata, the probability of selection of the apprentice is shown in Section 12.0 of the NAS 2015 User Guide.

Step 2. Predict the scope status of the unresolved units by modeling the probability of being in-scope or out-of-scope

This step consists of calculating the probability of being in-scope (or out-of-scope) for each unresolved unit. Using the resolved cases for which the status was determined as the analysis data, a logistic regression model was built using variables on the frame as explanatory variables (such as province, frame status, trade, registration year, age and sex). Using the probability from the logistic model, survey scope inclusion homogeneity groups (GHI) were formed (see Section 12.0 of the NAS 2015 User Guide).

Step 3. Non-response adjustment

After step 2, unresolved cases can be classified as in-scope non-response. Therefore, a typical nonresponse adjustment (second phase adjustment) can be applied on the in-scope units only. For that purpose, response homogeneity groups (RHGs) were formed. RHGs are determined through a combination of logistic regression to predict the probability of being a respondent and then using a clustering procedure based on the modeled probability of response (see Section 12.0 of the NAS 2015 User Guide).

Quality evaluation

To ensure high quality data, Statistics Canada's own internal policies and procedures were followed. The validation of the NAS 2015 results involved many processes which are outlined below.
During various stages of the survey development, steps were taken to ensure its quality. For example, the quality of the frame was evaluated by the methodologists to ensure that we had a complete and accurate population for the survey. As well, the response rates were closely monitored throughout the collection phase to ensure that there were sufficient responses to produce quality estimates for data analysis. It should also be mentioned that the NAS collection was performed using a Computer Assisted Telephone Interview (CATI), which allowed for certain edits to be built into the application; therefore, some editing was done directly at the time of the interview (for example, validity edits). Additionally, during the data processing phase, the data quality was ensured through verification of outlier's values for some key variables like income and age as well as with the use of consistency edits.
The results of the NAS 2015 data were also compared with other relevant surveys, such as the Census (2011), the National Household Survey (NHS), and the Labour Force Survey (LFS) to ensure consistency in the data regarding similar variables and similar trends in nature. Additionally, the results of the NAS 2015 data were compared against the data from the last iteration of the survey, i.e. NAS 2007 to detect any unusual or unexpected findings in order to validate the results. Frequencies of all NAS 2015 variables were reviewed by subject-matter experts, at the national level, to ensure that the responses were consistent across similar questions and corresponded to known trends in the fields. Cross-tabulations of similar variables were performed to ensure that the results made sense and were expected. A validation of the Frame information was also performed on three variables (Status, Trade, and Year) to ensure that the quality of the frame data was accurate. As well, a verification of all Soft Edits and Consistency Edits was performed to ensure that the rules for each question in the questionnaire was applied and that the designed flow of the questionnaire was followed.

Disclosure control

Statistics Canada is prohibited by law from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

In order to prevent any data disclosure, confidentiality analysis is done using the Statistics Canada Generalized Disclosure Control System (G-Confid). G-Confid is used for primary suppression (direct disclosure) as well as for secondary suppression (residual disclosure). Direct disclosure occurs when the value in a tabulation cell is composed of or dominated by few enterprises while residual disclosure occurs when confidential information can be derived indirectly by piecing together information from different sources or data series.

Revisions and seasonal adjustment

This methodology does not apply to this survey.

Data accuracy

To determine the quality of an estimate and calculate its coefficients of variation (CVs), the standard deviation must first be established. Confidence intervals also require the standard deviation of the estimate. The NAS uses a relatively simple sample design, but since calibration is done in multiple stages, there is no simple formula for calculating variance estimates. Therefore, an approximate method, the bootstrap method, is needed. Using the bootstrap weights and the BootVar program described in the subsection below, the CVs and other variance measurements can be calculated with accuracy.

1. Bootstrap method for variance estimation
1) Independently, in each stratum, a simple random sample of n-1 of n units in the sample is selected with replacement. Since the selection is with replacement, a unit may be chosen more than once.
2) This step is repeated R times to form R bootstrap samples. An average initial bootstrap weight based on the R samples is calculated for each unit sampled in the stratum.
3) Steps 1) and 2) are repeated B times, where B is large, yielding B initial bootstrap weights.
4) For each of the B samples produced in 3), the weights are then adjusted according to the same weighting process as the regular weights: non-response adjustment, calibration and so on. The end result is B final mean bootstrap weights for each unit in the sample.
5) The variation of the possible B estimates associated with the B bootstrap weights is compared with the variance of the estimator based on the regular weights and can be used to estimate it.

For the NAS, R=1 and B=1,000.

2. Statistical packages for variance estimation
2.1 BootVar
The bootstrap weights are provided and should be used to calculate variance estimation. BootVar is a macro program, available in SAS format, that can be used to calculate variance with bootstrap weights. It is made up of macros that calculate variances for totals, ratios, differences between ratios and for linear and logistic regression.

BootVar can be downloaded from Statistics Canada's Research Data Centre (RDC) website.

2.2 Other packages
The SAS/STAT module in SAS proposes procedures, such as SURVEYFREQ, SURVEYMEANS, SURVEYREG, etc., to calculate variance by providing the bootstrap weights. This is done using the VARMETHOD=BRR option for the procedure in question.

Other commercially available software can properly calculate the sampling variance from the bootstrap weights provided (e.g., SUDAAN [setting design = BRR], Wes Var and STATA9).

These methods can be adapted for the NAS according to a paper by Owen Phillips "Using bootstrap weights with Wes Var and SUDAAN" (Catalogue no. 12-002-XIE-20040027032) in The Research Data Centre Information and Technical Bulletin, Chronological index, Fall 2004, vol.1 no. 2, Statistics Canada Catalogue no. 12-002-XIE.

Date modified: