Longitudinal Administrative Databank (LAD)

Status:
Active
Frequency:
Annual
Record number:
4107

The Longitudinal Administrative Databank (LAD) is a longitudinal file designed as a research tool on income and demographics.

Detailed information for 1982 to 2012

Data release - November 18, 2014

Description

The Longitudinal Administrative Databank (LAD) is a longitudinal file designed as a research tool on income and demographics. It comprises a 20% sample of the annual T1 Family File (record number 4105) and the Longitudinal Immigration Data Base (record number 5057). Variables have been harmonized where possible and individuals can be linked year to year starting with 1982 data. The file is augmented annually with new data.

The longitudinal file contains many annual demographic variables about the individuals represented and annual income information for both the individual and their census family in that year. For immigrants landed between 1980 and 2011, the file also contains certain key characteristics observed at landing.

The longitudinal nature of the LAD permits custom-tailored research into dynamic phenomena, as well as representative cross-sectional patterns. Data are mainly used by government departments to evaluate programs and support policy recommendations. Academics, private consultants and Statistics Canada researchers also use the data for analyses of socio-economic conditions.

Reference period:
Calendar year

Subjects

  • Ethnic diversity and immigration
  • Household, family and personal income
  • Income, pensions, spending and wealth
  • Labour market and income
  • Personal and household taxation

Data sources and methodology

Target population

The population of interest is all members of Canadian families (families that include at least one person living in Canada). For cross-sectional purposes, in any specific reference year, the data cover all persons who completed a T1 tax return for that year or who received Canada Child Tax Benefits (CCTB) in that year, their non-filing spouses (including wage and salary information from the T4 file), their non-filing children identified from three sources (the CCTB file, the births files, and an historical file) and filing children who reported the same address as their parent. The dataset is based on the census family concept. The census family includes parent(s) and children living at the same address and non-family persons (people neither with a partner nor living with unmarried, childless children at the same address). For the longitudinal projects, it is only possible to link together data from years where a reliable identifier is available: only persons who completed a T1 tax return or who received CCTB, most of their non-filing spouses and non-filing children under 19 years of age who have previously filed will have a reliable identifier and can be followed across years. This limits representative longitudinal analysis to individuals who have started filing income tax returns and their partners. However, this covers around 75% of the official population estimates.

Sampling

This is a sample survey with a longitudinal design.

The frame is constructed from the annual release of the T1 Family File. Only individual records that have social insurance numbers can be selected and these are sampled at a 20% rate. Also included in the sample is a 20% sample of the Longitudinal Immigration Data Base. The survey units are individuals but the information about the characteristics of their family during the reference year is also kept. No stratification is performed as the sampling weight is equal across all units. The sampling is done once on each record in such a way that if someone is selected in a particular reference year, they will be selected in any other later (or earlier) years in which they are present in the T1 Family File.

Data sources

Data are extracted from administrative files and derived from other Statistics Canada surveys and/or other sources.

Income tax returns are filed mainly in the spring following the year of reference. The T1 files are usually received from the Canada Revenue Agency (CRA) one year and a month after the end of the income reference period. The T1 Family File (T1FF) is usually ready for extraction one year and a half after the end of the income reference period. The Longitudinal Administrative Data is drawn from the T1FF and linked to previous years, a process that takes approximately two months after the T1FF is available.

All Longitudinal and Administrative Data are micro-records extracted and constructed from the annual releases of the T1FF. More detailed information on the sources for that file is available in entry for record number 4105. An additional cross-reference file of social insurance numbers is also supplied annually by the Canada Revenue Agency. Its use permits the reliable linkage across years of people whose social insurance number changes over time. The key characteristics of the 20% sample of recent immigrants are supplied by a linkage to an extract of the Longitudinal Immigrant Data Base. In addition, Tax Free Savings Account (TFSA) information for 2009 to 2012 has been added to the Longitudinal Administrative Databank (LAD).

Error detection

Most error detection and edits of income fields are performed during the construction of the T1 Family File. Outliers are identified and the plausibility of those records is checked manually, some mathematical identities are verified and identifiable data-entry problems are also corrected. All edits are done at the micro-record level. During the sampling and processing of the Longitudinal Administrative Data from a new reference year of the annual T1 Family File, there are a few longitudinal consistency comparisons made at the micro-record level. In particular, we edit sex, year of birth and year of death to a uniform and constant value for each individual.

Imputation

No imputation is performed when deriving the Longitudinal Administrative Data from the T1 Family File. For details on the creation of families and imputations made during the construction of the T1 Family File, please consult that IMDB entry. In general however, if an identifiable person was not a T1 filer in a specific year, very limited income information is available for them for that year.

Estimation

CANSIM tables 204-0001 and 204-0002 are generated from the Longitudinal Administrative Databank (LAD). There are typically two types of estimates derived from the LAD. Estimates of cross-sectional individual characteristics and all longitudinal estimates are usually performed without adjustment for non-response and without calibration. A simple constant weighting according to the inverse of the sampling rate is sufficient to obtain the estimates. Estimates of family characteristics are similar though larger families may have a higher probability of selection so a varying family weight must be used to obtain the estimates. Most variance calculations are direct but some may necessitate a slightly more complex method such as Rao-Demnati or in the case of small enough sub-populations, a bootstrap-based technique.

Quality evaluation

Most quality control procedures are performed when constructing the T1 Family File. Once at the stage of integrating the records of a new year into the Longitudinal Administrative Data, the main tools used are comparisons of control totals with those from the full T1 Family File to ensure we still have the representative sample and that fields were identified correctly. Some historical trend analysis is also used.

Disclosure control

Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Only a small group of people within the Division have access to confidential data. Users must specify their requirements to these people who then carry out the retrievals. Before release, data are subjected to stringent non-disclosure practices:

1. A perturbation weight is used in all computations of counts, amounts or other statistical analyses.
2. Any cell must contain a minimum of 5 sampled individuals (or families), otherwise it is suppressed.
3. Each cell which can be dominated by one individual (or one family) is checked for dominance and suppressed if a problem is identified.
4. Once the primary suppressions are made, complementary suppressions are made so that suppressed information cannot be discovered residually. This is an iterative process - each complementary suppression may require an additional complementary suppression. Patterns are created to keep these to a minimum.
5. Finally, the counts and amounts are rounded -- sampled counts to the nearest five, dollar amounts to the nearest $100 (or $10 if the amount is below $1,000).
6. Totals and percentages are based on rounded means and counts to prevent the unravelling of non-disclosure procedures.
Outside of these general guidelines, special cases may sometimes require case by case evaluation by a committee.

Revisions and seasonal adjustment

No calendarization, benchmarking or seasonal adjustments are performed on the dataset. Specific projects using the dataset may choose to adjust weights for the filing rate (as compared for example to official population estimates) or benchmark using the T4 control totals for employment earnings. In general, no adjustments are made and there is no policy of regular revisions.

Dollar amounts are always stored in current dollars as provided on the tax returns. Specific analyses may choose to deflate, inflate or not the amounts to constant dollars using appropriate indices for multi-year comparisons.

Data accuracy

Details on cross-sectional data accuracy may be consulted under the entry for the T1 Family File (T1FF, record number 4105). The main departures from the T1FF are the sampling and the longitudinal components.

Since the sampling rate is relatively high at 20%, the variation due to sampling is quite low for relatively small populations. For example, for population counts of individuals with specific characteristics, the coefficient of variation (CV) due to sampling error is 20% or less when the population has 100 or more units, less than 10% when population exceeds 400 and less than 2% for populations of 10,000 or more. When calculating percentages of a population with specific characteristics, the CV due to sampling would be less than 10% as long as the population count is 400 people or more and the estimated percentage is 50% or more or if the population count is 1000 people or more and the estimated percentage is above 20%.

For longitudinal projects, the coverage will be lower than that observed in any single cross-sectional year: the main restriction is the inability to follow individuals without a reliable identifier. Furthermore, the individual usually must be included in all of the study years. For example, when studying one-year transitions, 95.9% of individuals with a record for 2004 income reference year also have one in 2005. Emigration or death accounts for 0.8% of the original 2004 group so 3.2% remain unexplained missing; these could be non-filers or late filers in 2005. When studying the composition of the 2005 cohort, 94.9% were also in the 2004 file, 2.7% had never filed before or moved to Canada in 2005 and 2.3% were non-filers or late filers in 2004 (of these, 56.3% had filed in 2003). The study of longer periods would result in more observations with a least one missing year of income data.