Longitudinal Administrative Databank (LAD)

Detailed information for 1982 to 2021

Status:

Active

Frequency:

Annual

Record number:

4107

The Longitudinal Administrative Databank is a longitudinal file designed as a research tool on income and demographics.

Data release - November 10, 2023

Description

The Longitudinal Administrative Databank (LAD) is a longitudinal file designed as a research tool on income and demographics. It comprises a 20% sample of the annual T1 Family File (record number 4105). The longitudinal LAD file contains many annual demographic variables about the individuals represented, including the landing year of recent immigrants and an immigration flag, and annual income information for both the individual and their census family in that year.

As well, since 2017, a linking key variable is available on the Longitudinal Immigration Database (record number 5057) which permits researchers to link all key characteristics of those tax filing immigrants on the Longitudinal Immigration Database to their records on the LAD, from 1982 onward.

The longitudinal nature of the LAD permits custom-tailored research into dynamic phenomena, as well as representative cross-sectional patterns. Data are mainly used by government departments to evaluate programs and support policy recommendations. Academics, private consultants and Statistics Canada researchers also use the data for analyses of socio-economic conditions.

Reference period: The calendar years. Calendar year "y" for income; end of calendar year "y" for age; point in time (usually April of calendar year "y+1") for address information.

Collection period: Income tax returns are filed mainly in the spring following the year of reference. The T1 files for calendar year "y" are received from Canada Revenue Agency in January of the year "y+2".

Subjects

  • Household, family and personal income
  • Immigration and ethnocultural diversity
  • Income, pensions, spending and wealth
  • Labour market and income
  • Personal and household taxation

Data sources and methodology

Target population

The population of interest is individuals who filed a federal tax return. Specifically, all persons who have a social insurance number and completed a T1 tax return for that year are included. Additionally the population includes a small number of relatives of tax filers who did not file a T1 themselves yet had a social insurance number and either received Canada Child Benefit, received a T4 - Statement of earnings, or were listed as dependants on their spouses T1.

Instrument design

This methodology does not apply.

Sampling

This is a sample survey with a longitudinal design.

The frame is constructed from the annual release of the T1 Family File. Only individual records that have social insurance numbers can be selected and these are sampled at a 20% rate. Key characteristics of recent immigrants are available by a linkage to the Longitudinal Immigration Database (record number 5057). The survey units are individuals, but the information about the characteristics of their family during the reference year is also kept. No stratification is performed as the sampling weight is equal across all units. The sampling is done once on each record in such a way that if someone is selected in a particular reference year, they will be selected in any other later (or earlier) years in which they are present in the T1 Family File.

For the longitudinal projects, it is only possible to link together data from years where a reliable identifier is available: only persons who completed a T1 tax return or who received Canada Child Benefit, most of their non-filing spouses and non-filing children under 19 years of age who have previously filed will have a reliable identifier and can be followed across years. This limits representative longitudinal analysis to individuals who have started filing income tax returns and their partners. However, this covers around 75% of the official population estimates.

Data sources

Data are extracted from administrative files and derived from other Statistics Canada surveys and/or other sources.

Income tax returns are filed mainly in the spring following the year of reference. The T1 files are usually received from the Canada Revenue Agency (CRA) one year and a month after the end of the income reference period. The T1 Family File (T1FF) is usually ready for extraction one year and a half after the end of the income reference period. The longitudinal administrative data are drawn from the T1FF and linked to previous years, a process that takes a few months after the T1FF is available.

All longitudinal and administrative data are microrecords extracted and constructed from the annual releases of the T1FF. More detailed information on the sources for that file is available in entry for record number 4105 (Annual Income Estimates for Census Families and Individuals [T1 Family File]). An additional cross-reference file of social insurance numbers is also supplied annually by the CRA. Its use permits the reliable linkage across years of people whose social insurance number changes over time. The key characteristics of recent immigrants are available by a linkage to the Longitudinal Immigration Database. In addition, tax-free savings account information and a group of variables relating to individuals and their ownership of shares in Canadian-controlled private corporations, are available on the Longitudinal Administrative Databank.

Error detection

Most error detection and edits of income fields are performed during the construction of the T1 Family File. Outliers are identified and the plausibility of those records is checked manually, some mathematical identities are verified and identifiable data-entry problems are also corrected. All edits are done at the micro-record level. During the sampling and processing of the longitudinal administrative data from a new reference year of the annual T1 Family File, there are a few longitudinal consistency comparisons made at the micro-record level. In particular, we edit sex, year of birth and year of death to a uniform and constant value for each individual.

Imputation

No imputation is performed when deriving the Longitudinal Administrative Data from the T1 Family File. For details on the creation of families and imputations made during the construction of the T1 Family File, please consult the record number 4105 (Annual Income Estimates for Census Families and Individuals [T1 Family File]) of the Integrated Metadatabase. In general, if an identifiable person was not a T1 filer in a specific year, very limited income information is available for them for that year.

Estimation

Data tables 11-10-0024-01, 11-10-0025-01, 11-10-0026-01, 11-10-0054-01, 11-10-0055-01, 11-10-0056-01, 11-10-0058-01, 11-10-0059-01 and 11-10-0061-01 are generated from the Longitudinal Administrative Databank. Estimates of cross-sectional individual characteristics and all longitudinal estimates are usually performed without adjustment for non-response and without calibration. A simple constant weighting according to the inverse of the sampling rate is sufficient to obtain the estimates. Estimates of family characteristics are similar though larger families may have a higher probability of selection so a varying family weight must be used to obtain the estimates. Most variance calculations are direct but some may necessitate a slightly more complex method such as Rao-Demnati or, in the case of small enough sub-populations, a bootstrap-based technique.

Quality evaluation

Most quality control procedures are performed when constructing the T1 Family File. Once at the stage of integrating the records of a new year into the longitudinal administrative data, the main tools used are comparisons of control totals with those from the full T1 Family File to ensure we still have the representative sample and that fields were identified correctly. Some historical trend analysis is also used.

Disclosure control

Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Only Statistics Canada employees and deemed employees can be approved to access the confidential microdata. Before release, aggregate data are subjected to stringent non-disclosure practices:

1. A perturbation weight is used in all computations of counts, amounts or other statistical analyses.
2. Any cell must contain a minimum of five sampled individuals (or families), otherwise it is suppressed.
3. Each cell which can be dominated by one individual (or one family) is checked for dominance and suppressed if a problem is identified.
4. Once the primary suppressions are made, complementary suppressions are made so that suppressed information cannot be discovered residually. This is an iterative process - each complementary suppression may require an additional complementary suppression. Patterns are created to keep these to a minimum.
5. Finally, the counts and amounts are rounded; sampled counts to the nearest five, dollar amounts to the nearest $100 (or $10 if the amount is below $1,000).
6. Totals and percentages are based on rounded means and counts to prevent the unravelling of non-disclosure procedures.

Outside of these general guidelines, special cases may sometimes require case by case evaluation by a committee.

Revisions and seasonal adjustment

No calendarization, benchmarking or seasonal adjustments are performed on the dataset. Specific projects using the dataset may choose to adjust weights for the filing rate (as compared for example to official population estimates) or benchmark using the T4 control totals for employment earnings. In general, no adjustments are made and there is no policy of regular revisions.

Dollar amounts are always stored in current dollars as provided on the tax returns. Specific analyses may choose to deflate, inflate or not the amounts to constant dollars using appropriate indices for multi-year comparisons.

Data accuracy

Details on cross-sectional data accuracy may be consulted under the entry for the T1 Family File (record number 4105 - (Annual Income Estimates for Census Families and Individuals [T1 Family File]). The main departures from the T1 Family File are the sampling and the longitudinal components.

Since the sampling rate is relatively high, at 20%, the variation due to sampling is quite low for relatively small populations. For example, for population counts of individuals with specific characteristics, the coefficient of variation (CV) due to sampling error is 20% or less when the population has 100 or more units, less than 10% when population exceeds 400 and less than 2% for populations of 10,000 or more. When calculating percentages of a population with specific characteristics, the CV due to sampling would be less than 10% as long as the population count is 400 people or more and the estimated percentage is 50% or more, or if the population count is 1000 people or more and the estimated percentage is above 20%.

For longitudinal projects, the coverage will be lower than that observed in any single cross-sectional year: the main restriction is the inability to follow individuals without a reliable identifier. Furthermore, the individual usually must be included in all of the study years. For example, when studying one-year transitions, 95.9% of individuals with a record for 2013 income reference year also have one in 2014. Emigration or death accounts for 0.8% of the original 2013 group so 3.2% remain unexplained missing; these could be non-filers or late filers in 2014. When studying the composition of the 2014 cohort, 94.9% were also in the 2013 file, 2.7% had never filed before or moved to Canada in 2014 and 2.3% were non-filers or late filers in 2013 (of these, 56.3% had filed in 2012). The study of longer periods would result in more observations with at least one missing year of income data.

Documentation

Date modified: