Longitudinal Administrative Databank (LAD)
Detailed information for 1982 to 2015
The Longitudinal Administrative Databank (LAD) is a longitudinal file designed as a research tool on income and demographics.
Data release - November 15, 2017
The Longitudinal Administrative Databank (LAD) is a longitudinal file designed as a research tool on income and demographics. It comprises a 20% sample of the annual T1 Family File (record number 4105) and the Longitudinal Immigration Data Base (record number 5057). Variables have been harmonized where possible and individuals can be linked year to year starting with 1982 data. The file is augmented annually with new data.
The longitudinal file contains many annual demographic variables about the individuals represented and annual income information for both the individual and their census family in that year. For immigrants landed between 1980 and 2015, the file also contains certain key characteristics observed at landing.
The longitudinal nature of the LAD permits custom-tailored research into dynamic phenomena, as well as representative cross-sectional patterns. Data are mainly used by government departments to evaluate programs and support policy recommendations. Academics, private consultants and Statistics Canada researchers also use the data for analyses of socio-economic conditions.
Reference period: The calendar years, 1982 to 2015. Calendar year "y" for income; end of calendar year "y" for age; point in time (usually April of calendar year "y+1") for address information.
Collection period: Income tax returns are filed mainly in the spring following the year of reference. The T1 files for income year "y" are received from Canada Revenue Agency (CRA) in January of the year "y+2".
- Ethnic diversity and immigration
- Household, family and personal income
- Income, pensions, spending and wealth
- Labour market and income
- Personal and household taxation
Data sources and methodology
The population of interest is individuals who filed a federal tax return. Specifically, all persons who have a social insurance number and completed a T1 tax return for that year are included. Additionally the population includes a small number of relatives of tax filers who did not file a T1 themselves yet had a social insurance number and either received Canada Child Tax Benefits (CCTB), received a T4 statement of earnings, or were listed as dependants on their spouses T1.
This methodology does not apply.
This is a sample survey with a longitudinal design.
The frame is constructed from the annual release of the T1 Family File. Only individual records that have social insurance numbers can be selected and these are sampled at a 20% rate. Also included in the sample is a 20% sample of the Longitudinal Immigration Data Base (5057). The survey units are individuals but the information about the characteristics of their family during the reference year is also kept. No stratification is performed as the sampling weight is equal across all units. The sampling is done once on each record in such a way that if someone is selected in a particular reference year, they will be selected in any other later (or earlier) years in which they are present in the T1 Family File.
For the longitudinal projects, it is only possible to link together data from years where a reliable identifier is available: only persons who completed a T1 tax return or who received CCTB, most of their non-filing spouses and non-filing children under 19 years of age who have previously filed will have a reliable identifier and can be followed across years. This limits representative longitudinal analysis to individuals who have started filing income tax returns and their partners. However, this covers around 75% of the official population estimates.
Data collection for this reference period: 1982-01-01 to 2015-12-31
Data are extracted from administrative files and derived from other Statistics Canada surveys and/or other sources.
Income tax returns are filed mainly in the spring following the year of reference. The T1 files are usually received from the Canada Revenue Agency (CRA) one year and a month after the end of the income reference period. The T1 Family File (T1FF) is usually ready for extraction one year and a half after the end of the income reference period. The Longitudinal Administrative Data is drawn from the T1FF and linked to previous years, a process that takes approximately two months after the T1FF is available.
All Longitudinal and Administrative Data are micro-records extracted and constructed from the annual releases of the T1FF. More detailed information on the sources for that file is available in entry for record number 4105. An additional cross-reference file of social insurance numbers is also supplied annually by the Canada Revenue Agency. Its use permits the reliable linkage across years of people whose social insurance number changes over time. The key characteristics of the 20% sample of recent immigrants are supplied by a linkage to an extract of the Longitudinal Immigrant Data Base. In addition, Tax Free Savings Account (TFSA) information for 2009 to 2015 has been added to the Longitudinal Administrative Databank (LAD).
Most error detection and edits of income fields are performed during the construction of the T1 Family File. Outliers are identified and the plausibility of those records is checked manually, some mathematical identities are verified and identifiable data-entry problems are also corrected. All edits are done at the micro-record level. During the sampling and processing of the Longitudinal Administrative Data from a new reference year of the annual T1 Family File, there are a few longitudinal consistency comparisons made at the micro-record level. In particular, we edit sex, year of birth and year of death to a uniform and constant value for each individual.
No imputation is performed when deriving the Longitudinal Administrative Data from the T1 Family File. For details on the creation of families and imputations made during the construction of the T1 Family File, please consult that T1FF IMDB entry - 4105. In general however, if an identifiable person was not a T1 filer in a specific year, very limited income information is available for them for that year.
CANSIM tables 204-0001, 204-0002, 204-0101, 204-0102, and 204-0103 are generated from the Longitudinal Administrative Databank (LAD). There are typically two types of estimates derived from the LAD. Estimates of cross-sectional individual characteristics and all longitudinal estimates are usually performed without adjustment for non-response and without calibration. A simple constant weighting according to the inverse of the sampling rate is sufficient to obtain the estimates. Estimates of family characteristics are similar though larger families may have a higher probability of selection so a varying family weight must be used to obtain the estimates. Most variance calculations are direct but some may necessitate a slightly more complex method such as Rao-Demnati or in the case of small enough sub-populations, a bootstrap-based technique.
Most quality control procedures are performed when constructing the T1 Family File. Once at the stage of integrating the records of a new year into the Longitudinal Administrative Data, the main tools used are comparisons of control totals with those from the full T1 Family File to ensure we still have the representative sample and that fields were identified correctly. Some historical trend analysis is also used.
Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
Only Statistics Canada employees and deemed employees can be approved to access the confidential microdata. Before release, aggregate data are subjected to stringent non-disclosure practices:
1. A perturbation weight is used in all computations of counts, amounts or other statistical analyses.
2. Any cell must contain a minimum of 5 sampled individuals (or families), otherwise it is suppressed.
3. Each cell which can be dominated by one individual (or one family) is checked for dominance and suppressed if a problem is identified.
4. Once the primary suppressions are made, complementary suppressions are made so that suppressed information cannot be discovered residually. This is an iterative process - each complementary suppression may require an additional complementary suppression. Patterns are created to keep these to a minimum.
5. Finally, the counts and amounts are rounded -- sampled counts to the nearest five, dollar amounts to the nearest $100 (or $10 if the amount is below $1,000).
6. Totals and percentages are based on rounded means and counts to prevent the unravelling of non-disclosure procedures.
Outside of these general guidelines, special cases may sometimes require case by case evaluation by a committee.
Revisions and seasonal adjustment
No calendarization, benchmarking or seasonal adjustments are performed on the dataset. Specific projects using the dataset may choose to adjust weights for the filing rate (as compared for example to official population estimates) or benchmark using the T4 control totals for employment earnings. In general, no adjustments are made and there is no policy of regular revisions.
Dollar amounts are always stored in current dollars as provided on the tax returns. Specific analyses may choose to deflate, inflate or not the amounts to constant dollars using appropriate indices for multi-year comparisons.
Details on cross-sectional data accuracy may be consulted under the entry for the T1 Family File (T1FF, record number 4105). The main departures from the T1FF are the sampling and the longitudinal components.
Since the sampling rate is relatively high at 20%, the variation due to sampling is quite low for relatively small populations. For example, for population counts of individuals with specific characteristics, the coefficient of variation (CV) due to sampling error is 20% or less when the population has 100 or more units, less than 10% when population exceeds 400 and less than 2% for populations of 10,000 or more. When calculating percentages of a population with specific characteristics, the CV due to sampling would be less than 10% as long as the population count is 400 people or more and the estimated percentage is 50% or more or if the population count is 1000 people or more and the estimated percentage is above 20%.
For longitudinal projects, the coverage will be lower than that observed in any single cross-sectional year: the main restriction is the inability to follow individuals without a reliable identifier. Furthermore, the individual usually must be included in all of the study years. For example, when studying one-year transitions, 95.9% of individuals with a record for 2013 income reference year also have one in 2014. Emigration or death accounts for 0.8% of the original 2013 group so 3.2% remain unexplained missing; these could be non-filers or late filers in 2014. When studying the composition of the 2014 cohort, 94.9% were also in the 2013 file, 2.7% had never filed before or moved to Canada in 2014 and 2.3% were non-filers or late filers in 2013 (of these, 56.3% had filed in 2012). The study of longer periods would result in more observations with a least one missing year of income data.
- Longitudinal Administrative Data Dictionary
Last review : January 16, 2017.
- Date modified: