Annual Income Estimates for Census Families and Individuals (T1 Family File)

Detailed information for 2017

Status:

Active

Frequency:

Annual

Record number:

4105

This activity is conducted for the development and dissemination of annual small area income data for Canadians.

Data release - July 11, 2019

Description

This activity is conducted for the development and dissemination of annual small area socio-economic data for Canadians and their families. These data, collected primarily from income tax returns submitted to the Canada Revenue Agency (CRA), provide income and demographic information for sub-provincial geographic areas. Data are used by municipal, provincial and federal government departments to evaluate programs and support policy recommendations. Data are used in business and educational fields to learn more about the markets targeted. Academics and researchers use the data for analyses of socio-economic conditions.

Reference period: Calendar year "y" for income and contributions, end of calendar year "y" for age, point in time (usually April of calendar year "y+1") for address information.

Collection period: Income tax returns are filed mainly in the spring following the year of reference. The T1 files for calendar year "y" are received from Canada Revenue Agency (CRA) in January of the year "y+2".

Subjects

  • Household, family and personal income
  • Household spending and savings
  • Income, pensions, spending and wealth
  • Pension plans and funds and other retirement income programs
  • Personal and household taxation

Data sources and methodology

Target population

These data cover all persons who completed a T1 tax return for the year of reference or who received Federal child benefits, their non-filing spouses (including wage and salary information from the T4 file), their non-filing children identified from three sources (a file pertaining to Federal Child Benefits, the births files, and an historical file) and filing children who reported the same address as their parent. Development of the small area family data is based on the census family concept. The census family concept groups individuals either in a census family (parent(s) and children living at the same address) or identifies them as persons not in census families.

Instrument design

This methodology does not apply.

Sampling

This methodology does not apply.

Data sources

Data are extracted from administrative files.

The data includes almost all individuals who filed an individual T1 tax return (some late filers are not included), and to this data recipients of Federal Child Benefits are added. From these records, in addition to tax filers, non-filing spouses, partners and children are determined. When complete, the data is approximately 96% of the population and is left unweighted and unadjusted. Please note that sampling methodology does not apply to this survey.

The individual T1 file is combined with the T4 tax file and a file pertaining to Federal Child Benefits which are all received from the Canada Revenue Agency. The files are processed over a 4 to 5 month period to create the T1 Family File (T1FF). Newborns from the previous year, not represented on the Federal Child Benefits file, are identified from a file of births. Families are joined and identifiable missing spouses and children are imputed. The final T1FF contains information for tax filers and imputed persons (35.3 million individuals in 2017). Tax filers who died within the year are not counted.

The period of income is the calendar year. Missing components of income across Canada. Only for the province of Quebec, the data for provincial taxes are based on estimates because Statistics Canada does not receive the administrative files required for these calculations.

During geocoding, Statistics Canada's Postal Code Conversion File (PCCF) is used to convert postal codes to standard geographic areas (Census Divisions, Census Metropolitan Areas, Census Agglomerations and Census Tracts).

Error detection

During processing, there is a combination of automatic and manual editing. Variables with values of unity (a type of flag for CRA) are converted to zero and variables with values above their absolute maximum are corrected automatically. Those with outliers are identified then examined and those identified as erroneous are corrected manually. Variables where negative values should not be possible are also checked and adjusted as needed.

Imputation

Because the source files have limited direct information on the number and characteristics of non-filing individuals, this information must be derived. The family system creates families by linking filing family members together and estimates non-filing members from information on the tax filers' returns, based on marital status, deductions and information for tax credits, from the Federal Child Benefits file or from an historical file. For example, the family system imputes a non-filing spouse whenever a filer has declared him/herself married but was not linked with a filing spouse. Wage and salary income for non-filing spouses is derived from the T4 file when such information exists.

For tax year 2017, 27.8 million people filed tax returns and an additional 7.4 million people were identified as non-filing family members. Thus, less than one-quarter of the people on the file are non-filers in 2017. This proportion has been declining over time as more and more people file, either to report taxes or to receive transfer payments now administered by the Canada Revenue Agency.

Between 1982 and 1992, information about children was derived directly from the tax file. Starting with the 1993 tax year, a combination of files was used to identify non-filing children: a file pertaining to Federal Child, the provincial births files and the T1 Family File (T1FF) of the previous year.

In 2017, approximately 75.3% of the Canadian population filed a tax return. A completed T1FF accounts for approximately 95.6%, the difference being non-filers identified from other files or from filers' information.

Estimation

The production of estimates involves the following major processes:

Edit & Imputation: If a value is identified as being outside the maximum range for its type, the value is set to the maximum value. This methodology was chosen because it was found that occasionally values were expressed as dollars and cents, while in fact they should have been dollars only. If a value is identified as being an erroneous outlier, then manual correction can take the above form, or whatever seems reasonable.

Geocoding: During geocoding, Statistics Canada's Postal Code Conversion File (PCCF) is used to convert postal codes to standard geographic areas (Census Divisions, Census Metropolitan Areas, Census Agglomerations and Census Tracts).

Family formation: Census families are formed through matching by social insurance number, family name and postal code, while accounting for age, sex and marital status. It is assumed that parents must be a minimum of 15 years older than their children. When a spouse is imputed, their sex is assigned as opposite that of the filing person and their age is probabilistically assigned from husband-wife age distributions. For imputed children that are not found on the Federal Child Benefits file, the births files or the historical file, their sex and age are imputed. The imputed age of child is assigned as a probabilistic function of the mother's age.

Income & tax estimation of missing values: There are some non-taxable sources of income missing from the tax return. These are calculated from the information contained within the tax return. The Federal Child benefits are obtained directly from a file from CRA. The provincial refundable credits and provincial benefits of the National Child Benefit program are calculated using the current year's information as a proxy. The GST/HST credit is calculated for those who applied. Quebec taxes are calculated based on the information contained within the federal return.

Aggregation: The data are aggregated to approximate the standard geographic areas of Statistics Canada. Census metropolitan areas (CMAs) and census agglomerations (CAs) are areas consisting of one or more neighbouring municipalities situated around a major urban core. A CMA must have a total population of at least 100,000 of which 50,000 or more live in the urban core. A CA must have an urban core population of at least 10,000.

Other levels of postal and census geography are also available.

When performing calculations, Canada Revenue Agency (CRA) tax rules are used.

Since this data is based on the entire T1 file, and is not a sample, data is left unweighted and unadjusted.

Quality evaluation

The estimates are evaluated in several ways:

1. The geography is evaluated by comparing the number of tax filers and dependants with population estimates from Statistics Canada for the same areas.
2. The demographic information is evaluated in much the same way - by comparisons with estimates from Statistics Canada for the same areas.
3. The income information is evaluated by trend analysis and by comparisons with data from the Canadian Income Survey (CIS) whenever possible.
4. When Census of Population data are available, many comparisons are made -- population, income and demographics.
5. In addition, comparisons are made for income of individuals with the annual data produced by CRA called Income Statistics.

The 2016 T1FF has 34,803,330 persons identified as being in Canada, representing 95.2% of Statistics Canada's official population estimates. Coverage is greater than 91.9% of the population estimates across all provinces and territories. Provincial coverage can be affected by provincial legislation regarding provincial income tax liability and/or eligibility for provincial tax credits.

The 2016 Census of Population collected demographic information for May 2016. This compares fairly closely with the 2015 tax file, tax returns being filed primarily at the end of April, 2016. The 2015 T1FF had 34,465,690 persons, representing 98% of the Census population; coverage is 96.9% or more across all provinces and territories.

Improvements have been made to the process of identifying children. The introduction of the Universal Child Care Benefit program from 2006 to June 2016, and subsequently the Canadian Child Benefit in July 2016, has allowed the identification of more children under the age of six. These changes have resulted in improved coverage of children in the T1FF data compared to the official Statistics Canada population estimates. The impact of these changes is most notable in the counts and median total income of lone-parent families although it is not possible to distinguish the precise impact of the improvements separately from normal year-to-year change.

Disclosure control

Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential.

Only a small group of people within the Income Statistics Division of Statistics Canada have access to confidential data. Users must specify their requirements to these people who then carry out the retrievals. Before release, data are subjected to stringent non-disclosure practices:

1. There must be a minimum of 100 tax filers in any geographic area before any data will be produced.
2. Any cell must represent a minimum of 15 observations otherwise it is suppressed.
3. Each cell which can be dominated by one tax filer (or one family) is checked for dominance and suppressed if a problem is identified.
4. For dollar amounts, once the primary suppressions are made, complementary suppressions are made so that suppressed information cannot be discovered residually. This is an iterative process - each complementary suppression may require an additional complementary suppression. Patterns are created to keep these to a minimum.
5. Finally, the counts and amounts are rounded -- counts to the nearest ten, aggregate amounts to the nearest $5,000 and distribution measures such as percentiles to the nearest $10.
6. Averages and percentages are based on rounded counts and amounts to prevent the unravelling of non-disclosure procedures.

Additional discloser control rules may be applied if deemed necessary.

Revisions and seasonal adjustment

Once the data are finalized, they are not revised. For analyses, data are sometimes adjusted to constant dollars for comparison with data from other years, but only current dollars are kept on the file.

Data accuracy

The data are unadjusted apart from editing and estimation of missing components to achieve a definition of income that is compatible with Statistics Canada's definition of income. There are no coefficients of variation from sampling, as the sample is nearly a census (95.6% of the total population identified) and the data are neither weighted nor adjusted to compensate for the 4.4% of the people who appear to be missing.

Documentation

Date modified: