Administrative Personal Income Masterfile (APIM)

Detailed information for 2021

Status:

Active

Frequency:

Annual

Record number:

5384

The Administrative Personal Income Masterfile (APIM) is a comprehensive, centrally processed source of personal income data generated from tax returns and associated federal slips.

Data release - Scheduled for December 9, 2022

Description

The APIM is a centrally processed annual database containing income information for over 30 million Canadians, covering the tax-filing population and those who have not filed by the time of data production.

This data source can be integrated with other data or used as a standalone data source. Program results aim to provide stakeholders at all levels of government, the research community, and domestic and international organizations with comprehensive and timely income-related data to aid in better understanding Canadians' well-being and inform policy decisions.

Each year, the APIM database is produced upon receipt of the Canada Revenue Agency (CRA) files by Statistics Canada. These files are processed at Statistics Canada to align them with the total income of person standard used to measure personal income (https://www23.statcan.gc.ca/imdb/p3Var.pl?Function=DEC&Id=100736)

Reference period: January 1 to December 31

Subjects

  • Household, family and personal income
  • Income, pensions, spending and wealth

Data sources and methodology

Target population

The target population of the APIM includes all individuals with income who can be identified within the administrative files sourced through the CRA for the reference year, which are listed below.

This approach provides maximum coverage because it imposes no restrictions based on age, permanent resident status or place of residence. However, if an individual has not filed taxes and is not on any other income-related files mentioned below, they will be considered out of scope for that reference year.

Instrument design

This methodology type does not apply to this statistical program.

Sampling

This methodology does not apply.

Data sources

Data are extracted from administrative files.

The CRA supplies the data used to compile the APIM. The intended original purpose of these data sources is to fulfill the CRA's mission to "administer tax, benefits, and related programs, and ensure compliance on behalf of governments across Canada, thereby contributing to the ongoing economic and social well-being of Canadians." (https://www.canada.ca/en/revenue-agency/corporate/about-canada-revenue-agency-cra/mission-vision-promise-values.html).

The following are the two types of data sources used to construct the APIM database:

1- personal tax filings received annually by the CRA. Filing deadlines for these are at the end of April following the tax year for most filers and mid-June for the self-employed.

2- tax slips providing information on income and related matters that are issued for the reference or tax year by employers, the government or other institutions and submitted to the CRA.

Statistics Canada receives different versions of these data from the CRA as tax files are reassessed or submitted late. The APIM database is based on data compiled 10 months after the reference period. Alternate versions of APIM have been produced using earlier or later compilation dates, which can result in less or more complete files. Information about these alternate versions of APIM are available to users upon special request.

It should be noted that the APIM compiled using the 10-month data files is assessed as being of very good quality when compared to versions compiled later. Most of the late filers have tax slips available from which a comprehensive profile of income can be constructed.

The following CRA files are used to construct income and related data:

T1 - Income Tax and Benefits Returns
GHSTC file - Goods and Services Tax Credit file
CCB (Canada Child Benefits) file
T4 - Statement of Remuneration Paid
T4E - Statement of Employment Insurance and Other Benefits
T3 - Statement of Trust Income Allocations and Designations
T4A - Statement of Pension, Retirement, Annuity, and Other Income
NR4 - Statement of Amounts Paid or Credited to Non-Residents of Canada
T4A-NR - Statement of Fees, Commissions, or Other Amounts Paid to Non-Residents for Services Rendered in Canada
T4A(P) - Statement of Canada Pension Plan Benefits
T4A(OAS) - Statement of Old Age Security Benefits
T4RIF - Statement of Income from a Registered Retirement Income Fund
T4RSP - Statement of RRSP Income
T5 - Statement of Investment Income
T5007 - Statement of Benefits
T5008 - Statement of Securities Transactions
T5018 - Statement of Contract Payments
T1204 - Government Service Contract Payments
T5013 - Statement of partnership income

Each year, amendments to tax legislation and government transfers to individuals are reviewed and updated to ensure that the APIM production system is up to date and that correct assessments are made when measuring personal income. A series of validation activities follows these annual updates to ensure proper quality control.

Data from these sources are combined for individuals through direct linkages based on their Social Insurance Number. These numbers are removed from the APIM at the earliest possible processing stage.

Since the data sources sometimes provide different information for individuals, the most reliable sources are prioritized for compiling income. The T1 Income Tax and Benefit Return file is the default data source, which is supplemented with information from other sources. For individuals who did not file a T1 tax return, income-related tax slips can be used to create a comprehensive portrait of income.

Error detection

Minimal data editing is performed on records in the APIM database. During processing, however, automatic and manual checks are performed. Some variables are compared when amounts are available from multiple sources, and precedence rules are implemented to select the best source.

Imputation

Imputation is used to produce complete income records for individuals whose tax information is incomplete at the time of APIM processing. For example, some tax filers from previous years may be late in filing, some individuals do not file T1 tax returns but receive tax slips, and some income sources are not included on the forms but can be estimated. In APIM processing, two types of imputation are used: donor and deterministic.

For individuals who do not have a tax return at the time of processing, a donor is randomly selected from among individuals with similar characteristics to those of the recipient, and this donor's value is used to impute the recipient. A set of matching variables, such as province, age, and income variables, which are present on a tax slip, is used to identify individuals with similar characteristics. Not all values are imputed directly from the donor to the recipient. When valuable auxiliary information is available from non-filer records, such as tax slips or the previous year's T1 tax return, this information is combined with some of the donor's values to obtain more accurate imputed values. Administrative tax slips make it possible to build a comprehensive income profile for an individual who did not file a T1 tax return, even without a tax return. Between 2017 and 2020, an average of 10% to 12% of records were for non-filers and required certain variables to be imputed in this manner.

Where administrative data are incomplete, deterministic methods are used to derive various income amounts whose values are subject to highly predictable patterns. For example, provincial child benefit amounts are not available to Statistics Canada, but the high correlation of eligibility with federal benefits makes it possible to estimate them.

Estimation

The APIM contains a wide selection of income and income tax variables, including credits that have been estimated to accurately reflect the full income an individual would have received during the year. Such estimates relate primarily to the social benefits an individual may have received, including refundable provincial tax credits, social assistance, provincial child benefits, supplements for seniors and other amounts not tracked by the CRA.

All provinces, except Quebec, have agreements with the CRA for the administration of tax laws, and an agreement is in place to obtain tax data from the CRA. In contrast, Quebec's tax data are entirely estimated because Quebec administers its tax laws through Revenu Québec, and Statistics Canada does not have an agreement to obtain its data.

Quality evaluation

To ensure consistent delivery of high-quality estimates, subject-matter analysts undertake validation activities at key points in the production cycle. These activities follow the Guidelines for the Validation of Statistical Outputs and include outlier analysis, analysis of change over time to review year-to-year shifts in income, cross tabulations by various demographic identifiers, and coherence and consistency analysis based on events in the news and known changes observed in tax law and government income assistance programs.

In addition, budgets are reviewed, and inquiries are made to officials at various levels of government to obtain benchmark assessments, including cost estimates for the program and the approximate number of beneficiaries, to ensure that our estimates are consistent.

Disclosure control

Statistics Canada is prohibited by law from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Only a small group of people within the Centre for Income and Socioeconomic Well-being Statistics of Statistics Canada have access to confidential data. Users must specify their requirements and seek appropriate approvals before data retrieval is permitted. Furthermore, personal identifiers are stripped from the data at the beginning of processing and replaced by anonymized identifiers.

Revisions and seasonal adjustment

This methodology type does not apply to this statistical program.

Data accuracy

The methodology of this program was designed to control errors and reduce their potential effects. However, the results of the program remain subject to error from non-sampling error (e.g., coverage, response, and processing error).

Non-sampling error

The APIM's coverage of personal income is very extensive, leaving only a limited proportion of the population's income uncovered. Without sufficient information and traceability of certain types of transactions, the APIM program is unable to cover income from unreported cash transactions, i.e., under-the-table work, foreign income for which the CRA does not have the information and illegal or gambling activities.

Response-related errors are also conceivable with the APIM source tax administrative data. Errors related to the collection and processing of information by the CRA, including misreporting by filers, may be present in the data.

Another error we recognize results from the deterministic calculation of certain social benefits an individual may have received, such as refundable provincial tax credits, provincial supplements for seniors and other refunds that are not tracked by the CRA. These amounts may contain a degree of error because some of the data leading to an estimate may be incomplete or missing. This may result in an overestimation or an underestimation of an individual's total income.

Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: