Canadian Employer-Employee Dynamics Database (CEEDD)

Detailed information for January 1, 2001 to December 31, 2020

Status:

Active

Frequency:

Annual

Record number:

5228

To provide matched data between employees and employers in the Canadian labour market from administrative data sources.

Data release - The data are not available to the public because the individual observations are confidential under the Statistics Act.

Description

The CEEDD is a set of linkable files to provide matched data between employees and employers in the Canadian labour market. Selected variables from different component files are made available for analytical purposes. Analysis can be done cross-sectionally by calendar year or longitudinally across years from 2001 onwards to the most current information that can be drawn from the source component files (generally current year minus two).

Collection period: The CEEDD begins in 2001.

Data sources and methodology

Target population

The CEEDD data is based on linkable files from administrative sources. The population of the database therefore covers all individuals and firms in the administrative records.

Instrument design

This methodology does not apply.

Sampling

The CEEDD data is a set of longitudinal linkable files between employees and employers from the component files.

Data sources

Data are extracted from administrative files and derived from other Statistics Canada surveys and/or other sources.

The CEEDD data is constructed with the following components from 2001 onwards:
- T1 Personal Master File (T1 PMF)
- T1 Historical Personal Master File (T1 H)
- T1 Family File (T1FF)
- Financial Declaration File (T1FD, 2005 onward)
- T1 Business Declaration File (T1BD, 2005 onward)
- T2 Corporation Income Tax Return
- T4 Statement of Remuneration Paid Files (T4)
- Record of Employment (ROE)
- Trade by Exporter Characteristics (TEC, 2010 onward)
- Trade by Importer Characteristics (TIC, 2010 onward)
- National Accounts Longitudinal Microdata File (NALMF)
- Longitudinal Immigration Database (IMDB)
- Temporary Residents File (TR)

Error detection

Individual-level data of tax filers are drawn from the T1 PMF and T1 H files. Selected variables from the yearly T1 tax form are extracted and processed for the analytical output files under CEEDD. Three key processes are performed: a) Harmonize variable names and codes; b) Identify multiple SIN holders and link the tax data of the same individual but under different SINs longitudinally; and c) Produce consistent demographics.

Efforts have been spent to track the same enterprises over time to distinguish 'real' entries and exits from 'false' ones. Real entries and exits reflect actual demographic events (the creation of new enterprises and the failure of existing ones); false entries and exits may simply reflect organizational restructuring within an enterprise or a change in its reporting practices. Methods such as labor tracking or tracking predecessors and successors for every pair of two years have been used to facilitate this process.

Imputation

Processing of the selected variables from different component files under the CEEDD linkage environment might require the use of imputation to fill in information that is missing or in error from the source files. Imputation done during the processing is documented in their own documentation to inform users.

Estimation

Individual-level data is reported on a calendar-year basis while most firm-level data is reported on a fiscal-year basis. A procedure has been applied to convert all firm-level records to a calendar-year basis to be consistent. That is, all firm-level records are first prorated according to the number of days in the calendar year covered by each reported fiscal period. These weighted records are then aggregated to the corresponding calendar year from multiple fiscal-year records.

Quality evaluation

Specific data issues related to variables in the different CEEDD component files are described in more details in their own documentation.

Disclosure control

Statistics Canada is prohibited by law from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

In order to prevent any data disclosure, confidentiality analysis is done using the Statistics Canada Generalized Disclosure Control System (G-Confid). G-Confid is used for primary suppression (direct disclosure) as well as for secondary suppression (residual disclosure). Direct disclosure occurs when the value in a tabulation cell is composed of or dominated by few enterprises while residual disclosure occurs when confidential information can be derived indirectly by piecing together information from different sources or data series.

Revisions and seasonal adjustment

The CEEDD database is updated annually with the addition of another year of data. Variables from each year are maintained in the yearly files but the harmonized information based on tracking of information over time may change due to more information from the most recent available year as well as updated information related to existing variables.

Data accuracy

Under the CEEDD linkage environment, there are now more situations with information that can be captured from more than one source file (for example, total earnings from T4 can be found from both T4 slips as well as from T1 files). It is important that users be aware of the situation and make the analytical decision necessary for the project when inconsistent information arises in these cases.

Specific data issues related to variables in the different component files are described in more detail in their own documentation.

Documentation

Date modified: