The Open Database of Educational Facilities (ODEF)

Detailed information for August 2019

Status:

Active

Frequency:

Occasional

Record number:

5301

For the purpose of exploring open data for official statistics and to support geospatial research across various domains, the Data Exploration and Integration Lab (DEIL) undertook a project to create an accessible and harmonized database of educational facilities released as open data by various levels of government within Canada.

Data release - August 19, 2019

Description

The Open Database of Educational Facilities (ODEF) is a compilation of data from open and internet sources on the locations and types of educational facilities across Canada, originating from municipal, regional, and provincial governments. It is a centralized and harmonized repository of educational facility data made available under the Open Government License - Canada. The database is expected to be updated periodically as new open datasets from government sources become available.

The benefit of open data is that any user can access and make use of it freely. Individuals, formal and informal organizations, or enterprises can use the data and other information to research and innovate on any number of topics.

This dataset is one of a number of datasets created as part of the Linkable Open Data Environment (LODE). The LODE is an exploratory initiative that aims at enhancing the use and harmonization of open data from authoritative sources by providing a collection of datasets released under a single license, as well as open source code to link these datasets together. Access to the LODE datasets and code are available through the Statistics Canada website and can be found at:
https://www.statcan.gc.ca/eng/lode

Reference period: The reference period for these data varies. For more information on the reference period of a specific dataset, please consult the open data portal for that data provider directly.

Data sources and methodology

Target population

An education facility is a physical site at which the primary activity is imparting instruction to a body of students or participants. All education facilities in Canada are in scope for this dataset. These include all levels of education, private and public schools with no exclusions for funding arrangement, operator type, subject area, denomination, student type, location, etc. Note that both on and off reserve facilities are covered by this database.
As a result of this definition, the database covers facilities such as early childhood education, kindergarten, elementary, secondary, and post-secondary institutions, and specific vocational training centers (such as hair dressing schools). The database does not include virtual educational institutions and daycares.

Instrument design

This methodology type does not apply to this statistical program.

Sampling

Data are collected for all units of the target population, therefore no sampling is done.

Data sources

Data collection for this reference period: 2019-01-01 to 2019-07-31.

Individual open datasets were downloaded from their respective government open data portals. In addition to openly licensed databases, the ODEF also includes a set of publically available listings of educational facilities for which permission to include was granted by the data providers.

The primary processing component for the database comprised reformatting the source data to CSV format and mapping the original dataset attributes to standard variable (column) names. Concatenated address data were parsed and separated into the respective location variables using libpostal, a natural language processing solution for address parsing. The original data files and fields were converted to standard formats and fields using the custom software OpenTabulate. Deduplication was done using literal and fuzzy string matching.

Error detection

During processing, entries with incorrect postal code or 2-letter province/territory code format were separated from the cleaned data and identified as erroneous, which were then manually corrected. A limited number of entries were manually corrected when it was clear that the parsing had not been done correctly.

Imputation

The original data sources use a variety of standards, classifications and nomenclatures to describe the education level or grade range. The International Standard Classification of Education (ISCED) is used to provide a standard definition of an education level, which allows for the imputation of ISCED levels of a facility from its corresponding grade range or education level label in the original data.

ISCED levels were directly obtained from the grade range indicated by the data provider, if there was one available. Otherwise, an education level was converted to a grade range, which was then mapped to ISCED levels.

Census subdivision (CSD) names were derived from two different attributes in the data. The first attribute is the geographic coordinates, namely latitude and longitude. The second attribute is the city name, where literal string matching is done with each education facility municipality name and a list of CSD names.

Estimation

This methodology type does not apply to this statistical program.

Quality evaluation

Validation of the underlying datasets was not undertaken, since data provided was taken "as-is".

Disclosure control

This methodology does not apply.

Revisions and seasonal adjustment

This methodology type does not apply to this survey.

Data accuracy

This methodology does not apply.

Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: