Skip to content

Going Global with Big Data

Across the world, skills are one of the biggest data gaps in labour market information (LMI). In addition to being difficult to identify and measure, skills are not clearly defined. As we noted in a recent LMI Insight report, these skills data gaps can only be meaningfully filled if skills are linked to existing labour market data such as occupations. This is a big challenge for Canada, but as I learned last week, countries around the world face the same problem. One important avenue LMIC is exploring in collaboration with Statistics Canada, and Employment and Social Development Canada (ESDC) is how to take advantage of new big data sources that have recently become available.

Last week, I had the privilege of participating in a workshop on skills and big data hosted by the International Labour Organization (ILO). The workshop centred on the question, “Can we use big data for skills anticipation and matching?” The consensus was a resounding, “Yes! But there are many hurdles that must first be overcome.” On this front, participants identified four key challenges:

  1. Skills linked to traditional data

The need for reliable, accurate linkages between skills and other LMI indicators, primarily occupations, is crucial. Such linkages allow researchers and other LMI users to leverage existing historic data. For this reason, we are working with Statistics Canada and ESDC to assess the various approaches to linking skills to national occupational classifications (NOC). In doing so, our pilot project will focus on the 47 skills in ESDC’s new Skills and Competencies Taxonomy. As Figure 1 shows, however, many other workplace domains, such as knowledge, interests, work activities, and tools and technologies, can and should be mapped to existing LMI data such as occupation and industry.

  1. Big data must be used

The amount of available data has exploded in recent years, largely thanks to the skyrocketing use of job-posting websites. Web scraping and analysis firms such as Vicinity Jobs, Burning Glass Technologies, and Janzz are becoming experts in cleaning and structuring the rich data stored in these natural-language postings. Although job-posting data comes with many limitations ignoring this type of information would eliminate new and important insights from real-world language, words, and phrases used by employers. Cleaning and structuring this data remains a major challenge. Thankfully, many organizations are becoming more open about the taxonomies they use to organize the information pulled from big data sources.

  1. Big data is not the be-all and end-all

Some of the best features of online job-posting data are also a source of weakness. While natural (e.g., unformatted, raw) language descriptions of skills are important, postings may not include all implicit skills and, furthermore, skills are often mislabelled (e.g., field of study or computer program requirements). In this sense, there are both too few and too many “skills” identified in job-posting data. Resources such as the O*NET database, which provides skills data for US occupations, must continue to play an important role in ensuring that skills data pulled from job postings and other non-standard sources are reliable.

  1. Big data quality and availability varies widely

Not all jobs are posted online and this omission is not random. Job postings are often skewed to match urban job markets and white-collar occupations. While this is a problem for Canada and other advanced economies, for emerging and developing economies, the problem is magnified. In addition to a much smaller share of jobs being posted online, many emerging and developing countries lack a regular data collection tool such as the Labour Force Survey to link skills to. This is a major challenge for the ILO and its member countries. LMIC, Statistics Canada and ESDC will publish and share lessons learned in Canada as we begin to use and assess big data in combination with other methods. The ideal result would be that our experiences could help others address and overcome the skills data gap.

At the ILO workshop, I gained many insights into emerging data analysis techniques, applications, and continuing challenges faced in the LMI world. This included input from international organizations, private firms, and academics. Yet perhaps the clearest insight is that everyone is at the early stages of making sense of the world of work as viewed through the lens of skills. While the future holds many possible pathways, we all agreed that the best route forward can only be found in a shared commitment to data openness, transparency and dialogue.


Tony Bonen is LMIC’s Director, Research, Data and Analytics. He leads LMIC’s team of economists, investigating everything from Canada’s diverse labour market information needs to the potential implications of the changing nature of the world of work.

Leave a Comment

Scroll To Top