Skip to main content

Data Science at DIT: harnessing the potential of Natural Language Processing

Posted by: , Posted on: - Categories: People

Data visualisation on computer screen

Due to advances in computing power, new forms of analysis are now possible which in the past would have been impractical. A key development in Data Science has been in the field of Natural Language Processing (NLP). 

Our Digital, Data and Technology (DDaT) team in the Department for International Trade (DIT) works with state of the art technology which makes it easier to use the latest NLP tools and techniques to deliver new insights for the organisation.

Using NLP to better understand information

By using NLP techniques, we can automate analyses of language and improve our understanding of information in text form by processing large amounts of data at speeds that would previously have been impossible. 

Our Data Science team is using NLP to analyse our own internal data, as well as external sources of data, including social media.

Examples of NLP that you may have seen include

  • voice recognition in smartphones
  • spam filters for email 
  • customer support chatbots
  • translation of web pages into foreign languages

How computers process natural language

Word cloud graphic, the Department for International trade

Natural Language techniques are not based on computers as having any real understanding of natural language - this is something computers cannot currently do. The techniques involve quantifying statistical patterns in text according to rules that humans have set up in advance. 

For instance, we can count the number of words that are positive or negative according to a predefined list of words that have been categorised by sentiment. As an example, here are the top 5 negative and positive words in Jane Austen’s novel Emma:

Negative: poor, doubt, object, sorry, impossible.
Positive: well, good, great, like, better.

This type of analysis is being used by our Data Science team here in DIT to understand the sentiment behind customer feedback or social media data.

More complex analyses will take much longer pieces of text and analyse things like

  • patterns in high-frequency and low-frequency words
  • topics present in the text
  • sentiment - by looking at positive and negative words
  • similarity to other pieces of text

As stated above, we should not think of computers actually ‘understanding’ language in any real sense, and there are a number of challenges that would need to be resolved before we can say that computers can process natural language in the same way as humans. For example

  • some words have multiple meanings depending on the context
  • sarcasm or exaggeration can alter the meaning of words

Possible applications of NLP in DIT

Even so, the current state of NLP has powerful applications for all sectors of society including government. Possible applications for DIT include

  • extracting data from complex open source data
  • answering queries phrased in natural language in our websites
  • detecting phishing and making secure financial transactions
  • creating structured summaries of large volumes of feedback received through partner organisations
  • uncover insights on trade from social media

Our data team is continually looking at these applications using both public and internal data to deliver insight and improve operational processes within DIT.

This is part of our ambition to become an example for the most effective use of data to develop better digital services, guide trade policy and provide export and investment services.

Tony Coyne is Data Science Programme Lead in the Digital, Data and Technology (DDaT) team at DIT.

We currently have opportunities for data scientists to join our team. Visit our job listing page for details.

Find out more about the journey of our data team.

Subscribe to our blog channel for updates.

Sharing and comments

Share this page