Researchers pinpoint the “neurons” in machine-learning systems that capture specific linguistic features during language-processing tasks.

Researchers from MIT and the Qatar Computing Research Institute (QCRI) are putting the machine-learning systems known as neural networks under the microscope.

In a study that sheds light on how these systems manage to translate text from one language to another, the researchers developed a method that pinpoints individual nodes, or “neurons,” in the networks that capture specific linguistic features.

Neural networks learn to perform computational tasks by processing huge sets of training data. In machine translation, a network crunches language data annotated by humans, and presumably “learns” linguistic features, such as word morphology, sentence structure, and word meaning. Given new text, these networks match these learned features from one language to another, and produce a translation.

But, in training, these networks basically adjust internal settings and values in ways the creators can’t interpret. For machine translation, that means the creators don’t necessarily know which linguistic features the network captures.

In a paper being presented at this week’s Association for the Advancement of Artificial Intelligence conference, the researchers describe a method that identifies which neurons are most active when classifying specific linguistic features. They also designed a toolkit for users to analyze and manipulate how their networks translate text for various purposes, such as making up for any classification biases in the training data.

In their paper, the researchers pinpoint neurons that are used to classify, for instance, gendered words, past and present tenses, numbers at the beginning or middle of sentences, and plural and singular words. They also show how some of these tasks require many neurons, while others require only one or two.

“Our research aims to look inside neural networks for language and see what information they learn,” says co-author Yonatan Belinkov, a postdoc in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “This work is about gaining a more fine-grained understanding of neural networks and having better control of how these models behave.”

Co-authors on the paper are: senior research scientist James Glass and undergraduate student Anthony Bau, of CSAIL; and Hassan Sajjad, Nadir Durrani, and Fahim Dalvi, of QCRI.

Putting a microscope on neurons

Neural networks are structured in layers, where each layer consists of many processing nodes, each connected to nodes in layers above and below. Data are first processed in the lowest layer, which passes an output to the above layer, and so on. Each output has a different “weight” to determine how much it figures into the next layer’s computation. During training, these weights are constantly readjusted.

Neural networks used for machine translation train on annotated language data. In training, each layer learns different “word embeddings” for one word. Word embeddings are essentially tables of several hundred numbers combined in a way that corresponds to one word and that word’s function in a sentence. Each number in the embedding is calculated by a single neuron.