Chemists are training machine learning algorithms used by Facebook and Google to find new molecules

by Roberto Molar Candanosa

January 6, 2020

Steven A. Lopez is an assistant professor of chemistry and chemical biology in the College of Science at Northeastern. Photo by Matthew Modoono/Northeastern University

For more than a decade, Facebook and Google algorithms have been learning as much as they can about you. It’s how they refine their systems to deliver the news you read, those puppy videos you love, and the political ads you engage with.

These same kinds of algorithms can be used to find billions of molecules and catalyze important chemical reactions that are currently induced with expensive and toxic metals, says Steven A. Lopez, an assistant professor of chemistry and chemical biology at Northeastern.

Lopez is working with a team of researchers to train machine learning algorithms to spot the molecular patterns that could help find new molecules in bulk, and fast. It’s a much smarter approach than scanning through billions—and billions—of molecules without a streamlined process.

“We’re teaching the machines to learn the chemistry knowledge that we have,” Lopez says. “Why should I just have the chemical intuition for myself?”

The alternative to using expensive metals is organic molecules, and particularly plastics, which are everywhere, Lopez says. Depending on their molecular structure and ability to absorb light, these plastics can be converted with chemistry to produce better materials for today’s most important problems.

Lopez says the goal is to find molecules with the right properties and similar structures as metal catalysts. But to attain that goal, Lopez will need to explore an enormous number of molecules.

Thus far, scientists have been able to synthesize only about a million molecules. But conservative estimates of the number of possible molecules that could be analyzed is a quintillion, which is 10 raised to the power of 18, or the number one followed by 18 zeros.

Lopez thinks of this enormous number of possibilities as a vast ocean made up of billions of unexplored molecules. Such an immense molecular space is practically impossible to navigate—even if scientists were to combine experiments with supercomputer analysis.

Lopez says all of the calculations that have ever been done by computers add up to about a billion, or 10 to the ninth power. That’s about a million times less than the possible molecules.

“Forget it, there’s no chance,” he says. “We just have to use a smarter search technique.”

That’s why Lopez is leading a team, supported by a grant from the National Science Foundation, that includes research from Tufts University, Washington University in St. Louis, Drexel University, and Colorado School of Mines. The team is using an open-access database of organic molecules called VERDE materials DB, which Lopez and colleagues recently published, to improve their algorithms and find more useful molecules.

The database will also register newly found molecules, and can serve as a data hub of information for researchers across several different domains, Lopez says. That’s because it can launch researchers toward finding different molecules with many new properties and applications.

In tandem with the database, the algorithms will allow scientists to use computational resources more efficiently. After molecules of interest are found, researchers will recalibrate the algorithm to find more similar groups of molecules.

The active-search algorithm, developed by Roman Garnett at Washington University in St. Louis, uses a process similar to the classic board game Battleship, in which two players guess hidden locations off a grid to target and destroy vessels within a naval fleet.

In that grid, players place vessels as far apart as possible to make opponents miss targets. Once a ship is hit, players can readjust their strategy and redirect their attacks to the coordinates surrounding that hit.

That’s exactly how Lopez thinks of the concept of exploring a vast ocean of molecules.

“We are looking for regions within this ocean,” he says. “We are starting to set up the coordinates of all the possible molecules.”

Hitting the right candidate molecules might also expand the understanding that chemists have of this unexplored chemical space.

“Maybe we’ll find out through this analysis that we have something really at the edge of what we call the ocean, and that we can expand this ocean out a bit more in that region,” Lopez says. “Those are things that we wouldn’t [be able to find by searching] with a brute force, trial-and-error kind of approach.”

For media inquiries, please contact Jessica Hair at j.hair@northeastern.edu or 617-373-5718.

Schools in India don’t teach soft skills. This entrepreneur does

Are energy drinks the new cigarettes? England thinks so

Looking for an environmentally friendly drinking straw? The answer may surprise you

China’s ‘super embassy’ proposal in London sparks spying fears

Chemists are training machine learning algorithms used by Facebook and Google to find new molecules

Share

Is math really the language of nature? This physicist is on a quest to find out.

This student helps Northeastern’s coronavirus testing center run like a well-oiled machine

A COVID-19 vaccine won’t mean a swift end for wearing masks or physical distancing

Schools in India don’t teach soft skills. This entrepreneur does

Are energy drinks the new cigarettes? England thinks so

When can I start my midlife crisis?

Scientists have mapped millions of genomes. They are also drowning in data

They went all the way to the Panama Canal for their fluid mechanics class

Can you get a lower mortgage rate today? Don’t bet on it

Psychotherapy? Zoloft? AI can help your doctor choose

Houston, we have a Husky

This CEO was plucked from the classroom