Machine learning gets smarter to speed up drug discovery

Researchers develop a self-supervised learning framework that leverages the large amounts of unlabeled data that other models can’t. Credit: Mechanical and AI Lab, Carnegie Mellon University

Predicting molecular properties quickly and accurately is important to advancing scientific discovery and application in areas ranging from materials

Avremo presto un giudice robot?

Metaverso, blockchain, ologrammi: ecco come pagheremo in futuro

Spazio: tecnologie quantistiche. Sinergia tra scienza e industria

Predicting molecular properties quickly and accurately is important to advancing scientific discovery and application in areas ranging from materials science to pharmaceuticals. Because experiments and simulations to explore potential options are time-consuming and costly, scientists have investigated using machine learning (ML) methods to aid in computational chemistry research. But, most ML models can only make use of known, or labeled, data. This makes it nearly impossible to predict with accuracy the properties of novel compounds.

In an industry like drug discovery, there are millions of molecules from which to select for use in a potential drug candidate. A prediction error as small as 1% can lead to the misidentification of more than ten thousand molecules. Improving the accuracy of ML models with limited data will play a vital role in developing new treatments for disease.

While the amount of labeled molecule data is limited, there is a rapidly growing amount of feasible, but unlabeled, data. Researchers at Carnegie Mellon University’s College of Engineering pondered if they could use this large volume of unlabeled molecules to build ML models that could perform better on property predictions than other models.

Their work culminated in the development of a self-supervised learning framework named MolCLR, short for Molecular Contrastive Learning of Representations with Graph Neural Networks (GNNs). The findings were published in the journal Nature Machine Intelligence.

“MolCLR significantly boosts the performance of ML models by leveraging approximately 10 million unlabeled molecule data,” said Amir Barati Farimani, assistant professor of mechanical engineering.

For a simple explanation of labeled vs. unlabeled data, Ph.D. student Yuyang Wang suggested thinking of two sets of images of dogs and cats. In one set, each animal is labeled with the name of its species. In the other set, no labels accompany the images. To a human, the difference between the two types of animals might be obvious. But to a machine learning model, the difference isn’t clear. The unlabeled data is therefore not reliably useful. Applying this analogy to the millions of unlabeled molecules that could take humans decades to manually identify, the critical need for smarter machine learning tools becomes obvious.

The research team sought to teach its MolCLR framework how to use unlabeled data by contrasting positive and negative pairs of augmented molecule graph representations. Graphs transformed from the same molecule are considered a positive pair, while those from different molecules are negative pairs. By this means, representations of similar molecules stay close to each other, while distinct ones are pushed far apart.

The researchers had applied three graph augmentations to remove small amounts of information from the unknown molecules: atom masking, bond deletion, and subgraph removal. In atom masking, a piece of information about a molecule is eliminated. In bond deletion, a chemical bond between atoms is erased. A combination of both augmentations results in subgraph removal. Through these three types of changes, the MolCLR was forced to learn intrinsic information and make correlations.

When the team applied MolCLR to ClinTox, a database used to predict drug toxicity, MolCLR significantly outperformed other ML baseline models. On another database, Tox21, MolCLR stood out from the other ML models with the potential to distinguish which environmental chemicals posed the most severe threats to human health.

“We’ve demonstrated that MolCLR bears promise for efficient molecule design,” said Barati Farimani. “It can be applied to a wide variety of applications, including drug discovery, energy storage, and environmental protection.”

Fonte: Techxplore.com

Cookie	Durata	Descrizione
cookielawinfo-checkbox-analytics	11 mesi	Questo cookie è impostato dal plugin GDPR Cookie Consent. Il cookie viene utilizzato per memorizzare il consenso dell'utente per i cookie della categoria "Analitici".
cookielawinfo-checkbox-functional	11 mesi	Il cookie è impostato dal GDPR cookie consent per registrare il consenso dell'utente per i cookie della categoria "Funzionali".
cookielawinfo-checkbox-necessary	11 mesi	Questo cookie è impostato dal plugin GDPR Cookie Consent. Il cookie viene utilizzato per memorizzare il consenso dell'utente per i cookie della categoria "Necessario".
cookielawinfo-checkbox-others	11 mesi	Questo cookie è impostato dal plugin GDPR Cookie Consent. Il cookie viene utilizzato per memorizzare il consenso dell'utente per i cookie della categoria "Altro".
cookielawinfo-checkbox-performance	11 mesi	Questo cookie è impostato dal plugin GDPR Cookie Consent. Il cookie viene utilizzato per memorizzare il consenso dell'utente per i cookie della categoria "Prestazioni".
viewed_cookie_policy	11 mesi	Il cookie è impostato dal plugin GDPR Cookie Consent ed è utilizzato per memorizzare se l'utente ha acconsentito o meno all'uso dei cookie. Non memorizza alcun dato personale.

Cookie	Durata	Descrizione
__atuvc	1 anno 1 mese	AddThis imposta questo cookie per garantire che il conteggio aggiornato venga visualizzato quando si condivide una pagina e si ritorna ad essa, prima che la cache del conteggio delle condivisioni venga aggiornata.
__atuvs	30 minuti	AddThis imposta questo cookie per garantire che il conteggio aggiornato venga visualizzato quando si condivide una pagina e si ritorna ad essa, prima che la cache del conteggio delle condivisioni venga aggiornata.

Cookie	Durata	Descrizione
__gads	1 anno 24 giorni	Il cookie __gads, impostato da Google, viene memorizzato nel dominio DoubleClick e tiene traccia del numero di volte in cui gli utenti vedono un annuncio pubblicitario, misura il successo della campagna e ne calcola i ricavi. Questo cookie può essere letto solo dal dominio in cui è stato impostato e non traccia alcun dato durante la navigazione in altri siti.
_ga	2 anni	Il cookie _ga, installato da Google Analytics, calcola i dati dei visitatori, delle sessioni e delle campagne e tiene anche traccia dell'utilizzo del sito per il rapporto analitico del sito. Il cookie memorizza le informazioni in forma anonima e assegna un numero generato in modo casuale per riconoscere i visitatori unici.
_gat_gtag_UA_64767110_8	1 minuto	Impostato da Google per distinguere gli utenti.
_gid	1 giorno	Installato da Google Analytics, il cookie _gid memorizza informazioni sulle modalità di utilizzo di un sito web da parte dei visitatori e crea un rapporto analitico sulle prestazioni del sito. Alcuni dei dati raccolti includono il numero di visitatori, la loro provenienza e le pagine visitate in forma anonima.
uvc	1 anno 1 mese	Impostato da addthis.com per determinare l'utilizzo del servizio addthis.com.

Cookie	Durata	Descrizione
loc	1 anno 1 mese	AddThis imposta questo cookie di geolocalizzazione per aiutare a capire la posizione degli utenti che condividono le informazioni.
test_cookie	15 minuti	Il test_cookie è impostato da doubleclick.net e viene utilizzato per determinare se il browser dell'utente supporta i cookie.

Cookie	Durata	Descrizione
__gpi	1 anno 24 giorni	Nessuna descrizione
xtc	1 anno 1 mese	Nessuna descrizione

Machine learning gets smarter to speed up drug discovery

Researchers develop a self-supervised learning framework that leverages the large amounts of unlabeled data that other models can’t. Credit: Mechanical and AI Lab, Carnegie Mellon University

Commenti

AUTHOR: Direzione

Machine learning gets smarter to speed up drug discovery

Researchers develop a self-supervised learning framework that leverages the large amounts of unlabeled data that other models can’t. Credit: Mechanical and AI Lab, Carnegie Mellon University

Commenti

AUTHOR: Direzione

RECOMMENDED FOR YOU