An idiot's guide to lead optimisation for proteins

theophrastus•about 1 hour ago

After spending an entire career doing 'by hand' (and a helluva lot of molecular orbital calculations) on the problem this post is about, i've got to tersely weigh in with: there's (still) not enough available data given the size of protein 'phase space' to hope for a proper covering with one's trained up linear algebra model. Or typed another way: you've got to include at some stage some physical modeling parameters, like molecular orbitals [1], otherwise the 'response curve' will only optimize if one gets quite lucky, (which is actually unlucky as then you'll delude yourself into thinking it's a generally applicable, which it isn't). For instance, swap in a carboxylic acid moiety where there was previously an aldehyde, a protein side-chain flips over, and you're in a completely different corner of the energetic 'galaxy'.

[1] e.g. https://proteindf.github.io/

phreeza•18 minutes ago

That seems possible for generating completely new proteins.

Do you think it's also the case for lead optimization where you typically have some degree of measurements around your starting point, and you are expecting to stay in that local neighborhood for the generated candidates, too?

(Disclaimer: former Cradle employee here)

patrickkidger•12 minutes ago

Oh hello Thomas, fancy seeing you here :D ex-Cradlers unite!

patrickkidger•8 minutes ago

I'll offer a +1 to the sibling comment here.

Yeah it's totally true you can't build a one-size-fits-all foundation model, the data just isn't there. But also... no-one needs that. It's totally fine to tweak a foundation model for any individual problem, and that's the bulk of what is being described in the linked blog post / in the underlying paper.

FWIW whilst at Cradle we had a lot of doubts going into this. Like, thermostability is clearly evolutionarily correlated so it was always pretty likely that by hook or by crook the models could do that correctly. But, binding? Aggregation? Not at all clear that the same principles should hold. And the exciting finding was that yes, yes they do.

An idiot's guide to lead optimisation for proteins

⚡ Community Insights

Discussion (1 Comments)Read Original on HackerNews