I am hiring a PhD student in AI for structure prediction and design. The position is fully funded for 5 years (salary, health insurance, benefits) and will take place at Stockholm University/SciLifeLab.
Hi! I am starting a group at Stockholm University in AI for protein applications. Currently hiring a PhD student: and a postdoc: . Come work with me in beautiful Stockholm!
Happy to see Umol published:
If you want to predict the structure of protein-ligand complexes and avoid being sued by Google or Nvidia - we are here for you. Great work with
@AtharvaKelkar
Andrea Guljas
@CecClementi
and
@FrankNoeBerlin
We trained a new network for protein structure prediction on a conformational split of the PDB to generate alternative conformations. 52% (81) of the nonredundant protein conformations evaluated are predicted with high accuracy (TM-score>0.8).
Try it out:
We designed linear and cyclic peptide binders only from a protein target sequence. Different lengths; 46% had Kd in the micromolar range from a single sequence selection.
Check out our new paper where we improve protein complex prediction by performing gradient descent through the AF-multimer network. We effectively denoise the MSA profile, similar to how a blurry image would be sharpened.
Update: Umol can predict affinity, even though we did not train for this(!). Here are affinity values vs ligand plDDT on a held-out test set. This is not in the preprint () - more details will come in the publication.
This is what it looks like. Only the receptor sequence (green) is input and EvoBind2 finds a binding spot and creates a binder (blue) simultaneously. Code:
We designed linear and cyclic peptide binders only from a protein target sequence. Different lengths; 46% had Kd in the micromolar range from a single sequence selection.
Mind-blowing talk by
@Patrick18287926
at the
@cphbiosciphd
Copenhagen Bioscience Snapshot about AI tools for the prediction of protein interactions! It was a pleasure for
@VKleinSousa
and me to host you.
Try out his fast and state-of-the-art tools!
It is hard to tell what the performance here really is. They report 42% <2Å l-RMSD on PostBusters (), but train on all of PDB? DiffDock has 1% considering data-overlaps and chemical validity. Please mind your data.
RoseTTAFold updated to be All-Atom... biological assemblies containing proteins, nucleic acids, small molecules, metals, and covalent modifications ... and diffusion🤯
Hi! I am starting a group at Stockholm University in AI for protein applications. Currently hiring a PhD student: and a postdoc: . Come work with me in beautiful Stockholm!
Hi! I am starting a group at Stockholm University in AI for protein applications. Currently hiring a PhD student: and a postdoc: . Come work with me in beautiful Stockholm!
This is now published including a substantial comparison with Alphafold-multimer where we highlight the success of MolPC for large protein complexes of different symmetry
My extremely talented student Patrick has taken
#alphafold
one step further and shows that we are entering the era of automatic predictions of the structure of large complexes
@pushmeet
Great to hear - please also wait to release the print in nature to maintain the scientific practice. I think we still have a long way to go in this space and industry-academia interplay is essential
AlphaFold3: will be available as a web server. The protein-ligand lDDT displays how far we have to go until these models really become useful. We are missing some crucial piece of information:
@sokrypton
I can't read the paper - no Science access (I guess they couldn't afford the open-source fee). The results are underwhelming though and I don't see the point of predicting 71 million missense variants when the SpearmanR is 0.5 to ddG (I presume)
We went through the PDB and partitioned all single-chain structures into structural clusters. Then, we train on one partition and evaluate on the other. Result: learn to relate one MSA to one conformation and evaluate on all proteins with substantial conformational changes.
Hi! I am starting a group at Stockholm University in AI for protein applications. Currently hiring a PhD student: and a postdoc: . Come work with me in beautiful Stockholm!
@NobelPrize
The correlation between telomere length and ageing is around 0.4 in humans. This statement makes it look like you actually know why we age/what ageing is. Please refer to the sources of this statement. Finding an enzyme that extends DNA ends hardly explains ageing...
@arneelof
Yeah, I think the ddG data is the key here and there the performance is low. I think what is being learned is evolutionary likelihoods and not the actual effects.
@judith_bernett
@itisalist
@dbblumenthal
Great to see that someone shows what many have assumed: 99% PPI accuracy from single sequence is unreasonable...Reviewers have to be better at checking data.
We demonstrate the performance on seven difficult targets from CASP15 and increase the average MMscore to 0.76 compared to 0.63 with AF-multimer(AFM). Available here:
I am hiring a PhD student in AI for structure prediction and design. The position is fully funded for 5 years (salary, health insurance, benefits) and will take place at Stockholm University/SciLifeLab.
@amine_ketata
Doesn't it say that HDock is better at 2Å? Or do you consider your oracle with perfect selection here? Btw, did you try to run the Hdock scoring function on your generated poses? That scoring function looks very good for many tasks
Neural networks such as AlphaFold2 see almost all conformations in the PDB during training. Therefore, it is not possible to assess whether alternative protein conformations can be predicted or if these are reproduced from memory.
Our protocol, AFProfile, provides a way to direct predictions towards a defined target function guided by the MSA. We expect gradient descent over the MSA to be useful for different tasks, such as generating alternative conformations.
@arneelof
We mutate the binding residues of peptides and show that AF can distinguish between native and mutated peptide binders using only the plDDT score:
@AnimaAnandkumar
How do you know? I think there should be very little difference btw any of these tools on new (low homology) targets. None work very well.
@sokrypton
I have noticed similar issues with training and I think this is by design. If you delete the references to the function you are calling (e.g. del predict) after you call it this releases the cache. You can also do .clear_cache() depending on how things are vmapped/pmapped.
@pedrobeltrao
@sokrypton
There is even published work building on EvoBind (EvoPlay, ) which is great 👍 Difference is they cite us. Maybe it's just me, but I think previous work should be cited. They should also cite EvoPlay.
@WillMcCorki1
Thanks, maybe mention this in the main text? The issue here is the overlap with PoseBusters. It would be nice to see the performance on proteins with <20% seqid to your dataset compared to the other methods, aka performance on unseen targets.
@y_bromberg
Great to see these types of studies 👏. As expected, function prediction is more like annotation. Perhaps data partitions should not be based on date cutoffs for evaluation (CAFA)...
Hi! I am starting a group at Stockholm University in AI for protein applications. Currently hiring a PhD student: and a postdoc: . Come work with me in beautiful Stockholm!
@sokrypton
Update: we have now added the possibility to upload your own receptor structures and to specify a starting sequence to optimise further. This also makes it possible to only predict by setting NITER=1. If you have a request, contact us. 😊
@arneelof
@amine_ketata
Hdock seems to be the best for rigid docking. It is also very fast. Quite impressive since the scoring function was trained on relatively few structures.
@JameAbduljalil
@sokrypton
@arneelof
We didn't. If this is what you have available in the colab session the largest trimer can't be bigger than 1000 residues. You could try it in dimer mode if this doesn't fit for you.
@nickpolizzi_
@sokrypton
Important to also extract the available seqres from the files, otherwise you will end up with clusters/mappings that do not exist (don't trust the seqres from pdb)
@JameAbduljalil
@sokrypton
@arneelof
There is no upper limit. This will depend on your available GPU memory. Currently, approximately 3000 residues fit on 40Gb which will be limited by the size of the largest trimeric subcomponent.
@arneelof
AF adapts the receptor interface structure to the binders during optimisation. An example for PDB ID 2cnz is shown, where residue 12 (orange) in the receptor (grey, green) changes orientation to interact with the peptide residues (blue, magenta).
@sokrypton
Very interesting. Maybe it is best to use AF, OF, OmegaFold and ESMFold? Will be interesting to see how one can reduce the adversarial effect through a joint score. Are we then back at ensemble methods?😅
@arneelof
This is not the same problem though. By that logic, everyone can continue to evaluate on their training data. Sad that this is so widespread that it is not considered problematic.
@Lauren_L_Porter
I think this is true. Note that ColabFold is not trained in any way. ColabFold is an online version of AlphaFold. What you are really discussing is AlphaFold and overfitting to certain sequences.
@owl_poster
Also like the idea, but 58% with <native activity will not be very useful. I would be surprised if a different fold was adopted at 58%. The active site is really what matters here
@BrianHie
Nice work! I would perhaps remove the conclusion regarding the correlation analysis in Figure 3B, this result I think is due to the large separation btw values and very few data points.
@ylecun
You are the first person I hear agree with me on the power of gradient descent. Everyone I have ever talked to seems to think it is not such an important concept in ML. To me, the idea of gradient descent is what is important.
The PhD position comes with full funding for 4 years, salary, insurance, pension package, health benefits and help with student housing (the same package as for all employees).
@RolandDunbrack
Ligand format. Can't be included in protein chains sometimes and sometimes not. Sometimes multiple ligands are merged, sometimes they are too similar to amino acids, sometimes the residue number doesn't change for the atoms, sometimes it does.
@FurmanLab
I don't think it matters if the structures were included in training/not if you still use them at inference. Would be surprised if AF3 doesn't manage given the native structures as input
@arneelof
Update: we have now added the possibility to upload your own receptor structures and to specify a starting sequence to optimise further. This also makes it possible to only predict by setting NITER=1. If you have a request, contact us. 😊