Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such as with orphan proteins or fast-evolving proteins like antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its three-dimensional structure, suggesting that evolutionary information and MSAs should not be necessary to predict a protein’s folded form. Here, we introduce OmegaFold, the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures. OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionally characterized protein family and antibodies that tend to have noisy MSAs due to fast evolution. Our study fills a much-encountered gap in structure prediction and brings us a step closer to understanding protein folding in nature.
Shapley values have become one of the most popular feature attribution explanation methods. However, most prior work has focused on post-hoc Shapley explanations, which can be computationally demanding due to its exponential time complexity and preclude model regularization based on Shapley explanations during training. Thus, we propose to incorporate Shapley values themselves as latent representations in deep models thereby making Shapley explanations first-class citizens in the modeling paradigm. This intrinsic explanation approach enables layer-wise explanations, explanation regularization of the model during training, and fast explanation computation at test time. We define the Shapley transform that transforms the input into a Shapley representation given a specific function. We operationalize the Shapley transform as a neural network module and construct both shallow and deep networks, called ShapNets, by composing Shapley modules. We prove that our Shallow ShapNets compute the exact Shapley values and our Deep ShapNets maintain the missingness and accuracy properties of Shapley values. We demonstrate on synthetic and real-world datasets that our ShapNets enable layer-wise Shapley explanations, novel Shapley regularizations during training, and fast computation while maintaining reasonable performance. Code is available at https://github.com/inouye-lab/ShapleyExplanationNetworks.
De-identifying qualitative datasets is time-consuming and expensive but is a critical step in protecting the confidentiality of study participants. Peer-to-peer comments are an important supplement to peer evaluation ratings in team-based learning courses. Those comments comprise valuable research data for educational study to investigate but they usually contain identifiable information, such as names. In this work in progress, we study and propose a pipeline tool to identify all names appearing in CATME team peer evaluation comments and replacing those names with pseudonyms such as Rater 1 and Rater 2. We explored several natural language processing techniques empowered by machine learning methods and then optimized to the final algorithm. At its core, the algorithm combines the long short-term memory (LSTM) and conditional random field (CRF) approaches most common in the field of named entity recognition. The current algorithm performs well, with a high recall of 0.8 with reasonable precision scores resulting in 76 of score as we want to put an emphasis on recalls. We also propose this as a tool to make a large amount of data available for research that would otherwise be sensitive due to personally identifiable information.
Peer evaluation has been well established as an effective method to motivate team members to reflect their contribution and performance, to enforce sense of responsibility, to act as an incentive for demonstrating good interpersonal skills and to help the team achieve its goals. Behaviorally anchored rating scale is generally considered as an efficient and fair method to measure certain scores. However, in the application of peer evaluation, numerical rating could be influenced by raters’ biased understanding of the scale based on their cultural background. Supplementing peer-to-peer comments with numerical peer evaluation system could remediate rater bias effect. In this paper, we propose a natural language processing model that (1) processes the peer-to-peer comments about rater’s teammates’ teamwork behaviors; (2) converts comments into numerical space that allow for computation We evaluate our results against CATME data and validate our proposed system.