CRISPR—Clustered Regularly Interspaced Short Palindromic Repeats—is the microbial world’s answer to adaptive immunity. Bacteria don’t generate antibodies when they are invaded by a pathogen and then hold those antibodies in abeyance in case they encounter that same pathogen again, the way we do. Instead, they incorporate some of the pathogen’s DNA into their own genome and link it to an enzyme that can use it to recognize that pathogenic DNA sequence and cut it to pieces if the pathogen ever turns up again.
The enzyme that does the cutting is called Cas, for CRISPR associated. Although the CRISPR-Cas system evolved as a bacterial defense mechanism, it has been harnessed and adapted by researchers as a powerful tool for genetic manipulation in laboratory studies. It also has demonstrated agricultural uses, and the first CRISPR-based therapy was just approved in the UK to treat sickle-cell disease and transfusion-dependent beta-thalassemia.
Now, researchers have developed a new way to search genomes for CRISPR-Cas-like systems. And they’ve found that we may have a lot of additional tools to work with.
Modifying DNA
To date, six types of CRISPR-Cas systems have been identified in various microbes. Although they differ in detail, they all have the same appeal: They deliver proteins to a given sequence of genetic material with a degree of specificity that has heretofore been technically difficult, expensive, and time-consuming to achieve. Any DNA sequence of interest can be programmed into the system and targeted.
The native systems found in microbes usually bring a nuclease—a DNA-cleaving enzyme—to the sequence, to chop up the genetic material of a pathogen. This ability to cut any chosen DNA sequence can be used for gene editing; in tandem with other enzymes and/or DNA sequences, it can be used to insert or delete additional short sequences, correcting mutant genes. Some CRISPR-Cas systems cleave specific RNA molecules instead of DNA. These can be used to eliminate pathogenic RNA, like the genomes of some viruses, the way they are eliminated in their native bacteria. This can also be used to rescue defects in RNA processing.
But there are lots of additional ways to modify nucleic acids that might be useful. And it’s an open question as to whether enzymes that perform additional modifications have evolved. So, some researchers decided to search for them.
Researchers at MIT developed a new tool to detect variable CRISPR arrays and applied it to 8.8 tera (1012)-base pairs of prokaryotic genomic information. Many of the systems they found are rare and only appeared in the dataset in the past 10 years, highlighting how important it is to continue adding environmental samples that were previously hard to attain into these data repositories.
The new tool was required because databases of protein and nucleic acid sequences are expanding at a ridiculous rate, so the techniques for analyzing all of that data need to keep up. Some algorithms that are used to analyze them try to compare every sequence to every other one, which is obviously untenable when dealing with billions of genes. Others rely on clustering, but these find only genes that are highly similar so they can’t really shed light on the evolutionary relationships between distantly related proteins. But fast locality-sensitive hashtag-based clustering (“flash clust”) works by binning billions of proteins into fewer, larger clusters of sequences that differ slightly to identify new, rare relatives.
The search using FLSHclust successfully pulled out 188 new CRISPR-Cas systems.
Lots of CRISPyness
A few themes emerged from the work. One is that some of the newly identified CRISPR systems use Cas enzymes with never-before-seen domains, or appear to be fusions with known genes. The scientists further characterized some of these and found one to be more specific than the CRISPR enzymes currently in use, and another that cuts RNA that they propose is structurally distinct enough to comprise an entirely new seventh type of CRISPR-Cas system.
A corollary of this theme is the linkage of enzymes with different functionalities, not just nucleases (enzymes that cut DNA and RNA), with CRISPR arrays. Scientists have harnessed CRISPR’s remarkable gene-targeting ability by linking it to other kinds of enzymes and molecules, like fluorescent dyes. But evolution obviously got there first.
As one example, FLSHclust identified something called a transposase associated with two different types of CRISPR systems. A transposase is an enzyme that helps a particular stretch of DNA jump to another part of the genome. CRISPR RNA-guided transposition has been seen before, but this is another example of it. A whole host of proteins with varying functions, like proteins with transmembrane domains and signaling molecules, were found linked to CRISPR arrays, highlighting the mix-n-match nature of the evolution of these systems. They even found a protein expressed by a virus that binds to CRISPR arrays and renders them inactive—essentially, the virus inactivates the CRISPR system that evolved to protect against viruses.
Not only did the researchers find novel proteins associated with CRISPR arrays, but they also found other regularly interspaced repeat arrays that were not associated with any cas enzymes—similar to CRISPR but not CRISPR. They’re not sure what the functionality of these RNA guided systems might be but speculate that they are involved in defense just like CRISPR is.
The authors set out to find “a catalog of RNA-guided proteins that expand our understanding of the biology and evolution of these systems and provide a starting point for the development of new biotechnologies." It seems they achieved their goal: “The results of this work reveal unprecedented organizational and functional flexibility and modularity of CRISPR systems,” they write. They go on to conclude: “This represents only a small fraction of the discovered systems, but it illuminates the vastness and untapped potential of Earth’s biodiversity, and the remaining candidates will serve as a resource for future exploration.”
Article DOI: 10.1126/science.adi1910