Influenza Virus Genome Sequencing and Genetic Characterization

Genome Sequencing

Influenza viruses are constantly changing, in fact all influenza viruses undergo genetic changes over time (for more information, How Flu Viruses Can Change: “Drift” and “Shift”). An influenza virus’ genome consists of all genes that make up the virus. CDC conducts year-round surveillance of circulating influenza viruses to monitor changes in the genome of these viruses. This work is performed as part of routine U.S. influenza surveillance and as part of CDC’s role as a World Health Organization (WHO) Collaborating Center for the Surveillance, Epidemiology and Control of Influenza. The information CDC collects from studying genetic changes (also known as “substitutions” or “mutations”) in influenza viruses plays an important public health role by helping to determine whether vaccines and antiviral drugs will work against currently circulating influenza viruses, as well as helping to determine the potential for influenza viruses in animals to infect humans.

Genome sequencing is a process that determines the order, or sequence, of the nucleotides (i.e., A, C, G and T/U) in each of the genes present in the virus’s genome. Nucleotides are organic molecules that form the structural unit building block of nucleic acids, such as RNA and DNA. All influenza viruses consist of single-stranded RNA as opposed to dual-stranded DNA. The RNA genes of influenza viruses are made up of chains of nucleotides that are bonded together and coded by the letters A, C, G and U, which stand for adenine, cytosine, guanine, and uracil, respectively. Full genome sequencing can reveal the approximately 13,500-letter sequence of all the genes of the influenza virus’ genome.

The two influenza types (A and B) that cause seasonal epidemics have eight RNA gene segments. These genes contain “instructions” for making new viruses, thereby spreading infection. An influenza virus’s surface proteins, hemagglutinin (HA) and neuraminidase (NA), determine important properties of the virus and are included in most seasonal vaccines, which is why they are analyzed more closely. In a typical year, CDC performs whole genome sequencing on about 7,000 influenza viruses from original clinical samples collected through virologic surveillance.

Comparing the nucleotides in one gene of a virus with that of a different virus can reveal variations between the two viruses. Genetic variations are important because they can change amino acids that make up the influenza virus’s proteins, resulting in structural changes to the protein, and thereby altering properties of the virus. Some of these properties include the ability to evade human immunity, spread between people, and susceptibility to antiviral drugs. The changes to the proteins can come in the form of amino acid substitutions, insertions, or deletions.

Genetic Characterization

CDC and other public health laboratories around the world have been sequencing the gene segments of influenza viruses since the 1980s. CDC contributes gene sequences to public databases, such as GenBank and the Global Initiative on Sharing Avian Influenza Data (GISAID), for use by researchers and public health scientists. The sequences deposited into these databases allow CDC and other researchers to compare the genes of currently circulating influenza viruses with the genes of older influenza viruses and those used in vaccines. This process of comparing genetic sequences is called genetic characterization. CDC uses genetic characterization for several reasons:

To determine how closely “related” or similar flu viruses are to one another genetically
To monitor how flu viruses are evolving or changing over time
To identify genetic changes that affect the virus’s properties. For example, to identify the specific changes that are associated with influenza viruses spreading more easily, causing more severe disease, or developing resistance to antiviral drugs
To assess how well a flu vaccine might protect against a particular influenza virus based on its genetic similarity to the virus
To monitor for genetic changes in influenza viruses circulating in animal populations that could enable them to infect humans.

The genetic differences among a group of influenza viruses can be shown by organizing them into a graphic called a “phylogenetic tree.” Phylogenetic trees for influenza viruses are like family (genealogy) trees for people. These trees show how closely ‘related’ individual viruses are to one another. Each sequence from a specific influenza virus has its own branch on the tree. Viruses are grouped by comparing changes in nucleotides within the gene. Where branches meet, these “nodes” represent the common ancestor of the viruses and indicate that the viruses share similar genetic sequences. Viruses which share a common ancestor can also be described as belonging to the same clade. The degree of genetic difference (number of nucleotide differences) between viruses is represented by the length of the horizontal lines (branches) in the phylogenetic tree. The further apart viruses are on the horizontal axis of a phylogenetic tree, the more genetically different the viruses are to one another.

Phylogenetic trees of influenza viruses will usually display how similar sequences of the nucleotides for hemagglutinin (HA) genes of the vaccine virus and circulating viruses are to each other.

CDC Yearly Lab Work on Flu Viruses Infographic

Full Sized Infographic and Text Version

Genome sequencing reveals the sequence of the nucleotides in a gene, like alphabet letters in words. Comparing the composition of nucleotides in one virus gene with the order of nucleotides in a different virus gene can reveal variations between the two viruses.

Genetic variations are important because they affect the structure of an influenza virus’ surface proteins. Proteins are made of sequences of amino acids.

The substitution of one amino acid for another can affect properties of a virus, such as how well a virus transmits between people, and how susceptible the virus is to antiviral drugs or current vaccines.

Figure 1: A phylogenetic tree.

Methods of Flu Genome Sequencing

One influenza sample contains many influenza virus particles that were grown in a test tube and that often have small genetic differences in comparison to one another among the whole population of sibling viruses.

Due to the constantly changing nature of influenza viruses, every sample collected from a patient contains many influenza virus particles that have small genetic differences in comparison to one another. Traditionally, scientists have used a sequencing technique called “the Sanger method” to monitor influenza evolution as part of genetic characterization. Sanger sequencing identifies the predominant genetic sequence among the many influenza viruses found in a virus sample. This means small variations in the population of viruses present in a sample are not reflected in the final result. Newer technologies (such as Next Generation Sequencing, described below) are better suited for detecting small variations in the virus genes and offer advantages for whole genome sequencing.

Since 2014, CDC has been using “Next Generation Sequencing (NGS)” methodologies, which have greatly expanded the amount of information and detail that sequencing analysis can provide.

In a typical year, CDC performs whole genome sequencing on about 7,000 influenza viruses from original clinical samples collected through virologic surveillance. NGS uses advanced molecular detection (AMD) to identify gene sequences from each virus in a sample. Therefore, NGS reveals the genetic variations among many different influenza virus particles in a single sample and can increase the speed and accuracy of sequencing each of the protein coding regions of the virus. This level of detail can directly benefit public health decision-making in important ways, but data must be carefully interpreted by highly trained experts in the context of other available information. AMD Projects: Improving Influenza Vaccines has more information about how NGS and AMD are revolutionizing flu genome mapping at CDC.

Additional Resources

CDC’s Advanced Molecular Detection Program