Protein identification and characterization is one of the most important components of expressing and manufacturing recombinant proteins for biologics. It involves a wide variety of analytical tools and techniques because of the very complexity of proteins themselves, starting with 21 amino acids arranged in a nearly infinite number of ways, and then folded into three-dimensional structures.
Understanding a protein’s structure is key to understanding its function, and the creation of biological therapies, including antibodies, recombinant proteins, vaccines and other molecules, depends upon clear and accurate characterization.
Protein Identification Methods
Mass spectrometry (MS) – Whole proteins are first ionized, and then enter a mass analyzer, or identified at peptide level. The spectrometer identifies protein structure by mass fingerprinting or tandem mass spectrometry. Peptide masses are compared to online databases to make the closest protein matches.
Edman degradation – Purifies proteins by removing one residue at a time from the amino end of a peptide, using phenyl isothiocyanate. The process does not damage the protein and separates one residue at a time, but is not as useful for larger proteins.
Peptide mass fingerprinting – This high-throughput method involves endoproteases that cleaves unknown protein into smaller peptides. The mass of these peptides can be measured using MS, and resulting lists of “peptide peaks” compared to protein databases.
Database searching and bioinformatics tools – Data from MS and fingerprinting is compared using computer programs and databases to find the closest match to your sample.
Immunoassays – Identifies proteins by their interactions with specific antibodies. These tests include ELISA (enzyme-linked immunosorbent assays), Western blotting (separation of proteins by molecular weight, electrophoresis transfer to a membrane, and probing with antibodies), and immunoprecipitation (precipitating a protein antigen out of solution using an antibody that specifically binds to that protein).
Size exclusion chromatography – Using beads of specific dimensions packed in a column, these methods separate proteins according to size. When samples enter into the column, different molecules will have different elution rates.
Affinity chromatography – This technique also uses a column and cellulose beads. A substrate (or sometimes a coenzyme) is bound covalently to the beads at the top of the column. Those proteins that have a binding site for the immobilized substrate will bind, while all other proteins will be eluted.
Protein Characterization Techniques
Protein characterization elucidates the primary sequence, higher-order structure, post-translational modifications, interactions, and biological activities of proteins. Protein characterization involves a wide range of analytical tools and techniques. There cannot be a “one-size fits all” technology to determine the chemical makeup, structure and function of large molecules with different aggregation states, charges, size, and three dimensional configuration. But characterizing proteins in your samples, especially potential therapeutics, is vitally important to determine the safety, efficacy and purity of your protein-based therapies.
Protein Analysis Equipment
Protein analysis involves tools that detect, purify and identify proteins, and begin to characterize structure and function. Equipment can include traditional methods like electrophoresis, separating proteins by size and charge, western blotting, marking targets with antibodies, and mass spectrometry, measuring mass-to-charge ratios. More modern methods include light scattering (DLS).
Protein separation (electrophoresis) places proteins in a gel, and observes their mobility in the presence of an electric field. Proteins can be separated by solubility, size, charge and binding. The most common is SDS-Page (short for sodium dodecyl sulfate polyacrylamide gel electrophoresis), which separates proteins based on molecular weight.
Western blotting identifies proteins extracted from cells, and separates them by size, transfers them to a solid support (the blot), and uses primary and secondary antibodies to target proteins thus providing identification of proteins based on biological associations.
Light scattering is a more modern approach, and is more sensitive with larger molecules. For instance, Dynamic Light Scattering (DLS) can sensitively detect small quantities of polypeptide aggregates in preparations. The Halo Labs Aura family uses DLS to accurately measure particle size distribution, based on backgrounded membrane imaging (BMI) and fluorescence membrane microscopy (FMM). These instruments provide valuable information about the size distribution of particles in a sample, helping researchers and manufacturers understand the characteristics of their products.
Biologics Formulation Instruments
During manufacturing of therapeutic molecules, it’s important to have the right instrumentation to monitor protein conformation, predict thermal stability, and measure aggregate formation.
Differential Scanning Calorimetry (DSC) characterizes the thermal stability of proteins by measuring enthalpy (ΔH) and temperature (Tm) of thermally-induced structural transitions of molecules. Dynamic Light Scattering (DLS) can analyze particle mobility and charge (Zeta potential) using the technique of Electrophoretic Light Scattering (ELS), and the molecular weight of particles in solution using Static Light Scattering (SLS). Other systems, like Halo Labs’ Aura PTx, use Backgrounded Membrane Imaging (BMI) with two channels of Fluorescence Membrane Microscopy (FMM) to detect and identify subvisible particles that can form aggregates and threaten the stability of biologicals. Additionally, FMM enables characterization of formulation excipients as well such as polysorbate.
Protein Characterization Tools
Chromatography: High-performance liquid chromatography (HPLC), size-exclusion chromatography (SEC), ion-exchange chromatography (IEC), and affinity chromatography separate proteins based on size, charge, hydrophobicity, or specific interactions, enabling purity analysis and separation of protein isoforms or variants.
Electrophoresis: Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), isoelectric focusing (IEF), and capillary electrophoresis (CE) separate proteins based on size, charge, or isoelectric point, facilitating purity analysis, subunit composition, and post-translational modifications.
Mass spectrometry (MS): Liquid chromatography-mass spectrometry (LC-MS) and matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) identify and quantify proteins, peptides, and post-translational modifications, elucidating primary sequence, structural variations, and interactions.
Spectroscopy: Ultraviolet-visible (UV-Vis) spectroscopy, fluorescence spectroscopy, circular dichroism (CD), and nuclear magnetic resonance (NMR) spectroscopy allow protein structural analysis as well as analysis of folding, conformational changes, and ligand binding.
The Aura family of instruments (in particular the Aura PTx for proteins) uses a combination of backgrounded membrane imaging (BMI), a microscopy method and fluorescence membrane microscopy, which uses fluorescent dyes or antibodies for faster, reliable characterization all on one instrument. With Aura, researchers can understand the stability and purity of the proteins being made.
Protein Characterization Assays
Biophysical assays: Dynamic light scattering (DLS), surface plasmon resonance (SPR), and differential scanning calorimetry (DSC) characterize protein size, shape, stability, and interactions with ligands or other molecules.
Immunological assays: Enzyme-linked immunosorbent assay (ELISA), Western blotting, immunoprecipitation, and flow cytometry detect and quantify proteins, epitopes, or specific protein-protein interactions, facilitating immunoassays, biomarker detection, and protein-protein interaction studies.
Protein Formulation
The process of assembling the primary structure of proteins (amino acids), followed by secondary, and tertiary (three-dimensional) structures into final pharmaceutical protein products requires steps that ensure stability, structural integrity and function. The protein must be stabilized so it tolerates manufacturing processes and remains stable and active during transportation, storage, and administration. Formulation development aims to ensure the stability, efficacy, safety, and manufacturability of biopharmaceutical products by selecting appropriate excipients, buffers, pH, and dosage forms.
Factors that influence stability and efficacy include aggregate formation, which impacts protein structure (and therefore function). It is important to distinguish aggregated active pharmaceutical ingredients (API) from other particle types for understanding the root cause of instability. The Aura™PTx’s 96-well aggregate and particle imaging system can rapidly size, count, and characterize particles and identify them as proteins, non-proteins, excipients, or other types of molecules. This system is much more efficient and accurate than traditional flow cell systems used to distinguish protein size and count.
Protein Aggregation Analysis
Aggregates in protein samples are usually segments of broken polypeptide chains which “come along for the ride” when manufacturing protein therapies. They can, however, significantly reduce target protein stability and reduce the efficacy of a therapeutic agent. In addition, exposure to air, solid, light or changing temperatures (often parts of the pharmaceutical development process) can produce protein particles.
Successful protein therapies require product quality measurements in the late discovery/early development stages that detect, count and characterize formulation excipients, ensuring protein stability. But traditional methods require hours of sorting through images and relatively large volumes, and adopting complex machine learning libraries, and often overlook key sample aggregates and degraded compounds. Aura PTx combines Backgrounded Membrane Imagine (BMI) with two Fluorescence Membrane Microscopy (FMM) channels to quickly provide count, size and morphology, and differentiate cellular, protein or extrinsic aggregates.
Molecular Weight Characterization
Techniques for characterizing proteins by molecular weight include SDS-PAGE, MALDI-TOF MS, and size exclusion chromatography. SDS-PAGE is the most common technique, and separates proteins by molecular weight (and not by charge or folding). It’s often used in biochemistry, forensics, genetics and molecular biology. MALDI-TOF is a form of mass spectrometry that determines the ratio of mass to charge, which determines masses and elemental composition of proteins. Once MALDI (matrix-assisted laser desorption/ionization) was developed, this technique became popular to study proteins. Size exclusion chromatography, also known as gel filtration or high-performance liquid chromatography, is a powerful technique for aggregate and fragment analysis in the research, development, and manufacturing of biotherapeutic proteins, such as insulin and monoclonal antibodies.
HPLC Purity Analysis
High performance liquid chromatography, as mentioned above, is used for protein purity studies. It allows for highly selective separation of sample material. HPLC separates molecules based on their size by filtration through a gel. The gel is made of beads containing pores of a specific size distribution, charge, and/or affinity. Separation occurs when molecules of different sizes, charges or affinities are included or excluded from the pores within the matrix. Larger analytes will elute first, while the smaller molecules interact more with the stationary phase and will elute later.
Protein Charge Variants Analysis
Large biotherapeutic proteins can go through enzymatic post-translational modifications during manufacturing, such as glycosylation and lysine truncation. In addition, chemical modifications can occur during purification and storage. Because of this, FDA and other regulators require that these proteins go through charge variants analysis. This analysis has traditionally been performed using ion exchange chromatography with salt gradients, while newer techniques rely on pH gradients instead of salt. Isoelectric focusing using capillary electrophoresis (CE) has also been used as a fast global method, but it still has longer run times and is prone to human error. This often requires a very pure sample as it cannot handle contaminated or heterogeneous samples well.
Aggregation and Fragment Content Analysis
Aggregates and fragments are common impurities in biopharmaceuticals that have an impact on product efficacy, safety, and stability. Size-exclusion chromatography (SEC) can be coupled to electrospray ionization mass spectrometry (MS) for the characterization of size variants of therapeutic monoclonal antibodies (mAbs). Quadrupole time-of-flight (QTOF) MS can also be optimized for more sensitivity. In addition, analytical ultracentrifugation can aid in separating fragments.
Identifying and Quantifying Host Cell Protein Residues
Host cell proteins are impurities produced by the host organism during biotherapeutic manufacturing. Purification usually removes most of these residues from the final product, but residual proteins that remain with the manufactured antibody therapy, even at low levels, can be immunogenic. FDA and other regulators today dictate that practically no protein impurities be present in a final biotherapeutic product. Mass spectrometry (MS) techniques are useful for identification and quantitation of HCPs lingering in recombinant therapeutic products, and an improvement over ELISA and other analytical methods. BMI and FMM microscopy analytical techniques in the Halo Aura PTx system can also be very effective at detecting proteins and aggregates.
Sequence Coverage/Peptide Mapping
Determining the exact, accurate sequence of your therapeutic protein is critical to successful characterization. This method works by digesting proteins down to their fundamental peptides, and ensures a full sequence of the biotherapeutic molecule. Techniques for sequence coverage and peptide mapping include HPLC, and HPLC coupled with MS. MS techniques can provide high accuracy and selective data, especially with very complex biomolecules like monoclonal antibodies (Mabs).
N-terminal and C-terminal Sequence Confirmation
Amino (N-terminal) sequence analysis identifies the order of amino acids of your protein, starting at the N-terminal. N-terminal sequences significantly impact half-life, subcellular localization, and post-translational modifications of proteins. Similarly, C-terminal (acid terminal) sequence analysis identifies amino acid orders at the C-terminal end of the protein. This can provide information on protein folding, secondary structure, and the arrangement of functional domains. Two principal methods for conducting this sequence confirmation are mass spectrometry (MS) and Edman Degradation.
Post-translational Modification Analysis
Post-translational modifications (PTMs) are variants introduced in proteins after they are expressed. Epigenetics studies have shown that PTMs can be innocuous or can dramatically alter the function of proteins, and thus it is necessary to detect PTM effects in any biotherapeutic product. Common PTMs can range from the very simple to the complex, and include phosphorylation, glycosylation, ubiquitination, nitrosylation, methylation, acetylation, lipidation and proteolysis. They are most often enzymatic (caused by enzyme actions). Techniques for identifying and analyzing PTMs can vary by the type of PTM, and include SDS-PAGE, Western blotting, chromatin immunoprecipitation (ChIP), and other techniques.
Disulfide Bond Analysis
Disulfide bonds are covalent bindings between the sulfur atoms found on residues of the amino acid cysteine. They are the only covalent link found between polypeptides, and stabilize protein structure. Reducing agents like tris (2-carboxyethyl) phosphine hydrochloride (TCEP), beta-mercaptoethanol (BME), and dithiothreitol (DTT) can disrupt disulfide bonds and therefore protein stability. Disulfide bond analysis is usually conducted by MS and HPLC.
Glycosylation Modification Analysis
Protein glycosylation is an important PTM, and can change protein folding, conformation, distribution, stability and activity. Glycosylation involves the addition of a variety of sugars to proteins, which then control cell attachment to the extracellular matrix and protein-ligand interactions in the cell. Glycoproteins are detected, purified and analyzed by, in order, glycan staining and visualization, glycan crosslinking to agarose or magnetic resin for labeling or purification, or proteomic analysis by mass spectrometry.
Analysis of Glycosylation Sites and Glycoforms
Glycosylation will vary according to where in the cell the sugar molecule binds, and according to the structure and chemical binding site of the sugar molecule. These oligosaccharides can affect protein-protein interactions by either allowing or blocking proteins from binding to cognate interaction domains. Since they are hydrophilic, they can also alter protein solubility. Techniques for analysis include glycan staining, enrichment and analysis using lectins, and mass spectrometry.
Analysis of Sialic Acid
Sialic acid is the most prevalent sugar found on mammalian cell surfaces. They are essential to cell adhesion and immune modulation, and they bind selectins and lectins. Sialic acid and monosaccharide analysis is required by ICH Q6B guidelines for biopharmaceutical production. Typically, sialic acid is quantified by fluorescence labeling and analysis via HPLC with fluorescence detection.
Overview: Protein Characterization Technology
Proteins are arguably the most complex molecules in living cells and tissues, and their use as biopharmaceuticals requires integrating various technologies for comprehensive analysis to ensure their safety and efficacy.
Mass Spectrometry Characterization
As described previously, mass spectrometry is an invaluable tool to characterize and analyze proteins as well as fragments, aggregates and PTMs that can impact protein development and stability. MS techniques are used to determine structure, function, folding and interactions, identify proteins from its mass of peptide fragments, and to accurately count proteins in a sample. The development of high-throughput and quantitative MS proteomics within the last 20 years has expanded the scope of MS.
Characterization by Circular Dichroism Spectrometry
Circular dichroism measures the differential absorption of left- and right-handed circularly polarized light, by optically active compounds at different wavelengths. The incident light on the sample switches between LCP and RCP light. As the incident light switches direction of polarization, the absorption changes and the differentiated molar absorptivity can be calculated. CD can study secondary structure changes and set the stage for a conformational analysis of proteins. CD can also be used to examine protein/nucleic acid stability (folding, unfolding and refolding) in the presence of pH, denaturants, temperature and other factors.
FMM/SIMI/BMI
Forming the basis of Aura’s protein and protein fragment analysis, Aura PTx is the first and only system that combines Backgrounded Membrane Imagine (BMI) with two Fluorescence Membrane and Microscopy (FMM) channels to quickly provide count, size, ID, imaging and morphology with differentiated cellular and protein aggregates, excipients such as degraded polysorbate, and/or extrinsic particle characterazation.
Dynamic Light Scattering
DLS is used to measure particles from 0.3 nm to 10 µm. These instruments work using Brownian motion, in which lighter particles move faster than heavier (and therefore larger) ones. A laser illuminates the particles and the scattered light is analyzed. Fluctuations in intensity determine size distribution in a sample.
Characterization Through X-ray Crystallography
X-ray crystallography is based on the capture of x-ray images as they travel through a crystallized form of a target molecule (in this case, a protein). It is an accurate way to determine the three-dimensional structure of a protein. To use this technique, the crystallographer obtains protein crystals, records the diffraction pattern formed by x-rays passed through the crystals, and then interprets the data using computer software. The result is an atomic-resolution model of a protein.
Nuclear Magnetic Resonance Spectroscopy Characterization
Nuclear magnetic resonance (NMR) occurs when nuclei in an unmoving magnetic field is disturbed by an oscillating magnetic field; the nuclei generate an electromagnetic signal, whose frequency depends on the magnetic field applied. With x-ray crystallography and cryogenic-electron microscopy, NMR is one of three techniques that is used to elucidate the structure of proteins. Unlike nuclear magnetic imaging, protein NMR uses algorithms to create three-dimensional models of the sample of interest. Protein NMR is conducted on thoroughly purified samples.
Cryo-Electron Microscopy Characterization
Another technique that can determine the structure of a protein, cryo-EM is an emerging and promising technology to obtain high-resolution membrane protein structures, which are not possible with other techniques like NMR or X-ray crystallography. Cryo-EM consists of several applications like single particle analysis (SPA), cryo-electron tomography (cryoET), and micro electron diffraction (MicroED). It has been revolutionary in structural biology, infectious disease research, and drug discovery.
Advancing Protein Characterization
Because of the complex nature (not to mention importance) of proteins, it’s necessary to develop integrative approaches that combine multiple techniques for comprehensive analysis and protein characterization. Future trends in characterization appear to lie in higher throughputs and better atomic-level resolution, as well as the ability to quickly and reliably determine non-proteins and protein fragments and aggregates, particularly as more protein-based biotherapeutics are developed and enter the marketplace.
REFERENCES
https://www.usp.org/biologics/proteins
DISCOVER THE AURA FAMILY
Read More ON this topic