3.1. Comparison of GST Sequences from Drosophila melanogaster Using Multiple Sequence Alignment
Figure 2A presents a sample of multiple sequence alignment generated using the 36 GST sequences from
D. mel; the full MSA is given in the
Supplementary Materials (Figure S1). First, in the GSTome, sequence lengths vary between 199 (GSTD3) and 268 residues (GSTT3); this means that there is potentially a large number of gaps between GST sequences. As shown in
Figure 2B, the number of gaps can reach the value of 35. By looking at the probability distribution, two distinct peaks were observed, one for a very few number of gaps (between 0 and 5 gaps) and one for a very large number of gaps (between 28 and 35 gaps). Overall, around 45% of GST sequences do not present a gap and 63% present 6 gaps or less. Per class, sequence lengths vary from 199 to 224 residues for class Delta; from 220 to 240 for class Epsilon; from 241 to 254 for class Omega; from 228 to 268 for class Theta; from 227 to 246 for class Zeta. GSTS1 enzyme is the only GST of class Sigma and has a sequence length of 249 residues. GSTs, which belong to class Delta, are shorter than the others, inducing several gaps when comparing two sequences from a different class. This is especially true for MSA numbers
, associated with the N-terminal (
Figure 2B). MSA numbers 118–121 and 249–256 also present a large number of gaps between GST sequences due to specific insertions in the evolution of class Omega. By analyzing the complete MSA, only three MSA numbers are associated with a perfect 100% conservation score. They correspond to Pro113, which have been found to be an essential constituent of the G site of GSTs [
6,
32]; Leu136 and Asp237, which is known to be part of a N-capping box motif [
49]. These last two residues are located in
and
helices of the structure. For all the other MSA numbers, the conservation score is less than 100% and can vary from 14% (MSA number 63) to 97% (MSA number 132). Similarlly, the total charge of GST proteins in the
D. mel GSTome can strongly vary, from −15 (GSTS1) to +17 (GSTZ1). In GSTS1, 11 negatively charged amino acids are located in the N-terminal part (MSA number < 50); this region includes a large number of gaps in the MSA, as shown in
Figure 2B. In GSTZ1, six positively charged amino acids are located in the N-terminal, and most of the others are located in domain I which has a total charge of +12 (MSA number < 140). On average, GST classes Delta and Sigma are globally negatively charged, whereas classes Epsilon, Omega, Theta, and Zeta are globally positively charged. However, the total charge of GST in class Epsilon is on average null and GST class Zeta is the most positively charged class among the six of
D. mel. GSTome. Moreover, for class Delta, with the exception of GSTD8 (−11), the total charge is between −4 (GSTD6) and +2 (GSTD1); for class Epsilon, the total charge is between −8 (GSTE4) and +6 (GSTE10); for class Omega, the total charge is between +3 (GSTO3) and +7 (GSTO1 and GSTO2iA); for class Theta, the total charge is between +6 (GSTT3) and +10 (GSTT4); for class Zeta, the total charge is between +8 (GSTZ2) and +17 (GSTZ1). The exact location of positively and negatively charged amino acids as a function of MSA number for the complete GSTome is given in the
Supplementary Materials (Figure S2).
Figure 2D presents four examples of conservation scores for two well-known biological features of GST. First, the active residue of the G site is characterized by a Cystein for all four sequences in class Omega and by a Serine for most of the other classes. This residue has been identified in domain I (N-terminal) at MSA number 64. The conservation plot shows a high score for Serine (≈
). Exceptions concern Gly64 for GSTD2, which presents the unique Glycine patch among GST [
29] and for GSTD5, which does not correspond to a Glycine patch. GSTD7 is characterized by an Alanine, GSTS1 by a Lysine, and GSTT4 by an Aspargine at MSA number 64. Second, a central lock and key motif of GST has been identified for enzymes in classes Delta and Epsilon. For GST class Delta, this key motif, referred as the “Clasp” motif, comprises Tyr165 and Met168 [
28]. For GST in class Epsilon, this motif is different and is referred as the “Wafer” motif (
Figure 1). It comprises His130, His165, and Ser168 [
8]. For these three MSA numbers, the conservation score is below 40%, with Arginine and Histidine residues at MSA number 130; but, interestingly, we found Leucine residues that are specific to class Omega, and Valine and Isoleucine residues that are specific to the other classes of
D. mel. The same observation is performed for Tyr165, which is specific to class Delta (except for GSTD11iA with a Glutamine). Class Epsilon is mostly characterized by Histidine residues at this MSA number, except for GSTE1 and GSTE2, characterized by a Phenylalanine of motif “Clasp+Wafer” [
30]; GSTE11 and GSTE14 being characterized by an Histidine and an Isoleucine, respectively. GST classes Omega, Sigma, Theta, and Zeta are mostly characterized by negatively charged amino acids at MSA number 165, i.e., Aspartic acid, except for GSTT4 (Alanine). MSA number 168 is the less conserved among motif residues with a large variability in the nature of the amino acid observed for the 36 sequences of
D. mel, with Methionine and Histidine residues being the most conserved for classes Delta and Epsilon, respectively. Interestingly, Cystein residues are also identified for class Zeta, which are probably not directly related to the binding of GSH. However, it may have a crucial role in the stabilization of GST dimeric structure for this class specifically (see
Section 2.4). Finally, we compared sequence identity between pairs of GST sequences. As shown in
Figure 2E, sequence identities vary from 11% for the pair GSTS1–GSTT4 to 79% for the pair GSTD2–GSTD5. Moreover, sequence identities within a class vary between 45% and 64%. GST classes Delta and Epsilon show very similar sequences when compared to each other, in contrast to the other classes, with 34% sequence identity on average and below 20% for the other classes. Clusters of sequence identity within a class were identified: GSTD1 to GSTD5 for class Delta and GSTE5 to GSTE8 for class Epsilon. GSTD11iA, GSTE14, and GSTT4 present sequence identities with the other classes that are much lower than the other members.
3.2. Analysis of the Global and Local Conformations of GST from Drosophila melanogaster and Comparison between AlphaFold Predictions and Experiments
From the 36 sequences identified in
D. mel GSTome, only 12 structures corresponding to 8 different GSTs have been resolved experimentally using X-ray crystallography and deposited in the Protein Data Bank (
Table 1). In detail,
3ein [
4],
3mak [
28] (GSTD1),
3f6f, 3gh6 [
28] (GSTD10) and
6t2t [
33] (GSTE14) PDB structures contain one monomer per asymmetric unit, whereas
5f0g [
29] (GSTD2),
4png [
31] (GSTE7),
6kel, 6kem [
32] (GSTE14), and
1m0u [
34] (GSTS1) PDB structures contain dimeric structures. In addition,
4yh2 [
8] and
4pnf [
31] (GSTE6) are PDB complexes of two and four dimeric structures, respectively. GST experimental structures are given in the
Supplementary Materials (Figure S3). Except for
1m0u (GSTS1), XRD structures present a different number of residues in their monomeric subunits. When compared to their gene sequence length given by UniProt, the number of residues missing in XRD structures can vary from 1 (
3mak, GSTD1 and
3f6f, GSTD10) to 46 amino acids (
1m0u, GSTS1). Particularly, 32 and 46 residues are missing for
6kem (GSTE14) and
1m0u (GSTS1), respectively. Missing parts are located in the N-terminal, which have been either removed before crystallization or had too low electron density to be detected. Historically, in 2003, Agianian et al. published GSTS1 structure (
1m0u), proposing that GST class Sigma is not related to the detoxification but rather to protect tissues from oxidative stress [
34]. It has also been shown that GSTS1 from
D. mel has low activity toward the commonly used synthetic substrate 1-chloro-2,4-dinitrobenzene (CDNB), but has relatively high glutathione-conjugating activity for 4-hydroxynonenal (4-HNE), an electrophilic aldehyde derived from lipid peroxidation [
50]. In 2010, Low et al. studied recognition and detoxification of the insecticide DDT by GSTD1. They observed that
helix occluded the H site in its APO form while still being able to metabolize DDT [
4]. In 2012, Wongsantichon et al. observed structural transition after ligand binding in both GSTD1 and GSTD10 [
28]. In 2015, Scian et al. compared GSTE6 and GSTE7 structures, concluding that they were structurally and functionally comparable [
31]. In addition, it has also been shown that GSTE6 and GSTE7 display outstanding catalytic activities with several substrates, particularly with environmental pollutants [
51,
52]. In 2018, Gonzalez et al. studied GSTD2 and observed a unique “GGGG” motif called the Glycine patch [
29]. In 2020, Koiwai et al. studied GSTE14 and its implication in ecdysteroid synthesis [
32]. In 2020, Skerlová et al. demonstrated that GSTE14 is catalytically active with steroid substrates [
33]. Finally, in 2024, Schwartz et al. studied GSTE1 and characterized its dimerization interface from the XRD structure. Particularly, they showed that the interface contains a central lock and key motif that is an hybrid motif between “Clasp” and “Wafer” [
30]. To summarize, all the experimental structures of GST from
D. mel represent in total only 8 out of the 36 sequences of the complete GSTome.
Therefore, we performed structural predictions of GST proteins including their GSH cofactors for the 36 sequences of
D. mel, as explained in
Section 2.2.
Figure 3A presents AlphaFold prediction scores (pLDDT) as a function of MSA number. Overall, pLDDT score along the sequence is larger than 95% for all GSTs, with two main exceptions observed for N- (MSA number
) and C-terminal domains (MSA number
). In addition, GSTO1 also shows prediction scores ≈60% for MSA numbers 175–210 and 240–260, associated with the loop between
and
helices and to the end of
helix, respectively. This structural part of GST only exist for class Omega, with gaps in the MSA for the other classes (
Figure 3A). The corresponding 3D structures are shown in
Figure 3B and in the
Supplementary Materials (Figures S4–S9) and secondary structure prediction as a function of MSA number is presented in
Figure 3C. Although the 3D folds of GST look very similar, the local structural properties can vary in the
D. mel GSTome due to additions/insertions/deletions during the evolution. First, the N-terminals of the GSTS1 and GSTT1 structures are predicted as loops that are intrinsically disordered and that have been badly predicted by AlphaFold (pLDDT
). Second, the thioredoxin-like fold, indicated by the
-
-
-
-
-
-
labels in
Figure 3C, is very well conserved. Third, as mentioned in
Section 3.1, GSTD3 is characterized by the shortest sequence among all GSTs of
D. mel and is associated with a missing
-sheet and a cropped
helix in its N-terminal (domain I) compared to the other structures, starting at MSA number 72. Consequently, GSTD3 is missing Ser64, which is characteristic of the active site in GST. In addition, GSTD7 is characterized by a longer
helix when compared to other GSTs and classes Omega and Zeta exhibit structures with shorter
helix. Aside from that, all GSTs are characterized by identical secondary structures in domain I. The linker loop between domains I and II is located at MSA numbers 140–153. Domain II is the one that shows the biggest differences in secondary structure predictions compared to domain I, which is more conserved through evolution. For example, we identified potentially five
helices that have a relatively well defined positions in the sequence but with some variations between GSTs. In addition, part of the structure between MSA numbers 168 and 176 contains loops that shorten the
helix for GSTE4–GSTE11. This is the range of MSA numbers that corresponds to the “Wafer” motif in the dimerization interface, as mentioned above. From 3D structures, we observed that these interactions between monomeric subunits in this region seems to deform the
helix, leading to a partial local unfolding. In addition, this structural feature is also visible in the structures of GST in classes Delta, Theta, and Zeta but for a fewer number of residues. Moreover, C-terminal part of GSTE10 (MSA numbers 310–332) presents longer
helix compared to the other GSTs, which is oriented towards the outside of the structure (
Figure 3B). However, this local structure has a pLDDT score that is much lower than for the other regions of the protein (<60%). Finally, a set of two extra
sheets located between
and
helices on one side, and in the C-terminal on the other side, were found in GSTE1–GSTE9 (
Figure 3C). These additional secondary structures are very specific and may have an impact on the dynamics of GST structures, introducing extra local rigidity. Identically, a short
sheet of two residues was predicted for GST in class Omega with a very high pLDDT score from AlphaFold, whereas an extra
helix was also predicted in the N-terminal part of GSTT3 but with a prediction score below 50% (
Figure 3B).
For all pairs of AF-predicted structures, we computed the global deviation of the backbone between GST proteins.
Figure 4A shows a matrix representation of the pairwise computed RMSD. Overall, RMSD values vary between 0.8 (pair GSTE5–GSTE6) and 10.0 Å (pair GSTE10–GSTO1). Since thermal fluctuations are associated with values of the order of magnitude of ≈2.0 Å, pairs of GST structures with a pairwise RMSD lower than 2.0 Å are considered having an identical 3D fold. With this criterion, identical folds were identified in GST classes Delta (11 members) and Zeta (2 members); class Sigma containing only 1 member. On the contrary, GST classes Omega and Theta (4 members each) present the largest global deviation within a class, whereas GST class Epsilon (14 members) is the one showing the largest variability, particularly for structures GSTE10 to GSTE14 (
Figure 4A). For instance, global deviation ranges between 0.8 Å (GSTE5-E6) and 5.4 Å (GSTE10-E11). Moreover, GST structures which belong to different classes are characterized by non-similar 3D folds when compared to each other, which means that the global fold is strongly class-dependent. Fold similarity between GST classes Delta and Epsilon is the largest, with RMSD values on average around 3.8 Å compared to RMSD values on average around 6.0 Å between the other classes (
Figure 4B). The two classes showing the largest differences of 3D fold within each other are GST classes Epsilon and Omega, even though they are characterized by similar sequence lengths. This observation is also correlated with the sequence similarity already observed in
Figure 2E. Structurally, it comes from the fact that the dimerization interface is much smaller in GST class Omega than all the other classes, which leads to much more separated monomeric subunits and from the C-terminal domain. As shown in the
Supplementary Materials (Figure S10), the radius of the gyration of GST structures in class Omega is around 24 Å, which is a little bit larger than the mean value observed for the other classes (22 Å). Within a class, the radius of gyration is ≈21 Å for classes Delta and Epsilon, with the exception of GSTD7, GSTE10, and GSTE14, which are above 22 Å. GST classes Theta and Zeta are characterized by a radius of gyration ≈23 Å, with the exception of GSTZ1 (29 Å). Exceptions listed here generally have longer N- and C-terminal parts. In the case of GSTD7 and GSTE10, it is a longer C-terminal
helix. In the case of GSTZ1, it is a long N-terminal loop; for GSTE14, it is both the N- and C-terminal loops that significantly increase the radius of the gyration. For experimental structures, and as mentioned in
Section 3.1, the terminal parts are usually removed during the structure resolution process.
Figure 4C shows local deviation as a function of MSA number for two selected pairs of GST structures, within a class and between GST structures from two different classes. On the one hand, the local deviation for the pair GSTE5–GSTE6 is always below 2.0 Å, which shows how similar the two folds are, both at a global and at a local scale. Structural similarities between GSTD6 and GSTD7 have already been pointed out experimentally [
31] and, from the present work, the structural similarities are confirmed for GSTE5-E8 (
Figure 4A). For pair GSTE10–GSTO1, the local deviation can be as large as 20.0 Å, which confirms significant differences between the two folds. Particularly, the C-terminal domain (MSA numbers 280–300) is characterized by residues with large deviations (≈10.0 Å), associated with the
helix on both structures. Surprisingly, MSA numbers 160–210, corresponding to
and
helices, also show large local deviations. It is probably due to the fact that monomeric subunits of GSTO1 are much more separated compared to the ones of GSTE10. MSA numbers 80–100 show different secondary structures locally, a loop for GSTO1 and an
helix for GSTE10, leading to significant conformational changes. It is noteworthy that, even if global folds present significant differences, local deviations for MSA numbers 55–95 (in the conserved domain I) are very low (≈1.0 Å). We performed the same analysis of local deviation along the amino acid sequence by comparing AF-predicted structure of GSTE7 with its corresponding XRD structure (
4png), as shown in
Figure 4D. First, the backbone of GSTE7, as predicted by AF, is in agreement with the experimental data, with an associated global deviation between both structures of 0.63 Å. The main AF/XRD structural differences are observed for residues Met53 (Met1 in GSTE7), Pro54 (Pro2), Gln191 (Gln124), Thr192 (Thr125), Ser315 (Ser222), and Asn316 (Asn223), which are all located in loops in the N- or C-terminals. We also compared local deviations computed using Cartesian coordinates (
Figure 4D, top panel) with the same local deviation computed using internal coordinates, the coarse-grained angles
of the backbone [
40]. These local probes are more suitable to analyze local deviations since they are not dependent on the algorithm used to perform the structural alignment between the structures. As shown in
Figure 4D (middle panel), we found that structural differences observed using Cartesian coordinates for
sheet and
helix are significantly reduced, which means that the local conformation of the backbone is in fact very similar. On the other hand, large local deviations observed for
and
helices from Cartesian coordinates are also very large, which confirms significant differences between AF and XRD structures of GSTE7 in this region. Last but not least, we performed the same study for the side-chain local conformations predicted by AF compared to XRD (
Figure 4D, bottom panel). We used as internal coordinates
dihedral angles as probes to characterize the local conformation of the side chains. Overall, there are many more differences in the side chain conformations compared to the backbone, which means that AF fails to predict the correct structure in specific locations. This is particularly true at the dimerization interface of GSTs. Moreover, as already observed from local deviation analysis of the backbone, the region between the
and
helices shows large deviation. Surprisingly, Ser168 (Ser104 in GSTE7) located in the “Wafer” motif has a
distance of ≈1, which means that the local conformation is in opposite direction between the AF and XRD structures, equivalent to a cis vs. trans local conformation of this amino acid side chain in the experimental structure. By looking deeply into the XRD structure of GSTE7 (
4png), there is a water molecule at the interface between the two monomers close to Ser104 that could play a role and also explain the orientation of the side chain. In addition, the temperature as well as the periodicity of the crystal are experimental conditions that may induce such differences. The relaxation of side chains predicted by AF using a classical force-field, for example, is therefore necessary if we are to delve deeper into the comparison between AI-predicted and experimental structures.
3.3. Characterization of the Dimerization Interface and of the GSH Binding Sites of GST Structures from Drosophila melanogaster
The two main structural features of GST related to their biological functions are the dimerization interface (DI) and the GSH binding site (GS). From the AF-predicted structures, we extracted residues which belong to the DI and to the GS (see
Section 2.4) in order to characterize similarities/differences in the complete GSTome from
D. mel.
Figure 5 presents the study of the DI and, as shown in
Figure 5A for all GSTs, residues in the DI are located in both domain I and II, essentially in the
helices. GST class with the largest group of residues in the DI is Sigma, with
40, whereas GST class with the smallest group of residues in the DI is Omega, with
20. For classes Delta, Epsilon, Theta, and Zeta, the number of residues in the DI is similar, between 25 and 35 residues. The total charge of the DI is a well-conserved property in the
D. mel GSTome. Overall, the DI of GST structures is mostly positively charged and comprises between +2 and +6 for five of the six GST classes. The exception concerns GST class Omega, which presents mostly negatively charged DI (see
Figure S11). Then, we identified MSA numbers presenting a singular behavior in
D. mel GSTome. Particularly, we focus on MSA numbers 126, 127, 131, 158, and 162, which were found to belong to the DI of all 36 GST structures. These residues are located in the
and
helices and their nature in the different sequences of the GSTome is conserved or not, as depicted in
Figure 5B. For instance, residues with MSA numbers 126, 131, and 158 are mainly hydrophobic residues interacting with each other, with conservation scores between 80 and 100%. Residues at MSA number 127 are mainly hydrophobic, with a majority of Tryptophan (in 55% of the GSTome), whereas residues at MSA number 162 are mainly hydrophilic residues (polar neutral), with a large majority of Glutamine (in 58% of the GSTome). As shown in
Figure 5, the fact that these five residues are present in the DI of all the GSTs of
D. mel with similar type of interactions means that there is a particularly high conservation of the DI for this organism during evolution; a modification of these amino acid by mutations, for example, might disrupt the homodimeric structure. In addition, 19 residues were found to belong to the DI of the large majority of the GSTome, with at least 30 of the 36 sequences involved. It confirms the high structural conservation of this feature, which is essential for dimeric GST to perform their catalytic activities. It corresponds to MSA numbers 108, 109, 124, 126, 127, 128, 130, 131, 135, 138, 154, 155, 158, 162, 165, 166, 168, 169, and 219. In addition, GST class Omega is characterized by less residues in the DI, specifically at the extremity of
helix and inside the
helix (
Figure 5A). This is a consequence of the large separation observed between the two monomeric subunits, already mentioned in
Section 3.2. On the other hand, GST class Zeta is characterized by a larger number of residues in the DI for these specific MSA numbers, which impacts the stability of the dimeric structures in this class.
To further investigate this observation, we computed the energy (in kJ/mol) and the surface area (in Å
2) of the DI of all 36 structures in the GSTome, by taking into account the physical/chemical nature of the interactions using a classical force field (see
Section 2.4). As shown in
Figure 5C, the surface areas of the DI within a class are almost identical, whereas energies show a much larger variability. This means that the nature of the contacts between the residues at the interface is more relevant to characterize the DI than just their count. For instance, GSTS1 is the protein that presents both the largest number of residues in the DI and the largest surface area. However, GSTS1 is characterized by an energy of the DI very close to the median value of the GSTome. Furthermore, class Theta, which comprises enzymes that show surface areas of the DI very close to the median value of the GSTome, is one of the most stable classes. Finally, the most intriguing class is class Zeta. GSTZ1 and GSTZ2 show very similar sequences (sequence identity larger than 70%), very similar 3D folds (RMSD smaller than 2 Å), and very similar surface areas to those of the DI (≈2100 Å
2). However, they are characterized by energies of the DI extremely different. GSTZ1 presents an energy of the DI around −1750 kJ/mol, whereas GSTZ2 presents an energy almost two times larger, i.e., around −3230 kJ/mol. To gain insights into this particular behavior of GSTZ2, we looked at the 3D atomic structure. As shown in
Figure 5D, the two monomeric subunits are very close to each other, in contrast with class Omega. In addition, the DI does contain several Histidine and Glutamic acid, which involves strong electrostatic interactions between the monomeric subunits of GSTZ2. Such interactions have a major impact on the dynamics and flexibility of the protein and therefore in the biological function. For example, it is very unlikely that one would observe the breathing modes of the binding sites of GST for this class, as observed previously for GSTD1 and GSTD10 [
28]. By comparing the DI residues for GSTZ1 and GSTZ2, differences are identified at MSA numbers 127–128, which have a Cys–Asp motif for GSTZ1 and an Ile–Glu motif for GSTZ2; MSA numbers 145–155 have a Pro–Val motif for GSTZ1 and a Val-His motif for GSTZ2; MSA numbers 166 and 179 have a Leucine and a Serine for GSTZ1 that are both substituted by Isoleucines in GSTZ2; MSA number 186 has an Isoleucine for GSTZ1 that is substituted by a Valine; MSA number 215 has a Glycine in GSTZ1 that is substituted by an Arginine in GSTZ2.
Next, a similar analysis was performed to characterize the GSH binding site (GS) of the complete
D. mel GSTome (
Figure 6). Structures were predicted using AlphaFold and AlphaFill, as described in
Section 2.2 and summarized in the
Supplementary Materials (Table S1). First, we found that residues in the GS are mostly located in domain I (
Figure 6A), except for MSA numbers 170–180, located in the central
helix of GST structures. The total charge in the GS is also a well-conserved property in the
D. mel GSTome. Overall, the GS total charge is slightly positive, between +1 and +2 for 86% of the GSTome, except for five GST structures of class Epsilon, i.e., GSTE1 and GSTE9–E12, which present a total charge of +3 or +4 (see
Figure S11). Compared to the dimerization interface, there is no MSA number in the GSTome that shows a 100% identification score for the GSH binding sites. It comes from the fact that the N-terminal of GSTT3 structure is in the G site. It might be an artifact of AlphaFold, since, for this particular GST, the pLDDT is <
and no GSH cofactors were found below sequence identity of 25% (see
Section 2.2 and
Table S1). Nevertheless, four residues belonging to the GS of 35 out of 36 GST sequences were identified at MSA numbers 111, 112, 113, and 129, which are located in the loop between
helix and
sheet in the thioredoxin fold, whereas MSA number 129 is located at the very beginning of the
helix. Residues at MSA numbers 111, 112, and 113 form a TVP sequence motif, which is highly conserved in the
D. mel GSTome, particularly Pro113, which is conserved in 36 sequences of the GSTome. Val112 can be replaced by other hydrophobic non-aromatic amino acids such as Isoleucine; this is the case in GST class Delta. Thr111 is less conserved than the other two residues and can be replaced by positively charged Lysine; this is the case for the GST in classes Omega, Theta, or Zeta. This motif has been demonstrated to be involved in the stabilization of GSH in the GS [
8]. We also identified four more residues at MSA numbers 64, 66, 110, and 128 that appear in the GS of 30 out of 36 sequences in the
D. mel GSTome; this makes them of strong interest in the present work. Among them, we found a residue at MSA number 64, which corresponds to the active Serine already mentioned in
Section 3.1. This residue is present in the GS of all GST structures except GSTD2, GSTD3, GSTS1, and GSTT3. Finally, class Omega does not include residues located in the
helix, was already the case for the DI, whereas class Zeta includes several residues located in this secondary structure. As explained above for energies of the DI, the fact that there are several contacts does not always mean that there are very strong interactions between residues and ligands. However, it is a good indicator of pocket volume accessible for the ligand to bind. Therefore, GSTs in class Zeta present binding pockets that appear to be more closed when compared to other classes.
Last but not least, we studied communication pathways between the two G sites of GST homodimeric structures. Basically, we identified residues involved in all possible existing pathways between the two GSH binding sites. Pathways were defined as a group of interacting residues that were visited to go from one GSH-bound site to the other APO (empty) site, as shown in
Figure 6D. In this context, it has been shown experimentally that the presence of a GSH ligand in the one binding site induces modifications in the other monomeric subunit, corresponding to allosteric effects in GST [
28]. From this analysis, all of the residues previously identified to belong to the GS were systematically involved in communication pathways. Moreover, several residues located in the middle of
helix were also identified. When compared to
Figure 5A, this property is associated with the central motif in the dimerization interface. Surprisingly, none of the pathways identified residues located in
helix of the DI, which might be due to the presence of a GSH ligand in the pocket. Second, some GST sequences show residues in
and
helices but they are not highly conserved in the GSTome. For example, MSA number 243 in
helix is visited in GST structures of classes Omega, Sigma, and Theta only. The most-visited residue in the GSTome is the residue at MSA number 165, which is located in the middle of the
helix. This residue has been identified in the GS for the complete GSTome and its nature is class-dependent. It is mostly a Tyrosine for GST in class Delta, an Histidine for GST in class Epsilon, and a Glutamic or Aspartic acid in GST classes Omega, Theta, and Zeta.
Figure 6D shows the structure of GSTZ2 with residues identified in the GS and an example of pathway passing through Glu165. It illustrates how the presence of a GSH ligand in the G site of one subunit involves a relatively short pathway to reach the other G site. Moreover, we saw that Glu165 (Glu114 in GSTZ2) in a given monomer is in contact with Glu128 (Glu81 in GSTZ2) of the other monomer. This is a very common interaction that has been identified in communication pathways of GST from the present work. Finally, by looking at the number of residues involved in communication pathways for each GST sequence, we observed that the average length is similar in the
D. mel GSTome (≈30 residues), with GSTD3 that shows the shortest average pathway (20 residues) and GST in classes Sigma and Zeta that show the longest average pathways (≈40 residues).
To summarize, we identified an ensemble of key residues in
D. mel GSTome, i.e., 10 in the DI and 8 in the GS that plays a crucial role. These residues, according to their MSA numbers, are the following: 108, 109, 124, 126, 127, 128, 130, 131, 135, 138, 154, 155, 158, 162, 165, 166, 168, 169 and 219 for the DI and 64, 66, 110, 111, 112, 113, 128, and 129 for the GS. Most of these residues have been highlighted in
Figure 2A and their conservation plots are presented in the
Supplementary Materials (see Figure S11). Among these residues, the-least conserved residue is at MSA number 168, with a maximum conservation score of 25% in the GSTome; this is very class dependent. In detail, it is a Methionine in GST class Delta; Serine in class Epsilon; Phenylalanine in class Omega; Histidine in class Theta; Cystein in class Zeta. The second least conserved residue is found at MSA number 165 (
Figure 6B), which is located at the center of GST structures and close to the GS of both monomeric subunits. Its maximum conservation score is of 27% in the GSTome and this is also very class dependent, as described above. In addition, this specific location is part of “Clasp” and “Wafer” motifs in GST classes Delta and Epsilon and we also identified its crucial role in GST structures of the other classes. Another case can be highlighted, MSA number 130, which corresponds to positively charged amino acids in GST classes Delta and Epsilon; and hydrophobic residues in the other classes. Finally, MSA numbers 113 and 128 show much higher conservation, with Pro113 which belongs to the GS; these have perfect conservation in the GSTome, and have negatively charged Asp128 or Glu128. From our analysis, it is the only residue that is in both the DI and the GS of GST structures. The overall high conservation of these MSA number residues emphasizes their important role in GST from
D. mel.
3.4. Analysis of the Structural Dynamics of GST from Drosophila melanogaster
From X-ray crystallography experiments, thermal B-factors, also called Debye–Waller factors or temperature factors, can be measured. They describe the attenuation of X-ray scattering caused by thermal motion. From AF-predicted structures, this information is not yet available, but physical models can be used to estimate structural fluctuations from atomic coordinates. One of them is normal mode (NM), which is a technique that can be used to describe flexible states accessible to a protein around its equilibrium position, in the harmonic approximation of small atomic displacements. In the present work, we applied an NM analysis (see
Section 2.5) to predict thermal B-factors of the 36 AF-predicted structures from
D. mel. First, we compared theoretical B-factors with experimental data (see
Figure S12). For GST structures listed in
Table 1, the comparison between the measured and predicted XRD thermal B-factors was very satisfying, with an average correlation of
. The best prediction was obtained for GSTD2 (
5f0g) with
; the worst prediction was obtained for GSTS1 (
1m0u), with
between the theoretical and experimental data. This particular behavior of GSTS1 is due to very large fluctuations measured experimentally in the region around MSA number 190, which corresponds to an external loop located between the
and
helices in domain II (see
Figure S12). These fluctuations are not reproducible using NM calculations and are probably due to anharmonic oscillations in the structure.
Figure 7A presents predicted thermal B-factors of AF-predicted structures for the complete
D. mel GSTome, as a function of MSA number. The most flexible regions are located in the N- and C-terminals, with B-factors values above 25 Å
2, as envisioned from structural properties shown in
Figure 2. On the one hand, residues which belong to the DI or to the GS show low flexibility for the complete GSTome (≈10 Å
2) due to the fact most of these residues are located in
helices. On the other hand, some MSA numbers are associated with highly flexible region. For instance, MSA number 100 shows B-factors of ≈20 Å
2 in all GST classes, except in class Omega. Similarly, MSA numbers 142–143 are highly flexible in class Delta and Epsilon but not much in the other classes. This result is not directly related to a difference of secondary structure as they are located in the linker loop between domains I and II. In terms of conservation, in GST class Delta, these MSA numbers correspond to a motif Lys–Asp and the polar amino acid is mostly a Serine in class Epsilon. However, it is less conserved in the other classes. This loop, which is located at the outside of the structure, is therefore more exposed to the solvent environment in GST structures of classes Delta and Epsilon and shows higher flexibility than the rest of the structure. This is also the case for the loop between the
and
helices, at MSA numbers 185–195. Furthermore, we computed the standard deviation of thermal B-factors in order to identify the MSA numbers that present the largest variability in flexibility/rigidity in the GSTome.
Figure 7B represents both standard deviation and average thermal B-factors as a function of MSA number. Hereafter, we mainly focus on MSA number characterized with a few gaps in the MSA of the GSTome. First, structural fluctuations of the GSTome profile roughly vary between 10 and 20 Å
2, apart from disordered N- and C-terminal parts. There are five mainly rigid regions (
Å
2), which correspond to the
helices plus the
sheet, and which are associated with a low variability in the GSTome. The residues which belong to the DI and to the GS are located in these specific rigid regions. In addition, the
and
helices, located in domain II, show intermediate flexibility with thermal B-factors around 15 Å
2, but are associated with a much larger variability compared to the
helices of the thioredoxin domain. In domain II, the largest variability is observed at MSA number 256. This MSA number has gaps in the Delta and Zeta classes. For the Omega class, it is located in the loop between the
and
helices, whereas in Epsilon and Theta classes, it is part of the
helix. It induces a larger flexibility in GST class Omega specifically.
In total, we identified six MSA numbers that are associated with both large variability and low number of gaps. This concerns numbers 142, 143, 193, 197, 198, 224, 263, and 281. MSA number 224, which is located at the extremity of the
helix, corresponds to a Glycine amino acid in 50% of the sequences, particularly in GST classes Delta and Omega. GSTs, which present residues other than Gly224, shows a lower flexibility in this region. MSA number 263, which is located between the
and
helices, is characterized by high flexibility in the Epsilon and Omega classes. In addition, it is not directly correlated with: (i) the nature of the amino acid, since, at this position, the maximum conservation score in the GSTome is 20%; (ii) the local conformation, since it is a loop for all GST structures in the GSTome. Within a class (
Figure 7C), the flexibility profiles of the GST are pretty well conserved and GST classes showing the largest variability in thermal B-factors profiles are the Epsilon, Omega, and Theta classes. The GST Epsilon class presents a large variability at MSA numbers 310–314, which are associated with the extra
-sheet mentioned in
Section 3.2 (
Figure 3C). The GST Omega class at MSA number 200, located in the
helix, presents flexibility for GSTO1 and is characterized by an Aspartic acid, whereas for the other GST of this class, this residue is a Leucine or an Isoleucine. The GST Theta class presents a large variability in flexibility in the loop between
and
. It is not relevant to discuss the variability in rigidity/flexibility in GST classes Sigma and Zeta since they only contain 1 and 2 structures, respectively.
Finally, we quantified the variability in flexibility/rigidity in the GSTome from
D. mel by computing the Pearson correlation between thermal B-factors profiles of each GST individually (
Figure 7A) with the average profile of the complete GSTome (
Figure 7B). As shown in
Figure 7D, GST classes Delta, Epsilon, and Theta are characterized by correlations
. However, GSTE10 and GSTT3 show singular behaviors, with correlation much smaller than the other GST structures of the same class. As shown in
Figure 3, the N-terminal of GSTT3 is very long and not well predicted by AF. In addition, it shows a lower flexibility in this region compared with the average profile. The same analysis holds for the very long C-terminal helix of GSTE10. GST classes Zeta and Sigma show correlation values
and
, respectively. Finally, GST class Omega shows correlation
, which is the smallest among the GSTome. This means that the GST from the Omega class are characterized by a flexibility profile slightly different compared to the other classes; this correlates with the fact that they present a 3D fold that is different from the other GSTs from the
D. mel (see
Section 3.2).