Virus Research 173 (2013) 350–353
Contents lists available at SciVerse ScienceDirect
Virus
Research
j o u r n a l h o m e p a g e :w w w.e l s e v i e r.c o m /l o c a t e /v i r u s r e
s
Synonymous codon usage pattern analysis of Hepatitis D virus
Arghya Kamal Bishal a ,Rashmi Mukherjee b ,∗,Chandan Chakraborty b
a Haldia Institute of Technology,Haldia,Purba Midnapur,West Bengal,India b
School of Medical Science &Technology,IIT,Kharagpur 721302,India
a r t i c l e
i n f o
Article history:
Received 23September 2012
Received in revised form 15January 2013Accepted 15January 2013
Available online 23 January 2013
Keywords:
Hepatitis D virus
Relative synonymous codon usage Effective number of codons Correlation analysis Mutational pressure
a b s t r a c t
Hepatitis D virus (HDV)is the smallest animal infecting RNA virus with unique features distinguishing it from other Hepatitis viruses.Codon usage variation is considered as an indicator of the forces shaping genome evolution.RSCU (relative synonymous codon usage)values,nucleotide contents,ENC (effective number of codons)values,aromaticity and hydrophobicity of 28HDV sequences were calculated and compared.RSCU values revealed that most of the codons ended with G or C.A comparative analysis of codon usage between HDV and human cells indicated that the synonymous codon usage pattern of HDV is a mixture of coincidence and antagonism to that of host cell.Finally the
characteristics of the synonymous codon usage patterns,ENC plot and the correlation analysis revealed that the most important determinant of the codon usage pattern for HDV is mutational pressure and positive selection force might have some influence in sequence diversity.Comparison of ENC values and GC frequencies at 3rd codon position (GC3s)between HDV and other Hepatitis viruses indicated that HDV comprise a distinct entity.
© 2013 Elsevier B.V. All rights reserved.
1.Introduction
Most amino acids are coded by more than one codon (synony-mous codon usage),due to the degeneracy of the genetic code.These synonymous codons are non-random and species specific (Grantham et al.,1981;Gupta et al.,2004).On the other hand,there are some codons that are used more preferably than oth-ers and this codon usage even varies gene to gene of the same species.It is well known that codon usage variation is considered as an indicator of the forces shaping genome evolution.So,under-standing the extent and causes of biases in codon usage is essential to the comprehension of viral evolution,particularly the interplay between viruses and the immune response (Shackelton et al.,2006).Selection pressure acts as a driving force for altering the behav-ior and fitness
of living organisms within the host environment such as the fine-tuning translation kinetics selection mechanism (Aragones et al.,2008,2010).Mutation pressure is the change in gene frequencies when same mutations occur repeatedly.This mutation frequencies range from 10−4to 10−5substitution per base per round of copying,for a variety of RNA viruses (Domingo,1996).In previous reports,translational selection and mutational pressure are thought to be among the major factors account-ing for codon usage variation among genomes in microorganism (Karlin and Mrázek,1996;Lesnik et al.,2000;Sharp et al.,1986).For some RNA viruses,compared with translational selection,
∗Corresponding author.Tel.:+919732535029.
E-mail address:rashmikgpiit@gmail (R.Mukherjee).
mutation pressure plays a more important role in synonymous codon usage pattern (Jenkins and Holmes,2003;Levin and Whittome,2000).Although it is known that compositional con-straints and translation selection are the generally accepted factors accounting for codon usage bias,recent studies suggest that fine-tuning translation kinetics selection as well as escape from antiviral cellular responses are also underlying codon usage bias (Aragones et al.,2008,2010;Karlin et al.,1994;Sugiya
ma et al.,2005).Although several selective mechanisms have been proposed to con-tribute specific patterns of codon usage,the combined effects of these forces in shaping genomic patterns of codon usage are not well understood (Shah and Gilchrist,2011).The relative signifi-cance of these forces on the evolution of codon usage bias can be deciphered by developing mechanistic models that explicitly take into account tRNA competition and intraribosomal dynamics as well as effects of amino acid substitutions on protein structure and function.
Hepatitis D virus (HDV)classified as Hepatitis delta virus (Deltaviridae family)is a small circular enveloped RNA virus.It contains a single-stranded covalently closed RNA genome of approximately 1.7kb and about 200molecules of Hepatitis D anti-gen (HDAg)for each genome.As the genome size of RNA viruses ranges approximately from 2to 31kb,with a genome of approxi-mately 1.7kb and only one ORF,HDV is the smallest known “RNA virus”infecting animals (Lai,1995;Makino et al.,1987;Taylor,2006;Wang et al.,1986).HDV requires the coexistence of HBV to supply envelope protein for its assembly into mature virions and hence it is called a satellite virus of HBV (Lai,1995;Taylor,2006).Part of the HDV genome might have historical homology to viroids
0168-1702/$–see front matter © 2013 Elsevier B.V. All rights reserved./10.1016/j.virusres.2013.01.007
A.K.Bishal et al./Virus Research173 (2013) 350–353351
Table1
List of28available complete RNA sequences of HDV examined.
No.Accession no.Strains Genotype Location Collected Time 1AF425645.1TWD2476-38IIa Taiwan2002
2AF309420.1Miyako2c Japan2009
3AY261460.1TW2479-182a Taiwan2006
4AY261458.1TW2479-532a Taiwan2006
5AY261459.1TW2479-132a Taiwan2006
6AY261457.1TW2479-12s2a Taiwan2004
7AF425644.1TWD2577-66I Taiwan2002
8GU177114.1HDV-DN79Gabon2010
9HQ005372.1Isolate31Turkey2010
10HQ005368.1Isolate91Turkey2010
11HQ005370.1Isolate11Turkey2010
12HQ005366.1Isolate61Turkey2010
13HQ005364.1Isolate51Turkey2010
14HQ005371.1Isolate21Turkey2010
15HQ005367.1Isolate81Turkey2010
16HQ005369.1Isolate41Turkey2010
17HQ005365.1Isolate71Turkey2010
18AY648959.1TW3678#25Taiwan2007
19AY648957.1TW5132#24Taiwan2007
20AY648955.1TW3038#25Taiwan2007
21AY648953.12621#56Taiwan2007
22U81989.1IC Ethiopia2004
23AY648958.1TW1573#4Taiwan2007
24AY648956.1TW1435#47Taiwan2007
25AY648954.1TW1025#14Taiwan2007
26AY648952.1TWD62#16Taiwan2007
27AY633627.1Isolate IR-1I Iran2005
28U81988.1IC Somalia2004
or plant virus satellite RNA sequences(Elena et al.,1991;Jenkins et al.,2000),and a rolling-circle model has been developed for viral RNA replication(Taylor,2003).However,in contrast to viroids, which do not code for any protein,the HDV antigenome contains an open reading frame that was probably acquired
by HDV from a cellular ancestor transcript,leading to the expression of the delta protein(Brazas and Ganem,1996;Long et al.,1997).Also Hepati-tis D circular genome possesses high GC nucleotide content.These unique features make it distinct from other hepatitis viruses.In the present study the codon usage and nucleotide compositions of28HDV genomes were analyzed to investigate the possible key determinants of codon usage bias.
2.Materials and methods
2.1.Sequences
Twenty-eight available complete RNA sequences of HDV were downloaded randomly from the National Center for Biotechnology Information(NCBI)bi.v/Genbank/.Serial numbers,GenBank accession number and other detailed informa-tion about the viruses are summarized in Table1.
2.2.Measure of synonymous codon usage(RSCU)
The relative synonymous codon usage(RSCU)values for all 28coding sequence of HDV were calculated to investigate the characteristics of synonymous codon usage without the confound-ing influ
ence of amino acid composition of different gene sample (Sharp and Li,1986a).The codons with RSCU values>1.0have positive codon usage bias(abundant codons),while those with RSCU values<1.0have negative codon usage bias(less-abundant codons),and when the RSCU values is1.0,it means that these codons are chosen equally or randomly(Sharp and Li,1986b).The synonymous codons with RSCU more than1.6were thought to be over-represented,while the synonymous codons with RSCU less than0.6were regarded as under-represented(Wong et al.,2010).2.3.Compositional properties measures
Each general nucleotide composition(U%,A%,C%and G%)and each nucleotide composition in the third site of codon(U3%,A3%, C3%and G3%)in HDV coding sequence were calculated.Also the GC3s(the frequencies of nucleotide G+C at the third codon posi-tion)and the GC content of HDV samples were calculated to examine the compositional properties.
2.4.Analysis of codon usage
The‘effective number of codons’(ENC),the useful estimator of absolute codon usage bias,was used to quantify the codon usage bias of the whole coding sequence of HDV.The ENC value ranges from20(when only one synonymous codon is chosen by the corre-sponding amino acid)to61(when all s
ynonymous codons are used equally)(Wright,1990).In this study,this measure was used to evaluate the degree of codon usage bias of HDV coding sequences.
2.5.Correspondence analysis(COA)
COA is an ordination technique that identifies the major trends in the variation of the data and distributes genes along contin-uous axes in accordance with these trends.COA creates a series of orthogonal axes to identify trends that explain the data varia-tion,with each subsequent axis explaining a decreasing amount of the variation(Greenacre,1984).In this study,the complete coding region of each gene was represented as a59dimensional vector, and each dimension corresponds to the RSCU value of one sense codon(excluding Met,Trp,and the termination codons)(Mardia et al.,1979).This was done using the CodonW program.
2.6.Statistical analysis
Correlation analysis was used to identify the relationship between nucleotide composition and synonymous codon usage pattern(Ewens and Grant,2001).This analysis was carried out using Spearman’s rank correlation analysis method.All these indices were also calculated using the CodonW program.The rel-ative frequencies of dinucleotide were also calculated using this program.P≤0.05was
considered to be significant.Results were statistically analyzed using SPSS software(version11;SPSS,Inc., Chicago,IL).
3.Results and discussion
3.1.Synonymous codon usage in Hepatitis D virus
The overall Relative Synonymous Codon Usage(RSCU)values of all64codons in28HDV genomes are summarized in Suppl.Table 2a.All codons which are used preferably are C and G ended,except the UGU and GGA codons of Cys and Gly which are U and A ended. Also GC content varies from0.607to0.569with mean of0.5869 and SD0.00846,indicating that G and C nucleotides are the major elements of HDV genome.Therefore,nucleotide composition is a major contributing factor shaping the codon usage pattern.
Additionally,a comparative analysis of the RSCU values between HDV and human cell(International Human Genome Sequencing Consortium,2001)was performed(Suppl.Table2a)and we found that the codon usage pattern of this virus was mostly coincident with that of its host.The similar synonymous codon usage pattern includes synonymous codons for Phe,Leu,Ile,Ser,Pro,Thr,Ala,His, Gln,Asn,Lys,Asp,Glu,Arg and Ser.This phenomenon enables codon usage of HDV to translate the corr
esponding amino acids efficiently, by adapting its host under translation selection.The rare codons in human cells,GUA for Val,UGU for Cys and GGA for Gly were not rare
352  A.K.Bishal et al./Virus Research 173 (2013) 350–
353
Fig.1.(a).Distribution of ENC values and GC frequency at 3rd codon position (GC3s)of HDV.(b)Comparison of ENC values and GC frequencies at 3rd codon position (GC3s)between HDV and other Hepatitis viruses.
in HDV.This result suggests that these codons,may influence the translation efficiency,enabling viral protein to be folded properly.
3.2.Compositional properties of HDV genome
Comparison among A,U,C,G values reveals that A and G were higher than C and T.Also the comparison of the values of A3,U3,C3,G3provided G3highest and U3lowest of all.The GC3s values of 28samples ranged from 0.565to 0.732with a mean value 0.6805and SD 0.03465.ENC values of these HDV genomes ranged from 41.92to 50.16with a mean value 46.8468and SD 2.14037.The ENC values for these HDV genomes were high.From Suppl.Table 3a ,it can be observed that apart from compositional constraints there were other codon usage influencing factors.
3.3.Mutational bias is another main factor leading to codon usage variation
To investigate patterns of synonymous codon usage ENC-plot (see Fig.1a)was considered as a part of the general strategy.The ENC-plots of the genes,whose codon choice is constrained only by a C3+G3composition,will lie on or just below the curve of the predicted values (Wright,1990).ENC values of each HDV genome were plotted against its corresponding GC3s values.All plots of the coding sequences lie below the expected curve as shown in Fig.1a.Therefore it can be hypothesized that the nucleotide composition was not the only codon usage influencing factor and the codon usage bias in all these 28HDV genomes is principally influenced by the mutational bias.
3.4.Correspondence analysis
COA was performed on RSCU values to investigate major trend in codon usage variation among HDV genes.One major trend in
the
Fig.2.Genetic characteristics of HDV based on different genotypes.
Table 2
Correlation analysis of ENC with GC3s,GC,ARO and HYD.
GC3s
reaction kinetics mechanism期刊GC
ARO
HYD ENC
r =−0.154*
r =−0.080
r =0.236*
r =0.064
*
Correlation is significant at 0.01%level.
first axis (axis 1)which accounted for 21.52%of total variation,and another major trend in the second axis (axis 2)for 17.38%of total variation were detected.Axis 1and axis 2were plotted according to genotypes.The genotype 2a has obviously different genetic charac-teristics with the rest,while genotypes 1,1c and 2c appear to have an evolutionary relationship.This result indicated that the codon usage variation might be one of factors to drive HDV evolution (see Fig.2).
3.5.Influencing factors of codon usage pattern
Correlation analysis was performed between GC3s content,GC content,aromaticity (ARO),hydrophobicity (HYD)and ENC values of 28HDV using Pearson’s rank correlation analysis method.The result showed significantly negative correlation existed between GC3s content and ENC values.Therefore,it can be concluded that the codon usage pattern was directly related to the nucleotid
e composition of the coding sequence.Moreover,there was a sig-nificant positive correlation between aromaticity and ENC.There was also a positive correlation between hydrophobicity and ENC values,indicating they were critical affecting factors of HDV codon usage pattern (Table 2).
Correlation analysis was carried out between A,U,C,G,GC contents and A3,U3,C3,G3,GC3s contents using Spearman’s rank correlation analysis way.The result showed a complex correlation among the nucleotide contents.There was a significant negative correlation existed between A3,U3and G.It suggested that this constraint may affect synonymous codon usage pattern.However A3has no correlation with U and C,and C3also has no correlation with A,U,G and GC,indicating no peculiarity in codon usage pat-tern.Furthermore,U3and G3have no correlation with A and C,suggesting no influence on synonymous codon usage (Table 3).
Table 3
Correlation study among the nucleotide contents.
A3
U3
C3
G3
GC3
A r =0.552**r =0.057r =−0.212r =−0.280r =−0.481**U r =−0.316r =0.503**r =−0.005r =−0.006r =−0.066C r =0.085r =−0.194r =0.322*r =−0.173r =0.123G r =−0.635**r =−0.401*r =0.186r =0.636**r =0.696**GC
r =−0.457**
r =−0.515**
r =0.260
r =0.450**
r =0.652**
**Correlation is significant at 0.01%level (1-tailed).*
Correlation is significant at 0.05%level (1-tailed).
A.K.Bishal et al./Virus Research173 (2013) 350–353353
Table4
Correlation analysis between thefirst two major axis(axis1and2)of HDV genome and nucleotide contents.
A3U3C3G3GC3
Axis1r=−0.060r=−0.209r=−0.160r=0.101r=0.060 Axis2r=0.196r=0.331*r=−0.173r=−0.373*r=−0.345*
*Correlation is significant at0.05%level(1-tailed).
Correlation analysis between thefirst two-principle axis(axis1 and axis2)of HDV genome in COA and nucleotide contents were analyzed.Surprisingly there was no significant correlation between first axis and the nucleotide contents.Only G3has significant nega-tive correlation with the second axis,indicating some influence on synonymous codon usage pattern(Table4).
4.Conclusion
The codon usage patterns are helpful to understand the pro-cesses of HDV evolution,especially the roles played by translation selection from host and mutation pressure from virus.Ourfindings reveal that the most important determinant of the codon usage pattern for HDV is mutation pressure.HDV sequence diversity may result from positive selection force.Although different regulating factors of the synonymous codon usage in HDV are demonstrated in this study,the pressure exerted by natural selection such asfine tuning translational kinetics or escape from the immune system should be investigated further for analyzing the profound charac-teristic of synonymous codon usage in HDV genome and to fully understand HDV evolution.
Appendix A.Supplementary data
Supplementary data associated with this article can be found,in the online version,at /10.1016/ j.virusres.2013.01.007.
References
Aragones,L.,Bosch,A.,Pinto,R.M.,2008.Hepatitis A virus mutant spectra under the selective pressure of monoclonal antibodies:codon usage constraints limit capsid variability.Journal of Virology82,1688–1700.
Aragones,L.,Guix,S.,Ribes,E.,Bosch,A.,Pinto,R.M.,2010.Fine-tuning translation kinetics selection as the driving force of codon usage bias in the hepatitis A virus capsid.PLoS Pathogens6,e1000797.
Brazas,R.,Ganem,D.,1996.A cellular homolog of hepatitis delta antigen:implica-tions for viral replication and evolution.Science274,90–94.
Domingo,E.,1996.Biological significance of viral quasi species.Viral Hepatitis Review2,247–261.
Elena,S.F.,Dopazo,J.,Flores,R.,Diener,T.O.,Moya,A.,1991.Phylogeny of viroids, viroid like satellite RNAs,and the viroid like domain of hepatitis delta virus RNA.
Proceedings of the National Academy of Sciences of the United States of America 88,5631–5634.Ewens,W.J.,Grant,G.R.,2001.Statistical Methods in Bioinformatics.Springer,New York.
Grantham,R.,Gautier,C.,Gouy,M.,Jacobzone,M.,Mercier,R.,1981.Codon cata-log usage is a genome strategy modulated for gene expressivity.Nucleic Acids Research9,r43–r74.
Greenacre,M.J.,1984.Theory and Applications of Correspondence Analysis.Aca-demic Press,London.
Gupta,S.K.,Bhattacharyya,T.K.,Ghosh,T.C.,2004.Synonymous codon usage in Lactococcus lactis:muta
tional bias versus translational selection.Journal of Biomolecular Structure&Dynamics21(4),739–1102.
International Human Genome Sequencing Consortium,2001.Initial sequencing and analysis of the human genome.Nature409,860–921.
Jenkins,G.M.,Woelk,C.H.,Rambaut,A.,Holmes,E.C.,2000.Testing the extent of sequence similarity among viroids,satellite RNAs,and hepatitis delta virus.
Journal of Molecular Evolution50,98–102.
Jenkins,G.M.,Holmes,E.C.,2003.The extent of codon usage bias in human RNA viruses and its evolutionary origin.Virus Research92,1–7.
Karlin,S.,Doerfler,W.,Cardon,L.R.,1994.Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses?
Journal of Virology68,2889–2897.
Karlin,S.,Mrázek,J.,1996.What drives codon choices in human genes?Journal of Molecular Biology262,459–472.
Lai,M.M.C.,1995.The molecular biology of hepatitis delta virus.Annual Review of Biochemistry64,259–286.
Lesnik,T.,Solomovici,J.,Deana,A.,Ehrlich,R.,Reiss,C.,2000.Ribosome traffic in
  • 175–185.
    Levin,D.B.,Whittome,B.,2000.Codon usage in nucleopolyhedroviruses.Journal of General Virology81,2313–2325.
    Long,M.,de Souza,S.J.,Gilbert,W.,1997.Delta-interacting protein A and the origin of hepatitis delta antigen.Science276,824–825.
    Makino,S.,Chang,M.F.,Shieh,C.K.,et al.,1987.Molecular cloning and sequencing of a human hepatitis delta(ı)virus RNA.Nature329(6137),343–346. Mardia,K.V.,Kent,J.T.,Bibby,J.M.,1979.Multivariate Analysis.Academic Press,New York.
    Shackelton,L.A.,Parrish,C.R.,Holmes,E.C.,2006.Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses.Journal of Molecular Evolution62,551–563.
    Shah,P.,Gilchrist,M.A.,2011.Explaining complex codon usage patterns with selec-tion for translational efficiency,mutation bias,and genetic drift.Proceedings of the National Academy of Sciences of the United States of America108(25), 10231–10236.
    Sharp,P.M.,Li,W.H.,1986a.Codon usage in regulatory genes in Escherichia coli does not reflect selection for‘rare’codon.Nucleic Acids Research14, 7737–7749.
    Sharp,P.M.,Li,W.H.,1986b.An evolutionary perspective on synonymous codon usage in unicellular organisms.Journal of Molecular Evolution24, 28–38.
    Sharp,P.M.,Tuohy,T.M.,Mosurski,K.R.,1986.Codon usage in yeast:cluster analysis clearly differentiates highly and lowly expressed genes.Nucleic Acids Research 14,5125–5143.
    Sugiyama,T.,Gursel,M.,Takeshita,F.,Coban,C.,Conover,J.,Kaisho,T.,Akira,S., Klinman,D.M.,Ishii,K.J.,2005.CpG RNA:identification of novel single-stranded RNA that stimulates human CD14+CD11c+monocytes.Journal of Immunology 174,2273–2279.
    Taylor,J.M.,2003.Replication of human hepatitis delta virus:recent developments.
    Trends in Microbiology11,185–190.
    Taylor,J.M.,2006.Hepatitis delta virus.Virology344(1),71–76.
    Wang,K.S.,Choo,Q.L.,Weiner,A.J.,et al.,1986.Structure,sequence and expression of the hepatitis delta(ı)viral genome.Nature323,508–514.
    Wong,E.H.,Smith,D.K.,Rabadan,R.,Peiris,M.,Poon,L.L.,2010.Codon usage bias and the evolution of influenza A viruses.Codon usage biases of influenza virus.
    BMC Evolutionary Biology10,253.
    Wright,F.,1990.The‘effective number of codons’used in a gene.Gene87,23–29.