This article has been cited by other articles in PMC. The occurrence of T-boxes in all analyzed genomes is shown in detail. Position, direction, e-value and specifier codon are shown for the T-box, together with position, name and function of the first gene located downstream and the size of the operon located downstream. All T-boxes are color coded according to the function of the genes located downstream: For all species the number of T-boxes is displayed, divided over four different categories tRNA synthesis, amino acid transport, amino acid biosynthesis and other.

Between brackets the number of regulated genes is shown. This number is based on the operon structure of the genes downstream of the T-box. In case multiple strains of a specific species were sequenced, these are only shown when differences between the strains were observed.

Extensive description of the improved annotation of the T-box regulated transporter systems not given in the main text. Abstract Background T-box anti-termination is an elegant and sensitive mechanism by which many bacteria maintain constant levels of amino acid-charged tRNAs.

The amino acid specificity of the regulatory element is related to a so-called specifier codon and can in principle be used to guide the functional annotation of the genes controlled via the T-box anti-termination mechanism.

Results Hidden Markov Models were defined to search the T-box regulatory element and were applied to all completed prokaryotic genomes. The vast majority of the genes found downstream of the retrieved elements encoded functionalities related to transport and synthesis of amino acids and the charging of tRNA. This is completely in line with findings reported in literature and with the proposed biological role of the regulatory element.

For several species, the functional annotation of a large number of genes encoding proteins involved in amino acid transport could be improved significantly on basis of the amino acid specificity of the identified T-boxes. In addition, these annotations could be extrapolated to a larger number of orthologous systems in other species. Analysis of T-box distribution confirmed that the element is restricted predominantly to species of the phylum Firmicutes.

Furthermore, it appeared that the distribution was highly species specific and that in the case of amino acid transport some boxes seemed to "pop-up" only recently. Conclusion We have demonstrated that the identification of the molecular specificity of a regulatory element can be of great help in solving notoriously difficult annotation issues, e. Furthermore, our analysis of the species-dependency of the occurrence of specific T-boxes indicated that these regulatory elements propagate in a semi-independent way from the genes that they control.

Background Transcription anti-termination is a regulatory mechanism commonly encountered in all lineages within the bacterial kingdom see e. We demonstrate that pf16 represents a peripherally related T4-like phage and show its evolutionary standing amongst the currently poorly classified T4 supergroup Tevenvirinae.

Materials and methods Phage isolation and purification Cultures of P. The lysate was concentrated by PEG precipitation, and purified using CsCl density gradient centrifugation as described by Sambrook and Russell [ 16 ]. DNA extraction, phenol-chloroform extraction, and ethanol precipitation, was carried out using SDS and proteinase K as described previously [ 16 ].

Electron microscopy Carbon-coated formovar grids were subjected to hydrophilic treatment with poly-l-lysine for 5 min. Sequencing of the resulting library was carried out from both ends 2x bp with the cycle MiSeq Reagent Kit v3 on MiSeq Illumina, USA and the adapters trimmed from the resulting reads at the facility.

Genome assembly and annotation A total of 1, bp paired-end reads was obtained and underwent initial quality checking using FastQC [ 17 ].

Given the massive read coverage, stringent quality trimming parameters were utilised. Assembly was carried out using Geneious R8 Biomatters, New Zealanddue to the difficulty other assemblers have in resolving terminal repetition often seen in phages. Geneious assemblies were checked for potential misassembly by comparison with output from SPAdes v3. In addition, the two sequences obtained were utilised as references for read mapping using bbmap v Pf16 was assembled with an average per base coverage of x and phiPMW with x.

Transmembrane domains and signal peptide sequences were identified using TM-Pred and SignalP respectively [ 26 ]. Frequency of codons recognised by tRNAs was carried out using an in-house script across both the whole genome and at the gene level.

To identify putative regulatory sequences, bp upstream of every ORF in pf16 and phiPMW was extracted using an in-house script and analysed using both neural network promoter prediction for prokaryotic sigma 70 promoters and Multiple EM for Motif Elicitation to identify phage specific promoters [ 33 ]. Promoters were confirmed by using an additional in-house script on the extracted sequences which also provided confirmation as to the associated consensus.

Protein bioinformatics Molecular models were constructed from amino acid sequences using the I-Tasser package and energy minimisation of the first, most favourable prediction carried out on the Yasara server [ 3435 ].

Fidelity of models was assessed using the validation tools of the Whatif server before subsequent analyses. MetaPPISP was utilised to predict residues likely to be involved in protein-protein interactions [ 36 ]. At each docking stage energy minimisation and validation was carried out in an iterative approach to ensure the fidelity of models. Models were viewed and aligned using the PyMol package [ 39 ]. Quantitative similarity and structural superimposition between molecular models was provided with the TM-align script implemented through PyMol [ 40 ].

Phylogenetic analysis Concatenation of three typically conserved phage genes: T-Coffee was then used to generate a consensus alignment from information provided by the separate programs. Such an approach typically leads to more accurate alignments.