C.neoformans var grubii
Leptospira spp.

Multilocus sequence typing of Streptococcus pyogenes

Please direct all enquires to the database curator:
Karen McGregor (


MLST was developed for Streptococcus pyogenes (group A streptococci, GAS) by Mark Enright in the laboratory of Brian Spratt together with the laboratory of Debra Bessen as a resource for researchers worldwide.  The internet-accessible database, funded by the Wellcome Trust and hosted at Imperial College London, allows unambiguous comparison of data between different laboratories.

The S. pyogenes MLST database currently contains information on over 1000 isolates, obtained from cases of serious invasive disease, upper respiratory tract infection and impetigo, as well as macrolide-resistant isolates.   Many of the > 150 recognised emm-types are represented in this set.

Investigators carrying out MLST on this species are encouraged to submit their data to the curator so that allelic profiles and strain details can be added to the database.  In this way the MLST database becomes an increasingly useful resource for the S. pyogenes community.

Enright MC, Spratt BG, Kalia A, Cross JH and Bessen DE.  2001. Multilocus sequence typing of Streptococcus pyogenes and the relationships between emm type and clone. Infection and Immunity 69:2416-2427

McGregor KF, Spratt BG, Kalia A, Bennett A, Bilek N, Beall B, Bessen DE.  2004.  Multilocus sequence typing of Streptococcus pyogenes representing most known emm types and distinctions among subpopulation genetic structures.  Journal of Bacteriology 186:4285-4294

[top of page]

Acknowledging the use of the MLST databases in your publications

Please acknowledge the use of this site in your publications as follows: 'We acknowledge the use of the Streptococcus pyogenes MLST database which is located at Imperial College London and is funded by the Wellcome Trust'.

[top of page]

The seven loci and the primers and conditions used for PCR

The S. pyogenes MLST scheme uses internal fragments of seven housekeeping genes amplified by PCR using the following primer pairs:-

Genes and Function

 Sequences (5'-3')

 Size of amplicon   
 used for  
 assigning alleles




glucose kinase












glutamine transporter protein












glutamate racemase












DNA mismatch repair protein

























xanthine phosphoribosyl transferase











acetyl-CoA acetyltransferase












PCR conditions

The PCR reactions are performed in volumes of 50 mL, with an initial denaturation at 95oC for five min, followed by 28 cycles of 95oC for 1 min, 55oC for 1 min and 72oC for 1 min. The amplified DNA fragments are purified either by precipitation with polyethylene glycol or using a commercial PCR purification kit. The sequence of each fragment is obtained on both strands using the same primers as those in the initial PCR amplifications.

As the same primers are used for amplification and sequencing, it is important that only a single DNA fragment is amplified in the initial PCR. This may involve some optimisation of the annealing temperature and other PCR conditions in individual laboratories.

[top of page]
 Obtaining an allelic profile and comparing your strains with those in our database

The allelic profile of a strain is based on the sequence of internal fragments of the seven housekeeping genes.  The sequences have to be trimmed so that they correspond exactly to the region that we use to define the alleles. The sequences of the seven loci from a typical GAS can be obtained below and can be used to ensure that your sequences have been trimmed correctly.  The sequences must be obtained on both strands, and they must be 100% accurate, since even a single error may convert a known allele into a novel allele.

Click the name below to obtain a correctly trimmed sequence for that locus
gki_ | gtr_ | muri | muts | recp | xpt_ | yqil

The S. pyogenes database can be interrogated in a variety of ways:

The locus query options allow you to obtain an allele number for each of you sequences.  You can assign your alleles one locus at a time by selecting the single locus option or, by using the multiple locus option, you can cut and paste the correctly trimmed sequence for all seven loci of a query strain into the corresponding boxes. 

The software will check that the sequences are the correct length and that they do not contain any unrecognised characters.  A check is also made to see if the submitted sequence is at least 70% similar to another allele at that locus (in case you have cut and pasted a sequence into the wrong box, or selected the wrong locus from the drop down menu).  If the sequence corresponds to a known allele, the allele number will be returned.  If the sequence appears to be a new allele it should be compared with the sequence of the most similar allele for that locus to check that any nucleotide differences are real. If you are convinced you have a new allele, you should submit the sequence traces to the database curator ( who will check your data, and provide you with a new allele number, and add your new allele to the database.

The profile query options allow you to search the database for allelic profiles matching your own and to obtain information on strains with that allelic profile.   After you have obtained the allele numbers at each locus for your query strain, you can select allelic profile query and enter the seven integers.  If the allelic profile is in the database, the sequence type assigned to this allelic profile will be returned along with details of any S. pyogenes isolates that are identical to the one you submitted.  You can also search for isolates that have allelic profiles that are similar to yours (e.g. isolates that have at least 4/7, 5/7 or 6/7 matches to the submitted allelic profile) and show relationships between your query strain and these strains by using the tree button.

Further details about strains that are identical, or similar, to the query strain can be obtained by clicking on the strain names.

There is also an option to perform a database query (e.g. to look at the details of all strains of a particular emm-type) or for more advanced querying.

If you have sequenced a large number of strains, options are available in the batch query menu to allow data from multiple strains to be entered simultaneously. 

For many of these pages, help boxes (?) are available with further details on how to enter and retrieve data.

[top of page]

Non-standard alleles

Strains have been identified which do not produce housekeeping gene fragments of described size (e.g. yqiL allele 48 has a 3-nt deletion).  To assign an allele number to a sequence of non-standard length you should use the single locus option from the locus query menu.  The sequence will be identified as being a non-standard length and a further search option lets you query a database of such alleles.  An allele number will be returned if the sequence corresponds to a known allele.

Strains have been identified that lack the yqiL locus.  Due to the way the software supporting the MLST database works, an allele number must be entered for all seven loci to obtain an ST assignment for a query allelic profile.  To obtain an ST for a query strain lacking yqiL, the yqiL allele should be entered as ‘67’ in the allelic profile query page.

Submitting your data to the MLST database

Each database is maintained by a curator and data can only be entered into a database by a curator.  The curator of the S. pyogenes database is Karen McGregor.

Submitting a new allele

Please send (preferably by email) two sequence trace files (one in each direction; note: these do not have to be edited) for the new allele to the database curator, along with the trimmed sequence (in a text file or within the body of the email) of the proposed new allele. 

Upon visual inspection of the trace files the curator will assign an allele number and enter the sequence of the new allele into the database.  If the curator feels the trace files do not clearly show the identity of the unique nucleotide(s) a number will not be assigned.  The curator will contact you explaining the reasons why this allele was not accepted and give you the opportunity to submit another trace file for this allele.

Submitting a new allelic profile

To be assigned with a new ST designation you should submit the allelic profile and information on a representative strain with epidemiological data to the database curator, who will enter it in the MLST database and assign an ST number.  If the new allelic profile contains a new allele, sequence trace files need to be sent to the curator as described above.

It should be noted that submission of a new ST which is a novel combination of known alleles does not require the submission of sequence trace files.  There is, of course, the potential that one of these alleles has been sequenced incorrectly and the onus is on the submitter to ensure that the allelic profile is correct.  It is strongly recommended that if a new ST is identified that varies at only a single locus from a previously identified ST, sequencing of this variant locus is repeated. 

If you are submitting information on a number of strains at one time, a template excel form is available which can be used for submissions.  The template can be obtained from the database curator.

Submitting strain information

Investigators are strongly encouraged to submit ST and strain information on all their isolates, not just ones with new STs.  The database will be of most use to researchers if as much information as possible on as many isolates as possible is included. 

To submit information on isolates with previously reported STs a template excel form can be used.  This form can be obtained from the database curator.

[top of page]

MLST of group C and G streptococci

Group C and G streptococci carry genes that are highly homologous to the seven housekeeping loci used in the S. pyogenes MLST scheme and, in many cases, will amplify fragments of these loci using the S. pyogenes MSLT primers.

A protocol which produces comparable MLST information for group C and G streptococci is currently being developed.   In the future this database will contain MLST information on all three Lancefield groups together with appropriate guidance on the use and interpretation of this data.  Information on these pages will be updated accordingly.

[top of page]

Profile Query

Locus Query

Batch Query