segid vs chain ID’s using the ribosome as an example

UPDATE: found this command in pymol

set ignore_case, 0

Which, when set to 0, activates the recognition and differentiation of lowercase and uppercase chain IDs in Pymol thereby circumventing some of the problems working with large molecules in PyMol!

When working with ribosome structures the traditional chain ID format of PDB files unfortunately falls shorts, as the full 70/80S ribosome contains far more chains than there are letters in the english alphabet. So all deposited ribosome structures are split into multiple PDB entries containing single subunits of each molecule in the asymmetric unit. Sometimes they are even divided further into proteins and rRNA for each single subunit yielding a total of eight depositions for one structure with two molecules in the ASU. Normally at least four entries are required for a standard ribosome structure. However this is still not enough to circumvent the inherent shortcomings of the single letter chain ID format. Therefore some authors resort to using numbers, capital and lowercase letters, or double letters in the chain IDs to differentiate the chain ID's from each other. While it is possible to select chain ID's consisting of numbers in PyMol you can not differentiate between lowercase and capital letters or double letter chain IDs.  This is obviously a problem when working with a full 70/80S ribosome structure, because you can not make specific selections using chain IDs and or resid only. Luckily PyMol does support something called segment ID or segid for short. The segid is a four character long string which can be used for describing specific segments within the same chain ID. The segid is, as fare as I know, not any longer officially part of the PDB format, but I can still be used.

I was lucky enough to get hold of a special version PDB file of the yeast 80S ribosome from the authors of the structure. What sets this specific PDB file apart from the 80S ribosome  pdb file you can combine yourself from the deposited PDB files, is the methodical implementation of segid’s. The PDB file contains specific segid’s for all of the ribosomal proteins (and the ligands) and the rRNA molecules for both molecules in the ASU. The clever part of this, is that it only has two chain ID’s: chain a and chain b, which each correspond to a full 80S molecule in the ASU. However as a segid is four letters long the chain ID has also been implemented in the segid. This means that a specific molecule,  no matter which molecule in the ASU and which subunit it belongs to, can specifically be selected by simply typing e.g. “select test, segid s20b”. This command will only select the small subunit protein 20 in the ASU molecule b. Similar molecule in both of the  molecules in the ASU can be further be selected simply by altering the above command to, “select test, segid s20*”. For the rRNA the segid is a bit different, but basically the same. for example to select the large rRNA molecule 25S in the ASU molecule B, all I need to type is “select test, segid 25sb”. Here the “s” in the segid does not indicate the small subunit, but simply refers to the sedimentation rate in svedberg units of the rRNA molecule. b is still the chain id of the ASU molecule.

This clearly demonstrates the power of the segid contra the chain ID, when dealing with truly multi chain structures, such as the ribosome. In general it is very few structures where the segid is truly needed, but if you have ever worked in PyMol with a PDB file containing lower and uppercase chain IDs, you will know how utterly frustrating it is.

Warning: count(): Parameter must be an array or an object that implements Countable in /var/www/ on line 405

One thought on “segid vs chain ID’s using the ribosome as an example”

Leave a Reply

Your email address will not be published. Required fields are marked *