Cleaning Up Dozens of Protein Structures in One Go

If you’ve ever worked with large sets of protein structures—from databases like the PDB or in-house experimental data—you already know how time-consuming the preparation step can be. Downloading structures, checking for missing atoms, stripping unnecessary molecules, adding hydrogens… doing it manually for tens or hundreds of files is not just boring, it’s prone to mistakes and inconsistent setups.

The Batch Protein Prepare extension in SAMSON helps solve this common pain by automating the structure-cleaning workflow across multiple files or PDB codes. Whether you’re running drug docking campaigns or setting up batches for molecular dynamics, this tool can free you from repetitive cleanup tasks and let you focus on the science.

Why batch preparation matters

Poorly-prepared input structures often lead to simulation failures, inaccurate results, or downstream bugs that are tricky to diagnose. When working with multiple structures, ensuring that each one undergoes the same validation and cleanup process is essential.

The Batch Protein Prepare extension brings consistency by applying the same cleaning steps to each file. These include:

Removing alternate locations (low-occupancy atoms)
Deleting unwanted ligands, co-factors, or ions
Stripping solvent molecules like water
Adding hydrogens based on residue type or valence

Two ways to start

You can either:

Process a folder of local structure files (supports PDB, mmCIF, MMTF, and MOL2)
Fetch structures based on PDB codes—from a simple list in a text file or a comma-separated string

This means whether you’re working with pre-downloaded datasets or just a list of PDB identifiers, you get the same standardized result.

Maintaining structure hierarchy

When cleaning local folders of structures, the output generated by the extension preserves the original folder structure. This is especially useful if you’ve organized your dataset based on conditions, source organisms, or any other hierarchy: no need to lose metadata or spend time reorganizing after the cleanup.

Handling edge cases

The batch module is designed to be reasonably robust. For instance:

It supports both standard and extended PDB identifiers
It automatically downloads and prepares missing structures
It applies consistent rules when dealing with ambiguous or missing data (e.g., occupancy values)

No code required

If you’re not into scripting or if you just want a graphical interface that gets the job done, no problem—the Batch Protein Prepare extension comes with a simple and intuitive UI.

When should you use it?

Here are typical use cases where the Batch Protein Prepare extension adds value:

Virtual screening workflows involving many protein targets
Systematic analysis across structural families
Batch simulations or ensemble-based modeling
Teaching or training sessions with large sets of examples

One more tip

If you need even deeper fixes—like missing residues or protonation at a specific pH—you can combine batch pre-processing using Batch Protein Prepare with further corrections using the PDBFixer extension, also available in SAMSON.

To learn more and access detailed steps and examples, visit the original documentation page: https://documentation.samson-connect.net/tutorials/prepare-protein/prepare-protein/

SAMSON and all SAMSON Extensions are free for non-commercial use. To get SAMSON, visit https://www.samson-connect.net.