Processing Multiple Protein Structures Without the Headaches

Working with dozens (or hundreds) of PDB files can be a common challenge in computational biology and drug discovery. Whether you’re screening compounds, running molecular dynamics, or preparing datasets for machine learning, there’s one recurring issue: curating and prepping all those structures. One file might be missing hydrogens. Another might still contain water molecules or irrelevant ligands. And repeating the same cleanup steps manually? Tedious at best, error-prone at worst.

This is where the Batch Protein Prepare extension in SAMSON can help. It simplifies large-scale protein preparation workflows by applying consistent cleaning and preparation steps across many files in one go.

Why Batch Preparation Matters

Many molecular modeling pipelines depend on consistent and reliable input structures. Variability or omissions—such as missing hydrogens or lingering solvent—can propagate downstream and alter the scientific outcomes you’re trying to measure. Prepping one file manually might be fine. But what if you’re working with hundreds?

The Batch Protein Prepare extension offers a way to:

Apply standardized preparation steps like removing alternate atom locations, stripping solvents/ions, and adding hydrogens.
Download structures automatically using their PDB IDs, without leaving the platform.
Preserve internal folder structures when processing large directories of input files.

How It Works

The extension provides a simple interface that takes either a folder of structures or a list of PDB identifiers. It supports multiple input formats, including:

.pdb
.mmCIF / PDBx
.mmtf
.mol2

Each file is processed with the same protocol used in SAMSON’s Home > Prepare tool. This includes:

Removing alternate atom locations (retaining higher-occupancy atoms)
Deleting ligands or co-factors not needed for further analysis
Clearing water and monatomic ions
Adding hydrogens based on residue type or valence

Need to prep 50 structures for docking with AutoDock Vina? Just enter the PDB codes, let Batch Protein Prepare download them, clean them, and output fully-prepped files. No coding required, no manual editing needed.

Example Use Case

Let’s say you’re setting up a computational screening campaign and obtained 100 target proteins in PDB format. Rather than manually opening, editing, or checking each one for consistency, you can:

Put all the files in a folder
Launch the Batch Protein Prepare extension
Select your folder and output location
Click Run

The extension will process the files—including downloading updated versions if needed—and apply all preparation steps consistently. The output is ready for use in downstream modeling tasks without additional curation.

This solution is especially helpful for researchers using structural databases or automating repetitive modeling workflows. The ability to cleanly and consistently prepare entire sets of proteins saves time and reduces the chances of introducing manual error—making your results more reproducible and your life easier.

To learn more about protein preparation and validation in SAMSON, visit the full documentation page.

SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON here.