If you’ve ever needed to prepare more than a handful of protein structures for docking, molecular dynamics, or energy calculations, you’ve probably faced this challenge: dealing with inconsistencies, missing atoms, or unwanted molecules across dozens (or hundreds) of PDB files.
Manual cleaning isn’t just tedious—it’s error-prone. Forgetting to strip a co-factor or missing an alternate atom location can skew results or crash downstream computations. Fortunately, if you’re working with SAMSON, there’s a way to automate these repetitive steps while keeping full control over your inputs and outputs.
Introducing SAMSON’s Batch Protein Prepare
The Batch Protein Prepare extension lets you process entire folders of protein structures—or download and prepare them directly using PDB codes. You get the same cleaning options available in the one-click preparation feature, but scaled to handle your entire dataset.
The extension is especially useful when:
- You need to prepare a protein library for virtual screening.
- You want to clean downloaded structures in bulk instead of opening each one manually.
- You need consistency across all cleaned structures for downstream workflows.
What Does It Do Exactly?
Batch Protein Prepare includes automated options to:
- Remove alternate locations – retains only highest-occupancy atoms to avoid uncertainty in positions.
- Delete unwanted ligands – essential if you want to isolate protein chains from solvents or co-factors.
- Strip water and monatomic ions – removes molecules that can interfere with many modeling protocols.
- Add hydrogens – based on valency or standard residue types, which can speed up pre-simulation hydrogen detection.
Whether you’re working with .pdb, .mmCIF, .mmtf, or .mol2 files, Batch Protein Prepare supports them all.
Working with PDB Codes Instead?
If you have a list of PDB identifiers from a screening dataset or a publication, you can simply provide them as a string or in a text file. The extension downloads the corresponding files automatically and applies the cleaning workflow.
Folder Structure Retention
One small but useful detail: Batch Protein Prepare preserves your folder structure. If your inputs are sorted into subfolders (for example, by protein family or functional group), your cleaned files will appear in the same organization—ready for post-processing or scripting.
Why It Matters
By automating a notoriously repetitive step, the Batch Protein Prepare extension helps save time and reduce human errors. It also ensures that your pipeline starts with cleaner, more consistent inputs—improving reproducibility and simplifying downstream analysis.

To get started, install the Batch Protein Prepare extension from SAMSON Connect.
To learn more, visit the official documentation.
SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON from https://www.samson-connect.net.
