If you’ve ever worked with multiple protein structures from the Protein Data Bank (PDB), you’ve probably experienced the repetitive – and tedious – process of cleaning them: stripped waters, removed alternate locations, deleted ligands, added missing hydrogens… the list goes on. Doing this manually for a handful of files is annoying. Doing it for hundreds? Practically impossible – unless you want to spend days clicking or writing complex scripts.
Fortunately, there’s an alternative that doesn’t require coding: the Batch Protein Prepare extension in SAMSON. It enables fast, consistent preprocessing of large sets of proteins with a few clicks.
Why batch preparation matters
Before using protein structures for applications like docking, dynamics, machine learning, or screening, they often require significant cleanup. Without it, downstream tasks can fail or return poor results. Batch preparation matters when:
- You download multiple structures from the PDB for comparative modeling.
- You want to scan many systems using a pipeline (e.g., Vina-based docking).
- You need consistent preprocessing for training data used in structural ML models.
What Batch Protein Prepare does
Available as a free extension in SAMSON, Batch Protein Prepare helps you:
- Automatically download and prepare proteins from their PDB codes (old or extended).
- Apply consistent cleanup (removes waters, ions, and unwanted ligands, adds hydrogens, resolves alternate locations).
- Process multiple files in a folder, with support for various file formats (PDB, mmCIF, MMTF, MOL2) and preservation of your folder structure.
This is all done through a user-friendly graphical interface – no scripting required.
Example use case
Imagine you have 150 protein-ligand complexes downloaded from the PDB. You want to:
- Keep only the protein.
- Remove waters, ions, alternate conformations.
- Add missing hydrogens.
With Batch Protein Prepare, you can point to your input folder – or even just list the PDB codes – and let the extension take care of everything. All files are cleaned identically, stored in the same folder hierarchy.
This approach ensures reproducibility and consistency, which is essential for screening campaigns or training datasets.
Getting started
To use Batch Protein Prepare in SAMSON:
- Launch SAMSON and install the extension from here: Batch Protein Prepare Extension.
- Open the extension from the Extensions menu or the main interface.
- Choose to input a folder of protein files or a list of PDB codes.
- Set your preferences for what to keep or remove.
- Click prepare – and that’s it.

Learn more
To explore filtering options, supported formats, and advanced controls, check the complete documentation on protein preparation in SAMSON.
SAMSON and all SAMSON Extensions are free for non-commercial use. You can get SAMSON at https://www.samson-connect.net.
