Molecular modelers often need to prepare dozens—or even hundreds—of protein structures to start simulations, perform docking, or build databases. The problem? Manual cleaning is slow, repetitive, and error-prone. Writing scripts helps, but it can be time-consuming to maintain and doesn’t always cover edge cases.
That’s where SAMSON’s Batch Protein Prepare extension comes in: it streamlines the preparation of multiple protein files without writing a single line of code. Whether you’re working with a folder full of downloaded structures or a list of PDB codes, Batch Protein Prepare can help you automate and standardize your preprocessing workflow—reliably and quickly.
Why Batch Preparation Matters
Protein preparation is crucial. Uncleaned structures may contain alternate locations, waters, co-factors, or missing atoms that interfere with calculations or cause crashes. When preparing multiple structures, consistency becomes even more important:
- You need to keep track of which structures were prepared and how.
- Manual clicking slows down processing and introduces inconsistency.
- Scripting is a solution, but not everyone scripts.
This extension avoids all those issues by providing a user-friendly interface built into SAMSON.
How Batch Protein Prepare Works
The extension applies the same trustworthy preparation workflow you’d find in Home > Prepare, including:
- Removing alternate locations (keeping highest-occupancy atoms)
- Deleting unnecessary ligands, waters, and monatomic ions
- Adding hydrogens
Input formats supported: PDB, PDBx/mmCIF, MMTF, MOL2
Ways to provide input:
- Select a folder containing structure files (preserves output folder hierarchy)
- Provide a list of PDB identifiers (plain text or comma-separated)
The extension will download missing structures (if needed) and apply uniform cleaning based on your selections.
Use Cases & Workflow
- Building training datasets for machine learning
- Preparing libraries for docking (e.g., for AutoDock Vina)
- Generating pre-cleaned input files for molecular dynamics simulations
For example, say you want to prepare 100 protein-ligand complexes for a virtual screening benchmark. Just collect the PDB codes, drop them into Batch Protein Prepare, and process them with a consistent preparation pipeline. You can move straight to simulation, sidestepping the tedious clean-up steps.

Why This Approach is Useful
Unlike bespoke pipelines or in-house scripts, this extension is:
- Graphical – accessible to users who aren’t programming experts
- Fast – processes entire directories at once
- Reproducible – applies the same system-wide policy across inputs
It handles tricky edge cases like inconsistent residue naming and alternate atom positions in a standardized way.
Conclusion
Preparing large sets of proteins shouldn’t require hours of manual polishing or complex scripting. With SAMSON’s Batch Protein Prepare, you can focus on your science while ensuring your inputs are consistently clean.
Learn more about protein preparation in SAMSON.
SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON here.
