If you work with large structural datasets, you’re probably familiar with the tedium of preparing multiple protein files—removing water molecules, deleting unnecessary ligands, fixing missing atoms—before launching simulations or docking studies. Doing this manually for dozens or even hundreds of structures isn’t just time-consuming—it’s inefficient and error-prone.
The Batch Protein Prepare extension in SAMSON offers a streamlined way to automate this entire process. Whether you’re collecting PDB files from public databases or cleaning up your own experimental models, this extension ensures that your input files are ready for downstream workflows with minimal effort.
Why Batch Preparation Matters
Accurate protein modeling depends on high-quality input data. Poorly prepared structures can lead to inaccurate binding affinities, simulation crashes, or misinterpreted results. Here are a few common issues with raw PDB files:
- Missing atoms or residues
- Alternate atom locations causing ambiguity
- Unnecessary heterogens like ions, solvents, or co-factors
Manually addressing each of these across multiple files quickly becomes unmanageable. Batch Protein Prepare removes the overhead.
How It Works
Once installed from SAMSON Connect, the Batch Protein Prepare extension lets you:
- Process entire folders of PDB, PDBx/mmCIF, MMTF, or MOL2 files, maintaining the original folder structure for outputs.
- Download structures automatically based on PDB codes (either as a single string or from a text file list).
- Apply standard cleaning steps automatically—remove alternate locations, delete unnecessary ligands and solvents, strip ions, and add hydrogens based on residue type or valence.
This allows you to prepare large datasets with a consistent procedure—a must for any work involving structure-based virtual screening, statistical modeling of protein properties, or training AI models on biomolecular structures.
Use Cases
If you’re screening ligand libraries against multiple target structures, validating predicted complexes, or assembling training data for machine learning, Batch Protein Prepare can save hours of manual prep while ensuring consistency across your workflows.
The extension supports both local files and remote downloads from the RCSB PDB, handling the entire operation in one go.
Sneak Peek
Here’s what the Batch Protein Prepare interface looks like:

Final Thoughts
Manually cleaning PDB files isn’t just a productivity bottleneck—it can also introduce inconsistent results. The Batch Protein Prepare extension handles this crucial (but often overlooked) step so you can focus on your science, not file prep.
To learn more about protein preparation workflows in SAMSON, visit the full documentation: https://documentation.samson-connect.net/tutorials/prepare-protein/prepare-protein/
SAMSON and all SAMSON Extensions are free for non-commercial use. You can get SAMSON at https://www.samson-connect.net
