If you frequently model proteins from PDB files or need to work with several different protein structures at once, you might already know how time-consuming and error-prone it can be to clean and prepare each structure individually. Protein preparation involves removing solvents and ligands, fixing missing atoms, and ensuring geometry is valid—steps that are critical before running any simulation or docking task.
This need for cleaning becomes a bottleneck when you’re dealing with multiple files or running high-throughput tasks like virtual screening. Fortunately, SAMSON offers an extension that simplifies this task—Batch Protein Prepare.
Streamline your workflow
The Batch Protein Prepare extension in SAMSON allows researchers to automate the preparation of entire folders of protein structures or even fetch and prepare proteins directly from their PDB IDs. This means you can:
- Prepare proteins stored in folders, with support for formats like
.pdb,.cif(mmCIF/PDBx),.mmtf, and.mol2. - Submit a string or text file of PDB IDs to automatically download and clean structures.
- Preserve your subfolder architecture during export, which helps keep data organized.
These batch operations remove alternate atom locations, water molecules, and monatomic ions. They also delete unnecessary ligands and add hydrogens using default rules suitable for downstream tasks like docking and energy calculations.
Who is this useful for?
This tool is particularly helpful for scientists doing virtual screenings, molecular dynamics simulations, or anyone needing to prepare hundreds (or thousands) of protein structures. Instead of opening each file and clicking through a series of manual validation steps, you can trigger one operation and let SAMSON do the rest.
How it works
Once the Batch Protein Prepare extension is installed, you can select a folder containing structures or input a list of PDB codes. SAMSON will then download any missing entries, prepare all structures according to the same cleaning rules, and export them into an output folder that mirrors the original structure.
It is also valuable for teams—consistent, reproducible cleaning means collaborators are working with identically pre-processed models, improving reproducibility and reducing debugging time.
Visual example

Closing thoughts
Working with proteins at scale requires tools that scale with your needs. SAMSON’s Batch Protein Prepare extension offers a streamlined, reliable solution to a common bottleneck in molecular modeling. Instead of spending time manually curating PDB files, scientists can focus on analysis, simulation, and discovery.
Learn more in the full documentation.
SAMSON and all SAMSON Extensions are free for non-commercial use. You can get SAMSON at https://www.samson-connect.net.
