Preparing Multiple Protein Structures Without the Headache

When working with protein modeling pipelines, one task that can quickly become repetitive and error-prone is formatting and cleaning your input structures. Whether you’re setting up simulations, docking experiments, or dataset-wide screening tasks, the protein structures you start with need to be clean and consistent. But what if you have dozens—or hundreds—of PDB files to prepare?

Manual preparation is not only tedious, it’s risky: all it takes is one residual water molecule, a missing hydrogen, or a misnamed residue to cause unexpected results downstream. Fortunately, SAMSON offers a practical and time-saving solution through its Batch Protein Prepare extension.

Why Batch Preparation Matters

Most molecular modeling workflows assume properly preprocessed input structures. If a structure contains solvent, ambiguous atom positions, or lacks hydrogens, this can lead to poor docking scores, simulation instability, or even application crashes. Now imagine running a pipeline on dozens of structures only to find that several failed due to small avoidable irregularities. That’s time lost. The Batch Protein Prepare extension helps you prepare entire folders of structures—or download and clean them directly from PDB codes—in a single step.

How It Works

The Batch Protein Prepare extension automates the same cleaning workflow available under Home > Prepare in SAMSON. This includes:

Removing alternate atom locations (keeping the highest-occupancy ones)
Deleting unnecessary ligands, co-factors, and ions
Stripping water molecules
Adding hydrogens to standard residues using residue-type rules or valence information

Using this extension, you can:

Process an entire folder of structures in bulk. Supported formats include .pdb, .mmCIF, .mmtf, and .mol2.
Download structures automatically from the PDB using a list of identifiers (supports both legacy and extended PDB IDs).

Output structures are saved with the same folder structure as the input, making it easy to map back to original datasets or continue processing without restructuring your files.

A Simple Workflow Example

To clean a folder of downloaded proteins (for example, from the PDB), launch the Batch Protein Prepare extension, specify the input folder, and let SAMSON process each structure. Alternatively, to work with specific identifiers (like 1abc, 2xyz), you can either paste them directly or load them from a text file. SAMSON takes care of downloading the corresponding files and applying standard cleanup steps.

If your input files already contain partially prepared structures, the extension handles them gracefully, only applying the necessary steps. You don’t have to micromanage preprocessing anymore.

When Should You Use Batch Protein Prepare?

This tool is especially helpful for:

Large-scale virtual screening tasks
Benchmarking multiple systems under the same preparation protocol
Generating standardized simulation-ready structures across a dataset

It ensures that preparation steps are uniform across many files, reducing variability caused by inconsistent preprocessing. And the time it saves quickly adds up.

Want to see this in action?

Final Thoughts

Although preprocessing might feel like a small step in your workflow, poor preparation can derail even the most thoughtfully designed protocol. Automating batch preparation with SAMSON turns this necessary step into a minor task, letting you focus more on actual modeling and analysis.

Learn more about preparing protein systems in SAMSON.

SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON here.