Cleaning 100 PDBs in One Go: Efficient Batch Protein Preparation with SAMSON

Anyone who has worked with protein structures knows the feeling: you download hundreds of PDB files for computational modeling, but before you can even run a simulation or docking, you need to clean them.

Hydrogens missing. Water molecules everywhere. Ligands and ions you don’t need. Alternate locations for atoms. Each one of these issues might cause downstream problems, from inaccurate results to simulation crashes. Cleaning your dataset one structure at a time isn’t just tedious—it’s a productivity killer.

SAMSON’s Batch Protein Prepare extension provides an efficient and repeatable way to prepare dozens—or hundreds—of structures in one go. If you regularly work with large datasets of protein structures, this can significantly reduce time spent on preprocessing.

Automated Yet Customizable

Once installed, the Batch Protein Prepare extension in SAMSON offers two main modes of operation:

  • Folder-based input: Prepare all protein structures in a folder (including subfolders). Supported formats include PDB, PDBx/mmCIF, MMTF, and MOL2.
  • PDB code input: Provide one or more PDB identifiers (either as a string or from a text file), and let SAMSON handle downloading and processing them.

In both modes, the tool applies the same cleaning steps used in Home > Prepare in the SAMSON interface:

  • Remove alternate atom locations, keeping the highest occupancy atoms.
  • Delete ligands, co-factors, water, and monatomic ions.
  • Add hydrogens automatically based on residue type or valence.

Batch Protein Prepare

Why Batch Operation Helps

Let’s say you’re preparing a dataset of 150 proteins for docking simulations. Each file must be consistent in format and free from elements that might interfere with the simulation workflow. Manual preparation of even a single structure might take a few minutes. For a whole dataset, that could scale to hours of repetitive work prone to human error.

With Batch Protein Prepare, the same work can be done in minutes. And importantly, the tool ensures that all proteins are treated consistently, reducing variability and avoiding sample-dependent preprocessing differences.

Maintaining Structure and Output

The extension maintains your folder structure when saving the cleaned files. This is especially useful when working with large hierarchies or datasets from multiple sources. You don’t need to reorganize or manually track which files were cleaned—it’s all done for you automatically.

Who This Is For

If you work in:

  • Structure-based drug design
  • Molecular docking or screening pipelines
  • Protein modeling for simulations
  • AI and ML workflows that leverage uniform protein structures

…the Batch Protein Prepare extension can quickly become a part of your daily workflow.

To learn more about preparing proteins in SAMSON, including manual validation and PDBFixer workflows, visit the full guide here.

SAMSON and all SAMSON Extensions are free for non-commercial use. To get started, download SAMSON at https://www.samson-connect.net.

Comments are closed.