Cleaning Hundreds of PDBs Without Scripting: Protein Batch Preparation in SAMSON

If you’ve ever worked with multiple protein structures from the Protein Data Bank (PDB), you’ve probably experienced the repetitive – and tedious – process of cleaning them: stripped waters, removed alternate locations, deleted ligands, added missing hydrogens… the list goes on. Doing this manually for a handful of files is annoying. Doing it for hundreds? Practically impossible – unless you want to spend days clicking or writing complex scripts.

Fortunately, there’s an alternative that doesn’t require coding: the Batch Protein Prepare extension in SAMSON. It enables fast, consistent preprocessing of large sets of proteins with a few clicks.

Why batch preparation matters

Before using protein structures for applications like docking, dynamics, machine learning, or screening, they often require significant cleanup. Without it, downstream tasks can fail or return poor results. Batch preparation matters when:

  • You download multiple structures from the PDB for comparative modeling.
  • You want to scan many systems using a pipeline (e.g., Vina-based docking).
  • You need consistent preprocessing for training data used in structural ML models.

What Batch Protein Prepare does

Available as a free extension in SAMSON, Batch Protein Prepare helps you:

  • Automatically download and prepare proteins from their PDB codes (old or extended).
  • Apply consistent cleanup (removes waters, ions, and unwanted ligands, adds hydrogens, resolves alternate locations).
  • Process multiple files in a folder, with support for various file formats (PDB, mmCIF, MMTF, MOL2) and preservation of your folder structure.

This is all done through a user-friendly graphical interface – no scripting required.

Example use case

Imagine you have 150 protein-ligand complexes downloaded from the PDB. You want to:

  1. Keep only the protein.
  2. Remove waters, ions, alternate conformations.
  3. Add missing hydrogens.

With Batch Protein Prepare, you can point to your input folder – or even just list the PDB codes – and let the extension take care of everything. All files are cleaned identically, stored in the same folder hierarchy.

This approach ensures reproducibility and consistency, which is essential for screening campaigns or training datasets.

Getting started

To use Batch Protein Prepare in SAMSON:

  1. Launch SAMSON and install the extension from here: Batch Protein Prepare Extension.
  2. Open the extension from the Extensions menu or the main interface.
  3. Choose to input a folder of protein files or a list of PDB codes.
  4. Set your preferences for what to keep or remove.
  5. Click prepare – and that’s it.

Batch Protein Prepare

Learn more

To explore filtering options, supported formats, and advanced controls, check the complete documentation on protein preparation in SAMSON.

SAMSON and all SAMSON Extensions are free for non-commercial use. You can get SAMSON at https://www.samson-connect.net.

Comments are closed.