Preparing Dozens of Protein Structures? Try This Workflow-Saving Tool

If you’ve ever had to prepare a large number of protein structures for simulations or docking runs, you know how time-consuming it can be to check alternate locations, add missing atoms, strip solvents, and standardize formats—file by file.

Whether you’re setting up a virtual screening pipeline or preparing a dataset for machine learning, consistent and clean protein models are essential. Fortunately, SAMSON offers a useful solution for handling this kind of repetitive work: the Batch Protein Prepare extension.

What Does Batch Protein Prepare Do?

Instead of preparing structures one at a time, this extension allows you to process entire folders or lists of PDB IDs in one go. It applies the same cleaning steps used in the regular one-click Home > Prepare tool, including:

  • Removing alternate locations (keeping the highest-occupancy atoms)
  • Deleting unwanted ligands, solvents, and ions
  • Adding missing hydrogens and standardizing residue geometries

This not only improves the consistency of your dataset but also significantly reduces manual intervention—and potential for error. You’re freed to focus on the biology behind your systems, rather than wrestling with file formatting or structure issues.

Multiple Input Options

Batch Protein Prepare is flexible with inputs, supporting:

  • PDB, PDBx/mmCIF, MMTF, and MOL2 formats
  • Folders of structures (preserving internal subfolder layout for output)
  • Lists of PDB identifiers supplied as comma-separated strings or text files

This means you can paste in a list of PDB codes or feed an entire data folder, and the extension will handle downloads (if needed) and standardized preparation automatically. If you work with public datasets or want to prepare large batches for training ML models, this is particularly useful.

Getting Started

To start using it:

  1. Download and install the extension from SAMSON Connect.
  2. Launch SAMSON and open the Batch Protein Prepare extension.
  3. Select your input type—folder of files, PDB codes, or text list.
  4. Set your output location and desired preparation options.
  5. Click Run.

The process may take some time depending on dataset size, but you’ll avoid hours of manual work and gain a reproducible pipeline step for structural preparation.

Visual Preview

Here’s what the Batch Protein Prepare interface looks like:

Batch Protein Prepare

Conclusion

Using Batch Protein Prepare in SAMSON can save you significant time and ensure that your system preparation is reproducible and consistent across proteins. Whether you’re curating a dataset or setting up a simulation pipeline, automated tools like this one can make structural biology workflows more efficient—and less error-prone.

To learn more, visit the official SAMSON documentation page for protein preparation and validation.

SAMSON and all SAMSON Extensions are free for non-commercial use. You can get SAMSON here.

Comments are closed.