Saving Hours: How to Prepare Hundreds of Protein Structures at Once with SAMSON

If you’ve ever worked on a large-scale protein modeling or virtual screening project, you’ve probably experienced the tedious task of preparing many protein structures for simulation or docking. Downloading PDB files, cleaning them, removing unwanted molecules, and adding missing atoms manually can take hours if not days.

Fortunately, SAMSON offers a way to automate this entire process through its Batch Protein Prepare extension. If you’ve never used it, this could be a pipeline-transforming addition to your toolbox.

When Manual Prep Becomes a Bottleneck

Let’s say you have a set of 150 PDB IDs you want to screen via molecular docking or dynamics. Each one needs to be cleaned—alternate locations resolved, waters removed, hydrogens added—and possibly downloaded if you only have the PDB codes. Manually handling even 20 structures this way can be error-prone and time-consuming. That’s the problem the Batch Protein Prepare extension is designed to solve.

What Does Batch Protein Prepare Do?

The extension applies a predefined cleaning and preparation protocol to many structures in one go. Here’s what it covers:

  • Automatically downloads structures from PDB based on a list or string of PDB IDs (supports old and extended styles).
  • Prepares existing folders of PDB, PDBx/mmCIF, MMTF, or MOL2 files in bulk.
  • Preserves your folder/file structure in the output for easy downstream tracking.
  • Applies the same steps as in SAMSON’s Home > Prepare panel: removes alternate atom locations, deletes optional ligands/co-factors, strips solvent and ions, and adds hydrogens where appropriate.

This saves significant time especially when dealing with large datasets in drug discovery pipelines or protein structure analyses.

How to Use It

To get started:

  1. Install the Batch Protein Prepare extension via SAMSON Connect.
  2. In SAMSON, go to Home > Prepare and select the batch preparation mode.
  3. Choose your input: either select a folder or input a list of PDB codes.
  4. Launch the preparation. SAMSON will handle downloading (if needed), cleaning, and generating the final ready-to-use protein files.

The extension is flexible and designed for scientists scaling their workflows—letting you process entire libraries without coding or scripting.

What About Output?

The processed files are saved in a user-specified folder, preserving the original folder structure for traceability. This is especially useful when mapping prepared files back to source IDs or experimental data.

Batch Protein Prepare

When Should You Use This?

This tool is particularly helpful in workflows involving:

  • Library-wide docking or scoring.
  • Algorithm benchmarking on a structure dataset.
  • Preparing molecular dynamics simulations in bulk.
  • Cleaning structures for machine learning model training.

Rather than writing scripts or relying on separate tools for downloading and fixing, SAMSON combines everything in a unified interface.

Make It a Habit

Once you integrate Batch Protein Prepare into your molecular design workflow, you’ll likely wonder how you managed without it. It’s ideal for researchers who want to delegate repetitive tasks to software so they can focus on the science.

To learn more or get started, visit the official documentation page.

SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON at https://www.samson-connect.net.

Comments are closed.