Cleaning Hundreds of Protein Files Doesn’t Have to Be Painful

If you’ve ever worked with protein structure files downloaded from the Protein Data Bank (PDB), you’ve probably experienced one or more of the following:

  • Manually opening and cleaning each PDB file, one at a time
  • Writing custom scripts to remove unwanted residues or atoms
  • Spending more time preparing data than analyzing results

It gets even more complicated when dealing with large datasets for ligand screening or molecular dynamics simulations. Many researchers find themselves in this bottleneck stage before they can even start their actual science.

The Batch Protein Prepare extension in SAMSON was designed to address exactly this pain point. Whether you’re cleaning a handful or hundreds of protein files, you can standardize the process and save a significant amount of time. Here’s how.

Automated Preparation, at Scale

The Batch Protein Prepare extension allows you to apply powerful cleaning steps—previously only available manually—to multiple structures, all in one go. You can:

  • Automatically download structures based on PDB codes
  • Prepare structures locally from folders (supports PDB, PDBx/mmCIF, MMTF, MOL2)
  • Preserve internal subfolder structures to keep output organized

This means you can set up your preprocessing once, and run it on dozens—or hundreds—of structures without having to intervene repeatedly.

What Does “Prepare” Actually Do?

The batch preparation tool applies the same steps as the single-click “Home > Prepare” function in SAMSON:

  • Removes alternate atom locations based on occupancy values
  • Deletes ligands or small molecules
  • Strips water molecules
  • Removes ions
  • Adds hydrogens based on residue types or valence

This functionality is critical for ensuring the structural integrity of proteins before proceeding to docking, molecular dynamics, or other simulation steps. Malformed or inconsistent structures can lead to errors or unreliable results down the line.

When Might You Use This?

Batch Protein Prepare is useful in a variety of scenarios, such as:

  • Preparing input files for virtual screening campaigns
  • Standardizing a dataset of proteins for comparative modeling
  • Cleaning structures downloaded from PDB for reuse in teaching or publication

Getting Started

Getting access to Batch Protein Prepare is easy within the SAMSON ecosystem. You can find the extension here or through the SAMSON Extension Store. Once installed, its interface lets you either specify a folder containing your target structures, or a list of PDB identifiers—SAMSON will handle the rest, even downloading the files if needed.

Batch Protein Prepare

Whether you’re a student assembling training data or a research team processing high-throughput results, this tool simplifies a previously time-consuming process into a reliable, reproducible workflow.

To learn more, visit the full documentation page: https://documentation.samson-connect.net/tutorials/prepare-protein/prepare-protein/

SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON at https://www.samson-connect.net.

Comments are closed.