Cleaning Hundreds of Protein Structures Without the Headache

If you’ve ever worked with multiple PDB structures, you already know how tedious the data cleaning process can become. Water molecules, ions, alternate locations, missing atoms—they all add up. Multiply that by 20, 100, or even 1,000 structures, and suddenly you’re spending more time cleaning files than running studies.

Fortunately, there’s a better way.

The Batch Protein Prepare extension in SAMSON lets you process large numbers of protein structures quickly and consistently. Whether your goal is docking, molecular dynamics, or binding affinity assessment, this extension helps you spend less time wrangling files and more time doing science.

Why automate protein preparation?

Manually cleaning protein structures might work for one or two files. But for screening studies, or any high-throughput molecular modeling work, manual preparation is slow, error-prone, and hard to standardize.

Using inconsistent preparation steps across datasets impacts reproducibility and accuracy. For example, leftover water molecules or missing hydrogens can alter docking scores. The Batch Protein Prepare extension helps prevent these issues by applying a predictable, clean pipeline to every structure.

What the Batch Protein Prepare Extension Does

Here’s what happens under the hood when you use the extension:

Download PDB structures automatically using PDB codes (support for both old and extended IDs).
Read local structures in formats such as PDB, PDBx/mmCIF, MMTF, MOL2.
Apply standard cleaning operations like the one-click Home > Prepare tool: remove water, strip alternate atom locations, delete unwanted ligands, and add hydrogens.
Preserve subfolder structures if you’re preparing entire directories of files.

Everything is done in one streamlined batch process. No more repetitive clicking or scripting for common tasks.

Who is this for?

If you’re a molecular modeler working on any of the following, this tool can help:

Virtual screening campaigns
Comparative modeling
Protein-ligand docking
Binding energy calculations
Machine learning pipelines requiring standardized macromolecular inputs

How to get started

You can install the extension from the SAMSON Connect site. Once installed, it’s accessible from the extension panel. You’ll be able to:

Select a folder of PDB files to clean
Or enter a list of PDB codes for automatic download and preparation
Launch the batch protocol and get consistent, cleaned structures in minutes

The interface is straightforward, and most users can run their first batch within minutes of installing.

Conclusion

Batch preparation isn’t just about saving time—it’s about improving consistency and reducing human errors. Whether you’re building datasets or running high-throughput modeling jobs, having a dedicated workflow tool makes a measurable difference in both efficiency and quality.

To learn more, including step-by-step instructions and extended capabilities, visit the SAMSON documentation: https://documentation.samson-connect.net/tutorials/prepare-protein/prepare-protein/

SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON at https://www.samson-connect.net.