Quickly Count Specific Atom Types in Molecular Folders with NSL

When working with complex molecular systems, it’s often necessary to select folders based on the precise number of atoms they contain. Whether you’re building datasets, analyzing molecular systems, or preparing simulations, filtering by atom counts can save time and reduce errors. Fortunately, if you’re using the SAMSON platform, you can leverage the Node Specification Language (NSL) to do just that.

This post explores how to use folder attributes in NSL to quickly query molecular folders by the number of specific atom types, using attributes like f.nC for carbon atoms, f.nH for hydrogen atoms, and others. This feature is especially useful when managing large projects where folder-level summaries are key.

Why Filter Folders by Atom Counts?

Imagine you’re working with dozens or hundreds of structures. Maybe you’re preparing training sets where each molecule needs to contain between 10 and 20 carbon atoms. Manually opening each folder and checking atom counts is tedious. This is where NSL’s filtering capabilities become essential: a powerful but simple query instantly highlights the folders you need.

How the Filtering Works

Use the folder attribute space, denoted with the short name f, followed by the appropriate short name for the atom type you want to filter by.

Short names for atom types include:

f.nC – number of carbon atoms
f.nH – number of hydrogen atoms
f.nN – number of nitrogen atoms
f.nO – number of oxygen atoms
f.nS – number of sulfur atoms

You can perform numerical comparisons with these attributes. For example:

f.nC < 10 – folders with fewer than 10 carbon atoms
f.nH 10:20 – folders with 10 to 20 hydrogen atoms inclusive
f.nN > 5 – folders with more than 5 nitrogen atoms

NSL supports a concise syntax, making it practical for quick filtering—even in the largest molecular environments.

Combining Filters

Filters can also be combined logically. For instance, to select folders that have more than 10 carbon atoms and fewer than 5 sulfur atoms, you could use:

f.nC > 10 and f.nS < 5

Who Can Benefit?

This is useful for molecular modelers who:

Need to curate molecular libraries based on structure size or atom types
Prepare input structures for computational chemistry simulations
Filter out folders based on molecular complexity
Build machine learning datasets requiring chemical diversity

What Makes This Helpful

NSL filtering operates at the folder level, meaning it’s computationally light and fast even across extensive datasets. This is a great way to introduce script-level automation without actually having to write scripts, thanks to NSL’s declarative syntax.

To explore all folder attribute capabilities, including additional atom types, structural counts, and full attribute definitions, visit the official documentation:

https://documentation.samson-connect.net/users/latest/nsl/folder/

SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON at https://www.samson-connect.net.