Filtering molecules by composition with NSL: A guide to atom count attributes

Molecular modelers often work with large datasets of molecules, ranging from small organic compounds to complex proteins. In such workflows, it’s common to filter molecules based on their composition: whether they contain a certain number of carbon atoms, nitrogen groups, or overall atom count. But navigating large structural datasets manually just to find molecules with specific characteristics can be time-consuming.

In SAMSON, the Node Specification Language (NSL) offers an efficient, expressive way to query molecule nodes based on their attributes. One powerful application is filtering molecules based on their atomic composition using attributes like numberOfAtoms, numberOfCarbons, numberOfHydrogens, and more.

Why filter by atom counts?

  • To quickly isolate molecules suitable for quantum calculations (e.g., fewer than 100 atoms).
  • To analyze families of molecules based on their elemental composition.
  • To prepare datasets for machine learning workflows with specific size ranges.

Instead of counting atoms manually, you can define expressive filters using NSL directly in SAMSON.

Key attributes to know

Within the molecule attribute space (mol for short), here are important count-based attributes you can use:

  • numberOfAtoms (mol.nat): Total number of atoms in the molecule.
  • numberOfCarbons (mol.nC): Number of carbon atoms present.
  • numberOfHydrogens (mol.nH): Number of hydrogen atoms.
  • numberOfNitrogens (mol.nN): Number of nitrogen atoms.
  • numberOfOxygens (mol.nO): Number of oxygen atoms.
  • numberOfSulfurs (mol.nS): Number of sulfur atoms.
  • numberOfCoarseGrainedAtoms (mol.ncga): If using coarse-grained models, the count of coarse-grained atoms.
  • formalCharge and partialCharge: For filtering based on charge properties.

Practical examples

Want to find all visible molecules with fewer than 10 carbon atoms?

Looking for neutral molecules (formal charge = 0) that contain between 100 and 200 atoms?

Need to ignore molecules containing more than 5 sulfurs for a certain workflow?

Using such expressions, you can interactively explore only the parts of your system relevant to your current task, saving time and reducing cognitive load.

Composability for advanced filters

Multiple attributes can be easily combined, helping you define sophisticated filters. For example, selecting molecules that:

  • Are visible
  • Contain between 5 and 10 oxygens
  • Have fewer than 500 total atoms

…can be done with:

Having these short and readable filters becomes much more useful than adopting custom scripts or closing and opening molecular files manually.

Explore the full list of count-based attributes and their syntax in the official documentation page.

SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON at https://www.samson-connect.net.

Comments are closed.