When working with large datasets of molecular structures, it can be difficult to focus only on the relevant models for your research. Whether you’re analyzing molecular dynamics, running simulations, or preparing datasets for machine learning, filtering models based on atomic composition is a frequent—and sometimes tedious—task.
The Node Specification Language (NSL) in SAMSON offers a useful solution: structuralModel attributes that let you query and filter structural models based on the number and type of atoms they contain. With a clear syntax and powerful selectors, NSL helps you focus only on the molecules that matter to your analysis.
Target Atom Counts with Simple Filters
Here’s how you can use NSL to work efficiently:
- Find models with more than 100 atoms:
sm.nat > 100 - Select models with 100 to 200 atoms:
sm.nat 100:200 - Filter models with fewer than 10 carbon atoms:
sm.nC < 10 - Identify models with 10 to 20 oxygen atoms:
sm.nO 10:20
All of this is performed within the structuralModel attribute space, which specifically targets nodes representing structural models. The prefix sm is used to indicate this context. You can combine these attributes logically to perform compound filters. For example:
|
1 |
sm.nC > 15 and sm.nH 5:10 |
This selects models with more than 15 carbon atoms and between 5 and 10 hydrogen atoms.
Why This Matters
Imagine you’re preparing a subset of protein structures for input into a docking algorithm that has a performance bottleneck when the atom count exceeds 1,000. Instead of manually browsing through your model collection, use:
|
1 |
sm.nat < 1000 |
Need to exclude water or solvent-like molecules with very few atoms? Use:
|
1 |
sm.nat > 20 |
Want to analyze only nitrogen-rich molecules for a study involving nitrogenous bases or amino groups? Try:
|
1 |
sm.nN > 10 |
Included Atom Types You Can Filter
Here are some of the atomic features you can filter on:
sm.nat: Total number of atomssm.nC: Number of carbon atomssm.nH: Number of hydrogen atomssm.nN: Number of nitrogen atomssm.nO: Number of oxygen atomssm.nS: Number of sulfur atoms
These filters can be combined with other filters such as name, selected, or visible to narrow down your dataset even further.
Use Cases in Research Workflows
This functionality is particularly helpful for researchers who:
- Need clean, filtered datasets for simulations or training data
- Want to exclude solvent molecules from visualization or processing
- Are troubleshooting memory usage during dynamics or rendering
- Require specific molecular configurations for reaction modeling
Filtering at the structural model level directly in the SAMSON interface saves time and helps maintain reproducibility. You can also script these queries as part of your data preparation pipeline in NSL-enabled contexts.
To dive deeper and explore more attributes such as formalCharge, partialCharge, or numberOfChains, visit the full documentation below.
Learn more about structural model attributes in NSL.
SAMSON and all SAMSON Extensions are free for non-commercial use. You can download SAMSON here.
