Open Cheminformatics Agent Documentation
Information regarding the CheminfEDU tools.
Documentation Index or back to Homepage
RDKit Tools
Tool Name | Description | Example Prompt | Example Answer |
---|---|---|---|
RDKitCalculateProperties |
Calculates molecular properties from a SMILES input such as molecular weight and rotatable bonds. |
Calculate properties for SMILES: C[C@@H](O)c1ccccc1 |
{ "MolWeight": 138.068, "NumRotatableBonds": 2 } |
MolSimilarity |
Computes the Tanimoto similarity between two SMILES strings (input as a single string with SMILES separated by '.'). Provides a descriptive output. |
Compare similarity between benzene and aspirin |
The Tanimoto similarity between benzene and aspirin is 0.2424, indicating low similarity. |
SMILES2Weight |
Returns the molecular weight based on a given SMILES string. |
What is the molecular weight of water |
18.010564686 |
FuncGroups |
Identifies functional groups in a molecule based on its SMILES input. |
Identify functional groups in aspirin |
This molecule contains carboxylic acids, esters, and a carbonyl methylester. |
IsSmiles |
Checks if the input text is a valid SMILES string. |
Is CCO a valid SMILES string? |
Yes, CCO is a valid SMILES string. |
Check if water is a valid SMILES. |
No, 'water' is not a valid SMILES string. |
||
CanonicalSmiles |
Converts a given SMILES string into its canonical form. |
What is the canonical SMILES for C(C)O? |
The canonical SMILES for C(C)O is CCO. |
Convert 'invalid_smiles' to canonical SMILES. |
Invalid SMILES string |
||
IsCas |
Checks if the input text is a valid CAS registry number. |
Is 7732-18-5 a CAS number? |
Yes, 7732-18-5 is a valid CAS number format. |
Tell me if 123-45-67 is a CAS number. |
No, 123-45-67 is not a valid CAS number format. |
||
IsMultipleSmiles |
Checks if the input text consists of multiple SMILES strings separated by dots ('.'). |
Does 'CCO.CCC' represent multiple SMILES? |
Yes, 'CCO.CCC' appears to represent multiple SMILES strings. |
Is 'benzene' a set of multiple SMILES? |
No, 'benzene' does not appear to represent multiple SMILES strings. |
||
Tanimoto |
Calculates the Tanimoto similarity coefficient between two molecules (input as two separate SMILES strings). |
Calculate the Tanimoto similarity between CCO and CCC. |
The Tanimoto similarity between the molecules represented by the SMILES strings "CCO" and "CCC" is approximately 0.43. |
What's the Tanimoto similarity of 'CCO' and 'invalid'? |
Error: Not a valid SMILES strin. |
||
LargestMol |
Identifies and returns the SMILES string of the largest molecule from a dot-separated SMILES string. |
What is the largest molecule in 'CCO.[Na+].[Cl-]'? |
The largest molecule in 'CCO.[Na+].[Cl-]' is CCO. |
Find the largest component in '[Na+].invalid.[Cl-]' |
The largest molecule in '[Na+].invalid.[Cl-]' is [Na+]. |
Chemfiles Tools
Tool Name | Description | Example Prompt | Example Answer |
---|---|---|---|
ChemfilesReadFile |
Reads molecular structure files in various formats (XYZ, PDB, SDF, MOL, etc.) and returns atomic positions, topology, and properties. |
Read the molecular structure from input.xyz file |
Successfully read 3 atoms from input.xyz. Contains O, H, H atoms with positions and bonds information. |
Load molecular data from protein.pdb |
Loaded protein structure with 1024 atoms, including topology and unit cell information. |
||
ChemfilesWriteFile |
Writes molecular structures to files in various formats. Can create water molecules, proteins, or any molecular system with atoms, bonds, and unit cells. |
Create a water molecule and save it as water.xyz |
Successfully created water molecule with O at origin and H atoms at (1,0,0) and (0,1,0), saved to water.xyz. |
Save the molecular structure with bonds to output.pdb |
Successfully saved molecular structure to output.pdb with proper bond connectivity. |
||
ChemfilesConvertFormat |
Converts molecular files between different formats (XYZ to PDB, SDF to MOL, etc.). Preserves molecular structure and properties during conversion. |
Convert input.xyz to PDB format |
Successfully converted input.xyz to output.pdb format while preserving atomic positions and connectivity. |
Transform the SDF file to MOL format |
Converted SDF file to MOL format, maintaining molecular structure and properties. |
||
ChemfilesAnalyzeStructure |
Analyzes molecular structures to provide information about atom counts, bond counts, atom types, molecular weight, and structural properties. |
Analyze the structure of the water molecule |
Structure analysis: 3 atoms (1 O, 2 H), 2 bonds, molecular weight: 18.015 Da, geometry: bent molecule. |
What are the structural properties of this protein? |
Protein analysis: 1024 atoms, 356 residues, contains alpha helices and beta sheets, molecular weight: 12.5 kDa. |
||
ChemfilesCalculateDistances |
Calculates distances between specified atom pairs in molecular structures. Useful for analyzing bond lengths, hydrogen bonds, and molecular geometry. |
Calculate the distance between atoms 0 and 1 in the molecule |
Distance between atom 0 and atom 1: 1.00 Å (typical O-H bond length). |
Find all bond distances in the water molecule |
Bond distances: O-H1: 1.00 Å, O-H2: 1.00 Å, H1-H2: 1.41 Å (104.5° bond angle). |
||
ChemfilesAddBonds |
Adds chemical bonds to molecular structures. Can create single, double, triple bonds and specify bond connectivity for molecules. |
Add O-H bonds to create a water molecule |
Successfully added bonds: O(0)-H(1) and O(0)-H(2), creating proper water molecule connectivity. |
Create C-C bond for ethane molecule |
Added C-C single bond between carbons, plus C-H bonds to complete ethane structure. |
||
ChemfilesSetUnitCell |
Sets periodic boundary conditions and unit cell parameters for crystalline structures. Supports cubic, orthorhombic, and triclinic cells. |
Set a cubic unit cell with 10 Å sides |
Successfully set cubic unit cell: a=b=c=10.0 Å, α=β=γ=90°. |
Create orthorhombic cell for protein crystal |
Set orthorhombic unit cell: a=50.0 Å, b=60.0 Å, c=70.0 Å, all angles 90°. |
||
ChemfilesAddResidue |
Adds residue information to molecular structures, useful for proteins and nucleic acids. Can specify residue names, numbers, and chain identifiers. |
Add an alanine residue to the protein structure |
Successfully added ALA residue #1 to the structure with proper atom assignments. |
Create residue information for DNA base |
Added nucleotide residue with proper base pairing and backbone connectivity. |
||
ChemfilesTrajectoryRead |
Reads molecular dynamics trajectories with multiple frames. Can process time-series molecular data and extract specific frames or time ranges. |
Read all frames from the MD trajectory file |
Successfully read 1000 frames from trajectory, each containing 512 atoms over 1 ns simulation. |
Extract frame 500 from the trajectory |
Extracted frame 500 (t=500 ps) with atomic positions and velocities. |
||
ChemfilesSelection |
Selects specific atoms based on criteria like element type, residue name, or spatial location. Useful for filtering and analyzing subsets of molecular structures. |
Select all zinc and nitrogen atoms from the structure |
Selected 12 atoms: 4 Zn atoms and 8 N atoms matching the criteria "name Zn or name N". |
Find all carbon atoms in aromatic rings |
Selected 24 aromatic carbon atoms from benzene rings in the molecular structure. |
||
ChemfilesGetFormats |
Lists all supported file formats for reading and writing molecular structures. Includes XYZ, PDB, SDF, MOL, GROMACS, AMBER, and many others. |
What file formats does chemfiles support? |
Supported formats: Read (25 formats): XYZ, PDB, SDF, MOL, GRO, TRR, XTC, DCD, etc. Write (18 formats): XYZ, PDB, SDF, MOL, GRO, etc. |
Notes on Chemfiles Integration
The chemfiles tools provide comprehensive support for molecular file I/O and structure manipulation. These tools can handle various molecular file formats commonly used in computational chemistry, molecular dynamics, and structural biology. The tools support both single molecular structures and molecular dynamics trajectories.
Key features:
- Support for 25+ input formats and 18+ output formats
- Trajectory processing for molecular dynamics simulations
- Unit cell and periodic boundary condition support
- Atom selection and filtering capabilities
- Bond connectivity and topology management
- Residue and chain information for biomolecules
Notes on Tanimoto Tools
The Tanimoto
function is now exposed as a direct tool. It takes two SMILES strings as separate inputs.
The MolSimilarity
class-based tool takes a single string with two SMILES separated by a dot (.
) and provides a more descriptive, human-readable output about the similarity level.
The agent might choose either depending on the phrasing of the prompt and its understanding of the tools. The prompts above for Tanimoto
are designed to encourage the use of the direct Tanimoto
function tool. If a user asks "Compare CCO and CCC.", the agent might opt for the MolSimilarity
tool.