Batch Polymer Embedding Analysis

Batch analysis of 10 common polymers using TransPolymer AI to calculate pairwise similarities. Identified PET and Polycarbonate as most similar (98.5%) and exported embeddings for later ML tasks.

768-dimensional embeddings per polymer
Pairwise cosine similarity calculations
Export to JSON and NumPy formats
Structure-property relationship insights

Chat Dialogs

Analyze 10 common polymers

PROMPT: I have a list of 10 polymer structures I need to analyze. Can you generate embeddings for all of them and help me find which ones are most similar?

CONTEXT: Polymers include: Polyethylene (*CC*), Polypropylene (*CC(C)*), Polybutylene (*CC(CC)*), Polystyrene (*CC(C1=CC=CC=C1)*), PVC (*CC(Cl)*), PVDF (*CC(F)(F)*), PMMA (*CC(C)(C(=O)OC)*), Nylon-6, PET, Polycarbonate.

THINKING: I'll mount TransPolymer and generate 768D embeddings for each polymer, then compute pairwise cosine similarities to find clusters.

RESULT: All 10 polymers received 768-dimensional embeddings from TransPolymer. Computed 45 pairwise similarities (10 choose 2 combinations).

View technical details

Top similar polymer pairs

PROMPT: Which polymers are most similar to each other?

CONTEXT: Cosine similarity ranges from 0 (orthogonal) to 1 (identical). Values above 0.95 indicate high structural similarity.

THINKING: Sorting the 45 similarity pairs to find the highest and lowest matches.

RESULT: Top similar pairs: - PET / Polycarbonate: 98.5% (aromatic polyesters) - PMMA / Nylon-6: 98.1% (carbonyl functional groups) - Polypropylene / Polybutylene: 97.8% (polyolefin family) Least similar: - Polyethylene / Polycarbonate: 79.6% (simple vs complex)

View technical details

Export embeddings for ML

PROMPT: Can you save these embeddings for later use in my property prediction model?

CONTEXT: Embeddings can be used as feature vectors for ML models predicting Tg, mechanical properties, or other polymer characteristics.

THINKING: I'll export in multiple formats for different use cases.

RESULT: Created files: - polymer_embeddings.json (human-readable with metadata) - polymer_embeddings.npy (10x768 NumPy array for direct ML input) - polymer_names.json (name/SMILES mapping)

View technical details

Technical Details

Embedding Generation

INPUT:

{
  "polymers": [
    {"name": "Polyethylene", "smiles": "*CC*"},
    {"name": "Polypropylene", "smiles": "*CC(C)*"},
    {"name": "Polystyrene", "smiles": "*CC(C1=CC=CC=C1)*"},
    {"name": "PET", "smiles": "*OC(=O)C1=CC=C(C=C1)C(=O)OCC*"},
    {"name": "Polycarbonate", "smiles": "*OC(C)(C)C1=CC=C(C=C1)C(C)(C)O*"}
  ]
}

OUTPUT:

{
  "success": true,
  "embeddings": {
    "Polyethylene": [0.123, -0.456, ...768 dims...],
    "Polypropylene": [0.134, -0.445, ...768 dims...],
    "PET": [0.567, -0.234, ...768 dims...]
  }
}

Similarity Results

OUTPUT:

{
  "top_similar": [
    {"polymer1": "PET", "polymer2": "Polycarbonate", "similarity": 0.9848},
    {"polymer1": "PMMA", "polymer2": "Nylon-6", "similarity": 0.9814},
    {"polymer1": "Polypropylene", "polymer2": "Polybutylene", "similarity": 0.9778}
  ],
  "least_similar": [
    {"polymer1": "Polyethylene", "polymer2": "Polycarbonate", "similarity": 0.7962},
    {"polymer1": "Polyethylene", "polymer2": "PET", "similarity": 0.8163}
  ]
}

Export Format

OUTPUT:

# Usage example
import numpy as np
import json

embeddings = np.load("polymer_embeddings.npy")  # Shape: (10, 768)
with open("polymer_names.json") as f:
    names = json.load(f)

# Use as features for property prediction
X = embeddings
y = [known_tg_values]
model.fit(X, y)

Tools Used

brain_discovery brain_mount_service brain_predict

Outcomes

Generated 768D embeddings for 10 polymers
PET/Polycarbonate most similar (98.5%) - aromatic polyesters
Polyethylene most different from complex polymers (79.6%)
Exported to JSON and NumPy for ML integration

Back to Stories