Platform Science Use Cases Pricing Research Company
Sign In Request Access
Tutorial

From SMILES to DFT Energy in One API Call

From SMILES to DFT Energy in One API Call

This tutorial walks through the complete workflow for submitting batch DFT calculations via the Qchemvyx API: from a SMILES-format candidate library through geometry optimization, thermochemistry, and energy ranking. We’ll cover authentication, single-molecule submission, batch library submission, result retrieval, and how to build a simple ranking pipeline in Python.

The target audience is a computational chemist or software developer who wants to integrate Qchemvyx into an existing chemistry informatics stack — whether that’s a Jupyter notebook for ad hoc catalyst screening or a production pipeline calling the API programmatically from an automated screening workflow.

Prerequisites and Installation

Requirements:

  • Python 3.9 or later
  • A Qchemvyx API key (generate one at qchemvyx.com/platform/api-keys after account creation)
  • The qchemvyx Python SDK
# Install the SDK
pip install qchemvyx

# Verify installation
python -c "import qchemvyx; print(qchemvyx.__version__)"
# Should print 0.4.x or later

API keys are scoped per project — a key created for Project A cannot submit jobs to Project B. Store your API key in an environment variable rather than hardcoding it in scripts:

# In your shell profile or .env file
export QCVX_API_KEY="qcvx_sk_your_key_here"

Single Molecule Submission: Geometry Optimization and Thermochemistry

The simplest use case: submit a SMILES string, get back an optimized geometry, Gibbs free energy, and orbital energies.

import os
from qchemvyx import Client

# Initialize client — reads QCVX_API_KEY from environment by default
client = Client(api_key=os.environ["QCVX_API_KEY"])

# Submit a DFT geometry optimization
# SMILES: aspirin (acetylsalicylic acid)
job = client.dft.submit(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    functional="B3LYP-D4",
    basis_set="def2-TZVP",
    solvent="water",         # SMD solvation model
    compute_thermochemistry=True,  # includes ZPE + thermal corrections
    temperature=298.15,            # Kelvin
    charge=0,
    multiplicity=1,
)

The job object is returned immediately with a job ID. The DFT calculation runs asynchronously on Qchemvyx infrastructure. To retrieve results:

# Blocking wait (blocks until job completes or timeout)
result = job.wait(timeout=7200)   # 2-hour timeout

# Or non-blocking status check
status = job.status()             # "queued" | "running" | "completed" | "failed"

# Access results
print(f"Job ID: {result.job_id}")
print(f"Gibbs free energy: {result.gibbs_free_energy:.6f} Eh")
print(f"Electronic energy: {result.electronic_energy:.6f} Eh")
print(f"ZPE correction: {result.zpe:.6f} Eh")
print(f"Thermal correction to G: {result.g_correction:.6f} Eh")
print(f"HOMO energy: {result.homo:.4f} eV")
print(f"LUMO energy: {result.lumo:.4f} eV")
print(f"HOMO-LUMO gap: {result.homo_lumo_gap:.4f} eV")

# Save optimized geometry
result.save_xyz("aspirin_opt.xyz")
# Or retrieve as a string
xyz_string = result.xyz_string()

Configuring DFT Parameters

The dft.submit() method accepts all major DFT method parameters. Key options:

job = client.dft.submit(
    smiles="...",

    # Method selection
    functional="wB97X-D3",      # or "B3LYP-D4", "PBE0-D4", "M06-2X", "r2SCAN-D4"
    basis_set="def2-TZVP",      # or "6-311G++(d,p)", "def2-SVP", "cc-pVTZ"

    # For transition metals: specify ECP separately
    ecp="SDD",                   # applied to metal atoms; def2-TZVP applied to rest
    metal_atoms=["Pd"],          # list of element symbols requiring ECP

    # Solvation
    solvent="thf",               # SMD model; options: "water", "dmso", "acetonitrile",
                                 # "thf", "toluene", "methanol", "dcm", "dmf", etc.
    solvent_model="smd",         # or "pcm" for IEF-PCM

    # Calculation type
    compute_thermochemistry=True,   # frequency calculation + thermal corrections
    compute_nbo=True,               # natural bond orbital analysis
    compute_esp=True,               # electrostatic potential charges

    # SCF settings
    scf_convergence=1e-8,          # SCF convergence criterion in Eh (default 1e-8)
    max_scf_cycles=300,
    diis=True,
)

Batch Submission: Screening a Catalyst Library

For high-throughput screening, use client.screening.submit(). This routes to Qchemvyx’s parallelized screening engine and is substantially faster than looping over individual dft.submit() calls because the scheduler can pack jobs onto available compute resources more efficiently.

Prepare your candidate library as a CSV with at minimum a smiles column. Optional: name, charge, multiplicity columns for per-molecule overrides.

# candidates.csv format:
# name,smiles,charge,multiplicity
# PPh3_Pd_complex,c1ccc(cc1)[PH](c1ccccc1)c1ccccc1,0,1
# XPhos_Pd_complex,...,0,1
# IPr_NHC,...,0,1
from qchemvyx import Client
import pandas as pd

client = Client()

# Submit batch screening job
batch = client.screening.submit(
    library_csv="candidates.csv",
    functional="wB97X-D3",
    basis_set="def2-TZVP",
    solvent="thf",
    compute_thermochemistry=True,
    ranking_property="gibbs_free_energy",   # property to rank by
    batch_name="phosphine_ligand_screen_v1",
)

print(f"Batch ID: {batch.batch_id}")
print(f"Total jobs submitted: {batch.job_count}")

Poll for batch completion and retrieve results:

# Wait for all jobs in batch to complete
results = batch.wait_all(timeout=86400, poll_interval=60)

# Get ranked results DataFrame
df = results.to_dataframe()
print(df[["name", "smiles", "gibbs_free_energy", "homo", "lumo",
          "homo_lumo_gap", "status"]].sort_values("gibbs_free_energy"))

# Save to CSV
df.to_csv("screening_results.csv", index=False)

# Access individual job results
for job_result in results:
    if job_result.status == "completed":
        print(f"{job_result.name}: ΔG = {job_result.gibbs_free_energy:.4f} Eh")

Transition State Search via API

The API supports NEB-based transition state searches when both reactant and product SMILES are provided:

ts_job = client.ts_search.submit(
    reactant_smiles="...",
    product_smiles="...",
    functional="wB97X-D3",
    basis_set="def2-TZVP",
    solvent="thf",
    method="ci_neb",         # climbing-image NEB
    n_images=16,
    spring_constant=0.1,     # Eh/Ang^2
    confirm_irc=True,        # runs IRC after CI-NEB converges
)

ts_result = ts_job.wait(timeout=28800)   # 8-hour timeout for large systems

print(f"Activation barrier ΔG‡: {ts_result.activation_barrier:.4f} Eh")
print(f"Imaginary frequency: {ts_result.imaginary_frequency:.1f} cm⁻¹")
print(f"IRC confirmed: {ts_result.irc_confirmed}")

# Save TS geometry
ts_result.save_xyz("ts_geometry.xyz")

Error Handling and Job Recovery

DFT calculations can fail for various reasons: SCF non-convergence, geometry optimization failures, or resource limits. The SDK provides structured error information:

from qchemvyx.exceptions import JobFailedError, SCFConvergenceError

try:
    result = job.wait(timeout=7200)
except SCFConvergenceError as e:
    print(f"SCF failed: {e.message}")
    print(f"Suggested fix: {e.suggested_remedy}")
    # Common fix: increase level shift or use fractional occupation
    retry_job = client.dft.submit(
        ...,
        level_shift=0.2,        # Eh, helps convergence for open-shell systems
        fractional_occupation=True,
    )
except JobFailedError as e:
    print(f"Job failed: {e.message}")
    print(f"Error type: {e.error_type}")  # "geometry", "scf", "resource", "input"

For batch screening, individual job failures don’t cancel the batch. Access the status breakdown:

results = batch.get_results()   # returns even if some jobs failed
completed = [r for r in results if r.status == "completed"]
failed = [r for r in results if r.status == "failed"]

print(f"Completed: {len(completed)}/{batch.job_count}")
print(f"Failed: {len(failed)}/{batch.job_count}")

for f in failed:
    print(f"  {f.name}: {f.error_message}")

Building a Screening Ranking Pipeline

A complete example pipeline: submit a SMILES library, retrieve results, apply secondary filters, and export the top candidates for synthesis consideration.

import os
import pandas as pd
from qchemvyx import Client

client = Client(api_key=os.environ["QCVX_API_KEY"])

# Load candidates
candidates_df = pd.read_csv("200_catalyst_candidates.csv")

# Submit batch
batch = client.screening.submit(
    library_csv="200_catalyst_candidates.csv",
    functional="wB97X-D3",
    basis_set="def2-TZVP",
    solvent="thf",
    compute_thermochemistry=True,
    batch_name="c-n_coupling_screen_q4",
)

print(f"Submitted {batch.job_count} jobs, batch ID: {batch.batch_id}")

# Wait and retrieve
results = batch.wait_all(timeout=72 * 3600)  # 72-hour max
df = results.to_dataframe()

# Filter: completed jobs only, stable ground states
df_ok = df[
    (df["status"] == "completed") &
    (df["s2_deviation"] < 0.05) &     # spin contamination check
    (df["n_imaginary_frequencies"] == 0)  # confirmed minimum
]

# Rank by Gibbs free energy (lower = more stable product complex)
df_ok = df_ok.sort_values("gibbs_free_energy")

# Apply secondary filter: HOMO-LUMO gap > 1.0 eV (ground state stability)
df_ok = df_ok[df_ok["homo_lumo_gap"] > 1.0]

# Export top 15 for synthesis prioritization
top15 = df_ok.head(15)
top15.to_csv("top15_synthesis_candidates.csv", index=False)
print(f"Top 15 candidates exported. Lowest ΔG: {top15.iloc[0][‘gibbs_free_energy’]:.4f} Eh")

This pipeline — from a 200-candidate SMILES library to a ranked synthesis shortlist of 15 — typically completes in 6–12 hours on the Qchemvyx infrastructure for small-to-medium organic catalyst structures (<80 non-hydrogen atoms). For larger organometallic complexes with multiple conformers that need to be sampled, computation time scales with the number of conformers submitted per candidate.