Usage

Quick start

from altametris.inference_sdk import InferenceClient, InferenceManager

InferenceClient — submitting and monitoring jobs

Check endpoint health

client = InferenceClient("batch_pipeline")
health = client.health()
print(health["status"])  # healthy / degraded / unhealthy

Submit a job (fire-and-forget)

job_name = client.invoke(inputs=inputs)

Blocking run — submit + stream + results in one call

result = client.run(inputs=inputs)
print(result["status"])   # Completed
print(result["job_name"]) # pipelinejob-XX-XXXX

Monitor a running job

# Stream logs in real time (blocks until done)
status = client.stream(job_name)

# Or poll every 60 seconds
status = client.poll(job_name, interval=60)

# Check status without blocking
status = client.status(job_name)

# Get result summary
result = client.results(job_name)

Cancel a job

client.cancel(job_name)  # blocks until Azure ML confirms cancellation

Failover between deployments

Submit on a primary deployment; if the compute stays queued longer than queue_timeout_s, cancel and resubmit on a fallback deployment. The timer only counts time in a waiting state (Queued, Preparing) — it resets when the child job starts running.

job_name = client.invoke_with_failover(
    inputs=inputs,
    primary_deployment="deployment-vg-t4",
    fallback_deployment="deployment-vg-v100",
    queue_timeout_s=3600,  # 1h in queue before switching
    poll_interval_s=60,
)
status = client.stream(job_name)
result = client.results(job_name)

InferenceManager — endpoint lifecycle

from pathlib import Path
from altametris.inference_sdk import InferenceManager

manager = InferenceManager("batch_pipeline")

# One-shot setup
manager.ensure_ready(
    deployment_name="my-deployment",
    pipeline_yml=Path("config/pipeline.yml"),
    compute_name="my-cluster",
)

# Step by step
manager.create_endpoint(description="My endpoint")
manager.ensure_compute(Path("config/compute.yml"))
manager.create_deployment("my-deployment", Path("config/pipeline.yml"), "my-cluster")

# Cleanup
manager.delete_deployment("my-deployment")
manager.delete_endpoint()