AI/ML Workloads

Deploy LLMs, image generators, and ML inference at scale

AI Inference Small

CPU-based inference for small models

$60.0/mo

$10.0 setup fee

4 OCPUs (ARM64)
16 GB RAM
100 GB NVMe

Includes:

ONNX Runtime
FastAPI
Model hosting
Auto-scaling

Deploy Now

AI Inference Medium

High-memory for larger models (LLaMA 7B)

$150.0/mo

$25.0 setup fee

8 OCPUs (ARM64)
64 GB RAM
250 GB NVMe

Includes:

llama.cpp
vLLM
Model quantization
API gateway

Deploy Now

AI Inference Large

Max memory for 13B+ models

$300.0/mo

$50.0 setup fee

16 OCPUs (ARM64)
128 GB RAM
500 GB NVMe

Includes:

llama.cpp
Mistral/LLaMA 13B+
Load balancing
Priority support

Deploy Now

POPULAR

🦙

Ollama Server

Pre-configured Ollama with models

$80.0/mo

$10.0 setup fee

8 OCPUs (ARM64)
32 GB RAM
200 GB NVMe

Includes:

Ollama
Open WebUI
LLaMA 3.2
Mistral 7B
API access

Deploy Now

Stable Diffusion

Image generation server

$100.0/mo

$15.0 setup fee

8 OCPUs (ARM64)
32 GB RAM
100 GB NVMe

Includes:

Automatic1111
ComfyUI
SD 1.5/XL models
API access

Deploy Now

Whisper Transcription

Audio transcription server

$50.0/mo

$5.0 setup fee

4 OCPUs (ARM64)
16 GB RAM
50 GB NVMe

Includes:

OpenAI Whisper
faster-whisper
REST API
Batch processing

Deploy Now

Supported Models

🦙

LLaMA 3.2

1B, 3B, 8B, 70B

🌊

Mistral

7B, Mixtral 8x7B

🎨

Stable Diffusion

SD 1.5, SDXL, Flux

🎤

Whisper

Large-v3, Turbo

💬

Qwen

1.5B, 7B, 14B

⚡

Phi-3

Mini, Small, Medium

🔮

DeepSeek

Coder, Math, V2

🧮

CodeLlama

7B, 13B, 34B

ARM64 Optimized

Ampere Altra CPUs deliver exceptional tokens/watt efficiency

OpenAI Compatible API

Drop-in replacement for your existing code

Private & Secure

Your data never leaves your infrastructure

Auto-Scaling

Scale up or down based on demand

Low Latency

Sub-second response times for chat models

24/7 Support

Expert help with model optimization

Use Cases

Chatbots & Assistants

Deploy private AI assistants for your team or customers without data leaving your infrastructure.

Code Generation

Run CodeLlama or DeepSeek Coder for private code completion and review.

Document Processing

Summarize, extract, and analyze documents with LLaMA or Qwen models.

Image Generation

Generate marketing images, product mockups, and creative content with Stable Diffusion.

Transcription

Transcribe meetings, podcasts, and calls with Whisper. Supports 99+ languages.

Translation

Translate content between languages while keeping sensitive data private.

Quick Start with Ollama

# After deployment, SSH to your server

$ ssh root@your-ai-server

# Check Ollama status

$ ollama list

NAME SIZE MODIFIED

llama3.2:8b 4.7GB 2 hours ago

mistral:7b 4.1GB 2 hours ago

# Query with cURL (OpenAI-compatible)

$ curl http://localhost:11434/v1/chat/completions \

-H "Content-Type: application/json" \

-d '{"model":"llama3.2:8b","messages":[{"role":"user","content":"Hello!"}]}'

# Or use Open WebUI at https://your-server:8080