AI/ML Workloads

Deploy LLMs, image generators, and ML inference at scale

AI Inference Small

CPU-based inference for small models

$60.0/mo

$10.0 setup fee

  • 4 OCPUs (ARM64)
  • 16 GB RAM
  • 100 GB NVMe

Includes:

  • ONNX Runtime
  • FastAPI
  • Model hosting
  • Auto-scaling
Deploy Now

AI Inference Medium

High-memory for larger models (LLaMA 7B)

$150.0/mo

$25.0 setup fee

  • 8 OCPUs (ARM64)
  • 64 GB RAM
  • 250 GB NVMe

Includes:

  • llama.cpp
  • vLLM
  • Model quantization
  • API gateway
Deploy Now

AI Inference Large

Max memory for 13B+ models

$300.0/mo

$50.0 setup fee

  • 16 OCPUs (ARM64)
  • 128 GB RAM
  • 500 GB NVMe

Includes:

  • llama.cpp
  • Mistral/LLaMA 13B+
  • Load balancing
  • Priority support
Deploy Now
POPULAR
🦙

Ollama Server

Pre-configured Ollama with models

$80.0/mo

$10.0 setup fee

  • 8 OCPUs (ARM64)
  • 32 GB RAM
  • 200 GB NVMe

Includes:

  • Ollama
  • Open WebUI
  • LLaMA 3.2
  • Mistral 7B
  • API access
Deploy Now

Stable Diffusion

Image generation server

$100.0/mo

$15.0 setup fee

  • 8 OCPUs (ARM64)
  • 32 GB RAM
  • 100 GB NVMe

Includes:

  • Automatic1111
  • ComfyUI
  • SD 1.5/XL models
  • API access
Deploy Now

Whisper Transcription

Audio transcription server

$50.0/mo

$5.0 setup fee

  • 4 OCPUs (ARM64)
  • 16 GB RAM
  • 50 GB NVMe

Includes:

  • OpenAI Whisper
  • faster-whisper
  • REST API
  • Batch processing
Deploy Now

Supported Models

🦙

LLaMA 3.2

1B, 3B, 8B, 70B

🌊

Mistral

7B, Mixtral 8x7B

🎨

Stable Diffusion

SD 1.5, SDXL, Flux

🎤

Whisper

Large-v3, Turbo

💬

Qwen

1.5B, 7B, 14B

âš¡

Phi-3

Mini, Small, Medium

🔮

DeepSeek

Coder, Math, V2

🧮

CodeLlama

7B, 13B, 34B

ARM64 Optimized

Ampere Altra CPUs deliver exceptional tokens/watt efficiency

OpenAI Compatible API

Drop-in replacement for your existing code

Private & Secure

Your data never leaves your infrastructure

Auto-Scaling

Scale up or down based on demand

Low Latency

Sub-second response times for chat models

24/7 Support

Expert help with model optimization

Use Cases

Chatbots & Assistants

Deploy private AI assistants for your team or customers without data leaving your infrastructure.

Code Generation

Run CodeLlama or DeepSeek Coder for private code completion and review.

Document Processing

Summarize, extract, and analyze documents with LLaMA or Qwen models.

Image Generation

Generate marketing images, product mockups, and creative content with Stable Diffusion.

Transcription

Transcribe meetings, podcasts, and calls with Whisper. Supports 99+ languages.

Translation

Translate content between languages while keeping sensitive data private.

Quick Start with Ollama

# After deployment, SSH to your server

$ ssh root@your-ai-server

# Check Ollama status

$ ollama list

NAME SIZE MODIFIED

llama3.2:8b 4.7GB 2 hours ago

mistral:7b 4.1GB 2 hours ago

# Query with cURL (OpenAI-compatible)

$ curl http://localhost:11434/v1/chat/completions \

-H "Content-Type: application/json" \

-d '{"model":"llama3.2:8b","messages":[{"role":"user","content":"Hello!"}]}'

# Or use Open WebUI at https://your-server:8080