AI/ML Workloads
Deploy LLMs, image generators, and ML inference at scale
AI Inference Small
CPU-based inference for small models
$60.0/mo
$10.0 setup fee
- 4 OCPUs (ARM64)
- 16 GB RAM
- 100 GB NVMe
Includes:
- ONNX Runtime
- FastAPI
- Model hosting
- Auto-scaling
AI Inference Medium
High-memory for larger models (LLaMA 7B)
$150.0/mo
$25.0 setup fee
- 8 OCPUs (ARM64)
- 64 GB RAM
- 250 GB NVMe
Includes:
- llama.cpp
- vLLM
- Model quantization
- API gateway
AI Inference Large
Max memory for 13B+ models
$300.0/mo
$50.0 setup fee
- 16 OCPUs (ARM64)
- 128 GB RAM
- 500 GB NVMe
Includes:
- llama.cpp
- Mistral/LLaMA 13B+
- Load balancing
- Priority support
Ollama Server
Pre-configured Ollama with models
$80.0/mo
$10.0 setup fee
- 8 OCPUs (ARM64)
- 32 GB RAM
- 200 GB NVMe
Includes:
- Ollama
- Open WebUI
- LLaMA 3.2
- Mistral 7B
- API access
Stable Diffusion
Image generation server
$100.0/mo
$15.0 setup fee
- 8 OCPUs (ARM64)
- 32 GB RAM
- 100 GB NVMe
Includes:
- Automatic1111
- ComfyUI
- SD 1.5/XL models
- API access
Whisper Transcription
Audio transcription server
$50.0/mo
$5.0 setup fee
- 4 OCPUs (ARM64)
- 16 GB RAM
- 50 GB NVMe
Includes:
- OpenAI Whisper
- faster-whisper
- REST API
- Batch processing
Supported Models
LLaMA 3.2
1B, 3B, 8B, 70B
Mistral
7B, Mixtral 8x7B
Stable Diffusion
SD 1.5, SDXL, Flux
Whisper
Large-v3, Turbo
Qwen
1.5B, 7B, 14B
Phi-3
Mini, Small, Medium
DeepSeek
Coder, Math, V2
CodeLlama
7B, 13B, 34B
ARM64 Optimized
Ampere Altra CPUs deliver exceptional tokens/watt efficiency
OpenAI Compatible API
Drop-in replacement for your existing code
Private & Secure
Your data never leaves your infrastructure
Auto-Scaling
Scale up or down based on demand
Low Latency
Sub-second response times for chat models
24/7 Support
Expert help with model optimization
Use Cases
Chatbots & Assistants
Deploy private AI assistants for your team or customers without data leaving your infrastructure.
Code Generation
Run CodeLlama or DeepSeek Coder for private code completion and review.
Document Processing
Summarize, extract, and analyze documents with LLaMA or Qwen models.
Image Generation
Generate marketing images, product mockups, and creative content with Stable Diffusion.
Transcription
Transcribe meetings, podcasts, and calls with Whisper. Supports 99+ languages.
Translation
Translate content between languages while keeping sensitive data private.
Quick Start with Ollama
# After deployment, SSH to your server
$ ssh root@your-ai-server
# Check Ollama status
$ ollama list
NAME SIZE MODIFIED
llama3.2:8b 4.7GB 2 hours ago
mistral:7b 4.1GB 2 hours ago
# Query with cURL (OpenAI-compatible)
$ curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"llama3.2:8b","messages":[{"role":"user","content":"Hello!"}]}'
# Or use Open WebUI at https://your-server:8080