Stack overview
Technical Architecture
The architecture behindprivate AI at enterprise scale.
Gridlight is not a wrapper around a cloud API. It is a purpose-built inference control plane that runs entirely within your environment — on hardware you own, in networks you control, with data that never leaves.
Single point of governance
Every inference request — from every application — passes through the control plane.
Model-agnostic by design
Gridlight treats models as hot-swappable infrastructure. Add, remove, or replace models without touching application code. The fabric handles version management and traffic migration.
Hardware you control
Gridlight runs on your existing GPU hardware — on-prem, edge, or private cloud. No GPU cloud subscription. No egress charges. No data leaving your network perimeter.
Intelligent routing
Route every query to the right model — automatically.
The Model Router evaluates three dimensions on every inference request and dispatches to the optimal model without application-layer changes.
Cost routing
Lightweight queries route to smaller, faster, cheaper models. Complex reasoning routes to larger ones. Cost per query optimizes automatically based on task complexity classification.
Capability routing
Code generation routes to code-optimized models. Document analysis routes to long-context models. Routing rules are policy-driven and version-controlled — no hardcoding in application logic.
Capacity-aware routing
Requests dispatch to the model instance with available headroom based on real-time GPU utilization and queue depth — maximizing throughput and keeping latency predictable under load.
Deployment topologies
Deploy where your data lives.
Gridlight supports multiple deployment topologies — from a single on-prem cluster to federated edge deployments with centralized governance.
Single-site on-prem
Control plane and model fabric co-located on your data center hardware. Simplest deployment. Recommended for organizations with a single primary data center.
Hub-and-spoke edge
Central control plane with distributed model fabric nodes at edge locations. Audit and policy enforced centrally; inference runs locally at each site. Ideal for multi-location regulated enterprises.
Air-gapped deployment
Fully isolated deployment with no external network dependencies. Designed for government, defense, and high-security environments where even outbound telemetry must remain internal.
