Technical Architecture

The architecture behindprivate AI at enterprise scale.

Gridlight is not a wrapper around a cloud API. It is a purpose-built inference control plane that runs entirely within your environment — on hardware you own, in networks you control, with data that never leaves.

Stack overview

How Gridlight sits in your environment

CLIENTSCONTROL PLANEMODELSCOMPUTECLIENT APPLICATIONSWeb AppsEnterprise ToolsDeveloper APIsDashboardsCustom IntegrationsGRIDLIGHT CONTROL PLANEWorkload OrchestratorQueue, priority & load balancingModel RouterCost · capability · compliance routingPolicy EngineRBAC · data classification · guardrailsCapacity GovernorGPU utilization · QoS · throttlingAudit LoggerImmutable logs · tamper-evidentObservability StackTelemetry · SLOs · SIEM exportYOUR ENVIRONMENTMODEL FABRIC — YOUR MODELSOpen-Source (Llama, Mistral)Fine-Tuned / DomainPrivate / Custom ModelsSharded Large ModelsHot-Swap ReadyYOUR COMPUTE — DATA NEVER LEAVES THIS LAYEROn-Prem GPU ClusterEdge NodesPrivate CloudWorkstations+More
CONTROL PLANE

Single point of governance

Every inference request — from every application — passes through the control plane.

MODEL FABRIC

Model-agnostic by design

Gridlight treats models as hot-swappable infrastructure. Add, remove, or replace models without touching application code. The fabric handles version management and traffic migration.

COMPUTE LAYER

Hardware you control

Gridlight runs on your existing GPU hardware — on-prem, edge, or private cloud. No GPU cloud subscription. No egress charges. No data leaving your network perimeter.

Intelligent routing

Route every query to the right model — automatically.

The Model Router evaluates three dimensions on every inference request and dispatches to the optimal model without application-layer changes.

Cost routing

Lightweight queries route to smaller, faster, cheaper models. Complex reasoning routes to larger ones. Cost per query optimizes automatically based on task complexity classification.

Capability routing

Code generation routes to code-optimized models. Document analysis routes to long-context models. Routing rules are policy-driven and version-controlled — no hardcoding in application logic.

Capacity-aware routing

Requests dispatch to the model instance with available headroom based on real-time GPU utilization and queue depth — maximizing throughput and keeping latency predictable under load.

Deployment topologies

Deploy where your data lives.

Gridlight supports multiple deployment topologies — from a single on-prem cluster to federated edge deployments with centralized governance.

TOPOLOGY 01

Single-site on-prem

Control plane and model fabric co-located on your data center hardware. Simplest deployment. Recommended for organizations with a single primary data center.

TOPOLOGY 02

Hub-and-spoke edge

Central control plane with distributed model fabric nodes at edge locations. Audit and policy enforced centrally; inference runs locally at each site. Ideal for multi-location regulated enterprises.

TOPOLOGY 03

Air-gapped deployment

Fully isolated deployment with no external network dependencies. Designed for government, defense, and high-security environments where even outbound telemetry must remain internal.