Introduction
What is Agent Cloud?
Agent Cloud is VideoSDK's fully managed cloud infrastructure for deploying and running AI voice agents. It abstracts away all the complexity of server management, scaling, and maintenance, allowing you to focus entirely on building your agent logic.
Agent Cloud supports two deployment workflows:
Low-Code Deployment (UI-Based)
For users who prefer a visual approach, Agent Cloud provides a low-code interface where you can:
- Design your AI agent directly from the VideoSDK dashboard
- Configure agent behavior, prompts, and integrations through the UI
- Deploy with a single click – no coding required
This approach is ideal for rapid prototyping, non-technical users, or teams that want to iterate quickly without writing deployment scripts.
Developer Deployment (CLI-Based)
For developers who build custom AI voice agents using the VideoSDK Pipeline, Agent Cloud provides a CLI-based deployment workflow:
- Develop your AI voice agent using the VideoSDK Agents Python SDK
- Use the VideoSDK CLI to package and deploy your agent to the cloud
- Manage deployments, versions, and configurations programmatically
This approach gives developers full control over their agent code while leveraging the managed infrastructure benefits of Agent Cloud.
CLI documentation for deploying agents will be covered in detail in the upcoming CLI guides.
Agent Cloud Architecture
A single deployment can have multiple running versions simultaneously, allowing you to manage and update your agents with flexibility.

What is a Deployment?
A Deployment represents a managed instance of your AI agent running on VideoSDK's cloud infrastructure. When you deploy an agent to Agent Cloud, VideoSDK handles:
- Infrastructure Provisioning: Automatically allocates compute resources
- Load Balancing: Distributes incoming requests across available replicas
- Health Monitoring: Continuously monitors agent health and restarts failed instances
- Scaling: Automatically scales replicas based on demand within configured limits
Each deployment is identified by a unique name and contains configuration for how your agent should be run, scaled, and managed.
What is a Version?
A Version represents a specific release of your AI agent within a deployment. Each time you update your agent code or configuration and deploy it, a new version is created.
Version Configuration
Every version includes the following configurable parameters:
| Parameter | Description |
|---|---|
| Min Replicas | The minimum number of agent instances that should always be running. This ensures baseline availability even during low traffic. |
| Max Replicas | The maximum number of agent instances that can be scaled up to during high demand. This caps your resource usage and costs. |
| Profile | The compute resource profile that defines CPU and memory allocation for each replica. |
Resource Profiles
Agent Cloud offers predefined resource profiles to match your agent's computational requirements:
| Profile | Description | Best For |
|---|---|---|
| cpu-small | Lightweight compute resources with minimal CPU and memory allocation | Simple agents, low-traffic applications |
| cpu-medium | Balanced compute resources suitable for most production workloads | Standard agents, moderate traffic |
| cpu-large | High-performance compute resources with increased CPU and memory | Complex agents, high-traffic, compute-intensive tasks |
Deployment Regions
Agent Cloud is available in multiple regions to ensure low latency and compliance with data residency requirements:
| Region | Location | Description |
|---|---|---|
| in002 | India | Optimized for users in the Indian subcontinent |
| us002 | United States | Optimized for users in North America (default) |
If no region is specified during deployment, us002 (United States) is used as the default region.
Choose a region closest to your users for the best performance. You can specify the region when deploying your agent using the --region flag:
videosdk agent deploy --image myrepo/myagent:v1 --region in002
Replica Scaling
Replicas are individual instances of your agent running within a version. Agent Cloud automatically manages replicas based on your configuration:
-
Minimum Replicas (
minReplica): Guarantees this many instances are always running, ensuring your agent is ready to handle requests without cold start delays. -
Maximum Replicas (
maxReplica): Sets the upper limit for scaling. When traffic increases, Agent Cloud automatically spins up additional replicas up to this limit.
Example Configuration:
Min Replicas: 2
Max Replicas: 10
Profile: cpu-medium
In this example, your agent will always have at least 2 instances running but can scale up to 10 instances during peak demand, each using medium-tier compute resources.
Summary
| Term | Definition |
|---|---|
| Agent Cloud | VideoSDK's managed cloud platform for deploying AI voice agents |
| Deployment | A managed instance of your agent on Agent Cloud, capable of running multiple versions |
| Version | A specific release of your agent within a deployment, with its own scaling and resource configuration |
| Replica | An individual running instance of your agent within a version |
| Min Replicas | Minimum number of agent instances always running |
| Max Replicas | Maximum number of agent instances during peak scaling |
| Profile | Compute resource tier (cpu-small, cpu-medium, cpu-large) for each replica |
| Region | Geographic location for deployment (in002 for India, us002 for US) |
Understanding these concepts is essential for effectively deploying and managing your AI agents on Agent Cloud. In the following guides, we'll explore how to create deployments, manage versions, and configure scaling for your specific use case.
Got a Question? Ask us on discord

