Introduction

What is Agent Cloud?

Agent Cloud is VideoSDK's fully managed cloud infrastructure for deploying and running AI voice agents. It abstracts away all the complexity of server management, scaling, and maintenance, allowing you to focus entirely on building your agent logic.

Agent Cloud supports two deployment workflows:

Low-Code Deployment (UI-Based)

For users who prefer a visual approach, Agent Cloud provides a low-code interface where you can:

Design your AI agent directly from the VideoSDK dashboard
Configure agent behavior, prompts, and integrations through the UI
Deploy with a single click – no coding required

This approach is ideal for rapid prototyping, non-technical users, or teams that want to iterate quickly without writing deployment scripts.

Developer Deployment (CLI-Based)

For developers who build custom AI voice agents using the VideoSDK Pipeline, Agent Cloud provides a CLI-based deployment workflow:

Develop your AI voice agent using the VideoSDK Agents Python SDK
Use the VideoSDK CLI to package and deploy your agent to the cloud
Manage deployments, versions, and configurations programmatically

This approach gives developers full control over their agent code while leveraging the managed infrastructure benefits of Agent Cloud.

info

Check out the CLI Installation Guide to get started with deploying your agents to Agent Cloud.

Agent Cloud Architecture

A single deployment can have multiple running versions simultaneously, allowing you to manage and update your agents with flexibility.

Agent Cloud Architecture

What is a Deployment?

A Deployment represents a managed instance of your AI agent running on VideoSDK's cloud infrastructure. When you deploy an agent to Agent Cloud, VideoSDK handles:

Infrastructure Provisioning: Automatically allocates compute resources
Load Balancing: Distributes incoming requests across available replicas
Health Monitoring: Continuously monitors agent health and restarts failed instances
Scaling: Automatically scales replicas based on demand within configured limits

Each deployment is identified by a unique name and contains configuration for how your agent should be run, scaled, and managed.

What is a Version?

A Version represents a specific release of your AI agent within a deployment. Each time you update your agent code or configuration and deploy it, a new version is created.

Version Configuration

Every version includes the following configurable parameters:

Parameter	Description
Min Replicas	The minimum number of agent instances that should always be running. This ensures baseline availability even during low traffic.
Max Replicas	The maximum number of agent instances that can be scaled up to during high demand. This caps your resource usage and costs.
Profile	The compute resource profile that defines CPU and memory allocation for each replica.

Resource Profiles

Agent Cloud offers predefined resource profiles to match your agent's computational requirements:

Profile	Description	Best For
cpu-small	Lightweight compute resources with minimal CPU and memory allocation	Simple agents, low-traffic applications
cpu-medium	Balanced compute resources suitable for most production workloads	Standard agents, moderate traffic
cpu-large	High-performance compute resources with increased CPU and memory	Complex agents, high-traffic, compute-intensive tasks

Deployment Regions

Agent Cloud is available in multiple regions to ensure low latency and compliance with data residency requirements:

Region	Location	Description
in002	India	Optimized for users in the Indian subcontinent
us002	United States	Optimized for users in North America (default)

note

If no region is specified during deployment, us002 (United States) is used as the default region.

Choose a region closest to your users for the best performance. You can specify the region when deploying your agent using the --region flag:

videosdk agent deploy --image myrepo/myagent:v1 --region in002

note

In examples like myrepo/myagent:v1, myrepo is a placeholder for your Docker registry username (e.g., your Docker Hub username).

Replica Scaling

Replicas are individual instances of your agent running within a version. Agent Cloud automatically manages replicas based on your configuration:

Minimum Replicas (minReplica): Guarantees this many instances are always running, ensuring your agent is ready to handle requests without cold start delays.
Maximum Replicas (maxReplica): Sets the upper limit for scaling. When traffic increases, Agent Cloud automatically spins up additional replicas up to this limit.

Example Configuration:

Min Replicas: 2
Max Replicas: 10
Profile: cpu-medium

In this example, your agent will always have at least 2 instances running but can scale up to 10 instances during peak demand, each using medium-tier compute resources.

Summary

Term	Definition
Agent Cloud	VideoSDK's managed cloud platform for deploying AI voice agents
Deployment	A managed instance of your agent on Agent Cloud, capable of running multiple versions
Version	A specific release of your agent within a deployment, with its own scaling and resource configuration
Replica	An individual running instance of your agent within a version
Min Replicas	Minimum number of agent instances always running
Max Replicas	Maximum number of agent instances during peak scaling
Profile	Compute resource tier (cpu-small, cpu-medium, cpu-large) for each replica
Region	Geographic location for deployment (in002 for India, us002 for US)

Understanding these concepts is essential for effectively deploying and managing your AI agents on Agent Cloud. In the following guides, we'll explore how to create deployments, manage versions, and configure scaling for your specific use case.

Got a Question? Ask us on discord

What is Agent Cloud?​

Low-Code Deployment (UI-Based)​

Developer Deployment (CLI-Based)​

Agent Cloud Architecture​

What is a Deployment?​

What is a Version?​

Version Configuration​

Resource Profiles​

Deployment Regions​

Replica Scaling​

Summary​

What is Agent Cloud?

Low-Code Deployment (UI-Based)

Developer Deployment (CLI-Based)

Agent Cloud Architecture

What is a Deployment?

What is a Version?

Version Configuration

Resource Profiles

Deployment Regions

Replica Scaling

Summary