Autoscaling for Model Deployments in Data Science is now Available

Services: Data Science
Release Date: March 13, 2024

Some Key benefits of autoscaling for model deployment include:

Dynamic Resource Adjustment: Autoscaling automatically increases or decreases the number of compute resources based on real-time demand (for example, autoscale and downscale from 1 to 10). This ensures that the deployed model can handle varying loads efficiently.
Cost Efficiency: By adjusting resources dynamically, autoscaling ensures you only use (and pay for) the resources you need. This can result in cost savings compared to static deployments.
Enhanced Availability: Paired with a load balancer, autoscaling ensures that if one instance fails, traffic can be rerouted to healthy instances, ensuring uninterrupted service.
Customizable Triggers: Users can customize the autoscaling query using MQL expressions.
Load Balancer Compatibility: Autoscaling works hand-in-hand with load balancers where LB bandwidth can be scaled automatically to support more traffic, ensuring best performance and reducing bottlenecks.
Cool-down Periods: After scaling actions, there can be a defined cool-down period during which the autoscaler doesn't take further actions. This prevents excessive scaling actions in a short time frame.

For more information, see the documentation.