Calculating Cost in Generative AI

In Generative AI, you can pay for on demand inferencing or dedicated AI clusters:

On Demand Inferencing
  • You pay as you go.
  • You pay for each inference call's character length.
  • On the pricing page, when you see the price for number of transactions, that's the number of characters in inference calls. One transaction equals to one character.
Dedicated AI Clusters
  • You get a dedicated set of GPUs.
  • You can fine-tune custom models on the dedicated AI clusters.
  • You can host replicas of foundational and fine-tuned models on the dedicated AI clusters.
  • You commit in advance to certain hours of using the dedicated AI clusters.

Review the following topics and examples to help you decide between on demand inferencing or using dedicated AI clusters and to calculate the cost for each option.