Now Available: Optimized DeepSeek-R1 On GMI Cloud

GMI Cloud is happy to announce that we are hosting DeepSeek and its distilled models!

GMI Cloud is excited to announce that we are now hosting a dedicated DeepSeek-R1 inference endpoint, on optimized, US-based hardware.

What’s DeepSeek-R1? Read our initial takeaways here.

Technical details:

  • Model Provider: DeepSeek
  • Type: Chat
  • Parameters: 685B
  • Deployment: Serverless (MaaS) or Dedicated Endpoint
  • Quantization: FP16
  • Context Length: The model can remember and process up to 128,000 tokens from previous inputs within a single session.

Additionally, we are offering the following distilled models:

  • DeepSeek-R1-Distill-Llama-70B
  • DeepSeek-R1-Distill-Qwen-32B
  • DeepSeek-R1-Distill-Qwen-14B
  • DeepSeek-R1-Distill-Llama-8B
  • DeepSeek-R1-Distill-Qwen-7B
  • DeepSeek-R1-Distill-Qwen-1.5B

Try our token-free service with unlimited usage!

Reach out for access to our dedicated endpoint here.

1 Like