Search/
Skip to content
/
OpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Apps
  • Models
  • Providers
  • Pricing
  • Enterprise
  • Labs

Company

  • About
  • Announcements
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR
  • Data

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube
Favicon for Nvidia

NVIDIA

Browse models provided by NVIDIA (Terms of Service)

6 models

Tokens processed on OpenRouter

  • NVIDIA: Nemotron 3 SuperNemotron 3 SuperFree variant

    NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.

by nvidiaMar 11, 20261M context$0/M input tokens$0/M output tokens
  • NVIDIA: Llama Nemotron Embed VL 1B V2Llama Nemotron Embed VL 1B V2Free variant

    The Llama Nemotron Embed VL 1B V2 embedding model is optimized for multimodal question-answering retrieval. The model can embed 'documents' in the form of image, text, or image and text combined. Documents can be retrieved given a user query in text form. The model supports images containing text, tables, charts, and infographics.

    by nvidiaFeb 25, 2026131K context$0/M input tokens$0/M output tokens
  • NVIDIA: Nemotron 3 Nano 30B A3BNemotron 3 Nano 30B A3BFree variant

    NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.

    by nvidiaDec 14, 2025256K context$0/M input tokens$0/M output tokens
  • NVIDIA: Nemotron Nano 12B 2 VLNemotron Nano 12B 2 VLFree variant

    NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

    by nvidiaOct 28, 2025128K context$0/M input tokens$0/M output tokens
  • NVIDIA: Nemotron Nano 9B V2Nemotron Nano 9B V2Free variant

    NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.

    by nvidiaSep 5, 202532K context$0/M input tokens$0/M output tokens
  • Meta: Llama Guard 4 12BLlama Guard 4 12BFree variant

    Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM—generating text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. Llama Guard 4 was aligned to safeguard against the standardized MLCommons hazards taxonomy and designed to support multimodal Llama 4 capabilities. Specifically, it combines features from previous Llama Guard models, providing content moderation for English and multiple supported languages, along with enhanced capabilities to handle mixed text-and-image prompts, including multiple images. Additionally, Llama Guard 4 is integrated into the Llama Moderations API, extending robust safety classification to text and images.

    by meta-llamaApr 30, 2025164K context$0/M input tokens$0/M output tokens