Sarvam-1-VL-4B-Instruct - VLLM (Merged)
Model Description
This is the recommended version for inference. Fully merged 16-bit model combining Qwen3-VL-4B-Instruct with trained LoRA weights.
Training Details
- Base Model: Qwen/Qwen3-VL-4B-Instruct
- Training Method: LoRA fine-tuning
- Format: Merged 16-bit weights
- Training Steps: 2,000
- Final Loss: 6.25
Datasets
Trained on 4 datasets covering:
- Translation (40%): BPCC - 22 Indic languages ↔ English
- Instruction Following (20%): Pralekha - 11 language pairs
- Document Layout (30%): IndicDLP - Document understanding
- Visual QA (10%): DocVQA - Question answering
Supported Languages
Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English
Usage
from transformers import Qwen3VLForConditionalGeneration, Qwen3VLProcessor
from PIL import Image
# Load model
model = Qwen3VLForConditionalGeneration.from_pretrained(
"mashriram/Sarvam-1-VL-4B-Instruct-VLLM",
torch_dtype="auto",
device_map="auto"
)
processor = Qwen3VLProcessor.from_pretrained("mashriram/Sarvam-1-VL-4B-Instruct-VLLM")
# Prepare input
image = Image.open("document.jpg")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Translate this document from English to Hindi."}
]
}
]
# Generate
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))
Performance
- Inference Speed: Optimized for VLLM serving
- Memory: ~8-9GB VRAM (fp16)
- Quality: Balanced accuracy and speed
License
Apache 2.0
- Downloads last month
- -
Model tree for mashriram/Sarvam-1-VL-4B-Instruct-VLLM
Base model
Qwen/Qwen3-VL-4B-Instruct