Back to News
Advertisement
ccleardusk 1 day ago 14 commentsRead Article on github.com

HI version is available. Content is displayed in original English for accuracy.

The model has 3B active parameters. We put the code, homepage, paper and model links here:

- Code: https://github.com/bytedance/Lance

- Homepage: https://lance-project.github.io/

- Paper: https://arxiv.org/abs/2605.18678

- Model: https://huggingface.co/bytedance-research/Lance

p.s. Lance is a research project, not a polished product. The model was trained using fewer than 128 GPUs.

Advertisement

⚡ Community Insights

Discussion Sentiment

100% Positive

Analyzed from 163 words in the discussion.

Trending Topics

#video#understanding#model#seems#hopefully#great#actual#lance#resolution#frame

Discussion (14 Comments)Read Original on HackerNews

embedding-shapeabout 19 hours ago
Video understanding is kind of new, especially if done well, and hopefully working well with UI and UX, that'd be great. Current agents already struggle a bit with 2D space with normal screenshots of unconventional UIs, wonder if this model would do better with actual recordings of navigating and using applications, feels like it could help a bunch with understanding UX at least hopefully. Will be fun to play around with :)
wxwabout 16 hours ago
What’s SOTA for video understanding? AFAIK most video search is powered by transcription and not the actual video. This seems impressive.
bguberfainabout 20 hours ago
Any plans to port to sglang or vLLM?
nkvdevabout 21 hours ago
Great quality, forked and going to try
Tsarpabout 23 hours ago
Nice work. Wish they had picked another name given how popular lance/lancedb is.
popalchemistabout 22 hours ago
Seems like the video output is crippled. Resolution is low (720 or so), as is the frame rate. The samples are shown up-scaled and frame-interpolated.

Why do that? Seems strange to be building sub-hd resolution video models in 2026.

jadboxabout 21 hours ago
Sure, but again, it's a micro 3B model. Perhaps it can't be used for general video work, but it might be able to do basic edits like remove an object from a table in a shot.
MattRixabout 20 hours ago
It’s not a micro model at all, it requires 40gb of VRAM. The 3B is just the active parameters.
asadmabout 23 hours ago
last dance for lance vance!
clearduskabout 22 hours ago
:D