ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
76% Positive
Analyzed from 2891 words in the discussion.
Trending Topics
#models#local#more#model#running#lot#laptop#code#seat#better

Discussion (65 Comments)Read Original on HackerNews
That’s the real limitation on an economy flight - space rather than power or the internet… at least it would be for me.
The only times I was able to get my laptop out and do some productive work was when I either was sitting in premium economy isle seat with room to spare or when there was an empty seat next to me
What really makes me nervous if I'm in an economy seat is the seat in front of me. Depending on how the seat is designed, if the person suddenly reclines (or hell, just flexes the seat a bunch while moving around), it can come pretty close to pinching the laptop screen. That would be bad news.
The ergonomics of using a laptop on an economy-class tray table are not worth it. You're sitting there like a T-rex trying to make your arms as small as possible to tap on the keys. And the vertical viewing angle to your screen sometimes prevents you from even seeing anything. I wouldn't even bring my laptop out during a flight.
The trick I've found is to pack a bluetooth keyboard. If you put your laptop on the tray table, you can put the bluetooth keyboard on your legs _under_ the tray table and have your arms fully and comfortably extended. This works especially well if you're a vim/emacs/other keyboard driven editor user as you very rarely need to reach up to poke the trackpad .
The tradeoff of poor comfort is insane productivity, for me anyway. Being restricted in place, no wifi, inconvenient toilet breaks, not in control of meal times, all means I get a lot of work done
A few years ago I saw some very interesting custom ergonomic setups optimized for traveling + flying.
One person with a thinkpad is able to get the monitor to be 180 degrees flat w/ the keyboard, and can hang it off the seat. He also brings a split ergo keyboard with a lap mount.
Another person did something similar with a M1 laptop, but needs an Ipad to act as the external monitor (laptop stays in bag) with a built and designed from scratch split ergo keyboard.
They are "OK enough" that it will be a matter of taste if they are acceptable or not at this point for you to use.
For coding they work fine for me, terminal tools work particularly well as I can bump the font size up. IDEs and web browsing aren't bad either, it's about the equivalent of a single 1080p screen. They are nicer than hunching over a laptop for travel use but I still prefer a proper monitor when available.
The optics are a generation or two from being where they need to be to market these as productivity devices, but if you like being an early adopter with all the quirks that come with it, they're fun.
It's also uncomfortable to look at the very bottom of the screen (which is where all the chat text boxes are), so I usually resize all my windows to be a bit smaller. With that, it's very good (and you can always just increase the font size).
I would like glasses with smaller fov, so I didn't have to look around so much, but that's probably just me, since everyone else likes them larger.
Return flight will test this with the correct cable. I expect at least 16% improvement against the 70W cap
Some plane sockets cut out completely if you attempt to draw more then the limit rather then continuing to provide power at the limit.
[1] https://betweentheprompts.com/40000-feet/
I really don't know what the hell people are doing locally, and suspect a lot of the hype around running these models locally is bullshit. Sure, you can make it do something but certainly nothing useful or substantial.
I ran 8 tests on a variety of open-weights models, and opus 4.7 (1mil ctx version) and the little dense model was right behind it: https://github.com/sleepyeldrazi/llm_programming_tests/tree/... Of note is that opus was the only model to push back against the spec on the hardest challenge, saying 'thats not possible', when there are links in the spec to examples of it being done.
There may be problems with the mlx versions, as i haven't had any looping in all the testing i've done, which is all my agentic and coding work the last couple of days (since it dropped). I have had tool_call misses 4 or 5 times so far, which isn't ideal but no looping. First I used it in pi-mono and later when i realized it's a serious model switched to opencode.
My setup is llama.cpp running on a 3090 in WSL, unsloth IQ4_NL with those flags: --ctx-size 128000 \ --jinja \ --temp 0.6 \ --top-p 0.95 \ --top-k 20 \ --min-p 0.0 \ --repeat-penalty 1.0 \ --presence-penalty 0.0 \ --threads 12 \ --gpu-layers 99 \ --no-warmup \ --no-mmap \ -fa on
I admit I sometimes get caught up in the tooling for its own sake, but I find local models useful for specific tasks like migrating configuration schemas, writing homelab scripts, or exploring financial data.
It might sound a bit paranoid, but privacy is another major driver for me. Keeping credentials and private information off cloud services is worth the extra friction.
That said I don't see why people are so scared to touch code even if it saves them 500 euro a month. Using my IDEs find across my repo and auto replacing 2 patterns is trivial to do and way faster to do by hand. I mostly use small models, it prevents a lot of the issues I've seen with large models and vibe/agentic coding medium to long term. I also write a lot of code.
There is certainly a lot of hype around local models. Some of it is overhype, some of it is just "people finding out" and discovering what cool stuff you can do. I suspect the post is a reply to the other one a few days ago where someone from hf posted a pic with them in the plane, using a local model, and saying it's really really close to opus. That was BS.
That being said, I've been working with local LMs since before chatgpt launched. The progress we've made from the likes of gpt-j (6B) and gpt-neoX (22B) (some of the first models you could run on regular consumer hardware) is absolutely amazing. It has gone way above my expectations. We're past "we have chatgpt at home" (as it was when launched), and now it is actually usable in a lot of tasks. Nowhere near SotA, but "good enough".
I will push back a bit on the "substantial" part, and I will push a lot on "nothing useful". You can, absolutely get useful stuff out of these models. Not in a claude-code leave it to cook for 6 hours and get a working product, but with a bit of hand holding and scope reduction you can get useful stuff. When devstral came out (24B) I ran it for about a week as a "daily driver" just to see where it's at. It was ok-ish. Lots of hand holding, figured out I can't use it for planning much (looked fine at a glance, but either didn't make sense, or used outdated stuff). But with a better plan, it could handle implementation fine. I coded 2 small services that have been running in prod for ~6mo without any issues. That is useful, imo. And the current models are waaay better than devstral1.
As to substantial, eh... Your substantial can be someone else's taj mahal, and their substantial could be your toy project. It all depends. I draw the line at useful. If you can string together a couple of useful tasks, it starts to become substantial.
The biggest lesson I've learned working with local models so far is: with the smaller models, you have to understand their limitations, be willing to run experiments, and fine-tune the heck out of everything. There are endless choices to be made: which model to use, which quant, thinking or not, sampling parameters, llama.cpp vs vLLM, etc. They much more fiddly for serious work than just downloading Claude Code and having it one-shot your application. But some of us enjoy fiddling so it all works out in the end.
Agree it's more fiddly than Claude code. But, it's also free, and in many cases way faster. For me I don't have a need for Claude code. I believe hands on keys is one of the most important parts of the SDLC especially for serious work. So small models fit the bill for me.
It works great for me. But I like to review the code and understand what it's doing, which doesn't appear to be how people do "useful or substantial" programming these days.
From Qwen3.6 page:
Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Instruct (or non-thinking) mode: temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
Gist with the compose and example of an output. https://gist.github.com/meaty-popsicle/f883f4a118ff345b430c3...
set min_p to like 0.3 and ignore top_p and top_k and you'll be fine.
There's better samplers now like top N sigma, top-h, P-less decoding, etc, but they're often not available in your LLM inference engine (i.e. vLLM)
For people who aren't completely vibe or agent coding these models are better than say copilot or the free models appearing after a Google search. Probably better than chatgpts flagships in some ways.
I mostly use 4b to 9b models for basic inquiries and code examples from libraries I haven't used before. Many of them can solve pretty hard math problems, and these are several steps away from say qwen3.6.
I would not discount running models locally. It's the best case scenario of a future with LLMs from a human rights and ecological perspective.
(I’ve recently discovered that you can pipe local models into Claude’s Code and Desktop, so this is on my list to try).
/R/localllama is okay for some information but beyond that there is so much noise and very little signal. I think it's intentional.
* New models running in llama.cpp (what's under the hood of ollama et al) frequently require bug fixes.
* The GGUF models that run in llama.cpp frequently require bug fixes (Unsloth is notorious for this -- they release GGUF models about 10 minutes after official .safetensors releases).
* You're probably running a <Q8 quantization of the model, and a good chance <BF16 quantization for KV cache. This makes for compounding issues as context grows and tool calls multiply.
Local models really are great but I think a major problem are the people in groups like r/localllama who run models at absurd quantization levels in order to cram them on their underpowered hardware and convince themselves that they're running SOTA at home.
The best way to run these models is, frankly, a lot of VRAM and vLLM (which is what the people developing these models are almost certainly targeting).
Did the author mean Qwen3.6-27B? Qwen3.6-35B-A3B?
curious if you had to heavily throttle the cpu or stick to super small quants (like 4 bit phi3) to actually make it through 10 hours without a power outlet?
Also, agreed with the other commenters: just read a damn book and take a nap.
Not only shouldn't we support someone like Elon Musk but also don't you find it hypcritic to respond with 'just read a damn book' and suggesting starlink?
Let's say you have a basic setup like llama.cpp and llama-server on a remote server (even if it's just sitting under your home office desk) running a 35GB Q8 quantized model of qwen 3.6 35B, it's not difficult to make llama-server available to your laptop over just about any form of internet connection and VPN.
Having the ability to run that same model locally if you really need to because no internet connection whatsoever is available, but the times that you simultaneously have no internet and a serious need for something the model can output are fairly rare these days.
Most certainly avoidable, unfortunately.
on the other hand, $6200 every few years is pretty tiny compared to a typical US developer salary, so is this really that crazy if it's your primary work machine?