Zen and the Art of Machine Learning Research

jjxmorris12 3 days ago 12 commentsRead Article on blog.jxmo.io

HI version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

87% Positive

Analyzed from 857 words in the discussion.

Discussion (12 Comments)Read Original on HackerNews

jdw64•about 2 hours ago

I feel that the Zen used in the West and the Zen in East Asia are quite different. I think the Western Zen is probably the one from the 1970s book Zen and the Art of Motorcycle Maintenance. It usually carries a sense of equanimity and beginner's mind. But in East Asia, Zen actually emphasizes aimlessness or non‑purposefulness.

The point where I really feel the difference is that Western Zen seems to be about how to train the self to become stronger, whereas actual Seon (Zen) in East Asia is about going with nature, letting go of the self, and allowing things to flow. In the actual practice of Seon, it's about doubting the self, letting go of attachments, and realizing that achievement, comparison, and the desire for control are all just fleeting. There's a famous phrase: 'Banghasak (放下著)' — let it all go.

If anything, I think ancient Roman Stoicism feels more like Zen than Western Zen does

So that's fascinating. When I saw this article, I was expecting it to be about whether we should give up the desire for success, but instead it took a completely different direction, which was surprising

peepee1982•about 1 hour ago

Similarly, the Western idea of Stoicism seems to focus mostly on controlling or even suppressing your emotions (at least on surface level), while the Stoicism you rightly call "Roman" (thanks for that, btw) is much more holistic and more of an ethical framework.

jdw64•about 1 hour ago

Thank you for letting me know correctly.

rented_mule•about 1 hour ago

Around 2015, I found myself managing back end and machine learning engineers (not researchers) at the same time. Many of the back end engineers wanted to do more ML. Some of them did well when given a chance, but others wanted to revert to back end within a few months. At the same time, one of the ML leaders wanted to step away from ML and only do back end work to support ML.

As I studied these dynamics, something occurred to me... Different people need to see signs of success at different frequencies. Because of the nature of our product, measuring the performance of a new/updated model required the model to be live for at least a full calendar month. So, between initial work and final analysis, it was often a 2 month wait or more. For many back end tasks, you can build a quick prototype, run it to see if it works, and be on your way - the signals come all day long. The varying frequency needs of different people went a long way to determining which of them liked working on ML.

This is sort of a manager's version of feature engineering. ;-) The people on that team taught me a lot!

sdfsefsdf•about 2 hours ago

Perhaps I've been deep in my own issues for too long, but it seems to me that the author is trying to say "don't trust the current evaluation suites too much"; scores only reflect a small part of the problem. What's interesting is discovering a new, stable evaluation metric, doing something new based on it, and having that new thing yield some unexpected intelligent results

stared•about 1 hour ago

It revolves around the sentiment of "go deeper" - but I think it is a double-edged sword. Sure, entropy, tensors and gradients are important - and yes, they are pretty much requirements.

But from what I see, it is the opposite - a lot (if not virtually all) progress in the last decade of deep learning was not because of a fundamental idea, but incremental, experimentally-verified practice. Even though I think there is good intuition for why ReLU is better than sigmoid (tl;dr: last layer is log(sigmoid) ~ ReLU, putting anything different inside kills the gradient), the original paper by Hinton himself was more or less "because it trains 3x faster".

Re-thinking fundamentals might help, but most "let's change the fundamentals" is rarely how it works. Even the most seminal papers, i.e. AlexNet and "Attention Is All You Need", are refinements of existing ideas, and show how they help.

Machine learning is an experimental science. Many mathematically cool ideas do not work. Many engineering ones do.

> I've tweeted before that one of the most important traits in a researcher is healthy paranoia. Be paranoid!

I have seen so many PhDs burned out to cinders; I don't think it is any more a good piece of advice than "depression is good for philosophers". Sure, be a relentless explorer.

> In short, holding on to ideas for too long can actually be counterproductive. Stay open-minded and refuse to let ego cloud your judgement.

Which I think is true.

lostdog•about 3 hours ago

I have some coworkers that are similar in everything--education, work ethic, and intelligence--but some of the tick out ML ideas that work like clockwork, while others get hits rarely if ever. I cannot tell what makes it work for some and not others. Their ideas both sound equally good.

Sometimes a coworker will be an ML star for a year or two, but then suddenly run out of steam. It's brutal to watch.

I used to think most smart people had similar distributions of good ideas, and it was just that the hardest working tried out all 50 of their ideas to pick out the 2 good ones. But I've seen smart and hardworking people have a hit rate of 0.

fyredge•about 2 hours ago

That's the nature of research. You try every idea that may be a good avenue and only a handful work out, if at all. That's why quantifying research credibility via publication and citation counts inherently lead to toxic work cultures. The best ideas must be given time to be discovered, not forced out and contorted to fit the requirements of a journal.

bobmarleybiceps•about 2 hours ago

this is part of why I think most researchers get less productive over time... Someone gets some big result during grad school or early career, get some big job from it, and then struggle to get new results of similar quality :shrug:

With ML in particular, there's also the sheer volume of people basically all looking at (essentially) the same problems... so it's kind of like monkeys with type writers spamming ideas until some work.

jack_pp•about 2 hours ago

In spirituality it is believed that ideas and inspirations aren't our own. That our mind is like an LLM that gets prompted by higher beings. In research everyone has high param count minds, trained for many years by studying. But just like LLMs by themselves are useless at creating new original work, no matter the compute you have available, so the mind can not create anything new without "inspiration"

59nadir•about 1 hour ago

Wow, this makes ML sound even more like voodoo than I thought. Can you give examples of what the nature of these ideas is?

nathaah3•about 2 hours ago

This is gold!!!!