RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
47% Positive
Analyzed from 2724 words in the discussion.
Trending Topics
#code#packages#data#programming#cran#tidyverse#used#package#language#always

Discussion (51 Comments)Read Original on HackerNews
If you’re making a package for a small team or aren’t pushing it to a large audience then just keep it on a GitHub repository. It is almost as easy to install from GitHub with devtools as it is to install.packages().
At risk of sounding like ChatGPT, it's not an R thing, it's a general thing. Turn [showdead] on in your profile and see how Show HN is flooded with AI slop projects and we all know GitHub is drowning in it.
I code Java and Javascript by day and mostly Python for my side projects because of the practicality. I've always been the guy who can finish projects that other people couldn't by attending to the essential details that everybody else feels entitled to ignore.
As for your problem you are making the classic mistake of repeatedly posting links to your blog and nothing but links to your blog. If you were finding articles from other sources and posting (say) 10% from your own blog you wouldn't be tripping up the filters.
Sure I seem to have a led a glamourous life of enterprise sales, industrial espionage and always being ready to write and give a talk in 48 hours if a TED speaker gets kidnapped but the reality I haven't had time to fix the busted Python packages that my autoposter depends on. I am way too busy transforming into a fox when I go down the elevator and casting glamours on people and I am always tell this witch that I am the familiar of that witch but on the 6th floor they have no idea who they are dealing with.
Is anyone arguing “AI is always bad”? I think the argument is clearly “the negatives outweigh the positives”.
Some other researcher, often with limited skills in your native tongue, even more limited skills in software development best practices, wrote some code for a paper between 5 and 50 years ago and your PI has told you to use that code and some OTHER code together at the same time to validate some experiment he wants you to do.
In the past you would take days/weeks/months to get this to work, but with an LLM?
I'm envious of the grad students of today for the amount of nonsense which is bypassable.
AI speeds up learning, so I bet that’s what you’re noticing with R.
As an aside, the best programmers these days are probabilistic programmers (who write stochastic functions). Our languages are Stan and PyMC. Both can be called by Python or R, and AI writes all of them extremely well. So it seems to me that the underlying language matters less than ever.
R these days mostly uses the tidyverse, which feels like a variant of DOP (Data-Oriented Programming). It's a kind of data flow, so it's different from typical OOP. I also occasionally work with statisticians (being a freelancer, ETL work is more common than you'd think), and I know what you mean by Stan and PyMC. I know they're powerful tools for Bayesian statistics and multilevel modeling. I know the basic syntax and examples, but I wouldn't say I know them well. My level is mainly focused on the scientists who hire me, and those tools still don't come up often in my country.
That said, I think we differ on the bigger picture because academic code isn't everything. Academic code is typically algorithm‑centric, like LeetCode problems, but most production work revolves around code hygiene and responsibility (algorithms are usually already established ones). Anyway, that's not the main point. What you said is mostly correct, but my focus was on something else: even people who studied at that level can be surprisingly clumsy at expressing themselves through programming. Regardless, thanks for your input, and I agree that AI is good at programming. But using a programming language generally means understanding its tradeoffs, and R is tricky in that regard since it feels like a mix of OOP and DOP variants
Programming isn’t even a field in the same way as prob&stats. Computer science does in fact have non-deterministic sub fields such as information theory.
Information theory doesn’t even incorporate utility.
As a result of the above, it is full of packages that come with associated datasets right in the package itself. Packages with a tiny script and gigabytes of data. Or perhaps just the data without any actual code.
Very weird universe.
The fact that there is a human (and one with expertise in R) reviewing each incoming package makes pure vibe coded slop much, much harder to get approved.
https://cran.r-project.org/web/packages/policies.html
In the same boat... from a PL perspective, yikes (especially the macro mechanism that somehow never seemed to be planned, but somehow exists). As a working statistician? It really does get work done quickly.
To pass inputs with complex unevaluated syntax, I've seen...
– ad-hoc string parsing (lavaan etc.)
– formulas (which somehow the tidyverse doesn't use),
– base R syntax manipulation by round-tripping between as.list and as.call;
– and whatever wheel reinvention with bizarre semantics that the tidyverse uses.
First, usage: Using R for our undergrads in time of LLMs is brilliant. ChatGPT slops out working code for their needs. Not pretty but works better that in 2022.
Second, development: Mastering R is hard, because its kalkül. Tidyverse mediates some of it, but still. This is the perfect breeding ground for slopification. Lets see.
Third, errata: I would love to know the percentage of science built on R to this day. I mean insights and analysis supported by it and it vast packages. What if somewhere, deep down in the stack there is an ancient bug that dented all of this? I think AI might help us here, or review slop will negate this?
Science is built on libraries with experience, that have been validated extensively against reality. Code often written by people who have retired and died because that exact same code has been validated and pinned to reality for decades. It is of course possible that a load bearing bug survives for a long time conspiring with an incorrect model of reality to give validated results, but wide use tends to eliminate these things.
The real world data transformations can get gnarly very quickly and sql is the perfect common debiminator compared to dplyr which is still niche...
How do you feel about polars?
It’s certainly miles better than Pandas, which has a terrible API in addition to being comically inefficient. In my group, we generally use it for any new work, and have also swapped out pandas for polars in critical spots of our existing code - the latter giving a huge benefit relative to the amount of work it took.
I largely agree with you on SQL being the common denominator, but there are some things that are just awkward in SQL, and much easier to do in Python or other general purpose language.
Posit is obviously the only organization with the pull to do that, and I feel like they got pulled in 10 directions during the move to AI and trying to also support Python. R Shiny is dead too which sucks because reflex.dev just copied them and ate their lunch in 3 months.
Not to mention the ridiculous styling/formatting of most tidyverse users, which Wickham and others seem to promote. One of the reasons R has lost ground to other languages recently is that most R code these days is ugly
The fact that young people are producing sub-optimal code (in terms of whatever optimization criteria you are choosing--here, it sounds like terseness) is not strong evidence that a particular software ecosystem (tidyverse) is flawed. Young people producing bad code is not surprising. They're your grad students, mentor them, and maybe they'll adapt to your ways of thinking. Or not.
> One of the reasons R has lost ground to other languages recently is that most R code these days is ugly
Citation needed, surely. The fact that this article is about an increase in the number of CRAN submissions and pseudo-quantitative indices like the TIOBE index show R's slice of the pie is growing provides evidence to the contrary.
What an awful thing to imagine. It's already the programming language of choice for egregious abuses of good practice.
People I've worked with that used R and manged data / did analysis didn't really seem too concerned with long term maintenance.
Secondary observation, these same people were the first to preach for the AI coding gospel.
Two things that make me wonder if they can possibly turn out good quality R.
Perhaps a true test of AGI will be when you ask it to write an application in R and it refuses for fear of what people might think.
Unless you’re the poor schmuck who is given the task of running the code written by the previous analyst, who has probably already left the company. Often it’s easier to just throw something together from scratch and then look for a new job, perpetuating the problem.
As a working data scientist, I know I am not a computer scientist or a 10x engineer (hell, I am probably a 0.8x engineer), but that's not where my expertise is. My engineer co-workers are 0.01x data scientists, but you won't see me complaining that they don't know the Central Limit Theorem or how to build a causal inference engine.
It's been a while so I don't remember any details. I don't go on Twitter/X as much as I used to in those days.