Clojure: Transducers

112

ttosh 2 days ago 47 commentsRead Article on clojure.org

DE version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

35% Positive

Analyzed from 2157 words in the discussion.

Discussion (47 Comments)Read Original on HackerNews

drob518•about 3 hours ago

Transducers work even better with a Clojure library called Injest. It has macros similar to the standard Clojure threading macros except Injest’s macros will recognize when you’re using transducers and automatically compose them correctly. You can even mix and match transducers and non-transducer functions and Injest will do its best to optimize the sequence of operations. And wait, there’s more! Injest has a parallelizing macro that will use transducers with the Clojure reducers library for simple and easy use of all your cores. Get it here: https://github.com/johnmn3/injest

Note: I’m not the author of Injest, just a satisfied programmer.

adityaathalye•about 4 hours ago

May I offer a little code riff slicing FizzBuzz using transducers, as one would do in practice, in real code (as in not a screening interview round).

Demo One: Computation and Output format pulled apart

  (def natural-nums (rest (range)))

  (def fizz-buzz-xform
    (comp (map basic-buzz)
          (take 100))) ;; early termination

  (transduce fizz-buzz-xform ;; calculate each step
             conj ;; and use this output method
             []   ;; to pour output into this data structure
             natural-nums)

  (transduce fizz-buzz-xform ;; calculate each step
             str ;; and use this output method
             ""  ;; to catenate output into this string
             natural-nums) ;; given this input

  (defn suffix-comma  [s]  (str s ","))

  (transduce (comp fizz-buzz-xform
                   (map suffix-comma)) ;; calculate each step
             str ;; and use this output method
             ""  ;; to catenate output into this string
             natural-nums) ;; given this input

Demos two and three for your further entertainment are here: https://www.evalapply.org/posts/n-ways-to-fizzbuzz-in-clojur...

(edit: fix formatting, and kill dangling paren)

bjoli•about 4 hours ago

I made srfi-171 [0], transducers for scheme. If you have any questions about them in general I can probably answer them. My version is pretty similar to the clojure version judging by the talks Rich Hickey gave on them.

I know a lot of people find them confusing.

0: https://srfi.schemers.org/srfi-171/srfi-171.html

jwr•about 2 hours ago

Transducers are IMHO one of the most under-appreciated features of Clojure. Once you get to know them, building transducer pipelines becomes second nature. Then you realize that a lot of data processing can be expressed as a pipeline of transformations, and you end up with reusable components that can be applied in any context.

The fact that transducers are fast (you don't incur the cost of handling intermediate data structures, nor the GC costs afterwards) is icing on the cake at this point.

Much of the code I write begins with (into ...).

And in Clojure, like with anything that has been added to the language, anything related to transducers is a first-class citizen, so you can reasonably expect library functions to have all the additional arities.

[but don't try to write stateful transducers until you feel really comfortable with the concepts, they are really tricky and hard to get right]

pjmlp•about 3 hours ago

Nowadays you can make use of some transducers ideas via gatherers in Java, however it isn't as straightforward as in plain Clojure.

talkingtab•about 3 hours ago

When I first read about transducers I was wowed. For example, if I want to walk all the files on my computer and find the duplicate photos in the whole file system, transducers provide a conveyor belt approach. And whether there are saving in terms of memory or anything, maybe. But the big win for me was to think about the problem as pipes instead of loops. And then if you could add conditionals and branches it is even easier to think about. At least I find it so.

I tried to implement transducers in JavaScript using yield and generators and that worked. That was before async/await, but now you can just `await readdir("/"); I'm unclear as to whether transducers offer significant advantages over async/await?

[[Note: I have a personal grudge against Java and since Clojure requires Java I just find myself unable to go down that road]]

jwr•about 1 hour ago

I think, like with the rest of Clojure, none of this is "revolutionary" in itself. Clojure doesn't try to be revolutionary, it's a bunch of existing ideas implemented together in a cohesive whole that can be used to build real complex systems (Rich Hickey said so himself).

Transducers are not new or revolutionary. The ideas have been around for a long time, I still remember using SERIES in Common Lisp to get more performance without creating intermediate data structures. You can probably decompose transducers into several ideas put together, and each one of those can be reproduced in another way in another language. What makes them nice in Clojure is, like the rest of Clojure, the fact that they form a cohesive whole with the rest of the language and the standard library.

jnpnj•about 1 hour ago

https://series.sourceforge.net/ this is the SERIES package you're referring to ? sorry, mostly a CL newb here, it's the first time I read about it

justinhj•about 2 hours ago

You could always try ClojureScript

css_apologist•about 1 hour ago

Is there a gain of clojure transducers to js style iterators? - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

both solve the copying problem, and not relying on concrete types

solomonb•about 1 hour ago

I never understood what was so special about Clojure's Transducers. Isn't it essentially just applying a transformation on the lambda applied to a fold?

waffletower•about 1 hour ago

That is a bit reductive. You can consider these implementations in other languages: https://github.com/hypirion/haskell-transducers -- https://github.com/ruuda/transducers

solomonb•about 1 hour ago

It seems like a messy abstraction whose results could be achieved through a variety of other tools. :/

waffletower•about 1 hour ago

It isn't messy in Clojure

vindarel•about 2 hours ago

its Common Lisp cousin: https://github.com/fosskers/transducers/

jwr•about 1 hour ago

I'd say SERIES is it's older cousin.

BoingBoomTschak•6 minutes ago

SERIES would be the grandfather, no?

thih9•about 4 hours ago

From (2016) at least.

https://web.archive.org/web/20161219045343/https://clojure.o...

adityaathalye•about 4 hours ago

I'd reckon most of Clojure is from ten years ago. Excellent backward compatibility, you see :) cf. https://hopl4.sigplan.org/details/hopl-4-papers/9/A-History-...

whalesalad•about 3 hours ago

It's a blessing and a curse that zero innovation has occurred in the Clojure space since 2016. Pretty sure the only big things has been clojure.spec becoming more mainstream and the introduction of deps.edn to supplant lein. altho I am still partial to lein.

seancorfield•about 2 hours ago

Clojure 1.9: Spec.

Clojure 1.10: datafy/nav + tap> which has spawned a whole new set of tooling for exploring data.

Clojure 1.11: portable math (clojure.math, which also works on ClojureScript).

Clojure 1.12: huge improvements in Java interop.

And, yes, the new CLI and deps.edn, and tools.build to support "builds as programs".

vaylian•about 1 hour ago

And we can look forward to Jank https://jank-lang.org/

whalesalad•about 2 hours ago

Things have surely happened and the language has improved, but would you consider any of this to be innovative?

iLemming•about 2 hours ago

> zero innovation has occurred in the Clojure space since 2016.

Oh, really? Zero, eh?

clojure.spec, deps.edn, Babashka, nbb, tap>, requiring-resolve, add-libs, method values, interop improvements, Malli, Polylith, Portal, Clerk, hyperfiddle/electric, SCI, flowstorm ...

Maybe you should've started the sentence with "I stopped paying attention in 2016..."?

eduction•about 4 hours ago

The key insight behind transducers is that a ton of performance is lost not to bad algorithms or slow interpreters but to copying things around needlessly in memory, specifically through intermediate collections.

While the mechanics of transducers are interesting the bottom line is they allow you to fuse functions and basic conditional logic together in such a way that you transform a collection exactly once instead of n times, meaning new allocation happens only once. Once you start using them you begin to see intermediate collections everywhere.

Of course, in any language you can theoretically do everything in one hyperoptimized loop; transducers get you this loop without much of a compromise on keeping your program broken into simple, composable parts where intent is very clear. In fact your code ends up looking nearly identical (especially once you learn about eductions… cough).

fud101•about 3 hours ago

These sound wild in terms of promise but I never understood them in a practical way.

moomin•about 3 hours ago

They're not really that interesting. They're "reduce transformers". So, take a reduction operation, turn it into an object, define a way to convert one reduction operation into another and you're basically done. 99% of the time they're basically mapcat.

The real thing to learn is how to express things in terms of reduce. Once you've understood that, just take a look at e.g. the map and filter transducers and it should be pretty obvious. But it doesn't work until you've grasped the fundamentals.

eduction•about 3 hours ago

Canonical example is rewriting a non transducing set of collection transformations like

   (->> posts
      (map with-user)
      (filter authorized?)
      (map with-friends)
      (into []))

That’s five collections, this is two, using transducers:

    (into []
          (comp
            (map with-user)
            (filter authorized?)
            (map with-friends))
          posts)

A transducer is returned by comp, and each item within comp is itself a transducer. You can see how the flow is exactly like the double threading macro.

map for example is called with one arg, this means it will return a transducer, unlike in the first example when it has a second argument, the coll posts, so immediately runs over that and returns a new coll.

The composed transducer returned by comp is passed to into as the second of three arguments. In three argument form, into applies the transducer to each item in coll, the third argument. In two argument form, as in the first example, it just puts coll into the first argument (also a coll).

kccqzy•about 2 hours ago

That does not sound like a good example. The two-argument form of `map` already returns a lazy sequence. Same for `filter`. I thought lazy sequences are already supposed to get rid of the performance problem of materializing the entire collection. So

fud101•about 2 hours ago

Thanks. So is this not an optimiser Clojure runtime can do for you automatically? I find the first one simpler to read and understand.

waffletower•about 1 hour ago

I am a fan of Christophe Grand's xforms library -- https://github.com/cgrand/xforms -- I find the transducer nexus function, by-key, to be particularly useful for eliminating clojure.core destructuring dances when one needs group-by with post-processing.

waffletower•about 1 hour ago

A not too contrived example: (require '[net.cgrand.xforms :as x]) (into {} (x/by-key :name :size (comp (x/reduce +) (map str))) example-mapseq)

mannycalavera42•about 5 hours ago

transducers and async flow are :chefkiss

instig007•about 2 hours ago

You get this for free in Haskell, and you also save on not having to remember useless terminology for something that has no application on their own outside Foldables anyways.

Maxatar•about 2 hours ago

>...you also save on not having to remember useless terminology...

It may be true in this particular case, but in my admittedly brief experience using Haskell you absolutely end up having to remember a hell of a lot of useless terminology for incredibly trivial things.

tombert•about 2 hours ago

Terminology doesn't bother me nearly as much as people defining custom operators.

I used to think it was cute the you could make custom operators in Haskell but as I've worked more with the language, I wish the community would just accept that "words" are actually a pretty useful tool.

iLemming•about 2 hours ago

> You get this for free in Haskell,

Oh, my favorite part of the orange site, that's why we come here, that's the 'meat of HN' - language tribalism with a technical veneer. Congratulations, not only you said something as lame as: "French doesn't need the subjunctive mood because German has word order rules that already express uncertainty", but you're also incorrect factually.

Haskell's laziness gives you fusion-like memory behavior on lists for free. But transducers solve a broader problem - portable, composable, context-independent transformations over arbitrary reducing processes - and that you don't get for free in Haskell either.

Transducers exist because Clojure is strict, has a rich collection library, and needed a composable abstraction over reducing processes that works uniformly across collections, channels, streams, and anything else that can be expressed as a step function. They're a solution to a specific problem in a specific context.

Haskell's laziness exists because the language chose non-strict semantics as a foundational design decision, with entirely different consequences - both positive (fusion, elegant expression of infinite structures) and negative (space leaks, reasoning difficulty about resource usage).

instig007•about 1 hour ago

> Haskell's laziness gives you fusion-like memory behavior on lists for free.

Haskell laziness & fusion isn't limited to lists, you can fuse any lawful composition of functions applied over data with the required lawful instances used for the said composition. There's no difference to what transducers are designed for.

> But transducers solve a broader problem - portable, composable, context-independent transformations over arbitrary reducing processes - and that you don't get for free in Haskell either.

Transducers don't solve a broader problem, it's the same problem of reducing complexities of your algorithims by eliminating transient data representations. If you think otherwise, I invite you to provide a practical example of the broader scope, especially the part about "context-independent transformations" that would be different to what Haskell provides you without that separate notion.

> and negative (space leaks, reasoning difficulty about resource usage).

which is mostly FUD spread by internet crowd who don't know the basics of call-by-need semantics, such as the places you don't bind your intermediate evaluations at, and what language constructs implicitly force evaluations for you.

iLemming•about 1 hour ago

> you can fuse any lawful composition of functions

each of those requires manually written rewrite rules or specific library support. It's not a universal property that falls out of laziness - it's careful engineering per data type. Transducers work over any reducing function by construction, not by optimization rules that may or may not fire.

> it's the same problem

It is not. Take a transducer like `(comp (filter odd?) (map inc) (take 5))`. You can apply this to a vector, a lazy seq, a core.async channel, or a custom step function you wrote five minutes ago. The transformation is defined once, independent of source and destination. In Haskell, fusing over a list is one thing. Applying that same composed transformation to a conduit, a streaming pipeline, an io-streams source, and a pure fold requires different code or different typeclass machinery for each. You can absolutely build this abstraction in Haskell (the foldl library gets close), but it's not free - it's a library with design choices, just like transducers are.

You're third claim is basically the "skill issue" defense. Two Haskell Simons - Marlow, and Jones, and also Edward Kmett have all written and spoken about the difficulty of reasoning about space behavior in lazy Haskell. If the people who build the compiler and its core libraries acknowledge it as a real trade-off, dismissing it as FUD from people who "don't know the basics" is not an argument. It's gatekeeping.

Come on, how can you fail to see the difference between: "Haskell can express similar things" with "Haskell gives you this for free"?

eduction•about 2 hours ago

It goes beyond a foldable, can be applied to streams. Clojure had foldables, called reducers, this was generalized further when core.async came along - transducers can be attached to core async channels and also used in places where reducers were used. The terminology is used to document the thing that various contexts accept (chan, into, sequence, eduction etc). They exist to make the language simpler and more general. They could actually allow a bunch of old constructs to be dispensed with but came along too late to build the whole language around.

instig007•about 1 hour ago

> It goes beyond a foldable, can be applied to streams.

> Clojure had foldables, called reducers, this was generalized further when core.async came along - transducers can be attached to core async channels and also used in places where reducers were used.

Ok, you mean there's a distinction between foldables and the effectful and/or infinite streams, so there's natural divide between them in terms of interfaces such as (for instance) `Foldable f` and `Stream f e` where `e` is the effect context. It's a fair distinction, however, I guess my overall point is that they all have applicability within the same kind of folding algorithms that don't need a separate notion of "a composing object that's called a transducer" if you hop your Clojure practice onto Haskell runtime where transformations are lazy by default.