Parse, Don't Validate – In a Language That Doesn't Want You To

ffagnerbrack about 3 hours ago 51 commentsRead Article on cekrem.github.io

⚡ Community Insights

Discussion Sentiment

56% Positive

Analyzed from 3180 words in the discussion.

Discussion (51 Comments)Read Original on HackerNews

Xenoamorphous•8 minutes ago

I don't like zod. I want to define my types, not write schemas. And I don't like that then I have to use the types derived from those schemas rather than types I've defined myself directly.

So I just define my types and then use typescript-json-schema or similar to build a JSON Schema at build time (i.e. from an npm script) which then I use to validate input using ajv.

The only thing I do on top of that is to use annotations like "@minimum 0" (or, in the email example, "@format email") where the base types are not enough, but those simply go inside comments.

So the compiled package only has ajv as runtime dependency (which you're likely to have anyway, as it's everywhere), you're just defining regular types with some annotations on top and use a dev dependency to build you the JSON Schema. And as popular as zod is, I think JSON Schema is more of a standard and likely to stay with us longer.

I also reference those generated JSON Schemas from my OpenAPI definition, as a bonus.

ramon156•about 1 hour ago

The author found out about the square holes in round peg situation with TS. Functions can implicitly error, and there's no annotation that's enforced to tell you that it might error. FP solves this with Result/Option, but this doesn't fit in TS. Effect is there to find a solution but will fail.

Zod is the acceptable middleground in my opinion. Zod will allow you to throw a schema against an object and it'll tell you "yes the result fits your schema". This is fine for most projects.

If you want to go zero-dependency, you can see how far you can get with TS's type system. Branded types are kinda cool. NewTypes are also cool, but also high maintenance. Unless you're building a library that millions depend on, it's probably not worth it.

whilenot-dev•9 minutes ago

FYI branded types and newtypes are kind of the same thing, branded types just use a unique symbol that's expressed explicitly.

epolanski•39 minutes ago

> Effect is there to find a solution but will fail.

What do you mean?

I'm into Effect from long time and it really scales well the more complex your applications.

Schema is way more advanced than Zod by the way, both at type level and functionality it has a proper decoder/encoder architecture.

You can encode "this isn't just a string -> non-empty-string -> valid email pattern" but a confirmed email the user has clicked on at the type level, by leveraging effectful schemas (and durable workflows if you want).

You may not need it 99% of the time, I myself rarely use that, but it's not a fair comparison.

Zod is more ergonomic, has easier apis and is perfect for most users. Would not recommend schema unless one buys the whole package.

programmarchy•15 minutes ago

I haven’t used Effect but the problem I see with using it is that it seems to want to completely swallow the whole app architecture. At that point, why not just use a functional language?

Altern4tiveAcc•about 1 hour ago

Zod is by far the most ergonomic way to express those ideas in TypeScript these days. I miss it when writing code in other languages.

The friction with the rest of the ecosystem is real, though. Most code out there expects you to handle errors with exceptions.

I get the impression that polymorphic return types could get in the way of JSC/V8/SpiderMonkey's JIT, but I haven't measured it and I'm not sure of the actual impact on hot and cold paths. Same for all the allocations caused by custom Option<T>/Result<T,E> implementations.

I think using Zod at the edge (with branded types and whatnot), while keeping return types as T/Promise<T> to keep a sane relationship with the ecosystem is a good middle ground.

jerf•19 minutes ago

I haven't done a lot of Typescript, but I've done at least a couple of month's worth now, and every time I have to type "as" my inner Haskell programmer screams.

If I could add one feature to Typescript it would be something like "as" that actually validates the result against the type system and can fail. Unfortunately, that's way, way easier said than done. It's the bad type of keyword that has unbounded runtime cost because it would have to be a runtime comparison, and there are a lot of design questions about how to write it. However, I still petulantly want it even though I can hardly define it. "zod" is pretty good but you can see how trying to add that as a "keyword" is nightmare fuel for a language-level change.

IshKebab•6 minutes ago

> I miss it when writing code in other languages.

You can use Pydantic in Python and serde_derive in Rust. I assume most languages have a thing like that.

throwaw12•about 1 hour ago

I personally love the idea and concept, but struggle to apply to real projects.

Suppose I have a User with some attributes like birthday, email and whether they have been verified.

in common codebase, you can see `if (user.verified_at != null)` or something along the lines, in case of parsed code I do feel like I should have types for each of them (or interfaces):

    - UserWithBirthday
    - VerifiedUser, UnverifiedUser
    - UserWithEmail, UserWithoutEmail

(and imagine having a method which accepts user with birthday and email to send an email day before their birthday, would you create UserWithBirthdayAndEmail type?)

it feels like it is going to bloat the interface space, how do you tackle this problem?

bern4444•24 minutes ago

It's pretty trivial to create derived and augmented types with Pick, Omit, Required, Partial. Combined with a few parsing functions that return an object typed to whatever specification you need and you are set IE:

    type User = { name: string; verified: boolean; email?: string; lastName: string; birthday?: string | { year: string; month: string; date: string; }}

    type Birthday = Required<Pick<User, 'birthday'>>;
    type UserWithBirthday = User & { birthday: Birthday } 
    type VerifiedUser = User & { verified: true; email: string; }
    type VerifiedUserWithBirthday = User & UserWithBirthday & VerifiedUser;


    const userHasBDayAndEmail = (user: User): user is VerifiedUserWithBirthday => {
        if (user.email === undefined || user.birthday === undefined) {
            return false
        }

        return true
    }

Any caller of userHasBDayAndEmail knows for the rest of its nested call stack if the provided user is a User object or a VerifiedUserWithBirthday.

The types are cheap to write (they're all derived) and have no runtime impact (types are erased at build/compile time) and these parsing functions are quite small to write

https://www.typescriptlang.org/play/?#code/FAFwngDgpgBAqgZyg...

throwaw12•15 minutes ago

creation is not a problem, maintenance is.

Suppose you want to add one more property to VerifiedUserWithBirthday and UnverifiedUserWithBirthday, you might get 2 more new types, and somewhere at the higher layer call chains you need to know which enclosing type you should pass so that some method in the bottom chain will accept it.

I am sure there are more elegant ways, but I am struggling to generalize it to most enterprise SaaS CRUD apps, where you have one object with bunch of properties and can conditionally traverse the code logic

bern4444•4 minutes ago

Yeah that's the engineering part in software engineer :)

If you have VerifiedUserWithBirthday, any value that fails the parsing function is implicitly UnverifiedUserOrUserWithoutBirthday... No need to define it separately. You get the inverse type for free IE a value that is of type User and not of type VerifiedUserWithBirthday.

A new property doesn't mean a new derived type. Only if that new property impacts what a VerifiedUserWithBirthday should represent should the VerifiedUserWithBirthday type be updated and even then, it's not a new type, just an update to an existing type. Again minimal updates needed.

The compiler handles all the validation and will tell you exactly where there are any issues - the compiler is what makes the maintenance cost quite low.

columnarx3•36 minutes ago

I think this is the wrong pattern in this instance. You parse an email or phone number because validating leaves it as a plain string, and you lose the context to know for sure if that string is actually an email or phone number.

In your instance, you could have:

  type User = {
    // ... rest of fields
    email: {
      verified: boolean,
      // branded type here ensures that this string is a proper email address
      value: EmailAddress,
    },
    birthday: Date | null,
  };

In this instance, your logic with a method that accepts birthday and email has all the information it needs to make its choice.

sirwhinesalot•39 minutes ago

The computer-science answer to this problem are called "refinement types", where you can attach arbitrary predicates to a type, e.g. (pseudo-code):

    fn send_birthday_mail(user: {u: User, u.birthday != null})

Contracts are a similar solution that restricts the predicates to only appearing in function types.

The difference between this and an assert is that it gets checked at compile time (it can get quite expensive to do the check though).

What can you do in mainstream languages? As much as is worth and no more than that. String -> User is worth it, User -> UserWithBirthday is not.

throwaw12•25 minutes ago

this looks cool, but you are doing validation when accepting the object, you probably can't do it excessively, for example, if you are dealing with objects with heights, you might have a HumanLikeHeight where height range is between 40cm and 250cm, and you want to send email to that human, would you keep adding these conditions to the predicates?

robertlagrant•about 3 hours ago

This feels right, and I also have never done it (or had the guts to get others to do it).

The reason I've not is - say there's an optional field. Currently we call that null, probably, and check each time if it's there or not. I could instead make a type, like User and UserWithPhoneNumber. Should we be making types for each combination of present/absent fields? That can't be right.

The classic answer is to move the logic inside the domain object, or have a helper function outside the object, so you aren't constantly checking for field presence/absence, but are instead writing the logic once and calling some code.

I'm not sure in practice types can help with this. But I'd love to be proven wrong.

xx_ns•about 2 hours ago

I think this is a slightly different problem. The absence of an optional field, if that's a legal state, is meaningful every time you use the type, so you encode it on the field: `phone: ValidPhoneNumber | null`. When it's not null you're still guaranteed a valid phone number. When it is null, that's a legal state you have to handle and which is domain logic, not validation you forgot to do.

The combinatorial explosion you're picturing only shows up if you make a separate type per combination of present fields, but you don't need to. An independent optional field stays one `T | null`. You only reach for distinct types when fields are correlated and present together because they represent a state, and then it's a discriminated union on a status field, which is N states, not 2^N.

robertlagrant•35 minutes ago

That's fair enough - I see what you mean. I think I read the case I was thinking into the article. Now I re-read it, it is saying what you're saying, which does make a lot of sense.

Using types like this also means you can more easily avoid assignment errors, as everything will have a very specific type (e.g. Age instead of int).

frogulis•about 1 hour ago

This explosion of optionality types is (the most important) topic of Rich Hickey's "Maybe Not" talk. I recommend it!

The short version is: the shape of a type is inherent to the type itself, but the optionality of its members is dependent on the situation. A type system that solves this problem separates these concepts to allow for this distinction.

I _suspect_ it's possible to implement something like that in typescript but I haven't tried it myself (and I doubt it's very ergonomic).

pillmillipedes•about 2 hours ago

if a user with/without phone number are equally valid states to be then types won't help you much. I think it's more about writing

  class User{phone: ?PhoneNumber}

over

  class User{phone: ?string}.

throwwwll•about 2 hours ago

To expand and give some notion of good taste:

It's more about writing

    struct User {phone: MaybePhoneNumber} // give or take, it's a monoid

over

    struct User {phone: Option<String>}

pillmillipedes•about 1 hour ago

I don't mind discussing syntax when appropriate, but this feels like arguing over which trivial brainfuck substitution[1] is the best.

> monoid

nullables with `??` and `?.` are also give-or-take monoids. is it common though to `or` two MaybePhoneNumbers together or to apply a PhoneNumber->MaybePhoneNumber function to it? if not then why mention it?

let's see something meaningfully different like a database schema.

[1] https://esolangs.org/wiki/Trivial_brainfuck_substitution

lumpysnake•about 1 hour ago

We should make authors disclose how much AI was used to write an article. This reeks of Opus 4.8.

ramon156•40 minutes ago

I recently made a Firefox Extension to mark authors as Slop for the same goal but not the same reason.

I don't think disclosing helps here. If the article wasn't obviously generated, why would that affect you ?

The only issue I have is being half-way through the article and realizing I am reading hallucinated text. If I can mark the author once, I won't see them again. This works fine for me. You could argue that disclosing would fix this issue, but the issue is not that AI was used, but that it was not curated.

lijok•about 1 hour ago

Why should they disclose how much AI was used to write an article?

lumpysnake•42 minutes ago

Because I would've completely avoided the article if I knew that I would be served slop. I was interested in the content, but I was immediately thrown off by the writing style, which closely resembles what I've been getting from Opus 4.8 lately in my dev work. Filler language and useless metaphors everywhere.

> Booleans look tidy until somebody adds a third case and exhaustiveness silently doesn’t kick in. Strings narrow honestly.

Like, nobody truly writes like that. It wouldn't get past any competent editor.

Strings narrow honestly? What does that even mean? This kind of 3-word precision is useless and they appear everywhere in the article. We get the point with in the first sentence, no need to add more.

twoodfin•14 minutes ago

I just flag like I would terrible writing by a human and move on.

It’s frankly depressing when (2018) oldies-but-goodies get reposted here for the Nth time. The clarity of thought and obvious effort that went into communicating that thought was expected for top-voted posts at the time. Now those posts appear exceptional in this era’s standard of “the LLM just cleaned up my notes” slop.

Bjartr•about 1 hour ago

If nothing else, it should be done as a courtesy to those who would like to avoid such content.

If the result is better for having used AI, why wouldn't an author want to disclose it?

NeutralCrane•31 minutes ago

I think the need to jump through hoops to disclose anything and anything that might offend someone’s particular sensibilities is a losing battle. What if I want a disclosure on if the content is being hosted via AWS vs some non-magacorp that agrees with my sensibilities more? Or that the power being used by the data center is renewable? Or a disclosure for the author’s every political position so I know if I agree with them and if I should amplify their message and/or generate ad revenue through their site?

At the end of the day, the ideas within the content are what matters. An idea has or does not have merit regardless of if it was produced entirely by a person, or by a person using AI as an editor, or 100% generated by AI. If you need a disclosure on if an idea was produced by AI, you are saying that you have no interest on debating the content on the grounds of the arguments it is making, while simultaneously ceding you can’t tell the difference between someone using AI and someone who isn’t (which undermines one of the primary arguments against AI, that it makes for inferior outputs).

lijok•about 1 hour ago

Should they disclose the use of a spellchecker? A translation app? Gramarly? A writing tutor?

exceptione•about 1 hour ago

It is nice the author mentioned F#, because if you want to target the browser (or any JavaScript runtime), you can do from F# directly from fable (https://fable.io). This allows you to program by default in a type safe manner without having to play tricks to circumvent the limits of structural typing.

robrenaud•17 minutes ago

I suspect idiomatic TypeScript or idiomatic F# are both way better solutions in the real world than abstruse Typescript emulating idiomatic F#.

rzmmm•about 1 hour ago

Is there benefit of using this branded type over just encapsulating the raw string in a private variable in closure or class? This feels a bit like forced nominal typing. The Email type doesn't have to be a string, it can be encapsulated so that invalid Emails are not representable.

iainmerrick•about 1 hour ago

The main advantage of branding is that it’s a zero-cost abstraction -- the boilerplate vanishes at runtime. Just using a string instead of a containing object can give you a lighter-weight runtime.

somat•about 1 hour ago

"TypeScript is structurally typed, which means two types with the same shape are the same type. string is string is string"

I don't speak typescript so am probably missing something obvious. but. why would you parse an email(or anything really) into a string? (or string equivalent) When parsed it will end up as a specific email object, that is, something closer to a C struct. What is the articles dance doing?

exceptione•41 minutes ago

Javascript doesn't have structs. The idea is that you have data on one hand and you have type witness about that data on the other hand. Type witness is something for the type system. But here you encounter the limits of structural typing versus nominal typing, because structural typing isn't able to witness that directly.

In sufficiently nominal type systems, I can hide the constructor for an EmailAddress type (as in: nobody can just construct an EmailAddress type). In Haskell speak, I can then export a function parseEmailAddress = rawString :: string -> EmailAddress. The function parseEmailAddress is the only place that has access to the constructor. Which means that the only way to turn a string into an EmailAddress is by calling parseEmailAddress.

Note that at runtime EmailAddress is just a string. The boundaries live in the type system, not on the value level. A structural typing system (as in TypeScript) does not enable that, it forces you to turn EmailAddress into something else than just a string.

Are you confusing Email vs EmailAddress? I think that in many cases would prefer to be EmailAddress represented as a dumb string at runtime. But if you don't, you will easily find other examples where you have 2 structurally similar types, that you don't want to mix up.

somat•8 minutes ago

Javascript does have structs, it calls them objects.

If I parsed an emailAddress the thing that came out it would look like {'domain':'example.com', 'user':'john-doe'} or emailaddr.domain emailaddr.user and a emailaddr.address method if you like that form. Even if what I parsed ended up as a single string-like field, I would still name that field. emailaddr.address

Salutes for the bit on hiding the constructor, that makes a lot of sense.

It probably does not help anything that in my one attempt at making a javascript web application I did not bother trying to understand how javascript likes it's objects and just forced a python looking model onto it. If any of the web development team saw my code I would definitely get laughed out of the club.

camdenreslink•about 1 hour ago

In some languages you can create a type that is equivalent to a string, but it’s own distinct type (sometimes called the New Type pattern). Which I guess is the same as a struct with a single field, but languages have syntactic sugar, and depending on implementation doesn’t allocate another extra wrapper object on the heap (this would happen in JavaScript/TypeScript).

LelouBil•about 1 hour ago

Look up NewTypes.

The article's dance is to avoid having extra fields that are completely unnecessary here. They want some kind of nominal email type, that is actually a string, so can be used in places where a string is needed, but when a method requires an "email" you can't use any string.

It's a pretty common pattern in functional programming and in many other languages nowadays

wwalexander•41 minutes ago

This is just validation that is using the type system to indicate the validation has already occurred. I think the real point of “parse, don’t validate” is to make the type system give you structural guarantees that couldn’t exist otherwise (e.g. always having a first/last element in the NonEmpty example from the original article). If you’re just branding the types as “parsed” (in reality, simply validated) you still have to know that the invariants you care about hold when using the “parsed” type (e.g. splitting the email type using “@“ will always yield 2 elements), instead of the structure of the type holding that info inherently (e.g. struct Email { name: String, host: String }).

jerf•25 minutes ago

"This is just validation that is using the type system to indicate the validation has already occurred. I think the real point of “parse, don’t validate” is to make the type system give you structural guarantees that couldn’t exist otherwise (e.g. always having a first/last element in the NonEmpty example from the original article)."

It's the same thing. In the latter case, something has validated that your NonEmpty has a first and a last element. It's all validation before you stick it in a type that asserts that the validation is guaranteed to have occurred so every function receiving it doesn't need to do it itself.

Any non-trivial use of a type system will involve making guarantees the type system itself can not actually express [1]. There's nothing wrong with saying "this is a valid email in accordance with my standards" in a type. Merely using the type system to assert "I have some sort of value in the name and host fields" is valid but a degenerate use. "struct Email { name: Name, host: Hostname }" is an even stronger use of the type system, where Name and Hostname are themselves values you can only get by passing some incoming string through a validation process. Asserting that these things exist is just the most basic check possible, but your type still permits {name: "\0\0\0\0\0\0", host: "!"}, whereas under my definition, assuming that Name and Hostname are reasonably defined, that value will not be ever be something that can be witnessed.

In fact in general, while I don't absolutely rigidly apply this, especially in smaller script-like programs, when a "string" appears in my strong types that specifically means "this has unbounded contents". It's an appropriate type for "stuff I got off a network" or "stuff a user typed". What stuff? Don't know. Haven't checked it yet. When I do it'll get a more specific type like a Username or DecodedUTF8String or something else. Thanks to people using way too many "strings" and "ints" in the world I have to constantly explain to my LLM that I want stronger types. I'm yet to find the invocation to put into my CLAUDE.md or equivalent to get it to do it right the first time consistently.

[1]: With a wistful stare into the distance acknowledging the theoretical utopia of dependent types... but it doesn't seem to be coming down from "theoretical" any time soon.

hankbond•about 2 hours ago

As a new TypeScript user these are concepts that have greatly helped me simplify my code and improve reliability discrete of testing. Many LLMs guide in this direction if you loosely ask them, but having a concise post like this with the what and the why is fantastic as reference material. The suggestion to use Separation and a Linter rule is something I'm going to immediately look into for my current project. Great post!

roywiggins•10 minutes ago

ai; dr, unfortunately

ivolimmen•about 1 hour ago

One of the pillars of Domain Driven Design. I love working on a pure DDD application but I do not often convince my team (I am a constant) that this is the best way ...

jve•about 1 hour ago

> I am a constant

What did you mean by that? You don't accept mutability or any inputs on your state of mind?

ramses0•about 1 hour ago

Meta: in addition to upvotes and downvotes, we almost need a slop/not-slop slider.

This one barely scrapes by at what feels like 30-40% "slop": "honestly", "the one thing", etc...

...but I did learn something about "Brand" types, and have personally tried to do more of "parse don't validate" in my own code.

Recently I did this similar trick for `exec( ValidExecutable(...) )` [python], where it required tagging/washing through a private function/variable to "get" the private bit.

All the scanners tend to light up when they see "exec" at all (eg: `exec( "pandoc" )` for PDF generation), but I needed to hard code a few "expected" pandoc locations so the imaginary hackers couldn't shadow "pandoc" on a path location they controlled.

conartist6•about 2 hours ago

Don't forget to freeze the objects