GenCAD
438
RU version is available. Content is displayed in original English for accuracy.
RU version is available. Content is displayed in original English for accuracy.
Discussion Sentiment
Analyzed from 4889 words in the discussion.
Trending Topics
Discussion (126 Comments)Read Original on HackerNews
I noticed in the GitHub that they mention it is only around 60% reliable even on their own training data, but the image shown on the front page feels pretty misleading. I made 10 images that were very similar in complexity to the examples shown, and even after running it around 50 times on each image, not a single one worked correctly. In the rare cases where it produced something, the output was completely wrong.
This seems pretty misleading in its current state and definitely needs more work.
In the limitation section they are quite direct with it: "images used .. are mostly isometric and noise-free CAD-images" and "limited CAD vocabulary used in this study needs to be extended by including more sophisticated CAD tokens such as revolve operation, edge operation (e.g., fillets/chamfers)". It currently supports only series of pads (can be subtractive). Many simple beginner exercises of seemingly similar complexity don't fit those constraints. Some of the parts shown could be more naturally created using wider set of tools, but technically can be created using only pads.
So even if you tried with clean screenshot of simple 3d model, it will likely fail if camera settings aren't right, and it will fail if model can't be represented using series of pad operations. Anything containing spheres, cones, nearly all lathe parts, fillets which can't be included in 2d sketches will fail. In theory arbitrary extrusion angles are supported, but all examples showed only axis aligned parts.
That said I wouldn't be surprised if it failed even if you considered exact limitations not just similar complexity in your attempts.
Assuming images in the googledrive are the training data, "mostly isometric and noise-free CAD images" is a bit of understatement. All of them are at very specific angles using and single style. Specific solid gray infill color for "images", and white infill with weird shading around perimeter for "sketches". Both with white background and pixelated non antialised 1px lines. No reason to expect it capable of processing anything which doesn't have exactly same visual style. For practical product that would have to be solved but that wasn't really the point of this research. All the style transference papers have show that it's more or less solvable problem, but for a paper exploring what model architecture and 3d representation could work best for AI cad it seems like unnecessary distraction that would only bloat the training costs and time. Most annoying part is that it makes hard to test the model with your own inputs.
So I would really appreciate a good AI/LLM tool that I can feed my sketch and parameters and it can save me hours of searching web and watching tutorials on how to extrude a circle over a curve
BTW, any existing AI tools work really well with OpenSCAD, so if you want a parametrized model that can be made out of simple shapes, I highly recommend this flow
There are mechanical engineers out there who can literally model objects nearly as fast and they can 'think' about the layout of said object. If you look at the complexity of, say, a CAD model from a real, highly complex aluminum casting section of an automotive subframe, or the living-room-sized cross-fuselage spar forging of a fighter aircraft, with hundreds of ribs and fillets and features- and compare that to the simple model you are trying to make in OpenSCAD, you should quickly realize the parallels in difficulty you are trying to express (similar to the person without knowledge of C++ or Python watching someone be able to build applications by typing code from their fingertips as if they already knew what to type...)
You are struggling for a few reasons- 1., it is a knowledge hurdle of an entire field you are trying to surmount- again, go watch someone actually model a real, complex part and watch the speed, they can do so in a tool like Solidworks, CATIA, NX, etc... at a rate that is far different because they have experience that it can honestly take even good people years to accumulate - and 2. they are using professional tools - you mention OpenSCAD, like it is CAD, but it isn't. It is programmatic mesh generation, and it turns out that programmatically typing out how to generate complex things is much more difficult than a combination of a graphical GUI and graph-based generator that all big CAD programs figured out starting in the 1980s. If those tools you use were really the best way to make complex models 'paramaterized', then why do we design our fighter jets, Formula 1 cars, or Space X rockets in Dassault's CATIA or Siemen's NX ?
You want a LLM to take a sketch to your CAD, but what I'm saying is, there are people out there that can skip the sketch and build the CAD as fast as you can likely hand draw the first sketch, and these are skills you can actually learn, but you may just be using the wrong tools and have not had the practice necessary.
like its purposefully built to be unusable
I think this is possible, but the ‘trick’ would be translating your instructions in English into some kind of language that the CAD software understands.
I’m on a bunch of 3D printing forums, and everyone tries to describe what the finished product would LOOK like. They end up making PICTURES when what they really want is a STL file.
Two dimensions are easier to visualize then three, so let’s put it this way:
If you wanted to turn “English” into “a 2D image that’s dimensionally accurate”, you’d want to translate from “English” to “SVG.”
SVG is dimensionally accurate. JPG isn’t. The file format itself has no concept of “dimensions” only “pixels.”
I haven't tried telling AI to "make a thing," but I'm able to get Co-pilot to refactor code. It's just the geometry that makes my head spin.
It will take much longer than a day for AI to get to this level, so there's not much to lose by just learning how to use the software now :)
From the outside, the hard part of designing a chair is making a blueprint. At least making a blueprint looks hard to people who've never made one. According to outsiders, the next layer of the onion is perhaps inserting reasonable constraint dimensions for similar reasons.
From the inside, as a guy who's recreationally made furniture, the hard part is judgment about joint selection and design, experience with wood warping (all wood changes shape with the seasons, a good woodworker makes it look easy to work around and a bad one makes expensive firewood that rapidly falls apart). Another insider PoV is judgment about wood selection to get the correct balance of final finish durability and appearance. Finally working toward outer layers of the onion, its time to do parametric joint design decisions... What's the ideal number and size of dovetail joints for, perhaps, a drawer.
I've seen prints of chairs before I don't need a LLM to make one similar to the ones I've seen before and could probably make from memory (at least ones I built myself), the library has loanable books and woodworking magazines. I do see the attraction from the outside.
Consider something like a Windsor chair. The larger the wedge in the spindles the tighter and longer lasting the chair until you break something trying to force them in; there's a lot of judgment and experience in designing, selecting, and installing spindles, but none of it is written down so it'll be hard to train a LLM... Tighten it until it breaks then don't tighten it that much next time. Most super detailed plans for Windsors are for inferior machine produced replicas which are not necessarily useful for a fine woodworker and are not exactly what craftsmen would aspire to. People who want "a cheap chair" will buy a 4-pak of folding chairs from walmart anyway, not make a homemade Windsor-style chair.
Another somewhat more blunt example is for actual woodworkers the "problem" with hand cut dovetails isn't knowing what they look like or how to make a diagram of one, but gaining the experience behind a hammer and chisel to push your luck while cutting them as far as possible without going too far and turning the part into scrap. One unavoidable part of woodworking is I've turned quite a bit of wood into scrap on the last step; oh well make another. At least I can burn scrap wood to keep warm LOL.
Its kind of like from outside the programming fraternity the non-programmers think the only skills required to program are typing real fast and being very experienced at fizzbuzz during interviews. But that doesn't work IRL, from an inside-out perspective.....
The woodworking world is not exactly lacking for a library of "semi-decent" plans. An automated system to make enormous quantities of low quality unverified and untested plans would not really help the field, no.
So why $work-1 spent so much time on this was quite logical. When you have point clouds generated from crappy head mounted cameras, you get models that are very complex.
for example, if you look at a point cloud of an Ikea LACK (https://www.ikea.com/gb/en/p/lack-nest-of-tables-set-of-2-wh...) It will be massively complex. this means that when you want to perform nay kind of interaction with it, its computationally difficult (https://www.researchgate.net/publication/221064696/figure/fi...)
So an active area of research is point cloud to "CAD" model (ie simplyfied, where a LACK tabl would be ~40 triangles rather than 400k)
One of those ways is to say "oh this pointcloud looks like a table, lets generate a bunch of hypothesis tables and see if they fit." One way to do that is to have a model that understands parametric CAD, and can create a number of tables with parameters that can be adjusted until it fits.
A perhaps easier way is to take a point cloud, get an image model trained on CAD models to draw models, in 2d imagery, then use something like this to get an actual model out.
Its not efficient, but it might work.
There are also lots of other cases, like automatic plagiarism, which are less good.
Basically leverage the randomness to create many variations, then select the most accurate variation automatically.
Terribly wasteful of time and processing power, but so is using GPU time to make pretty pictures randomly.
https://github.com/cjtrowbridge/vibe-modeling
I would even argue that for basic modelling majority of tools/features in CAD operate at the abstraction closer to CSG for describing what and B-rep is only treated as how. Just like good chunk of code based CAD use combination of CSG for what and triangle mesh based geometry engine for how. That's assuming you consider standard 2d->3d operations (extrude, revolve, sweep along arbitrary 2d profile) as valid primitives for CSG.
User comes into direct contact of B-rep in very specific situations: 1 doing operations like fillets/chamfers/draft/thickness based on intermediate geometry, 2 attaching sketches or other features to generated geometry or using generated edges (instead of new sketches) for guiding operations like complex sweeps, 3 surface based modelling workflows where you build up the the solid from individual faces typically including complex curved surfaces.
In case of 1 and 2 the the dependency on b-rep based representation is only marginal, in theory you could select edges in triangle mesh based underlying representation but the final result but quality of result wouldn't be as nice and TNP issues for parametric model editing would likely be even bigger than it is for existing CAD. That's not really CSG territory anymore but isn't exclusive to B-rep either, and involves a bunch of work that's outside the scope of B-rep. In non parametric mesh modellers with more destructive editing workflows like blender chamfers and fillets work fine. And if anything for reliable parametric models you often want to limit dependencies on intermediate geometry as that depends on CAD keeping track of where each edge/face originated from outside the b-rep and increases the chance of TNP issues.
3 is critical for industrial design containing large amount of complex curved surfaces like cars and other consumer products, but there are also many more technical parts where it can be completely ignored. Cad tutorials for beginner tutorials almost completely ignore this category of cad modelling. The part about not being exclusive to b-rep also applies for surface modelling part.
It's analogous to "all squares are rectangles, but not all rectangles are squares" (squares=CSG, rectangles=BREP)
CSG by itself isn't suitable for most CAD use-cases.
What is your workflow for llm integration to openscad?
For very simple geometries it works great, but it very quickly becomes apparent that there’s a bit of a disconnect between “LLM views image” and “LLM emits scad that looks like that image” when it comes to anything non-trivial.
Still gives me a starting point I can mess with, which is great since I have zero CAD training or experience.
(I’m not the commenter you replied to)
I've one shotted a light saber hilt with threaded parts and it worked flawlessly.
In comparison, here's one of my recent designs: what I would still call a very simple case [4]. And it's not like I'm a trained mechanical engineer working commercially, this is stuff I design in my spare time as a programmer.
[1] - https://github.com/cjtrowbridge/vibe-modeling/blob/main/outp...
[2] - https://github.com/cjtrowbridge/vibe-modeling/blob/main/outp...
[3] - https://github.com/cjtrowbridge/vibe-modeling/blob/main/outp...
[4] - https://object.ceph-eu.hswaw.net/q3k-personal/fe3e54e6df604a...
Ironically the former is engineered to avoid the latter.
1Blocker and AdGuard.
For 1Blocker I have everything enabled except adult content and scripts. AdGuard has everything turned off except for General.
These are just the free versions too.
I've been using this setup for a few years and it works close to 100% of the time.
I also wrote a bit about what goes into CAD apps! https://campedersen.com/tessellation
I can't help but to be skeptical of one person writing ~115k LOC in 4 months which is just the Rust crates, nevermind the frontend (which is another 100k LOC!!!).
I'm curious why you decided to go with "eager" tessellation. Creating a circle immediately results in a bunch of lines which resemble a circle but would fail under tangency constraints quickly. Is this a current limitation or part of the strategy for the kernel?
In my own project I use LLMs very sparingly and hesitantly, but made the observation that they are not very useful on the hard parts of CAD. I expect this is because of a lack of training material. Most professional CAD applications are proprietary and books on the topic are usually sparse on implementation details. The non-BRep CAD applications such as OpenSCAD and family are probably overrepresented in the training data.
This might also explain why people's experiences with LLMs are very varied. If you stay in the happy path of CRUD web development and stuff all is nice and well, but if you start to veer off this path you get more and more challenged.
What wasn't working? I would love real feedback -
I’ve seen several vibe coded attempts at a geometric kernel (including several of my own) and this happens every time.
Vibe coding a geometric kernel is practically impossible because sooner or later* the LLM inevitably takes the tessellation shortcut and if you don’t catch it, the codebase is completely compromised. At the end of the day, there isn’t enough training data in architecture or algorithms (opencascade solvespace and truck being the only real examples, all significantly worse than commercial kernels like Parasolid or ACIS).
* usually as soon as you ask it to do anything non-trivial. If you’re lucky you’ll get a naive Newton marching algorithm on analytical bodies, which is slightly better but has the same problem with degenerate and pathological cases (coincidence, tangents, parallels)
Every time I see one of these things,its like whoever worked on it doesn't know how to use CAD or understand what CAD is used for.
Every 6 months, I reevaluate how well LLMs can model from scratch or can modify existing files.
It's very superficial (atleast the last time I tried it). I'm guessing if/when LLMs crack visual reasoning, they might be able to do it.
[0]: https://daibingquan.github.io/MeshCoder/
[1]: https://grandpacad.com
Which CAD program? I'm confused
Am I reading this right?
>Most importantly, GenCAD does not merely generate a 3D solid but also the entire CAD program.
Looks like you can go JSON -> step files, but not really in such a way that you can modify any of the operations.
* https://github.com/mightyhorst/DeepCAD
Clue here: > Our proposed GenCAD architecture...
So, at this point, it seems like this will work with all CAD programs, since they have yet to encounter any systems that they can't work with. More seriously, my guess would be whatever one is available for free in their lab. Kind of standard operating procedure for academic projects -- do a proof of concept, make a video that avoids known bugs, get a grade, push source to git, graduate. Good ideas come out of that... production code... eh... maybe.
More likely someone ends up in the situation that my kid did, previous graduate student's git repo is stale by 2 versions of C++, and 4 versions of ROS, and neither of the two unit tests still work after porting.
Doesn't matter. CAD models/objects are represented by a sequence of operations on a primitive or sketch. Unlike meshes, that describe the manifested resulting shape of objects in 3D programs like Blender.
So it's about the fact, that their model outputs that hierarchy of operations. The history of development, not just the result.
That seems difficult enough that I have not found an open source program to load a 3D model and allow me to set the toolpaths in a UI, never mind have an LLM generate them from the model.
Actually the drawing and modeling are very much the hard part, so much so that the open source geometric kernels are decades behind the commercial ones. The computational geometry is genuinely a hard problem due to floating point errors and degenerate cases like parallel surfaces and tangent lines.
Once you have the geometric kernels, CAM is little more than a physically aware pathfinding optimization problem. Computationally expensive but otherwise straight forward. The kernel, on the other hand, has to be built up experimentally, tracking down every place where the math breaks down or there’s a pathological case, until you’ve got the thousands of special cases worked out.
https://arxiv.org/abs/2603.04337 https://arxiv.org/abs/2603.05607 https://arxiv.org/abs/2605.01171
For a more detailed review: https://github.com/lichengzhanguom/LLMs-CAD-Survey-Taxonomy
"These fonts are licensed under the Open Font License. You can use them in your products & projects – print or digital, commercial or otherwise."
Then then have a trained llm that has can generate kcl to either create new parts or act as a llm assistant for changes to existing parts.
It’s neat that llms can do 3-D but I wonder how much of the problem is integration.
https://youtube.com/@thang010146
I don't mean to come across as personally critical. From your comments it sounds possible (I am not sure, of course) that you have been having some distressing experiences. If so, we hope things improve. But please don't post these comments to HN - they aren't on topic here, and they're not an effective way to address the situation.
Re your other comment: I am sure it is a serious issue, although I don't understand it. It's just not an issue that can be solved through internet forum threads like HN.