The comment that points out that this week-long experiment produced nothing more than a non-functional wrapper for Servo (an existing Rust browser) should be at the top:
Has anyone tried to rewrite some popular open source project with IA? I imagine modern LLMs can be very effective at license-washing/plagiarizing dependencies, it could be an interesting new benchmark too
As the author, it's a stretch to say that JustHTML is a port of html5ever. While you're right that this was part of the initial prompt, the code is very different, which is typically not what counts as "port". Your mileage may wary.
Interesting, IIUC the transformer architecture / attention mechanism were initially designed for use in the language translation domain. Maybe after peeling back a few layers, that's still all they're really doing.
This has long been how I have explained LLMs to non-technical people: text transformation engines. To some extent, many common, tedious, activities basically constitute a transformation of text into one well known form from another (even some kinds of reasoning are this) and so LLMs are very useful. But they just transform text between well known forms.
Not me personally, but a GitHub user wrote a replacement for Go's regexp library that was "up to 3-3000x+ faster than stdlib": https://github.com/coregx/coregex ... at first I was impressed, so started testing it and reporting bugs, but as soon as I ran my own benchmarks, it all fell apart (https://github.com/coregx/coregex/issues/29). After some mostly-bot updates, that issue was closed. But someone else opened a very similar one recently (https://github.com/coregx/coregex/issues/79) -- same deal, "actually, it's slower than the stdlib in my tests". Basically AI slop with poor tests, poor benchmarks, and way oversold. How he's positioning these projects is the problematic bit, I reckon, not the use of AI.
Same user did a similar thing by creating an AWK interpreter written in Go using LLMs: https://github.com/kolkov/uawk -- as the creator of (I think?) the only AWK interpreter written in Go (https://github.com/benhoyt/goawk), I was curious. It turns out that if there's only one item in the training data (GoAWK), AI likes to copy and paste freely from the original. But again, it's poorly tested and poorly benchmarked.
I just don't see how one can get quality like this, without being realistic about code review, testing, and benchmarking.
I went through the motions. There are various points in the repo history where compilation is possible, but it's obscure. They got it to compile and operate prior to the article, but several of the PRs since that point broke everything, and this guy went through the effort of fixing it. I'm pretty sure you can just identify the last working commit and pull the version from there, but working out when looks like a big pain in the butt for a proof of concept.
> but several of the PRs since that point broke everything, and this guy went through the effort of fixing it. I'm pretty sure you can just identify the last working commit and pull the version from there, but working out when looks like a big pain in the butt for a proof of concept.
I went through the last 100 commits (https://news.ycombinator.com/item?id=46647037) and nothing there was working (yet/since). Seems now after a developer corrected something it managed to pass `cargo check` without errors, since commit 526e0846151b47cc9f4fcedcc1aeee3cca5792c1 (Jan 16 02:15:02 2026 -0800)
There are conversations elsewhere - I'd have to go look through them, but at some point about an hour before the article was published, it could be compiled, and then things got pushed that broke it again? There's no central discussion, I had to piece together information from multiple threads.
Sorry, I should have taken notes, lol. At any rate, it was so much digging around I just gave up, I didn't want to invest more effort into it. I figured they'd get a stable version for others to try and I'd return to it at some point.
Negative results are great. When you publish them on purpose, it's honorable. When you reveal them by accidentally, it's hilarious. Cheers to Cursor for today's entertainment.
The blog[0] is worded rather conservatively but on Twitter [2] the claim is pretty obvious and the hype effect is achieved [2]
CEO stated "We built a browser with GPT-5.2 in Cursor"
instead of
"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"
Even then, "resolving merge conflicts along the way" doesn't mean anything, as there are two trivial merge strategies that are always guaranteed to work ('ours' and 'theirs').
We use claude code a lot for updating systems to a newer minor/major version. We have our own 'base' framework for clients which is a, by now, very large codebase that does 'everything you can possibly need'; so not only auth, but payments, billing, support tickets, email workflows, email wysiwyg editing, landing page editor, blogging, cms, AI /agent workflows etc etc (across our client base, we collect features that are 'generic' enough and create those in the base). It has many updates for the product lead working on it (a senior using Claude code) but we cannot just update our clients (whose versions are sometimes extremely customised/diverging) at the same pace; some do not want updates outside security, some want them once a year etc. In this case AI has been really a productivity booster; our framework always was quite fast moving before AI too when we had 3.5 FTE (client teams are generally much larger, especially the first years) on it but then merging, that to mean; including the new features and improvements in the client version that are in the new framework version without breaking/removing changes on the client side, was a very painful process taking a lot of time and at at least 2 people for an extended period of time; one from the client team, one from the framework team. With CC it is much less painful: it will merge them (it is not allowed, by hooks, to touch the tests), it will run the client tests and the new framework tests and report the difference. That difference is evaluated usually by someone from the client team who will then merge and fix the tests (mostly manually) to reflect the new reality and test the system manually. Claude misses things (especially if functionalities are very similar but not exactly the same, it cannot really pick which to take so it does nothing usually) but the biggest bulk/work is done quickly and usually without causing issues.
Haha. True, CI success was not part of PR accept criteria at any point.
If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline
If I had a nickel for every time I've seen a human dev disable/xfail/remove a failing test "because it's wrong" and then proceeding to break production I would have several nickels, which is not much, but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.
A coworker opened a PR full of AI slop. One of the first things I do is check if the tests pass. Of course, the didn't. I asked them to fix the tests, since there's no point in reviewing broken code.
"Fix the tests." This was interpreted literally, and assert status == 200 got changed to assert status == 500 in several locations. Some tests required more complex edits to make them "pass."
Inquiries about the tests went unanswered. Eventually the 2000 lines of slop was closed without merging.
100%, trying a bit of an experiment like this(similar in that I mostly just care about playing around with different agents, techniques etc.) it has built out literally hundreds of tests. Dozens of which were almost pointless as it decided to mock apis. When the number of failed tests exceeded 40 it just started disabling tests.
To be fair, many human developers are fond of pointless tests that mock everything to the extent that no real code is actually exercised. At least the tests are fast though.
> it is shocking how often claude suggests just disabling or removing tests.
Arguably, Claude is simply successfully channeling what the developers who wrote the bulk of its training data would do. We've already seen how bad behavior injected into LLMs in one domain causes bad behavior in other domains, so I don't find this particularly shocking.
The next frontier in LLMs has to be distinguishing good training data from bad training data. The companies have to do this, even if only in self defense against the new onslaught of AI-generated slop, and against deliberate LLM poisoning.
If the models become better at critically distinguishing good from bad inputs, particularly if they can learn to treat bad inputs as examples of what not to do, I would expect one benefit of this is that the increased ability of the models to write working code will then greatly increase the willingness of the models to do so, rather than to simply disable failing tests.
If I had a nickel for every time I’ve seen a human being pull down their pants and defecate in the middle of the street I’d have a couple nickels. That’s not a lot but it suggests that this behavior is not LLM specific.
Had humans not been doing this already, I would have walked into Samsung with the demo application that was working an hour before my meeting, rather than the android app that could only show me the opening logo.
There are a lot of really bad human developers out there, too.
It's implied by the fact that early in the post they say:
>"To test this system, we pointed it at an ambitious goal: building a web browser from scratch."
and then near the end, they say:
>"Hundreds of agents can work together on a single codebase for weeks, making real progress on ambitious projects."
This means they only make progress toward it, but do not "build a web browser from scratch".
If you're curious, the State of Utopia (will be available at https://stateofutopia.com ) did build a web browser from scratch, though it used several packages for the networking portion of it.
So clearly someone, at some point, managed to run this, surely? That's where the screenshots come from? I just don't understand how, given the code is riddled with errors.
I'm eager to find out if this was actually successfully compiled at one point (otherwise how did they get the screenshots?), so I'm running `cargo check` for each of the last 100 commits to see if anything works. Will update here with the results once it's ready.
> Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.
> Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even a commit made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
Gonna need to look closer into it when I have time, but seems they manually patched it up in the end, so the original claim still doesn't stand :/
I wouldn’t be surprised if any form of screen shot is fake (as in not made the way it claims), in my experience Occam’s razor tends to lead that way when extraordinary claims are made regarding LLM’s.
Like it or not, it's a fundraising strategy. They have followed it mutliple times (eg: vague posts about how much their inhouse model is writing code, online RL, and lines of code etc. earlier) and it was less vague before. They released a model and did not give us the exact benchmarks or even tell us the base model for the same. This is not to imply there is no substance behind it, but they are not as public about their findings as one would like them to be. Not a criticism, just an observation.
Never releasing the benchmarks or being openly benched unlike literally every other model provider always irked me.
I think they know they're on the backfoot at the moment. Cursor was hot news for a long time but now it seems terminal based agents are the hot commodity and I rarely see cursor mentioned. Sure they already have enterprise contracts signed but even at my company we're about to swap from a contract with cursor to Claude code because everyone wants to use that instead now - especially since it doesn't tie you to one editor.
So I think they're really trying to get "something" out there that sticks and puts them in the limelight. Long context/sessions are one of the hot things especially with Ralph being the hot topic so this lines up with that.
Also I know cursor has its own cli but I rarely see mention of it.
Unfortunately all the major LLM companies have realized the truth doesn't really matter anymore. We even saw this with the GPT-5 launch with obviously vibe coded + nebulous metrics.
Diminishing returns are starting to really set in and companies are desperate for any illusion to the contrary.
I used to hate this, I've seen Apple do it with claims of security and privacy, I've seen populist demagogues do this with every proposal they make. Now I realize this is just the reality of the world.
Its just a reminder not to trust, instead verify. Its more expensive, but trust only leads to pain.
The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.
Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.
Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even some commits made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
I think the original post was just headline bait. There is such a fast news cycle around AI that many people would take "Thousands of AI agents collaborate to make a web browser" at face value.
At least I now have something to link to, when this inevitable gets mentioned in some off-hand HN comment, about how "now AI agents can build whole browsers from scratch".
A fast news cycle around projects that don't actually work. It's a real bummer that "fake news" became politically charged because it's a perfect description of this segment.
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
"From scratch" sounds very impressive. "custom JS VM" is as well. So let's take a look at the dependencies [1], where we find
- html5ever
- cssparser
- rquickjs
That's just servo [2], a Rust based browser initially built by Mozilla (and now maintained by Igalia [3]) but with extra steps. So this supposed "from scratch" browser is just calling out to code written by humans. And after all that it doesn't even compile! It's just plain slop.
Why would they think it's a great idea to claim they implemented CSS and JS from scratch when the first thing any programmer would do is to look at the code and immediately find out that they're just using libraries for all of that?! They can't be as dumb as thinking no one would've noticed?!
I guess the answer is that most people will see the claim, read a couple of comments about "how AI can now write browsers, and probably anything else" from people who are happy to take anything at face value if it supports their view (or business) and move on without seeing any of the later comotion. This happens all the time with the news. No one bothers to check later if claims were true, they may live their whole lives believing things that later got disproved.
> Why would they think it's a great idea to claim they implemented CSS and JS from scratch when the first thing any programmer would do is to look at the code and immediately find out that they're just using libraries for all of that?!
Programmers were not the target audience for this announcement. I don’t 100% know who was, but you can kind of guess that it was a mix of: VC types for funding, other CEOs for clout, AI influencers to hype Cursor.
Over-hyping a broken demo for funding is a tale as old as time.
That there’s a bit of a fuck-you to us pleb programmers is probably a bonus.
I'm actually impressed by their ignorance. I could never sleep at night knowing my product is built on such brazen lies.
Bullshitting and fleecing investors is a skill that needs to be nurtured and perfected over the years.
I wonder how long this can go on.
Who is the dumb money here? Are VCs fleecing "stupid" pension funds until they go under?
Or is it symptom of a larger grifting economy in the US where even the president sells vaporware, and people are just emulating him trying to get a piece of the cake?
That is because I've noticed the AI just edits the version management files (package.json, cargo.toml, etc) directly instead of using the build tool (npm add, cargo add), so it always hallucinates a random old version that's found in its training set. I explicitly have to tell the AI to use the build tool whenever I use AI.
I was LITERALLY thinking the other day of a niche tool for engineers to help them discover and fix this in the future because at the rate I have seen models version lock dependencies I thought this is going to be a big problem in the future.
You can do prompt injection through versions. The LLM would go back to GitHub in its endless attempt to people please, but dependency managers would ignore it for being invalid.
Bigger companies have vulnerability and version management toolsets like Snyk, Cycode, etc. to help keep things up to date at scale across lots of repos.
> The JS engine used a custom JS VM being developed in vendor/ecma-rs as part of the browser, which is a copy of my personal JS parser project vendored to make it easier to commit to.
It looks like there are two JS backends: quickjs and vm-js (vendor/ecma-rs/vm-js), based on a brief skim of the code. There is some logic to select between the two. I have no idea if either or both of them work.
It seemingly did but after I saw it define a VerticalAlign twice in different files[1][2][3] I concluded that it's probably not coherent enough to waste time on checking the correctness.
Would be interesting if someone who has managed to run it tries it on some actually complicated text layout edge cases (like RTL breaking that splits a ligature necessitating re-shaping, also add some right-padding in there to spice things up).
I thought they'd plagiarise, not import. Importing servo's code would make it obvious because it's so easy to look at their dependencies file. And yet ... they did. I really think they thought no one would check?
Hypothetically: what if they did check, only in order to ‘check’ they asked the LLM instead of manually verifying and were told a story? Or, perhaps, they did check manually but sometime after the files were subtly changed despite no incentive or reason to do so outside of a passing test? …
Humans who are bad and also bad at coding have predictable, comprehensible, failure modes. They don’t spontaneously sabotage their career and your project because Lord Markov twitched one of its many tails. They also lie for comprehensible reasons with attempts at logical manipulations of fact. They don’t spontaneously lie claiming not to having a nose, apologize for lying and promise to never do it again, then swear they have no nose in the next breath while maintaining eye contact.
Semi-autonomous to autonomous is a doozy of a step.
You know, a good test would be to tell it to write a browser using a custom programming language, or at least some language for which there are no web browsers written.
Write a browser without any access to the internet, is what I'd attempted if I was running this experiment. Just seed it with a bunch of local HTML, CSS and JS files from the various testing suites that exists.
To be fair, even if "from scratch" means "download and build Chromium", that's still nontrivial to accomplish. And with how complicated a modern browser is, you can get into Ship of Theseus philosophy pretty fast.
I wouldn't particularly care what code the agents copied, the bigger indictment is the code doesn't work.
So really, they failed to meet the bar of "download and build Chromium" and there's no point to talk about the code at all.
I really doubt this marketing approach is effective. Isn't this just shooting themselves in the foot? My actual experience with Cursor has been: their design is excellent and the UX is great—it handles frontend work reasonably well. But as soon as you go deeper, it becomes very prone to serious bugs. While the addition of Claude's new models has helped somewhat, the results are still not as good as Google's Antigravity (despite its poor UX and numerous bugs). What's worse, even with this much-hyped Claude model, you can easily blow through the $20 subscription limit in just a few days. Maybe they're betting on models becoming 10x better and 10x cheaper, but that seems unlikely to happen anytime soon.
Hitting my head into buggy apps made by these AI companies and seeing them all be amazed in parallel that skills/MCP would be necessary for real work has me pretty relaxed about ‘our jobs’.
OpenAIs business-model floundering, degenerating inline to ads soon (lol), shows what can be done with infini-LLM, infini-capital, and all the smarts & connections on Earth… broadly speaking, I think the geniuses at Google who invented a lot of this shizz understand it and were leveraging it appropriately before ChatGPT blew up.
Can’t help but draw parallels to how working with AI feels like. Your coworker opens a giant impressive looking PR and marks it ready for review. Meanwhile it’s up to someone else in the team to do the actual work of checking. Meanwhile the PR author gets patted on the back by management for being forward thinking and pro-active while everyone else is “nitpicky” and holding progress back.
I wonder who they actually tried to impress with that? People who understand and appreciate the difficulty of building a browser from scratch would surely be interested to understand what you (or your Agent) did to a degree that they would understand if you didn’t.
Key phrase "They never actually claim this browser is working and functional
" This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny.
In my personal experience, Codex and Claude Code are definitively capable tools when used in certain ways.
What Cursor did with their blogpost seems intentionally and outright misleading, since I'm not able to even run the thing. With Codex/Claude Codex it's relatively easy to download it and run it to try for yourself.
Yes, many tools work like that, especially professional tools.
You think you can just fire up Ableton, Cubase or whatever and make as great music as a artist who done that for a long time? No, it requires practice and understanding. Every tool works like this, some different difficulties, some different skill levels, but all of them have it in some way.
This is the company making the tool that is holding the tool, in this case, claiming that "[they] built a browser" when, if TFA's assertions are correct, they did not "build a browser" by any reasonable interpretation of those words.
(I grant that you're speaking from your experience, about different tools, two replies up, but this claims is just paper-rock-scissorable through these various AI tools. "Oh, this tool's authors are just hype, but this tool works totes-mc-oates…". Fool me once, and all.)
Yes, and apparently is a horrible way, because they've obviously failed to produce a functioning browser. But since I'm the author of TFA, I guess I'm kind of biased in this discussion.
Codex was sold to me as a tool that can help me do program. I tried it, evaluated it, found it helpful, continued using it. Based on my experience, it definitively helps with some tasks. Apparently also, it does not work for others, for some not at all. I know the tool works for me, and I take the claim that it doesn't for others, what am I left to believe in? That the tool doesn't actually work, even though my own experience and usage of it says otherwise?
Codex is still an "AI success", regardless if it could build an entire browser by itself, from scratch, or whatever. It helps as it is today, I wouldn't need it to get better to continue using it.
But even with this perspective, which I'd say is "nuanced" (others would claim "AI zealot" probably), I'm trying to see if what Cursor claims is actually true, that they managed to build a browser in that way. When it doesn't seem true, I call it out. I still disagree with "This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny", and I'm claiming what Cursor is doing here is different.
Not even the Ableton marketing team is telling me I can just fire up Ableton and make great music and if I can't do that I must be a brainwashed doomer.
The argument isn't what OpenAI/Anthropic are selling their users, what I said was:
> are definitively capable tools when used in certain ways
Which I received pushback on. My reply is to that pushback, defending what I said, not what others told you.
Edit: Besides the point, but Ableton (and others) constantly tell people how to learn how to use the tool, so they use it the right way. There is a whole industry of people (teachers) who specialize in specific software/hardware and teaching others "how to hold the tool correctly".
> "definitively capable tools when used in certain ways". This sounds like "if it doesn't work for you is because you don't use in the right way" imo.
Yes, because that's what it is. If you seriously can't get Gemini 3 or Opus 4.5 to work you're either using it wrong or coding on something extremely esoteric.
> Codex and Claude Code are definitively capable tools when used in certain ways.
They definitely can make some things better and you can do somethings faster, but all the efficiency is gonna get sucked up by companies trying to drop more slop.
Yes that's completely expected. Just like any other tool or service.
It's just like a chisel. Well the chisel company didn't promise to let you become a master craftsman overnight but anyway it's just like a chisel in that you have to learn how to use it. And people expect a chisel to actually chisel through wood out the box but anyway it's exactly like a chisel.
I haven’t studied the project that this is a comment on, but: The article notices that something that compiles, runs, and renders a trivial HTML page might be a good starting point, and I would certainly agree with that when it’s humans writing the code. But is it the only way? Instead of maintaining “builds and runs” as a constant and varying what it does, can it make sense to have “a decent-sized subset of browser functionality” as a constant and varying the “builds and runs” bit? (Admittedly, that bit does not seem to be converging here, but I’m curious in more general terms.)
In theory you could generate a bunch of code that seems mostly correct and then gradually tweak it until it's closer and closer to compiling/working, but that seems ill-suited to how current AI agents work (or even how people work). AI agents are prone to make very local fixes without an understanding of wider context, where those local fixes break a lot of assumptions in other pieces of code.
It can be very hard to determine if an isolated patch that goes from one broken state to a different broken state is on net an improvement. Even if you were to count compile errors and attempt to minimize them, some compile errors can demonstrate fatal flaws in the design while others are minor syntax issues. It's much easier to say that broken tests are very bad and should be avoided completely, as then it's easier to ensure that no patch makes things worse than it was before.
Obviously, it has to eventually build and run if there’s to be any point to it, but is it necessary that every, or even any, step along the way builds and runs? I imagine some sort of iterative set-up where one component generates code, more or less "intelligently", and others check it against the C, HTML, JavaScript, CSS and what-have-you specs, and the whole thing iterates until all the checking components are happy. The components can’t be completely separate, of course, they’d have to be more or less intermingled or convergence would be very slow (like when lcamtuf had his fuzzer generate a JPEG out of an empty file), but isn’t that basically what (large) neural networks are; tangled messes of interconnected functions that do things in ways too complicated for anyone to bother figuring out?
I don't want to defend the AI slop, but it's common for me to go on for a few weeks without being able to compile everything when doing something realy big. I can still compile individual modules and run their tests, but not the full application (which puts all modules together)... but it may take a lot of time until all modules can come together and actually run the app.
Browsers contain several high complexity pieces each of could take a while to build on its own, and interconnect them with reasonably verbose APIs that need to be implemented or at least stubbed out for code to not crash. There is also the difficulty of matching existing implementations quirk for quirk.
I guess the complexity is on-par with operating systems, but with the added compatibility problems that in order to be useful it doesn't just have to load sites intended to be compatible with it, it has to handle sites people actually use on the internet, and those are both a moving target, and tend to use lots of high complexity features that you have to build or at least stub out before the site will even work.
In all sincerity, this question is almost identical to "what's the most difficult thing about building an operating system" as a modern browser is tens of millions of lines of code that can run sophisticated applications. It has a network stack, half a dozen parsers, frame construction and reflow modules, composite, render and paint components, front end UI components, an extensibility framework, and more. Each one of these must enable supporting backward compatibility for 30 year old content as well as ridiculously complex contemporary web apps. And it has to load and render sites that a completely programming illiterate fool like me wrote. It must do this all in a performant and secure way using minimal system resources. Also, it probably also must run on Mac, Windows, Linux, Android, iOS, and maybe more.
I think it's only a matter of time until this becomes reality. It's almost inevitable.
My prediction last year was already that in the distant future - more than 10 years into the future - operating systems will create software on the fly. It will be a basic function of computers. However, there might remain a need for stable, deterministic software, the two human-machine interaction models can live together. There will be a need for software that does exactly what one wants in a dumb way and there will be a need for software that does complex things on the fly in an overall less reliable ad hoc way.
Even if it doesn't see any improvements beyond this point it wouldn't be a big deal. It's good enough for most programmers and any improvements are just a bonus.
This is why AI skeptics exist. We’re now at the point where you can make entirely unsubstantiated claims about AI capability, and even many folks on HN will accept it with a complete lack of discernment. The hype is out of control.
> folks on HN will accept it with a complete lack of discernment
Well, I'm a heavy LLM user, I "believe" LLM helps me a lot for some tasks, but I'm also a developer with decades of experience, so I'm not gonna claim it'll help non-programmers to build software, or whatever. They're tools, not solutions in themselves.
But even us "folks on HN" who generally keep up with where the ecosystem is going, have a limit I suppose. You need to substantiate what you're saying, and if you're saying you've managed to create a browser, better let others verify that somehow.
The second top comment is my own (skeptical) comment, with 20 points at this moment. Thanks to those 20 people, I felt compelled to write the blog-post in this submission, and try to ask a bit clearer "what is going on?", since apparently we're at least 20 people who is wondering about this.
I certainly don’t think Simon is a shill. He’s obviously a highly talented person, who in my opinion just doesn’t exercise appropriate discernment in some cases.
Edit: Of course, this isn’t a trait unique to Simon either. Everybody has blind spots, and it’s reasonable to be excited when new tech is released. On an unrelated note, my intent is to push back against some of the people here who try to shut down skepticism. Obviously, this doesn’t describe Simon, but I’ve seen others here who try to silence skeptical voices. This comes across as highly controlling and insecure.
I do not think you are reacting to what I said in good faith.
> he better hope he's on the right side of history here, as otherwise he will have burnt his reputation
That's something I've actually given quite a lot of thought to. My reputation and credibility matters a great deal to me. If it turns out this entire LLM thing was an over-hyped scam I'll take a very big hit to that reputation, and I'll deserve it.
(If AI rises up and tries to kill or enslave us all I'll be too busy fighting back to care.)
> company claims they "built a browser" from scratch
> looks inside
> completely useless and busted
30 billion dollar VS Code fork everyone. When we do start looking at these people for what they are: snake oil salesmen.
They slop laundered the FOSS Servo code into a broken mess and called it a browser, but dumbasses with money will make line go up based on lies. EFF right off.
Always take any pronouncement from an AI company (heavily dependent on VC and public sentiment on AI) with a heavy grain of salt..
hype over reality
I’m building an AI startup myself and I know that world and its full of hypsters and hucksters unfortunately - also social media communication + low attention span + AI slop communication is a blight upon todays engineering culture
Thank you for telling me about the email, it had a typo :( Been fixed now.
Regarding the downvotes, I think it's because it's feeling like you're pushing your project although it isn't really super relevant to the topic. The topic is specifically about Cursor failing to live up to their claims.
The amount of negativity in the original post was astounding.
People were making all sorts of statements like:
- “I cloned it and there were loads of compiler warnings”
- “the commit build success rate was a joke”
- “it used 3rd party libs”
- “it is AI slop”
What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.
If you are hung up on commit build quality, or code quality, you are completely missing the point, and I fear for your job prospects. These things will get better; they will get safer as the workflows get tuned; they will scale well beyond any of us.
Don’t look at where the tech is. Look where it’s going.
As mentioned elsewhere (I'm the author of this blogpost), I'm a heavy LLM user myself, use it everyday as a tool, get lots of benefits from it. It's not a "hit post" on using LLM tools for development, it's a post about Cursor making grand claims without being able to back them up.
No one is hung up on the quality, but there is a ground fact if something "compiles" or "doesnt". No one is gonna claim a software project was successful if the end artifact doesn't compile.
I think for the point of the article, it appeared to, at some point, render homepages for select well known sites. I certainly did not expect this to be a serious browser, with any reliability or legs. I don’t think that is dishonest.
> I certainly did not expect this to be a serious browser, with any reliability or legs.
Me neither, and I note so twice in the submission article. But I also didn't expect a project that for the last 100+ commits couldn't reliably be built and therefore tested and tried out.
My apologies - my point(s) were more about the original submission for the Cursor blog post, not your post itself.
I did read your post, and agree with what you're saying. It would be great if they pushed the agents to favour reliability or reproducibility, instead of just marching forwards.
> What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.
Correct, but Gas Town [1] already happened and what's more _actually worked_, so this experiment is both useless (because it doesn't demonstrate working software) _and_ derivative (because we've already seen that you can set up a project where with spend similar to the spend of a single developer you can churn out more code than any human could read in a week).
Spending 24h/day to build nothing isn't impressive - it's really, really bad. That's worse than spending 8h/day to build nothing.
If the piece of shit can't even compile, it's equivalent to 0 lines of code.
> Don’t look at where the tech is. Look where it’s going.
Given that the people making the tech seem incapable of not lying, that doesn't give me hope for where it's going!
Look, I think AI and LLMs in particular are important. But the people actively developing them do not give me any confidence. And, neither do comments like these. If I wanted to believe that all of this is in vain, I would just talk to people like you.
I'm sorry but what? Are you really trying to argue that it doesn't matter that nothing works, that all it produced is garbage and that what is really important is that it made that garbage really quickly without human oversight?
Quality absolutely matters, but it's hyper context dependent.
Not everything needs to, or should have the same quality standards applied to them. For the purposes of the Cursor post, it doesn't bother me that most of the commits produced failed builds. I assume, from their post, that at some points, it was capable of building, and rendering the pages shown in the video on the post. That alone, is the thing that I think is interesting.
Would I use this browser? Absolutely not. Do I trust the code? Not a chance in hell. Is that the point? No.
"Quality" here isn't if A is better than B. It's "Does this thing actually work at all?"
Sure, I don't care too much if the restaurant serves me food with silverware that is 18/10 vs 18/0 stainless steel, but I absolutely do care if I order a pizza and they just dump a load of gravel onto my plate and tell me it's good enough, and after all, quality isn't the point.
It is hard to look at where it is going when there are so many lies about where the tech is today. There are extraordinary claims made on Twitter all the time about the technology, but when you look into things, it’s all just smoke and mirrors, the claims misrepresent the reality.
What a silly take. Where the tech is is extremely relevant. The reality of this blog post is it shows the tech is clearly not going anywhere better either, as they seem to imply. 24 hours of useless code is still useless code.
This idea that quality doesn't matter is silly. Quality is critical for things to work, scale, and be extensible. By either LLMs or humans.
People that spend time poking holes in random vendor claims remind me of folks you see video of standing on the beach during a tsunami warning. Their eyes fixed on the horizon looking for a hundred foot wave, oblivious to the shore in front of them rapidly being gobbled up by the sea.
https://news.ycombinator.com/item?id=46649046
- JustHTML [1], which in practice [2] is a port of html5ever [3] to Python.
- justjshtml, which is a port of JustHTML to JavaScript :D [4].
- MiniJinja [5] was recently ported to Go [6].
All three projects have one thing in common: comprehensive test suites which were used to guardrail and guide AI.
References:
1. https://github.com/EmilStenstrom/justhtml
2. https://friendlybit.com/python/writing-justhtml-with-coding-...
3. https://github.com/servo/html5ever
4. https://simonwillison.net/2025/Dec/15/porting-justhtml/
5. https://github.com/mitsuhiko/minijinja
6. https://lucumr.pocoo.org/2026/1/14/minijinja-go-port/
Same user did a similar thing by creating an AWK interpreter written in Go using LLMs: https://github.com/kolkov/uawk -- as the creator of (I think?) the only AWK interpreter written in Go (https://github.com/benhoyt/goawk), I was curious. It turns out that if there's only one item in the training data (GoAWK), AI likes to copy and paste freely from the original. But again, it's poorly tested and poorly benchmarked.
I just don't see how one can get quality like this, without being realistic about code review, testing, and benchmarking.
I went through the motions. There are various points in the repo history where compilation is possible, but it's obscure. They got it to compile and operate prior to the article, but several of the PRs since that point broke everything, and this guy went through the effort of fixing it. I'm pretty sure you can just identify the last working commit and pull the version from there, but working out when looks like a big pain in the butt for a proof of concept.
I went through the last 100 commits (https://news.ycombinator.com/item?id=46647037) and nothing there was working (yet/since). Seems now after a developer corrected something it managed to pass `cargo check` without errors, since commit 526e0846151b47cc9f4fcedcc1aeee3cca5792c1 (Jan 16 02:15:02 2026 -0800)
Sorry, I should have taken notes, lol. At any rate, it was so much digging around I just gave up, I didn't want to invest more effort into it. I figured they'd get a stable version for others to try and I'd return to it at some point.
I was seeing screenshots and actually getting scared for my job for a second.
It’s broken and there’s no browser engine? Cursor should be tarred and feathered.
CEO stated "We built a browser with GPT-5.2 in Cursor"
instead of
"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"
[0] https://cursor.com/blog/scaling-agents
[1] https://x.com/kimmonismus/status/2011776630440558799
[2] https://x.com/mntruell/status/2011562190286045552
[3]https://www.reddit.com/r/singularity/comments/1qd541a/ceo_of...
If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline
True, but it is shocking how often claude suggests just disabling or removing tests.
"Fix the tests." This was interpreted literally, and assert status == 200 got changed to assert status == 500 in several locations. Some tests required more complex edits to make them "pass."
Inquiries about the tests went unanswered. Eventually the 2000 lines of slop was closed without merging.
Arguably, Claude is simply successfully channeling what the developers who wrote the bulk of its training data would do. We've already seen how bad behavior injected into LLMs in one domain causes bad behavior in other domains, so I don't find this particularly shocking.
The next frontier in LLMs has to be distinguishing good training data from bad training data. The companies have to do this, even if only in self defense against the new onslaught of AI-generated slop, and against deliberate LLM poisoning.
If the models become better at critically distinguishing good from bad inputs, particularly if they can learn to treat bad inputs as examples of what not to do, I would expect one benefit of this is that the increased ability of the models to write working code will then greatly increase the willingness of the models to do so, rather than to simply disable failing tests.
There are a lot of really bad human developers out there, too.
So you flubbed managing a project and are now blaming your employees. Classy.
>"To test this system, we pointed it at an ambitious goal: building a web browser from scratch."
and then near the end, they say:
>"Hundreds of agents can work together on a single codebase for weeks, making real progress on ambitious projects."
This means they only make progress toward it, but do not "build a web browser from scratch".
If you're curious, the State of Utopia (will be available at https://stateofutopia.com ) did build a web browser from scratch, though it used several packages for the networking portion of it.
See my other comments and posts for links.
But apparently "some pages take a literal minute to load"
Seems like "I had to do the last mile myself", not "autonomous coding" which was Cursor's claim here.
Edit: As mentioned, I ran `cargo check` on all the last 100 commits, and seems every single of them failed in some way: https://gist.github.com/embedding-shapes/f5d096dd10be44ff82b...
> Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even a commit made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
Gonna need to look closer into it when I have time, but seems they manually patched it up in the end, so the original claim still doesn't stand :/
I think they know they're on the backfoot at the moment. Cursor was hot news for a long time but now it seems terminal based agents are the hot commodity and I rarely see cursor mentioned. Sure they already have enterprise contracts signed but even at my company we're about to swap from a contract with cursor to Claude code because everyone wants to use that instead now - especially since it doesn't tie you to one editor.
So I think they're really trying to get "something" out there that sticks and puts them in the limelight. Long context/sessions are one of the hot things especially with Ralph being the hot topic so this lines up with that.
Also I know cursor has its own cli but I rarely see mention of it.
Diminishing returns are starting to really set in and companies are desperate for any illusion to the contrary.
Its just a reminder not to trust, instead verify. Its more expensive, but trust only leads to pain.
https://github.com/wilson-anysphere/formula
The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.
I couldn’t make it render the apple page that was on the Cursor promo. Maybe they’ve used some other build.
Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even some commits made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
"From scratch" sounds very impressive. "custom JS VM" is as well. So let's take a look at the dependencies [1], where we find
- html5ever
- cssparser
- rquickjs
That's just servo [2], a Rust based browser initially built by Mozilla (and now maintained by Igalia [3]) but with extra steps. So this supposed "from scratch" browser is just calling out to code written by humans. And after all that it doesn't even compile! It's just plain slop.
[1] - https://github.com/wilsonzlin/fastrender/blob/main/Cargo.tom...
[2] - https://github.com/servo/servo
[3] - https://blogs.igalia.com/mrego/servo-2025-stats/
I guess the answer is that most people will see the claim, read a couple of comments about "how AI can now write browsers, and probably anything else" from people who are happy to take anything at face value if it supports their view (or business) and move on without seeing any of the later comotion. This happens all the time with the news. No one bothers to check later if claims were true, they may live their whole lives believing things that later got disproved.
Programmers were not the target audience for this announcement. I don’t 100% know who was, but you can kind of guess that it was a mix of: VC types for funding, other CEOs for clout, AI influencers to hype Cursor.
Over-hyping a broken demo for funding is a tale as old as time.
That there’s a bit of a fuck-you to us pleb programmers is probably a bonus.
Bullshitting and fleecing investors is a skill that needs to be nurtured and perfected over the years.
I wonder how long this can go on.
Who is the dumb money here? Are VCs fleecing "stupid" pension funds until they go under?
Or is it symptom of a larger grifting economy in the US where even the president sells vaporware, and people are just emulating him trying to get a piece of the cake?
- Servo's HTML parser
- Servo's CSS parser
- QuickJS for JS
- selectors for CSS selector matching
- resvg for SVG rendering
- egui, wgpu, and tiny-skia for rendering
- tungstenite for WebSocket support
And all of that has 3M of lines!
It's also using weirdly old versions of some dependencies (e.g. wgpu 0.17 from June 2023 when the latest is 28 released in Decemeber 2025)
https://news.ycombinator.com/item?id=46650998
Would be interesting if someone who has managed to run it tries it on some actually complicated text layout edge cases (like RTL breaking that splits a ligature necessitating re-shaping, also add some right-padding in there to spice things up).
[1] https://github.com/wilsonzlin/fastrender/blob/main/src/layou...
[2] https://github.com/wilsonzlin/fastrender/blob/main/src/layou...
[3] Neither being the right place for defining a struct that should go into computed style imo.
We at least it's not outright ripping them off like it usually does.
I doubt even they checked, given they say they just let the agents run autonomously.
Humans who are bad and also bad at coding have predictable, comprehensible, failure modes. They don’t spontaneously sabotage their career and your project because Lord Markov twitched one of its many tails. They also lie for comprehensible reasons with attempts at logical manipulations of fact. They don’t spontaneously lie claiming not to having a nose, apologize for lying and promise to never do it again, then swear they have no nose in the next breath while maintaining eye contact.
Semi-autonomous to autonomous is a doozy of a step.
I wouldn't particularly care what code the agents copied, the bigger indictment is the code doesn't work.
So really, they failed to meet the bar of "download and build Chromium" and there's no point to talk about the code at all.
OpenAIs business-model floundering, degenerating inline to ads soon (lol), shows what can be done with infini-LLM, infini-capital, and all the smarts & connections on Earth… broadly speaking, I think the geniuses at Google who invented a lot of this shizz understand it and were leveraging it appropriately before ChatGPT blew up.
What Cursor did with their blogpost seems intentionally and outright misleading, since I'm not able to even run the thing. With Codex/Claude Codex it's relatively easy to download it and run it to try for yourself.
Reminds me of SAAP/Salesforce.
You think you can just fire up Ableton, Cubase or whatever and make as great music as a artist who done that for a long time? No, it requires practice and understanding. Every tool works like this, some different difficulties, some different skill levels, but all of them have it in some way.
(I grant that you're speaking from your experience, about different tools, two replies up, but this claims is just paper-rock-scissorable through these various AI tools. "Oh, this tool's authors are just hype, but this tool works totes-mc-oates…". Fool me once, and all.)
Codex was sold to me as a tool that can help me do program. I tried it, evaluated it, found it helpful, continued using it. Based on my experience, it definitively helps with some tasks. Apparently also, it does not work for others, for some not at all. I know the tool works for me, and I take the claim that it doesn't for others, what am I left to believe in? That the tool doesn't actually work, even though my own experience and usage of it says otherwise?
Codex is still an "AI success", regardless if it could build an entire browser by itself, from scratch, or whatever. It helps as it is today, I wouldn't need it to get better to continue using it.
But even with this perspective, which I'd say is "nuanced" (others would claim "AI zealot" probably), I'm trying to see if what Cursor claims is actually true, that they managed to build a browser in that way. When it doesn't seem true, I call it out. I still disagree with "This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny", and I'm claiming what Cursor is doing here is different.
> are definitively capable tools when used in certain ways
Which I received pushback on. My reply is to that pushback, defending what I said, not what others told you.
Edit: Besides the point, but Ableton (and others) constantly tell people how to learn how to use the tool, so they use it the right way. There is a whole industry of people (teachers) who specialize in specific software/hardware and teaching others "how to hold the tool correctly".
Yes, because that's what it is. If you seriously can't get Gemini 3 or Opus 4.5 to work you're either using it wrong or coding on something extremely esoteric.
That's an almost universal truth that you need to learn how to use any non trivial tool.
They definitely can make some things better and you can do somethings faster, but all the efficiency is gonna get sucked up by companies trying to drop more slop.
It's just like a chisel. Well the chisel company didn't promise to let you become a master craftsman overnight but anyway it's just like a chisel in that you have to learn how to use it. And people expect a chisel to actually chisel through wood out the box but anyway it's exactly like a chisel.
It can be very hard to determine if an isolated patch that goes from one broken state to a different broken state is on net an improvement. Even if you were to count compile errors and attempt to minimize them, some compile errors can demonstrate fatal flaws in the design while others are minor syntax issues. It's much easier to say that broken tests are very bad and should be avoided completely, as then it's easier to ensure that no patch makes things worse than it was before.
The diffusion model of software engineering
Writing junk in a text file isn't the hard part.
That doesn't mean we can usefully build software that is a big, tangled mess.
Browsers contain several high complexity pieces each of could take a while to build on its own, and interconnect them with reasonably verbose APIs that need to be implemented or at least stubbed out for code to not crash. There is also the difficulty of matching existing implementations quirk for quirk.
I guess the complexity is on-par with operating systems, but with the added compatibility problems that in order to be useful it doesn't just have to load sites intended to be compatible with it, it has to handle sites people actually use on the internet, and those are both a moving target, and tend to use lots of high complexity features that you have to build or at least stub out before the site will even work.
My prediction last year was already that in the distant future - more than 10 years into the future - operating systems will create software on the fly. It will be a basic function of computers. However, there might remain a need for stable, deterministic software, the two human-machine interaction models can live together. There will be a need for software that does exactly what one wants in a dumb way and there will be a need for software that does complex things on the fly in an overall less reliable ad hoc way.
It _is_ stuck at this point.
There's so much money involved no one wants to admit it out loud.
They have no path to the necessary exponential gains and no one is actually working on it.
I don’t mean the tech itself—-which is kind of useful. I mean the 99% of the value inflation of a kind of useful tool (if you know what you’re doing).
Well, I'm a heavy LLM user, I "believe" LLM helps me a lot for some tasks, but I'm also a developer with decades of experience, so I'm not gonna claim it'll help non-programmers to build software, or whatever. They're tools, not solutions in themselves.
But even us "folks on HN" who generally keep up with where the ecosystem is going, have a limit I suppose. You need to substantiate what you're saying, and if you're saying you've managed to create a browser, better let others verify that somehow.
The top comment is indeed baseless hype without a hint of skepticism.
There is also clearly a lot of other skeptical people in that submission too. Also, simonw (from that top comment) told me themselves "it's not clear that what they built even runs": https://bsky.app/profile/simonwillison.net/post/3mckgw4mxoc2...
> This project from Cursor is the second attempt I've seen at this now!
I used the word "attempt" very deliberately, to avoid suggesting that either of these two projects had achieved the goal.
I don't see how you can get to "baseless hype without a hint of skepticism" there unless you've already decided to take anything I say in bad faith.
and he wonders why people call him a shill
accepting everything some shit company tells you as gospel is not the default position of a "researcher"
he better hope he's on the right side of history here, as otherwise he will have burnt his reputation
Edit: Of course, this isn’t a trait unique to Simon either. Everybody has blind spots, and it’s reasonable to be excited when new tech is released. On an unrelated note, my intent is to push back against some of the people here who try to shut down skepticism. Obviously, this doesn’t describe Simon, but I’ve seen others here who try to silence skeptical voices. This comes across as highly controlling and insecure.
I do not think you are reacting to what I said in good faith.
> he better hope he's on the right side of history here, as otherwise he will have burnt his reputation
That's something I've actually given quite a lot of thought to. My reputation and credibility matters a great deal to me. If it turns out this entire LLM thing was an over-hyped scam I'll take a very big hit to that reputation, and I'll deserve it.
(If AI rises up and tries to kill or enslave us all I'll be too busy fighting back to care.)
> looks inside
> completely useless and busted
30 billion dollar VS Code fork everyone. When we do start looking at these people for what they are: snake oil salesmen.
They slop laundered the FOSS Servo code into a broken mess and called it a browser, but dumbasses with money will make line go up based on lies. EFF right off.
Man
Always take any pronouncement from an AI company (heavily dependent on VC and public sentiment on AI) with a heavy grain of salt..
hype over reality
I’m building an AI startup myself and I know that world and its full of hypsters and hucksters unfortunately - also social media communication + low attention span + AI slop communication is a blight upon todays engineering culture
Regarding the downvotes, I think it's because it's feeling like you're pushing your project although it isn't really super relevant to the topic. The topic is specifically about Cursor failing to live up to their claims.
People were making all sorts of statements like: - “I cloned it and there were loads of compiler warnings” - “the commit build success rate was a joke” - “it used 3rd party libs” - “it is AI slop”
What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.
If you are hung up on commit build quality, or code quality, you are completely missing the point, and I fear for your job prospects. These things will get better; they will get safer as the workflows get tuned; they will scale well beyond any of us.
Don’t look at where the tech is. Look where it’s going.
No one is hung up on the quality, but there is a ground fact if something "compiles" or "doesnt". No one is gonna claim a software project was successful if the end artifact doesn't compile.
Me neither, and I note so twice in the submission article. But I also didn't expect a project that for the last 100+ commits couldn't reliably be built and therefore tested and tried out.
I did read your post, and agree with what you're saying. It would be great if they pushed the agents to favour reliability or reproducibility, instead of just marching forwards.
Correct, but Gas Town [1] already happened and what's more _actually worked_, so this experiment is both useless (because it doesn't demonstrate working software) _and_ derivative (because we've already seen that you can set up a project where with spend similar to the spend of a single developer you can churn out more code than any human could read in a week).
[1]: https://github.com/steveyegge/gastown
If the piece of shit can't even compile, it's equivalent to 0 lines of code.
> Don’t look at where the tech is. Look where it’s going.
Given that the people making the tech seem incapable of not lying, that doesn't give me hope for where it's going!
Look, I think AI and LLMs in particular are important. But the people actively developing them do not give me any confidence. And, neither do comments like these. If I wanted to believe that all of this is in vain, I would just talk to people like you.
I'm sorry but what? Are you really trying to argue that it doesn't matter that nothing works, that all it produced is garbage and that what is really important is that it made that garbage really quickly without human oversight?
That's.....that's not success.
Not everything needs to, or should have the same quality standards applied to them. For the purposes of the Cursor post, it doesn't bother me that most of the commits produced failed builds. I assume, from their post, that at some points, it was capable of building, and rendering the pages shown in the video on the post. That alone, is the thing that I think is interesting.
Would I use this browser? Absolutely not. Do I trust the code? Not a chance in hell. Is that the point? No.
Sure, I don't care too much if the restaurant serves me food with silverware that is 18/10 vs 18/0 stainless steel, but I absolutely do care if I order a pizza and they just dump a load of gravel onto my plate and tell me it's good enough, and after all, quality isn't the point.
There are very few software development contexts where the quality metric of “does the project build and run at all” doesn’t matter quite a lot.
This idea that quality doesn't matter is silly. Quality is critical for things to work, scale, and be extensible. By either LLMs or humans.
Am I misunderstanding this metaphor? Tsunamis pull the sea back before making landfall.