Hacker Newsnew | past | comments | ask | show | jobs | submit | arkmm's commentslogin

They're still very good for finetuned classification, often 10-100x cheaper to run at similar or higher accuracy as a large model - but I think most people just prompt the large model unless they have high volume needs or need to self host.

Maybe a bit off-topic, but how'd you meet your partner while on your adventures?

As a follow-up to this, even though water makes up 70% of the Earth's surface, it's only 0.02% of the Earth's mass.


I might have messed it up, but as a follow up to your follow up, I think the depth of the ocean is comparable to the width of a single human hair compared to the head.

If you inflate a 18cm diameter head to the size of our planet, a 75um hair would be about 5km wide - which is about the average depth of our oceans.

It's one hair, not a whole head of hair!


Wow, this deserves its own submission.


Neat approach, but seems like the eventual goal of caching DOM maps for all users would be a privacy nightmare?


Yes I can imagine PI somehow being stored in the workflow. I frequently see llms hardcoding tests just to make user happy and this can also happen in the browser version where if something is too hard to scrape but agent is able to infer from screenshot so it might end up making a workflow that seems correct but is just hardcoded with data. We are thinking of multiple guards/blocks to not let user create such a workflow, but the risks that come with an open ended agent are still going to be present.


What's misleading about that? You rent $100 of time on an H100 to train the model.


What sorts of automations were you able to get working with the Chrome dev tools MCP?


Not OP, but in my experience, Jest and Playwright are so much faster that it's not worth doing much with the MCP. It's a neat toy, but it's just too slow for an LLM to try to control a browser using MCP calls.


Actually the super power of having the LLM in the bowser may be that it vastly simplifies using LLMs to write Playwright scripts.

Case in point, last week I wrote a scraper for Rate Your Music, but found it frustrating. I'm not experienced with Playwright, so I used vscode with Claude to iterate in the project. Constantly diving into devtools, copying outter html, inspecting specific elements etc is a chore that this could get around, making for faster development of complex tests


Yeah I think it would be better to just have the model write out playwright scripts than the way it's doing it right now (or at least first navigate manually and then based on that, write a playwright typescript script for future tests).

Cuz right now it's way too slow... perform an action, then read the results, then wait for the next tool call, etc.


This is basically our approach with Herd[0]. We operate agents that develop, test and heal trails[1, 2], which are packaged browser automations that do not require browser use LLMs to run and therefore are much cheaper and reliable. Trail automations are then abstracted as a REST API and MCP[3] which can be used either as simple functions called from your code, or by your own agent, or any combination of such.

You can build your own trails, publish them on our registry, compose them ... You can also run them in a distributed fashion over several Herd clients where we take care of the signaling and communication but you simply call functions. The CLI and npm & python packages [4, 5] might be interesting as well.

Note: The automation stack is entirely home-grown to enable distributed orchestration, and doesn't rely on puppeteer nor playwright but the browser automation API[6] is relatively similar to ease adoption. We also don't use the Chrome Devtools Protocol and therefore have a different tradeoff footprint.

0: https://herd.garden

1: https://herd.garden/trails

2: https://herd.garden/docs/trails-automations

3: https://herd.garden/docs/reference-mcp-server

4: https://www.npmjs.com/package/@monitoro/herd

5: https://pypi.org/project/monitoro-herd/

6: https://herd.garden/docs/reference-page


Whoa that’s cool. I’ll check it out, thanks!


Thanks! Let me know if you give it a shot and I’ll be happy to help you with anything.


You might want to change column title colors as they're not visible (I can see them when highlighting the text) https://herd.garden/docs/alternative-herd-vs-puppeteer/


Oh thanks! It was a bug in handling browser light mode. I just fixed it.


Now I notice that testimonials are victim of the same issue


Looks useful! What would it take to add support for (totally random example :D) Harper's Magazine?


> or at least first navigate manually and then based on that, write a playwright typescript script for future tests

This has always felt like a natural best use for LLMs - let them "figure something out" then write/configure a tool to do the same thing. Throwing the full might of an LLM every time you're trying to do something that could be scriptable is a massive waste of compute, not to mention the inconsistent LLM output.


Exactly this. I’ve spent some time last week at a 50 something people web agency helping them setup QA process where agents explore the paths and based on those passes write automated scripts that humans verify and put into testing flow.


That's nice. Do you have some tips/tricks based on your experience that you can share?


Not tested much but Playright can read browser_network_requests' response, which is a much faster way to extract information than waiting for all the requests to finish, then parse the html, when what you're looking for is already nicely returned in an api call. Puppeteer MCP server doesn't have an equivalence.


You can use it for debugging with the llm though.


In theory or in practice?


I've used it a few times with vscode.

Although I find the electron mcp is better for what I'm doing at the moment.


I've used it to read authenticated pages with Chromium. It can be run as a headless browser and convert the HTML to markdown, but I generally open Chromium, authenticate to the system, then allow the CLI agent to interact with the page.

https://github.com/grantcarthew/scripts/blob/main/get-webpag...


The irony of this is so much of Reddit comments these days are AI generated.


Unfortunately I think they have stopped doing this since COVID.


Maybe Vegas is different, but I saw them just a couple of weeks ago. They were doing a 50th anniversary performance at the place they did their first show together, the Minnesota Renaissance Festival. They hung around in the crowd for at least 20 minutes after the show, talking and letting people get selfies.


This misses the forest from the trees IMO:

- The datacenter GPU market is 10x larger than the consumer GPU market for Nvidia (and it's still growing). Winning an extra few percentage points in consumer is not a priority anymore.

- Nvidia doesn't have a CPU offering for the datacenter market and they were blocked from acquiring ARM. It's in their interest to have a friend on the CPU side.

- Nvidia is fabless and has concentrated supplier and geopolitical risk with TSMC. Intel is one of the only other leading fabs onshoring, which significantly improves Nvidia's supplier negotiation position and hedges geopolitical risk.


> Nvidia doesn't have a CPU offering for the datacenter market and they were blocked from acquiring ARM. It's in their interest to have a friend on the CPU side.

Someone should tell nvidia that. They sure seem to think they have a datacenter CPU.

https://www.nvidia.com/en-us/data-center/grace-cpu-superchip...


I wonder if this signal a lack of confidence in their CPU offerings going forward?

But there's always TSMC being a pretty hard bottleneck - maybe they just can't get enough (and can't charge close to their GPU offerings per wafer), and pairing with Intel themselves is preferable to just using Intel's Foundry services?


> Someone should tell nvidia that

To be fair from what I hear someone really should tell at least half of nvidia that.


Jensen was literally talking about the need for x86 CPU on yesterdays webcast


>Nvidia is fabless and has concentrated supplier and geopolitical risk with TSMC.

East India Company has been conducting continental wars on its own. A modern company with $4T valuation and a country-GDP-size revenue and possessing key military technology of today and tomorrow wars - AI software and hardware, including robotics - can successfully wage such a continental war through a suitable proxy, say an oversized private military contractor (especially if it massively armed with drones and robots), and in particular is capable of defending an island like Taiwan. (or thinking backwards - an attack on Taiwan would cause a trillion or two drop in NVDA valuation. What options get on the table when there is a threat of a trillion dollar loss ... To compare - 20 years of Iraq cost 3 trillions, ie. 150B/year buy you a lot of military hardware and action, and efficient defense of Taiwan would cost much less than that.)


Defending against territorial conquest is considerably easier than defending against kinetic strikes on key manufacturing facilities


Not necessarily. Territorial war requires people. Defense from kinetic strikes on key objects concentrated on smallish territory requires mostly high-tech - radars and missiles - and that would be much easier for a very rich high-tech US corporation.

An example - Starlink antenna, sub-$500, a phased array which actually is like a half or a third of such an array on a modern fighter jet where it cost several millions. Musk naturally couldn't go the way of a million-per-antenna, so he had to develop and source it on his own. The same with anti-missile defense - if/when NVDA gets to it to defend the TSMC fabs, NVDA would produce such defense systems orders of magnitude cheaper, and that defense would work much better than the modern military systems.


If China bombs tsmc, we blockade the Malacca straits.

China's economy shuts down in a month, their population starves in another month


    > Nvidia is fabless and has concentrated supplier and geopolitical risk with TSMC. Intel is one of the only other leading fabs onshor[e]
TSMC is building state of the art fabs in Arizona, USA. Samsung in Texas, USA. I assume these are being done to reduce geopolitical risk on all sides.

Something that I never read about: Why can't NVidia use Samsung fabs? They are very close to TSMC state of the art.


> They are very close to TSMC state of the art.

They're not. Most have tried at 1 point. Apple had a release with TSMC + Samsung and users spotted a difference. There was quite a bit of negativity.


TSMC will not have state of the art on US soil.

Taiwanese gov prevents them from doing it. Leading node has to be on Taiwanese soil


    > Taiwanese gov prevents them from doing it. Leading node has to be on Taiwanese soil
This is bold claim. Do you have a public evidence to share? I have never once seen this mentioned in any newspaper articles that I have read about TSMC and their expansion in the US.



Maybe after being bitten by Samsung on their RTX3000 GPU. Power Spike and a lot of heat.


Intel just released a halfway decent workstation (eg data center) card and we were expecting an even better set of cards by Xmas before this happened.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: