I agree with you in spirit, but I find it hard to explain that distinction. What's the difference between mass web scraping and an automated tool using this agent? The biggest differences I assume would be scope and intent... But because this API is open for general development, it's difficult to judge the intent and scope of how it could be used.
What's difficult to explain? If you're having an agent crawl a handful of pages to answer a targeted query, that's clearly not mass scraping. If you're pulling down entire websites and storing their contents, that's clearly not normal use. Sure, there's a gray area, but I bet almost everyone who doesn't work for an AI company would be able to agree whether any given activity was "mass scraping" or "normal use".
What is worse: 10,000 agents running daily targeted queries on your site, or 1 query pulling 10,000 records to cache and post-process your content without unnecessarily burdening your service?
The single agent pulling regularly 10k records, which nobody will ever use, is worse than the 10k agents coming from the same source, and using the same cache, they fill when doing a targeted request. But even worse are 10k agents from 10k different sources, scraping 10k sites each, of which 9999 pages are not relevant for their request.
At the end it's all about the impact on the servers, and those can be optimized, but this does not seem to happen at the moment at large. So in that regard, centralizing usage and honouring the rules is a good step, and the rest are details to figure out on the way.
I apprehend that you want me to say the first one is worse, but it's impossible with so few details. Like: worse for whom? in what way? to what extent?
If (for instance) my content changes often and I always want people to see an up-to-date version, the second option is clearly worse for me!
No, I've been turning it over in my mind since this question started to emerge and I think it's complicated, I don't have an answer myself. After all, the first option is really just the correlate to today's web traffic, it's just no longer your traffic. You created the value, but you do not get the user attention.
My apprehension is not with AI agents per se, it is the current, and likely future implementation: AI vendors selling the search and re-publication of other parties' content. In this relationship, neither option is great: either these providers are hammering your site on behalf of their subscribers' individual queries, or they are scraping and caching it, and reselling potentially stale information about you.