I'm a little unimpressed by its instruction following here, the summaries I get from other models are a lot closer to my system prompt. Here's the same thing against Gemini 2.5 Pro for example (massively better): https://gist.github.com/simonw/f21ecc7fb2aa13ff682d4ffa11ddc...
I tried summarizing the thread so far (339 comments) with a custom system prompt [0] and a user-prompt that captures the structure (hierarchy and upvotes) of the thread [1].
This is the output that we got (based on the HN-Companion project) [2]:
That Gemini 2.5 one is impressive. I found it interesting that the blog post didn't mention Gemini 2.5 at all. Okay, it was released pretty recently, but 10 days seems like enough time to run the benchmarks, so maybe the results make Llama 4 look worse?
This is a great idea! Exactly what I was also thinking and started working on a side-project. Currently the project can create summaries like this [1].
Since HN Homepage stories change throughtout the day, I thought it is better to create the Newsletter based on https://news.ycombinator.com/front
So, you are getting the news a day late, but it will capture the top stories for that day. The newsletter will have high-level summary for each post and a link to get the details for that story from a static site.
Same, it was using the high quality openai voice until my account ran out of funds.. Now it's using edge-tts which is free. So far it seems like the best option in terms of price/performance, but I'm happy to switch it up if something better comes along.
I'll look into it for the next iteration! I could just take the transcript that's already on the page and put it somewhere separate from the audio.
But thinking about it a little more, what would the use case for a text version actually look like? I feel like if you're already on HN, navigating somewhere else to get a TLDR would be too much friction. Or are we talking RSS/blog type delivery?
And with Scout I got complete junk output for some reason:
Junk output here: https://gist.github.com/simonw/d01cc991d478939e87487d362a8f8...I'm running it through openrouter, so maybe I got proxied to a broken instance?
I managed to run it through Scout on Groq directly (with the llm-groq plugin) but that had a 2048 limit on output size for some reason:
Result here: https://gist.github.com/simonw/a205c5fc131a1d4e9cd6c432a07fe...I'm a little unimpressed by its instruction following here, the summaries I get from other models are a lot closer to my system prompt. Here's the same thing against Gemini 2.5 Pro for example (massively better): https://gist.github.com/simonw/f21ecc7fb2aa13ff682d4ffa11ddc...