Is there any TTS engine that doesn't need cloning and has some sort of parameters one can specify?
Like what if I want to graft on TTS to an existing text chat system and give each person an unique, randomly generated voice? Or want to try to get something that's not quite human, like some sort of alien or monster?
You could use an old-school formant synthesizer that lets you tune the parameters, like espeak or dectalk. espeak apparently has a klatt mode which might sound better than the default but i haven't tried it.
Whatever's possible, local on Linux. If it's not, one of my servers has a GPU that I pass through to a Windows VM, and run the games there and displayed by streaming. Also works for VR.
Not a setup for everyone and a tad technically complex to set up, but it works well enough for my needs.
It does run into some trouble with games that don't like virtual machines, but since I'm a very casual gamer I just play things that don't complain about that.
> Can someone be the Steam for Excel, please? :)
You can actually add anything you want to Steam, so you can use Steam Link to run Excel remotely.
Sleep just ceased to exist in the last few years and got replaced with an always on, low power mode.
I believe the reasoning was partly that suspend to RAM had serious reliability issues due to the complexity of saving the state, partly that people starting expecting cell phone-like performance where eg, mail is always received.
I think that makes sense and was in a way unavoidable.
Compare a physical shop with Spotify. A physical shop has limited space, so old stuff has to be pruned out to leave room for the new releases. So sales for old stuff gradually stop, and there's a small selection of current releases you can buy.
Spotify and the like aren't like that. It's an infinitely growing amount of music you can play. New releases may be completely unnoticed by users who follow recommendation algorithms. You can trivially follow impulses like "So what else did the the band that made Video Killed the Radio Star make?".
Since digital is infinitely reproducible and not perishable this will keep getting worse and worse. Any new artist competes against all of the music that was released before them.
I can generate images or get LLM answers in below 15 seconds on mundane hardware. The image generator draws many times faster than any normal person, and the LLM even on my consumer hardware still produces output faster than I can type (and I'm quite good at that), let alone think what to type.
Speed highly correlates with power efficiency. I believe my hardware maxes out somewhere around 150W. 15 seconds of that isn't much at all.
> Also, why are people moving mountains to make huge, power obliterating datacenters if actually "its fine, its not that much"?
I presume that's mostly training, not inference. But in general anything that serves millions of requests in a small footprint is going to look pretty big.
It's not a good analogy at all, because of what they said about mundane hardware. They're specifically not talking about any kind of ridiculous wattage situation, they're talking about single GPUs that need fewer watts than a human in an office to make text faster than a human, or that need 2-10x the watts to make video a thousand times faster.
An LLM gives AN answer. If you ask for not many more than that it gets confused, but instead of acting in a human-like way, it confidently proceeds forward with incorrect answers. You never quite know when the context got poisoned, but reliability drops to 0.
There's many things to say on this. Free is worthless. Speed is not necessarily a good thing. The image generation is drivel. But...
The main nail in the coffin is accountability. I can't trust my work if I can't trust the output of the machine. (and as a bonus, the machine can't build a house. It's single purpose).
I think you still should be able to expect a bit of accommodation on trains that cross country borders or go to airports.
The EU makes travel between EU countries as easy as travel between US states. You can just get on a train from Germany to Spain without any prior planning.
It's also unusual given how much English you'll hear in Germany nowadays (at least in major, tourist-attracting cities) in just about any other context.
English has been in a hegemonic position over German for the past sixty years, not vice versa.
The majority of popular German language films tend to have English language titles when aimed at the English market, and nearly always when aimed at children: "Goodbye Lenin", "Run Lola Run" etc. I was pretty amazed at "Ice Age", because it would be easy and concise to translate.
The way I see it, when LLMs work, they're almost magical. When they don't, oh well, it didn't take that long anyway, and I didn't have them until recently, so I can just do things the old boring way if the magic fails.
The problem with zork is that you don’t have a list of all the options in front of you so you have to guess. You could have a menu that lists all the valid options, but that changes the game. It doesn’t require you to use imagination and open-ended thinking, it becomes more of a point’n’click storybook.
But for tools, we should have a clear up front list of capabilities and menu options. Photoshop and VScode give you menu after menu of options with explicit well defined behaviors because they are tools used to achieve a specific aim and not toys for open ended exploration.
An llm doesn’t give you a menu because the llm doesn’t even know what it’s capable of. And that’s why I think we can see such polarized responses - some people want an LLM that’s a supercharged version of a tool, others want a toy for exploration.
The only time it ever seems like magic is when you don't really care about the problem or how it gets "solved" and are willing to ignore all the little things it got wrong.
Generative AI is neither magic, nor does it really solve any problems. The illusion of productivity is all in your head.
For my uses, my rule is "long to research, but easy to verify". I only ask for things I can quickly determine if they're right or not, I just don't want to spend half an hour googling and sorting though the data.
For most of my queries there's an acceptable margin of error, which is generally unavoidable AI or not. Google isn't guaranteed to return everything you might want either.
> As a practical implementation of "six degrees of Kevin Bacon", you could get an organic trust chain to random people.
GPG is terrible at that.
0. Alice's GPG trusts Alice's key tautologically.
1. Alice's GPG can trust Bob's key because it can see Alice's signature.
2. Alice's GPG can trust Carol's key because Alice has Bob's key, and Carol's key is signed by Bob.
After that, things break. GPG has no tools for finding longer paths like Alice -> Bob -> ??? -> signature on some .tar.gz.
I'm in the "strong set", I can find a path to damn near anything, but only with a lot of effort.
The good way used to be using the path finder, some random website maintained by some random guy that disappeared years ago. The bad way is downloading a .tar.gz, checking the signature, fetching the key, then fetching every key that signed in, in the hopes somebody you know signed one of those, and so on.
And GPG is terrible at dealing with that, it hates having tens of thousands of keys in your keyring from such experiments.
GPG never grew into the modern era. It was made for persons who mostly know each other directly. Addressing the problem of finding a way to verify the keys of random free software developers isn't something it ever did well.
What's funny about this is that the whole idea of the "web of trust" was (and, as you demonstrate, is) literally PGP punting on this problem. That's how they talked about it at the time, in the 90s, when the concept was introduced! But now the precise mechanics of that punt have become a critically important PGP feature.
I don't think it punted as much as it never had that as an intended usage case.
I vaguely recall the PGP manuals talking about scenarios like a woman secretly communicating with her lover, or Bob introducing Carol to Alice, and people reading fingerprints over the phone. I don't think long trust chains and the use case of finding a trust path to some random software maintainer on the other side of the planet were part of the intended design.
I think to the extent the Web of Trust was supposed to work, it was assumed you'd have some familiarity with everyone along the chain, and work through it step by step. Alice would known Bob, who'd introduce his friend Carol, who'd introduce her friend Dave.
I don't think it was quite too early, it just makes tradeoffs that are undesirable.
Lytro as I understand it, trades a huge amount of resolution for the focusing capability. Some ridiculous amount, like the user gets to see just 1/8th of the pixels on the sensor.
In a way, I'd say rather than too early it was too late. Because autofocus was already quite good and getting better. You don't need to sacrifice all that resolution when you can just have good AF to start with. Refocusing in post is a very rare need if you got the focus right initially.
And time has only made that even worse. Modern autofocus is darn near magic, and people love their high resolution photos.
I find it very useful for wildlife photos. Autofocus never seems to work well for me on e.g. birds in flight.
It's also possible to generate a depth map from a single shot, to use as a starting point for a 3D model.
They're pretty neat cameras. The relatively low output resolution is the main downside. They would also have greatly benefited from consulting with more photographers on the UI of the hardware and software. There's way too much dependency on using the touchscreen instead of dedicated physical controls.
I'd argue the opposite, consumers need more resolution than pros.
A pro will show up with a 300mm f/2.8, a tripod, a camera with good AF and high ISO, and the skills, plan and patience to catch birds in flight.
But all that stuff is expensive. The consumer way to approximate the lack of a good lens is a small, high res sensor. That only works in bright light, but you can get good results with affordable equipment in the right conditions. Greatly reducing the resolution is far from optimal when you can't have a big fancy lens to compensate.
And where is focus the hardest? Mostly where you want to have high detail. Wildlife, macro, sports.
I've got my doubts, because current AI tech doesn't quite live in the real world.
In the real world something like inventing a meat substitute is thorny problem that must be solved in meatspace, not in math. Anything from not squicking out the customers, to being practical and cheap to produce, to tasting good, to being safe to eat long term.
I mean, maybe some day we'll have a comprehensive model of humans to the point that we can objectively describe the taste of a steak and then calculate whether a given mix and processing of various ingredients will taste close enough, but we're nowhere near that yet.
Like what if I want to graft on TTS to an existing text chat system and give each person an unique, randomly generated voice? Or want to try to get something that's not quite human, like some sort of alien or monster?
reply