![](https://hexbear.net/pictrs/image/5b3b9aae-f62d-4a32-855c-631182c9e7df.png)
![](https://hexbear.net/pictrs/image/2c4e08c9-7864-4b30-8012-7ffb79b949f6.jpeg)
Synthetic data is basically a fancy way of saying ‘I’m properly formatting data and reinforcing the ai’s good outputs’. Rearranging words, fixing / adding tags, that sort of thing. This is generated with various tools that usually have an LLM or VLM plugged in, though some are as simple as a regex script.
I mean thats just the case with everything really. Theres a lot of very good use cases that are mostly to do with data manipulation, but the coolest ones are translating. I think we’re approaching a point where small models are providing very accurate translations and are even translating tone and intent properly, which is far superior to simple dictionary translation methods. I think its very possible that new phones could be outfitted with tensor cores and you could have a real-time universal translator in your hand, though it’ll likely only add ‘subtitles’ irl for you. AI voice-word recognition has also been very good and can be miniaturized. This is the use case I’m most excited for, personally, as a communist. Currently translating in a foreign country requires a lot of typing (if you dont have a perfect grasp of language) and it removes a very human element I feel to conversation. If everyone could locally run a subtitle-translation generation app it’d be amazing for all of humanity.
Theres of course plenty of manufacturing use cases as well, but China is spearheading on that, though there is some work being done in the US as well in the few industries that remain.