David Gerard@awful.systemsM to

TechTakes@awful.systemsEnglish · 5 months ago

come see all the popular super-duper-autocomplete systems failing hard at really simple reasoning questions and babbling nonsense from latent space!

51

come see all the popular super-duper-autocomplete systems failing hard at really simple reasoning questions and babbling nonsense from latent space!

David Gerard@awful.systemsM to

TechTakes@awful.systemsEnglish · 5 months ago

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Chat

sinedpick@awful.systems
link
fedilink
English
arrow-up
9·
5 months ago
This all but confirms that all those benchmark evals are in the training set right?
- David Gerard@awful.systemsOPM
  link
  fedilink
  English
  arrow-up
  13·
  5 months ago
  Some forms are - but many are not! The fun stuff is in Appendix 2, the responses.

TechTakes@awful.systems

techtakes@awful.systems

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !techtakes@awful.systems

Big brain tech dude got yet another clueless take over at HackerNews etc? Here’s the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

101 users / day
304 users / week
1.8K users / month
5.97K users / 6 months
9 local subscribers
1.4K subscribers
570 Posts
16.3K Comments
Modlog

mods:
David Gerard@awful.systems