• Kuvwert@lemm.ee
    link
    fedilink
    arrow-up
    12
    arrow-down
    5
    ·
    11 months ago

    52% In the first year is pretty cool, excited to see how it will evolve.

    • SirGolan@lemmy.sdf.org
      link
      fedilink
      arrow-up
      5
      arrow-down
      3
      ·
      11 months ago

      GPT4 with reflexion prompting gets 90% correct (for HumanEval coding benchmark). The paper this is based on is misleading at best.