Retentive Network: A Successor to Transformer for Large Language Models

nsa@kbin.social · 1 year ago

Retentive Network: A Successor to Transformer for Large Language Models

Lenguador@kbin.social · 1 year ago

This looks amazing, if true. The paper is claiming state of the art across literally every metric. Even in their ablation study the model outperforms all others.

I’m a bit suspicious that they don’t extend their perplexity numbers to the 13B model, or provide the hyper parameters, but they reference it in text and in their scaling table.

Code will be released in a week https://github.com/microsoft/unilm/tree/master/retnet

KingsmanVince@kbin.social · 1 year ago

https://github.com/Jamie-Stirling/RetNet non-official implementation

missing@kbin.social · 1 year ago

If the claims here are true… wow research and development are moving very quickly

SSamDav@lemmy.pt · 1 year ago

Would love to now how it compares with hyenna on the LRA.

missing@kbin.social · 1 year ago

deleted by creator