GrapheneOS now officially supports Pixel 9, 9 Pro, and 9 Pro XL

BookSwiftieAndrew@kbin.earth · 23 days ago

GrapheneOS now officially supports Pixel 9, 9 Pro, and 9 Pro XL

Possibly linux@lemmy.zip · 20 days ago

What else would it be except an llm? What do you think model means?

helenslunch · 20 days ago

…what do you think LLM means?

Possibly linux@lemmy.zip · 20 days ago

Large language model

helenslunch · 20 days ago

Large language model.

You are aware AI is used for more than just reading and generating text?

evo@sh.itjust.works · edit-2 20 days ago

You are aware that those are often called LMMs, Large Multimodal Model. And one of the modes that makes it multi-modal is Language. All LMMs are or contain an LLM.

helenslunch · edit-2 20 days ago

LLMs are not called LMMs, they’re called LLMs LOL

But thank you for moving the goalposts and making it clear you don’t know what you’re talking about and have no interest in an honest discussion. Goodbye.

Possibly linux@lemmy.zip · 20 days ago

https://github.com/haotian-liu/LLaVA

I don’t think Google actually uses LLava but the concept is the same. The data gets converted into text for the model to process.

helenslunch · 20 days ago

How do you convert text to images?

Possibly linux@lemmy.zip · edit-2 20 days ago

Its complicated and far over my head mathematically.

https://arxiv.org/abs/2304.08485

Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, our model and code base publicly available.

This paper is a few years old but it is the basics. The newer llava is based on open models.