From https://twitter.com/llm_sec/status/1667573374426701824
- People ask LLMs to write code
- LLMs recommend imports that don’t actually exist
- Attackers work out what these imports’ names are, and create & upload them with malicious payloads
- People using LLM-written code then auto-add malware themselves
Indirect prompt injections will make this worse. Plugins lead to scraping insecure websites (i.e., search for docs for a particular topic). This can result in malicious context being embedded and suggested during a PR or code output.
That along with the above, faking commonly recommended inputs, it becomes very difficult to just trust and use LLM output. One argument is that experienced devs can catch this, but security is often about the weakest link, one junior dev’s mistake with this could lead to a hole.
There are guard rails to put in place for some of these things (i.e., audit new libraries, only scrape from ‘reliable’ websites), but I suspect most enterprises/startups implementing this stuff don’t have such guard rails in place.
Related
https://medium.com/palo-alto-networks-developer-blog/mind-tricks-the-perils-of-prompt-injection-attacks-against-llms-4b148fcd7519
https://rez0.blog/hacking/2023/05/19/prompt-injection-poc.html#:~:text=Indirect Prompt Injection%3A A technique,for the next LLM execution.