We Asked 100+ AI Models to Write Code. The Results: AI-generated Code That Works, But Isn’t Safe

Arthur Besse · 4 days ago

We Asked 100+ AI Models to Write Code. The Results: AI-generated Code That Works, But Isn’t Safe

@[email protected] · 4 days ago

How do they fare against human generated code?

Because the baseline is pretty bad.

SSUPII · edit-2 4 days ago

Yeah, because this would be something influenced a lot by training data we actually naturally provide.

Remember that AI is not magic, and will generate “average” code at best. Higher than average code might be possible, but will require insane training data filtering and drastically diminish its size making it actually less generally capable.

@[email protected] · 3 days ago

This is only true for basic pre-training of the base model. The later stage fine tuning (I used to call it RLHF but now I think many different techniques exist) is to make the model understand the basic level of expectation. Despite having 4chan in their training set, you will never see modern LLMs spontaneously generate edgy racist shit, not because it can’t but because it learnt that this is not the output expected.

Similarly with code, base models would produce by default average code, but fine tuning makes it understand that only the highest standard has to be generated. I can guarantee you that the code LLMs produce is much higher quality (on the superficial level) than the average code on github: documentation on all functions, error code and exceptions managed correctly, special cases handled whenever they are identified…

We Asked 100+ AI Models to Write Code. The Results: AI-generated Code That Works, But Isn’t Safe

We Asked 100+ AI Models to Write Code. The Results: AI-generated Code That Works, But Isn’t Safe

We Asked 100+ AI Models to Write Code. Here’s How Many Failed Security Tests. | Veracode