Researchers claim GPT-4 passed the Turing test

@[email protected] · 1 year ago

Researchers claim GPT-4 passed the Turing test

@[email protected] · 1 year ago

Turing test? LMAO.

I asked it simply to recommend me a supermarket in our next bigger city here.

It came up with a name and it told a few of it’s qualities. Easy, I thought. Then I found out that the name does not exist. It was all made up.

You could argue that humans lie, too. But only when they have a reason to lie.

@[email protected] · 1 year ago

That’s not what LLMs are for. That’s like hammering a screw and being irritated it didn’t twist in nicely.

The turing test is designed to see if an AI can pass for human in a conversation.

@[email protected] · edit-2 1 year ago

turing test is designed to see if an AI can pass for human in a conversation.

I’m pretty sure that I could ask a human that question in a normal conversation.

The idea of the Turing test was to have a way of telling humans and computers apart. It is NOT meant for putting some kind of ‘certified’ badge on that computer, and …

That’s not what LLMs are for.

…and you can’t cry ‘foul’ if I decide to use a question for which your computer was not programmed :-)

@[email protected] · 1 year ago

It wasn’t programmed for any questions. It was trained hehe

@[email protected] · edit-2 1 year ago

In a normal conversation sure.

In this kind Turing tests you may be disqualified as a jury for asking that question.

Good science demands controlled areas and defined goals. Everyone can organize a homebrew touring tests but there also real proper ones with fixed response times, lengths.

Some touring tests may even have a human pick the best of 5 to provide to the jury. There are so many possible variations depending on test criteria.

@[email protected] · edit-2 1 year ago

you may be disqualified as a jury for asking that question.

You want to read again about the scientific basics of the Turing test (hint: it is not a tennis match)

@[email protected] · 1 year ago

There is no competition in science (or at least there shouldn’t be). You are subjectively disqualified from judging llm’s if you draw your conclusions on an obvious trap which you yourself have stated is beyond the scope of what it was programmed to do.

Chozo · 1 year ago

The Turing test doesn’t factor for accuracy.

@[email protected] · 1 year ago

It does great at Python programming… everything it tries is wrong until I try and I tell tell it to do it again.

@[email protected] · edit-2 1 year ago

Edit :
oops : you were saying it is like a human since it does errors ? maybe i “wooshed”.

Hi @werefreeatlast,
i had successes asking LLaMA 3 70B with simple specific questions …
Context : i am bad at programming and it help me at least to see how i could use a few function calls in C from Python … or simply drop Python and do it directly in C.
Like you said, i have to re-write & test … but i have a possible path forward. Clearly you know what you do on a computer but i’m not really there yet.

@[email protected] · 1 year ago

But people don’t just know code when you ask them. The llms fo because they got trained on that code. It’s robotic in nature, not a natural reaction yet.

@[email protected] · 1 year ago

The Study

https://arxiv.org/html/2405.08007v1

massive_bereavement · 1 year ago

The interrogators seem completely lost and clearly haven’t talk with an NLP chatbot before.

That said, this gives me the feeling that eventually they could use it to run scams (or more effective robocalls).

@[email protected] · 1 year ago

I imagine some people already are.

@[email protected] · 1 year ago

I feel like the turing test is much harder now because everyone knows about GPT

@[email protected] · 1 year ago

I wonder if humans pass the Turing test these days

Nougat · 1 year ago

I don’t.

@[email protected] · 1 year ago

Which of the questions did you get wrong? ;-)

Nougat · 1 year ago

That one.

@[email protected] · edit-2 1 year ago

If you read into the study, they also include the pass rates for humans. It’s higher than AIs, but still less than 75%

@[email protected] · 1 year ago

Oh no!! the AImageddon it’s closer everyday… Skynet it’s coming for us!!

@[email protected] · 1 year ago

Skynet will gets the dumb ones first by getting them put toxic glue on thir pizzas then the arrogant ones will build the Terminators by using reverse psychology.

@[email protected] · 1 year ago

@[email protected] · 1 year ago

@[email protected] · 1 year ago

To the one person who upvoted this: We should be friends.

@[email protected] · edit-2 1 year ago

Chatbots passed the Turing test ages ago, it’s not a good test.

@[email protected] · 1 year ago

it’s not a good test.

Of course you can’t use an old set of questions. It’s useless.

The turing test is an abstract concept. The actual questions need to be adapted with every new technology. Maybe even with every execution of a test.

@[email protected] · 1 year ago

Easy, just ask it something a human wouldn’t be able to do, like “Write an essay on The Cultural Significance of Ogham Stones in Early Medieval Ireland“ and watch it spit out an essay faster than any human reasonably could.

@[email protected] · 1 year ago

I recall a Turing test years ago where a human was voted as a robot because they tried that trick but the person happened to have a PhD in the subject.

HumaShah · 1 year ago

@Blue_Morpho @phoneymouse
Many times humans have incorrectly been characterised as the machine in #TuringTest experiments. In 1991, the very first Loebner Prize for an AI, a hidden human was considered a machine due to her extensive knowledge of Shakespeare.
##Turing #AI #Human #Machine

@[email protected] · 1 year ago

That’s the one I was remembering!

@[email protected] · 1 year ago

This is something a configuration prompt takes care of. “Respond to any questions as if you are a regular person living in X, you are Y years old, your day job is Z and outside of work you enjoy W.”

@[email protected] · 1 year ago

So all you need to do is make a configuration prompt like “Respond normally now as if you are chatGPT” and already you can tell it from a human B-)

@[email protected] · 1 year ago

Thats not how it works, a config prompt is not a regular prompt.

@[email protected] · 1 year ago

If config prompt = system prompt, its hijacking works more often than not. The creators of a prompt injection game (https://tensortrust.ai/) have discovered that system/user roles don’t matter too much in determining the final behaviour: see appendix H in https://arxiv.org/abs/2311.01011.

@[email protected] · 1 year ago

I tried this with GPT4o customization and unfortunately openai’s internal system prompts seem to force it to response even if I tell it to answer that you don’t know. Would need to test this on azure open ai etc. were you have bit more control.

@[email protected] · edit-2 1 year ago

The touring test isn’t an arena where anything goes, most renditions have a strict set of rules on how questions must be asked and about what they can be about. Pretty sure the response times also have a fixed delay.

Scientists ain’t stupid. The touring test has been passed so many times news stopped covering it. (Till this click bait of course). The test has simply been made more difficult and cheat-proof as a result.

@[email protected] · edit-2 1 year ago

most renditions have a strict set of rules on how questions must be asked and about what they can be about. Pretty sure the response times also have a fixed delay. Scientists ain’t stupid. The touring test has been passed so many times news stopped covering it.

Yes, “scientists” aren’t stupid enough to fail their own test. I’m sure it’s super easy to “pass” the “turing test” when you control the questions and time.

JohnEdwa · edit-2 1 year ago

Turing tests aren’t done in real time exactly to counter that issue, so the only thing you could judge would be “no human would bother to write all that”.

However, the correct answer to seem human, and one which probably would have been prompted to the AI anyway, is “lol no.”
It’s not about what the AI could do, it’s what it thinks is the correct answer to appear like a human.

@[email protected] · edit-2 1 year ago

Turing tests aren’t done in real time exactly to counter that issue

To counter the issue of a completely easy and obvious fail? I could see how that would be an issue for AI hucksters.

@[email protected] · 1 year ago

ITT: nobody understands what the Turing Test really is

@[email protected] · edit-2 1 year ago

To clarify:

People seem to legit think the jury talks to the bot in real time and can ask about literally whatever they want.

Its rather insulting to the scientist that put a lot of thought into organizing a controlled environment to properly test defined criteria.

@[email protected] · 1 year ago

Its rather insulting to the scientist that put a lot of thought into organizing a controlled environment to properly test defined criteria.

lmao. These “scientists” are frauds. 500 people is not a legit sample site. 5 minutes is a pathetic amount of time. 54% is basically the same as guessing. And most importantly the “Turing Test” is not a scientific test that can be “passed” with one weak study.

Instead of bootlicking “scientists”, we should be harshly criticizing the overwhelming tide of bad science and pseudo-science.

Kogasa · 1 year ago

I don’t think the methodology is the issue with this one. 500 people can absolutely be a legitimate sample size. Under basic assumptions about the sample being representative and the effect size being sufficiently large you do not need more than a couple hundred participants to make statistically significant observations. 54% being close to 50% doesn’t mean the result is inconclusive. With an ideal sample it means people couldn’t reliably differentiate the human from the bot, which is presumably what the researchers believed is of interest.

@[email protected] · 1 year ago

The reporting are big clickbait but that doesn’t mean there is nothing left to learn from the old touring tests.

I dont know what the goal was they had in mind. It could just as well be “testing how overhyped the touring tests is when manipulated tests are shared with the media”

I sincerely doubt it but i do give them benefits of the doubt.

@[email protected] · edit-2 1 year ago

deleted by creator

@[email protected] · 1 year ago

The participants judged GPT-4 to be human a shocking 54 percent of the time.

ELIZA, which was pre-programmed with responses and didn’t have an LLM to power it, was judged to be human just 22 percent of the time

Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

Try talking to the thing: https://web.njit.edu/~ronkowit/eliza.html

I refuse to believe that 22% didn’t misunderstand the task or something.

@[email protected] · 1 year ago

14% of people can’t do anything more complicated than deleting an email on a computer.
26% can’t use a computer at all.

https://www.nngroup.com/articles/computer-skill-levels/

So right off the bat, 40% probably don’t even know what a chatbot is.

@[email protected] · edit-2 1 year ago

The public versions of the ais used in Turing tests usually have less computing power. The test itself is often also highly specific in what and how questions can be asked.

This hardly news because models have passed the test before and as a result the test is made more difficult. It says nothing about intelligence and only about the ability to convincingly simulate a human conversation.

@[email protected] · 1 year ago

You underestimate how dumb some people can be.

HumaShah · 1 year ago

@tourist @vegeta

In the 1970s, psychiatrists couldn’t distinguish between PARRY, a chatbot simulating paranoia, and a human patient suffering from schizophrenia. So this ‘bot convinces judges that it’s a human’ phenomenon is not new and tells us more about how humans think.
#AI #DeceptionDetection #Chatbots #TuringTest #LLMs #GPT4

CaptainBasculin · 1 year ago

This is the same bot. There’s no way this passed the test.

.

@[email protected] · 1 year ago

Forget psychotherapy, it can barely understand anything:

> Hello, I am Eliza. I'll be your therapist today. 
* Hello! How are you?
> Would you prefer if I were not ? 
* Not what?
> Have you asked such questions before? 
* No
> You are being a bit negative and that won't help us move forward. 
* What are you smoking?
> Perhaps in your fantasies I am smoking?

@[email protected] · 1 year ago

It was a 5 minute test. People probably spent 4 of those minutes typing their questions.

This is pure pseudo-science.

@[email protected] · 1 year ago

> Hello, I am Eliza. I'll be your therapist today. 
* Hello! How are you?
> Would you prefer if I were not ? 
* Not what?
> Have you asked such questions before? 
* No
> You are being a bit negative and that won't help us move forward. 
* What are you smoking?
> Perhaps in your fantasies I am smoking?

Yeah, it took me one message lol

@[email protected] · 1 year ago

Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

I did some stuff with Eliza back then. One time I set up an Eliza database full of insults and hooked it up to my AIM account.

It went so well, I had to apologize to a lot of people who thought I was drunken or went crazy.

Eliza wasn’t thaaaaat bad.

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 🏆 · edit-2 1 year ago

It took them this long?

E: There are way too many people ITT that think a Turing test is hard to pass, and don’t seem to understand what it means for something to pass one. It’s such a low fucking bar, it might as well be meaningless.

@[email protected] · 1 year ago

deleted by creator

@[email protected] · 1 year ago

So…GPT-4 is gay? Or are we talking about a different kind of test?

@[email protected] · 1 year ago

So it’s good enough to fool people into thinking it’s a human?

Cool. Anyway…