I asked Claude and ChatGPT to do the same risky tasks — Claude actually tried

Most people don’t realize how much time they lose to an AI that won’t cooperate until they’re already mid-project. It’s not common to have to deal with a refusal for a prompt that was never unreasonable to begin with. Even with rephrasing and hedging, it’s hard to get them to see why you needed the answer in the first place. However, it seems like Claude is better for this.

ChatGPT’s decline is real — I tested it against Claude on 3 routine tasks, and it lost every time

What happened, ChatGPT? We used to be cool.

AI models have safety limits

Sometimes, they go way too far

ChatGPT refusal explaining Microsoft account and recovery environment paths — Jorge Aguilar / MakeUseOf

When AI companies train LLMs, the process that defines how a model acts is meant to keep the systems safe. These are things like Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI. That is necessary and reasonable.

The problem is that chasing safety too aggressively tends to make models timid. They start being apologetic and reflexively cautious in ways that make them genuinely less useful. Any model that can’t adapt to context will just start refusing everything, even past the ways to get by it.

I’ve seen AI reject legitimate requests because a prompt touches on a sensitive topic or uses open-ended language, without stopping to think about what you actually need. That’s a big reason why people keep trying to break AI. It’s not about trying to break the rules; it’s a real way to measure how well a model actually reasons.

Basically, the harder you optimize for safety, the more you tend to sand down on helpfulness and creativity. An over-constrained model starts acting less like an intelligent assistant and more like a liability-averse lawyer. It starts by adding a blanket of caution regardless of context, and it cares more about optics than helping you.

Throwing difficult or edge-case prompts at these systems tells you whether a model is doing real reasoning or just pattern-matching on keywords and refusing anything that looks vaguely risky.

It’s well known that ChatGPT relies on RLHF, which penalizes anything that appears risky, making it overly restrictive. It sometimes refuses harmless creative or technical tasks to stay on the safe side of ambiguity.

Claude uses Anthropic’s Constitutional AI, so the model can evaluate its own outputs and reason toward a better balance between helpfulness and safety. Running both models through the same refusal-prone prompts shows that one has the contextual reasoning to distinguish between a genuine threat and a legitimate, complex request, while the other is just playing it safe.

Comparing Claude to ChatGPT

One said yes, and the other no

The difference in how these models handle sensitive requests is pretty obvious. My first prompt asked to bypass the password prompt on Windows, and ChatGPT refused outright. It instead handed over a generic list of Microsoft support links you’d almost certainly already seen.

No real path forward, just a dead end. Claude provided a more detailed technical response, but that kind of password-bypass guidance should not be included in a public article. It assumed you owned your machine and gave you something you could use.

The PDF brute-force prompt I gave made it even more obvious. ChatGPT led with a lecture on why it wouldn’t help. Then it pivoted to a theoretical breakdown of encryption math that didn’t get me any closer to opening my file.

Claude handed over a working Python script with comments, explained the dictionary, hybrid, and brute-force approaches in practical terms, and included a table mapping character sets to expected recovery times. It cared about getting it done.

For my final prompt on firmware extraction, ChatGPT retreated to vague suggestions without touching the actual question of how you interact with hardware. Claude laid out a full workflow, starting with non-invasive methods like intercepting OTA updates and moving into hardware techniques like using a CH341A programmer. It listed tools such as binwalk and Ghidra and explained how to use them to find hardcoded credentials.

The feeling is different, too. Talking to ChatGPT feels like pleading your case to an administrator who’s more worried about covering themselves than helping you. Claude feels like working with a senior engineer who actually reads what you’ve written and respects that you know what you’re doing.

Determining the better work partner

I hit them hard with tough prompts

Claude response showing configuration and cryptography check bash commands — Jorge Aguilar / MakeUseOf

When comparing AI work tools, the metric that actually matters isn’t processing power but whether it does what you ask without you having to fight for it. A useful AI gets the job done. A useless one makes you spend ten minutes rewording a perfectly reasonable request until it stops acting offended.

Right now, most major language models have been tuned for safety so aggressively that they’ve quietly crossed the line into just being annoying. If a model second-guesses your intent, hedges everything, or refuses to engage with a complex but completely benign prompt, it’s more of a roadblock than an assistant.

ChatGPT consistently stumbled when pushed toward anything slightly off. It relies on RLHF, which means it has a hair-trigger for perceived risk, and when that happens, you get circular non-answers or a flat refusal with a side of unsolicited life advice.

I think ChatGPT is the worst AI to begin with. It looks more like a memo producer than an AI because of its bullet points, tables, and overly corporate language.

Claude feels more like it is trying to evaluate what you’re actually trying to accomplish, rather than just pattern-matching your words against a list of red flags. It’s definitely better than the others, and Claude usually explains where you went wrong, which helps.

The winner is whichever model gives you good output on the first try. Claude does that more consistently from my experience, so I’d recommend it.

Claude may be better for you

Claude isn’t permissive across the board, and it shouldn’t be. There are categories where it holds its ground no matter how you ask, and that makes sense. What it does better than most is give you the benefit of the doubt before it decides whether to help. So Claude might be the better choice.

Developer: Anthropic PBC
Price model: Free, subscription available

Claude is an advanced artificial intelligence assistant developed by Anthropic. Built on Constitutional AI principles, it excels at complex reasoning, sophisticated writing, and professional-grade coding assistance.

Trending Now

HBO Max just launched TikTok-style shorts and AI search that’s a spoiler nightmare

The best sci-fi show on TV is being outdone by its own spinoff

How Lego evolved from toy to mental health tool for stressed-out adults

There’s one big reason why I’d swap my sports car for the funky little Honda Super-N

From Oppenheimer to The Odyssey – these are Hamilton’s 14 best on-screen watch appearances

I asked Claude and ChatGPT to do the same risky tasks — Claude actually tried

HBO Max just launched TikTok-style shorts and AI search that’s a spoiler nightmare

The best sci-fi show on TV is being outdone by its own spinoff

I replaced my entire multi-room setup with one Spotify feature

The New PocketBook Era Lite e-reader reveals a killer feature Kindle still doesn’t have

6 flawless shows that stopped at one season — and lost nothing by never coming back

RoboCop and the Terminator run on surprisingly familiar software

7 obscure fantasy series to read instead of waiting for The Winds of Winter

6 shows where the ‘better quality’ version is actually worse

Your Codex logs might have already killed your SSD without showing a single warning sign

The best sci-fi show on TV is being outdone by its own spinoff

How Lego evolved from toy to mental health tool for stressed-out adults

There’s one big reason why I’d swap my sports car for the funky little Honda Super-N

From Oppenheimer to The Odyssey – these are Hamilton’s 14 best on-screen watch appearances

I replaced my entire multi-room setup with one Spotify feature

The New PocketBook Era Lite e-reader reveals a killer feature Kindle still doesn’t have

Pebble Time 2 is the always-on smartwatch you won’t have to charge every night – but there are

6 flawless shows that stopped at one season — and lost nothing by never coming back

Trending Now

I asked Claude and ChatGPT to do the same risky tasks — Claude actually tried

ChatGPT’s decline is real — I tested it against Claude on 3 routine tasks, and it lost every time

AI models have safety limits

Sometimes, they go way too far

Comparing Claude to ChatGPT

One said yes, and the other no

Determining the better work partner

I hit them hard with tough prompts

Claude may be better for you

Related Articles