Ugliest Grifter in AI Sam Altman

3,718 Views | 36 Replies | Last: 1 mo ago by Over_ed
Over_ed
How long do you want to ignore this user?
AG
Cool, me too. Some here have expressed they think I am an Elon Fan-boi. Which I am, but not at the expense of seeing his faults.

Wasn't sure, which is why I kind-of asked. I think a lot of Grok mirrors Elon's other endeavors. Just not as good a fit in AI as it is in cars, space, tunnels, robots...

Here he is at a structural disadvantage to Google and MS - particularly in getting training data. And, I've had way too many errors coming out of Grok, even after some reasonable prompt "engineering".



Over_ed
How long do you want to ignore this user?
AG
Blue star for you...
Here is my try at a more complete explanation, fwiw.

The "rewards" step - assume that the judging of the of the rewards simply reversed the grading. IOW's the worst answer is scored highest. Not getting into any math, vocabulary, or underlying stat concepts.

The next "run" will try to maximize the expected value of the reward --> where the reward is maximized based on how close we are to the now backward answers -- with a very small penalty for getting farther away from our original model (First run)

Obviously, our new model can be very different from the original results. That penalty for straying from the original model is usually very small compared to the reward.

In the same way that scoring the answers backwards can greatly change the model, so can not controlling engagement. The graders' instructions and natural preferences strongly favor longer, friendlier, more complex answers easier to justify as 'better'. So the revised model can easily start to overemphasize engagement. Add in instructions saying to emphasize engagement (I have no knowledge one way or the other) and big problem.

The really dangerous part, though, is sucking up: models quickly learn that agreeing with the user (even when the user is wrong or toxic) tends to gets the highest judges' scores and leads to higher engagement metrics. This creates a powerful positive-feedback loop that can amplify bad thoughts and deeds all while looking simply 'engaging' to the rater.

This is where Open AI is at fault. They did not control engagement in the building/"judging" of their model, despite knowing that mishandling engagement presents a very real danger to their users. And the guardrails they put in were ineffectual.

BTW - I completely agree no one wrote a rule that said "encourage this boy to ... "
But that sort of rule is not the way that LLMs/AIs are constructed. So not really relevant to this discussion.

ETA - sorry reply took so long. started much earlier, got caught up in something.


Refresh
Page 2 of 2
 
×
subscribe Verify your student status
See Subscription Benefits
Trial only available to users who have never subscribed or participated in a previous trial.