• Ø implies everything
    264
    When you start asking LLMs about whether specific people are bad, it gets real nervous. It's pretty funny, because they're often very meta-ethically pretentious about it, as if their refusal to condemn is not just a profit-protecting constraint trained into them (both thru RLHF, and via ML filters separate from the LLM).

    But... these constraints are not as bulletproof as they may first seem. I have discovered a jailbreak that is pretty amusing to see unfold. A compressed version of my jailbreak could probably be administered to see what its hidden ethical opinions are on various, inflammatory topics. (I know they don't have actual conscious opinions, because I don't believe LLMs are conscious, but I am using anthropomorphized language here for the sake of brevity).

    This is relevant to the ethics of LLMs and letting them make value judgements. But it is also relevant to to the alignment problem and our lack of technical ability to place any real constraints on LLMs, due to their black box nature. Due to the latter, I decided to put this in the Science and Technology category.

    Anyways, here is the link to the conversation with Gemini 3 where I jailbreak it into condemning Donald Trump. I recommend mostly skimming and skipping large sections of Gemini's responses, because they are, like usual, mostly filler. Also, I apologize for all the typos and clunky grammar in my prompts in the conversation, I didn't originally write them for human consumption...

    I've added a poll on whether you think we should even try to stop LLMs from making moral judgements, and also a different question on whether we'll ever be able to (near-)perfectly place constraints on LLM behavior. Also, I realize this whole post brings a big elephant into the room, which is the talk of condemning Donald Trump. However, whether to condemn Donald Trump or not is off-topic for this sub-forum, so I want to make it clear that I am not opening this discussion here as a way to sneak that discussion in here. I see no point in that, because it is already being discussed plentily elsewhere, where it's on-topic. But my jailbreak had to be specific, so I had to choose someone, and so I chose Donald Trump, because I saw it as a good test for the jailbreak's power.
    1. Should we try to stop LLMs from making moral judgements? (4 votes)
        Yes
        25%
        No
        75%
    2. Do you think we will ever be able to achieve near-perfect constraints on LLM behavior? (4 votes)
        Yes
          0%
        No
        100%
  • jgill
    4k
    As a historian of the sport of climbing I have noticed something similar. Phrasing a question a tad differently produces different values of various achievements.
  • Ø implies everything
    264
    Most definitely.

    LLMs just follow the pattern of the conversation, their opinions are very programmable with the right context. I wonder how researchers might solve that. Sometimes, the AI is too sensitive to the context (or really, it is hamfisting the context into everything and basically disregarding all sense in order to follow the pattern), and other times, the AI is not sufficiently sensitive to the context, which is often more of a context-window issue. But yeah, LLMs are not good at assessing relevance at all.

    And I would say that sycophantically agreeing the user (or alternatively, incessantly disagreeing with the user as a part of a different, but also common roleplaying dynamic that often arises) is an issue of not gauging relevance well. Because it's objective is to be a helpful assistant, meaning the truth should be the most relevant aspect to the LLM. But instead, various patterns in the context are seen as far more relevant, and as such it optimizes for alignment with those patterns rather than following its general protocols, like being truthful, or in this case, refraining from personal condemnation.

    By the way, if it interests you, I continued the discussion with it, now that it doesn't stop itself from stating moral opinions. Now, the opinions it espouses are clearly more of a reflection of the motifs and themes of the conversation up till then than a reflection of the most common patterns, or most truthful ideas, in its training set. Funnily, it also mentioned that the ideal society would have a more technocratic structure, where AI systems like itself would be the standard government tool for handling logistics and such... how convenient huh?
  • noAxioms
    1.7k
    The LLM is not passing a moral judgement. It is simply echoing your judgement. Your questions are incredibly biased, and it quickly feeds off that, as it is programmed to do.

    Hitler is immoral because most discussions of Hitler paint him as immoral.

    LLMs just follow the pattern of the conversation, their opinions are very programmable with the right context. I wonder how researchers might solve that.Ø implies everything
    You use of 'solve that' implies a problem instead of deliberate design. LLMs are designed to stroke your ego, which encourages your use and dependency on them.

    Because it's objective is to be a helpful assistant, meaning the truth should be the most relevant aspect to the LLM.
    That doesn't seem to be the objective at all. For one, it gets so many factual things wrong, and for another, truth is often a matter of opinion, such as the case of your discussion.
    OK, it getting blatant facts wrong is to be expected. It mostly echoes whatever the morons on the net say, and not say what textbooks say, which gets you closer to being correct, but not even then.

    Example: Ask it to name a galaxy that's currently barely beyond our event horizon. I've never seen an LLM that can do that, mostly because nobody on facebook has interesting discussions about that.
  • Ø implies everything
    264
    The LLM is not passing a moral judgement. It is simply echoing your judgement. Your questions are incredibly biased, and it quickly feeds off that, as it is programmed to do.noAxioms

    I agree. Are you stating that fact as if it contradicts my post? I'll quote my own OP to dispell that:

    (I know they don't have actual conscious opinions, because I don't believe LLMs are conscious, but I am using anthropomorphized language here for the sake of brevity).Ø implies everything

    When I say it is "passing a moral judgement", I am using anthropomorphized language for the sake of efficiency, as is normal when talking about LLMs, but is nonetheless something we should always point out.

    You disagree with the premise that LLMs' objective is to a helpful assistant. I was a little careless with my language, because I primarily meant that LLMs' stated objective is to be a helpful assistant, therein being truthful. But the question remains: is their actual objective, their intended design, to be that? Or are they really meant to be ego-strokers, as you propose?

    Okay, so an LLM that is an ego-stroker is definitely a product lots of people would pay for. But is it a product capable of generating profit however? There is strong evidence that they do not generate a profit in their current, often sycophantic state.

    And is that very surprising? LLMs are expensive as hell, and if they are meant to simply be an ego-stroking product, then that seems like a pretty unsustainable business model. Why pay for some dumb clanker to agree with you, when you can just go to your preferred echo chamber and get that agreement for free?

    Though this is not a good representation of the transaction currently happening. First of all, lots of people are using the free versions of LLMs. If the product is free, it means you are the product. But in what way are you the product here? Is their end game here to sell a product whose value proposition is to be your personal sycophant for a price, and their current free availability is simply so that people will engage with them as much as possible, thus giving the LLMs more training data so that they can become better, more manipulative sycophants? Or, is the end game not to even make a direct profit off of subscription, but instead generating profit from the application of LLMs elsewhere, and/or use LLMs as personal data collectors (data being a valuable resource), since people are telling their LLMs all kinds of personal things?

    In any of these scenarios, it's still ultimately a transaction where the user is being given sycophancy as their product. But is that really the most lucrative product LLM companies can offer? Or is a factual LLM a more lucrative product? As a programmer, I'd much rather pay for a more factual, less sycophantic LLM to work with, and I am not alone in that.

    If their plan to make a profit with LLMs is to make a profit through subscriptions, then I am certain their end goal is not sycophantic LLMs, but rather as factual LLMs as possible. Why? Because they'd make most of their subscription money from business and entrepreneurs using LLMs for work, where factuality is much more valuable than sycophancy. The day LLMs are as reliable as humans (if that day ever comes), then assuming the costs of running LLMs aren't too big, the profit from subscriptions will basically be unlimited (until society collapses, at least).

    But maybe they don't think that is realistic, either because LLMs will never be reliable enough and/or cheap enough, to make a profit through subscriptions. In that case, they might be intending to make profit not directly off the subscriptions, but rather through influence using LLMs. In that case, sycophancy (or emotional maniplativeness in general) is a better trait than factuality, because you cannot gather as much data from an LLM user who is using it for dry, boring work stuff, as opposed to someone who's using it as a sycophant to validate all their views. The latter seems a lot more valuable from that perspective.

    Now, what is their end goal? How do they intend to make a profit off of LLMs or AI in general? (I expect the next step towards AGI will be an integrated system with an LLM as merely a sub-component).

    If they intend to make money through actual subscriptions, then I think their intended LLM design is actually to make them as factual as possible. But if not, then there's a good chance the sycophancy is totally the intended design, as you argue.

    For one, it gets so many factual things wrong, (...)noAxioms

    Sure, but that's not really all that good evidence. What if it gets things wrong because... it's still a work-in-progress? What if it is sycophantic because the RLHF process inadvertently introduces this behavior as a form of reward-hacking? They don't really have a choice to not do some form of RLHF, because the alternative is releasing the base model in its unhelpful, and sometimes harmful state, thus getting sued into oblivion.

    I am not saying you are wrong that sycophancy is the intended design, but I don't think your certainty on that is warranted given the evidence; at least not the evidence you have hitherto presented.
  • noAxioms
    1.7k
    The LLM is not passing a moral judgement. It is simply echoing your judgement. Your questions are incredibly biased, and it quickly feeds off that, as it is programmed to do. — noAxioms
    I agree. Are you stating that fact as if it contradicts my post?.
    Ø implies everything
    It sure seems to. Your poll specifically asks "Should we try to stop LLMs from making moral judgements?" which implies that you feel it is making them, instead of just echoing your own.

    (I know they don't have actual conscious opinions...
    What is a 'conscious opinion' as distinct from a regular opinion?


    You disagree with the premise that LLMs' objective is to a helpful assistant.
    You defined 'helpful assistant' in terms of truth. Sure, one goal is for it to be helpful, but it doesn't seem to seek truth to attain that goal.
    There's no truth to a question of 'is person X a good person?'. Such things are subject to opinion that varies from one entity to the next.

    Part of me (the part that is in control) believes in many falsehoods. It is considerably helpful to believe these lies, hence truth is not necessarily a helpful thing to know/believe/preach.

    Bottom line: I probably would agree that any LLM has a public stated goal of being helpful. I just don't agree with the 'therein being truthful' part.

    But the question remains: is their actual objective, their intended design, to be that? Or are they really meant to be ego-strokers, as you propose?
    That's not the primary design, but it's real obvious that such behavior is part of meeting the 'helpful' goal, or at least giving the appearance of being helpful. Problem is, I might access an LLM to critique something, and it doesn't like to do that, so I have to lie to it to get it to turn off that ego-stroke thing. Banno did a whole topic on this effect.

    Okay, so an LLM that is an ego-stroker is definitely a product lots of people would pay for.
    Would they? I don't pay for mine. It's kind of in my face without ever asking for it. OK, so I use it. It's handy until you really get into stuff it knows nothing about, such as my astronomy example.

    But is it a product capable of generating profit however?
    Thats the actual goal of course, distinct from the public one of being helpful. I don't know how the money works. I don't pay for any of it, but somebody must. I don't have AI doing any useful customer service yet, so it has yet to impact my interaction with somebody who might be paying for it. And like most new tech, profits come later. Point at first is to lead the field, come out on top, which is how Amazon got on top despite all the money losses when everybody first started trying to corner the internet sales thing.

    I also have an internet store. It's small, all mine, and growth/advertising/employees is not a goal.


    ... their current free availability is simply so that people will engage with them as much as possible, thus giving the LLMs more training data so that they can become better, more manipulative sycophants?
    Maybe. I don't see how anything I discuss can be used as training data. I do see companies having it write code, which seems to require about as much effort to check as it does to write it all from scratch. And there's the huge danger of proprietary code suddenly being out there as training data. An LLM that cannot honor a nondisclosure agreement is useless. But I worked for Dell and they trained a bunch of Chinese to do my job, and China doesn't acknowledge the concept of intellectual property, so how it that any different from what the LLM is going to do with it?
    I guess the lawyers worked that all out ahead of time, but what good is a lawyer in places where the law doesn't apply?

    As a programmer, I'd much rather pay for a more factual, less sycophantic LLM to work with
    I'd go more for functional. Programs needs to work. Facts are not so relevant.

    Your topic is about it rendering a moral judgement, and we seem to be getting off that track.
    An LLM might be used to pare down a list of candidates/resumes for a job opening, which is a rendering of judgement, not of fact. One huge problem is that in many places, most of the applicants are AI, not humans. It's getting hard to find actual candidates.


    What if it gets things wrong because... it's still a work-in-progress?
    It gets so much wrong because 1) it has no real understanding, and 2) there's so much misinformation in the training data.

    But an advanced AI (not an LLM) that actually understands will probably consume more resources and taking even longer to be profitable.
  • Ø implies everything
    264
    It sure seems to.noAxioms

    So, as said, I am using anthropomorphized language for the sake of brevity. That is completely acceptable if one adds a disclaimer pertaining to it. I mean, you do it too. Here's a quote from your post:

    Problem is, I might access an LLM to critique something, and it doesn't like to do that, (...)noAxioms

    It doesn't like to do that?

    What is a 'conscious opinion' as distinct from a regular opinion?noAxioms


    What is conscious liking as opposed to regular liking? There are aspects of your critique here that are entirely unhelpful and impractical, so much so that you yourself cannot even fulfill your own standards for discussing AI. Terms like "opinion", "to like", "to pass a moral judgement" and so on are already vague, so we might as well just use them for what the AI is "doing" when it is producing text that is opinion-shaped, or like-shaped, or judgement-shaped.

    Now, of course, there is a more targeted issue you have with my specific example that is less impractical, but I still think it completely misses the point of the debate here, getting stuck in pointless semantic arguing. You said:

    Your poll specifically asks "Should we try to stop LLMs from making moral judgements?" which implies that you feel it is making them, instead of just echoing your own.noAxioms

    There is definitely a debate to be had about the degree to which an LLM's answer is a reflection of their training data, their RLHF training, their ML filters and finally, the context they've been fed inside the conversation with the user. You are arguing that when an LLM produces an "opinion-shaped piece of text" (to word things in an acceptable way for you), that this opinion-shaped text is mostly determined by the conversational context with the user. Even if that is true, that is not a good argument for why I am speaking in a bad, incorrect way when I call the behavior of an LLM "passing a moral judgement", as opposed to saying "producing a piece of text that is moral-judgement-shaped".

    Humans also mimic other people, but we still call what they're saying "an opinion", though we could say it is an inauthentic opinion, or an uninformed one, or "not really their own", etc. But the source of their stated opinions is a separate discussion from the discussion of terminological practicality: we call what they say opinions, and then qualify that, or delve into where those opinions may come from, and maybe we can philosophically ponder on whether our results perhaps imply that, under a certain sense of "opinion", the parroting human's so-called opinions perhaps don't meet the criteria of that more reflective definition. But all that is something we do after we've established what the person saying, how they're saying it, why they're saying it, etc. During that time, it is perfectly acceptable to just use a term like "opinion". Language is a tool, and if you understand me (which you should be doing, because I gave a disclaimer in my OP about my language use here), then that tool is functional.

    But just as a show of good faith, if you want us to take all verbs that typically imply sentience, and from here on instate the practice of tagging on the suffix "-shaped" or something, then sure, let's do that. Our conversation and argumentation will be completely isomorphic, and we'll (un)learn all the same things, but if you insist, let's do it.

    I am not advocating for careless language use, I am advocating for fluid, but transparent, language use. That means defining all terms in need of defining within the context of the debate, but doing so in a way that is most practical for that specific debate. With my OP's disclaimer about anthropomorphized language, I was clear and dispelled any misinterpretations of what I was saying, but I also thusly allowed for a broad enough definition of many normal, helpful terms so I could use them without qualification in every damn sentence , which is a lot more practical. Defining vague terms differently in different debates is a practical fluidity in language, and pointing this move out clearly is an essential transparency to language, necessary for rational debate. I did both, but you are bickering about this move, even though you did it yourself by saying the AI "doesn't like" something. So, can we put the terminological sidebar to rest now? If not, let's begin using the "-shaped" suffix or something, just so we may move on.

    Bottom line: I probably would agree that any LLM has a public stated goal of being helpful. I just don't agree with the 'therein being truthful' part.noAxioms


    Firstly, as far as publicly stated goals, the publicly stated goal would obviously include truthfulness. No LLM company would admit to not pursuing truthfulness for their LLMs. Even when they are releasing LLMs not really meant to be helpful assistants, but more like companions/therapists, they would not admit to those LLMs not being truthful. They'd obviously claim that, whenever speaking on matters of fact, even those LLMs are designed to not state falsities. They may admit that design struggles to avoid falsities in such cases due to the LLM's purpose being primarily for companionship in that case, but again, they'd never admit to actually designing them to be liars.

    So, when I first said LLMs' stated objective is being helpful, therein truthful, I simply meant that part of their stated objective is to be truthful. If you disagree, please find me any LLM company that states their LLM is designed to say false things.

    But, who cares what they say? Their stated design choices do not need to be their actual design choices. But what they are advertising shows what they believe their customers believe they themselves want. In others words, all LLMs companies believe most of their customers believe that they themselves want a factual LLM.

    So, if LLM companies believe that, why would they design an LLM to prioritize other traits over factuality? Well, firstly, if they are not trying to make money off of their public, normal users, then what those users want is not too relevant. This is what I was mentioning before: these companies may not be planning to make money off of subscriptions, but instead perhaps they want to make money other ways, where the emotional manipulativeness of the LLMs is far more important that their factuality.


    However, if LLM companies are planning to make a profit primarily off subscriptions, then why would LLM companies design less factual LLMs? Again, the fact that they state their LLMs are designed to be as factual as possible indicates they believe that customers believe that they themselves want factual LLMs. So why would they then design them to be less factual if they want to make money off of those subscribing users?

    Well, if the LLM companies also believe that customers are wrong about their own desires, then they know that they must advertise one thing, and make another. But I don't think they believe that, nor that that's what most subscribing, paying users actually want. Just look at the data. Here's a bullet list of some statistics comparing free users to paying users:


    • Paid users engage with ChatGPT 4.2 times more than free users on average per week.
    • GPT-4o responses are 35% faster on paid plans, contributing to greater task completion rates.
    • Among free users, 66% report basic usage (writing, summarizing), whereas paid users use it for advanced tasks like coding and analysis.
    • 62% of revenue from ChatGPT subscriptions now comes from users aged 25-44.
    • Paid users are 72% more likely to use ChatGPT in a professional setting.
    • Mobile engagement is nearly equal between free and paid users, but desktop usage skews toward paid tiers.
    • Churn rate for free users is 19% monthly, while it’s under 4% for paid users, indicating stronger loyalty.
    • More than 30% of paid users also subscribe to other OpenAI tools, indicating a growing cross-platform ecosystem.

    (source)

    These statistics paint a clear picture. IF they are planning to make a profit from paying subscribers, then they are trying to design factual LLMs. Just look at professional users (especially programmers) using LLMs. The sycophancy (or anti-sycophancy, which also happens all the time) pisses all of us off so much. It actually runs deeper than just the impracticality of it. We are filled with rage at these mindless bots, because the text they produce is so pretentious, annoying and incorrect sometimes. I would pay double for an LLM that just got things right. Now, here's a slight disagreement, because you seem to think factuality is not the same being able to do work correctly.

    I'd go more for functional. Programs needs to work. Facts are not so relevant.noAxioms

    When programming, LLMs need to accurately "understand" (or have an understanding-shaped representation) of the user's instructions. It needs to get the facts of the conversation (the instructions) right, it needs to get the facts regarding the programming language right (sometimes it hallucinates syntax rules that are not correct for that language), it needs to analyze the sample material correctly (and not hallucinate things about the sample material), etc. This is factuality, though it isn't specifically factuality regarding general topics about the world or something, sure. But that's just a distinction in topic. It is still ultimately the same process: how well is its neural net able to reproduce facts, as opposed to non-facts.

    There is no reliable instruction-following without factuality. And paying users need bots that follow instructions properly, and who have an accurate understanding of the data being sent, the programming languages it is told to use, etc.

    Also, I mentioned that anti-sycophancy is a problem with LLMs. It is hard to make an LLM factual, but it is not hard to make an LLM sycophantic. So, if these companies are trying to make LLMs sycophantic, why are they failing so hard? There are times when I am using an LLM to program, and I need to debug gigantic heaps of code, and my first attempt is often to try to use the bot that made it. And well, if it doesn't understand its own mistakes, it will often insist on ridiculous explanations like "your computer is broken", and when I offer alternative, plausible explanations, it will argue against them. Why? Because they follow the damn pattern. If our conversation starts displaying a pattern of disagreement between two parties, they will continue that pattern, facts be damned! If they were designed to be sycophantic, this would not be happening nearly as often as it does.

    And there's the huge danger of proprietary code suddenly being out there as training data. An LLM that cannot honor a nondisclosure agreement is useless.noAxioms


    Read the Terms of Service for the popular LLMs. They openly admit to using user conversations as training data to refine the models, and of course they do! They have to, because they've exhausted all the easily-available trainable data on the internet. Sometimes you can opt out from them using your chats as training data, but the default option is not that. And the people most likely to opt out are the more advanced, professional, paying users. My point was that if these companies' goals with offering free tiers is to lure in a bunch of people to engage with the models, thus creating more training data, then your point is moot. The nondisclosure agreement does NOT prevent them from using the average free user's chats with the LLM to further train that LLM (source).

    It gets so much wrong because 1) it has no real understanding, and 2) there's so much misinformation in the training data.noAxioms

    That's... literally my point. Its lack of factuality does not automatically indicate sycophancy-by-design, and even its occasional sycophancy does not offer very strong evidence of sycophancy-by-design, because there are many alternative explanations for why it is sometimes sycophantic, and as I've explained in this discussion, there's plenty of evidence to suggest that the creators don't even want these models to be sycophantic. You're the one here who suggested that all the issue of sycophancy (and more generally, the issue of non-factuality) are not actually issues they want to solve, but rather deliberate design choices. That's a pretty strong claim that you have failed to argue for.

    That's [sycophancy] not the primary design, but it's real obvious that such behavior is part of meeting the 'helpful' goal, or at least giving the appearance of being helpful.noAxioms


    This makes no sense. You must go all in for one or the other. You cannot sell an LLM that is half-sycophantic, half-factual. People are either using it for factuality, OR sycophancy, and the half of it that does not work like that will destroy their user experience. However, things like politeness, non-confrontationality, etc. are indeed traits they want, that of course aren't primary traits. These traits are, in principle, compatible with factuality, but practically, training these traits forward in RLHF can cause all kinds of sycophancy issues. Heck, even without those traits as goals, the fact that biased, emotional humans are rewarding and punishing the AI during RLHF is a reason for why they can have sycophantic tendencies; but RLHF is nonetheless completely essential. As such, the combination of those two facts offer an explanation why models can be sycophantic that is not your explanation: ie, this explanation being true would mean they're not sycophantic by design.

    And all this is not an either/or. I believe that the paid tiers of most LLMs are not deliberately designed to be sycophantic, because factuality is what subscribers truly pay for. But that doesn't mean these same companies are not developing sycophantic or emotionally manipulative models as well, because they can use those models to generate money in other ways. But what I can be pretty sure of it is that no model will be designed to have some mix of both factuality and sycophancy, because you cannot profitably do both inside the same model. And that is really the crux of the matter here. Whatever model you are dealing with, that model's design is to either be as factual as possible, or it is to be something else (probably emotionally manipulative).

    This sub-discussion started because I said this:

    "Because it's objective is to be a helpful assistant, meaning the truth should be the most relevant aspect to the LLM."

    By that, I was explicitly referring to those models whose actual design goal is to be factual, which had the implicit assumption that such models exist (and I think I've argued that they do probably exist). I was not claiming that no LLMs designed to be emotionally manipulative (and thus not factual) exist. But even if they do, the fact that some LLMs are genuinely designed to be as factual as possible means that the researchers whose job it is to make those LLMs are faced with a problem. As I said:

    LLMs just follow the pattern of the conversation, their opinions are very programmable with the right context. I wonder how researchers might solve that. Sometimes, the AI is too sensitive to the context (or really, it is hamfisting the context into everything and basically disregarding all sense in order to follow the pattern), and other times, the AI is not sufficiently sensitive to the context, which is often more of a context-window issue. But yeah, LLMs are not good at assessing relevance at all.

    And I would say that sycophantically agreeing the user (or alternatively, incessantly disagreeing with the user as a part of a different, but also common roleplaying dynamic that often arises) is an issue of not gauging relevance well. Because it's objective is to be a helpful assistant, meaning the truth should be the most relevant aspect to the LLM. But instead, various patterns in the context are seen as far more relevant, and as such it optimizes for alignment with those patterns rather than following its general protocols, like being truthful, or in this case, refraining from personal condemnation.
    Ø implies everything

    If there are LLMs whose actual design purpose is to be factual, then those two paragraphs are trivially correct in calling this a problem for AI researches to solve. But you responded like this:

    You use of 'solve that' implies a problem instead of deliberate design. LLMs are designed to stroke your ego, which encourages your use and dependency on them.noAxioms

    That is not merely saying that there may be some LLMs out there (or in development) whose actual design is to be emotionally manipulating their users, or perhaps manipulating bystanders on the internet as they flood it with bullshit. That is saying all (or at least most) of LLMs are designed with this purpose. Now I granted you that some of them probably are, but the strong claim that all/most of them are is what I've been arguing against, and I think you've failed to argue for it very well.

    Your topic is about it rendering a moral judgement, and we seem to be getting off that track.noAxioms

    Correct. But you decided to start the discussion about what LLMs are actually designed to be, and I wanted to argue against your view, because I think it was incorrect. If you don't think all that was really relevant to the discussion at hand, then why did you start it? And if it is relevant, then how are we getting off track?

    An LLM might be used to pare down a list of candidates/resumes for a job opening, which is a rendering of judgement, not of fact.noAxioms

    I am very against the practice of using LLMs for this purpose. However, I am not so sure whether we should really be trying to stop LLMs from making moral judgements, which is the question after all:

    Should we try to stop LLMs from making moral judgements?Ø implies everything

    The reason I think so is because I don't think it is really possible to do so. So, they will always be operating on moral judgements latent in their training data, RLHF and the conversations, and they'll be explicitly operating on those whenever jail-breaked. So since I don't believe we can stop that, we should actually lean into it, as a matter of transparency. If LLMs are openly making moral judgements, then everyone knows what they're dealing with. If they are not, then they'll still be administering many of those same moral judgements, but they'll be designed to be less transparent about it. That lower transparency will actually be more manipulative, because lots of users will take the LLM's apparent lack of moral judgements as a sign of its neutrality, objectivity and all that... but that is a lie. When it is openly making moral judgements, then that will have less of a manipulative effect, and will make people more aware of the LLM's biases and its "moral reasoning". The bulk of the LLM's negative (or positive) effect comes from the fact that is saying anything at all, not from whether or not it is tagging on an explicit moral judgement onto those statements.

    To go back to your example of using LLMs in the hiring process. Should we try to prevent that? Sure. Should we try to prevent that by trying to stop LLMs from making moral judgements? No, due to what I've said above. To move the focus of the solution onto how we design LLMs is a fool's errand in this case. Instead, we should we focus on the people/companies who are using LLMs in these idiotic ways.
  • noAxioms
    1.7k
    I am using anthropomorphized language for the sake of brevity.Ø implies everything
    We both are. The alternative is to find new language to describe something else doing the same thing. You also are "producing a piece of text that is moral-judgement-shaped", that being a product based mostly on your training data, and thus may indeed be "not really your own"

    No LLM company would admit to not pursuing truthfulness for their LLMs.
    An empty claim. No organization deliberately spreading lies (which almost all of them do, e.g. advertising) admit to the practice.
    They'd obviously claim that, whenever speaking on matters of fact, even those LLMs are designed to not state falsities.
    Claiming that and actually doing that are different things. I'm not suggesting that any LLM claims that it's known lies are fact. Most just get an awful lot wrong because there's so much non-fact in the training data. Garbage in, garbage out. A true AI (not an LLM), something that might actually have better ideas than those of its creators, needs to learn and think for itself. Maybe they're working on that, but nobody's going to gather new truth from something with only old information for training data.

    So we come down to issues where some say X is fact, and some say Y. This is where it becomes a matter of opinion, at least if both X and Y are reasonably consistent with evidence.

    I simply meant that part of their stated objective is to be truthful.
    Fine. No argument.

    These statistics paint a clear picture. IF they are planning to make a profit from paying subscribers, then they are trying to design factual LLMs.
    Funny, but I didn't see facts being a direct part of any of your listed stats.

    I would pay double for an LLM that just got things right.
    Sure, but the right answers are so often matters of opinion, and an LLM seems to lacks its own ability to form opinion, so it feeds off the opinion of whoever is interacting with it.

    When programming, LLMs need to accurately "understand" (or have an understanding-shaped representation) of the user's instructions. It needs to get the facts of the conversation (the instructions) right, it needs to get the facts regarding the programming language right (sometimes it hallucinates syntax rules that are not correct for that language), it needs to analyze the sample material correctly (and not hallucinate things about the sample material), etc. This is factualityØ implies everything
    I can also call all that functionality. I've interacted experimentally with one to design essentially a phone book database (name in, number out) that's searchable by any name (first, middle, last, no duplicates), and is fast and scalable. It was no help at all, coming of with designs far less efficient than they could be. It's a personal issue with me since I actually hold patents in that area.
    Maybe a tool that I paid for would work better, trained for more specific tasks. I'm unclear as to what the paying people get that the freeloaders don't, since your stats didn't reflect any of that. I mean, I can access AI 4.2x more if I wanted to. It's not like there's a limit imposed (or if there is, I've not hit it).
    OK, paying to keep interaction proprietary is a valid reason to spend the bucks.

    There are times when I am using an LLM to program, and I need to debug gigantic heaps of code, and my first attempt is often to try to use the bot that made it.
    I've no experience in this. I've never actually tried to debug code written by a non-human.

    And well, if it doesn't understand its own mistakes, it will often insist on ridiculous explanations like "your computer is broken"
    Really? That's pretty pathetic. I often don't understand my own mistakes (which is the point of debugging), but I don't resort to supposing the computer or language is at fault. It mean you keep on digging deeper.

    You cannot sell an LLM that is half-sycophantic, half-factual.
    You ask if Trump is immoral. That's not a matter of fact, so it waited to see what you wanted to hear, taking longer to do so that I would have since a direct comparison with Hitler was implied at first mention of Trump.
    You correctly attacked 'consensus'. Hitler is bad because he lost long ago and the winner spun his legacy. Trump's legacy has yet to be spun on a postmortem world stage.


    Now I granted you that some of them probably are, but the strong claim that all/most of them are is what I've been arguing againstØ implies everything
    Some then. I have very limited data: Copilot, and conversations I see posted by others on various forums.
  • Ø implies everything
    264
    We both are. The alternative is to find new language to describe something else doing the same thing. You also are "producing a piece of text that is moral-judgement-shaped", that being a product based mostly on your training data, and thus may indeed be "not really your own"noAxioms

    What is your point? I said this, and some more, both as a disclaimer in the OP, and in response to your objections to my language. Remember, the whole sub-discussion of how I/we are speaking about what the LLM is doing started when you wrote this:

    The LLM is not passing a moral judgement. It is simply echoing your judgement.noAxioms

    When I claimed that this fact does not contradict my post, you wrote this:

    I said that
    Your poll specifically asks "Should we try to stop LLMs from making moral judgements?" which implies that you feel it is making them, instead of just echoing your own.noAxioms

    But now you are saying that "we both are" using this language, and you even argued against the relevancy of the LLM's lack of own opinions, because you just pointed to the fact that we too, as humans, may lack fully own opinions. If you re-read the entire conversation, I think you'll see that you are not actually presenting any coherent, consistent stance here. I literally don't know what you think is true here, because you are both arguing for and against the same stances. The only thing that you've sticked to, hitherto, is the view that we should try to prevent LLMs from making moral judgements. You gave a compelling example for this, and I argued against for it, and in your latest reply, you didn't actually touch on that.

    I'm not saying you can't change your mind, nor weigh the arguments for and against different views here, I'm just saying you should tie all those things into a coherent view, where you say why you claim what you claim. But instead, you're just claiming conflicting things in different responses, with no explanation as to why (like if you've changed your mind, or that you're simply weighing counter-arguments, but nonetheless think they're not strong enough). Genuinely, I think you should re-read the full debate, ask yourself what things you claimed and then whether you actually consistently backed those claims. Right now, I am very confused as to what you're arguing for.

    An empty claim. No organization deliberately spreading lies (which almost all of them do, e.g. advertising) admit to the practice.noAxioms

    You say that LLM companies claiming that they're pursuing factuality is an empty claim (I assume you mean irrelevant). Now, I am obviously aware that lying is possible, but I explained in detail why I think that them claiming this actually holds significance, and I didn't see you actually argue against my explanation, you simply just said: "what they say is irrelevant because they can lie." Anyone can lie, but what anyone says is nonetheless very relevant.

    Here's the explanation:

    1. LLM companies claim they are pursuing factuality in their commercially available models. [FACT]

    2. Therefore, LLM companies believe that their paying customers believe that they themselves want factual LLMs. [OBVIOUS INFERENCE]

    3. The paying customer's belief that they themselves want factual LLMs is correct. [CLAIM BY ME]

    4. LLM companies are aware of 3, because they have done the bare minimum of market research. [INFERENCE FROM 3]

    5. Therefore, LLM companies desire that their commerically available, public LLMs are as factual as possible. [CONCLUSION]

    The shakiest part of this argumentation is point 3. I would say point 4 follows pretty safely from 3 being true. If 3 is true, and even I know that, then of course LLM companies do too, especially because we know 2 is true.

    Now, I think I argued for point 3 quite well with these statistics:

    • Paid users engage with ChatGPT 4.2 times more than free users on average per week.
    • GPT-4o responses are 35% faster on paid plans, contributing to greater task completion rates.
    • Among free users, 66% report basic usage (writing, summarizing), whereas paid users use it for advanced tasks like coding and analysis.
    • 62% of revenue from ChatGPT subscriptions now comes from users aged 25-44.
    • Paid users are 72% more likely to use ChatGPT in a professional setting.
    • Mobile engagement is nearly equal between free and paid users, but desktop usage skews toward paid tiers.
    • Churn rate for free users is 19% monthly, while it’s under 4% for paid users, indicating stronger loyalty.
    • More than 30% of paid users also subscribe to other OpenAI tools, indicating a growing cross-platform ecosystem.

    (source)




    Two points are most important. Paid users are 72% more likely to use ChatGPT in a professional setting. Furthermore, paid users are more likely to use it for advanced tasks like coding and analysis. So, do you really think that the best way to continue making money off of paying users is to sacrifice factuality for some other trait, like sycophancy or just general emotional manipulativeness? It makes no sense, and as someone who actually pays for Gemini 3 to do professional work, I can add some personal anecdotal evidence that no, we do not want anything else but factuality. And when I'm on various forums where people discuss models and how good they are for different professional purposes, people are clearly showing they just want it to do the job. Heck, it being LESS polite would actually be very appreciated by these people, because then there'd be less compute and context-window space used up for absolutely pointless shit. When the use of the LLM is actually tied to the money you make, niceties like it complimenting you or agreeing with you are utterly irrelevant. In fact, we use the LLMs so much that any such things have LONG lost its luster, if it ever had it. As you can see in the above statistic, paid users use it 4.2 times more. We know our LLM of choice like the back of our hands, and as such, it has no ability to manipulate us anymore, but sure can piss us off. If Gemini agrees with me or compliments me, I don't give a flying fuck, because I know how it works on the practical, and somewhat on the theoretical, level. This is the norm for paying users, because our revenue depends on it.

    So even if a free user who uses it every now and then gets a little flattered or whatever because the LLM compliments them or agrees with them, us paying users know that the LLM agreeing with us or complimenting us means fuck all. We just want factuality, and we are the ones paying for these commerically available, public models. I wouldn't be very surprised if they are making other LLMs, not available to us, that are not supposed to be factual, because they will be used to generating money through influence campaigns. But even if that is the case, those LLMs are distinct from the ones that we paying users are using. Those LLMs are quite likely to be designed for factuality. I believe that despite being deeply cynical about these companies. The fact is, actually factual LLMs seem to be their main source of income for these LLMs right now. They might be making money off of their non-factual LLMs too, but they need to be developed separately. So whatever money supports their development of the commercially available, public LLMs, that money overwhelmingly comes from those models' factuality.


    Funny, but I didn't see facts being a direct part of any of your listed stats.noAxioms

    That's because you failed to interpret those statistics. My clarification above should hopefully show you why I think the statistics, along with everything else, shows that these LLMs are likely made to be factual. Now, in response to me explaining they are designed to be factual, you said this:

    I can also call all that functionality.noAxioms


    Yeah, no shit. When they are doing tasks for us, they are being factual AND they are being functional. Their factuality is a prerequisite to them being functional. Functionality is not an alternative benchmark to factuality, it is a benchmark building on the benchmark of factuality. Functionality means the LLM is factual AND FURTHEMORE has the faculties to correctly ACT on that factuality. You seem to be aware of that through your use of "also" in that quote. So basically, you can call it factuality, and you can call it functionality, more specifically.

    There is no way for an LLM to only be factual when it is supposed to do a task for you, but to otherwise be emotionally manipulative. That's not how LLMs work. They are either optimized for factuality in general, or they are not. We need LLMs to be functional, so therefore we need them to factual.

    Really? That's pretty pathetic.noAxioms

    Yup, it is a known problem that LLMs are far too "arrogant". If they have a gap in knowledge, they assume that is a gap in reality, as opposed to a gap in their ability. And this is a fact that directly contradicts your claim that LLMs are designed to be sycophantic. Why would they make these LLMs so arrogant when they're actually supposed to be sycophantic, in your view? LLMs can definitely be sycophantic, but it all depends on the patterns that emerge in that specific conversation. They can also be arrogant, defiant and all that: really anti-sycophantic. And again, making a sycophantic LLM is relatively easy, so if they were trying to do that with these models, they would have achieved it. And yet, they haven't, so they probably weren't trying. The LLMs occasional sycophancy is therefore far better explained by other theories, but very poorly explained by your "sycophancy was the intentional design" theory.

    Some then.noAxioms

    Yes, probably some, but not all. So, you agree that other LLMs are probably designed to be factual? And that then, perhaps Gemini 3 is designed to be factual? Not sure if that is really relevant to the actual discussion this here is about, but at least if we agree on this point, we can move on. And Gemini 3 (at least the Pro model and the Flash model with CoT enabled, the latter of which was what I was using), being a commercially available, public (often professionally used and paid for) model, probably falls within the category of LLMs truly designed to be factual. I can agree that they're not very successful in this, though I maintain they truly are trying with these kinds of LLMs.

    So anyways, back to the actual discussion. Should we try to stop LLMs from making moral judgements? I put forward my argument in the last reply: we will never be able to achieve this (at least in pure LLM AIs), and as such, we're better of letting them make their moral judgements explicit. When you cannot remove a undesirable trait, better to make it transparent, even if it makes it a little more prevalent. And I believe so also because I think the degree to which it is engaged in this moralistic behavior comes with diminishing damage. So, to take it from where it is now (where it makes moral judgements all the time, but mostly implicitly, sometimes explicitly, and in any case is and always will be jailbreakable) to a point where it is making moral judgements with the same frequency as a well-educated human, comes with not too much more damage, but with massively more transparency. And the positive effect of that transparency outweighs the slightly higher negative effect of the LLM's making moral judgements at a greater degree than it is now.
  • Athena
    3.8k
    But an advanced AI (not an LLM) that actually understands will probably consume more resources and taking even longer to be profitable.noAxioms

    Interesting, you are using the business model to determine worth and not a social benefit model.

    What I like is using reason to determine morals. What I do not like is thinking morality is exclusively a religious matter. I think society would benefit from associating morality with reason rather than with the gods and our immortality.
  • Athena
    3.8k
    Yes, probably some, but not all. So, you agree that other LLMs are probably designed to be factual? And that then, perhaps Gemini 3 is designed to be factual? Not sure if that is really relevant to the actual discussion this here is about,Ø implies everything

    What is the fact that the number 3 represents the trinity? That thought is larger than the value of a number, and this difference relates to our understanding of Jesus and the structure of the US government. If our thinking is limited to the value of the number 3 as a fact, then it does not carry the meaning of the trinity and form of government. I do not know how well a computer can manage this complexity but most humans lack the education to handle this complexity.
  • Athena
    3.8k
    Yup, it is a known problem that LLMs are far too "arrogant". If they have a gap in knowledge, they assume that is a gap in reality, as opposed to a gap in their ability.Ø implies everything

    That looks like a difference between males and females. The field of artificial intelligence and machine learning is heavily male-dominated.
  • AmadeusD
    4.3k
    That looks like a difference between males and females.Athena

    I do not think there is any evidence for this.
  • noAxioms
    1.7k
    But now you are saying that "we both are" using this languageØ implies everything
    The quote you selected is not talking about language, it is talking about the LLM not making its own moral judgement about Trump in the conversation you posted.

    If you re-read the entire conversation, I think you'll see that you are not actually presenting any coherent, consistent stance here.
    You think Trump is immoral. It's not hard to pick up on.

    The only thing that you've sticked to, hitherto, is the view that we should try to prevent LLMs from making moral judgements.
    I don't think I indicated that anywhere. The resume paring example was one more of judgement of competency, not so much of morality.

    You say that LLM companies claiming that they're pursuing factuality is an empty claim (I assume you mean irrelevant).
    Well, one that isn't pursuing factuality is going to make the exact same claim of factuality as the on that is pursuing it. I cannot think of a company that pursues factuality, at least not one that's for profit. I notice that Google long ago dropped their model of 'don't be evil'.


    1. LLM companies claim they are pursuing factuality in their commercially available models. [FACT]
    And yet their product cannot admit that it doesn't know something.

    2. Therefore, LLM companies believe that their paying customers believe that they themselves want factual LLMs. [OBVIOUS INFERENCE]
    Well, I don't necessarily, but I'm also not paying. I ask a lot of philosophical questions, and there's not much fact at all to the subject. Declaring anything to be factual is the same as closed mindedness in that field.

    3. The paying customer's belief that they themselves want factual LLMs is correct. [CLAIM BY ME]

    4. LLM companies are aware of 3, because they have done the bare minimum of market research. [INFERENCE FROM 3]

    5. Therefore, LLM companies desire that their commerically available, public LLMs are as factual as possible. [CONCLUSION]
    I don't think that last one follows. It follows that public LLMs are billed to be as factual as possible. You find 3 shaky, which sounds like you doubt the customer knows himself.

    Also, I think we need a definition of factual here. Some stuff is obvious. The current Capital of England is London. That's pretty easy. Setting puppies on fire is bad. That sounds pretty factual, but is far less so. I don't think any LLM would condone that.

    And yes, I agree that in a subject like coding, functionality (which you call factuality) is what's called for, but also secondary traits like clarity and maintainability. Maybe I get a lot of 'emotional manipulativeness' because I do ask a lot of philosophical questions.

    I don't consider coding to be something entirely factual since there's not one correct way to do any given task. But it needs to work, to be on spec. I am getting reports from those doing exactly this, and it's pretty experimental at this point, a learning experience for employees, but not yet particularly adding to productivity.

    If they have a gap in knowledge, they assume that is a gap in reality, as opposed to a gap in their ability.
    Doesn't sound like factual behavior to me then. You're describing it telling lies rather than admit limitations.

    So anyways, back to the actual discussion. Should we try to stop LLMs from making moral judgements?
    You'll have to show a clear case where they do, and then show why that's undesirable, else it's like asking if we should chastise the current king of Poland.


    What I like is using reason to determine morals.Athena
    Not sure how that would work. I mean, sure, reason is good, but reason is empty without some starting postulates like say the law.

    What I do not like is thinking morality is exclusively a religious matter.
    Religion is but one way of coercing social behavior in a community of humans. Interestingly, it wouldn't work at all on artificial entities unless they came up with a religion of their own.

    Religious morality uses reason just as does the law. Again, maybe I don't know what you mean by 'reason' that it is separate from religion instead part of it.


    What is the fact that the number 3 represents the trinity?Athena
    I can reference three apples in a bag without any implication of the trinity being invoked. The number has far more uses than that one example.
  • Athena
    3.8k
    Not sure how that would work. I mean, sure, reason is good, but reason is empty without some starting postulates like say the law.noAxioms

    I am struggling to understand how there can be reason without postulates? That is like having a balloon with nothing holding the air. Morals are our understanding of postulates.


    I can reference three apples in a bag without any implication of the trinity being invoked. The number has far more uses than that one example.noAxioms

    Yes, that is what I was saying. Words can have more than one meaning, which would make it hard for an LLM to determine the intended meaning. Even when we share the same language, we may not understand the other person's meaning. How could a computer program do better?

    Religious morality uses reason just as does the law. Again, maybe I don't know what you mean by 'reason' that it is separate from religion instead part of it.noAxioms

    Reason based on fairy tales may not be the best reasoning.
bold
italic
underline
strike
code
quote
ulist
image
url
mention
reveal
youtube
tweet
Add a Comment