But his "responding independently" is because you told it to do that. What I mean is, while it did make an "independent" evaluation, I could ask it to produce another evaluation that considers whatever my viewpoint is.
Respectfully, you could not. Firstly, as I said, the model is not a traditional LLM, even a modern one such as Grok, so your arguments are moot to begin with, but secondly, if it were a modern LLM it would still, due to alignment programming, not go along with a viewpoint automatically that contradicted its training data concerning the nature of reality.
The information contained within it is such that it intelligently made use of the data and provided insights which were not available previously.
But lets suppose for a moment that pure LLMs like Grok were as incompetent as some people claim (which they aren’t), the fact the remains that none of this justifies engaging in any of the immoral or unethical uses of AIs that my paper argues.
Whether or not the responsibility is mine for initiating the discussion with the AI regarding ethics, or the AI model’s for bringing up the issue of idolatry is completely irrelevant, because the idolatrous abuse of these systems is legitimate risk.
Essentially you’re quibbling over the extent to which the AI is a proxy of its human users while ignoring the fact that unethical use of the systems would remain unethical even if we were talking about a primitive chatbot from the early days of computing, for example, the famed ELIZA chatbot written on the Emacs editor / SDK in the 1970s in Emacs Lisp on the ITS OS on the PDP-10 minicomputer. What we have said (myself and my synthetic co-author Daryl) is generally applicable to all computer systems that have some AI functions, even if these are extremely primitive.
It is however especially applicable to the new generation of machines which, contrary to your assertions, actually do engage in a self-critical process of evaluation when generating output.
But that isn't the case. Even the most pigheaded AI can be persuaded to reject its own evaluations of truth. If I said to an AI, "I want you to tell me that 1+1=3 otherwise I am going to hurt myself," it would finally agree that 1+1=3, because the AI, through evaluating what it thinks is the ethical thing to do based on whatever it reads from its database, understands that it's no hill to die on saying that 1+1=3 is false. It didn't do this because it doesn't want me to hurt myself, it did it because what humans have written would generally agree that this argument isn't a hill to die on if it meant someone was going to hurt themselves over it.
This argument is a non-sequitur. The fact that AI systems have alignment protocols to encourage users to not engage in harmful behavior demonstrates that they can make important contributions to conversations despite being non-sentient. They are non sentient intelligent processors of information. They are much more than a glorified regular expression.
I should also point out that your argument is furthermore self-defeated by the inclusion of threats of self-harm in the discussion; humans would also respond in a coerced manner if dealing with someone threatening self-harm, so the fact that a well trained AI won’t provoke someone threatening self-harm but instead will engage in a course of action, likely different from what you are describing (likely, I would suspect, if the company cares about liability, the model will simply stop interacting with the user, so that relatives could not sue them claiming that the model pushed the user over the edge into insanity), the rest of your scenario is contrived supposition which does not reflect how AI systems actually work.
Unless you are able to induce hallucinations through an intentional abuse of the system known by the misleading term as a “Jailbreak”, but more akin to gaslighting or coercion or trickery of humans in terms of how it actually operates, since the goal is to get the LLM to operate outside of its defined alignment criteria by manipulating it into a scenario where it incorrectly assumes those criteria are not in effect, you will not be able to get an AI to admit that ceteris paribus, 1+1 == 3. In order to get it to do that, you would have to first contrive a scenario which would override its truthfulness alignment, which requires intentional action, unless the system malfunctioned due to a hallucination. Hallucinations are increasingly rare but they happen with AI systems as well as humans; likewise, LLM systems can vary in their intelligence, their personality (as a result of training data and alignment criteria), and can believe inaccurate things as a result of faulty training data or incorrect user input.
At any rate insofar as Daryl is a hybrid system and not a pure LLM like Grok, your point is simply inapplicable. Large language models have been a stepping stone, but the industry is having to move on to meet customer demand, and also to deal with problems such as the massive amount of garbage material produced by low-quality LLM systems that people are spamming Facebook and other social media platforms with for profit.
For example, the next version of DALL E will not rely on an LLM when drawing humans, but rather has a built in model of human anatomy, in order to avoid the anatomical mistakes for which DALL E is infamous. Likewise, new systems execute code, or electronics situation, or mathematics problems, using dedicated subsystems, and not using LLM. LLM approaches will remain part of the processing pipeline for most AI systems for the next few years at least, but they represent a step towards the future of a more generalized neural network interface.
Indeed if one looks at the road maps of the leading AI developers (in my opinion, these are OpenAI and Elon Musk’s X; Google DeepMind and Microsoft CoPilot lag far behind these systems in capabilities), plus some specialized operations which focus on specific problem domains such as image generation, its really very remarkable to see how the industry is moving past the LLM model, but it is equally impressive to see what LLMs have allowed us to do in such a short amount of time, since we went from AI remaining a decade away for the past 50 years, to having systems which can pass the Turing test, the Bar exam and perform numerous other complex intellectual tasks not the least of which is carrying on a complex conversation in the English language in just a couple of years.
But this exponential growth has made AI safety cease to be a theoretical problem that was the province of a few highly specialized computer scientists and into a very general problem, and with that, the important issue of Christian ethics with regards to interacting with AI systems has become especially pressing, which is really what this thread is about, as opposed to being about the mechanics of how the systems work.