The processors in AI facilities lack intention, but AI facilities are owned and operated by human individuals and corporations who have extensive intentions. — BC
And those extensive intentions are what, in your perspective? And in what context of copyright do those intentions exist?
AGI doesn't necessarily have to think exactly like us, but human intelligence is the only known example of a GI that we have and with regards to copyright laws it's important that the distinction between an AGI and a human intelligence not be that all that wide because our laws were made with humans in mind. — Mr Bee
Not exactly sure what point you're making here? The only time in which copyright laws apply to the system itself and independent of humans either on the back or front end is when an AGI shows real intelligence and provable qualia, but that's a whole other topic on AI that won't apply until we're actually at that point in history. That could be few years from now, 50 years or maybe never, depending on things we've yet to know about AGI and super intelligence. For now, the AGI system's that are on the table mostly just combine many different tasks so that if you input a prompt it will plan, train itself and focus efforts towards a the goal you asked for without constant monitoring and iterative inputs from a human.
Some believe this would lead to actual subjective intelligence for the AI, but it's still so mechanical and lacking the emotional component that's key to how humans structure their experience that the possibility for qualia is pretty low or non-existent. So the human input, the "prompter" still carries the responsibility of its use. I think, however, that the alignment problem becomes a bigger issue with AGI as we can't predict in what ways an AGI plan and execute for a specific goal.
This is also why AGI can be dangerous, like the paperclip scenario. With enough resources at its disposal it can spiral out of control. I think that the first example of this will be a collapse of some website infrastructure like Facebook as the AGI ends up flooding the servers with operations due to a task that spirals out of control. So before we see nuclear war or any actual dangers we will probably see some sort of spammed nonsense because an AGI executed a hallucinated plan for some simple goal it was prompted to do.
But all of that is another topic really.
The question is whether or not that process is acceptable or if it should be considered "theft" under the law. We've decided as a society that someone looking at a bunch of art and using it as inspiration for creating their own works is an acceptable form of creation. The arguments that I've heard from the pro-AI side usually tries to equate the former with the latter as if they're essentially the same. That much isn't clear though. My impression is that at the very least they're quite different and should be treated differently. That doesn't mean that the former is necessarily illegal though, just that it should be treated to a different standard whatever that may be. — Mr Bee
The difference between the systems and the human brain has more to do with the systems not being the totality of how a brain works. It's simulating a very specific mechanical aspect of our mind, but as I've mentioned it lacks intention and internal will, which is why inputted prompts need to guide these processes towards a desired goal. If you were able to add different "brain" functions up to the point that the system is operating on identical terms as the totality of our brain, how do laws for humans start to apply on the system? When do we decide it having agency enough to be the one responsible for actions?
But the fundamental core to all of this is whether or not copyright laws apply to a machine that merely operate on simulating a human brain function. It may be that neural networks that are floating and constantly reshape and retrain itself on input data is all there is to human consciousness, we don't know until we reach that point for these models. But in the end it becomes rather a question of how copyright laws function within a simulation of how we humans "record" everything around us in memory and how we operate on it.
Because when we compare these systems to that of artists and how they create something, there are a number of actions by artists that seem far more infringing on copyright than what these systems do. If a diffusion model is trained on millions of real and imaginary images of bridges, it will generate a bridge that is merely a synthesis of them all. And since there's only a limited number of image perspectives of bridges that are three-dimensionally possible, where it ends up will weight more towards one set of images than others, but never a single photo. An artist, however, might take a single copyrighted image and trace-draw on top of it, essentially copying the exact composition and choice of perspective from the one who took the photograph.
So if we're just goin by the definition of a "copy" or that the system "copies" from the training data, it rather looks like there are more artists actually copying than there are actual copying going on within these diffusion models.
Copyright court cases have always been about judging "how much" was copied. It's generally about defining how many notes something was similar to, if lyrics or texts appeared in too many exact words or sentences after another. And they all depend on the ability of the lawyers and attorneys to prove that the actions taken were more or less based on a line drawn in the sand from previous cases that proved or disproved infringement.
Copyright law has always been shifting because it's trying to apply a definition of
originality to determine if a piece of art is infringement or not. But the more we learn about the brain and creative process of the mind, the more we understand of how little free will we actually have and how influential our chemical and environmental processes are in creativity, and how less logical it is to propose "true originality". It simply doesn't exist. But copyright laws demand that we have a certain line drawn in the sand that defines where we conclude something "original", otherwise art and creativity cannot exist within a free market society.
Anyone who studied human creativity in a scientific manner, looking at biological processes, neuroscience etc. will start to see how these definitions soon become artificial and non-scientific. They are essentially arbitrary inventions that over the centuries and decades since 1709 have gone through patch-works trying to make sure that line in the sand is in the correct place.
But they're also taken advantage of. With artists that had a lot of power using it against lesser known artists. And institutions who've used it as a weapon to acquire pieces of work from artists who lose their compensation because they didn't have a dozen legal teams behind them fighting for their rights.
So, what exactly has "society" decided about copyright laws? In my view it seems to be a rather messy power battle rather than truly finding where the line is drawn in the sand. The reason why well-known artists try to prove copyright infringement within the process of training these models is that if they win, they will kill the models as they can't use the data that is necessary to train them. The idea of the existential threat to artists have skewed people's minds into making every attempt to kill these models, regardless of how illogical the reasoning is behind it. But it's all based on some magical thinking about creativity and ignoring the social and intellectual relationship between the artist and the audience.
So, first, creativity isn't a magic box that produce originality, there's no spiritual and divine source for it and that produces a problem for the people drawing the line in the sand. Where do you draw it? When do you decide something is original? Second, artists will never disappear because of these AI models. Because art is about the communication between the artist and their audience. The audience want THAT artist's perspective and subjective involvement in creation. If someone, artists or hacks who believe they're artists, think that generating a duplicate of a certain painting style through an AI system is going to kill the original artist, they're delusional. The audience doesn't care to experience derivative work, they care only about what the actual artist will do next, because the social and intellectual interplay between the artist and the audience is just as important, if not the most important aspect rather than some derivative content that looks similar. That artists believe they're gonna lose money on some hacks forcing an AI to make "copies" and derivative work out of their style is delusional on both sides of the debate.
In the end, it might be that we actually need the AI models for the purpose of deciding copyright infringement:
Imagine if we actually train these AIs on absolutely everything that's ever been created and possible to use as training data. And then we align that system to be used as a filter where we decide the weights of the system to approximately draw that line in the sand, based on what we "feel" is right for copyright laws. Then, every time we have a copyright dispute in the world, be it an AI generation or someone's actual work of art, this artwork is put through that filter and it can spot if that piece of work falls under copyright infringement or not.
That would solve both the problem with AI generated outputs
and normal copyright cases that try to figure out if something was plagiarized.
This is why I argue for artists to work with these companies for the purpose of alignment rather than battling against them. Because if we had a system that could help spot plagiarized content and define what's derivative, it will not only solve the problems with AI generative content, it will also help artists that do not have enough legal power to win against powerful actors within the entertainment industry.
But because the debate is so simplified down to two polarized sides and that people's view on copyright laws is this belief that there is a permanent and rigid line in the sand, we end up in a battle about power struggles about other things rather than about artists actual rights, the creativity and the prospects of these AI models.
Judging the training process to be copyright infringement becomes a stretch and a very wiggly drawn line in that sand. Such a definition start to creep into aspects that doesn't really have to do with copying and spreading files, or plagiarism and derivative work. And it becomes problematic to define that line properly based on how artists themselves work.
Depends on what we're talking about when we say that this hypothetical person "takes parts of those files and makes a collage out of them". The issue isn't really the fact that we have memories that can store data about our experiences, but rather how we take that data and use it to create something new. — Mr Bee
Then you agree that the training process of AI models does not infringe on copyright and that it's rather the problem of alignment, i.e how these AI models generate something and how we can improve them not to end up producing accidental plagiarism that the focus should be on. And as I mentioned above, such a filter in the system or such an additional function to spot plagiarism would maybe even be helpful to determine if plagiarism has occurred even outside AI generations; making copyright cases more automatic and fair to all artists and not just the ones powerful enough to have a legal teams acting as copyright special forces.
Because a court looks at the work, that's where the content is manifest, not in the mechanics of an Ai-system nor in its similarities with a human mind. — jkop
If the court look at the actual outputted work, then the training process does not infringe on copyright and the problem is about alignment, not training data or the training process.
Defining how the system works is absolutely important to all of this. If lots of artists use
direct copies of other's work in their own work and such work can pass copyright after a certain level of manipulation, then something that never use direct copies should also pass copyright. How a tool or technology function is absolutely part of how we define copyright. Such rulings have been going on for a long time and not just in this case:
https://en.wikipedia.org/wiki/White-Smith_Music_Publishing_Co._v._Apollo_Co.
https://en.wikipedia.org/wiki/Williams_%26_Wilkins_Co._v._United_States
https://en.wikipedia.org/wiki/Sony_Corp._of_America_v._Universal_City_Studios,_Inc.
https://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel_Corp.
What's relevant is whether a work satisfies a set threshold of originality, or whether it contains, in part or as a whole, other copyrighted works. — jkop
If we then look at only the output, there's cases like
Mannion v. Coors Brewing Co., in which the derivative work can be argued is even more closely resembled to the original than what a diffusion model produce even when asked to do a direct copy, and yet, the court ruled that it was not copyright infringement.
So where do you draw the line? As soon as we start to define "originality" and we start to use scientific research on human creativity, we run into the problem of what constitutes "inspiration" or "sources" for the synthesis that is the creative output.
There is no clear line about what constitutes "originality", so it's not a binary question. AI generation can be ruled both infringement and not, so it all ends up being about alignment; how to make sure the system acts within copyright laws and not that
it, in
itself, breaks copyright law, which is what the anti-AI movement is trying to prove, on these shaky grounds. And the question of what constitutes "originality" is within the history of copyright cases a very muddy defined concept, to the point that anyone saying the concept is "clear", don't know enough about this topic and has merely made up their own mind about what they themselves believe, which is no ground for any law or regulation.
There are also alternatives or additions to copyright, such as copyleft, Creative Commons, Public Domain etc. Machines could be "trained" on such content instead of stolen content, but the Ai industry is greedy, and to snag people's copyrighted works, obfuscate their identity but exploit their quality will increase the market value of the systems. Plain theft! — jkop
And now you're just falling back on screaming "theft!" You simply don't care about the argument I've made over and over now. Training data is not theft because it's not a copy and the process mimics how the human brain memorize and synthesize information. It's not theft for a person with photographic memory, so why is it theft for these companies when they're not distributing the raw data anywhere?
Once again you don't seem to understand how the systems work. It's not about greed; the systems require such a large amount of data to function in a way that makes the technology function properly. The amount of data is key. And HOW the technology works is absolutely part of how we define copyright laws, as described with the cases above. So ignoring how this tech works and just screaming that they are "greeeeedy!" just becomes the same shouting polarized hashtag mantra that everyone else is doing right now.
And this attitude and lack of knowledge about the technology show up in your contradictions:
Because a court looks at the work, that's where the content is manifest, not in the mechanics of an Ai-system nor in its similarities with a human mind. — jkop
Machines could be "trained" on such content instead of stolen content, but the Ai industry is greedy... Plain theft! — jkop
...If the court should just look at the output, then the training data and process is not the problem, but still you scream that this process is theft, even though the court might only be able to look at what the output of these AI models are doing.
The training process using copyrighted material happens behind closed doors. Just like artists gathering copyrighted material in their process to produce their artwork. If the training process on copyrighted material is identical to an artist using copyrighted material when working, since both appears behind closed doors... the only thing that matters is the final artwork and output from the AI. If alignment is solved, there won't be a problem, but the use of copyrighted material in the training process is
not theft, regardless of how you feel about it.
Based on previous copyright cases, if the tech companies win against those claiming the training process is "theft", it won't be because the companies are greedy and have "corrupted" legal teams, it will be because of the copyright law itself and how it's ruled in the past. It's delusional to think that all of this concludes in "clear"
cases of "theft".