Was wondering if anyone was aware of any solutions to either getting around a lot of the legality around using models that are pretrained on non-commercially licensable datasets like imagenet and parts of MS-COCO.
It seems to me that the academic community creates wildly useful models that then sometimes can’t be used because they’re using pretrained resnet.
Perhaps I’m dead wrong on this being the reality of using research in industry. I’m honestly wanting to know if anyone else has run into these legal issues at larger companies and has either solved their problem or felt the same.
Lastly, it goes without saying, but an AI model, or Photoshop, or Windows Paint, or even a pencil and paper can be used to produce a material that falls under someone else’s copyright. Prompting a model to produce a facsimile of Mickey Mouse and using that in some commercial production is effectively no different than taking a pencil and drawing Mickey Mouse. Both will likely end up in a lawsuit brought by Disney. So the fact that a model can produce copyrighted material does not solely mean the model, itself, is illegal for use.
I really appreciate the work you just did to provide that context, I think it was very valuable.
However, I don’t think these get to the issue that I experienced working for a fortune 500 manufacturing company. From what I experienced, they will not touch it if it even has a potential possibility for a lawsuit. I would imagine this has to do with risk assessment. I’ve even heard rumors that Netflix doesn’t allow their engineers to use pretrained ImageNet models.
TL;DR:
Are there any larger (imagenet or ms-coco sized) datasets for imaging that allow us to get similar performance on benchmark tasks in deep learning CV that are either commercially licensable or guarantee that you won’t get sued?