AI companies claim their tools couldn’t exist without training on copyrighted material. It turns out, they could — it’s just really hard. To prove it, AI researchers trained a new model that’s less powerful but much more ethical. That’s because the LLM’s dataset uses only public domain and openly licensed material.
The paper (via The Washington Post) was a collaboration between 14 different institutions. The authors represent universities like MIT, Carnegie Mellon and the University of Toronto. Nonprofits like Vector Institute and the
→ Continue reading at Engadget