The Internet Archive has often been a valuable resource for journalists, from it’s finding records of deleted tweets or providing academic texts for background research. However, the advent of AI has created a new tension between the parties. A few major publications have begun blocking the nonprofit digital library’s access to their content based on concerns that AI companies’ bots are using the Internet Archive’s collections to indirectly scrape their articles.
“A lot of these AI businesses are looking for readily available, structured databases of content,” Robert Hahn, head of business affairs and licensing for The Guardian, told Nieman Lab. “The Internet Archive’s API would have been an obvious place
→ Continue reading at Engadget