Making AI Smarter?

Tech companies are teaming up with libraries to add tons of information to artificial intelligence databases. Will this make AI searches better?

In this photo, a man prepares to scan, or digitize, a very old book. (The scanner in this photo is not related to the projects described in today’s article.)

It might seem as if artificial intelligence (AI) knows everything. But AI bots have only the data people have given them, and that can lead to incomplete, inaccurate, or biased search results. Now, tech companies are working to expand AI’s knowledge, by partnering with libraries around the world.

Companies like Google, Microsoft, and OpenAI (which owns ChatGPT, a well-known chatbot) are working with Harvard University, public libraries, and other institutions to digitize, or put online, parts of their book collections and feed them into the banks of data used to “train” AI. The book subjects range from law to the sciences to literature.

Expanding AI

The tech companies are eager to expand what AI “knows.” When AI bots, like ChatGPT and Google’s AI Overview, were first developed, the companies fed information into them from various online sources, from scanned books to Wikipedia to social media. Since not all of these sources are reliable, AI search results aren’t always accurate. Moreover, some of the sources that were fed into AI bots were copyrighted, meaning it was illegal to copy them without permission from the author or copyright holder. This has led to numerous lawsuits against the big tech companies.

There’s also plenty that’s missing from AI’s data collection, including much of the information on library bookshelves. Under the new partnerships with institutions, books will be added only if they are in the public domain, meaning their copyright has expired. In the United States, many copyrighted works enter the public domain once they are 95 years old.

College students study at Harvard University’s Widener Library. Harvard is among those working with tech companies to scan part of their book collections.

More Information for Everyone

The partnership benefits not only the tech companies but also libraries, which are eager to digitize their collections so that more people have access to them. Digitization is expensive—but now that tech companies are funding the project, libraries can go ahead with it.

“Many of these titles exist only in the stacks of major libraries, and the creation and use of this dataset will provide expanded access to these volumes and the knowledge within,” said Mary Rasenberger, CEO of the Authors Guild, in a statement.

Approach with Caution

No one is sure how these projects will affect AI searches. One concern is that old books often contain outdated or even harmful information. This might include disproven scientific theories or racist language. Librarians say it’s important for people to look closely at the search results returned by AI and think carefully about what information to accept and what to reject.

“When you’re dealing with such a large data set, there are some tricky issues around harmful content and language,” Kristi Mukk, a coordinator at Harvard’s Library Innovation Lab, told the Associated Press. Mukk said it is important to make “informed decisions and use AI responsibly.”

In the

News!

Making AI Smarter?

Did You Know?

Human vs. Robot

Bringing Reading to the World

WORD OF THE DAY

repository

Sudoku

In Case You Missed It