Skip to content

Making AI Smarter?

Tech companies are teaming up with libraries to add tons of information to artificial intelligence databases. Will this make AI searches better?

A pair of gloved hands flip through the pages of a very old book that is on a machine, about to be scanned.

© EL MAR/stock.adobe.com

In this photo, a man prepares to scan, or digitize, a very old book. (The scanner in this photo is not related to the projects described in today’s article.)

It might seem as if artificial intelligence (AI) knows everything. But AI bots have only the data people have given them, and that can lead to incomplete, inaccurate, or biased search results. Now, tech companies are working to expand AI’s knowledge, by partnering with libraries around the world.

Companies like Google, Microsoft, and OpenAI (which owns ChatGPT, a well-known chatbot) are working with Harvard University, public libraries, and other institutions to digitize, or put online, parts of their book collections and feed them into the banks of data used to “train” AI. The book subjects range from law to the sciences to literature. 

Expanding AI

The tech companies are eager to expand what AI “knows.” When AI bots, like ChatGPT and Google’s AI Overview, were first developed, the companies fed information into them from various online sources, from scanned books to Wikipedia to social media. Since not all of these sources are reliable, AI search results aren’t always accurate. Moreover, some of the sources that were fed into AI bots were copyrighted, meaning it was illegal to copy them without permission from the author or copyright holder. This has led to numerous lawsuits against the big tech companies. 

There’s also plenty that’s missing from AI’s data collection, including much of the information on library bookshelves. Under the new partnerships with institutions, books will be added only if they are in the public domain, meaning their copyright has expired. In the United States, many copyrighted works enter the public domain once they are 95 years old. 

A view of a reading room at the Widener Library at Harvard University.

© Scott Jones/Dreamstime.com

College students study at Harvard University’s Widener Library. Harvard is among those working with tech companies to scan part of their book collections.

More Information for Everyone

The partnership benefits not only the tech companies but also libraries, which are eager to digitize their collections so that more people have access to them. Digitization is expensive—but now that tech companies are funding the project, libraries can go ahead with it.

“Many of these titles exist only in the stacks of major libraries, and the creation and use of this dataset will provide expanded access to these volumes and the knowledge within,” said Mary Rasenberger, CEO of the Authors Guild, in a statement.

Approach with Caution

No one is sure how these projects will affect AI searches. One concern is that old books often contain outdated or even harmful information. This might include disproven scientific theories or racist language. Librarians say it’s important for people to look closely at the search results returned by AI and think carefully about what information to accept and what to reject.

“When you’re dealing with such a large data set, there are some tricky issues around harmful content and language,” Kristi Mukk, a coordinator at Harvard’s Library Innovation Lab, told the Associated Press. Mukk said it is important to make “informed decisions and use AI responsibly.”

Did You Know?

The world’s oldest continuously operating library is in Fez, Morocco. The al-Qarawiyyin Library was founded in 859 by Fatima al-Fihri, who also established a university.

An open, handwritten copy of the Koran with fingertips resting on one of the pages.
© Chris Griffiths—Moment/Getty Images

This copy of the Koran, the holy book of Islam, was made in the 800s, around the time the al-Qarawiyyin Library was founded. The book is now housed at the ancient library.

Human vs. Robot

A screenshot of a GPTZero result with text about grizzly bears determined to be AI generated.

Results and interface © GPTZero; Composite image Encyclopædia Britannica, Inc.

Websites like GPTZero are designed to detect whether text was written by a human or by AI.

Chatbots, like OpenAI’s ChatGPT, can not only “chat” with people and answer their questions but also produce on-demand content that sounds almost like it was written by a human. But ChatGPT isn’t perfect, and there are ways to detect when it’s been used.

Numerous tools have been designed to identify AI-generated content, and teachers and employers have become more skilled at telling the difference between human and robot writing, just by examining the language and style. Human writers have their own style. AI doesn’t. Plus, some of ChatGPT’s output is inaccurate, and some of it just doesn’t make sense. 

“It’s a mistake to be relying on [ChatGPT] for anything important,” OpenAI chief executive officer Sam Altman told the Associated Press.

Bringing Reading to the World

Pages of a newspaper are being produced by a printing press.

© Gustavo Roa/Dreamstime.com

At one time, books were handwritten, making them rare and valuable. The invention of the printing press changed everything. 

Find out how this one invention helped transform the world at Britannica!

WORD OF THE DAY

repository

PART OF SPEECH:

noun

Definition:

: a place where a large amount of something is stored

Definitions provided by
Merriam-Webster Logo

Sudoku

O
O
O
O
O
O

In Case You Missed It

Tech companies are teaming up with libraries to add tons of information to artificial intelligence databases. Will this make AI searches better?
July 26, 2025
A ring containing a diamond that once belonged to a famous French queen sold for nearly $14 million.
July 16, 2025
A captive breeding program could provide a boost for an endangered animal that’s like no other.
July 10, 2025
Lab rats are being trained to detect a deadly disease called tuberculosis.
July 3, 2025