Imagine an undergraduate trying to brainstorm a fruitful theme for a term paper. One midnight, she is struck by inspiration, and opens the Libraries’ Digital Collections to type this query:
“Are there any examples of World War Two propaganda posters that talk about the importance of food?”
Often, such a specifically worded request is untenable with a regular search bar. But if a generative AI-assisted search tool now in prototype form at Northwestern Libraries were available to our hypothetical student, she might be surprised when the AI brought back images of pertinent posters (with slogans like “Save every drop of fat!” and “Fight food waste in the home!”), along with a written summary of the findings and percentile ratings showing the AI’s confidence in the relevance of each image.
For the student, this is still a “someday” technology – but the Libraries’ prototype really did answer this question in a live demonstration this year, delighting the development team that is experimenting with how artificial intelligence can facilitate a digital collection search. But that’s just the immediate goal; writ large, this experimentation is meant to explore how generative AI tools can be a force for good in the library field, said Carolyn Caizzi, head of Repository and Digital Curation.
“This field is moving at breakneck speed,” she said. “At the Libraries, we have developers saying, ‘What can I do? How can I use this?’ We’re fortunate that we have this team of software developers here who have the support to be creative and experiment.”
That team, led by digital initiatives product manager David Schober, is looking for answers for the biggest questions posed since tools like ChatGPT exploded in the public consciousness last year. In describing his team’s experimentation, Schober invoked one of the most famous scenes in cinema about mastering technology.
“At first, there was there was some excitment and maybe some trepidation in our field,” he said. “Like 2001: A Space Odyssey, we had to walk up to this tool and ask: Do I hit something with it? Or fly with it? Throw it in the air? How can I use it?”
The Libraries are treating this prototyping work as a way to show a proof of concept. “And we’re publishing how we did it,” Schober said. “Because there needs to be a bigger conversation in the library tech community around what this this technology can do.”
The result of this early-stage development is a workable tool that is giving developers and librarians plenty to start that conversation.
“For me as a librarian, this flips search completely on its head,” Caizzi said. “We’re used to using keywords to search, but now I’m having a conversation to ask questions and get answers. That’s a powerful shift.”
The Libraries’ Digital Collections turns out to be ideal for this experimentation, because it’s the perfect size — as a “Goldilocks collection,” it’s just big enough for meaningful queries without being overwhelming. Plus, Northwestern developers are already familiar with collections digitized and stored here. (“It’s easier if we develop a tool like this for data we manage and understand,” Caizzi said.)
To build this prototype, developers took the metadata from the digitized collections and stored it in a “vector database,” then combined the search capabilities with the reasoning of a large language model developed by OpenAI. Rather than a database of content in rows and columns, like a typical spreadsheet, a vector database operates in multiple dimensions offering a way to query concepts that are similar, rather than by relying on keywords. The generative AI portion then describes the results and provides context for the user’s query; for example, a request for “people on a beach” might consider previously unrelated images combining people, sand or water for a more comprehensive result.
The result is a tool that is showing real promise. In one demonstration, senior developer Brendan Quinn, who initiated the early experimentation and development, posed this query to the prototype:
“Are there any examples of French humor in the collection? And can you explain why the titles are funny in English? If the jokes don’t make sense, I’d be okay with that.”
The tool searched the Libraries’ digitized collections and successfully identified a set of covers from 20th Century humor publications, such as Le Pêle Mêle and Le Bon Vivant. The tool also tried summarizing the images it found, but conceded, “It is difficult to translate the humor from French to English, as many of the jokes are based on cultural references and wordplay that may not translate well.”
“The language model is translating these titles here and is trying to answer our prompt in the best way it can,” Quinn said. Still this represented an encouraging outcome for the prototype, he said, adding, “The proof is in the prototype, as it were.”
There’s more to the work than delivering relevant search results. For example, developers must continue to grapple with ethical implications of AI-assisted search. Quinn pointed out how a search for the term “women’s clothing” successfully delivered a number of images of women in dresses; however, “does that term have to mean wearing a dress?” he asked. “Does this reveal a Western bias in the language model? This is a good example of where we need to begin investigating.”
Similarly, developers are concerned with communicating to the end user how the tool arrived at its search in order to build trust in the results. In other words, by offering transparency to the tool’s “thinking,” users can come away with higher confidence that they found everything in their search.
Next Caizzi and her team will release the tool to campus. Her group is applying for grants to make a toolkit that will enable other libraries to try similar projects with their own collections, so those institutions can contribute their own discoveries to the collective understanding of AI-assisted search.
“We are all trying to understand these tools and how they can be leveraged to the benefit of our users and frankly — I know it sounds lofty — to the benefit of humanity at large,” Schober said. “Library-led initiatives are critical to ensuring that happens.”