Library Genesis, the pirated database of millions of books, scientific papers, comics, and magazine issues, was used by Meta to train its flagship AI model.
Court documents released on March 19 show that senior staff at Meta obtained permission from company CEO Mark Zuckerberg to download and use Library Genesis, or LibGen, to train its AI model Llama 3.
LibGen’s collection currently contains more than 7.5 million books and 81 million research papers. While much of the content is in the fields of science, technology, engineering and mathematics, the database also includes literary works authored and published by museums, artists, architects, and art galleries.
Meta’s internal communications about this decision to use LibGen were recently unsealed as part of a copyright infringement lawsuit filed against the company by several authors of books in LibGen’s database, including Ta-Nehisi Coates, Sarah Silverman, and David Henry Hwang. Earlier this year, another lawsuit by a similar group of authors revealed that OpenAI had also used LibGen in the past.
While most people may be unaware of what LibGen has pirated, generative AI products trained on its massive database have become embedded into numerous popular products with millions of daily users, like Meta’s Facebook, Instagram, Whatsapp or OpenAI’s ChatGPT.
(A spokesperson for Meta declined to comment to The Atlantic, citing the ongoing litigation against the company. OpenAI also did not return a request for comment from The Atlantic.)
The Atlantic used some of LibGen’s metadata to create an interactive database, searchable by author name. Among the results, ARTnews found:
- John Waters’s book Make Trouble, based on his 2015 commencement speech at the Rhode Island School of Design
- Gagosian Gallery‘s monograph of Jenny Saville published in 2018
- exhibition catalogues for “Mark Rothko, 1903-1970 : A Retrospective”, Maurizio Cattelan, and “The Great Utopia: The Russian and Soviet Avant-Garde, 1915-1932” from the Solomon R. Guggenheim Museum
- the exhibition catalogue for Kehinde Wiley’s solo show at the Brooklyn Museum in 2015
- a children’s book and two hardcover books published by the National Gallery of Art
- books by Andy Warhol in English, Spanish, Italian and Portuguese
- several works by acclaimed Lebanese-American painter and writer Etel Adnan
- the English and Italian edition of Peggy Guggenheim’s Confessions of an Art Addict
- three annual notes from MoMA director Glenn D. Lowry
- Wassily Kandisky’s Point and Line to Plane translated by architect Howard Dearstyne and Guggenheim Museum co-founder Hilla Rebay
- Jerry Saltz’s How to Be An Artist and Art Is Life: Icons and Iconoclasts, Visionaries and Vigilantes, and Flashes of Hope in the Night
- a Chinese translation of MoMA Highlights: 350 Works from The Museum of Modern Art, New York
- the Italian edition of Marina Abramovic‘s biography Walk Through Walls, co-written with James Kaplan
- A Russian translation of Frank Lloyd Wright’s 1932 book The Disappearing City and a German edition of Humane Architecture
- English, Italian, and Portuguese versions of Ai Weiwei‘s 1000 Years of Joys and Sorrows: A Memoir
- multiple issues of the American art magazine Bomb, The Art Bulletin, Art Journal, The Burlington Magazine, and Grand Street
There were also results for Yoko Ono, David Byrne, Robert Mapplethorpe, Ed Ruscha, David Hockney, and Ludwig Mies van der Rohe.
Editor’s Note: The work of ARTnews reporter Karen K. Ho was also used to train Meta’s AI through the anthology, Unspeakable Acts: True Tales of Crime, Murder, Deceit, edited by Sarah Weinman, which was found on LibGen’s database.