Onuma Tahee
National Diet Library. Chief, Research and Development Section, Research and Development for Next-Generation Systems Office, Digital Information Planning Division, Digital Information Department

Developing new library services using AI (machine learning): an introduction to the Next Digital Library

The Research and Development for Next-Generation Systems Office (R&D Office) at the National Diet Library, Japan (NDL) conducts research and development of practical applications for new library services using machine learning and other advanced information technologies as a means of improving the discoverability of digitized materials. One example of how the R&D Office’s efforts are being put to use is the Next Digital Library, an experimental search and view service for digitized materials.

The Next Digital Library <https://lab.ndl.go.jp/dl/> features two search functions: a Keyword Fulltext Search of texts that are generated using optical character recognition (OCR) and an Illustration Search for finding illustrations, photographs, and maps that are extracted automatically from digitized material. Users are able to search content from the NDL Digital Collections of about 336,000 books for which copyright protection has expired, including numerous old and rare Japanese books, with full text data for 30,000 items on industrial subjects. The Next Digital Library also features unique functionality for improving library services, such as whitening the background color of digitized copies of materials that are discolored due to aging, automatic processing of images to enhance page by page readability on smartphone displays, and automatic detection of page turning direction. The NDL is working now to implement this new functionality in the NDL Digital Collections.

Since August 2019, the R&D Office has operated an account on the GitHub website <https://github.com/ndl-lab>, through which it shares much technology, including the source code for the Next Digital Library and datasets for training machine learning models. The NDL hopes that the availability of this data will attract talented engineers from outside the NDL and serve as a hub for the exchange of expertise.

AI (機械学習) を用いた新たな図書館サービスの開発~「次世代デジタルライブラリー」の紹介を中心に


図書・古典籍の全て約33万6,000点 (うち、本文テキスト検索対象は産業分野約3万点) である。その他、スマートフォン等の縦長ディスプレイに応じた見開きページの自動分割機能、資料閲覧画面でのページめくり方向の自動判定・設定機能等も搭載している。現在、これらの機能を「国立国会図書館デジタルコレクション」に搭載するため、検討を進めている。