Background
The Bibliographic Knowledge Network (BKN) is a project funded for two years starting September 2008 by the NSF Cyber-enabled Discovery and Innovation (CDI) Program to develop a suite of tools and services to encourage formation of virtual organizations in scientific communities of various sizes, such as conference groups and departmental research groups, and allow such organizations to filter out relevant documents from various input streams, select and enhance the quality of bibliographic data associated with the organization, and attract students, teachers and researchers to contribute to activity of the organization.
A copy of the project proposal is available at http://www.stat.berkeley.edu/users/pitman/bkn-proposal.pdf.
Methods of bibliometric analysis, machine learning and statistical visualization will be applied to assist the exploration and understanding of bibliographic collections of various sizes, for example all work produced by a research group, or published in a journal, or all work in a field. This will provide an interactive environment which allows the researcher to move beyond static summaries to dynamically explore the environment in which an article of interest exists. In particular, methods of machine-learning will be applied to to build an article recommendation service, based on collaborative filtering and on semantic analysis of bibliographic data, initially for researchers in probability and statistics.
Research will also be done to provide adequate authoring tools for authors in mathematical fields to easily create highly structured, machine-readable documents in latex, bibtex and or similar formats, which can then be easily aggregated and interlinked in encyclopedic compilations, and then subjected to machine-learning and statistical analysis to provide high-level overviews of the landscape of these fields.
In statistics, mathematics and related fields, including social science, we expect the networks of information about authors, publications, problems and datasets that will be created and exposed through this project should advance these fields by revealing hidden connections among different sub-disciplines, and accelerating the transmission of knowledge across these sub-disciplines. With respect to information science, the project should advance understanding of the collaborative production and enhancement of bibliographic information online, leveraging flexible similarity metrics presented in a visually stimulating way to draw interest and encourage researchers to expand their search parameters.
The project addresses three fundamental problems of knowledge management:
- The compartmentalization problem (how to break down barriers which separate disciplines)
- The navigation problem (how to guide students and researchers within and between disciplines)
- The maintenance problem (how to provide incentives for individuals and organizations to improve the quality of publicly accessible knowledge).
It is proposed to solve these problems by gradually distilling the wealth of heterogeneous data now available in digital formats into an openly navigable network of websites, the Bibliographic Knowledge Network (BKN), each node of which is a website dedicated to a specific topic or field of knowledge. Each participating site will be maintained by some individual or Virtual Organization with a commitment to that field. Sites may be designed as guides for researchers, teachers, and students, or they may provide more specialized services, such as gateways to connect other internet resources.
The BKN will be created through the development of software which makes it easy for a large collection of mostly small and distributed organizations to brand, select, maintain, and annotate collections of structured scientific content. That content will be made available in machine-readable formats, to allow connections between ideas in different disciplines to be made using methods of machine learning. Methods of machine learning will be applied to provide article recommendation services based on both collaborative filtering and semantic analysis of documents.
The collective knowledge system emerging from this project will be available beyond the walls of academia, and should provide well-organized high quality information to anyone with an Internet connection. The expository components of the system will attract people from all backgrounds to pursue scientific careers, and will allow students at all levels to encounter materials which will lead them to higher levels. The system should add value to (and leverage the capabilities of) other Open Access initiatives, including the system of interoperable digital repositories, Wikipedia, Open Journal Systems, and free academic search services such as Google Scholar.



