CLOCKSS: An Archive for the Open Access Community
Victoria Reich and Irene Perciali, CLOCKSS
As the number of online scholarly resources has grown, so too has the fraction of them that are open access (OA). Online scholarly resources represent the best and most important advances in knowledge, the result of years of study and billions of dollars of investment. The fact that so many of them are available for free is a testament to the hard work of OA advocacy organizations (such as SPARC), and to the responsible stewardship of major research institutions and universities that have supported the OA movement (for example, universities such as Stanford that have passed OA mandates for their faculty).
And yet OA materials are especially difficult to safeguard for the very long term. After all the effort to procure and publish OA content, there are rarely financial resources left to secure and preserve them in a reliable archive. Open access content is more fragile and vulnerable than publisher-owned content, which is protected by the larger profit margins of commercial presses. Much of the new online open-access scholarship comes from small publishers with tenuous business models, from scholars and universities using do-it-yourself web publishing platforms, and from smaller scholarly communities and those in the developing world (the DOAJ list of journals gives a good cross-section, and quantitative measures of OA growth through 2009 can be found here).
This content is most at risk of disappearing. Every day there are examples of open access materials that have become unavailable. When content is no longer provided by the publisher, the author, or the host institution, there are no guarantees that it will ever be made available again, and available at no charge. And yet we need to guarantee that these irreplaceable materials persist.
What is the right approach to preserve open access content for the very long term? A trustworthy long-term archive for open access content would need to have most of the following features: • It must be governed by and represent the diverse voices of the global open access scholarship community. • It must be affordable; for some of the smaller scholarly communities, it will probably need to be free of charge. • It must be technologically diverse – able to store any number of content types in a diverse array of redundant locations. • Most importantly, it must guarantee that open access content remain free and open access, regardless of changes in the market, institutions, and governments.
CLOCKSS is a not-for-profit archive that was founded by the world’s leading academic publishers and libraries in 2006, with a vision for a global archive that could fill some of the gaps in the landscape of scholarly archives. In the process of designing the archive – its governance, its finances, and its technology – the founding organizations grappled with the same question that faces the open access community today. How to ensure that all scholarly resources are preserved in the hands of the community, via a fair and sustainable model that is affordable and accessible to all? The CLOCKSS model represents one way to address the archiving needs of open access content outlined above, and CLOCKSS’s experience implementing these principles may be instructive to others.
2. Community governance
Open access content belongs to the entire scholarly community. An archive for open access content must thus be governed by the community. A transparent community-based governance structure is most likely to persist in the future as technologies and priorities change. Distributed governance and administration ensures that no single legal entity can compromise the long-term viability of this initiative.
CLOCKSS was designed carefully to share the responsibility of governing the archive among all the participants. The CLOCKSS archive is managed by and for its stakeholders. The bylaws require that the Board of Directors be made up in equal proportion by libraries and publishers. CLOCKSS also has an Advisory Council made up of one delegate from each supporting library. The Board responds to recommendations from the Advisory Council, which ensures that all libraries contribute to the policies and practices of CLOCKSS. CLOCKSS participants have the opportunity to be deeply involved in all aspects of the scholarly communications industry and keep the community’s best interests at the forefront when they make their decisions.
Just as open access content belongs to the entire scholarly community, the costs of archiving it are the responsibility of all. CLOCKSS evens out the financial responsibility through a mixture of fee-for-service and endowment funding. It receives participation fees from the publishers and libraries. Annual fees start at $450 for research libraries and $200 for publishers, and our goal is to have even lower participation fees over time, in order to extend the archive to smaller publishers and those in the developing world. Indeed, fees for smaller publishers have already been lowered once, which has allowed a number of smaller publishers to participate.
Additionally, to guarantee the very long-term sustainability of an archive, its operating expenses must be secured and guaranteed for that very long term. CLOCKSS is also building an endowment that will guarantee the future of the archive independent of any particular member, and that will make it possible for all content to be preserved, regardless of cost. CLOCKSS maintains the lowest possible operating costs, to make it easier for institutions of all sizes and budgets to participate, and to make sure the archive is lean enough to persist into the very long term.
4. Distributed technology
Open access scholarship is worldwide, and spans continents, borders, and institutions. In the open access spirit of decentralization, diversity, and community, CLOCKSS technology is geographically distributed, to guard against natural or human failure; it uses open access software, to guard against technological arrogance; and each of its redundant repositories are administered independently, to guard against the risk of political and policy fluctuations at any given institution.
CLOCKSS (which stands for Controlled Lots of Copies Keeps Stuff Safe) is built on LOCKSS technology (Lots of Copies Keeps Stuff Safe), an award-winning open source preservation method that relies on technological redundancy and duplication. The technology manages the preserved content in all formats and genres of content, to ensure today's content will be readable by tomorrow's scholars. The archive is distributed across 12 geopolitically, geographically, and geologically diverse long-lived steward libraries that have agreed to take on an archival role on behalf of the wider international community. CLOCKSS steward libraries are distributed around the world: the United States, Canada, Hong Kong, Japan, Australia, Scotland, and growing.
CLOCKSS thus reinforces the role of libraries as memory organizations, complementing their role as custodians and advocates for open access content.
5. Permanent Open Access
Content that is open access once, should stay open access. The open access commitment must be resilient over the very long term: no eventual changes in policy, finances, or management should change the terms of access.
The CLOCKSS bylaws guarantee open access continuity in several ways. First, the CLOCKSS founding organizations unanimously agreed that any content triggered from the archive is assigned a creative commons license, which guarantees that it remain permanently open access. This is part of the CLOCKSS bylaws as well as the agreement that each publisher signs when they contribute their content to the archive.
Second, representatives from each participating publisher and library decide by majority whether and when to release content from the archive. Content is triggered when it is no longer available from any publisher, and the CLOCKSS board votes to “light up” the affected titles and restore access to them again. The trigger event must be approved by at least 75% of the board; this means that no one publisher can block content from being released from the archive.
When content is triggered, all libraries and scholars around the world have access in perpetuity without payment. Within the past year, CLOCKSS experienced three trigger events and responded by releasing the triggered content, making it free not only to CLOCKSS participants, or to current or former subscribers to that licensed content, but free to everyone with access to the Internet.
For commercial publishers, CLOCKSS is a way to leave a long-term legacy to all the world’s scholars, and make their content as useful as possible once it no longer has commercial value. For open access publishers, CLOCKSS is the only comprehensive archive that keeps open access content open access. It is the only way to make sure that, no matter what happens, scholarly content will always remain in the hands of scholars.
Victoria Reich is the director of the LOCKSS Program (for Lots of Copies Keep Stuff Safe) at Stanford University. She is also a founding Director of the CLOCKSS Archive. The CLOCKSS Archive is a community-governed, geographically-distributed, dark archive. Prior to LOCKSS and CLOCKSS, Victoria helped launch Stanford University's HighWire Press. She has extensive library experience in both the public and private sector, having held positions at Stanford University Libraries, the National Agricultural Library, the Library of Congress, and the University of Michigan. Victoria is a member of the Stanford Copyright & Fair Use Web Site Advisory Board. In 2008 she received the Ulrich's Serials Librarianship Award. For more information and a list of publications and presentations see, http://www.lockss.org/lockss/Vicky_Reich.
Irene Perciali has worked in scholarly communications for the past four years, and is currently the Director of Development at CLOCKSS. Previously, she directed the journal publishing program at Berkeley Electronic Press, and researched scholarly communications issues at UC Berkeley’s Center for Studies in Higher Education.