Guidelines for Preserving New Forms of Scholarship

Guidelines

If a publication platform enables user contributed content and that content is managed by the platform e.g. annotations or comments, the platform’s Terms of Use should clearly define the rights related to that content, especially if they may wish to preserve it or migrate it as part of the context of the publication. If a publication is likely to be archived with this context intact, the implementation of these features and their associated terms should factor in ethical consideration of how a user’s information is displayed on the platform, and how they are informed and consent to the use of the content.

See also:
55. Ethical concerns of user-contributed content

If a publication platform integrates third party applications for features such as annotations or comments, the publisher should ensure that the terms of service for that application provide appropriate permission for preserving and migrating that content over time. For example, Hypothesis’ Terms of Service specify that the copyright of annotation data is CC0.

See also:
14. Avoid being dependent on third party services for core features
15. Plan a strategy for preservation when third party dependencies exist

The Library of Congress updates their Recommended Formats Statement regularly. This is a helpful quick reference for selecting a format that is stable when there is an opportunity to choose. If converting data from a proprietary format to an open file format results in some data loss, consider saving both. For less established or proprietary formats, consider recording the type, version, and software used to generate and play the file—this can be included in the metadata or documentation.

These guidelines may also be considered during file format selection:
13. Acquire the highest quality version of media to use for preservation
34. For EPUBs, opt for core media types, as defined by the EPUB specification

Sometimes it is necessary or preferable to reference or embed third-party media content that is outside of the control of the publisher but integral to the understanding of the work. For these features, anticipate that their availability may be temporary and make plans to ensure that they are not only preserved, but sustained in some form within the publication while they are on the publisher platform. In the case of an embedded YouTube video, for example, some options to support preservation might include: retaining or requesting a copy of the video file; getting permission to take a copy of the content using the YouTube-DL tool in order to bring it into the local publication; or archiving and linking to a copy on the Internet Archive. An informative caption can help support future readers if the content is unavailable.

These guidelines may also improve preservability of third party hosted media:
12. Start discussions about multimedia early in the project
14. Avoid externally hosted media
16. Captions for non-text features add meaningful context
20. Ensure all core intellectual components of a work are reflected in the export package

When a publisher acquires rights for resources that are part of the publication, these should also include rights pertaining to the preservation of those resources. Express these rights in the metadata in a way that allows a preservation institution to determine what they have permission to preserve and relate them to the relevant material.

These guidelines may also support the creation of license metadata:
8. Clarify the license related to preserving third party web resources
24. Create descriptive metadata for each publication resource
40. Embed license information in the HTML

Some publishers may use copyrighted fonts and obfuscate them in order to protect the rights when embedded in the EPUB. Because obfuscated fonts create both a technical and copyright challenge for preservation, open fonts should be used.

A preservation service may not collect web content outside of the agreed upon domain names unless copyright for the content being harvested is clear. If third-party pages and features that are visually embedded in an EPUB or a web-based publication are meant to be preserved, it should be possible to identify which content publishers have the right to collect so that a web crawler can be configured to include or exclude it. One way to differentiate could be to consistently express the rights in the metadata that is supplied to the preservation service. Another option is to apply structured metadata describing the rights status to the HTML. The Creative Commons REL documentation includes examples of this that cover both page- and object-level licenses - this approach could support automated harvesting decisions at either level. Alternatively, a publisher could supply a list of domain names to include for harvest during the initial preservation workflow configuration.

These guidelines may also be useful to consider when embedding external web content:
25. Add license information to resource-level metadata
38. List the URLs for external web content in the metadata
45. Embed metadata that includes a license in the <head> of a web page