Skip to main content

40. If external web content is embedded in a publication, identify the rights and note intention for collecting this content as part of the publication.

Some preservation services will not collect web content outside of the agreed upon domain names unless copyright for the content being harvested is clear. If third-party pages and features that are visually embedded in an EPUB or a web-based publication are meant to be preserved, it should be possible to identify which content publishers have the right to collect them so that a web crawler can be configured to include or exclude them. One way to communicate these rights is to express them in the metadata that is supplied to the preservation service. Another option is to apply structured metadata describing the rights status to the HTML. The Creative Commons REL documentation includes examples of this that cover both page- and object-level licenses. This approach could support automated harvesting decisions at either level. Alternatively, a publisher could supply a list of domain names to include for harvest during the initial preservation workflow configuration.

These guidelines may also be useful to consider when embedding external web content: