Abstract

This specification defines a collection of information that describes the structure of Web Publications, so that user agents or developers may create user experiences well-suited to reading publications, such as sequential navigation and offline reading. This information includes the default reading order, a list of resources, and publication-wide metadata.

Status of This Document

This is a preview

Do not attempt to implement this version of the specification. Do not reference this version as authoritative in any way. Instead, see https://w3c.github.io/wpub/ for the Editor's draft.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This first public working draft provides a preliminary outline of a Web Publication. Many details are under active consideration within the Publishing Working Group and are subject to change. The most prominent known issues have been identified in this document and links provided to comment on them.

In particular, the Working Group seeks feedback on the following issues:

This document was published by the Publishing Working Group as an Editor's Draft. Comments regarding this document are welcome. Please send them to public-publ-wg@w3.org (subscribe, archives).

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 March 2017 W3C Process Document.

1. Introduction§

1.1 Why Web Publications§

This section is non-normative.

The Web is a lonely place. It is unbounded: resources live out their lives on remote servers scatterd across the globe, only reachable by addresses sometimes known to only a few people. But life on the Web is not all doom and gloom. Through the power of Web pages, these resources can be brought together to create amazing experiences.

Web sites add another layer of relationship — this time between pages — but the relationship is a tenuous one that typically depends on hyperlinks to add cohesion. Without a user that understands how to follow the connections, a Web site is still no more than a loose coupling of information.

The preceding is not a critique of the Web, but meant to highlight that the modern Web is very much an active, event-driven experience. Users follow the necessary paths to obtain the information they need.

The traditional publishing model, sometimes called the print model, differs from the Web model in that the publisher packages all the information together and thereby establishes the common pathway through it. The user can passively follow the content page-by-page, or actively find other pathways via a table of contents or index. It is a model that has worked to bind information in a cohesive way for millenia, and continues to be an important model alongside the Web. The publication as a bounded edition, made public, is used to carry intellectual and artistic works of innumerable form: novels, plays, poetry, journals, magazines, newspapers, articles, laws, treatises, pamphlets, atlases, comics, manga, notebooks, memos, manuals, and albums of all sorts.

Attempts to reproduce this model on the Web, however, have had to work around its loose coupling of information: sometimes publications are compressed into a single page; sometimes they are broken across multiple pages and hyperlinked together. These models both have flaws, however: single-page publications are often so large they render slowly, especially on low-power devices; mutli-page publications cannot be easily taken offline because their common thread cannot be established.

As a result, users have had trouble accessing, compiling and downloading Web content for curation and personal use. That, in turn, has fed the continuing development of non-Web digital formats to redress these problems, and made it necessary to create both Web-ready content and alternative renditions for offline use.

This specification aims to reduce these barriers and reinvigorate publishing by combining the best aspects of both models — the persistent availability and portability of bounded publications with the pervasive accessibility, addressability, and interconnectedness of the Open Web Platform. To do so, it adds an unobtrusive definition of interrelation to the Web model: the Web Publication.

1.2 What is a Web Publication§

This section is non-normative.

A Web Publication is a discoverable and identifiable collection of information about a publication and its resources. This information is expressed in a machine-readable document called a manifest, which is what enables user agents to understand the bounds of the Web Publication and connection between its resources.

The manifest includes metadata that describes the Web Publication, as it has an identity and nature beyond its constituent resources. The manifest also provides a list of all the resources that belong to the Web Publication and the default reading order, which is how it connects resources into a single contiguous work.

A Web Publication is discoverable in one of two ways: resources either include a link to the manifest (via an HTTP Link header or an [HTML] link element), or the manifest can be loaded directly by a compatible user agent.

With the establishment of Web Publications, user agents can build new experiences tailored specifically for their unique reading needs.

1.3 Scope§

This section is non-normative.

This specification only defines requirements for the production and rendering of valid Web Publications. As much as possible, it leverages existing Open Web Platform technologies to achieve its goal—that being to allow for a measure of boundedness on the Web without changing the way that the Web itself operates.

Moreover, the specification is designed to adapt automatically to updates to Open Web Platform technologies in order to ensure that Web Publications continue to interoperate seamlessly as the Web evolves (e.g., by referencing the latest published versions instead of specific versions).

Further, this specification does not attempt to constrain the nature of a Web Publication: any type of work that can be represented on the Web constitutes a potential Web Publication.

1.4 Terminology§

Wherever appropriate, this document relies on terminology defined by the note on "Publishing and Linking on the Web" [publishing-linking], including, in particular, user, user agent, browser, and address.

Identifier

An identifier is metadata that can be used to refer to Web Content in a persistent and unambiguous manner. URLs, URNs, DOIs, ISBNs, or PURLs are all examples of persistent identifiers frequently used in publishing.

Manifest

A manifest represents structured information about a Web Publication, such as informative metadata, a list of all resources, and a default reading order.

Non-empty

For the purposes of this specification, non-empty is used to refer to an element, attribute or property whose text content or value consists of one or more characters after whitespace normalization, where whitespace normalization rules are defined per the host format.

URL

In this specification, the general term URL is used as in other W3C specifications like HTML [ html], and is defined by URL Standard of the WhatWG [url]. In particular, such a URL allows for the usage of characters from Unicode following [RFC3987]. See the note in the HTML5 document for further details.

Web Publication

A Web Publication is a collection of one or more resources, organized together through a manifest into a single logical work with a default reading order. The Web Publication is uniquely identifiable and presentable using Open Web Platform technologies.

2. Conformance§

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY, MUST, MUST NOT, RECOMMENDED, SHOULD, and SHOULD NOT are to be interpreted as described in [RFC2119].

2.1 Conformance Classes§

This specification defines two conformance classes: one for Web Publications and one for user agents that process them.

A Web Publication is conformant to this specification if it meets the following criteria:

A user agent is conformant to this specification if it meets the following criteria:

3. Information Set§

Editor's note

The name "infoset" may change depending on feedback. Although this term has a different meaning for individuals familiar with XML, alternatives such as "properties" and "metadata" do not fully capture the nature or purpose. See issue #63 for discussion.

Editor's note

As the serialization of the manifest remains an open issue, specifics about how properties are compiled into the infoset remain unspecified. This includes, but is not limited to, what specific names the properties will have in the infoset, whether the names in the manifest will be the same as those in the infoset and/or whether mappings to properties from known vocabularies will be used.

3.1 Explanation§

A Web Publication is defined by a set of properties known as its information set (infoset). The infoset is both abstract and concrete. It is abstract in the sense that it represents a set of information that a user agent has to be able to compile about the Web Publication, but it also becomes concrete when the user agent creates an internal representation of the information.

The infoset does not require a specific serialization. It is primarily compiled from a Web Publication's manifest, whose serialization requirements are defined in 4.4 Serialization. It is therefore possible to express the same infoset via different manifests, although a Web Publication will only have one manifest.

Although the manifest is the primary source of the infoset, some information may be obtained independent of it. For example, fallback rules for properties defined in the following subsections allow a user agent to compile information that the author has not provided in the manifest, whether as an intentional optimization or by accidental omission.

3.2 Requirements§

The Web Publication infoset MUST include the following information:

In addition, the infoset SHOULD include the following information:

Editor's note

These requirements reflect the current minimum consensus, though a number of issues remain open that could change whether an item is required or recommended. See the following sections for more information.

Issue 21: manifest: metadata

Whether the minimum manifest must include any metadata, or a specific slot to handle metadata. (Note: this is now more specifically related to the infoset.)

3.3 Title§

The title provides the human-readable name of the Web Publication.

When specified in the manifest, the title MUST be non-empty.

If a user agent requires a title and one is not available in the infoset, it MAY create one. This specification does not mandate how such a title is created. The user agent might:

Note

A user agent is not expected to produce a meaningful title [WCAG20] for a Web Publication when one is not specified.

3.4 Creators§

Creators are the individuals or entities responsible for the creation of the Web Publication.

The role the creator played in the creation of the Web Publication SHOULD also be specified (e.g., 'author', 'illustrator', 'translator').

3.5 Language§

The language specified in the Web Publication's infoset identifies the natural language(s) of its content.

This language is not used in the processing or rendering of the Web Publication (including the manifest), and is not a replacement for identifying the language of each resource as defined by its format. It instead allows a user agent to ability to provide supplementary enhancements, such as the ability to download a custom dictionary or the preload a language-specific text-to-speech module.

When specified, the language MUST be a tag that conforms to [BCP47].

If a user agent requires the language and one is not available in the infoset, it MAY attempt to determine the language. This specification does not mandate how such a language tag is created. The user agent might:

If a language tag cannot be determined, the value "und" (undetermined) MUST be used.

Issue 53: Language of web publication v. language of manifest/resources

The question is whether the language declared for the manifest content is the same as the language of the publication, and how to deal with multilingual publications.

3.6 Canonical Identifier§

A Web Publication's canonical identifier is a unique identifier that resolves to the preferred version of the Web Publication. The canonical identifier SHOULD be an address, but, if not, it MUST be possible to make a one-to-one mapping to an address (e.g., a DOI can be resolved to a URL via a DOI resolver).

If a Web Publication is hosted at more than one address, this identifier allows a user agent to identify the shared relationship between the versions and to determine which of the available addresses is primary.

The canonical identifier is also intended to provide a measure of permanence above and beyond the Web Publication's address. Even if a Web Publication is permanently relocated to a new address, for example, the canonical identifier will provide a way of locating the new location (e.g., a DOI registry could be updated with the new URL, or a redirect could be added to the URL of the canonical identifier).

When assigned, the canonical identifier needs to be unique to one and only one Web Publication, independent of its address(es). Ensuring uniqueness is outside the scope of this specification, however. The actual uniqueness achievable depends on such factors as the conventions of the identifier scheme used and the degree of control over assignment of identifiers.

Note

If the canonical identifier is a URL, it can be used as the target of a "canonical" link [ RFC6596] (e.g., a [html] link element whose rel attribute has the value canonical or a Link HTTP header field [RFC5988] similarly identified).

Issue 58: is a canonical identifier necessary

The question is whether a canonical identifier is necessary to call out explicitly in the infoset, or whether it is/can be handled by other metadata.

3.7 Address§

A Web Publication's address is a URL that refers to a Web Publication and enables the retrieval of a representation of the manifest of the Web Publication.

The availability of this address does not preclude the creation and use of other identifiers and/or addresses to retrieve a representation of a Web Publication in whole or part.

Note
The Web Publication's address can also be used as value for an identifier link relation [ link-relation].

3.8 Resources§

The infoset MUST include a list of the Web Publication's resources, although the list is not required to be exhaustive. Resources in the default reading order MUST be included in this list.

Issue 22: manifest: requirements for offline

The discussion led to the question whether the manifest/infoset MUST list all resources or not. In this sense, this became a duplicate of issue #23 ended up at the same question.

Issue 23: MUST the manifest include information about secondary resources or not?

The question is whether the manifest/infoset MUST list all resources or not.

Issue 59: avoiding resource declaration duplication

The question is whether the manifest MUST list resources in the default reading order or whether this can be inferred.

3.9 Default Reading Order§

The default reading order is a specific progression through a set of Web Publication resources.

A user might follow alternative pathways through the content, but in the absence of such interaction the default reading order defines the expected progression from one resource to the next.

The default reading order MUST include at least one resource.

The default reading order is either specified directly in the manifest or a link is provided to an [ html] nav element whose list of links are processed to create one.

The process for extracting a default reading order from a nav element are as follows:

  1. extract a list of resource paths referenced from the href attribute of all a elements;
  2. strip any fragment identifiers from the references;
  3. resolve all relative paths to full URLs;
  4. remove all consecutive references to the same resource, leaving only the first.

If a user agent requires a default reading order and one is not provided in the infoset, it MAY attempt to construct one. This specification does not mandate how such a default reading order is created. The user agent might:

Issue 35: Proposal: an HTML-first Table of Contents approach to Web Publication

Define the default reading order of a Web Publication to be the files referenced in the first

Issue 39: Do all documents in the reading order have to be reachable from the ToC

There is a consensus that a Web Publication must have a reading order and must/should have a table of contents (the main navigation entry point).

3.10 Table of Contents§

The table of contents provides access to major sections of the Web Publication. There are no requirements on the completeness of the table of contents, except that, when specified, it MUST link to at least one resource in the default reading order.

The table of contents is either specified directly in the manifest or a link is provided to an [html] nav element containing one.

If a user agent requires a table of contents and one is not specified, it MAY construct one. This specification does not mandate how such a table of contents is created. The user agent might:

  1. attempt to locate a table of contents in the default reading order (e.g., an HTML document with a nav element that has the role attribute value doc-toc);
  2. use the titles of resources in the default reading order;
  3. calculate a table of contents using its own algorithms.
Issue

This question arises only if this mechanism is accepted: the question is whether a table of contents navigation element can refer, via links, to any resource that is not listed in the default reading order.

Editor's note

The issue of using the HTML nav element as a possible encoding of the table of contents is mentioned or explicitly addressed in a number of issues listed below.

Issue 35: Proposal: an HTML-first Table of Contents approach to Web Publication

Define the resources in the default reading order of a Web Publication to be the files referenced in the first

Issue 39: Do all documents in the reading order have to be reachable from the ToC

There is a consensus that a Web Publication must have a reading order and must/should have a table of contents (the main navigation entry point).

3.11 Publication Date§

The publication date is the date on which the Web Publication was originally published. It represents a static event in the lifecycle of a Web Publication and allows subsequent revisions to be identified and compared.

The exact moment of publication is intentionally left open to interpretation: it could be when the Web Publication is first made available online, or could be a point in time before publication when the Web Publication is considered final.

3.12 Last Modification Date§

The last modification date is the date when the Web Publication was last updated.

The last modification date SHOULD be updated whenever changes are made to the resources of the Web Publication, including the manifest. It does not necessarily reflect all changes to the Web Publication, however, as, for example, it might not reflect changes to third-party content.

3.13 Accessibility Metadata§

Accessibility metadata allows discovery of features and affordances of the Web Publication that enable its usability by users with specific reading requirements and needs.

3.14 Extensibility§

The infoset is designed to provide a basic set of properties for use by user agents in presenting and rendering a Web Publication. It MAY be extended in the following ways:

  1. through the inclusion of additional properties in the manifest;
  2. by the provision of linked metadata records.

User Agents MAY support additional properties but MUST NOT include unrecognized properties in the infoset. The use of linked records is RECOMMENDED whenever possible, as the use of native formats standardizes and simplifies processing by user agents.

4. Manifest§

4.1 Overview§

This section is non-normative.

A manifest is a specific serialization of a Web Publication's infoset.

4.2 Requirements§

The requirements for a conforming Web Publication manifest are as follows:

  1. It MUST declare that it describes a Web Publication.
  2. It MUST be serialized as defined in 4.4 Serialization.

4.3 Declaration§

Editor's note
Placeholder for how a manifest declares it describes a web publication.

4.4 Serialization§

Issue 25: manifest embedded, linked, both?

Should the manifest be in an external file, embedded in a specified manner, or should either option be allowed?

Should the table of contents be a separate HTML file or is the listing of resources in the default reading order an implicit table of contents?

Issue 32: Relationships to the Web App Manifest specification.

In case the (concrete) manifest is expressed in JSON (see issue #7), should it be defined “on top” (i.e., as some form of an extension) of the Web Application Manifest specification, or should it be a fully separate specification?

4.5 Linking to a Manifest§

Providing a link from a resource to its manifest allows a user agent to discover that a Web Publication is available. The inclusion of links is not always possible, however. For example, when the resource is hosted by a third party, or when access to modify HTTP headers is restricted.

The inclusion of links from resources to their manifests is therefore RECOMMENDED.

Links MUST take one or both of the following forms:

A user agent is only required to process the first publication link encountered, so a resource SHOULD NOT link to more than one Web Publication.

Issue 13: Associating a manifest with publication resources

If we have a collection of information about a web publication as a whole ("manifest") that exists separately from most of the publication's resources, we need to find a way to associate the manifest with the other publication resources.

Issue 32: Relationships to the Web App Manifest specification.

How linking to a manifest is done may change depending on whether integration with [AppManifest] is feasible and beneficial. For a list of differences in linking approaches, refer to the wiki analysis.

Issue 76: Allow links to multiple publications?

When it comes to linking to a manifest and initiating the reading experience, the question is whether to allow resources to link to multiple manifests and solve the problems of selecting from among the available publications, or only use the first link found.

4.6 Linking from a Manifest§

The manifest serialization MUST provide a general linking mechanism for defining a relationship between the Web Publication and other resources on the Web as well as the type of those relationships.

This mechanism is used in to express many parts of the Web Publication's infoset, including but not limited to:

Editor's note

There are some overlaps between this list and, e.g., the separate section on canonical identifiers.

This linking mechanism may also be used to express other common link structures on the open Web. For example:

Note

Some of these link structures, such as dynamic search links, may require support for URI templates [RFC6570] to be meaningfully useful in the context of a Web Publications. User agents should(?) support URI templates in order to make it easier for publishers to integrate dynamic server-side features into their publications with minimal coding and effort.

Issue 67: Should linking from manifests support URI templates?

URI templates rfc650 may be useful features to make external references easier to maintain.

5. Web Publication Lifecycle§

Editor's note

The publishing working group is currently evaluating the best approach for implementing web publications in user agents. This note is intnded to provide an overview of where current thinking is and what issues are under consideration.

The development of web publications is not viewed as a separate forking of the web, but an enhancement layer that can be supported by user agents. To that end, the primary constraints on any solution for web publications are that:

  • the rendering of web publications must not interfere with the underlying web model and APIs. All functionality and enhancements must be layered on top.
  • a web publication should not have to carry its own implementation code. Functionality is ideally provided by the user agent and/or polyfill.

While this specification will provide implementation flexibility for user agents, there are still a number of areas that have been identified as potentially needing to be detailed. These include:

  • initialization expectations for a web publication:

    • use agent initiation v. user prompts;
    • linked v. directly loaded manifests;
    • resources that belong to more than one web publication.
  • the creation of a "publication state":

    • persistence of publication information across page loads
    • location and persistence of UI;
    • indications of supported features;
    • DOM issues such as persistence of numbering schemes.
  • tracking the extent of a publication:

    • taking an entire publication offline;
    • enabling search across documents.
  • establishing the bounds of a publication:

    • when to end the publication state;
    • document history traversal;
    • how to handle links outside the publication.
  • updating of the manifest.

The working group intends to flesh out the lifecycle in later revisions once it is clearer what models are viable and what solutions can be standardized. Input on the feasibility and challenges of these approaches is welcome at any time.

6. Reading Enhancements§

Editor's note

This section contains placholders for possible reading enhancements the UA may/should/must provide. The list is subject to addition, modification and removal as the enhancements get discussed in more detail.

6.1 Navigation§

6.1.1 Reading Order§

Editor's note

Placeholder for automatic progression through the default reading order.

6.1.2 Table of Contents§

Editor's note

Placeholder for accessing a table of contents.

6.2 Offline Reading§

Editor's note

Placeholder for offline reading of a publication.

6.4 Pagination§

Editor's note

Placeholder for paginated reading experience.

7. Security§

Editor's note
Placeholder for security issues.

8. Privacy§

Editor's note
Placeholder for privacy issues.

A. Acknowledgements§

This section is non-normative.

The following people contributed to the development of this specification:

The Working Group would also like to thank the members of the Digital Publishing Interest Group for all the hard work they did paving the road for this specification.

B. References§

B.1 Normative references§

[BCP47]
Tags for Identifying Languages. A. Phillips; M. Davis. IETF. September 2009. IETF Best Current Practice. URL: https://tools.ietf.org/html/bcp47
[HTML]
HTML Standard. Anne van Kesteren; Domenic Denicola; Ian Hickson; Philip Jägenstedt; Simon Pieters. WHATWG. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RFC3987]
Internationalized Resource Identifiers (IRIs). M. Duerst; M. Suignard. IETF. January 2005. Proposed Standard. URL: https://tools.ietf.org/html/rfc3987
[RFC5988]
Web Linking. M. Nottingham. IETF. October 2010. Proposed Standard. URL: https://tools.ietf.org/html/rfc5988
[RFC6596]
The Canonical Link Relation. M. Ohye; J. Kupke. IETF. April 2012. Informational. URL: https://tools.ietf.org/html/rfc6596
[html]
HTML 5.1. Steve Faulkner; Arron Eicholz; Travis Leithead; Alex Danilo. W3C. 2016-11-01. W3C Recommendation. URL: https://www.w3.org/TR/html/
[publishing-linking]
Publishing and Linking on the Web. Ashok Malhotra; Larry Masinter; Jeni Tennison; Daniel Appelquist. W3C. 30 April 2013. W3C Note. URL: https://www.w3.org/TR/publishing-linking/
[url]
URL Standard. Anne van Kesteren. WHATWG. Living Standard. URL: https://url.spec.whatwg.org/

B.2 Informative references§

[AppManifest]
Web App Manifest. Marcos Caceres; Kenneth Christiansen; Mounir Lamouri; Anssi Kostiainen; Rob Dolin. W3C. 22 September 2017. W3C Working Draft. URL: https://www.w3.org/TR/appmanifest/
[RFC4287]
The Atom Syndication Format. M. Nottingham, Ed.; R. Sayre, Ed.. IETF. December 2005. Proposed Standard. URL: https://tools.ietf.org/html/rfc4287
[RFC6570]
URI Template. J. Gregorio; R. Fielding; M. Hadley; M. Nottingham; D. Orchard. IETF. March 2012. Proposed Standard. URL: https://tools.ietf.org/html/rfc6570
[WCAG20]
Web Content Accessibility Guidelines (WCAG) 2.0. Ben Caldwell; Michael Cooper; Loretta Guarino Reid; Gregg Vanderheiden et al. W3C. 11 December 2008. W3C Recommendation. URL: https://www.w3.org/TR/WCAG20/
Identifier: A Link Relation to Convey a Preferred URI for Referencing. H. Van de Sompel; M. Nelson; G. Bilder; J. Kunze; S. Warner. IETF. URL: https://tools.ietf.org/html/draft-vandesompel-identifier-00
[webmention]
Webmention. Aaron Parecki. W3C. 12 January 2017. W3C Recommendation. URL: https://www.w3.org/TR/webmention/