Abstract

This specification defines digital publishing based on a fully native representation of publications within the Open Web Platform.

Status of This Document

This is a preview

Do not attempt to implement this version of the specification. Do not reference this version as authoritative in any way. Instead, see https://w3c.github.io/wpub/ for the Editor's draft.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This first public working draft provides a preliminary outline of a Web Publication. Many details are under active consideration within the Publishing Working Group and are subject to change. The most prominent known issues have been identified in this document and links provided to comment on them.

In particular, the Working Group seeks feedback on the following issues:

This document was published by the Publishing Working Group as an Editor's Draft. Comments regarding this document are welcome. Please send them to public-publ-wg@w3.org ( subscribe , archives ).

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .

This document is governed by the 1 March 2017 W3C Process Document .

1. Introduction §

1.1 Background §

This section is non-normative.

For millenia now, the written word has been the primary means of encoding and sharing ideas and information. The publication as a bounded edition, made public, has been used to carry intellectual and artistic works of innumerable form: novels, plays, poetry, journals, magazines, newspapers, articles, laws, treatises, pamphlets, atlases, comics, manga, notebooks, memos, manuals, and albums of all sorts.

More recently, with the advent of the information age, print has been ceding ground to digital, and the Web has become a major forum for the public dissemination of ideas. But the Web is unbounded: information and resources are only loosely connected through hyperlinks. While this model has helped the Web thrive in many areas, it has proven problematic for traditional information publishing—users often cannot access works in their entirety, especially when offline, and have not been able to easily access, compile and download content for curation and personal use. That, in turn, has fed the continuing development of non-Web document formats to redress these problems, and made it necessary to create both Web-ready content and alternative offline renditions to ensure publications are fully available.

This specification aims to reduce these barriers and reinvigorate publishing by combining the best aspects of both models—the persistent availability and portability of bounded publications with the pervasive accessibility, addressability, and interconnectedness of the Open Web Platform.

1.2 Scope §

This section is non-normative.

This specification only defines requirements for the production and rendering of valid Web Publications . As much as possible, it leverages existing Open Web Platform technologies to achieve its goal—that being to allow for a measure of boundedness on the Web without changing the way that the Web itself operates.

Moreover, the specification is designed to adapt automatically to updates to Open Web Platform technologies in order to ensure that Web Publications continue to interoperate seamlessly as the Web evolves (e.g., by referencing the latest published versions instead of specific versions).

Further, this specification does not attempt to constrain the nature of a Web Publication: any type of work that can be represented on the Web constitutes a potential Web Publication.

1.3 Terminology §

Wherever appropriate, this document relies on terminology defined by the note on "Publishing and Linking on the Web" [ publishing-linking ]. In particular, for the following terms: user , user agent , browser , and address .

Default Reading Order

The default reading order is a specific progression through one or more primary resources defined in the manifest by the author of a Web Publication .

A user might follow alternative pathways through the content, but in the absence of such interaction the default reading order defines the expected progression from one primary resource to the next.

Identifier

An identifier is metadata that can be used to refer to a Web Content in a persistent and unambiguous manner. URLs, URNs, DOIs, ISBNs, or PURLs are all examples of persistent identifiers frequently used in publishing.

Manifest

A manifest represents structured information about a Web Publication , such as informative metadata, a list of all primary and secondary resources , and a default reading order .

Primary Resource

A primary resource is one that is presented directly by a user agent (i.e., not embedded within another).

Secondary Resource

A secondary resource is one that is required for the processing or rendering of a primary resource .

URL

In this specification, the general term URL is used as in other W3C specifications like HTML [ html ], and is defined by URL Standard of the WhatWG [ url ]. In particular, such a URL allows for the usage of characters from Unicode following [ rfc3987 ]. See the note in the HTML5 document for further details.

Web Publication

A Web Publication is a collection of one or more primary resources , organized together through a manifest into a single logical work with a default reading order . The Web Publication is uniquely identifiable and presentable using Open Web Platform technologies.

2. Conformance §

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY , MUST , and SHOULD are to be interpreted as described in [ RFC2119 ].

2.1 Conformance Classes §

This specification defines two conformance classes: one for Web Publications and one for user agents that process them.

A Web Publication is conformant to this specification if it meets the following criteria:

A user agent is conformant to this specification if it meets the following criteria:

3. Information Set §

Editor's note

The name "infoset" may change depending on feedback. Although this term has a different meaning for individuals familiar with XML, alternatives such as "properties" and "metadata" do not fully capture the nature or purpose.

3.1 Overview §

This section is non-normative.

A Web Publication is defined by a set of properties and features known as its information set (infoset). The infoset is both abstract and concrete. It is abstract in the sense that it represents a set of information that a user agent has to be able to compile about the Web Publication, but it also becomes concrete when the user agent creates an internal representation of the information.

The infoset does not require a specific serialization. It is primarily compiled from a Web Publication's manifest , whose serialization requirements are defined in Manifest . It is therefore possible to express the same infoset via different manifests, although a Web Publication will only have one manifest.

Although the manifest is the primary source of the infoset, some information may be obtained independent of it. For example, fallback rules for properties defined in the following subsections allow a user agent to compile information that the author has not provided in the manifest, whether as an intentional optimization or by accidental omission.

3.2 Requirements §

The Web Publication infoset MUST include the following information:

In addition, the infoset SHOULD include the following information:

Editor's note

These requirements reflect the current minimum consensus, though a number of issues remain open that could change whether an item is required or recommended. See the following sections for more information.

Issue 15 : Minimum Viable Manifest

Ignoring issues such as location, serialization, etc. What is the minimum viable manifest? (Note: this is now specifically related to the infoset.)

Issue 21 : manifest: metadata

Whether the minimum manifest must include any metadata, or a specific slot to handle metadata. (Note: this is now more specifically related to the infoset.)

3.3 Title §

The Web Publication's infoset requires a title.

If a title is not specified in the manifest , the user agent MUST provide one as follows:

  1. If the Web Publication contains at least one primary resource whose media type includes a title element (e.g., SVG or HTML), use the title of the first instance listed in the default reading order.
  2. If the preceding step results in an empty title, or no primary resources are recognized as having a title, the user agent MUST use its own algorithm to produce a title (e.g., inspect subsequent primary resources or provide a placeholder title).
Note

A user agent may not be able to produce a meaningful title [ WCAG20 ] for a Web Publication based on the above rules. Authors are encouraged to ensure that their manifest contains such a title, or one is provided in one of the two steps above.

Issue 20 : manifest: title

(See also issue #24 .) The question is whether the manifest MUST include a title or not.

In the current proposal, a title is required in the infoset because the user agent must create one using the fallback mechanism when not present in the manifest.

3.4 Language §

The Web Publication's infoset requires the language(s) of its content be specified. The language MUST be a tag that conforms to [ BCP47 ] or its successors.

If the language is not specified in the manifest , the user agent MUST provide one as follows:

  1. If the Web Publication contains at least one primary resource whose media type allows the language to be specified (e.g., SVG or HTML), use the language of the first instance listed in the default reading order.
  2. If the preceding step results in an empty language tag, use the value " und " (undetermined).
Issue 29 : For manifest in FPWD: Should Natural Language be Required per WCAG 2

The question is whether the manifest MUST include the language(s) of the content or not.

In the current proposal, language is required in the infoset because the user agent must create one using the fallback mechanism when not present in the manifest.

3.5 Canonical Identifier §

A Web Publication's canonical identifier is an identifier assigned to it by the publisher. It SHOULD be an address , but, if not, it MUST be possible to make a 1-to-1 mapping to an address.

If the canonical identifier is a URL, it MAY be used as the value of the href attribute of a "canonical" link element [ rfc6596 ] (i.e., a link element with the attribute rel="canonical" specified on it).

If assigned, this canonical identifier MUST be unique to the Web Publication .

3.6 Address §

A Web Publication's address is a URL that refers to a Web Publication and enables the retrieval of a representation of the manifest of the Web Publication.

The availability of this address does not preclude the creation and use of other identifiers and/or addresses to retrieve a representation of a Web Publication in whole or part.

Note
The Web Publication's address can also be used as value for an identifier link relation [ link-relation ].

3.7 Resources §

The infoset MUST include a list of the primary resources of the Web Publication, regardless of whether a primary resource is also used as a secondary resource in another context (e.g., an image might be embedded in an HTML document and also be rendered directly in a user agent via a link to view the raw image).

The infoset also SHOULD list secondary resources, although the list is not required to be exhaustive.

Issue 22 : manifest: requirements for offline

The discussion led to the question whether the manifest/infoset MUST list all Secondary resources or not. In this sense, this became a duplicate of issue #23 ended up at the same question.

Issue 23 : MUST the manifest include information about secondary resources or not?

The question is whether the manifest/infoset MUST list all Secondary resources or not.

3.8 Default Reading Order §

The default reading order MUST include at least one primary resource .

If the default reading order is not specified in the manifest , but the table of contents is available (either as part of the manifest or retrived from among the primary resources ), the primary resources listed in that table of contents also provide the default reading order (multiple references to the same primary resource in the table of contents should be disregared in favor of the first occurence of that resource).

Editor's note

The relationship between the default reading order and the table of contents is the subject of several issues; see the list in the section on table of contents .

Issue 26 : Should the manifest be an implicit TOC?

Should the TOC be a separate HTML file or is the listing of primary resources in the manifest an implicit TOC?
See #2

Issue 35 : Proposal: an HTML-first Table of Contents approach to Web Publication

Define the primary resources of a WP to be the files referenced in the first

Issue 36 : Is the ToC sufficient to provide reading order?

Relates to both #26 "Should the manifest be an implicit TOC?" and the more recent #35 "Proposal: an HTML-first Table of Contents approach to Web Publication."

Issue 39 : Do all documents in the reading order have to be reachable from the ToC

There is a consensus that a Web publication must have a reading order (a list of primary resources) and must/should have a table of contents (ToC) (the main navigation entry point).

3.9 Table of Contents §

Editor's note
Placeholder for identifying table of contents - whether embedded or by reference.

4. Manifest §

4.1 Overview §

This section is non-normative.

A manifest is a specific serialization of a Web Publication's infoset .

4.2 Requirements §

The requirements for a conforming Web Publication manifest are as follows:

  1. It MUST declare that it describes a Web Publication.
  2. It MUST be serialized as defined in Serialization .

4.3 Declaration §

Editor's note
Placeholder for how a manifest declares it describes a web publication.

4.4 Serialization §

Issue 7 : manifest format

Format of the Manifest (JSON, XML, embedded into HTML, etc.).

Issue 25 : manifest embedded, linked, both?

Should the manifest be in an external file, embedded in a specified manner, or should either option be allowed?

Issue 26 : Should the manifest be an implicit TOC?

Should the table of contents be a separate HTML file or is the listing of primary resources in the manifest an implicit table of contents?

Issue 32 : Relationships to the Web App Manifest specification.

In case the (concrete) manifest is expressed in JSON (see issue #7 ), should it be defined “on top” (i.e., as some form of an extension) of the Web Application Manifest specification, or should it be a fully separate specification?

5. Establishing a Web Publication §

Editor's note
Placeholder for how primary resources identify they belong to a publication and discussion of the lifecycle of a publication.
Issue 13 : Associating a manifest with publication resources

If we have a collection of information about a web publication as a whole ("manifest") that exists separately from most of the publication's resources, we need to find a way to associate the manifest with the other publication resources.

6. Security §

Editor's note
Placeholder for security issues.

7. Privacy §

Editor's note
Placeholder for privacy issues.

A. Acknowledgements §

This section is non-normative.

The following people contributed to the development of this specification:

The Working Group would also like to thank the members of the Digital Publishing Interest Group for all the hard work they did paving the road for this specification.

B. References §

B.1 Normative references §

[BCP47]
Tags for Identifying Languages . A. Phillips; M. Davis. IETF. September 2009. IETF Best Current Practice. URL: https://tools.ietf.org/html/bcp47
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels . S. Bradner. IETF. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[html]
HTML 5.1 . Steve Faulkner; Arron Eicholz; Travis Leithead; Alex Danilo. W3C. 2016-11-01. W3C Recommendation. URL: https://www.w3.org/TR/html/
[publishing-linking]
Publishing and Linking on the Web . Ashok Malhotra; Larry Masinter; Jeni Tennison; Daniel Appelquist. W3C. 30 April 2013. W3C Note. URL: https://www.w3.org/TR/publishing-linking/
[rfc3987]
Internationalized Resource Identifiers (IRIs) . M. Duerst; M. Suignard. IETF. January 2005. Proposed Standard. URL: https://tools.ietf.org/html/rfc3987
[rfc6596]
The Canonical Link Relation . M. Ohye; J. Kupke. IETF. April 2012. Informational. URL: https://tools.ietf.org/html/rfc6596
[url]
URL Standard . Anne van Kesteren. WHATWG. Living Standard. URL: https://url.spec.whatwg.org/

B.2 Informative references §

[WCAG20]
Web Content Accessibility Guidelines (WCAG) 2.0 . Ben Caldwell; Michael Cooper; Loretta Guarino Reid; Gregg Vanderheiden et al. W3C. 11 December 2008. W3C Recommendation. URL: https://www.w3.org/TR/WCAG20/
Identifier: A Link Relation to Convey a Preferred URI for Referencing . H. Van de Sompel; M. Nelson; G. Bilder; J. Kunze; S. Warner. IETF. URL: https://tools.ietf.org/html/draft-vandesompel-identifier-00