Copyright © 2017 W3C ® ( MIT , ERCIM , Keio , Beihang ). W3C liability , trademark and permissive document license rules apply.
Selecting part of a resource on the Web is an ubiquitous action. Over the years several selection techniques have been developed, usually in conjunction with the media type of the resource. Many of these approaches are also expressed in terms of a fragment identifiers [ url ], but that is not always the case.
This document does not define any new approach to selection. Instead, it relies on existing techniques, providing a common model and syntax to express and possibly combine selections. Although defined in conjunction with Web Publications, the techniques described in this document can be used for any type of Web Resources
The formal specification and the semantics originate from a separate W3C Recommendation, namely the Web Annotation Data Model [ annotation-model ], where it is used to select targets of annotations. That model has been extended by adding two more selector types (see the Introduction for further details.)
The document consists of two parts: a description of, essentially, the Selector Model as defined by the Web Annotation Data Model, and a reformulation of that data model in the form of Fragment ID-s. It is not clear, at this moment, whether the standardization of fragment identifiers is necessary, or whether the JSON based structure fulfills the needs of the requirements. If the latter, we can remove the relevant section, and the only possible normative extension in the document is described in issue #4 .
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Publishing Working Group as an Editor's Draft. Comments regarding this document are welcome. Please send them to public-publ-wg@w3.org ( subscribe , archives ).
Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 1 March 2017 W3C Process Document .
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY , MUST , MUST NOT , NOT RECOMMENDED , RECOMMENDED , SHOULD , and SHOULD NOT are to be interpreted as described in [ RFC2119 ].
This specification relies on a subset of JSON terms originally defined as part of the Web Annotation Data Model [ annotation-model ] and Vocabulary [ annotation-vocab ]. This specification extends the definitions of some of those terms in order to satisfy additional use cases, but all uses conforming to original definitions remain valid. This specification also defines additional JSON terms to meet needs of additional use cases. In order to ensure backward compatibility, implementations of this specification MAY ignore any JSON terms not defined in this specification (directly or by reference to the Web Annotation Data Model and Vocabulary) and MUST NOT treat as invalid any JSON term encountered that is not defined in this specification.
This section is non-normative.
Selecting part of a resource on the Web is an ubiquitous action. Interactive editing of a resources, highlighting an area on the screen, adding an annotation to a specific point in a resource, or defining a bookmarks to a section of a long document are all examples that involve selection within a resource. Over the years several selection techniques have been developed, usually in conjunction with the media type of the resource. These include referring to a unique identifier within a resource, defining a time interval for an audio or video track, identifying an element within the DOM tree for an XML source, or using CSS style elements to locate content. Many of these approaches are also expressed in terms of a fragment identifiers [ url ], but that is not always the case.
This document does not define any new approach to selection. Instead, it relies on existing techniques, providing a common model and syntax to express selections. Furthermore, the model also includes a way to combine selections via refinements, a feature that may greatly improve the efficiency of applications relying on complex selections. Such a common model makes it easier to provide generic and interoperable tools and APIs to handle selections in various applications.
A selection or state, as described in this document, may have its own unique identity in the form of an URL. This URL SHOULD be dereferencable and return the selection/state definition itself.
Using the URL of the selection definition, instead of the reference to the “complete” resource could be seen as akin to a server side redirection, returning part of a resource.
The data model is defined in [ json ], in the form of JSON objects and keys. The formal specification and the semantics of these originate from a larger model, namely the Web Annotation Data Model [ annotation-model ], where it is used to select targets of annotations. The current document “extracts” Selectors and States from that data model; by doing so, it makes their usage easier for applications developers whose concerns are not related to annotations. Compared to the Web Annotation Data Model, however, this document adds two new selectors, namely:
Both of these new selectors aim the particular requirement of resource collections on the Web, like Web Applications or Web Publications [ wpub ].
This section is normative
Wherever appropriate, this document relies on terminology defined by the note on “Publishing and Linking on the Web” [ publishing-linking ], including, in particular, user , user agent , browser , and address . Furthermore, the document also relies on some additional terms defined by the “Web Publication” [ wpub ], including a URL .
source
term,
and
MAY
contain
other
terms
to
refine
the
selection.
The reference to the TR version of the WPUB document must be used (for the url definition), when available.
This section is non-normative.
A Locator serves as a wrapper around the selection of another Web Resource . This extra selection is done via Specifiers that can be:
Specifiers MAY be External Web Resources with their own URLs, such as in the example for the Selector construction, however it is RECOMMENDED that they be included in the representation to avoid requiring unnecessary network interactions to retrieve all of the information.
| Term | Type | Description |
|---|---|---|
| id | Property |
The
identity
of
the
Locator
A Locator SHOULD have exactly 1 URL that identifies it. |
| source | Relationship |
The
relationship
between
a
Locator
and
the
resource
that
it
is
a
more
specific
representation
of,
i.e.,
the
Source
.
There MUST be exactly 1
source
relationship
associated
with
a
Locator.
The
source
resource
MAY
be
described
in
detail
as
in
the
core
data
model
or
be
just
the
resource’s
URL.
|
| scope | Relationship |
The
relationship
between
a
Locator
and
the
resource
that
provides
the
scope
or
context
in
this
selection.
There MAY be 0 or more scope relationships for each Locator. Conceptually, if no scope is provided, the value of the
source
relationship
can
be
considered
as
the
scope.
|
The
WA
scope
facility
has
been
added;
do
we
need
when
we
also
have
the
Embedded
Resource
Selector?
This section is non-normative.
Selection of part of a Web Resource requires two distinct entities:
A Selector object is used to describe how to determine the Segment from within the Source resource. The nature of the Selector is dependent on the type of resource, as the methods to describe Segments from various media-types differ. These two entities are encapsulated in a Locator .
Example Use Case: Qitara wants to associate a selection of text in a web page with a slice of a dataset. She selects both using her client, and creates Locators with Selectors for both entities before associating them with one another.
| Term | Type | Description |
|---|---|---|
| selector | Relationship |
The
relationship
between
a
Locator
and
a
Selector.
There MAY be 0 or more
selector
relationships
associated
with
a
Locator.
Multiple
Selectors
SHOULD
select
the
same
content,
however
some
Selectors
will
not
have
the
same
precision
as
others.
User
Agents
MUST
pick
one
of
the
described
segments
,
if
they
are
different.
|
{
"source": "http://example.org/page1",
"selector": "http://example.org/paraselector1"
}
As the most well understood mechanism for selecting a Segment is to use the fragment part of a URL defined by the representation’s media type, it is useful to allow this as a description mechanism via a Selector. This allows existing and future fragment specifications to be used with Locators in a consistent way. To be clear about which fragment type is being used, the Selector may refer to the specification that defines it.
Example Use Case: Ramona wants to associate part of a video as the description of an image. She selects the time range within the video and clicks that it is describing the target. Her client then creates the Annotation using a Locator with a FragmentSelector.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
FragmentSelectors MUST have exactly 1
type
and
the
value
MUST
be
FragmentSelector
.
|
| value | Property |
The
contents
of
the
fragment
component
of
a
URL
that
describes
the
Segment.
The FragmentSelector MUST have exactly 1
value
property.
|
| conformsTo | Relationship |
The
relationship
between
the
FragmentSelector
and
the
specification
that
defines
the
syntax
of
the
URL
fragment
in
the
value
property.
The Fragment Selector SHOULD have exactly 1
conformsTo
link
to
the
specification
that
defines
the
syntax
of
the
fragment
and
MUST
NOT
have
more
than
1.
|
It
is
RECOMMENDED
to
use
FragmentSelector
as
a
consistent
method
compatible
with
other
means
of
describing
Locators,
rather
than
using
the
URL
with
a
fragment
directly.
User
Agents
SHOULD
be
aware
of
both.
The
following
URLs
are
some
of
the
specifications
that
define
the
semantics
of
fragments,
and
hence
may
be
used
with
the
conformsTo
relationship.
Other
URLs
MAY
also
be
used.
| Name | Fragment Specification | Description |
|---|---|---|
| HTML | http://tools.ietf.org/rfc/rfc3236 |
[
rfc3236
]
Example:
namedSection
|
| http://tools.ietf.org/rfc/rfc3778 |
[
rfc3778
]
Example:
page=10&viewrect=50,50,640,480
|
|
| Plain Text | http://tools.ietf.org/rfc/rfc5147 |
[
rfc5147
]
Example:
char=0,10
|
| XML | http://tools.ietf.org/rfc/rfc3023 |
[
rfc3023
]
Example:
xpointer(/a/b/c)
|
| CSV | http://tools.ietf.org/rfc/rfc7111 |
[
rfc7111
]
Example:
row=5-7
|
| Media | http://www.w3.org/TR/media-frags/ |
[
media-frags
]
Example:
xywh=50,50,640,480
|
| SVG | http://www.w3.org/TR/SVG/ |
[
svg11
]
Example:
svgView(viewBox(50,50,640,480))
|
source
,
a
#
,
and
the
value
.
For
example,
the
URL
from
the
example
below
would
be
http://example.org/video1#t=30,60
.
{
"source": "http://example.org/video1",
"selector": {
"type": "FragmentSelector",
"conformsTo": "http://www.w3.org/TR/media-frags/",
"value": "t=30,60"
}
}
One of the most common ways to select elements in the HTML Document Object Model is to use CSS Selectors [ css3-selectors ]. CSS Selectors allow for a wide variety of well supported ways to describe the path to an element in a web page, and thus cover many of the basic use cases for selection. Results are not defined for when a CSS Selector is applied to a representation that does not conform to the Document Object Model.
Example Use Case: Sally selects a paragraph in a one of the chapters of a Web Publication that she wishes bookmark. Her client calculates a CSS path that cleanly identifies that element and stores CSS Selector in its local bookmark store. Because the selection is made as part of a Web Publication (e.g., through the WP aware User Agent), a reference to the Web Publication’s address is also added to the bookmark.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
CssSelectors MUST have exactly 1
type
and
the
value
MUST
be
CssSelector
.
|
| value | Property |
The
CSS
selection
path
to
the
Segment.
There MUST be exactly 1
value
associated
with
a
CSS
Selector.
|
{
"scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
"selector": {
"type": "CssSelector",
"value": "#elemid > .elemclass + p"
}
}
Another common method of selecting elements and content within a resource that supports the Document Object Model (DOM), such as documents in XML or HTML, is to use an XPath selection [ dom-level-3-xpath ]. XPath allows a great deal of flexibility when describing the path through the structure to the selected content. Results are not defined for when an XPath Selector is applied to a representation that does not conform to the DOM.
Example Use Case: Teynika selects a span within a table in an HTML page and writes a note about the content. To refer explicitly to this element, her client carefully constructs an XPath to identify the relevant element.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
XPath Selectors MUST have exactly 1
type
and
the
value
MUST
be
XPathSelector
.
|
| value | Property |
The
xpath
to
the
selected
segment.
There MUST be exactly 1
value
associated
with
an
XPath
Selector.
|
{
"source": "http://example.org/page1.html",
"selector": {
"type": "XPathSelector",
"value": "/html/body/p[2]/table/tr[2]/td[3]/span"
}
}
This Selector describes a range of text by copying it, and including some of the text immediately before (a prefix) and after (a suffix) it to distinguish between multiple copies of the same sequence of characters.
For example, if the document was “abcdefghijklmnopqrstuvwxyz”, one could select “efg” by a prefix of “abcd”, the match of “efg” and a suffix of “hijk”.
Example Use Case: Ulrika selects a typo (‘anotation’) in a web page and adds a comment that it should be replaced with the correct spelling (‘annotation’).
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
Text Quote Selectors MUST have exactly 1
type
and
the
value
MUST
be
TextQuoteSelector
.
|
| exact | Property |
A
copy
of
the
text
which
is
being
selected,
after
normalization.
Each TextQuoteSelector MUST have exactly 1
exact
property.
|
| prefix | Property |
A
snippet
of
text
that
occurs
immediately
before
the
text
which
is
being
selected.
Each TextQuoteSelector SHOULD have exactly 1
prefix
property,
and
MUST
NOT
have
more
than
1.
|
| suffix | Property |
The
snippet
of
text
that
occurs
immediately
after
the
text
which
is
being
selected.
Each TextQuoteSelector SHOULD have exactly 1
suffix
property,
and
MUST
NOT
have
more
than
1.
|
The selection of the text MUST be in terms of unicode code points (the “character number”), not in terms of code units (that number expressed using a selected data type). Selections SHOULD NOT start or end in the middle of a grapheme cluster. The selection MUST be based on the logical order of the text, rather than the visual order, especially for bidirectional text. For more information about the character model of text used on the web, see [ charmod ]. Also, The text MUST be normalized for the purpose of selection. Thus HTML/XML tags SHOULD be removed, and character entities SHOULD be replaced with the character that they encode.
If, after processing the prefix, exact, and suffix, the user agent discovers multiple matching text sequences, then the selection SHOULD be treated as matching all of the matches.
{
"source": "http://example.org/page1",
"selector": {
"type": "TextQuoteSelector",
"exact": "anotation",
"prefix": "this is an ",
"suffix": " that has some"
}
}
This Selector describes a range of text by recording the start and end positions of the selection in the stream. Position 0 would be immediately before the first character, position 1 would be immediately before the second character, and so on. The start character is thus included in the list, but the end character is not.
For example, if the document was “abcdefghijklmnopqrstuvwxyz”, the start was 4, and the end was 7, then the selection would be “efg”.
Example Use Case: Valeria writes a review of an ebook that does not allow its content to be extracted and copied. Her client describes the selection using its start and end position in the content.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
Text Position Selectors MUST have exactly 1
type
and
the
value
MUST
be
TextPositionSelector
.
|
| start | Property |
The
starting
position
of
the
segment
of
text.
The
first
character
in
the
full
text
is
character
position
0,
and
the
character
is
included
within
the
segment.
Each TextPositionSelector MUST have exactly 1
start
property,
and
the
value
MUST
be
a
non-negative
integer.
|
| end | Property |
The
end
position
of
the
segment
of
text.
The
character
is
not
included
within
the
segment.
Each TextPositionSelector MUST have exactly 1
end
property,
and
the
value
MUST
be
a
non-negative
integer.
|
The text MUST be selected and normalized in the same way as for the Text Quote Selector before counting the number of characters to determine the start and end positions.
{
"source": "http://example.org/ebook1",
"selector": {
"type": "TextPositionSelector",
"start": 412,
"end": 795
}
}
Similar to the Text Position Selector , the Data Position Selector uses the same properties but works at the byte in bitstream level rather than the character in text level.
Example Use Case: Wendy produces visualizations of regions of online disk images for as part of her publication. She calculates the start and end positions from the binary stream and stores that as a reference using the DataPositionSelector.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
Data Position Selectors MUST have exactly 1
type
and
the
value
MUST
be
DataPositionSelector
.
|
| start | Property |
The
starting
position
of
the
segment
of
data.
The
first
byte
is
character
position
0.
Each DataPositionSelector MUST have exactly 1
start
property.
|
| end | Property |
The
end
position
of
the
segment
of
data.
The
last
character
is
not
included
within
the
segment.
Each DataPositionSelector MUST have exactly 1
end
property.
|
{
"source": "http://example.org/diskimg1",
"selector": {
"type": "DataPositionSelector",
"start": 4096,
"end": 4104
}
}
An SvgSelector defines an area through the use of the Scalable Vector Graphics [ svg11 ] standard. This allows the user to select a non-rectangular area of the content, such as a circle or polygon by describing the region using SVG. The SVG may be either embedded or referenced as an External Web Resource .
Note that the SvgSelector uses SVG to select an area of a resource. Segments of an SVG representation may also be selected using selectors, including the FragmentSelector or even an SvgSelector.
Example Use Case: Xena is tagging an old map online with a diagonal region for a historical road. Her client creates SVG polygon to highlight the region by overlaying a transparent area with a different color.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
SVG Selectors MUST have exactly 1
type
and
the
value
MUST
include
SvgSelector
.
|
| value | Property |
The
character
sequence
of
the
SVG
content.
There MAY be exactly 1
value
property
associated
with
the
Selector,
and
if
so
the
value
of
the
property
MUST
be
well-formed
SVG
XML.
|
The dimensions of the SVG shape or canvas MUST be relative to the dimensions of the Source , such that scaling the shape’s size to the full size of the image correctly describes the desired area.
{
"source": "http://example.org/map1",
"selector": {
"type": "SvgSelector",
"id": "http://example.org/svg1"
}
}
{
"source": "http://example.org/map1",
"selector": {
"type": "SvgSelector",
"value": "<svg:svg> ... </svg:svg>"
}
}
Selections made by users may be extensive and/or cross over internal boundaries in the representation, making it difficult to construct a single selector that robustly describes the correct content. A Range Selector can be used to identify the beginning and the end of the selection by using other Selectors. In this way, two points can be accurately identified using the most appropriate selection mechanisms, and then linked together to form the selection. The selection consists of everything from the beginning of the starting selector through to the beginning of the ending selector, but not including it.
Example Use Case: Yadira wants to comment on text in a Web Publication that spreads over several paragraphs. She selects the start and the end of the selection; her User Agent calculates the Range Selector using the first selection as a start and the second selector as the end.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
Range Selectors MUST have exactly 1
type
and
the
value
MUST
be
RangeSelector
.
|
| startSelector | Relationship |
The
Selector
which
describes
the
inclusive
starting
point
of
the
range.
There MUST be exactly 1
startSelector
associated
with
a
Range
Selector.
|
| endSelector | Relationship |
The
Selector
which
describes
the
exclusive
ending
point
of
the
range.
There MUST be exactly 1
endSelector
associated
with
a
Range
Selector.
Both
startSelector
and
endSelector
SHOULD
be
of
the
same
class.
|
{
"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
"selector": {
"type": "RangeSelector",
"startSelector": {
"type": "TextQuoteSelector",
"exact": "Call me Ishmael.",
"suffix": "Some years ago"
},
"endSelector": {
"type": "TextQuoteSelector",
"exact": "He desires to paint you the dreamiest, ",
"prefix": "But here is an artist. ",
"suffix": "shadiest, quietest"
}
}
}
This section is normative
For some use cases it is required to identify a fragment that spans, possibly, over multiple contiguous members of a group of resources (e.g., a subset in order of the resources which comprise a Web Publication [ wpub ]). A Multi Resource Selection can be used to identify this span by creating an ordered list of Locators. The selection consists of everything from the beginning of the starting selector in the first Locator, all selections identified by the intermediate Locators in the list (if any), through to the beginning of the ending selector, but not including it.
A Range Selector may be considered as a special case for a Multi Resource Selector, albeit defined much more succinctly.
Example Use Case: Előd wants to comment on a text in a Web Publication that spreads over several resources within the Web Publication. He selects the start and the end of the selection in two different constituent resoruces; his User Agent calculates the Multi Resource Selector using the first selection as a start, the second selector as the end, and the resources listed in the default reading order of the Web Publication as intermediate selections.
The reference to the TR version of the WPUB document must be used, when available.
The definition is complex, and we have to be sure that the selector really has reasonable use cases. Selection of 2 consecutive resources is covered by the usage of the Embedded Resource Selector (combined with refinement).
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
Range Selectors MUST have exactly 1
type
and
the
value
MUST
be
MultiResourceSelector
.
|
| locators | Relationship |
A
list
of
Locators.
There MUST be exactly 1
locators
associated
with
a
Multi
Resource
Selector.
The list MUST have at least 1 element. |
{
"source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"selector": {
"type": "MultiResourceSelector",
"locators": [{
"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
"selector": {
"type": "TextQuoteSelector",
"exact": "Call me Ishmael.",
"suffix": "Some years ago"
}
},{
"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html"
},{
"source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c003.html",
"selector": {
"type": "TextQuoteSelector",
"exact": "The opposite wall of this entry",
"suffix": " was hung"
}
}]
}
}
This section is normative
For
some
use
cases
it
is
required
to
identify
a
resource
that
part
of
a
group
of
resources,
where
that
group
has
its
own
identity
on
the
Web
(and
can
be
identified
via
its
own
URL).
An
example
may
be
resource
representing
a
chapter
as
part
of
a
Web
Publication [
wpub
]).
An
Embedded
Resource
Selector
can
be
used
to
identify
that
resource
through
the
value
relationship.
This
Selector
is
usually
used
in
conjunction
with
other
Selectors,
e.g.,
through
a
refinement
.
Example Use Case: Janine wants to select the cover image of a Web Publication, which is linked to the Web Publication as a whole. She uses an Embedded Resource Selector to designate the image, with the Web Publication’s address as the Source for the selector.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
Selector.
Range Selectors MUST have exactly 1
type
and
the
value
MUST
be
EmbeddedResourceSelector
.
|
| value | Relationship |
The
URL
of
the
resource
within
the
collection
or
resources
identified
by
the
Source
.
An EmbeddedResourceSelector MUST have exactly 1
value
property.
|
{
"source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"selector": {
"type": "EmbeddedResourceSelector",
"value": "https://dauwhe.github.io/html-first/MobyDickNav/images/moby-dick-book-cover.jpg"
}
}
It may be easier, more reliable, or more accurate to specify the segment of interest of a resource as a selection of a selection, rather than as a selection of the complete resource. Particularly for resources that contain other resources, such as various packaging formats, this also allows decomposition of the selection mechanisms when the components do not have unique identifiers. This is accomplished by having selectors chained together, where each refines the results of the previous one.
Example Use Case: Zara selects a paragraph of text and then a short phrase within it. Her client records the phrase as a TextQuoteSelector that further modifies a FragmentSelector used to identify the paragraph that the phrase is part of.
Example Use Case: Brianne wants to comment on a text in a Web Publication that spreads over two consecutive resources in a Web Publication. The user agent cannot use the Range Selector with, e.g., Text Quote Selectors directly, because those Selectors would rely on the same Source . Instead, Embedded Resource Selectors are used for the start and the end, each using the Selector Refinement to identify the necessary quote.
| Term | Type | Description |
|---|---|---|
| refinedBy | Relationship |
The
relationship
between
a
broader
selector
and
the
more
specific
selector
that
should
be
applied
to
the
results
of
the
first.
A Selector MAY be
refinedBy
1
or
more
other
Selectors.
If
more
than
1
is
given,
then
they
are
considered
to
be
alternatives
that
will
result
in
the
same
selection.
|
{
"source": "http://example.org/page1",
"selector": {
"type": "FragmentSelector",
"value": "para5",
"refinedBy": {
"type": "TextQuoteSelector",
"exact": "Selected Text",
"prefix": "text before the ",
"suffix": " and text after it"
}
}
}
{
"source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
"selector": {
"type": "RangeSelector",
"startSelector": {
"type": "EmbeddedResourceSelector",
"value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
"refinedBy": {
"type": "TextQuoteSelector",
"exact": "Call me Ishmael"
}
},
"endSelector": {
"type": EmbeddedResourceSelector",
"value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html",
"refinedBy": {
"type": "TextQuoteSelector",
"exact": "A hundred black faces turned around"
}
}
}
}
A State describes the intended state of a resource when selected, and thus provides the information needed to retrieve the correct representation of that resource. Web resources change over time, and a State might be used to describe how to recover the intended previous version. Web resources also have multiple formats, and a State might equally be used to describe how to retrieve that particular format.
The state aspect of a Web Resource requires two distinct entities:
A State object is used to describe how to determine the state of interest from within the Source resource. These two entities are encapsulated in a Locator .
Example Use Case: Alexandra visualizes data on a web page that changes frequently. Her client records information to allow other clients to hopefully reconstruct the original visualization.
| Term | Type | Description |
|---|---|---|
| state | Relationship |
The
relationship
between
the
Locator
and
the
State
.
There MAY be 0 or more
state
relationships
for
each
Locator.
Multiple
States
SHOULD
select
the
same
content,
however
some
States
will
not
have
the
same
precision
as
others.
Consuming
user
agents
MUST
pick
one
of
the
described
segments
,
if
they
are
different.
|
States MUST be processed before processing Selector information.
{
"source": "http://example.org/page1",
"state": {
"id": "http://example.org/state1"
}
}
A Time State resource records the time at which the resource is when the intended selection occurs, typically the time that the resource was created and/or a link to a persistent copy of the current version. The timestamp for the resource could be resolved via the Memento protocol, described in RFC 7089 [ rfc7089 ].
Example Use Case: Britney makes a note about the current state of the front page of a news website, and flags that the page is likely to change often. Her client adds in a State with the current time to describe the version of the page.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
State.
Time States MUST have exactly 1
type
and
the
value
MUST
be
TimeState
.
|
| sourceDate | Property |
The
timestamp
at
which
the
Source
resource
should
be
interpreted.
There MAY be 0 or more
sourceDate
properties
per
TimeState.
If
there
is
more
than
1,
each
gives
an
alternative
timestamp
at
which
the
Source
may
be
interpreted.
The
timestamp
MUST
be
expressed
in
the
xsd:dateTime
format,
and
MUST
use
the
UTC
timezone
expressed
as
"Z".
If
sourceDate
is
provided,
then
sourceDateStart
and
sourceDateEnd
MUST
NOT
be
provided.
|
| sourceDateStart | Property |
The
timestamp
that
begins
the
interval
over
which
the
Source
resource
should
be
interpreted.
There MAY be exactly 1
sourceDateStart
property
per
TimeState.
The
timestamp
MUST
be
expressed
in
the
xsd:dateTime
format,
and
MUST
use
the
UTC
timezone
expressed
as
"Z".
If
sourceDateStart
is
provided
then
sourceDateEnd
MUST
also
be
provided.
|
| sourceDateEnd | Property |
The
timestamp
that
ends
the
interval
over
which
the
Source
resource
should
be
interpreted.
There MAY be exactly 1
sourceDateEnd
property
per
TimeState.
The
timestamp
MUST
be
expressed
in
the
xsd:dateTime
format,
and
MUST
use
the
UTC
timezone
expressed
as
"Z".
If
sourceDateEnd
is
provided
then
sourceDateStart
MUST
also
be
provided.
|
| cached | Relationship |
A
link
to
a
copy
of
the
Source
resource's
representation,
appropriate
for
the
application.
There MAY be 0 or more
cached
relationships
per
TimeState.
If
there
is
more
than
1,
each
gives
an
alternative
copy
of
the
representation.
|
{
"source": "http://example.org/page1",
"state": {
"type": "TimeState",
"cached": "http://archive.example.org/copy1",
"sourceDate": "2015-07-20T13:30:00Z"
}
}
As there are potentially many representations that can be delivered from a resource with a single URL, and a selection may only apply to one of them, it is important to be able to record the HTTP Request headers that need to be sent to retrieve the correct representation. The HttpRequestState resource maintains a copy of the headers to be replayed when obtaining the representation.
Example Use Case: Carla retrieves a PDF representation of a Web resource that can deliver HTML, PDF or plain text and then writes a description about it. She signals that her description is only about the PDF representation. Her client then includes a State to describe how to retrieve the target representation.
| Term | Type | Description |
|---|---|---|
| type | Relationship |
The
class
of
the
State.
Request Header States MUST have exactly 1
type
and
the
value
MUST
be
HttpRequestState
.
|
| value | Property |
The
HTTP
request
headers
to
send
as
a
single,
complete
string.
An HttpRequestState MUST have exactly 1
value
property.
|
Content-Location
header,
then
the
client
might
instead
use
it
as
the
target
of
the
Annotation,
rather
than
the
URL
that
was
requested.
{
"source": "http://example.org/resource1",
"state": {
"type": "HttpRequestState",
"value": "Accept: application/pdf"
}
}
Similar to the refinement of selection , it may be easier, more reliable or more accurate to specify the appropriate state of the resource as a hierarchy of atomic State resources. This is particularly appropriate for representing the combination of a State that reflects an internal transformation along with the results of a State that describes an external request. This decomposition is accomplished by having the states chained together in the same way as Selectors.
Further, given that the State(s) will likely result in a specific representation, there may be specific Selectors that are appropriate for describing the segment of the representation. In order to accommodate this, States may also be refined by Selectors .
Example Use Case: Devina writes a comment about a travel e-book which has many versions available over time, and is available in different formats. She is particularly commenting on a specific version and format, so her client adds both a TimeState to capture the time and an HttpRequestState to capture the format.
| Term | Type | Description |
|---|---|---|
| refinedBy | Relationship |
The
relationship
between
a
broader
State
and
either
a
more
specific
State
or
a
Selector
that
SHOULD
be
applied
to
the
results
of
the
first.
Each State MAY be
refinedBy
1
or
more
other
States
or
Selectors.
If
more
than
1
is
given,
then
they
are
considered
to
be
alternatives
that
will
result
in
the
same
result.
|
{
"source": "http://example.org/ebook1",
"state": {
"type": "TimeState",
"sourceDate": "2016-02-01T12:05:23Z",
"refinedBy": {
"type": "HttpRequestState",
"value": "Accept: application/epub+zip"
}
}
}
Although Selectors and States provide a flexible way of identifying, e.g., a suitable Segment of a Resource, the fact that this is defined through an indirection using a Locator may be an obstacle for some applications.
We must add a good example. The original one referred to RDF, which is not relevant in this context.
To mitigate this issue, a mapping of Selectors and States on URL fragments [ url ] is defined below. As a result of this mapping the targeted Segment , or the relevant state, is expressed in a single (albeit complex) URL. In that URL the Selector , respectively the State , is expressed as a single string and serves as a fragment combined with the URL of the Source . Note that this representation is valid only if the URL for the Source does not contain a fragment identifier of its own (a URL may contain at most one fragment identification).
The syntax for mapping a Selector , respectively a State , follows the same, “functional” syntax as used, for example, by the XPointer Framework [ xptr-framework ]:
selector(…)
,
respectively
the
state(…)
,
functional
syntax
refinedBy
,
startSelector
,
and
endSelector
the
syntax
is
key=selector(…)
,
respectively
key=state(…)
when
appropriate,
with
the
value
following,
recursively,
the
same
syntax
as
the
full
fragment;
locators
the
content
is
a
comma
separated
list
of
locators;
key=value
syntax,
e.g.,
type=FragmentSelector
.
(see the examples below.)
The reference to the Web Publication Resources should be changed to use the short name of the WP specification, as opposed to the editor's draft like now.
The values SHOULD be percent encoded [ rfc3986 ]; the encoding is a MUST for characters that may make the fragment ambiguous, namely:
| character | code |
| space | %20 |
=
|
%3D |
,
|
%2C |
#
|
%23 |
A fragment identifier is defined for a specific media type. This means that, formally, the fragment identifier syntax and semantics defined in this section should be registered for each media type separately by IANA . Until such a registration is done, these fragment identifiers have the potential to conflict with other fragments possibly specified by the media type registrations. Consequently, this pattern should only be used when the implementation cannot produce or manage the full representation described above.
This section contains a mapping of all examples used in the definion of Selectors and States onto full URL-s with fragment identifiers. Note that the examples below have been broken into several lines for a greater readability; in real usage such new lines are not allowed in a URL.
Example for a 4.1 Fragment Selector
http://example.org/video1#selector(
type=FragmentSelector,
conformsTo=http://www.w3.org/TR/media-frags,
value=t%3D30%2C60
)
Example for a 4.2 CSS Selector
https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html#selector(
type=CssSelector,
scope=https://dauwhe.github.io/html-first/MobyDick.wpub
value=%23elemid%20>%20.elemclass%20+%20p,
)
Example for a 4.3 XPath Selector
http://example.org/page1.html#selector(
type=XPathSelector,
value=/html/body/p[2]/table/tr[2]/td[3]/span
)
Example for a 4.4 Text Quote Selector
http://example.org/page1#selector(
type=TextQuoteSelector,
exact=annotation,
prefix=this%20is%20an%20,
suffix=%20that%20has%20some
)
Example for a 4.5 Text Position Selector
http://example.org/ebook1#selector(
type=TextPositionSelector,
start=412,
end=795
)
Example for a 4.6 Data Position Selector
http://example.org/diskimg1#selector(
type=DataPositionSelector,
start=4096,
end=4104
)
First example for a 4.7 SVG Selector
http://example.org/map1#selector(
type=SvgSelector,
id=http://example.org/svg1
)
Second example for a 4.7 SVG Selector
http://example.org/map1#selector(
type=SvgSelector,
value=<svg:svg>%20…%20</svg:svg>
)
Please note that long SVG representations will produce very long URLs when produced according to this pattern. Care should be taken in environments where there is a character limit to URLs, and implementers should consider publishing the SVG as a separate resource and using its URL as shown in Example 22 .
Example for a 4.8 Range Selector
https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html#selector(
type=RangeSelector,
startSelector=selector(
type=TextQuoteSelector,
exact=Call%20me%20Ishmael.,
suffix=Some%20years%20ago
),
startSelector=selector(
type=TextQuoteSelector,
exact=He%20desires%20to%20paint%20you%20the dreamiest,%20,
prefix=But%20here%20is%20an%20artist.%20,
suffix=shadiest%2C%20quietest
)
)
Example for a 4.9 Multi Resource Selector
https://dauwhe.github.io/html-first/MobyDick.wpub#selector(
type=MultiResourceSelector,
selector(
source=https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html
type=TextQuoteSelector,
exact=Call%20me%20Ishmael.,
suffix=Some%20years%20ago
),
selector(
source=https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html
),
selector(
source=https://dauwhe.github.io/html-first/MobyDickNav/html/c003.html
type=TextQuoteSelector,
exact=The%20opposite%20wall%20of%20thi%20entry,
suffix=%20was%20hung
)
)
Example for an
https://dauwhe.github.io/html-first/MobyDick.wpub#selector(
type=EmbeddedResourceSelector,
value=https://dauwhe.github.io/html-first/MobyDickNav/images/moby-dick-book-cover.jpg
)
Example for a 4.11 Refinement of Selection
http://example.org/page1#selector(
type=FragmentSelector,
value=para5,
refinedBy=selector(
type=TextQuoteSelector,exact=Selected%20Text,
prefix=text%20before%20the%20,
suffix=%20and%20text%20after%20it
)
)
Example for a 4.11 Refinement of Selection
https://dauwhe.github.io/html-first/MobyDick.wpub#selector(
type=RangeSelector,
startSelector=selector(
type=EmbeddedResourceSelector,
value=https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html
refinedBy=selector(
type=TextQuoteSelector,
exact=Call%20me%20Ishmael.
)
),
endSelector= selector(
type=EmbeddedResourceSelector,
value=https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html
refinedBy=selector(
type=TextQuoteSelector,
exact=A%20hundred%20black%20faces%20turned%20around,
)
)
)
Example for a 5.1 Time State
http://example.org/page1#state(
type=TimeState,
cached=http://archive.example.org/copy1,
sourceDate=2015-07-20T13:30:00Z
)
Example for a 5.2 Request Header State
http://example.org/resource1#state(
type=HttpRequestState,
value=Accept:%20application/pdf
)
Example for a 5.3 Refinement of State
http://example.org/ebook1#state(
type=TimeState,sourceDate=2016-02-01T12:05:23Z,
refinedBy=state(
type=HttpRequestState,
value=Accept:%20application/epub+zip
)
)
Not all Selectors are relevant for all media types; some combinations are meaningless or not formally defined. An implementation may therefore ignore certain types of Selectors in case the corresponding media types are not handled by that particular implementation.
The table below shows the correspondence among the main media types addressed in this specification and Selector types. The meaning of the table elements, and their effect on implementation conformance, is as follows.
| Fragment | CSS | XPath | Text Quote | Text Position | Data Position | Svg | |
|---|---|---|---|---|---|---|---|
| HTML (text/html) | ✔︎ | ✔︎ | ✔︎ | ✔︎ | ✔︎ | ✘ | ✘ |
| CSV (text/csv) | ✔︎ | ✘ | ✘ | ✔︎ | ✔︎ | ✘ | ✘ |
| Plain Text (text/plain) | ✔︎ | ✘ | ✘ | ✔︎ | ✔︎ | ✘ | ✘ |
| Other text files (text/*) | ? | ✘ | ✘ | ✔︎ | ✔︎ | ✘ | ✘ |
| PDF (application/pdf) | ✔︎ | ✘ | ✘ | ✔︎ | ✔︎ | ✘ | ✘ |
| XML (application/xml, application/*+xml) | ✔︎ | ✔︎ | ✔︎ | ✔︎ | ✔︎ | ✘ | ✘ |
| SVG (image/svg+xml) | ✔︎ | ✔︎ | ✔︎ | ✔︎ | ✔︎ | ✘ | ✔︎ |
| Image, other than SVG (image/gif, image/jpeg, image/png, image/tiff) | ✔︎ | ✘ | ✘ | ✘ | ✘ | ? | ✔︎ |
| Video (video/*) | ✔︎ | ✘ | ✘ | ✘ | ✘ | ? | ✔︎ |
| Binary Data Files | ? | ✘ | ✘ | ✘ | ✘ | ✔︎ | ✘ |
This section is non-normative.
The table below contains some other, possible combinations of media types and selector types, which MAY be implemented but are not mandated by this specification. Some of these combinations may also form the basis for defining new, implementation-specific selector extensions.
| Fragment | CSS | XPath | Text Quote | Text Position | Data Position | Svg | |
|---|---|---|---|---|---|---|---|
| CSS (text/css) | ✘ | ✘ | ✘ | ✔︎ | ✔︎ | ✘ | ✘ |
| TSV (text/tab-separated-values) | ✔︎ ✝ | ✘ | ✘ | ✔︎ | ✔︎ | ✘ | ✘ |
| JSON (application/json, application/*+json) | ✘ | ✘ | ✘ | ✔︎ | ? | ✘ | ✘ |
| Programming languages (application/javascript, python files, etc.) | ✘ | ✘ | ✘ | ✔︎ | ? | ✘ | ✘ |
| ✝ Fragments are not formally defined through IETF, though there are well-known connections to existing fragments or practices. | |||||||
This document has been derived from the “Selectors and States” [ selectors-states ] Working Group Note, published by the Web Annotation Working Group. That Note is based on the formal specification and the semantics of a separate W3C Recommendation, namely the Web Annotation Data Model [ annotation-model ], where it is used to select targets of annotations. This documents introduces some changes as follows.
Editorial Changes
Non-editorial Changes
scope
relationship
has
re-introduced.
(This
relationship
is
part
of
the
“Web
Annotation
Data
Model” [
annotation-model
]
but
was
not
retained
in
the
“Selectors
and
States” [
selectors-states
]
Note.)
| Term | Usage |
|---|---|
| cached | Time State |
| conformsTo | Fragment Selector |
| end | Text Position Selector , Data Position Selector |
| endSelector | Range Selector |
| exact | Text Quote Selector |
| locators | Multi Resource Selector |
| prefix | Text Quote Selector |
| refinedBy | Selector , State |
| selector | Locator |
| scope | Locator |
| source | Locator |
| sourceDate | Time State |
| sourceDateEnd | Time State |
| sourceDateStart | Time State |
| start | Text Position Selector , Data Position Selector |
| startSelector | Range Selector |
| state | Locator |
| suffix | Text Quote Selector |
| type | Locator , Fragment Selector , CSS Selector , XPath Selector , Text Quote Selector , Text Position Selector , Data Position Selector , SVG Selector , Embedded Reosource Selector , Multiple Resource Selector , Time State , Request Header State |
| value | Fragment Selector , CSS Selector , SVG Selector , XPath Selector , Embedded Resource Selector , Request Header State |
Just a placeholder for now; once the specification is, overall, complete, a section should come to summarize the exact relationships to the Web Annotation Model, the relationships in the conformance of the two standards, etc.
This section is non-normative.
The following people contributed to the development of this specification:
The Working Group would also like to thank the members of the Digital Publishing Interest Group for all the hard work they did paving the road for this specification.