web: add support for URI-reference

Based on a patch by Daniel Hartwig <mandyke@gmail.com>.

* NEWS: Update.
* doc/ref/web.texi (URIs): Fragments are properly part of a URI, so
  remove the incorrect note.  Add documentation on URI subtypes.
* module/web/uri.scm (uri-reference?): New base type predicate.
  (uri?, relative-ref?): Specific predicates.
  (validate-uri-reference): Strict validation.
  (validate-uri, validate-relative-ref): Specific validators.
  (build-uri-reference, build-relative-ref): New constructors.
  (string->uri-reference): Rename from string->uri.
  (string->uri, string->relative-ref): Specific constructors.
  (uri->string): Add #:include-fragment? keyword argument.
* module/web/http.scm (parse-request-uri): Use `build-uri-reference',
  and result is a URI-reference, not URI, object.  No longer infer an
  absent `uri-scheme' is `http'.
  (write-uri): Just use `uri->string'.
  (declare-uri-header!): Remove unused function.
  (declare-uri-reference-header!): Update.  Rename from
  `declare-relative-uri-header!'.
* test-suite/tests/web-uri.test ("build-uri-reference"):
  ("string->uri-reference"): Add.
  ("uri->string"): Also tests for relative-refs.
* test-suite/tests/web-http.test ("read-request-line"):
  ("write-request-line"): Update for no scheme in some URIs.
  ("entity headers", "request headers"): Content-location, Referer, and
  Location should also parse relative-URIs.
* test-suite/tests/web-request.test ("example-1"): Expect URI-reference
  with no scheme.
This commit is contained in:
Andy Wingo 2017-05-21 11:56:59 +02:00
commit 7095a536f3
9 changed files with 340 additions and 148 deletions

View file

@ -173,23 +173,13 @@ Guile provides a standard data type for Universal Resource Identifiers
The generic URI syntax is as follows:
@example
URI := scheme ":" ["//" [userinfo "@@"] host [":" port]] path \
[ "?" query ] [ "#" fragment ]
URI-reference := [scheme ":"] ["//" [userinfo "@@"] host [":" port]] path \
[ "?" query ] [ "#" fragment ]
@end example
For example, in the URI, @indicateurl{http://www.gnu.org/help/}, the
scheme is @code{http}, the host is @code{www.gnu.org}, the path is
@code{/help/}, and there is no userinfo, port, query, or fragment. All
URIs have a scheme and a path (though the path might be empty). Some
URIs have a host, and some of those have ports and userinfo. Any URI
might have a query part or a fragment.
There is also a ``URI-reference'' data type, which is the same as a URI
but where the scheme is optional. In this case, the scheme is taken to
be relative to some other related URI. A common use of URI references
is when you want to be vague regarding the choice of HTTP or HTTPS --
serving a web page referring to @code{/foo.css} will use HTTPS if loaded
over HTTPS, or HTTP otherwise.
@code{/help/}, and there is no userinfo, port, query, or fragment.
Userinfo is something of an abstraction, as some legacy URI schemes
allowed userinfo of the form @code{@var{username}:@var{passwd}}. But
@ -197,14 +187,6 @@ since passwords do not belong in URIs, the RFC does not want to condone
this practice, so it calls anything before the @code{@@} sign
@dfn{userinfo}.
Properly speaking, a fragment is not part of a URI. For example, when a
web browser follows a link to @indicateurl{http://example.com/#foo}, it
sends a request for @indicateurl{http://example.com/}, then looks in the
resulting page for the fragment identified @code{foo} reference. A
fragment identifies a part of a resource, not the resource itself. But
it is useful to have a fragment field in the URI record itself, so we
hope you will forgive the inconsistency.
@example
(use-modules (web uri))
@end example
@ -213,40 +195,36 @@ The following procedures can be found in the @code{(web uri)}
module. Load it into your Guile, using a form like the above, to have
access to them.
The most common way to build a URI from Scheme is with the
@code{build-uri} function.
@deffn {Scheme Procedure} build-uri scheme @
[#:userinfo=@code{#f}] [#:host=@code{#f}] [#:port=@code{#f}] @
[#:path=@code{""}] [#:query=@code{#f}] [#:fragment=@code{#f}] @
[#:validate?=@code{#t}]
Construct a URI object. @var{scheme} should be a symbol, @var{port}
either a positive, exact integer or @code{#f}, and the rest of the
fields are either strings or @code{#f}. If @var{validate?} is true,
also run some consistency checks to make sure that the constructed URI
is valid.
Construct a URI. @var{scheme} should be a symbol, @var{port} either a
positive, exact integer or @code{#f}, and the rest of the fields are
either strings or @code{#f}. If @var{validate?} is true, also run some
consistency checks to make sure that the constructed URI is valid.
@end deffn
@deffn {Scheme Procedure} build-uri-reference [#:scheme=@code{#f}]@
[#:userinfo=@code{#f}] [#:host=@code{#f}] [#:port=@code{#f}] @
[#:path=@code{""}] [#:query=@code{#f}] [#:fragment=@code{#f}] @
[#:validate?=@code{#t}]
Like @code{build-uri}, but with an optional scheme.
@end deffn
In Guile, both URI and URI reference data types are represented in the
same way, as URI objects.
@deffn {Scheme Procedure} uri? obj
@deffnx {Scheme Procedure} uri-scheme uri
Return @code{#t} if @var{obj} is a URI.
@end deffn
Guile, URIs are represented as URI records, with a number of associated
accessors.
@deffn {Scheme Procedure} uri-scheme uri
@deffnx {Scheme Procedure} uri-userinfo uri
@deffnx {Scheme Procedure} uri-host uri
@deffnx {Scheme Procedure} uri-port uri
@deffnx {Scheme Procedure} uri-path uri
@deffnx {Scheme Procedure} uri-query uri
@deffnx {Scheme Procedure} uri-fragment uri
A predicate and field accessors for the URI record type. The URI scheme
will be a symbol, or @code{#f} if the object is a URI reference but not
a URI. The port will be either a positive, exact integer or @code{#f},
and the rest of the fields will be either strings or @code{#f} if not
present.
Field accessors for the URI record type. The URI scheme will be a
symbol, or @code{#f} if the object is a relative-ref (see below). The
port will be either a positive, exact integer or @code{#f}, and the rest
of the fields will be either strings or @code{#f} if not present.
@end deffn
@deffn {Scheme Procedure} string->uri string
@ -254,15 +232,11 @@ Parse @var{string} into a URI object. Return @code{#f} if the string
could not be parsed.
@end deffn
@deffn {Scheme Procedure} string->uri-reference string
Parse @var{string} into a URI object, while not requiring a scheme.
Return @code{#f} if the string could not be parsed.
@end deffn
@deffn {Scheme Procedure} uri->string uri
@deffn {Scheme Procedure} uri->string uri [#:include-fragment?=@code{#t}]
Serialize @var{uri} to a string. If the URI has a port that is the
default port for its scheme, the port is not included in the
serialization.
serialization. If @var{include-fragment?} is given as false, the
resulting string will omit the fragment (if any).
@end deffn
@deffn {Scheme Procedure} declare-default-port! scheme port
@ -323,6 +297,70 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes
as @code{"scrambled%20eggs/biscuits%26gravy"}.
@end deffn
@subsubheading Subtypes of URI
As we noted above, not all URI objects have a scheme. You might have
noted in the ``generic URI syntax'' example that the left-hand side of
that grammar definition was URI-reference, not URI. A
@dfn{URI-reference} is a generalization of a URI where the scheme is
optional. If no scheme is specified, it is taken to be relative to some
other related URI. A common use of URI references is when you want to
be vague regarding the choice of HTTP or HTTPS -- serving a web page
referring to @code{/foo.css} will use HTTPS if loaded over HTTPS, or
HTTP otherwise.
@deffn {Scheme Procedure} build-uri-reference [#:scheme=@code{#f}]@
[#:userinfo=@code{#f}] [#:host=@code{#f}] [#:port=@code{#f}] @
[#:path=@code{""}] [#:query=@code{#f}] [#:fragment=@code{#f}] @
[#:validate?=@code{#t}]
Like @code{build-uri}, but with an optional scheme.
@end deffn
@deffn {Scheme Procedure} uri-reference? obj
Return @code{#t} if @var{obj} is a URI-reference. This is the most
general URI predicate, as it includes not only full URIs that have
schemes (those that match @code{uri?}) but also URIs without schemes.
@end deffn
It's also possible to build a @dfn{relative-ref}: a URI-reference that
explicitly lacks a scheme.
@deffn {Scheme Procedure} build-relative-ref @
[#:userinfo=@code{#f}] [#:host=@code{#f}] [#:port=@code{#f}] @
[#:path=@code{""}] [#:query=@code{#f}] [#:fragment=@code{#f}] @
[#:validate?=@code{#t}]
Like @code{build-uri}, but with no scheme.
@end deffn
@deffn {Scheme Procedure} relative-ref? obj
Return @code{#t} if @var{obj} is a ``relative-ref'': a URI-reference
that has no scheme. Every URI-reference will either match @code{uri?}
or @code{relative-ref?} (but not both).
@end deffn
In case it's not clear from the above, the most general of these URI
types is the URI-reference, with @code{build-uri-reference} as the most
general constructor. @code{build-uri} and @code{build-relative-ref}
enforce enforce specific restrictions on the URI-reference. The most
generic URI parser is then @code{string->uri-reference}, and there is
also a parser for when you know that you want a relative-ref.
@deffn {Scheme Procedure} string->uri-reference string
Parse @var{string} into a URI object, while not requiring a scheme.
Return @code{#f} if the string could not be parsed.
@end deffn
@deffn {Scheme Procedure} string->relative-ref string
Parse @var{string} into a URI object, while asserting that no scheme is
present. Return @code{#f} if the string could not be parsed.
@end deffn
For compatibility reasons, note that @code{uri?} will return @code{#t}
for all URI objects, even relative-refs. In contrast, @code{build-uri}
and @code{string->uri} require that the resulting URI not be a
relative-ref. As a predicate to distinguish relative-refs from proper
URIs (in the language of RFC 3986), use something like @code{(and
(uri-reference? @var{x}) (not (relative-ref? @var{x})))}.
@node HTTP
@subsection The Hyper-Text Transfer Protocol