Segments

Hierarchical schemes often interpret the path as a slash-delimited sequence of percent-encoded strings called segments. In this library the segments may be accessed using these separate, bidirectional view types which reference the underlying URL:

Type Accessor Description

segments_view

segments

A read-only range of decoded segments.

segments_ref

segments

A modifiable range of decoded segments.

segments_encoded_view

encoded_segments

A read-only range of segments.

segments_encoded_ref

encoded_segments

A modifiable range of segments.

First we observe these invariants about paths and segments:

  • All URLs have a path

  • A path is absolute, relative, or empty

  • Paths starting with "/" are absolute

  • A relative path can never follow an authority

  • Every path maps to a unique range of segments, plus a bool indicating if the path is absolute.

  • Every range of segments, plus a bool indicating if the path is absolute, maps to a unique path.

The following URL contains a path with three segments: "path", "to", and "file.txt":

http://www.example.com/path/to/file.txt

To understand the relationship between the path and segments, we define this function segs which returns a list of strings corresponding to the elements in a container of segments:

auto segs( core::string_view s ) -> std::list< std::string >
{
    url_view u( s );
    std::list< std::string > seq;
    for( auto seg : u.encoded_segments() )
        seq.push_back( seg.decode() );
    return seq;
}

In this table we show the result of invoking segs with different paths. This demonstrates how the library achieves the invariants described above for various interesting cases:

s cpp:segs( s )[] absolute

""

cpp:{ }[]

"/"

cpp:{ }[]

yes

"./"

cpp:{ "" }[]

"usr"

cpp:{ "usr" }[]

"./usr"

cpp:{ "usr" }[]

"/index.htm"

cpp:{ "index.htm" }[]

yes

"/images/cat-pic.gif"

cpp:{ "images", "cat-pic.gif" }[]

yes

"images/cat-pic.gif"

cpp:{ "images", "cat-pic.gif" }[]

"/fast//query"

cpp:{ "fast", "", "query" }[]

yes

"fast//"

cpp:{ "fast", "", "" }[]

"/./"

cpp:{ "" }[]

yes

".//"

cpp:{ "", "" }[]

This implies that two paths may map to the same sequence of segments . In the paths undefined["usr"] and undefined["./usr"], the undefined["./"] is a prefix that might be necessary to maintain the invariant that instances of url_view_base always refer to valid URLs. Thus, both paths map to cpp:{ "usr" }[]. On the other hand, each sequence determines a unique path for a given URL. For instance, setting the segments to {"a"} would always map to either "./a" or "a", depending on whether the "." prefix is necessary to keep the URL valid.

Sequences don’t iterate the leading "." when it’s necessary to keep the URL valid. Thus, when we assign cpp:{ "x", "y", "z" }[] to segments, the sequence always contains cpp:{ "x", "y", "z" }[] after that. It never contains cpp:{ ".", "x", "y", "z" }[] because the "." needed to be included. In other words, the contents of the segment container are authoritative, and the path string is a function of them. Not vice-versa.

Library algorithms which modify individual segments of the path or set the entire path attempt to behave consistently with the behavior expected as if the operation was performed on the equivalent sequence. If a path maps, say, to the three element sequence cpp:{ "a", "b", "c" }[] then erasing the middle segment should result in the sequence cpp:{ "a", "c" }[]. The library always strives to do exactly what the caller requests; however, in some cases this would result in either an invalid URL, or a dramatic and unwanted change in the URL’s semantics.

For example consider the following URL:

url u = url().set_path( "kyle:xy" );

The library produces the URL string "kyle%3Axy" and not "kyle:xy", because the latter would have an unintended scheme. The table below demonstrates the results achieved by performing various modifications to a URL containing a path:

URL Operation Result

"info:kyle:xy"

remove_scheme()

undefined["kyle%3Axy"]

undefined["kyle%3Axy"]

cpp:set_scheme( "gopher" )[]

"gopher:kyle:xy"

"http://www.example.com//kyle:xy"

remove_authority()

"http:/.//kyle:xy"

"//www.example.com//kyle:xy"

remove_authority()

"/.//kyle:xy"

undefined["http://www.example.com//kyle:xy";]

remove_origin()

undefined["/.//kyle:xy"]

undefined["info:kyle:xy"]

remove_origin()

undefined["kyle%3Axy"]

"/kyle:xy"

cpp:set_path_absolute( false )[]

undefined["kyle%3Axy"]

undefined["kyle%3Axy"]

cpp:set_path_absolute( true )[]

undefined["/kyle:xy"]

undefined[""]

cpp:set_path( "kyle:xy" )[]

undefined["kyle%3Axy"]

undefined[""]

cpp:set_path( "//foo/fighters.txt" )[]

"/.//foo/fighters.txt"

"my%3Asharona/billa%3Abong"

normalize()

"my%3Asharona/billa:bong"

"./my:sharona"

normalize()

"my%3Asharona"

For the full set of containers and functions for operating on paths and segments, please consult the reference.