URLencode Encode or Decode (partial) URLs

Description

Functions to percent-encode or decode characters in URLs.

Usage

URLencode(URL, reserved = FALSE, repeated = FALSE)
URLdecode(URL)

Arguments

URL

a character vector.

reserved

logical: should ‘reserved’ characters be encoded? See ‘Details’.

repeated

logical: should apparently already-encoded URLs be encoded again?

Details

Characters in a URL other than the English alphanumeric characters and - _ . ~ should be encoded as % plus a two-digit hexadecimal representation, and any single-byte character can be so encoded. (Multi-byte characters are encoded byte-by-byte.) The standard refers to this as ‘percent-encoding’.

In addition, ! $ & ' ( ) * + , ; = : / ? @ # [ ] are reserved characters, and should be encoded unless used in their reserved sense, which is scheme specific. The default in URLencode is to leave them alone, which is appropriate for file:// URLs, but probably not for http:// ones.

An ‘apparently already-encoded URL’ is one containing %xx for two hexadecimal digits.

Value

A character vector.

References

Internet STD 66 (formerly RFC 3986), https://tools.ietf.org/html/std66

Examples

(y <- URLencode("a url with spaces and / and @"))
URLdecode(y)
(y <- URLencode("a url with spaces and / and @", reserved = TRUE))
URLdecode(y)

URLdecode(z <- "ab%20cd")
c(URLencode(z), URLencode(z, repeated = TRUE)) # first is usually wanted

## both functions support character vectors of length > 1
y <- URLdecode(URLencode(c("url with space", "another one")))

Copyright (©) 1999–2012 R Foundation for Statistical Computing.
Licensed under the GNU General Public License.