URL General Syntax
(Page 4 of 4)
"Unsafe" Characters and Special Encodings
URLs are normally expressed in the standard US ASCII character set, which is the default used by most TCP/IP application protocols. Certain characters in the set are called unsafe, because they have special meaning in different contexts, and including them in a URL would lead to ambiguity or problems in of how they should be interpreted. The space character is the classical unsafe character because spaces are normally used to separate URLs, so including one in a URL would break the URL into pieces. Other characters are unsafe because they have special significance in a URL, such as the colon (:).
The safe characters in a URL are alphanumerics (A to Z, a to z and 0 to 9) and the following special characters: the dollar sign ($), hyphen (-), underscore (_), period (.), plus sign (+), exclamation point (!), asterisk (*), apostrophe ('), left parenthesis ((), and right parenthesis ()). All other unsafe characters can be represented in a URL using an encoding scheme consisting of a percent sign (%) followed by the hexadecimal ASCII value of the character. The most common examples are given in Table 223.
When these sequences are encountered, they are interpreted as the literal character they represent, without any significance. So, the URL http://www.myfavesite.com/are%20you%20there%3F points to a file called are you there? on www.myfavesite.com. The %20 codes prevent the spaces from breaking up the URL, and the 3F prevents the question mark in the file name from being interpreted as a special URL character.
Again, these encodings are sometimes abused for nefarious purposes, unfortunately, such as using them for regular ASCII characters to obscure URLs.
Home - Table Of Contents - Contact Us
The TCP/IP Guide (http://www.TCPIPGuide.com)
Version 3.0 - Version Date: September 20, 2005
© Copyright 2001-2005 Charles M. Kozierok. All Rights Reserved.
Not responsible for any loss resulting from the use of this site.