最近在研究请求头、响应头的有效字符,首先看看 rfc2616 关于消息头的定义:
RFC2616 4.2 Message Headers
beyond the common forms.
message-header = field-name ":" [ field-value ] field-name = token field-value = *( field-content | LWS ) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string>
这个是关于消息头的语法定义,其中一些细节在标准的2.2 Basic Rules内定义如下:
CR = <US-ASCII CR, carriage return (13)> LF = <US-ASCII LF, linefeed (10)> SP = <US-ASCII SP, space (32)> HT = <US-ASCII HT, horizontal-tab (9)> CRLF = CR LF LWS = [CRLF] 1*( SP | HT ) OCTET = <any 8-bit sequence of data> CHAR = <any US-ASCII character (octets 0 - 127)> CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)> TEXT = <any OCTET except CTLs, but including LWS> token = 1*<any CHAR except CTLs or separators> separators = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | "" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT quoted-string = ( <"> *(qdtext | quoted-pair ) <"> ) qdtext = <any TEXT except <">> quoted-pair = "\" CHAR
语法说的很明白,field-content 可以是
1)token, separators, quoted-string 组成的字符串,
token:除开控制字符和分隔符的 us字符 [0, 127];
separators:分隔符
quoted-string:由一对引号括起来的 qdtext 或则 quoted-pair 组成;
qdtext:除了引号 " 之外的TEXT (即任何 8 位字节,除开控制字符,但包括CR LF HT) quoted-pair:(反斜杠后跟 [0, 127] 范围内的任何值)。
2)* TEXT:字符串,除开所有控制字符的8位字节,包括 CR LF HT。
如果研究完协议的描述,可以看到,所有的8字节ASCII码事实上都在标准的允许之内!
但是,实际上,几乎所有的web服务器都不会接受除了CR LF HT之外的控制字符,以及大于等于127的字符(即几乎所有不可读的字符都不在范围之内)。
实际上的事实是,被允许的字符很严格,其范围属于:
9, 10, 13, [32, 127)