<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://stlab.adobe.com/wiki/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;feed=atom&amp;action=history</id>
		<title>How To Write A Simple Lexical Analyzer or Parser - Revision history</title>
		<link rel="self" type="application/atom+xml" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;feed=atom&amp;action=history"/>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;action=history"/>
		<updated>2013-05-20T01:50:43Z</updated>
		<subtitle>Revision history for this page on the wiki</subtitle>
		<generator>MediaWiki 1.19.0</generator>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2432&amp;oldid=prev</id>
		<title>SeanParent at 20:26, 20 October 2011</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2432&amp;oldid=prev"/>
				<updated>2011-10-20T20:26:25Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 20:26, 20 October 2011&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 28:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 28:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;template &amp;lt;typename I&amp;gt; // I models InputIterator&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;template &amp;lt;typename I&amp;gt; // I models InputIterator&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;// require value_type(I) == code_point_t&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;// require value_type(I) == code_point_t&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;struct parser&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;{&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&amp;#160; &amp;#160; bool is_production(T&amp;amp; value);&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;#160;&amp;#160; &amp;#160; bool is_production(T&amp;amp; value);&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 194:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 196:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;That's the complete code. Besides having a solution to our problem we have a general framework for building other solutions to parsing problems and a pretty decent start that can be reused for other XML related issues. Parsers written in this style are simple to extend even if they are packaged as a library, you can simply inherit from the struct and add additional productions. The code described here took very little time to write (the biggest hassle was building the larger lookup tables from the productions in the XML specification.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;That's the complete code. Besides having a solution to our problem we have a general framework for building other solutions to parsing problems and a pretty decent start that can be reused for other XML related issues. Parsers written in this style are simple to extend even if they are packaged as a library, you can simply inherit from the struct and add additional productions. The code described here took very little time to write (the biggest hassle was building the larger lookup tables from the productions in the XML specification.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The complete example file can be found [http://stlab.adobe.com:8080&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;/@md=d&amp;amp;cd=//sandbox/papers/xml_parser_example/&amp;amp;cdf=//sandbox/papers/xml_parser_example/xml_parser_example.cpp&amp;amp;c=9XT@/&lt;/del&gt;/sandbox/papers/xml_parser_example/xml_parser_example.cpp?ac=64&amp;amp;rev1=&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;2 &lt;/del&gt;here.]&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The complete example file can be found [http://stlab.adobe.com:8080/sandbox/papers/xml_parser_example/xml_parser_example.cpp?ac=64&amp;amp;rev1=&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;3 &lt;/ins&gt;here.]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2292&amp;oldid=prev</id>
		<title>SeanParent: /* A Simple Recursive Decent Parser */</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2292&amp;oldid=prev"/>
				<updated>2008-05-20T17:25:05Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;A Simple Recursive Decent Parser&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 17:25, 20 May 2008&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 17:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 17:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The stream denoted by first last is going to be UTF16 code points because that is how the XML specification is defined. We'll define code_point_t to be a 16 bit integer:&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The stream denoted by &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;tt&amp;gt;[&lt;/ins&gt;first&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;, &lt;/ins&gt;last&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;)&amp;lt;/tt&amp;gt; &lt;/ins&gt;is going to be UTF16 code points because that is how the XML specification is defined. We'll define code_point_t to be a 16 bit integer:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 35:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 35:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Writing the body of the production is &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;generally quite &lt;/del&gt;simple. We start at the top level production rule and work our way down - in this case we are interested in parsing &amp;lt;tt&amp;gt;Name&amp;lt;/tt&amp;gt;. The &amp;lt;tt&amp;gt;Name&amp;lt;/tt&amp;gt; production rule looks like this:&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Writing the body of the production is simple. We start at the top level production rule and work our way down - in this case we are interested in parsing &amp;lt;tt&amp;gt;Name&amp;lt;/tt&amp;gt;. The &amp;lt;tt&amp;gt;Name&amp;lt;/tt&amp;gt; production rule looks like this:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 62:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 62:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Here string16_t is our token type. For this problem, we don't actually care what the token value is - and constructing the token is relatively expensive &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;(if &lt;/del&gt;we were writing a full-blown XML parser we would spend some energy on dealing with tokens efficiently. This code also doesn't tell is if the name is qualified or reserved - those are attributes of our token value. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;So we're going to &lt;/del&gt;rewrite the function to drop the token and add some attributes. We'll deal with attributes in a similar fashion to the value of the production rule, except if the production rule is recognized the attributes will be accumulated (as opposed to overwritten). We don't need to pass the attribute to all production rules, only those that have attributes. Here is the definition of our attribute type:&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Here string16_t is our token type. For this problem, we don't actually care what the token value is - and constructing the token is relatively expensive&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. If &lt;/ins&gt;we were writing a full-blown XML parser we would spend some energy on dealing with tokens efficiently. This code also doesn't tell is if the name is qualified or reserved - those are attributes of our token value. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;We &lt;/ins&gt;rewrite the function to drop the token and add some attributes. We'll deal with attributes in a similar fashion to the value of the production rule, except if the production rule is recognized the attributes will be accumulated (as opposed to overwritten). We don't need to pass the attribute to all production rules, only those that have attributes. Here is the definition of our attribute type:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 145:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 145:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Some of the tables, such as the one for &amp;lt;tt&amp;gt;BaseChar&amp;lt;/tt&amp;gt; are quite large making it prohibitively expensive to write an expression like &amp;lt;tt&amp;gt;(0x4E00U &amp;lt;= x &amp;amp;&amp;amp; x &amp;lt;= 0x9FA5U) || (x == 0x3007U) || ...&amp;lt;/tt&amp;gt;. Instead we will create a simple lookup table. When dealing with a parser that works with a simple &amp;lt;tt&amp;gt;char&amp;lt;/tt&amp;gt; type we could just build a table with 256 elements that contained an enumeration indicating what production rule the character matched. A table 16 bit values would be a bit large (but we could do it - 64K elements) - but instead we'll create an array of values denoting ranges which contain the code points we want to match and search to find if our character is in one of the ranges. To do that we'll create a semi-open ranges for each character range of the form &amp;lt;tt&amp;gt;(f0, l0], (f1, l1], ...&amp;lt;/tt&amp;gt; and then use &amp;lt;tt&amp;gt;lower_bound()&amp;lt;/tt&amp;gt; to search the table, if we end up on an odd index (starting at 0) then the item is in the range, an even index it is not. This strategy means we can't represent a range which includes &amp;lt;tt&amp;gt;0&amp;lt;/tt&amp;gt; but &amp;lt;tt&amp;gt;0&amp;lt;/tt&amp;gt; is not a valid character in an XML production. The code for the table lookup for &amp;lt;tt&amp;gt;Ideographic&amp;lt;/tt&amp;gt; is this:&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Some of the tables, such as the one for &amp;lt;tt&amp;gt;BaseChar&amp;lt;/tt&amp;gt; are quite large making it prohibitively expensive to write an expression like &amp;lt;tt&amp;gt;(0x4E00U &amp;lt;= x &amp;amp;&amp;amp; x &amp;lt;= 0x9FA5U) || (x == 0x3007U) || ...&amp;lt;/tt&amp;gt;. Instead we will create a simple lookup table. When dealing with a parser that works with a simple &amp;lt;tt&amp;gt;char&amp;lt;/tt&amp;gt; type we could just build a table with 256 elements that contained an enumeration indicating what production rule the character matched. A table &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;of &lt;/ins&gt;16 bit values would be a bit large (but we could do it - 64K elements) - but instead we'll create an array of values denoting ranges which contain the code points we want to match and search to find if our character is in one of the ranges. To do that we'll create a semi-open ranges for each character range of the form &amp;lt;tt&amp;gt;(f0, l0], (f1, l1], ...&amp;lt;/tt&amp;gt; and then use &amp;lt;tt&amp;gt;lower_bound()&amp;lt;/tt&amp;gt; to search the table, if we end up on an odd index (starting at 0) then the item is in the range, an even index it is not. This strategy means we can't represent a range which includes &amp;lt;tt&amp;gt;0&amp;lt;/tt&amp;gt; but &amp;lt;tt&amp;gt;0&amp;lt;/tt&amp;gt; is not a valid character in an XML production. The code for the table lookup for &amp;lt;tt&amp;gt;Ideographic&amp;lt;/tt&amp;gt; is this:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 161:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 161:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To transform from the closed ranges to the semi-open ranges we subtract one from the first item in the range. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Note that this &lt;/del&gt;code doesn't have to be a member function and is not a template - we can put these lookup tables in a .cpp file. We'll use this function in our &amp;lt;tt&amp;gt;is_ideographic()&amp;lt;/tt&amp;gt; member function:&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To transform from the closed ranges to the semi-open ranges we subtract one from the first item in the range. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;This &lt;/ins&gt;code doesn't have to be a member function and is not a template - we can put these lookup tables in a .cpp file. We'll use this function in our &amp;lt;tt&amp;gt;is_ideographic()&amp;lt;/tt&amp;gt; member function:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 192:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 192:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;That's the complete code. Besides having a solution to our problem we &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;also &lt;/del&gt;have a general framework for building other solutions to parsing problems and a pretty decent start that can be reused for other XML related issues. Parsers written in this style are simple to extend even if they are packaged as a library, you can simply inherit from the struct and add additional productions. The code described here took very little time to write (the biggest hassle was building the larger lookup tables from the productions in the XML specification.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;That's the complete code. Besides having a solution to our problem we have a general framework for building other solutions to parsing problems and a pretty decent start that can be reused for other XML related issues. Parsers written in this style are simple to extend even if they are packaged as a library, you can simply inherit from the struct and add additional productions. The code described here took very little time to write (the biggest hassle was building the larger lookup tables from the productions in the XML specification.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The complete example file can be found [http://stlab.adobe.com:8080/@md=d&amp;amp;cd=//sandbox/papers/xml_parser_example/&amp;amp;cdf=//sandbox/papers/xml_parser_example/xml_parser_example.cpp&amp;amp;c=9XT@//sandbox/papers/xml_parser_example/xml_parser_example.cpp?ac=64&amp;amp;rev1=2 here.]&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The complete example file can be found [http://stlab.adobe.com:8080/@md=d&amp;amp;cd=//sandbox/papers/xml_parser_example/&amp;amp;cdf=//sandbox/papers/xml_parser_example/xml_parser_example.cpp&amp;amp;c=9XT@//sandbox/papers/xml_parser_example/xml_parser_example.cpp?ac=64&amp;amp;rev1=2 here.]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2291&amp;oldid=prev</id>
		<title>SeanParent: /* A Simple Recursive Decent Parser */</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2291&amp;oldid=prev"/>
				<updated>2008-05-20T17:16:37Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;A Simple Recursive Decent Parser&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 17:16, 20 May 2008&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 9:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 9:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== A Simple Recursive Decent Parser ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== A Simple Recursive Decent Parser ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To solve the problem of validating an XML name we're going to write a simple recursive decent, or top-down, parser. The idea is to read from a stream of characters denoted by a pair of InputIterators - we will only need to read each character once without backtracking &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;(a &lt;/del&gt;grammar which can be read without backtracking is known as an [http://en.wikipedia.org/wiki/LL_parser LL(1)] grammar.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To solve the problem of validating an XML name we're going to write a simple recursive decent, or top-down, parser. The idea is to read from a stream of characters denoted by a pair of InputIterators - we will only need to read each character once without backtracking&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. A &lt;/ins&gt;grammar which can be read &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;in this form &lt;/ins&gt;without backtracking is known as an [http://en.wikipedia.org/wiki/LL_parser LL(1)] grammar.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To do this we are going to translate each production rule into a function which will return a &amp;lt;tt&amp;gt;bool&amp;lt;/tt&amp;gt; with the value &amp;lt;tt&amp;gt;true&amp;lt;/tt&amp;gt; if it recognizes the production and false if it does not. If it does recognize the production then the stream will be advanced to the end of the production and a value will be updated with the recognized production. Otherwise, the stream and value are left unchanged. Such a function will follow this pattern:&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To do this we are going to translate each production rule into a function which will return a &amp;lt;tt&amp;gt;bool&amp;lt;/tt&amp;gt; with the value &amp;lt;tt&amp;gt;true&amp;lt;/tt&amp;gt; if it recognizes the production and false if it does not. If it does recognize the production then the stream will be advanced to the end of the production and a value will be updated with the recognized production. Otherwise, the stream and value are left unchanged. Such a function will follow this pattern:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2290&amp;oldid=prev</id>
		<title>SeanParent: /* A Simple Recursive Decent Parser */</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2290&amp;oldid=prev"/>
				<updated>2008-05-20T17:16:03Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;A Simple Recursive Decent Parser&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 17:16, 20 May 2008&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 9:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 9:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== A Simple Recursive Decent Parser ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== A Simple Recursive Decent Parser ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To solve the problem of validating an XML name we're going to write a simple recursive decent, or top-down, parser. The idea is to read from a stream of &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;character &lt;/del&gt;denoted by a pair of InputIterators - we will only need to read each character once without backtracking (a grammar which can be read without backtracking is known as an [http://en.wikipedia.org/wiki/LL_parser LL(1)] grammar.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To solve the problem of validating an XML name we're going to write a simple recursive decent, or top-down, parser. The idea is to read from a stream of &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;characters &lt;/ins&gt;denoted by a pair of InputIterators - we will only need to read each character once without backtracking (a grammar which can be read without backtracking is known as an [http://en.wikipedia.org/wiki/LL_parser LL(1)] grammar.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To do this we are going to translate each production rule into a function which will return a &amp;lt;tt&amp;gt;bool&amp;lt;/tt&amp;gt; with the value &amp;lt;tt&amp;gt;true&amp;lt;/tt&amp;gt; if it recognizes the production and false if it does not. If it does recognize the production then the stream will be advanced to the end of the production and a value will be updated with the recognized production. Otherwise, the stream and value are left unchanged. Such a function will follow this pattern:&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;To do this we are going to translate each production rule into a function which will return a &amp;lt;tt&amp;gt;bool&amp;lt;/tt&amp;gt; with the value &amp;lt;tt&amp;gt;true&amp;lt;/tt&amp;gt; if it recognizes the production and false if it does not. If it does recognize the production then the stream will be advanced to the end of the production and a value will be updated with the recognized production. Otherwise, the stream and value are left unchanged. Such a function will follow this pattern:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2289&amp;oldid=prev</id>
		<title>SeanParent: /* Introduction */</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2289&amp;oldid=prev"/>
				<updated>2008-05-20T17:14:50Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Introduction&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 17:14, 20 May 2008&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 3:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 3:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;This ''paper'' is a response to an e-mail question asking if anyone had code to validate that a sequence of characters was a valid XML name which was not a reserved XML name (starting with &amp;lt;tt&amp;gt;'x'|'X' 'm'|'M' 'l'|'L'&amp;lt;/tt&amp;gt;) or a qualified name (one containing &amp;lt;tt&amp;gt;':'&amp;lt;/tt&amp;gt;). ''See [http://www.w3.org/TR/REC-xml/#sec-common-syn Section 2.3 of the XML Specification].''&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;This ''paper'' is a response to an e-mail question asking if anyone had code to validate that a sequence of characters was a valid XML name which was not a reserved XML name (starting with &amp;lt;tt&amp;gt;'x'|'X' 'm'|'M' 'l'|'L'&amp;lt;/tt&amp;gt;) or a qualified name (one containing &amp;lt;tt&amp;gt;':'&amp;lt;/tt&amp;gt;). ''See [http://www.w3.org/TR/REC-xml/#sec-common-syn Section 2.3 of the XML Specification].''&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Languages are defined using a grammar. There are many notations for grammars but most use some variant of [http://en.wikipedia.org/wiki/Extended_Backus–Naur_form Extended Backus-Naur form] (EBNF), XML is no exception to this rule. The variant of EBNF used for the XML notation is described in [http://www.w3.org/TR/REC-xml/#sec-notation Section 6 of the XML Specification]. Transforming an EBNF grammar into a lexical analyzer and/or parser is a relatively straight forward process. For simplicity I define the lexical analyzer as the system which transforms a stream of characters into tokens and the parser as the system which &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;reads the &lt;/del&gt;tokens &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;and takes invokes some semantic action &lt;/del&gt;for &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;the &lt;/del&gt;production &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;rules&lt;/del&gt;. There are libraries, such as the [http://www.boost.org/doc/libs/1_35_0/libs/spirit/index.html Boost Spirit Library] and tools such as [http://en.wikipedia.org/wiki/Lex_programming_tool Lex] and [http://en.wikipedia.org/wiki/Yacc Yacc] which can be used to aid in writing lexical analyzers and parser. Knowing how to write these systems simply and directly is an invaluable tool in any programmers tool chest and experience with writing such systems will make you more productive when using libraries and tools and give you some insight into when you should and should not use them. Some issues (including this one) can also be addressed by writing a regular expression using a library such as the [http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html Boost Regex Library] - however, dealing with the numerous ranges of values in the grammar production with this problem are difficult to express in a regular expression. A [http://en.wikipedia.org/wiki/Finite_state_machine finite automata] can also be constructed for many such problems (including this one) and such systems have the advantage of inverting control flow - you put values into a finite automata as opposed to executing a parse function which pulls values from a stream.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Languages are defined using a grammar. There are many notations for grammars but most use some variant of [http://en.wikipedia.org/wiki/Extended_Backus–Naur_form Extended Backus-Naur form] (EBNF), XML is no exception to this rule. The variant of EBNF used for the XML notation is described in [http://www.w3.org/TR/REC-xml/#sec-notation Section 6 of the XML Specification]. Transforming an EBNF grammar into a lexical analyzer and/or parser is a relatively straight forward process. For simplicity I define the lexical analyzer as the system which transforms a stream of characters into tokens and the parser as the system which &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;transforms a stream of &lt;/ins&gt;tokens &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;into a sequence of actions &lt;/ins&gt;for &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;each &lt;/ins&gt;production &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;rule&lt;/ins&gt;. There are libraries, such as the [http://www.boost.org/doc/libs/1_35_0/libs/spirit/index.html Boost Spirit Library] and tools such as [http://en.wikipedia.org/wiki/Lex_programming_tool Lex] and [http://en.wikipedia.org/wiki/Yacc Yacc] which can be used to aid in writing lexical analyzers and parser. Knowing how to write these systems simply and directly is an invaluable tool in any programmers tool chest and experience with writing such systems will make you more productive when using libraries and tools and give you some insight into when you should and should not use them. Some issues (including this one) can also be addressed by writing a regular expression using a library such as the [http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html Boost Regex Library] - however, dealing with the numerous ranges of values in the grammar production with this problem are difficult to express in a regular expression. A [http://en.wikipedia.org/wiki/Finite_state_machine finite automata] can also be constructed for many such problems (including this one) and such systems have the advantage of inverting control flow - you put values into a finite automata as opposed to executing a parse function which pulls values from a stream.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When you look at a typical EBNF grammar it will frequently not distinguish which parts are the lexical analyzer production rules and which are the parser production rules - the structure of both a simple parser and simple lexer will be the same. The separation is the point where the value returned by the production rule is not a character from the stream, but some form of token. Because the structure is the same I'm going to refer from here on to both parts as simply the ''parser''.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When you look at a typical EBNF grammar it will frequently not distinguish which parts are the lexical analyzer production rules and which are the parser production rules - the structure of both a simple parser and simple lexer will be the same. The separation is the point where the value returned by the production rule is not a character from the stream, but some form of token. Because the structure is the same I'm going to refer from here on to both parts as simply the ''parser''.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2240&amp;oldid=prev</id>
		<title>SeanParent: /* A Simple Recursive Decent Parser */</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2240&amp;oldid=prev"/>
				<updated>2008-05-03T00:19:49Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;A Simple Recursive Decent Parser&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 00:19, 3 May 2008&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 103:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 103:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;NameChar &lt;del class=&quot;diffchange diffchange-inline&quot;&gt; &lt;/del&gt;::= &lt;del class=&quot;diffchange diffchange-inline&quot;&gt; &lt;/del&gt;Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2239&amp;oldid=prev</id>
		<title>SeanParent: /* A Simple Recursive Decent Parser */</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2239&amp;oldid=prev"/>
				<updated>2008-05-03T00:18:31Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;A Simple Recursive Decent Parser&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 00:18, 3 May 2008&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 142:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 142:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Ideographic &lt;del class=&quot;diffchange diffchange-inline&quot;&gt; &lt;/del&gt;::= &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;	&lt;/del&gt;[#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Ideographic ::= [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2238&amp;oldid=prev</id>
		<title>SeanParent: /* A Simple Recursive Decent Parser */</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2238&amp;oldid=prev"/>
				<updated>2008-05-03T00:17:38Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;A Simple Recursive Decent Parser&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 00:17, 3 May 2008&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 121:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 121:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Letter &lt;del class=&quot;diffchange diffchange-inline&quot;&gt; &lt;/del&gt;::= BaseChar | Ideographic&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Letter ::= BaseChar | Ideographic&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2237&amp;oldid=prev</id>
		<title>SeanParent: /* A Simple Recursive Decent Parser */</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2237&amp;oldid=prev"/>
				<updated>2008-05-03T00:12:59Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;A Simple Recursive Decent Parser&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;a href=&quot;http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;amp;diff=2237&amp;amp;oldid=2232&quot;&gt;Show changes&lt;/a&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2232&amp;oldid=prev</id>
		<title>SeanParent: /* Introduction */</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php?title=How_To_Write_A_Simple_Lexical_Analyzer_or_Parser&amp;diff=2232&amp;oldid=prev"/>
				<updated>2008-05-02T23:34:56Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Introduction&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 23:34, 2 May 2008&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 5:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 5:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Languages are defined using a grammar. There are many notations for grammars but most use some variant of [http://en.wikipedia.org/wiki/Extended_Backus–Naur_form Extended Backus-Naur form] (EBNF), XML is no exception to this rule. The variant of EBNF used for the XML notation is described in [http://www.w3.org/TR/REC-xml/#sec-notation Section 6 of the XML Specification]. Transforming an EBNF grammar into a lexical analyzer and/or parser is a relatively straight forward process. For simplicity I define the lexical analyzer as the system which transforms a stream of characters into tokens and the parser as the system which reads the tokens and takes invokes some semantic action for the production rules. There are libraries, such as the [http://www.boost.org/doc/libs/1_35_0/libs/spirit/index.html Boost Spirit Library] and tools such as [http://en.wikipedia.org/wiki/Lex_programming_tool Lex] and [http://en.wikipedia.org/wiki/Yacc Yacc] which can be used to aid in writing lexical analyzers and parser. Knowing how to write these systems simply and directly is an invaluable tool in any programmers tool chest and experience with writing such systems will make you more productive when using libraries and tools and give you some insight into when you should and should not use them. Some issues (including this one) can also be addressed by writing a regular expression using a library such as the [http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html Boost Regex Library] - however, dealing with the numerous ranges of values in the grammar production with this problem are difficult to express in a regular expression. A [http://en.wikipedia.org/wiki/Finite_state_machine finite automata] can also be constructed for many such problems (including this one) and such systems have the advantage of inverting control flow - you put values into a finite automata as opposed to executing a parse function which pulls values from a stream.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Languages are defined using a grammar. There are many notations for grammars but most use some variant of [http://en.wikipedia.org/wiki/Extended_Backus–Naur_form Extended Backus-Naur form] (EBNF), XML is no exception to this rule. The variant of EBNF used for the XML notation is described in [http://www.w3.org/TR/REC-xml/#sec-notation Section 6 of the XML Specification]. Transforming an EBNF grammar into a lexical analyzer and/or parser is a relatively straight forward process. For simplicity I define the lexical analyzer as the system which transforms a stream of characters into tokens and the parser as the system which reads the tokens and takes invokes some semantic action for the production rules. There are libraries, such as the [http://www.boost.org/doc/libs/1_35_0/libs/spirit/index.html Boost Spirit Library] and tools such as [http://en.wikipedia.org/wiki/Lex_programming_tool Lex] and [http://en.wikipedia.org/wiki/Yacc Yacc] which can be used to aid in writing lexical analyzers and parser. Knowing how to write these systems simply and directly is an invaluable tool in any programmers tool chest and experience with writing such systems will make you more productive when using libraries and tools and give you some insight into when you should and should not use them. Some issues (including this one) can also be addressed by writing a regular expression using a library such as the [http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html Boost Regex Library] - however, dealing with the numerous ranges of values in the grammar production with this problem are difficult to express in a regular expression. A [http://en.wikipedia.org/wiki/Finite_state_machine finite automata] can also be constructed for many such problems (including this one) and such systems have the advantage of inverting control flow - you put values into a finite automata as opposed to executing a parse function which pulls values from a stream.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When you look at a typical EBNF grammar it will frequently not distinguish which parts are the lexical analyzer production rules and which are the parser production rules - the structure of both a simple parser and simple lexer will be the same. The separation point &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;becomes apparent&amp;#160; once you start writing &lt;/del&gt;the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;grammar&lt;/del&gt;. Because the structure is the same I'm going to refer from here on to both parts as simply the ''parser''.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When you look at a typical EBNF grammar it will frequently not distinguish which parts are the lexical analyzer production rules and which are the parser production rules - the structure of both a simple parser and simple lexer will be the same. The separation &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;is the &lt;/ins&gt;point &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;where the value returned by the production rule is not a character from &lt;/ins&gt;the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;stream, but some form of token&lt;/ins&gt;. Because the structure is the same I'm going to refer from here on to both parts as simply the ''parser''.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== A Simple Recursive Decent Parser ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== A Simple Recursive Decent Parser ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	</feed>