<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/css" href="http://stlab.adobe.com/wiki/skins/common/feed.css"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://stlab.adobe.com/wiki/index.php?title=Special:Newpages&amp;feed=atom</id>
		<title>Adobe Open Source Wiki - New pages [en]</title>
		<link rel="self" type="application/atom+xml" href="http://stlab.adobe.com/wiki/index.php?title=Special:Newpages&amp;feed=atom"/>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Special:Newpages"/>
		<updated>2009-11-24T05:39:01Z</updated>
		<subtitle>From Adobe Open Source Wiki</subtitle>
		<generator>MediaWiki 1.6.7</generator>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/ABI_safe_library_theory_of_operation</id>
		<title>ABI safe library theory of operation</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/ABI_safe_library_theory_of_operation"/>
				<updated>2009-08-27T23:59:08Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The idea with the ABI safe library (currently the version_1 namespace types) is to provide a set of types that have a defined binary representation regardless of compiler settings, version, or options. And so these types can be passed freely across a DLL boundary.&lt;br /&gt;
&lt;br /&gt;
The template types are intended to be ABI safe so long as they are themselves instantiated with an ABI safe type.&lt;br /&gt;
&lt;br /&gt;
The following are the specific techniques that the library uses:&lt;br /&gt;
&lt;br /&gt;
1) Memory allocation - any heap allocated object will carry with it an capture_allocator (look up the actual type name). This allocator holds a pointer to a new/delete pair of proc pointers. All allocations go through the allocator and all frees also go through the allocator. This ensures that a consistent new/delete pair is used if a DLL or application of locally replaced these operations.&lt;br /&gt;
&lt;br /&gt;
1a) The memory allocators in the ABI safe containers are copied with the pointers which were allocated by them. With STL containers it is undefined if the allocator is copied with the container, this means that the capture_allocator is not guaranteed to work with containers other than the version_1 containers.&lt;br /&gt;
&lt;br /&gt;
2) Alignment - the types all try to ensure maximal alignment of types where necessary - this is usually done by aligning to the natural alignment for the type up to 64 bits (there may be issues with types that require 128 bit alignment). There is currently a known issue here with close_hash_map which relies on adobe::pair but makes no attempt to ensure alignment of the second element. So far this hasn't caused any real world problems and it should be fixed at some point by trying to ensure that the current &amp;quot;typical&amp;quot; alignment is kept.&lt;br /&gt;
&lt;br /&gt;
3) No use of C++ compiler constructs, such as RTTI and v-tables which don't have a defined representation. Instead there is an adobe typeinfo system that relies on matching by name (rather than by a symbol that would need to be exported) and constructing v-tables with tables of proc-pointers (compiler generated as static structs - see any_regular_t).&lt;/div&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/Debugging_any_regular_t_and_short_names</id>
		<title>Debugging any regular t and short names</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Debugging_any_regular_t_and_short_names"/>
				<updated>2009-02-11T21:45:26Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Debugging the any_regular_t type can be problematic if you don't know what the data type of the value stored in it is. To make it simpler for to view an any_regular_t in the debugger, release 1.0.41 contains a four character short name for the type which is stored at the top of the vtable.&lt;br /&gt;
&lt;br /&gt;
The structure of an any_regular_t is a pair of doubles, the first word of the first double contains a pointer to a vtable.&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot;&lt;br /&gt;
|+ any_regular_t&lt;br /&gt;
|-&lt;br /&gt;
! colspan=&amp;quot;2&amp;quot; | double || double&lt;br /&gt;
|-&lt;br /&gt;
| vtable* || pad (on 32 bit machines) ||  data or data*&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The first word of the vtable contains a version number on release builds (currently 1) and a short name (a four character constant) on debug builds.&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot;&lt;br /&gt;
|+ vtable&lt;br /&gt;
|-&lt;br /&gt;
| version or short name&lt;br /&gt;
|-&lt;br /&gt;
| ... proc pointers ...&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
By convention, the four character codes are stored in little endian order (so as to be readable in the debugger when viewing memory) -&lt;br /&gt;
&lt;br /&gt;
{| border=&amp;quot;1&amp;quot;&lt;br /&gt;
|+ short name for adobe::version_1::dictionary_t&lt;br /&gt;
|-&lt;br /&gt;
| 'd' || 'i' || 'c' || 't' || (remainder of word is zeros on 64 bit machine)&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Also by convention, ASL will only use lower case letters in their short names - clients are free to use upper case letters. Note that the short name facility is to aid debugging _only_. it is not intended as a unique name for every type.&lt;br /&gt;
&lt;br /&gt;
You can define a name for your own type using the ADOBE_SHORT_NAME_TYPE() macro in the global scope defined in &amp;lt;adobe/typeinfo.hpp&amp;gt;. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
namespace my_space {&lt;br /&gt;
class my_class { };&lt;br /&gt;
} // namespace my_space&lt;br /&gt;
&lt;br /&gt;
ADOBE_SHORT_NAME_TYPE('M','y','C','l', my_space::my_class);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once defined, a short name for a particular type can be referred to using the short_name type function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cout &amp;lt;&amp;lt; adobe::short_name&amp;lt;double&amp;gt;::value;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following is a list of the short names for the CEL types defined in ASL:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; border=&amp;quot;1&amp;quot;&lt;br /&gt;
|+ CEL short names&lt;br /&gt;
|-&lt;br /&gt;
! type || name&lt;br /&gt;
|-&lt;br /&gt;
| double || dble&lt;br /&gt;
|-&lt;br /&gt;
| bool || bool&lt;br /&gt;
|-&lt;br /&gt;
| empty_t || emty&lt;br /&gt;
|-&lt;br /&gt;
| array_t || arry&lt;br /&gt;
|-&lt;br /&gt;
| dictionary_t || dict&lt;br /&gt;
|-&lt;br /&gt;
| string_t || strg&lt;br /&gt;
|-&lt;br /&gt;
| name_t || name&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In addition, the following short names are defined though these won't typically appear in an any_regular_t:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; border=&amp;quot;1&amp;quot;&lt;br /&gt;
|+ short names&lt;br /&gt;
|-&lt;br /&gt;
! type || name&lt;br /&gt;
|-&lt;br /&gt;
| string16_t || st16&lt;br /&gt;
|-&lt;br /&gt;
| float || flot&lt;br /&gt;
|-&lt;br /&gt;
| int || int_&lt;br /&gt;
|-&lt;br /&gt;
| short || shrt&lt;br /&gt;
|-&lt;br /&gt;
| long || long&lt;br /&gt;
|-&lt;br /&gt;
| unsigned int || uint&lt;br /&gt;
|-&lt;br /&gt;
| unsigned short || ushr&lt;br /&gt;
|-&lt;br /&gt;
| unsigned long || ulng&lt;br /&gt;
|-&lt;br /&gt;
| char || char&lt;br /&gt;
|-&lt;br /&gt;
| signed char || schr&lt;br /&gt;
|-&lt;br /&gt;
| uchar || uchr&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/PropertyModelEvaluator</id>
		<title>PropertyModelEvaluator</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/PropertyModelEvaluator"/>
				<updated>2008-10-29T21:32:42Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: pmeval [option]... [file]...&lt;br /&gt;
Command line only options:&lt;br /&gt;
  --help                produce help message&lt;br /&gt;
  -v [ --version ]      print version string&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Grammar:&lt;br /&gt;
translation_unit        = [ adam_test ].                               &lt;br /&gt;
adam_test               = sheet_specifier [ interaction_list ].         &lt;br /&gt;
interaction_list        = interaction [ interaction_list ].             &lt;br /&gt;
interaction             = update_decl | reinitialize_decl               &lt;br /&gt;
                                      | dump_decl | check_decl | print_decl | assert_decl &lt;br /&gt;
                                      | contributing_decl | trailing_comment | . &lt;br /&gt;
update_decl             = &amp;quot;update&amp;quot; dictionary.                        &lt;br /&gt;
reinitialize_decl       = &amp;quot;reinitialize&amp;quot; dictionary.                  &lt;br /&gt;
dump_decl               = &amp;quot;dump&amp;quot;.                                     &lt;br /&gt;
check_decl              = &amp;quot;check&amp;quot; dictionary.                         &lt;br /&gt;
print_decl              = &amp;quot;print&amp;quot; &amp;quot;(&amp;quot; expression &amp;quot;)&amp;quot;.             &lt;br /&gt;
assert_decl             = &amp;quot;assert&amp;quot; &amp;quot;(&amp;quot; expression &amp;quot;)&amp;quot;.            &lt;br /&gt;
contributing_decl       = &amp;quot;contributing&amp;quot;.                             &lt;br /&gt;
                                                                        &lt;br /&gt;
keywords                += &amp;quot;update&amp;quot; | &amp;quot;reinitialize&amp;quot; | &amp;quot;check&amp;quot; | &amp;quot;dump&amp;quot; | &amp;quot;print&amp;quot; &lt;br /&gt;
                           | &amp;quot;assert&amp;quot; | &amp;quot;contributing&amp;quot; .&lt;br /&gt;
&amp;lt;pre/&amp;gt;&lt;/div&gt;</summary>
		<author><name>MatMarcus</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/All_About_Binding</id>
		<title>All About Binding</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/All_About_Binding"/>
				<updated>2008-08-22T22:57:20Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
Within a layout description, &amp;lt;code&amp;gt;bind&amp;lt;/code&amp;gt; arguments are used to specify cells within the sheet to bind the widget to. The value of a bind argument must be of type &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;. A common misconception with the &amp;lt;code&amp;gt;@cell&amp;lt;/code&amp;gt; notation is that &amp;lt;code&amp;gt;@&amp;lt;/code&amp;gt; is some kind of reference operator - it is not. The &amp;lt;code&amp;gt;@&amp;lt;/code&amp;gt; operator is a quote operator which quotes a keyword or identifier and generate a &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
[[Image:bind_01.png|frame|Model-View-Controller]]&lt;br /&gt;
&lt;br /&gt;
The sheet within ASL acts as a model in a Model-View-Controller pattern. A typical widget, such as a checkbox,  acts as a View-Controller.&lt;br /&gt;
&lt;br /&gt;
[[Image:bind_02.png|frame|checkbox(bind: @cell)]]&lt;br /&gt;
&lt;br /&gt;
In this configuration the widget is displaying the current value in the cell to which it is bound, and when clicked on, requests a change of the model to a new value. A control will be disabled if changing the value of the cell to which it is bound will have no effect on the output of the sheet.&lt;br /&gt;
&lt;br /&gt;
Not all widgets comprise both a controller and view. A widget which simply displays a value is only a view, a piece of static text may be neither.&lt;br /&gt;
&lt;br /&gt;
The usual configuration for a button is a bit different. A button binds to a cell as a view, even though it doesn't display the value. The value it binds to are typically the arguments to an action. Clicking on the button will invoke the action with the value from the cell to which the button is bound. A button will be disabled if the value in the cell to which it is bound is &amp;lt;i&amp;gt;invalid&amp;lt;/i&amp;gt;. A cells value is invalid when it is derived from information contributing to an invariant in the sheet.&lt;br /&gt;
&lt;br /&gt;
[[Image:bind_03.png|frame|button(bind: @cell, action: @do_it)]]&lt;br /&gt;
&lt;br /&gt;
A button may also act as a controller. This is the equivalent of a button with an action that simply sets a cell within the sheet. The cell to set is specified with the &amp;lt;code&amp;gt;bind_output&amp;lt;/code&amp;gt; argument. For example, let's say that you wanted a button which added 11 to the &amp;lt;code&amp;gt;value&amp;lt;/code&amp;gt; cell. That could be expressed as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// -- in the sheet&lt;br /&gt;
interface:&lt;br /&gt;
    value: 1;&lt;br /&gt;
output:&lt;br /&gt;
    next_value &amp;lt;== value + 11;&lt;br /&gt;
// -- in the layout&lt;br /&gt;
    button(name: &amp;quot;Next Value&amp;quot;, bind: @next_value, bind_output: value);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Image:bind_04.png|frame|button(bind: @cell, bind_output: @another)]]&lt;br /&gt;
&lt;br /&gt;
The button acts as the latch in the system to prevent this configuration from being an infinite loop.&lt;br /&gt;
&lt;br /&gt;
The value of a button can also be specified with the &amp;lt;code&amp;gt;value&amp;lt;/code&amp;gt; argument, rather than &amp;lt;code&amp;gt;bind&amp;lt;/code&amp;gt;. For example, a button which reset the above value to 1 would look like:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   button(name: &amp;quot;Reset&amp;quot;, value: 1, bind_output: @value);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Such a button acts strictly as a controller.&lt;/div&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/Performance/Analysis/Example3</id>
		<title>Performance/Analysis/Example3</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Performance/Analysis/Example3"/>
				<updated>2008-07-21T01:46:23Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;h2&amp;gt;Stepanov Abstraction with MSVC&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we'll examine the results from stepanov_abstraction.cpp compiled with Microsoft Visual C++ (&amp;lt;b&amp;gt;MSVC&amp;lt;/b&amp;gt;) 2008 and 2005.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
And to be specific:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Intel Core2 QuadCore 2.66Ghz&lt;br /&gt;
Windows Vista 64 bit Ultimate&lt;br /&gt;
&lt;br /&gt;
MSVC 2008:&lt;br /&gt;
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for 80x86&lt;br /&gt;
&lt;br /&gt;
MSVC 2005:&lt;br /&gt;
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cl /nologo /Ox /TP /EHsc /D &amp;quot;WIN32&amp;quot; /D &amp;quot;NDEBUG&amp;quot; /Wp64 stepanov_abstraction.cpp -o stepanov_abstraction.exe&lt;br /&gt;
.\stepanov_abstraction &amp;gt; stepanov_abstraction_msvc2008.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;MSVC 2008&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Accumulate&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                        description   absolute   operations   ratio with&lt;br /&gt;
number                                    time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                     &amp;quot;double pointer&amp;quot;   4.60 sec   869.19 M     1.00&lt;br /&gt;
 1               &amp;quot;double pointer_class&amp;quot;   4.59 sec   872.22 M     1.00&lt;br /&gt;
 2         &amp;quot;DoubleValueWrapper pointer&amp;quot;   4.54 sec   881.06 M     0.99&lt;br /&gt;
 3   &amp;quot;DoubleValueWrapper pointer_class&amp;quot;   4.55 sec   878.16 M     0.99&lt;br /&gt;
 4       &amp;quot;DoubleValueWrapper10 pointer&amp;quot;   4.57 sec   875.08 M     0.99&lt;br /&gt;
 5 &amp;quot;DoubleValueWrapper10 pointer_class&amp;quot;   4.54 sec   881.25 M     0.99&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is good. The compiler had no significant penalty for increasing the data or pointer abstractions in the accumulation loops.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Insertion Sort&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                                       description   absolute   operations   ratio with&lt;br /&gt;
number                                                   time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                     &amp;quot;insertion_sort double pointer&amp;quot;   0.86 sec    2.33 M     1.00&lt;br /&gt;
 1               &amp;quot;insertion_sort double pointer_class&amp;quot;   1.19 sec    1.69 M     1.38&lt;br /&gt;
 2         &amp;quot;insertion_sort DoubleValueWrapper pointer&amp;quot;   1.19 sec    1.69 M     1.38&lt;br /&gt;
 3   &amp;quot;insertion_sort DoubleValueWrapper pointer_class&amp;quot;   1.56 sec    1.28 M     1.82&lt;br /&gt;
 4       &amp;quot;insertion_sort DoubleValueWrapper10 pointer&amp;quot;   1.17 sec    1.71 M     1.36&lt;br /&gt;
 5 &amp;quot;insertion_sort DoubleValueWrapper10 pointer_class&amp;quot;   1.56 sec    1.28 M     1.82&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
This isn't so good -- all of the times should have been the same and the ratios close to 1.0.&lt;br /&gt;
Increasing the abstraction on the value and the pointer both appear to cause slowdowns.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Quicksort&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                                  description   absolute   operations   ratio with&lt;br /&gt;
number                                              time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                     &amp;quot;quicksort double pointer&amp;quot;   1.22 sec   13.15 M     1.00&lt;br /&gt;
 1               &amp;quot;quicksort double pointer_class&amp;quot;   1.34 sec   11.92 M     1.10&lt;br /&gt;
 2         &amp;quot;quicksort DoubleValueWrapper pointer&amp;quot;   1.51 sec   10.58 M     1.24&lt;br /&gt;
 3   &amp;quot;quicksort DoubleValueWrapper pointer_class&amp;quot;   1.61 sec    9.96 M     1.32&lt;br /&gt;
 4       &amp;quot;quicksort DoubleValueWrapper10 pointer&amp;quot;   1.51 sec   10.58 M     1.24&lt;br /&gt;
 5 &amp;quot;quicksort DoubleValueWrapper10 pointer_class&amp;quot;   1.61 sec    9.96 M     1.32&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Again, we have some slowdowns when increasing abstraction, but not as much as we saw in the insertion sort.&lt;br /&gt;
That is partly due to the different complexities of the algorithms being tested:  Insertion sort is O(N^2) and Quicksort is O(NLogN).  Also, the 3 sort algorithms in this test file use pointers, values, indexing and incrementing very differently.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Heap Sort&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                                  description   absolute   operations   ratio with&lt;br /&gt;
number                                              time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                     &amp;quot;heap_sort double pointer&amp;quot;   1.36 sec   11.79 M     1.00&lt;br /&gt;
 1               &amp;quot;heap_sort double pointer_class&amp;quot;   1.42 sec   11.27 M     1.05&lt;br /&gt;
 2         &amp;quot;heap_sort DoubleValueWrapper pointer&amp;quot;   1.37 sec   11.66 M     1.01&lt;br /&gt;
 3   &amp;quot;heap_sort DoubleValueWrapper pointer_class&amp;quot;   1.51 sec   10.57 M     1.12&lt;br /&gt;
 4       &amp;quot;heap_sort DoubleValueWrapper10 pointer&amp;quot;   1.39 sec   11.53 M     1.02&lt;br /&gt;
 5 &amp;quot;heap_sort DoubleValueWrapper10 pointer_class&amp;quot;   1.72 sec    9.32 M     1.26&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And we still have slowdowns with increasing abstraction.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h4&amp;gt;Conclusions for MSVC 2008:&amp;lt;/h4&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;MSVC 2008&amp;lt;/b&amp;gt; could improve code generation related to C++ abstraction.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;MSVC 2005 Comparison&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Accumulate MSVC2005&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                        description   absolute   operations   ratio with&lt;br /&gt;
number                                    time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                     &amp;quot;double pointer&amp;quot;   4.55 sec   878.16 M     1.00&lt;br /&gt;
 1               &amp;quot;double pointer_class&amp;quot;   4.55 sec   878.16 M     1.00&lt;br /&gt;
 2         &amp;quot;DoubleValueWrapper pointer&amp;quot;   4.54 sec   881.06 M     1.00&lt;br /&gt;
 3   &amp;quot;DoubleValueWrapper pointer_class&amp;quot;   4.55 sec   878.16 M     1.00&lt;br /&gt;
 4       &amp;quot;DoubleValueWrapper10 pointer&amp;quot;   4.55 sec   878.16 M     1.00&lt;br /&gt;
 5 &amp;quot;DoubleValueWrapper10 pointer_class&amp;quot;   4.54 sec   881.06 M     1.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Good - very similar times, and no problems with increased abstraction.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Insertion Sort MSVC2005&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                                       description   absolute   operations   ratio with&lt;br /&gt;
number                                                   time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                     &amp;quot;insertion_sort double pointer&amp;quot;   1.19 sec    1.69 M     1.00&lt;br /&gt;
 1               &amp;quot;insertion_sort double pointer_class&amp;quot;   1.17 sec    1.71 M     0.99&lt;br /&gt;
 2         &amp;quot;insertion_sort DoubleValueWrapper pointer&amp;quot;   1.19 sec    1.69 M     1.00&lt;br /&gt;
 3   &amp;quot;insertion_sort DoubleValueWrapper pointer_class&amp;quot;   1.19 sec    1.69 M     1.00&lt;br /&gt;
 4       &amp;quot;insertion_sort DoubleValueWrapper10 pointer&amp;quot;   1.19 sec    1.69 M     1.00&lt;br /&gt;
 5 &amp;quot;insertion_sort DoubleValueWrapper10 pointer_class&amp;quot;   1.19 sec    1.69 M     1.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;b&amp;gt;MSVC 2008&amp;lt;/b&amp;gt; produces faster code in one case, but significantly slower code in 2 cases.&lt;br /&gt;
And &amp;lt;b&amp;gt;MSVC 2005&amp;lt;/b&amp;gt; doesn't have any penalty for increased abstraction -- all the times are close to the same.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Quicksort MSVC2005&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                                  description   absolute   operations   ratio with&lt;br /&gt;
number                                              time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                     &amp;quot;quicksort double pointer&amp;quot;   1.22 sec   13.16 M     1.00&lt;br /&gt;
 1               &amp;quot;quicksort double pointer_class&amp;quot;   1.23 sec   12.98 M     1.01&lt;br /&gt;
 2         &amp;quot;quicksort DoubleValueWrapper pointer&amp;quot;   1.54 sec   10.36 M     1.27&lt;br /&gt;
 3   &amp;quot;quicksort DoubleValueWrapper pointer_class&amp;quot;   1.54 sec   10.36 M     1.27&lt;br /&gt;
 4       &amp;quot;quicksort DoubleValueWrapper10 pointer&amp;quot;   1.56 sec   10.26 M     1.28&lt;br /&gt;
 5 &amp;quot;quicksort DoubleValueWrapper10 pointer_class&amp;quot;   1.54 sec   10.36 M     1.27&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is close to the same as &amp;lt;b&amp;gt;MSVC 2008&amp;lt;/b&amp;gt;, but faster in each case involving the pointer class.&lt;br /&gt;
There is still some penalty for using the value abstractions.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Heap Sort MSVC2005&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                                  description   absolute   operations   ratio with&lt;br /&gt;
number                                              time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                     &amp;quot;heap_sort double pointer&amp;quot;   1.36 sec   11.79 M     1.00&lt;br /&gt;
 1               &amp;quot;heap_sort double pointer_class&amp;quot;   1.37 sec   11.65 M     1.01&lt;br /&gt;
 2         &amp;quot;heap_sort DoubleValueWrapper pointer&amp;quot;   1.37 sec   11.65 M     1.01&lt;br /&gt;
 3   &amp;quot;heap_sort DoubleValueWrapper pointer_class&amp;quot;   1.37 sec   11.65 M     1.01&lt;br /&gt;
 4       &amp;quot;heap_sort DoubleValueWrapper10 pointer&amp;quot;   1.40 sec   11.40 M     1.03&lt;br /&gt;
 5 &amp;quot;heap_sort DoubleValueWrapper10 pointer_class&amp;quot;   1.37 sec   11.66 M     1.01&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is much better than &amp;lt;b&amp;gt;MSVC 2008&amp;lt;/b&amp;gt;, faster on 3 tests, and showing little penalty for increased abstraction.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h4&amp;gt;Conclusions for MSVC 2008 compared with MSVC 2005:&amp;lt;/h4&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;MSVC 2008&amp;lt;/b&amp;gt; produced faster code then &amp;lt;b&amp;gt;MSVC 2005&amp;lt;/b&amp;gt; in one test.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;MSVC 2008&amp;lt;/b&amp;gt; produced slower code than &amp;lt;b&amp;gt;MSVC 2005&amp;lt;/b&amp;gt; in 8 tests.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;MSVC 2008&amp;lt;/b&amp;gt; has increased abstraction penalties in the insertion sort and heap sort tests.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;MSVC 2008&amp;lt;/b&amp;gt; seems to have the most problems in code involving pointer_class.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;/div&gt;</summary>
		<author><name>ChrisCox</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/Performance/Analysis/Example2</id>
		<title>Performance/Analysis/Example2</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Performance/Analysis/Example2"/>
				<updated>2008-06-30T00:24:36Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;h2&amp;gt;Loop Invariant Code Motion and LLVM&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's take a look at the results from simple_types_loop_invariant.cpp&amp;lt;P&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We'll use &amp;lt;b&amp;gt;LLVM-gcc&amp;lt;/b&amp;gt; 2.1, specifically:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
gcc version 4.0.1 (Apple Computer, Inc. build 5449)(LLVM build 2.1)&lt;br /&gt;
Apple PowerMac G5 2.5 Ghz Quad Processor&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;llvm-g++ -I. -O3 simple_types_loop_invariant.cpp -o simple_types_loop_invariant_llvm&lt;br /&gt;
./simple_types_loop_invariant_llvm &amp;gt; simple_types_loop_invariant_llvm.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Integer math&amp;lt;/h2&amp;gt;&lt;br /&gt;
The int32_t case is fairly representative of all the integer results under &amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                          description   absolute   operations   ratio with&lt;br /&gt;
number                                      time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                 &amp;quot;int32_t variable add&amp;quot;   2.57 sec   622.57 M     1.00&lt;br /&gt;
 1       &amp;quot;int32_t multiple variable adds&amp;quot;   2.57 sec   622.57 M     1.00&lt;br /&gt;
 2            &amp;quot;int32_t variable subtract&amp;quot;   2.57 sec   622.57 M     1.00&lt;br /&gt;
 3  &amp;quot;int32_t multiple variable subtracts&amp;quot;   3.52 sec   454.55 M     1.37&lt;br /&gt;
 4            &amp;quot;int32_t variable multiply&amp;quot;   3.84 sec   416.67 M     1.49&lt;br /&gt;
 5 &amp;quot;int32_t multiple variable multiplies&amp;quot;   3.86 sec   414.51 M     1.50&lt;br /&gt;
 6              &amp;quot;int32_t variable divide&amp;quot;  24.36 sec    65.68 M     9.48&lt;br /&gt;
 7    &amp;quot;int32_t multiple variable divides&amp;quot;  53.19 sec    30.08 M    20.70&lt;br /&gt;
 8      &amp;quot;int32_t multiple variable mixed&amp;quot;   3.86 sec   414.51 M     1.50&lt;br /&gt;
 9                 &amp;quot;int32_t variable and&amp;quot;   1.95 sec   820.51 M     0.76&lt;br /&gt;
10        &amp;quot;int32_t multiple variable and&amp;quot;   1.99 sec   804.02 M     0.77&lt;br /&gt;
11                  &amp;quot;int32_t variable or&amp;quot;   2.56 sec   625.00 M     1.00&lt;br /&gt;
12         &amp;quot;int32_t multiple variable or&amp;quot;   2.58 sec   620.16 M     1.00&lt;br /&gt;
13                 &amp;quot;int32_t variable xor&amp;quot;   1.99 sec   804.02 M     0.77&lt;br /&gt;
14        &amp;quot;int32_t multiple variable xor&amp;quot;   1.96 sec   816.33 M     0.76&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;addition&amp;lt;/h3&amp;gt;&lt;br /&gt;
This is good.  Multiple loop invariant adds are the same speed as a single add.&lt;br /&gt;
And that means the compiler moved the loop invariant calculations out of the loop.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;subtraction&amp;lt;/h3&amp;gt;&lt;br /&gt;
Hmm, multiple loop invariant subtracts take longer than a single subtract.&lt;br /&gt;
That means that &amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt; failed to optimize loop invariant subtraction.&lt;br /&gt;
And that is really surprising because it did optimize addition.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for (x=0;x&amp;amp;lt;count;++x)&lt;br /&gt;
	result += input[x] - v1 - v2 - v3 - v4;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
should have been optimized to&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
temp = v1 + v2 + v3 + v4;&lt;br /&gt;
for (x=0;x&amp;amp;lt;count;++x)&lt;br /&gt;
	result += input[x] - temp;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
An assembly dump shows that &amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt; issues each subtract inside the loop.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;multiplication&amp;lt;/h3&amp;gt;&lt;br /&gt;
Also good.  This runs a little slower than adds due to instruction latency, but is optimized correctly.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;division&amp;lt;/h3&amp;gt;&lt;br /&gt;
Clearly not optimized.  Hmm, what was the source for this?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for (x=0;x&amp;amp;lt;count;++x)&lt;br /&gt;
	result += ((((input[x] / v1) / v2) / v3) / v4);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Whoops! That can't be optimized for integer math without getting wrong results.&lt;br /&gt;
I messed up and used the wrong test code, and failed to catch it for months.&lt;br /&gt;
I'll have to change the test to include something that can be moved out of the loop.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;mixed math&amp;lt;/h3&amp;gt;&lt;br /&gt;
This should be the same speed as a single add, but isn't. And yet it is not as slow as if all the operations were done in the inner loop.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for (x=0;x&amp;amp;lt;count;++x)&lt;br /&gt;
	result += input[x] + v1 - v2 * v3 / v4;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
should have been optimized to&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
temp = v1 - v2 * v3 / v4;&lt;br /&gt;
for (x=0;x&amp;amp;lt;count;++x)&lt;br /&gt;
	result += input[x] + temp;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Reading the assembly shows that &amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt; did move the multiply and divide out of the loop, but kept 2 adds and a subtract.&lt;br /&gt;
This is probably related to whatever caused the subtract case to fail earlier.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;bitwise and&amp;lt;/h3&amp;gt;&lt;br /&gt;
Good.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;bitwise or&amp;lt;/h3&amp;gt;&lt;br /&gt;
Both versions run about the same speed, but how is this running slower than a &amp;lt;b&amp;gt;bitwise and&amp;lt;/b&amp;gt;?&lt;br /&gt;
Reading the assembly shows that &amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt; did optimize the multiple &amp;lt;b&amp;gt;bitwise or&amp;lt;/b&amp;gt;s correctly,&lt;br /&gt;
generating almost identical code to the &amp;lt;b&amp;gt;bitwise and&amp;lt;/b&amp;gt; loops.&lt;br /&gt;
But there's an interesting problem - the 7 instruction &amp;lt;b&amp;gt;bitwise or&amp;lt;/b&amp;gt; loops straddle a cacheline boundary.&lt;br /&gt;
On some CPUs (like the PowerPC 970 aka G5) that can add significant stalls. &lt;br /&gt;
If I change the code slightly (by removing some other tests), the loops fall on better alignment and run as fast as the &amp;lt;b&amp;gt;bitwise and&amp;lt;/b&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;bitwise xor&amp;lt;/h3&amp;gt;&lt;br /&gt;
Good.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Other sizes and unsigned values&amp;lt;/h3&amp;gt;&lt;br /&gt;
The 8, 16, 64 bit and unsigned values are very similar to int32_t but are also hitting unaligned&lt;br /&gt;
loop problems that make analysis of optimizations a little more difficult (without reading all the assembly).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Conclusions for integer loop invariants&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;With the -O3 flag for optimization, &amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt; is usually moving integer loop invariant calculations out of inner loops.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt; does not optimize loop invariant integer subtracts. &amp;lt;i&amp;gt;[ LLVM 2.3 fixes this problem - ccox ]&amp;lt;/i&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt; is not aligning loops and can have code position dependent slowdowns on some processors. &amp;lt;i&amp;gt;[ LLVM team is working on it - ccox ]&amp;lt;/i&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Floating point math&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                        description   absolute   operations   ratio with&lt;br /&gt;
number                                    time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                 &amp;quot;float variable add&amp;quot;   3.84 sec   416.67 M     1.00&lt;br /&gt;
 1       &amp;quot;float multiple variable adds&amp;quot;   6.20 sec   258.06 M     1.61&lt;br /&gt;
 2            &amp;quot;float variable subtract&amp;quot;   3.85 sec   415.58 M     1.00&lt;br /&gt;
 3  &amp;quot;float multiple variable subtracts&amp;quot;   6.20 sec   258.06 M     1.61&lt;br /&gt;
 4            &amp;quot;float variable multiply&amp;quot;   3.84 sec   416.67 M     1.00&lt;br /&gt;
 5 &amp;quot;float multiple variable multiplies&amp;quot;   4.72 sec   338.98 M     1.23&lt;br /&gt;
 6              &amp;quot;float variable divide&amp;quot;  18.57 sec    86.16 M     4.84&lt;br /&gt;
 7    &amp;quot;float multiple variable divides&amp;quot;  53.81 sec    29.73 M    14.01&lt;br /&gt;
 8      &amp;quot;float multiple variable mixed&amp;quot;   3.85 sec   415.58 M     1.00&lt;br /&gt;
&lt;br /&gt;
test                         description   absolute   operations   ratio with&lt;br /&gt;
number                                     time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0                 &amp;quot;double variable add&amp;quot;   3.84 sec   416.67 M     1.00&lt;br /&gt;
 1       &amp;quot;double multiple variable adds&amp;quot;   6.23 sec   256.82 M     1.62&lt;br /&gt;
 2            &amp;quot;double variable subtract&amp;quot;   3.84 sec   416.67 M     1.00&lt;br /&gt;
 3  &amp;quot;double multiple variable subtracts&amp;quot;   6.22 sec   257.23 M     1.62&lt;br /&gt;
 4            &amp;quot;double variable multiply&amp;quot;   3.84 sec   416.67 M     1.00&lt;br /&gt;
 5 &amp;quot;double multiple variable multiplies&amp;quot;   4.17 sec   383.69 M     1.09&lt;br /&gt;
 6              &amp;quot;double variable divide&amp;quot;  18.58 sec    86.11 M     4.84&lt;br /&gt;
 7    &amp;quot;double multiple variable divides&amp;quot;  53.80 sec    29.74 M    14.01&lt;br /&gt;
 8      &amp;quot;double multiple variable mixed&amp;quot;   3.85 sec   415.58 M     1.00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt; fails to completely optimize even a single case.&lt;br /&gt;
But the assembly dump shows that it does partly optimize the mixed math case similar to the way it did for integers.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h3&amp;gt;Conclusions for floating point loop invariants&amp;lt;/h3&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;With the -O3 flag for optimization, &amp;lt;b&amp;gt;LLVM&amp;lt;/b&amp;gt; is rarely moving floating point loop invariant calculations out of inner loops.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;NOTE: LLVM doesn't do the FP optimizations because the LLVM architect believes it would affect the result of the FP operations.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;/div&gt;</summary>
		<author><name>ChrisCox</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/Performance/Analysis/Example1</id>
		<title>Performance/Analysis/Example1</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Performance/Analysis/Example1"/>
				<updated>2008-05-30T03:23:29Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;h2&amp;gt;Loop Unrolling and GCC&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Let's take a look at the output from loop_unroll.cpp, compiled with &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; 4.0.1.&lt;br /&gt;
(not to pick on &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt;, but it gives some very interesting results)&lt;br /&gt;
&lt;br /&gt;
And to be specific:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Apple PowerMac G5 2.5 Ghz Quad Processor&lt;br /&gt;
gcc version 4.0.1 (Apple Computer, Inc. build 5367)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;g++ -I. -O3 loop_unroll.cpp -o loop_unroll_gcc&lt;br /&gt;
./loop_unroll_gcc &amp;gt; &amp;quot;loop_unroll gccO3.txt&amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;Integer loop unrolling&amp;lt;/h3&amp;gt;&lt;br /&gt;
I'm only copying the first 10 results of each section to save space.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                description   absolute   operations   ratio with&lt;br /&gt;
number                            time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;int32_t for loop unroll 1&amp;quot;  12.10 sec   198.35 M     1.00&lt;br /&gt;
 1  &amp;quot;int32_t for loop unroll 2&amp;quot;  13.94 sec   172.17 M     1.15&lt;br /&gt;
 2  &amp;quot;int32_t for loop unroll 3&amp;quot;  10.61 sec   226.20 M     0.88&lt;br /&gt;
 3  &amp;quot;int32_t for loop unroll 4&amp;quot;   8.91 sec   269.36 M     0.74&lt;br /&gt;
 4  &amp;quot;int32_t for loop unroll 5&amp;quot;   9.62 sec   249.48 M     0.80&lt;br /&gt;
 5  &amp;quot;int32_t for loop unroll 6&amp;quot;   9.77 sec   245.65 M     0.81&lt;br /&gt;
 6  &amp;quot;int32_t for loop unroll 7&amp;quot;   9.16 sec   262.01 M     0.76&lt;br /&gt;
 7  &amp;quot;int32_t for loop unroll 8&amp;quot;   9.86 sec   243.41 M     0.81&lt;br /&gt;
 8  &amp;quot;int32_t for loop unroll 9&amp;quot;   9.31 sec   257.79 M     0.77&lt;br /&gt;
 9 &amp;quot;int32_t for loop unroll 10&amp;quot;   8.96 sec   267.86 M     0.74&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The times are more or less decreasing with the unroll factor, and the minimum time is reached after an unroll factor of 4 or more.&lt;br /&gt;
This is close to the pattern we expect to see when loops are unrolled by hand and the compiler doesn't do any further unrolling.&lt;br /&gt;
In fact, with only -O3, &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; does not unroll loops (missing out on an important optimization!).&lt;br /&gt;
So later, we'll have to try another test with &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt;'s -funroll-loops flag.&lt;br /&gt;
&lt;br /&gt;
Because unrolling by a factor of 2 got slower than a non-unrolled loop, we can also guess that &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt;&lt;br /&gt;
has some trouble scheduling the instructions (it should have been the same or faster).&lt;br /&gt;
A look at the assembly code confirms that the factor of 2 case has a few more stalls than the non-unrolled case.&lt;br /&gt;
I am a little worried that the times go back up when unrolling by a factor of 5 or more.&lt;br /&gt;
But another look at the assembly shows that it's just more less-than-optimal scheduling.&lt;br /&gt;
&lt;br /&gt;
Alright, on to the &amp;lt;b&amp;gt;while&amp;lt;/b&amp;gt; loops.  These look about the same as the &amp;lt;b&amp;gt;for&amp;lt;/b&amp;gt; loops.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                  description   absolute   operations   ratio with&lt;br /&gt;
number                              time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;int32_t while loop unroll 1&amp;quot;  12.11 sec   198.18 M     1.00&lt;br /&gt;
 1  &amp;quot;int32_t while loop unroll 2&amp;quot;  13.93 sec   172.29 M     1.15&lt;br /&gt;
 2  &amp;quot;int32_t while loop unroll 3&amp;quot;  10.60 sec   226.42 M     0.88&lt;br /&gt;
 3  &amp;quot;int32_t while loop unroll 4&amp;quot;   8.90 sec   269.66 M     0.73&lt;br /&gt;
 4  &amp;quot;int32_t while loop unroll 5&amp;quot;   9.61 sec   249.74 M     0.79&lt;br /&gt;
 5  &amp;quot;int32_t while loop unroll 6&amp;quot;   9.78 sec   245.40 M     0.81&lt;br /&gt;
 6  &amp;quot;int32_t while loop unroll 7&amp;quot;   9.19 sec   261.15 M     0.76&lt;br /&gt;
 7  &amp;quot;int32_t while loop unroll 8&amp;quot;   9.86 sec   243.41 M     0.81&lt;br /&gt;
 8  &amp;quot;int32_t while loop unroll 9&amp;quot;   9.32 sec   257.51 M     0.77&lt;br /&gt;
 9 &amp;quot;int32_t while loop unroll 10&amp;quot;   8.95 sec   268.16 M     0.74&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;b&amp;gt;do&amp;lt;/b&amp;gt; loops are just a little different.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test               description   absolute   operations   ratio with&lt;br /&gt;
number                           time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;int32_t do loop unroll 1&amp;quot;  11.22 sec   213.90 M     1.00&lt;br /&gt;
 1  &amp;quot;int32_t do loop unroll 2&amp;quot;  13.46 sec   178.31 M     1.20&lt;br /&gt;
 2  &amp;quot;int32_t do loop unroll 3&amp;quot;  10.00 sec   240.00 M     0.89&lt;br /&gt;
 3  &amp;quot;int32_t do loop unroll 4&amp;quot;   8.89 sec   269.97 M     0.79&lt;br /&gt;
 4  &amp;quot;int32_t do loop unroll 5&amp;quot;   9.24 sec   259.74 M     0.82&lt;br /&gt;
 5  &amp;quot;int32_t do loop unroll 6&amp;quot;   9.78 sec   245.40 M     0.87&lt;br /&gt;
 6  &amp;quot;int32_t do loop unroll 7&amp;quot;   8.55 sec   280.70 M     0.76&lt;br /&gt;
 7  &amp;quot;int32_t do loop unroll 8&amp;quot;   9.74 sec   246.41 M     0.87&lt;br /&gt;
 8  &amp;quot;int32_t do loop unroll 9&amp;quot;   9.30 sec   258.06 M     0.83&lt;br /&gt;
 9 &amp;quot;int32_t do loop unroll 10&amp;quot;   8.67 sec   276.82 M     0.77&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;do&amp;lt;/b&amp;gt; loops are faster &amp;lt;b&amp;gt;for&amp;lt;/b&amp;gt; the non-unrolled case than &amp;lt;b&amp;gt;for&amp;lt;/b&amp;gt; or &amp;lt;b&amp;gt;while&amp;lt;/b&amp;gt; loops.&lt;br /&gt;
That means that &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; may have some problems normalizing the different loop forms or with&lt;br /&gt;
instantiating the templates.  Unfortunately, reading the assembly and compiler intermediate data&lt;br /&gt;
didn't help me identify the exact cause.&lt;br /&gt;
But it is still close to the results for the &amp;lt;b&amp;gt;for&amp;lt;/b&amp;gt; loops after unrolling by a factor of 2 or more.&lt;br /&gt;
&lt;br /&gt;
Finally, the &amp;lt;b&amp;gt;goto&amp;lt;/b&amp;gt; loops look similar to the &amp;lt;b&amp;gt;do&amp;lt;/b&amp;gt; loops.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                 description   absolute   operations   ratio with&lt;br /&gt;
number                             time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;int32_t goto loop unroll 1&amp;quot;  11.21 sec   214.09 M     1.00&lt;br /&gt;
 1  &amp;quot;int32_t goto loop unroll 2&amp;quot;  13.46 sec   178.31 M     1.20&lt;br /&gt;
 2  &amp;quot;int32_t goto loop unroll 3&amp;quot;  10.18 sec   235.76 M     0.91&lt;br /&gt;
 3  &amp;quot;int32_t goto loop unroll 4&amp;quot;   8.90 sec   269.66 M     0.79&lt;br /&gt;
 4  &amp;quot;int32_t goto loop unroll 5&amp;quot;   9.24 sec   259.74 M     0.82&lt;br /&gt;
 5  &amp;quot;int32_t goto loop unroll 6&amp;quot;   9.77 sec   245.65 M     0.87&lt;br /&gt;
 6  &amp;quot;int32_t goto loop unroll 7&amp;quot;   8.56 sec   280.37 M     0.76&lt;br /&gt;
 7  &amp;quot;int32_t goto loop unroll 8&amp;quot;   9.74 sec   246.41 M     0.87&lt;br /&gt;
 8  &amp;quot;int32_t goto loop unroll 9&amp;quot;   9.30 sec   258.06 M     0.83&lt;br /&gt;
 9 &amp;quot;int32_t goto loop unroll 10&amp;quot;   8.66 sec   277.14 M     0.77&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h4&amp;gt;Conclusions for int32_t loop unrolling under &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt;:&amp;lt;/h4&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;With the -O3 flag for optimization, &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; is not unrolling loops.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; can generate faster code for many of these loops if it unrolled them.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; can generate faster code for the non-unrolled &amp;lt;b&amp;gt;for&amp;lt;/b&amp;gt; and &amp;lt;b&amp;gt;while&amp;lt;/b&amp;gt; loops if&lt;br /&gt;
it generated code similar to the &amp;lt;b&amp;gt;do&amp;lt;/b&amp;gt; and &amp;lt;b&amp;gt;goto&amp;lt;/b&amp;gt; loops.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; has some instruction scheduling problems on my CPU.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;Double loop unrolling&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's take a look at the double precision loop unrolling results.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;test               description   absolute   operations   ratio with&lt;br /&gt;
number                           time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;double for loop unroll 1&amp;quot;   3.13 sec   191.69 M     1.00&lt;br /&gt;
 1  &amp;quot;double for loop unroll 2&amp;quot;   2.23 sec   269.06 M     0.71&lt;br /&gt;
 2  &amp;quot;double for loop unroll 3&amp;quot;   3.91 sec   153.45 M     1.25&lt;br /&gt;
 3  &amp;quot;double for loop unroll 4&amp;quot;   7.46 sec    80.43 M     2.38&lt;br /&gt;
 4  &amp;quot;double for loop unroll 5&amp;quot;   6.90 sec    86.96 M     2.20&lt;br /&gt;
 5  &amp;quot;double for loop unroll 6&amp;quot;   5.79 sec   103.63 M     1.85&lt;br /&gt;
 6  &amp;quot;double for loop unroll 7&amp;quot;   6.30 sec    95.24 M     2.01&lt;br /&gt;
 7  &amp;quot;double for loop unroll 8&amp;quot;   6.65 sec    90.23 M     2.12&lt;br /&gt;
 8  &amp;quot;double for loop unroll 9&amp;quot;   7.07 sec    84.87 M     2.26&lt;br /&gt;
 9 &amp;quot;double for loop unroll 10&amp;quot;   7.30 sec    82.19 M     2.33&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The times go down, then up, way up, down a little, up again, and... What the heck?&lt;br /&gt;
If a compiler doesn't unroll the loops beyond what the template did,&lt;br /&gt;
the times should decrease with larger unrolling factors.&lt;br /&gt;
And we know that with -O3, &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; is not unrolling the loops.&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;b&amp;gt;while&amp;lt;/b&amp;gt; loops look about the same as the &amp;lt;b&amp;gt;for&amp;lt;/b&amp;gt; loops.&lt;br /&gt;
The &amp;lt;b&amp;gt;do&amp;lt;/b&amp;gt; and &amp;lt;b&amp;gt;goto&amp;lt;/b&amp;gt; loops are again a little faster for the non-unrolled case.&lt;br /&gt;
But all of the double loops show the same unexpected timings.&lt;br /&gt;
&lt;br /&gt;
What is going on?&lt;br /&gt;
&lt;br /&gt;
I tried to isolate a few of the double tests so I could look at the assembly, but when I ran it I got:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test               description   absolute   operations   ratio with&lt;br /&gt;
number                           time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;double for loop unroll 1&amp;quot;   3.13 sec   191.69 M     1.00&lt;br /&gt;
 1  &amp;quot;double for loop unroll 2&amp;quot;   2.23 sec   269.06 M     0.71&lt;br /&gt;
 2  &amp;quot;double for loop unroll 3&amp;quot;   1.99 sec   301.51 M     0.64&lt;br /&gt;
 3  &amp;quot;double for loop unroll 4&amp;quot;   1.95 sec   307.69 M     0.62&lt;br /&gt;
 4  &amp;quot;double for loop unroll 5&amp;quot;   1.92 sec   312.50 M     0.61&lt;br /&gt;
 5  &amp;quot;double for loop unroll 6&amp;quot;   1.82 sec   329.67 M     0.58&lt;br /&gt;
 6  &amp;quot;double for loop unroll 7&amp;quot;   1.95 sec   307.69 M     0.62&lt;br /&gt;
 7  &amp;quot;double for loop unroll 8&amp;quot;   2.01 sec   298.51 M     0.64&lt;br /&gt;
 8  &amp;quot;double for loop unroll 9&amp;quot;   2.06 sec   291.26 M     0.66&lt;br /&gt;
 9 &amp;quot;double for loop unroll 10&amp;quot;   2.02 sec   297.03 M     0.65&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wait! That's not the same!&lt;br /&gt;
&lt;br /&gt;
When I add back some of the other tests (&amp;lt;b&amp;gt;while&amp;lt;/b&amp;gt;, &amp;lt;b&amp;gt;do&amp;lt;/b&amp;gt;, &amp;lt;b&amp;gt;goto&amp;lt;/b&amp;gt;, or the integer tests)&lt;br /&gt;
the times get worse - but not in any predictable pattern that I could see.&lt;br /&gt;
Somehow, &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; is having problems with these loops because of the surrounding code.&lt;br /&gt;
&lt;br /&gt;
Sadly, the results are very similar if I compile with -funroll-loops.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h4&amp;gt;Conclusions for double loop unrolling under &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt;:&amp;lt;/h4&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&amp;lt;li&amp;gt;I can't draw any conclusions about unrolling loops with double precision math using this code,&lt;br /&gt;
until some bugs are fixed in &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;i&amp;gt;[I confirmed with hand unrolled code that the double loops show the same timing&lt;br /&gt;
patterns as the integer loops. The &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; bug seems related to template instantiation -- ccox]&amp;lt;/i&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;hr&amp;gt;&lt;br /&gt;
&amp;lt;h3&amp;gt;Integer loops with -funroll-loops&amp;lt;/h3&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now let's see what happens when we explicitly tell &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; to unroll the loops.&lt;br /&gt;
Of course, adding the extra flag is not allowed by the benchmark rules, but it is interesting to&lt;br /&gt;
know what would happen if &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; did turn on loop unrolling.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;g++ -I. -O3 -funroll-loops loop_unroll.cpp -o loop_unroll_gcc_unrolled&lt;br /&gt;
./loop_unroll_gcc_unrolled &amp;gt; &amp;quot;loop_unroll gccO3unroll.txt&amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                description   absolute   operations   ratio with&lt;br /&gt;
number                            time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;int32_t for loop unroll 1&amp;quot;  10.30 sec   233.01 M     1.00&lt;br /&gt;
 1  &amp;quot;int32_t for loop unroll 2&amp;quot;  10.57 sec   227.06 M     1.03&lt;br /&gt;
 2  &amp;quot;int32_t for loop unroll 3&amp;quot;  10.62 sec   225.99 M     1.03&lt;br /&gt;
 3  &amp;quot;int32_t for loop unroll 4&amp;quot;   8.90 sec   269.66 M     0.86&lt;br /&gt;
 4  &amp;quot;int32_t for loop unroll 5&amp;quot;   9.62 sec   249.48 M     0.93&lt;br /&gt;
 5  &amp;quot;int32_t for loop unroll 6&amp;quot;   9.78 sec   245.40 M     0.95&lt;br /&gt;
 6  &amp;quot;int32_t for loop unroll 7&amp;quot;   9.22 sec   260.30 M     0.90&lt;br /&gt;
 7  &amp;quot;int32_t for loop unroll 8&amp;quot;   9.86 sec   243.41 M     0.96&lt;br /&gt;
 8  &amp;quot;int32_t for loop unroll 9&amp;quot;   9.32 sec   257.51 M     0.90&lt;br /&gt;
 9 &amp;quot;int32_t for loop unroll 10&amp;quot;   8.95 sec   268.16 M     0.87&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The non-unrolled case, and the unrolled by 2 cases have improved a little.&lt;br /&gt;
The rest of the cases look about the same as without -funroll-loops.&lt;br /&gt;
It looks like &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; is unrolling small loops, but not larger loops.&lt;br /&gt;
And &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; is not unrolling them enough to get optimal performance, or has trouble scheduling&lt;br /&gt;
the instructions.  A look at the assembly shows that &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; unrolled the non-unrolled case by a&lt;br /&gt;
factor of 4 (good!), but scheduled the instructions rather poorly.&lt;br /&gt;
The unroll factor 2 case was unrolled by an additional factor of 2, and scheduled poorly.&lt;br /&gt;
Ok, so &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; is trying to unroll the loops and just not scheduling the instructions very well after unrolling.&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;b&amp;gt;while&amp;lt;/b&amp;gt; loops look very similar to the &amp;lt;b&amp;gt;for&amp;lt;/b&amp;gt; loop results.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                  description   absolute   operations   ratio with&lt;br /&gt;
number                              time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;int32_t while loop unroll 1&amp;quot;  10.27 sec   233.69 M     1.00&lt;br /&gt;
 1  &amp;quot;int32_t while loop unroll 2&amp;quot;  10.58 sec   226.84 M     1.03&lt;br /&gt;
 2  &amp;quot;int32_t while loop unroll 3&amp;quot;  10.61 sec   226.20 M     1.03&lt;br /&gt;
 3  &amp;quot;int32_t while loop unroll 4&amp;quot;   8.90 sec   269.66 M     0.87&lt;br /&gt;
 4  &amp;quot;int32_t while loop unroll 5&amp;quot;   9.62 sec   249.48 M     0.94&lt;br /&gt;
 5  &amp;quot;int32_t while loop unroll 6&amp;quot;   9.78 sec   245.40 M     0.95&lt;br /&gt;
 6  &amp;quot;int32_t while loop unroll 7&amp;quot;   9.23 sec   260.02 M     0.90&lt;br /&gt;
 7  &amp;quot;int32_t while loop unroll 8&amp;quot;   9.87 sec   243.16 M     0.96&lt;br /&gt;
 8  &amp;quot;int32_t while loop unroll 9&amp;quot;   9.31 sec   257.79 M     0.91&lt;br /&gt;
 9 &amp;quot;int32_t while loop unroll 10&amp;quot;   8.97 sec   267.56 M     0.87&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;b&amp;gt;do&amp;lt;/b&amp;gt; loops show the same oddity on the non-unrolled case as the integer loops.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test               description   absolute   operations   ratio with&lt;br /&gt;
number                           time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;int32_t do loop unroll 1&amp;quot;   8.42 sec   285.04 M     1.00&lt;br /&gt;
 1  &amp;quot;int32_t do loop unroll 2&amp;quot;  10.70 sec   224.30 M     1.27&lt;br /&gt;
 2  &amp;quot;int32_t do loop unroll 3&amp;quot;  10.09 sec   237.86 M     1.20&lt;br /&gt;
 3  &amp;quot;int32_t do loop unroll 4&amp;quot;   9.86 sec   243.41 M     1.17&lt;br /&gt;
 4  &amp;quot;int32_t do loop unroll 5&amp;quot;   9.23 sec   260.02 M     1.10&lt;br /&gt;
 5  &amp;quot;int32_t do loop unroll 6&amp;quot;   9.78 sec   245.40 M     1.16&lt;br /&gt;
 6  &amp;quot;int32_t do loop unroll 7&amp;quot;   8.57 sec   280.05 M     1.02&lt;br /&gt;
 7  &amp;quot;int32_t do loop unroll 8&amp;quot;   9.75 sec   246.15 M     1.16&lt;br /&gt;
 8  &amp;quot;int32_t do loop unroll 9&amp;quot;   9.31 sec   257.79 M     1.11&lt;br /&gt;
 9 &amp;quot;int32_t do loop unroll 10&amp;quot;   8.68 sec   276.50 M     1.03&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the &amp;lt;b&amp;gt;goto&amp;lt;/b&amp;gt; loops look about the same as the &amp;lt;b&amp;gt;do&amp;lt;/b&amp;gt; loops.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
test                 description   absolute   operations   ratio with&lt;br /&gt;
number                             time       per second   test0&lt;br /&gt;
&lt;br /&gt;
 0  &amp;quot;int32_t goto loop unroll 1&amp;quot;   8.42 sec   285.04 M     1.00&lt;br /&gt;
 1  &amp;quot;int32_t goto loop unroll 2&amp;quot;  10.69 sec   224.51 M     1.27&lt;br /&gt;
 2  &amp;quot;int32_t goto loop unroll 3&amp;quot;  10.16 sec   236.22 M     1.21&lt;br /&gt;
 3  &amp;quot;int32_t goto loop unroll 4&amp;quot;   9.84 sec   243.90 M     1.17&lt;br /&gt;
 4  &amp;quot;int32_t goto loop unroll 5&amp;quot;   9.24 sec   259.74 M     1.10&lt;br /&gt;
 5  &amp;quot;int32_t goto loop unroll 6&amp;quot;   9.77 sec   245.65 M     1.16&lt;br /&gt;
 6  &amp;quot;int32_t goto loop unroll 7&amp;quot;   8.56 sec   280.37 M     1.02&lt;br /&gt;
 7  &amp;quot;int32_t goto loop unroll 8&amp;quot;   9.73 sec   246.66 M     1.16&lt;br /&gt;
 8  &amp;quot;int32_t goto loop unroll 9&amp;quot;   9.29 sec   258.34 M     1.10&lt;br /&gt;
 9 &amp;quot;int32_t goto loop unroll 10&amp;quot;   8.67 sec   276.82 M     1.03&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h4&amp;gt;Conclusions for int32_t loop unrolling, with -funroll-loop:&amp;lt;/h4&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&amp;lt;li&amp;gt;With the -funroll-loops flag, &amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; is unrolling some smaller loops.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; can generate faster code for these loops if it did a better job of scheduling&lt;br /&gt;
the instructions after unrolling.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;b&amp;gt;gcc&amp;lt;/b&amp;gt; can still generate faster code for the non-unrolled &amp;lt;b&amp;gt;for&amp;lt;/b&amp;gt; and &amp;lt;b&amp;gt;while&amp;lt;/b&amp;gt;&lt;br /&gt;
loops if it generated code similar to the &amp;lt;b&amp;gt;do&amp;lt;/b&amp;gt; and &amp;lt;b&amp;gt;goto&amp;lt;/b&amp;gt; loops.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;/div&gt;</summary>
		<author><name>ChrisCox</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/Performance/Analysis/Examples</id>
		<title>Performance/Analysis/Examples</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Performance/Analysis/Examples"/>
				<updated>2008-05-30T03:17:20Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;h2&amp;gt;What do all those numbers mean?&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Ok, you've downloaded the benchmark source, compiled and run it, and now you have a text file with a lot of numbers.&lt;br /&gt;
How do you interpret those numbers?&lt;br /&gt;
&lt;br /&gt;
You are looking at the raw numbers from each test, not a weighted average or formatted summary.&lt;br /&gt;
Eventually, we will have a web site where you can look at the raw numbers from different compilers and platforms,&lt;br /&gt;
as well as analysis that has broken down the raw numbers into something a little smaller and easier to digest.&lt;br /&gt;
But that's going to take time to build (both the analysis and the web site).&lt;br /&gt;
For now, these are engineering numbers, not marketing numbers. You can think of it as a list of&lt;br /&gt;
times taken by each of the runners in a track event, not the short list of olympic medal winners -- useful if you&lt;br /&gt;
want to do some analysis, but not as useful if all you want to know is the winner.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h2&amp;gt;Examples of analyzing benchmark results:&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Performance/Analysis/Example1|loop_unroll.cpp with gcc]]&lt;br /&gt;
&lt;br /&gt;
[[Performance/Analysis/Example2|simple_types_loop_invariant.cpp with LLVM]]&lt;br /&gt;
&lt;br /&gt;
[[Performance/Analysis/Example3|stepanov_abstraction.cpp with MSVC]]&lt;/div&gt;</summary>
		<author><name>ChrisCox</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/Performance/WhatToTest</id>
		<title>Performance/WhatToTest</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Performance/WhatToTest"/>
				<updated>2008-05-06T00:50:43Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: /* More Complex Stuff */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;What things need to be tested as part of this benchmark suite?&lt;br /&gt;
&lt;br /&gt;
If you have additions or suggestions, please add them.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Fundamental Concepts ==&lt;br /&gt;
Most of these can't easily be tested.&lt;br /&gt;
* regular types&lt;br /&gt;
* basic math&lt;br /&gt;
* basic flow control&lt;br /&gt;
* dereference&lt;br /&gt;
** a pointer/iterator is a regular type that can be dereferenced&lt;br /&gt;
* type conversion&lt;br /&gt;
** performance of type conversions (known to be a problem on certain compilers)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Language Concepts ==&lt;br /&gt;
* abstraction&lt;br /&gt;
**what happens to performance when I put { } around a value?&lt;br /&gt;
&lt;br /&gt;
* inheritance&lt;br /&gt;
**what happens if my function is in a child class, virtual, or both?&lt;br /&gt;
**what happens if my values are in a child class?&lt;br /&gt;
&lt;br /&gt;
* function calls - how expensive are different styles of function call?&lt;br /&gt;
** inline, function pointer, defined in another module, etc.&lt;br /&gt;
&lt;br /&gt;
* exceptions - what is the cost of using exceptions?&lt;br /&gt;
** throwing versus not&lt;br /&gt;
** compared to error return codes&lt;br /&gt;
** exceptions enabled or not in the compiler&lt;br /&gt;
** compared to setjmp/longjmp?&lt;br /&gt;
&lt;br /&gt;
* RTTI - what are the costs?&lt;br /&gt;
** compared to per-object ids or strings&lt;br /&gt;
&lt;br /&gt;
* function objects - what are the costs/benefits compared to inline functions and function pointers?&lt;br /&gt;
&lt;br /&gt;
* template instantiation&lt;br /&gt;
** are there any problems with duplicate template bodies and code size?&lt;br /&gt;
** are there any problems with recursive templates or deep usage of templates?&lt;br /&gt;
&lt;br /&gt;
* constructors&lt;br /&gt;
** Are empty constructors correctly optimized away?&lt;br /&gt;
** Are copy constructors correctly optimized away?&lt;br /&gt;
** Are default copy constructors efficient?&lt;br /&gt;
&lt;br /&gt;
== Simple Idioms ==&lt;br /&gt;
These are building blocks, with more than one way to express them.  These are commonly reused things, usually found in headers like &amp;quot;inlines.h&amp;quot;, &amp;quot;math_utils.h&amp;quot;, etc.&lt;br /&gt;
&lt;br /&gt;
For all of these:   What is the optimal implementation, and does the compiler recognize other implementations and substitute the optimal one?&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* swapping 2 values&lt;br /&gt;
** does anything need to be tested for swapping 2 different types, or is that handled by assignment/conversion testing?&lt;br /&gt;
* absolute value&lt;br /&gt;
* min/max of 2 values&lt;br /&gt;
* switch versus if/else trees&lt;br /&gt;
* pin values to range (max, min combo)&lt;br /&gt;
* rotate bits within a byte/word&lt;br /&gt;
* byte order reversal&lt;br /&gt;
* round up/down&lt;br /&gt;
* round float to int&lt;br /&gt;
* thresholding&lt;br /&gt;
* pointer alignment tests and comparisons&lt;br /&gt;
* fixed point math&lt;br /&gt;
* extended precision math (ie: 64 bit math on a 32 bit CPU)&lt;br /&gt;
&lt;br /&gt;
== Complex Idioms ==&lt;br /&gt;
These may require multiple optimizations to perform well, or some pattern/idiom recognition in the compiler.&lt;br /&gt;
These are simple algorithms important to many different tasks/fields, but are frequently the inner loops and most performance sensitive code.&lt;br /&gt;
&amp;lt;BR&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* safe bool cast from a smart pointer&lt;br /&gt;
* hand coded memcpy, memmove, memset, memcmp&lt;br /&gt;
** simple types and user defined types&lt;br /&gt;
** arrays and containers&lt;br /&gt;
* convolution&lt;br /&gt;
** 1D, 2D, and more?&lt;br /&gt;
* matrix multiplication&lt;br /&gt;
** 2D is common&lt;br /&gt;
** higher dimensions get a bit esoteric&lt;br /&gt;
* min/max of a sequence&lt;br /&gt;
* summation of a sequence&lt;br /&gt;
* product of a sequence&lt;br /&gt;
* dot product of sequences&lt;br /&gt;
* sorting a sequence&lt;br /&gt;
** many possible algorithms and implementations&lt;br /&gt;
** given the permutations, is this testable?&lt;br /&gt;
** maybe just test STL implementations?&lt;br /&gt;
* reversing a sequence&lt;br /&gt;
* rotating members of a sequence&lt;br /&gt;
* find/search sequence&lt;br /&gt;
** again, many possible algorithms and implementations&lt;br /&gt;
** given the permutations, is this testable?&lt;br /&gt;
** maybe just test STL implementations?&lt;br /&gt;
* lookup tables&lt;br /&gt;
* interleave/deinterleave buffers&lt;br /&gt;
* transpose block&lt;br /&gt;
* rotate a 2D block&lt;br /&gt;
** plus or minus 90 degrees&lt;br /&gt;
** 180 degrees&lt;br /&gt;
** horizontal flip&lt;br /&gt;
** vertical flip&lt;br /&gt;
** transpose&lt;br /&gt;
* histogram building&lt;br /&gt;
* stack like operations&lt;br /&gt;
* bitarrays and manipulations of them&lt;br /&gt;
* complex type and template&lt;br /&gt;
&lt;br /&gt;
== Runtime and Library Support ==&lt;br /&gt;
These are library functions or header defined functions/macros supplied by the compiler or OS. These can sometimes be the performance limiting factor in a variety of applications.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* allocation / deletion&lt;br /&gt;
** new, delete&lt;br /&gt;
** malloc, free&lt;br /&gt;
** performance for various sizes (testing suballocation)&lt;br /&gt;
&lt;br /&gt;
* mathlib&lt;br /&gt;
** all functions&lt;br /&gt;
** compared to inline versions of some functions&lt;br /&gt;
&lt;br /&gt;
* string manipulation routines&lt;br /&gt;
** strcpy, strcmp, strchr, strcat, etc.&lt;br /&gt;
** strncpy, strncmp, strncat, etc.&lt;br /&gt;
&lt;br /&gt;
* memory routines&lt;br /&gt;
** memcpy, memmove, memset, memcmp, etc.&lt;br /&gt;
&lt;br /&gt;
* ctype routines&lt;br /&gt;
** isdigit, isalpha, etc.&lt;br /&gt;
&lt;br /&gt;
* stdio&lt;br /&gt;
** printf, getc, fread, fwrite, etc.&lt;br /&gt;
&lt;br /&gt;
* C++ iostreams&lt;br /&gt;
** cout, ostream, etc.&lt;br /&gt;
** compare cout versus printf&lt;br /&gt;
&lt;br /&gt;
* stdlib&lt;br /&gt;
** strtod, strtof&lt;br /&gt;
&lt;br /&gt;
* threads&lt;br /&gt;
** creation, destruction, sleep overhead, semaphores, mutexes&lt;br /&gt;
&lt;br /&gt;
== More Complex Stuff ==&lt;br /&gt;
* STL containers&lt;br /&gt;
** insertion&lt;br /&gt;
*** push_front&lt;br /&gt;
*** push_back&lt;br /&gt;
*** in-order, reverse-order and random insert for ordered containers&lt;br /&gt;
** removal&lt;br /&gt;
** deletion of container&lt;br /&gt;
** iterating items&lt;br /&gt;
** reverse iterating items&lt;br /&gt;
** copy containers&lt;br /&gt;
** erase all entries&lt;br /&gt;
** search/find/lookup&lt;br /&gt;
*** in-order, reverse-order and random&lt;br /&gt;
** vector&lt;br /&gt;
*** is iteration of vector the same as iterating a pointer&lt;br /&gt;
*** is operator[] the same speed as indexing a pointer&lt;br /&gt;
*** what is the penalty for using operator at() range checking&lt;br /&gt;
&lt;br /&gt;
* STL algorithms&lt;br /&gt;
* iostreams&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* strstr / text searching&lt;br /&gt;
** no &amp;quot;best&amp;quot; algorithm&lt;br /&gt;
** large variance in optimum based on alphabet (roman text, gene sequencing, bit pattern)&lt;br /&gt;
&lt;br /&gt;
* Discrete Wavelet/Fourier/Cosine/Sine Transform&lt;br /&gt;
** 1D (Audio, PDE spectral methods)&lt;br /&gt;
** 2D (video, image compression)&lt;br /&gt;
** simple and optimized algorithms&lt;br /&gt;
** wavelet and FFT have many near optimal implementations&lt;br /&gt;
** JPEG, MPEG&lt;br /&gt;
&lt;br /&gt;
== Optimizations ==&lt;br /&gt;
We need to test as many optimizations as possible.  Priority should be given to those that are used most often and offer the highest benefit to the most code (which isn't always easy to judge).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* algebraic simplification&lt;br /&gt;
* constant folding&lt;br /&gt;
* constant propagation&lt;br /&gt;
* copy propagation&lt;br /&gt;
* common subexpression elimination&lt;br /&gt;
* loop invariant code motion&lt;br /&gt;
* dead code elimination&lt;br /&gt;
* strength reduction&lt;br /&gt;
* value range propagation (aka Predicate Simplification?)&lt;br /&gt;
* loop fusion&lt;br /&gt;
* loop interchange&lt;br /&gt;
* loop normalization&lt;br /&gt;
* loop removal&lt;br /&gt;
* loop unrolling&lt;br /&gt;
* loop unswitching&lt;br /&gt;
* normalization of references&lt;br /&gt;
* scalar replacement of arrays&lt;br /&gt;
* scalar replacement of structs&lt;br /&gt;
* instruction scheduling&lt;br /&gt;
* removal of unused objects/allocations&lt;br /&gt;
** this may not be legal in some circumstances&lt;br /&gt;
* inlining&lt;br /&gt;
** keyword inline&lt;br /&gt;
** auto inline of small functions&lt;br /&gt;
** functions defined in classes&lt;br /&gt;
** functions defined in templates&lt;br /&gt;
** depth of inlining (could be difficult to test)&lt;br /&gt;
** leaf first or trunk first?&lt;br /&gt;
** what factors might disable inlining on some compilers?&lt;br /&gt;
&lt;br /&gt;
== Conformance to Specification ==&lt;br /&gt;
* user specified swap in STL algorithms&lt;br /&gt;
* short circuit evaluation of boolean expressions&lt;br /&gt;
* growth policies and complexities of STL containers&lt;br /&gt;
* concept checks compile away to nothing&lt;br /&gt;
&lt;br /&gt;
== Binary Size ==&lt;br /&gt;
* Are duplicate templates correctly collapsed?&lt;br /&gt;
* Are unused template member functions correctly removed?&lt;br /&gt;
* Are unused (or duplicate) static, const, and static const values correctly removed?&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Language Extensions ==&lt;br /&gt;
* [http://www.boost.org/ Boost]&lt;br /&gt;
* [http://openmp.org/wp/ OpenMP]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Other ? ==&lt;br /&gt;
* ineffiency returning or working with std::pair ?&lt;/div&gt;</summary>
		<author><name>ChrisCox</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/Performance/ReleaseNotes</id>
		<title>Performance/ReleaseNotes</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Performance/ReleaseNotes"/>
				<updated>2008-05-05T21:26:33Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;h1&amp;gt;Release 2 - December 8, 2008&amp;lt;/h1&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h2&amp;gt;Known Problems&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;Line out of place in functionobjects.cpp near line 334.&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;i&amp;gt;&amp;lt;b&amp;gt;Moved the line to the correct location.&amp;lt;/b&amp;gt;&amp;lt;/i&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;makefile.nt was not updated to include the new test files&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;i&amp;gt;&amp;lt;b&amp;gt;Fixed makefile.nt, posted update.&amp;lt;/b&amp;gt;&amp;lt;/i&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h2&amp;gt;Files&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;dl&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;dt&amp;gt;functionobjects.cpp&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
This is a demonstration of the performance of of function pointers, functors, and native comparison operators.  Some compilers have difficulty instantiating simple functors.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;simple_types_constant_folding.cpp&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
A test to see if the compiler will correctly fold constants and simple constant math for simple types.  Results may be surprising.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;stepanov_vector.cpp&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Similar to the abstraction penalty benchmark, this file tests what happens when we move from pointers to vector iterators and when we use reverse iterators.  This tests the compiler supplied STL implementation in addition to the compiler itself.  Some good compilers are crippled by bad STL implementations.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/dl&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h1&amp;gt;Release 1 - May 5, 2008&amp;lt;/h1&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h2&amp;gt;Fixed Problems&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Some Linux distributions have incomplete MACH/gcc headers.  I need a simple way to determine whether they have the TR1 headers or not.  Unfortunately, I don't have any of those distributions.&lt;br /&gt;
&amp;lt;i&amp;gt;&amp;lt;b&amp;gt;Related code is now commented out.&amp;lt;/b&amp;gt;&amp;lt;/i&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;Some 64 bit Linux distributions don't like benchmark_stdint.hpp.&lt;br /&gt;
&amp;lt;i&amp;gt;&amp;lt;b&amp;gt;Fix is posted, but really needs to be tested on more OSes.&amp;lt;/b&amp;gt;&amp;lt;/i&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;At least one embedded compiler didn't like the order of templates in loop_unroll.cpp.&lt;br /&gt;
&amp;lt;i&amp;gt;&amp;lt;b&amp;gt;Changed the template order.&amp;lt;/b&amp;gt;&amp;lt;/i&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h2&amp;gt;Files&amp;lt;/h2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;dl&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;dt&amp;gt;loop_unroll.cpp&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Test to see if compilers will correctly unroll loops to hide instruction latency.  Some compilers have problems expanding the templates, and most compilers have problems correctly unrolling the loops for best performance.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;simple_types_loop_invariant.cpp&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
A test to see if the compiler will move loop invariant calculations out of the loop.  Most compilers have room for improvement.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;stepanov_abstraction.cpp&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
An expanded version of the original test, answering &amp;quot;what happens to performance when I wrap a value in curly braces?&amp;quot;  Almost all compilers do well on the original summation tests, but they don't do nearly so well on simple sort routines using the same abstractions.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;dt&amp;gt;machine.cpp&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
A utility program to print out information about the compiler version, OS, and machine environment - because it is nice to know which build of your compiler generated a particular report, and which of the 30 machines in your lab that it was run on.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;dt&amp;gt;benchmark_algorithms.h&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Templated algorithms shared by several test files.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;benchmark_results.h&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Shared result reporting and formatting functions for all tests, must work in C and C++.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;benchmark_shared_tests.h&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Templated test functions shared by several test files.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;benchmark_stdint.h&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Standard type definitions, required because some compilers still have not picked up C99 standard headers.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;benchmark_timer.h&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Shared timer function for all tests, must work in C and C++.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;dt&amp;gt;makefile&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Make information for Unix based OSes.  &amp;quot;make report&amp;quot; to build all binaries and start a benchmark run.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;makefile.nt&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Make information for Windows.  &amp;quot;nmake -f makefile.nt report&amp;quot; (in a shell/command prompt with the appropriate compiler variables set) to build all binaries and start a benchmark run.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;dt&amp;gt;LICENSE_1_0_0.txt&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
License details, a copy can also be found at http://stlab.adobe.com/licenses.html .&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&amp;lt;dt&amp;gt;README.txt&amp;lt;/dt&amp;gt;&lt;br /&gt;
&amp;lt;dd&amp;gt;&lt;br /&gt;
Basic information about the benchmark suite: goals, rules, and instructions on how to build and run.&lt;br /&gt;
&amp;lt;/dd&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/dl&amp;gt;&lt;/div&gt;</summary>
		<author><name>ChrisCox</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/How_To_Write_A_Simple_Lexical_Analyzer_or_Parser</id>
		<title>How To Write A Simple Lexical Analyzer or Parser</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/How_To_Write_A_Simple_Lexical_Analyzer_or_Parser"/>
				<updated>2008-05-02T02:33:51Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: /* A Simple Recursive Decent Parser */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
This ''paper'' is a response to an e-mail question asking if anyone had code to validate that a sequence of characters was a valid XML name which was not a reserved XML name (starting with &amp;lt;tt&amp;gt;'x'|'X' 'm'|'M' 'l'|'L'&amp;lt;/tt&amp;gt;) or a qualified name (one containing &amp;lt;tt&amp;gt;':'&amp;lt;/tt&amp;gt;). ''See [http://www.w3.org/TR/REC-xml/#sec-common-syn Section 2.3 of the XML Specification].''&lt;br /&gt;
&lt;br /&gt;
Languages are defined using a grammar. There are many notations for grammars but most use some variant of [http://en.wikipedia.org/wiki/Extended_Backus–Naur_form Extended Backus-Naur form] (EBNF), XML is no exception to this rule. The variant of EBNF used for the XML notation is described in [http://www.w3.org/TR/REC-xml/#sec-notation Section 6 of the XML Specification]. Transforming an EBNF grammar into a lexical analyzer and/or parser is a relatively straight forward process. For simplicity I define the lexical analyzer as the system which transforms a stream of characters into tokens and the parser as the system which transforms a stream of tokens into a sequence of actions for each production rule. There are libraries, such as the [http://www.boost.org/doc/libs/1_35_0/libs/spirit/index.html Boost Spirit Library] and tools such as [http://en.wikipedia.org/wiki/Lex_programming_tool Lex] and [http://en.wikipedia.org/wiki/Yacc Yacc] which can be used to aid in writing lexical analyzers and parser. Knowing how to write these systems simply and directly is an invaluable tool in any programmers tool chest and experience with writing such systems will make you more productive when using libraries and tools and give you some insight into when you should and should not use them. Some issues (including this one) can also be addressed by writing a regular expression using a library such as the [http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html Boost Regex Library] - however, dealing with the numerous ranges of values in the grammar production with this problem are difficult to express in a regular expression. A [http://en.wikipedia.org/wiki/Finite_state_machine finite automata] can also be constructed for many such problems (including this one) and such systems have the advantage of inverting control flow - you put values into a finite automata as opposed to executing a parse function which pulls values from a stream.&lt;br /&gt;
&lt;br /&gt;
When you look at a typical EBNF grammar it will frequently not distinguish which parts are the lexical analyzer production rules and which are the parser production rules - the structure of both a simple parser and simple lexer will be the same. The separation is the point where the value returned by the production rule is not a character from the stream, but some form of token. Because the structure is the same I'm going to refer from here on to both parts as simply the ''parser''.&lt;br /&gt;
&lt;br /&gt;
== A Simple Recursive Decent Parser ==&lt;br /&gt;
&lt;br /&gt;
To solve the problem of validating an XML name we're going to write a simple recursive decent, or top-down, parser. The idea is to read from a stream of characters denoted by a pair of InputIterators - we will only need to read each character once without backtracking. A grammar which can be read in this form without backtracking is known as an [http://en.wikipedia.org/wiki/LL_parser LL(1)] grammar.&lt;br /&gt;
&lt;br /&gt;
To do this we are going to translate each production rule into a function which will return a &amp;lt;tt&amp;gt;bool&amp;lt;/tt&amp;gt; with the value &amp;lt;tt&amp;gt;true&amp;lt;/tt&amp;gt; if it recognizes the production and false if it does not. If it does recognize the production then the stream will be advanced to the end of the production and a value will be updated with the recognized production. Otherwise, the stream and value are left unchanged. Such a function will follow this pattern:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bool is_production(I&amp;amp; first, I last, T&amp;amp; value);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The stream denoted by &amp;lt;tt&amp;gt;[first, last)&amp;lt;/tt&amp;gt; is going to be UTF16 code points because that is how the XML specification is defined. We'll define code_point_t to be a 16 bit integer:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
typedef uint16_t code_point_t;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rather than passing &amp;lt;tt&amp;gt;first&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;last&amp;lt;/tt&amp;gt; through all of our functions we're going to create a simple struct which holds them and write the &amp;lt;tt&amp;gt;is_production()&amp;lt;/tt&amp;gt; as a member function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
template &amp;lt;typename I&amp;gt; // I models InputIterator&lt;br /&gt;
// require value_type(I) == code_point_t&lt;br /&gt;
    bool is_production(T&amp;amp; value);&lt;br /&gt;
&lt;br /&gt;
    I f_m; // first&lt;br /&gt;
    I l_m; // last&lt;br /&gt;
};&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Writing the body of the production is simple. We start at the top level production rule and work our way down - in this case we are interested in parsing &amp;lt;tt&amp;gt;Name&amp;lt;/tt&amp;gt;. The &amp;lt;tt&amp;gt;Name&amp;lt;/tt&amp;gt; production rule looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Name ::= (Letter | '_' | ':') (NameChar)*&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And we could write this production something like the following:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bool is_name(string16_t&amp;amp; s)&lt;br /&gt;
{&lt;br /&gt;
    string16_t   result;&lt;br /&gt;
    code_point_t c;&lt;br /&gt;
&lt;br /&gt;
    /* (Letter | '_' | ':') */&lt;br /&gt;
    if (is_letter(c) || is_match('_', c) || is_match(':', c)) ; else return false;&lt;br /&gt;
&lt;br /&gt;
    result += c;&lt;br /&gt;
&lt;br /&gt;
    /* (NameChar)* */&lt;br /&gt;
    while(is_name_char(c)) result += c;&lt;br /&gt;
&lt;br /&gt;
    s = result;&lt;br /&gt;
    return c;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here string16_t is our token type. For this problem, we don't actually care what the token value is - and constructing the token is relatively expensive. If we were writing a full-blown XML parser we would spend some energy on dealing with tokens efficiently. This code also doesn't tell is if the name is qualified or reserved - those are attributes of our token value. We rewrite the function to drop the token and add some attributes. We'll deal with attributes in a similar fashion to the value of the production rule, except if the production rule is recognized the attributes will be accumulated (as opposed to overwritten). We don't need to pass the attribute to all production rules, only those that have attributes. Here is the definition of our attribute type:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
enum xml_parse_attribute_t {&lt;br /&gt;
    xml_none        = 0,&lt;br /&gt;
    xml_qualified   = 1,&lt;br /&gt;
    xml_reserved    = 2&lt;br /&gt;
};&lt;br /&gt;
&lt;br /&gt;
xml_parse_attribute_t&amp;amp; operator|=(xml_parse_attribute_t&amp;amp; r, xml_parse_attribute_t x)&lt;br /&gt;
{ r = xml_parse_attribute_t(r | x); return r; }&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We define &amp;lt;tt&amp;gt;operator|=()&amp;lt;/tt&amp;gt; on our type to make it simple to accumulate. &amp;lt;tt&amp;gt;xml_qualified&amp;lt;/tt&amp;gt; will mean that the name contains a &amp;lt;tt&amp;gt;':'&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;xml_reserved&amp;lt;/tt&amp;gt; will mean it starts with the sequence &amp;lt;tt&amp;gt;xml&amp;lt;/tt&amp;gt;. To check if the name is reserved we unroll the first couple of iterations of our loop:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bool is_name(xml_parse_attribute_t&amp;amp; a)&lt;br /&gt;
{&lt;br /&gt;
    code_point_t c;&lt;br /&gt;
&lt;br /&gt;
    if (is_match(':', c)) { a |= xml_qualified; }&lt;br /&gt;
    else if (is_letter(c) || is_match('_', c)) ; else return false;&lt;br /&gt;
    bool reserved = c == code_point_t('X') || c == code_point_t('x');&lt;br /&gt;
&lt;br /&gt;
    if (is_name_char(c, a)) ; else return true;&lt;br /&gt;
    reserved &amp;amp;= (c == code_point_t('M') || c == code_point_t('m'));&lt;br /&gt;
&lt;br /&gt;
    if (is_name_char(c, a)) ; else return true;&lt;br /&gt;
    reserved &amp;amp;= (c == code_point_t('L') || c == code_point_t('l'));&lt;br /&gt;
&lt;br /&gt;
    while (is_name_char(c, a)) ;&lt;br /&gt;
&lt;br /&gt;
    if (reserved) a |= xml_reserved;&lt;br /&gt;
&lt;br /&gt;
    return true;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we can write our remaining production rules. We'll tackle &amp;lt;tt&amp;gt;NameChar&amp;lt;/tt&amp;gt; next. This is the only other production rule which needs to deal with attributes because it may contain a &amp;lt;tt&amp;gt;':'&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The specification of the XML parser is in terms of UCS2 code points (as opposed to characters) and this production is only going to recognize a single code point. Except for the minor complexity introduced to handle the attribute, the code is nearly identical to the production rule.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bool is_name_char(code_point_t&amp;amp; c, xml_parse_attribute_t&amp;amp; a)&lt;br /&gt;
{&lt;br /&gt;
    if (is_match(':', c)) { a |= xml_qualified; return true; }&lt;br /&gt;
&lt;br /&gt;
    return is_letter(c) || is_digit(c) || is_match('.', c)&lt;br /&gt;
        || is_match('-', c) || is_match('_', c) || is_combining_char(c) || is_extender(c);&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;is_letter()&amp;lt;/tt&amp;gt; function is even simpler:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Letter ::= BaseChar | Ideographic&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bool is_letter(code_point_t&amp;amp; c) { return is_base_char(c) || is_ideographic(c); }&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;is_match()&amp;lt;/tt&amp;gt; is a helper function for ''terminal'' productions. A terminal production is a production which is not defined in terms of another production. It is in the terminal productions where the interesting work is done. Here is the body.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bool is_match(code_point_t x, code_point_t&amp;amp; c)&lt;br /&gt;
{&lt;br /&gt;
    if (f_m == l_m || *f_m != x) return false;&lt;br /&gt;
    c = x; ++f_m; return true;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If we are at the end of our steam or the current character is not a match then return &amp;lt;tt&amp;gt;false&amp;lt;/tt&amp;gt;. Otherwise we set our value to the matched character, increment the stream, and return &amp;lt;tt&amp;gt;true&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The remaining productions are all of a form which match one of a set of code points:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Ideographic ::= [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Some of the tables, such as the one for &amp;lt;tt&amp;gt;BaseChar&amp;lt;/tt&amp;gt; are quite large making it prohibitively expensive to write an expression like &amp;lt;tt&amp;gt;(0x4E00U &amp;lt;= x &amp;amp;&amp;amp; x &amp;lt;= 0x9FA5U) || (x == 0x3007U) || ...&amp;lt;/tt&amp;gt;. Instead we will create a simple lookup table. When dealing with a parser that works with a simple &amp;lt;tt&amp;gt;char&amp;lt;/tt&amp;gt; type we could just build a table with 256 elements that contained an enumeration indicating what production rule the character matched. A table of 16 bit values would be a bit large (but we could do it - 64K elements) - but instead we'll create an array of values denoting ranges which contain the code points we want to match and search to find if our character is in one of the ranges. To do that we'll create a semi-open ranges for each character range of the form &amp;lt;tt&amp;gt;(f0, l0], (f1, l1], ...&amp;lt;/tt&amp;gt; and then use &amp;lt;tt&amp;gt;lower_bound()&amp;lt;/tt&amp;gt; to search the table, if we end up on an odd index (starting at 0) then the item is in the range, an even index it is not. This strategy means we can't represent a range which includes &amp;lt;tt&amp;gt;0&amp;lt;/tt&amp;gt; but &amp;lt;tt&amp;gt;0&amp;lt;/tt&amp;gt; is not a valid character in an XML production. The code for the table lookup for &amp;lt;tt&amp;gt;Ideographic&amp;lt;/tt&amp;gt; is this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bool in_ideographic(code_point_t c)&lt;br /&gt;
{&lt;br /&gt;
    static const code_point_t table[] =&lt;br /&gt;
    {&lt;br /&gt;
        0x4DFFU, 0x9FA5U, // [#x4E00-#x9FA5]&lt;br /&gt;
        0x3006U, 0x3007U, // #x3007&lt;br /&gt;
        0x3020U, 0x3029U  // [#x3021-#x3029]&lt;br /&gt;
    };&lt;br /&gt;
&lt;br /&gt;
    return (adobe::lower_bound(table, c) - boost::begin(table)) &amp;amp; 0x01;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To transform from the closed ranges to the semi-open ranges we subtract one from the first item in the range. This code doesn't have to be a member function and is not a template - we can put these lookup tables in a .cpp file. We'll use this function in our &amp;lt;tt&amp;gt;is_ideographic()&amp;lt;/tt&amp;gt; member function:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bool is_ideographic(code_point_t&amp;amp; c) { return is_match(in_ideographic, c); }&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;tt&amp;gt;is_match()&amp;lt;/tt&amp;gt; is another variant of our previous is_match which takes a predicate function instead of a value:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
bool is_match(bool (*p)(code_point_t), code_point_t&amp;amp; c)&lt;br /&gt;
{&lt;br /&gt;
    if (f_m == l_m || !p(*f_m)) return false;&lt;br /&gt;
    c = *f_m; ++f_m; return true;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The remaining productions are written the same way - just with larger tables. Our parser is complete!&lt;br /&gt;
&lt;br /&gt;
Finally - to make this a little simpler to invoke we write a wrapper function which just creates a temporary instance of the class and we add one last check to make sure that the valid name made up the entire range provided:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
template &amp;lt;typename I&amp;gt; // I models InputIterator&lt;br /&gt;
// require value_type(I) == code_point_t&lt;br /&gt;
bool valid_name_unqualified_unreserved(I f, I l)&lt;br /&gt;
{&lt;br /&gt;
    xml_parser&amp;lt;I&amp;gt; parser = { f, l };&lt;br /&gt;
    xml_parse_attribute_t a = xml_none; &lt;br /&gt;
    return parser.is_name(a) &amp;amp;&amp;amp; parser.f_m == l &amp;amp;&amp;amp; a == xml_none;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
That's the complete code. Besides having a solution to our problem we have a general framework for building other solutions to parsing problems and a pretty decent start that can be reused for other XML related issues. Parsers written in this style are simple to extend even if they are packaged as a library, you can simply inherit from the struct and add additional productions. The code described here took very little time to write (the biggest hassle was building the larger lookup tables from the productions in the XML specification.&lt;br /&gt;
&lt;br /&gt;
The complete example file can be found [http://stlab.adobe.com:8080/@md=d&amp;amp;cd=//sandbox/papers/xml_parser_example/&amp;amp;cdf=//sandbox/papers/xml_parser_example/xml_parser_example.cpp&amp;amp;c=9XT@//sandbox/papers/xml_parser_example/xml_parser_example.cpp?ac=64&amp;amp;rev1=2 here.]&lt;/div&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/Compositing_Models</id>
		<title>Compositing Models</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Compositing_Models"/>
				<updated>2008-04-21T20:46:57Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A common point of confusion with the property model library (aka Adam) is that this is a mechanism for constructing user interfaces. That isn't the case. The property model library is a library for constructing models of how discrete properties interact. These properties might represent the parameters to a function (as is often the case in a dialog) or may be properties of the document itself.&lt;br /&gt;
&lt;br /&gt;
Viewing the property model library as being part of the UI, leads one to attempt to construct a composite of layered models. Unfortunately, this doesn't work.&lt;br /&gt;
&lt;br /&gt;
The key to the Model-View-Controller pattern is that it is, logically, a simple DAG:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&lt;br /&gt;
[[image:composite_model_00.png]]&lt;br /&gt;
&lt;br /&gt;
''Model-View-Controller''&lt;br /&gt;
&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Using the property model library to build the model has many advantages including:&lt;br /&gt;
&lt;br /&gt;
* The property model is able to handle bidirectional relationships and solve based on a priority mechanism.&lt;br /&gt;
* The same model can be driven from a script or API with no UI attached.&lt;br /&gt;
* The model tracks contributing values which are useful for generating intelligent script which capture the user ''intent''.&lt;br /&gt;
* The model tracks connected components (components which may contribute) and this information is useful to control the enable/disable state of widgets (a disconnected component cannot contribute so the controller is disabled).&lt;br /&gt;
&lt;br /&gt;
The property model library does have some limitations including:&lt;br /&gt;
&lt;br /&gt;
* Only handles discrete values, no mechanism for modeling sequences or other structures.&lt;br /&gt;
* Only handles many-to-one relationships, no support for many-to-many relationships.&lt;br /&gt;
&lt;br /&gt;
There are times where the property model library is not the correct answer - before determining that the property model library is not the correct solution for you problem it is very important that you understand what you are modeling. There are many cases where people dismiss the property model library because they think it is somehow ''only'' a UI mechanism. Also, it is frequently the case the clients believe that they are dealing with a document model, when in fact they are dealing with a model of function parameters (sometimes ''function'' is called ''command''). Distinguishing the two cases is important. A simple test is to ask if any constraints in the model are in fact invariants on the model or, rather, constraints imposed as a mechanism to construct parameters to a function that will operate on the model.&lt;br /&gt;
&lt;br /&gt;
Treating a property model as part of the UI will lead to attempts to composite the property model with some other model. It is very tempting to bind the UI to a sheet and then have a model treat the view interface on the sheet as a way to connect another model (treating the sheet as a controller) and connect back to the sheet through the controller interface (treating the sheet as a view). This leads to the following structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&lt;br /&gt;
[[image:composite_model_01.png]]&lt;br /&gt;
&lt;br /&gt;
''Poorly Composited Models''&lt;br /&gt;
&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here &amp;quot;Model A&amp;quot; is the property model and &amp;quot;Model B&amp;quot; is some other model. The figure-eight links between the two create a problem. For example - let's say a request to change the model comes in from the controller to ''A''. The view is notified of the change before ''B'' gets a crack at the data - this is akin to an electrical short.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&lt;br /&gt;
[[image:composite_model_02.png]]&lt;br /&gt;
&lt;br /&gt;
''Effect of Controller Request''&lt;br /&gt;
&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The once ''B'' gets the data, runs it through it's model and then attempts to notify the changes back, then we find that ''B'' will be notified of the change it just made.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&lt;br /&gt;
[[image:composite_model_03.png]]&lt;br /&gt;
&lt;br /&gt;
''Effect of View Update''&lt;br /&gt;
&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is the same problem we have with any widget library which has logical state. I'm frequently heard telling people that if their widget has a ''get'' function then it is incorrect. The simplest solution to this problem is simply not to use ''A'' at all, provide your own model and don't use the property model library. Another solution, of course, is to only use the property model library. Because of the convenience of binding a layout to the property model library, however, some people still insist on attempting to composite models in this form (one side effect is an assert that fires in sheet_t::update() when it is re-entered. It may appear that the system would work without this assert, however, the first thing update() does is clears a set dirty flags - re-entering update() will give undefined behavior as to which views will be notified of the change that was in progress.&lt;br /&gt;
&lt;br /&gt;
There is a supported way that models can be composited - the property model library allows any function to be plugged in to calculate a derived value, such a function can be arbitrarily complex and work on arbitrary data types. This can be used to effectively embed another model within the property model library:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&lt;br /&gt;
[[image:composite_model_04.png]]&lt;br /&gt;
&lt;br /&gt;
''Using Functions to Embed Models''&lt;br /&gt;
&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This structure allows you to keep all of the advantages of the property model library.&lt;br /&gt;
&lt;br /&gt;
A number of clients would like to use the property model library simply as a [http://en.wikipedia.org/wiki/Breadboard breadboard]. This would effectively remove the property model from the system except as a connection point:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;center&amp;gt;&lt;br /&gt;
[[image:composite_model_05.png]]&lt;br /&gt;
&lt;br /&gt;
''Breadboard Model''&lt;br /&gt;
&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is currently no way to do this with the property model library. It would not be difficult to add - likely I would add a new cell type, ''connection'', and an input/output API which would let you connect to the two halves of the cell. Input will always refer to information coming from the controller, output information going to the view. So a widget would set the input cell and another model would monitor the input cell, and set the output cell which a view would monitor. This would really just be a conduit and not have much at all to do with the property model library (these cells could not be referenced from the sheet) but would provide a breadboard to connect other models to.&lt;/div&gt;</summary>
		<author><name>SeanParent</name></author>	</entry>

	<entry>
		<id>http://stlab.adobe.com/wiki/index.php/Performance/home</id>
		<title>Performance/home</title>
		<link rel="alternate" type="text/html" href="http://stlab.adobe.com/wiki/index.php/Performance/home"/>
				<updated>2008-03-17T23:49:23Z</updated>
		
		<summary type="html">&lt;p&gt;Summary: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Not much here yet.  This will hold a lot of the documentation for the C++ Benchmarks suite.&amp;lt;P&amp;gt;&lt;br /&gt;
See http://stlab.adobe.com/performance/ for the suite main web site.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* [[Performance/ReleaseNotes|Release notes]]&lt;br /&gt;
&lt;br /&gt;
* [[Performance/Analysis/Examples|Some example analyses of results]]&lt;br /&gt;
&lt;br /&gt;
* [[Performance/WhatToTest|What is tested and what needs to be tested?]]&lt;/div&gt;</summary>
		<author><name>ChrisCox</name></author>	</entry>

	</feed>