Figure 13: OMLET results for test samples from the GRUFF chair database.
Figure 13 shows the plot of the average error per sample versus training set size for examples from the conventional chair category, and a separate plot for examples from the straightback chair category. Since there are only 28 straightback chair examples, only 3 different training set sizes (6,12,18) were evaluated in addition to the leave-one-out testing. All 78 conventional chair examples were used to train the ranges associated with the conventional chair category before the ranges for the straightback chair category were trained. No testing was done for the subcategory armchair since there were only four training samples available. The plot shows that increasing the number of training samples generally leads to a reduction in the average error. When more than 20 training examples are used, the actual evaluation measures of the test examples are within approximately 1% of the desired evaluation measures for both the conventional chair and straightback chair categories.
We should note here that the errors in overall evaluation measures found for categories at different learning levels are not directly comparable. So, the plot of the error rate for the straightback chair category is not directly comparable to the plot for the conventional chair category (Figure 13). As an example, consider an object with a desired overall evaluation measure of 0.85 for the category conventional chair. If OMLET computes an actual evaluation measure of 0.86, then the error for this example is 0.01. Let's assume the provides_back_support portion of this object has a desired evaluation measure of 0.75. The overall desired evaluation measure for this example in the category straightback chair would be 0.9625 (POR of 0.85 and 0.75). Now, suppose OMLET finds the actual evaluation measure for the back support of the object to be 0.76, or an error of 0.01. In this case, the actual overall evaluation measure of this example for the category straightback chair would be 0.9664 (POR of 0.86 and 0.76). As a result, the error of 0.01 attributed to the provides_back_support portion of the object is manifested as a much smaller error of 0.0039 in the overall evaluation measure of the object.
The original range parameters (z1,n1,n2,z2) hand-crafted by an expert for the three ranges in the conventional chair definition (see Figure 4) are: AREA (0.057599 0.135 0.22 0.546699)
CONTIGUOUS SURFACE (0.0 1.0 1.0 1.0)
HEIGHT (0.275 0.4 0.6 1.1)
These are the range values used by GRUFF to determine the desired evaluation measures in the goals provided to OMLET. A typical example of the range parameters as learned by OMLET is: AREA (0.057599 0.135002 0.219992 0.546706)
CONTIGUOUS_SURFACE (7.45591e-06 0.999995 10000 10000)
HEIGHT (0.275 0.400002 0.6 1.10009)
OMLET was able to determine that the CONTIGUOUS_SURFACE range was a one-legged membership function, and the n2 and z2 values (i.e., the leg that does not exist) were set to arbitrarily large values. These results show that the OMLET system is capable of using labeled examples to automatically determine range parameters which are similar to those that would be hand-crafted by an expert. This will facilitate the construction of other object category definitions.
In Figure 13, we can see that the number of training samples does indeed affect the error rate of test samples. With more than 20 or so training samples, the error rates for both the conventional chair and straightback chair categories begin to level off. So, the number of training samples becomes less of a factor affecting system performance if a sufficient number are used. What constitutes a sufficient number of training samples for a category may depend on the number of ranges to be learned and the quality of the training data. There are 3 ranges that must be learned for the category conventional chair, and 5 ranges that must be learned for the category straightback chair. The histograms of desired evaluation measures for the GRUFF conventional chairs and the back supports of the GRUFF straightback chairs in Figure 11 A and B, respectively, reflect the quality of the training data used for the leave-one-out tests.
We can isolate the effect of the quality of the training data with some additional experiments utilizing two separate data sets of GRUFF conventional chair examples. The number of training epochs, the number of training samples, and the number of ranges to be learned will be identical for each data set. One data set of 38 ``bad" examples contains all conventional chair examples with desired evaluation measures less than 0.6. A second data set of ``good" examples was created by selecting 38 of the remaining conventional chair examples. The histograms of desired evaluation measures for the examples used in the ``good" and ``bad" data sets are shown in Figure 11 C and D, respectively. Leave-one-out testing (37 training examples) resulted in an average error of 0.0001 for the examples in the ``good" data set, and 0.1869 for the examples in the ``bad" data set. Thus, it would seem that the quality of the training data has a considerable effect on the performance of the learning algorithm.
Using the set of 38 ``good" conventional chair examples to train OMLET, the average error found using the 38 ``bad" examples to test drops to 0.013 (compared to an average error of 0.1869 when 37 ``bad" examples are used to train). A closer examination of the results reveals that one ``bad" example contributes a relatively high error of 0.5 to the average. If this single example is excluded from the test results, the average error of the remaining 37 ``bad" examples is only 0.00067. If the 38 ``bad" examples are used to train OMLET, the average error found using the 38 ``good" examples to test is 0.242. These results indicate that OMLET is not inherently biased to produce more accurate test results for ``good" examples since we are able to achieve a low error rate for the ``bad" examples when ``good" training data is used. Rather, these results emphasize the importance of controlling the quality of the data used to train OMLET.