Thomas R. Hinrichs
The Institute for the Learning Sciences
Northwestern University
Evanston, IL 60201
This paper presents an analysis of some of the limitations of feature-based recognition and describes a process that integrates structural recognition with retrieval. This structural recognition algorithm is designed to augment the retrieval capabilities of case-based reasoners by facilitating the recognition of functional design cliches, natural laws, and sub problems for which individual features may not be predictive.
One reason that feature-based recognition is so ubiquitous is the assumption that recognition should be cheap. After all, human beings easily recognize faces and situations on the order of milliseconds. But the deceptive ease with which we recognize the familiar may lead us astray by suggesting computational models that emphasize reduction to simple primitives that are ``immediate'' or ``free''. Moreover, by reducing to primitive features, it is easy to see how to amortize the cost of recognition by indexing concepts and cases at storage time, rather than performing complex computations at retrieval/recognition time. This apparent computational efficiency can be an illusion, however, because it does not account for the cost of perceiving features, and in some ways it trades off competence for performance.
Some recent work in case-based reasoning has involved effectively extracting features from geometrical structure (Berger, 1993), and from traces of continuous parameters over time (Ram and Santamaria, 1993). In this paper, we concentrate instead on recognition and retrieval based on the topological structure of engineered designs.
Figure 1: Voltage Divider
A case-based design system, for example, should be able to retrieve a similar circuit for which the voltage across M has been solved, and to transfer the reasoning or constraints that determined the value. However, because JULIA relies on feature-based recognition, it cannot do this. JULIA's memory is arranged as a multiple-discrimination network that discriminates based on deviations from and specializations of normative features. Consequently, it could retrieve circuits containing resistors and batteries, or even resistors with a specific value, but it could not retrieve only those circuits with two resistors in series. Unless, that is, the concept ``two resistors in series'' were explicitly represented in the input, or arbitrary procedures were permitted as discrimination tests. What makes this problem hard is that the primitive components of the circuit, its features, are for the most part the same; by themselves they are not predictive. What is predictive is relationships that are implicit in the input representation.
This is not merely a problem with discrimination networks and knowledge-poor matching. The same limitation is true of other retrieval feature-based recognition algorithms. For example, PROTOS (Bareiss, 1989) does not rely on discrimination to retrieve exemplars, but instead uses a two-phase retrieval algorithm in which the reminding strengths of the input features are combined to activate hypothesis categories, and exemplars of those categories are then matched against the input case using knowledge about the relationships between features. In addition, PROTOS permits categories to be features of other categories thereby representing complex features. Given the right category and the right relational vocabulary, PROTOS could distinguish between a circuit with resistors in series and one with resistors in parallel. Nevertheless, it could never be reminded of a category based solely on relationships in its input, even if those relationships were explicitly represented. This is a fundamental problem of feature-based recognition.
In effect, feature-based recognition reduces the input to a set of features, thereby losing all information about their configuration and cardinality. For some problems, this suffices to enable an efficient first pass retrieval, followed by a matching phase that ranks retrieved items or verifies a hypothesis category. From a pattern-recognition perspective, this is essentially pattern classification followed by template matching.
For many problems, however, this begs the question of exactly what should constitute a feature. If symbols in the input do not represent predictive features, then at what granularity and specificity are features to be found? For example, is a 5Ω resistor a different feature from a 500Ω resistor? Is a transistor a feature, or should it be translated into a `small signal' model composed of passive elements such as resistors? One could appeal to basic level effects and argue that component types are natural categories (hence their designation by different schematic symbols.) But one could also argue that the voltage divider example shows that features can be at a larger granularity, and the existence of alternative representations of transistors suggests that, ultimately, there may not be a fixed set of primitives.
Figure 2: The Feature Extraction process
Moreover, for recognizing relational structure, feature extraction can be like the tail wagging the dog. Not only is it a domain-specific black box, but there is typically no feedback from the decision procedure to the feature-extraction process. Feature extraction provides no general account of how to focus attention and it ignores cardinality and configuration information, so there must be a small number of relevant configurational categories. Most importantly, because features are extracted via arbitrary code, they are not systematically composed of simpler features that can be derived or learned from the input. The vocabulary of features is either fixed or determined by the task and the local features of the input. While top-down processing is important because it provides expectations in the form of a pragmatic (task-driven) context, it must be applied in a global-to-local fashion in order to provide a structural context as well.
One candidate for a common grammar that is increasingly used in engineering design representations is the bond-graph grammar (Paynter, 1961). Bond graphs are a mathematical notation used to model a very wide range of dynamic systems in terms of a small set of abstract primitive elements. Although they are almost universally applicable for modeling the flow of energy in dynamic systems, bond graphs are generally not applicable for modeling software systems or physical configurations. Choosing or learning an appropriate domain-specific grammar could be an arbitrarily difficult task.
Furthermore, parsing by itself can only extract the structure of formal languages. Informal languages, such as natural language, require some kind of semantic analysis as well. The basic problem with parsing is that it is fundamentally a bottom-up process. Contextual information might determine an initial pattern-description language, but after that there is no feedback from higher-level processes.
If recognition is to be based first on relational structure, this raises the problem of inductive bias. Since there are virtually infinitely many relationships implicit in an input, which should be attended to? Part of the answer is that expectations from pragmatic and structural context must provide preferences between decompositions. However, to rely solely on context is to invite infinite regress. Another part of the answer may be to focus attention using gestalt laws of perceptual grouping such as grouping similar elements, grouping highly interconnected clusters and contiguous elements (Rock and Palmer, 1990).
Although originally postulated to account for visual perception, these laws also provide guidance in recognizing abstract relationships in a topological representation. While topological representations have no analog for proximity relationships, there could be other decomposition strategies, such as identifying parallel and serial connections of two-terminal devices, disconnecting graphs at cut nodes that form points of cleavage, and separating circuits or loops. In effect, this would promote configurational information from the role of disambiguation to a more predictive role.
To see how this could work, let's revisit the voltage divider example. Given a structural description of the circuit, the first step would be to group similar elements R1 and R2. The next step would be to identify their relationships as a serial connection of two-terminal devices. If the components were complex sub-circuits, then this abstract relationship could provide a structural context to further constrain their interpretation. In this case, however, the components are simple resistors and their serial connection is enough to retrieve the concept `voltage divider' from which the constraint equation can be transferred to solve the original problem. In this way, recognition can be seen as a process of searching through the space of decompositions for representations that further constrain the recognition process.
This process is supported by a conceptual representation with a number of unusual characteristics. First, domains are represented explicitly as categories that determine equivalence classes. Because there is no uniform vocabulary of primitive features with which an input must be specified, the level of abstraction at which features are considered to be `the same' is determined by the domain's equivalence classes. For example, in conceptually breaking apart a circuit design, sometimes you want to group all impedances together, and sometimes you need to distinguish resistors, capacitors, and inductors. In matching and decomposing the input, we progressively abstract the domain. Next, topological structure is itself represented as a domain. While designs are often implicitly represented as graphs of components, structural recognition makes it necessary to manipulate a representation with respect to progressively more abstract domains, up to and including the purely structural. In addition, for any domain, there are two privileged equivalence classes: the `null device' and the `infinite device'. In an electrical circuit, for example, we typically consider wire to be a null device (except for transmission line effects, etc.) In the high frequency domain, capacitors may also belong to the null device class, while inductors become infinite devices. The null device permits us to treat all of its terminals as a single node, while an infinite device can simply be removed from the representation (in the context of a particular domain).
(defdomain electric-Circuit (defconcept resistor :type graph :domain electric-circuit :equivalence-classes :type two-terminal-device) ((null-device wire) (resistor) (defconcept voltage-Divider (voltage-Source))) :domain electric-circuit :type series (defconcept two-terminal-device :terminals (r1.1 r1.2 r2.2) :domain graph :features ((r1 resistor) :terminals (t1 t2) (r2 resistor))) :remindings (series parallel)) (defconcept CommonEmitter (defconcept series :domain active-circuit :domain graph :type transistor-amp-config :type three-terminal-device :terminals (Q1.1 Q1.2 Q1.3) :terminals (d1.1 d1.2 d2.2) :features ((Q1 transistor) :features (Rc resistor) ((d1 two-terminal-device) (BN1 Bias-Net)) (d2 two-terminal-device)) :relationships :relationships ((connected Q1.2 BN1.2) ((connected d1.2 d2.1))) (connected VD1.1 Rc.1) (connected Rc.2 Q1.1) (connected Q1.3 BN1.3)))
An input to the program is a graph consisting of a set of features (instances of devices) and terminals. All relationships are implicitly represented by terminals shared between features. This input representation is generated automatically by a simple drafting program.
The RECOGNIZE procedure works by first determining if any feature in the input graph predicts (via remindings) a configuration or concept that can be directly matched. If not, it decomposes the input to subgraphs and progressively abstracts the domain and features until a feature is predictive of some configuration or concept. It then re-represents the input in terms of the new concept and repeats the process until no further refinement is possible. The DECOMPOSE procedure first determines the domain of the input based on its features. It then combines all terminals connected via components in the null device equivalence class for the current domain and eliminates `infinite devices'. Next, it groups together features based on similarity (defined by equivalence classes), and lastly, it decomposes those groups based on contiguity (i.e., connectivity). The RE-REPRESENT procedure takes a subgraph and re-writes it in terms of features that are relevant to the domain. It works bottom-up by predicting the context of the subgraph, matching to find a classification and bindings, discriminating to specialize the classification, and finally re-writing the subgraph of the input in terms of the new feature. Verifying a reminding and discriminating to a specialization employ a subgraph matching algorithm that uses constraint propagation and marker passing to minimize the amount of non-deterministic graph matching performed. To illustrate the algorithm, consider again the problem of recognizing a voltage divider in a common-emitter transistor amplifier (Fig. 4):
Figure 4: A common emitter amplifier