All software tend to evolve. Some reasons for evolution are bug fixes, code re-factoring, adding new functionality, and modifying existing functionality. When a Java program evolves from one version to another, typically new classes get added and some of the existing classes get modified or deleted. Understanding how a Java program evolves helps in program maintenance, and more importantly, can help in minimizing class regression testing. This paper presents a language-based approach for analyzing program evolution. It shows how the notion of atomic changes provides a semantically richer vocabulary to characterize a Java program that undergoes changes across versions. Some atomic changes with respect to Java are changing class member order, changing member access level, defining new methods, and method renaming. Examples are presented to demonstrate that classes that evolve by certain permutations of atomic changes might not require any regression testing. A tool called JEvolveTM has been developed to automatically analyze Java programs and to identify those modified classes that require regression testing. Data obtained by analyzing various versions of Java Developers Kit using this tool are presented.
1. INTRODUCTION
All software tend to evolve. In fact, evolution indicates that the software is being actively used! Some reasons for the evolution are
Section 2 briefly touches upon related research. Section 3 outlines
some ways of characterizing an evolving Java program. Section 4 defines
the concept of atomic changes and enumerates some of the atomic changes
applicable to Java programs. The connection between atomic changes and
class retesting is discussed in Section 5. Section 6 briefly describes
a tool that automatically analyzes two versions of a Java program to suggest
portions that needed to be retested. Summary of results obtained by running
the tool on various versions of the Java Developers Kit are presented in
Section 7. Finally, Section 8 presents the conclusions.
2. RELATED RESEARCH
Kung et al. [3] identify the types of changes that can be made to an OO library and provide a system that captures these and makes inferences about their impact on software maintenance. Hsia et al. [4] address the issue of test case selection for revalidation purposes. Their approach consists in computing the class firewall for a changed class and using that to identify which classes need to be retested when a program evolves. Rothermel and Harrold [5] use a program dependence graph to represent control and data dependencies in an OO program and use this to identify the statements in the modified program that will produce different test results. None of these approaches is language specific and hence does not take advantage of certain properties that can further reduce regression test effort. Our approach brings into play peculiarities of specific OO languages (we have considered in our research C++ and Java) and uses that knowledge to minimize regression testing. Whitmire [6] describes a small number of atomic operations characterizing design changes. The changes we have proposed are more fine-grained than his and occur at the programming level. Palay [7] describes a C++ system that understands certain compatible changes to an evolving class in order to minimize recompilation.
3. CHARACTERIZING JAVA PROGRAM EVOLUTION
One common way to characterize an evolved Java program is to say something like 10 files were modified, 15 classes were changed, 8000 source lines were altered, and so on. This kind of description is too coarse and is not a useful indicator of change complexity. Instead, we would benefit by coming up with a formalism that has the following desirable properties:
class A {
int i;
void setVal(int v) {
i = v;
}
}
If the above class is changed to
class A {
void setVal(int v) {
i = v;
}
int i;
}
the text differencing engine will show that the two source files are
different since the order of class elements have changed. Although these
differences are relevant from a purely textual point of view,
they are not significant from a program behaviour perspective. Consider
another example:
class Base {
protected int value;
// other elements
}
class Derived extends Base {
private int value; //
--- (1)
void ff() {
System.out.println(value);
}
}
If the derived class is later changed to
class Derived extends Base {
private int i; // ---
(1a)
void ff() {
System.out.println(value);
}
}
a text differencing utility will highlight the differences with respect
to lines marked (1) and (1a), but will show no change with respect to the
body of Derived.ff(). However, it is clear that the behaviour of Derived.ff()
has changed (because of change in symbol binding) and from a regression
testing perspective, the method requires to be retested. Reasoning like
this is possible if program differences can be represented at a higher
level than purely lexical. The concept of atomic changes
proves useful here.
4. ATOMIC CHANGES
An atomic change is a change applied to the source code such that
Atomic changes satisfy three interesting properties [2]:
An important research question is Should a class be retested every time it changes? Another related question is If a class is modified, what are the other classes that must be retested? In the most common scenario, every time a class is changed, the entire regression test suite is rerun. This eliminates the risk of not testing a class whose behaviour has changed in the process of program evolution, but can be quite costly! Assume that we have written a fairly large application in Java comprising around 500 classes that uses JDK 1.1.6. Should we retest our entire application if we link it to JDK1.1.7 when that eventually becomes available? Imagine the nightmare of having to retest all classes in our application! Since testing requires several resources such as time, humans, and hardware, we would like to expend no more than what is needed for retesting the application.
Unfortunately, the optimum effort needed to retest a set of classes that have changed cannot be easily computed. However, certain clues can be derived from a careful study of the source changes. If in doubt, of course, we can always resort to retesting. Going back to the Sample class discussed in Section 4, should we retest the modified version? Despite seven atomic changes, it is safe to conclude that retesting the modified class is not necessary since nothing significant has changed. But, how do we know nothing significant has changed without retesting the affected class?
As pointed out in Section 4, an atomic change might or might not induce retesting on a class. If we catalog all atomic changes possible in a Java program, associating with each one its impact on retesting, then by examining the actual atomic changes that are part of a particular evolution, it is possible to decide whether or not retesting is needed. The assumption we will make for our analysis is that the modified program builds without errors. In the Sample class above, none of the atomic changes induces retest, and hence the modified class need not be retested!
A complete description of possible atomic changes in a Java program is beyond the scope of this paper. However, it would be of interest to know some of the atomic changes that induce retest on the affected class. The following is a partial list:
| Description | V1 | V2 | V3 |
| Total no. of atomic changes | 246 | 420 | 3988 |
| Classes modified | 49 | 120 | 680 |
| Classes added | 1 | 22 | 546 |
| Classes deleted | 0 | 1 | 153 |
| Classes that require retest | 99 | 100 | 779 |
| Methods that require retest | 407 | 606 | 8149 |
| Modified methods that do not require retest | 8 | 30 | 61 |
| Atomic change | V1 | V2 | V3 |
| Accessibility level of field changed | 0 | 2 | 19 |
| Method made synchronized | 1 | 8 | 30 |
| Method made non-synchronized | 1 | 33 | 18 |
| Method argument renamed | 2 | 2 | 44 |
| Method made final | 3 | 0 | 8 |
| Method made non-final | 0 | 0 | 31 |
| Class made final | 1 | 0 | 0 |
| Class made non-final | 0 | 0 | 1 |
| Class made public | 0 | 0 | 3 |
| Accessibility level of method changed | 1 | 2 | 26 |
| Class members reordered | 1 | 3 | 99 |
| Field made final | 0 | 1 | 16 |
| Instance variable made transient | 0 | 1 | 11 |
| Instance variable made non-transient | 0 | 0 | 3 |
| Field direct initializer modified | 1 | 10 | 34 |
| Static initializer block added | 0 | 2 | 47 |
| Static initializer block removed | 1 | 1 | 4 |
| Static method added | 0 | 7 | 210 |
| Static method removed | 0 | 3 | 37 |