Patch Testing

Fintan mentioned a register article in which Oracle are concerned about the interdependencies of linux patches. This is something that used to frustrate me no end while I used Debian, where apt-getting the latest package of something resulted in having to download an often large number of seemingly unrelated packages. Discussing Suns patch dependency logic is something worth of a post on its own, but for now I’d like (at Fintans prodding!) to discuss system testing of patches provided by my group, Patch System Test.

Overview of Solaris Patch System Testing and
Performance Regression Testing describes the testing that is performed on patches after they are created and before they are released.

The overall process is, very simply, something like this:

. An engineer fixes a bug or adds a feature in their code.
. Their group performs testing of that fix to ensure it is ok.
. This fix is then fed into the latest solaris build and tested.
. A patch is created for the issue.
. Patch System Test start testing the patch.

As soon as the patch is created, many audits are performed to ensure that the packaging is sane. For example there is an audit to check that the version string of the packages match what was shipped in the OS, another is to ensure that the files delivered in the previous revision are all included in the present revision etc.

Once it passes these audits, it moves to our Install Backout Testing. This simply applies and backs out the patch a number of times. In the more complicated test cases the patch is applied using the -R option to patchadd which is designed to test that the patch is compliant with live upgrade etc. Since patch creators are able to write free scripts in patches (prepatch etc.) auditing is not reliable to catch issues and this practical testing is the best way to identify problems. Abscent from the document is our Install Backout testing on zones in solaris 10 which obviously ensures that the patch can be added and removed from a system running zones.

The testing that Patch System Test performs is system testing (as the name implies!). We are not concerned with testing each and every function in the code to ensure its correctness, this responsibility lies formemost with the developers. What we aim to do is to test if any of the changes in the patches in our test cycle degrade the system or its applications.

We apply the patches to a range of machines. For our Solaris 10 line this ranges from an ultra 10 to domains on a 15k, and from relativly old PC’s to the latest opteron offerings from Sun.

On these systems we run the tests that are described in the “Overview of Solaris Patch System Testing and Performance Regression Testing” document. Obviously we cant test every application on every configuration, so a set of tests of common applications and environments are selected. By running the OCE testsuite for example we are not testing Oracle, we are testing if any of the patches applied to the system cause it to fail. By running the liverpool test suite we are checking to see if a patch has caused something that a set of users logging in and using the system may notice. The document explains the other test suites, and although I have not checked every one, I think you can asusme that most are also running for Solaris 10 patches.

Of particular intest to some people are our Veritas and Oracle test setups. The Veritas test is designed to basically check that Veritas fs and vm work correctly after patches have been applied. So for example it checks that an existing filesystem is still mountable & writeable, it tries to create and destroy volumes and newfs volumes etc.

The OCE test suites are something that I am even less familar with! This certification test suite is ran against different machine configurations and different Oracle and OS versions. Again it is ran with the intention of finding any problems in patches that could cause something in oracle not to work.

Despite our testing, some things will slip through. A comment on Fintans blog mentiones what I think is bug 4978228 which was caused by a minor change to a data structure in a Soalris 8 kernel patch. Veritas file systems were using this portion of the data structure and this resulted in high io wait times being reported by some utilities. This issue while obviously a problem did not cause oracle to sease working or filesystem performance to be degraded so the issue was not caught until someone actually started looking at mpstat output.

Hopefully that gives an overview of our testing for you.

1 thought on “Patch Testing”

Leave a Comment