Geeks With Blogs
Tim Watson blog

 

I had an interesting question come up the other day: why override object's "Equals" method and/or the equality/inequality operators at all? Why not just compare objects in some dedicated function or other? After my initial shock died down (the question came from an OO developer), I realized there are several reasons. Let's consider these in the context of Java (which the conversation was about) and .NET, in turn.

In Java, the situation is not cut and dried when it comes to equality comparison. To compare object identity (e.g. do two reference variables point to the same object in memory), Java provides the "==" equality operator. For any non-primitive type, this binary operator returns true if both its operands refer to the same object in memory, otherwise it returns false. For most objects, this makes sense as the default mode of comparison; in some circumstances, this might even be the only behavior appropriate for an object. Consider, for example, an invoice object which is being passed around and asked to update itself with various items, discounts, and so on. You'll want to be sure that each time you change it; you're changing the invoice, and not a copy of it. When comparing two invoices, you might as well look and see if the references point to the same instance.

Not all objects work like this however. Sometimes, you'll come across objects that should be compared using value semantics, similar to the Java language's primitive types. Consider a monetary value, for example, or a date/time. There might be hundreds of these, duplicated throughout the system; when you compare two such objects, you want to know if they represent the same underlying value, not whether two instances point to the same place in memory! An address is potentially a good example of this; it might only exist in one place in the world, but within the system we might be comfortable having dozens of copies of an address. A good measure of whether or not you want a “value object” is this: does the object cease to be "itself" if you change its underlying value(s)? An address certainly does, because once you change the house number, street name or postal code, it’s not the same address anymore!

Identifying value objects is just half the battle though; they also need to be compared using the proper (value) semantics. My friend had suggested to me that you could just define a method to do this; compare the values of the objects fields and return true or false to indicate equality. Before I set about explaining why this approach isn't a good idea, let's just look at an example of such a method.

public static boolean AreEqual( Address a, Address b ) {
    if ( ( a == null ) && ( b == null ) )
        return true;
    if ( ( a == null ) || ( b == null ) )
        return false ;
    return (
            a.house().equals( b.house() ) &&
            a.street().equals( b.street() ) &&
            a.town().equals( b.town() ) &&
            a.county().equals( b.county() ) &&
            a.postCode().equals( b.postCode() )
    );

}

 

This code looks innocuous enough at first glance, but there are several problems with it. First of all, it's totally wrong from an OO point of view. The whole crux of Object Orientation is to put data and behavior together on the same object. This means (from a structural point of view) that any questions about the object (such as, "are you the same as this other instance?") should be directed at the object (in question). Pulling the comparative behavior out into a separate method is an odd choice, even if the method is defined as a static member of the Address class. This is definitely a procedural approach, and not an object oriented one. 

The second problem with this implementation is that it isn't what other developers will be expecting. Every Java developer knows that you call the "==" operator to do reference comparison, and Object.equals to do value comparison (if the class in question supports it). This means that other developers will naturally expect to compare instances of your object(s) using the paradigm they're familiar with: Object.equals() . Implementing comparison in an unusual, counter-intuitive way is very bad, because it makes the code harder to maintain (anything that is hard to understand - e.g. counterintuitive - is hard to maintain) and therefore less reliable; hard to maintain code means it is far more likely that programmers will introduce bugs accidentally. Always aim for consistency, especially with the language/platform you are targeting. If Java expects you to use Object.equals to implement value based equality comparison, use it. Don't make up your own techniques; you're not doing anyone any favors, least of all your co-workers.

The third problem with this code is that such objects cannot be used as a key in a hashtable. Yes, that's right - you cannot use an object as a key in a hashtable unless you override the "equals" method! A number of data structures use the hashCode() method from “java.lang.Object” to organize their data internally. If you wish to use your object as a key in a hashtable for example, you must implement hashCode() properly and you must also override equals(). A quick glance at the Java API documentation will reveal that there is a semantic contract between hashCode() and equals(). If you implement one, you should implement the other properly and meet a few minimum requirements as well. The most important of these is when calling equals() on two objects returns true, calling hashCode() on them should return the same (integer) value.

The reason for all this is simple: data structures that use hashing (and many of them do), take an object’s hash code and use it to identify the bucket into which the object should go when added to the collection. Two objects might yield the same hash code but have equals() return false, in which case the collection must find the right bucket and then sort through it using brute force (e.g. comparing each object in the bucket using equals() to find a match). If all objects of a given type return the same hash code (which is perfectly legal!), there will be only one bucket and any search will be horribly inefficient. In an ideal world, each combination of objects that is considered equal will have a unique hash code; each unique object has its own bucket and searching is therefore efficient.

Either way, if you don’t override equals() and hashCode(), your class probably won’t work properly with some of the Java collection classes. This discussion is an over-simplification of the way that equals() and hashCode() work, but it serves to explain why implementing equality comparison using a special method instead of overriding equals() is wrong, and that is the point I’m going for. Some people have done their theses on hashing algorithms, so there’s no point in trying to expound much about them here (I’m not clever enough anyway!). Instead, here's a very simple example in Java.

public class Address {

 

    private static final int HASH_CODE_SEED = 7;

    private static final int HASH_CODE_MULTIPLIER = 31;
    private String houseName;
    private String street;
    private String town;
    private String county;
    private String postalCode;

    private int cachedHashCode;

     

    public Address( String houseName, String street,
                    String town, String county, String postalCode ) {
        this.houseName = houseName;
        this.street = street;
        this.town = town;
        this.county = county;
        this.postalCode = postalCode;

  cachedHashCode = HASH_CODE_SEED
    }

 

        @Override
    public boolean equals( Object object ) {
        if ( ( object == null ) || ( object.getClass() != this.getClass() ) )
            return false;
        return ( this.hashCode() == object.hashCode() );
    }

 

    @Override
    public int hashCode() {

        if ( cachedHashCode != HASH_CODE_SEED )

            return cachedHashCode;   
        int result = HASH_CODE_SEED;

        result = HASH_CODE_MULTIPLIER * result + ( house() == null ? 0 : house().hashCode() );

        result = HASH_CODE_MULTIPLIER * result + ( street() == null ? 0 : street().hashCode() );

        result = HASH_CODE_MULTIPLIER * result + ( town() == null ? 0 : town().hashCode() );

        result = HASH_CODE_MULTIPLIER * result + ( county() == null ? 0 : county().hashCode() );

        result = HASH_CODE_MULTIPLIER * result + ( postCode() == null ? 0 : postCode().hashCode() );  

        cachedHashCode = result;

        return cachedHashCode;   

    }

 

    @Override
     public String toString() {
        return houseName + ", " + street + ", " +  town +
                ", " +  county + ", " + postalCode;
    }

 

    public static void main(String[] argv) {
        Address a1 = new Address( "The Goldings", "Wannamaker Street",

                "New Gate", "Dougleshire", "TV2 4BY" );

        Address a2 = new Address( "Number 47", "Elms Lea",

                "Sunbury upon Thames", "South Somewhere", "XF9 Y7H" );

        assert a1.hashCode() != a2.hashCode();

        assert !a1.equals( a2 );

        assert a1.hashCode() != a2.hashCode(); //repeat the test!

        Address sameAsA1 = new Address("The Goldings", "Wannamaker Street",

                "New Gate", "Dougleshire", "TV2 4BY" );

        assert a1.hashCode() == sameAsA1.hashCode();

        assert a1.equals( sameAsA1 );

    }

}

 

The main problem with this implementation of equals is that some hash codes are bound to be duplicated, as the hash value is only 32 bits long. In an object with more than 32 bits of information in it, we cannot possibly have a different hash code for every possible value. In reality, it would be better to compare the fields themselves, but I'll leave that to the reader's imagination.

The same points hold true for implementing value semantics in .NET. I’ll not rehash the points of difference between value and reference types again here, expect to note that where you want value semantics, you probably want to implement the object as a value type. The rules for "what is a value object" hold pretty much true for choosing to implement a struct as well: it should be immutable, comparisons are value (not reference) based, etc. The code extract below implements the Address class as a C# struct. All the noteworthy things about equality and hash code implementation were covered in a previous post.

public struct Address : IEquatable<Address?>, IEquatable<Address>

{

    private const Int32 HASH_CODE_SEED = 7;

    private String houseName;

    private String street;

    private String town;

    private String county;

    private String postalCode;

   

    public Address( String houseName, String street,

                    String town, String county, String postalCode ) {

        this.houseName = houseName;

        this.street = street;

        this.town = town;

        this.county = county;

        this.postalCode = postalCode;

    }

 

    public override int GetHashCode() {

        Int32 result = HASH_CODE_SEED;

        result ^= (

            GetHashForNullableField( houseName ) ^

            GetHashForNullableField( street ) ^

            GetHashForNullableField( town ) ^

            GetHashForNullableField( county ) ^

            GetHashForNullableField( postalCode )

        );

        return result;

    }

 

    private Int32 GetHashForNullableField( Object field ) {

        if ( field == null )

            return 0;

        return field.GetHashCode();

    }

 

    public static Boolean operator ==( Address a, Address b ) {

        Address? nilOrA = a;

        Address? nilOrB = b;

        Address? addr = nilOrA ?? nilOrB;

        if ( !addr.HasValue )

            return true; //they're both null !!!

        return addr.Value.Equals( ReferenceEquals( addr, nilOrA ) ? nilOrB : nilOrA );

    }

   

    public static Boolean operator !=( Address a, Address b ) {

        return !( a == b );

    }

 

    public override bool Equals( object obj ) {

        if ( obj is Address )

            return Equals( (Address?)obj );

        return false;

    }

 

    public bool Equals( Address? other ) {

        if ( !other.HasValue )

            return false;

        Address theOther = other.Value;

        return (

            ( houseName == theOther.houseName ) &&

            ( street == theOther.street ) &&

            ( town == theOther.town ) &&

            ( county == theOther.county ) &&

            ( postalCode == theOther.postalCode )

        );

    }

 

    public bool Equals( Address other ) {

        return Equals( (Address?)other );

    }

 

    public override String ToString() {

        return String.Format( "{0}, {1}, {2}, {3}, {4}",

            House, Street, Town, County, PostCode );

    }

 

    public string House { get { return houseName; } }

    public string Street { get { return street; } }

    public string Town { get { return town; } }

    public string County { get { return county; } }

    public string PostCode { get { return postalCode; } }

}

 

In terms of inheritance hierarchies, it's worth pointing out that because any .NET type can override Object's Equals method, you shouldn't rely on it as a test for identity. Unlike Java, you cannot rely on the "==" operator either, because it can be overloaded; You can defer to the static Object.ReferenceEquals method, which exists for the sole purpose of identity testing.

 

I consulted the 3rd edition of Jeff Richter's bible shortly after posting the first time around and noticed a couple of other elucidating points. When defining a struct, you should always override Equals because the default implementation uses reflection to compare all the field values between the two instances. This represents an unneccessary performance hit, and the couple of minutes it takes you to test and write the overrides and overloads is well worth it in that light.

Posted on Monday, March 13, 2006 9:36 AM | Back to top


Comments on this post: All objects are equal, but some are more equal than others

# re: All objects are equal, but some are more equal than others
Requesting Gravatar...
Great article!
Left by Tim Scott on Jun 16, 2006 6:43 PM

Your comment:
 (will show your gravatar)


Copyright © Tim Watson | Powered by: GeeksWithBlogs.net