Why does a foreach loop silently insert an "explicit" conversion?

The C# specification defines
foreach (V v in x) 
  embedded-statement
as having the same semantics as:1
{
  E e = ((C)(x)).GetEnumerator();
  try 
  {
    V v;  // Inside the while in C# 5.
    while (e.MoveNext()) 
    {
      v = (V)e.Current;
      embedded-statement
    }
  }
  finally 
  {
    // necessary code to dispose e
  }
}
There are a lot of subtleties here that we've discussed before; what I want to talk about today is the explicit conversion from e.Current to V. On the face of it this seems very problematic; that's an explicit conversion. The collection could be a list of longs and V could be int; normally C# would not allow a conversion from long to int without a cast operator appearing in the source code.2 What justifies this odd design choice?

The answer is: the foreach loop semantics were designed before generics were added to the language; a highly likely scenario is that the collection being enumerated is an ArrayList or other collection where the element type is unknown to the compiler, but is known to the developer. It is rare for an ArrayList to contain ints and strings and Exceptions and Customers; usually an ArrayList contains elements of uniform type known to the developer. In a world without generics you typically have to know that ahead of time by some means other than the type system telling you. So just as a cast from object to string is a hint to the compiler that the value is really a string, so too is
foreach(string name in myArrayList)
a hint to the compiler that the collection contains strings. You don't want to force the user to write:
foreach(object obj in myArrayList)
{
  string name = (string)obj;
In a world with generics, where the vast majority of sequences enumerated are now statically typed, this is a misfeature. But it would be a large breaking change to remove it, so we're stuck with it.
I personally find this feature quite confusing. When I was a beginner C# programmer I mistakenly believed the semantics of the foreach loop to be:
    while (e.MoveNext()) 
    {
      current = e.Current;
      if (!(current is V)) 
        continue;
      v = current as V;
      embedded-statement
    }
That is, the real feature is "assert that every item in the sequence is of type V and crash if it is not", whereas I believed it was "for every element in this sequence of type V...".3
You might wonder why the C# compiler does not produce a warning in modern code, where generics are being used. When I was on the C# compiler team I implemented such a warning and tried it on the corpus of C# code within Microsoft. The number of warnings produced in correct code (where someone had a sequence of Animal but knew via other means that they were all Giraffe) was large. Warnings which fire too often in correct code are bad warnings, so we opted out of adding the feature.
The moral of the story is: sometimes you get stuck with weird legacy misfeatures when you massively change the type system in version two of a language. Try to get your type system right the first time when next you design a new language.
  1. This is not exactly what the spec says; I've made one small edit because I don't want to get into the difference between the element type and the loop variable type in this episode.
  2. Or the long being a constant that fits into an int.
  3. If the latter is the behavior you actually want, the OfType extension method has those semantics.
Previous
Next Post »