porges

Objects can be collected while their instance methods are still executing

In Peter Ritchie’s post Dispose Pattern and “Set large fields to null”, he states the following (my highlighting):

At face value, setting a field to null means that the referenced object is now unrooted from the class that owns the field and, if that was the last root of that reference, the Garbage Collector (GC) is now free to release the memory used by the object that was referenced by that field. Although advanced, this seems all very academic because the amount of time between unrooting the reference and the return from Dispose (and thus the unrooting of the parent object) would seem like a very short amount of time. Even if the amount of time between these two actions is small, setting a single field to null (i.e. a single assignment) seems like such a minor bit of code to provide no adverse affects. The prevalent opinion seems to be that the GC “handles” this case and does what is best for you without setting the field to null.

In fact, this “short amount of time” can be so short as to be negative! A class that nothing else refers to is unrooted as soon as it no longer refers to itself, so objects in .NET can be collected while their instance methods are still executing. I first learned about this from Chris Brumme’s blog. Here is an example that shows this behaviour:

The output is:

start method
collected
end method

Once the method Go() has no references to this, it is eligible for collection. (What this means is that setting fields to null will actually extend the lifetime of the parent object ever-so-slightly.)

However, the code that a using is transformed into will hold onto a reference to the object while its Dispose() method is called, so you won’t get the same behaviour with the following code:

The output is:

start method
end method
collected

This is because the using statement is transformed to something like this:

I think (but cannot confirm) that the reference being held is that of Slow in the outer scope. We can restore the previous behaviour if the using transform was written as this instead:

Either way, I wouldn’t bother setting fields to null in the Dispose() method. (For one thing, these fields can no longer be declared readonly.) And most of the time, the GC is smarter than you are.

On i, j, … as iteration variables (but really a foray into primary sources)

This question was recently asked on StackOverflow:

I know this might seem like an absolutely silly question to ask, yet I am too curious not to ask…

Why did “i” and “j” become THE variables to use as counters in most control structures?

The question has generated many answers, from scholarly to spurious — but the thing that has struck me is that no one has attempted to cite their sources or do any research. Why is this, when we live in a time when primary sources are more widely available than ever?

Let’s start with the claims that FORTRAN was the original source for their use in programing languages—while perhaps not the ultimate origin, it may have been the reason that they became widespread in the programming community.

The original manual for Fortran 1 for the IBM 704 is readily available online. The first thing I notice is the glorious cover:

And sure enough, we can find the definition for the integral variables:

Unfortunately, the path stops here. I can’t find any references by Backus (or anyone else) as to why they chose IJKLMN as the integer variables. However, due to the fact that integer variables in Fortran “are somewhat restricted in their use and serve primarily as subscripts or exponents”, 2 I am forced to the conclusion that they were used in imitation of those in mathematics. I don’t think we’ll ever know exactly who or when they were introduced to Fortran itself.

What we can do, however, is have a look at when they arose in mathematics. The usual place that i, j, etc. arise is in ‘sigma notation’, using the summation operator Σ. For example, if we write:

We mean i (= 1) + i (= 2) + i (= 3), until i = 100, and we can calculate the answer as 1 + 2 + 3 + 4 + … = 5050. So where did this notation itself come from?

The standard work on the history of mathematical notations is A History of Mathematical Notations by Florian Cajori. 3 He states that Σ was first used by Euler, in his Institutiones calculi differentialis (1755). We can see the part in question here:

This reads (translation by Ian Bruce, from 17thCentryMaths.com):

26. Just as we have been accustomed to specify the difference by the sign Δ, thus we will indicate the sum by the sign Σ; evidently if the difference of the function y were z, there will be z = Δy; from which, if y may be given, the difference z is found we have shown before. But if moreover the difference z shall be given and the sum of this y must be found, y = Σz is made and evidently from the equation z = Δy on regressing this equation will have the form y = Σz, where some constant quantity can be added on account of the reasons given above; […]

Evidently this is not the Σ we are looking for, as Euler uses it only in opposition to Δ (for finite differencing). In fact, Cajori notes that Euler’s Σ “received little attention”, and it seems that only Lagrange adopted it. Here is an excerpt from his Œuvres (printed MDCCCLXIX):

Again, we can see Σ is only used in opposition to Δ. Cajori next states that Σ to mean “sum” was used by Fourier, in his Théorie Analytique de la chaleur (1822), and here we find what we’re looking for:

The sign Σ affects the number i and indicates that the sum must be taken from i=1 to i=1/0. One can also contain the first term 1 under the sign Σ, and we have: [equation]

It must then have all its integral values from -1/0 up to 1/0; that is what one indicates by writing the limits -1/0 and +1/0 next to the sign Σ, that one of the values of i is 0. This is the most concise expression of the solution. 4

Since Fourier explains Σ several times in the book, and not just once, we can assume that the notation is either new or unfamiliar to most readers. 5 In any case, it doesn’t really matter who invented it, because while we have found our Σ, Fourier doesn’t explain why he uses i. In fact, since he uses it to index sequences in other places it appears it must be an already-existing usage. 6

A quick glance at the text by Euler above shows that he uses indexing very rarely (despite the subject of the text being a prime candidate!), and when he does, he uses m.

And, this is as far as I got. Time to publish this.

Notes:

  1. It isn’t written FORTRAN here. I’m not sure of the nuances of its capitalization.
  2. J.W. Backus, R.J. Beeber, S. Best, R. Goldberg, L.M. Haibt, H.L. Herrick, R.A. Nelson, D. Sayre, P.B. Sheridan, H.J. Stern, I. Ziller, R.A. Hughes, and R. Nutt, The FORTRAN automatic coding system. Pages 188-198. In Proceedings Western Joint Computer Conference, Los Angeles, California, February 1957.
  3. Unfortunately, only the first volume appears to be readily available online. You can see some of the second volume on Google Books.
  4. Note that Fourier has no qualms about writing -1/0 and +1/0!
  5. Knuth also states that the notation arrived with Fourier, so I guess I’m not in bad company.
  6. While i is often used as (one of) the indices for a matrix, true matrices weren’t developed until after Fourier’s book was published, we must look elsewhere.

Casting in .NET via object mutation

In this post, we will see how to make the following code fail:

	object it = new SomeStruct { Item = 1 };
 
	Floatsy(it);
 
	Console.WriteLine(((SomeStruct)it).Item);

At runtime, it will throw an InvalidCastException!

Read the rest of this entry »

Validating email addresses with .NET regex

I did validation in Haskell a while back, and since I recently discovered .NET’s “balancing groups” regex feature, it seems like it would be a good time to do it for .NET.

Read the rest of this entry »