In the previous installment of this miniseries, we defined string as a data type representative of a series of characters, noticed the shorthand C# keyword, satisfied ourselves that string is a reference type, and observed some behavioral aspects of its immutability and function semantics.

Let’s continue then with our little expedition…

5. Why is there a String.Copy() method?

It’s the simplest way to get a new instance of string with the same value as another string. Consider the following code snippet:

Code:

using System;

namespace StringExplorer {
   internal class Program {
      private static void Main() {
         string s1 = new string('.', 5);

         string s2 = s1;
         Console.WriteLine("s2={0}, ReferenceEquals(s1, s2)={1}",
                           s2, ReferenceEquals(s1, s2));

         string s3 = string.Copy(s1);
         Console.WriteLine("s3={0}, ReferenceEquals(s1, s3)={1}",
                           s3, ReferenceEquals(s1, s3));

         Console.ReadLine();
      }
   }
}

We’ve already seen that the assignment in line 8 is by reference and thus, while the values are indeed the same, both variables reference the same instance. String.Copy() on the other hand delivers a new separate instance with the same value.

Output:

s2=....., ReferenceEquals(s1, s2)=True
s3=....., ReferenceEquals(s1, s3)=False

6. Why does ReferenceEquals() return true when it should be false?

The following snippet demonstrates this (quite interesting) behavior:

Code:

using System;

namespace StringExplorer {
   internal class Program {
      private static void Main() {
         string s1 = "one";

         string s2 = "two";
         Console.WriteLine("s2={0}, ReferenceEquals(s1, s2)={1}",
                           s2, ReferenceEquals(s1, s2));

         string s3 = "one";
         Console.WriteLine("s3={0}, ReferenceEquals(s1, s3)={1}",
                           s3, ReferenceEquals(s1, s3));

         Console.ReadLine();
      }
   }
}

Given everything we’ve seen so far in this series, one could be forgiven for assuming that assignments from string literals as in this example will always return new instances. However, .Net has a clever little trick when dealing with string literals (i.e. “literal” string values, defined by supplying string data enclosed in double quotes ""), called interning.

The runtime is of course cognizant of the immutability of strings, and (seeing as their values will most likely not change) “interns” string literals into an internal pool. Whenever there is a request to create a string from a literal value (say, as for s2 and s3 in our example) the runtime will first check to see if there is a string with the same value in the interned list. If there isn’t (as in our case for s2) a new instance gets created, interned, and returned to the caller. If, on the other hand, there is already an item in the interned list with the same value as was requested by the caller (as in our case for s3), the existing instance is returned instead, thereby saving the overhead of creating another copy.

Output:

s2=two, ReferenceEquals(s1, s2)=False
s3=one, ReferenceEquals(s1, s3)=True

Also, an expression like "o" + "ne" will be pre-calculated by the compiler and interned as "one", and not as "o" and "ne" separately.

7. In what ways can I interact with / manipulate string interning?

Okay, this one is really more of an aside, but since we’re talking about interning anyway…

a. String.IsInterned() can be used to detect whether a string exists in the CLR intern pool. If it is found, the function will return the interned string instance, otherwise it will return null.

Code:

string output = string.IsInterned(input);

b. String.Intern() can be used to retrieve a given string from the intern pool. If it is not found, the function will create a new interned string instance with the specified value and then return a reference to the new instance.

Code:

string output = string.Intern(input);

c. String.Copy() can be used (as we saw earlier) to forcibly get a non-interned instance with the same value as an interned one.

Code:

string output = string.Copy(interned);

8. Are string constants also interned?

Yes. That is, constant values are literally replaced into their “referencing sites” by the compiler (i.e. they’re cut-n-paste’d into everywhere the constant symbol is used) as true literals. This is how the runtime will eventually see them, and we would observe identical behavior to the literal string interning we witnessed above.

I’m hoping that some red flags are going up in everyone’s heads right now: constant values intended for public consumption should ALWAYS be declared as static readonly and NOT as const. This way callers will always refer to the definition to retrieve the symbol’s value, instead of the value being copied into the calling assembly at compile time. Should the defining type update the definition, the caller in the latter situation would not be aware of an update.