1. What is a “System.String”?

At the simplest functional level, it is a data type capable of storing text. Technically, it is a class in the .Net BCL that represents text as a series of Unicode characters. More specifically, it represents a null-terminated series of UTF-16 encoded Unicode characters (although the null termination is not significant to .Net since string may contain nulls).

2. Is there a difference between “System.String” and “string”?

No. That is, string is a C# keyword and System.String is a .Net class, but they refer to exactly the same thing, and are treated in exactly the same way once the compiler takes over. In other words, there are slight nuance differences between them at design time, but that’s pretty much where it ends. They will both compile to exactly the same IL.

Specifically, because it is a keyword, string does not require the “System” namespace qualifier. To use System.String, one has to either use the fully qualified name or add a “using System;” directive at the top of the code file. However, these are not particularly painful limitations, so it really boils down to personal preference at the end of the day. Whichever you choose, be consistent. Also try to be consistent with your choice over all similar data types (i.e. if you choose to prefer “string” over “System.String” then also prefer “int” over “System.Int32”). Ultimately, string is simply a shorthand or alias for System.String.

My personal preference is for the C# keywords where they are available (e.g. string, char, bool, int, etc.) and as such I will for the remainder of this topic refer to System.String simply as string, except where I specifically want to draw your attention to the .Net class implementation (as opposed to the data type).

3. Is string a value type or reference type?

string is a reference type. This is an almost guaranteed junior-to-intermediate interview question, and with good reason. Looking at the .Net implementation of System.String, we notice two obvious things: firstly, it is declared as a “class” (not a struct); and secondly, it inherits directly from System.Object (not System.ValueType). Let’s test this with a few lines of code…

Code:

namespace StringExplorer {
   internal class Program {
      private static void Main() {
         int i1 = System.DateTime.Now.Second;
         int i2 = i1;
         System.Console.WriteLine("ReferenceEquals(i1, i2): {0}", ReferenceEquals(i1, i2));

         string s1 = new string('.', 5);
         string s2 = s1;
         System.Console.WriteLine("ReferenceEquals(s1, s2): {0}", ReferenceEquals(s1, s2));
      }
   }
}

On line 6, we expect ReferenceEquals() to return false, because we know i1 and i2 to be value types. Since ReferenceEquals() will *always* return false for value types, the output from line 10 is conclusive evidence that string is NOT a value type.

Output:

ReferenceEquals(i1, i2): False
ReferenceEquals(s1, s2): True

4. Then why does string behave like a value type?

First, allow me to demonstrate the source of the confusion: in the following code snippet, I create a class (read: reference type) called “MyString”. This class can be assigned a string value (implicit cast operator or “set” accessor) which is stored internally and can be retrieved easily (ToString() override). Additionally, I’ve added one method (+ operator) that can be used to modify the value of a MyString instance. ** Make a mental note here, and take a close look at the implementaton, it’s important.

Code:

using System;

namespace StringExplorer {
   internal class Program {
      private class MyString {
         public string Value { get; set; }

         public override string ToString() {
            return Value;
         }

         public static implicit operator MyString( string value ) {
            return new MyString { Value = value };
         }

         public static MyString operator +( MyString a, MyString b ) {
            if( a == null || b == null ) {
               throw new ArgumentNullException();
            }
            a.Value += b.Value;
            return a;
         }
      }

      private static void Main() {
         MyString m1 = "one";
         MyString m2 = m1;
         m2 += "two";
         Console.WriteLine("m1={0}, m2={1}", m1, m2);

         string s1 = "one";
         string s2 = s1;
         s2 += "two";
         Console.WriteLine("s1={0}, s2={1}", s1, s2);
      }
   }
}

When we now do the modification operation on our “m2” instance (which, because this is a reference type, is really also our “m1” instance), the operation internalizes the effects on the object and both references to the instance will naturally show the modified value when we output them on line 29. This is as expected and is perfectly normal behavior for a reference type.

However, when we attempt the same pattern on a string, we are presented with a result that looks like we made a value copy of s1 into s2 on line 32.

Output:

m1=onetwo, m2=onetwo
s1=one, s2=onetwo

Though, if we look back at question 3, we are reminded that this is not the case. After the assignment to s2, both references do indeed point to the same instance (as we clearly saw in that example). So why the odd output then?

And thus, to our original question…

Strings in .Net are immutable, meaning that their internal values cannot be changed (by any “normal” means, anyway). Remember the mental note a short while back? Our “MyString” class specifically allowed changing the internal value (or state) of a particular instance. Strings, on the other hand, do not allow this.

A clue to this can be found in the public definition of System.String (just type "" and hit . in the IDE). The class implements function semantics, meaning that all the methods that operate on the value of the string (Insert, Remove, Replace, etc.) also return a string. More importantly, they return a new instance of String.

Couple this with the immutability of a string (the fact that the internal value cannot be changed once it has been assigned), and the behavior we saw earlier starts to make perfect sense. Simply put, the “+=” operation on line 32 creates a new string instance which is then assigned to s2, and the value of s1 remains unchanged.

We can easily adapt our MyString class to behave in the same way:

Code:

private class MyString {
   private readonly string _Value;

   private MyString( string value ) {
      _Value = value;
   }

   public string Value {
      get { return _Value; }
   }

   public override string ToString() {
      return Value;
   }

   public static implicit operator MyString( string value ) {
      return new MyString(value);
   }

   public static MyString operator +( MyString a, MyString b ) {
      if( a == null || b == null ) {
         throw new ArgumentNullException();
      }
      return new MyString(a.Value + b.Value);
   }
}

Notice that now our internal state (line 6) is read-only, and thus cannot be changed after the constructor(s) exit. We are therefore forced in line 28 to create and return a new instance (whereas earlier we modified instance “a” and returned that). Running the code now gives us the exactly the same behavior we observed with string.

Output:

m1=one, m2=onetwo
s1=one, s2=onetwo

In Part II, we’ll continue our exploration of some “basic questions about System.String”.