Contents
Introduction
A common question asked by C# developers is why structs (the representation of .NET value types in the C# language) do not support inheritance whereas classes (.NET reference types) do?
I see three reasons:
- Design choice
- Conflict with array covariance
- Memory management
As C# is the main language of the .NET platform I’ll use the term struct in this article instead of value type.
#1 Design
The first reason is that structs have been designed to be lightweight, and one of their primary uses is to represent simple data structures like points or colours which are basically a set of primitive data.
Preventing inheritance has not only conceptual implications by guiding the usage, but also technical as no inheritance means no need for a vtable to store the pointers to virtual methods that may be overridden further down the types hierarchy.
In the same way instances of structs do not have an object header which stores metadata like type information along with a sync-block used to take a lock on the object in multi-threaded programs.
So in memory structs instances are merely a sequence of fields storage.
Of course if any operation, like calling a virtual methods inherited from System.Object
like ToString
, imply the usage of these “missing” parts a reference-type box wrapping a copy of the instance is used instead.
#2 Array covariance
The second reason is a technical arbitrage: you cannot have both struct inheritance and array covariance.
In a type-system context, covariance is the fact that if a type inherits from another, this relation exists also between types derived from these types.
As an example if B
inherits from A
, then references to B
are also considered references to A
, and can be substituted by them, which allows dynamic polymorphism.
In C# the vectorization of references is also covariant: if B
inherits from A
and can be substituted to A
, then an array B[]
of references to B
instances can be substituted to an array A[]
of references to A
instances.
This is possible because all references have the same memory representation.
Here is an illustration:
using static System.Console; using System.Linq; long F(A[] a) => a.Sum(x => x.n); A[] a = { new B { n = 1 }, new B { n = 2 }, new B { n = 3 } }; WriteLine(F(a)); class A { public int n; } class B : A { }
Compilation and run:
>csc Test.cs Compilateur Microsoft (R) Visual C# version 4.4.0-3.22518.13 (7856a68c) Copyright (C) Microsoft Corporation. Tous droits réservés. >Test.exe 6
What could go wrong?
So what could go wrong if we also had structs inheritance?
We could end with arrays of instances of derived struct B
used where an array of instances of the base struct A
is expected, and as more often than not the instances of B
will occupy more memory than instances of A
, as more fields are added, the memory address calculation to reach the ith element of the array will be wrong.
Here is an illustration in C++:
#include <iostream> struct A { int na; }; struct B : A { int nb; }; void f(A as[]) { std::cout << as[0].na << "|" << as[1].na << "|" << as[2].na << std::endl; } int main() { B bs[] = { B{ A{1}, 2}, B{ A{3}, 4}, B{ A{5}, 6}}; f(bs); return 0; }
Note: in C++ struct
s have not at all the same semantics as in C#: it only impacts the default visibility of members (public
) and the default inheritance type (public
).
Compilation:
>cl /std:c++latest /EHsc test.cpp Microsoft (R) C/C++ Optimizing Compiler Version 19.34.31933 for x64 Copyright (C) Microsoft Corporation. All rights reserved. /std:c++latest is provided as a preview of language features from the latest C++ working draft, and we're eager to hear about bugs and suggestions for improvements. However, note that these features are provided as-is without support, and subject to changes or removal as the working draft evolves. See https://go.microsoft.com/fwlink/?linkid=2045807 for details. test.cpp Microsoft (R) Incremental Linker Version 14.34.31933.0 Copyright (C) Microsoft Corporation. All rights reserved. /out:test.exe test.obj
Run:
>test.exe 1|2|3
So we get 1|2|3
instead of 1|3|5
, so what happened?
The issue is that the memory representation of instances of A
and B
are not the same: B
ones are twice as big as A
ones, so the pointer arithmetic to get to the ith instance in the array is not the same with an array of A
or B
.
Here is a representation of the array bs
:
+----------+---------+----------+ | 0 | 1 | 2 | |+----+----+----+----+----+----+| || B | B | B || |+----+----+----+----+----+----+| || 1 | 2 | 3 | 4 | 5 | 6 || |+----+----+----+----+----+----+| +-------------------------------+
It contains 3 instances of B
stored side by side at indices 0, 1, and 2.
But the function f
interprets it as:
+----------+---------+----------+ | 0 | 1 | 2 | 3 | 4 | 5 | |+----+----+----+----+----+----+| || A | A | A | A | A | A || |+----+----+----+----+----+----+| || 1 | 2 | 3 | 4 | 5 | 6 || |+----+----+----+----+----+----+| +-------------------------------+
And mechanically goes from one item in the array to the next by incrementing the address by the size of an instance of A
and not the size of an instance of B
.
The solution is to make a copy of the bs
array to another with the expected memory layout:
#include <iostream> #include <algorithm> struct A { int na; }; struct B : A { int nb; }; void f(A as[]) { std::cout << as[0].na << "|" << as[1].na << "|" << as[2].na << std::endl; } int main() { B bs[] = { B{ A{1}, 2}, B{ A{3}, 4}, B{ A{5}, 6}}; A as[3]; std::copy(std::begin(bs), std::end(bs), std::begin(as)); f(as); return 0; }
During the copy process, each instance of B
is mapped (sliced, see #3) to an instance of A
, and we get this array:
+-----+----+----++ | 0 | 1 | 2 | |+----+----+----+| || A | A | A || |+----+----+----+| || 1 | 3 | 5 || |+----+----+----+| +----------------+
Same compilation.
Run:
>test.exe 1|3|5
As expected we get the values of the na
fields of the B
instances which have been copied to the new A
instances.
But wait!
But we have inheritance for struct
s as they inherit from System.ValueType
which itself inherits from System.Object
as all .NET types.
So we could trick this restriction further up the inheritance hierarchy if we can’t further down:
A[] a = { new A() }; object[] o = a; struct A { }
Compilation:
>csc Test.cs Compilateur Microsoft (R) Visual C# version 4.4.0-3.22518.13 (7856a68c) Copyright (C) Microsoft Corporation. Tous droits réservés. Test.cs(3,14): error CS0029: Cannot implicitly convert type 'A[]' to 'object[]'
So all is safe: when we have inheritance we cannot have covariance.
The reason is that the memory representation of items in object[]
, references pointing to objects themselves, has nothing to do with those of instances of A
directly stored side by side into the array.
As in C++, the solution is to copy the array to another array with the correct memory layout:
A[] a = { new A() }; object[] o = new object[a.Length]; System.Array.Copy(a, o, a.Length);
During the copy from the array a
to the array o
, the instances of A
are boxed and the references to these boxes are stored in the array o
, so as statically declared o
is an array of references to objects.
So in conclusion:
- if we don’t have covariance inheritance is safe because a
B[]
could not be substituted where anA[]
is expected, this is C++ choice, - if we don’t have inheritance covariance is safe because
A[]
cannot reference aB[]
as notB
can exist, this is C# choice.
#3 Memory management
And finally memory management, a debatable reason, as we’ll see, which is often invoked: what would happen if an instance of a derived struct is copied to a variable hosting an instance of a base struct?
Here is an illustration:
struct A { public int NA; } struct B : A { public int NB; } A a = new B();
Indeed with a naive implementation that would copy the whole B
instance into a
memory, you might end up breaking the memory safety, corrupting neighbouring memory as B
instances are larger than A
instances.
But there exists a well-known solution, implemented by C++: slicing.
The idea is to only copy the part of the B
instance which is inherited from A
(and makes it an A
).
Here is an illustration in C++, along with a demonstration of what happens if we dodge it:
#include <iostream> struct A { int64_t na; }; struct B : A { int64_t nb; }; int main() { B b { A{1}, 1 }; A a1 = {2}; A a2 = {3}; a1 = b; std::cout << "With slicing: " << a1.na << "|" << a2.na << std::endl; *(B*)&a1 = b; std::cout << "Without slicing: " << a1.na << "|" << a2.na << std::endl; return 0; }
The first assignment a1 = b
uses the default C++ behaviour, leveraging slicing.
The second one *(B*)&a1 = b
bypasses slicing by artificially creating a B
left-value to reproduce what would happen without slicing:
&a1
gets anA*
pointer toa1
,(B*)
casts it to aB*
pointer, breaking type-safety by the way,- and finally
*
dereferences it to present aB
instance.
Note: I’ve used int64_t
as the code was compiled and run on a 64-bit machine and for performance reasons the compiler aligns the structs’ instances memory to 8 bytes memory boundaries, hence preventing the demonstration of the issue as the corrupted memory is not used anyway.
Compilation:
>cl /std:c++latest /EHsc test.cpp Microsoft (R) C/C++ Optimizing Compiler Version 19.34.31933 for x64 Copyright (C) Microsoft Corporation. All rights reserved. /std:c++latest is provided as a preview of language features from the latest C++ working draft, and we're eager to hear about bugs and suggestions for improvements. However, note that these features are provided as-is without support, and subject to changes or removal as the working draft evolves. See https://go.microsoft.com/fwlink/?linkid=2045807 for details. test.cpp test.cpp(19) : warning C4789: buffer 'a1' of size 8 bytes will be overrun; 16 bytes will be written starting at offset 0 Microsoft (R) Incremental Linker Version 14.34.31933.0 Copyright (C) Microsoft Corporation. All rights reserved. /out:test.exe test.obj
Note that the compiler has spotted the issue with the second assignment.
Run:
>test.exe With slicing: 1|3 Without slicing: 1|1
So the first assignment, with slicing, is perfectly memory-safe: only a1
has been affected.
But the second assignment, without slicing, also affects a2
, a sign that memory has been corrupted.
Let’s explain what happened in detail.
Here is a representation of the memory before the first assignment:
+--------+--------+--------+--------+ | b | a1 | a2 | +--------+--------+--------+--------+ | na | nb | na | na | +--------+--------+--------+--------+ | 1 | 1 | 2 | 3 | +--------+--------+--------+----+---+
The 3 instances memory slots are contiguous.
Then after the first assignment:
+--------+--------+--------+--------+ | b | a1 | a2 | +--------+--------+--------+--------+ | na | nb | na | na | +--------+--------+--------+--------+ | 1 | 1 | 1 | 3 | +--------+--------+--------+----+---+
Thanks to slicing only b.na
has been copied to a1
, only setting a1.na
.
But without slicing, the entire b
is copied to a1
overflowing on a2
which is stored right behind
So after the second assignment, we get:
+--------+--------+--------+--------+ | b | a1 | a2 | +--------+--------+--------+--------+ | na | nb | na | na | +--------+--------+--------+--------+ | 1 | 1 | 1 | 1 | +--------+--------+--------+----+---+
As expected not only a1.n
but also a2.n
was set.
So while it is a real technical issue it has a well-known solution implemented by C++ for decades, but most C# programmers, myself included, are not fluent in C++, so it is largely unknown.
But while slicing probably could have been implemented since the inception of .NET and C#, it would imply more complexity in the C# compiler and/or .NET CLR.
Conclusion
So we have presented 3 good reasons for .NET/C# not to support value-types/struct inheritance.
If you know about any other reasons please share.