Friday, April 10, 2009

Generic Arithmetic for .NET

To me, one of the major design flaws and sources of frustration in .NET’s generics system is the absence of suppport for some sort of common arithmetic interface for numeric types (float, double, int, etc.).

But why should anyone care about such things? Well, the problem becomes obvious as soon as you try to implement—say—a generic vector class:

(Note: I'll only show the relavant parts of the code here to avoid clutter and confusion)

   1:  // a vector in R^n
   2:  class Vector<T>
   3:  {
   4:      T[] components;
   5:    
   6:      // ctors, properties etc. ...
   7:  
   8:      // operator defintion, here: vector addition
   9:      public staticVector<T> operator +(Vector<T> lhs,
  10:                                        Vector<T> rhs)
  11:      {
  12:          Vector<T> res = new Vector<T>(lhs.Length);
  13:  
  14:          for (int i = 0; i < res.Length; ++i)
  15:          {
  16:                       // compiler error!
  17:              res[i] = lhs[i] + rhs[i];
  18:          }
  19:  
  20:          return res;
  21:      }
  22:  }

The reason for the compiler error is, that there’s simply not enough information available to the compiler to check, whether T supports op_Addition. Now, if the CLI designers had included some interface definition like

   1:  interface INumeric<T>
   2:  {
   3:      T Add(T rhs);
   4:      T Sub(T rhs);
   5:      T Mul(T rhs);
   6:      T Div(T rhs);
   7:      // etc.

... (or similar) and if they had implemented this interface for every numeric primitive type, we could simply use an interface constraint to solve our problem:

   1:  // a vector in R^n
   2:  class Vector<T> where T: INumeric<T>
   3:  {
   4:      T[] components;
   5:      
   6:      // ctors, properties ...
   7:      
   8:      public static Vector<T> operator +(Vector<T> lhs, Vector<T> rhs)
   9:      {
  10:          Vector<T> res = new Vector<T>(lhs.Length);
  11:      
  12:          for (int i = 0; i < res.Length; ++i)
  13:          {
  14:              res[i] = lhs[i].Add(rhs[i]);       // OK, T implements INumeric<T>
  15:          }
  16:          
  17:          return res;
  18:      }
  19:  }

(Note: Operators are always statically defined in .NET, hence we can only use normal prefix method calls as static methods can never be virtual, thus neither abstract nor part of an interface defintion.)

Unfortunately, such an interface does not exist. Several people already tried to circumvent this limitation. One of the first approaches uses special numerical interfaces that implement arithmetic operations for concrete types. This technique works reasonably well, besides having the disadvantage of forcing the user to explicitly provide an implementation of such an interfaces (or at least a type name of a type implementing the according interface).

More recently, a new solution came up, which uses runtime code generation by building and compiling expression trees on the fly (or using LCG for older .NET version). This indeed is a very elegant solution to the problem. Using an implementation of this concept like the one found in the MiscUtil library you can simply write something like

   1:  T sum = Operator<T>.Add(lhs, rhs);

... inside your generic math algorithms.

The appropriate "Add" method for type T is generated on-the-fly upon the first call and cached for later reuse. The drawback of this method is the need for delegate invocations, which tend to be a lot more expensive than interface calls (at least my first benchmarks indicate that, especially on x64; curiously, the effect is much less severe on x86).

Interestingly, the latest F# CTP also provides generic vector and matrix classes (namespace Microsoft.FSharp.Math in assembly FSharp.PowerPack.dll). You can instantiate e.g. matrices like that:

   1:  Vector<float> v = VectorModule.Generic.create<float>(3, 3, 0f);

This returns a 3×3 matrix of floats initialized to 0.0f. So how did they do that? As far as I understand, their approach is very similar to the first method I described (using interfaces to define "calculator" types). To avoid having the user explicitly specify the operators’ implementation they use some kind of “type registry”, mapping each known numeric type to an appropriate implementation of an interface that defines basic arithmetic operations.

I’ve tried to do a quick n’ dirty implementation of this idea in C# which resulted in something like this (I haven’t yet checked if this actually runs...):

   1:  namespace GenericMath
   2:  {
   3:      public interface INumeric<T>
   4:      {
   5:          T Add(T lhs, T rhs);
   6:          T Sub(T lhs, T rhs);
   7:          T Mul(T lhs, T rhs);
   8:          T Div(T lhs, T rhs);
   9:          T Abs(T x);
  10:          T Zero { get; }
  11:          T One { get; }
  12:      }
  13:   
  14:      class FloatNumerics: INumeric<float>
  15:      {
  16:          public float Add(float lhs, float rhs)
  17:          {
  18:              return lhs + rhs;
  19:          }
  20:   
  21:          public float Sub(float lhs, float rhs)
  22:          {
  23:              return lhs - rhs;
  24:          }
  25:   
  26:          public float Mul(float lhs, float rhs)
  27:          {
  28:              return lhs * rhs;
  29:          }
  30:   
  31:          public float Div(float lhs, float rhs)
  32:          {
  33:              return lhs / rhs;
  34:          }
  35:   
  36:          public float Abs(float x)
  37:          {
  38:              return Math.Abs(x);
  39:          }
  40:   
  41:          public float Zero
  42:          {
  43:              get { return 0f; }
  44:          }
  45:   
  46:          public float One
  47:          {
  48:              get { return 1f; }
  49:          }
  50:      }
  51:   
  52:      // implementations for other types ...
  53:   
  54:      public static class ArithmeticAssociations
  55:      {
  56:          static readonly IDictionary<Type, object> operationsDict = new Dictionary<Type, object>
  57:          {
  58:              { typeof(float), new FloatNumerics() },
  59:              { typeof(double), new DoubleNumerics() },
  60:              { typeof(int), new Int32Numerics() }
  61:          };
  62:   
  63:          static string numericIfaceName = typeof(INumeric<>).FullName;
  64:          static string numericIfaceNameShort = typeof(INumeric<>).Name;
  65:   
  66:          public static bool UnregisterType<T>()
  67:          {
  68:              return operationsDict.Remove(typeof(T));
  69:          }
  70:   
  71:          public static void RegisterType<T>(INumeric<T> arithmetics)
  72:          {
  73:              operationsDict[typeof(T)] = arithmetics;
  74:          }
  75:   
  76:          public static INumeric<T> TryGetNumericOps<T>()
  77:          {
  78:              Type t = typeof(T);
  79:              
  80:              if (!operationsDict.ContainsKey(t))
  81:              {
  82:                  throw new ArgumentException("No implementation of " 
  83:                      + numericIfaceNameShort + "<" + t.Name + "> registered.");
  84:              }
  85:   
  86:              Type opsType = operationsDict[t].GetType();
  87:              
  88:              if (opsType.GetInterface(numericIfaceName) == null)
  89:              {
  90:                  throw new ArgumentException("Arithmetic object associated with " 
  91:                      + t.Name + " does not implement " + numericIfaceNameShort);
  92:              }
  93:   
  94:              return (INumeric<T>)operationsDict[t];
  95:          }
  96:      }
  97:   
  98:      // a sample class that uses INumeric + ArithmeticAssociations
  99:      public sealed class Vector<T>
 100:      {
 101:          static INumeric<T> ops = ArithmeticAssociations.TryGetNumericOps<T>();
 102:   
 103:          T[] components;
 104:   
 105:          public T this[int i]
 106:          {
 107:              get { return components[i]; }
 108:              set { components[i] = value; }
 109:          }
 110:   
 111:          public int Length { get { return components.Length; } }
 112:   
 113:          public Vector(int size)
 114:          {
 115:              components = new T[size];
 116:          }
 117:   
 118:          public static T operator *(Vector<T> lhs, Vector<T> rhs)
 119:          {
 120:              T res = ops.Zero;
 121:   
 122:              for (int i = 0; i < lhs.Length; ++i)
 123:              {
 124:                  res = ops.Add(res, ops.Mul(lhs[i], rhs[i]));
 125:              }
 126:   
 127:              return res;
 128:          }
 129:   
 130:          public static Vector<T> operator *(T lambda, Vector<T> vec)
 131:          {
 132:              Vector<T> result = new Vector<T>(vec.Length);
 133:   
 134:              for (int i = 0; i < result.Length; ++i)
 135:              {
 136:                  result[i] = ops.Mul(vec[i], lambda);
 137:              }
 138:   
 139:              return result;
 140:          }
 141:   
 142:          // other properties, methods and operators ...
 143:      }
 144:   
 145:      // ...
 146:  }

Objects of the type Vector can now be created by this simple line:

   1:  Vector<float> v = new Vector<float>(3);

As you can see, this is totally transparent to the user (no need to specifiy a "Calculator" type parameter) and doesn’t have the potential performance problems of the delegate/Func method.

Maybe someone finds this useful and can even improve the idea. I also plan to do a performance comparision of the different methods soon.


Update 2009-04-11:
I found another nice blog entry that covers the same topic and also presents a solution similar to the one showed above.

No comments:

Post a Comment