Showing posts with label interop. Show all posts
Showing posts with label interop. Show all posts

Monday, May 19, 2014

Binary STL I/O using NativeInterop.Stream

There are essentially three common ways for reading/writing structured binary data of some user defined type T (read: "files of records") from/to files in C# and .NET in general:
  1. Use System.IO.BinaryReader/BinaryWriter and its' ReadXXX/Write methods to read/write individual fields of the data type T in question.
  2. Use one of the System.Runtime.InteropServices.Marshal.PtrToStructure/StructureToPtr methods to covert between (pinned) byte arrays (which can be written to and read from System.IO.Stream) and non-generic value types or "formatted class types" via .NET's COM marshalling infrastructure.
  3. Use a little bit of unsafe code and cast a byte* pointing at the first entry of a pinned byte buffer to a T* so structs of that type T can be written/read by simply dereferencing a pointer.
Generally, the whole task becomes a lot easier, when T is an unmanaged type (required for option #3 to work properly). Unmanaged types are either primitive types or value types (struct in C#) composed of only unmanaged types. See the recursion in the definition? What that essentially boils down to is that an unmanaged type may be composed of arbitrarily nested structs, as long as no reference type is involved at any level.

Sadly, C# cannot constrain type parameters to unmanaged types and, as a consequence, does not support dereferencing generic pointer types. (In C# there is only the struct constraint, but that is not sufficient as a struct might contain fields of reference type.) What that means is that option #3 from above only works for concrete types.

Because F# can constrain a type parameter to be an unmanaged type, I used F# to build the NativeInterop library that—at its core—exposes the possiblity to (unsafely!) dereference generic pointers in C#. This in turn enabled the implementation of some generic extension methods (NativeInterop ≥ v1.4.0) for System.IO.Stream that provide both easy* to use and efficient** methods for reading and writing structured binary files.

A word of warning: The F# unmanaged constrain does not surface in the NativeInterop API if used from C# (and probably VB.NET). Thus the user of NativeInterop has to make sure, that his data types are truly unmanaged! If that is not the case NativeInterop may produce arbitrary garbage!

Example: Writing and Reading Binary STL files

Let's look at a simple example for how to use the aformentioned library methods to handle a simple structured binary file format: STL files essentially contain a description of a triangle mesh (triangulated surface). The exact format of the binary version (there is also an ASCII variant) is described on Wikipedia:
  • An 80 byte ASCII header, which may contain some descriptive string
  • The number of triangles in the mesh as an unsigned 32 bit integer
  • For each triangle there is one record of the following format:
    • A normal vector made up of three single precions floating point numbers
    • Three vertices, each made up of three single precions floating point numbers
    • A field called "Attribute byte count" (16 bit unsigned integer); this should usually be zero, but some software may interpret this as color information.

Modeling STL Records in C#

In this example, we will only explicitly model the triangle records. The header information is so simple that it's easier to just directly write out a 80 byte ASCII string.

To represent the triangle information, we use the following user-defined unmanaged type(s):

[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct STLVector
{
    public readonly float X;
    public readonly float Y;
    public readonly float Z;
 
    public STLVector(float x, float y, float z) {
        this.X = x;
        this.Y = y;
        this.Z = z;
    }
}
 
[StructLayout(LayoutKind.Sequential, Pack = 1)]    
struct STLTriangle
{
    // 4 * 3 * 4 byte + 2 byte = 50 byte
    public readonly STLVector Normal;
    public readonly STLVector A;
    public readonly STLVector B;
    public readonly STLVector C;
    public readonly ushort AttributeByteCount;
 
    public STLTriangle(
        STLVector normalVec, 
        STLVector vertex1, 
        STLVector vertex2, 
        STLVector vertex3, 
        ushort attr = 0) 
    {
        Normal = normalVec;
        A = vertex1;
        B = vertex2;
        C = vertex3;
        AttributeByteCount = attr;            
    }
}

Defining a Test Geometry

For testing purposes, we can now define a super-simple triangle mesh, a single tetrahedron:

// tetrahedron, vertex order: right hand rule
var mesh = new STLTriangle[] {
    new STLTriangle(new STLVector(0, 0, 0),
                    new STLVector(0, 0, 0),
                    new STLVector(0, 1, 0),
                    new STLVector(1, 0, 0)),
    new STLTriangle(new STLVector(0, 0, 0),
                    new STLVector(0, 0, 0),
                    new STLVector(0, 0, 1),
                    new STLVector(0, 1, 0)),
    new STLTriangle(new STLVector(0, 0, 0),
                    new STLVector(0, 0, 0),
                    new STLVector(0, 0, 1),
                    new STLVector(1, 0, 0)),
    new STLTriangle(new STLVector(0, 0, 0),
                    new STLVector(0, 1, 0),
                    new STLVector(0, 0, 1),
                    new STLVector(1, 0, 0)),
};

We leave the normals at zero: Most software will derive the surface normals automatically correctly, if the order in which the vertices of each face are specified follow the right hand rule (i.e. vertices enumerated counter-clockwise when looking at the face from the outside).

Writing STL

Now it's really straightforward to generate a valid STL file from the available data:

using (var bw = new BinaryWriter(File.OpenWrite(filename), Encoding.ASCII)) {
    // Encode the header string as ASCII and put it in a 80 bytes buffer
    var headerString = "Tetrahedron";
    var header = new byte[80];
    Encoding.ASCII.GetBytes(headerString, 0, headerString.Length, header, 0);
    bw.Write(header);
    // write #triangles
    bw.Write(mesh.Length);
    // use extension method from NativeInterop.Stream to write out the mesh data
    bw.BaseStream.WriteUnmanagedStructRange(mesh);
}

And here we have it, our, ahem, "beautiful" tetrahedron rendered in MeshLab:


Note how all the surface normals are sticking outward.

Reading STL

By using the ReadStructRange<T> extension method, reading binary STL data is just as simple:

string header;
STLTriangle[] mesh;
 
using (var br = new BinaryReader(File.OpenRead(filename), Encoding.ASCII)) {
    header = Encoding.ASCII.GetString(br.ReadBytes(80));
    var triCount = br.ReadUInt32();
    mesh = br.BaseStream.ReadUnmanagedStructRange<STLTriangle>((int)triCount);
} 

Conclusion

Reading and writing simple structured binary data is easy using NativeInterop.Stream. Get it now from NuGet! For reporting bugs or suggesting new features, please use the BitBucket issue tracker.

*Easy, because the user of the library doesn't have to fiddle with unsafe code.
**Efficient, because under the hood it boils down to a generic version of option #3 with zero marshalling overhead.

Sunday, April 25, 2010

A minimalistic native 64 bit array implementation for .NET (updated code)

If you ever felt the need to process huge amounts of data via a algorithm implemented using .NET/the CLR, you’ve surely ran into the 2^31-items-limit of the CLR’s current array implementation that only supports Int32 array indices (this also affects other collections like List<T> as those use standard arrays for storage internally).
You can try to circumvent this limitation by implementing your own array-like data-structure, either by emulating continuous storage via a collection of standard .NET-arrays (partition your data in chunks with 2^31 items each), or you can use native APIs and some evil pointer arithmetic to get maximum performance.
A while ago I tried to implement the latter approach in C#, which isn’t a big deal, only a matter of some unsafe-blocks for pointer arithmetic and a call to Marshal.AllocHGlobal for allocating memory on the unmanged heap. However, when I tried to make that custom collection into a generic one, I ran into an unsolvable problem:
public unsafe T this[long index]
{
    get
    {                
        return *((T*)pBase + index);
    }
    set
    {
        *((T*)pBase + index) = value;
    }
}

This code does not compile. The reason for that is, that there is no way to tell the C# compiler that T shall be constrained to unmanaged types.
Interestingly, F# 2.0 does feature such a constraint! This is how a minimalistic F# implementation of such an native 64 bit array could look like:
namespace NativeTools
 
#nowarn "9"
#nowarn "42"
 
open System
open System.Runtime
open Microsoft.FSharp.NativeInterop
 
module internal PointerArithmetic =
    [<CompiledName("AddIntPtrToIntPtr")>]
    [<Unverifiable>]
    let inline addNativeInt (x: nativeptr<'T>) (n: nativeint) : nativeptr<'T> = 
        (NativePtr.toNativeInt x) + n * (# "sizeof !0" type('T) : nativeint #) |> NativePtr.ofNativeInt
    
    // "reinterpret_cast<IntPtr>(x)"... EVIL!
    [<CompiledName("Int64ToIntPtr")>]
    [<Unverifiable>]
    let inline int64ToNativeint (x: int64) = (# "" x : nativeint #)
 
    [<CompiledName("AddInt64ToIntPtr")>]
    [<Unverifiable>]
    let inline addInt64 (x: nativeptr<'a>) (o: int64) : nativeptr<'a> = addNativeInt x (int64ToNativeint o)
    
[<Sealed>]
type NativeArray64<'T when 'T: unmanaged>(length: int64) =
    let itemSize: int64 = (int64)(InteropServices.Marshal.SizeOf(typeof<'T>))
    let mutable isDisposed = false
    let allocatedBytes = length * itemSize
    let blob = InteropServices.Marshal.AllocHGlobal(nativeint allocatedBytes)
    let pBlobBase: nativeptr<'T> = NativePtr.ofNativeInt blob
    let disposeLock = new Object()
 
    member this.Length = length
    member this.BaseAddress = pBlobBase
    member this.ItemSize = itemSize
    member this.IsDisposed = isDisposed
    member this.AllocatedBytes = allocatedBytes
    
    member private this.Free () =
        lock disposeLock (fun () ->
            if isDisposed
                then ()
                else InteropServices.Marshal.FreeHGlobal blob
                     isDisposed <- true
        )
           
    member this.Item
        with get (idx: int64) =
                        NativePtr.read (PointerArithmetic.addInt64 pBlobBase idx)                    
        and  set (idx: int64) (value: 'T) =
                        NativePtr.write (PointerArithmetic.addInt64 pBlobBase idx) value
        
    member private this.Items = seq {
            for i in 0L .. length - 1L do
                yield this.[i]
        }
 
    override this.Finalize () = this.Free()
    
    interface IDisposable with
        member this.Dispose () =
            GC.SuppressFinalize this
            this.Free()
 
    interface Collections.Generic.IEnumerable<'T> with
        member this.GetEnumerator () : Collections.Generic.IEnumerator<'T> =
            this.Items.GetEnumerator()
        member this.GetEnumerator () : Collections.IEnumerator =
            this.Items.GetEnumerator() :> Collections.IEnumerator

UPDATE 2010-04-25: Removed a few bugs.

You can use this data structure in your C# code like a normal array:
var length = 8L * 1024L * 1024L * 1024L;

// allocate a byte-array of 8 GiB

using(arr = new NativeTools.NativeArray64<byte>(length))

{

    arr[0] = 123;

    arr[length-1] = 222;

    Console.WriteLine("Allocated " + arr.AllocatedBytes);

}

// auto-disposed ...