Reduce the size of MongoDB documents generated from .Net/C#

Introduction

This is a small article about an issue I had recently trying to save some big documents represented as .Net objects in MongoDB using the MongoDB .Net driver.

While saving a “relatively” big document I’ve received the following exception:

System.IO.FileFormatException: Size 32325140 is larger than MaxDocumentSize 16777216.
   at MongoDB.Bson.IO.BsonBinaryWriter.BackpatchSize() in c:\projects\mongo-csharp-driver\MongoDB.Bson\IO\BsonBinaryWriter.cs:line 697
   at MongoDB.Bson.IO.BsonBinaryWriter.WriteEndArray() in c:\projects\mongo-csharp-driver\MongoDB.Bson\IO\BsonBinaryWriter.cs:line 294
   at MongoDB.Bson.Serialization.Serializers.EnumerableSerializerBase`1.Serialize(BsonWriter bsonWriter, Type nominalType, Object value, IBsonSerializationOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\EnumerableSerializerBase.cs:line 408
   at MongoDB.Bson.Serialization.BsonClassMapSerializer.SerializeMember(BsonWriter bsonWriter, Object obj, BsonMemberMap memberMap) in c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\BsonClassMapSerializer.cs:line 684
   at MongoDB.Bson.Serialization.BsonClassMapSerializer.Serialize(BsonWriter bsonWriter, Type nominalType, Object value, IBsonSerializationOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\BsonClassMapSerializer.cs:line 432
   at MongoDB.Driver.Internal.MongoInsertMessage.AddDocument(BsonBuffer buffer, Type nominalType, Object document) in c:\projects\mongo-csharp-driver\MongoDB.Driver\Communication\Messages\MongoInsertMessage.cs:line 53
   at MongoDB.Driver.Operations.InsertOperation.Execute(MongoConnection connection) in c:\projects\mongo-csharp-driver\MongoDB.Driver\Operations\InsertOperation.cs:line 97
   at MongoDB.Driver.MongoCollection.InsertBatch(Type nominalType, IEnumerable documents, MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1149
   at MongoDB.Driver.MongoCollection.Insert(Type nominalType, Object document, MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1004
   at MongoDB.Driver.MongoCollection.Save(Type nominalType, Object document, MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1426

Well the message is clear: seems like I’ve exceeded the MongoDB max document size threshold which is 16MB, fair enough this is quite a sane design decision.

First I’ll explain why I had this issue, then how I’ve solved it.

Causes and consequences

At first I was quite surprised because the same set of objects represented as a CSV document was only a 6MB file.
But rethinking about the data I remembered that this data-set is mostly a sparse matrix because a lot of properties are null.
With the CSV format for each null property you only pay for a semi-colon, quite cheap even if you have hundreds of thousands of them.

But with an object-oriented representation like .Net objects or BSON documents this is another story: for each null property the cost is far higher because you still store the name of the property and the “null” symbol!
And when you have dozens of properties (and yes I have good reasons to have that much properties in a single object :)) the overhead can be huge and represent most of the total size.

So you end up with documents that look something like:

{
    a: "Some data",
    b: null,
    c: null,
    d: "Some other data",
    e: null,
    f: null,
    g: null,
    ...
    z: "Last data"
}

Much of the document is filled with useless markers increasing its size for no additional information.
And this is not really flattering for BSON: my BSON document was 6 times bigger than the CSV document!

The solution

Fortunately the guys behind the MongoDB .Net driver are aware of this kind of issue, and they have taken it into account when designing the driver, allowing you to customize the way the BSON documents are generated.

You have at least 2 solutions:

  • mark properties that should be ignored if null,
  • register a global policy for the whole app-domain.

If you want to mark properties individually you can use the BsonIgnoreIfNull attribute:

class Data
{
    [BsonIgnoreIfNull]
    public string A { get; set; }
    [BsonIgnoreIfNull]
    public string B { get; set; }
    [BsonIgnoreIfNull]
    public string C { get; set; }
}

The good thing is that this is quite explicit.
But it can add a lot of code if like me you have dozens of properties to mark.
Moreover it is quite obtrusive and I don’t like to pollute my business entities with technical attributes, though I do it if there is no simpler solution: again pragmatism should always prevails over dogmatism, though some dogmatic geeks prefer duplicating code and add mappings to clearly isolate business entities. (I’m a recovering dogmatic ;))

For my current issue I’ve chosen the other way by registering a global policy:

ConventionPack pack = new ConventionPack();
pack.Add(new IgnoreIfNullConvention(true));

ConventionRegistry.Register("Ignore null properties of data", pack, type => type == typeof(Data));

The last predicate ensures the policy only applies to my “Data” class.

I’ve put this code in the static constructor of the type that is the entry point to the MongoDB database.
So if I have no need for MongoDB the type won’t be loaded by the CLR and this code won’t be executed.
You could also put this code in the Main of your application, but if you have more than one application that uses your MongoDB layer you might need to duplicate code, so prefer a static constructor or any other “Init” method.

Conclusion

After applying this patch I was able to save my documents, and to have an idea of how many space was saved I’ve checked the size of the newly saved document in the Mongo Shell using the Object.bsonsize() method:

> Object.bsonsize(db.data.find()[0])
7161729

Compared to the original BSON document that included all the properties this is far better, 7MB instead of 32MB, more than 4 times smaller.
Of course there is still an overhead compared to CSV because you need to store the fields names when the values are not null, but it’s limited to “only” 15%.
It’s still a big document, but one that fits into the MongoDB database, and this is all that matters.

Hopefully this article will help somebody with the same issue.
If you catch any typo or mistake or have additional questions feel free to let a comment.

5 thoughts on “Reduce the size of MongoDB documents generated from .Net/C#

    • You mean for your data class?
      If so this would not be a big surprise as XML serializers have similar limitations.
      Could you please provide a minimal sample demonstrating the issue?

  1. Sadly, one limitation of the C# MongoDB driver is that you cannot set integer values to null. What I do in this cases is to save the value as string, and conver it to integer server side. It requires slightly more CPU, but you save a lot of disk space some times.

    • Thanks for sharing, I was not aware of this limitation.
      But often you can workaround this kind of limitations by using a special value that is out of the range of acceptable values for the field.
      e.g. if you have only positive numbers you can use -1, or if you have reasonable values you could use int.MaxValue or int.MinValue.
      The downside is that your data model is less future-proof as these special values could be added to the range of acceptable values later.

Leave a Reply to pragmateek Cancel reply

Your email address will not be published. Required fields are marked *

Prove me you\'re human :) *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>