Reduce the size of MongoDB documents generated from .Net/C#

Introduction

This is a small article about an issue I had recently trying to save some big documents represented as .Net objects in MongoDB using the MongoDB .Net driver.

While saving a “relatively” big document I’ve received the following exception:

System.IO.FileFormatException: Size 32325140 is larger than MaxDocumentSize 16777216.
   at MongoDB.Bson.IO.BsonBinaryWriter.BackpatchSize() in c:\projects\mongo-csharp-driver\MongoDB.Bson\IO\BsonBinaryWriter.cs:line 697
   at MongoDB.Bson.IO.BsonBinaryWriter.WriteEndArray() in c:\projects\mongo-csharp-driver\MongoDB.Bson\IO\BsonBinaryWriter.cs:line 294
   at MongoDB.Bson.Serialization.Serializers.EnumerableSerializerBase`1.Serialize(BsonWriter bsonWriter, Type nominalType, Object value, IBsonSerializationOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\EnumerableSerializerBase.cs:line 408
   at MongoDB.Bson.Serialization.BsonClassMapSerializer.SerializeMember(BsonWriter bsonWriter, Object obj, BsonMemberMap memberMap) in c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\BsonClassMapSerializer.cs:line 684
   at MongoDB.Bson.Serialization.BsonClassMapSerializer.Serialize(BsonWriter bsonWriter, Type nominalType, Object value, IBsonSerializationOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Bson\Serialization\Serializers\BsonClassMapSerializer.cs:line 432
   at MongoDB.Driver.Internal.MongoInsertMessage.AddDocument(BsonBuffer buffer, Type nominalType, Object document) in c:\projects\mongo-csharp-driver\MongoDB.Driver\Communication\Messages\MongoInsertMessage.cs:line 53
   at MongoDB.Driver.Operations.InsertOperation.Execute(MongoConnection connection) in c:\projects\mongo-csharp-driver\MongoDB.Driver\Operations\InsertOperation.cs:line 97
   at MongoDB.Driver.MongoCollection.InsertBatch(Type nominalType, IEnumerable documents, MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1149
   at MongoDB.Driver.MongoCollection.Insert(Type nominalType, Object document, MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1004
   at MongoDB.Driver.MongoCollection.Save(Type nominalType, Object document, MongoInsertOptions options) in c:\projects\mongo-csharp-driver\MongoDB.Driver\MongoCollection.cs:line 1426

Well the message is clear: seems like I’ve exceeded the MongoDB max document size threshold which is 16MB, fair enough this is quite a sane design decision.

First I’ll explain why I had this issue, then how I’ve solved it.

Continue reading

Using an Access, MySQL, PostgreSQL or SQLite database from VBA

Introduction

Note to pedantic guys: yes Access is not a database engine, only the graphical front-end to Jet/ACE, but we’ll stand with this simplification. 🙂

The sample application (Excel + VBA) and the SQL source code are available in this ZIP archive: VBA_Databases_Source.zip.

If you are an advanced VBA developer who builds applications that manage a non trivial amount of data odds are good you are using an Access database to store them.
If this setup perfectly fits your current needs, you completely master it, you’re not experiencing any issue and your needs won’t evolve in the near future you can skip this article and continue enjoying your system. 😉

Indeed, do you really need a new database management system (DBMS)?
Often the only argument in favor of migrating to other DBMS is they are “better”; while it’s true for some technical capabilities, it may not be regarding other “metrics” like simplicity: Access is easy to understand and manage for non IT staff and is often installed with default business workstation setup along with the rest of the Office suite.

So let’s say you have strong reasons to migrate to a new DBMS because you’ve come to the point where you feel you need support for at least one of the following features: interoperability, big storage, high concurrency (hundreds of users) and/or high performance, and Access starts to become a part of the problem.
So what can you do if you want to enhance the quality of your database without making your total cost of ownership (TCO) explode?

Your TCO is essentially made of:

  • licensing costs: limiting them is quite simple: using a free, usually open-source, database and paying only for support
  • management costs: they are by far bigger than the licensing costs and are directly impacted by the complexity of the DBMS; so you need a simple DBMS that you’ll be able to setup and manage yourself as you used to do with Access without the help of system or database administrators
  • development costs: every additional change to your current schema or VBA implementation to fit the new DBMS will have a cost; so we want things to be transparent with zero additional development, which in particular means a standard SQL-based DBMS.

While the equation may seem a little complex it has at least three solutions:

  • SQLite is the ideal candidate if you’re happy with the “single-file” model, you don’t have high concurrency constraints, and your only needs are interoperability (with Mac OS, Linux, Unix…), bigger storage and/or costs savings,
  • MySQL and PostgreSQL: if you need support for high-concurrency, really big storage (e.g. tens of GBs), advanced user-management, performance fine tuning and other advanced features you’ll have to jump out of the single-file world.
    If you don’t have specific requirements then MySQL and PostgreSQL will appear similar to you and equally do the job. However, in this kind of situation, I have a preference for MySQL, not because its inherent capabilities would be superior (as I’ve said MySQL and PostgreSQL are roughly equivalent for simple setups), but because, as the reference open-source DBMS for years, MySQL benefits from a huge community and toolbox. Moreover, while you’ll sure find the tools to work in good conditions with PostgreSQL, if you ever need to justify your choice to your hierarchy you’ll be in a better position if you choose the standard solution instead of the challenger.
    But as I’m not sectarian, and for completeness, I’ll cover both.

In this article I’ll quickly cover the setup of these three DBMS (with links to other resources for more extensive instructions) and illustrate their usage with a small VBA application, a revolutionary todo-list manager, that uses Access too.

Continue reading