The Self-Link Nonsense in Azure DocumentDB

Lately I have been playing with the Azure DocumentDB. It's really good: it's more truly a Database-as-a-Service than other hosted/managed NoSQL providers; it supports many good features that not everyone supports, like the distinction between replace vs. update; its price seems substantially lower than the others' (MongoLab, RavenHQ, MongoHQ).

However, I am not very happy with DocumentDB's client side programming experience. Overall, the implementation of the data access layer on top of DocumentDB is probably the most bloated among all the document store databases that I've used, including MongoDB, RavenDB and RethinkDB. In particular, there is one thing really annoying in Azure DocumentDB: the self-link. Due to the need of self-link in various places, the data access layer code for DocumentDB is fairly awkward.

What I meant by "bloated" was that to achieve the same thing (e.g. to implement the DeleteOrderByNumber method in the below example), I only need 1 line code on MongoDB but a lot more code on DocumentDB:

For MongoDB:


collection.DeleteOneAsync(o => o.OrderNumber == orderNumber).Wait();

For DocumentDB:



Order order = client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.OrderNumber == orderNumber)
.AsEnumerable().FirstOrDefault();
Document doc = client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.Id == order.Id).AsEnumerable().FirstOrDefault();
client.DeleteDocumentAsync(doc.SelfLink);


Let's go through the full example. I have an Order document, in which the Id is a GUID generated by the database during insert and the OrderNumber is a user-friendly string, such as "558-4094307-8688964".

    public class Order
    {
        public string Id; 
        public string OrderNumber; 
        public string ShippingAddress;
    }

The next thing I want to do is to implement the data access layer to add, get, update and delete an Order document in a document store database. In particular, I need a Get method and a Delete method which takes OrderNumber as parameter, because the client will also need to call the REST API using the order number. So basically I need to implement the below methods (I'm using C#):

    void AddOrder(Guid order);
    Order GetOrder(string id);
    Order GetOrderByNumber(string orderNumber);
    void UpdateOrder(Order order);
    void DeleteOrder(string id);
    void DeleteOrderByNumber(string orderNumber);

For each method, I compared the implementation on each database. The comparison is mainly focused on the amount of code and how straightforward and intuitive it is to code.

MongoDB RethinkDB RavenDB DocumentDB
AddOrder
GetOrder
GetOrderByNumber
UpdateOrder
DeleteOrder
DeleteOrderByNumber

Here is the code and why I gave these ratings:

1. AddOrder

/* MongoDB */
void AddOrder(Order order)
{
    collection.InsertOneAsync(order).Wait();
}

/* RethinkDB */
void AddOrder(Order order)
{
    order.Id = conn.Run(tblOrders.Insert(order)).GeneratedKeys[0];
}

/* RavenDB */
void AddOrder(Order order)
{
    using (IDocumentSession session = store.OpenSession())
    {
        session.Store(order);
        session.SaveChanges();
    }
}

/* DocumentDB */
void AddOrder(Order order)
{
    Document doc = client
        .CreateDocumentAsync(collection.SelfLink, order).Result;
    order.Id = doc.Id;
}

Not too bad. Extra credit to MongoDB and RavenDB: their client lib can automatically back-fill the DB generated value of Id to my original document object. On DocumentDB and RethinkDB, I need to write my own code to do the back-fill. Note: I'm using the latest official client driver for MongoDB, RavenDB and DocumentDB. For RethinkDB, they don't have official .NET driver, so I am using a community-supported .NET driver for RethinkDB.

2. GetOrder

/* MongoDB */
Order GetOrder(string id)
{
    return collection.Find(o => o.Id == id).FirstOrDefaultAsync().Result;
}

/* RethinkDB */
Order GetOrder(string id)
{
    return conn.Run(tblOrders.Get(id));
}

/* RavenDB */
Order GetOrder(string id)
{
    using (IDocumentSession session = store.OpenSession())
    {
        return session.Load(id);
    }
}

/* DocumentDB */
Order GetOrder(string id)
{
    return client.CreateDocumentQuery(collection.DocumentsLink)
        .Where(x => x.Id == id).AsEnumerable().FirstOrDefault(); 
}

One up for RethinkDB and RavenDB. When I use Id to query, I shouldn't need to write a search condition like x => x.Id == id.

3. GetOrderByNumber

/* MongoDB */
Order GetOrderByNumber(string orderNumber)
{
    return collection.Find(o => o.OrderNumber == orderNumber)
        .FirstOrDefaultAsync().Result;
}

/* RethinkDB */
Order GetOrderByNumber(string orderNumber)
{
    return conn.Run(tblOrders.Filter(o => o.OrderNumber == orderNumber))
            .FirstOrDefault();
}

/* RavenDB */
Order GetOrderByNumber(string orderNumber)
{
    using (IDocumentSession session = store.OpenSession())
    {
        return session.Query()
            .Where(x => x.OrderNumber == orderNumber).FirstOrDefault();
    }
}

/* DocumentDB */
Order GetOrderByNumber(string orderNumber)
{
    return client.CreateDocumentQuery(collection.DocumentsLink)
        .Where(x => x.OrderNumber == orderNumber)
        .AsEnumerable().FirstOrDefault();
}

4. UpdateOrder

/* MongoDB */
void UpdateOrder(Order order)
{
    collection.ReplaceOneAsync(o => o.Id == order.Id, order).Wait();
}

/* RethinkDB */
void UpdateOrder(Order order)
{
    conn.Run(tblOrders.Get(order.Id.ToString()).Replace(order));
}

/* RavenDB */
void UpdateOrder(Order order)
{
    using (IDocumentSession session = store.OpenSession())
    {
        session.Store(order);
        session.SaveChanges();
    }
}

/* DocumentDB */
void UpdateOrder(Order order)
{
    Document doc = client.CreateDocumentQuery(collection.DocumentsLink)
        .Where(x => x.Id == order.Id).AsEnumerable().FirstOrDefault();
    client.ReplaceDocumentAsync(doc.SelfLink, order).Wait();
}

DocumentDB needs an extra step! I have to do a separate query by Id first, to get back a Document object, then use the SelfLink value on the Document object to call ReplaceDocumentAsync. I don't understand why the syntax has to be like that.

5. DeleteOrder

/* MongoDB */
void DeleteOrder(string id)
{
    collection.DeleteOneAsync(o => o.Id == id).Wait();
}

/* RethinkDB */
void DeleteOrder(string id)
{
    conn.Run(tblOrders.Get(id).Delete());
}

/* RavenDB */
void DeleteOrder(string id)
{
    using (IDocumentSession session = store.OpenSession())
    {
        session.Delete(id);
        session.SaveChanges();
    }
}

/* DocumentDB */
void DeleteOrder(string id)
{
    Document doc = client.CreateDocumentQuery(collection.DocumentsLink)
        .Where(x => x.Id == id).AsEnumerable().FirstOrDefault();
    client.DeleteDocumentAsync(doc.SelfLink);
}

Same as in GetOrder, extra point for RethinkDB and RavenDB for not needing a search condition x => x.Id == id when Id is used.

6. DeleteOrderByNumber

/* MongoDB */
void DeleteOrderByNumber(string orderNumber)
{
    collection.DeleteOneAsync(o => o.OrderNumber == orderNumber).Wait();
}

/* RethinkDB */
void DeleteOrderByNumber(string orderNumber)
{
    conn.Run(tblOrders.Filter(o => o.OrderNumber == orderNumber).Delete());
}

/* RavenDB */
void DeleteOrderByNumber(string orderNumber)
{
    using (IDocumentSession session = store.OpenSession())
    {
        var order = session.Query()
            .Where(x => x.OrderNumber == orderNumber).FirstOrDefault();
        session.Delete(order);
        session.SaveChanges();
    }
}

/* DocumentDB */
void DeleteOrderByNumber(string orderNumber)
{
    Order order = client.CreateDocumentQuery(collection.DocumentsLink)
        .Where(x => x.OrderNumber == orderNumber)
        .AsEnumerable().FirstOrDefault();
    Document doc = client.CreateDocumentQuery(collection.DocumentsLink)
        .Where(x => x.Id == order.Id).AsEnumerable().FirstOrDefault();
    client.DeleteDocumentAsync(doc.SelfLink);
}

MongoDB and RethinkDB are the best for DeleteOrderByNumber. They both only need 1 call. RavenDB needs 2 calls: it first needs to query by OrderNumber, then do the Delete (which presumably will use the Id). DocumentDB is the worst as I need to do 3 calls! Before I can call DeleteDocumentAsync, I first need to do a query by OrderNumber to get the Id, then use Id to query again to get the self-link of this Order document! DocumentDB's client driver seems to only have one method for delete: DeleteDocumentAsync, which only takes a SelfLink string.

I don't understand why there isn't an overload of DeleteDocumentAsync which can take Id. It doesn't seem to be just me. There are 300 votes on feedback.azure.com asking for the support of deleting a document by id.

Summary

Overall, the data access layer implementation on DocumentDB is a bit inferior experience than on the other three. I hope the DocumentDB team can improve it in the near future.



Foot Note 1:

I was advised that if my Order object is extended from the Microsoft.Azure.Documents.Resource type, it will already have the SelfLink property on it and I will not need the extra step in UpdateOrder and DeleteDocumentAsync.

It works but not acceptable to me. Having the Order object extended from Resource will pollute my domain model. Usually we want our domain model objects to be free of dependencies, so that it works the best for the interoperability across different layers and stacks.

Although strictly speaking, the Order object isn't in 100% purity on MongoDB. I needed to put a [BsonId] attribute on the Id property. But an attribute is much better than additional member fields introduced by extending from a type in a specific DB's client driver. For example, one of the major difference is that: in JSON serialization, attributes won't show up but member fields will.

Foot Note 2:

The Order object was defined slightly differently on each DB. For completeness, here is the exact definitions:

    /* MongoDB */
    [Serializable]
    public class Order
    {
        [BsonId(IdGenerator = typeof(StringObjectIdGenerator))]
        public string Id;
        public string OrderNumber;
        public string ShippingAddress;
    }

    /* RethinkDB */
    [DataContract]
    public class Order
    {
        [DataMember(Name = "id", EmitDefaultValue = false)]
        public string Id;
        [DataMember]
        public string OrderNumber;
        [DataMember]
        public string ShippingAddress;
    }

    /* RavenDB */
    public class Order
    {
        public string Id = string.Empty;
        public string OrderNumber;
        public string ShippingAddress;
    }

    /* DocumentDB */
    public class Order
    {
        [JsonProperty(PropertyName = "id")]
        public string Id;
        public string OrderNumber;
        public string ShippingAddress;
    }

Comments on “The Self-Link Nonsense in Azure DocumentDB

Leave a Reply