Lately I have been playing with the Azure DocumentDB. It's really good: it's more truly a Database-as-a-Service than other hosted/managed NoSQL providers; it supports many good features that not everyone supports, like the distinction between replace vs. update; its price seems substantially lower than the others' (MongoLab, RavenHQ, MongoHQ).
However, I am not very happy with DocumentDB's client side programming experience. Overall, the implementation of the data access layer on top of DocumentDB is probably the most bloated among all the document store databases that I've used, including MongoDB, RavenDB and RethinkDB. In particular, there is one thing really annoying in Azure DocumentDB: the self-link. Due to the need of self-link in various places, the data access layer code for DocumentDB is fairly awkward.
What I meant by "bloated" was that to achieve the same thing (e.g. to implement the DeleteOrderByNumber
method in the below example), I only need 1 line code on MongoDB but a lot more code on DocumentDB:
For MongoDB:
collection.DeleteOneAsync(o => o.OrderNumber == orderNumber).Wait();
For DocumentDB:
Order order = client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.OrderNumber == orderNumber)
.AsEnumerable().FirstOrDefault();
Document doc = client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.Id == order.Id).AsEnumerable().FirstOrDefault();
client.DeleteDocumentAsync(doc.SelfLink);
Let's go through the full example. I have an Order
document, in which the Id
is a GUID generated by the database during insert and the OrderNumber
is a user-friendly string, such as "558-4094307-8688964".
public class Order
{
public string Id;
public string OrderNumber;
public string ShippingAddress;
}
The next thing I want to do is to implement the data access layer to add, get, update and delete an Order
document in a document store database. In particular, I need a Get
method and a Delete
method which takes OrderNumber
as parameter, because the client will also need to call the REST API using the order number. So basically I need to implement the below methods (I'm using C#):
void AddOrder(Guid order);
Order GetOrder(string id);
Order GetOrderByNumber(string orderNumber);
void UpdateOrder(Order order);
void DeleteOrder(string id);
void DeleteOrderByNumber(string orderNumber);
For each method, I compared the implementation on each database. The comparison is mainly focused on the amount of code and how straightforward and intuitive it is to code.
MongoDB | RethinkDB | RavenDB | DocumentDB | |
---|---|---|---|---|
AddOrder | ||||
GetOrder | ||||
GetOrderByNumber | ||||
UpdateOrder | ||||
DeleteOrder | ||||
DeleteOrderByNumber |
Here is the code and why I gave these ratings:
1. AddOrder
/* MongoDB */
void AddOrder(Order order)
{
collection.InsertOneAsync(order).Wait();
}
/* RethinkDB */
void AddOrder(Order order)
{
order.Id = conn.Run(tblOrders.Insert(order)).GeneratedKeys[0];
}
/* RavenDB */
void AddOrder(Order order)
{
using (IDocumentSession session = store.OpenSession())
{
session.Store(order);
session.SaveChanges();
}
}
/* DocumentDB */
void AddOrder(Order order)
{
Document doc = client
.CreateDocumentAsync(collection.SelfLink, order).Result;
order.Id = doc.Id;
}
Not too bad. Extra credit to MongoDB and RavenDB: their client lib can automatically back-fill the DB generated value of Id
to my original document object. On DocumentDB and RethinkDB, I need to write my own code to do the back-fill. Note: I'm using the latest official client driver for MongoDB, RavenDB and DocumentDB. For RethinkDB, they don't have official .NET driver, so I am using a community-supported .NET driver for RethinkDB.
2. GetOrder
/* MongoDB */
Order GetOrder(string id)
{
return collection.Find(o => o.Id == id).FirstOrDefaultAsync().Result;
}
/* RethinkDB */
Order GetOrder(string id)
{
return conn.Run(tblOrders.Get(id));
}
/* RavenDB */
Order GetOrder(string id)
{
using (IDocumentSession session = store.OpenSession())
{
return session.Load(id);
}
}
/* DocumentDB */
Order GetOrder(string id)
{
return client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.Id == id).AsEnumerable().FirstOrDefault();
}
One up for RethinkDB and RavenDB. When I use Id
to query, I shouldn't need to write a search condition like x => x.Id == id
.
3. GetOrderByNumber
/* MongoDB */
Order GetOrderByNumber(string orderNumber)
{
return collection.Find(o => o.OrderNumber == orderNumber)
.FirstOrDefaultAsync().Result;
}
/* RethinkDB */
Order GetOrderByNumber(string orderNumber)
{
return conn.Run(tblOrders.Filter(o => o.OrderNumber == orderNumber))
.FirstOrDefault();
}
/* RavenDB */
Order GetOrderByNumber(string orderNumber)
{
using (IDocumentSession session = store.OpenSession())
{
return session.Query()
.Where(x => x.OrderNumber == orderNumber).FirstOrDefault();
}
}
/* DocumentDB */
Order GetOrderByNumber(string orderNumber)
{
return client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.OrderNumber == orderNumber)
.AsEnumerable().FirstOrDefault();
}
4. UpdateOrder
/* MongoDB */
void UpdateOrder(Order order)
{
collection.ReplaceOneAsync(o => o.Id == order.Id, order).Wait();
}
/* RethinkDB */
void UpdateOrder(Order order)
{
conn.Run(tblOrders.Get(order.Id.ToString()).Replace(order));
}
/* RavenDB */
void UpdateOrder(Order order)
{
using (IDocumentSession session = store.OpenSession())
{
session.Store(order);
session.SaveChanges();
}
}
/* DocumentDB */
void UpdateOrder(Order order)
{
Document doc = client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.Id == order.Id).AsEnumerable().FirstOrDefault();
client.ReplaceDocumentAsync(doc.SelfLink, order).Wait();
}
DocumentDB needs an extra step! I have to do a separate query by Id
first, to get back a Document
object, then use the SelfLink
value on the Document
object to call ReplaceDocumentAsync
. I don't understand why the syntax has to be like that.
5. DeleteOrder
/* MongoDB */
void DeleteOrder(string id)
{
collection.DeleteOneAsync(o => o.Id == id).Wait();
}
/* RethinkDB */
void DeleteOrder(string id)
{
conn.Run(tblOrders.Get(id).Delete());
}
/* RavenDB */
void DeleteOrder(string id)
{
using (IDocumentSession session = store.OpenSession())
{
session.Delete(id);
session.SaveChanges();
}
}
/* DocumentDB */
void DeleteOrder(string id)
{
Document doc = client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.Id == id).AsEnumerable().FirstOrDefault();
client.DeleteDocumentAsync(doc.SelfLink);
}
Same as in GetOrder
, extra point for RethinkDB and RavenDB for not needing a search condition x => x.Id == id
when Id
is used.
6. DeleteOrderByNumber
/* MongoDB */
void DeleteOrderByNumber(string orderNumber)
{
collection.DeleteOneAsync(o => o.OrderNumber == orderNumber).Wait();
}
/* RethinkDB */
void DeleteOrderByNumber(string orderNumber)
{
conn.Run(tblOrders.Filter(o => o.OrderNumber == orderNumber).Delete());
}
/* RavenDB */
void DeleteOrderByNumber(string orderNumber)
{
using (IDocumentSession session = store.OpenSession())
{
var order = session.Query()
.Where(x => x.OrderNumber == orderNumber).FirstOrDefault();
session.Delete(order);
session.SaveChanges();
}
}
/* DocumentDB */
void DeleteOrderByNumber(string orderNumber)
{
Order order = client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.OrderNumber == orderNumber)
.AsEnumerable().FirstOrDefault();
Document doc = client.CreateDocumentQuery(collection.DocumentsLink)
.Where(x => x.Id == order.Id).AsEnumerable().FirstOrDefault();
client.DeleteDocumentAsync(doc.SelfLink);
}
MongoDB and RethinkDB are the best for DeleteOrderByNumber
. They both only need 1 call. RavenDB needs 2 calls: it first needs to query by OrderNumber
, then do the Delete (which presumably will use the Id
). DocumentDB is the worst as I need to do 3 calls! Before I can call DeleteDocumentAsync
, I first need to do a query by OrderNumber
to get the Id
, then use Id
to query again to get the self-link of this Order document! DocumentDB's client driver seems to only have one method for delete: DeleteDocumentAsync
, which only takes a SelfLink string.
I don't understand why there isn't an overload of DeleteDocumentAsync
which can take Id
. It doesn't seem to be just me. There are 300 votes on feedback.azure.com asking for the support of deleting a document by id.
Summary
Overall, the data access layer implementation on DocumentDB is a bit inferior experience than on the other three. I hope the DocumentDB team can improve it in the near future.
Foot Note 1:
I was advised that if my Order
object is extended from the Microsoft.Azure.Documents.Resource
type, it will already have the SelfLink property on it and I will not need the extra step in UpdateOrder
and DeleteDocumentAsync
.
It works but not acceptable to me. Having the Order
object extended from Resource
will pollute my domain model. Usually we want our domain model objects to be free of dependencies, so that it works the best for the interoperability across different layers and stacks.
Although strictly speaking, the Order
object isn't in 100% purity on MongoDB. I needed to put a [BsonId]
attribute on the Id property. But an attribute is much better than additional member fields introduced by extending from a type in a specific DB's client driver. For example, one of the major difference is that: in JSON serialization, attributes won't show up but member fields will.
Foot Note 2:
The Order
object was defined slightly differently on each DB. For completeness, here is the exact definitions:
/* MongoDB */
[Serializable]
public class Order
{
[BsonId(IdGenerator = typeof(StringObjectIdGenerator))]
public string Id;
public string OrderNumber;
public string ShippingAddress;
}
/* RethinkDB */
[DataContract]
public class Order
{
[DataMember(Name = "id", EmitDefaultValue = false)]
public string Id;
[DataMember]
public string OrderNumber;
[DataMember]
public string ShippingAddress;
}
/* RavenDB */
public class Order
{
public string Id = string.Empty;
public string OrderNumber;
public string ShippingAddress;
}
/* DocumentDB */
public class Order
{
[JsonProperty(PropertyName = "id")]
public string Id;
public string OrderNumber;
public string ShippingAddress;
}
Comments on “The Self-Link Nonsense in Azure DocumentDB”