Posted about 1 month ago by avalanche123
Hi, my name is Bulat S. (my last name won't make it any easier, but in case you were wondering it's Shakirzyanov), I joined OpenSky in August 2009 (It's been almost a year since then, but it feels like ages). My official title in the company is Hacker, which also says a lot about me (that I don't like corporate titles for one).
The last 6 weeks were truly amazing for me. Not only was I able to learn a new technology, I also managed to contribute back to the community. But let's go over everything step by step.
Building an eCommerce system is not easy, and building a platform is even harder. When it comes to data in eCommerce, there is nothing definite, no real structure you could stick to, and no final requirements. Something as obvious as the "item you add to cart" could be overly complicated when it comes to data.
There is a good example of how to model the database for handling variable product attributes; Magento is one of the most advanced open source eCommerce solutions available today. It uses EAV (Entity Attribute Value), which solves the problem of variable attributes by sacrificing database level integrity and application performance. The amount of queries you need to perform to select one entity will grow with every attribute data type you introduce; however, it still is a viable solution.
A document store on the other hand lets you save two absolutely different documents in the same collection. Because of its schema-less structure it is also possible to add or remove a document's properties after saving - it's a database that adapts to your data structure on the fly.
At OpenSky, we decided to use MongoDB for storage of products and use relational databases for order-related data since MongoDB doesn't support transactions.
So what is the benefit of using MongoDB over MySQL, or any other RDBMS, for storing variable attribute data. Performance. This is the pseudo-query we would have to write to select one product, with id 1, and all of its attributes in a typical EAV model:
SELECT * FROM `product` WHERE id = 1; SELECT * FROM `product_attributes` = WHERE product_id = 1; SELECT * FROM `product_values_int` WHERE product_id = 1; SELECT * FROM `product_values_varchar` WHERE product_id = 1; SELECT * FROM `product_values_datetime` WHERE product_id = 1; SELECT * FROM `product_values_text` WHERE product_id = 1; SELECT * FROM `product_values_float` WHERE product_id = 1;
After the above queries are run, there would be a huge step of data hydration into the product object, which Magento handles quite well, albeit slowly. Contrast this with what we would do in MongoDB:
db.products.find({'_id': '1'});
Not only is the selection simpler, but it also returns a JSON object, which can easily be hydrated into a native PHP object. And here is how a configurable product could be represented in MongoDB:
{ "_id": ObjectId("4bffd798fdc2120019040000") "name": "Configurable T-Shirt" "options": [ { "name": "small", "price": 12.99 }, { "name": "medium", "price": 15.99 }, { "name": "large", "price": 17.99 } ] }
There is no need for joins, as product options are a collection of embedded objects. Object references (akin foreign key relationships in RDBMSs) are also possible, but they are generally only necessary if you need to access the object independently. For instance, if I needed a page to list all product options across all products, I would probably put options into their own collection and reference them from the product document.
Of course, there are plenty of ORM libraries for MongoDB, which were either hard-to-extract parts of frameworks, not quite ORMs or used the ActiveRecord pattern (which after using DataMapper for quite some time, I wouldn't want to go back to). The very same day I started writing an object document mapper (ODM) to use at OpenSky, Jon Wage (developer for the Doctrine project) released a proof-of-concept MongoDB ODM, which you can find on github. After contacting Jon and giving his library a couple of tries and tests, I decided to use it for OpenSky's products domain layer.
I started to submit patches and unit tests to the project and soon joined the core team for MongoDB ODM. Today, we are past first alpha release of the project, and this is my first post on the Doctrine blog (yay!).
Getting back to our example, this is how the product and embedded option classes for the aforementioned data structure could look:
<?php // Product.php /** * @Document(collection="products") */ class Product { /** * @Id */ private $id; /** * @String */ private $name; /** * @EmbedMany(targetDocument="Product\Option") */ private $options = array(); public function getId() { return $this->id; } public function setName($name) { $this->name = $name; } public function getName() { return $this->name; } public function addOption(Product\Option $option) { $this->options[] = $option } //... }
And the Product\Option class:
<?php // Product/Option.php namespace Product; /** * @EmbeddedDocument */ class Option { /** * @String */ private $name; /** * @Float */ private $price; public function setName($name) { $this->name = $name; } public function getName() { return $this->name; } public function setPrice($price) { $this->price = $price; } public function getPrice() { return $this->price; } //... }
Using the DocumentManager instance, we could easily persist the product with:
<?php $product = new Product(); $product->setName('Configurable T-Shirt'); $small = new Product\Option(); $small->setName('small'); $small->setPrice(12.99); $product->addOption($small); $medium = new Product\Option(); $medium->setName('medium'); $medium->setPrice(15.99); $product->addOption($medium); $large = new Product\Option(); $large->setName('large'); $large->setPrice(15.99); $product->addOption($large); $documentManager->persist($product); $documentManager->flush();
MongoDB ODM intelligently uses atomic operators to update data, which makes it really fast. It also supports inheritance (collection-per-class and single-collection inheritances), which is similar to table inheritance design patterns for ORMs. Check out the official Mongo ODM project documentation for more information and examples. Complete instructions on how to setup your DocumentManager instance can be found here.
The above code would store the product object as a document in MongoDB.
There is much more to talk about in terms or technologies, techniques and practices we adopt and use at OpenSky, so this post is definitely not the last one.
Comments (17) [ add comment ]
Good ! Posted by jeremyFreeAgent about about 1 month ago.
This is what I do. Thanks for doing it too !
Awesome Posted by Kyle about about 1 month ago.
Great to see more posts about MongoDB and e-commerce. Keep them coming!
Thanks! Posted by Peter about about 1 month ago.
Thanks for your post Bulat and welcome to the Doctrine community. Could you write more about how you use MongoDB and MySQL together? I am in exactly the same situation, but chickened out of using MongoDB because I coldn't think of a good way to relate records in MongoDB with records in MySQL.
transactions Posted by taxilian about about 1 month ago.
Out of curiosity, what types of things would you need to do in order processing that require a transaction that you couldn't just put all in one document?
RE: Thanks! Posted by Bulat S. about about 1 month ago.
Peter,
I will definitely continue sharing my experience with mongodb odm in ecommerce, so stay tuned.
P.S. thanks everyone for reading it, and, most importantly, finding it useful.
RE: transactions Posted by Bulat S. about about 1 month ago.
Without going into details, inventory management requires you to update stock status only after successful order. But doing it without transactions would mean that the second order could start while the first is still in progress. Most of the time it would be fine, but in case there is only one item left, it can be confusing to customer why he couldn't checkout. So the solution is to wrap order processing in transaction, reduce stock level in the beginning, and on failure, roll it back.
lists Posted by Koc about about 1 month ago.
What to do if attribute not calar, it is a list (multiple choise)? How to store data, build model?
re Koc Posted by jwage about about 1 month ago.
You can use the @Collection annotation to store an array or you can store many embedded documents with @EmbedMany.
EAV queries Posted by Colin about about 1 month ago.
I'm very excited to see MongoDB used in e-commerce. However, your statement that each new attribute is an additional query is completely incorrect. Magento's EAV models use joins to pull in the additional attributes, so each new attribute is actually an additional join. So 100 products with 100 attributes is still one query, albeit a very large one. What's funny is that they've started moving to "flat" tables and indexes which is just ridiculously messy, not to mention the multiple sets of metadata tables.. A Magento 1.4 installation now has over 300 tables.
This isn't optimal for concurrency, but if you absolutely had to do a transaction you could execute server-side javascript. E.g.: create new records with a "pending" flag, compile a javascript function to modify existing records (decrement inventory) and remove the "pending" flag from the new records (orders/invoices). If the update failed you would delete the "pending" records. Not real ACID, but then again e-commerce isn't banking.
ODM and ORM Posted by Michael about about 1 month ago.
Fantastic post. Thank you for that!
Now I have one big question. You told us that OpenSky uses MongoDB for products and a relational database for order-related data.
Is it possible to build up relations between Doctrine ODM and ORM objects? Maybe using the default relation annodations to set up a relation from \Documents\ObjectX to \Entities\ObjectY?
After reading this post there are so many things in my mind that I wanna do. But this is essencial for all of them...
Another wish for next blog posts: There are some cases where I'm not sure if I should use ODM or ORM. Could you dedicate a blog post to a comparison of ORM and ODM? Advantages and disadvantages of relational databases / document storage compared to each other? When to use ODM and when to use ORM?
RE: EAV queries Posted by Bulat S. about about 1 month ago.
Colin,
what I said was "The amount of queries ... will grow with every [new] attribute data type", data type is the keyword, which means, that if you decide to store timestamps, floats or blobs, you would have to introduce additional tables, hence the queries. Your statement "Magento's EAV models use joins to pull in the additional attributes ..." is correct for selection of product collections (catalog pages). Individual products (product pages), however, are selected similarly to what I described, because an individual product needs all attributes to be hydrated onto it, and product collection only needs attributes, used in product list view.
I completely agree with the rest of your comment.
RE: ODM and ORM Posted by Bulat S. about about 1 month ago.
Hi Michael,
I understand your frustration. What we had to do is store our own custom identifiers instead of mongo id objects for each document/entity, that would be used in both data stores. Which let us store same values in mysql and mongo. Its the best solution available at the time.
P.S. I will soon publish the step by step comparison of ORM vs ODM and RDBMSs vs DODs (document-oriented databases).
ORM + ODM Posted by Lukas about about 1 month ago.
A big question of course is how much Doctrine2 should natively provide to bridge apps that use with the ORM and the ODM. Another question in that vain is if there should be some sort of abstraction query API that can easily allow code to switch from ORM to ODM for at least the CRUD stuff as well as PK lookups and basic filtering.
@String Posted by Simon H about about 1 month ago.
I'm nitpicking, but wouldn't it be nicer to use the standard "@var string" style annotation - which people are probably already using - than introducing a new "@String" style syntax?
Either way, this still looks very exciting, speaking as someone just getting into MongoDB at the moment, so thanks.
re: simon Posted by jwage about about 1 month ago.
@var string won't work because that is not the syntax for annotations and we need the ability to specify arguments. i.e. @String(option=value)
questions Posted by Evgeny about about 1 month ago.
Why keep using key-value pairs in mongo document?
Imo instead of options collection better to use something like
"prise_per_size": { "small": 12.99 "medium": 15.99 "large": 17.99 }
RE: questions Posted by Bulat S. about about 1 month ago.
He Evgeny,
The reason is custom attributes:
{ name: "large", price: 17.99, color: "black", size: "L", // etc. }
One option per stock item.