As we build Jixee, we run into many technical issues. As you may know, many of these issues can be extremely frustrating. But the silver lining lies in the joy (and relief) you feel when you figure out what the problem is. To better help our community and fellow developers, we’re going to be occasionally writing posts about technical difficulties we run into.
Just the other night, John, one of our engineers, ran into an issue with our NoSQL database and ACID transactions. While John was reworking Jixee’s task module, he discovered some of our actions (e.g. update a task, move a task) are multi-statement actions. This lead to the issue when work was interrupted with an error like a power or internet failure. Half of the steps would be applied while the other half would not be. Here’s how John fixed this issue.
As a backup, John checked for an error and had to write “rollback” steps to revert the partial changes. If, for any reason, these rollbacks failed, then an error email was sent mentioning “manual intervention” is required to fix the database items in question.
This wasn’t the optimal solution, but John found it really time-consuming to write rollbacks and felt it was impractical to write handling for so many edge cases.
Another thing John found could happen is someone could “get tasks” at the same time the multi-step query is occurring, and something could be off. These aren’t likely events but given that they are a possibility; John felt the probability would increase as we scale our user base. Now this news isn’t entirely new to us. It’s a database issue that we’ve been dealing since the infancy of Jixee (in a future post we’ll discuss Jixee’s technical origins and why this is an issue now). Jixee’s old task system, with levels and updating sub-tasks, were multi-statement and could have run into troubles if something went out. It’s a limitation of Mongo.
Mongo lets us do ACID based transactions (basic) on a single document. So we can find and modify the document and it remains concurrent. This allows us to ensure the user_task_id (like jix-546) is atomic and only 1 could possibly exist with that number.
But if we need to do something that has steps to it such as updating the ordered lists (which is what tasks are being turned into) then we have to update ordering on the old task group, the new task group and possibly across boards if the user moves tasks to another board.
When John wrote the task linking function, he had this issue come up and decided to send a function to Mongo to process the transaction. So at least all of the steps were in 1 network request and Mongo can apply them all and hope no errors occur. This is the db.eval feature, however the eval function is not shard safe and would break or not work properly. John might be able to remove it for linking when I get to it on the new task system.
John’s research into this lead him to finding this solution. The solution is a fork of 2.2 version of Mongo and uses a newer storage engine. It allows transactions so we could begin a transaction, do multiple statements, and end the transaction. If anything were to fail during this then nothing would get applied. It meant that transactions work on a snapshot so even if it were in the middle of doing multi-statement task modification, anyone getting tasks at the same time will get the original intended list. A possible fix.
But we would need to make sure it has the same type of support since all these things are open source. However, TokuMX’s transactions are not yet supported for shard systems, so that can be another issue.
If we find TokuMX is great then an alternative to shard later could be to have backups of database machines but split entire accounts across those machines since accounts can be safely isolated from each other and do away with shards. Or maybe even more to MySQL but that’s a broad tropic.
What has your experience been with NoSQL databases and ACID transactions? I’ll keep you updated on our progress. I hope this gives you a brief glimpse at Jixee, how we operate, and how we tackle our technical issues. If you need help keeping track of your team and issues, try Jixee free for 14-days.