The case
I am currently working on project where we are using a CMS product, in conjunction with a search service based on elastic search as well as some back-end API:s.
Elastic search is an incredibly powerful search service, and you can do almost anything with it.
But should you?
What we did
With a CMS that does not support storing of arbitrary data particularly well, it is tempting to look for creative alternatives. It's always a big step for example to add support for an OR mapper or any other custom table or database. It's not necessarily a good idea, if it can be avoided.
In this situation we hit upon the idea of using the search index to store data fetched from a back-end API. After all, it can serialize and index just about any .NET type - so why not add some data carrying properties to our custom index object?
I had a bad feeling about this, my spidey-sense started tingling... I was thinking that an index is something fairly approximate and it's only intended to as well as possible make it possible to find data. Not store it. Hmm...
In the team I tried to argue along these lines, but I had no luck. So off we went, starting to store more than text to search and back-references to the actual data.
What happened
So now we're in trouble. Not really deep trouble as yet, but it's just not a good idea as it turns out. We're getting inconsistent states and the code can't trust what it sees. It works, sort of, kind of, most of the time but...
The problem is basically that when you store your data in a database or a file, you expect consistent and reproduceable behavior, every time. If you don't get that, the assumption is that something is broken.
When you store your data in a search index, this just does not apply, here are some of the reasons:
- The index never promised to give a consistent view! Two reads can give different results, in elastic there's a concept of shards for example that can cause this behavior.
- The index never promised that a write is immediately or deterministically reflected in a subsequent read. This is due both to caching and to queueing behavior in the index, since the assumption is that you're basically requesting an index update that you'd like to be effective asap - but not guaranteed immediately.
- The index has a rate limit, it's perfectly ok for it to say that it's too busy, since the assumption is that at worst you lost an index update. No data is lost. With a database or a file etc., if that happens, you'll just simply have to gear up, it's a fatal error situation.
No comments:
Post a Comment