Friday, April 23, 2021

Using a search index as a database is a bad idea

The case

I am currently working on project where we are using a CMS product, in conjunction with a search service based on elastic search as well as some back-end API:s.

Elastic search is an incredibly powerful search service, and you can do almost anything with it.

But should you?

What we did

With a CMS that does not support storing of arbitrary data particularly well, it is tempting to look for creative alternatives. It's always a big step for example to add support for an OR mapper or any other custom table or database. It's not necessarily a good idea, if it can be avoided.

In this situation we hit upon the idea of using the search index to store data fetched from a back-end API. After all, it can serialize and index just about any .NET type - so why not add some data carrying properties to our custom index object?

I had a bad feeling about this, my spidey-sense started tingling... I was thinking that an index is something fairly approximate and it's only intended to as well as possible make it possible to find data. Not store it. Hmm...

In the team I tried to argue along these lines, but I had no luck. So off we went, starting to store more than text to search and back-references to the actual data.

What happened

So now we're in trouble. Not really deep trouble as yet, but it's just not a good idea as it turns out. We're getting inconsistent states and the code can't trust what it sees. It works, sort of, kind of, most of the time but...

The problem is basically that when you store your data in a database or a file, you expect consistent and reproduceable behavior, every time. If you don't get that, the assumption is that something is broken.

When you store your data in a search index, this just does not apply, here are some of the reasons:

  • The index never promised to give a consistent view! Two reads can give different results, in elastic there's a concept of shards for example that can cause this behavior.
  • The index never promised that a write is immediately or deterministically reflected in a subsequent read. This is due both to caching and to queueing behavior in the index, since the assumption is that you're basically requesting an index update that you'd like to be effective asap - but not guaranteed immediately.
  • The index has a rate limit, it's perfectly ok for it to say that it's too busy, since the assumption is that at worst you lost an index update. No data is lost. With a database or a file etc., if that happens, you'll just simply have to gear up, it's a fatal error situation.
Specifically we're now in a situation that our code won't always work, mostly if we're "too fast". If we wait a few minutes between writing, and expecting it to be able to read back, it'll often but not always work.


Don't use a search index as a data store. Even if looks cool and easy, don't. I don't know exactly what problems you'll run into, but I'll bet a beer or soda that it'll cause you some unexpected grief.

No comments:

Post a Comment