Monday, May 3, 2021

Thinking about cacheability

Performance, Cacheability and Thinking Ahead

Although it's very true that "premature optimization is the root of all evil" (Hoare/Knuth), this should never be taken to mean that writing inefficient code is a good thing. Nor does it mean that we should ignore the possible need for future optimizations. As all things, it's a question of balance.

When designing APIs for example, just how you define them may impact even the possibility for future optimizations, specifically caching. Although caching is no silver bullet, often it is the single most effective measure to increase performance so it's definitively a very important tool in the optimization toolbox.

Let's imagine a search API, that searches for locations of gas stations within a given distance from a given point, and also filters on the brand of the station.

(In a real life application, the actual search terms will of course be more complex, so let's assume that using an underlying search service really is relevant which is perhaps not necessarily the case in this literal example. Also, don't worry about the use of static methods and classes in the sample code, it's just for show.)

[Route("v1/[controller]")]
public class GasStationV1Controller : ControllerBase
{
    [HttpGet]
    public IEnumerable<GasStation> Search(string brand, double lat, double lon, double distance)
    {
        return SearchService.IndexSearch(brand, lat, lon, distance);
    }
}

We're exposing a REST API that delegates the real work to a fictitious search service, accessing an index and providing results based on the search parameters, possibly adding relevance, sponsoring and other soft parameters to the search. That's not important here.

What is important is that we've decided to let the search index handle the geo location part of the search as well, so we're indexing locations and letting the search index handle distance and nearness calculations, which on the surface of things appear to make sense. The less we do in our own code, and the more we can delegate, the better!

But, unfortunately it turns out this is a little too slow, and also we're overloading the back end search service which has a rate limiting function as well as a per-call pricing schedule so it's expensive too. What to do? The obvious thing is to cache. Said and done.

[Route("v2/[controller]")]
public class GasStationV2Controller : ControllerBase
{
    [HttpGet]
    public IEnumerable<GasStation> CachedSearch(string brand, double lat, double lon, double distance)
    {
        string cacheKey = $"{brand}-{lat}-{lon}-{distance}";
        return ObjectCache.GetOrAdd(cacheKey, () => SearchService.IndexSearch(brand, lat, lon, distance));
    }
}

Now we're using our awesome ObjectCache to either get it from the cache, or if need be, call the back end service. All set, right?

Not quite.

The location that we're looking to find near matches to is essentially where the user is, which means there'll be quite a bit of variation of the search parameters. In fact, there is very little chance that anything in the cache will ever be re-used. The net effect of our caching layer is just to fill server memory. We're not reducing the back end search service load, and we're not speeding anything up for anyone.

The thing to consider here is that when we're designing an API that has the potential of being a bottleneck in one way or another, we should consider to make it possible to add a caching layer even if we don't to begin with (remember that thing about premature optimizations).

Avoid designing low-level API:s that take essentially open-ended parameters, i.e. parameters that have effectively infinite variation, and where very seldom a set of parameters is used twice. It's not always possible, it depends on the situation, but consider it.

As it turns out, our only option was to redesign what we use the search index for, and move some functionality into our own application. This is often a memory/time tradeoff, but in this case, keeping up to a 100 000 gas stations in memory is not a problem, and filtering them in memory in the web server is an excellent option.

This is how it looks like now, and although we're obliged to do some more work on our own, we'll be fast and we're offloading the limited and expensive search service quite a bit.

[Route("v3/[controller]")]
public class GasStationV3Controller : ControllerBase
{
    [HttpGet]
    public IEnumerable<GasStation> Search(string brand, double lat, double lon, double distance)
    {
        string cacheKey = $"{brand}";
        return ObjectCache.GetOrAdd(cacheKey, () => SearchService.IndexSearch(brand))
            .Select(g => g.SetDistance(lat, lon))
            .Where(g => g.Distance <= distance)
            .OrderBy(g => g.Distance);
    }
}

Now we have more manageable set of search parameters to cache, and we can still serve the user rapidly and without overloading the search service and or budget.

Taking this a step further, we'd consider moving this logic to the client, if it's reasonable since then we can even let the HTTP response become cacheable, which can further increase the scalability and speed for the users.

In the end performance is always about compromises, but the lesson learned here is that even if we don't (think) we need optimization and caching at the start, we should at least consider leaving the path for it open.

No comments:

Post a Comment