Talk: All Things Cached - SF Python 2017 Meetup

  • Python All Things Cached Slides

  • Can we have some fun together in this talk?

  • Can I show you some code that I would not run in production?

  • Great talk by David Beazley at PyCon Israel this year.

    • Encourages us to scratch our itch under the code phrase: “It’s just a prototype.” Not a bad place to start. Often how it ends :)

Landscape

  • At face value, caches seem simple: get/set/delete.

  • But zoom in a little and you find just more and more detail.

Backends

  • Backends have different designs and tradeoffs.

Frameworks

  • Caches have broad applications.

  • Web and scientific communities reach for them first.

I can haz mor memory?

  • Redis is great technology: free, open source, fast.

  • But another process to manage and more memory required.

$ emacs talk/settings.py
$ emacs talk/urls.py
$ emacs talk/views.py
$ gunicorn --reload talk.wsgi
$ emacs benchmark.py
$ python benchmark.py
  • I dislike benchmarks in general so don’t copy this code. I kind of stole it from Beazley in another great talk he did on concurrency in Python. He said not to copy it so I’m telling you not to copy it.

$ python manage.py shell
>>> import time
>>> from django.conf import settings
>>> from django.core.cache import caches
>>> for key in settings.CACHES.keys():
...     caches[key].clear()
>>> while True:
...     !ls /tmp/filebased | wc -l
...     time.sleep(1)

Fool me once, strike one. Feel me twice? Strike three.

  • Filebased cache has two severe drawbacks.

    1. Culling is random.

    2. set() uses glob.glob1() which slows linearly with directory size.

DiskCache

  • Wanted to solve Django-filebased cache problems.

  • Felt like something was missing in the landscape.

  • Found an unlikely hero in SQLite.

I’d rather drive a slow car fast than a fast car slow

  • Story: driving down the Grapevine in SoCal in friend’s 1960s VW Bug.

Features

  • Lot’s of features. Maybe a few too many. Ex: never used the tag metadata and eviction feature.

Use Case: Static file serving with read()

  • Some fun features. Data is stored in files and web servers are good at serving files.

Use Case: Analytics with incr()/pop()

  • Tried to create really functional APIs.

  • All write operations are atomic.

Case Study: Baby Web Crawler

  • Convert from ephemeral, single-process to persistent, multi-process.

“get” Time vs Percentile

  • Tradeoff cache latency and miss-rate using timeout.

“set” Time vs Percentile

  • Django-filebased cache so slow, can’t plot.

Design

  • Cache is a single shard. FanoutCache uses multiple shards. Trick is cross-platform hash.

  • Pickle can actually be fast if you use a higher protocol. Default 0. Up to 4 now.

    • Don’t choose higher than 2 if you want to be portable between Python 2 and 3.

  • Size limit really indicates when to start culling. Limit number of items deleted.

SQLite

  • Tradeoff cache latency and miss-rate using timeout.

  • SQLite supports 64-bit integers and floats, UTF-8 text and binary blobs.

  • Use a context manager for isolation level management.

  • Pragmas tune the behavior and performance of SQLite.

    • Default is robust and slow.

    • Use write-ahead-log so writers don’t block readers.

    • Memory-map pages for fast lookups.

Best way to make money in photography? Sell all your gear.

  • Who saw eclipse? Awesome, right?

    • Hard to really photograph the experience.

    • This is me, staring up at the sun, blinding myself as I hold my glasses and my phone to take a photo. Clearly lousy.

  • Software talks are hard to get right and I can’t cover everything related to caching in 20 minutes. I hope you’ve learned something tonight or at least seen something interesting.

Conclusion

  • Windows support mostly “just worked”.

    • SQLite is truly cross-platform.

    • Filesystems are a little different.

    • AppVeyor was about half as fast as Travis.

    • check() to fix inconsistencies.

  • Caveats:

    • NFS and SQLite do not play nice.

    • Not well suited to queues (want read:write at 10:1 or higher).

  • Alternative databases: BerkeleyDB, LMDB, RocksDB, LevelDB, etc.

  • Engage with me on Github, find bugs, complain about performance.

  • If you like the project, star-it on Github and share it with friends.

  • Thanks for letting me share tonight. Questions?