BERT models with millisecond inference — The Pattern project

The challenge:

  1. Convert and pre-load BERT models on each shard of RedisCluster (code)
  2. Pre-tokenise all potential answers using RedisGears and distribute potential answers on each shard of Redis Cluster using RedisGears(code for batch and for event-based RedisGears function)
  3. Amend calling API to direct question query to shard with most likely answers. Code. The call is using graph-based ranking and zrangebyscore to find the most ranked sentences in response to question and then gets relevant hashtag from sentence key
  4. Tokenise question. Code. Tokenisation happening on shard and uses RedisGears and RedisAI integration via `import redisAI`
  5. Concatenate user question and pre-tokenised potential answers. Code
  6. Run inference using RedisAI. Code model run in async mode without blocking the main Redis threat, so shard can still serve users
  7. Select answer using max score and convert tokens to words. Code
  8. Cache the answer using Redis — next hit on API with the same question returns the answer in nanoseconds.Code this function uses ‘keymiss’ event.
  1. Clevo laptop with Intel(R) Core(TM) i7–10875H CPU @ 2.30GHz, 64 GB RAM, SSD

--

--

--

I am a systems thinker with a deep understanding of technology and a methodological approach to innovation

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

K-means Clustering — Everything you need to know

Graph Data Science 101: Evolving Your Application of GDS Technology

Bar chart race with Plotly

Data Analysis in Python; Inflation and Stock Market

Is Data Virtualization the Secret to Accelerating Cloud Migrations?

Outcome Metrics for Nonprofits

Visualizing NYC Bike Share Trips with a Chord Diagram

This is How an *iPhone* Charges

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alex Mikhalev

Alex Mikhalev

I am a systems thinker with a deep understanding of technology and a methodological approach to innovation

More from Medium

Optimizing TensorFlow Models for Inference

UML Model Creation from Requirements using Domain Ontology

Build a large-scale document processing no-code application on AWS with ML-generated Titles…

MLOps — Building End to End Pipeline using Sagemaker SDK & AWS CodeCommit