How It's Made

“Avacado” or Avocado?

A simple search query correction heuristic for the resource-constrained

At Instacart, our search engine is one of the most important tools customers rely on to quickly find their favorite grocery items in the digital aisles. But, oftentimes, some of the most common grocery goods are the trickiest to spell. We’re not all spellers or digital natives — if you mistype or forget how to spell 🥑 (it’s Avocado by the way) you’re not alone. But, that typo shouldn’t automatically mean you have a poor search experience.

Some of our most popular misspelled queries are:

  • Siracha ♨️
  • Zuchinni 🥒
  • Jalepeno 🌶
  • Cantelope 🍈
  • Parmesean 🧀

During the early days of Instacart our team was small and scrappy, and we used to manually add correction terms for commonly misspelled queries. But, this approach doesn’t scale. My first project as the first Machine Learning Engineer on Instacart’s Search & Discovery team 5 years ago, was to fix this problem. In technical terms, we wanted to improve the Recall of our search engine for misspelled queries.

The query correction problem

In general, we encounter two main query correction problems:

  • non-word query correction (spelling errors that result in non-words, for example, “avacado” for “avocado”)
  • real-word query correction (spelling errors that accidentally result in an actual word, for example “line” for “lime”)

Non-word query corrections can be solved by using a dictionary and some form of distance function to map the incorrect word with a correct word from the dictionary. Real-word query corrections are much harder problems and require more state-of-the-art techniques like building a language model. Peter Norvig wrote a great post on “How to Write a Spelling Corrector,” which walks through some of these concepts.

Given the short one-word queries we typically see with grocery searches, query correction is not an easy problem. There are thousands of papers on this topic and this is still a very active area of research. For us, the non-word query corrections were the most prevalent and accounted for most of the spelling errors that resulted in zero-result queries in the early days of Instacart.

For any given problem, there may be standard “state-of-the-art” solutions that are not always trivial to implement. In the early days, we had to solve a number of technical problems when building the Instacart marketplace, and it was impossible to prioritize each of these problems. So, we had to get scrappy vs. state-of-the-art when tackling these issues. Our commonly used approach to problem-solving in those days was — “can we use a simple approach that solves 80%-90% of the problem in a short period of time?” We often went for a quick, yet very effective, heuristic. It not only solved most of the incorrect spelling problems but also gave us the added benefit of helping with the query reformulation problem, which we’ll explain below.

The path to a successful query

After entering a search query, a customer will take one of the following actions:

  • Add an item to their cart if the query yields precise results with good recall (this event is called a conversion)
  • Search for a variation of the same query if they do not get any results or are not satisfied with the results
  • Search for a different query altogether if they decide to not convert on the given query
  • Go to the checkout page to place the order
  • Bounce off Instacart.com if they are not satisfied with the experience

For every query, each of these can be considered a different state the customer can move to in a Markov Chain. We are interested in helping the customer add an item to their basket (conversion state) as efficiently as possible.

Jagannath Putrevu

Author

Jagannath Putrevu is a member of the Instacart team. To read more of Jagannath Putrevu's posts, you can browse the company blog or search by keyword using the search bar at the top of the page.

Most Recent in How It's Made

One Model to Serve Them All: How Instacart deployed a single Deep Learning pCTR model for multiple surfaces with improved operations and performance along the way

How It's Made

One Model to Serve Them All: How Instacart deployed a single Deep Learning pCTR model for multiple surfaces with improved operations and performance along the way

Authors: Cheng Jia, Peng Qi, Joseph Haraldson, Adway Dhillon, Qiao Jiang, Sharath Rao Introduction Instacart Ads and Ranking Models At Instacart Ads, our focus lies in delivering the utmost relevance in advertisements to our customers, facilitating novel product discovery and enhancing…...

Dec 19, 2023
Monte Carlo, Puppetry and Laughter: The Unexpected Joys of Prompt Engineering

How It's Made

Monte Carlo, Puppetry and Laughter: The Unexpected Joys of Prompt Engineering

Author: Ben Bader The universe of the current Large Language Models (LLMs) engineering is electrifying, to say the least. The industry has been on fire with change since the launch of ChatGPT in November of…...

Dec 19, 2023
Unveiling the Core of Instacart’s Griffin 2.0: A Deep Dive into the Machine Learning Training Platform

How It's Made

Unveiling the Core of Instacart’s Griffin 2.0: A Deep Dive into the Machine Learning Training Platform

Authors: Han Li, Sahil Khanna, Jocelyn De La Rosa, Moping Dou, Sharad Gupta, Chenyang Yu and Rajpal Paryani Background About a year ago, we introduced the first version of Griffin, Instacart’s first ML Platform, detailing its development and support for end-to-end ML in…...

Nov 22, 2023