Levenshtein Distance Calculator: Measure String Edit Distance

Comparison Zone

Source String

0 chars

Target String

0 chars

Case Sensitive Ignore Whitespace

Edit Distance

minimum operations

Similarity

type in both fields

Transformation Path

Enter text in both fields above to see the transformation path.

Key Terms Explained

Levenshtein Distance

The minimum number of single-character edits (insertions, deletions, substitutions) needed to change one string into another. Named after Soviet mathematician Vladimir Levenshtein, who introduced the concept in 1965.

Edit Distance

A general term for any metric that counts the number of operations required to transform one string into another. Levenshtein distance is the most widely used variant, but others (Damerau-Levenshtein, Hamming) use different operation sets.

Dynamic Programming

An algorithmic technique that solves complex problems by breaking them into overlapping sub-problems and storing their results. The Levenshtein algorithm uses a matrix to avoid recomputing the cost of reaching each position in the two strings.

Wagner-Fischer Algorithm

The standard algorithm for computing Levenshtein distance, published in 1974. It builds an (n+1) x (m+1) cost matrix and fills it row by row, with each cell storing the minimum edits to transform the first i characters of the source into the first j characters of the target.

Fuzzy String Matching

Techniques that find strings which are approximately equal rather than exactly equal. Edit distance is a core building block: a low distance means the strings are likely the same entity with minor typos or formatting differences.

Insertion

An edit operation that adds a character to the source string. In the transformation path it is shown in green and counts as 1 unit of edit distance.

Deletion

An edit operation that removes a character from the source string. In the transformation path it is shown in red with strikethrough and counts as 1 unit of edit distance.

Substitution

An edit operation that replaces one character with another. In the transformation path it is shown in yellow and counts as 1 unit of edit distance. A substitution is equivalent to a deletion followed by an insertion, which is why it costs 1, not 2.

The Complete Guide to Levenshtein Distance

Whether you are cleaning a database, building a spell-checker, or auditing code for near-duplicate functions, Levenshtein distance gives you a precise, numeric answer to the question: "how different are these two strings, and exactly which characters changed?"

How to Use This Tool

Type or paste your first string into the "Source String" box and your second string into the "Target String" box. Results update in real time as you type - no button to click. The Edit Distance score shows the raw integer count of minimum operations. The Similarity percentage normalizes that score against the longer string so you can compare pairs of different lengths on the same scale.

The Transformation Path below the scores shows a character-level map. Characters in red were deleted from the source; characters in green were inserted to reach the target; characters in yellow were substituted. Unchanged characters appear in grey. Use the toggles to control whether the comparison is case-sensitive (on by default) and whether whitespace should be stripped before comparing (off by default).

What the Edit Distance Score Actually Means

An edit distance of 0 means the strings are identical under the current settings. A distance of 1 means a single typo - one key press wrong - separates them. Distances in the range of 1 to 3 are typically considered "close enough to be the same" for fuzzy matching purposes, though the right threshold depends entirely on your use case. A distance equal to the length of the longer string means every character is different - the strings share nothing in common.

The similarity percentage (1 - distance / max length) is more intuitive for communication. 95% similar is easy to grasp; an edit distance of 3 on a 60-character string is less so. Use the raw distance for algorithms and the percentage for dashboards and human review queues.

Why Not Just Compare Character by Character?

A naive positional comparison breaks the moment a single insertion or deletion shifts everything. Compare "colour" and "color": every character from position 4 onward is "wrong" according to positional comparison, even though only one letter was added. Levenshtein distance recognizes that a single deletion transforms one into the other and returns a distance of 1. This makes it far more robust for real-world text where insertions and deletions are as common as outright substitutions.

How the Wagner-Fischer Algorithm Works

The algorithm creates a grid with (source length + 1) rows and (target length + 1) columns. The first row is filled with 0, 1, 2, 3... (cost of deleting each source character to reach an empty target). The first column is filled the same way (cost of inserting each target character from an empty source). Each remaining cell is filled with the minimum of three options: the cell above plus 1 (deletion), the cell to the left plus 1 (insertion), or the diagonal cell plus 0 if the characters match, or plus 1 if they do not (substitution). The value in the bottom-right corner is the Levenshtein distance.

To generate the Transformation Path, this tool then backtracks from the bottom-right cell to the top-left, at each step identifying whether a match, substitution, insertion, or deletion was chosen. This backtrack trace is what powers the color-coded character display you see in the Operation Map.

Practical Applications

Spell checkers use edit distance to rank candidate corrections by closeness to the misspelled word. Database deduplication tools use it to find records that differ only by a typo or abbreviation. DNA sequencing software uses it (and related algorithms) to align genetic sequences. Search engines use it to handle "did you mean?" suggestions. Version control systems use edit-distance-derived metrics for displaying code changes. Any time you need to answer "are these two things the same, just slightly garbled?" edit distance is the right starting point.

Performance Note for Large Inputs

The Wagner-Fischer algorithm has time and memory complexity of O(n x m) where n and m are the string lengths. For strings up to 1,000 characters each, computation is near-instantaneous in a modern browser. Beyond that, you will see a warning in the character counter and the algorithm may take a perceptible moment to run. For very long inputs such as entire files or large code blocks, a line-level diff tool is usually more appropriate - see the Diff Checker tool for that use case.

Frequently Asked Questions

What is an "Edit Distance"? ▾

Edit distance is the minimum number of single-character operations (insertions, deletions, or substitutions) required to transform one string into another. The Levenshtein variant is the most common definition: each operation costs exactly 1, and you find the cheapest sequence of edits. A distance of 0 means the strings are identical; higher numbers mean more work is needed to convert one into the other.

Why is this tool useful for data cleaning? ▾

When merging records or deduplicating databases, the same entity often appears with slight spelling variations - typos, abbreviations, or inconsistent formatting. Edit distance gives you a numeric score for how different two strings are, so you can set a threshold and automatically flag likely duplicates for review without needing an exact match. For example, "Renee Smith" and "Rene Smyth" have an edit distance of 2 and would likely be caught by a threshold of 3, while "Renee Smith" and "Robert Jones" would have a much higher distance and be correctly treated as different people.

How does this differ from a character-by-character comparison? ▾

A character-by-character comparison only checks positions in lockstep and breaks the moment a single insertion or deletion shifts everything out of alignment. Levenshtein distance accounts for insertions and deletions explicitly, so a single extra character in the middle of a word does not cascade into every subsequent position being flagged as a mismatch. This makes it dramatically more accurate for natural language text, usernames, product names, and any other strings where insertions and deletions are common.

Can I use this to compare large blocks of code? ▾

You can, but Levenshtein distance is designed for short-to-medium strings. For very long inputs the algorithm's memory and time requirements grow as length1 times length2, which can become slow in a browser past a few thousand characters each. More importantly, for code comparison the character-level view is often less useful than a line-level diff. If you want to compare two functions, pasting them here will show you every character difference; if you want to compare two files, the Diff Checker tool (which operates line by line) will give you a more readable and actionable result.

How is the similarity percentage calculated? ▾

Similarity is calculated as: 1 minus (edit distance divided by the length of the longer string), expressed as a percentage. A distance of 0 gives 100% similarity. A distance equal to the longer string's length gives 0% similarity. This normalizes the raw distance so you can compare pairs of strings of different lengths on the same scale - a distance of 3 means something very different for a 5-character string versus a 500-character string, and this formula accounts for that.