Edit Distance

Leave a comment

May 25, 2017 by oneOokay

Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)

You have the following 3 operations permitted on a word:

a) Insert a character
b) Delete a character
c) Replace a character


竟然从来没刷到过这个类型的题…大家dou说这个是一个classic dp problem.

int[][] dp = new int[len1 + 1][len2 + 1];

dp[i][j] 表示word1的substring:[第1个到第i个char]与word2的substring:[第1个到第j个char]的edit distance.

  • dp[0][0] = 0
  • dp[0][j]= j (j= [1 – len2] ):表示word1为””,显然此时任何word2的substring与word1之间的edit distance就是此时word2的substring的length
  • 同理:dp[i][0] = i ( i = [1 – len1] )

如果word1.charAt(i) == word2.charAt(j): dp[i][j] = dp[i – 1][j – 1]

如果不等的话:

replace cost dp[i][j] = dp[i – 1][j – 1] + 1.  i-1和j-1的distance + 1. 1 表示替换掉either i or j.

delete cost 和 insert cost: 这里我们要考虑最终edit的结果是把word1变成word2还是把word2变成word1.

把word1变成word2的话:

  • delete cost: delete word1: dp[i][j] = dp[i – 1][j] + 1: 删掉word1的i,剩下的word1(0, i – 1)与word2(0, j)的距离+1就是新的距离
  • insert cost: insert into word1: dp[i][j] = dp[i][j – 1] + 1;在word1的i之后加上一个char(word2的j) 这样word1 i +1就和word2 j相等了.此时的edit cost就是: word1(0,i)到word2(0,  j – 1)的距离+1
  • 综上:不等的时候: dp[i][j] = Math.min(dp[i-1][j-1] + 1, Math.min(dp[i – 1][j] + 1, dp[i][j – 1] + 1))

把word2变成word1的话:

  • delete cost: delete word2: dp[i][j] = dp[i][j – 1] + 1: 删掉word2的j,剩下的word2(0, j – 1)与word1(0, i)的距离+1就是新的距离
  • insert cost: insert into word2: dp[i][j] = dp[i – 1][j] + 1;在word2的j之后加上一个char(word1的i) 这样word2 j +1就和word1 i相等了.此时的edit cost就是: word2(0,j)到word1(0,  i – 1)的距离+1
  • 综上:不等的时候: dp[i][j] = Math.min(dp[i-1][j-1] + 1, Math.min(dp[i][j – 1] + 1, dp[i – 1][j] + 1))

所以把word2变成word1或者把word1变成word2,他们的insert / delete cost 都是一样的(恰好掉换了一下顺序而已)

代码:

public int minDistance(String word1, String word2) {
        int len1 = word1.length();
        int len2 = word2.length();
        int[][] dp = new int[len1 + 1][len2 + 1];
        dp[0][0] = 0;
        for (int i = 1; i <= len1; i ++){
            dp[i][0] = i;
        }
        for (int i = 1; i <= len2; i ++){
            dp[0][i] = i;
        }
        for (int i = 1; i <= len1; i ++){
            for (int j = 1; j <= len2; j ++){
                if (word1.charAt(i - 1) == word2.charAt(j - 1)){
                    dp[i][j] = dp[i - 1][j - 1];
                }else {
                    dp[i][j] = Math.min(Math.min(dp[i - 1][j - 1] + 1, dp[i - 1][j] + 1), dp[i][j - 1] + 1);
                }
            }
        }
        return dp[len1][len2];
    }

优化为一维矩阵:

推导dp[i][j]的时候依赖于dp[i-1][j-1](斜上方), dp[i][j-1](左边)和dp[i – 1][j](上面);

于是可以简化成一维矩阵.

public int minDistance(String word1, String word2) {
        int len1 = word1.length();
        int len2 = word2.length();
//用len1作为dp的长度说明了,我们要把word2变成word1.
        int[] dp = new int[len1 + 1]; 
        dp[0] = 0;
//初始化:相当于把word1[0,i]变成word2:""的edit distance
        for (int i = 1; i <= len1; i ++){
            dp[i] = i;
        }
//注意这里要把word2放到外层循环,word1放到内层循环.
//为了方便,与前面的for的变量相对应,所以外层用了j,内层用了i
//这个双重循环就是把word[0,j]变成word1的edit distance.
        for (int j = 1; j <= len2; j ++){
            int pre = dp[0];//在更新dp[0]的时候先把dp[0]存下来.
            dp[0] = j;//把word2[0,j]变成word1:""的edit distance
            for (int i = 1; i <= len1; i ++){
                int tmp = dp[i]; //这里dp[i]就是二维矩阵中上一层的dp[i-1][j]
                if (word2.charAt(j - 1) == word1.charAt(i - 1)){
                    dp[i] = pre; //pre是dp[i-1][j-1]
                }else {
                    //dp[i - 1]由于dp[i-1]已经被更新过了,所以其实是:dp[i][j-1]
                    dp[i] = Math.min(Math.min(pre + 1, dp[i] + 1), dp[i - 1] + 1);
                }
                pre = tmp;//pre是内层更新之前的i-1的值,所以就是dp[i-][j-1]
            }
        }
        return dp[len1];
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: