Wildcard Matching/Regular Expression

Leave a comment

May 5, 2017 by oneOokay

  • Regular Expression Matching
  • Wildcard Matching

Wildcard matching:Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false

与RegularExpression的区别在于: *跟它之前的char没有任何关系. 同样的用DP.
  • 一个boolean[][] dp. m和n各为s,p的length还是length+1?
  • 首先明确一下dp的意义: dp[i][j]表示:s:从0到第i个字符与p:从0到第j个字符是否match.
    • 当s(i – 1) == p(j – 1)||p(j) == ‘?’: dp[i][j] = dp[i-1][j-1].
  • 否则当p(j) == ‘*’时:
    • 当 ‘*’ match empty string: dp[i][j] = dp[i][j-1]
    • 当 ‘*’ match多个字符的时候: dp[i][j] = dp[i – 1][j]
      • 当 ‘*’ match s 中的1个字符的时候: dp[i][j] = dp[i – 1][j – 1]
      • match s中的2个字符的时候: dp[i][j] = dp[i – 2][j – 1]
      • match s中的3个字符的时候: dp[i][j] = dp[i – 3][j – 1]
      • match s中的i个字符的时候: dp[i][j] = dp[0][j – 1].
    • 所以dp[i][j] = dp[i][j – 1] ||dp[0到i-1][j-1]是否存在一个True. 也就是这一列dp[0到i][j-1]是否存在一个True. 由递推可知: dp[i – 1][j]的值等于dp[0到i-1][j]是否存在一个True.(因为p(j) == ‘*’) .所以可以简化为dp[i][j] = dp[i][j – 1] || dp[i – 1][j].
  • 由上面的递推公式可以知道需要初始化dp[0][i]和dp[i][0]并且dp[s.length + 1][p.length + 1].
    • dp[0][0] = true.
    • 当p为空的时候dp[i][0] (i > 0)均为false,不需要初始化.
    • 当s为空的时候dp[[0][i] (i > 0):需要考虑特殊情况当p(j) == ‘*’的时候. 根据递推公式带入:dp[0][i] = dp[0][i – 1](因为dp[-1][i] is not valid).

 

public boolean isMatch(String s, String p) {
        int m = s.length();
        int n = p.length();
        boolean[][] dp = new boolean[m + 1][n + 1];
        dp[0][0] = true;
        
        for (int i = 1; i <= n; i ++){
            if (p.charAt(i - 1) == '*')
                dp[0][i] = dp[0][i - 1];
        }
        
        for (int i = 1; i <= m; i ++){
            for (int j = 1; j <= n; j ++){
                if (p.charAt(j - 1) == '?' || p.charAt(j - 1) == s.charAt(i - 1))
                    dp[i][j] = dp[i - 1][j - 1];
                else if (p.charAt(j - 1) == '*'){
                    dp[i][j] = dp[i][j - 1] || dp[i - 1][j];
                }
            }
        }
        return dp[m][n];
    }

Regular Expression Matching:

Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

和Wildcard matching情况差不多一个boolean[][] dp: m = slen + 1; n = plen + 1. 特殊情况在于’A*’的match情况: ‘A*’可以match: “”, “A”,”AA”,…”A…..A”(n个A)

  1. 当p.charAt(j) == s.charAt(i)或者p.charAt(j) == ‘.’时: dp[i][j] = dp[i-1][j-1]
  2. 当p.charAt(j) == ‘*’:
    1. 把’a*’当成empty string:此时: dp[i][j] = dp[i][j – 2];
    2. 这里还需要对*之前的char和当前s在i处的char进行判断.只有当*之前的char match s(i) (注意特殊情况:*之前是’.’) 的情况下:
      1. dp[i][j]= dp[i – 1][j] || dp[i][j – 1].这里和wildcard matching相同.

初始化同样的需要初始化dp[0][i]和dp[i][0]. 虽然有涉及到j-2但是不需要担心,因为input一直都是valid的,就不会存在j-2 out of index range的情况.

然后答案在p.charAt(j – 1) == ‘*’ 前p前一个char和s当前的char相同的情况下dp[i][j] |= dp[i][j – 1]只需要看它上一行同一列的元素即可.为什么不用开同一行的左边一个元素(dp[i][j-1])呢?

因为dp[i-1][j]的值取决于dp[i – 1][j – 2]和在p.charAt(i – 2)与s.charAt(j-2)相等的情况下,

 

    public boolean isMatch(String s, String p) {
        int m = s.length();
        int n = p.length();
        boolean[][] dp = new boolean[m + 1][n + 1];
        dp[0][0] = true;
        
        for (int i = 2; i <= n; i ++){
            if (p.charAt(i - 1) == '*'){
                    dp[0][i] = dp[0][i - 2];
            }
        }
        for (int i = 1; i <= m; i ++){
            for (int j = 1; j <= n; j ++){
                if (s.charAt(i - 1) == p.charAt(j - 1) || p.charAt(j - 1) == '.'){
                    dp[i][j] = dp[i - 1][j - 1];
                }else if (p.charAt(j - 1) == '*'){
                    dp[i][j] = dp[i][j - 2];
                    if (p.charAt(j - 2) == s.charAt(i - 1) || p.charAt(j - 2) == '.')
                        dp[i][j] |= dp [i - 1][j];
                }
            }
        }
        return dp[m][n];
    }
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: