php - Remove all spaces from a string that are not enclosed in singlequotes or doublequotes
Solution:
You can use
preg_replace('~(?<!\\\\)(?:\\\\{2})*(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(?!)|\s+~s', '', $str)
See the PHP demo and a regex demo.
Details
(?<!\\)(?:\\{2})*- a check if there is no escaping\immediately on the left: any amount of double backslashes not preceded with\(?:"[^"\\]*(?:\\.[^"\\]*)*"'[^'\\]*(?:\\.[^'\\]*)*')- either a double- or single-quoted string literal allowing escape sequences(*SKIP)(?!)- skip the match and start a new search from the location where the regex failed- or\s+- 1 or more whitespaces.
Note that a backslash in a single-quoted PHP string literal is used to form string escape sequences, and thus a literal backslash is "coded" with the help of double backslashes, and to match a literal backslash in text, two such backslashes are required, hence "\\\\" is used.
Answer
Solution:
You could capture either " or ' in a group and consume any escaped variants or each until encountering the closing matching ' or " using a backreference \1
(?<!\\)(['"])(?:(?!(?:\1\\)).\\.)*+\1(*SKIP)(*FAIL)\h+
Explanation
(?<!\\)Negative lookbehind, assert not a\directly to the left(['"])capture group 1, match either'or"(?:Non capture group(?!(?:\1\\)).If what is not directly to the right is either the value in group 1 or a backslash, match any char except a newlineOr\\.Match an escaped character
)*+Close non capture group and repeat 1+ times\1Backreference to what is captured in group 1 (match up either'or")(*SKIP)(*FAIL)Skip the match until now. Read more about (*SKIP)(*FAIL)Or\h+Match 1+ horizontal whitespace chars that you want to remove
As @Wiktor Stribiżew points out in his comment
In some rare situations, this might match at a wrong position, namely, if there is a literal backslash (not an escaping one) before a single/double quoted string that should be skipped. You need to add (?:\{2})* after (?<!\)
The pattern would then be:
(?<!\\)(?:\\{2})*(['"])(?:(?!(?:\1\\)).\\.)*+\1(*SKIP)(*FAIL)\h+
Answer
Solution:
Here is a 3 step approach:
- replace spaces in quote sections with placeholder
- remove all spaces
- restore spaces in quote sections
$str = 'abc | xx ?? "1 x \' 3" d e f \' y " 5 \' x yz';
echo 'input: ' . $str . "\n";
$result = preg_replace_callback( // replace spaces in quote sections with placeholder
'|(["\'])(.*?)(\1)|',
function ($matches) {
$s = preg_replace('/ /', "\x01", $matches[2]);
return $matches[1] . $s . $matches[3];
},
$str
);
$result = preg_replace('/ /', '', $result); // remove all spaces
$result = preg_replace('/\x01/', ' ', $result); // restore spaces in quote sections
echo 'result: ' . $result;
echo "\nexpect: " . 'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';
Output:
input: abc | xx ?? "1 x ' 3" d e f ' y " 5 ' x yz
result: abc|xx??"1 x ' 3"def' y " 5 'xyz
expect: abc|xx??"1 x ' 3"def' y " 5 'xyz
Explanation:
- replace spaces in quote sections with placeholder
- use a
preg_replace_callback() '|(["\'])(.*?)(\1)|'matches quote sections starting and ending with either"or'- the
(\1)makes sure to match the closing quote (either"or') - within the callback, use
preg_replace()to replace all spaces with a non-printable replacement"\x01"
- remove all spaces
- use
preg_replace()to remove all spaces - the replace does not match the replacement
"\x01", thus misses spaces in quote sections
- restore spaces in quote sections
- use
preg_replace()to restore all spaces from replacement"\x01"