php - Remove all spaces from a string that are not enclosed in singlequotes or doublequotes
Solution:
You can use
preg_replace('~(?<!\\\\)(?:\\\\{2})*(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(?!)|\s+~s', '', $str)
See the PHP demo and a regex demo.
Details
(?<!\\)(?:\\{2})*
- a check if there is no escaping\
immediately on the left: any amount of double backslashes not preceded with\
(?:"[^"\\]*(?:\\.[^"\\]*)*"'[^'\\]*(?:\\.[^'\\]*)*')
- either a double- or single-quoted string literal allowing escape sequences(*SKIP)(?!)
- skip the match and start a new search from the location where the regex failed- or
\s+
- 1 or more whitespaces.
Note that a backslash in a single-quoted PHP string literal is used to form string escape sequences, and thus a literal backslash is "coded" with the help of double backslashes, and to match a literal backslash in text, two such backslashes are required, hence "\\\\"
is used.
Answer
Solution:
You could capture either "
or '
in a group and consume any escaped variants or each until encountering the closing matching '
or "
using a backreference \1
(?<!\\)(['"])(?:(?!(?:\1\\)).\\.)*+\1(*SKIP)(*FAIL)\h+
Explanation
(?<!\\)
Negative lookbehind, assert not a\
directly to the left(['"])
capture group 1, match either'
or"
(?:
Non capture group(?!(?:\1\\)).
If what is not directly to the right is either the value in group 1 or a backslash, match any char except a newlineOr
\\.
Match an escaped character
)*+
Close non capture group and repeat 1+ times\1
Backreference to what is captured in group 1 (match up either'
or"
)(*SKIP)(*FAIL)
Skip the match until now. Read more about (*SKIP)(*FAIL)Or
\h+
Match 1+ horizontal whitespace chars that you want to remove
As @Wiktor Stribiżew points out in his comment
In some rare situations, this might match at a wrong position, namely, if there is a literal backslash (not an escaping one) before a single/double quoted string that should be skipped. You need to add (?:\{2})* after (?<!\)
The pattern would then be:
(?<!\\)(?:\\{2})*(['"])(?:(?!(?:\1\\)).\\.)*+\1(*SKIP)(*FAIL)\h+
Answer
Solution:
Here is a 3 step approach:
- replace spaces in quote sections with placeholder
- remove all spaces
- restore spaces in quote sections
$str = 'abc | xx ?? "1 x \' 3" d e f \' y " 5 \' x yz';
echo 'input: ' . $str . "\n";
$result = preg_replace_callback( // replace spaces in quote sections with placeholder
'|(["\'])(.*?)(\1)|',
function ($matches) {
$s = preg_replace('/ /', "\x01", $matches[2]);
return $matches[1] . $s . $matches[3];
},
$str
);
$result = preg_replace('/ /', '', $result); // remove all spaces
$result = preg_replace('/\x01/', ' ', $result); // restore spaces in quote sections
echo 'result: ' . $result;
echo "\nexpect: " . 'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';
Output:
input: abc | xx ?? "1 x ' 3" d e f ' y " 5 ' x yz
result: abc|xx??"1 x ' 3"def' y " 5 'xyz
expect: abc|xx??"1 x ' 3"def' y " 5 'xyz
Explanation:
- replace spaces in quote sections with placeholder
- use a
preg_replace_callback()
'|(["\'])(.*?)(\1)|'
matches quote sections starting and ending with either"
or'
- the
(\1)
makes sure to match the closing quote (either"
or'
) - within the callback, use
preg_replace()
to replace all spaces with a non-printable replacement"\x01"
- remove all spaces
- use
preg_replace()
to remove all spaces - the replace does not match the replacement
"\x01"
, thus misses spaces in quote sections
- restore spaces in quote sections
- use
preg_replace()
to restore all spaces from replacement"\x01"