Regular expressions in php deletes dots, commas, etc

I have this code:

$words = ['mleko', 'mas?�o'];
$words = explode(' ', $value); // create an array of words
foreach($words as $word) { //iterate through words
    $word = preg_replace('/[^\w]/uis', '', $word);
    if (in_array(mb_strtolower($word), $allergens)) {
        $return .= "<b>" . $word . "</b> ";
    } else {
        $return .= $word . " ";
    }
}

The above code works fine, but it deletes characters like:,. e.t.c.

How can I fix it? :)

Answer

Solution:

Problem jest w podej?�ciu jakie zastosowa?�e?�. Nie tylko w linii

$word = preg_replace('/[^\w]/uis', '', $word);

which should be extended with the characters of the Polish alphabet (the range of the \w class is [a-zA-Z0-9_], remember to mention the range of lowercase and uppercase characters separately) like in this line

$word = preg_replace('/[^\w?�?�?�?�?�???�?????�?�???????�?????�]/uis', '', $word);

Moreover, I believe that the above line is used incorrectly. In my opinion, you should save the result of this operation in another variable as below

$rawWord = preg_replace('/[^\w?�?�?�?�?�???�?????�?�???????�?????�]/uis', '', $word);

Thanks to this, you have access to both the purified and the original value, which you can use in this way

if (in_array(mb_strtolower($rawWord), $allergens)) {
    $return .= str_replace($rawWord, "<b>{$rawWord}</b> ", $word);
} else {
    $return .= $word;
}

With this approach, however, you will still miss some characters. Even spaces that you filtered out with explode earlier. In my opinion, instead of concatenating a string, you should build an array and finally concatenate it with spaces. Complete code below.

$allergens = ['jogurt', 'jaja', '??ytni', "jogurt", "banan"];
$value = 'Chleb ??ytni, mas?�o z mleka, jogurt naturalny z mleka, jaja, piecze?� rzymska z kasz?� gryczan?�.';

$returns = [];
$words = explode(' ', $value); // create an array of words
foreach($words as $word) { //iterate through words
    $rawWord = preg_replace('/[^\w?�?�?�?�?�???�?????�?�???????�?????�]/uis', '', $word);
    if (in_array(mb_strtolower($rawWord), $allergens)) {
        $returns[] = str_replace($rawWord, "<b>{$rawWord}</b>", $word);
    } else {
        $returns[] = $word;
    }
}

$return = implode(' ', $returns);

Look at this line

$returns[] = str_replace($rawWord, "<b>{$rawWord}</b>", $word);

Replaces the original word (containing the characters you want to ignore) with cleaned and bold version of the word. This keeps all characters (like commas) stuck to the word.

In the $return variable at the end you will get something like this

Chleb ??ytni, mas?�o z mleka, jogurt naturalny z mleka, jaja, piecze?� rzymska z kasz?� gryczan?�.

Source