I have strings like'ageage'
or'feetfeetfeet'
or'cmcmcmcmcm'
and would like to reduce these to'age'
,'feet'
, and'cm'
respectively.
This is an intermediate step in normalization for matching across different data sources of certain classes of data fields that originally also contained numbers. The numeric parts have been removed into a separate string. All the unicode letters have been transliterated to lowercase ASCII letters with:
public static function transliterate(string $value)
{
$transliterator = Transliterator::createFromRules(
':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;',
Transliterator::FORWARD
);
return $transliterator->transliterate($value);
}
Also note that pluralization doesn't matter because while the examples I've provided are in English the project is normalizing mainly Turkish strings where such words would always be singular.
I expect this can be done with regex. Though I'm not entirely sure how
I assume non regex is ok.
This method loops through half the string and tries to find a substring that if used in a str_replace returns nothing.
If we find that then the know it's a repeating word.
$str = 'feetfeetfeet';
$return = $str; // return full str if it fails
$len = strlen($str);
for($i = 1; $i < $len/2; $i++){
$sub = substr($str, 0, $i);
if(str_replace($sub, "", $str) == ""){
$return = $sub;
break;
}
}
echo $return; //feet
This looks similar to finding longest common prefix which is also a suffix. Now, thelength - longest prefix which is also a suffix
is your answer. You can find the algorithm of building the prefix suffix table from this .
Time complexity isO(n)
and space complexity isO(n)
.
Snippet:
<?php
$str = "feetfeetfeet";
$length = strlen($str);
$prefix_suffix_table = array_fill(0, $length, 0);
$j = 0;
for($i = 1; $i < $length; ++$i){
while($j > 0 && $str[$i] != $str[$j]){
$j = $prefix_suffix_table[$j - 1];
}
if($str[$i] == $str[$j]){
$prefix_suffix_table[$i] = ++$j;
}
}
echo substr($str, 0, $length - end($prefix_suffix_table));
Demo: http://sandbox.onlinephpfunctions.com/code/b401c75cde38a51a561b53bb0a6294eb615b208c
Note: If your string is malformed likexyz
which doesn't have a repeating substring, you can just add an additional check usingstr_repeat()
and throw an exception if required.
You can also usestr_split()
to convert the string into array and find its unique elements and then again return implode all the unique elements together.
<?php
$str = array_unique(str_split('ageage'));
$result = implode($str);
?>
Output
age
I have figured out how to do this with a regex. Even though I have realized that it might not be useful for my purposes because mmmm can be both 2x mm (millimeter) or 4x m (meters). Though If I only care about supporting up to 3 repetitions I can use:
if(preg_match('/^([a-z]*)\1{2}$/', $input, $matches)) {
$repeating = $matches[1];
$reps = 3;
} elseif(if(preg_match('/^([a-z]*)\1$/', $input, $matches)) {
$repeating = $matches[1];
$reps = 2;
} else {
$repeating = $input;
$reps = 1;
}
Not that the following will divide the string into the smallest prime number of repeats:
preg_match('/^([a-z]*)\1+$/', $input, $matches);
$repeating = $matches[1];
Here is a table of the outputs of this:
?�??�??�??�??�??�??�??�??�??�??�??�??�??��?�??�??�??�??�??�??�??�??�??�??�??�??�?
?�� $input ?�� $repeating ?��
?�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??��
?�� mm ?�� m ?��
?�� mmm ?�� m ?��
?�� mmmm ?�� mm ?��
?�� mmmmm ?�� m ?��
?�� mmmmmm ?�� mmm ?��
?�� mmmmmmm ?�� m ?��
?�� mmmmmmmm ?�� mmmm ?��
?�� mmmmmmmmm ?�� mmm ?��
?�� mmmmmmmmmm ?�� mmmmm ?��
?��?�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�?
Because only the smalles prime subdivisions are considered
preg_match('/^([a-z]*)\1{1,2}$/', $input, $matches)
is unsuitable as it will, like in the above table, find the repeating part of 'mmmmmm' to be 'mmm' instead of the desired mm.
The three case implementation I have provided at the beginning is what I am currently using because my input is generally either age groups or dimensions for products and I have yet to see a product be described with more than three dimensions or with an age group like'11yr,12yr,13yr,14yr'
though I can imagine something like the latter, however rare, eventually occurring. Thus I will probably move away from this method and switch to extracting the units from the original string containing the numbers with preg_match_all:
preg_match_all('/([0-9]+)\s*([a-z]*)\s*/', $input, $matches)
However in case someone else is actually interested in finding the smallest repeating substring (so 'm' for 'mmmm') this can be done with a regex in a loop:
$repeating = $input;
while(preg_match('/^([a-z]*)\1+$/', $repeating, $matches)) {
$repeating = $matches[1];
}
This will produce:
?�??�??�??�??�??�??�??�??�??�??�??�??�??��?�??�??�??�??�??�??�??�??�??�??�??�??�?
?�� $input ?�� $repeating ?��
?�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??��
?�� mm ?�� m ?��
?�� mmm ?�� m ?��
?�� mmmm ?�� m ?��
?�� mmmmm ?�� m ?��
?�� mmmmmm ?�� m ?��
?�� mmmmmmm ?�� m ?��
?�� mmmmmmmm ?�� m ?��
?�� mmmmmmmmm ?�� m ?��
?�� mmmmmmmmmm ?�� m ?��
?�� cmcm ?�� cm ?��
?�� cmcmcm ?�� cm ?��
?�� cmcmcmcm ?�� cm ?��
?�� cmcmcmcmcm ?�� cm ?��
?��?�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�??�?
Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.
Find the answer in similar questions on our website.
Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.
PHP (from the English Hypertext Preprocessor - hypertext preprocessor) is a scripting programming language for developing web applications. Supported by most hosting providers, it is one of the most popular tools for creating dynamic websites.
The PHP scripting language has gained wide popularity due to its processing speed, simplicity, cross-platform, functionality and distribution of source codes under its own license.
https://www.php.net/
Welcome to the Q&A site for web developers. Here you can ask a question about the problem you are facing and get answers from other experts. We have created a user-friendly interface so that you can quickly and free of charge ask a question about a web programming problem. We also invite other experts to join our community and help other members who ask questions. In addition, you can use our search for questions with a solution.
Ask about the real problem you are facing. Describe in detail what you are doing and what you want to achieve.
Our goal is to create a strong community in which everyone will support each other. If you find a question and know the answer to it, help others with your knowledge.