Regex PHP Function
one text
Yes, I know that people don't like parsing PHP, use a tokenizer they said, it will be great they said... I want you to know it isn't great, it isn't even fine.
I am working in .NET and using PCRE-NET and want to parse some PHP Functions to see if I can do some PHP tree shaking.
I tried using CodeParser
which uses Antlr4
to tokenize, the results I got back were horrible to navigate. Yes it is all there technically, but it is so convoluted that really, Regex is better for what I am looking for.
I have the following regex working:
(?<functionScope>\w+)\s*function\s+(?<functionName>\w+)\s*\((?<functionArguments>(?:[^()]+)*)?\s*\)[\s:]*.*(?<functionBody>{(?:[^{}]+|(?-1))*+})
Try it out: https://regex101.com/r/yU6K45/1
This will break up a PHP File into the individual scopes, functions, arguments and function body. I am now looking at the functionBody
and wanting to find all functions used inside that function, which I have here:
(?=[^\=\s])((?<functionClass>[$?\w[\w\d]*)?(?<ClassOperator>::|->|\\)?){0,3}?(?<functionName>\w[\w\d]*)\((?<Arguments>.*)?\)
See it at: https://regex101.com/r/3JzPR5/1
An issue I am having is with named groups. When there is a lot of namespacing, the named groups don't work out well. I am wondering if you have any ideas how to split up the line:
$uri = ExtraLevel\Psr7\UriResolver::resolve(Psr7\Utils::uriFor($config['base_uri']), $uri);
To where I would have something like:
Full match ExtraLevel\Psr7\UriResolver::resolve(Psr7\Utils::uriFor($config['base_uri']), $uri)
Group `functionClass` ExtraLevel\
Group `functionClass2` Psr7\
Group `functionClass3` UriResolver::
Group `functionName` resolve
Group `Arguments` Psr7\Utils::uriFor($config['base_uri']), $uri
Would love to match in a way that won't break when there aren't 3-4 levels.
Source