php - How do I stop htmlPurifier from automatically decoding html entities?
I have a strange issue. I use CKEditor-4 to collect formatted text from user in form of html. Also, the html content is filtered using htmlpurifier from the server.
When the user use quotes like ???
, ??�
and ???
CKEditor converts them into html entities like ”
, ’
, and “
, which is fine. The issue is, when I filter them using htmlpurifier - this quotes get's automatically decoded. This prevents the content from: being presented to user for later edit as the quotes are literally encoded in strage ways like ???��?�
How do i fix this? I think, if I could stop htmlpurifier from automatically decoding things, this would work, But I am new to htmlpurifier - so I can't find a way.
I have tried using htmlentities
before passing it to htmlpurifier. but it would encode the whole html, Hence: stopping htmlpurifier from purifying html at all.
Answer
Solution:
After CBroe's comment, I found out that my application is not using UTF-8 all the way through.
And I can't rectify it also. For those who are in similar situation, I found a work-around. htmlPurifier does support a configuration to encode all non-ASCII charecters with some trade-offs - It's fine with my case(I think).
you can enable the htmlpurifier config Core.EscapeNonASCIICharacters
like so
$config->set('Core.EscapeNonASCIICharacters', true);
which did the trick for me.
Source