php - How do I stop htmlPurifier from automatically decoding html entities?

I have a strange issue. I use CKEditor-4 to collect formatted text from user in form of html. Also, the html content is filtered using htmlpurifier from the server.

When the user use quotes like ???, ??� and ??? CKEditor converts them into html entities like ”, ’, and “, which is fine. The issue is, when I filter them using htmlpurifier - this quotes get's automatically decoded. This prevents the content from: being presented to user for later edit as the quotes are literally encoded in strage ways like ???��?�

How do i fix this? I think, if I could stop htmlpurifier from automatically decoding things, this would work, But I am new to htmlpurifier - so I can't find a way.

I have tried using htmlentities before passing it to htmlpurifier. but it would encode the whole html, Hence: stopping htmlpurifier from purifying html at all.

Answer

Solution:

After CBroe's comment, I found out that my application is not using UTF-8 all the way through.

And I can't rectify it also. For those who are in similar situation, I found a work-around. htmlPurifier does support a configuration to encode all non-ASCII charecters with some trade-offs - It's fine with my case(I think).

you can enable the htmlpurifier config Core.EscapeNonASCIICharacters like so

$config->set('Core.EscapeNonASCIICharacters', true);

which did the trick for me.

Source