da: (bit)
[personal profile] da
I'm having a devil of a time with some php to do unicode processing and display.

Do you know how to turn the unicode representation of "Ğ" into its HTML entity (Ğ)? The character is from ISO-8859-9, Latin 9, and it's one of a few I'm having trouble converting. Because they're not in Latin 1. And heaven forfend we actually want to use those other characters.

You'd *think* (or at least I would think) htmlentities() would do the job; but not with htmlentities($string, ENT_QUOTES, 'UTF-8'); it remains the unicode string. get_html_translation_table(HTML_ENTITIES) suggests htmlentities() only has about 100 mappings, which is a disappointment.

I've browsed all sorts of PHP and perl docs, as well as straight references for ISO-8851-*, including some which say "here are the HTML mappings for a number of UTF8 characters" - but I haven't found an anywhere-near-useful set of UTF8 to HTML entities.

This seems like a bug.

Halp?

[Edit to add: I found this, which is perl to convert entities to LaTeX, and maybe I need to hack that up to produce a simple array myself?... Hm.]

Date: Tuesday, 28 September 2010 09:34 pm (UTC)
From: [identity profile] da-lj.livejournal.com
Ah, I see that now.

What I ended up doing was

mb_convert_encoding($element,'HTML-ENTITIES','UTF-8');

which fit the bill exactly (it uses named entities where available, and numeric entities where names aren't available).

And hey! Your profile builds cleanly. And is republished with a ğ where it belongs.

December 2024

S M T W T F S
12 34567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Wednesday, 24 December 2025 11:00 am
Powered by Dreamwidth Studios