da: (bit)
[personal profile] da
I'm having a devil of a time with some php to do unicode processing and display.

Do you know how to turn the unicode representation of "Ğ" into its HTML entity (Ğ)? The character is from ISO-8859-9, Latin 9, and it's one of a few I'm having trouble converting. Because they're not in Latin 1. And heaven forfend we actually want to use those other characters.

You'd *think* (or at least I would think) htmlentities() would do the job; but not with htmlentities($string, ENT_QUOTES, 'UTF-8'); it remains the unicode string. get_html_translation_table(HTML_ENTITIES) suggests htmlentities() only has about 100 mappings, which is a disappointment.

I've browsed all sorts of PHP and perl docs, as well as straight references for ISO-8851-*, including some which say "here are the HTML mappings for a number of UTF8 characters" - but I haven't found an anywhere-near-useful set of UTF8 to HTML entities.

This seems like a bug.

Halp?

[Edit to add: I found this, which is perl to convert entities to LaTeX, and maybe I need to hack that up to produce a simple array myself?... Hm.]

Date: Tuesday, 28 September 2010 01:57 am (UTC)
From: [identity profile] cypherpunk95.livejournal.com
I seem to recognize this problem. ;-)

Are you saying you have "Ğ" and you want "Ğ"? In HTML, isn't that the same thing? Or you have the UTF-8 representation "\xc4\x9f" and you want "Ğ"? [The latter's just some bitfiddling.] I don't think you should ever need to generate the string "Ğ".

Date: Tuesday, 28 September 2010 12:37 pm (UTC)
From: [identity profile] da-lj.livejournal.com
> I seem to recognize this problem. ;-)

Yeah. I can hardly believe that in '08 we fit everybody's data into ISO-8851-1. ...Or shoe-horned it in, more likely.

> Or you have the UTF-8 representation "\xc4\x9f" and you want "Ğ"? [The latter's just some bitfiddling.]

Yes, that is what I want. I will look at wikipedia again, and see how my comprehension is this time, with this slightly more specific pointer. :)

[edit & to &]
Edited Date: Tuesday, 28 September 2010 12:39 pm (UTC)

Date: Tuesday, 28 September 2010 11:00 pm (UTC)
chezmax: (Default)
From: [personal profile] chezmax
You just want to do UTF-8 decoding, which isn't too difficult. It's just a way of expanding the bytes so they all fit in 8-bits.

December 2024

S M T W T F S
12 34567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Wednesday, 24 December 2025 01:08 pm
Powered by Dreamwidth Studios