PHP and JSON and Unicode, Oh my!

Today I had a bit of a puzzle.  The gplus widget that I wrote for WordPress was rendering Google+ activity with a ragged left edge.

Google+ Widget
The ragged edge is shown for *some* activities

As geeks know, html by default collapses whitespace, so in order for the ragged edge to appear there,  a “hard whitespace” must have crept in, somewhere along the line.   For example, HTML has entites like   that will do this.  A quick look in the Chrome developer tool confirmed this.

I figured this was a simple fix:

$content = str_replace(" "," ", $content);

Simple, right?

But that didn’t work.

Hmmm…. Next I added in some obvious variations on that theme:

$content = str_replace(" "," ", $content);
$content = str_replace("\n"," ", $content);
$content = str_replace("\r"," ", $content);

That also did not work.

This data was simple json, coming from Google+, via their REST API. (But not directly. I built caching into the gplus widget so that it doesn’t hit the Google server every time it renders itself. It reads data from the cache file if it’s fresh, and only connects to Google+ if necessary). A call to json_decode() turns that json into a PHP object, and badda-boom. Right?

Turns out, no. The json had a unicode sequence 0xC2A0, which I guess is a non-breaking space if you speak Unicode. Not sure how that got into the datastream, I certainly did not put it in there myself, explicitly. And when WordPress rendered the widget, that sequence somehow, somewhere got translated into  , but only after the gplus widget code was finished.

I needed to replace the unicode sequence with a regular whitespace. This did the trick.

$content = str_replace("\xc2\xa0"," ", $content);

PHP, JSON, HTTP, HTML – these are all minor miracle technologies. Sometimes when gluing them all together you don’t get the results you expect. In those cases I’m glad to have a set of tools at my disposal, so I can match impedances.

2 thoughts on “PHP and JSON and Unicode, Oh my!

  1. Travis Tubbs Reply

    OMG thank you!!! This has been bugging me a LOT today. I had been jumping through hoops to try and fix this error. It would be nice to know why this happens. (The “Â ” are clearly visible in the original JSON output. Just couldn’t figure out why it’s even there to begin with and how to get rid of it.)

  2. Omar Rdz Reply

    Thnks was very useful. I look everywhere and your solution if I worked.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.