In case you’ve never heard of it, I wrote a little Twitter friend/follower cross-reference tool a few years ago. Basically, I was wondering which of the people I followed on Twitter also followed me back, who didn’t follow back, who followed me that I didn’t follow back, and all the permutations around those ideas. After a couple of days of hacking, Twitual was born.
One of the main problems with Twitual, however, is that the data gathering and analysis is linear. When you submit a Twitter username, the server has to fetch all of the friends, fetch all of the followers, and do all of the set calculations before it can send a single thing back to your browser. Due to the way the Twitter API works, it may have to make many HTTP calls to the Twitter servers. It can be slow, and you’re left with a browser that appears to be frozen, with nothing happening, until it finally updates the page all at once.
On the server-side, I did not see any errors. Everything appeared to be working fine. But in the browser, I saw an error stating “Could not decode a text frame as UTF-8″. This error did not come from my code, or any of the libraries I was using. I finally determined that it was from the Chrome browser itself. And when this error occurred, the browser unceremoniously dropped the connection to the server, disconnecting the socket.io link, and from that point, there was no way to recover.
My next suspect was the socket.io module. This idea seemed to be supported by testing that revealed that everything went fine until I attempted to send my JSON-encapsulated Twitter data to the browser, plus the accompanying “Could not decode a text frame as UTF-8″ message from Chrome. But after more searching and reading, it seemed more and more like this problem was not specific to socket.io, or any other module.
In all of my searching, I seemed to keep coming back to this one particular message thread on the V8 issues queue about UTF-8 encoding/decoding problems. This seemed to be the crux of the problem, and after digesting what I read there, and experimenting with some code, I saw what was happening.
Fortunately, that same V8 issue discussion has a work-around. You can do additional encoding/decoding which will escape the troublesome byte sequences. Since my case was dealing with JSON instead of just plain strings, my implementation tosses the JSON serialization/deserialization into the mix:
While searching for solutions to the problem, I saw a lot of other people frustrated by similar symptoms. I’m hoping that by posting this, others will be able to find it and apply this solution, or something similar. Also, it appears that there are still discussions about modifying the behavior of V8 to better handle this encoding issue, hopefully in a completely transparent fashion.
Oh, and if you’re interested, you can try out the Twitual 2.0 Prototype. It’s very much a work-in-progress since I’ve mostly been trying to solve these underlying issues. So right now, there’s practically no UI, and it still needs better error handling for when something goes wrong with the Twitter API. Once I settle on a templating solution, the look-and-feel of the whole thing is going to start changing radically. And again, this is a prototype, and at any given time it might not even be up and running. You’ve been warned.