hKit and Tidying (X)HTML: A Serious Failure [Update]
Posted on March 6th, 2008
After promoting hAvatar during BarcampBrighton and SemanticCamp, I was starting to get pretty happy with the response we got from people. So, you can imagine how annoyed I was when suddenly my own avatar stopped working on several sites. Today I took a dive into the code behind the hAvatar plugin for Wordpress, and most importantly: hKit.
Problem Found
As I expected the issue was not with Alper’s code of the hAvatar plugin but rather with the hKit PHP library. As a step in the process of determining the hCard in the hKit library, the source code that is read is “Tidied”. hKit has 3 build in ways of doing this, the default being using the W3C tidy service.
The W3C service is a very simple service that simply takes a URL as a parameter like this:
http://cgi.w3.org/cgi-bin/tidy?docAddr=http://google.com/
As a result this gives the tidied output of the source of that URL. Now the serious issue that I ran into was that for some weird reason my URL (http://cristianobetta.com) causes a timeout in this service. To be more precise: about every URL on my server causes an issues. Obviously I contacted my hosting provider about this issue but let’s put the problem in a different perspective.
The Bottleneck Dillema
The problem here in my opinion is that hKit relies on a “bottleneck” in their process. Normally this bottleneck only causes a performance issue, but this time it even caused errors. Sadly though, because of this use hKit is not really a standalone script and things can therefore go wrong without hKit knowing. In my opinion hKit would be much more interesting if it shipped with an in-build, platform independent solution to take care of this step.
There are currently some other options besides the proxy to tidy up code. One of the settings of hKit allows for changing the tidy mode to “exec”, “php” or “none”. The first tries to use the tidy command, the second the tidy php functions. Unfortunately none of both are by default available on most systems, making an easy deployment of hAvatar on those systems way harder. To use the “php” option the tidy library needs to be compiled into PHP, which is sometimes impossible, and to use the “exec” command a binary is needed, which makes the solution rather platform dependent.
The Plead
So here I am, asking for a new solution that makes hKit a more independent library. I think it makes sense from a technical and philosophical perspective, but most of all from a performance view. Currently hKit isn’t that fast already (you can’t instance it more than once, making parallel processing of avatars fairly hard) and I think that a server side, non-proxy solution would seriously give this kit a performance boost. Obviously I would be happy as people would by default probably be able to load my avatar, as where it is currently unclear why W3C can’t fetch my url.
* Update: After an email with my hosting provider it seems that the W3C proxy now has access to my server, enabling all hAvatar activities on my domains. Still, I think my point above is valid from a performance point of view.