Cleaning up HTML

Quite some time ago, I coded up a PHP function that attempts to filter out potential nastiness in comments posted to MyPHPBlog sites (like this one). It’s mainly to keep people from injecting javascript into comments which could trigger Cross-Site Scripting problems. But I’ve never been completely satisfied with my solution. What I really wanted was a function that would not only filter for security, but would also turn the input into valid XHTML, automagically.

Simon Willison has come up with something that is a step in that direction: safeHtmlChecker. It’s a PHP class which will parse a chunk of XHTML and return a list of errors. But I want a class that will auto-correct the errors. Maybe it’s time to call upon the power of LazyWeb? Yes, I think it is….

LazyWeb, I invoke thee!

Stumble It!
Cleaning up HTML

Related posts:

  1. Fun Filters
    "UPDATE 2008-09-22: This code was superceded by my Text Filter Suite plugin. You can download the current version of the plugin from the WordPress Plugin..."
  2. Real World CSS
    " For anyone who is still mired in the HTML Tag Soup of old-style web page design, the question, “Why should I switch to using..."
  3. Future Hack
    " One of the great things about the Internet is that sometimes you find the answers to problems before you even start searching. Actually, the..."
  4. Clean URLs
    " Simon has a post about clean urls using Apache’s mod_rewrite. But there is a lot of other good info on the subject in the..."
  5. Asynchronous Weblogs.com Pings
    " Hey, Dave Winer — how about implementing a method of invoking the weblogs.com ping that doesn’t do an immediate site check? It could just..."
This entry was posted in Blogs and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

7 Comments

  1. Posted February 28, 2003 at 4:14 am | Permalink

    The ’standard’ tool for cleaning HTML (which can output XHTML) is HTML Tidy from the W3C (by Dave Raggett originally, I think) – I’m pretty sure you’ll find a PHP interface with a bit of googling. I use this both as a desktop app and (Java version) embedded in code, and its results really do seem magical at times.

  2. Posted February 28, 2003 at 6:50 am | Permalink

    I thought I had heard something about a native way to call Tidy from PHP, and I did do a search at php.net, but that came up empty. I just didn’t have time to cast my net wider yet.

  3. Posted February 28, 2003 at 1:37 pm | Permalink

    This looks like a working demo. I suspect they’re just passing it through the command line, which is what I’d recommend as the easiest solution as well.

  4. Posted February 28, 2003 at 1:50 pm | Permalink

    Yes, I had run across phpTidyHt earlier today. I’m definitely going to take a look at it, though I still hope to find a solution that doesn’t require a sytem call to an external program.

  5. Posted March 10, 2003 at 8:22 am | Permalink

    Try this:

    $foo = strip_tags($foo)

    then

    $foo = nl2br($foo)

    This will remove ALL HTML tags and format all newlines to

    If thats what you are after anyway.

  6. Posted March 10, 2003 at 9:19 am | Permalink

    I already use the strip_tags() function (with the optional array of allowed tags). But it doesn’t do anything to help close unbalanced tags, or to strip out attempts to insert javascript urls (unless you disallow links altogether).

    I’m still mulling the problem over. Another solution might be to use something like BBCode, or a Wiki-like syntax.

  7. dusoft ambience.sk
    Posted April 16, 2004 at 9:20 am | Permalink

    Check out Content management system Absolut Engine
    It produces valid XHTML Strict from WYSIWYG editor!

Post a Comment

Your email is never published nor shared.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe without commenting

  • Subscribe

  • Follow Me

    Twitter  Facebook  Flickr  Last.fm  LinkedIn  StumbleUpon  Technorati  Delicious  
  • Referrals

    Sign up for Text Link Ads and earn money from your blog.
  • Lifestream