Matching a word / characters outside of html tags

Posted on January 4, 2008. Filed under: PHP, Regular Expressions | Tags: , , , |

Today I spent a good 2 hours on this very simple regEx problem. I tried googling just about every set of search terms I could think of, and didn’t find anything useful … basically I wanted to replace a certain word inside a string with another word, but not within html.

To do this I used a negative lookahead to see if there were any > characters after the string I wanted to replace, preceded by any non < characters [if any]. The beauty of the look around functions are that they don’t match text … they instead match what’s positioned around the text, similar to how $, ^ and \b function.

So in English, the regEx I came up with, word(?!([^<]+)?>), could be interpreted as:

  • Match the characters "word" literally word
  • Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!([^)
    • Match the regular expression below and capture its match into backreference number 1 ([^<]+)?
      • Between zero and one times, as many times as possible, giving back as needed (greedy) ?
      • Match any character that is not a "<" [^<]+
        • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
    • Match the character ">" literally >

[Thanks RegexBuddy]

For example, if we wanted to replace all instances of word with repl which exist outside of any HTML:


word <a href="word">word</word>word word

would become:


repl <a href="word">repl</word>repl repl

The regular expression I used to do this, word(?!([^<]+)?>), fits nicely into preg_replace();


<?php
    $str = "word <a href=\"word\">word</word>word word";
    $str = preg_replace("/word(?!([^<]+)?>)/i","repl",$str);
    echo $str;
    # repl <word word="word">repl</word>
?>

I know this is really a one-liner, but I have it in its expanded form to simplify the steps.

Make a Comment

Make a Comment: ( 5 so far )

blockquote and a tags work here.

5 Responses to “Matching a word / characters outside of html tags”

RSS Feed for Adventures in PHP / DHTML / CSS and MySQL Comments RSS Feed

I’m a regex noob, and this is exactly what I was looking for. Thanks!

Thanks for posting this! This is the exact RegEx I was looking for, so you saved me a lot of time!

Thanks but this fails if you have a closing angle bracket following the text you want to replace without having a preceding opening bracket.

e.g.
word > word word

becomes

word > repl repl

Just what I needed! Thank you so much !

True, this doesn’t work for word> word word – but in HTML text a > is represented by > so this won’t be a problem – also a lone > should not be found in valid HTML.


Where's The Comment Form?

Liked it here?
Why not try sites on the blogroll...