Matching a word / characters outside of html tags
Today I spent a good 2 hours on this very simple regEx problem. I tried googling just about every set of search terms I could think of, and didn’t find anything useful … basically I wanted to replace a certain word inside a string with another word, but not within html.
To do this I used a negative lookahead to see if there were any > characters after the string I wanted to replace, preceded by any non < characters [if any]. The beauty of the look around functions are that they don’t match text … they instead match what’s positioned around the text, similar to how $, ^ and \b function.
So in English, the regEx I came up with, word(?!([^<]+)?>), could be interpreted as:
- Match the characters "word" literally word
- Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!([^)
- Match the regular expression below and capture its match into backreference number 1 ([^<]+)?
- Between zero and one times, as many times as possible, giving back as needed (greedy) ?
- Match any character that is not a "<" [^<]+
- Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
- Match the character ">" literally >
- Match the regular expression below and capture its match into backreference number 1 ([^<]+)?
[Thanks RegexBuddy]
For example, if we wanted to replace all instances of word with repl which exist outside of any HTML:
word <a href="word">word</word>word word
would become:
repl <a href="word">repl</word>repl repl
The regular expression I used to do this, word(?!([^<]+)?>), fits nicely into preg_replace();
<?php
$str = "word <a href=\"word\">word</word>word word";
$str = preg_replace("/word(?!([^<]+)?>)/i","repl",$str);
echo $str;
# repl <word word="word">repl</word>
?>
I know this is really a one-liner, but I have it in its expanded form to simplify the steps.
I’m a regex noob, and this is exactly what I was looking for. Thanks!
Matt Kantor
August 13, 2008
Thanks for posting this! This is the exact RegEx I was looking for, so you saved me a lot of time!
Dave Wooldridge
August 29, 2008
Thanks but this fails if you have a closing angle bracket following the text you want to replace without having a preceding opening bracket.
e.g.
word > word word
becomes
word > repl repl
Steve
December 28, 2008
Just what I needed! Thank you so much !
Martijn
March 11, 2009
True, this doesn’t work for word> word word – but in HTML text a > is represented by > so this won’t be a problem – also a lone > should not be found in valid HTML.
Charleh
October 9, 2009