Regex to remove an HTML tag and its content from PHP string
  John Mwaniki /   05 Dec 2021

Regex to remove an HTML tag and its content from PHP string

How to remove HTML tags from a PHP string.

We use the in-built PHP strip_tags() function to remove HTML, XML, and PHP tags from a PHP string.

Example

<?php
$mystring = "<h1>Lorem Ipsum</h1><p>Lorem <strong>ipsum dolor</strong> sit amet, <em>consectetur</em> adipiscing elit. <span style='color: green'>Donec</span> nec volutpat ligula.</p>";
echo strip_tags($mystring);

Output

Lorem IpsumLorem ipsum dolor sit amet, consectetur adipiscing elit. Donec nec volutpat ligula.

As you can see, it removes all the HTML tags and their attributes but retains all the content of those tags.

How to retain only specified tags

The strip_tags() function allows for a second optional argument for specifying allowable tags to be spared when the rest HTML tags get stripped off. This way, you can retain some and remove all the other tags.

Example

<?php
$mystring = "<h1>Lorem Ipsum</h1><p>Lorem <strong>ipsum dolor</strong> sit amet, <em>consectetur</em> adipiscing elit. <span style='color: green'>Donec</span> nec volutpat ligula.</p>";
echo strip_tags($mystring,"<h1>,<p>");

Output

<h1>Lorem Ipsum</h1><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec nec volutpat ligula.</p>

As you can see the rest of the tags have been removed leaving the string with only the <h1> and <p>, which were specified in the second argument.

How to remove certain tags with all their content

As opposed to the above examples where only tags are removed but their content remains intact, let's see how we can do away with specific tags together with their content.

To achieve this we use the PHP preg_replace() function.

<?php
$newstring = preg_replace('~<tag(.*?)</tag>~Usi', "", $str);

The first argument is the regular expression(we specify the tag(s) that we want to remove or replace in it), the second is the match(this is what we replace the specified tag(s) with) and the third is the string in which we want to make changes to.

Replace the terms "tag" with the respective opening and closing tags you wish to remove and $str with your string. These tags in the string will get replaced with whatever you set as the second argument, in this case, removed since we have used empty quotes "".

Example

<?php
$mystring = "<h1>Lorem Ipsum</h1><p>Lorem <strong>ipsum dolor</strong> sit amet, <em>consectetur</em> adipiscing elit. <span style='color: green'>Donec</span> nec volutpat ligula.</p>";
echo preg_replace('~<h1(.*?)</h1>~Usi', "", $mystring);

Output

<p>Lorem <strong>ipsum dolor</strong> sit amet, <em>consectetur</em> adipiscing elit. <span style='color: green'>Donec</span> nec volutpat ligula.</p>

We have removed the <h1> tag and its content as specified in the function.

If you would like to strip off multiple tags with their content at a go, you can specify them as an array of regular expressions in the first argument of the function.

Example

<?php
$mystring = "<h1>Lorem Ipsum</h1><p>Lorem <strong>ipsum dolor</strong> sit amet, <em>consectetur</em> adipiscing elit. <span style='color: green'>Donec</span> nec volutpat ligula.</p>";
echo preg_replace(array('~<h1(.*?)</h1>~Usi','~<strong(.*?)</strong>~Usi','~<em(.*?)</em>~Usi'), "", $mystring);

Output

<p>Lorem sit amet, adipiscing elit. <span style='color: green'>Donec</span> nec volutpat ligula.</p>

We have specified an array of <h1>, <strong> and <em>, all which together with their content have been striped off.

That's all for this article.