Home > C#, Misc > C# – Regular Experssion Match Pattern Exclude Words

C# – Regular Experssion Match Pattern Exclude Words

Recently I got a problem for parsing html tags and replacing with different tags.

For example: Find all b tags and replace them with strong

<b>This is a bold text</b> Lorem Ipsum is simply dummy text 
<b>some more bold text</b> 
Lorem Ipsum is simply dummy text <b> again bold text </b>

Convert above to

<strong>This is a bold text</strong> Lorem Ipsum is simply dummy text 
<strong> some more bold text </strong> 
Lorem Ipsum is simply dummy text <strong> again bold text </strong>

First I thought of using simple string replacement, but that won’t work in all cases.
Then I thought of splitting and parsing each token to solve. This require lots of code
and in some corner cases fails. Now only option left with me is Regular expressions.
Here again I got into several issues, but finally I came up with below.

    string strRegex = @"\<b\>((.|\n)*?)\<\/b\>";
    RegexOptions myRegexOptions = RegexOptions.Multiline;
    Regex myRegex = new Regex(strRegex, myRegexOptions);
    string strReplace = @"<strong>$1</strong>";
    string newstring = myRegex.Replace(inputstring, strReplace);
Advertisements
Categories: C#, Misc Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: