Sunday, October 30, 2011

Strip HTML using UDF in SQL Server 2005

I would strongly suggest to do this in the front end application (or) make use of CLR based function to do this job. But for simple well formed html string may be we can make use of the new XML datatype introduced in SQL Server 2005 as shown in the below example.


CREATE FUNCTION dbo.Strip_WellFormed_HTML

@inputString VARCHAR(MAX) 

RETURNS VARCHAR(MAX) 
AS
BEGIN
--Variable Declaration    
DECLARE @htmlContent XML
    DECLARE @parsedValue VARCHAR(MAX)
    
--Variable Initialization
SET @htmlContent = @inputString;
    
  WITH HTML(InnerText) AS
    (
        SELECT Html.Tag.query('.') FROM @htmlContent.nodes('/') AS Html(Tag)
    )
    SELECT @parsedValue = InnerText.value('.', 'VARCHAR(MAX)') FROM HTML
    RETURN @parsedValue
END
GO


Pasting of html tags in blogger seems to be a difficult task. So just check out the attached screenshots of the way I have tested it.



Strip HTML irrespective of whether it is Well formed or not:

I have used Bill McEvoy's script in the past for this purpose. He followed a very cool idea and the original article can be read here. If you wanted to directly see his script then check out here.


No comments: