Friday, June 10, 2011

String Manipulation in SQL

Here we cover all the scenario which are common in the SQL for data cleaning.
1.Remove all the special characters and the characters from the String and pick only numbers.


CREATE FUNCTION dbo.UFNG_ONLY_DIGITS (@StrVal AS VARCHAR(max))
RETURNS VARCHAR(max)
AS
BEGIN
      WHILE PATINDEX('%[^0-9]%', @StrVal) > 0
            SET @StrVal = REPLACE(@StrVal,
                SUBSTRING(@StrVal,PATINDEX('%[^0-9]%', @StrVal),1),'')
      RETURN @StrVal
END


SELECT dbo.UFNG_ONLY_DIGITS('8as</SDf>ggfhfdgh#$%^#$%^df7%87^A8876-*S')

2.Remove the HTML Code from the String
CREATE FUNCTION dbo.tagstrip ( @in VARCHAR(8000)  )
RETURNS VARCHAR(8000) AS BEGIN
DECLARE @i INT
      WHILE 1 = 1 BEGIN
            SET @i = LEN( @in )
            SET @in = REPLACE( @in, SUBSTRING( @in,
            CHARINDEX( '<', @in ),
            CHARINDEX( '>', @in ) -
            CHARINDEX( '<', @in ) + 1 ), SPACE( 0 ) )
            IF @i = LEN( @in ) BREAK END
      RETURN (  @in )
END

SELECT dbo.tagstrip('<div><font face=Calibri size=2 color=black>N.A.</font></div>')

3.Remove HTML text using the SSIS package

If the Source contain the the record with the Html tags then we can remove it by adding the derived column transformation in the package.
Most of the time this scenario occurs when we use the source as a share point List.  


The source data is Like:
<Div>Pushpendra S. Dhakad</Div>
<Div>#$Pushpendra S. Dhakad#$</Div>

Write below code:
(DT_WSTR,4000)(LEN(Comments_U) == 0 ? "" : (DT_WSTR,4000)REPLACE(REPLACE(REPLACE(RTRIM(LTRIM(Comments_U)),"<div>",""),"</div>",""),"&nbsp;","")) 

No comments:

Post a Comment