MS SQL Backslash preceding new line/line feed removes line break - sql-server-2008

I'm working in classical ASP and am trying to simply insert some user input into my MS SQL 2008 database. This is something I do basically every day, but I don't think I've ever experienced this bug before.
The bug I am getting is that, if a user ends a line of text in backslash, and starts a new line below, both the backslash and line break are lost after the data is stored in the DB.
If i try the following statement, hardcoded from an ASP file:
UPDATE TBLarticle_text SET Introduction = 'Text on first line \" & vbCrLf & " text on second line' WHERE ArticleGuid = 28
The resulting data is without the backslash or the line break. The string is correct if stored in a variable and printed on the page.
Here is the example user input (normally from a form, but it's not really relevant). The input:
Text on first line \
text on second line
... is stored as:
Text on first line text on second line
I don't see any issues if the backslash is followed by anything other than a line break.

I know this is old, but I just came across a MS KB article that discusses this: http://support.microsoft.com/kb/164291/en-us. The long and short of it is that the three characters \<CR><LF> together is some weird escape sequence, and you need to replace all occurrences of \<CR><LF> with \\<CR><LF><CR><LF> when inserting or updating because the first backslash in \\<CR><LF><CR><LF> escapes the weird escape sequence, i.e. the next three characters \<CR><LF>, and then the additional <CR><LF> puts the carriage return back in place. Annoying, yes.

I am seeing a similar issue. I would say the best way to work around this is to not let ASP pass such strings to SQL Server. You can clean this up by simply replacing that character sequence and injecting a space at the end of such a line (which a user is unlikely to ever notice):
sql = REPLACE(sql, "\" & vbCrLf, "\ " & vbCrLf)
You don't have to be using ASP or VBScript to observe this behavior:
CREATE TABLE #floobar(i INT, x VARCHAR(255));
INSERT #floobar SELECT 1, 'foo \
bar';
INSERT #floobar SELECT 2, 'foo \\
bar';
INSERT #floobar SELECT 3, 'foo
bar';
SELECT * FROM #floobar;
I don't know that you're ever going to get Microsoft to fix this to not treat this character sequence special, so the "fix" is going to be to work around it. You may also have to watch out for non-traditional CR/LF, e.g. CHR(10) or CHR(13) on their own, or also vbTab (Chr(9))).

Related

SQL is seemgly removing spaces that have been inserted by RegEx

I am working on a JavaScript app, in which I am preparing my data replacing tabs with spaces using RegEx in the frontend:
str = str.replace(/\t+/g, " ");
So
'tabbed title'
becomes
'tabbed title' and so on and so forth
This is then passed to an express route which then sends the data to my MySQL database via a stored procedure, utilizing the escape() method from the Javascript MySQL sdk
The issue is, when passing a string where tab characters have been replaced with spaces after the RegEx, the title is being stored in the database as 'tabbedtitle'
When entering 'tabbed title' normally, with spaces entered via my keyboard, the space is preserved. After the RegEx transform, it is not. It seems like SQL is doing something under the hood, or the " " in my RegEx is not a traditional space character (even though in all my of my research it appears it is a regular space)
I've confirmed I am indeed passing 'tabbed title' to the db from express, and there is nothing transforming the data inside my SP. I've even tried entering a utf-8 space \u0020 rather than " " in my RegEx, but the problem perists
Instead of replacing tabs with a space maybe replace them with a hyphen or some other non-whitespace character? Might help narrow it down

easy way to query without putting everything in quotation marks

How do I query in MySql without putting all inserts in quotations? (I have a big list and it would take to much time to quote and unquote every word)
Example:
SELECT *
FROM names
WHERE names.first IN ("joe", "tom", "vincent")
Since you said the list is comma separated, simply use the 'find and replace' feature to find all commas and replace them with ","
The result should be joe","tom","vincent"," which you can simply copy into mysql.
All you then have to do is edit the start and end of the string

How to replace delimiters from a string in SQL Server

I have the following data
abc
pqr
xyz,
jkl mno
This is one string separated by delimiters like space, new line, comma, tab.
There could be two or more consecutive spaces or tabs or any delimiter after or before a word.
I would like to be able to do the following
Get the individual words removing all leading and trailing delimiters off it
Append the individual words with "OR"
I am trying to achieve this to build a T-SQL query separated by OR clause.
Thanks
I think you can achieve what you need (although I think using a programming language is way better) using just SQL, here is my approach.
Kindly note that I will just handle commas, newlines and multiple-spaces, but you can simple follow using the same technique to remove the rest of your undesired characters
so let's assume that we have a table names ExampleData with a column named DataBefore and another called DataAfter.
DataBefore: has the line value that you want to clean
DataAfter: will host the cleaned text
First we need to trim the preceding & leading space(s) from the text
Update ExampleData
set DataAfter = LTRIM(RTRIM(DataBefore))
Second, we should clean all the commas, and replace them with spaces (doesn't matter if we will end up with many spaces together)
Update ExampleData
set DataAfter = replace(replace(DataAfter,',',' '),char(13),' ')
This is the part in which you may continue and remove any other characters using the same technique, and replace it by a space
So far we have a text that has no spaces before or after, and every comma, newline, TAB, dash, etc character replaced by a space, let's continue our cleaning procedure.
We can now safely move on to replace the spaces between words with just one, this is made by using the following SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' ')
as per your needs, we need to place an OR between each word, this is achievable with this SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' OR ')
we are done now, as a final step that may or may not make a change, we need to remove any space at the end of the whole text, just in case an unwanted character was at the end of the text and as a result got replaced by a space, this can be achieved by the following statement:
Update ExampleData
set DataAfter = RTRIM(DataAfter)
we are now done. :)
as a test, I've generated the following text inside the DataBefore column:
this is just a, test, to be sure, that everything is, working, great .
and after running the previous commands, ended up with this value inside the DataAfter column:
this OR is OR just OR a OR test OR to OR be OR sure OR that OR everything OR is OR working OR great OR .
Hope that this is what you want, let me know if you need any extra help :)

dealing with strange characters with access, asp and CSVs

I have a problem, i have to create a csv file with an ASP Classic page, taking the data from a MS Access database, all really simple, but in the final file I have tons of strange characters, appearing as squares (unknown character square). I must get rid of those characters, but i really don't know how... have you got some ideas?
this is how I see something on the file: M�NSTERSTRA�E and of course, I don't really know which are the char that give problems...and they are really a lot.
and this is how I write the csv...
dim fs,f,d
set fs = Server.CreateObject("Scripting.FileSystemObject")
set f = fs.OpenTextFile(Server.MapPath("clienti.csv"), 2, true,true)
d = ""
do while not rs1.EOF
d = ""
For Each fField in RS1.Fields
f.Write(d)
f.Write(" ")
temp = RS1(fField.Name)
if len(trim(temp)) > 0 then
f.Write(trim(temp))
end if
d = ";"
Next
f.WriteLine("")
rs1.movenext
loop
f.Close
set f = Nothing
set fs = Nothing
I can't think about making a replace of all the chars, becouse I don't know them before i extract all the clients... I need some workaround for this...
The � means that your browser doesn't recognize that char, so makes a substitute. One example is the "smart quotes" (curly ones) that some applications, like MS Word, substitute for the strait quotes. The default character encoding is ISO-8859-1.
If you don't want those to show up, you have 2 choices. You can delete them, of you can try to find the appropriate substitution.
Either way, first you have to identify all the chars that result in �. To do this, you'll have to go through each char and compare it to this list: http://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html
Once you identify the bad char, you have the choice of just deleting it, or once you figure out what they should be, you can change them to what they should be. For instance, the smart quotes are coded as 147 & 148, so you can just change both of those to strait quotes ("). If you do a search, you'll probably find some code that does most, if not all, of this for you.

CSV with comma or semicolon?

How is a CSV file built in general? With commas or semicolons?
Any advice on which one to use?
In Windows it is dependent on the "Regional and Language Options" customize screen where you find a List separator. This is the char Windows applications expect to be the CSV separator.
Of course this only has effect in Windows applications, for example Excel will not automatically split data into columns if the file is not using the above mentioned separator. All applications that use Windows regional settings will have this behavior.
If you are writing a program for Windows that will require importing the CSV in other applications and you know that the list separator set for your target machines is ,, then go for it, otherwise I prefer ; since it causes less problems with decimal points, digit grouping and does not appear in much text.
CSV is a standard format, outlined in RFC 4180 (in 2005), so there IS no lack of a standard. https://www.ietf.org/rfc/rfc4180.txt
And even before that, the C in CSV has always stood for Comma, not for semiColon :(
It's a pity Microsoft keeps ignoring that and is still sticking to the monstrosity they turned it into decades ago (yes, I admit, that was before the RFC was created).
One record per line, unless a newline occurs within quoted text (see below).
COMMA as column separator. Never a semicolon.
PERIOD as decimal point in numbers. Never a comma.
Text containing commas, periods and/or newlines enclosed in "double quotation marks".
Only if text is enclosed in double quotation marks, such quotations marks in the text escaped by doubling. These examples represent the same three fields:
1,"this text contains ""quotation marks""",3
1,this text contains "quotation marks",3
The standard does not cover date and time values, personally I try to stick to ISO 8601 format to avoid day/month/year -- month/day/year confusion.
I'd say stick to comma as it's widely recognized and understood. Be sure to quote your values and escape your quotes though.
ID,NAME,AGE
"23434","Norris, Chuck","24"
"34343","Bond, James ""master""","57"
Also relevant, but specially to excel, look at this answer and this other one that suggests, inserting a line at the beginning of the CSV with
"sep=,"
To inform excel which separator to expect
1.> Change File format to .CSV (semicolon delimited)
To achieve the desired result we need to temporary change the delimiter setting in the Excel Options:
Move to File -> Options -> Advanced -> Editing Section
Uncheck the “Use system separators” setting and put a comma in the “Decimal Separator” field.
Now save the file in the .CSV format and it will be saved in the semicolon delimited format.
Initially it was to be a comma, however as the comma is often used as a decimal point it wouldnt be such good separator, hence others like the semicolon, mostly country dependant
http://en.wikipedia.org/wiki/Comma-separated_values#Lack_of_a_standard
CSV is a Comma Seperated File. Generally the delimiter is a comma, but I have seen many other characters used as delimiters. They are just not as frequently used.
As for advising you on what to use, we need to know your application. Is the file specific to your application/program, or does this need to work with other programs?
To change comma to semicolon as the default Excel separator for CSV - go to Region -> Additional Settings -> Numbers tab -> List separator
and type ; instead of the default ,
Well to just to have some saying about semicolon. In lot of country, comma is what use for decimal not period. Mostly EU colonies, which consist of half of the world, another half follow UK standard (how the hell UK so big O_O) so in turn make using comma for database that include number create much of the headache because Excel refuse to recognize it as delimiter.
Like wise in my country, Viet Nam, follow France's standard, our partner HongKong use UK standard so comma make CSV unusable, and we use \t or ; instead for international use, but it still not "standard" per the document of CSV.
best way will be to save it in a text file with csv extension:
Sub ExportToCSV()
Dim i, j As Integer
Dim Name As String
Dim pathfile As String
Dim fs As Object
Dim stream As Object
Set fs = CreateObject("Scripting.FileSystemObject")
On Error GoTo fileexists
i = 15
Name = Format(Now(), "ddmmyyHHmmss")
pathfile = "D:\1\" & Name & ".csv"
Set stream = fs.CreateTextFile(pathfile, False, True)
fileexists:
If Err.Number = 58 Then
MsgBox "File already Exists"
'Your code here
Return
End If
On Error GoTo 0
j = 1
Do Until IsEmpty(ThisWorkbook.ActiveSheet.Cells(i, 1).Value)
stream.WriteLine (ThisWorkbook.Worksheets(1).Cells(i, 1).Value & ";" & Replace(ThisWorkbook.Worksheets(1).Cells(i, 6).Value, ".", ","))
j = j + 1
i = i + 1
Loop
stream.Close
End Sub