dealing with strange characters with access, asp and CSVs - ms-access

I have a problem, i have to create a csv file with an ASP Classic page, taking the data from a MS Access database, all really simple, but in the final file I have tons of strange characters, appearing as squares (unknown character square). I must get rid of those characters, but i really don't know how... have you got some ideas?
this is how I see something on the file: M�NSTERSTRA�E and of course, I don't really know which are the char that give problems...and they are really a lot.
and this is how I write the csv...
dim fs,f,d
set fs = Server.CreateObject("Scripting.FileSystemObject")
set f = fs.OpenTextFile(Server.MapPath("clienti.csv"), 2, true,true)
d = ""
do while not rs1.EOF
d = ""
For Each fField in RS1.Fields
f.Write(d)
f.Write(" ")
temp = RS1(fField.Name)
if len(trim(temp)) > 0 then
f.Write(trim(temp))
end if
d = ";"
Next
f.WriteLine("")
rs1.movenext
loop
f.Close
set f = Nothing
set fs = Nothing
I can't think about making a replace of all the chars, becouse I don't know them before i extract all the clients... I need some workaround for this...

The � means that your browser doesn't recognize that char, so makes a substitute. One example is the "smart quotes" (curly ones) that some applications, like MS Word, substitute for the strait quotes. The default character encoding is ISO-8859-1.
If you don't want those to show up, you have 2 choices. You can delete them, of you can try to find the appropriate substitution.
Either way, first you have to identify all the chars that result in �. To do this, you'll have to go through each char and compare it to this list: http://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html
Once you identify the bad char, you have the choice of just deleting it, or once you figure out what they should be, you can change them to what they should be. For instance, the smart quotes are coded as 147 & 148, so you can just change both of those to strait quotes ("). If you do a search, you'll probably find some code that does most, if not all, of this for you.

Related

Using REPLACE with CHAR(160) is Returning Hexadecimal as Value

I am trying to get rid of &nbsp characters in MYSQL, but am getting weird behavior where using REPLACE is returning a hexadecimal string.
The original value is some HTML stored in a field with the type BLOB:
<h3>This was just an appetizer. Are you ready for the full course?</h3><p>Dive into more business news, check out the latest tech trends, and get a couple quick tips from our health section.  </p></div>
The SQL I am using is this:
UPDATE tbl
SET field = REPLACE(field, CHAR(160), '');
And after executing, this is what is left in the database:
3C68333E5468697320776173206A75737420616E206170706574697A65722E2041726520796F7520726561647920666F72207468652066756C6C20636F757273653F3C2F68333E3C703E4469766520696E746F206D6F726520627573696E657373206E6577732C20636865636B206F757420746865206C61746573742074656368207472656E64732C20616E6420676574206120636F75706C6520717569636B20746970732066726F6D206F7572206865616C74682073656374696F6E2E20C23C2F703E3C2F6469763E
What is going on and how could I avoid this? Do I need to use VARCHAR for the field type?
You get (binary) BLOB back, after the replace.
so you have to convert it back to text
UPDATE tbl
SET field = CAST(REPLACE(field, CHAR(160), '')AS CHAR(10000) CHARACTER SET utf8);
Of course you have to check character set and size.
I found that CHAR codes didn't work, but a copy pasted whitespace worked. This looks like a normal space, but is in fact CHAR(160) and I don't have an error anymore. ' '

Weird character at the end of database entry

I am migrating an excel sheet (csv) to mysql, however when I do an insert, some fields end up with empty spaces at the end, and I cant get rid of them for some reason. So I assume there is a wierd character at the end, since not even this:
UPDATE FOO set FIELD2 = TRIM(Replace(Replace(Replace(FIELD2,'\t',''),'\n',''),'\r',''));
Gets rid of it completely, I still have a whitespace at the end and I dont know how to get rid of it. I have over 2000 entries, so doing it manually is not an option. I am using Laravel with the revision package and it doesnt work because it thinks that those spaces at the end are changes and it creates a bunch of duplicates. Thank you for your help.
If you think there are weird characters in the original csv, you could open it in a text processor capable of doing regex replaces, and then replace all non ascii characters with nothing.
Your regex would look like this:
[^\u0000-\u007F]+
then after removing any possible strange characters, re-import the data into the database.
Unfortunately, I don't think regex replaces are possible in sql, so you'll need to re-import.

MS SQL Backslash preceding new line/line feed removes line break

I'm working in classical ASP and am trying to simply insert some user input into my MS SQL 2008 database. This is something I do basically every day, but I don't think I've ever experienced this bug before.
The bug I am getting is that, if a user ends a line of text in backslash, and starts a new line below, both the backslash and line break are lost after the data is stored in the DB.
If i try the following statement, hardcoded from an ASP file:
UPDATE TBLarticle_text SET Introduction = 'Text on first line \" & vbCrLf & " text on second line' WHERE ArticleGuid = 28
The resulting data is without the backslash or the line break. The string is correct if stored in a variable and printed on the page.
Here is the example user input (normally from a form, but it's not really relevant). The input:
Text on first line \
text on second line
... is stored as:
Text on first line text on second line
I don't see any issues if the backslash is followed by anything other than a line break.
I know this is old, but I just came across a MS KB article that discusses this: http://support.microsoft.com/kb/164291/en-us. The long and short of it is that the three characters \<CR><LF> together is some weird escape sequence, and you need to replace all occurrences of \<CR><LF> with \\<CR><LF><CR><LF> when inserting or updating because the first backslash in \\<CR><LF><CR><LF> escapes the weird escape sequence, i.e. the next three characters \<CR><LF>, and then the additional <CR><LF> puts the carriage return back in place. Annoying, yes.
I am seeing a similar issue. I would say the best way to work around this is to not let ASP pass such strings to SQL Server. You can clean this up by simply replacing that character sequence and injecting a space at the end of such a line (which a user is unlikely to ever notice):
sql = REPLACE(sql, "\" & vbCrLf, "\ " & vbCrLf)
You don't have to be using ASP or VBScript to observe this behavior:
CREATE TABLE #floobar(i INT, x VARCHAR(255));
INSERT #floobar SELECT 1, 'foo \
bar';
INSERT #floobar SELECT 2, 'foo \\
bar';
INSERT #floobar SELECT 3, 'foo
bar';
SELECT * FROM #floobar;
I don't know that you're ever going to get Microsoft to fix this to not treat this character sequence special, so the "fix" is going to be to work around it. You may also have to watch out for non-traditional CR/LF, e.g. CHR(10) or CHR(13) on their own, or also vbTab (Chr(9))).

Removing strange characters from MySQL data

Somewhere along the way, between all the imports and exports I have done, a lot of the text on a blog I run is full of weird accented A characters.
When I export the data using mysqldump and load it into a text editor with the intention of using search-and-replace to clear out the bad characters, searching just matches every "a" character.
Does anyone know any way I can successfully hunt down these characters and get rid of them, either directly in MySQL or by using mysqldump and then reimporting the content?
This is an encoding problem; the  is a non-breaking space (HTML entity ) in Unicode being displayed in Latin1.
You might try something like this... first we check to make sure the matching is working:
SELECT * FROM some_table WHERE some_field LIKE BINARY '%Â%'
This should return any rows in some_table where some_field has a bad character. Assuming that works properly and you find the rows you're looking for, try this:
UPDATE some_table SET some_field = REPLACE( some_field, BINARY 'Â', '' )
And that should remove those characters (based on the page you linked, you don't really want an nbsp there as you would end up with three spaces in a row between sentences etc, you should only have one).
If it doesn't work then you'll need to look at the encoding and collation being used.
EDIT: Just added BINARY to the strings; this should hopefully make it work regardless of encoding.
The accepted answer did not work for me.
From here http://nicj.net/mysql-converting-an-incorrect-latin1-column-to-utf8/ I have found that the binary code for  character is c2a0 (by converting the column to VARBINARY and looking what it turns to).
Then here http://www.oneminuteinfo.com/2013/11/mysql-replace-non-ascii-characters.html found the actual solution to remove (replace) it:
update entry set english_translation = unhex(replace(hex(english_translation),'C2A0','20')) where entry_id = 4008;
The query above replaces it to a space, then a normal trim can be applied or simply replace to '' instead.
I have had this problem and it is annoying, but solvable. As well as  you may find you have a whole load of characters showing up in your data like these:
“
This is connected to encoding changes in the database, but so long as you do not have any of these characters in your database that you want to keep (e.g. if you are actually using a Euro symbol) then you can strip them out with a few MySQL commands as previously suggested.
In my case I had this problem with a Wordpress database that I had inherited, and I found a useful set of pre-formed queries that work for Wordpress here http://digwp.com/2011/07/clean-up-weird-characters-in-database/
It's also worth noting that one of the causes of the problem in the first place is opening a database in a text editor which might change the encoding in some way. So if you can possibly manipulate the database using MySQL only and not a text editor this will reduce the risk of causing further trouble.

CSV with comma or semicolon?

How is a CSV file built in general? With commas or semicolons?
Any advice on which one to use?
In Windows it is dependent on the "Regional and Language Options" customize screen where you find a List separator. This is the char Windows applications expect to be the CSV separator.
Of course this only has effect in Windows applications, for example Excel will not automatically split data into columns if the file is not using the above mentioned separator. All applications that use Windows regional settings will have this behavior.
If you are writing a program for Windows that will require importing the CSV in other applications and you know that the list separator set for your target machines is ,, then go for it, otherwise I prefer ; since it causes less problems with decimal points, digit grouping and does not appear in much text.
CSV is a standard format, outlined in RFC 4180 (in 2005), so there IS no lack of a standard. https://www.ietf.org/rfc/rfc4180.txt
And even before that, the C in CSV has always stood for Comma, not for semiColon :(
It's a pity Microsoft keeps ignoring that and is still sticking to the monstrosity they turned it into decades ago (yes, I admit, that was before the RFC was created).
One record per line, unless a newline occurs within quoted text (see below).
COMMA as column separator. Never a semicolon.
PERIOD as decimal point in numbers. Never a comma.
Text containing commas, periods and/or newlines enclosed in "double quotation marks".
Only if text is enclosed in double quotation marks, such quotations marks in the text escaped by doubling. These examples represent the same three fields:
1,"this text contains ""quotation marks""",3
1,this text contains "quotation marks",3
The standard does not cover date and time values, personally I try to stick to ISO 8601 format to avoid day/month/year -- month/day/year confusion.
I'd say stick to comma as it's widely recognized and understood. Be sure to quote your values and escape your quotes though.
ID,NAME,AGE
"23434","Norris, Chuck","24"
"34343","Bond, James ""master""","57"
Also relevant, but specially to excel, look at this answer and this other one that suggests, inserting a line at the beginning of the CSV with
"sep=,"
To inform excel which separator to expect
1.> Change File format to .CSV (semicolon delimited)
To achieve the desired result we need to temporary change the delimiter setting in the Excel Options:
Move to File -> Options -> Advanced -> Editing Section
Uncheck the “Use system separators” setting and put a comma in the “Decimal Separator” field.
Now save the file in the .CSV format and it will be saved in the semicolon delimited format.
Initially it was to be a comma, however as the comma is often used as a decimal point it wouldnt be such good separator, hence others like the semicolon, mostly country dependant
http://en.wikipedia.org/wiki/Comma-separated_values#Lack_of_a_standard
CSV is a Comma Seperated File. Generally the delimiter is a comma, but I have seen many other characters used as delimiters. They are just not as frequently used.
As for advising you on what to use, we need to know your application. Is the file specific to your application/program, or does this need to work with other programs?
To change comma to semicolon as the default Excel separator for CSV - go to Region -> Additional Settings -> Numbers tab -> List separator
and type ; instead of the default ,
Well to just to have some saying about semicolon. In lot of country, comma is what use for decimal not period. Mostly EU colonies, which consist of half of the world, another half follow UK standard (how the hell UK so big O_O) so in turn make using comma for database that include number create much of the headache because Excel refuse to recognize it as delimiter.
Like wise in my country, Viet Nam, follow France's standard, our partner HongKong use UK standard so comma make CSV unusable, and we use \t or ; instead for international use, but it still not "standard" per the document of CSV.
best way will be to save it in a text file with csv extension:
Sub ExportToCSV()
Dim i, j As Integer
Dim Name As String
Dim pathfile As String
Dim fs As Object
Dim stream As Object
Set fs = CreateObject("Scripting.FileSystemObject")
On Error GoTo fileexists
i = 15
Name = Format(Now(), "ddmmyyHHmmss")
pathfile = "D:\1\" & Name & ".csv"
Set stream = fs.CreateTextFile(pathfile, False, True)
fileexists:
If Err.Number = 58 Then
MsgBox "File already Exists"
'Your code here
Return
End If
On Error GoTo 0
j = 1
Do Until IsEmpty(ThisWorkbook.ActiveSheet.Cells(i, 1).Value)
stream.WriteLine (ThisWorkbook.Worksheets(1).Cells(i, 1).Value & ";" & Replace(ThisWorkbook.Worksheets(1).Cells(i, 6).Value, ".", ","))
j = j + 1
i = i + 1
Loop
stream.Close
End Sub