I use SublimeText for writing Python. Every so often it will insert characters that I didn't type. Today's example:
Non-ASCII character '\xc2' in file
/path/to/my/project/forms.py on line 256, but no encoding
declared; see http://www.python.org/peps/pep-0263.html for details
(forms.py, line 256)
This doesn't happen to my colleagues and happens to me from time to time. I'm not sure what to do about it. I can delete the line and re-type it and it's fine. I have tried updating versions etc etc.
I don't want to just set the file encoding because I'm not actually typing non-ascii characters and that would be ignoring the actual problem.
Has anyone else found this? Solutions?
That happens to me too! If you are in MacOS you´re typing ⌥ + space, if you are windows/Linux I guess it's alt + space
Related
I am exporting a MySQL database on a windows machine (running XAMPP) to then import into a Linux server (using cmdline or phpMyAdmin IMPORT "filename.sql")
The dbdump file has mixed LF/CRLF line endings, and I know Linux uses LF for line endings.
Will this cause a problem?
Thanks
I anticipate that the mysql program on each platform would expect "its" line-ending style. I honestly don't know if, on each platform, it is "smart enough" to know what to do with each kind of file. (Come to think of it, maybe it does ...) Well, there's one sure way to find out ...
You say that the file has mixed(?!) line-endings? That's very atypical ...
However, there are ready-made [Unix/Linux] utilities, dos2unix and unix2dos, which can handle this problem – and I'm quite sure that Windows has them too. Simply run your dump-file through the appropriate one before using it.
you can import db from Windows to Linux without problem.
MySQL's SQL tokenizer skips all whitespace characters, according to ctype.h.
https://github.com/mysql/mysql-server/blob/8.0/mysys/sql_chars.cc#L94-L95
else if (my_isspace(cs, i))
state_map[i] = MY_LEX_SKIP;
The my_isspace() function tests that in character set cs, the character i is whitespace. For example in ASCII, this includes:
space
tab
newline (\n)
carriage return (\r)
vertical tab (\v)
form feed (\f)
All of the whitespace characters are considered the same for this purpose. So there's no problem using CRLF or LF for line endings in SQL code.
But if your data (i.e. string values) contain different line endings, those line endings will not be converted.
When I try to read a csv file in Octave I realize that the very first value from it is converted to zero. I tried both csvread and dlmread and I'm receiving no errors. I am able to open the file in a plain text editor and I can see the correct value there. From what I can tell, there are no funny hidden characters, spacings, or similar in the csv file. Files also contain only numbers. The only thing that I feel might be important is that I have five columns/groups that each have different number of values in them.
I went through the commands' documentation on Octave Forge and I do not know what may be causing this. Does anyone have an idea what I can troubleshoot?
To try to illustrate the issue, if I try to load a file with the contents:
1.1,2.1,3.1,4.1,5.1
,2.2,3.2,4.2,5.2
,2.3,3.3,4.3,
,,3.4,4.4
,,3.5,
Command window will return:
0.0,2.1,3.1,4.1,5.1
,2.2,3.2,4.2,5.2
,2.3,3.3,4.3,
,,3.4,4.4
,,3.5,
( with additional trailing zeros after the decimal point).
Command syntaxes I'm using are:
dt = csvread("FileName.csv")
and
dt = dlmread("FileName.csv",",")
and they both return the same.
Your csv file contains a Byte Order Mark right before the first number. You can confirm this if you open the file in a hex editor, you will see the sequence EF BB BF before the numbers start.
This causes the first entry to be interpreted as a 'string', and since strings are parsed based on whether there are numbers in 'front' of the string sequence, this is parsed as the number zero. (see also this answer for more details on how csv entries are parsed).
In my text editor, if I start at the top left of the file, and press the right arrow key once, you can tell that the cursor hasn't moved (meaning I've just gone over the invisible byte order mark, which takes no visible space). Pressing backspace at this point to delete the byte order mark allows the csv to be read properly. Alternatively, you may have to fix your file in a hex editor, or find some other way to convert it to a proper Ascii file (or UTF without the byte order mark).
Also, it may be worth checking how this file was produced; if you have any control in that process, perhaps you can find why this mark was placed in the first place and prevent it. E.g., if this was exported from Excel, you can choose plain 'csv' format instead of 'utf-8 csv'.
UPDATE
In fact, this issue seems to have already been submitted as a bug and fixed in the development branch of octave. See #58813 :)
I am having an issue using Lazarus + zeos + access in some characters (Ç, ~, í ... )
The problem is a bit weird, sometimes i can insert properly, but sometimes the characters go crazy, example:
When I am typing it is ok, the ç and ã
BUT when I exit the DBedit:
This happens sometimes, and sometimes the chars are registered just fine
Using zeos, with zeosconnection. ClientCodepage UTF8 / ControlsCodepage UTF8 / AutoEncodestrings true.
Tried to change the charset but the problem persists, and the worst thing is that sometimes it works but sometimes it seems to loose the charset...
This difference in behaviour occurs in the same run of the program. For example, I am typing, when I save a change to a record in my database, everything is fine and ok; then I try to create a new record and the problems occurs, and the funny thing is: when I type "requisição" the chars stay the same, but I proceed to type "requisição de saída" the chars break; it seen like a problem that the software is trying to auto-encode based on what I am typing.
Also, I have discovered that if I put an extra blank space after the word that has "...ção" in the end everything works as it should "requisição[][]de saída" where the [] are the two blank spaces
Any advice?
When I am opening a csv file containing Chinese characters, using Microsoft Excel, TextWrangler and Sublime Text, there are some Chinese words, which cannot be displayed properly. I have no ideas why this is the case.
Specifically, the csv file can be found in the following link: https://www.hkex.com.hk/eng/plw/csv/List_of_Current_SEHK_EP.CSV
One of the word that cannot be displayed correctly is shown here:
As you can see a ? can be found.
Using mac file command as suggested by
http://osxdaily.com/2015/08/11/determine-file-type-encoding-command-line-mac-os-x/ tell me that the csv format is utf-16le.
I am wondering what's the problem, why I cannot read that specific text?
Is it related to encoding? Or is it related to my laptop setting? Trying to use Mac and windows 10 on Mac (via Parallel Desktop) cannot display the work correctly.
Thanks for the help. I really want to know why this specific text cannot be displayed properly.
The actual name of HSBC Broking Securities is:
滙豐金融證券(香港)有限公司
The first character, U+6ED9 滙, is one of the troublesome HKSCS characters: characters that weren't available in standard pre-Unicode Big-5, which were grafted on in incompatible ways later.
For a while there was an unfortunate convention of converting these characters into Private Use Area characters when converting to Unicode. This data was presumably converted back then and is now mangled, replacing 滙 with U+E05E Private Use Area Character.
For PUA cases that you're sure are the result of HKSCS-compatibility-bodge, you can convert back to proper Unicode using this table.
I've asked this question before, here, however that solution didn't fix it when I looked closely. This is the problem.
For some reason, my mysql table is converting single and double quotes into strange characters. E.g
"aha"
is changed into:
“ahaâ€
How can I fix this, or detect this in PHP and decode everything??
Previously i tried doing this query right after connecting to MySQL:
$sql="SET NAMES 'latin1'";
mysql_query($sql);
But that no longer has any effect. I'm seeing strings such as:
“aha†(for "aha")
It’s (for "its")
etc.
Any ideas?
As per the answer to your original question, your input is actually in UTF-8, but the output you're seeing looks wrong because your output terminal and/or browser is set to the (single byte) character encoding "Windows 1252".
If you just make sure that your output is also set to UTF-8 then everything should be fine.
See Quotation marks turn to question marks