SQLCMD changes "7 k" string to 7k or 7,000? - csv

I have a SQL query that returns a value similar to "7 KI" for a description column, but when I export my data to a .csv/text file using SQLCMD, the resulting data is "7KI" (no space inbetween 7 and K). I believe that this causes a formatting issue when I open this csv file in Excel later on...(I'm wondering if "7K" is getting read as 7,000 because the data is splitting into a new column as-if it contained a comma there?)...
This is so weird to me. I am using the Quotename function to return the Unicode version of this description field in the first place - do you think that's why I am experiencing this issue? I used the Quotename function to fix issues with this field a few days ago. To me, it's odd that the export is stripping the space inbetween 7 and K in the first place.
Has anyone ever seen/fixed this before? It really looks like there is a space included in the SQL data, and I copied-and-pasted the value into a online Unicode converter to confirm there is a space. Why would a random space be getting stripped in the SQLCMD export? Plenty of my other text fields are not having this issue, even though they have spaces too (although not followed by "K"...) so maybe the issue is with CMD somehow reading the value...?

Related

Shorttext data type uses wrong character cut-off

I have a table with one field that is made up of hyperlinks such as this:
<http://nl.dbpedia.org/resource/Wereldkampioenschappen_indooratletiek_2008>
Now I have to change the datatype of this field to shorttext so I can later use JOIN in the query. From what I understand it is supposed to automatically cut-off all lines that go above the 255 character threshold. This is not the case with the example above, however if I change the datatype to ShortText it changes that text to:
<http://nl.dbpedia.org/resource/Wereldkampioenschappen_in
So it seems to keep only 57 characters instead of 255. I also tried using the Import Text Wizard and putting the datatype as ShortText there (so it never gets imported as a hyperlink), but the same problem persists as when I change it from hyperlink to shorttext.
Does anyone know how I can fix this? Thanks :)
P.S. I literally started working with Access today, so I'm still very much Googling everything. I couldn't find this problem anywhere though unfortunately.
Short Text will truncate the text to the number of characters specified in the "Table Design" view. You can specify any number of characters from 1 to 255.
More Information:
YouTube : Access 2016: Getting Started
Office.com : Access Database design basics
Office.com : Introduction to data types and field properties
Office.com : Data types for Access desktop databases
I imported using fixed width instead of delimited. Use delimited and this problem goes away!

Weird character at the end of database entry

I am migrating an excel sheet (csv) to mysql, however when I do an insert, some fields end up with empty spaces at the end, and I cant get rid of them for some reason. So I assume there is a wierd character at the end, since not even this:
UPDATE FOO set FIELD2 = TRIM(Replace(Replace(Replace(FIELD2,'\t',''),'\n',''),'\r',''));
Gets rid of it completely, I still have a whitespace at the end and I dont know how to get rid of it. I have over 2000 entries, so doing it manually is not an option. I am using Laravel with the revision package and it doesnt work because it thinks that those spaces at the end are changes and it creates a bunch of duplicates. Thank you for your help.
If you think there are weird characters in the original csv, you could open it in a text processor capable of doing regex replaces, and then replace all non ascii characters with nothing.
Your regex would look like this:
[^\u0000-\u007F]+
then after removing any possible strange characters, re-import the data into the database.
Unfortunately, I don't think regex replaces are possible in sql, so you'll need to re-import.

MySQL export to MS Excel : 1-9 becomes 9-Jan or 42013

I use Workbench to query database at work. We have a field which indicates company size and has the following options :
1-9
10-49
50-99
100-499
500+
When I export the results containing this field in Excel(which I use for analysis), 1-9 becomes 9-Jan, when I change the format of the cell to text, it becomes 42013. Similarly, 10-49 becomes Oct-50 and in text - 18537. Is there a way to avoid this?
I know this may seem trivial but I take a download of the results every couple of hours or so, and currently, I use the Replace function in Excel to fix this which is a time cost. Also, adding manual intervention increases the probability of error which I want to minimize. I would ideally like the result to export as 1-9, as it exists in the database, based on which the analytical model is built to take input.
I would appreciate any help or pointers on how to fix this issue.
Thanks!
You are not saying how you are bringing the data into Excel. The simplest method is to bring the column in as "text". You can do this when you are importing the data into Excel, by setting the column type to "text".
Alternatively, when you create the output file, you can prepend the value with a single quote or some other character:
select concat('''', company_size)
select concat('_', company_size)
Appreciate the help! I used to export the results in CSV which caused the problem I think. I exported them in XML and that solved it, the fields appear as they exist in the database. Thanks a lot. The concatenation would work as well!

SSIS - Text was truncated or one or more characters had no match in the target code page - Special Characters

I have a text file with Vertical Bar{|} separated values and I am using a Flat File source to read the values which fails with the above error.
I have a Flat File Connection Manager, where I set the columnwidth of each column. The particular column which causes error has
DataType - DT_WSTR
OutputColumnWidth - 30
The problem is raised only when the particular column has special characters like 'Société Amomyna da Pramt Hgyme' though it still has only 30 characters.
If I increase the column width it works but I need to know is that the right solution.
Please let me know if you require more details. Thanks in advance
If you go to the Flat file connection manager under Advanced and Look at the "OutputColumnWidth" description's ToolTip It will tell you that Composit characters may use more spaces. So the "é" in "Société" most likely occupies more than one character.
EDIT: Here's something about it: http://en.wikipedia.org/wiki/Precomposed_character

MSSQL to MySQL migration - char encoding issues with UCS-2 surrogate pairs, how can I remove these from MSSQL database?

I have been tasked with migrating a Microsoft SQL Server 2005 database to MySQL 5.6 (these are both database servers runnig locally) and would really appreciate some help.
-MSSQL source database has latin1 collation (so has ISO 8859-1 character set right?) but doesn't have any char/varchar fields (any string field is nvarchar/nchar) so all this data should be using the UCS-2 character set.
-MySQL target database wants the character set UTF-8
I decided to use the database migration toolkit in the latest version of the MySQL workbench. at first it worked fine and migrated everything as expected. But I have been totally tripped up upon encountering UCS-2 surrogate pair characters in the MSSQL database.
The migration toolkit copytable program did not provide a very useful error message: "Error during charset conversion of wstring: No error". It also did not provide any field/row information on the problem-causing data and would fail within chunks of 100 rows. So after searching through the 100 rows after the last successful insert I found that the issue seemed to be caused by two UCS-2 characters in one of the nvarchar fields. They are listed as surrogate pairs in the UCS-2 character set. They were specifically the characters DBC0 and DC83 (I got this by looking at the binary data for the field and comparing byte pairs (little endian) with data that was being migrated successfully).
When this surrogate pair was removed from the MSSQL database the row was migrated successfully to MySQL.
Here is the problem:
I have tried to search for these characters in a test MSSQL table (this chartest table is just various test strings an nvarchar field) to prepare a replacement script and keep getting strange results... I must be doing something incorrectly.
Searching for
SELECT * FROM chartest WHERE text LIKE NCHAR(0xdc83)
Will return any surrogate pair character (whether or not it uses DC83), but obviously, only if it is the only character (or part of the pair) in that field. This isn't a big deal since I would like to remove any instance of these anyway (I dont like to remove data like this but I think we can afford it).
Searching for
SELECT * FROM chartest WHERE text LIKE '%' + (NCHAR(0xdc83))+ '%'
Will return every row! Regardless of whether it even has a unicode character present in the field let alone the DC83 character. Is there a better way to find and replace these characters? Or something else I should try?
I have also tried setting the target databse, table, and field character set to UCS-2 but it seems as though it does not make a difference.
I should also mention that this migration is using live data (~50GB database!) while one of the sites that feeds it is taken offline so any solutions to this need to have a quick running time...
I would appreciate any suggestions very much! Please let me know if there is any information I have left out.
I had this error, and now I have discovered the source of the problem. I had a hard time finding out, so maybe this will be useful to someone, even though I realize, my problem and workaround may not be spot on matching op's original trouble.
I am migrating data from MSSQL to MySQL, and the content being migrated is html-content from Sitecore CMS (target CMS is Drupal, btw).
I've found, that I get this error when converting the database and hitting records, that contain Instagram-embeds. Instagram-embeds work in the way, that the embedded post data is copied to the embed code (instead of being loaded async., et.c. - even the image is included as base64-css...), and the young people nowadays tend to put a lot of emoji's in their image-descriptions (using their iPhones with Emoji keyboard). Emoji's are represented by 4-byte encoded characters, but MySQL utf8 only allows for 3-byte encoded unicode characters.
My initial error from running wbcopytables.exe (which is the non-GUI way of doing Migration Wizard in MySQL Workbench) was the
Error during charset conversion of wstring: No error
but upgrading MySQL Workbench to recent version (from 5.something to 6.x) makes the error a bit more descriptive, hinting table and column (alas, not row):
ERROR: Could not successfully convert UCS-2 string to UTF-8 in table
[MyDatabase].[dbo].[MyTable] (column MyColumn).
Original string: ...
Anyway - a solution *could* be to use utf8mb4 which would allow for the emoji's. Read more here.
But it looks like, it's a bad idea to do this in e.g. my case with Drupal.
So - the solution I ended up with was simply to strip these characters in my migrate-script. There is no point in keeping these for users of the site in question, since they are being displayed as rectangles on the webpage anyway. Since you can't search-and-replace with regex in SQL Server, I processed the data using a DAL and c# .NET, and I found the help here (thanks a ton, Jon Skeet) - turns out there is a regex-pattern for matching one half of a surrogate pair in UTF-16. See below (and use the pattern in another language if needed).
var noUcs2SurrogatePairsString = Regex.Replace(stringWithUcs2SurrogatePairs, #"\p{Cs}", string.Empty);
I had a very similar problem today, and I found that it was caused by empty strings, replaced them with NULLs or a value representing no data and the migration worked fine.
I solved just editing the "import data script.cmd" where it reads columns "As NVARCHAR" by replacing those with "VARCHAR" only.
Note: My table columns was VARCHAR type already, so... for some stupid reason the migration script improperly cast it to UNICODE (NVARCHAR) type.
This issue has now been resolved. I used user Remus Rusanu's suggestion here for finding the rows with these surrogate pair characters using CHARINDEX and have decided to use SUBSTRING to exclude the troublesome characters like so:
UPDATE test
SET a = SUBSTRING(a, 1, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 - 1) -- string before the unwanted character
+ SUBSTRING(a, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 +1, LEN(a) ) -- string after the unwanted character
WHERE CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000))) % 2 = 1 -- only odd numbered charindexes (to signify match at beginning of byte pair character)