SSIS tab delimited export for mainframe consumption issue - ssis

Good afternoon, hope someone can shed some light on a strange issue that I am having. I am running a simple SSIS flat file export job that creates a tab delimited flat file. This file is to be consumed by our mainframe team but when they get the file they are seeing a period (.) character that precedes and ends each field. I am wondering if the tab character that SSIS produces or uses is not being handled or is not what the mainframe expects it to be. Any help would be appreciated.

What encoding is your SSIS script outputting, and what encoding is the mainframe expecting?
The table here indicates that the ASCII tab character (0x09) is a superscript indicator in EBCDIC. IIRC (it's been a looong time), the period was used as a 'not printable' placeholder in mainframe output.
EDIT: And what character sets/code pages are in play? ASCII <> EBCDIC, and CP437 (OEM) <> CP1252 (Windows Latin-1).

Related

ERP ( IFS ) export into CSV - coding problem

I'm exporting some data from the ERP system ( IFS ) into the CSV file. From that CSV it's being uploaded to another tool.
I have a problem with character coding. Until now we were pulling only Dannish and Finnish data and used the WE8MSWIN1252. Now we need to include also Polish signs. Unfortunately the coding that we have is not covering the special characters in Polish. I've tried already AL16UTF16, AL32UTF8, EEC8EUROASCI and none of them gave me the expected result ( having all of the Dannish, Finnish, Polish special signs visible correctly in the CSV). Is there any coding which would cover all ofthose signs right into the CSV? While ?I was opening the AL32UTF8 in notepad it worked fine, but we have to use the CSV due to the integration that is the next step in the puzzle.
Please note that changing the csv to anything else is really the last resort. We don't want to play with the integration that is going further.

MySQL - Table Data Import Wizard error in MacOS "Unhandled exception: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)"

I am unable to load any CSV file into MySQL. Using the Table Data Import Wizard, this error pops up every time I get to the 'Configure Import Settings' step:
"Unhandled exception: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)"
... even though the CSV is encoded as UTF-8 and that seems to be the default encoding setting for MySQL Workbench. Granted, I am not very skilled with computers, I have only a few weeks' exposure to MySQL. This has not always happened to me. I had no issues with this a couple of months ago while I was in a database management course.
But, I think this is where my problem lies: at one point I tried to uninstall MySQL Workbench and Community Server and re-installed, and ever since, this error happens every time I try to load data. I am even using a very basic test file that still won't load (all column types are set to 'Text' in Excel and saved as UTF-8 CSV:
I am using MySQL 8.0.28 on MacOS 11.5.2 (Big Sur)
Case 1, you wanted ï ("LATIN SMALL LETTER I WITH DIAERESIS"):
Character set ASCII is not adequate for the accented letters you have. You probably need latin1
Case 2, the first 3 bytes of the file are (hex) EF BB BF:
That is "BOM", which is a marker at the beginning of the file that indicates that it is encoded in UTF-8. But, apparently, the program reading it dos not handle such.
In some situations, you can remove the 3 bytes and proceed; in other situations, you need to read it using some UTF-8 setting.
Since you say "Text' in Excel and saved as UTF-8 CSV", I suspect that it is case 2. But that only addresses the source (Excel), over which you may not have enough control to get rid of the BOM.
I don't know what app has "Table Data Import Wizard", I cannot address the destination side of the problem. Maybe the wizard has a setting of UTF-8 or utf8mb4 or utf8; any of those might work instead of "ascii".
Sorry, I don't have the full explanation, but maybe the clues "BOM" or "EFBBBF" will help you find a solution either in Excel or in the Wizard.
Was able to solve it by saving my excel file to csv using MS DOS csv and Macintosh csv. After that, I was able to import my csv through the import wizard without the bug.

import CSV in another language to SAS

I am attempting to import a CSV file which is in French to my US based analysis. I have noticed several issues in the import related to the use of accents. I put the csv file into a text reader and found that the data look like this
I am unsure how to get rid of the [sub] pieces and format this properly.
I am on SAS 9.3 and am unable to edit the CSV as it is a shared CSV with French researchers. I am also limited to what I can do in terms of additional languages within SAS because of admin rights.
I have tried the following fixes:
data want(encoding=asciiany);
set have;
comment= Compress(comment,'0D0A'x);
comment= TRANWRD(comment,'0D0A'x,'');
comment= TRANWRD(comment,'0D'x,'');
comment= TRANWRD(comment,"\u001a",'');
How can I resolve these issues?
While this would have been a major issue a few decades ago, nowadays, it's very simple to determine the encoding and then run your SAS in the right mode.
First, open the CSV in a text editor, not the basic Notepad but almost any other; Notepad++ is free, for example, or Ultraedit or Textpad, on Windows, or on the Mac, BBEdit, or several others will do. I'll assume Notepad++ for the rest of this answer, but all of them have some way of doing this. If you're in a restricted no-admin-rights environment, good news: Notepad++ can be installed in your user folder with no admin rights (or even on a USB!). (Also, an advanced text editor is a vital data science tool, so you should have one anyway.)
In Notepad++, once you open the file there will be an encoding in the bottom right: "UTF-8", "WLATIN1", "ASCII", etc., depending on the encoding of the file. Look and see what that is, and write it down.
Once you have that, you can try starting SAS in that encoding. For the rest of this, I assume it is in UTF-8 as that is fairly standard, but replace UTF-8 with whatever the encoding you determined. earlier.
See this article for more details; the instructions are for 9.4, but they have been the same for years. If this doesn't work, you'll need to talk to your SAS administrator, and they may need to modify your SAS installation.
You can either:
Make a new shortcut (a copy of the one you run SAS with) and add -encoding UTF-8 to the command line
Create a new configuration file, point SAS to it, and include ENCODING=UTF-8 in the configuration file.
Note that this will have some other impacts - the datasets you create will be encoded in UTF-8, and while SAS is capable of handling that, it will add some extra notes to the log and some extra time if you later do work in non-UTF8 SAS with this, or if you use non-UTF8 SAS datasets in this mode.
This worked:
data want;
array f[8] $4 _temporary_ ('ä' 'ö' 'ü' 'ß' 'Ä' 'Ö' 'Ü' 'É');
array t[8] $4 _temporary_ ('ae' 'oe' 'ue' 'ss' 'Ae' 'Oe' 'Ue' 'E');
set have;
newvar=oldvar;
newvar = Compress(newvar,'0D0A'x);
newvar = TRANWRD(newvar,'0D0A'x,'');
newvar = TRANWRD(newvar,'0D'x,'');
newvar = TRANWRD(newvar,"\u001a",'');
newvar = compress(newvar, , 'kw');
do _n_=1 to dim(f);
d=tranwrd(d, trim(f[_n_]), trim(t[_n_]));
end;
run;

How do you fix the following error? I am trying to Data Table Import Wizard to load a csv file into Workbench

I am trying to upload a .csv file into Workbench using the Table Data Import Wizard.
I receive the following error whenever attempting to load it:
Unhandled exception: 'ascii' codec can't decode byte 0xc3 in position 1253: ordinal not in range(128)
I have tried previous solutions that suggested I encode the .csv file as a MS-DOS csv and as a UTF-8 csv. Neither have worked for me.
Attempting to change the data in the file would not be feasible since its made up of thousands of cells, so it would quite impractical. Is there anything that can be done to resolve this?
What was after the C3? What should have been there?
C3, when interpreted as "latin1" is à -- an unlikely character.
More likely is a 2-byte UTF-8 code that starts with C3. This includes the accented letters of Western European languages. Example é, hex C3A9.
You tried "UTF-8 csv" -- Please provide the specifics of how you tried it. What settings in the Wizard, etc.
Probably you should state that the data is "UTF-8" or utf8mb4, depending on whether you are referring to outside or inside MySQL.
Meanwhile, if you are loading the data into an existing "table", let's see SHOW CREATE TABLE. It should probably not say "ascii" anywhere; instead, it should probably say "utf8mb4".

Migrating MS Access data to MySQL: character encoding issues

We have an MS Access .mdb file produced, I think, by an Access 2000 database. I am trying to export a table to SQL with mdbtools, using this command:
mdb-export -S -X \\ -I orig.mdb Reviewer > Reviewer.sql
That produces the file I expect, except one thing: Some of the characters are represented as question marks. This: "He wasn't ready" shows up like this: "He wasn?t ready", only in some cases (primarily single/double curly quotes), where maybe the content was pasted into the DB from MS Word. Otherwise, the data look great.
I have tried various values for "export MDB_ICONV=". I've tried using iconv on the resulting file, with ISO-8859-1 in the from/to, with UTF-8 in the from/to, with WINDOWS-1250 and WINDOWS-1252 and WINDOWS-1256 in the from, in various combinations. But I haven't succeeded in getting those curly quotes back.
Frankly, based on the way the resulting file looks, I suspect the issue is either in the original .mdb file, or in mdbtools. The malformed characters are all single question marks, but it is clear that they are not malformed versions of the same thing; so (my gut says) there's not enough data in the resulting file; so (my gut says) the issue can't be fixed in the resulting file.
Has anyone run into this one before? Any tips for moving forward? FWIW, I don't have and never have had MS Access -- the file is coming from a 3rd party -- so this could be as simple as changing something on the database, and I would be very glad to hear that.
Thanks.
Looks like "smart quotes" have claimed yet another victim.
MS word takes plain ascii quotes and translates them to the double-byte left-quote and right-quote characters and translates a single quote into the double byte apostrophe character. The double byte characters in question blelong to to an MS code page which is roughly compatable with unicode-16 except for the silly quote characters.
There is a perl script called 'demoroniser.pl' which undoes all this malarky and converts the quotes back to plain ASCII.
It's most likely due to the fact that the data in the Access file is UTF, and MDB Tools is trying to convert it to ascii/latin/is0-8859-1 or some other encoding. Since these encodings don't map all the UTF characters properly, you end up with question marks. The information here may help you fix your encoding issues by getting MDB Tools to use the correct encoding.