CSV File Adding New ROW After Every ROW - csv

When I am trying to copy a CSV file from server one to server two the csv file adding a new row after every every row.
Example:
ORIGINAL FILE:
1. ONE
2. TWO
3. THREE
4. FOUR
AFTER TRANSFERRING THE FILE:
1. ONE
2.
3. TWO
4.
5. THREE
6.
7. FOUR
i am running this system for there months. And this problem happened suddenly. And this is problem is not regular. this is the 2nd time i am facing this problem.
I am using Filezilla to upload the file. After upload the file, with a script i am scanning the directory for csv file after every 2 minutes. then move the csv file to another server.
Need help

I believe the problem just could be that you are not transferring in binary mode. You want to be in binary mode or it can cause different formatting.
After you log in to an ftp site, ftp will print out the file transfer
type. In our case, it is binary. Binary mode transfers the files, bit
by bit, as they are on the FTP server. Ascii mode, however, will
download the text directly. You can type ascii or binary to switch
between the types.
Source: http://www.tldp.org/HOWTO/FTP-3.html

Related

Azure Synapse Dedicated Pool COPY INTO function fails due to base64 encode image in CSV file

I am using Azure Synapse Link for Dynamics 365. It automatically exports data from Dynamics 365 in CSV format into blob storage/data lake. I use the COPY INTO function to load the data into a Dedicated Pool instance. However, the contact model has recently started failing.
I investigated the issue and found that the cause was due to a field that has an image encoded as text. I only copy selected fields from the CSV files and this is not one of them, but it still causes the copy to fail. I manually updated the CSV file to exclude this data from the one row where it was found and it worked fine.
The error message associated with the error is:
The column is too long in the data file for row 1328, column 32.
This is supposed to be an automated process so I do not want to be manually editing CSV files when this occurs. Are there any parameters that I can add to the COPY INTO function to prevent this error? I tried using MAXERRORS but that made no difference.
The only other thing that I could think of is to write a script (maybe an Azure Function?) that checks the file for this issue and corrects it. Maybe there is a simpler approach though?

How to retrieve original pdf stored as MySQL mediumblob?

A table containing almost four thousand records includes a mediumblob field for each record that contains the record's associated PDF report. Under both MySQL Workbench and phpMyAdmin the relevant DOCUMENT column displays the data as a BLOB button or link. In the case of phpMyAdmin the link also indicates the size of the data the Blob contains.
The issue is that when the Blob button/link is clicked, under MySQL Workbench opening any of the files using the SQL Editor only displays the raw Blob data and under phpMyAdmin th link only allows the Blob data to be saved as a .bin file instead of displaying or saving the data as a viewable PDF file. All previous attempts to retrieve the original PDFs using PHP have failed - see related earlier thread: Extract Pdf from MySql Dump Saved as Text.
The filename field in the table shows that all the stored files are PDF files. Further research and tests indicate that the mediumblob data has been stored as application/octet-streams.
My question is how can the original PDFs be retrieved as readable PDFs? Is it possible for a .bin file saved from the database to be converted or used to recover the original PDF file?
Any assistance would be greatly appreciated.
In line with my assumption and Isaac's suggestion the only solution was to be able to speak to one of the software developers. It transpires that the documents have been zipped using an third-party library as well as the header being removed before then being stored in the database.
The third-party library used is version 2.0.50727 of Chilkat, available from www.chilkatsoft.com. That version no longer appears to be available, but hopefully at least one of the later versions may do the job.
Thanks again for everyone's input and assistance.
Based on the discussion in the comments, it sounds like you'll need to either refer to the original source code or consult with the original developer to determine exactly how the data was stored.
Using phpMyAdmin to download the mediumblob data as a file will download a .bin file in many cases, I actually don't recall how it determines content type (for instance, a PNG file will download with a .png extension, but most other binary files simply download as a .bin when phpMyAdmin isn't sure what the extension should be, PDF included). So the behavior you're seeing from phpMyAdmin is expected and correct, but since the .bin file doesn't work when it's renamed to .pdf that means something has probably gone wrong with the import and upload.
BLOB data is often stored in a pretty standardized way, but it seems your data doesn't follow that method.
Without us seeing the code directly, we can't guess what exactly happened with storing the data and would only be guessing.

Managing a large SPSS (*.sav) file (4.2 GB)

I have received an SPSS file from survey fielded by another company that allegedly only contains ~1500 respondents, but the file size somehow has ballooned 4.2GB. My hunch is that the reason for this is that the file was from a global survey and the 1500 records that have been selected are from the US only so there are a series of blank variables, metadata for those variables that are included in this file and may also be in multiple languages/alphabets.
I only need a subset of this data, and can likely work with it if I removed the metadata but my issue has been that I can't get the damn thing open to cut down on the number of variables. I have been using the tools at my disposal to try the following workarounds, though I'm sure there are better options:
Opening the file using PSPP (freeware SPSS) - this causes the PSPP to stop responding
Using the R command read.spss (from the foreign package) to write a .csv - this claims that the file has a duplicate variable name and won't proceed further
Using the R command spss.system.file to write a .csv - when I tried this, R has spend a lot of time thinking as it as it attempts to run this and has been running for a couple hours with no apparent success.
Using the PSPP text conversion tool (https://pspp.benpfaff.org/) to create either a dictionary or a .csv file - both of these options crash after the file has completed uploading.
I've gone back to the other company to try have them work on reducing the file size, however I wasn't sure if anyone else had any ideas to do either of the following:
Open the file using another program/converter that could turn it into a .csv or other similarly skinny file format
Use another program to at least read only the variable names included in the file so that I can provide the other company with the specific variables I need
The following command from PSPP should do what you need:
$ pspp-convert originalFile.sav output.csv
In case it doesn't, please provide terminal error message.

I get a mysterious "Neo.ClientError.Statement.InvalidSyntax" error when loading a CSV in Neo4j

For a course on Excel I was trying to load a CSV in Neo4j (first time using this application) when I was blocked at the first step of replicating an example shown in said course: loading.
The command which was used in the example was this;
LOAD CSV WITH HEADERS FROM "file:/path/to/file/file.csv"
as row
CREATE (m:movie {name:row.movie})
But it gave syntax errors. I found out I could correct it by using double \ and add "file:";
LOAD CSV WITH HEADERS FROM "file://C:\\path\\to\\file\\file.csv"
as row
CREATE (m:movie {name:row.movie})
Neo4j accepts this syntax, processes for a few moments, and returns YET ANOTHER error;
Neo.TransientError.Statement.ExternalResourceFailure
I tried the same commands (original and my own) in the online Neo4j console but no luck. I can reach the file using that path without problem; it really is there. The CSV file consist out of just 5 strings of regular letters, that's all. No fancy formatting or characters.
What's going on?
Not that mysterious, Neo4j's IMPORT CSV function looks for the specified CSV file in the import directory within your server configuration for that database, as specified at the top of its server configuration file. (IE: dbms.directories.import=import in your neo4j.conf file.)
You should create the import directory in...
"C:\Users\[User Name]\Documents\Neo4j\default.graphdb\"
If you place your CSV file in there, you can specify any sub-directory or just the "file.csv" you want to import with the IMPORT CSV function as below.
LOAD CSV WITH HEADERS FROM "file:///file.csv"
AS row
RETURN row
LIMIT 5
Try using:
"file:///C:/path/to/file/file.csv"
Since your file is on your local computer, the third / following the file scheme is not preceded by a host name or address -- but it still needs to be there. Also, file URI path separators should be forward slashes (even on Windows machines).
See the File URI scheme Wikipedia page if you need more information.

SSIS Package not reading the last row in flat file

I have SSIS Package which will load .EXT file into my Database table.
The package Flat File connection manager Editor properties are
Format: Ragged Right
Code Page: 1252 ANSI (Latin-I)
Text Qualifier: <None>
Header Row Delimiter: <LF>
While trying to preview the file before loading, i am able to see all the rows in columns and
preview tab of Flat File connection manager Editor.
But in actual loading of the file, last record alone is not getting imported into table.
It was loading fine and still it is processing the file on daily basis.
Only for two days file, it was not imported last records. I am trying to find the root cause.
I suspected something wrong with the file, but i do not find any differences between the
working and not-working version of files.
Please suggest us to resolve the same. Kindly let me know if any informations required.
I ran into the same issue and did some research to find a solution that worked from me. Apparently the SSIS package had gone through a conversion from an earlier version at one point. When the conversion was done, the text qualifier property on the flat file connection was mangled. It had originally been <none>, but the conversion changed it to _x003C_none_x003E_. I opened the flat file connection manager and changed the text qualifier property on the general tab back to the proper value of <none>.
Credit goes to this thread for providing the answer.
I had a similar issue. My flat file didn't had any text qualifiers. When i added a text qualifier the package ran successfully. My guess is that the file is read as text and the CRLF is not recognized at the last line.
If you can provide a sample of the data from the file