Merging PDF's with special characters in file name with Sejda Console via .CSV - csv

I'm new to this forum and I'm not a programmer, so forgive me if I'm asking stupid questions...
I'm trying to merge some pdf's to 1 pdf with Sejda Console using a .csv file, but when the .csv contains special characters (e.g. ø) Sejda returns with:
Invalid filename found: File 'Something � something.pdf"...
So, it changed ø in �.
I've tried to import the .csv with different encoding standards (via Notepad save as: ANSI, UNICODE and UTF-8) and they all don't work (but, they have all a unique way to screw up the filename...)
Without this kind of characters it works fine.
It also works fine when the file names with ø are given directly in the syntax, like:
sejda-console-3.1.3/bin/sejda-console merge -f first.pdf second.pdf -o merged.pdf
And a second problem occurred: when a comma exists in the file name, the file name stops by the comma. Logically when the list separator is still a comma, but on my pc the list separator is a semicolon (Regional and Language Options). Adding quotes around the file name doesn't work...
I call the batch of Sejda with:
call "C:\sejda-console-3.0.13\bin\sejda-console.bat" merge -l 28.csv -o 28.pdf
And for this test 28.csv contains:
1700050.1_0060764-CROSS TRACK SKATE AXLE.pdf,
1700050.1_0060792-ø32 ATK10K6 FIXING PLATE.pdf,
1700050.1_0060798-CROSS TRACK SKATE NUTPLATE.pdf,
What is the proper way to get Sejda to merge correctly?

Related

Invalid literal because symbol appears when reading a csv file

When I am using replit I can remove the little symbol that appears when I drag and drop in a csv file so my main.py can read it, otherwise I get invalid literal base 10 issue. I am trying to run this on local machine with sublime text and getting same error now as it is reading the file from the directory, so I assume it is adding this symbol in before reading.... I can click on the csv file in replit and edit, but cannot do this in sublime.
Can someone explain what this is for? HOw can I get it to read the basic comma delimited numbers in the file (It is a game tile map).
with open(f'level{level}_data.csv', newline= '') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
Saved it is comma delimited csv instead of UTF-8 comma delimited csv. It then imports without the 'question mark in a diamon' symbol. I understand this is an unrecognised special character, but I have nothing apart from integers in my table. Maybe someone could clarify that?...

CVS -- Need command line to change status of file file from Binary to allow keyword substitution

I am coming into an existing project after several years of use. I have been attempting to add the nice keywords $Header$ and $Id$ so that I can identify the file versions in use.
I have come across several text files where these keywords did not expand at all. Investigation has determined that CVS thinks these files are BINARY and will not expand the keywords.
Is there anyway from a Linux Command Line invocation to permanently change the status of these files in the repository to cause keyword expansion? I'd be appreciative if you could tell me. Several attempts that I have tried have not succeeded.
cvs admin -kkv filename
will restore the file to the default text mode so keywords are expanded.
If you type
cvs log -h filename
(to show just the header and not the entire history), a binary file will show
keyword substitution: b
which indicates that keyword substitution is never done, while a text file will show
keyword substitution: kv
The CVSROOT/cvswrappers file can be used to specify the default new files you add, based on their names.

Remove all binary characters from a file

Occasionally, I have a hard time manipulating data in a CSV file because of the following error.
Binary file (standard input) matches
I researched several articles online but cannot seem to find one that helps me remove all of the binary characters or elements from a CSV file.
Unfortunately, I do not know where to start with this.
If I run the 'file' command on the file, I get the following output:
Little-endian UTF-16 Unicode text, with very long lines, with CRLF, CR line terminators
The second from last line in the file prints as:
"???? ?????, ???? ???",????,"?????, ????",???,,,,,,,,,,,,,,,,,,,,,,,,* Home,email#address.com,,
The second line in the file prints as:
,,,,,,,,,,,,,,,,,,,,,,,,,,,* ,email#address.com,,
This file contains too many lines to open in Excel or a GUI, "Save as..." and remove the binary elements that way.
Please help me. Thank you!

Change .xls to .csv without losing columns

I have a folder with around 400 .txt files that I need to convert to .csv. When I batch rename them to .csv, all the columns get smushed together into one. Same thing happens when I convert to .xls then .csv, even though the columns are fine in .xls. If I open the .xls file and save as to .csv, it's fine, but this would require opening all 400 files.
I am working with sed from the mac terminal. After navigating to the folder that contains the files within the terminal, here is some code that did not work:
for file in *.csv; do sed 's/[[:blank:]]+/,/g'
for file in *.csv; do sed -e "s/ /,/g"
for file in *.csv; do s/[[:space:]]/,/g
for file in *.csv; do sed 's/[[:space:]]{1,}/,/g'
Any advice on how to restore the column structure to the csv files would be much appreciated. And it's probably already apparent but I'm a coding newb so please go easy. Thanks!
Edit: here is an example of how the xls columns look, and how they should look in csv format:
Dotsc.exe 2/12/15 1:17 PM 0 Nothing 1 Practice
Everything that is separated by spaces here (except the space between 7 and PM) are separated by columns in the file. Here is what it looks like when I rename the batch rename the file to .csv:
Dotsc.exe 2/12/15 1:17 PM 0 Nothing 1 Practice
Columns have now turned into spaces, and all data is in the same column. Hope that clarifies things.
I think that what you are trying to do only in batch is not possible . I suggest you to use some library in Java.
Take a look here : http://poi.apache.org/spreadsheet/

Migrating MS Access data to MySQL: character encoding issues

We have an MS Access .mdb file produced, I think, by an Access 2000 database. I am trying to export a table to SQL with mdbtools, using this command:
mdb-export -S -X \\ -I orig.mdb Reviewer > Reviewer.sql
That produces the file I expect, except one thing: Some of the characters are represented as question marks. This: "He wasn't ready" shows up like this: "He wasn?t ready", only in some cases (primarily single/double curly quotes), where maybe the content was pasted into the DB from MS Word. Otherwise, the data look great.
I have tried various values for "export MDB_ICONV=". I've tried using iconv on the resulting file, with ISO-8859-1 in the from/to, with UTF-8 in the from/to, with WINDOWS-1250 and WINDOWS-1252 and WINDOWS-1256 in the from, in various combinations. But I haven't succeeded in getting those curly quotes back.
Frankly, based on the way the resulting file looks, I suspect the issue is either in the original .mdb file, or in mdbtools. The malformed characters are all single question marks, but it is clear that they are not malformed versions of the same thing; so (my gut says) there's not enough data in the resulting file; so (my gut says) the issue can't be fixed in the resulting file.
Has anyone run into this one before? Any tips for moving forward? FWIW, I don't have and never have had MS Access -- the file is coming from a 3rd party -- so this could be as simple as changing something on the database, and I would be very glad to hear that.
Thanks.
Looks like "smart quotes" have claimed yet another victim.
MS word takes plain ascii quotes and translates them to the double-byte left-quote and right-quote characters and translates a single quote into the double byte apostrophe character. The double byte characters in question blelong to to an MS code page which is roughly compatable with unicode-16 except for the silly quote characters.
There is a perl script called 'demoroniser.pl' which undoes all this malarky and converts the quotes back to plain ASCII.
It's most likely due to the fact that the data in the Access file is UTF, and MDB Tools is trying to convert it to ascii/latin/is0-8859-1 or some other encoding. Since these encodings don't map all the UTF characters properly, you end up with question marks. The information here may help you fix your encoding issues by getting MDB Tools to use the correct encoding.