TCL: file normalize gives unexpected output - tcl

I have the following line of code:
file normalize [string map {\\ /} $file]
The string map operation is to make the line work for paths containing backslashes instead of forward (as is the case in Windows)
For some values of $file (let's say it's "/path/to/my/file") I get output similar to:
/path/to/"/path/to/my/file/"
This doesn't happen for all paths but I'm unable to figure out what causes it. There are no symbolic links in the path.
Is there something I'm doing wrong, or is there an alternative to file normalize that I could try?
my tcl version is 8.5
UPDATE:
On further investigation I see that the string map is not making any difference. The output of file normalize itself is coming with that extra text before the desired text. Also, the extra text seems to be from a previous run of the code.
UPDATE 2: It was because of the quotation marks in the input to file normalize

Most likely the path has backslashes where it shouldn't have them.
% file normalize {"/path/to/some/file"}
/path/to/"/path/to/some/file"
% file normalize \"/path/to/some/file\"
/path/to/"/path/to/some/file"
Perhaps some pathname handling code escaped special characters for some reason and left the path in a mangled state.
I would try to keep the pathname pristine and when it needs to be changed for display or other processing, make a copy of it first.

Related

Octave - dlmread and csvread convert the first value to zero

When I try to read a csv file in Octave I realize that the very first value from it is converted to zero. I tried both csvread and dlmread and I'm receiving no errors. I am able to open the file in a plain text editor and I can see the correct value there. From what I can tell, there are no funny hidden characters, spacings, or similar in the csv file. Files also contain only numbers. The only thing that I feel might be important is that I have five columns/groups that each have different number of values in them.
I went through the commands' documentation on Octave Forge and I do not know what may be causing this. Does anyone have an idea what I can troubleshoot?
To try to illustrate the issue, if I try to load a file with the contents:
1.1,2.1,3.1,4.1,5.1
,2.2,3.2,4.2,5.2
,2.3,3.3,4.3,
,,3.4,4.4
,,3.5,
Command window will return:
0.0,2.1,3.1,4.1,5.1
,2.2,3.2,4.2,5.2
,2.3,3.3,4.3,
,,3.4,4.4
,,3.5,
( with additional trailing zeros after the decimal point).
Command syntaxes I'm using are:
dt = csvread("FileName.csv")
and
dt = dlmread("FileName.csv",",")
and they both return the same.
Your csv file contains a Byte Order Mark right before the first number. You can confirm this if you open the file in a hex editor, you will see the sequence EF BB BF before the numbers start.
This causes the first entry to be interpreted as a 'string', and since strings are parsed based on whether there are numbers in 'front' of the string sequence, this is parsed as the number zero. (see also this answer for more details on how csv entries are parsed).
In my text editor, if I start at the top left of the file, and press the right arrow key once, you can tell that the cursor hasn't moved (meaning I've just gone over the invisible byte order mark, which takes no visible space). Pressing backspace at this point to delete the byte order mark allows the csv to be read properly. Alternatively, you may have to fix your file in a hex editor, or find some other way to convert it to a proper Ascii file (or UTF without the byte order mark).
Also, it may be worth checking how this file was produced; if you have any control in that process, perhaps you can find why this mark was placed in the first place and prevent it. E.g., if this was exported from Excel, you can choose plain 'csv' format instead of 'utf-8 csv'.
UPDATE
In fact, this issue seems to have already been submitted as a bug and fixed in the development branch of octave. See #58813 :)

Remove last few characters from big JSON file

I have several huge (>2GB) JSON files that end in ,\n]. Here is my test file example, which is the last 25 characters of a 2 GB JSON file:
test.json
":{"value":false}}}}}},
]
I need to delete the ,\n and add back in the ] from the last three characters of the last line. The entire file is on three lines: both the front and end brackets are on their own line, and all the contents of the JSON array is on the second line.
I can't load the entire stream into memory to do something like:
string[0..-2]
because the file is way too large. I tried several approaches, including Ruby's:
chomp!(",\n]")
and UNIX's:
sed
both of which made no change to my JSON file. I viewed the last 25 characters by doing:
tail -c 25 filename.json
and also did:
ls -l
to verify that the byte size of the new and the old file versions were the same.
Can anyone help me understand why none of these approaches is working?
It's not necessary to read in the whole file if you're looking to make a surgical operation like this. Instead you can just overwrite the last few bytes in the file:
file = 'huge.json'
IO.write(file, "\n]\n", File.stat(file).size - 5)
The key here is to write as many bytes out as you back-track from the end, otherwise you'll need to trim the file length, though you can do that as well if necessary with truncate.

Why are uri chars (or at least spaces) being dropped on an html file upload?

I have a file upload form and would like to use the filename on the server, however I notice that when I upload it the spaces are dropped. On the client/browser I can do something like this in an event called after the input type='file' element has changed:
function process_svg (e) {
var files = e.target.files || e.originalEvent.dataTransfer.files;
console.log(files[0].filename);
And if I upload a file with the name 'some file - type.ext' 'some file - type.ext' will be printed in the console. On the server (running bottle) however if I run:
#route('/some_route')
def some_route():
print(request.files['form_name_attr'].filename)
I get 'somefile-type.ext.' I am guessing this has to do with uri escaping (or lack there of), but since you cannot change a file preupload how do you get around this and preserve it? Strangely I cannot find mention of this on google, in part I have had trouble thinking of appropriate search terms, but I'm also aware that this may not actually be native behaviour, but a bug elsewhere in my code.
I do not think that is the case as I've issued these console.log and print statements at the end (right before the upload) and beginning (right when the server starts processing the request) and do not believe I really have any code to touch it in between, however if that is the case please let me know as I could be looking in the wrong direction.
You want raw_filename, not filename.
(Note that it may contain unsafe characters.)
#route('/some_route', method='POST')
def some_route():
print(request.files['form_name_attr'].filename) # "cleaned" file name
print(request.files['form_name_attr'].raw_filename) # unmodified file name
Found this in the source code for FileUpload.filename:
Only ASCII letters, digits, dashes, underscores and dots are allowed
in the final filename. Accents are removed, if possible. Whitespace is
replaced by a single dash. Leading or tailing dots or dashes are
removed. The filename is limited to 255 characters.

Changing The Delimiter to CTRL+A in Python CSV Module

I'm trying to write a csv file with the delimiter ctrl+a. I'm going to have to eventually write the file to hadoop and I'm unable to use a standard delimiter.
Currently I'm trying this:
writer = csv.writer(f, delimiter = "\u0001")
for item in aList:
writer.writerow(item)
f.close()
However, the outputted excel file doesn't appear to be written correctly...
Some rows are condensed into one block, while others will have one field in the first and then the rest condensed into the second block, etc.
Is the error where I'm setting up the writer object, or am I just not familiar with separating files this way?
You can try using the nonprinting "group separator" character, which can be represented in python code as '\035'
see http://www.asciitable.com/index/asciifull.gif for some other nonprinting characters if you need more.
It may be helpful to include more context about why you want to use nonstandard delimiter. And whether Excel parsing of the file is necessary, or just a quick check to see if the file might be parsed properly by the target system, Hadoop.

In Stata, how do I add variable labels from a separate csv file?

I have a set of csv files that are very simple to load into Stata using the -insheet- command. But they have very uninformative variable names. For each of these files, I also have a file of metadata consisting of two columns: the original (uninformative) variable names, and a description of what the variables actually mean. I'd like to use these metadata files to create variable labels, preferably without going through and typing up all the separate label commands or turning the metadata file into a dictionary for each file. It seems like there must be a quick way of loading the metadata file into Stata and looping through it to generate the label commands, but I don't know what it is. Any thoughts?
Ideally each line of the metadata is something like
varname1 "more interesting description"
in which case you can prefix each line with
label var
and then run the file as if it were a do-file using do. See the help for label. That is easy in a decent text editor, as for example searching for the start of each line and replacing it with label var (note the need for the space).
What could bite here includes:
You don't have double quotes " " as delimiters, in which case you need to insert them.
The extra information does not qualify as a variable label because it is more than 80 characters long. See help limits.
There are other ways to do this with Stata. You could write a program to read in the metadata and write out a do-file using file, but if this were my problem I would reach first for my text editor. (Most experienced Stata programmers use something else as well as doedit.)