Is there an efficient command-line tool for prepending lines to a file inside a ZIP archive?
I have several large ZIP files containing CSV files missing their header, and I need to insert the header line. It's easy enough to write a script to extract them, prepend the header, and then re-compress, but the files are so large, it takes about 15 minutes to extract each one. Is there some tool that can edit the ZIP in-place without extracting?
Fast answer, no.
A zip file contains 1 to N file entries inside and all of them works as un splitable units, meaning that if you want to do something on an entry, you need to process this entry completely (i.e. extracting).
The only fast operation you can do is adding a new file to your archive. It will create a new entry and append it to the file, but this is probably not what you need
Related
I have a CSV file and I want to extract the element in the first row and 3rd column. How might I go about doing this?
I would load the CSV in a matrix and then take the relevant row/column; of course, you could ignore the non-relevant element while loading the CSV. How to do the aforementioned has already been answered e.g.
How can I read and parse CSV files in C++?
I use Stata 12.
I want to add some country code identifiers from file df_all_cities.csv onto my working data.
However, this line of code:
merge 1:1 city country using "df_all_cities.csv", nogen keep(1 3)
Gives me the error:
. run "/var/folders/jg/k6r503pd64bf15kcf394w5mr0000gn/T//SD44694.000000"
file df_all_cities.csv not Stata format
r(610);
This is an attempted solution to my previous problem of the file being a dta file not working on this version of Stata, so I used R to convert it to .csv, but that also doesn't work. I assume it's because the command itself "using" doesn't work with csv files, but how would I write it instead?
Your intuition is right. The command merge cannot read a .csv file directly. (using is technically not a command here, it is a common syntax tag indicating a file path follows.)
You need to read the .csv file with the command insheet. You can use it like this.
* Preserve saves a snapshot of your data which is brought back at "restore"
preserve
* Read the csv file. clear can safely be used as data is preserved
insheet using "df_all_cities.csv", clear
* Create a tempfile where the data can be saved in .dta format
tempfile country_codes
save `country_codes'
* Bring back into working memory the snapshot saved at "preserve"
restore
* Merge your country codes from the tempfile to the data now back in working memory
merge 1:1 city country using `country_codes', nogen keep(1 3)
See how insheet is also using using and this command accepts .csv files.
I want to be able to do Vimdiffs and Vimfolds on Bookmarks files that have been converted to CVS files ie with one description and one uri per line. However, because the Bookmarks file has multiple levels for the folders, the CSV file will also need fields for the different levels of folder names on each line.
I am new to jq but it seems like it should be able to do this sort of conversion?
Thanks,
Phil.
Have you tried to use any free tools like: https://json-csv.com/
or json2csv: https://www.npmjs.com/package/json2csv
If neither of those works, perhaps this approach.
When I need to reconstruct data I write a set of loops that identify each property I want for each line in my CSV. Let's say my JSON has Name, Email, Phone but for some reason all are at different object levels in my JSON.
First right a loop that resolves Name, then a loop for Email, and one for Phone. At the end of the first loop call the second, and from the second call the third.
Then you can use jq -n which allows to create JSON with no input.
So your CSV output would be like jq -n '{NewName: .["'$Name'"]}'
once you have a clean JSON with all data points at the same level CSV conversion is smooth.
Hope this helps
I have a set of csv files that are very simple to load into Stata using the -insheet- command. But they have very uninformative variable names. For each of these files, I also have a file of metadata consisting of two columns: the original (uninformative) variable names, and a description of what the variables actually mean. I'd like to use these metadata files to create variable labels, preferably without going through and typing up all the separate label commands or turning the metadata file into a dictionary for each file. It seems like there must be a quick way of loading the metadata file into Stata and looping through it to generate the label commands, but I don't know what it is. Any thoughts?
Ideally each line of the metadata is something like
varname1 "more interesting description"
in which case you can prefix each line with
label var
and then run the file as if it were a do-file using do. See the help for label. That is easy in a decent text editor, as for example searching for the start of each line and replacing it with label var (note the need for the space).
What could bite here includes:
You don't have double quotes " " as delimiters, in which case you need to insert them.
The extra information does not qualify as a variable label because it is more than 80 characters long. See help limits.
There are other ways to do this with Stata. You could write a program to read in the metadata and write out a do-file using file, but if this were my problem I would reach first for my text editor. (Most experienced Stata programmers use something else as well as doedit.)
I have files being generated by another program/user that have names such as "jh-1.txt, jh-2.txt, ..., jh-100.txt, ..., jh-1024.txt". I'm extracting a column from these files, manipulating the data, and outputting to a new matrix. The only problem is that Octave is using ASCII ordering and not natural ordering when reading in the files. Thus, the output matrix is not ordered in a natural way. My question is, can Octave sort file names in a natural order? I'm getting file names in the standard method:
fileDirectory = '/path/to/directory';
filePattern = fullfile(fileDirectory, '*.txt'); % Selects only the txt files.
dataFiles = dir(filePattern); % Gets the info from the txt files in the directory.
baseFileName = {dataFiles.name}'; % Gets all the txt file names.
I can't rename the files because this is a script for another user. They are on a Windows machine and already have Octave installed with Cygwin and I don't want to make them use the command line more than they have to because they are unfamiliar with it. Alternatively, it would be nice to have the output with the file names in a column but, I haven't figured that one out either (bit of a noob with Octave myself). That way the user could use Excel (which they are familiar with) to sort the columns.
I don't think there's a built in natural sort in Octave. However, there is a natural sort submission on Mathwork's File Exchange. I've not used it, but the comments imply it works in Octave too.