I tried a few solutions here, but I wasn`t able to get a solution.
Short summary what it needs to do:
I have a lot ov CSV files and the first 6 lines included header information. I want to create a batch to delete the first lines for all CSV files.
Related
I have a CSV file and I want to extract the element in the first row and 3rd column. How might I go about doing this?
I would load the CSV in a matrix and then take the relevant row/column; of course, you could ignore the non-relevant element while loading the CSV. How to do the aforementioned has already been answered e.g.
How can I read and parse CSV files in C++?
I have several huge (>2GB) JSON files that end in ,\n]. Here is my test file example, which is the last 25 characters of a 2 GB JSON file:
test.json
":{"value":false}}}}}},
]
I need to delete the ,\n and add back in the ] from the last three characters of the last line. The entire file is on three lines: both the front and end brackets are on their own line, and all the contents of the JSON array is on the second line.
I can't load the entire stream into memory to do something like:
string[0..-2]
because the file is way too large. I tried several approaches, including Ruby's:
chomp!(",\n]")
and UNIX's:
sed
both of which made no change to my JSON file. I viewed the last 25 characters by doing:
tail -c 25 filename.json
and also did:
ls -l
to verify that the byte size of the new and the old file versions were the same.
Can anyone help me understand why none of these approaches is working?
It's not necessary to read in the whole file if you're looking to make a surgical operation like this. Instead you can just overwrite the last few bytes in the file:
file = 'huge.json'
IO.write(file, "\n]\n", File.stat(file).size - 5)
The key here is to write as many bytes out as you back-track from the end, otherwise you'll need to trim the file length, though you can do that as well if necessary with truncate.
I've got a very large .csv file, which contains 10 million lines of data. The file size is around 250 MB. Each line contains three values and looks like this:
-9.8199980e-03,183,-4.32
I want to delete every 2nd line or e.g. copy every 10th line straight into a new file. Which program should I use and can you also post the code?
I tried it with Scilab and Excel; they couldn't open the file or just a small part of it. I can open the file in Notepad++, but when I tried to record and run a macro, which deletes every 2nd line, it crashed.
I would recommend you install gawk/awk from here and harness the power of this brilliant tool.
If you want every other line:
gawk "NR%2" original.csv > new.csv
If you want every 10th line:
gawk 'NR%10==0" original.csv > new.csv
Is there an efficient command-line tool for prepending lines to a file inside a ZIP archive?
I have several large ZIP files containing CSV files missing their header, and I need to insert the header line. It's easy enough to write a script to extract them, prepend the header, and then re-compress, but the files are so large, it takes about 15 minutes to extract each one. Is there some tool that can edit the ZIP in-place without extracting?
Fast answer, no.
A zip file contains 1 to N file entries inside and all of them works as un splitable units, meaning that if you want to do something on an entry, you need to process this entry completely (i.e. extracting).
The only fast operation you can do is adding a new file to your archive. It will create a new entry and append it to the file, but this is probably not what you need
I have a folder with around 400 .txt files that I need to convert to .csv. When I batch rename them to .csv, all the columns get smushed together into one. Same thing happens when I convert to .xls then .csv, even though the columns are fine in .xls. If I open the .xls file and save as to .csv, it's fine, but this would require opening all 400 files.
I am working with sed from the mac terminal. After navigating to the folder that contains the files within the terminal, here is some code that did not work:
for file in *.csv; do sed 's/[[:blank:]]+/,/g'
for file in *.csv; do sed -e "s/ /,/g"
for file in *.csv; do s/[[:space:]]/,/g
for file in *.csv; do sed 's/[[:space:]]{1,}/,/g'
Any advice on how to restore the column structure to the csv files would be much appreciated. And it's probably already apparent but I'm a coding newb so please go easy. Thanks!
Edit: here is an example of how the xls columns look, and how they should look in csv format:
Dotsc.exe 2/12/15 1:17 PM 0 Nothing 1 Practice
Everything that is separated by spaces here (except the space between 7 and PM) are separated by columns in the file. Here is what it looks like when I rename the batch rename the file to .csv:
Dotsc.exe 2/12/15 1:17 PM 0 Nothing 1 Practice
Columns have now turned into spaces, and all data is in the same column. Hope that clarifies things.
I think that what you are trying to do only in batch is not possible . I suggest you to use some library in Java.
Take a look here : http://poi.apache.org/spreadsheet/