How can multiple input files be read and processed in awk? - html

I have a lot of .html files saved and I have already written a bunch of awk codes for Windows which do me the further text processing I need and works perfectly with one file, but I couldn't find a solution for reading the all of the files one ofter another and put it to results.txt?
awk -f C:/PLT2/parse.txt input_files > C:/PLT2/results.txt

This is a question for your OS, not for awk. The UNIX answer would be:
awk -f C:/PLT2/parse.txt input_file1 input_file2 input_fie3.... > C:/PLT2/results.txt
but you're not using UNIX, so....

Related

Calling Imagemagick from awk?

I have a CSV of image details I want to loop over in a bash script. awk seems like an obvious choice to loop over the data.
For each row, I want to take the values, and use them to do Imagemagick stuff. The following isn't working (obviously):
awk -F, '{ magick "source.png" "$1.jpg" }' images.csv
GNU AWK excels at processing structured text data, although it can be used to summon commands using system function it is less handy for that than some other language, e.g. python has module of standard library called subprocess which is more feature-rich.
If you wish to use awk for this task anyway, then I suggest preparing output to be feed into bash command, say you have file.txt with following content
file1.jpg,file1.bmp
file2.png,file2.bmp
file3.webp,file3.bmp
and you have files listed in 1st column in current working directory and wish to convert them to files shown in 2nd column and access to convert command, then you might do
awk 'BEGIN{FS=","}{print "convert \"" $1 "\" \"" $2 "\""}' file.txt | bash
which is equvialent to starting bash and doing
convert "file1.jpg" "file1.bmp"
convert "file2.png" "file2.bmp"
convert "file3.webp" "file3.bmp"
Observe that I have used literal " to enclose filenames, so it should work with names containing spaces. Disclaimer: it might fail if name containing special character, e.g. ".

Split large file size json into multiple files [duplicate]

I have json file exported from mongodb which looks like:
{"_id":"99919","city":"THORNE BAY"}
{"_id":"99921","city":"CRAIG"}
{"_id":"99922","city":"HYDABURG"}
{"_id":"99923","city":"HYDER"}
there are about 30000 lines, I want to split each line into it's own .json file. (I'm trying to transfer my data onto couchbase cluster)
I tried doing this:
cat cities.json | jq -c -M '.' | \
while read line; do echo $line > .chunks/cities_$(date +%s%N).json; done
but I found that it seems to drop loads of line and the output of running this command only gave me 50 odd files when I was expecting 30000 odd!!
Is there a logical way to make this not drop any data using anything that would suite?
Assuming you don't care about the exact filenames, if you want to split input into multiple files, just use split.
jq -c . < cities.json | split -l 1 --additional-suffix=.json - .chunks/cities_
In general to split any text file into separate files per-line using any awk on any UNIX system is simply:
awk '{close(f); f=".chunks/cities_"NR".json"; print > f}' cities.json

Csvkit how to stack many files

I have about 1.5k csv files and I need to stack them. (OS: win10)
How can i stack them using csvkit? (or maybe you can recommend something other that csvkit?)
I'm trying to the following. I created the following structure and write
cd files
for /r %i in (*) do csvstack -e utf-8 ../res.csv %i > ../res.csv
But it doesnt really work. Help please.
You can use
csvstack *.csv >./output.csv

Win7/GAWK .csv in .txt - Error Chinese Signs in Output and OFS not working

I'm using GAWK on Windows 7 PC to manipulate some txt-files and I'm totally newbie.
Currently I have several .csv-files that I want to merge together.
My .csv-files are separated by "\t" but I need that with ";".
So far GAWK recognizes the FS as "\t" but doesn't output the files with a ";" separator plus at the end of each line there are several "Chinese Signs" like "਍㄀嘀圀䈀匀㜀䄀㌀㄀䔀䌀".
Code-Line in Windows CMD:
gawk -v FS="\t" -v OFS=";" "{print $1$2}" test.csv > test.txt
I converted the .csv in .txt as well, had a look at it (tab separated) and used the .txt as input for my AWK:
gawk -v FS="\t" -v OFS=";" "{print $1$2}" test.txt
and had a look directly in CMD. No chinese signs but the columns are not separated by ";".
So far I'm just finding post/threads/documents about AWK/GAWK using on LINUX but none for Windows. I know that the functions are the same but the quoting is different. Maybe some of you has some tips for it?
I'm using GAWK with binaries of GnuWIN32 found Here.
Maybe some of you can help with my two problems?

Updating files using AWK: Why do I get weird newline character after each replacement?

I have a .csv containing a few columns. One of those columns needs to be updated to the same number in ~1000 files. I'm trying to use AWK to edit each file, but I'm not getting the intended result.
What the original .csv looks like
heading_1,heading_2,heading_3,heading_4
a,b,c,1
d,e,f,1
g,h,i,1
j,k,m,1
I'm trying to update column 4 from 1 to 15.
awk '$4="15"' FS=, OFS=, file > update.csv
When I run this on a .csv generated in excel, the result is a newline ^M character after the first line (which it updates to 15) and then it terminates and does not update any of the other columns.
It repeats the same mistake on each file when running through all files in a directory.
for file in *.csv; do awk '$4="15"' FS=, OFS=, $file > $file"_updated>csv"; done
Alternatively, if someone has a better way to do this task, I'm open to suggestions.
Excel is generating the control-Ms, not awk. Run dos2unix or similar on your file before running awk on it.
Well, I couldn't reproduce your problem in my linux as writing 15 to last column will overwrite the \r (the ^M is actually 0x0D or \r) before the newline \n, but you could always remove the \r first:
$ awk 'sub(/\r/,""); ...' file
I have had some issues with non-ASCII characters processed in a file in a differing locale, for example having a file with ISO-8859-1 encoding processed with Gnu awk in UTF8 shell.