Compiling csv into one master file then output error? - csv

I am trying to do something for my company. Basically what I need to do is
Compile all the csv in a folder into one master file.
From the master file, output potential error code found in the master file to the user.
The key thing of this is to make it automated. Meaning, I only want to press one button or do one step and it will do step 1 and 2 for me immediately.
The question is I have no idea what software or coding I should be using or looking at. Will be great if someone can enlighten on how I should approach this?
Note: I have limited knowledge of such things but am willing to learn.
====
Edit:
To give better example,
File1.csv
Voltage Ampere Power Error ID
==============================================
6V 3A 6W 18-ABB 000123
8V 2A 7W 0 123991
8V 10A 25W 25-ASB 461233
10V 23A 10W 18-ABB 248811
1V 2A 9W 0 321881
File2.csv
Voltage Ampere Power Error ID
==============================================
6V 4A 6W 0 312313
3V 5A 7W 0 123312
2V 10A 5W 25-ASB 461643
1V 2A 10W 18-ABB 656474
11V 2A 9W 0 124242
What I want to achieve,
Compile file1 and file 2 into one master.csv as below,
master.csv
File1
Voltage Ampere Power Error ID
==============================================
6V 3A 6W 18-ABB 000123
8V 2A 7W 0 123991
8V 10A 25W 25-ASB 461233
10V 23A 10W 18-ABB 248811
1V 2A 9W 0 321881
File2
Voltage Ampere Power Error ID
==============================================
6V 4A 6W 0 312313
3V 5A 7W 0 123312
2V 10A 5W 25-ASB 461643
1V 2A 10W 18-ABB 656474
11V 2A 9W 0 124242
The master.csv must contain the filenamewhen it is being compile. From master.csv, find and isolate the machine ID with the error code 18-ABB or 25-ASB (it will be variable but if its 0,it means no error) into a new called for example outputerror.csv file.
The headers (Voltage etc.) needs to be carry forward to the new outputerror.csv file.
Hence, the outputerror.csv should look like this,
outputerror.csv
Voltage Ampere Power Error ID
==============================================
File1
6V 3A 6W 18-ABB 000123
8V 10A 25W 25-ASB 461233
10V 23A 10W 18-ABB 248811
File2
2V 10A 5W 25-ASB 461643
1V 2A 10W 18-ABB 656474

Updated
#ECHO OFF
REM Delete any old output files, ignoring any error messages
DEL MASTER.CSV ERROR.CSV 2>NUL:
REM Keep track of file number in FNUM
SET /A FNUM=1
REM Loop through all files whose names look like "2015-03-01.CSV"
FOR %%A IN ( *-*-*.csv ) DO (
SET FNAME=%%A
CALL :PROCESSFILE
SET /A FNUM+=1
)
GOTO :EOF
REM ######################################################################
REM PROCESSFILE SUBROUTINE
REM ######################################################################
:PROCESSFILE
SET /A LNUM=1
REM New file, append its name to MASTER
ECHO %FNAME% >> MASTER.CSV
FOR /F "tokens=*" %%L IN (%FNAME%) DO (
SET LINE=%%L
CALL :PROCESSLINE
SET /A LNUM+=1
)
GOTO :EOF
REM ######################################################################
REM PROCESSLINE SUBROUTINE
REM ######################################################################
:PROCESSLINE
FOR /F "tokens=1-5 delims=," %%T in ("%LINE%") DO (
ECHO %LINE% >> MASTER.CSV
IF %LNUM% EQU 1 (
REM Output header line to ERROR if processing first file
IF %FNUM% EQU 1 ECHO %LINE% >> ERROR.CSV
REM Output filename to ERROR for all files
ECHO %FNAME% >> ERROR.CSV
) ELSE (
REM Output lines where field 4 is not "-" to ERROR
IF NOT "%%W" == "-" ECHO %LINE% >> ERROR.CSV
)
)
GOTO :EOF

This is actually MUCH easier using awk - in fact it is only 2 lines of code! I would suggest downloading awk.exe from here. It is INCREDIBLY powerful and will help with any scripting or text processing task.
The manual is available here.
The whole thing then becomes, many lines of comment and 2 lines of code (the third and the last line), which you run just the same as my other all-Windows solution.
#ECHO OFF
REM Print the contents of all CSV files whose names look like a date, e.g. 2012-11-01.csv, and add their name in ahead of line 3
awk "FNR==3{print FILENAME}1" *-*-*.csv > MASTER.CSV
REM From MASTER.CSV, print the following lines out to file ERROR.CSV:
REM ... first 3 lines, i.e. Record Number < 4
REM ... any lines containing "CSV" or "csv"
REM ... no lines with "Voltage" or "="
REM ... any lines with field4 != "0"
awk "NR<4 || /csv/ || /CSV/{print;next} /Voltage|=/{next} $4!=\""0\""" MASTER.CSV > ERROR.CSV

Related

Decoding a hex file

I would like to use a webservice who deliver a binary file with some data, I know the result but I don't know how I can decode this binary with a script.
Here is my binary :
https://pastebin.com/3vnM8CVk
0a39 0a06 3939 3831 3438 1206 4467 616d
6178 1a0b 6361 7264 6963 6f6e 5f33 3222
0d54 6865 204f 6c64 2047 7561 7264 2a02
....
Some part are in ASCII so it easy to catch, at the end of the file you got vehicle name in ASCII and some data, it should be kill/victory/battle/XP/Money data but I don't understand how I can decode these hexa value, I tried to compare 2 vehicles who got same kills but I don't see any match.
There is a way to decode these data ?
Thanks :)
Hello guys, after 1 year I started again to find a solution, so here is the structure of the packet I guessed : (the part between [ ] I still don't know what is it for)
[52 37 08 01 10] 4E [18] EA [01 25] AB AA AA 3E [28] D4 [01 30] EC [01 38] 88 01 [40] 91 05 [48] 9F CA 22 [50] F5 C2 9A 02 [5A 12]
| | | | | | | | |
Victories Victory Ratio| | Air target| Xp Money earned
| | | Ground Target
Battles Deaths Respawns
So here is the result :
Victory : 78
Battles : 234
Victory Ratio : ? (should be arround 33%)
Deaths : 212
Respawns : 236
Air Target : 136
Ground Target : 657
Xp : ? (should be arround 566.56k)
Money : ? (should be arround 4.63M)
Is there a special way to calculate the result of a long hex like this ?
F5 C2 9A 02 (should be arround 4.63M)
I tell you a bit more :
I know the result, but I don't know how to calculate it with these hex from the packet.
If I check a packet with a small amout of money or XP to be compatible with one hex :
[52 1E 08 01 10] 01 [18] [01 25] 00 00 80 3F [28] 01 [30] 01 [48] 24 [50] 6E [5A 09]
6E = 110 Money earned
24 = 36 XP earned
Another exemple :
[52 21 08 01 10] 02 [18] 03 [25] AB AA 2A 3F [28] 02 [30] 03 [40] 01 [48] 78 [50] C7 08 [5A 09]
XP earned = hex 78 = 120
Money earned = hex C7 08 = 705
How C7 08 can do 705 decimal ?
Here is the full content in case but I know how to isolate just these part I don't need to decode all these hex data :
https://pastebin.com/vAKPynNb
What you have asked is nothing but how to reverse engineer a binary file. Lot of threads already on SO
Reverse engineer a binary dictionary file to extract strings
Tools to help reverse engineer binary file formats
https://reverseengineering.stackexchange.com/questions/3495/what-tools-exist-for-excavating-data-structures-from-flat-binary-files
http://www.iwriteiam.nl/Ha_HTCABFF.html
The final take out on all is that no single solution for you, you need to spend effort to figure it out. There are tools to help you, but don't expect a magic wand tool to give you the structure/data.
Any kind of file read operation is done in text or binary format with basic file handlers. And some languages offer type reading of int, float etc. or arrays of them.
The complex operations behind these reading are almost always kept hidden from normal users. And then the user has to learn more when it comes to read/write operations of data structures.
In this case, OFFSET and SEEK are the words one must find value and act accordingly. whence data read, it must be converted to suitable data type too.
The following code shows basics for these operations to write data and read blocks to get numbers back. It is written in PHP as the OP has commented in the question he uses PHP.
Offset is calculated with these byte values to be 11: char: 1 byte, short: 2 bytes, int: 4 bytes, float: 4 bytes.
<?php
$filename = "testdata.dat";
$filehandle = fopen($filename, "w+");
$data=["test string","another test string",77,777,77777,7.77];
fwrite($filehandle,$data[0]);
fwrite($filehandle,$data[1]);
$numbers=array_slice($data,2);
fwrite($filehandle,pack("c1s1i1f1",...$numbers));
fwrite($filehandle,"end"); // gives 3 to offset
fclose($filehandle);
$filename = "testdata.dat";
$filehandle = fopen($filename, "rb+");
$offset=filesize($filename)-11-3;
fseek($filehandle,$offset);
$numberblock= fread($filehandle,11);
$numbersback=unpack("c1a/s1b/i1c/f1d",$numberblock);
var_dump($numbersback);
fclose($filehandle);
?>
Once this example understood, the rest is to find the data structure in the requested file. I had written another example but it uses assumptions. I leave the rest to readers to find what assumptions I made here. Be careful though: I know nothing about real structure and values will not be correct.
<?php
$filename = "testfile";
$filehandle = fopen($filename, "rb");
$offset=17827-2*41; //filesize minus 2 user area
fseek($filehandle,$offset);
print $user1 = fread($filehandle, 41);echo "<br>";
$user1pr=unpack("s1kill/s1victory/s1battle/s1XP/s1Money/f1Life",$user1);
var_dump($user1pr); echo "<br>";
fseek($filehandle,$offset+41);
print $user2 = fread($filehandle, 41);echo "<br>";
$user2pr=unpack("s1kill/s1victory/s1battle/i1XP/i1Money/f1Life",$user2);
var_dump($user2pr); echo "<br>";
echo "<br><br>";
$repackeduser2=pack("s3i2f1",$user2pr["kill"],$user2pr["victory"],
$user2pr["battle"],$user2pr["XP"],$user2pr["Money"],
$user2pr["Life"]
);
print $user2 . "<br>" .$repackeduser2;
print "<br>3*s1=6bytes, 2*i=6bytes, 1*f=*bytes (machine dependent)<br>";
print pack("s1",$user2pr["kill"]) ."<br>";
print pack("s1",$user2pr["victory"]) ."<br>";
print pack("s1",$user2pr["battle"]) ."<br>";
print pack("i1",$user2pr["XP"]) ."<br>";
print pack("i1",$user2pr["Money"]) ."<br>";
print pack("f1",$user2pr["Life"]) ."<br>";
fclose($filehandle);
?>
PS: pack and unpack uses machine dependent size for some data types such as int and float, so be careful with working them. Read Official PHP:pack and PHP:unpack manuals.
This looks more like the hexdump of a binary file. Some methods of converting hex to strings resulted in the same scrambled output. Only some lines are readable like this...
Dgamaxcardicon_32" The Old Guard
As #Tarun Lalwani said, you would have to know the structure of this data to get the in plaintext.
If you have access to the raw binary, you could try using strings https://serverfault.com/questions/51477/linux-command-to-find-strings-in-binary-or-non-ascii-file

Fixing broken csv files using awk

I have some csv files which are broken since there are junk such as control characters, enters and delimiters in some of the fields. An example mockup data without control characters:
id;col 1;col 2;col 3
1;data 11;good 21;data 31
2;data 12;cut
in two;data 32
3;data 13;good 23;data 33
4;data 14;has;extra delimiter;data 34
5;data 15;good 25;data 35
6;data 16;cut
and;extra delimiter;data 36
7;data 17;data 27;data 37
8;data 18;cut
in
three;data 38
9;data 19;data 29;data 39
I am processing above crap with awk:
BEGIN { FS=OFS=";" } # delimiters
NR==1 { nf=NF; } # header record is fine, use the NF
NR>1 {
if(NF<nf) { # if NF less that header's NF
prev=$0 # store $0
if(getline==1) { # read the "next" line
succ=$0 # set the "next" line to succ
$0=prev succ # rebuild a current record
}
}
if(NF!=nf) # if NF is still not adequate
$0=succ # expect original line to be malformed
if(NF!=nf) # if the "next" line was malformed as well
next # well skip "next" line and move to next
} 1
Naturally above program will fail records 4 and 6 (as the actual data has several fields where the extra delimiter may lurk) and 8 (since I only read the next line if NF is too short. I can live with loosing 4 and 6 but 8 might be doable?
Also, three successive ifs scream for a for loop but it's Friday afternoon here and my day is nearing $ and I just can't spin my head around it anymore. Do you guys have any brain reserve left I could borrow? Any best practices I didn't think of?
The key her is to keep a buffer containing the lines that are still not "complete"; once they are, print them and clear the buffer:
awk -F';' 'NF>=4 && !nf {print; next} # normal lines are printed
{ # otherwise,
if (nf>0) { # continue with a "broken" line by...
buff=buff OFS $0 # appending to the buffer
nf+=NF-1 # and adding NF
} else { # new "broken" line, so...
buff=$0 # start buffer
nf=NF # set number of fields already seen
}
}
nf>=4{ # once line is complete
print buff # print it
buff=""; nf=0 # and remove variables
}' file
Here, buff is such buffer and nf an internal counter to keep track of how many fields have been seen so far for the current record (like you did in your attempt).
We are adding NF-1 when appending to the buffer (that is, from the 2nd line of a broken stream) because a line with NF==1 does not add any record but just concatenates with the last field of the previous line:
8;data 18;cut # NF==3 |
in # NF==1 but it just continues $3 | all together, NF==4
three;data 38 # NF==2 but $1 continues $3 |
With your sample input:
$ awk -F';' 'NF>=4 && !nf {print; next} {buff=(nf>0 ? buff OFS : "") $0; nf+=(nf>0 ? NF-1 : NF)} nf>=4{print buff; buff=""; nf=0}' a
id;col 1;col 2;col 3
1;data 11;good 21;data 31
2;data 12;cut in two;data 32
3;data 13;good 23;data 33
4;data 14;has;extra delimiter;data 34
5;data 15;good 25;data 35
6;data 16;cut and;extra delimiter;data 36
7;data 17;data 27;data 37
8;data 18;cut in three;data 38
9;data 19;data 29;data 39

cmd batch search csv using wilcard

I have a CSV file which I have exported from our data module, which only contain product numbers. The product numbers are comma separated, which may or may not course a problem (I am not that good of a programmer). I found a fancy little batch here at stackoverflow, which has helped me to read the csv when running the batch, however, I am lost when it comes to trigger the right commands.
#echo off
set "theDir=C:\Web\Photos"
for /F "delims=" %%x in (C:\Web\MySQL_active_product_numbers.csv) do (
copy %theDir%\%%x*.* "C:\Web\ActivePhotos\"
)
I have set the directory I want to scan, using the theDir variable, and in my csv file I have a list a product numbers. Example of csv content (one column only):
10000,02,65
10000,25,65
10001,02,65
...
The files I want to copy contain more characters than what each line in the csv contain, which is why I need the wildcard search to locate and copy the file. Example of files.
10000,02,65 chocolate bar.jpg
10000,25,65 ice cream cone.jpg
10001,02,65 candy.jpg
....
I really want to copy the jpgs from one directory to another, but as you can see from my csv, I only have the product numbers to match the filename, but I can't figure out how to use the wilcard search within my batch, to loop through the csv, to locate each file and copy it to a different directory. I hope it all made sense, and I appreciate all your input and support with my batch problem. Thanks.
instead of
copy %theDir%\%%x*.* "C:\Web\ActivePhotos\"
try:
copy "%theDir%\%%x*.*" "C:\Web\ActivePhotos\*.*"
(your filename contains spaces!)
EDIT
well, it runs well at my computer. The problem with commas and spaces can easilybe solved by enclosing path\filenamein Doubleqoutes (") Here is the proof:
C:\Users\Stephan\CanCan>type t.bat
#echo off
dir C:\users\Stephan\CanCan\New\*.*
set "theDir=C:\users\Stephan\CanCan"
for /F "delims=" %%x in (C:\users\Stephan\CanCan\t.csv) do (
copy "%theDir%\%%x*.*" "C:\users\Stephan\CanCan\New\*.*"
)
dir C:\users\Stephan\CanCan\New\*.*
C:\Users\Stephan\CanCan>type t.csv
10000,02,65
10000,25,65
10001,02,65
C:\Users\Stephan\CanCan>
C:\Users\Stephan\CanCan>dir *.jpg
Datenträger in Laufwerk C: ist Boot
Volumeseriennummer: FA25-2E12
Verzeichnis von C:\Users\Stephan\CanCan
01.08.2013 18:45 6 10000,02,65 chocolate bar.jpg
01.08.2013 18:45 6 10000,25,65 ice cream cone.jpg
01.08.2013 18:45 6 10001,02,65 candy.jpg
3 Datei(en), 18 Bytes
0 Verzeichnis(se), 753.621.913.600 Bytes frei
C:\Users\Stephan\CanCan>t.bat
Datenträger in Laufwerk C: ist Boot
Volumeseriennummer: FA25-2E12
Verzeichnis von C:\users\Stephan\CanCan\New
01.08.2013 18:52 <DIR> .
01.08.2013 18:52 <DIR> ..
0 Datei(en), 0 Bytes
2 Verzeichnis(se), 753.621.913.600 Bytes frei
C:\users\Stephan\CanCan\10000,02,65 chocolate bar.jpg
1 Datei(en) kopiert.
C:\users\Stephan\CanCan\10000,25,65 ice cream cone.jpg
1 Datei(en) kopiert.
C:\users\Stephan\CanCan\10001,02,65 candy.jpg
1 Datei(en) kopiert.
Datenträger in Laufwerk C: ist Boot
Volumeseriennummer: FA25-2E12
Verzeichnis von C:\users\Stephan\CanCan\New
01.08.2013 18:53 <DIR> .
01.08.2013 18:53 <DIR> ..
01.08.2013 18:45 6 10000,02,65 chocolate bar.jpg
01.08.2013 18:45 6 10000,25,65 ice cream cone.jpg
01.08.2013 18:45 6 10001,02,65 candy.jpg
3 Datei(en), 18 Bytes
2 Verzeichnis(se), 753.621.913.600 Bytes frei
C:\Users\Stephan\CanCan>

A Windows script to merge three csv file columns into one single column

I can do this within excel for example using the =CONCATENATE function to merge a number of columns into one single column. But i want to do is merge columns within 3 different csv files within the same folder into one single column. I want to run this via batch script so something like a VBScript the CMD copy command does not seem to work.
Here is the file structure:
File1.csv
Column1: www.domain.com/
Column2: www.nwdomain.com/
Column3: www.stackdomain.com/
Column4: www.example-domain.com/
File2.csv
Column1: about
Column2: contact
Column3: index
Column4: faq
File3.csv
Column1: .html
Column2: .html
Column3: .html
Column4: .html
Result in output file:
Column1: www.domain.com/about.html
Column2: www.nwdomain.com/contact.html
Column3: www.stackdomain.com/index.html
Column4: www.example-domain.com/faq.html
Thanks for your help.
#ECHO OFF
SETLOCAL
::
(
FOR /f "tokens=1*delims=:" %%a IN ('findstr /n /r "." ^<csv1.csv') DO (
FOR /f "tokens=1*delims=:" %%c IN ('findstr /n /r "." ^<csv2.csv') DO (
IF %%a==%%c FOR /f "tokens=1*delims=:" %%e IN ('findstr /n /r "." ^<csv3.csv') DO (
IF %%a==%%e (
FOR /f "tokens=1-4delims=," %%m IN ("%%b") DO (
FOR /f "tokens=1-4delims=," %%r IN ("%%d") DO (
FOR /f "tokens=1-4delims=," %%w IN ("%%f") DO (
ECHO.%%m%%r%%w,%%n%%s%%x,%%o%%t%%y,%%p%%u%%z
)
)
)
)
)
)
)
)>new.csv
should work.
What it does is,
For file1, FINDSTR "outputs" any line which contains any character (/r ".") preceded by the line number and a colon (/n). This "output" is read by the FOR /f and parsed into 2 tokens, delimited by the colon (tokens=1* means 'the first token;all of the rest of the line') and the effect is to put the line number in %%a and the rest of the line, which is the line from the original .csv into %%b
FOR EACH LINE of csv1Repeat for csv2, this time placing the line number in %%c, line in %%d
Only if the line numbers match, repeat for csv3 with the number in %%e and text in %%f
If the line number from this last file matches, parse the line text in each of %%b, %%d and %%f - this time selecting the four columns, separated by commas. This data appears in %%m..%%p, %%r..%%u, %%w..%%z All we have to do then is butt-up the appropriate parts and insert the commas.
DONE!
Source and test results, including run time (5 rows)
start:21:45:40.87
end :21:45:41.09
csv1.csv =========
www.domain.com/,www.nwdomain.com/,www.stackdomain.com/,www.example-domain.com/
www.domain.com/,www.nwdomain.com/,www.stackdomain.com/,www.example-domain.com/
www.domain.com/,www.nwdomain.com/,www.stackdomain.com/,www.example-domain.com/
www.domain.com/,www.nwdomain.com/,www.stackdomain.com/,www.example-domain.com/
www.domain.com/,www.nwdomain.com/,www.stackdomain.com/,www.example-domain.com/
csv2.csv =========
about,contact,index,faq
about,contact,index,faq
about,contact,index,faq
about,contact,index,faq
about,contact,index,faq
csv3.csv =========
.html,.html,.html,.html
.html,.html,.html,.html
.html,.html,.html,.html
.html,.html,.html,.html
.html,.html,.html,.html
new.csv =========
www.domain.com/about.html,www.nwdomain.com/contact.html,www.stackdomain.com/index.html,www.example-domain.com/faq.html
www.domain.com/about.html,www.nwdomain.com/contact.html,www.stackdomain.com/index.html,www.example-domain.com/faq.html
www.domain.com/about.html,www.nwdomain.com/contact.html,www.stackdomain.com/index.html,www.example-domain.com/faq.html
www.domain.com/about.html,www.nwdomain.com/contact.html,www.stackdomain.com/index.html,www.example-domain.com/faq.html
www.domain.com/about.html,www.nwdomain.com/contact.html,www.stackdomain.com/index.html,www.example-domain.com/faq.html
=============
In VBScript:
Const delim = ","
Set fso = CreateObject("Scripting.FileSystemObject")
Set f1 = fso.OpenTextFile("File1.csv")
Set f2 = fso.OpenTextFile("File2.csv")
Set f3 = fso.OpenTextFile("File3.csv")
Do Until f1.AtEndOfStream Or f2.AtEndOfStream Or f3.AtEndOfStream
a1 = Split(f1.ReadLine, delim)
a2 = Split(f2.ReadLine, delim)
a3 = Split(f3.ReadLine, delim)
n = Min(UBound(a1), UBound(a2), UBound(a3))
Dim aout(n)
For i = 0 To n
aout(i) = a1(i) & a2(i) & a3(i)
Next
WScript.StdOut.WriteLine Join(aout, delim)
Loop
f1.Close
f2.Close
f3.Close
Function Min(a, b, c)
If a<=b Then
If c<a Then
Min = c
Else
Min = a
End If
Else
If c<b Then
Min = c
Else
Min = b
End If
End If
End Function
Although not really programming, a quick and dirty way is to open all files in Excel, create a new XLS or XLSX file, then use this formula in the first cell of the newly created file:
=[File1.csv]File1!A1&[File2.csv]File2!A1&[File3.csv]File3!A1
where File1.csv, File2.csv, and File3.csv are your CSV files.
Then drag across the columns / rows to apply the formula.

Automatically sum numeric columns and print total

Given the output of git ... --stat:
3 files changed, 72 insertions(+), 21 deletions(-)
3 files changed, 27 insertions(+), 4 deletions(-)
4 files changed, 164 insertions(+), 0 deletions(-)
9 files changed, 395 insertions(+), 0 deletions(-)
1 files changed, 3 insertions(+), 2 deletions(-)
1 files changed, 1 insertions(+), 1 deletions(-)
2 files changed, 57 insertions(+), 0 deletions(-)
10 files changed, 189 insertions(+), 230 deletions(-)
3 files changed, 111 insertions(+), 0 deletions(-)
8 files changed, 61 insertions(+), 80 deletions(-)
I wanted to produce the sum of the numeric columns but preserve the formatting of the line. In the interest of generality, I produced this awk script that automatically sums any numeric columns and produces a summary line:
{
for (i = 1; i <= NF; ++i) {
if ($i + 0 != 0) {
numeric[i] = 1;
total[i] += $i;
}
}
}
END {
# re-use non-numeric columns of last line
for (i = 1; i <= NF; ++i) {
if (numeric[i])
$i = total[i]
}
print
}
Yielding:
44 files changed, 1080 insertions(+), 338 deletions(-)
Awk has several features that simplify the problem, like automatic string->number conversion, all arrays as associative arrays, and the ability to overwrite auto-split positional parameters and then print the equivalent lines.
Is there a better language for this hack?
Perl - 47 char
Inspired by ChristopheD's awk solution. Used with the -an command-line switch. 43 chars + 4 chars for the command-line switch:
$i-=#a=map{($b[$i++]+=$_)||$_}#F}{print"#a"
I can get it to 45 (41 + -ap switch) with a little bit of cheating:
$i=0;$_="Ctrl-M#{[map{($b[$i++]+=$_)||$_}#F]}"
Older, hash-based 66 char solution:
#a=(),s#(\d+)(\D+)#$b{$a[#a]=$2}+=$1#gefor<>;print map$b{$_}.$_,#a
Ruby — 87
puts ' '+[*$<].map(&:split).inject{|i,j|[0,3,5].map{|k|i[k]=i[k].to_i+j[k].to_i};i}*' '
Python - 101 chars
import sys
print" ".join(`sum(map(int,x))`if"A">x[0]else x[0]for x in zip(*map(str.split,sys.stdin)))'
Using reduce is longer at 126 chars
import sys
print" ".join(reduce(lambda X,Y:[str(int(x)+int(y))if"A">x[0]else x for x,y in zip(X,Y)],map(str.split,sys.stdin)))
AWK - 63 characters
(in a bash script, $1 is the filename provided as command line argument):
awk -F' ' '{x+=$1;y+=$4;z+=$6}END{print x,$2,$3,y,$5,z,$7}' $1
One could of course also pipe the input in (would save another 3 characters when allowed).
This problem is not challenging or difficult... it is "cute" though.
Here is solution in Python:
import sys
r = []
for s in sys.stdin:
r = map(lambda x,y:(x or 0)+int(y) if y.isdigit() else y, r, s.split())
print ' '.join(map(str, r))
What does it do... it keeps tally in r while proceeding line by line. Splits the line, then for each element of the list, if it is a number, adds it to the tally or keeps it as string. At the end they all get re-mapped to string and merged with spaces in between to be printed.
Alternative, more "algebraic" implementation, if we did not care about reading all input at once:
import sys
def totalize(l):
try: r = str(sum(map(int,l)))
except: r = l[-1]
return r
print ' '.join(map(totalize, zip(*map(str.split, sys.stdin))))
What does this one do? totalize() takes a list of strings and tries to calculate sum of the numbers; if that fails, it simply returns the last one. zip() is fed with a matrix that is list of rows, each of them being list of column items in the row - zip transposes the matrix so it turns into list of column items and then totalize is invoked on each column and the results are joined as before.
At the expense of making your code slightly longer, I moved the main parsing into the BEGIN clause so the main clause is only processing numeric fields. For a slightly larger input file, I was able to measure a significant improvement in speed.
BEGIN {
getline
for (i = 1; i <= NF; ++i) {
# need to test for 0, too, in this version
if ($i == 0 || $i + 0 != 0) {
numeric[i] = 1;
total[i] = $i;
}
}
}
{
for (i in numeric) total[i] += $i
}
END {
# re-use non-numeric columns of last line
for (i = 1; i <= NF; ++i) {
if (numeric[i])
$i = total[i]
}
print
}
I made a test file using your data and doing paste file file file ... and cat file file file ... so that the result had 147 fields and 1960 records. My version took about 1/4 as long as yours. On the original data, the difference was not measurable.
JavaScript (Rhino) - 183 154 139 bytes
Golfed:
x=[n=0,0,0];s=[];readFile('/dev/stdin').replace(/(\d+)(\D+)/g,function(a,b,c){x[n]+=+b;s[n++]=c;n%=3});print(x[0]+s[0]+x[1]+s[1]+x[2]+s[2])
Readable-ish:
x=[n=0,0,0];
s=[];
readFile('/dev/stdin').replace(/(\d+)(\D+)/g,function(a,b,c){
x[n]+=+b;
s[n++]=c;
n%=3
});
print(x[0]+s[0]+x[1]+s[1]+x[2]+s[2]);
PHP 152 130 Chars
Input:
$i = "
3 files changed, 72 insertions(+), 21 deletions(-)
3 files changed, 27 insertions(+), 4 deletions(-)
4 files changed, 164 insertions(+), 0 deletions(-)
9 files changed, 395 insertions(+), 0 deletions(-)
1 files changed, 3 insertions(+), 2 deletions(-)
1 files changed, 1 insertions(+), 1 deletions(-)
2 files changed, 57 insertions(+), 0 deletions(-)
10 files changed, 189 insertions(+), 230 deletions(-)
3 files changed, 111 insertions(+), 0 deletions(-)
8 files changed, 61 insertions(+), 80 deletions(-)";
Code:
$a = explode(" ", $i);
foreach($a as $k => $v){
if($k % 7 == 0)
$x += $v;
if(3-$k % 7 == 0)
$y += $v;
if(5-$k % 7 == 0)
$z += $v;
}
echo "$x $a[1] $a[2] $y $a[4] $z $a[6]";
Output:
44 files changed, 1080 insertions(+), 338 deletions(-)
Note: explode() will require that there is a space char before the new line.
Haskell - 151 135 bytes
import Char
c a b|all isDigit(a++b)=show$read a+read b|True=a
main=interact$unwords.foldl1(zipWith c).map words.filter(not.null).lines
... but I'm sure it can be done better/smaller.
Lua, 140 bytes
I know Lua isn't the best golfing language, but compared by the size of the runtimes, it does pretty well I think.
f,i,d,s=0,0,0,io.read"*a"for g,a,j,b,e,c in s:gmatch("(%d+)(.-)(%d+)(.-)(%d+)(.-)")do f,i,d=f+g,i+j,d+e end print(table.concat{f,a,i,b,d,c})
PHP, 176 166 164 159 158 153
for($a=-1;$a<count($l=explode("
",$i));$r=explode(" ",$l[++$a]))for($b=-1;$b<count($r);$c[++$b]=is_numeric($r[$b])?$c[$b]+$r[$b]:$r[$b]);echo join(" ",$c);
This would, however, require the whole input in $i... A variant with $i replaced with $_POST["i"] so it would be sent in a textarea... Has 162 chars:
for($a=-1;$a<count($l=explode("
",$_POST["i"]));$r=explode(" ",$l[$a++]))for($b=0;$b<count($r);$c[$b]=is_numeric($r[$b])?$c[$b]+$r[$b]:$r[$b])$b++;echo join(" ",$c);
This is a version with
NO HARDCODED COLUMNS