Given a TSV file with col2 that contains either a field or record separator (FS/RS) being respectively a tab or a carriage return which are escaped/surrounded by quotes.
$ printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | \cat -vet
col1^Icol2^Icol3$
1^I"A^IB"^I1234$
2^I"CD$
EF"^I567$
+------+---------+------+
| col1 | col2 | col3 |
+------+---------+------+
| 1 | "A B" | 1234 |
| 2 | "CD | 567 |
| | EF" | |
+------+---------+------+
Is there a way in sed/awk/perl or even (preferably) miller/mlr to transform those pesky characters into spaces in order to generate the following result:
+------+---------+------+
| col1 | col2 | col3 |
+------+---------+------+
| 1 | "A B" | 1234 |
| 2 | "CD EF" | 567 |
+------+---------+------+
I cannot get miller 6.2 to make the proper transformation (tried with DSL put/gsub) because it doesn't recognize the tab or CR/LF being part of the columns which breaks the field number:
$ printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | mlr --opprint --barred --itsv cat
mlr : mlr: CSV header/data length mismatch 3 != 4 at filename (stdin) line 2.
A good library cleanly handles things like embedded newlines and quoted separators (in fields)
In a Perl script with Text::CSV
use warnings;
use strict;
use Text::CSV;
my $file = shift // die "Usage: $0 filename\n";
my $csv = Text::CSV->new( { binary => 1, sep_char => "\t", auto_diag => 1 } );
open my $fh, '<', $file or die "Can't open $file: $!";
while (my $row = $csv->getline($fh)) {
s/\s+/ /g for #$row; # collapse multiple spaces, tabs, newlines
$csv->say(*STDOUT, $row);
}
Note the many other options for the constructor that can help handle various irregularities.
This can fit in a one-liner; its functional interface (with csv) is particularly well suited for that.
if you run
printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | \
mlr --c2t --fs "\t" clean-whitespace
col1 col2 col3
1 A B 1234
2 CD EF 567
I'm using mlr 6.2.
A way to do it in miller 5 is to use simply the put verb:
printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | \
mlr --tsv put -S 'for (k in $*) {$[k] = gsub($[k], "\n", " ")}' then clean-whitespace
perl -MText::CSV_XS=csv -e'
csv
in => *ARGV,
on_in => sub { s/\s+/ /g for #{$_[1]} },
sep_char => "\t";
'
Or s/[\t\n]/ /g if you prefer.
Can be placed all on one line.
Input is accepted from file named by argument or STDIN.
With GNU awk for multi-char RS, RT, and gensub():
$ awk -v RS='"([^"]|"")*"' '{ORS=gensub(/[\n\t]/," ","g",RT)} 1' file
col1 col2 col3
1 "A B" 1234
2 "CD EF" 567
The above just uses RS to isolate each "..." string and saves it in RT, then replaces every \n or \t in that string with a blank and saves the result in ORS, then prints the record.
you absolutely don't need gawk to get this done - here's one that works for mawk, gawk, or macos nawk :
INPUT
--before--
col1 col2 col3
1 "A B" 1234
2 "CD
EF" 567
CODE
{m,n,g}awk '
BEGIN {
1 __=substr((OFS=FS="\t\"")(FS)(ORS=_)\
(RS = "^$"),_+=_^=_<_,_)
}
END {
1 printbefore()
3 for (_^=_<_; _<=NF; _++) {
3 sub(/[\t-\r]+/, ($_~__)?" ":"&", $_)
}
1 print
}
1 function printbefore(_)
{
1 printf("\n\n--before--\n%s\n------"\
"------AFTER------\n\n", $+_)>("/dev/stderr")
}
OUTPUT
———AFTER (using mawk)------
col1 col2 col3
1 "A B" 1234
2 "CD EF" 567
strip out the part about printbefore() that's more for debugging purposes, then it's just
{m,n,g}awk '
BEGIN { __=substr((OFS=FS="\t\"") FS \
(ORS=_) (RS="^$"),_+=_^=_<_,_)
} END {
for(--_;_<=NF;_++) {
sub(/[\t-\r]+/, $_~__?" ":"&",$_) } print }'
I'm trying to make request in jq:
cat testfile.txt | jq 'fromjson | select(.kubernetes.pod.memory.usage.bytes != null) .kubernetes.pod.memory.usage.bytes, ."#timestamp"'
My output is:
"2019-03-15T00:24:21.733Z"
"2019-03-15T00:25:10.169Z"
"2019-03-15T00:24:47.908Z"
105889792
"2019-03-15T00:25:04.446Z"
34557952
"2019-03-15T00:25:04.787Z"
How to delete excess dates?
For example output only:
105889792
"2019-03-15T00:25:04.446Z"
34557952
"2019-03-15T00:25:04.787Z"
You just need to add a pipe after select :
cat testfile.txt | jq 'fromjson | select(.kubernetes.pod.memory.usage.bytes != null) | .kubernetes.pod.memory.usage.bytes, ."#timestamp"'
Here's a DRYer (as in dry) solution:
.["#timestamp"] as $ts | .kubernetes.pod.memory.usage.bytes // empty | ., $ts
Note that this particular use of // assumes that you wish to treat null, false, and a missing key in the same way. If not, you can still use the same idea to stay DRY.
I have hundreds of CSV files. Each CSV file is similar to this:
| KEYWORD | NUMBER OF COMPS | AVGE M E (K) | GS/M | EST. A SE/M | C CORE |
|---------|-----------------|--------------|------|-------------|--------|
| Apples | 311 | 12 | N/A | <100 | 10 |
| Bananas | >1,200 | 737 | N/A | 490 | 88 |
| Oranges | 48 | 184 | N/A | N/A | 1 |
| Fruits | 161 | 94 | N/A | - | 6 |
(I have posted this in table format, to make it more readable, but the CSV data is at the bottom of this post).
All the CSV files have the same header row. Only the data is different.
I would like to do the following:
Merge all the CSV files together, but only have 1 header row.
Omit any rows where EST. A SE/M (Column 5) contains any of the following data: <100, N/A or -
Notes about the Data
Sometimes the some or even all cells in the CSV file are wrapped in quotation marks.
Other times they are not.
Sometimes the first column (keyword) may contain multiple words or accented characters.
My code so far
This code merges all the CSV files into 1 without only one heading
awk '(NR == 1) || (FNR > 1)' *.csv > ^0-output.csv
This works perfectly.
However, I am not sure how to delete the unwanted rows after the merge.
So far I have this:
awk '$5 !~ /(<100|N\/A|-)/' ^0-output.csv > ^0-output.csv
But when I use this code, it just produces a blank file.
Plus, I am not sure if there is a way to integrate it in the first line, so it does everything with a single command.
Notes
Here is how the data looks in CSV format
Sample1.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Apples,311,12,N/A,<100,10
Bananas,">1,200",737,N/A,490,88
Oranges,48,184,N/A,N/A,1
Fruits,161,94,N/A,-,63
Sample2.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Dino,588,67,N/A,888,234
Thunder,">1,200",211,N/A,<100,77
Ninja,95,37,N/A,-,878
Sample3.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Blur,84,2454,N/A,-,234
Sample4.csv
"KEYWORD","NUMBER OF COMPS","AVGE M E (K)","GS/M","EST. A SE/M","C CORE"
"hedgehog rolls ròund",32,481,N/A,"878",13
"Clever Fox jumps Hîgh",233,83,N/A,"<100",12
"Bear à lot",122,35,N/A,"-",11
"kitten hîgh life","121","673","32","N/A","15"
Please note: The actual files that the finished script will be used on will have a variety of file names. They will NOT always follow the pattern of sample 1, sample 2 etc.
Expected Output
Expected output: (CSV format)
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Bananas,">1,200",737,N/A,490,88
Dino,588,67,N/A,888,234
"hedgehog rolls ròund",32,481,N/A,"878",13
(Note: It doesn't matter if the expected output keeps the wrapping quote marks as the final CSV file is opened in Apple Numbers)
Expected output: (Readable format)
| KEYWORD | NUMBER OF COMPS | AVGE M E (K) | GS/M | EST. A SE/M | C CORE |
|---------|-----------------|--------------|------|-------------|--------|
| Bananas | >1,200 | 737 | N/A | 490 | 88 |
| Dino | 588 | 67 | N/A | 888 | 234 |
| hedgehog rolls ròund | 588 | 67 | N/A | 888 | 234 |
Environment:
I am using Mac OS X 10.14.6. I am unable to install other versions of awk.
You may just add merge 2 conditions into one using && :
awk -F, 'NR==1 || (FNR>1 && $5 !~ /^(<100|N\/A|-)$/)' *.csv > output.csv
Here $5 !~ /^(<100|N\/A|-)$/) will skip a row if $5 is <100 or - or N/A. It is important to use regex anchors ^ and $ to avoid matching unwanted string such as 1000 or AB-123.
It seems you have a comma in double quotes also in file1.csv. In that case following gnu-awk command should work from you:
awk -v FPAT='"[^"]*"|[^,]*' '
NR == 1 || (FNR > 1 && $5 !~ /^(<100|N\/A|-)*$/)' *.csv > output.csv
EDIT: As per OP's comments there could be a comma in between " too, so to handle that its better to use FPAT, written and tested with GNU awk.
awk -v FPAT='[^,]*|"[^"]+"' '
{ sub(/\r$/,"") }
FNR==1{
if(NR==1){ print }
next
}
$5=="<100"||$5=="N/A"||$5=="-"{
next
}
1
' *.csv
Could you please try following, written and tested with GNU awk on shown samples only.
awk '
BEGIN{
FS=OFS=","
}
FNR==1{
if(NR==1){ print }
next
}
$5=="<100"||$5=="N/A"||$5=="-"{ next }
1
' *.csv
OR in case your values can contain something else also and you want to use regex to match the values which you want to neglect then try following.
awk '
BEGIN{
FS=OFS=","
}
FNR==1{
if(NR==1){ print }
next
}
$5~/<100/ || $5~/N\/A/ || $5~/-/{ next }
1
' *.csv
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="," ##Setting field separator as comma here.
}
FNR==1{ ##Checking condition if its firt line of current Input_file then do following.
if(NR==1){ print } ##If its very first line of very first Input_file then print that line.
next ##next will skip all further statements from here.
}
$5=="<100"||$5=="N/A"||$5=="-"{ next } ##Checking condition if 5th field contains either <100 OR N/A OR - then skip all further statements.
1 ##awk'sh way to print the current line.
' *.csv ##Passing all .csv files to awk program from here.
It looks to me like you're only interested in testing the 2nd-last field and neither that nor the last field can contain commas so just count field numbers from the end instead of from the beginning of each line and then you don't care whether earlier fields contain commas or not. Given that, this will work using any awk:
$ awk -F',' '(NR==1) || (FNR>1 && $(NF-1)!~/^"?(<100|N\/A|-)"?$/)' *.csv
KEYWORD,NUMBER OF COMPS,AVGE M E (K),GS/M,EST. A SE/M,C CORE
Bananas,">1,200",737,N/A,490,88
Dino,588,67,N/A,888,234
"hedgehog rolls ròund",32,481,N/A,"878",13
I have some data on my db and I'm printing those to screen vertically but I want to print them on three different column. They can be in div, table etc.
For two column I found a solution.
If current row is even number, print location A
else location B
but if we want to print to more than two column how can we do that?
For example input must like that..
Name1 | Name2 | Name3
Name4 | Name5 | Name6
..... | ..... | .....
This way is much better than the previous I suggested. Took me a while to find it, but I knew I had done it somewhere!
echo '<table><tr>';
for ($i=0; $i < 40; $i++) {
foreach ($data as $el) {
echo '<td>' . $el . '</td>';
if ($i++ > 1 && $i++ % 3 == 0 ) echo '</tr><tr>';
}
}
echo '</tr></table>';
I have some results from a mysql table that I would like to export, I am currently able to click a download link and download an xls but I would like to be able to run this via a cron job and have the weekly results email to me.
I have looked at doing this from Mysql and save it out as a csv directly.
However I am struggling with the SQL, the table format is as follows
btFormQuestions (some columns ommitted)
+-------+---------------+----------+-----------+
| msqID | questionSetId | Question | InputType |
|-------+---------------+----------+-----------+
| 1 | 123456 | Name | field |
| 2 | 123456 | Telephone| field |
| 3 | 123456 | Email | email |
| 4 | 123456 | Enquiry | test |
btFormAnswers
+-----+------+-------+-----------------+
| aID | asID | msqID | answer |
+-----+------+-------+-----------------+
| 1 | 1 | 1 | Sean |
| 2 | 1 | 2 | 0800 0 |
| 3 | 1 | 3 | se#te.com |
| 4 | 1 | 4 | Asking Question |
btFormAnswersSet
+------+---------------+---------------------+
| asID | questionSetId | created |
+------+---------------+---------------------+
| 1 | 123456 | 2013-04-30 11:07:55 |
The sql queries, I am currently using to get the information into PHP and into an array is as follows:
//get answers sets
$sql='SELECT * FROM btFormAnswerSet AS aSet '.
'WHERE aSet.questionSetId='.$questionSet.' ORDER BY created DESC LIMIT 0, 100;
$answerSetsRS=$db->query($sql);
//load answers into a nicer multi-dimensional array
$answerSets=array();
$answerSetIds=array(0);
while( $answer = $answerSetsRS->fetchRow() ){
//answer set id - question id
$answerSets[$answer['asID']]=$answer;
$answerSetIds[]=$answer['asID'];
}
//get answers
$sql='SELECT * FROM btFormAnswers AS a WHERE a.asID IN ('.join(',',$answerSetIds).')';
$answersRS=$db->query($sql);
//load answers into a nicer multi-dimensional array
while( $answer = $answersRS->fetchRow() ){
//answer set id - question id
$answerSets[$answer['asID']]['answers'][$answer['msqID']]=$answer;
}
return $answerSets;
I would like to be able to do one of the following
A.) Move all of this into one query to be able to get the following sort of result
+---------------+------+-----------+-----------+-----------------+
| QuestionSetID | Name | Telephone | Email | Enquiry |
+---------------+------+-----------+-----------+-----------------+
| 123456 | Sean | 0800 0 | se#te.com | Asking Question |
(I did try this with various joins but could not get them quite right)
If I could get this to work I would not mind saving as a CSV
B.) Output the returned array as excel file that can be saved to a location on the server,
The current code creates a html table from the array
The code is a little long so I am only pasting the top and bottom bits here
//fwrite($handle, $excelHead);
//fwrite($handle, $row);
//fflush($handle);
ob_start();
header("Content-Type: application/vnd.ms-excel");
echo "<table>\r\n";
//Question headers go here
foreach($answerSets as $answerSetId=>$answerSet){
$questionNumber=0;
$numQuestionsToShow=2;
echo "\t<tr>\r\n";
echo "\t\t<td>". $dateHelper->getSystemDateTime($answerSet['created'])."</td>\r\n";
foreach($questions as $questionId=>$question){
$questionNumber++;
if ($question['inputType'] == 'checkboxlist'){
$options = explode('%%', $question['options']);
$subanswers = explode(',', $answerSet['answers'][$questionId]['answer']);
for ($i = 1; $i <= count($options); $i++)
{
echo "\t\t<td align='center'>\r\n";
if (in_array(trim($options[$i-1]), $subanswers)) {
// echo "\t\t\t".$options[$i-1]."\r\n";
echo "x";
} else {
echo "\t\t\t \r\n";
}
echo "\t\t</td>\r\n";
//fwrite($handle, $node);
//fflush($handle);
}
}elseif($question['inputType']=='fileupload'){
echo "\t\t<td>\r\n";
$fID=intval($answerSet['answers'][$questionId]['answer']);
$file=File::getByID($fID);
if($fID && $file){
$fileVersion=$file->getApprovedVersion();
echo "\t\t\t".''.$fileVersion->getFileName().''."\r\n";
}else{
echo "\t\t\t".t('File not found')."\r\n";
}
echo "\t\t</td>\r\n";
}else{
echo "\t\t<td>\r\n";
echo "\t\t\t".$answerSet['answers'][$questionId]['answer'].$answerSet['answers'][$questionId]['answerLong']."\r\n";
echo "\t\t</td>\r\n";
}
//fwrite($handle, $node);
//fflush($handle);
}
echo "\t</tr>\r\n";
//fwrite($handle, $row);
//fflush($handle);
}
echo "</table>\r\n";
//fwrite($handle, $excelFoot);
//fflush($handle);
//fclose($handle);
file_put_contents($filePath, ob_get_clean());
I can get the file to save to the directory but I am having issues setting it as an Excel file, I have also tried, playing with Fwrite (instead of the buffer) with the similar results
can anyone help, or point me in the right location.
Thank you,
Sean
I would do this from within concrete5. That way you get all the form-results-related models, plus the various helpers (like email).
For more info about jobs, see http://www.concrete5.org/documentation/developers/system/jobs/ . To run from a cron job, see http://www.concrete5.org/documentation/how-tos/developers/how-to-run-certain-jobs-via-cron/ .
It looks like you've got the code to generate the answers, and put it into an array, but you might want to look at something like https://github.com/concrete5/concrete5/blob/master/web/concrete/core/controllers/blocks/form_statistics.php#L32 . I'm not positive that's exactly what you need, but I do know that the dashboard page builds that answers table for you, so the code clearly exists somewhere.
Finally, to create an excel file, elsewhere c5 uses the "put it into a table and call it .xls" method, which works with excel and open office. I'm not sure exactly what you mean by "having issues setting it as Excel", but it sounds like this is your issue at the moment. If something is getting saved to the file, then you should post the file contents and you/we can work backwards as to what is causing the issue. It's probably just misformatted HTML or something.
Finally, to send the email, you can use the Mail Helper, but that doesn't currently allow for attachments (there's a pull request in github that does, and that you could use to override the mail helper with). Typically, the "best practice" would be to send it as a link.