psql to csv file '-' becomes '-0'

psql to csv file '-' becomes '-0' - csv

I want to output the results from a psql query to a csv file. I have used the following approach
\o test.csv
SELECT myo_date, myo_maps_study, cbp_lvef, cbp_rvef, myx_ecg_posneg, myx_st, std_drugs, std_reason_comment FROM myo INNER JOIN studies ON (myo_std_uid = std_uid) LEFT OUTER JOIN cbp on (std_uid = cbp_std_uid) LEFT OUTER JOIN myx on (std_uid = myx_std_uid) WHERE myo_maps_study ~ 'MYO[0-9]*\$' AND std_reason_comment ~ 'AF' AND cbp_lvef is not null AND myx_st IS NOT NULL AND std_drugs IS NOT NULL ORDER by myo_date DESC LIMIT 500;
\q
The results on the query on its own is as follows
06/11/2013 | MYO134537 | 36.75000 | 29.00000 | - | 0.0 | ASPIRIN;BISOPROLOL;LISINOPRIL;METFORMIN;PPI;STATIN;FLUOXETINE;AMLODIPINE;GTN | CPOE;AF;T2DM;POSET
31/10/2013 | MYO130555 | 45.00000 | 36.25000 | - | 0.0 | DILTIAZEM;STATIN;LISINOPRIL;ASPIRIN;FRUSEMIDE;SALBUTAMOL;PARACETAMOL;AMOXICILLIN | TROP-VE; CP; AF; CTPA-VE; ANT T; INV; RF
23/10/2013 | MYO130538 | 18.75000 | 18.50000 | + | -1.0 | ASPIRIN;BISOPROLOL;RAMIPRIL | AF;MR;QLVFN;FAILED CARDIOVERSION
18/10/2013 | MYO134510 | 39.50000 | 32.25000 | - | 0.0 | ASPIRIN;STATIN;CO-CODAMOL;BISOPROLOL;GTN;PPI | PVD;AF
18/10/2013 | MYO130537 | 19.00000 | 18.00000 | - | 0.0 | STATIN;RAMIPRIL;AMLODIPINE;WARFARIN;(METOPROLOL-STOPPED FOR TEST) | TIA;AF;RF+++;ETINAP
However the csv file (opened in open office) looks like this
06/11/2013 MYO134537 36.75 29 -0 0 ASPIRIN;BISOPROLOL;LISINOPRIL;METFORMIN;PPI;STATIN;FLUOXETINE;AMLODIPINE;GTN CPOE;AF;T2DM;POSET
31/10/2013 MYO130555 45 36.25 -0 0 DILTIAZEM;STATIN;LISINOPRIL;ASPIRIN;FRUSEMIDE;SALBUTAMOL;PARACETAMOL;AMOXICILLIN TROP-VE; CP; AF; CTPA-VE; ANT T; INV; RF
23/10/2013 MYO130538 18.75 18.5 0 -1 ASPIRIN;BISOPROLOL;RAMIPRIL AF;MR;QLVFN;FAILED CARDIOVERSION
18/10/2013 MYO134510 39.5 32.25 -0 0 ASPIRIN;STATIN;CO-CODAMOL;BISOPROLOL;GTN;PPI PVD;AF
18/10/2013 MYO130537 19 18 -0 0 STATIN;RAMIPRIL;AMLODIPINE;WARFARIN;(METOPROLOL-STOPPED FOR TEST) TIA;AF;RF+++;ETINAP
The '-' signs have become -0 and '+' have become 0. For clarity, I would like to change these to N and P respectively.
Doing a more test.csv gives
06/11/2013,MYO134537,36.75,29,-0,0,ASPIRIN;BISOPROLOL;LISINOPRIL;METFORMIN;PPI;STATIN;FLUOXETINE;AMLODIPINE;GTN,CPOE;AF;T2DM;POSET,,
31/10/2013,MYO130555,45,36.25,-0,0,DILTIAZEM;STATIN;LISINOPRIL;ASPIRIN;FRUSEMIDE;SALBUTAMOL;PARACETAMOL;AMOXICILLIN,TROP-VE; CP; AF; CTPA-VE; ANT T; INV; RF,,
23/10/2013,MYO130538,18.75,18.5,0,-1,ASPIRIN;BISOPROLOL;RAMIPRIL,AF;MR;QLVFN;FAILED CARDIOVERSION,,
18/10/2013,MYO134510,39.5,32.25,-0,0,ASPIRIN;STATIN;CO-CODAMOL;BISOPROLOL;GTN;PPI,PVD;AF,,
18/10/2013,MYO130537,19,18,-0,0,STATIN;RAMIPRIL;AMLODIPINE;WARFARIN;(METOPROLOL-STOPPED FOR TEST),TIA;AF;RF+++;ETINAP,,
However, when I select the cell in open office the contents of -0 or 0 cells is always 0. This does not allow me to do a search a replace. I do not want to change these manually.
Can I force the + and - through using a psql command or can I use some other linux tool to change the -0 to N and 0 to P. I am using RHEL6.

Try using the decode function in place of the field name.
decode(myx_ecg_posneg,'-','N','+','P')
Update: Sorry, that's pl/sql. Try the case expression:
CASE myx_ecg_posneg
WHEN '-' THEN 'N'
WHEN '+' THEN 'P'
END

Related

Find and replace '000L' from all rows

I'm trying to remove certain letters from part numbers but I'm having difficulties trying to get it working correctly.
This is where I'm at right now. It's non functional.
SELECT REPLACE(`part`, '[0-9]L', '') FROM `table` WHERE (`part ` LIKE '%[0-9]L')
Essentially say I have these five items:
D39J02GEN
20F934L
2984CPL
29048L20GEN
1120934L
I only want the ones in bold to be detected. So where they end in L, only if they have a number before the L.
Edit: this one gets close:
SELECT * FROM `table ` WHERE `part` REGEXP '^[0-9].*L';
but still shows ones where there is anything after the L. This is also no closer to removing the letter L.

If you know the value is at the end, then do:
SELECT LEFT(part, LENGTH(part) - 2)
FROM `table`
WHERE part REGEXP '[0-9]L$';
This would be much trickier if the pattern were in the middle of the string.

Something like this should also work if the match is always required at the end of the text.
Query
SELECT
*
FROM
t
WHERE
SUBSTRING(REVERSE(t.text_string), 1, 1) = 'L'
AND
SUBSTRING(REVERSE(t.text_string), 2) >> 0 <> 0
Result
| text_string |
| ----------- |
| 20F934L |
| 1120934L |
see demo
Note
SUBSTRING(REVERSE(t.text_string), 2) >> 0 basically means CAST(SUBSTRING(REVERSE(t.text_string), 2) AS UNSIGNED) here
Why this works?
I use MySQL's loose autocasting feature which can convert 439F02 in a INT 439 but it can't convert PC4892 into a INT it would be converted into 0
See the below resultset based on the query
Query
SELECT
*
, SUBSTRING(REVERSE(t.text_string), 1, 1)
, SUBSTRING(REVERSE(t.text_string), 2)
, SUBSTRING(REVERSE(t.text_string), 2) >> 0
, SUBSTRING(REVERSE(t.text_string), 2) >> 0 <> 0
FROM
t
Result
| text_string | SUBSTRING(REVERSE(t.text_string), 1, 1) | SUBSTRING(REVERSE(t.text_string), 2) | SUBSTRING(REVERSE(t.text_string), 2) >> 0 | SUBSTRING(REVERSE(t.text_string), 2) >> 0 <> 0 |
| ----------- | --------------------------------------- | ------------------------------------ | ----------------------------------------- | ---------------------------------------------- |
| D39J02GEN | N | EG20J93D | 0 | 0 |
| 20F934L | L | 439F02 | 439 | 1 |
| 2984CPL | L | PC4892 | 0 | 0 |
| 29048L20GEN | N | EG02L84092 | 0 | 0 |
| 1120934L | L | 4390211 | 4390211 | 1 |
Here is a demo to see the above results for yourself.

CSV output file using command line for wireshark IO graph statistics

I save the IO graph statistics as CSV file containing the bits per second using the wireshark GUI. Is there a way to generate this CSV file with command line tshark? I can generate the statistics on command line as bytes per second as follows
tshark -nr test.pcap -q -z io,stat,1,BYTES
How do I generate bits/second and save it to a CSV file?
Any help is appreciated.

I don't know a way to do that using only tshark, but you can easily parse the output from tshark into a CSV file:
tshark -nr tmp.pcap -q -z io,stat,1,BYTES | grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" | awk -F '[ |]+' '{print $2","($5*8)}'
Explanations
grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" selects only the raw from the tshark output with the actual data (i.e., second <> second | transmitted bytes).
awk -F '[ |]+' '{print $2","($5*8)}' splits that data into 5 blocks with [ |]+ as the separator and display blocks 2 (the second at which starts the interval) and 5 (the transmitted bytes) with a comma between them.

Another thing that may be good to know:
If you change the interval from 1 second to 0.5 seconds, then you have to allow . in the grep part by adding \. between two digits \d .
Otherwise the result will be an empty *.csv file.
grep -P "\d{1,2}\.{1}\d{1,2}\s+<>\s+\d{1,2}\.{1}\d{1,2}\s*\|\s+\d+"

The answers in this thread gave me the keys to solving a similar problem with tshark io stats and I wanted to share the results and how it works. In my case, the task was to convert multiple columns of tshark io stat records with potential decimals in the data. This answer converts multiple data columns to csv, adds rudimentary headers, accounts for decimals in fields and variable numbers of spaces.
Complete command string
tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10" \
| grep -P "\d+\.?\d*\s+<>\s+|Interval +\|" \
| tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g' > somefile.csv
Explanation
The command string has 3 major parts.
tshark creates the report with the data in columns
Extract the desired lines with grep
Use tr and sed to convert the records grep matched into a csv delimited file.
Part 1: tshark creates the report with the data in columns
tshark is run with -z io,stat at a 30 second interval, counting frames and bytes with various filters.
tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10"
Here is the output when run against my test pcap file:
=================================================================================================
| IO Statistics |
| |
| Duration: 179.179180 secs |
| Interval: 30 secs |
| |
| Col 1: Frames and bytes |
| 2: FRAMES |
| 3: BYTES |
| 4: FRAMES()ip.src == 10.10.10.10 |
| 5: BYTES()ip.src == 10.10.10.10 |
| 6: FRAMES()ip.dst == 10.10.10.10 |
| 7: BYTES()ip.dst == 10.10.10.10 |
|-----------------------------------------------------------------------------------------------|
| |1 |2 |3 |4 |5 |6 |7 |
| Interval | Frames | Bytes | FRAMES | BYTES | FRAMES | BYTES | FRAMES | BYTES |
|-----------------------------------------------------------------------------------------------|
| 0 <> 30 | 107813 | 120111352 | 107813 | 120111352 | 26682 | 15294257 | 80994 | 104808983 |
| 30 <> 60 | 122437 | 124508575 | 122437 | 124508575 | 49331 | 17080888 | 73017 | 107422509 |
| 60 <> 90 | 138999 | 135488315 | 138999 | 135488315 | 54829 | 22130920 | 84029 | 113348686 |
| 90 <> 120 | 158241 | 217781653 | 158241 | 217781653 | 42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 | 43709 | 18800647 | 67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 | 50754 | 22053280 | 72786 | 120574520 |
=================================================================================================
Considerations
Looking at this output, we can see several items to consider:
Rows with data have a unique sequence in the Interval column of "space<>space", which we will can use for matching.
We want the header line, so we will use the word "Interval" followed by spaces and then a "|" character.
The number of spaces in a column are variable depending on the number of digits per measurement.
The Interval column gives both the time from 0 and the from the first measurement. Either can be used, so we will keep both and let the user decide.
When using milliseconds there will be decimals in the Interval field
Depending on the statistic requested, there may be decimals in the data columns
The use of "|" as delimiters will require escaping in any regex statement that covers them.
Part 2: Extract the desired lines with grep
Once tshark produces output, we use grep with regex to extract the lines we want to save.
grep -P "\d+\.?\d*\s+<>\s+|Interval +\|""
grep will use the "Digit(s)Space(s)<>Space(s)" character sequence in the Interval column to match the lines with data. It also uses an OR to grab the header by matching the characters "Interval |".
grep -P # The "-P" flag turns on PCRE regex matching, which is not the same as egrep. With egrep, you will need to change the escaping.
"\d+ # Match on 1 or more Digits. This is the 1st set of numbers in the Interval column.
\.? # 0 or 1 Periods. We need this to handle possible fractional seconds.
\d* # 0 or more Digits. To handle possible fractional seconds.
\s+<>\s+ # 1 or more Spaces followed by the Characters "<>", then 1 or more Spaces.
| # Since this is not escaped, it is a regex OR
Interval\s+\|" # Match the String "Interval" followed by 1 or more Spaces and a literal "|".
From the tshark output, grep matched these lines:
| Interval | Frames | Bytes | FRAMES | BYTES | FRAMES | BYTES | FRAMES | BYTES |
| 0 <> 30 | 107813 | 120111352 | 107813 | 120111352 | 26682 | 15294257 | 80994 | 104808983 |
| 30 <> 60 | 122437 | 124508575 | 122437 | 124508575 | 49331 | 17080888 | 73017 | 107422509 |
| 60 <> 90 | 138999 | 135488315 | 138999 | 135488315 | 54829 | 22130920 | 84029 | 113348686 |
| 90 <> 120 | 158241 | 217781653 | 158241 | 217781653 | 42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 | 43709 | 18800647 | 67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 | 50754 | 22053280 | 72786 | 120574520 |
Part 3: Use tr and sed to convert the records grep matched into a csv delimited file.
tr and sed are used for converting the lines grep matched into csv. tr does the bulk work of removing spaces and changing the "|" to ",". This is simpler and faster then using sed. However, sed is used for some cleanup work
tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g'
Here is how these commands perform the conversion. The first trick is to get rid of all of the spaces. This means we dont have to account for them in any regex sequences, making the rest of the work simpler
| tr -d " " # Spaces are in the way, so delete them.
| tr "|" "," # Change all "|" Characters to ",".
| sed -E 's/<>/,/; # Change "<>" to "," splitting the Interval column.
s/(^,|,$)//g; # Delete leading and/or trailing "," on each line.
s/Interval/Start,Stop/g' # Each of the "Interval" columns needs a header, so change the text "Interval" into two words with a , separating them.
> somefile.csv # Pipe the output into somefile.csv
Final result
Once through this process, we have a csv output that can now be imported into your favorite csv tool, spreadsheet, or fed to a graphing program like gnuplot.
$cat somefile.csv
Start,Stop,Frames,Bytes,FRAMES,BYTES,FRAMES,BYTES,FRAMES,BYTES
0,30,107813,120111352,107813,120111352,26682,15294257,80994,104808983
30,60,122437,124508575,122437,124508575,49331,17080888,73017,107422509
60,90,138999,135488315,138999,135488315,54829,22130920,84029,113348686
90,120,158241,217781653,158241,217781653,42103,15870237,115971,201901201
120,150,111708,131890800,111708,131890800,43709,18800647,67871,113082296
150,Dur,123736,142639416,123736,142639416,50754,22053280,72786,120574520

bash - extract data from mysql table (GROUP BY)- how to process

I have mySQL table:
+----+---------------------+-------+
| id | timestamp | value |
+----+---------------------+-------+
| 1 | 2016-03-29 18:53:28 | 1 |
| 2 | 2016-03-29 20:26:06 | 1 |
| 3 | 2016-03-29 20:26:22 | 1 |
+----+---------------------+-------+
3 rows in set (0.00 sec)
It is a table to hold water consumption data (each 1 in value is a 1 liter of water).
I wrote a bash script to extract data - sum of litres of water by months.
watersum=`echo " SELECT MONTHNAME(timestamp), SUM(value) FROM woda GROUP BY YEAR(timestamp), MONTH(timestamp);" | mysql -s -u$SQUSER -p$SQPASS -h$SQHOST $SQLDB`
echo $watersum
gives me:
March 693 April 9768 May 11277 June 11987 July 10047 August 8570
I would like to save this data in json file. How do convert the string in $watersum to a json string?

Make watersum an array
watersum=( $(echo " SELECT MONTHNAME(timestamp), SUM(value) FROM woda GROUP BY YEAR(timestamp), MONTH(timestamp);" | mysql -s -u$SQUSER -p$SQPASS -h$SQHOST $SQLDB) )
echo "{" && for((i=0;i<"${#watersum[#]}";i+=2))
do
echo -n "\"${watersum[$i]}\":\"${watersum[((i+1))]}\"";
(( (i+2) == "${#watersum[#]}" )) || echo ","
done && echo;echo "}"
Output
{
"March":"693",
"April":"9768",
"May":"11277",
"June":"11987",
"July":"10047",
"August":"8570"
}

Loading CSV with NULLs columns using bq load

I am trying to upload a CSV file(TSV actually) generated in mysql(using outfile) into Bigquery using bq tool. This table has following schema:
Here is the sample data file:
"6.02" "0000" "101" \N "Md Fiesta Chicken|1|6.69|M|300212|100|100^M Sourdough|1|0|M|51301|112|112" "6.5" \N "V03" "24270310376" "10/17/2014 3:34 PM" "6.02" "30103" "452" "302998" "2014-12-08 10:57:15" \N
And this is how I try to upload it using bq CLI tool:
$ bq load -F '\t' --quote '"' --allow_jagged_rows receipt_archive.receipts /tmp/rec.csv
BigQuery error in load operation: Error processing job
'circular-gist-812:bqjob_r8d0bbc3192b065_0000014ab097c63c_1': Too many errors encountered. Limit is: 0.
Failure details:
- File: 0 / Line:1 / Field:16: Could not parse '\N' as a timestamp.
Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]]
I think the issue is that updated_at column is NULL & hence skipped. so any idea how can I tell it to consider null/empty columns?

CuriousMind - This isn't an answer. Just an example of the problem of using floats instead of decimals...
CREATE TABLE fd (f FLOAT(5,2),d DECIMAL(5,2));
INSERT INTO fd VALUES (100.30,100.30),(100.70,100.70;
SELECT * FROM fd;
+--------+--------+
| f | d |
+--------+--------+
| 100.30 | 100.30 |
| 100.70 | 100.70 |
+--------+--------+
SELECT f/3+f/3+f/3,d/3+d/3+d/3 FROM fd;
+-------------+-------------+
| f/3+f/3+f/3 | d/3+d/3+d/3 |
+-------------+-------------+
| 100.300003 | 100.300000 |
| 100.699997 | 100.700000 |
+-------------+-------------+
SELECT (f/3)*3,(d/3)*3 FROM fd;
+------------+------------+
| (f/3)*3 | (d/3)*3 |
+------------+------------+
| 100.300003 | 100.300000 |
| 100.699997 | 100.700000 |
+------------+------------+
But why is this a problem, I hear you ask?
Well, consider the following...
SELECT * FROM fd WHERE f <= 100.699997;
+--------+--------+
| f | d |
+--------+--------+
| 100.30 | 100.30 |
| 100.70 | 100.70 |
+--------+--------+
...now surely that's not what would be expected when dealing with money?

To specify "null" in a CSV file, elide all data for the field. (It looks like you are using an unspecified escape syntax "\N".)
For example:
$ echo 2, > rows.csv
$ bq load tmp.test rows.csv a:integer,b:integer
$ bq head tmp.test
+---+------+
| a | b |
+---+------+
| 2 | NULL |
+---+------+

How to Set Variables and Process Variable for MySQL in a Perl Script

HERE IS MY TABLE EXAMPLE:
Id | Time | Predicted | Actual | High
----+------------+------------+----------+---------
1 | 01:00:00 | 100 | 100 | NULL
2 | 02:00:00 | 200 | 50 | NULL
3 | 03:00:00 | 150 | 100 | NULL
4 | 04:00:00 | 180 | 80 | NULL
I want to find highest value in Predicted and place it in the 'High' column (IN A SPECIFIC ROW)
========= USING THE FOLLOWING SYNTAX I AM ABLE TO ACHIEVE THIS MANUALLY IN SQL WITH THE FOLLOWING:
SET #peak=(SELECT MAX(Predicted) FROM table);
UPDATE table SET Peak=#peak WHERE Id='1';
Id | Time | Predicted | Actual | High
----+------------+------------+-----------+---------
1 | 01:00:00 | 100 | 100 | 200
2 | 02:00:00 | 200 | 50 | NULL
3 | 03:00:00 | 150 | 100 | NULL
4 | 04:00:00 | 180 | 80 | NULL
=======================================
However, when I attempt to use the above syntax in a Perl script it fails due to the '#" or any variable symbol. Here is the Perl syntax I attempted to overcome the variable issue with no real favourable results. This is true even when placing the #peak variable in the 'execute(#peak) with ? in the pre-leading syntax' parameter:
my $Id_var= '1';
my $sth = $dbh->prepare( 'set #peak = (SELECT MAX(Predicted) FROM table)' );
my $sti = $dbh->prepare ( "UPDATE table SET Peak = #peak WHERE Id = ? " );
$sth->execute;
$sth->finish();
$sti->execute('$Id_var');
$sti->finish();
$dbh->commit or die $DBI::errstr;
With the following error:
Global symbol "#peak" requires explicit package name
I would appreciate any help to get this working within my Perl script.

You need to escape the # symbal (which denotes an array variable) or use single quotes, eg
my $sti = $dbh->prepare ( "UPDATE table SET Peak = \#peak WHE...
Or, use a single quote
my $sti = $dbh->prepare ( 'UPDATE table SET Peak = #peak WHE...

Perl sees #peak as an array. Try referring to it as \#peak. The back slash means interpret next character literally.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

psql to csv file '-' becomes '-0' - csv

Try using the decode function in place of the field name. decode(myx_ecg_posneg,'-','N','+','P') Update: Sorry, that's pl/sql. Try the case expression: CASE myx_ecg_posneg WHEN '-' THEN 'N' WHEN '+' THEN 'P' END

Related

Find and replace '000L' from all rows

CSV output file using command line for wireshark IO graph statistics

bash - extract data from mysql table (GROUP BY)- how to process

Loading CSV with NULLs columns using bq load

How to Set Variables and Process Variable for MySQL in a Perl Script

Categories

Resources