CSV file data sorting using sed command - csv

tet(44)1,tet(37)1,oqxB1,VanX-D4,VanX-D4,tet(O)2,aadA51,VanR-G1,
How can I remove everything that comes only after the bracket () for each data, by using sed?
As these data are not in any pattern, deleting only characters after the bracket not working at all.

Related

Remove newline characters in my data with Duckdb

I've received a csv file with some issues, I've noticed a few issues when attempting to load into BigQuery.
I'm using duckdb to quickly sanitise the data, and I'm noticing a bunch of newline characters in my data.
Is there a quick way to remove newlines from duckdb data before I write back out to csv?
Linux uses \n for a new-line, Windows \r\n and old Macs \r.
So essentially, what solved it for me was
select * from table where regexp_matches(bad_column, ['\r\n']);
then I could run
delete from table where id in (select id from table where regexp_matches(bad_column, ['\r\n']));

Replace a value of attribute in json in SQL dump

I have table in mariadb with five columns and in that one column is of type longtext (Compressed String) in json format.
Also I have a dump of that table and in that I need to change the attribute salary value in nested json string in below based on the value of empId attribute's value.
INSERT INTO Employee VALUES (1,"ram","1243","19-03-14",{"name":"ram",age:"23","empId":"1234","address":{"city":{"name":"",.*},"country":{"name":"",.*}},"gender":"male","hobbies":"travel","salary":"40000","qualification":"BE","marrital-status":"married",.*),
(2,"komal","1243","19-03-14",{"name":"komal",age:"21","empId":"1534","address":{"city":{"name":"",.*},"country":{"name":"",.*}},"gender":"male","hobbies":"music","salary":"30000","qualification":"BE","marrital-status":"married",.*),
(3,"ramya","1243","19-03-14",{"name":"ramya",age:"22","empId":"1754","address":{"city":{"name":"",.*},"country":{"name":"",.*}},"gender":"male","hobbies":"travel","salary":"40000","qualification":"BE","marrital-status":"married",.*),
(4,"raj","1243","19-03-14",{"name":"raj",age:"23","empId":"1364","address":{"city":{"name":"",.*},"country":{"name":"",.*}},"gender":"male","hobbies":"playing","salary":"40000","qualification":"BE","marrital-status":"married",.*);
I have a csv file with empId and mapped to revised salary.
I will loop the csv file and in the dump based on empId replace the salary from the csv.
I tried with sed command to replace with below pattern.
:%s/\("empId":"1243"\),\(.*\),\("address":{"city":{\),\(.*\),\(}\),\("country":{\),\(.*\),\(}}\),\(.*\),"salary":[0-9\"]*/\1,\2,\3,\4,5,\6,\7,\8,\9,"salaray":"50000"/
but I am getting below error message.
E872: (NFA regexp) Too many '('
E51: Too many \(
E476: Invalid command
How can I parse the json in the dump and change the value using sed or any other way in shell script?
I'm not 100% sure about your use case but you can try this for each empId in the csv.
sed -E 's/"empId":"1234+"/"salary":1234/g' input.txt
Add a -i to the command to replace the text in the file otherwise it will just be the standard output that changes after you test that it's working.
Using node or python to do this might be easier. You could make a small script that reads the json from a file then outputs the new json to a new file. Or if you're restricted in your tech stack, maybe loading the csv into the sql db itself and creating a query from it, with your input data, to get the right form to then insert could be another option.
Some resources I used which might explain better than this short answer.
https://linuxize.com/post/how-to-use-sed-to-find-and-replace-string-in-files/
https://phoenixnap.com/kb/grep-regex
using extended regular expressions in sed

Delete parentheses from a CSV file in Bash script

I'm using a bash script to create a report for AdWords (the AdWords files are in python). I'm generating a "CAMPAIGN_PERFORMANCE_REPORT" (as CSV file), and one of the measures I take is "conversions".
My problem is this - when I have more then thousand conversions, the number is with " on each side and a comma.
Example:
2016-12-25,Campaign_A,Universal App Campaign,264.0
2016-12-25,Campaign_B,Universal App Campaign,"1,535.0"
2016-12-25,Campaign_C,Universal App Campaign,"1,472.0"
2016-12-25,Campaign_D,Universal App Campaign,"1,378.0"
2016-12-25,Campaign_E,Universal App Campaign,382.0
2016-12-25,Campaign_F,Universal App Campaign,431.0
When I insert this data into MySQL the cell is divided in 2 and I get "1" in the conversions instead of 1535 (for instance).
So I need your help in one of these two issues:
Does anyone know how can I take the "conversions" field as Long and not as Double from the AdWords API?
If Not, How can I replace parenthesis (") and commas (,) in several files in the same folder in linux? Since I have a csv file for each AdWords account...
Thank you!
This is too long for a comment.
If you are loading data into MySQL, then you should be using load data infile.
This command has an option: fields optionally enclosed by, where you can specify the double quote character. This will treat commas between the delimiter character as part of the value, not a value separator.
You can review the documentation here.
You can run the file through a sed filter like this:
sed -r ':l s/"([0-9]+),/"\1/g; t l; s/"([0-9.]+)"$/\1/g' yourfile > convertedfile
It uses a two step approach to get rid of commas and quotes:
as long as there is a quote followed by a number ( [0-9.]+ ) followed by , the comma is removed: :l s/"([0-9]+),/"\1/g; t l; (this is a "label; remove comma; if something was removed goto label" - construct)
remove quotes around numbers ([0-9.]+) at the end of a line ($)

How can I quickly reformat a CSV file into SQL format in Vim?

I have a CSV file that I need to format (i.e., turn into) a SQL file for ingestion into MySQL. I am looking for a way to add the text delimiters (single quote) to the text, but not to the numbers, booleans, etc. I am finding it difficult because some of the text that I need to enclose in single quotes have commas themselves, making it difficult to key in to the commas for search and replace. Here is an example line I am working with:
1239,1998-08-26,'Severe Storm(s)','Texas,Val Verde,"DEL RIO, PARKS",'No',25,"412,007.74"
This is FEMA data file, with 131246 lines, I got off of data.gov that I am trying to get into a MySQL database. As you can see, I need to insert a single quote after Texas and before Val Verde, so I tried:
s/,/','/3
But that only replaced the first occurrence of the comma on the first three lines of the file. Once I get past that, I will need to find a way to deal with "DEL RIO, PARKS", as that has a comma that I do not want to place a single quote around.
So, is there a "nice" way to manipulate this data to get it from plain CSV to a proper SQL format?
Thanks
CSV files are notoriously dicey to parse. Different programs export CSV in different ways, possibly including strangeness like embedding new lines within a quoted field or different ways of representing quotes within a quoted field. You're better off using a tool specifically suited to parsing CSV -- perl, python, ruby and java all have CSV parsing libraries, or there are command line programs such as csvtool or ffe.
If you use a scripting language's CSV library, you may also be able to leverage the language's SQL import as well. That's overkill for a one-off, but if you're importing a lot of data this way, or if you're transforming data, it may be worthwhile.
I think that I would also want to do some troubleshooting to find out why the CSV import into MYSql failed.
I would take an approach like this:
:%s/,\("[^"]*"\|[^,"]*\)/,'\1'/g
:%s/^\("[^"]*"\|[^,"]*\)/'\1'/g
In words, look for a double quoted set of characters or , \|, a non-double quoted set of characters beginning with a comma and replace the set of characters in a single quotation.
Next, for the first column in a row, look for a double quoted set of characters or , \|, a non-double quoted set of characters beginning with a comma and replace the set of characters in a single quotation.
Try the csv plugin. It allows to convert the data into other formats. The help includes an example, how to convert the data for importing it into a database
Just to bring this to a close, I ended up using #Eric Andres idea, which was the MySQL load data option:
LOAD DATA LOCAL INFILE '/path/to/file.csv'
INTO TABLE MYTABLE FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n';
The initial .csv file still took a little massaging, but not as much as I were to do it by hand.
When I commented that the LOAD DATA had truncated my file, I was incorrect. I was treating the file as a typical .sql file and assumed the "ID" column I had added would auto-increment. This turned out to not be the case. I had to create a quick script that prepended an ID to the front of each line. After that, the LOAD DATA command worked for all lines in my file. In other words, all data has to be in place within the file to load before the load, or the load will not work.
Thanks again to all who replied, and #Eric Andres for his idea, which I ultimately used.

Is there a work-around that allows missing data to equal NULL for LOAD DATA INFILE in MySQL?

I have a lot of large csv files with NULL values stored as ,, (i.e., no entry). Using LOAD DATA INFILE makes these NULL values into zeros, even if I create the table with a string like var DOUBLE DEFAULT NULL. After a lot of searching I found that this is a known "bug", although it may be a feature for some users. Is there a way that I can fix this on the fly without pre-processing? These data are all numeric, so a zero value is very different from NULL.
Or if I have to do pre-processing, is there one that is most promising for dealing with tens of csv files of 100mb to 1gb? Thanks!
With minimal preprocessing with sed, you can have your data ready for import.
for csvfile in *.csv
do
sed -i -e 's/^,/\\N,/' -e 's/,$/,\\N/' -e 's/,,/,\\N,/g' -e 's/,,/,\\N,/g' $csvfile
done
That should do an in-place edit of your CSV files and replace the blank values with \N. Update the glob, *.csv, to match your needs.
The reason there are two identical regular expressions matching ,, is because I couldn't figure out another way to make it replace two consecutive blank values. E.g. ,,,.
"\N" (without quotes) in a data file signifies that the value should be null when the file is imported into MySQL. Can you edit the files to replace ",," with ",\N,"?