Every week, I have to completely replace the data in several very large MySQL tables. So what I normally do is delete the existing data, import the new data, and then run my usual queries to modify the new data as needed.
Unfortunately, these days I have noticed that the new data contains unwanted characters, such as quotes and extra spaces. With well over 100,000 records in some of these tables (AFAIK), I cannot easily open the data in notepad to strip out unwanted characters, prior to importing.
I realize I could write a separate find and replace query for every single column in every table, like this:
UPDATE mytablename SET mycolumn = REPLACE(mycolumn, '"', '');
But having to name every column is a bother. Anyway, I would like to find a more elegant solution. Today, I found a snippet on the internet that looks like a start:
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE
table_name = 'myTable' and ordinal_position = 1
I think the next step might be to loop through the ordinal positions, and then replace and update each column, but I don't know how to do this in MySQL. I also don't know how to stop the loop after the last column is reached, to avoid error messages.
Is there an easy way to do this? Or am I hoping for too much?
I am a beginner, so a clear, simple explanation would be much appreciated.
Thanks in advance.
MORE INFORMATION:
Since my first post, I have discovered that stored procedures are not allowed on my server. Too bad.
Anyway, I have tried this new code, just to get started:
set #mytablestring='mytable';
set #mycolumnnumber=1;
set #mycolumnname=(SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = #mytablestring and ordinal_position = #mycolumnnumber);
SELECT #mycolumnname FROM mytable;
Unfortunately, in the final SELECT query, #mycolumnname is interpreted as a string, not as a column name. So the query does not work. If I could get past this, I believe I could write some code to loop through the columns by incrementing #mycolumnnumber.
If anyone knows how to solve this, I would really appreciate some help.
Many thanks.
I suggest that you take a look at vim, sed, awk and many of the other text editors and text processing utilities that you can find on Linux (and sometimes on Windows too). 100,000 records may be a pain in Notepad, but it's a piece of cake for real text processing utilities.
For example, to strip all # characters from foobar.txt:
sed 's/#//g' foobar.txt > foobar-clean.txt
Or, the same thing with the file opened in (g)vim:
:%s/#//g
Related
I am trying to create a BCP file with | delimiter and then load it to a snowflake table.
Issue:
in SQL server there are columns defined as CHAR(4) and have values "sss"
so when i do BCP the its being padded to length of 4 "sss " and being loaded to snowflake
due to which our reports are failing because they do something like where column="SSS" but due to trailing space in snowflake the correct columns are not showing up.
we do not want to change our reports. So, is there a way that BCP can handle the padding or trimming of these columns?
note that there 24 tables and each have around 130+ columns so i cant go and put Trim functions on each char column
If your BCP file is maintaining the trailing space, then Snowflake will retain it, too, as long as the field is being FIELD_OPTIONALLY_ENCLOSED_BY a " or '. You may also want to make sure your TRIM_SPACE option is correctly set on your format definition for your COPY INTO command.
If your BCP file isn't maintaining the space and you can't figure out how to get that to work, you could force the space back in during the COPY INTO command with some string functions in your SELECT, or you could create a view for your report that does the same set of string functions to force the space for your report to work from.
So, is there a way that BCP can handle the padding or trimming of these columns?
Yes, but not by some switch or option. The correct way to handle this is to set your datatypes up front. As someone mentioned in comments to your question, your query that is creating BCP output should use VARCHAR(4) instead of CHAR(4). BCP is giving you what you asked of it. They way to avoid whitespace is to use varchar.
Seems like a fairly quick "find and replace" against scripted out query objects would work fine but you know your situation best.
Additionally, "trim" wont work - FYI. Even if the value of the field was only "SSS" (as in your example); if the result/column is defined as CHAR(4) you will get 4 bytes of data and a blank in the 4th place since you only had 3 bytes of data. Trim will work during the query... the padded " " you are getting is placed there by the copy out. The way to correct this is to set your data types as you need up front.
Unless someone knows of a better way in snowflake (im not familiar with it) the only other option is to manipulate the file inbetween SQL and Snowflake. replace " |" with "|"... but... blech.
This is a known "issue" with BCP. The "solution" is to use the queryout option, which means you must include a query with every export. But the data are the way they are.
Eg: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/88c258fe-d1a6-4f3a-9dac-40388d04e9c7/remove-space-in-columns-on-bcp-out?forum=transactsql
But this is really a Snowflake problem, because Snowflake has its own default CHAR semantics.
You get a warning in the documentation String & Binary Data Types but that doesn't tell the whole truth.
The following executed on Oracle (and apparently MSSQL? MySQL?) will select the aaa line:
CREATE TABLE C AS SELECT CAST('aaa ' AS CHAR(4)) t FROM DUAL;
SELECT * FROM C WHERE t = 'aaa';
but won't on Snowflake, unless you create the column with COLLATION:
CREATE OR REPLACE TABLE C (t CHAR(4) COLLATE 'en_US-rtrim');
INSERT INTO C VALUES('aaa ');
SELECT * FROM C WHERE t = 'aaa';
Unfortunately, you can't ALTER the collation after creation, which would have been convenient after a COPY INTO <table>.
PS: Mike Walton's answer is better, TRIM_SPACE is much cleaner than COLLATE.
I have two tables that i would like to join. They were loaded from separate files sources. I am trying to join then using this code.
SELECT mutation_icgc2016.idsample, mutation_icgc2016.primarySite,
mutation_icgc2016.cancer, mutation_icgc2016.geneAffected,
ICGCDiffSamplesJan2016.idsample
FROM mutation_icgc2016
JOIN ICGCDiffSamplesJan2016
ON mutation_icgc2016.idsample=ICGCDiffSamplesJan2016.idsample
WHERE mutation_icgc2016.geneAffected is not null;
There is a problem however. The resulting table is empty. If i use just one value from the rows e.g.:
select * from ICGCOldSamplesJan2016 where idsample = "xxxxxx";
Results actually come up.
Thanks for your help in advance.
I have found the solution, I exported the files from excel becasue that is how they were given to me. Turns out Excel adds hidden/ special characters that mysql uses in identifying the fields. This is why trim doesnt work.
The solution was the to copy the row from excel into a text editor like notepad++ which i used in this case and saving the file as a csv. It worked! Thanks a lot for your help #JNevill.
I have a database full of important information... I need to change the information to fit a new format...
The current information follows this format... (Note this is all inserted in one cell)
DataString1:SomeOtherString1:MoreString1|DataString2:SomeOtherString2:MoreString2|DataString3:SomeOtherString3:MoreString3
I need to be able to do the following...
1) Locate all of the '|' symbols.
2) For each '|' symbol I need to find the second ':' before it.
3) Insert another ':' before the results of step 2.
I can accomplish this through code in another language like PHP for example but I would like to be able to do it via SQL.
The above example would turn into this... (I bolded the changes...)
DataString1::SomeOtherString1:MoreString1|DataString2::SomeOtherString2:MoreString2|DataString3::SomeOtherString3:MoreString3
I ended up just making a PHP script to do this, I'm not even sure what I wanted was entirely possible with MySQL as there was over 30 instances of | in each cell, and over 180 instances of ':' in each cell.... If anyone has an answer feel free to post and if it works Ill vote it as best answ
In sql the substring can be used like this substring(expression,startposition, length), If your occurence is repeating then you will have to create an update script which will check the substring postion at the character length lets say
DataString1
Here position of S is 5, you should count the occurence of | and save this on to a variable
select #sd substring (column_name ,start,length) from table where --conditons
Update table set column_name = '' where substring (column_name ,start,length) = '|'
you have to get the occurance of #sd and do the math to concatinate the string and update your cells
I ended up just making a PHP script to do this, I'm not even sure what I wanted was entirely possible with MySQL as there was over 30 instances of | in each cell, and over 180 instances of ':' in each cell.... If anyone has an answer feel free to post and if it works Ill vote it as best answer.
This should be very simple but MS Access is killing me!
All I want to do is find and replace all instances of ' (and some others) with the appropriate character, in this case, an apostrophe.
Here's my query:
UPDATE Table1
SET Title = Replace(Title, "'", "'")
WHERE Title LIKE '*'*'
And even a simple select doesn't work:
SELECT * FROM Table1
WHERE Title LIKE '*'*'
Does anyone have a solution? I really have searched and found nothing for this particular issue.
I could do a PHP script but all this stuff is supposed to be kept in the DB so I kinda need to sort it with a simple query if possible.
Thanks a lot!
EDIT: Some Sample Data of Title in Table1
Row 1. 'Quick Release' Loop, Stainless Steel
Row 2. Silver 'V' Shaped Edging
Row 3. Plastic 'T' Shaped Seal
The bad thing is that # is system character in jetSQL. There are several ways to deal with it:
omit WHERE part and update all strings with replace.
use such WHERE (InStr(1,[Title],"'")>0);
use DAO or ADO recordset
I am struggling with this query and want to know if I am wasting my time and need to write a php script or is something like the following actually possible?
UPDATE my_table
SET #userid = user_id
AND SET filename('http://pathto/newfilename_'#userid'.jpg')
FROM my_table
WHERE filename
LIKE '%_%' AND filename
LIKE '%jpg'AND filename
NOT LIKE 'http%';
Basically I have 700 odd files that need renaming in the database as they do not match the filenames as I am changing system, they are called in the database.
The format is 2_gfhgfhf.jpg which translates to userid_randomjumble.jpg
But not all files in the database are in this format only about 700 out of thousands. So I want to identify names that contain _ but don't contain http (thats the correct format that I don't want to touch).
I can do that fine but now comes the tricky bit!!
I want to replace that file name userid_randomjumble.jpg with http://pathto/filename_userid.jpg So I want to set the column user_id in that row to a variable and insert it into my new filename.
The above doesn't work for obvious reasons but I am not sure if there is a way round what I'm trying to do. I have no idea if it's possible? Am I wasting my time with this and should I turn to PHP with mysql and stop being lazy? Or is there a way to get this to work?
Yes it is possible without the php. Here is a simple example
SET #a:=0;
SELECT * FROM table WHERE field_name = #a;
Yes you can do it using straightforward SQL:
UPDATE my_table
SET filename = CONCAT('http://pathto/newfilename_', userid, '.jpg')
WHERE filename LIKE '%\_%jpg'
AND filename NOT LIKE 'http%';
Notes:
No need for variables. Any columns of rows being updated may be referenced
In mysql, use CONCAT() to add text values together
With LIKE, an underscore (_) has a special meaning - it means "any single character". If you want to match a literal underscore, you must escape it with a backslash (\)
Your two LIKE predicates may be safely merged into one for a simpler query