MySQL to CSV - separating multiple values - mysql

I have downloaded a MySQL table as CSV, which has over thousand entries of the following type:
id,gender,garment-color
1,male,white
2,"male,female",black
3,female,"red,pink"
Now, when I am trying to create a chart out of this data, it is taking "male" as one value, and "male,female" as a separate value.
So, for the above example, rather than counting 2 "male", and 3 "female", the chart is showing 3 separate categories ("male", "female", "male,female"), with one count each.
I want the output as follows, for chart to have the correct count:
id,gender,garment-color
1,male,white
2,male,black
2,female,black
3,female,red
3,female,pink
The only way I know is to copy the row in MS Excel and adjust the values manually, which is too tedious for 1000+ entries. Is there a better way?

From MySQL command line or whatever tool you are using to send queries to MySQL:
select * from the_table
into outfile '/tmp/out.txt' fields terminated by ',' enclosed by '"'
Then download /tmp/out.txt' from the server and it should be good to go assuming your data is good. If it is not, you might need to massage it with some SQL function use in theselect`.

The csv likely came from a poorly designed/normalized database that had both those values in the same row. You could try using selects and updates, along some built in string functions, on such rows to spawn additional rows containing the additional values and update their original rows to remove those values; but you will have to repeat until all commas are removed (if there is more than one in some field), and will have to determine if a row containing multiple fields with such comma-separated lists need multiplied out (i.e. should 2 gender and 4 color mean 8 rows total).
More likely, you'll probably want to create additional tables for X_garmentcolors, and X_genders; where X is whatever the original table is supposed to be describing. These tables would have an X_id field referencing the original row and a [garmentcolor|gender] value field holding one of the values in the original rows lists. Ideally, they should actually reference [gender|garmentcolor] lookup tables instead of holding actual values; but you'd have to do the grunt work of picking out all the unique colors and genders from your data first. Once that is done, you can do something like:
INSERT INTO X_[garmentcolor|gender] (X_id, Y_id)
SELECT X.X_id, Y.Y_id
FROM originalTable AS X
INNER JOIN valueTable AS Y
ON X.Y_valuelist LIKE CONCAT('%,' Y.value) -- Value at end of list
OR X.Y_valuelist LIKE CONCAT('%,' Y.value, ',%') -- Value in middle of list
OR X.Y_valuelist LIKE CONCAT(Y.value, ',%') -- Value at start of list
OR X.Y_valuelist = Y.value -- Value is entire list
;

Related

How to make MySQL MATCH...AGAINST use various word separators?

I have a table with 300K string values. These values contain all types of word separators so it looks like this:
id value
1 A B C
2 A B_C
3 A_B-C
4 A-B-C
Let's say I want to find all four rows containing A and B. This query
SELECT * FROM table WHERE MATCH(value) AGAINST('+A +B' IN BOOLEAN MODE);
will return only one row with space separated values:
1 A B C
Is there a way to make MATCH...AGAINST use other word separators? I tried to use LIKE and it was too slow.
You will probably want to alter your app and schema just a little bit to solve this problem. You have two tasks:
Task 1: Transform your existing data
Assuming you need to keep the source data unchanged:
Step 1: Add a field to your schema, "searchFriendly", same datatype as the source data.
Step 2: Write a script to transform the data you already have. Get the whole data set and do string replaces to get spaces.
Step 3: Save that transformed data to the new searchFriendly field.
Task 2: Modify the app so that all future database save/update's on this data, also perform the transformation and save that data as well.
Step 1: Find the part of the app that saves these records.
Step 2: Before actually writing the data to the database, perform the transformation.
Step 3: Add the transformed data to your API call to save/update the record, under the searchFriendly field.

MySQL Query conditional find nth element in column string

I have a MySQL table setup where one column's values are a string of comma-separated True/False values (1s or 0s). For example, in the column, one field's value may be "0,1,0,0,0,0,1,1,0" and another may be "1,0,0,1,1,1,0,0,0" (note: these are NOT 9 separate columns, but a string in one column). I need to QUERY the MySQL table for elements that are "true"(1) for the "nth element" of that column's value/string.
So, if I was looking for rows, with a specific column, where the 3rd element of the column's value was 1, it would produce a list of results. So, in this case, I would only be searching for "1" in the fth place (12345 = X,X,X...) of the string (X,X,1,X,X,X,X,X,X,X). How can I query this?
This is a crude example of what I am trying to do ...
"SELECT tfcolumn FROM mytable WHERE substr({tfcolumn}, 0, 5)=1"
{tfcolumn} represents the column value
5 represents the 5th position of the string
=1 represents what I need that position to equal to.
Please help. Thanks
You can't. Once you put a serialized data type into a column in SQL (like comma separated lists, or JSON objects) you are preventing yourself from performing any query on the data in those columns. You have to pull the data in a different way and then use a program like python, VB, etc to get the comma separated values you are looking for.
Unless you want to deal with trying to make this mess of a query work...
I would recommend changing your table structure before it's too late. Although it is possible, it is not optimized in a format that a DBMS recognizes. Because of that the DBMS will spend a significant amount of time going through every record to parse the csv values which is something that it was not meant to be doing. Doing the query in SQL will take as much time (if not more time) than just pulling all the records and searching with a tool that can do it properly.
If the column contains values exactly like the ones you posted, then the Nth element is at the 2 * N - 1 position in the comma separated list.
So do this:
SELECT tfcolumn
FROM tablename
WHERE substr(tfcolumn, 2 * 5 - 1, 1) = '1'
Replace 5 with the index that you search for.
See the demo.
Or remove all commas and get the Nth char:
SELECT tfcolumn
FROM tablename
WHERE substr(replace(tfcolumn, ',', ''), 5, 1) = '1'
See the demo.
Try this
if substring_index(substring_index('0,1,0,0,0,0,1,1,0',',',3),',',-1)='1'
The first argument can be your column name. The second argument (',') tells the function that the string is comma-separated. The third argument takes the first 3 elements of the string. So, the output of inner substring_index is '0,1,0'.
The outer substring_index has -1 as the last argment. So, it starts counting in reverse direction & takes only 1 element starting from right.
For example, if the value in a particular row is '2,682,7003,14,185', then the value of substring_index(substring_index('2,682,7003,14,185',',',3),',',-1) is '7003'.

Show Fields but with content

i'm trying to show fields names on a combobox, but I need only those that are not null or have blank spaces.
I all ready have the field names with this query:
SHOW FIELDS FROM model WHERE type LIKE 'varchar(15)'
Any idea about how can i do this?
Update:
I'm working with an old database who is poorly designed. I attached an image:Database Screenshot This is a tire sizes database, so i need to get the years by model who has the size captured to show them in a combo box.
You can use your current query to get the "candidate" fields, but (short of some very complicated dynamic sql) you'll need to build a separate query to determine which candidates are pertinent. Generically, something like:
SELECT SUM(IF(field1 REGEXP '[a-zA-Z0-9.]+', 1, 0) > 0 AS showField1
, SUM(IF(field2 REGEXP '[a-zA-Z0-9.]+', 1, 0) > 0 AS showField2
, ...
FROM theTable
;
Depending on what you consider "has values" the regexp string may need adjusted; learn more here: http://dev.mysql.com/doc/refman/5.7/en/regexp.html
If the table is huge (large number of rows), you may be better off querying on each field separately like this (the one above will be looking at the whole table without some sort of WHERE to reduce the rows examined):
SELECT 1 FROM theTable WHERE fieldX REGEXP '[a-zA-Z0-9.]+' LIMIT 1;
Treating having a query result as "yes" and no result as "no content".

MySQL column: Change existing phone numbers into specific format?

I have a MySQL column that contains phone numbers, the problem is that they're in different formats, such as:
2125551212
212-555-1212
(212)5551212
I'd like to know if it's possible to take the existing 10 digits, remove the formatting, and change them all to this format: (212) 555-1212
Not a duplicate, as I'm looking to update several thousand entries instead of masking new entries.
Unfortunately, no REGEXP_MATCHES() or TRANSLATE() function comes with standard MySQL installation (they do with Postgres), so you could do this a way which I find really dirty, but it works.
First you cleanse your column by removing characters that aren't numbers using replace()
Then you take several parts of the string to separate them out using substr()
Finally, you concatenate them adding symbols between your substrings with concat()
If you have any more characters that you need truncate, just add another replace() on top of 3 already existing.
Sample data
create table nums ( num text );
insert into nums values
('2125551212'),
('212-555-1212'),
('(212)5551212');
Query formatting your data
select
num,
concat('(',substr(num_cleansed,1,3),') ',substr(num_cleansed,4,3),'-',substr(num_cleansed,7)) AS num_formatted
from (
select
num,
replace(replace(replace(num,'(',''),')',''),'-','') as num_cleansed
from nums
) foo
Result
num num_formatted
2125551212 (212) 555-1212
212-555-1212 (212) 555-1212
(212)5551212 (212) 555-1212
Click here SQLFiddle to preview output.
I'm leaving UPDATE statement as a homework for the reader.

MySQL: How to split a long text separated by pipeline and semicolon

I have a column with type string, and we saved all sizes and prices in this field it is separated by pipeline and semicolon sign as shown below
10;100|11;111|12;112|13;130|14;130|15;105
I want to get the minimum price in this field (price;size where 10 is size and 1502 is price).
I used this query but it returns only one of them and not the minimum
select SUBSTRING_INDEX((textColumn), '|', 1) as sizeAndprice from myTable;
If the structure can be changed, I would suggest in doing so.
However, if it cannot be changed whatever the reason, I would fetch the entire row as it is and use the explode() function:
/* fetch the whole column as string in a variable ($myWholeText for this example) */
$mySplitColumns = explode ("|", $myWholeText );
You will then have all the split values (both price and size) in a separate index of array $mySplitColumns, and you will be able to sort them out as you like.
For more reference about the explode function, check this link