SQL: List unique substring from fields in table - mysql

I'm running a query to retrieve the first word from a string in a colum like so..
SELECT SUBSTRING_INDEX( `field` , ' ', 1 ) AS `field_first_word`FROM `your_table`
But this allows duplicates, what do I need to add to the statement to get unique list of words out?

DISTINCT
For example:
SELECT DISTINCT SUBSTRING_INDEX( `field` , ' ', 1 ) AS `field_first_word` FROM `your_table`
Note: There are performance implications for DISTINCT. However, in your case, there is likely limited pure MySQL alternatives.

To remove duplicates in a SELECT statement, change to SELECT DISTINCT column. In this case:
SELECT DISTINCT SUBSTRING_INDEX( `field` , ' ', 1 ) AS `field_first_word`FROM `your_table`

Related

How can I use an IF or Case function to summarize a GROUP_CONCAT column? AND then apply it to the original data table?

I am quite the novice at MYSQL and would appreciate any pointers - the goal here would be to automate a categorical field using GROUP_CONCAT in a certain way, and then summarize certain patterns in the GROUP_CONCAT field in a new_column. Furthermore, is it possible to add the new_column to the original table in one query? Below is what I've tried and errors to an unknown column "Codes" if this assists:
SELECT
`ID`,
`Code`,
GROUP_CONCAT(DISTINCT `Code` ORDER BY `Code` ASC SEPARATOR ", ") AS `Codes`,
IF(`Codes` LIKE '123%', 'Description1',
IF(`Codes` = '123, R321', 'Description2',
"Logic Needed"))
FROM Table1
GROUP BY `ID`
Instead of nested if statements, I would like to have a CASE statement as a substitute. Reason being is that I already have around 1000 lines of logical already written as "If [column] = "?" Then "?" else if" etc. I feel like using CASE would be an easier transition with the logic. Maybe something like:
SELECT
`ID`,
`Code`,
GROUP_CONCAT(DISTINCT `Code` ORDER BY `Code` ASC SEPARATOR ", ") AS `Codes`,
CASE
WHEN `Codes` LIKE '123%' THEN 'Description1'
WHEN `Codes` = '123, R321' THEN 'Description2'
ELSE "Logic Needed"
END
FROM Table1
GROUP BY `ID`
Table Example:
ID,Code
1,R321
1,123
2,1234
3,1231
4,123
4,R321
Completed Table:
ID,Codes,New_Column
1,"123, R321",Description2
2,1234,Description1
3,1231,Description1
4,"123, R321",Description2
How then can I add back the summarized data to the original table?
Final Table:
ID,Code,New_Column
1,R321,Description2
1,123,Description2
2,1234,Description1
3,1231,Description1
4,123,Description2
4,R321,Description2
Thanks.
You can't refer to a column alias in the same query. You need to do the GROUP_CONCAT() in a subquery, then the main query can refer to Codes to summarize it.
It also doesn't make sense to select Code, since there isn't a single Code value in the group.
SELECT ID, Codes,
CASE
WHEN `Codes` = '123, R321' THEN 'Description2'
WHEN `Codes` LIKE '123%' THEN 'Description1'
ELSE "Logic Needed"
END AS New_Column
FROM (
SELECT
`ID`,
GROUP_CONCAT(DISTINCT `Code` ORDER BY `Code` ASC SEPARATOR ", ") AS `Codes`
FROM Table1
GROUP BY ID
) AS x
As mentioned in a comment, the WHEN clauses are tested in order, so you need to put the more specific cases first. You might want to use FIND_IN_SET() rather than LIKE, since 123% will match 1234, not just 123, something

How to quote values of single column using group_concat and concat, distinct

I need to use group_concat to build a list of comma separated values but I need the values to be quoted inside single quote. How to do this? The query which I have written doesn't work for me.
I have values inside column like this:
userid (column)
1) 1,2
2) 3,4
Query 1:
SELECT GROUP_CONCAT( DISTINCT CONCAT('\'', user_id, '\'') ) as listed_id
Query 2:
SELECT GROUP_CONCAT( DISTINCT CONCAT('''', user_id, '''') ) as listed_id
Expected output:
'1','2','3','4'
But I am getting values like this
'1,2,3,4'
Try this, Its is working perfectly in my case:
SELECT GROUP_CONCAT( DISTINCT CONCAT("'", REPLACE(user_id, "," , "','") , "'")) as listed_id FROM users
Here is the output:

mysql query - how to merge tables in another table

I have a problem joining tables in the result column. i have a working query which combine different tables using UNION but when i'm extending another table i got an error saying 'The used SELECT statements have a different number of columns'
this is my query:
(SELECT
IDNumber,
CONCAT(LastName, ', ', FirstName, ' ', Middle) as Name,
CONCAT(EmDesignation, ', ', Department) as Information,
Image,
v.PlateNo as PlateNumber
FROM
tblemployee as e, tblvehicle as v
WHERE
v.RFIDNo LIKE '6424823943'
AND
e.RFIDNo LIKE '6424823943')
UNION
(SELECT
IDNumber,
CONCAT(LastName, ', ', FirstName, ' ', Middle) as Name,
CONCAT(Course, ', ', Year) as Information,
Image,
v.PlateNo as PlateNumber
FROM
tblstudents as s, tblvehicle as v
WHERE
v.RFIDNo LIKE '6424823943'
AND
s.RFIDNo LIKE '6424823943')
I have problem with this. Continuation query above
UNION
(SELECT
Barrier
FROM
tblinformation as inf
WHERE
inf.RFIDNo IN (6424823943)
ORDER BY
AttendanceNo DESC LIMIT 1)
The error message is correct. Add NULLs to your second query to match the column number, and it will work.
For example:
SELECT
Barrier,
NULL,
NULL,
NULL,
NULL
FROM
tblinformation as inf
...
The error message states what the problem is. Just improve number of columns in SELECT and it will work correctly.

Average length of table column

I have a table with 50+ VARCHAR(255) columns.
The moderators report that some of the content is cut of after 250 characters in few of the fields.
As checked this is expected behavior for VARCHAR(255) and I have to update some of the fields to text. But the problem is they can not give me details/instruction which fields are making problems ..
So my best guess is to analyse the current data and find the columns that usually store long content.
Is there a good query structure I can use to get:
- AVG length for each column.
- Max length for each column.
- Count of rows with length 200+ for this column.
SELECT AVG(CHAR_LENGTH(col)) avg_length,
MAX(CHAR_LENGTH(col)) max_length,
COUNT(CASE WHEN CHAR_LENGTH(col) >= 200 THEN 1 ELSE NULL END) 200_plus_count
FROM tbl;
For average select AVG(length(column_name)) and for maximum select MAX(length(column_name)) for count 200+ select COUNT(column_name) from table WHERE len(rows)=>200. This site should help you with other sql related questions, hope I answered your question :)
I recently had reason to implement exactly this. Using similar logic as Arth's and VMai's answers, I built a stored procedure to get all column sizes for a table.
DELIMITER //
CREATE PROCEDURE ColumnSizeForTable(TableName varchar(64), SchemaName varchar(64))
BEGIN
SELECT ##group_concat_max_len INTO #group_concat_max;
SET SESSION group_concat_max_len = 100000;
SELECT
CONCAT('SELECT TRIM(TRAILING \' UNION ALL \' FROM CAST(CONCAT(',
GROUP_CONCAT(
CONCAT(
CONCAT('\'SELECT \\\'', COLUMN_NAME, '\\\' ColName,\''),
', ',
CONCAT('IFNULL(AVG(CHAR_LENGTH(',COLUMN_NAME,')),\'0\'), \' ColAverage,\''),
', ',
CONCAT('IFNULL(MAX(CHAR_LENGTH(',COLUMN_NAME,')),\'0\'), \' ColMaximum,\''),
', ',
CONCAT('IFNULL(COUNT(CASE WHEN CHAR_LENGTH(',COLUMN_NAME,') >= 200 THEN 1 ELSE NULL END),\'0\'), \' Col200Plus UNION ALL \'')
)
),
') AS CHAR)) INTO #unionquery FROM ',
TABLE_NAME,
';')
INTO #columnquery FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = TableName
AND TABLE_SCHEMA = SchemaName
GROUP BY TABLE_NAME;
PREPARE columnsizestmnt FROM #columnquery;
EXECUTE columnsizestmnt;
PREPARE unionstmnt FROM #unionquery;
EXECUTE unionstmnt;
SET SESSION group_concat_max_len = #group_concat_max;
END //
DELIMITER ;
CALL ColumnSizeForTable('TABLENAME','SCHEMANAME');
How this works:
Increase the session variable group_concat_max_len so we don't have to worry about GROUP_CONCAT being cut off.
Execute a concat query that builds another query.
This first query populates the column names.
The resulting query is put into #columnquery
Execute #columnquery. This builds the query to put the data into a readable format
This second query gets the column data, including average, max, and the 200+ count.
Similar to VMai's answer, if we weren't building another query from this, it would be a flattened result set.
The resulting query is put into #unionquery
Execute #unionquery. This SELECT is outputted to the user. It returns the column details that we were trying to collect in a single, readable format.
I'm building on Arths good answer and integrating my comment. This query should build the query to be executed:
SELECT
CONCAT(
-- The `SELECT` keyword
'SELECT ',
-- we build our list of analyzing columns with a GROUP_CONCAT
GROUP_CONCAT(
CONCAT (
-- of the columns from Arths answer
CONCAT ('AVG(CHAR_LENGTH(', COLUMN_NAME, ')) AVG_', COLUMN_NAME), ', ',
CONCAT ('MAX(CHAR_LENGTH(', COLUMN_NAME, ')) MAX_', COLUMN_NAME), ', ',
CONCAT ('COUNT((CASE WHEN CHAR_LENGTH(', COLUMN_NAME, ') >= 200 THEN 1 ELSE NULL END) 200_plus_', COLUMN_NAME)
), ' '),
-- and add the FROM clause
' FROM ',
TABLE_NAME,
';' )
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
-- replace it by your own
TABLE_NAME = 'example10'
GROUP BY
TABLE_NAME;
Note
It's not nicely formatted, but a search by , and replace by ,\n with an editor like notepad++ will make the statement readable. So you'll missing the annoying task of writing this by hand and not making mistakes.

MySQL: Selecting rows ordered by word count

How can I order my query by word count? Is it possible?
I have some rows in table, with text fields. I want to order them by word count of these text fields.
Second problem is, that I need to select only these rows, which have for example minimum 10 words, or maximum 20.
Well, this will not perform very well since string calculations need to be performed for all rows:
You can count number of words in a MySQL column like so: SELECT SUM( LENGTH(name) - LENGTH(REPLACE(name, ' ', ''))+1) FROM table (provided that words are defined as "whatever-delimited-by-a-whitespace")
Now, add this to your query:
SELECT
<fields>
FROM
<table>
WHERE
<condition>
ORDER BY SUM(LENGTH(<fieldWithWords>) - LENGTH(REPLACE(<fieldWithWords>, ' ', '')) + 1)
Or, add it to the condition:
SELECT
<fields>
FROM
<table>
WHERE
SUM(LENGTH(<fieldWithWords>) - LENGTH(REPLACE(<fieldWithWords>, ' ', '')) + 1) BETWEEN 10 AND 20
ORDER BY <something>
Maybe something like this:
SELECT Field1, SUM( LENGTH(Field2) - LENGTH(REPLACE(Field2, ' ', ''))+1)
AS cnt
FROM tablename
GROUP BY Field1
ORDER BY cnt
Field2 is the string field in which you'd like to count words.