SQL counting the top occurrences of substrings seperated by commas in a column - mysql

I have a column in MYSQL with a list of comma-separated names of varying lengths. Some example columns would be: ,bob,joe,mike, or ,steve,bill,dan,.
I'm looking to sort by the names that occur the most in all columns and be able to count how many times they occur. For example it could return that Joe is the most common name with x occurrences in all of the columns and that bob is the second most common name with y occurrences in all of the columns.
Is there an effective way to go about this or am I better off storing each name individually as their own record? This table has records added to it quite often so if I could cut down on the size that would be ideal.

I would definitely go for storing these values as 1 row each in the 'name' column of a 1-many table. That way you can use aggregate functions easily.

Related

Understanding the use of bitwise operators in MySQL?

Can someone explain the purpose of using bitwise operators(like BIT_OR) in MySQL queries. For example, if have a table such as following:
What is the purpose of aggregate operation like:
SELECT name FROM table GROUP BY name HAVING BIT_OR(value) = 0;
What exactly does the BIT_OR do? I understand the actual operation of converting two integers to binary and determining if each pair of corresponding digits are either 0 or 1(if at least one of them is a 1), but what happens with varchar or other non-number columns columns? I know for example, SUM aggregate function can give me the sum of a column of each group. Likewise, what does BIT_OR tell me for each group?
**NOTE:**I randomly created the above table and query - it doesn't illustrate any specific problem

How to add the numbers in comma separated string of numbers contained in a MySQL column?

I have a MySQL column which contains a string of scores separated by a semi-colon eg: "5;21;24;25;26;28;117".
This column was created not by design, but by collecting the values from multiple rows in a table using GROUP_CONCAT and GROUP BY. The original data arrived as a spreadsheet with multiple rows with the ID value.
I can use a select clause with REPLACE function to replace the ; with a +.
SELECT values, REPLACE(values,";","+") AS score FROM [table_name] WHERE 1
values score
5;21;24;25;26;28;117 5+21+24+25+26+28+117
However what I need is the sum of: 5+21+24+25+26+28+117 to get a total of 246.
Is there any way to do this in MySQL without using some other scripting language?
The SELECT clause shows me a string of numbers joined with the + symbol.
Am looking for a way to evaluate that string to give me the result: 246
UPDATE:
As I was framing my question, I did more research and came up with this link which solves my problem:
(https://dba.stackexchange.com/questions/120747/evaluate-a-string-value-as-a-computed-expression-in-an-sql-statement-sthg-like).
Am keeping this question and the link to the answer here in case it could help other people searching for the same.

How do I compare all rows in a columns incrementally but also group them at the same time in access?

This is the table that I would like to compare the values within the row groups incrementally however I would not like to compare rows amongst groups as it would come with negative values.
How do I achieve this on the same table? I am unfamiliar with subqueries
It sounds like what you need is a Calculated Field added as a third column to your table.
Instructions:
Select the drop down menu next to "Click to Add" to create a new column.
From the choices available select Caluculated Field and then Number.
The Expression Builder will open.
All you need to do from there is select the column you want to use as your primary number (in your case Number) from the Expression Categories, insert a minus sign, then select the number you want to subtract from the first number (in your case Group), again in the Expression Categories.
Name the column you created whatever you would like.

Concatenated numbers stuck in scientific notation MS Access

I am trying to concatenate two columns of numbers, Column A and Column B. Both columns are stored as numbers in the underlying table. Column A varies from 1-9 numbers, and Column B is 2 or 3 numbers. The combination of 9 numbers in column B and 3 numbers in column B forces column A to be displayed in scientific notation before merging with Column B, resulting in an error.
So far I have tried the following expressions:
[Table].[ColumnA]&[Table].[ColumnB]
Val([Table].[ColumnA]&[Table].[ColumnB])
FormatNumber([Table].[ColumnA]&[Table].[ColumnB],0)
It seems like "&" is forcing columns A and B into text to be joined, but any formatting functions are converting after. How do I make sure both columns stay numbers before, during and after merging?
Set field in table as LongInteger or Double type and then all 9 digits should display. Do not use Single or Integer.
Or use CDbl([ColumnA]) in query.
Regardless, the result of concatenation is a string, not a number value.
You missed the last combination:
Format([Table].[ColumnA],"0") & Format([Table].[ColumnB],"0")

MySQL export of single column showing duplicate entries only once

I need to export a single column from a MySQL database which shows each entry only once. So in the following table:
id author(s) content
________________________________________
1 Bill, Sara, Mike foo1
1 Sara foo2
2 Bill, Sara, Mike foo3
2 Sara foo4
3 David foo5
3 Mike foo5
I would need to export a list of authors as "Bill, Sara, Mike, Susan" so that each name is shown only once.
Thanks!
UPDATE: I realize this may not be possible, so I am going to have to accept an exported list which simply eliminates any exact duplicates within the column, so the output would be as such: Bill, Sara, Mike, Sara, David, Mike Any help forming this query would be appreciated.
Thanks again!
It's possible to get the resultset, but I'd really only do this to convert this to another table, with one row per author. I wouldn't want to run queries like this from application code.
The SUBSTRING_INDEX function can be used to extract the first, secpond, et al. author from the list, e.g.
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 1 ),',',-1) AS author1
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 2 ),',',-1) AS author2
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 3 ),',',-1) AS author3
But this gets messy at the end, because you get the last author when you retrieve beyond the length of the list.
So, you can either count the number of commas, with a rather ugly expression:
LENGTH(authors)-LENGTH(REPLACE(authors,',','')) AS count_commas
But it's just as easy to append a trailing comma, and then convert empty strings to NULL
So, replace authors with:
CONCAT(authors,',')
And then wrap that in TRIM and NULLIF functions.
NULLIF(TRIM( foo ),'')
Then, you can write a query that gets the first author from each row, another query that gets the second author from each row (identical to the first query, just change the '1' to a '2', the third author, etc. up to the maximum number of authors in a column value. Combine all those queries together with UNION operations (this will eliminate the duplicates for you.)
So, this query:
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',1),',',-1)),'') AS author
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',2),',',-1)),'')
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',3),',',-1)),'')
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',4),',',-1)),'')
FROM unfortunately_designed_table a
this will return a resultset of unique author names (and undoubtedly a NULL). That's only getting the first four authors in the list, you'd need to extend that to get the fifth, sixth, etc.
You can get the maximum count of entries in that column by finding the maximum number of commas, and adding 1
SELECT MAX(LENGTH(a.authors)-LENGTH(REPLACE(a.authors,',','')))+1 AS max_count
FROM unfortunately_designed_table a
That lets you know how far you need to extend the query above to get all of the author values (at the particular point in time you run the query... nothing prevents someone from adding another author to the list within a column at a later time.
After all the work to get distinct author values on separate rows, you'd probably want to leave them in a list like that. It's easier to work with.
But, of course, it's also possible to convert that resultset back into a comma delimited list, though the size of the string returned is limited by max_allowed_packet session variable (iirc).
To get it back as a single row, with a comma separated list, take that whole mess of a query from above, and wrap it in parens as an line view, give it an alias, and use the GROUP_CONCAT function.
SELECT GROUP_CONCAT(d.author ORDER BY d.author) AS distinct_authors
FROM (
...
) d
WHERE d.author IS NOT NULL
If you think all of these expressions are ugly, and there should be an easier way to do this, unfortunately (aside from writing procedural code), there really isn't. The relational database is designed to handle information in tuples (rows), with each row representing one entity. Stuffing multiple entities or values into a single column goes against relational design. As such, SQL does not provide a simple way to extract values from a string into separate tuples, which is why the code to do this is so messy.