MySQL export of single column showing duplicate entries only once - mysql

I need to export a single column from a MySQL database which shows each entry only once. So in the following table:
id author(s) content
________________________________________
1 Bill, Sara, Mike foo1
1 Sara foo2
2 Bill, Sara, Mike foo3
2 Sara foo4
3 David foo5
3 Mike foo5
I would need to export a list of authors as "Bill, Sara, Mike, Susan" so that each name is shown only once.
Thanks!
UPDATE: I realize this may not be possible, so I am going to have to accept an exported list which simply eliminates any exact duplicates within the column, so the output would be as such: Bill, Sara, Mike, Sara, David, Mike Any help forming this query would be appreciated.
Thanks again!

It's possible to get the resultset, but I'd really only do this to convert this to another table, with one row per author. I wouldn't want to run queries like this from application code.
The SUBSTRING_INDEX function can be used to extract the first, secpond, et al. author from the list, e.g.
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 1 ),',',-1) AS author1
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 2 ),',',-1) AS author2
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 3 ),',',-1) AS author3
But this gets messy at the end, because you get the last author when you retrieve beyond the length of the list.
So, you can either count the number of commas, with a rather ugly expression:
LENGTH(authors)-LENGTH(REPLACE(authors,',','')) AS count_commas
But it's just as easy to append a trailing comma, and then convert empty strings to NULL
So, replace authors with:
CONCAT(authors,',')
And then wrap that in TRIM and NULLIF functions.
NULLIF(TRIM( foo ),'')
Then, you can write a query that gets the first author from each row, another query that gets the second author from each row (identical to the first query, just change the '1' to a '2', the third author, etc. up to the maximum number of authors in a column value. Combine all those queries together with UNION operations (this will eliminate the duplicates for you.)
So, this query:
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',1),',',-1)),'') AS author
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',2),',',-1)),'')
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',3),',',-1)),'')
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',4),',',-1)),'')
FROM unfortunately_designed_table a
this will return a resultset of unique author names (and undoubtedly a NULL). That's only getting the first four authors in the list, you'd need to extend that to get the fifth, sixth, etc.
You can get the maximum count of entries in that column by finding the maximum number of commas, and adding 1
SELECT MAX(LENGTH(a.authors)-LENGTH(REPLACE(a.authors,',','')))+1 AS max_count
FROM unfortunately_designed_table a
That lets you know how far you need to extend the query above to get all of the author values (at the particular point in time you run the query... nothing prevents someone from adding another author to the list within a column at a later time.
After all the work to get distinct author values on separate rows, you'd probably want to leave them in a list like that. It's easier to work with.
But, of course, it's also possible to convert that resultset back into a comma delimited list, though the size of the string returned is limited by max_allowed_packet session variable (iirc).
To get it back as a single row, with a comma separated list, take that whole mess of a query from above, and wrap it in parens as an line view, give it an alias, and use the GROUP_CONCAT function.
SELECT GROUP_CONCAT(d.author ORDER BY d.author) AS distinct_authors
FROM (
...
) d
WHERE d.author IS NOT NULL
If you think all of these expressions are ugly, and there should be an easier way to do this, unfortunately (aside from writing procedural code), there really isn't. The relational database is designed to handle information in tuples (rows), with each row representing one entity. Stuffing multiple entities or values into a single column goes against relational design. As such, SQL does not provide a simple way to extract values from a string into separate tuples, which is why the code to do this is so messy.

Related

MYSQL select all record if contains specific number

I have a small problem, I have a table like this:
id|name|group|date_created
1|Volvo|1,3|06-04-2020 10:00:00
2|Audi|3|06-04-2020 10:00:00
etc....
Now I wish I could get all the records that have the value 1 inside the group column.
I tried LIKE "%1%", but I don't think it's a good query. Can you address me?
SELECT id FROM cars WHERE group LIKE '%1%'
The problem with your query is that it would wrongly match '1' against a list like '45,12,5' for example.
One method is to add commas on both ends before searching:
where concat(',', `group`, ',') like '%,1,%';
But in MySQL, it is much more convenient to use string function find_in_set(), whose purpose is just what you are looking for, ie search for a value in a comma-separated list:
select id from cars where find_in_set('1', `group`) > 0
Notes:
you should fix your data model, and have a separated table to store relationship between ids and groups, with each tuple on a separate row. Related reading: Is storing a delimited list in a database column really that bad?
group is a reserved word in MySQL, so not a good choice for a column name (you would need to surround it with backticks everytime you use it, which is error-prone)

Delete duplicate records in MS Access that aren't exact matches

I am working with Excel spreadsheets that I'm importing into MS Access. They include a client name, date of birth, some other personal information, and order information. The same clients often have multiple, unique orders. I am creating a table that is just unique clients (which I'll link to the order table later) and so when I import data from Excel I would like to delete duplicate client records, preserving one. I would like to match them on Name and Date of Birth. The issue I'm running into is that some client names are strings that don't match exactly.
For example:
Name DOB
---- ---
DOE,JOHN 1/1/1960
DOE,JOHN L 1/1/1960
JOHNSON,PAT 12/1/1945
SMITH,BETTY 2/1/1935
In the above set I'd like to limit it to just three records and remove an excess John Doe record.
I basically would like to only look at the client name before the space.
I wouldn't be opposed to losing the middle initial totally, so if there's a way to just chop it off, that'd work too. How can I achieve this?
Sounds like your easiest option is to in fact cut off any middle initals.
You'll want to process as follows.
Use Select DISTINCT when all done and said.
If you use the InStr function Syntax HERE , you can search for the space after the first name.
you can then choose to select only what's left of that with the Left function (left minus 1 as to not include the space). You'll come up with an error if a space isn't found, so add and iif statement to simply output just the name.
After reviewing the data, you'll need to remove column 1 (in the example below) as well as insert the Expr1 code directly into the iif statement, so in the end you'll only have two columns: DOB and Expr2 (or rename AS Name)
Here's an example:
SELECT DISTINCT
Table1.Name,
Table1.DOB,
InStr(1,[Table1].[Name]," ",1) AS Expr1,
IIf([expr1]>0,Left([Table1].[Name],[Expr1]-1),[Table1].[Name]) AS Expr2
FROM Table1;
Wayne beat me to it..

Select distinct column and then count 2 columns that relate to that column in MySQL

So I have an error log that I need to analyze.
In that error log there is are fields called
EVENT_ATTRIBUTE that displays the name of the device that collected that information.
EVENT_SEVERITY that displays a number from 1 to 5. In this column I need to find the amount 4's and 5's.
The problem is I need to get the distinct EVENT_ATTRIBUTES and then count all the 4's and 5's related to that specific EVENT_ATTRIBUTE and output the count.
Basically the sensors(event_attribute) detect different errors. I need to analyze how many 4's and 5's each of the sensors picks up so that I can analyze them.
I am having problems taking the distinct sensors and linking them to the specific sensor. I have tried this so far but it only returns me same number for 4 and 5 so I don't think I am doing it correctly.
SELECT DISTINCT LEFT(EVENT_ATTRIBUTE, locate('(', EVENT_ATTRIBUTE, 1)-1) AS
SensorName,
COUNT(CASE WHEN 'EVENT_SEVERITY' <>5 THEN 1 END) AS ERROR5,
COUNT(CASE WHEN 'EVENT_SEVERITY' <>4 THEN 1 END) AS ERROR4
FROM nodeapp.disc_event
WHERE EVENT_SEVERITY IN (5,4)
Group BY SensorName;
Here is the table that I am looking at.
Event Error Table
Im truncating the event attribute because the IP address doesn't matter. Basically I want to make the unique event_attribute act as a primary key and count the amount of 4's and 5's connected to that primary key.
With the code above I get this output: Event Result Table
Thank you for all your help!
You're very close.
DISTINCT is unnecessary when you're grouping.
You want SUM(). COUNT() simply counts everything that's not null. You can exploit the hack that a boolean expression evaluates to either 1 or 0.
SELECT LEFT(EVENT_ATTRIBUTE, LOCATE('(', EVENT_ATTRIBUTE, 1)-1) AS SensorName,
SUM(EVENT_SEVERITY = 5) ERROR_5,
SUM(EVENT_SEVERITY = 4) ERROR_4,
COUNT(*) ALL_ERRORS
FROM nodeapp.disc_event
GROUP BY LEFT(EVENT_ATTRIBUTE, LOCATE('(', EVENT_ATTRIBUTE, 1)-1);
Even if EVENT_SEVERITY values are stored as strings in your DBMS, expressions like EVENT_SEVERITY = 4 implicitly coerce them to integers.
It's generally good practice to include batch totals like COUNT(*) especially when you're debugging; they form a good sanity check that you're handling your data correctly.
The query is interpreting 'EVENT_SEVERITY' as string, try using ` or double quotes to delimit the field instead. ...and while it is "standard", I tend to shy away from double-quotes because they look like they should be for strings (and in some configurations of MySQL are).
Edit (for clarity): I mean it is literally interpreting 'EVENT_SEVERITY' as the string "EVENT_SEVERITY", not the underlying value of the field as a string.

Where clause with one column and multiple criteria returning one row instead of13

I have a simple query with a few rows and multiple criteria in the where clause but it is only returning one row instead of 13. No joins and the syntax was triple checked and appears to be free of errors.
Query:
select column1, column2, column3
from mydb
where onecolumn in (number1, number2....number13)
Results:
returns one row of data associated with a random number in the where clause
spent a big part of the day trying to figure this one out and am now out of ideas. Please help...
Absent a more detailed test case, and the actual SQL statement that is actually running, this question cannot be answered. Here are some "ideas"...
Our first guess is that the rows you think are going to satisfy the predicates aren't actually satisfying all of the conditions.
Our second guess is that you've got an aggregate expression (COUNT(), MAX(), SUM()) in the SELECT list that's causing an implicit GROUP BY. This is a common "gotcha"... the non-standard MySQL extension to GROUP BY which allows non-aggregates to appear in the SELECT list, which are not also included as expressions in the GROUP BY clause. This same gotcha appears when the GROUP BY clause is omitted entirely, and an aggregate is included in the SELECT list.
But the question doesn't make any mention of an aggregate expression in the SELECT list.
Our third guess is another issue that beginners frequently overlook: the order of precedence of operations, especially AND and OR. For example, consider the expressions:
a AND b OR c
a AND ( b OR c )
( a AND b ) OR c
consider those while we sing-along, Sesame Street style,...: "One of these things is not like the others, one of these things just doesn't belong..."
A fourth guess... if it wasn't for the row being returned having a value of onecolumn as a random number in the IN list... if it was instead the first number in the IN list, we'd be very suspicious that the IN list actually contains a single string value that looks like a list a values, but is actually not.
The two expression in the SELECT list look very similar, but they are very different:
SELECT t.n IN (2,3,5,7) AS n_in_list
, t.n IN ('2,3,5,7') AS n_in_string
FROM ( SELECT 2 AS n
UNION ALL SELECT 3
UNION ALL SELECT 5
) t
The first expression is comparing n to each value in a list of four values.
The second expression is equivalent to t.n IN (2).
This is a frequent trip up when neophytes are dynamically creating SQL text, thinking that they can pass in a string value and that MySQL will see the commas within the string as part of the SQL statement.
(But this doesn't explain how a some the random one in the list.)
Those are all just guesses. Those are some of the most frequent trip ups we see, but we're just guessing. It could be something else entirely. In it's current form, there is no definitive "answer" to the question.

Getting CSV values in mySQL

So I have some CSV values stored in a mySQL database.
For example:
ID Name parentID
1 Dave 1,4,6
2 Josh 2
3 Pete 10
4 Andy 2,10
Using this query
SELECT * FROM `table` WHERE `parentID` LIKE %4%
Only Dave will be returned, this is correct.
However if I select using: LIKE %1%, pete and andy are selected as well as dave, because they conatin '1'.
I need the query to be able to distinguish '10' for example, from '1'.
It needs acknowledge each value between a comma is distinct and appreciate the fact the last comma may be omitted.
Am I right in thinking perhaps REGEX could do the job instead?
Thanks.
You can use a regex to match "word boundaries":
WHERE parentID RLIKE '[[:<:]]4[[:>:]]'
Or you can use a special function that parses elements of a comman-separated string:
WHERE FIND_IN_SET('4', parentID) <> 0
I agree with the comment from #Nanne.
You will also find that it's better to store data not in comma-separated lists, but in a normalized fashion. I don't know if you have freedom to change your schema at this time, but for what it's worth, read my answer for the question Is storing a delimited list in a database column really that bad?