MySQL 5.7 and SELECT DISTINCT JSON attributes - mysql

I need a help in selection. How I can get rows with unique attributes, for example I have 2 json strings in db.
{"name":["name1"]};
{"name":["name2", "name1"]};
{"name":["name3", "name4"]};
{"name":["name3"]};
If I try just SELECT DISTINCT data->"$.name" so I get all 2 strings, but I need to check every param and if some was before then don't show it.
Is it possible?
I want to get just 1 and 3 rows, because 2 and 4 contains names which we already have (I don't care about name2 in my case name2 equivalent name1).

I wanted to contribute this answer to the void. This will return all unique top-level JSON keys; unfortunately {'test':1} and {'test':1, 'word':1} will return two records, test and test, word. This may still be suitable for some.
SELECT DISTINCT JSON_KEYS(tags) as name FROM items WHERE JSON_LENGTH(tags) >= 1

SELECT DISTINCT JSON_UNQUOTE(features->"$.name[0]") as name
FROM data WHERE JSON_LENGTH(features->"$.name") = 1
So, I just took result where attr name has 1 item only. And that we can check for unique. It's not the best solution, but I don't have another yet)

Related

I want to select rows which has a particular value in the column but the column can contain multiple values

col_1 col_2
0 ab,bc,cd
1 bc,xy
2 zz,xx
3 ab
4 cc
5 ef,kk,ok
I want to select rows that have "ab" as one of the values in col_2. For example - in this case, 0th and 3rd row will be selected.
So, is there any SQL query for that?
First, you should fix your data model. Storing multiple values in a string is just a misuse of strings. The correct data model would have a separate row for each col_1/col_2 combination.
Sometimes, we are stuck with other people's really bad decisions on data modeling. MySQL actually has a function to help deal with this, find_in_set().
You can use:
where find_in_set('ab', col_2) > 0
until you fix the data model.

Select distinct column and then count 2 columns that relate to that column in MySQL

So I have an error log that I need to analyze.
In that error log there is are fields called
EVENT_ATTRIBUTE that displays the name of the device that collected that information.
EVENT_SEVERITY that displays a number from 1 to 5. In this column I need to find the amount 4's and 5's.
The problem is I need to get the distinct EVENT_ATTRIBUTES and then count all the 4's and 5's related to that specific EVENT_ATTRIBUTE and output the count.
Basically the sensors(event_attribute) detect different errors. I need to analyze how many 4's and 5's each of the sensors picks up so that I can analyze them.
I am having problems taking the distinct sensors and linking them to the specific sensor. I have tried this so far but it only returns me same number for 4 and 5 so I don't think I am doing it correctly.
SELECT DISTINCT LEFT(EVENT_ATTRIBUTE, locate('(', EVENT_ATTRIBUTE, 1)-1) AS
SensorName,
COUNT(CASE WHEN 'EVENT_SEVERITY' <>5 THEN 1 END) AS ERROR5,
COUNT(CASE WHEN 'EVENT_SEVERITY' <>4 THEN 1 END) AS ERROR4
FROM nodeapp.disc_event
WHERE EVENT_SEVERITY IN (5,4)
Group BY SensorName;
Here is the table that I am looking at.
Event Error Table
Im truncating the event attribute because the IP address doesn't matter. Basically I want to make the unique event_attribute act as a primary key and count the amount of 4's and 5's connected to that primary key.
With the code above I get this output: Event Result Table
Thank you for all your help!
You're very close.
DISTINCT is unnecessary when you're grouping.
You want SUM(). COUNT() simply counts everything that's not null. You can exploit the hack that a boolean expression evaluates to either 1 or 0.
SELECT LEFT(EVENT_ATTRIBUTE, LOCATE('(', EVENT_ATTRIBUTE, 1)-1) AS SensorName,
SUM(EVENT_SEVERITY = 5) ERROR_5,
SUM(EVENT_SEVERITY = 4) ERROR_4,
COUNT(*) ALL_ERRORS
FROM nodeapp.disc_event
GROUP BY LEFT(EVENT_ATTRIBUTE, LOCATE('(', EVENT_ATTRIBUTE, 1)-1);
Even if EVENT_SEVERITY values are stored as strings in your DBMS, expressions like EVENT_SEVERITY = 4 implicitly coerce them to integers.
It's generally good practice to include batch totals like COUNT(*) especially when you're debugging; they form a good sanity check that you're handling your data correctly.
The query is interpreting 'EVENT_SEVERITY' as string, try using ` or double quotes to delimit the field instead. ...and while it is "standard", I tend to shy away from double-quotes because they look like they should be for strings (and in some configurations of MySQL are).
Edit (for clarity): I mean it is literally interpreting 'EVENT_SEVERITY' as the string "EVENT_SEVERITY", not the underlying value of the field as a string.

MySQL export of single column showing duplicate entries only once

I need to export a single column from a MySQL database which shows each entry only once. So in the following table:
id author(s) content
________________________________________
1 Bill, Sara, Mike foo1
1 Sara foo2
2 Bill, Sara, Mike foo3
2 Sara foo4
3 David foo5
3 Mike foo5
I would need to export a list of authors as "Bill, Sara, Mike, Susan" so that each name is shown only once.
Thanks!
UPDATE: I realize this may not be possible, so I am going to have to accept an exported list which simply eliminates any exact duplicates within the column, so the output would be as such: Bill, Sara, Mike, Sara, David, Mike Any help forming this query would be appreciated.
Thanks again!
It's possible to get the resultset, but I'd really only do this to convert this to another table, with one row per author. I wouldn't want to run queries like this from application code.
The SUBSTRING_INDEX function can be used to extract the first, secpond, et al. author from the list, e.g.
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 1 ),',',-1) AS author1
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 2 ),',',-1) AS author2
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 3 ),',',-1) AS author3
But this gets messy at the end, because you get the last author when you retrieve beyond the length of the list.
So, you can either count the number of commas, with a rather ugly expression:
LENGTH(authors)-LENGTH(REPLACE(authors,',','')) AS count_commas
But it's just as easy to append a trailing comma, and then convert empty strings to NULL
So, replace authors with:
CONCAT(authors,',')
And then wrap that in TRIM and NULLIF functions.
NULLIF(TRIM( foo ),'')
Then, you can write a query that gets the first author from each row, another query that gets the second author from each row (identical to the first query, just change the '1' to a '2', the third author, etc. up to the maximum number of authors in a column value. Combine all those queries together with UNION operations (this will eliminate the duplicates for you.)
So, this query:
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',1),',',-1)),'') AS author
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',2),',',-1)),'')
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',3),',',-1)),'')
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',4),',',-1)),'')
FROM unfortunately_designed_table a
this will return a resultset of unique author names (and undoubtedly a NULL). That's only getting the first four authors in the list, you'd need to extend that to get the fifth, sixth, etc.
You can get the maximum count of entries in that column by finding the maximum number of commas, and adding 1
SELECT MAX(LENGTH(a.authors)-LENGTH(REPLACE(a.authors,',','')))+1 AS max_count
FROM unfortunately_designed_table a
That lets you know how far you need to extend the query above to get all of the author values (at the particular point in time you run the query... nothing prevents someone from adding another author to the list within a column at a later time.
After all the work to get distinct author values on separate rows, you'd probably want to leave them in a list like that. It's easier to work with.
But, of course, it's also possible to convert that resultset back into a comma delimited list, though the size of the string returned is limited by max_allowed_packet session variable (iirc).
To get it back as a single row, with a comma separated list, take that whole mess of a query from above, and wrap it in parens as an line view, give it an alias, and use the GROUP_CONCAT function.
SELECT GROUP_CONCAT(d.author ORDER BY d.author) AS distinct_authors
FROM (
...
) d
WHERE d.author IS NOT NULL
If you think all of these expressions are ugly, and there should be an easier way to do this, unfortunately (aside from writing procedural code), there really isn't. The relational database is designed to handle information in tuples (rows), with each row representing one entity. Stuffing multiple entities or values into a single column goes against relational design. As such, SQL does not provide a simple way to extract values from a string into separate tuples, which is why the code to do this is so messy.

What is the equivalent in SQLite of a MySQL statement, using a sub-select that refers a field alias

The question looks more complicated than the problem itself. So here the example.
I have a "Test" table in MySQL with only one field: ObjectId INTEGER(10).
Values in the ObjectId field:
1
1
1
2
2
I execute the following statement in MySQL query browser:
SELECT DISTINCT ObjectId obid, (SELECT count(*) FROM Test WHERE ObjectId = obid) AS cnt FROM Test
The result is what I expect: in the example above, a number of 2 rows containing:
Column 1: the ObjectId value
Column 2: the number of times the respective ObjectId value appears in the column.
obid cnt
1 3
2 2
Now I execute the same statement on an identical table (structure and data) created in a SQLite database. I get an error telling that "no such column: obid".
The above mentioned SELECT is quite convenient for my purpose. It would look a bit odd to be forced to replace it with a series of selects (like using a cursor). I am new to SQLite, so maybe I miss something. I've investigated a lot with zero results.
So, has anyone an idea if there is a similar, single SELECT statement, that would produce the same result on the SQLite database?
Thanks
You're doing it the complicated way. :) GROUP BY should get what you want.
SELECT ObjectID AS obid, COUNT(1) AS cnt
FROM Test
GROUP BY ObjectID

How can I add a "group" of rows and increment their "group id" in MySQL?

I have the following table my_table with primary key id set to AUTO_INCREMENT.
id group_id data_column
1 1 'data_1a'
2 2 'data_2a'
3 2 'data_2b'
I am stuck trying to build a query that will take an array of data, say ['data_3a', 'data_3b'], and appropriately increment the group_id to yield:
id group_id data_column
1 1 'data_1a'
2 2 'data_2a'
3 2 'data_2b'
4 3 'data_3a'
5 3 'data_3b'
I think it would be easy to do using a WITH clause, but this is not supported in MySQL. I am very new to SQL, so maybe I am organizing my data the wrong way? (A group is supposed to represent a group of files that were uploaded together via a form. Each row is a single file and the the data column stores its path).
The "Psuedo SQL" code I had in mind was:
INSERT INTO my_table (group_id, data_column)
VALUES ($NEXT_GROUP_ID, 'data_3a'), ($NEXT_GROUP_ID, 'data_3b')
LETTING $NEXT_GROUP_ID = (SELECT MAX(group_id) + 1 FROM my_table)
where the made up 'LETTING' clause would only evaluate once at the beginning of the query.
You can start a transaction do a select max(group_id)+1, and then do the inserts. Or even by locking the table so others can't change (insert) to it would be possible
I would rather have a seperate table for the groups if a group represents files which belong together, especially when you maybe want to save meta data about this group (like the uploading user, the date etc.). Otherwise (in this case) you would get redundant data (which is bad – most of the time).
Alternatively, MySQL does have something like variables. Check out http://dev.mysql.com/doc/refman/5.1/en/set-statement.html