I hope it is not a duplicity, however I haven't been able to find such an example in given answers. And I expect that skilled sql guy will be able to help me easilly as problem is most likely me.
Please note that I tried really hard to understand similar questions that were mentioned by the system, however none of them pushed me in the face as the one think that I need. Please understand that I am very weak with SQL so maybe it was a correct answer.
Let's have two tables:
azure_tickets:
| ticket_id | system_tags | status |
| -------- | -------------- | -------- |
| 1209 | CZ_released_2023/01/19; IT_released_2023/01/24| For Release |
| 1210 | CZ_released_2023/01/19; HU_released_2023/01/24| Closed |
azure_tickets_history_releases_eav:
| ticket_id | status | days_count |
| -------- | -------------- | -------- |
| 1209 |On Stage | 12 |
| 1210 | On Stage | 25 |
Now the first table gets me ticket_id and system_tags (of course more columns there, but for this calculation it is what is needed).
What i need is an average count of days for all tickets with the given system_tag (for single country only, in this case only CZ) for status "On Stage".
We are using metabase, so I was able to creep my way up to the following SQL:
SELECT
`Table A`.`Tags A1` AS `Tags A`,
`Question 172`.`Azure System Tags` AS `Tags B`,
`Question 172`.`Days On Stage` AS `Days On Stage`
FROM (
SELECT DISTINCT regexp_substr(system_tags, concat({{country}}, '_released_[^;]*')) as "Tags A1"
FROM azure_tickets
WHERE azure_tickets.system_tags LIKE concat("%", {{country}}, "_released_%")
)
`Table A`
LEFT JOIN (
SELECT
regexp_substr(`Table B`.`system_tags`, concat({{country}}, '_released_[^;]*')) AS `Azure System Tags`,
SUM(`Table A`.`days_count`) / COUNT(`Table A`.`days_count`) AS `Days On Stage`
FROM (
SELECT `azure_tickets_history_releases_eav`.`ticket_id`,
`azure_tickets_history_releases_eav`.`days_count`
FROM `azure_tickets_history_releases_eav`
WHERE `ticket_id` IN (
SELECT `azure_tickets`.`ticket_id`
FROM `azure_tickets`
WHERE `azure_tickets`.`system_tags` LIKE concat("%", 'CZ_released_2022/12/14', "%") AND
`azure_tickets`.`state` IN ("For release", 'Closed') AND
`azure_tickets`.`team_project` NOT IN ('mobile-team','cloud-infrastructure','bart-team','ipf-team','integration-backoffice-team','web-measurements-team','devops-team','ecommerce-sla','qaa-team')) AND
`country` = {{country}} AND
`status` = "On Stage"
)
`Table A`
JOIN `azure_tickets` AS `Table B` ON `Table A`.`ticket_id` = `Table B`.`ticket_id`
)
`Question 172` ON `Table A`.`Tags A1` = `Question 172`.`Azure System Tags`
Which almost gives me what I need. It looks like this:
| Tags A | Tags B | Days On Stage |
| -------- | -------------- | -------- |
| CZ_released_2022/12/14 |CZ_released_2022/12/14 | 25.74 |
| CZ_released_2022/05/12 | | |
| CZ_released_2022/07/25 | | |
| cz_released_2022/07/28 | | |
As you can see on the row 23, there is a where clausole with hardcoded tag (CZ_released_2022/12/14). What I need is to replace this hardcoded tag with the column value Tag A (or B, it doesnt matter), however no matter how I try to update the sql, I get unknown columns in the subselect, for example "Unknown column 'Table A.Tags A1' in 'where clause'".
I am unsure whether the previous queries are needed (ie Question 172, I can per parte it, but I think there is a problem with principe, not the subquery).
I would much appreciate your inputs how to move on with this, I am out of ideas, as I don't use sql and metabase too often.
so I was wrong, I thought this will be easy to solve.
So here is how I had solved it.
I counted all data on backend.
There is a wisdom for me and maybe for you too - if it is too complicated to calculate, simplify it on backend.
Related
In MySQL, say I have the following table (called workers):
| id | specialty | status | name
| :- | :-------- | :--------- | :--- |
| 1 | Bricks | Unemployed | Joe
| 2 | Bricks | Employed | Eric
| 3 | Bricks | Contracted | Bob
| 4 | Tiles | Employed | Dylan
| 5 | Tiles | Contracted | James
In my query, say I want to find who is a prospective person for a new job. Thus, I would want to first find who is Unemployed, if no one is Unemployed, then who is only Contracted, and if no one is Contracted then at least who is Employed.
This would be GROUP BY specialty. The only methods I could figure out are either complex sub-queries or sets of UNIONs (or both). I also tried GROUP_CONCAT however this didn't work (or I didn't do it right). Googling this has not yielded any results.
Another idea is to assign a value to each category, and then do a group-wise max/min sub-query. I piloted this and it works, however seems quite messy and definitely not normalized:
SELECT
`id`,
`name`,
`status`,
-- I haven't been able to figure out how to get rid of MIN from the actual select
-- statement except by wrapping this in another sub-query, which I'm not keen on
MIN(`priority`) AS `priority`
FROM workers
INNER JOIN (
SELECT 'Unemployed' AS `status`, 0 AS `priority` FROM dual UNION
SELECT 'Contracted' AS `status`, 1 AS `priority` FROM dual UNION
SELECT 'Employed' AS `status`, 2 AS `priority` FROM dual
) AS priorities USING (`status`)
GROUP BY `specialty`;
I am looking for a more standard, efficient, normalized or versatile method of doing this.
Update:
An additional method I could be to use a CASE expression in the SELECT clause of the statement. This would be if I were to normalize the status column, through a foreign-key relationship or other related table:
New table called statuses
| id | status |
| :- | :------------- |
| 1 | Employed |
| 2 | Contracted |
| 3 | Unemployed |
| 4 | Not contracted |
Diffs: 'Not Contracted' is a new status and my workers table now stores the foreign key to the new statuses table.
Then my SQL would be:
SELECT
`id`,
`name`,
statuses.status,
MIN(`priority`) AS `priority`
FROM workers
INNER JOIN (
SELECT
`id`,
`status`,
CASE
-- currently uses text in `status`,
-- could also explicitly use `id`
WHEN `status` IN ('Unemployed', 'Not Contracted') THEN 0
WHEN `status` = 'Contracted' THEN 1
WHEN `status` = 'Employed' THEN 2
ELSE 3
END AS `priority`
FROM statuses
) AS statuses ON workers.status = statuses.id
GROUP BY `specialty`;
Note: You might think - why not put the priority in the statuses table? The reason why I am not doing that is because the priority changes depending on the data needed / the purpose of the report being generated.
Potentially this is a cleaner solution (for the times that the related data to prioritize against is in another table). Again, I am looking for a more standard, efficient, normalized or versatile method of doing this. Also, if there is more of a way this could be configurable to user input / variables.
The difficulty here mainly arises because you don't have an ordinal column which ranks the various status in some order. Absent that, we can introduce one using a CASE expression, similar to what your second query is trying to do:
SELECT w1.*
FROM workers w1
INNER JOIN
(
SELECT
specialty,
MIN(CASE status WHEN 'Unemployed' THEN 1
WHEN 'Contracted' THEN 2
ELSE 3 END) AS status_rnk
FROM workers
GROUP BY specialty
) w2
ON w1.specialty = w2.specialty AND
w2.status_rnk = CASE w1.status WHEN 'Unemployed' THEN 1
WHEN 'Contracted' THEN 2
ELSE 3 END;
I have to select row which has one kind of value in one row but not the other row, both row having a key entity common
Sample model :
+------+---------+---------------+
| key | status | arrival_p |
+------+---------+---------------+
| k1 | failure | came |
| k1 | success | gone |
| k2 | failure | came |
| k3 | success | came |
| k3 | failure | gone |
| k4 | success | came |
| k5 | success | came |
| k2 | success | gone |
| k6 | success | gone |
+------+---------+---------------+
so in this case, except for k4 and k5, all have come and gone. how can i find folks who have come but not gone??
k6 has just gone, so its an outlier, good to catch it but not immediate priority.
i tried below query but it doesn't work (i know of exact value in actual table which matches my description but below query returns no value at all) :
select ap1.`key`
from `arrival_pattern` ap1
left join `arrival_pattern` ap2
on ap1.`key` = ap2.`key`
where ap2.`key` is NULL
and ap1.`arrival_p` = 'came'
and ap2.`arrival_p` = 'gone'
limit 10;
any help or pointers in right direction as to what might be wrong in my join is helpful. i am on mysql.
TIA :)
Since both came and gone can appear only once for a specific key, you might as well select the elements for which a single record exists:
SELECT `key`,
COUNT(*)
FROM arrival_pattern
GROUP BY `key`
HAVING COUNT(*) = 1;
This solves also the second question ('k6 has just gone').
Also, note that key is a Reserved Keyword in MySQL. You should seriously consider naming your column to something else.
use not exists
select t1.* from arrival_pattern t1
where not exists ( select 1
from arrival_pattern t2
where t2.key=t1.key
and t2.arrival_p='gone')
you can try below self join
select t1.* arrival_pattern t1
left join
(select key from
arrival_pattern t2 where arrival_p='gone'
) t2
on t1.key=t2.key where t2.key is null
I have two tables with the following structures (unnecessary columns trimmed out)
----------------- ---------------------
| mod_personnel | | mod_skills |
| | | |
| - prs_id | | - prs_id |
| - name | | - skl_id |
----------------- | |
---------------------
There may be 0 to many rows in the skills table for each prs_id
What I want is all the personnel records which do NOT have an associated skill record with skill_id 1.
In plain English "I want all the people who do not have the skill x".
Currently, I have only been able to do it with the following nested select. But I am hoping to find a faster way.
SELECT * FROM `mod_personnel` WHERE `prs_id` NOT IN (
SELECT `prs_id` FROM `mod_skills` WHERE `skl_id` = 1 )
This may be faster:
SELECT `mod_personnel`.*
FROM `mod_personnel`
left outer join `mod_skills`
on `mod_skills`.`prs_id` = `mod_personnel`.`prs_id`
and `mod_skills`.`skl_id` = 1
WHERE `mod_skills`.`prs_id` is null;
Using a NOT EXISTS might be faster.
SELECT *
FROM `mod_personnel` p
WHERE NOT EXISTS (SELECT *
FROM `mod_skills` s
WHERE s.`prs_id` = p.`prs_id`
AND s.`skl_id` = 1 );
I'm trying to construct a query that's driving me crazy. I had no idea where to start with solving it, but after searching around a bit I started playing with subqueries. Now I'm at the point where I'm not sure if that will solve my issue or, if it will, how to create one that does what I want.
Here's a very simplistic view of my current table (call it tbl_1):
---------------------------------
| row | name | other_names |
|-------------------------------|
| 1 | A | B, C |
| 2 | B | C |
| 3 | A | C |
| 4 | D | E |
| 5 | C | A, B |
---------------------------------
Some of the items I'm working with have multiple names (brand names, names in other countries, code names, etc.), but ultimately all of those different names refer to the same item. I originally was running a search query along the lines of:
SELECT * FROM tbl_1
WHERE name LIKE '%A%'
OR other_names LIKE '%A%';
Which would return rows 1 and 3. However, I quickly realized that my query should also return row 2, as A = B = C. How would I go about doing something like that? I'm open to alternative suggestions outside of a fancy query, such as constructing another table that somehow combines all the names into one row, but I figure something like that would be error prone or inefficient.
Additionally, I'm running MySQL 5.5.23 using InnoDB with other code written in PHP and Python.
Thanks!
Update 5/26/12:
I went back to my original thinking of using a subquery, but right when I thought I was getting somewhere I ran into a documented MySQL issue where the query is evaluated from the outside in and my subquery will be evaluated for every row and won't finish in a realistic amount of time. Here's what I was attempting to do:
SELECT * FROM tbl_1
WHERE name = ANY
(SELECT name FROM tbl_1 WHERE other_names LIKE '%A%' or name LIKE '%A%')
OR other_names = ANY
(SELECT name FROM tbl_1 WHERE other_names LIKE '%A%' or name LIKE '%A%')
Which returns what I want using the example table, but the aforementioned MySQL issue/bug causes the subquery to be considered a dependent query rather than an independent one. As a result, I haven't been able to test the query on my real table (~250,000 rows) as it eventually times out.
I've read that the main workaround for the issue is to use joins rather than subqueries, but I'm not sure how I would apply that to what I'm trying to do. The more I think about it, I might be better off running the subqueries independently using PHP/Python and using the resulting arrays to craft the main query that I want. However, I still think there is the potential to miss some results because the terms in the columns aren't nearly as nice as my example (some of the terms are multiple words, some have parenthesis, the other names aren't necessarily comma-separated, etc).
Alternatively, I'm thinking about constructing a separate table that will build the necessary links, something like:
| 1 | A | B, C|
| 2 | B | C, A|
| 3 | C | A, B|
but I think that's a lot easier said than done considering the data I'm working with and the non-standardized format in which it exists.
The route that I'm strongly considering at the point is to build a separate table with the links that are easily constructed (i.e. 1:1 ratio for name:other_names) so I don't have to deal with the formatting issues that exist in the other_names column. I may also eliminate/limit the use of LIKE and require users to know at least one exact name in order to simplify the results and probably increase the overall performance.
In conclusion, I hate working with input data that I have no control over.
Stumbled on this question by accident, so i don't know if my suggestion is relevant, but this looks like good usage for something like an "union-find".
The SELECT would be extremely easy and fast.
But the insert & update is relativly complex and you will probably need an in-code loop (while updated rows > 0)... and several databse calls
Example for the table:
---------------------------
| row | name | group |
|-------------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 4 | C | 1 |
| 5 | D | 2 |
| 6 | X | 1 |
| 7 | Z | 2 |
---------------------------
selecting:
SELECT name FROM tbl WHERE group = (SELECT group FROM tbl WHERE name LIKE '%A%')
inserting relation K = T: (psedu codeish..)
SELECT group as gk WHERE name = K;
SELECT group as gt WHERE name = T;
if (gk empty result) and (gt empty result) insert both with new group
---------------------------
| row | name | group |
|-------------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 4 | C | 1 |
| 5 | D | 2 |
| 6 | X | 1 |
| 7 | Z | 2 |
| 8 | K | 3 |
| 9 | T | 3 |
---------------------------
if (gk empty result) and (gt NOT empty result) insert t with group = gx.group
---------------------------
| row | name | group |
|-------------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 4 | C | 1 |
| 5 | D | 2 |
| 6 | X | 1 |
| 7 | Z | 2 |
| 8 | K | 2 |
| 9 | T | 2 |
---------------------------
(the same in the other case)
and when both not empty, update one group to be the other
UPDATE tbl1 SET group = gt WHERE group = gk
I can't think of a query, that supports unlimited depth of name identity. But if you could work with a limited number of "recursions", you might consider using a query similar to this, starting with the query you provided, you retrieve all rows with name identities:
SELECT a.* FROM tbl_1 a
WHERE a.name='A'
OR a.other_names LIKE '%A%'
UNION
SELECT b.* FROM tbl_1 a
JOIN tbl_1 b ON a.other_names LIKE '%' || b.name || '%' OR b.other_names LIKE '%' || a.name || '%'
WHERE a.name='A'
OR a.other_names LIKE '%A%';
This query would return row 2, but it wouldn't return any additional rows having "B" as "other_name" in your example. So you would have to union another query:
SELECT a.* FROM tbl_1 a
WHERE a.name='A'
OR a.other_names LIKE '%A%'
UNION
SELECT b.* FROM tbl_1 a
JOIN tbl_1 b ON a.other_names LIKE '%' || b.name || '%' OR b.other_names LIKE '%' || a.name || '%'
WHERE a.name='A'
OR a.other_names LIKE '%A%';
UNION
SELECT c.* FROM tbl_1 a
JOIN tbl_1 b ON (a.other_names LIKE '%' || b.name || '%' OR b.other_names LIKE '%' || a.name || '%')
JOIN tbl_1 c ON (b.other_names LIKE '%' || c.name || '%' OR c.other_names LIKE '%' || b.name || '%')
WHERE a.name='A'
OR a.other_names LIKE '%A%';
As you can see, the query would grow and accelerate rapidly with increasing depth, and it also isn't what I would call beautiful. But it might fit your needs. I'm not very experienced working with MySQL functions, but I guess you would be able to create a more elegant solution also working with unlimited depth using those. You might also consider solving the problem programmatically with Python.
I want to find items in common from the "following_list" column in a table of users:
+----+--------------------+-------------------------------------+
| id | name | following_list |
+----+--------------------+-------------------------------------+
| 9 | User 1 | 26,6,12,10,21,24,19,16 |
| 10 | User 2 | 21,24 |
| 12 | User 3 | 9,20,21,26,30 |
| 16 | User 4 | 6,52,9,10 |
| 19 | User 5 | 9,10,6,24 |
| 21 | User 6 | 9,10,6,12 |
| 24 | User 7 | 9,10,6 |
| 46 | User 8 | 45 |
| 52 | User 9 | 10,12,16,21,19,20,18,17,23,25,24,22 |
+----+--------------------+-------------------------------------+
I was hoping to be able to sort by the number of matches for a given user id. For example, I want to match all users except #9 against #9 to see which of the IDs in the "following_list" column they have in common.
I found a way of doing this through the "SET" datatype and some bit trickery:
http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html#bits
However, I need to do this on an arbitrary list of IDs. I was hoping this could be done entirely through the database, but this is a little out of my league.
EDIT: Thanks for the help everybody. I'm still curious as to whether a bit-based approach could work, but the 3-table join works nicely.
SELECT a.following_id, COUNT( c.following_id ) AS matches
FROM following a
LEFT JOIN following b ON b.user_id = a.following_id
LEFT JOIN following c ON c.user_id = a.user_id
AND c.following_id = b.following_id
WHERE a.user_id = ?
GROUP BY a.following_id
Now I have to keep convincing myself not to prematurely optimize.
If you normalised your following_list column into a separate table with user_id and follower_id, then you'd find that COUNT() was extremely easy to use.
You'd also find the logic for selecting a list of followers, or a list of user's being followed much easier
Your problem would be simplified if you could split your following_list column off into a child table, e.g.
TABLE id_following_list:
id | following
--------------
10 | 21
10 | 24
46 | 45
...| ...
You can read more here.
Normalize the table, drop the column following_list, create a table following:
user_id
following_id
Which leads to the easy-peasy query (untested, you get the point):
SELECT b.user_id, COUNT(c.following)
FROM following a
JOIN following b -- get followings of <id>
ON b.following_id = a.following_id
AND b.user_id = a.following_id
JOIN following c -- get all (other) followings of <id> again, match with followings of b
ON b.following_id = c.following_id
AND c.user_id = a.user_id
WHERE a.user_id = <id>
GROUP BY b.user_id
ORDER BY COUNT(b.following) DESC
Performance may very well very based on indexes & size of dataset, maybe add a 'similarity' column which is updated at regular intervals or changes just for fast data retrieval.