I have an order_transactions table with 3 relevant columns. id (unique id for the transaction attempt), order_id (the id of the order for which the attempt is being made), and success an int which is 0 if failed, and 1 if successful.
There can be 0 or more failed transactions before a successful transaction, for each order_id.
The question is, how do I find:
The number of orders which never had a successful transaction
The number of orders which had a transaction with a failure (eventually successful or not)
The number of orders which never had a failed transaction (success only)
I realize this is some combination of distinct, group by, maybe a subselect, etc, I'm just not well versed in this enough. Thanks.
To get the number of orders which never had a successful transaction you can use:
SELECT COUNT(*)
FROM (
SELECT order_id
FROM transactions
GROUP BY order_id
HAVING COUNT(CASE WHEN success = 1 THEN 1 END) = 0) AS t
Demo here
The number of orders which had a transaction with a failure (eventually successful or not) can be obtained using the query:
SELECT COUNT(*)
FROM (
SELECT order_id
FROM transactions
GROUP BY order_id
HAVING COUNT(CASE WHEN success = 0 THEN 1 END) > 0) AS t
Demo here
Finally, to get the number of orders which never had a failed transaction (success only):
SELECT COUNT(*)
FROM (
SELECT order_id
FROM transactions
GROUP BY order_id
HAVING COUNT(CASE WHEN success = 0 THEN 1 END) = 0) AS t
Demo here
You want "counts" of orders that meet specific conditions over multiple rows, so I'd start with a GROUP BY order_id
SELECT ...
FROM mytable t
GROUP BY t.order_id
To find out if a particular order ever had a failed transaction, etc. we can use aggregates on expressions that "test" for conditions.
For example:
SELECT MAX(t.success=1) AS succeeded
, MAX(t.success=0) AS failed
, IF(MAX(t.success=1),0,1) AS never_succeeded
FROM mytable t
GROUP BY t.order_id
The expressions in the SELECT list of that query are MySQL shorthand. We could use longer expressions (MySQL IF() function or ANSI CASE expressions) to achieve an equivalent result, e.g.
CASE WHEN t.success = 1 THEN 1 ELSE 0 END
We could include the `order_id` column in the SELECT list for testing. We can compare the results for each order_id to the rows in the original table, to verify that the results returned meet the specification.
To get "counts" of orders, we can reference the query as an inline view, and use aggregate expressions in the SELECT list.
For example:
SELECT SUM(r.succeeded) AS cnt_succeeded
, SUM(r.failed) AS cnt_failed
, SUM(r.never_succeeded) AS cnt_never_succeeded
FROM (
SELECT MAX(t.success=1) AS succeeded
, MAX(t.success=0) AS failed
, IF(MAX(t.success=1),0,1) AS never_succeeded
FROM mytable t
GROUP BY t.order_id
) r
Since the expressions in the SELECT list return either 0, 1 or NULL, we can use the SUM() aggregate to get a count. To make use of a COUNT() aggregate, we would need to return NULL in place of a 0 (FALSE) value.
SELECT COUNT(IF(r.succeeded,1,NULL)) AS cnt_succeeded
, COUNT(IF(r.failed,1,NULL)) AS cnt_failed
, COUNT(IF(r.never_succeeded,1,NULL)) AS cnt_never_succeeded
FROM (
SELECT MAX(t.success=1) AS succeeded
, MAX(t.success=0) AS failed
, IF(MAX(t.success=1),0,1) AS never_succeeded
FROM mytable t
GROUP BY t.order_id
) r
If you want a count of all order_id, add a COUNT(1) expression in the outer query. If you need percentages, do the division and multiply by 100,
For example
SELECT SUM(r.succeeded) AS cnt_succeeded
, SUM(r.failed) AS cnt_failed
, SUM(r.never_succeeded) AS cnt_never_succeeded
, SUM(1) AS cnt_all_orders
, SUM(r.failed)/SUM(1)*100.0 AS pct_with_a_failure
, SUM(r.succeeded)/SUM(1)*100.0 AS pct_succeeded
, SUM(r.never_succeeded)/SUM(1)*100.0 AS pct_never_succeeded
FROM (
SELECT MAX(t.success=1) AS succeeded
, MAX(t.success=0) AS failed
, IF(MAX(t.success=1),0,1) AS never_succeeded
FROM mytable t
GROUP BY t.order_id
) r
(The percentages here are a comparison to the count of distinct order_id values, not as the total number of rows in the table).
successful order
select count(*) from
( select distinct order_id from my_table where success = 1 ) as t;
unsuccessful order
select count(*) from
( select distinct order_id from my_table where success = 0 ) as t;
never filed transaction
select count(*) from
( select distintc order_id from my_table where id not in
(select distinct order_id from my_table where success = 0) ) as t;
Related
I have a table with the following columns member_id, status and created_at (timestamp) and i want to extract the latest status for each member_id based on the timestamp value.
member_id
status
created_at
1
ON
1641862225
1
OFF
1641862272
2
OFF
1641862397
3
OFF
1641862401
3
ON
1641862402
Source: Raw data image
So, my ideal query result would be like this:
member_id
status
created_at
1
OFF
1641862272
2
OFF
1641862397
3
ON
1641862402
Expected query results image
My go to process for doing things like that is to assign a row number to each data and get row number 1 depending on the partition and sorting.
For mysql, this is only available starting mysql 8
SELECT ROW_NUMBER() OVER(PARTITION BY member_id ORDER BY created_at DESC) as row_num,
member_id, status, created_at FROM table
This will generate something like this.
row_num
member_id
status
created_at
1
1
OFF
1641862272
2
1
ON
1641862225
1
2
OFF
1641862397
1
3
ON
1641862402
2
3
OFF
1641862401
Then you use that as a sub query and get the rows where row_num = 1
SELECT member_id, status, created_at FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY member_id ORDER BY created_at DESC) as row_num,
member_id, status, created_at FROM table
) a WHERE row_num = 1
MySQL has support for Window Function since v8.0. the solution from crimson589 is preferred for v8+, this solution applies for earlier versions of MySQL or if you need an alternate solution to window queries.
After grouping by member_id we can either join back into the original set to gain the corresponding status value to the MAX(created_at)
SELECT ByMember.member_id
, status.status
, ByMember.created_at
FROM (
SELECT member_id, max(created_at) as created_at
FROM MemberStatus
GROUP BY member_id
) ByMember
JOIN MemberStatus status ON ByMember.member_id = status.member_id AND ByMember.created_at = status.created_at;
Or you could use a sub query instead of the join:
SELECT ByMember.member_id
, (SELECT status.status FROM MemberStatus status WHERE ByMember.member_id = status.member_id AND ByMember.created_at = status.created_at) as status
, ByMember.created_at
FROM (
SELECT member_id, max(created_at) as created_at
FROM MemberStatus
GROUP BY member_id
) ByMember
The JOIN based solution allows you to query additional columns from the original set instead of having multiple sub-queries. I would almost always advocate for the JOIN solution, but sometimes the sub-query is simpler to maintain.
I've setup a fiddle to compare these options: http://sqlfiddle.com/#!9/0edb931/11
You can group by member_id and max of created_at, then a self join with member_id and created_at will give you the latest status.
I am trying to merge two queries into one, but UNION is not working for me.
Here is the code:
SELECT
Customer_A,
Activity,
Customer_P,
Purchase
FROM (
SELECT
buyer_id as Customer_A,
COUNT(buyer_id) As Activity
FROM
customer_info_mxs
GROUP BY buyer_id
UNION ALL
SELECT
buyer_id as Customer_P,
SUM(purchase_amount) As Purchase
FROM
customer_info_mxs
GROUP BY buyer_id
)sub
I expect to have 4 columns as a result, but I get 2 instead (Customer_A) and(Activity).
If the query is supposed to return a list of customers, their number of purchases, and the total amount they’ve spent, then you can use a single query like this:
SELECT mxs.buyer_id as Customer,
COUNT(mxs.purchase_id) As Activity,
SUM(mxs.purchase_amount) As Purchases
FROM customer_info_mxs mxs
GROUP BY mxs.buyer_id;
Otherwise, your first subquery will always be a buyer_id and a value of 1.
Be sure to change purchase_id to whatever the unique id is for each purchase if you wish to see that number.
I think there is some confusion about the union statement. The union statement returns a row set that is the sum of all of the 'unioned' queries; since these queries have only 2 columns, the combined output only has two columns. The fact that the columns have different names is irrelevant. The column names in the output are being applied from the first query of the union.
One option is to just do
select buyer_id, count(buyer_id), sum(purchase_amount) from customer_info_mxs group by buyer_id
From your question, it looks like you are trying to do a pivot, turning some of the rows into additional columns. That could be done with ... some difficulty.
i read your comment,
'main goal is to creat a dataset in which returns 5 columns as: Customer_A, Activity (top 100), customer_P, Purchase(top 100), inner join of activity and purchase'
please try this query
SET #row_number = 0, #row_number2 = 0;
SELECT t1.Customer_A,t1.Activity, t2.Customer_P, t2.Purchase
from (
SELECT (#row_number:=#row_number + 1) AS n, t.Customer_a, t.Activity
from (
select buyer_id as Customer_A,COUNT(buyer_id) As Activity
FROM customer_info_mxs
GROUP BY buyer_id
order by Activity desc
Limit 100
)t
) t1
left join (
SELECT (#row_number2:=#row_number2 + 1) AS n,
FROM (
select buyer_id as Customer_P, SUM(purchase_amount) Purchase
FROM customer_info_mxs
GROUP BY buyer_id
order by Purchase desc
Limit 100
)t
) t2 on t2.n=t1.n
basic idea is, i just create some temporary number 0-99 to table 1 (t1) and join to temporary number on table 2 (t2)
This query works and provides me with the information I need, but it is very slow: it takes 18 seconds to agregate a database of only 4,000 records.
I'm bringing it here to see if anyone has any advice on how to improve it.
SELECT COUNT( status ) AS quantity, status
FROM log_table
WHERE time_stamp
IN (SELECT MAX( time_stamp ) FROM log_table GROUP BY userid )
GROUP BY status
Here's what it does/what it needs to do in plain text:
I have a table full of logs, each log contains a "userid", "status" (integer between 1-12) and "time_stamp" (a time stamp of when the log was created). There may be many entries for a particular userid, but with a different time stamp and status. I'm trying to get the most recent status (based on time_stamp) for each userid, then count the occurrences of each most-recent status among all the users.
My initial idea was to use a sub query with GROUP BY userid, that worked fast - but that always returned the first entry for each userid, not the most recent. If I could do GROUP BY userid using time_stamp DESC to Identify which row should be the representative for the group, that would be great. But of course ORDER BY inside of group does not work.
Any suggestions?
The first thing to try is to make this an explicit join:
SELECT COUNT(status) AS quantity, status
FROM log_table join
(select lg.userid, MAX( time_stamp ) as maxts
from log_table lg
GROUP BY userid
) lgu
on lgu.userid = lg.userid and lgu.maxts = lg.time_stamp
GROUP BY status;
Another approach is to use a different where clause. This will work best if you have an index on log_table(userid, time_stamp). This approach is doing the filtering by saying "there is no timestamp bigger than this one for a given user":
SELECT COUNT(status) AS quantity, status
FROM log_table
WHERE not exists (select 1
from log_table lg2
where lgu.userid = lg.userid and lg2.time_stamp > lg.time_stamp
)
GROUP BY status;
I have two tables marks and exams.
In the marks table I have studentid, mark1, mark2 and examid-foreign key from exams for different exams.
I want to get distinct student id and their number of failures in one single query.
The condition for failure is mark1+mark2 <50 or mark1<30. For e.g. If a student having studentid 1 has 15 entries(15 exams) in marks table and the same student failed in 6 so I want to get result as '1' and '6' in two columns and similarly for all students. For this case I wrote query using 'case' and is given below
select
distinct t1.studentid,
(#arrear:=
case
when (t1.mark1+t1.mark2) <50 OR t1.mark1 < 30
then #arrear+1 else #arrear
end) as failures
from marks t1, exams t2,
(select #arrear := 0) r
where t1.examid = t2.examid group by t1.studentid;
But the above query failed to give correct result. How can I modify the query to get correct result?
Try this. You don't need to use variables to help you.
select
m.studentid,
sum(case when m.mark1 + m.mark2 < 50 or m.mark1 < 30 then 1 else 0 end) as failures
from
marks m inner join exams e
on
m.examid = e.examid
group by
m.studentid
The case statement works out if the result is a failure or not and returns 1 for fail, 0 for no fail. Summing the result of this (grouped by studentid) gives you the number of fails per studentid
Oh and the join makes a more efficient join between your two tables :)
You don't need variable #arrear. You can get your info using only query
Try this:
select
distinct t1.studentid,
sum(
case
when (t1.mark1+t1.mark2) <50 OR t1.mark1 < 30
then 1
else 0
end
) as failures
from marks t1, exams t2
where t1.examid = t2.examid group by t1.studentid;
I want to Apply condition for GROUP BY.
When the condition city_id != 0 is true, group the list. Otherwise normal list.
I used this query for that:
(
SELECT city_id, sum(sales) as counts
FROM product_sales
WHERE city_id !=0
GROUP BY city_id
)
UNION
(
SELECT city_id, sales
FROM product_sales
WHERE city_id =0
ORDER BY sales_id
)
Anyone can help me avoid the UNION and get the list in a single query?
One idea : GROUP BY the city_id when it is not zero, else emulate a random unique value for grouping with UUID(). So each row with city_id = 0 will not be grouped.
select city_id, sum(sales)
from product_sales
group by
case when city_id = 0
then UUID()
else city_id
end
SQL Fiddle.
A UUID is designed as a number that is globally unique in space and
time. Two calls to UUID() are expected to generate two different
values, even if these calls are performed on two separate computers
that are not connected to each other.