get the most ordered package using two database tables - mysql

I want to get the count of all packages ordered for the whole week and get the package_id of the one with the highest frequency and also has the status='active' in my package table
these are my database tables
sales
+------------+------------------+
| package_id | datesales |
+------------+------------------+
| 1 | timestamp |
| 2 | timestamp |
| 1 | timestamp |
| 1 | timestamp |
| 2 | timestamp |
| 2 | timestamp |
| 3 | timestamp |
+------------+------------------+
packages
+------------+------------------+
| package_id | status |
+------------+------------------+
| 1 | inactive |
| 2 | active |
| 3 | active |
+------------+------------------+
I tried using this sql but I'm not really good with aggregation
SELECT count(product_id) as product_id from i.sales
where [i dunno how to put the sql for package table here]
i.date(datesales) <= curdate() and
i.date(datesales) >= curdate() - interval 6 day
group by product_id
with the above example in sales table, since I have 3 counts of package_id=1 and also 3 counts of package_id=2,
I want to get the id for package_id=2 since it is the highest frequency of orders and it has the status='active' in my package table

I think you basically want order by and limit and join:
select package_id, count(*) as cnt
from sales i join
packages p
using (package_id)
where -- i.date(i.datesales) <= curdate() and -- I doubt you have future start dates
i.datesales >= curdate() - interval 6 day and
p.status = 'active'
group by package_id
order by count(*) desc
limit 1;
Here is a db<>fiddle.

Related

Seek rows with incorrect dates in historic data

I had a table that is an historic log, recently I fixed a bug that was writing in that table an incorrect date, the dates should be correlatives, but in some cases there was a date that wasn't it, so much older than the previous date.
How can I get all the rows that aren't correlatives for each entity_id? In the example below I should get the rows 5 and 10.
The table has millions of rows and thousand of differents entities. I was thinking to compare the results of ordering by date and id but that is a lot of manual work.
| id | entity_id | time_stamp |
|--------|-------------|---------------|
| 1 | 7 | 2019-01-22 |
| 2 | 9 | 2019-01-05 |
| 3 | 6 | 2019-03-14 |
| 4 | 9 | 2019-04-20 |
| 5 | 6 | 2015-10-04 | WRONG
| 6 | 9 | 2019-07-15 |
| 7 | 3 | 2019-07-04 |
| 8 | 7 | 2019-06-01 |
| 9 | 6 | 2019-11-04 |
| 10 | 7 | 2019-03-04 | WRONG
Are there any function to compare the previous date by the entity id? I'm completely lost here, not sure how to clean the data. The database is MYSQL by the way.
If you are running MySQL 8.0, you can use lag(); the idea is to order records by id within groups having the same entity_id, and then to filter on records where the current timestamp is smaller than the previous one:
select t.*
from (
select t.*, lag(time_stamp) over(partition by entity_id order by id) lag_time_stamp
from mytable t
) t
where time_stamp < lag_time_stamp
In earlier versions, one option is to use a correlated subquery to get the previous timestamp:
select t.*
from mytable t
where time_stamp < (
select time_stamp
from mytable t1
where t1.entity_id = t.entity_id and t1.id < t.id
order by id desc
limit 1
)
SELECT s1.*
FROM sourcetable s1
WHERE EXISTS ( SELECT NULL
FROM sourcetable s2
WHERE s1.id < s2.id
AND s1.entity_id = s2.entity_id
AND s1.time_stamp > s2.time_stamp )
The index by (entity_id, id, time_stamp) or (entity_id, time_stamp, id) will increase the performance.

Get return for the latest day

I am running a mysql - 10.1.39-MariaDB - mariadb.org binary- database.
I am having the following table:
| id | date | product_name | close |
|----|---------------------|--------------|-------|
| 1 | 2019-08-07 00:00:00 | Product 1 | 806 |
| 2 | 2019-08-06 00:00:00 | Product 1 | 982 |
| 3 | 2019-08-05 00:00:00 | Product 1 | 64 |
| 4 | 2019-08-07 00:00:00 | Product 2 | 874 |
| 5 | 2019-08-06 00:00:00 | Product 2 | 739 |
| 6 | 2019-08-05 00:00:00 | Product 2 | 555 |
| 7 | 2019-08-07 00:00:00 | Product 3 | 762 |
| 8 | 2019-08-06 00:00:00 | Product 3 | 955 |
| 9 | 2019-08-05 00:00:00 | Product 3 | 573 |
I want to get the following output:
| id | date | product_name | close | daily_return |
|----|---------------------|--------------|-------|--------------|
| 4 | 2019-08-07 00:00:00 | Product 2 | 874 | 0,182679296 |
| 1 | 2019-08-07 00:00:00 | Product 1 | 806 | -0,179226069 |
Basically I want ot get the TOP 2 products with the highest return. Whereas return is calculated by (close_currentDay - close_previousDay)/close_previousDay for each product.
I tried the following:
SELECT
*,
(
CLOSE -(
SELECT
(t2.close)
FROM
prices t2
WHERE
t2.date < t1.date
ORDER BY
t2.date
DESC
LIMIT 1
)
) /(
SELECT
(t2.close)
FROM
prices t2
WHERE
t2.date < t1.date
ORDER BY
t2.date
DESC
LIMIT 1
) AS daily_return
FROM
prices t1
WHERE DATE >= DATE(NOW()) - INTERVAL 1 DAY
Which gives me the return for each product_name.
How to get the last product_name and sort this by the highest daily_return?
Problem Statement: Find the top 2 products with the highest returns on the latest date i.e. max date in the table.
Solution:
If you have an index on date field, it would be super fast.
Scans table only once and also uses date filter(index would allow MySQL to only process rows of given date range only.
A user-defined variable #old_close is used to find the return. Note here we need sorted data based on product and date.
SELECT *
FROM (
SELECT
prices.*,
CAST((`close` - #old_close) / #old_close AS DECIMAL(20, 10)) AS daily_return, -- Use #old_case, currently it has value of old row, next column will set it to current close value.
#old_close:= `close` -- Set #old_close to close value of this row, so it can be used in next row
FROM prices
INNER JOIN (
SELECT
DATE(MAX(`date`)) - INTERVAL 1 DAY AS date_from, -- if you're not sure whether you have date before latest date or not, can keep date before 1/2/3 day.
#old_close:= 0 as o_c
FROM prices
) AS t ON prices.date >= t.date_from
ORDER BY product_name, `date` ASC
) AS tt
ORDER BY `date` DESC, daily_return DESC
LIMIT 2;
Another version which doesn't depend on this date parameter.
SELECT *
FROM (
SELECT
prices.*,
CAST((`close` - #old_close) / #old_close AS DECIMAL(20, 10)) AS daily_return, -- Use #old_case, currently it has value of old row, next column will set it to current close value.
#old_close:= `close` -- Set #old_close to close value of this row, so it can be used in next row
FROM prices,
(SELECT #old_close:= 0 as o_c) AS t
ORDER BY product_name, `date` ASC
) AS tt
ORDER BY `date` DESC, daily_return DESC
LIMIT 2
You can do it with a self join:
select
p.*,
cast((p.close - pp.close) / pp.close as decimal(20, 10)) as daily_return
from prices p left join prices pp
on p.product_name = pp.product_name
and pp.date = date_add(p.date, interval -1 day)
order by p.date desc, daily_return desc, p.product_name
limit 2
See the demo.
Results:
| id | date | product_name | close | daily_return |
| --- | ------------------- | ------------ | ----- | ------------ |
| 4 | 2019-08-07 00:00:00 | Product 2 | 874 | 0.182679296 |
| 1 | 2019-08-07 00:00:00 | Product 1 | 806 | -0.179226069 |

How can I retrieve all the columns on a timerange aggregation?

I am currently struggling on how to aggregate my daily data in other time aggregations (weeks, months, quarters etc).
Here is how my raw data type looks like:
| date | traffic_type | visits |
|----------|--------------|---------|
| 20180101 | 1 | 1221650 |
| 20180101 | 2 | 411424 |
| 20180101 | 4 | 108407 |
| 20180101 | 5 | 298117 |
| 20180101 | 6 | 26806 |
| 20180101 | 7 | 12033 |
| 20180101 | 8 | 80368 |
| 20180101 | 9 | 69544 |
| 20180101 | 10 | 39919 |
| 20180101 | 11 | 26291 |
| 20180102 | 1 | 1218490 |
| 20180102 | 2 | 410965 |
| 20180102 | 4 | 108037 |
| 20180102 | 5 | 297727 |
| 20180102 | 6 | 26719 |
| 20180102 | 7 | 12019 |
| 20180102 | 8 | 80074 |
First, I would like to check the sum of visits regardless of traffic_type:
SELECT date, SUM(visits) as visits_per_day
FROM visits_tbl
GROUP BY date
Here is the outcome:
| ymd | visits_per_day |
|:--------:|:--------------:|
| 20180101 | 2294563 |
| 20180102 | 2289145 |
| 20180103 | 2300367 |
| 20180104 | 2310256 |
| 20180105 | 2368098 |
| 20180106 | 2372257 |
| 20180107 | 2373863 |
| 20180108 | 2364236 |
However, if I want to check the specific day which the visits_per_day was the highest for each time aggregation (eg.: Month), I am struggling to retrieve the right output.
Here is what I did:
SELECT
(date div 100) as y_month, MAX(visits_per_day) as max_visit_per_day
FROM
(SELECT date, SUM(visits) as visits_per_day
FROM visits_tbl
GROUP BY date) as t1
GROUP BY
y_month
And here is the output of my query:
| y_month | max_visit_per_day |
|:-------:|:-----------------:|
| 201801 | 2435845 |
| 201802 | 2519000 |
| 201803 | 2528097 |
| 201804 | 2550645 |
However, I cannot know what was the exact day where the visits_per_day was the highest.
Desired output:
| y_month | max_visit_per_day | ymd |
|:-------:|:-----------------:|:--------:|
| 201801 | 2435845 | 20180130 |
| 201802 | 2519000 | 20180220 |
| 201803 | 2528097 | 20180325 |
| 201804 | 2550645 | 20180406 |
ymd would represent the day in which the visits_per_day was the highest.
This logic would be used in a dashboard with the help of programming in order to automatically select the time aggregation.
Can someone please help me?
This is a job for the structured part of structured query language. That is, you will write some subqueries and treat them as tables.
You already know how to find the number of visits per day. Let's add the month for each day to that query (http://sqlfiddle.com/#!9/a8455e/13/0).
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
Next you need to find the largest number of daily visits in each month. (http://sqlfiddle.com/#!9/a8455e/12/0)
SELECT month, MAX(visits) max_daily_visits
FROM (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
) dayvisits
GROUP BY month
Then, the trick is retrieving the date on which that maximum occurred in each month. That requires a join. Without common table expressions (which MySQL lacks) you need to repeat the first subquery. (http://sqlfiddle.com/#!9/a8455e/11/0)
SELECT detail.*
FROM (
SELECT month, MAX(visits) max_daily_visits
FROM (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
) dayvisits
GROUP BY month
) maxvisits
JOIN (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
) detail ON detail.visits = maxvisits.max_daily_visits
AND detail.month = maxvisits.month
The outline of this rather complex query helps explain it. Instead of that subquery, we'll use an imaginary table called dayvisits.
SELECT detail.*
FROM (
SELECT month, MAX(visits) max_daily_visits
FROM dayvisits
GROUP BY date DIV 100
) maxvisits
JOIN dayvisits detail ON detail.visits = maxvisits.max_daily_visits
AND detail.month = maxvisits.month
You're seeking an extreme value for each month in the subquery. (This is a fairly standard sort of SQL operation.) To do that you find that value with a MAX() ... GROUP BY query. Then you join that to the subquery itself to find the other values corresponding to the extreme value.
If you did have common table expressions, the query would look like this. YOu might consider adopting the MySQL fork called MariaDB, which has CTEs.
WITH dayvisits AS (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
)
SELECT dayvisits.*
FROM (
SELECT month, MAX(visits) max_daily_visits
FROM dayvisits
GROUP BY month
) maxvisits
JOIN dayvisits ON dayvisits.visits = maxvisits.max_daily_visits
AND dayvisits.month = maxvisits.month
[Query Check on MSSQL] its quick and efficient.
select visit_sum_day_wise.date
, visit_sum_day_wise.Max_Visits
, visit_sum_day_wise.traffic_type
, LAST_VALUE(visit_sum_day_wise.visits) OVER(PARTITION BY
visit_sum_day_wise.date ORDER BY visit_sum_day_wise.date ROWS BETWEEN
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS max_visit_per_day
from (
select visits_tbl.date , visits_tbl.visits , visits_tbl.traffic_type
,max(visits_tbl.visits ) OVER ( PARTITION BY visits_tbl.date ORDER
BY visits_tbl.date ROWS BETWEEN UNBOUNDED PRECEDING AND 0
PRECEDING) Max_visits
from visits_tbl
) as visit_sum_day_wise
where visit_sum_day_wise.visits = (select max(visits_B.visits ) from
visits_tbl visits_B where visits_B.Date = visit_sum_day_wise.date )
enter image description here

Using MySQL group by clause with where clause

I have two tables, one that store product information and one that stores reviews for the products.
I am now trying to get the number of reviews submitted for the products between two dates but for some reason I get the same results regardless of the dates i put.
This is my query:
SELECT
productName,
COUNT(*) as `count`,
avg(rating) as `rating`
FROM `Reviews`
LEFT JOIN `Products` using(`productID`)
WHERE `date` BETWEEN '2015-07-20' AND '2015-07-30'
GROUP BY
`productName`
ORDER BY `count` DESC, `rating` DESC;
This returns:
+------------+---------------------+
| productName| count|rating |
+------------+------+--------------+
| productA | 23 | 4.3333333 |
| productB | 17 | 4.25 |
| productC | 10 | 3.5 |
+------------+---------------------+
Products table:
+---------+-------------+
|productID | productName|
+---------+-------------+
| 1 | productA |
| 2 | productB |
| 3 | productC |
+---------+-------------+
Reviews table
+---------+-----------+--------+---------------------+
|reviewID | productID | rating | date |
+---------+-----------+--------+---------------------+
| 1 | 1 | 4.5 | 2015-07-27 17:47:01|
| 2 | 1 | 3.5 | 2015-07-27 18:54:22|
| 3 | 3 | 2 | 2015-07-28 13:28:37|
| 4 | 1 | 5 | 2015-07-28 18:33:14|
| 5 | 2 | 1.5 | 2015-07-29 11:58:17|
| 6 | 2 | 3.5 | 2015-07-30 15:04:25|
| 7 | 2 | 2.5 | 2015-07-30 18:11:11|
| 8 | 1 | 3 | 2015-07-30 18:26:23|
| 9 | 1 | 3 | 2015-07-30 21:35:05|
| 10 | 1 | 4.5 | 2015-07-31 14:25:47|
| 11 | 3 | 0.5 | 2015-07-31 14:47:48|
+---------+-----------+--------+---------------------+
when I put two random dates that I do know for sure they not on the date column, I will still get the same results. Even when I want to retrieve records only on a certain day, I get the same results.
You should not use left join, because by doing so you retrieve all the data from one table. What you should use is something like :
select
productName,
count(*) as `count`,
avg(rating) as `rating`
from
products p,
reviews r
where
p.productID = r.productID
and `date` between '2015-07-20' and '2015-07-30'
group by productName
order by count desc, rating desc;
If the result, given your sample data, that you're looking for is:
| productName | count | rating |
|-------------|-------|--------|
| productA | 5 | 4 |
| productB | 3 | 3 |
| productC | 1 | 2 |
This is the count and average of reviews made on any date between 2015-07-20 and 2015-07-30 inclusive.
Then the there are two issues with your query. First, you need to change the join to a inner join instead of a left join, but more importantly you need to change the date condition as you are currently excluding reviews that fall on the last date on the range, but after midnight.
This happens because your between clause compares datetime values with date values so the comparison ends up being date between '2015-07-20 00:00:00' and '2015-07-30 00:00:00' which clearly excludes some dates at the end.
The fix is to either change the date condition so that the end is a day later:
where date >= '2015-07-20' and date < '2015-07-31'
or cast the date column to a date value, which will remove the time part:
where date(date) between '2015-07-20' and '2015-07-30'
Sample SQL Fiddle
You are using a LEFT JOIN between your reviews and your products tables. This will result in all the rows of reviews being shown with some rows having all product columns left empty.
You should use INNER JOIN, as this will filter only the wanted results.
(In the end I can only guess, since I don't even know which column belongs to which table ...)
The full query (very similar to Angelo Giannis's solution):
select
productName,
count(*) as `count`,
avg(rating) as `rating`
from
products INNER JOIN reviews USING(productId)
where date between '2015-07-20' and '2015-07-30'
group by productName
order by count desc, rating desc;
Here a fiddle with my and Angelo's solution (they both work).

how to group mysql based on items and today date

I have a database that store transaction logs, I would like to count all the logs for that day and group them based on prod_id
MySQL table structure:
Table name = products
+------+---------+------------+--------+
| ID | PROD_ID | DATE | PERSON |
+------+---------+------------+--------+
| 1 | 2 | 1400137633 | 1 |
| 2 | 2 | 1400137666 | 1 |
| 3 | 3 | 1400137125 | 2 |
| 4 | 4 | 1400137563 | 1 |
| 5 | 2 | 1400137425 | 2 |
| 6 | 3 | 1400137336 | 1 |
+------+---------+------------+--------+
MYSQL CODE:
$q = 'SELECT count(ID) as count
FROM PRODUCTS
WHERE PERSON ='.$db->qstr($person).'
AND DATE(FROM_UNIXTIME(DATE)) = DATE(NOW())';
so what I get is the number of items for the given date. Since the date is the same as all other entries. however I would like to group the items by prod_id, I tried GROUP BY PROD_ID but that did not give me what I want. I would like it to group if the PROD_ID is multiple and the date is the same display as one entry while still count the others
so here I should get an output ($Person = 1).... 2+2+2=1 +3 +4 so total should be 3
any suggestions?
Use DISTINCT with COUNT on PROD_ID.
Example:
SELECT count( distinct PROD_ID ) as count
FROM PRODUCTS
WHERE PERSON = 1 -- <---- change this with relevant variable
AND DATE( FROM_UNIXTIME (DATE ) ) = curdate();
And I suggest you to use Prepared Statement to bind values.