mysql not in issue - mysql

I have a select statement that I am trying to build a list of scripts as long as the users role is not in the scripts.sans_role_priority field. This works great if there is only one entry into the field but once I add more than one the whole function quits working. I am sure I am overlooking something simple, just need another set of eyes on it. Any help wold be appreciated.
script:
SELECT *
FROM scripts
WHERE active = 1
AND homePage='Y'
AND (role_priority > 40 OR role_priority = 40)
AND (40 not in (sans_role_priority) )
ORDER BY seq ASC
data in scripts.sans_role_priority(varchar) = "30,40".
Additional testing adds this:
When I switch the values in the field to "40, 30" the select works. Continuing to debug...

Maybe you are looking for FIND_IN_SET().
SELECT *
FROM scripts
WHERE active = 1
AND homePage='Y'
AND (role_priority > 40 OR role_priority = 40)
AND NOT FIND_IN_SET('40', sans_role_priority)
ORDER BY seq ASC
Note that having "X,Y,Z" as VARCHAR values in some fields reveals that your DB schema may be improved in order to have X, Y and Z stored as separate values in a related table.

SELECT *
FROM scripts
WHERE active = 1
AND homePage='Y'
AND role_priority >= 40
AND NOT FIND_IN_SET(40,sans_role_priority)
ORDER BY seq ASC
See: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set
Note that CSV in databases is just about the worst antipattern you can find.
It should be avoided at all costs because:
You cannot use an index on a CSV field (at least not a mentally sane one);
Joins on CSV fields are a major PITA;
Selects on them are uber-slow;
They violate 1NF.
They waste storage.
Instead of using a CSV field, consider putting sans_role_priority in another table with a link back to scripts.
table script_sans_role_priority
-------------------------------
script_id integer foreign key references script(id)
srp integer
primary key (script_id, srp)
Then the renormalized select will be:
SELECT s.*
FROM scripts s
LEFT JOIN script_sans_role_priority srp
ON (s.id = srp.script_id AND srp.srp = 40)
WHERE s.active = 1
AND s.homePage='Y'
AND s.role_priority >= 40
AND srp.script_id IS NULL
ORDER BY seq ASC

SELECT *
FROM scripts
WHERE active = '1'
AND homePage='Y'
AND role_priority >= '40'
AND sans_role_priority <> '40'
ORDER BY seq ASC

Related

MySQL - Add flag column to identify the first payment

I want to improve my current query. So I have this table called Incomes. Where I have a sourceId varchar field. I have a single SELECT for the fields I need, but I needed to add an extra field called isFirstTime to represent if it was the first time on the row on what that sourceId was used. This is my current query:
SELECT DISTINCT
`income`.*,
CASE WHEN (
SELECT
`income2`.id
FROM
`income` as `income2`
WHERE
`income2`."sourceId" = `income`."sourceId"
ORDER BY
`income2`.created asc
LIMIT 1
) = `income`.id THEN true ELSE false END
as isFirstIncome
FROM
`income` as `income`
WHERE `income`.incomeType IN ('passive', 'active') AND `income`.status = 'paid'
ORDER BY `income`.created desc
LIMIT 50
The query works but slows down if I keep increasing the LIMIT or OFFSET. Any suggestions?
UPDATE 1:
Added WHERE statements used on the original query
UPDATE 2:
MYSQL version 5.7.22
You can achieve it using Ordered Analytical Function.
You can use ROW_NUMBER or RANK to get the desired result.
Below query will give the desired output.
SELECT *,
CASE
WHEN Row_number()
OVER(
PARTITION BY sourceid
ORDER BY created ASC) = 1 THEN true
ELSE false
END AS isFirstIncome
FROM income
WHERE incomeType IN ('passive', 'active') AND status = 'paid'
ORDER BY created desc
DB Fiddle: See the result here
My first thought is that isFirstIncome should be an extra column in the table. It should be populated as the data is inserted.
If you don't like that, let's try to optimize the query...
Let's avoid doing the subquery more than 50 times. This requires turning the query inside-out. (It's like "explode-implode", where the query gathers lots of stuff, then sorts it and throws most of the rows away.)
To summarize:
do the least amount of effort to just identify the 5 rows.
JOIN to whatever tables are needed (including itself if appropriate); this is to get any other columns desired (including isFirstIncome).
SELECT i3.*,
( ... using i3 ... ) as isFirstIncome
FROM (
SELECT i1.id, i1.sourceId
FROM `income` AS i1
WHERE i1.incomeType IN ('passive', 'active')
AND i1.status = 'paid'
ORDER BY i1.created DESC
LIMIT 50
) AS i2
JOIN income AS i3 USING(id)
ORDER BY i2.created DESC -- yes, repeated
(I left out the computation of isFirstIncome; it is discussed in other Answers. But note that it will be executed at most 50 times.)
(The aliases -- i1, i2, i3 -- are numbered in the order they will be "used"; this is to assist in following the SQL.)
To assist in performance, add
INDEX(status, incomeType, created, id, sourceId)
It should help with my formulation, but probably not for the other versions. Your version would benefit from
INDEX(sourceId, created, id)

MySQL Count from a table where condition on the last row of a related table

I'm new with MySQL and actually have a problem. (... and my English is poor... :D)
The database (extract)
I have 3 tables: Batch, MainPost and MainPostHistory.
A Batch has 1 to x MainPost, and a MainPost has 1 to x MainPostHistory (kind of log).
Every tables have an auto-increment primary key.
In addition, a MainPostHistory is defined by a DateTime and a MainPostStatusID.
Of course, all tables are linked by foreign key indexes.
What I have to do
I have to count (for each Batch) the number of MainPost having their last MainPostHistory with a MainPostStatusID equals to (for an example) 0.
So I have 2 parameters: the BatchID and the MainPostStatusID to check.
What I've done
I wrote the following query, but receive an error "Unknown column MP.ID" :
SELECT COUNT(*)
FROM MainPost AS MP
WHERE (MP.BatchID = #BatchID) AND (((
SELECT qMPH.MainPostStatusID
FROM (
SELECT MPH.MainPostStatusID
FROM MainPostHistory AS MPH
WHERE MPH.MainPostID = MP.ID
ORDER BY MPH.DateTime DESC
LIMIT 1
) AS qMPH
)) = #SearchedMainPostStatusID);
What I expect
Why this error, and how to solve it?
And, by the way, is it the best way to do it?
Please! And thanks for reading! :-)
You don't need to nest the subquery inside another one where MP.ID is out of scope:
SELECT COUNT(*)
FROM MainPost AS MP
WHERE (MP.BatchID = #BatchID) AND (
SELECT MPH.MainPostStatusID
FROM MainPostHistory AS MPH
WHERE MPH.MainPostID = MP.ID
ORDER BY MPH.DateTime DESC
LIMIT 1
) = #SearchedMainPostStatusID;

Performance issue on query with math calculations

This my query with its performance (slow_query_log):
SELECT j.`offer_id`, o.`offer_name`, j.`success_rate`
FROM
(
SELECT
t.`offer_id`,
(
SUM(CASE WHEN `offer_id` = t.`offer_id` AND `sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) / COUNT(*)
) AS `success_rate`
FROM `tblSales` AS t
WHERE DATE(t.`sales_time`) = CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
) AS j
LEFT JOIN `tblOffers` AS o
ON j.`offer_id` = o.`offer_id`
LIMIT 5;
# Time: 180113 18:51:19
# User#Host: root[root] # localhost [127.0.0.1] Id: 71
# Query_time: 10.472599 Lock_time: 0.001000 Rows_sent: 0 Rows_examined: 1156134
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
The query works fine and provides the output I needed. But it appears to be that its a bit slower.
offer_id and sales_status are already indexed in the tblSales. So do you have any suggestion on improving the inner query (where it calculates the success rate) so that performance can be improved? I have been playing with the math for more than 2hrs. But couldn't get a better way.
Btw, tblSales has lots of data. It contains those sales which are SUCCESSFUL, FAILED, PENDING, etc.
Thank you
EDIT
As you requested am including the table design also(only relevant fields are included):
tblSales
`sales_id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
`offer_id` bigint UNSIGNED NOT NULL DEFAULT '0',
`sales_time` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
`sales_status` ENUM('WAITING', 'SUCCESS', 'FAILED', 'CANCELLED') NOT NULL DEFAULT 'WAITING',
PRIMARY KEY (`sales_id`),
KEY (`offer_id`),
KEY (`sales_status`)
There are some other fields also in this table, that holds some other info. Amount, user_id, etc. which are not relevant for my question.
Numerous 'problems', none of which involve "math".
JOINs make things difficult. LEFT JOIN says "I don't care whether the row exists in the 'right' table. (I suspect you don't need LEFT??) But it also says "There may be multiple rows in the right table. Based on the column names, I will guess that there is only one offer_name for each offer_id. If this is correct, then here my first recommendation. (This will convince the Optimizer that there is no issue with the JOIN.) Change from
SELECT ..., o.offer_name, ...
LEFT JOIN `tblOffers` AS o ON j.`offer_id` = o.`offer_id`
...
to
SELECT ...,
( SELECT offer_name FROM tbloffers WHERE offer_id j.offer_id
) AS offer_name, ...
It also gets rid of a bug wherein you are assuming that the inner ORDER BY will be preserved for the LIMIT. This used to be the case, but in newer versions of MariaDB / MySQL, it is not. The ORDER BY in a "derived table" (your subquery) is now ignored.
2 down, a few more to go.
"Don't hide an indexed column in a function." I am referring to DATE(t.sales_time) = CURDATE(). Assuming you have no sales_time values for the 'future', then that test can be changed to t.sales_time >= CURDATE(). If you really need to restrict to just today, then do this:
AND sales_time >= CURDATE()
AND sales_time < CURDATE() + INTERVAL 1 DAY
The ORDER BY and the LIMIT should usually be put together. In your case, you may as well add the LIMIT to the "derived table", thereby leading to only 5 rows for the outer query to work with. But... There is still the question of getting them sorted correctly. So change from
SELECT ...
FROM ( SELECT ...
ORDER BY ... )
LIMIT ...
to
SELECT ...
FROM ( SELECT ...
ORDER BY ...
LIMIT 5 ) -- trim sooner
ORDER BY ... -- deal with the loss of ordering from derived table
Rolling it all together, I have
SELECT j.`offer_id`,
( SELECT offer_name
FROM tbloffers
WHERE offer_id = j.offer_id
) AS offer_name,
j.`success_rate`
FROM
( SELECT t.`offer_id`,
AVG(t.sales_status = 'SUCCESS') AS `success_rate`
FROM `tblSales` AS t
WHERE t.sales_time >= CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 5
) AS j
ORDER BY `success_rate` DESC;
(I took the liberty of shortening the SUM(...) in two ways.)
Now for the indexes...
tblSales needs at least (sales_time), but let's go for a "covering" (with sales_time specifically first):
INDEX(sales_time, sales_status, order_id)
If tbloffers has PRIMARY KEY(offer_id), then no further index is worth adding. Else, add this covering index (in this order):
INDEX(offer_id, offer_name)
(Apologies to other Answerers; I stole some of your ideas.)
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
Approach this with a simple JOIN and GROUP BY:
SELECT s.offer_id, o.offer_name,
AVG(s.sales_status = 'SUCCESS') as success_rate
FROM tblSales s JOIN
tblOffers o
ON o.offer_id = s.offer_id
WHERE s.sales_time >= CURDATE() AND
s.sales_time < CURDATE() + INTERVAL 1 DAY
GROUP BY s.offer_id, o.offer_name
ORDER BY success_rate DESC;
Notes:
The use of date arithmetic allows the query to make use of an index on tblSales(sales_time) -- or better yet tblSales(salesTime, offer_id, sales_status).
The arithmetic for success_rate has been simplified -- although this has minimal impact on performance.
I added offer_name to the GROUP BY. If you are learning SQL, you should always have all the unaggregated keys in the GROUP BY clause.
A LEFT JOIN is only needed if you have offers in tblSales which are not in tblOffers. I am guessing you have proper foreign key relationships defined, and this is not the case.
Based on not much information that you have provided (i mean table schema) you could try the following.
SELECT `o`.`offer_id`, `o`.`offer_name`, SUM(CASE WHEN `t`.`sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) AS `success_rate`
FROM `tblOffers` `o`
INNER JOIN `tblSales` `t`
ON `o`.`offer_id` = `t`.`offer_id`
WHERE DATE(`t`.`sales_time`) = CURDATE()
GROUP BY `o`.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 0,5;
You can find a sample of this query in this SQL Fiddle example
Without knowing your schema, the lowest hanging fruit I see is this part....
WHERE DATE(t.`sales_time`) = CURDATE()
Try changing that to something that looks like
Where t.sales_time >= #12-midnight-of-current-date and t.sales_time <= #23:59:59-of-current-date

MySQL get interval records

Are there any solutions on MySQL script to filter the results with specific interval number.
For example, if I have 100,000 records in database and I'd like to get only the record number 1000, 2000, 3000, etc. (step by 1000).
I could do this on server side script by getting the entire results (e.g. 100,000) and use syntax like:
for($i=0, $i <= 100,000, $i = $i+1000) $filterResult[] = $record[$i];
However, as you may see, it would pull stress to the system as 100,000 records will need to generated first.
Are there any solutions that could complete from database script? Please note that, primary key may not start with 1 - 100,000 as the results based on some condition in where clause.
Your help would be really appreciated.
You can do:
SELECT *
FROM tbl
WHERE id % 1000 = 0
But it seems like you don't want to rely on the primary key value, but rather the row ranking of a result set.
In that case, you can do:
SELECT *
FROM (
SELECT *, #rn:=#rn+1 AS rank
FROM tbl
CROSS JOIN (SELECT #rn:=0) var_init
WHERE column1 = value AND
column2 = value
) a
WHERE a.rank % 1000 = 0
Where column1 = value AND column2 = value is just a placeholder for whatever filtration you're doing in your query.

Producing multiple maximum and minimum values with SQL Query

I'm becoming frustrated with a curious limitation of SQL - its apparent inability to relate one record to another outside of aggregate functions. My problem is summarized thusly.
I have a table, already sorted. I need to find its maximum values (note the plural!) and minimum values. No, I am not looking for a single maximum or single minimum. More specifically I'm trying to generate a list of the local peaks of a numeric sequence. A rough description of an algorithm to generate this is:
WHILE NOT END_OF_TABLE
IF RECORD != FIRST_RECORD AND RECORD != LAST_RECORD THEN
IF ((RECORD(Field)<RECORD_PREVIOUS(Field) AND RECORD(Field)<RECORD_NEXT(Field)) OR
RECORD(Field)>RECORD_PREVIOUS(Field) AND RECORD(Field)>RECORD_NEXT(Field)) THEN
ADD_RESULT RECORD
END IF
END IF
END WHILE
See the Problem? I need to do a query that a given record must compare against the previous and next records' values. Can this even be accomplished in standard SQL?
Your frustration is shared by many; while SQL is great for working with general sets, it's terribly deficient when trying to work with issues specific to ordered sets (whether it's physically ordered in the table or there is an implicit or explicit logical order is irrelevant). There are some things that can help (for example, the rank() and row_number() functions), but the solutions can differ across RDBMS's.
If you can be specific about which platform you're working with, I or someone else can provide a more detailed answer.
You have to self-join twice and generate a rownumber without gaps:
In T-SQL:
WITH ordered AS (
SELECT ROW_NUMBER() OVER (ORDER BY your_sort_order) AS RowNumber
,* -- other columns here
)
SELECT *
FROM ordered
LEFT JOIN ordered AS prev
ON prev.RowNumber = ordered.RowNumber - 1
LEFT JOIN ordered AS next
ON next.RowNumber = ordered.RowNumber + 1
WHERE -- here you put in your local min/local max and end-point handling logic - end points will have NULL in next/prev
Yes. You need a self join - but without a database schema, it's hard to be specific about the solution.
Specifically, I'm wondering about the "ordering" thing you mention - but I'm going to assume there's an "ID" field we can use for this.
(Oh, and I'm using old-school join syntax, coz I'm a dinosaur).
select *
from myTable main,
myTable previous,
myTable next
where previous.id = main.id - 1
and next.id = main.id + 1
and previous.record > main.record
and next.record < main.record
(I think I've interpreted your requirement correctly in the greater/less than clauses, but adjust to taste).
SELECT
current.RowID,
current.Value,
CASE WHEN
(
(current.Value < COALESCE(previous.Value, current.Value + 1))
AND
(current.Value < COALESCE(subsequent.Value, current.Value + 1))
)
THEN
'Minima'
ELSE
'Maxima'
END
FROM
myTable current
LEFT JOIN
myTable previous
ON previous.RowID = (SELECT MAX(RowID) FROM myTable WHERE RowID < current.ROWID)
LEFT JOIN
myTable subsequent
ON subsequent.RowID = (SELECT MIN(RowID) FROM myTable WHERE RowID > current.ROWID)
WHERE
(
(current.Value < COALESCE(previous.Value, current.Value + 1))
AND
(current.Value < COALESCE(subsequent.Value, current.Value + 1))
)
OR
(
(current.Value > COALESCE(previous.Value, current.Value - 1))
AND
(current.Value > COALESCE(subsequent.Value, current.Value - 1))
)
Note: The < and > logic is copied from you, but does not cater for local maxima/minima that are equal across one or more consecutive records.
Note: I've created a fictional RowID to join the records in order, all the is important is that the joins get the "previous" and "subsequent" records.
Note: The LEFT JOINs and COALESCE statements cause the first and last values to always be counted as a maxima or minima.