MYSQL - how to string comparisons and query? - mysql

+--------------------+---------------+------+-----+---------+-------+
| ID | GKEY |GOODS | PRI | COUNTRY | Extra |
+--------------------+---------------+------+-----+---------+-------+
| 1 | BOOK-1 | 1 | 10 | | |
| 2 | PHONE-1 | 2 | 12 | | |
| 3 | BOOK-2 | 1 | 13 | | |
| 4 | BOOK-3 | 1 | 10 | | |
| 5 | PHONE-2 | 2 | 10 | | |
| 6 | PHONE-3 | 2 | 20 | | |
| 7 | BOOK-10 | 2 | 20 | | |
| 8 | BOOK-11 | 2 | 20 | | |
| 9 | BOOK-20 | 2 | 20 | | |
| 10 | BOOK-21 | 2 | 20 | | |
| 11 | PHONE-30 | 2 | 20 | | |
+--------------------+---------------+------+-----+---------+-------+
Above is my table. I want to get all records which GKEY > BOOK-2, Who can tell me the expression with mysql?
Using " WHERE GKEY>'BOOK-2' " Cannot get the correct results.

How about (something like):
(this is MSSQL - I guess it will be similar in MySQL)
select
*
from
(
select
*,
index = convert(int,replace(GKEY,'BOOK-',''))
from table
where
GKEY like 'BOOK%'
) sub
where
sub.index > 2
By way of explanation: The inner query basically recreates your table, but only for BOOK rows, and with an extra column containing the index in the right data type to make a greater than comparison work numerically.
Alternatively something like this:
select
*
from table
where
(
case
when GKEY like 'BOOK%' then
case when convert(int,replace(GKEY,'BOOK-','')) > 2 then 1
else 0
end
else 0
end
) = 1
Essentially the problem is that you need to check for BOOK before you turn the index into a numberic, as the other values of GKEY would create an error (without doing some clunky string handling).

SELECT * FROM `table` AS `t1` WHERE `t1`.`id` > (SELECT `id` FROM `table` AS `t2` WHERE `t2`.`GKEY`='BOOK-2' LIMIT 1)

Related

Remove duplicates leaving at least one with highest parameter from group

I have following schema:
+--+------+-----+----+
|id|device|token|cash|
+--+------+-----+----+
column device is unique and token is not unique and null by default.
What i want to achieve is to set all duplicate token values to default (null) leaving only one with highest cash. If duplicates have same cash leave first one.
I have heard about cursor, but it seems that it can be done with usual query.
I have tried following SELECT only to see if im right about my thought how to achieve this, but it seems im wrong.
SELECT
*
FROM
db.table
WHERE
db.table.token NOT IN (SELECT
*
FROM
(
SELECT DISTINCT
MAX(db.table.balance)
FROM
db.table
GROUP BY db.table.balance) temp
)
For example:
This table after query
+-----+---------+--------+-------+
| id | device | token | cash|
+-----+---------+--------+-------+
| 1 | dev_1 | tkn_1 | 3 |
| 2 | dev_2 | tkn_1 | 10 |
| 3 | dev_3 | tkn_2 | 10 |
| 4 | dev_4 | tkn_2 | 14 |
| 5 | dev_5 | tkn_3 | 10 |
| 6 | dev_6 | null | 10 |
| 7 | dev_7 | null | 10 |
| 8 | dev_8 | tkn_4 | 11 |
| 8 | dev_8 | tkn_4 | 11 |
| 8 | dev_8 | tkn_5 | 11 |
+-----+---------+--------+-------+
should be:
+-----+---------+--------+-------+
| id | device | token | cash|
+-----+---------+--------+-------+
| 1 | dev_1 | null | 3 |
| 2 | dev_2 | tkn_1 | 10 |
| 3 | dev_3 | null | 10 |
| 4 | dev_4 | tkn_2 | 14 |
| 5 | dev_5 | tkn_3 | 10 |
| 6 | dev_6 | null | 10 |
| 7 | dev_7 | null | 10 |
| 8 | dev_8 | tkn_4 | 11 |
| 8 | dev_8 | null | 11 |
| 8 | dev_8 | tkn_5 | 15 |
+-----+---------+--------+-------+
Thanks in advance :)
Try using an EXISTS subquery:
UPDATE yourTable t1
SET token = NULL
WHERE EXISTS (SELECT 1 FROM (SELECT * FROM yourTable) t2
WHERE t2.token = t1.token AND
t2.cash > t1.cash);
Demo
Note that this answer assumes that there would never be a tie for two token records having the same highest cash amount.
To set exactly one row in the even of duplicates on the maximum cash, use the id:
update t join
(select tt.*,
(select t3.id
from t t3
where t3.token = tt.token
order by t3.cash desc, id desc
) as max_cash_id
from t tt
) tt
on t.id = tt.id and t.id < tt.max_cash_id
set token = null;

Group condition in last data mysql

I have a data like this :
Table LOT
+-------+--------+
|Lot_id | Prod_id|
+-------+--------+
| LOT-1 | Prd-1 |
| LOT-1 | Prd-2 |
| LOT-1 | Prd-3 |
| LOT-2 | Prd-4 |
+-------+--------+
Table Process
+-------+--------+--------+------------+----------+
|proc_id|proc_cat|proc_seq|proc_prod_id|t_proc_qty|
+-------+--------+--------+------------+----------+
| 1 | Proc-A | 1 | Prd-1 | 100 |
| 2 | Proc-H | 2 | Prd-1 | 100 |
| 3 | Proc-D | 3 | Prd-1 | 100 |
| 4 | Proc-A | 1 | Prd-2 | 100 |
| 5 | Proc-H | 2 | Prd-2 | 100 |
| 6 | Proc-D | 3 | Prd-2 | 20 |
| 7 | Proc-Q | 4 | Prd-2 | 20 |
| 8 | Proc-A | 1 | Prd-3 | 100 |
| 9 | Proc-H | 2 | Prd-3 | 100 |
| 10 | Proc-D | 3 | Prd-3 | 50 |
| 11 | Proc-O | 1 | Prd-4 | 80 |
| 12 | Proc-F | 2 | Prd-4 | 80 |
| 13 | Proc-H | 3 | Prd-4 | 80 |
+-------+--------+--------+------------+----------+
And i want data like this if i want select just LOT=LOT-1.
table LOT joined to table Process and data is accumulated sum(t_proc_qty) from last proc_seq each proc_prod_id and group by proc_cat and order by proc_seq
+--------+--------+----------+
|proc_cat|proc_seq|t_proc_qty|
+--------+--------+----------+
| Proc-D | 3 | 150 |->accumulated from Prd-1 and prd-3 in last process is seq 3
| Proc-Q | 4 | 20 |->accumulated from Prd-2 in last process is seq 4
+--------+--------+----------+
What queries I use in MySQL ?
I stucked in query
SELECT proc_cat, proc_seq, SUM(t_proc_qty)
FROM Process
LEFT JOIN Lot ON proc_prod_id=Prod_id
WHERE Lot_id='LOT-1'
GROUP BY proc_prod_id
ORDER BY proc_seq DESC LIMIT 1
this schema for trial query SQLFiddle
From table Process you want the records with the highest proc_id per proc_prod_id:
select *
from process
where not exists
(
select *
from process later
where later.proc_prod_id = process.proc_prod_id
and later.proc_id > process.proc_id
);
From this data you want an aggregate per proc_cat and proc_sec. And you also want to consider only prod_id for 'LOT-1' in table LOT.
The complete query:
select proc_cat, proc_seq, sum(t_proc_qty)
from process
where proc_prod_id in (select prod_id from lot where lot_id = 'LOT-1')
and not exists
(
select *
from process later
where later.proc_prod_id = process.proc_prod_id
and later.proc_id > process.proc_id
)
group by proc_cat, proc_seq
order by proc_cat, proc_seq;
SQL fiddle: http://sqlfiddle.com/#!9/1fa3fd/5

Proper Indexing MySQL Table

I can't seem to get this query to perform any faster than 8 hours! 0_0
I have read up on indexing and I am still not sure I am doing this right.
I am expecting my query to calculate a value for BROK_1_RATING based on dates and other row values - 500,000 records.
Using record #1 as an example - my query should:
get all other records that have the same ESTIMID
ignore records where ANALYST =""
ignore records where ID is the same as record being compared i.e.
ID != 1
the records must fall within a time frame
i.e. BB.ANNDATS_CONVERTED <= working.ANNDATS_CONVERTED,
BB.REVDATS_CONVERTED > working.ANNDATS_CONVERTED
BB.IRECCD must = 1
Then count the result
Then write the count value to the BROK_1_RATING column for record #1
now do same for record#2, and #3 and so on for the entire table
In human terms - "Examine the date of record #1 - Now, within time frame from record #1 - count the number of times the number 1 exists with the same brokerage ESTIMID, do not count record #1, do not count blank ANALYST rows. Move on to record #2 and do the same"
UPDATE `working` SET `BROK_1_RATING` =
(SELECT COUNT(`ID`) FROM (SELECT `ID`, `IRECCD`, `ANALYST`, `ESTIMID`, `ANNDATS_CONVERTED`, `REVDATS_CONVERTED` FROM `working`) AS BB
WHERE
BB.`ANNDATS_CONVERTED` <= `working`.`ANNDATS_CONVERTED`
AND
BB.`REVDATS_CONVERTED` > `working`.`ANNDATS_CONVERTED`
AND
BB.`ID` != `working`.`ID`
AND
BB.`ESTIMID` = `working`.`ESTIMID`
AND
BB.`ANALYST` != ''
AND
BB.`IRECCD` = 1
)
WHERE `working`.`ANALYST` != '';
| ID | ANALYST | ESTIMID | IRECCD | ANNDATS_CONVERTED | REVDATS_CONVERTED | BROK_1_RATING | NO_TOP_RATING |
------------------------------------------------------------------------------------------------------------------
| 1 | DAVE | Brokerage000 | 4 | 1998-07-01 | 1998-07-04 | | 3 |
| 2 | DAVE | Brokerage000 | 1 | 1998-06-28 | 1998-07-10 | | 4 |
| 3 | DAVE | Brokerage000 | 5 | 1998-07-02 | 1998-07-08 | | 2 |
| 4 | DAVE | Brokerage000 | 1 | 1998-07-04 | 1998-12-04 | | 3 |
| 5 | SAM | Brokerage000 | 1 | 1998-06-14 | 1998-06-30 | | 4 |
| 6 | SAM | Brokerage000 | 1 | 1998-06-28 | 1999-08-08 | | 4 |
| 7 | | Brokerage000 | 1 | 1998-06-28 | 1999-08-08 | | 5 |
| 8 | DAVE | Brokerage111 | 2 | 1998-06-28 | 1999-08-08 | | 3 |
'EXPLAIN' results:
id| select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----------------------------------------------------------------------------------------------------------------------------------------
1 | PRIMARY | working | index | ANALYST | PRIMARY | 4 | NULL | 467847 | Using where
2 | DEPENDENT SUBQUERY | <derived3> | ALL | NULL | NULL | NULL | NULL | 467847 | Using where
3 | DERIVED | working | index | NULL | test_combined_indexes | 226 | NULL | 467847 | Using index
I have indexes on the single columns - and as well - have tried multiple column index like this:
ALTER TABLE `working` ADD INDEX `test_combined_indexes` (`IRECCD`, `ID`, `ANALYST`, `ESTIMID`, `ANNDATS_CONVERTED`, `REVDATS_CONVERTED`) COMMENT '';
Well you can shorten the query a lot by just removing the extra stuff:
UPDATE `working` as AA SET `BROK_1_RATING` =
(SELECT COUNT(`ID`) FROM `working` AS BB
WHERE BB.`ANNDATS_CONVERTED` <= AA.`ANNDATS_CONVERTED`
AND BB.`REVDATS_CONVERTED` > AA.`ANNDATS_CONVERTED`
AND BB.`ID` != AA.`ID`
AND BB.`ESTIMID` = AA.`ESTIMID`
AND BB.`ANALYST` != ''
AND BB.`IRECCD` = 1 )
WHERE `ANALYST` != '';

Subtract values from line above the current line in MySQL

I've the following table:
| id | Name | Date of Birth | Date of Death | Result |
| 1 | John | 3546565 | 3548987 | |
| 2 | Mary | 5233654 | 5265458 | |
| 3 | Lewis| 6546876 | 6548752 | |
| 4 | Mark | 6546546 | 6767767 | |
| 5 | Steve| 6546877 | 6548798 | |
And I need to do this for the whole table:
Result = 1, if( current_row(Date of Birth) - row_above_current_row(Date of Death))>X else 0
To make things easier, I guess, I created the same table above but with 2 extra id fields: id_minus_one and id_plus_one
Like this:
| id | id_minus_one | id_plus_one |Name | Date_of_Birth | Date_of_Death | Result |
| 1 | 0 | 2 |John | 3546565 | 3548987 | |
| 2 | 1 | 3 |Mary | 5233654 | 5265458 | |
| 3 | 2 | 4 |Lewis| 6546876 | 6548752 | |
| 4 | 3 | 5 |Mark | 6546546 | 6767767 | |
| 5 | 4 | 6 |Steve| 6546877 | 6548798 | |
So my approach would be something like (in pseudo code):
for id=1, ignore result. (Because there is no row above)
for id=2, Result = 1 if( (Where id=2).Date_of_Birth - (where id_minus_one=id-1).Date_of_Death )>X else 0
for id=3, Result = 1 if( (Where id=3).Date_of_Birth - (where id_minus_one=id-1).Date_of_Death)>X else 0
and so on for the whole table...
Just ignore id_plus_one if there is no need for it, I'll use it later for the same thing. So, if I manage to do this for id_minus_one I'll manage for id_plus_one as they are the same algorithm.
My question is how to pass that pseudo code into SQL code, I can't find a way to relate both ids in just one select.
Thank you!
As you describe this, it is just a self join with some logic on the select:
select t.*,
((t.date_of_birth - tprev.date_of_death) > x) as flag
from t left outer join
t tprev
on t.id_minus_one = tprev.id

Top 'n' results for each keyword

I have a query to get the top 'n' users who commented on a specific keyword,
SELECT `user` , COUNT( * ) AS magnitude
FROM `results`
WHERE `keyword` = "economy"
GROUP BY `user`
ORDER BY magnitude DESC
LIMIT 5
I have approx 6000 keywords, and would like to run this query to get me the top 'n' users for each and every keyword we have data for. Assistance appreciated.
Since you haven't given the schema for results, I'll assume it's this or very similar (maybe extra columns):
create table results (
id int primary key,
user int,
foreign key (user) references <some_other_table>(id),
keyword varchar(<30>)
);
Step 1: aggregate by keyword/user as in your example query, but for all keywords:
create view user_keyword as (
select
keyword,
user,
count(*) as magnitude
from results
group by keyword, user
);
Step 2: rank each user within each keyword group (note the use of the subquery to rank the rows):
create view keyword_user_ranked as (
select
keyword,
user,
magnitude,
(select count(*)
from user_keyword
where l.keyword = keyword and magnitude >= l.magnitude
) as rank
from
user_keyword l
);
Step 3: select only the rows where the rank is less than some number:
select *
from keyword_user_ranked
where rank <= 3;
Example:
Base data used:
mysql> select * from results;
+----+------+---------+
| id | user | keyword |
+----+------+---------+
| 1 | 1 | mysql |
| 2 | 1 | mysql |
| 3 | 2 | mysql |
| 4 | 1 | query |
| 5 | 2 | query |
| 6 | 2 | query |
| 7 | 2 | query |
| 8 | 1 | table |
| 9 | 2 | table |
| 10 | 1 | table |
| 11 | 3 | table |
| 12 | 3 | mysql |
| 13 | 3 | query |
| 14 | 2 | mysql |
| 15 | 1 | mysql |
| 16 | 1 | mysql |
| 17 | 3 | query |
| 18 | 4 | mysql |
| 19 | 4 | mysql |
| 20 | 5 | mysql |
+----+------+---------+
Grouped by keyword and user:
mysql> select * from user_keyword order by keyword, magnitude desc;
+---------+------+-----------+
| keyword | user | magnitude |
+---------+------+-----------+
| mysql | 1 | 4 |
| mysql | 2 | 2 |
| mysql | 4 | 2 |
| mysql | 3 | 1 |
| mysql | 5 | 1 |
| query | 2 | 3 |
| query | 3 | 2 |
| query | 1 | 1 |
| table | 1 | 2 |
| table | 2 | 1 |
| table | 3 | 1 |
+---------+------+-----------+
Users ranked within keywords:
mysql> select * from keyword_user_ranked order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| mysql | 2 | 2 | 3 |
| mysql | 4 | 2 | 3 |
| mysql | 3 | 1 | 5 |
| mysql | 5 | 1 | 5 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| query | 1 | 1 | 3 |
| table | 1 | 2 | 1 |
| table | 3 | 1 | 3 |
| table | 2 | 1 | 3 |
+---------+------+-----------+------+
Only top 2 from each keyword:
mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| table | 1 | 2 | 1 |
+---------+------+-----------+------+
Note that when there are ties -- see users 2 and 4 for keyword "mysql" in the examples -- all parties in the tie get the "last" rank, i.e. if the 2nd and 3rd are tied, both are assigned rank 3.
Performance: adding an index to the keyword and user columns will help. I have a table being queried in a similar way with 4000 and 1300 distinct values for the two columns (in a 600000-row table). You can add the index like this:
alter table results add index keyword_user (keyword, user);
In my case, query time dropped from about 6 seconds to about 2 seconds.
You can use a pattern like this (from Within-group quotas (Top N per group)):
SELECT tmp.ID, tmp.entrydate
FROM (
SELECT
ID, entrydate,
IF( #prev <> ID, #rownum := 1, #rownum := #rownum+1 ) AS rank,
#prev := ID
FROM test t
JOIN (SELECT #rownum := NULL, #prev := 0) AS r
ORDER BY t.ID
) AS tmp
WHERE tmp.rank <= 2
ORDER BY ID, entrydate;
+------+------------+
| ID | entrydate |
+------+------------+
| 1 | 2007-05-01 |
| 1 | 2007-05-02 |
| 2 | 2007-06-03 |
| 2 | 2007-06-04 |
| 3 | 2007-07-01 |
| 3 | 2007-07-02 |
+------+------------+