I have a huge table, I want to select groups of rows randomly.
The classic random query (SELECT * FROM table ORDER BY RAND() LIMIT 1000;
) selects not adjacent rows, but I want to select random groupS of n rows (in my picture n = 3rows).
The following picture is just example, the rows are random with every execution.
Not perfect - but maybe adequate for your purposes...
SELECT * FROM my_table;
+-----+
| id |
+-----+
| 1 |
| 2 |
| 3 |
...
| 188 |
| 189 |
| 190 |
| 191 |
...
| 253 |
| 254 |
| 255 |
| 256 |
+-----+
SELECT DISTINCT a.* FROM my_table a JOIN (SELECT * FROM my_table ORDER BY RAND() LIMIT 10) b ON b.id BETWEEN a.id AND a.id+2 ORDER BY id;
+-----+
| id |
+-----+
| 1 |
| 31 |
| 32 |
| 33 |
| 108 |
| 109 |
| 110 |
| 144 |
| 145 |
| 146 |
| 166 |
| 167 |
| 168 |
| 199 |
| 200 |
| 201 |
| 202 |
| 203 |
| 204 |
| 225 |
| 226 |
| 227 |
| 232 |
| 233 |
| 234 |
| 246 |
| 247 |
| 248 |
+-----+
28 rows in set (0.00 sec)
Assuming langids are contiguous you can select one group with SELECT ... WHERE id>3*r and id<=3*(r+1) where r is a random integer from 1 to MAX(id)/3. Multiplying r by 3 ensures no groups will overlap.
You could create a temporary table or subquery by SELECT DISTINCT CAST(langid/3 AS INT), order it randomly, and select the first N of them, then join against this table.
Consider this
SELECT id, name, #rank:=#rank+1 AS rank, CAST(rank/3 AS INT) AS groupid FROM
(SELECT id, name FROM Objects) z, (SELECT #rank:=0) zz;
This result set will give new contiguous IDs to the rows in the Objects table, so we don't have to assume anything about their actual primary keys. groupid indexes the groups.
From this set you can select any number of groupids randomly, and then for each chosen groupid you can find the original primary key.
Related
I am trying to select all but the last row of grouped data from a table.
+----+--------+--------+ +----+--------+--------+
| id | userID | amount | | id | userID | amount |
+----+--------+--------+ +----+--------+--------+
| 1 | 20 | 400 | | 1 | 20 | 400 |
| 2 | 20 | 200 | | 2 | 20 | 200 |
| 3 | 21 | 100 | => | 3 | 21 | 100 |
| 4 | 11 | 500 | | 4 | 11 | 500 |
| 5 | 11 | 250 | | 6 | 21 | 50 |
| 6 | 21 | 50 |
| 7 | 20 | 100 |
| 8 | 21 | 200 |
+----+--------+--------+
I have tried to use the query
SELECT *
FROM table
WHERE userID != (SELECT MAX(userID) FROM table)
GROUP
BY userID
but it only fetches one unique row of data even though there are more left
You have not aggreagtion function so you don't need group by
SELECT *
FROM table
WHERE userID != (
SELECT MAX(userID) FROM table
)
This can happen with mysql version <5.7 for mysql version > 5.7 (by default setting) this use of group by raise an error
E.g....
SELECT a.*
FROM my_table a
LEFT
JOIN
( SELECT MAX(id) id
FROM my_table
GROUP
BY userid
) b
ON b.id = a.id
WHERE b.id IS NULL
I have two tables as:
mysql> select * from survey;
+-----------+-----------+----------+--------+-----------+
| survey_id | client_id | stage_id | by_ref | no_branch |
+-----------+-----------+----------+--------+-----------+
| 2 | 65 | 72 | P | 15 |
| 3 | 67 | 72 | D | 2 |
+-----------+-----------+----------+--------+-----------+
2 rows in set (0.07 sec)
mysql> select * from allcode where code_type="MARKETING_STAGES";
+------------------+---------+------+--------------------+
| code_type | code_id | srno | code_name |
+------------------+---------+------+--------------------+
| MARKETING_STAGES | 72 | 1 | Enquiry |
| MARKETING_STAGES | 73 | 3 | Meeting |
| MARKETING_STAGES | 74 | 4 | Presentation |
| MARKETING_STAGES | 75 | 5 | Review / Follow up |
| MARKETING_STAGES | 76 | 6 | Negotiation |
| MARKETING_STAGES | 77 | 7 | Order |
| MARKETING_STAGES | 78 | 8 | Agreement |
| MARKETING_STAGES | 162 | 9 | Complete |
| MARKETING_STAGES | 163 | 2 | Tender |
+------------------+---------+------+--------------------+
9 rows in set (0.04 sec)
I want to update stage_id of survey table to next value which will be fetch from allcode code_id.
Right now I have client_id i.e. 65 from survey table, and want to update stage_id to 163 ( i.e. Next code_id from allcode table on sorting based on srno )
What I have tried till is
update survey as s
set s.stage_id=
(select code_id from allcode
where code_id > (select stage_id from (select * from survey where client_id=65 )as su)
and code_type="MARKETING_STAGES"
limit 1)
where client_id=65;
This query update stage_id of allcode to 73 and I want it to be updated to 163 (Depending on srno)
I would use joins in the update to get the next code_id based on srno:
update survey s
inner join allcode a1 on s.stage_id=a1.code_id
inner join allcode a2 on a1.srno=a2.srno-1
set s.stage_id=a2.code_id
where a1.code_type='MARKETING_STAGES' and a2.code_type='MARKETING_STAGES' and s.client_id=65
I assumed that srno field increments by 1 without any gaps. The purpose of the 1st join is to get the current stage_id's srno. Then the 2nd join gets the stage_id for the next srno.
You were missing an order by before limit in subquery.
So without touching your rest of the query, I just tried adding an order by and it seems to update first stage id from 72 to 163 as you want.
Rextester Demo
update survey as s
set s.stage_id=
(select code_id from allcode
where code_id > (select stage_id from (select * from survey where client_id=65 )as su)
and code_type="MARKETING_STAGES"
ORDER BY SRNO
limit 1)
where client_id=65;
I have table with a bunch of (machine id) mid's and (sensor id) sid's, and their corresponding (values) v's. Needless to say the id column is a unique row number. (NB: There are other columns in the table, and not all mid's have the same sid's)
Current Table:
+------+-------+-------+-----+---------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+---------------------+
| 51 | 10 | 1 | 40 | 2015/5/1 11:56:01 |
| 52 | 10 | 2 | 39 | 2015/5/1 11:56:25 |
| 53 | 10 | 2 | 40 | 2015/5/1 11:56:42 |
| 54 | 11 | 1 | 50 | 2015/5/1 11:57:52 |
| 55 | 11 | 2 | 18 | 2015/5/1 11:58:41 |
| 56 | 11 | 2 | 19 | 2015/5/1 11:58:59 |
| 57 | 11 | 3 | 58 | 2015/5/1 11:59:01 |
| 58 | 11 | 3 | 65 | 2015/5/1 11:59:29 |
+------+-------+-------+-----+---------------------+
Q: How would I get the MAX(v)for each sid for each mid?
Expected Output:
+------+-------+-------+-----+---------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+---------------------+
| 51 | 10 | 1 | 40 | 2015/5/1 11:56:01 |
| 53 | 10 | 2 | 40 | 2015/5/1 11:56:42 |
| 54 | 11 | 1 | 50 | 2015/5/1 11:57:52 |
| 56 | 11 | 2 | 19 | 2015/5/1 11:58:59 |
| 58 | 11 | 3 | 65 | 2015/5/1 11:59:29 |
+------+-------+-------+-----+---------------------+
The expected output is to obtain the whole row with all the (single) max value for all the sids in all the mids.
Addendum:
Due to a very big table, I need to place boundaries with dates. For the sample above the two boundary dates should be 2015/05/01 00:00:00 (1st of May'15) till 2015/05/02 00:00:00 (2nd of May'15). Q: How could I add this date boundary?
Find the max v in subquery for each combination of mid, sid and then join it with your original table to get the desired result.
select *
from your_table t
join (
select mid, sid, max(v) as v
from your_table
group by mid, sid
) t2 using (mid, sid, v);
Note here that if there are multiple rows with same sid, mid and v, it will return all of them.
As mentioned in the comments, since you have an id column, you can include that in limited correlated query like this:
select *
from your_table t1
where id = (select id
from your_table t2
where t1.mid = t2.mid
and t1.sid = t2.sid
order by v desc, id desc
limit 1
);
This will give you one single row per mid, sid combination with max v (and latest id in case of ties).
Use MAX() function with GROUP BY clause
SELECT id, mid, sid, MAX(v) AS v, `timestamp`
FROM MyTable
GROUP BY mid, sid;
This returns rows with maximum values of v for each combination of mid and sid.
I have the following data:
+---------+----------+----------+--------+
| id | someId | number | data |
+---------+----------+----------+--------+
| 27 | 123 | 1 | abcde1 |
| 28 | 123 | 3 | abcde2 |
| 29 | 123 | 1 | abcde3 |
| 30 | 123 | 5 | abcde4 |
| 31 | 124 | 4 | abcde1 |
| 32 | 124 | 8 | abcde2 |
| 33 | 124 | 1 | abcde3 |
| 34 | 124 | 2 | abcde4 |
| 35 | 123 | 16 | abcde1 |
| 245 | 123 | 3 | abcde2 |
| 250 | 125 | 0 | abcde3 |
| 251 | 125 | 1 | abcde4 |
| 252 | 125 | 7 | abcde1 |
| 264 | 125 | 0 | abcde2 |
| 294 | 123 | 0 | abcde3 |
| 295 | 126 | 0 | abcde4 |
| 296 | 126 | 0 | abcde1 |
| 376 | 126 | 0 | abcde2 |
+---------+----------+----------+--------+
And I want to get a MySQL query that gets me the data of the row with the highest number for each someId. Note that id is unique, but number isn't
SELECT someid, highest_number, data
FROM test_1
INNER JOIN (SELECT someid sid, max(number) highest_number
FROM test_1
GROUP BY someid) t
ON (someid=sid and number=highest_number)
Unfortunately it is not look quite efficient. In Oracle it could be possible to user OVER clause without subqueries, but MySQL…
Update 1
If there are several instances of highest number this will returs also several data for each pair of someid and number.
To get the only row per each someid we should preaggregate the source table to make someid and number pairs unique (see t1 subquery)
SELECT someid, highest_number, data
FROM
(SELECT someid, number, MIN(data) data
FROM test_1
GROUP BY
someid, number) t1
INNER JOIN
(SELECT someid sid, max(number) highest_number
FROM test_1
GROUP BY someid) t2
ON (someid=sid and number=highest_number)
Update 2
It is possible to simplify previous solution
SELECT someid,highest_nuimber,
(select min(data)
from test_1
where someid=t1.someid and number=highest_nuimber)
FROM
(SELECT someid, max(number) highest_nuimber
FROM test_1
GROUP BY someid) t1
If we materialize unique pairs of someid and number than it is possible to use correlated subquery. Unlike a JOIN it would not produce additional rows if highest value of number is repeated several times.
Slight tweak to Naeel's answer but to return just a single data result for any someId even if there's a tie you should add a GROUP BY:
SELECT t1.someid, t1.number, t1.data
FROM Table1 t1
INNER JOIN (SELECT someId sid, max(number) max_number
FROM Table1
GROUP BY someId) t2
ON (someId = sid AND number = max_number)
GROUP BY t1.someId
SQL Fiddle here
I have 2 tables :
table 'g'
+------+
| id |
+------+
| 1 |
| 32 |
| 3 |
| 6 |
| 5 |
| 22 |
| 54 |
| 21 |
+------+
table 'h'
+------+------+
| id | sl |
+------+------+
| 1 | 323 |
| 11 | 423 |
| 1 | 333 |
| 33 | 32 |
| 44 | 443 |
+------+------+
How can I show records from 2 tables like (select distinct id from 'g' and 'h' table and joining maximum 'sl' from 'h' table for each id. the 'id's of 'g' table which does not match with 'id' of table 'h', those 'sl' fields will be null)
+------+------+
| id | sl |
+------+------+
| 1 | 333 |
| 32 | null |
| 3 | null |
| 6 | null |
| 5 | null |
| 22 | null |
| 54 | null |
| 21 | null |
| 11 | 423 |
| 33 | 32 |
| 44 | 443 |
+------+------+
-Thanks.
This can be done with a UNION between the two, left joined as a derived table against the h to get the MAX() values:
SELECT
allids.id,
MAX(sl) AS sl
FROM
/* Subquery gets UNION (distinct, not UNION ALL) of ids from both tables */
(SELECT id FROM g UNION SELECT id FROM h) allids
/* LEFT JOINed back against `h` for the MAX() aggregates */
LEFT JOIN h ON allids.id = h.id
GROUP BY id
http://sqlfiddle.com/#!2/2c348/3
Update after comments:
To force them to sort in the arbitrary (un-ordered) order that they were inserted, it may sort of work to place a number literal into the subquery which gets used in the ORDER BY.
The order rows are inserted isn't really meaningful to the RDBMS though. You cannot reliably assume that they would always be given back to you in the same order, absent an ORDER BY clause.
SELECT
allids.id,
MAX(sl) AS sl
FROM
/* Awful hack adds a number literal which is used in the ORDER BY */
/* This still won't guarantee that the rows from each table will be in the original order though */
(SELECT id, 1 AS sort FROM g UNION SELECT id, 2 AS sort FROM h) allids
LEFT JOIN h ON allids.id = h.id
GROUP BY id
ORDER BY sort
http://sqlfiddle.com/#!2/2c348/6