I've got a really big problem, and it stems from a table with 50k+ records.
This table looks something like this (+15 or so more columns that aren't too important):
table_1
date | name | email | num_x | num_y
I also have another table ON A DIFFERENT DB (same server) that looks something like this (+1 not important column):
table_2
name | comment | status
table_1 is updated daily with new entries (it is a feed table for use on other projects), which means there are a lot of repeat "name" rows. This is intended. table_2 contains comments and status notes about "name"s, but no repeat "name"s.
I need to write a query that will select all "name"s from table_1 where the total of all num_x + num_y > X. So, for example, if this were a few rows...
2010-11-19 | john.smith | john.smith#example.com | 20 | 20
2010-11-19 | joel.schmo | joel.schmo#example.com | 10 | 10
2010-11-18 | john.smith | john.smith#example.com | 20 | 20
2010-11-18 | joel.schmo | joel.schmo#example.com | 10 | 10
.. and I needed to find all "name"s with total num_x + num_y > 50, then I'd return
john.smith | john.smith#example.com | 80 . I would also return john.smith's status and comment from the other DB.
I wrote a query that I believe works fine, but it's problematic because it takes forever and a day to run. I also successfully retrieve records from the other db (I don't have that listed below).
SELECT
name,
email,
SUM(num_x + num_y) AS total
FROM
table_1
GROUP BY
name
HAVING
SUM(num_x + num_y) > 100
ORDER BY
total ASC
Is there a better way to go about this?
Thanks everyone!
Dylan
Why do you repeat the sum in GHAVING rather than repeat total? Unless im missing something, there is no difference in results and avoiding the second sum would save time
If you can skip the ORDER BY clause and don't mind the slightly different select, I think you'll get some amount of speed up by splitting up the sum. I have a small database and have tested that its a valid query and results are correct, but its not nearly large enough to quantify the performance difference.
SELECT
name,
email,
SUM(num_x) as sumX, SUM(num_y) AS sumY
FROM
table_1
GROUP BY
name
HAVING
sumX + sumY > 100
An index on name is a no-brainer. That's the simplest thing that will speed it up.
Create an index for name, this will improve the performance:
ALTER TABLE `table_1` ADD INDEX (`name`);
But, redesigning your databases would be my recomendation. Create an artificial key for names, something like id_name | name | email, beeing id_name an integer auto_increment, this way you'll have a better performance.
Try:
SELECT
name,
email,
num_x + num_y AS total
FROM
table_1
WHERE
num_x + num_y > 100
ORDER BY
total ASC
Just getting rid of the grouping should make quite a significant difference.
maybe change the database the sum is made everytime you change x or y but it really depends of how often you change them...
Otherwise you can try to do the sum only once...
but I don't see why you do a order by on only one table if you've got a primary key...
Related
I have a MariaDB table with an archive of past lottery results, imagine EuroMillions or Powerball lotteries.
For example on EuroMillions numbers go from 1 to 50 and then the extra balls from 1 to 12, each result is 5 numbers form the main pool and 2 from the extra pool. So my historic results table could look like this:
Lottery Results table
(other columns like id, date, draw number, etc) | main_numbers | extra_numbers | (timestamp columns)
... | 1,2,3,4,5 | 1,2 | ...
... | 3,12,34,35,45 | 5,11 | ...
... | 4,15,34,39,45 | 10,11 | ...
... | 7,11,25,28,44 | 10,12 | ...
(you get the idea, I have thousands of records...)
So I could select main_numbers and get result "3,12,34,35,45" for that second example row. And for the extra_numbers I would get "5,11".
What I want is to given a set of numbers for main and extra to see if they match any of my results, finding any number of numbers (numbered lottery balls).
So for example if I SELECT to find main_numbers "5,9,22,34,45" with extra_numbers "2,11" I would get (from my extracted example) two records:
... | 3,12,34,35,45 | 5,11 | ...
... | 4,15,34,39,45 | 10,11 | ...
Matching two main numbers and one extra number, in this case finding lottery prizes in the results table. Makes sense?
I'm using MariaDB and I'm a bit lost on how to proceed, I tried WHERE IN, FIELD_IN_SET, etc.
Is there a way to perform a SELECT to find results in only one statement or do I have to pick all records and then iterate elsewhere, php for example?
My aim would be to have it in one statement, so I could just send the numbers and get the matching records... Possible?
I hope this makes sense.
Many thanks for your answers.
Consider the following.
For simplicity, let's say that a lottery comprises 3 main balls, and two bonus balls:
DROP TABLE IF EXISTS lottery_results;
CREATE TABLE lottery_results
(draw_id INT NOT NULL
,ball_no INT NOT NULL
,ball_val INT NOT NULL
,PRIMARY KEY(draw_id,ball_no)
);
INSERT INTO lottery_results VALUES
(1,1,22),
(1,2,35),
(1,3,62),
(1,4,27),
(1,5,17),
(2,1,18),
(2,2,33),
(2,3,49),
(2,4, 4),
(2,5,35);
And we want to find all results where 34, 35, or 36 were drawn as a main number...
SELECT draw_id
FROM lottery_results
WHERE ball_no <=3
AND ball_val IN(34,35,36);
+---------+
| draw_id |
+---------+
| 1 |
+---------+
Thanks Strawberry,
I found a solution if I have all numbers in distinct columns, but could I find if they are in the same column in CSV?
So if I put my CSV in distinct columns for numbers (n_1...n_5) and extra numbers for the stars in (s_1, s_2) I can seek matched in those multiple columns.
This is using multiple columns:
To find matches numbers 1,2,3,4,5 with stars 1,2...
In EuroMillions you get a prize with 2 or more numbers and any star (one or two).
SELECT
main_numbers, extra_numbers,
((n_1 IN (1,2,3,4,5)) +
(n_2 IN (1,2,3,4,5)) +
(n_3 IN (1,2,3,4,5)) +
(n_4 IN (1,2,3,4,5)) +
(n_5 IN (1,2,3,4,5))) AS matched_numbers,
((s_1 IN (1,2)) +
(s_2 IN (1,2))) AS matched_stars,
created_at
FROM `lottery_results_archive`
HAVING matched_numbers >= 3 OR matched_numbers = 2 AND matched_stars > 0
ORDER BY matched_numbers DESC, matched_stars DESC, created_at DESC
Makes sense?
Thanks.
How to retrieve odd rows from the table?
In the Base table always Cr_id is duplicated 2 times.
Base table
I want a SELECT statement that retrieves only those c_id =1 where Cr_id is always first as shown in the output table.
Output table
Just see the base table and output table you should automatically know what I want, Thanx.
Just testing min date should be enough
drop table if exists t;
create table t(c_id int,cr_id int,dt date);
insert into t values
(1,56,'2020-12-17'),(56,56,'2020-12-17'),
(1,8,'2020-12-17'),(56,8,'2020-12-17'),
(123,78,'2020-12-17'),(1,78,'2020-12-18');
select c_id,cr_id,dt
from t
where c_id = 1 and
dt = (select min(dt) from t t1 where t1.cr_id = t.cr_id);
+------+-------+------------+
| c_id | cr_id | dt |
+------+-------+------------+
| 1 | 56 | 2020-12-17 |
| 1 | 8 | 2020-12-17 |
+------+-------+------------+
2 rows in set (0.002 sec)
What you're looking for could be "partition by", at least if you're working on mssql.
(In the future, please include more background, SQL is not just SQL)
https://codingsight.com/grouping-data-using-the-over-and-partition-by-functions/
I have an old query lying around, that is able to put a sorting index on data who lacks this, although the underlying reason is 99.9% sure to be a bad data design.
Typically I use this query to remove bad data, but you may rewrite it to become a join instead, so that you can identify the data you need.
The reason why I'm not putting that answer here, is to point out, bad data design results in more work when reading it afterwards, whom seems to be the real root cause here.
DELETE t
FROM
(
SELECT ROW_NUMBER () OVER (PARTITION BY column_1 ,column_2, column_3 ORDER BY column_1,column_2 ,column_3 ) AS Seq
FROM Table
)t
WHERE Seq > 1
What I mean by literal order is that, altough the IDs are auto-increment, through business logic, it might end up that 8 comes after 4 when 5 should've been there. That is to say, if a deletion if ID happens, there's no re-indexing
This is how my rows look (table name is wp_posts):
+-----+-------------+----+--+--+--+
| ID | post_author | .. | | | |
+-----+-------------+----+--+--+--+
| 4 | .. | | | | |
+-----+-------------+----+--+--+--+
| 8 | .. | | | | |
+-----+-------------+----+--+--+--+
| 124 | .. | | | | |
+-----+-------------+----+--+--+--+
| 672 | .. | | | | |
+-----+-------------+----+--+--+--+
| 673 | .. | | | | |
+-----+-------------+----+--+--+--+
| 674 | .. | | | | |
+-----+-------------+----+--+--+--+
ID is an int that has the auto-increment characteristic, but when a post is deleted, there is no re-assignment of IDs. It will just simply get deleted and because it's auto-increment, you can still assume that, vertically, the items that come after the one you're looking at are always bigger than the ones before.
I'm querying for ID: SELECT ID FROM wp_posts to get a list of all the IDs I need. Now, it just so happens that I need to batch all of this, using AJAX requests because once I retrieve the IDs, I need to operate on them.
Thing is, I don't really understand how to pass my data back to AJAX. What LIMIT does is, if I provide 2 arguments, such as: SELECT ID FROM wp_posts LIMIT 1,3, it'll return back 4,8,124 because it looks at row number. But what do I do on the next call? Yes, the first call always starts with 1, but once I need to launch the second AJAX request to perform yet another SELECT, how do I know where I should start? In my case, I'd want to start again at 4, so, my second query would be SELECT ID FROM wp_posts LIMIT 4, 7 and so on.
Do I really need to send that counter (even if I can automate it, since, you see, it's an increment of 3) back?
Is there no way for SQL to handle this automatically?
You have many confusions in your question. Let me try to clear up some basic ones.
First, the auto-incremented key is the primary key for the table. You do not need to worry about gaps. In fact, the key should basically be meaningless. It fulfills the following:
It is guaranteed to be unique.
It is guaranteed to be in insertion order.
Gaps are allowed and of no concern. There is no re-indexing. It is a bad idea because:
Primary keys uniquely identify each row and this mapping should be consistent across time.
Primary keys are used in other tables to refer to values, so re-indexing would either invalidate those relationships or require massive changes to many tables.
Re-indexes pre-supposes that the value means something, when it doesn't.
Second, a query such as:
SELECT ID
FROM wp_posts
LIMIT 1, 3;
Can return any three rows. Why? Because you have no specified an ORDER BY and SQL result sets without ORDER BY are unordered. There are no guarantees. So you should always be in the habit of using an ORDER BY.
Third, if you want to essentially "page" through results, then use the OFFSET feature in LIMIT (as you have above):
SELECT ID
FROM wp_posts
ORDER BY ID
LIMIT #offset, 3;
This will allow you to reset the #offset value and go to which rows you want.
First query:
SELECT ID FROM wp_posts ORDER BY ID LIMIT 3
This returns 4,8,124 as you said. In your client, save the largest ID value in a variable.
Subsequent queries:
SELECT ID FROM wp_posts WHERE ID > ? ORDER BY ID LIMIT 3
Send a parameter into this query using the greated ID value from the previous result. It's still in a variable.
This also helps make the query faster, because it doesn't have to skip all those initial rows every time. Paging through a large dataset using LIMIT/OFFSET is pretty inefficient. SQL has to actually read all those rows even though it's not going to return them.
But if you use WHERE ID > ? then SQL can efficiently start the scan in the right place, on the first row that would be included in the result.
Seems, you want to return the first three rows of your query ordered by currently existing ID values(whatever they're after all DML statement's applied on the table wp_posts).
Then, Consider using an auxiliary iteration variable #i to provide an ordered integer value set starting from 1 and increasing as 2,3,... without any gaps :
select t.*
from
(
select #i := #i + 1 as rownum, t1.*
from tab t1
join (select #i:=0) t2
) t
order by rownum
limit 0,3;
Demo
I have a top list page on my website, which fetches 25 rows with highest values in a particular column. I have no issues fetching the top list if it is based on one column (score for instance), but when more columns are involved, I faced some performance issues.
In the problematic case I want to select 25 rows, ordered by a sum of two columns in a descending order.
SELECT username, rank1 + rank2 AS rank FROM users ORDER BY rank DESC LIMIT 25
The query works, but takes approximately 0.25 seconds to finish, in contrast to queries on single column which take about 0.0003. Below is the result for explain query:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | accounts | ALL | NULL | NULL | NULL | NULL | 517874 | Using filesort
Both rank1 and rank2 are indexed, but clearly the indexes are not used for this query. Is there a way to improve the performance by somehow editing the query or the indexes?
MySQL does not handle this situation very well. Other databases (Oracle, Postgres, SQL Server, for example) offer some form of function-based indexes which can directly solve this problem. To do this in MySQL requires adding a new column to the table, then adding a trigger to keep it up-to-date. And finally an index on the new column. Perhaps a lot of work.
In some situations, you might be able to assume that the top XXX by the sum is going to be in the top YYY for each ranking. If this is true, then a query such as this will improve performance:
select ur1.*
from (select u.*
from users u
order by rank1 desc
limit 1000
) ur1 join
(select u.*
from users u
order by rank2 desc
limit 1000
) ur2
on ur1.username = ur2.username
order by ur1.rank1 + ur1.rank2 desc
limit 25;
This extracts the top 1000 (or whatever values) by each ranking and then identifies users common to the two lists. Hopefully there are 25 such users (for your application). At the very least, this should perform better than the overall query. You can first try this. If it returns 25 rows, then great. Otherwise, go for your original query.
I am thinking of returning a randomly ordered SQL response where the results are mixed up randomly, with a limit.
The thing is I need All the rows back, basically divided into groups (chunks of rows). I hope I am clear.
For example, from table A:
ID | NAME | PROFESSION
++++++++++++++++++++++++++++++++
1 | Jack | Carpenter
2 | Rob | Manager
3 | Phil | Driver
4 | Mary | Cook
5 | Tim | Postman
6 | Bob | Programmer
The query would return something like this:
With a limit of 0,2:
6 | Bob | Programmer
4 | Mary | Cook
With a limit of 2,2:
1 | Jack | Carpenter
5 | Tim | Postman
With a limit of 4,2:
3 | Phil | Driver
2 | Rob | Manager
Note: all the table rows were returned. In my page I need to have a << >> buttons that will show the user the needed "group"s of data.
How do I go about writing such a query ?
A better name for your explained problem would be randomly shuffled records. That is true that the order is random but since the order needs to be remembered, you have no choice but to save it in a column. You can do this by saving a randomly populated field and ordering your records based on that. This way you have ordered your records in no specific order while the order is remembered for future select queries. And whenever you got tired of the order, you can update the mentioned field with new randomly generated values to shuffle them again. This is the technique used by players to shuffle a playlist without replaying a song twice.
[EDIT]
While the first given solution stands as the general answer, there's a hack you can use in MySQL to randomly order records. In this way, all you need to store for remembering an order is its seed.
SELECT * FROM tbl ORDER BY RAND(s);
For instance, if you want each user see the records in some different randomly ordered, you can use their user_id as the seed. This way the order each user will ever see the records in, will remain the same while it is random and different from other users.
I can think of two things here:
If the data in the table is huge, add a column that tells the group to which a row belongs. When the user clicks on >> or << buttons, get the rows for that particular group.
If you are dealing with small amount of data, you could do this in the code itself.
If you use ORDER BY RAND() then you will have to flag selected records somewhere which is no advisable.
You can use some intelligent algorithm with combination of total_pages and ID e.g.
SELECT *
FROM my_table
ORDER BY MOD(ID, total_pages);
Add a column to the table called something like random_col
Then each time you need to randomise the table you run
UPDATE table SET random_col = RAND()
And now each time you want to retrieve results you run a normal select
SELECT * FROM table ORDER BY random_col ASC LIMIT x,y
And the results will appear in the same order until you randomise them again by running the 'UPDATE'