Switching Raw greatest-n-per-group MySQL query to Laravel query builder - mysql

I want to move a raw mysql query into Laravel 4's query builder, or preferably Eloquent.
The Setup
A database for storing discount keys for games.
Discount keys are stored in key sets where each key set is associated with one game (a game can have multiple keysets).
The following query is intended to return a table of key sets and relevant data, for viewing on an admin page.
The 'keys used so far' is calculated by a scheduled event and periodically stored/updated in log entries in a table keySetLogs. (it's smart enough to only log data when the count changes)
We want to show the most up-to-date value of 'keys used', which is a 'greatest-n-per-group' problem.
The Raw Query
SELECT
`logs`.`id_keySet`,
`games`.`name`,
`kset`.`discount`,
`kset`.`keys_total`,
`logs`.`keys_used`
FROM `keySets` AS `kset`
INNER JOIN
(
SELECT
`ksl1`.*
FROM `keySetLogs` AS `ksl1`
LEFT OUTER JOIN `keySetLogs` AS `ksl2`
ON (`ksl1`.`id_keySet` = `ksl2`.`id_keySet` AND `ksl1`.`set_at` < `ksl2`.`set_at`)
WHERE `ksl2`.`id_keySet` IS NULL
ORDER BY `id_keySet`
)
AS `logs`
ON `logs`.`id_keySet` = `kset`.`id`
INNER JOIN `games`
ON `games`.`id` = `kset`.`id_game`
ORDER BY `kset`.`id_game` ASC, `kset`.`discount` DESC
Note: the nested query gets the most up-to-date keys_used value from the logs. This greatest-n-per-group code used as discussed in this question.
Example Output:
+-----------+-------------+----------+------------+-----------+
| id_keySet | name | discount | keys_total | keys_used |
+-----------+-------------+----------+------------+-----------+
| 5 | Test_Game_1 | 100.00 | 10 | 4 |
| 6 | Test_Game_1 | 50.00 | 100 | 20 |
| 3 | Test_Game_2 | 100.00 | 10 | 8 |
| 4 | Test_Game_2 | 50.00 | 100 | 14 |
| 1 | Test_Game_3 | 100.00 | 10 | 1 |
| 2 | Test_Game_3 | 50.00 | 100 | 5 |
...
The Question(s)
I have KeySet, KeySetLog and Game Eloquent Models created with relationship functions set up.
How would I write the nested query in query builder?
Is it possible to write the query entirely with eloquent (no manually writing joins)?

I don't know Laravel or Eloquent so I probably shouldn't comment, but if performance isn't at stake then it seems to me that this query could be rewritten something like this:
SELECT ksl1.id_keySet
, g.name
, k.discount
, k.keys_total
, ksl1.keys_used
FROM keySetLogs ksl1
LEFT
JOIN keySetLogs ksl2
ON ksl1.id_keySet = ksl2.id_keySet
AND ksl1.set_at < ksl2.set_at
LEFT
JOIN keysets k
ON k.id = l.id_keySet
LEFT
JOIN games g
ON g.id = k.id_game
WHERE ksl2.id_keySet IS NULL
ORDER
BY k.id_game ASC
, k.discount DESC

Related

SQL - get records from many-to-many relations by the user itself -OR- his group

I have two database tables, one as the main table and the other as the relation table.
The first table is a table of contents and the second table is a table that connects to users or groups.
Some data may also be modified in this second table.
I'm not sure about the structure and performance.
for example, we have User Id 160 which is under group id 7
So for the first, we have a post Table.
id | title | content | cover | status
------------------------------------------------
1 | first | content 1 | /img/... | 1
2 | second | content 2 | /img/... | 1
3 | another | content 3 | /img/... | 1
4 | four | content 4 | /img/... | 1
5 | five | content 5 | /img/... | 1
and for the second we have a post_rel Table:
id | group_id | user_id | post_id | title | cover | sort | status
---------------------------------------------------------------------------
1 | 7 | NULL | 1 | g title | img/... | 1 | 1
2 | NULL | 160 | 1 | u title | NULL | 2 | 1 *** selected for user_id
3 | 7 | NULL | 2 | NULL | img/... | 6 | 0
4 | NULL | 160 | 2 | NULL | img/... | 4 | 1 *** selected for user_id
5 | NULL | 160 | 3 | some | img/... | 3 | 1 *** selected for user_id
6 | 7 | NULL | 4 | NULL | img/... | 9 | 1 *** selected for group_id
7 | NULL | 165 | 5 | NULL | img/... | 5 | 0
This is the basic query we have.
select
`post_rel`.`title` as `custom_title`,
`post_rel`.`cover` as `custom_cover`,
`post_rel`.`group_id`,
`post_rel`.`user_id`,
`post`.*
from
`post`
inner join `post_rel` on `post`.`id` = `post_rel`.`post_id`
where
`post`.`status` = 1
and `post_rel`.`status` = 1
and (
`post_rel`.`user_id` = 160
or (
`post_rel`.`group_id` = 7
and `post_rel`.`post_id` not in (
select
`post_rel`.`post_id`
from
`post_rel`
where
`post_rel`.`user_id` = 160
)
)
)
order by
`post_rel`.`sort` asc
So, what you think about the basic query? Especially in the subquery, won't performance drop in a large table? Is it possible to write a better and simpler query or change the structure?
Edit: this is sqlfiddle example of my code and structure http://sqlfiddle.com/#!9/ed9d4b/1
I would change it to use "not exists" instead of "not in" and would use aliases so I could pull it off like so:
select
b.`title` as `custom_title`,
b.`cover` as `custom_cover`,
b.`group_id`,
b.`user_id`,
a.*
from
`post` a
inner join `post_rel` b on a.`id` = b.`post_id`
where
a.`status` = 1
and b.`status` = 1
and (
b.`user_id` = 160
or (
b.`group_id` = 7
and not exists (
select
'x'
from
`post_rel` c
where
c.`user_id` = 160 and c.`post_id`=b.`post_id`
)
)
)
order by
b.`sort` asc
Typically when managing users and group, there's this notion of an exception user who directly can get assigned to assets just like the whole group. This seems to be an example of that.
From a modeling-only perspective, there are 2 ways to deal with that:
Ensure that every user exists in a group and that you only assign assets to groups. For the exception user, create a group. You could even enforce that every user belongs to only one group. This way your post_rel table deals with only groups. Unfortunately, the relationship between group and user is not understood well enough to weigh in appropriately.
Driven only by the need to eliminate null values towards a good model which also reduces overhead, the other option is to use name value pairs and allows the User and Group to exist in the same field with another field besides it, denoting Group or User.
These are the SQL Fiddle:
NOT EXISTS version: http://sqlfiddle.com/#!9/1af8cf/2
NOT IN version: http://sqlfiddle.com/#!9/1af8cf/1
Some reading on nulls https://dev.mysql.com/doc/refman/5.6/en/data-size.html
Specifically:
Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.

Left Join takes very long time on 150 000 rows

I am having some difficulties to accomplish a task.
Here is some data from orders table:
+----+---------+
| id | bill_id |
+----+---------+
| 3 | 1 |
| 9 | 3 |
| 10 | 4 |
| 15 | 6 |
+----+---------+
And here is some data from a bills table:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+----+
I want to list all the bills that have no order associated with.
In order to achieve that, I thought that the use of LEFT JOIN was appropriated so I wrote this request:
SELECT * FROM bills
LEFT JOIN orders
ON bills.id = orders.bill_id
WHERE orders.bill_id IS NULL;
I thought that I would have the following result:
+----------+-----------+----------------+
| bills.id | orders.id | orders.bill_id |
+----------+-----------+----------------+
| 2 | NULL | NULL |
| 5 | NULL | NULL |
+----------+-----------+----------------+
But I can't reach the end of the request, it has run more than 5 minutes without result, I stopped the request because this can't be a production time anyway.
My real dataset has more than 150 000 orders and 100 000 bills. Is the dataset too big?
Is my request wrong somewhere?
Thank you very much for your tips!
EDIT: side note, the tables have no foreign keys defined... *flies away*
Your query is fine. I would use table aliases in writing it:
SELECT b.*
FROM bills b LEFT JOIN
orders o
ON b.id = o.bill_id
WHERE o.bill_id IS NULL;
You don't need the NULL columns from orders, probably.
You need an index on orders(bill_id):
create index idx_orders_billid on orders(bill_id);
By your where statement, I assume your looking for orders that have no bills.
If that's the case you don't need to do a join to the bills table as they would by definition not exist.
You will find
SELECT * FROM orders
WHERE orders.bill_id IS NULL;
A much better performing query.
Edit:
Sorry I missed your "I want to list all the bills that have no order associated with." when reading the question. As #gordon pointed out an index would certainly help. However if changing the scheme is feasible I would rather have a nullable bill.order_id column instead of a order.bill_id because you won't need a left join, an inner join would suffice to get order bills as it would be a quicker query for your other assumed requirements.

Issue with grouping?

I asked earlier about a solution to my problem which worked however now when I'm trying to get some information from a second table (that stores more information) I'm running into a few issues.
My tables are as follows
Users
+----+----------------------+---------------+------------------+
| id | username | primary_group | secondary_groups |
+----+----------------------+---------------+------------------+
| 1 | Username1 | 3 | 7,10 |
| 2 | Username2 | 7 | 3,5,10 |
| 3 | LongUsername | 1 | 3,7 |
| 4 | Username3 | 1 | 3,10 |
| 5 | Username4 | 7 | |
| 6 | Username5 | 5 | 3,7,10 |
| 7 | Username6 | 2 | 7 |
| 8 | Username7 | 4 | |
+----+----------------------+---------------+------------------+
Profile
+----+---------------+------------------+
| id | facebook | steam |
+----+---------------+------------------+
| 1 | 10049424151 | 11 |
| 2 | 10051277183 | 55 |
| 3 | 10051281183 | 751 |
| 4 | | 735 |
| 5 | 10051215770 | 4444 |
| 6 | 10020210531 | 50415 |
| 7 | 10021056938 | 421501 |
| 8 | 10011547143 | 761 |
+----+---------------+------------------+
My SQL is as follows (based off the previous thread)
SELECT u.id, u.username, p.id, p.facebook, p.steam
FROM users u, profile p
WHERE p.id=u.id AND FIND_IN_SET( '7', secondary_groups )
OR primary_group = 7
GROUP BY u.id
The problem is my output is displayed as below
+----+----------------------+-------------+-------+
| id | username | facebook | steam |
+----+----------------------+-------------+-------+
| 1 | Username1 | 10049424151 | 11 |
| 2 | Username2 | 10051277183 | 55 |
| 3 | LongUsername | 10051281183 | 751 |
| 4 | Username4 | 10051215770 | 4444 |
| 5 | Username5 | 10049424151 | 11 |
| 6 | Username6 | 10049424151 | 55 |
+----+----------------------+-------------+-------+
I'm guessing that the problem is that profile rows with a primary_group of 7 are getting matched to all user rows. Remove the GROUP BY, and you'll be able to better see what is happening.
But that's just a guess. It's not clear what you are attempting to achieve.
I suspect you are getting tripped up with the order of precedence of the AND and OR. (The AND operator has a higher order of precedence than OR operator. That means the AND will be evaluated before the OR.)
The quick fix is to just add some parens, to override the default order of operations. Something like this:
WHERE p.id=u.id AND ( FIND_IN_SET('7',secondary_groups) OR primary_group = 7 )
-- ^ ^
The parens will cause the OR operation to be evaluated (as either TRUE, FALSE or NULL) and then the result from that will be evaluated in the AND.
Without the parens, it's the same as if the parens were here:
WHERE ( p.id=u.id AND FIND_IN_SET('7',secondary_groups) ) OR primary_group = 7
-- ^ ^
With the AND condition evaluated first, and the result from that is operated on by OR. This is what is causing profile rows with a 7 to be matched to rows in user with different id values.
A few pointers on style:
avoid the old-school comma operator for join operations, and use the newer JOIN syntax
place the join predicates (conditions) in the ON clause, other filtering criteria in the WHERE clause
qualify all column references
As an example:
SELECT u.id
, u.username
, p.id
, p.facebook
, p.steam
FROM users u
JOIN profile p
ON p.id = u.id
WHERE u.primary_group = 7
OR FIND_IN_SET('7',u.secondary_groups)
ORDER BY u.id
We only need a GROUP BY clause if we want to "collapse" rows. If the id column is unique in both the users and profile tables, then there's no need for a GROUP BY u.id. We can add an ORDER BY clause if we want rows returned in a particular sequence.
I don't know, what exactly do you want to do with output, but you can't group informations like this. MySQL isn't really a classic programming language, it's more like powerful tool for set mathematics. So if you want to get informations based on corelations between two or more tables, first you write a select statement which contains raw data which you want to work with, like this:
SELECT * FROM users u INNER JOIN profile p ON p.id=u.id
GROUP BY u.id;
Now you select relevant data with WHERE statement:
SELECT * FROM users u INNER JOIN profile p ON p.id=u.id WHERE
FIND_IN_SET( '7', secondary_groups ) OR primary_group = 7
GROUP BY u.id;
Now you should see grouped joined tables profile and users, and can start mining data. For example, if you want to count items in these groups, just add count function in SELECT and so on.
When debugging SQL, I highly recommend these steps:
1.) First, you should write down all corelations between data, all foreign keys between tables, so you will know if your selection is fully deterministic. You can now start JOINing tables from left to right
2.) Try small bits of querys on model database. Then you will see which selection works right and which doesn't do what you expected.
I think #SIDU has it in the comments: You are experiencing a Boolean order of operations problem. See also SQL Logic Operator Precedence: And and Or
For example:
SELECT 0 AND 0 OR 1 AS test;
+------+
| test |
+------+
| 1 |
+------+
When doing complex statements with both AND and OR, use parenthesis. The operator order problem is leading to you doing an unintended outer join that's being masked by your GROUP BY. You shouldn't need a GROUP BY for that statement.
Although I don't personally care for the style #spencer7593 suggests in his answer(using INNER JOIN, etc.), it does have the advantage of preventing or identifying errors early for people new to SQL, so it's something to consider.

activerecord ruby row with max value in mysql table

I need to select from MySQL table table1 (it's shown below) all records with different 'foreign_row_id' values and group them by maximum datetime value. For example, from the table below I should select rows with id=2 and id=3. And after this I have to join the result with table with phrase_id's.
In my project I use only Ruby and ActiveRecord without Rails.
+----+---------------------+----------------+--------------+
| id | datetime | foreign_row_id | other_fields |
+----+---------------------+----------------+--------------+
| 1 | 2013-05-02 17:36:15 | 1 | 1 |
| 2 | 2013-05-02 17:36:53 | 1 | 1 |
| 3 | 2013-05-03 00:00:00 | 2 | 3 |
+----+---------------------+----------------+--------------+
Here my ruby code:
#result= Model1.joins(:foreign_row).
where(:user_id => user_id).
order(:datetime).
reverse_order.
select('table1.*, foreign_row.*').
maximum(:datetime, :group => :foreign_row_id).
And it gives me only one record, without grouping by id and joining: {"1":"2013-05-02T17:36:53+09:00"}.
What should I change in the my code to get all rows?
I solved this by parts, first I get a SQL sentence that would solve problem:
SELECT * FROM (SELECT * FROM `models` ORDER BY `datetime` desc) m GROUP BY `foreign_row_id`
And then I built that query with Arel:
model_table = Model1.arel_table
subquery = model_table.project(Arel.sql('*')).order('`datetime` desc').as('m')
query = model_table.project(Arel.sql('*')).from(subquery).group('`foreign_row_id`')
Finally you can run that query:
Model1.find_by_sql query.to_sql
I added some back ticks because fields I tested with were SQL reserved words, I think you can omit them.

Optimizing sql join query, comparing query effectiveness

I'm a student working on a module for moodle cms (course management system) of my college. I have to write some join queries for my module. I can not make changes to table structures, they are pretty much set in stone (I didn't make them, they were given to me).
I have no experience with writing queries for large databases. I've created a working prototype of my module and now I'm trying to organize the code/optimize queries etc.
Tasks:
| id | task |
--------------------
| 1 | task1 |
| 2 | task3 |
| 3 | task3 |
| 4 | task4 |
| ... | ... |
Assets:
| id | asset |
--------------------
| 1 | task1 |
| 2 | task3 |
| 3 | task3 |
| 4 | task4 |
| ... | ... |
TaskAsset:
| id | taskid | assetid | coefficient |
-----------------------------------------------
| 1 | 2 | 33 | coefficient1 |
| 2 | 5 | 35 | coefficient2 |
| 3 | 6 | 36 | coefficient3 |
| 4 | 8 | 37 | coefficient4 |
| 5 | ... | ... | ... |
$query = "SELECT TaskAsset.id as id, Assets.asset AS asset, Tasks.task AS task
, coefficient
FROM Tasks, Assets, Taskasset
WHERE Taskasset.taskid= Tasks.id AND TaskAsset.assetid = Assets.id";
$result = mysql_query($query) or die(mysql_error());
while($row = mysql_fetch_array($result))
{
echo $row['id']." - ".$row['asset']." - ".$row['task'] . $row['coefficient'];
echo "<br />";
}
Questions:
1.) So, if table structures are like these, is my query effective?
If they are, is a simple join still effective if I have to join more tables? Like 4 or 5?
2.) How do I rate effectiveness of queries? In phpmyadmin, I can see the time it took for the query to run. I've never used anything else for this because my tables had very few records, so it did not matter.
The only thing that I would do differently is explicitly specify the joins.
$query = "SELECT ta.id as id, a.asset AS asset, t.task AS task
, coefficient
FROM TaskAsset ta
JOIN Tasks t ON ta.taskId = t.id
JOIN Assets a ON ta.assetId = a.id";
This does the same thing but I personally prefer it a lot better. That said, you should try to run an EXPLAIN on your query. That is where you'll see the pressure points.
Your query is fine as is from an optimality standpoint, assuming indexes are present on the id fields of the tables. With the right indexes, you can join many more tables and the performance will still be good.
You should try to get yourself familiar with the ANSI join syntax - as this is much easier to read than the old FROM x, y, z ... style joins - and it's also more difficult to get wrong!
This query is appropriate for the results that you want.
TaskAssets is a mapping table that is meant to join columns of Task and Asset together by foreign keys. You need to view columns from all three tables for your result set so this is the most efficient way for it to be done.
What might be even more important than the query are the indexes in the tables.
You are doing
SELECT ta.id as id, a.asset AS asset, t.task AS task, coefficient
FROM TaskAsset ta
JOIN Tasks t ON ta.taskId = t.id <-- equi join here
JOIN Assets a ON ta.assetId = a.id <-- another equi join.
This query has two equi joins.
Always assign indexes on fields involved in an equi-join.
Consider assigning indexes on fields involved in a where clause (this query doesn't have any but that's beside the point)
Strongly consider putting an index on a field used in a group by clause