I have the following code:
team_articles = user.npt_teams.to_a.inject({}) {|arts,team|
arts.merge({ team.name =>
NptArticle.join(:npt_authors).join(:users).join(:npt_teams).where(:npt_teams__id => team.id).to_a.uniq})
}
It causes my terminal to stop responding and my Macbook to slow down.
In mysqlworkbench it gets a response instantly.
A suggestion was to create a lighter version of the NptArticle object but I'm not quite sure how to create a version that pulls less columns so any suggestion to fix this issue would be great.
This is the table.
The generated SQL is:
SELECT * FROM `npt_articles` INNER JOIN `npt_authors` INNER JOIN `users` INNER JOIN `npt_teams` WHERE (`npt_teams`.`id` = 1)
I'd love to upgrade the Ruby version but I can't. I'm working off an old code-base and this is the version of Ruby it uses. There are plans to re-build in the future with more modern tools but at the moment this is what I have to work with.
Results from :
EXPLAIN SELECT * FROM npt_articles INNER JOIN npt_authors INNER JOIN users INNER JOIN npt_teams WHERE (npt_teams.id = 1);
So for npt_team.id =1 you are performing a cross join for all:
npt_articles
npt_authors
users
If the number of articles, authors and users is even moderate you would get a huge number of results as the joins aren't restricted. Normally, you would use something like:
INNER JOIN `npt_authors` ON (npt_articles.ID=npt_authors.articleID)
(it depends on how your database relates).
In addition, you would need indexes on the fields that relate the tables to each other, which will speed things up as well.
Look at the rows column of the EXPLAIN SELECT. That is how many rows are being processed for each part of the join. To get an estimate of the total number of rows processed, multiply these numbers together. 1 x 657 x 269723 x 956188 = rather a lot.
I'm not Ruby wiz so perhaps somebody else can post how you do this.
Related
I tried to find if there are any answered but couldn't seem to find any. I'm trying to join together four tables but one of the joins is not on the table that the other two joins are from, I've successfully joined three of the table I'm just not sure of syntax for joining the third.
SELECT * FROM
nc_booking
INNER JOIN
nc_customer ON nc_booking.c_id = nc_customer.id
INNER JOIN
nc_propertys ON nc_booking.p_id = nc_propertys.id
How would i now join nc_propertys to another table nc_owner?
Building on the code from #GordonLinoff, to add your extra table you need to do something like:
SELECT *
FROM nc_booking b INNER JOIN
nc_customer c
ON b.c_id = c.id INNER JOIN
nc_propertys p
ON b.p_id = p.id INNER JOIN
nc_owner o
ON o.id = p.o_id;
You haven't shared the column names we need to use to connect the extra table, so the last line might not be right. A few things to note ...
(1) The SELECT * is not ideal. If you only need particular columns here, list them. I've stuck with your * because I don't know what you want from the tables. Where a column with the same name exists in each table, you'll have "fully qualify" the field name as follows ...
SELECT c.id as customer_id,
-- more field can go here, with a comma after each
...
Several of the joined tables have an id field, so the c. is necessary to tell the database which one we want. Notice that as with the tables, we can also give the fields an 'alias', which in this case is 'customer_id'. This can be very helpful for presentation, and is often essential when using the output from a query as part of a larger piece of code.
(2) Since all the joins are INNER JOINS it makes little (if any) difference what order the tables are listed as long as the connections between them remain the same.
(3) For MySQL, it technically shouldn't matter whether you have lots of new-lines or none at all. SQL is designed to ignore "white space" (except within data). What matters is simply laying out your code so it is easy to read ... especially for other users who later might need to figure out what you were doing (although in my experience also for you, when you return to a piece of code several years later and can't remember it at all).
(4) In each ON clause it doesn't actually matter whether you wright say a = b or b = a. That's because you aren't setting one to equal the other, you are requiring that they already be equal so it amounts to the same thing either way.
My advice to a SQL beginner would be when you are writing a SELECT query (which only reads and doesn't change any data): if you aren't too sure then write some code and set it to run. If it's completely invalid, your software should give you some idea of what is wrong and no harm will be done. If it's valid but wrong, the very worst that can happen is that you put some unnecessary load on your database server ... if it takes a long time to run and you weren't expecting it to, then you should be able to cancel the query. As long as you have some idea of what you expect the results to look like, and roughly how many rows to expect, you won't go too far wrong. If you get completely stuck come back here to Stack Overflow.
Things get a bit different if you are writing code which DELETEs or UPDATEs data. Then you want to know exactly what you're up to. Normally you can write a closely related SELECT statement first to make sure you're going to be making all and only the changes you were expecting. It's also best to make sure you've got a way to undo your changes should the worst happen. Backups are obviously good, and you can often create your own backup copy of a table before you make any alterations. You don't necessarily need to rely on backup software or your in house IT guys for that ... in my experience they don't like databases anyway.
Also there are some great books out there. For a beginner, I'd recommend anything by Ben Forta, including his SQL in 10 Minutes (that's a per chapter figure), or his MySQL Crash Course (the latter is a little old though, so won't have anything on the more recently added features of MySQL).
Your syntax looks okay. I am providing an answer because you really should learn to use table aliases. They make a query easier to write and to read:
SELECT *
FROM nc_booking b INNER JOIN
nc_customer c
ON b.c_id = c.id INNER JOIN
nc_propertys p
ON b.p_id = p.id;
i have a MySQL statement which works - i can get the records requested - movies.* & groups.name.
$stmt= $mysqli->query("SELECT DISTINCT ebooks.*, groups.name FROM ebooks
INNER JOIN ebooks_groups ON ebooks.uuid = ebooks_groups.ebookuuid
INNER JOIN groups_users ON ebooks_groups.groupuuid = groups_users.groupuuid
INNER JOIN groups ON groups_users.groupuuid = groups.uuid
WHERE useruuid=".$get_useruuid."
ORDER BY groups.name");
1/ However i need to grab another column from the groups table - namely groups.uuid
i tried
SELECT DISTINCT movies.*, groups.* FROM movies, groups
&
SELECT DISTINCT movies.*, groups.name, groups.uuid FROM movies, groups
but it retrieved no records.
2/ Then I had another look at my original code - ... FROM movies ... - how is this even working if i'm not selecting FROM movies, groups tables?
AFAIK, this is pure MySQL. PHP or not doesn't come into play.
First to understand is the implicit join:
Explicit vs implicit SQL joins
That understanding should solve at least half of your problem.
Secondly, I'd never code a SELECT * without a very good reason (and there's few). It makes much more sense to select just the columns you need instead of getting them all and even if you need all that are currently there, if you work on the database model later on, there might be more (or less!!) columns in the database and it'll be much harder to detect that your code needs updating if you don't have them explicitly listed.
For the rest I build my SQL queries slowly step by step. That helps a lot to debugging your queries esp. as you have the actual tables and some sample data ...
[That should solve your other half of the question]
I'm trying to do what I think is a set of simple set operations on a database table: several intersections and one union. But I don't seem to be able to express that in a simple way.
I have a MySQL table called Moment, which has many millions of rows. (It happens to be a time-series table but that doesn't impact on my problem here; however, these data have a column 'source' and a column 'time', both indexed.) Queries to pull data out of this table are created dynamically (coming in from an API), and ultimately boil down to a small pile of temporary tables indicating which 'source's we care about, and maybe the 'time' ranges we care about.
Let's say we're looking for
(source in Temp1) AND (
((source in Temp2) AND (time > '2017-01-01')) OR
((source in Temp3) AND (time > '2016-11-15'))
)
Just for excitement, let's say Temp2 is empty --- that part of the API request was valid but happened to include 'no actual sources'.
If I then do
SELECT m.* from Moment as m,Temp1,Temp2,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11'15'))
)
... I get a heaping mound of nothing, because the empty Temp2 gives an empty Cartesian product before we get to the WHERE clause.
Okay, I can do
SELECT m.* from Moment as m
LEFT JOIN Temp1 on m.source=Temp1.source
LEFT JOIN Temp2 on m.source=Temp2.source
LEFT JOIN Temp3 on m.source=Temp3.source
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... but this takes >70ms even on my relatively small development database.
If I manually eliminate the empty table,
SELECT m.* from Moment as m,Temp1,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... it finishes in 10ms. That's the kind of time I'd expect.
I've also tried putting a single unmatchable row in the empty table and doing SELECT DISTINCT, and it splits the difference at ~40ms. Seems an odd solution though.
This really feels like I'm just conceptualizing the query wrong, that I'm asking the database to do more work than it needs to. What is the Right Way to ask the database this question?
Thanks!
--UPDATE--
I did some actual benchmarks on my actual database, and came up with some really unexpected results.
For the scenario above, all tables indexed on the columns being compared, with an empty table,
doing it with left joins took 3.5 minutes (!!!)
doing it without joins (just 'FROM...WHERE') and adding a null row to the empty table, took 3.5 seconds
even more striking, when there wasn't an empty table, but rather ~1000 rows in each of the temporary tables,
doing the whole thing in one query took 28 minutes (!!!!!), but,
doing each of the three AND clauses separately and then doing the final combination in the code took less than a second.
I still feel I'm expressing the query in some foolish way, since again, all I'm trying to do is one set union (OR) and a few set intersections. It really seems like the DB is making this gigantic Cartesian product when it seriously doesn't need to. All in all, as pointed out in the answer below, keeping some of the intelligence up in the code seems to be the better approach here.
There are various ways to tackle the problem. Needless to say it depends on
how many queries are sent to the database,
the amount of data you are processing in a time interval,
how the database backend is configured to manage it.
For your use case, a little more information would be helpful. The optimization of your query by using CASE/COUNT(*) or CASE/LIMIT combinations in queries to sort out empty tables would be one option. However, if-like queries cost more time.
You could split the SQL code to downgrade the scaling of the problem from 1*N^x to y*N^z, where z should be smaller than x.
You said that an API is involved, maybe you are able handle the temporary "no data" tables differently or even don't store them?
Another option would be to enable query caching:
https://dev.mysql.com/doc/refman/5.5/en/query-cache-configuration.html
I have a query that I am trying to build a view off of. The query below works as I would expect it to, but it has a sub-query, so mysql won't let me build a view off it. I'll start by saying I'm still relatively new to sql queries, so I am a little lost when trying to refactor this into a query without a sub-query. I have seen a few Q/As on here where more experienced users have helped eliminate sub-queries such as this, and I was hoping someone might be able to make some suggestions for me. I have also seen the suggestion of doing multiple views, and then combining them, but this feels less clean to me, and I'd rather not do it if someone can see a means of eliminating the sub-query here. Generic advice on eliminating sub-queries would be great so I have a better idea what to look for, as well as a more specific answer to the problem at hand.
The query:
(SELECT
`user_id`,`operations`.`name` AS `operation`,
`objectables`.`objectable_id` AS `objectable_id`,
`objectables`.`objectable_type` AS `objectable_type`,
sha2(concat(`operations`.`name`,
`objectables`.`objectable_type`,
`objectables`.`objectable_id`),
256) AS `access_hash`
FROM
(
(SELECT
`users`.`id` as `user_id`,
`ace_user`.`ace_id` as `ace_id`
FROM
`users`
LEFT JOIN `ace_user` ON ((`ace_user`.`user_id` = `users`.`id`))
)
UNION
(SELECT
`users`.`id` as `user_id`, `ace_id`
FROM (
((`users`
LEFT JOIN `role_user` ON ((`role_user`.`user_id` = `users`.`id`)))
LEFT JOIN `permission_role` ON ((`permission_role`.`role_id` = `role_user`.`role_id`)))
LEFT JOIN `ace_permission` ON ((`ace_permission`.`permission_id` = `permission_role`.`permission_id`))
)
)
) as `all_aces`
LEFT JOIN `aces` ON ((`aces`.`id` = `all_aces`.`ace_id`))
LEFT JOIN `operations` ON ((`operations`.`id` = `aces`.`operation_id`))
LEFT JOIN `objectables` ON ((`objectables`.`ace_id` = `all_aces`.`ace_id`)))
I apologize for the slightly convoluted mess of a query that's above. In a nutshell what I am trying to do is make a view that summarizes all the access a user has. A user can be granted access either via direct assignment of an ACE (Access control entry) or via a Role, which has certain permissions associated with it, which are linked to their own set of ACEs. ACEs are in turn linked to the objects they grant permission to. I'm trying to make a view that shows all of these, along with a hash of object,operation etc for quicker hasAccess() checks. Hopefully this helps make sense of my messy query above.
Thanks in advance for any and all help!
-Wally
So my expertise is not in MySQL so I wrote this query and it is starting to run increasingly slow as in 5 minutes or so with 100k rows in EquipmentData and 30k or so in EquipmentDataStaging (which to me is very little data):
CREATE TEMPORARY TABLE dataCompareTemp
SELECT eds.eds_id FROM equipmentdatastaging eds
INNER JOIN equipment e ON e.e_id_string = eds.eds_e_id_string
INNER JOIN equipmentdata ed ON e.e_id = ed.ed_e_id
AND eds.eds_ed_log_time=ed.ed_log_time
AND eds.eds_ed_unit_type=ed.ed_unit_type
AND eds.eds_ed_value = ed.ed_value
I am using this query to compare data rows pulled from a clients device to current data sitting within their database. From here I take the temp table and use the ID's off it to make conditional decisions. I have the e_id_string indexed and I have e_id indexed and everything else is not. I know that it looks stupid that I have to compare all this information, but the clients system is spitting out redundant data and I am using this query to find it. Any type of help on this would be greatly appreciated whether it be a different approach by SQL or MySql Management. I feel like when I do stuff like this in MSSQL it handles the requests much better, but that is probably because I have something set up incorrectly.
TIPS
index all necessary columns which are using with ON or WHERE condition
here you need to index eds_ed_log_time,eds_e_id_string, eds_ed_unit_type, eds_ed_value,ed_e_id,ed_log_time,ed_unit_type,ed_value
change syntax to SELECT STRAIGHT JOIN ... see more reference