I have two tables, one called st_grid that has soccer matches in it, and another table, st_compiled, that is essentially a copy of st_grid but the process of putting a row in st_compiled is quite intensive so I want to put the data in one row at a time. The two tables are like this with the pertinent columns:
st_grid
id
league_id
fixture_date
inplay_fixture_compiled
inplay_fixture_compiled_id
grid_id
I want to select a row from st_grid where there is no corresponding row in st_compiled ON grid_id, but I havent had any luck. I have looked up various queries and am trying this one
SELECT g.id
FROM st_grid g
WHERE NOT EXISTS
(SELECT i.grid_id
FROM inplay_fixture_compiled i
WHERE g.id = i.grid_id)
AND g.league_id = '15'
But it isnt working, all that is happening is the page is hanging for minutes when I try and run it. There is about 170,000 rows in st_grid (but for each league_id there will max 600 rows) and 10,000 in st_compiled but I dont believe that that is a huge amount by any means.
Hope that makes sense, any help much appreciated.
P
Why don't you try join in the case to get all the data in st_grid that is not yet exists in st_compiled.
For e.g;
select grid_id
from st_grid t1
left join st_compiled t2 on t1.grid_id = st_compiled.grid_id
where t2.grid_id is null
SELECT g.grid_id
FROM st_grid g
LEFT JOIN inplay_fixture_compiled i ON (g.id = i.grid_id)
WHERE g.league_id = '15' and i.grid_id is null.
You can join these tables with a LEFT JOIN keyword and filter out the NULL's, but this will likely be less efficient than using NOT EXISTS.
Related
I had most of this query worked about except two things, large things, one, as soon as I add the forth table [departments_tbl]into the query, I get about 8K rows returned when I should only have about 100.
See the attached schema, no the checkmarks, these are the fields I want returned.
This won't help, but here is just one of the queries that I almost had working, until the [department_tbl was added to the mix]
SELECT _n_cust_entity_storeid_15.entity_id,
_n_cust_entity_storeid_15.email,
customer_group.customer_group_code,
departments.`name`,
departments.manager,
_n_cust_rpt_copy.first_name,
_n_cust_rpt_copy.last_name,
_n_cust_rpt_copy.last_login_date,
_n_cust_rpt_copy.billing_address,
_n_cust_rpt_copy.billing_city,
_n_cust_rpt_copy.billing_state,
_n_cust_rpt_copy.billing_zip
FROM _n_cust_entity_storeid_15 INNER JOIN customer_group ON _n_cust_entity_storeid_15.group_id = customer_group.customer_group_id
INNER JOIN departments ON _n_cust_entity_storeid_15.store_id = departments.store_id,
_n_cust_rpt_copy
ORDER BY _n_cust_rpt_copy.last_name ASC
I've tried subqueries, joins, but just can't get it to work.
Any help would be greatly appreciated.
Schema Please note that entity_id and cust_id fields would the be links between the _ncust_rpt_copy table and the _n_cust_entity_storeid_15 tbl
You have a cross join to the last table, _n_cust_rpt_copy:
SELECT _n_cust_entity_storeid_15.entity_id,
_n_cust_entity_storeid_15.email,
customer_group.customer_group_code,
departments.`name`,
departments.manager,
_n_cust_rpt_copy.first_name,
_n_cust_rpt_copy.last_name,
_n_cust_rpt_copy.last_login_date,
_n_cust_rpt_copy.billing_address,
_n_cust_rpt_copy.billing_city,
_n_cust_rpt_copy.billing_state,
_n_cust_rpt_copy.billing_zip
FROM _n_cust_entity_storeid_15 INNER JOIN
customer_group
ON _n_cust_entity_storeid_15.group_id = customer_group.customer_group_id INNER JOIN
departments
ON _n_cust_entity_storeid_15.store_id = departments.store_id join
_n_cust_rpt_copy
ON ???
ORDER BY _n_cust_rpt_copy.last_name ASC;
It is not obvious to me what the right join conditions are, but there must be something.
I might guess they it at least includes the department:
_n_cust_rpt_copy
ON _n_cust_rpt_copy.department_name = departments.name and
Let's say I have about 25,000 records in two tables and the data in each should be the same. If I need to find any rows that are in table A but NOT in table B, what's the most efficient way to do this.
We've tried it as a subquery of one table and a NOT IN the result but this runs for over 10 minutes and almost crashes our site.
There must be a better way. Maybe a JOIN?
Hope LEFT OUTER JOIN will do the job
select t1.similar_ID
, case when t2.similar_ID is not null then 1 else 0 end as row_exists
from table1 t1
left outer join (select distinct similar_ID from table2) t2
on t1.similar_ID = t2.similar_ID // your WHERE goes here
I would suggest you read the following blog post, which goes into great detail on this question:
Which method is best to select values present in one table but missing
in another one?
And after a thorough analysis, arrives at the following conclusion:
However, these three methods [NOT IN, NOT EXISTS, LEFT JOIN]
generate three different plans which are executed by three different
pieces of code. The code that executes EXISTS predicate is about 30%
less efficient than those that execute index_subquery and LEFT JOIN
optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT
EXISTS.
If the performance you're seeing with NOT IN is not satisfactory, you won't improve this performance by switching to a LEFT JOIN / IS NULL or NOT EXISTS, and instead you'll need to take a different route to optimizing this query, such as adding indexes.
Use exixts and not exists function instead
Select * from A where not exists(select * from B);
Left join. From the mysql documentation
If there is no matching row for the right table in the ON or USING
part in a LEFT JOIN, a row with all columns set to NULL is used for
the right table. You can use this fact to find rows in a table that
have no counterpart in another table:
SELECT left_tbl.* FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id =
right_tbl.id WHERE right_tbl.id IS NULL;
This example finds all rows in left_tbl with an id value that is not
present in right_tbl (that is, all rows in left_tbl with no
corresponding row in right_tbl).
I'm reasonably new to MySQL and I'm trying to select a distinct set of rows using this statement:
SELECT DISTINCT sp.atcoCode, sp.name, sp.longitude, sp.latitude
FROM `transportdata`.stoppoints as sp
INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code
INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id
INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id
INNER JOIN `vehicledata`.gtfsagencys as agency ON route.agency_id = agency.agency_id
WHERE agency.agency_id IN (1,2,3,4);
However, the select statement is taking around 10 minutes, so something is clearly afoot.
One significant factor is that the table gtfsstop_times is huge. (~250 million records)
Indexes seem to be set up properly; all the above joins are using indexed columns. Table sizes are, roughly:
gtfsagencys - 4 rows
gtfsroutes - 56,000 rows
gtfstrips - 5,500,000 rows
gtfsstop_times - 250,000,000 rows
`transportdata`.stoppoints - 400,000 rows
The server has 22Gb of memory, I've set the InnoDB buffer pool to 8G and I'm using MySQL 5.6.
Can anybody see a way of making this run faster? Or indeed, at all!
Does it matter that the stoppoints table is in a different schema?
EDIT:
EXPLAIN SELECT... returns this:
It looks like you are trying to find a collection of stop points, based on certain criteria. And, you're using SELECT DISTINCT to avoid duplicate stop points. Is that right?
It looks like atcoCode is a unique key for your stoppoints table. Is that right?
If so, try this:
SELECT sp.name, sp.longitude, sp.latitude, sp.atcoCode
FROM `transportdata`.stoppoints` AS sp
JOIN (
SELECT DISTINCT st.fk_atco_code AS atcoCode
FROM `vehicledata`.gtfsroutes AS route
JOIN `vehicledata`.gtfstrips AS trip ON trip.route_id = route.route_id
JOIN `vehicledata`.gtfsstop_times AS st ON trip.trip_id = st.trip_id
WHERE route.agency_id BETWEEN 1 AND 4
) ids ON sp.atcoCode = ids.atcoCode
This does a few things: It eliminates a table (agency) which you don't seem to need. It changes the search on agency_id from IN(a,b,c) to a range search, which may or may not help. And finally it relocates the DISTINCT processing from a situation where it has to handle a whole ton of data to a subquery situation where it only has to handle the ID values.
(JOIN and INNER JOIN are the same. I used JOIN to make the query a bit easier to read.)
This should speed you up a bit. But, it has to be said, a quarter gigarow table is a big table.
Having 250M records, I would shard the gtfsstop_times table on one column. Then each sharded table can be joined in a separate query that can run parallel in separate threads, you'll only need to merge the result sets.
The trick is to reduce how many rows of gtfsstop_times SQL has to evaluate. In this case SQL first evaluates every row in the inner join of gtfsstop_times and transportdata.stoppoints, right? How many rows does transportdata.stoppoints have? Then SQL evaluates the WHERE clause, then it evaluates DISTINCT. How does it do DISTINCT? By looking at every single row multiple times to determine if there are other rows like it. That would take forever, right?
However, GROUP BY quickly squishes all the matching rows together, without evaluating each one. I normally use joins to quickly reduce the number of rows the query needs to evaluate, then I look at my grouping.
In this case you want to replace DISTINCT with grouping.
Try this;
SELECT sp.name, sp.longitude, sp.latitude, sp.atcoCode
FROM `transportdata`.stoppoints as sp
INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code
INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id
INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id
INNER JOIN `vehicledata`.gtfsagencys as agency ON route.agency_id = agency.agency_id
WHERE agency.agency_id IN (1,2,3,4)
GROUP BY sp.name
, sp.longitude
, sp.latitude
, sp.atcoCode
There other valuable answers to your question and mine is an addition to it. I assume sp.atcoCode and st.fk_atco_code are indexed columns in their table.
If you can validate and make sure that agency ids in the WHERE clause are valid, you can eliminate joining `vehicledata.gtfsagencys` in the JOINS as you are not fetching any records from the table.
SELECT DISTINCT sp.atcoCode, sp.name, sp.longitude, sp.latitude
FROM `transportdata`.stoppoints as sp
INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code
INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id
INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id
WHERE route.agency_id IN (1,2,3,4);
I have 2 tables with tth hashes, and i need to get which of them, who exist in first table but not exist in second table.
I'll try something like this:
SELECT f.*
FROM files as f
LEFT JOIN trans as t ON t.tth=f.tth
WHERE t.id IS NULL
But it's working very slow, in first table 65k lines, and second table with 130k lines, so query working for ~5 minutes.
Here exist another way?
Thanks.
P.S. All columns in both tables having indexes.
Thanks, OMG Ponies, I read an article on the link, and a little optimizing method using NOT IN, query rate was ~ 0.6 seconds.
SELECT f.*
FROM files as f
WHERE
f.tth NOT IN (
SELECT trans.tth
FROM trans as t
-- We do not need to look entries across entire table, only from already matched, so create selection "limits" with INNER JOIN.
INNER JOIN files ON files.tth=t.tth
)
I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.