I run this query to get 20 random items from my wordpress database based on things like rating, category, etc
SELECT (A.user_votes/A.user_voters) as site_rating, B.ID as post_id, B.post_author, B.post_date,E.name as category
FROM `wp_gdsr_data_article` as A
INNER JOIN `wp_posts` as B ON (A.post_id = B.id)
INNER JOIN wp_term_relationships C ON (B.ID = C.object_id)
INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id)
INNER JOIN wp_terms E ON (D.term_id = E.term_id)
WHERE
B.post_type = 'post' AND
B.post_status = 'publish' AND
D.taxonomy='category' AND
E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP')
ORDER BY RAND()
LIMIT 20
Then, for each result of the random items, I want to find a corresponding item that is very similar to the random item (around the same rating) but not identical and also one the user has not seen:
SELECT ABS($site_rating-(A.user_votes/A.user_voters)) as diff, (A.user_votes/A.user_voters) as site_rating, B.ID as post_id, B.post_author, B.post_date,E.name as category ,IFNULL(F.count,0) as count
FROM `wp_gdsr_data_article` as A
INNER JOIN `wp_posts` as B ON (A.post_id = B.id)
INNER JOIN wp_term_relationships C ON (B.ID = C.object_id)
INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id)
INNER JOIN wp_terms E ON (D.term_id = E.term_id)
LEFT JOIN (
SELECT *,COUNT(*) as count FROM `verus` WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
) as F ON (A.post_id = F.post_id_winner OR A.post_id = F.post_id_loser)
WHERE
E.name = '$category' AND
B.ID <> '$post_id' AND
B.post_type = 'post' AND
B.post_status = 'publish' AND
D.taxonomy='category' AND
E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP')
ORDER BY count ASC, diff ASC
LIMIT 1
Where the following php variables refer to the result of the previous query
$post_id = $result['post_id'];
$category = $result['category'];
$site_rating = $result['site_rating'];
and $_SERVER['REMOTE_ADDR'] refers to the user's IP.
Is there a way to combine the first query with the 20 additional queries that need to be called to find corresponding items, so that I need just 1 or 2 queries?
Edit: Here is the view that simplifies the joins
CREATE VIEW `versus_random` AS
SELECT (A.user_votes/A.user_voters) as site_rating, B.ID as post_id, B.post_author, B.post_date,E.name as category
FROM `wp_gdsr_data_article` as A
INNER JOIN `wp_posts` as B ON (A.post_id = B.id)
INNER JOIN wp_term_relationships C ON (B.ID = C.object_id)
INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id)
INNER JOIN wp_terms E ON (D.term_id = E.term_id)
WHERE
B.post_type = 'post' AND
B.post_status = 'publish' AND
D.taxonomy='category' AND
E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP')
My attempt now with the view:
SELECT post_id,
(
SELECT INNER_TABLE.post_id
FROM `versus_random` as INNER_TABLE
WHERE
INNER_TABLE.post_id <> OUTER_TABLE.post_id
ORDER BY (SELECT COUNT(*) FROM `versus` WHERE ip = '54' AND (INNER_TABLE.post_id = post_id_winner OR INNER_TABLE.post_id = post_id_loser)) ASC
LIMIT 1
) as innerquery
FROM `versus_random` as OUTER_TABLE
ORDER BY RAND()
LIMIT 20
However the query just timesout and freezes my mysql.
I think it should work like this, but I don't have any Wordpress at hand to test it. The second query that gets the related post is embedded in the other query, when it gets just the related_post_id. The whole query is turned into a subquery itself, given the alias 'X' (although you are free to use 'G', if you want to continue your alphabet.)
In the outer query, the tables for posts and data-article are joined again (RA and RP) to query the relevant fields of the related post, based on the related_post_id from the inner query. These two tables are left joined (and in reverse order), so you still get the main post if no related post was found.
SELECT
X.site_rating,
X.post_id,
X.post_author,
X.post_date,
X.category,
RA.user_votes / RA.user_voters as related_post_site_rating,
RP.ID as related_post_id,
RP.post_author as related_post_author,
RP.post_date as related_post_date,
RP.name as related_category,
FROM
( SELECT
(A.user_votes/A.user_voters) as site_rating,
B.ID as post_id, B.post_author, B.post_date,E.name as category,
( SELECT
RB.ID as post_id
FROM `wp_gdsr_data_article` as RA
INNER JOIN `wp_posts` as RB ON (RA.post_id = RB.id)
INNER JOIN wp_term_relationships RC ON (RB.ID = RC.object_id)
INNER JOIN wp_term_taxonomy RD ON (RC.term_taxonomy_id = RD.term_taxonomy_id)
INNER JOIN wp_terms RE ON (RD.term_id = RE.term_id)
LEFT JOIN (
SELECT *,COUNT(*) as count FROM `verus` WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
) as RF ON (RA.post_id = RF.post_id_winner OR RA.post_id = RF.post_id_loser)
WHERE
RE.name = E.name AND
RB.ID <> B.ID AND
RB.post_type = 'post' AND
RB.post_status = 'publish' AND
RD.taxonomy='category' AND
RE.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP')
ORDER BY count ASC, diff ASC
LIMIT 1) as related_post_id
FROM `wp_gdsr_data_article` as A
INNER JOIN `wp_posts` as B ON (A.post_id = B.id)
INNER JOIN wp_term_relationships C ON (B.ID = C.object_id)
INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id)
INNER JOIN wp_terms E ON (D.term_id = E.term_id)
WHERE
B.post_type = 'post' AND
B.post_status = 'publish' AND
D.taxonomy='category' AND
E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP')
ORDER BY RAND()
LIMIT 20
) X
LEFT JOIN `wp_posts` as RP ON RP.id = X.related_post_id
LEFT JOIN `wp_gdsr_data_article` as RA.post_id = RP.id
I can't test my proposal so take it with the benefit of the doubt. Anyway i hope it could be a valid starting point for some of the issues faced.
I can not imagine a solution that does not pass through a temporary table, cabling onerous computations present in your queries. You could also have the goal to not interfere with the randomization of the first phase. In the following I try to clarify.
I'll start with these rewritings:
-- first query
SELECT site_rating, post_id, post_author, post_date, category
FROM POSTS_COMMON
ORDER BY RAND()
LIMIT 20
-- second query
SELECT ABS(R.site_rating_A - R.site_rating_B) as diff, R.site_rating_B as site_rating, P.post_id, P.post_author, P.post_date, P.category, F.count
FROM POSTS_COMMON AS P
INNER JOIN POSTS_RATING_DIFFS AS R ON (P.post_id = R.post_id_B)
LEFT JOIN (
/* post_id_winner, post_id_loser explicited; COUNT(*) NULL treatment anticipated */
SELECT post_id_winner, post_id_loser, IFNULL(COUNT(*), 0) as count FROM `verus` WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
) as F ON (P.post_id = F.post_id_winner OR P.post_id = F.post_id_loser)
WHERE
P.category = '$category'
AND R.post_id_A = '$post_id'
ORDER BY count ASC, diff ASC
LIMIT 1
with:
SELECT A.post_id_A, B.post_id_B, A.site_rating as site_rating_A, B.site_rating as site_rating_B
INTO POSTS_RATING_DIFFS
FROM POSTS_COMMON as A, POSTS_COMMON as B
WHERE A.post_id <> B.post_id AND A.category = B.category
CREATE VIEW POSTS_COMMON AS
SELECT A.ID as post_id, A.user_votes, A.user_voters, (A.user_votes / A.user_voters) as site_rating, B.post_author, B.post_date, E.name as category
FROM wp_gdsr_data_article` as A
INNER JOIN `wp_posts` as B ON (A.post_id = B.post_id)
INNER JOIN wp_term_relationships C ON (B.ID = C.object_id)
INNER JOIN wp_term_taxonomy D ON (C.term_taxonomy_id = D.term_taxonomy_id)
INNER JOIN wp_terms E ON (D.term_id = E.term_id)
WHERE
B.post_type = 'post' AND
B.post_status = 'publish' AND
D.taxonomy='category' AND
E.name NOT IN ('Satire', 'Declined', 'Outfits','Unorganized', 'AP')
POSTS_COMMON isolates a common view between the two queries.
With POSTS_RATING_DIFFS, a temporary table populated with the ratings combinations and diffs, we have "the trick" of transforming the inequality join criteria on post_id(s) in an equality one (see R.post_id_A = '$post_id' in the second query).
We also take advantage of a temporary table in having precomputed ratings for the combinatory explosion of A.post_id <> B.post_id (with post category equality), and moreover being useful for other sessions.
Also extracting the RAND() ordering in a temporary table could be advantageous. In this case we could limit the ratings combinations and diffs only on the 20 randomly chosen.
Original limiting to one single row in the dependent second level query is done by mean of ordering and limit statements.
The proposed solution avoids elaborating a LIMIT 1 on an ORDER BY resultset in the second level query wich become a subquery.
The single row calculation in the subquery is done by mean of a WHERE criteria on the maximum of a single value calculated from the columns values on which ORDER BY clause is used.
The combination into a single value must be valid in preserving the correct ordering. I'll leave in pseudo-code as:
'<combination of count and diff>'
For example, using combination of the two values into a string type, we could have:
CONCAT(LPAD(CAST(count AS CHAR), 10, '0'), LPAD(CAST(ABS(diff) AS CHAR), 20, '0'))
The structure of the single query would be:
SELECT (Q_LVL_1.user_votes/Q_LVL_1.user_voters) as site_rating_LVL_1, Q_LVL_1.post_id as post_id_LVL_1
, Q_LVL_1.post_author as post_author_LVL_1, Q_LVL_1.post_date as post_date_LVL_1
, Q_LVL_1.category as category_LVL_1, Q_LVL_2.post_id as post_id_LVL_2
, Q_LVL_2.diff as diff_LVL_2, Q_LVL_2.site_rating as site_rating_LVL_2
, Q_LVL_2.post_author as post_author_LVL_2, Q_LVL_2.post_date as post_date_LVL_2
, Q_LVL_2.count
FROM POSTS_COMMON AS Q_LVL_1
, /* 1-row-selection query placed side by side for each Q_LVL_1's row */
(
SELECT CORE_P.post_id, CORE_P.ABS_diff as diff, P.site_rating, P.post_author, P.post_date, CORE_P.count
FROM POSTS_COMMON AS P
INNER JOIN (
SELECT FIRST(CORE_P.post_id) as post_id, ABS(CORE_P.diff) as ABS_diff, CORE_P.count
FROM (
/*
selection of posts with post_id(s) different from first level query,
not already taken and with the topmost value of
'<combination of count and diff>'
*/
) AS CORE_P
GROUP BY CORE_P.count, ABS(CORE_P.diff)
/* the one row selector */
) AS CORE_ONE_LINER ON P.post_id = CORE_ONE_LINER.post_id
) AS Q_LVL_2
ORDER BY RAND()
LIMIT 20
CORE_P selection could have more post_id(s) corresponding to the topmost value '<combination of count and diff>', so the use of GROUP BY and FIRST clauses to reach the single row.
This brings to a possible final implementation:
SELECT (Q_LVL_1.user_votes/Q_LVL_1.user_voters) as site_rating_LVL_1, Q_LVL_1.post_id as post_id_LVL_1
, Q_LVL_1.post_author as post_author_LVL_1, Q_LVL_1.post_date as post_date_LVL_1
, Q_LVL_1.category as category_LVL_1, Q_LVL_2.post_id as post_id_LVL_2
, Q_LVL_2.diff as diff_LVL_2, Q_LVL_2.site_rating as site_rating_LVL_2
, Q_LVL_2.post_author as post_author_LVL_2, Q_LVL_2.post_date as post_date_LVL_2
, Q_LVL_2.count
FROM POSTS_COMMON AS Q_LVL_1
, (
SELECT CORE_P.post_id, CORE_P.ABS_diff as diff, P.site_rating, P.post_author, P.post_date, CORE_P.count
FROM POSTS_COMMON AS P
INNER JOIN
(
SELECT FIRST(CORE_P.post_id) as post_id, ABS(CORE_P.diff) as ABS_diff, CORE_F.count
FROM (
SELECT CORE_RATING.post_id as post_id, ABS(CORE_RATING.diff) as ABS_diff, CORE_F.count
FROM (
SELECT post_id_B as post_id, site_rating_A - site_rating_B as diff
FROM POSTS_RATING_DIFFS
WHERE POSTS_RATING_DIFFS.post_id_A = Q_LVL_1.post_id
) as CORE_RATING
LEFT JOIN (
SELECT post_id_winner, post_id_loser, IFNULL(COUNT(*), 0) as count
FROM `verus`
WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
) as CORE_F ON (CORE_RATING.post_id = CORE_F.post_id_winner OR CORE_RATING.post_id = CORE_F.post_id_loser)
WHERE
POSTS_RATING_DIFFS.post_id_A = Q_LVL_1.post_id
AND '<combination of CORE_F.count and CORE_RATING.diff>'
= MAX (
SELECT '<combination of CORE_F_2.count and CORE_RATING_2.diff>'
FROM (
SELECT site_rating_A - site_rating_B as diff
FROM POSTS_RATING_DIFFS
WHERE POSTS_RATING_DIFFS.post_id_A = Q_LVL_1.post_id
) as CORE_RATING_2
LEFT JOIN (
SELECT post_id_winner, post_id_loser, IFNULL(COUNT(*), 0) as count
FROM `verus`
WHERE ip = '{$_SERVER['REMOTE_ADDR']}'
) as CORE_F_2 ON (CORE_RATING_2.post_id = CORE_F_2.post_id_winner OR CORE_RATING_2.post_id = CORE_F_2.post_id_loser)
) /* END MAX */
) AS CORE_P
GROUP BY CORE_P.count, ABS(CORE_P.diff)
) AS CORE_ONE_LINER ON P.post_id = CORE_ONE_LINER.post_id
) AS Q_LVL_2
ORDER BY RAND()
LIMIT 20
I know it´s difficult to answer without knowing the model, but I have next heavy query that takes around 10 secs to complete in my MySQL database. I guess it can be optimized, but I´m not that skilled.
SELECT DISTINCT
b . *
FROM
boats b,
states s,
boat_people bp,
countries c,
provinces pr,
cities ct1,
cities ct2,
ports p,
addresses a,
translations t,
element_types et
WHERE
s.name = 'Confirmed' AND bp.id = '2'
AND b.state_id = s.id
AND b.id NOT IN (SELECT
bc.boat_id
FROM
boat_calendars bc
WHERE
(date(bc.since) <= '2015-02-09 09:23:00 +0100'
AND date(bc.until) >= '2015-02-09 09:23:00 +0100')
OR (date(bc.since) <= '2015-02-10 09:23:00 +0100'
AND date(bc.until) >= '2015-02-10 09:23:00 +0100'))
AND b.people_capacity_id >= bp.id
AND c.id = (SELECT DISTINCT
t.element_id
FROM
translations t,
element_types et
WHERE
t.element_translation = 'Spain'
AND et.name = 'Country'
AND t.element_type_id = et.id)
AND pr.country_id = c.id
AND pr.id = (SELECT DISTINCT
t.element_id
FROM
translations t,
element_types et
WHERE
t.element_translation = 'Mallorca'
AND et.name = 'Province'
AND t.element_type_id = et.id)
AND ((ct1.province_id = pr.id AND p.city_id = ct1.id AND b.port_id = p.id)
OR (ct2.province_id = pr.id AND a.city_id = ct2.id AND b.address_id = a.id));
Basically, it tries to get all the boats, that are not already booked in Confirmed state and that are in a province and a country ie. Mallorca, Spain.
Please, let me know if you need some more details about de purpose of the query or the model.
remove * from select clause. instead give column names in select clause. it will increase some
performance. Its one of the way to optimize
Instead of having a sub query, use LEFT JOIN NULL (just google for it) and it will help a lot.
All your answers are good. But, according to #POHH suggestions, I increased the performance magically by just replacing the b.* for b.somecolumnsnames.
From 10 to 1 or 2 secs.
I have this query:
SELECT c.id, p.id, p.register_date, o.office_agency_name_1, a.street_address, a.supplemental_address_1, a.city, s.abbreviation, co.iso_code, a.postal_code, ph.phone, pf.phone, e.email,
ab.name, c.addressee_custom, (Select phone from civicrm_phone where contact_id = civicrm_contact.id AND is_billing = 1) as billing_phone
FROM civicrm_contact c,
civicrm_participant p,
civicrm_value_office_info_1 o,
civicrm_address a,
civicrm_state_province s,
civicrm_country co,
civicrm_phone ph,
civicrm_phone pf,
civicrm_email e,
civicrm_address ab
WHERE p.contact_id = c.id
AND p.is_test = 0
AND p.event_id = 1
AND p.status_id NOT IN (4,11,12)
AND ( c.is_deleted = 0 OR c.is_deleted IS NULL )
AND o.entity_id = c.id
AND a.contact_id = c.id
AND s.id = a.state_province_id
AND co.id = a.country_id
AND ph.contact_id = c.id
AND ph.is_primary = 1
AND pf.contact_id = c.id
AND pf.phone_type_id = 3
AND e.contact_id = c.id
AND e.is_primary = 1
AND ab.contact_id = c.id
AND ab.is_billing = 1
billing_phone is the most recent addition ... this is of course not working and giving me:
1054 - Unknown column 'civicrm_contact.id' in 'where clause'
Due to the complexity, and the fact that there are more fields yet to add, I'd like to avoid JOINs if possible, and stay as close to the current syntax as possible.
When I tried to add billing phone with the same pattern as the others, it became apparent that not everyone has a billing phone - the number of rows dropped dramatically.
I'd like to make this work as a subquery in the outer SELECT (which would get me NULLs for those rows with no billing phone, right?).
Actually, it doesn't work as a JOIN either:
SELECT c.id, p.id, p.register_date, o.office_agency_name_1, a.street_address, a.supplemental_address_1, a.city, s.abbreviation, co.iso_code, a.postal_code, ph.phone, pf.phone, e.email,
ab.name, c.addressee_custom, pb.phone as billing_phone
FROM civicrm_contact c,
civicrm_participant p,
civicrm_value_office_info_1 o,
civicrm_address a,
civicrm_state_province s,
civicrm_country co,
civicrm_phone ph,
civicrm_phone pf,
civicrm_email e,
civicrm_address ab
LEFT JOIN civicrm_phone as pb on pb.contact_id = c.id AND pb.is_billing = 1
WHERE p.contact_id = c.id
AND p.is_test = 0
AND p.event_id = 1
AND p.status_id NOT IN (4,11,12)
AND ( c.is_deleted = 0 OR c.is_deleted IS NULL )
AND o.entity_id = c.id
AND a.contact_id = c.id
AND s.id = a.state_province_id
AND co.id = a.country_id
AND ph.contact_id = c.id
AND ph.is_primary = 1
AND pf.contact_id = c.id
AND pf.phone_type_id = 3
AND e.contact_id = c.id
AND e.is_primary = 1
AND ab.contact_id = c.id
AND ab.is_billing = 1
That gives me:
1054 - Unknown column 'c.id' in 'on clause'
How do I make this work?
What it appears is that the aliased table "c" (which refers to civicrm_contact) doesn't have a column "id" in it.
Are you sure this column exists in the table?
I have this query:
SELECT CONCAT(f.name, ' ', f.parent_names) AS FullName,
stts.name AS 'Status',
u.name AS Unit,
city.name AS City,
hus.mobile1 AS HusbandPhone,
wife.mobile1 AS WifePhone,
f.phone AS HomePhone,
f.contact_initiation_date AS InitDate,
fh.created_at AS StatusChangeDate,
cmt.created_at AS CommentDate,
cmt.comment AS LastComment,
f.reconnection_date AS ReconnectionDate,
(
SELECT GROUP_CONCAT(t.name)
FROM taggings tgs JOIN tags t
ON tgs.tag_id = t.id
WHERE tgs.taggable_type = 'family' AND
tgs.taggable_id = f.id
) AS HandlingStatus
FROM families f
JOIN categories stts ON f.family_status_cat_id = stts.id
JOIN units u ON f.unit_id = u.id
JOIN categories city ON f.main_city_cat_id = city.id
JOIN contacts hus ON f.husband_id = hus.id
JOIN contacts wife ON f.wife_id = wife.id
JOIN comments cmt ON f.id = cmt.commentable_id AND
cmt.created_at = (SELECT MAX(created_at) FROM comments WHERE commentable_id = f.id)
JOIN family_histories fh ON f.id = fh.family_id AND
fh.created_at = (SELECT MAX(created_at) FROM family_histories WHERE family_id = f.id AND family_history_cat_id = 1422) AND
fh.family_history_cat_id = 1422
WHERE f.id = 12212
The problem is with the second SELECT (the column - HandlingStatus).
I don't understand but when the column has results (tested as a stand alone query) - I get an empty result set, and when there are no results - I get a result.
Why?
I have a query that uses SUBSTRING() as a criteria:
SELECT p.name p_name,
pa.line1 p_line1,
pa.zip p_zip,
c.name c_name,
ca.line1 c_line1,
ca.zip c_zip
FROM bank b
JOIN import_bundle ib ON ib.bank_id = b.id
JOIN generic_import gi ON gi.import_bundle_id = ib.id
JOIN account_import ai ON ai.generic_import_id = gi.id
JOIN account a ON a.account_import_id = ai.id
JOIN account_address aa ON aa.account_id = a.id
JOIN address ca ON aa.address_id = ca.id
JOIN address pa ON pa.zip = ca.zip OR (pa.zip = ca.zip AND pa.line1 = ca.line1)
JOIN prospect p ON p.address_id = pa.id
JOIN customer c ON a.customer_id = c.id
WHERE b.name = 'M'
AND ib.active = 1
AND gi.active = 1
AND SUBSTRING(p.name, 1, 12) = SUBSTRING(c.name, 1, 12)
LIMIT 100
As you can see, it's just comparing the first 12 characters of p.name and c.name. Unfortunately, adding this query to the WHERE clause makes my query unbearably slow. Are there any tricks out there to do this same comparison, or is my best bet to add another column to each table that contains the first 12 characters of the customer's name? I hope it's not the latter because that would be a lot of work and I'll ultimately be doing several comparisons like this.
Add the extra columns and set up an update trigger to populate them automatically. Be sure to create indexes on the new columns, of course.