Database Query by Keywords - mysql

I have following tables:
products_match:
atcode varchar(6)
valcode varchar(100)
id_prod varchar(15)
products:
asin varchar(15)
title varchar(155)
Example content of products_match table:
atcode='type'
valcode='wifi'
id_prod='1SC52DD'
atcode='type'
valcode='ram'
id_prod='11DD5ER'
There are multiple kwywords in this table.
I'm budilding a simple search engine - I need to display products matching multiple criteria, example:
select products where atcode='type' AND valcode='wifi' AND valcode='brand' AND 'valcode' = 'Sony'
Do I need to apply self joins for every group of arguments here?
Right now I have following query:
SELECT * FROM products_match a
JOIN products b ON a.id_prod=b.asin
JOIN assortment_match c ON a.id_prod=c.id_prod
WHERE c.atcode='brand' AND c.valcode='sony'
ORDER BY sales_rank ASC LIMIT 0,60
however it returns no products.
Can anybody help me solve this issue?
Edit
I've been told that I should use one self join for every group of keywords. What do you think?

One method is that for each match, you could use an EXISTS subquery.
AND EXISTS (select 1 from products_match
where id_prod = a.id_prod
and atcode = 'type' and valcode = 'wifi')

Related

Is it possible to combine MySQL queries to multiple tables into a single query based on the results from one of the queries?

The $userid of the currently logged in user is all the is currently available in the PHP code. I want to run a query against the mysql tables to return all of the status updates for myself and for friends ordered by createddate DESC.
MySQL Sample database tables:
[statusupdates]
statusupdateid int(8)
ownerid int(8)
message varchar(250)
createddate datetime
[friends]
friendid int(8)
requestfrom int(8)
requestto int(8)
daterequested datetime
dateupdated datetime
status varchar(1)
Question: Can I perform a single string query that returns each statusupdates.userid and the statusupdates.message ordered by statusupdates.createddate DESC?
Or do I have to run a query for each friends record where the $userid is in either the friends.requestfrom or friends.requestto then, run another query for alternate friends.requestfrom or friends.requestto (the one that doesn't include $userid), then sort all of the results by statusupdate.createddate and then get the statusupdates.message?
You want to look at MySQL Joins.
I think this may do something like what you're after, but it will almost definitely need debugging!
SELECT DISTINCT s.ownerid, s.message
FROM statusupdates s
LEFT JOIN friends f ON ($userid = f.requestfrom)
LEFT JOIN friends f ON ($userid = f.requestto)
ORDER BY s.createddate;
This is untested, but should work or at least get you in the right direction.
You could use IN() where you get a list of userids from a sub query. That subquery does a UNION on 2 queries - 1st to get the requestfrom userids, and 2nd to get the requestto userids. Finally we add an OR to include the current userid.
also, I assume that you also want to filter out where status = 1, as you don't want updates from those who have not confirmed friendships
SELECT s.ownerid, s.message
FROM statusupdates s
WHERE s.ownerid IN (
SELECT f1.requestfrom
FROM friends f1
WHERE f1.requestto = $userid
AND f1.status = 1
UNION
SELECT f2.requestto
FROM friends f2
WHERE f2.requestfrom = $userid
AND f2.status = 1
)
OR s.ownerid = $userid
ORDER BY s.createddate DESC
take a look at this sqlFiddle example - http://sqlfiddle.com/#!2/85ea0/3

Performing joins

So this my first run into mysql databases,
I got a lot of help from my first question :
MYSQL - First Database Structure help Please
and built my database as pitchinnate recommended
I have :
Table structure for table club
Column Type Null Default
id int(11) No
clubname varchar(100) No
address longtext No
phone varchar(12) No
website varchar(255) No
email varchar(100) No
Table structure for table club_county
Column Type Null Default
club_id int(11) No
county_id int(11) No
Table structure for table county
Column Type Null Default
id int(11) No
state_id tinyint(4) No
name varchar(50) No
Table structure for table states
Column Type Null Default
id tinyint(4) No
longstate varchar(20) No
shortstate char(2) No
I set up foreign key relationships for everything above that looks that way.... states.id -> county.state_id for example
What I tried to run :
SELECT *
FROM club
JOIN states
JOIN county
ON county.state_id=states.id
JOIN club_county
ON club_county.club_id=club.id
club_county.county_id=county.id
This didn't work... I'm sure the reason is obvious to those of you who know what SHOULD be done.
What I'm trying to do is
get a listing of all clubs, with their associated state and county(ies)
You need to specify a JOIN condition for each of your joins. It should look something like the following:
SELECT *
FROM club
JOIN club_county ON club.id = club_county.club_id
JOIN county ON club_county.county_id = county.id
JOIN states ON county.state_id = state.id
Your version omitted an ON clause on the line that reads JOIN states.
One thing regarding your table names: It's advisable to stick to either singular or plural table names and not to mix them (notice you have club (singular) and states (plural) tables). This makes things easier to remember when you're developing and you're less likely to make mistakes.
EDIT:
If you want to limit which columns appear in your result, you just need to modify the SELECT clause. Instead if "SELECT *", you comma separate just the fields you want.
E.g.
SELECT club.id, club.name, county.name, states.name
FROM club
JOIN club_county ON club.id = club_county.club_id
JOIN county ON club_county.county_id = county.id
JOIN states ON county.state_id = state.id
The query you have written will not even execute as it has syntax error.
Please see this link for more details on JOINS:
13.2.8.2. JOIN Syntax
Also, `
SELECT *
FROM club a, county b, states c, club_county d
WHERE a.id = d.county_id
AND b.id = d.county_id
AND b.state_id = c.id
`
I hope this will help... If you still need help, please let us know...
Thanks...
Mr.777
So after you have modified the question, now the answer would be more like:
SELECT club.id,club.clubname,county.name,states.longstate,states.shortstate
FROM club,club_county,county,states
WHERE club.id=club_county.club_id
AND county.id=club_county.county_id
AND states.id = county.state_id
Please let me know if you need more help...
Thanks...
Mr.777

Dynamic query string

I want to add some dynamic content in from clause based on one particular column value.
is it possible?
For Example,
SELECT BILL.BILL_NO AS BILLNO,
IF(BILL.PATIENT_ID IS NULL,"CUS.CUSTOMERNAME AS NAME","PAT.PATIENTNAME AS NAME")
FROM
BILL_PATIENT_BILL AS BILL
LEFT JOIN IF(BILL.PATIENT_ID IS NULL," RT_TICKET_CUSTOMER AS CUS ON BILL.CUSTOMER_ID=CUS.ID"," RT_TICKET_PATIENT AS PAT ON BILL.PATIENT_ID=PAT.ID")
But This query is not working.
Here
BILL_PATIENT_BILL table is a common table.
It can have either PATIENT_ID or CUSTOMER_ID. If a particular record has PATIENT_ID i want PATIENTNAME in RT_TICKET_PATIENT as NAME OtherWise it will hold CUSTOMER_ID. If it is i want CUSTOMERNAME as NAME.
Here I m sure That BILL_PATIENT_BILL must have either PATIENT_ID or CUSTOMER_ID.
Can anyone help me?
You can also use IF() to select the right values instead of constructing your query from strings:
SELECT
BILL.BILL_NO AS BILLNO,
IF( BILL.PATIENT_ID IS NULL, cus.CUSTOMERNAME, pat.PATIENTNAME ) AS NAME
FROM
BILL_PATIENT_BILL AS BILL
LEFT JOIN RT_TICKET_CUSTOMER cus ON BILL.CUSTOMER_ID = cus.ID
LEFT JOIN RT_TICKET_PATIENT pat ON BILL.PATIENT_ID = pat.ID
However, it would also be possible to PREPARE a statement from strings and EXECUTE it but this technique is prone to SQL injections, i can only disadvise to do so:
read here: Is it possible to execute a string in MySQL?

optimising and scaling mysql structure + queries for large mailing groups

So I have a system that stores contacts and allows them to be put into groups. These groups can be defined by criteria (everyone with surname 'smith'), or by explicitly adding / excluding people.
The problem I am having is that when I list the mailing groups, I need to count how many contacts are in each one. This number can change as contacts are added / removed from the contacts table. On small groups / amounts of contacts it is fine, however using 50k ish contacts runs into problems
An example query I use for this is as follows:
SELECT COUNT(c_id) FROM contacts, mgroups
LEFT JOIN mgroups_explicit ON mg_id = me_mg_id
WHERE mgroups.site_id = '10'
AND mg_id = '20'
AND me_c_id = c_id
AND contacts.site_id = '10'
OR (contacts.site_id = '10' AND ( c_tags LIKE '%tag1%')) AND c_id NOT IN
( SELECT mex_c_id FROM mgroups_exclude WHERE c_id = mex_c_id ) GROUP BY c_id
The criteria table does not feature in this query, as the problem presents itself when large groups are created explicitly, rather than with a criteria. This is required as criteria based groups grow or shrink on the fly as you modify your contacts, where as explicit is generally set in stone. So in this case, if you explicitly add 20k contacts to a group, it adds 20k rows to the table marked with that mg_id as a foreign key.
This basically takes ages / times out / gets the wrong number / generally doesn't work very well. I either need to figure out a more efficient query, or figure out a better way to store everything.
Any ideas?
The 5 main tables that make up the database
contacts - where the actual contacts reside
Field Type Null Default Comments
c_id int(8) No
site_id int(6) No
c_email varchar(500) No
c_source varchar(255) No
c_subscribed tinyint(1) No 0
c_special tinyint(1) No 0
c_domain text No
c_title varchar(12) No
c_name varchar(128) No
c_surname varchar(128) No
c_company varchar(128) No
c_jtitle text No
c_ad1 text No
c_ad2 text No
c_ad3 text No
c_county varchar(64) No
c_city varchar(128) No
c_postcode varchar(32) No
c_lat varchar(100) No
c_lng varchar(100) No
c_country varchar(64) No
c_tel varchar(20) No
c_mob varchar(20) No
c_dob date No
c_registered datetime No
c_updated datetime No
c_twitter varchar(255) No
c_facebook varchar(255) No
c_tags text No
c_special_1 text No
c_special_2 text No
c_special_3 text No
c_special_4 text No
c_special_5 text No
c_special_6 text No
c_special_7 text No
c_special_8 text No
mgroups - basic mailing group info
Field Type Null Default Comments
mg_id int(8) No
site_id int(6) No
mg_name varchar(255) No
mg_created datetime No
mgroups_criteria - criteria for said mailing groups
Field Type Null Default Comments
mc_id int(8) No
site_id int(6) No
mc_mg_id int(8) No
mc_criteria text No
mgroups_exclude - anyone to exclude from criteria
Field Type Null Default Comments
mex_id int(8) No
site_id int(6) No
mex_c_id int(8) No
mex_mg_id int(8) No
mgroups_explicit - anyone to explicitly add without the use of criteria
Field Type Null Default Comments
me_id int(8) No
site_id int(6) No
me_c_id int(8) No
me_mg_id int(8) No
And the indexs / explain of query. Must admit, indexes are not my strong point, any improvements?
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mgroups ALL PRIMARY,mg_id NULL NULL NULL 9 Using temporary; Using filesort
1 PRIMARY mgroups_explicit ref me_mg_id me_mg_id 4 engine_4.mgroups.mg_id 8750
1 PRIMARY contacts ALL PRIMARY,c_id NULL NULL NULL 86012 Using where; Using join buffer
2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const table...
I don't see any indexes in the schema above, you do have indexes don't you?
run an explain on the query
EXPLAIN
SELECT COUNT(c_id) FROM
contacts, mgroups LEFT JOIN mgroups_explicit ON mg_id = me_mg_id
WHERE
mgroups.site_id = '10'
AND mg_id = '20'
AND me_c_id = c_id
AND contacts.site_id = '10'
OR (contacts.site_id = '10'
AND ( c_tags LIKE '%tag1%'))
AND c_id NOT IN (SELECT mex_c_id FROM mgroups_exclude WHERE c_id = mex_c_id ) GROUP BY c_id
That will tell you about what indexes are being used how many records it has to sort through etc..
DC
Right so I got this answered elsewhere (Huge thanks to Hambut_Bulge), so for the sake of it being useful to anyone else heres the solution:
First things off you're mixing old and new (ANSI) style joins in the same query. This is considered a bad idea in SQL circles. By old style I mean we write a query with a join along these lines
SELECT a.column_name, b.column2
FROM table1 a, second_table b
WHERE a.id_key = b.fid_key
AND b.some_other_criteria = 'Y';
In the newer ANSI style we'd rewrite the above to this:
SELECT a.column_name, b.column2
FROM table1 a INNER JOIN second_table b ON a.id_key = b.fid_key
WHERE b.some_other_criteria = 'Y';
Its neater and easier to read which bits are join conditions and which are where clauses. Its also best to get into the habit of using ANSI style as old style support may (at some point) be discontinued.
Also try and be consistent in your use of dot notation and/or aliases. Again it makes big queries easier to read.
Back to your problem query, I began by starting to convert it into ANSI style and straight-away noticed that you don't have a join condition between contacts and mgroups. This means that optimizer will create a cross join (also called a cartesian product), which was probably something you don't want to do. The cross join (in case you didn't know) joins every row in the contacts table with every row in the mgroups table. So if you have 50,000 rows in contacts and 20,000 rows in mgroup you're going to get a joined result set containing 1,000,000,000 rows!
The other thing that is going to slow this query drastically is the subquery on mgroups_exclude. A subquery is executed once for each row in the outer query eg:
SELECT a.column1
FROM table1 a
WHERE a.id_key NOT IN ( SELECT * FROM table2 b WHERE a.id_key = b.fid_key);
Assume that table1 has 2,000,000 rows and table2 has 500,000. For each and every row in the outer query (table1) the database is going to have to do a full scan on the inner query. So to get a result the database will have read 1,000,000,000,000 rows and we may only be interested in 1,000! It will not touch any indexes no matter what.
To get around this we can use a left join (also called a left outer join) on the two tables.
SELECT a.column1
FROM table1 a LEFT JOIN table2 b ON a.id_key = b.fid_key
WHERE b.fid_key IS NULL;
An outer join does not require each record in the joined tables to have a matching record. So the example above we'd get all the records from table1 even if there is no match on table2. For non-matched records the database returns a NULL and we can test for that in the where clause. Now the optimizer can scan the indexes on the two tables id_key fields (assuming there are any), resulting in a much faster query.
So, to wrap up. I'd rewrite your orginal query thus:
SELECT COUNT( a.c_id )
FROM contacts a
INNER JOIN mgroups b ON a.c_id = b.mg_id
LEFT JOIN mgroups_explicit c ON b.mg_id = c.me_mg_id
LEFT JOIN mgroups_exclude d ON a.c_id = d.mex_c_id
WHERE b.mg_id = '20'
AND a.site_id = '10'
AND a.c_tags LIKE '%tag1%'
AND d.mex_c_id IS NULL
GROUP BY c_id;

Using SQL to get the Last Reply on a Post

I am trying to replicate a forum function by getting the last reply of a post.
For clarity, see PHPBB: there are four columns, and the last column is what I like to replicate.
I have my tables created as such:
discussion_id (primary key)
user_id
parent_id
comment
status
pubdate
I was thinking of creating a Link Table that would update for each time the post is replied to.
The link table would be as follow:
discussion_id (primary key)
last_user_id
last_user_update
However, I am hoping that theres a advance query to achieve this method. That is, grabbing each Parent Discussion, and finding the last reply in each of those Parent Discussions.
Am I right that there is such a query?
Here is a update.
I am still having a little trouble but I feel like I am almost there.
My current query:
SELECT
`discussion_id`,
`parent_id`,
`user_id` as `last_user_id`,
`user_name` as `last_user_name`
FROM `table1`, `table2`
WHERE `table1`.`id` = `table2`.`user_id`
Results:
discussion_id---------parent_id-----last_user_id-------last_user_name
30---------------------NULL-------------3--------------raiku
31---------------------30---------------2--------------antu
32---------------------30---------------1--------------admin
33---------------------NULL-------------3--------------raiku
Adding this:
GROUP BY `parent_id`
Turns it into:
discussion_id---------parent_id-----last_user_id-------last_user_name
32---------------------30---------------1--------------admin
33---------------------NULL-------------3--------------raiku
But I want it to turn it into:
discussion_id---------parent_id-----last_user_id-------last_user_name
30---------------------NULL-------------3--------------raiku
32---------------------30---------------1--------------admin
33---------------------NULL-------------3--------------raiku
Id 30, and ID 33 share the same parent_id: NULL but they are the "starting thread" or the "parent post"
They should not be combined, how would I go on by "Grouping" but "ignoring" null values?
This query will take the highest (thus assuming latest) discussion per parent_id. Not the neatest solution however ...
select discussion_id, user_id, pubdate
from tablename
where discussion_id in
(
select max(discussion_id)
from tablename
group by parent_id
)
You could try something like this:
SELECT parent.discussion_id,
child.discussion_id as last_discussion_id,
child.user_id as last_user_id,
child.pubdate as last_user_update
FROM Discussion parent
INNER JOIN Discussion child ON ( child.parent_id = parent.discussion_id )
LEFT OUTER JOIN Discussion c ON ( c.parent_id = parent.discussion_id AND c.discussion_id > child.discussion_id)
WHERE c.discussion_id IS NULL
The left join to Discussion c will not match when you have the post with the highest id, which should be the row that you want.
You want GROUP BY. This should work out OK:
SELECT MAX(`pubdate`), `discussion_id`, `user_id` FROM `table` GROUP BY `parent_id`
You'll obviously need to fill in an appropriate the WHERE clause and LIMIT as needed.