I could use some expert advice with a MYSQL query I'm trying to put together.
What i would like to do:
I'm trying to create a page that will allow users to perform an advanced search across multiple tables.
The 4 tables are:
members, profiles, skills, genre
Members:
*********************************
id | member_id | login | zipcode
*********************************
Profiles:
*********************************************************************
id | member_id | exp | commitment | practice | gigs | availability
*********************************************************************
Skills:
************************************************************************
id | member_id | lead_vocals | background_vocals | guitar | bass| drums
************************************************************************
Genre:
********************************************************************************
id | member_id | alternative | classic_rock | modern_rock | blues | heavy_metal
********************************************************************************
Skills and Genre represent check box values checked or not (1 or 0)
The search form would be a series of checkboxes and dropdowns that would allow a user to specify the specific items they want to search for.
What I need help with:
I need help coming up with the best way to put this query together. I've been reading up on Joins, Unions, Sub Queries and Derived tables. I can do some basic queries and get part of the data for example:
SELECT members.member_id FROM members LEFT JOIN skills ON members.member_id = skills.member_id WHERE skills.leadvocals = 1
However I just cant seem to wrap my head around putting it all together.
An example of the search criteria would look something like this:
A user fills out the form and wants to search for all members with (members table) zipcode = 11111 OR zipcode = 22222 (profiles table) commitment = ANY, practice = ANY, gigs = 1, availability = ANY (skills table) lead_vocals = 1 and lead_guitar = 1 (genre table) alternative = 1, modern_rock = 1, heavy_metal = 1
Note I already have the logic to calculate the zipcode distance and return a list of zip codes in the range.
At the end of the day the query just needs to return a list of results with member_id and login from the members table that match the criteria.
I'm not looking for somebody to just provide that answer (although I wouldn't mind the answer :)) I learn better by trying to figure it out on my own but I need some help getting started.
Thanks in advance.
The SQL query in the question seems valid, you just need to add the rest of the tables and conditions to get the data you want.
A user fills out the form and wants to search for all members with
(members table) zipcode = 11111 OR zipcode = 22222 (profiles table)
commitment = ANY, practice = ANY, gigs = 1, availability = ANY
(skills table) lead_vocals = 1 and lead_guitar = 1 (genre table)
alternative = 1, modern_rock = 1, heavy_metal = 1
You already put half of the query in your request, except it is written in English and it happens that SQL is just a small subset of English.
The tables you mentioned appear in the FROM clause:
FROM members m
INNER JOIN profiles p USING (member_id)
INNER JOIN skills s USING (member_id)
INNER JOIN genre g USING (member_id)
The conditions appear in the WHERE clause:
WHERE p.gigs = 1
AND s.lead_vocals = 1 AND s.guitar = 1
AND g.alternative = 1 AND g.modern_rock = 1 AND g.heavy_metal = 1
The fields that allow ANY value do not appear in the query, they do not filter the results.
Searching more than one value for zipcode can be done using the IN operator:
AND m.zipcode IN ('11111', '22222')
At the end of the day the query just needs to return a list of results with member_id and login from the members table that match the criteria.
The fields to be returned goes to the SELECT clause:
SELECT m.member_id, m.login
Maybe you want to get the list of members in a specific order, for example sorted by their login names:
ORDER BY m.login
... or by some of their skills; put lead vocals in front of the list:
ORDER BY s.lead_vocals DESC
(order DESCending to get those having 1 in front of those having 0 in column lead_vocals
Now, if we put all together we get the complete query:
SELECT m.member_id, m.login
FROM members m
INNER JOIN profiles p USING (member_id)
INNER JOIN skills s USING (member_id)
INNER JOIN genre g USING (member_id)
WHERE p.gigs = 1
AND s.lead_vocals = 1 AND s.lead_guitar = 1
AND g.alternative = 1 AND g.modern_rock = 1 AND g.heavy_metal = 1
AND m.zipcode IN ('11111', '22222')
ORDER BY s.lead_vocals DESC, m.login
Because you get the information from user input you don't know in advance that practice, for example, is allowed to have ANY value. You need to compose the query from pieces, using the data received from the form.
try this query.
select
m.*
from
members m
left join Profiles p on p.member_id = m.id
left join Skills s on s.member_id = m.id
left join Genre g on g.member_id = m.id
where (m.zipcode = 11111 OR m.zipcode = 22222) and p.gigs = 1 and s.lead_vocals = 1 and s.lead_guitar = 1 and g.alternative = 1, g.modern_rock = 1, g.heavy_metal = 1
Related
I have a database with 500k company profiles + locations they provide their services in.
So I have companies table + locations table.
Company can serve in the whole country or only in a city.
Locations table looks like this:
ID | company_id | scope | country_id | city_id
1 | 'companyuuid...' | 'city' | 'UK' | '32321'
2 | 'companyuuid...' | 'country' | 'US' | NULL
When company provides services in the whole country we indicate scope "country" and we have scope "city" when company provides service only within specific city.
Unfortunately MySQL is pretty slow processing queries when they have "OR" statement and considering amount if data we need to work with, queries should be as optimized as possible.
select distinct companies.id from companies
inner join locations on companies.id = locations.company_id
and (locations.scope = 'city' and locations.city_id = '703448' )
order by companies.score desc limit 12 offset 0
My current problem is that when searching for companies within a city, I also need to show companies that provide services within the whole country. Obvious way would be adding OR statement like this:
select distinct companies.id from companies
inner join locations on companies.id = locations.company_id
and (locations.scope = 'city' and locations.city_id = '703448' )
or (locations.scope = 'country' and locations.country_id = 'UK' )
order by companies.score desc limit 12 offset 0
BUT the problem is that OR statement will make the query extremely slow.
Is there any other way to use additional join maybe, so we can keep the query fast?
I would recommend using exists:
select c.id
from companies c
where exists (select 1
from locations l
where l.company_id = c.id and
l.scope = 'city' and
l.city_id = 703448 -- I'm guessing city_id is a number, so no quotes
) or
exists (select 1
from locations l
where l.company_id = c.id and l.scope = 'country'
)
order by c.score desc
limit 12 offset 0;
The exists subqueries can make use of an index on locations(company_id, scope, city_id). The query might even be able to take advantage of an index on companies(score).
Problem 1: OR seems "wrong". Do you want all the cities in the UK, plus all the Londons, including the one in Canada.
You probably want AND instead of OR. And you would need a "self join" to reach into locations twice?? EAV schema sucks.
Problem 2: x AND y OR z is (x AND y) OR z, not x AND (y OR z).
Looking at the various post, I'm still struggling with creation of one query. I'm trying to get out put of a Users progression within a course. Here is my DB Setup.
COURSE TABLE:
(CourseID | Product | CourseName | TotalModules | CreationDate)
MODULE TABLE:
(ModuleID | Product | ModuleName | TotalPages | CreationDate)
MODULECOURSEASSOCIATION TABLE:
(CourseID | ModuleID | CreationDate)
User TABLE:
(UserID | StudentEmail | CreationDate)
PROGRESSION TABLE:
(ProgressionID | ModuleID | UserID | PageNumber, CreationDate)
The Progression Table records beside the ProgressionIDs, which Module the User has seen, and associated date. What I want to accomplish is this:
Have one total page progression count on a course level.
I.E. A CourseID "1", maybe have 3 Modules. Each of the 3 modules has 3 pages associated with it. So a total of 9 pages all together for Course "A" in ProductID = 2.
UserID "1" has seen Page 1, Page 2, Page 3 in the first Module in ProductID = 2. The progression count should then be 30%. How do I write this in a query?
I tried this:
SELECT count(1) AS TotalModulesCompleted
FROM (select count(PageNumber) AS NumPages,
(select TotalPages
from Module, ModuleCourseAssociation
where Module.ModuleID = ModuleCourseAssociation.ModuleID and
ModuleCourseAssociation.CourseID = 1 and Module.ProductID = 2) AS TotalPages,
(select CourseID from ModuleCourseAssociation
where ModuleCourseAssociation.ModuleID = Progression.ModuleID
and CourseID=8) AS CourseID
FROM Progression WHERE UserID = 11 GROUP by ModuleID)
CourseProgression
WHERE NumPages = TotalPages;
I'm rather lost. Any help is appreciated.
I assume that you require this information per course for one particular student.
You have requested a percentage complete. Therefore you will require two pieces of information from the query:
Total pages in course
Total pages that user has read
With this in mind, the following should work provided that the database is correctly loaded:
SELECT
c.CourseID
,c.CourseName
,SUM(m.TotalPages) AS CourseTotalPages
,SUM(up.PageNumber) AS UserTotalPages
,SUM(up.PageNumber) / SUM(m.TotalPages) AS UserPercentageComplete
FROM `Course` c
JOIN `ModuleCourseAssociation` cm ON c.CourseID = cm.CourseID
JOIN `Module` m ON cm.ModuleID = m.ModuleID
LEFT JOIN `Progression` up ON m.ModuleID = up.ProgressionID
LEFT JOIN `User` u ON up.UserID = u.UserID
WHERE c.ProductID = 2
AND (u.UserID = 11 OR u.UserID IS NULL) -- Cats or nothing
GROUP BY c.CourseID
,c.CourseName
I have provided an SQLFiddle here.
I've got two tables, that are not linked the standard way (I'm aware this isn't a good way to do it)
lets say the tables are setup like the below
Table:
component
fields:
cid, cname, rangeid, company
Table: ranges
fields:
rid, rangename, year
While this is quite simple in a relational DB, i'm not too sure of the cleanest way to do this otherwise (remaking the DB is not an option).
the basic query I need is.
select * from component where range.year = '2014' and company = 'xxx'
any advice would be greatly appreciated.
Is this what you're looking for?
SELECT a.cid, a.cname, a.rangeid, a.company, b.rid, b.rangename, b.year
FROM component a
JOIN ranges b ON
b.rid = a.rangeid
WHERE b.year = 2014
AND a.company = 'xxx'
Result
| CID | CNAME | RANGEID | COMPANY | RID | RANGENAME | YEAR |
------|-----------|---------|---------|-----|-----------|------|
| 1 | Component | 1 | xxx | 1 | Range | 2014 |
Demo
If a range from component may not exist in ranges, then use a LEFT JOIN.
JOIN the two tables:
select c.cname, r.rangename, r.year, ...
from component AS c
INNER JOIN ranges AS r ON c.rangeid = r.rid
where r.year = '2014'
and c.company = 'xxx';
Note that: You can JOIN any tables normally, even if they haven't any relation between them, just put the condition in the ON clause, just like in your case. But, you have to ensure that indexes are setup correctly, see this page fore more information:
Multiple-Column Indexes
JOIN is what you are looking for :
select *
from component
inner join ranges on rid = rangeid and year = 2014
where company = 'xxx'
I have a search page where I am trying to build a complex search condition on two tables which look something like:
Users
ID NAME
1 Paul
2 Remy
...
Profiles
FK_USERS_ID TOPIC TOPIC ID
1 language 1
1 language 2
1 expertise 1
1 expertise 2
1 expertise 3
2 language 1
2 language 2
The second table Profiles, lists the "languages" or the "expertises" (among other stuff) of each user, and topic id is a foreign key to another table depending on the topic (if topic is "language", than topic ID is the ID of a language in the languages table, etc...).
The search needs to find something like where user name LIKE %PAU% and the user "has" language 1 and has language 2 and has expertise 1 and has expertise 2.
Any help would be really appreciated! I am performing a LEFT JOIN on the two tables although I am not sure that is the correct choice. My main problem lies on the "AND". The same user has to have both languages 1 and 2, and at the same time expertise 1 and 2.
I work in PHP and I usually try to avoid inner SELECTs and even joins, but I think an inner SELECT is imminent here?
You can accomplish this by building a set of users that matches the criterias from your profile tables, something like this:
SELECT FK_USERS_ID
FROM Profiles
WHERE topic='x'
AND TOPIC_ID IN (1,2)
GROUP BY FK_USERS_ID
HAVING COUNT(1) = 2
Here you list your users that matches the topics you need. By grouping by the user id and specifying the amount of rows that should be returned, you can effectively say "only those that has x and y in topic z. Just make sure that the COUNT(1) = x has the same number of different TOPIC_IDs to look for.
You can then query the user table
SELECT ID
FROM Users
WHERE name like '%PAU%'
AND ID IN (<insert above query here>)
You can also do it in a join and a derived table, but the essence should be explained above.
EDIT:
if you are looking for multiple combinations, you can use mysql's multi-column IN:
SELECT FK_USERS_ID
FROM Profiles
WHERE (topic,topic_id) IN (('x',3),('x',5),('y',3),('y',6))
GROUP BY FK_USERS_ID
HAVING COUNT(1) = 4
This will look for uses matching the pairs x-3, x-5, y-3 and y-6.
You should be able to build the topic-topic_id pairs easily in php and stuffing it into the SQL string, and also just counting the number of pairs you generate into a variable for using for the count(1) number. See http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/ for performance talk using this approach.
Isn't it just a simple classical INNER JOIN?
SELECT
p.topic, p.topic_id
FROM
profiles p
INNER JOIN
users u
ON
u.id = p.fk_users_id
WHERE
u.name LIKE '%Paul%'
This query would return all the languages and expertise with their IDs for the users matching the pattern, in this case containing Paul in their name. Is this what you like? Or something else?
select *
from users u, profiles p
where u.id = p.fk_users_id
and exists (select 1
from profiles
where fk_users_id = u.id
and topic = 'language'
and topic_id = 1)
and exists (select 1
from profiles
where fk_users_id = u.id
and topic = 'language'
and topic_id = 22)
and exists (select 1
from profiles
where fk_users_id = u.id
and topic = 'expertise'
and topic_id = 1)
and exists (select 1
from profiles
where fk_users_id = u.id
and topic = 'expertise'
and topic_id = 1)
and u.name like '%PAU%'
EDIT:
Ok, a slight variation on #cairnz' answer:
SELECT ID
FROM Users
WHERE name like '%PAU%'
AND ID IN (SELECT FK_USERS_ID
FROM Profiles
WHERE topic='x'
AND ((TOPIC_ID = 1 AND TOPIC = 'language')
OR (TOPIC_ID = 2 AND TOPIC = 'language')
OR (TOPIC_ID = 1 AND TOPIC = 'expertise')
OR (TOPIC_ID = 2 AND TOPIC = 'expertise'))
GROUP BY FK_USERS_ID
HAVING COUNT(1) = 4)
I would do based on JOIN conditions multiple times against each condition that you are "requiring". I would also ensure an index on the Profiles table based on the each part of the key looking for... (FK_User_ID, Topic_ID, Topic)
SELECT STRAIGHT_JOIN
U.ID
FROM Users U
JOIN Profiles P1
on U.ID = P1.FK_User_ID
AND P1.Topic_Id = 1
AND P1.Topic = "language"
JOIN Profiles P2
on U.ID = P2.FK_User_ID
AND P2.Topic_Id = 2
AND P2.Topic = "language"
JOIN Profiles P3
on U.ID = P3.FK_User_ID
AND P3.Topic_Id = 1
AND P3.Topic = "expertise"
JOIN Profiles P4
on U.ID = P4.FK_User_ID
AND P4.Topic_Id = 2
AND P4.Topic = "expertise"
WHERE
u.name like '%PAU%'
This way, any additional criteria as expressed in other answer provided shouldn't be too much an impact. The tables are setup by the criteria as if simultaneous, and if any are missing, they will be excluded from the result immediately instead of trying to do a sub-select counting for every entry (which I think might be the lag you are encountering).
So, each of your "required" criteria would take the same "JOIN" construct, and as you can see, I'm just incrementing the "alias" of the join instance.
Using the tables below as an example and the listed query as a base query, I want to add a way to select only rows with a max id! Without having to do a second query!
TABLE VEHICLES
id vehicleName
----- --------
1 cool car
2 cool car
3 cool bus
4 cool bus
5 cool bus
6 car
7 truck
8 motorcycle
9 scooter
10 scooter
11 bus
TABLE VEHICLE NAMES
nameId vehicleName
------ -------
1 cool car
2 cool bus
3 car
4 truck
5 motorcycle
6 scooter
7 bus
TABLE VEHICLE ATTRIBUTES
nameId attribute
------ ---------
1 FAST
1 SMALL
1 SHINY
2 BIG
2 SLOW
3 EXPENSIVE
4 SHINY
5 FAST
5 SMALL
6 SHINY
6 SMALL
7 SMALL
And the base query:
select a.*
from vehicle a
join vehicle_names b using(vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
group
by a.id
having count(distinct c.attribute) = 2;
So what I want to achieve is to select rows with certain attributes, that match a name but only one entry for each name that matches where the id is the highest!
So a working solution in this example would return the below rows:
id vehicleName
----- --------
2 cool car
10 scooter
if it was using some sort of max on the id
at the moment I get all the entries for cool car and scooter.
My real world database follows a similar structure and has 10's of thousands of entries in it so a query like above could easily return 3000+ results. I limit the results to 100 rows to keep execution time low as the results are used in a search on my site. The reason I have repeats of "vehicles" with the same name but only a different ID is that new models are constantly added but I keep the older one around for those that want to dig them up! But on a search by car name I don't want to return the older cards just the newest one which is the one with the highest ID!
The correct answer would adapt the query I provided above that I'm currently using and have it only return rows where the name matches but has the highest id!
If this isn't possible, suggestions on how I can achieve what I want without massively increasing the execution time of a search would be appreciated!
If you want to keep your logic, here what I would do:
select a.*
from vehicle a
left join vehicle a2 on (a.vehicleName = a2.vehicleName and a.id < a2.id)
join vehicle_names b on (a.vehicleName = b.vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct c.attribute) = 2;
Which yield:
+----+-------------+
| id | vehicleName |
+----+-------------+
| 2 | cool car |
| 10 | scooter |
+----+-------------+
2 rows in set (0.00 sec)
As other said, normalization could be done on few levels:
Keeping your current vehicle_names table as the primary lookup table, I would change:
update vehicle a
inner join vehicle_names b using (vehicleName)
set a.vehicleName = b.nameId;
alter table vehicle change column vehicleName nameId int;
create table attribs (
attribId int auto_increment primary key,
attribute varchar(20),
unique key attribute (attribute)
);
insert into attribs (attribute)
select distinct attribute from vehicle_attribs;
update vehicle_attribs a
inner join attribs b using (attribute)
set a.attribute=b.attribId;
alter table vehicle_attribs change column attribute attribId int;
Which led to the following query:
select a.id, b.vehicleName
from vehicle a
left join vehicle a2 on (a.nameId = a2.nameId and a.id < a2.id)
join vehicle_names b on (a.nameId = b.nameId)
join vehicle_attribs c on (a.nameId=c.nameId)
inner join attribs d using (attribId)
where d.attribute in ('SMALL', 'SHINY')
and b.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct d.attribute) = 2;
The table does not seems normalized, however this facilitate you to do this :
select max(id), vehicleName
from VEHICLES
group by vehicleName
having count(*)>=2;
I'm not sure I completely understand your model, but the following query satisfies your requirements as they stand. The first sub query finds the latest version of the vehicle. The second query satisfies your "and" condition. Then I just join the queries on vehiclename (which is the key?).
select a.id
,a.vehiclename
from (select a.vehicleName, max(id) as id
from vehicle a
where vehicleName like '%coo%'
group by vehicleName
) as a
join (select b.vehiclename
from vehicle_names b
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
group by b.vehiclename
having count(distinct c.attribute) = 2
) as b on (a.vehicleName = b.vehicleName);
If this "latest vehicle" logic is something you will need to do a lot, a small suggestion would be to create a view (see below) which returns the latest version of each vehicle. Then you could use the view instead of the find-max-query. Note that this is purely for ease-of-use, it offers no performance benefits.
select *
from vehicle a
where id = (select max(b.id)
from vehicle b
where a.vehiclename = b.vehiclename);
Without going into proper redesign of you model you could
1) Add a column IsLatest that your application could manage.
This is not perfect but will satisfy you question (until next problem, see not at the end)
All you need is when you add a new entry to issue queries such as
UPDATE a
SET IsLatest = 0
WHERE IsLatest = 1
INSERT new a
UPDATE a
SET IsLatest = 1
WHERE nameId = #last_inserted_id
in a transaction or a trigger
2) Alternatively you can find out the max_id before you issue your query
SELECT MAX(nameId)
FROM a
WHERE vehicleName = #name
3) You can do it in single SQL, and providing indexes on (vehicleName, nameId) it should actually have decent speed with
select a.*
from vehicle a
join vehicle_names b ON a.vehicleName = b.vehicleName
join vehicle_attribs c ON b.nameId = c.nameId AND c.attribute = 'SMALL'
join vehicle_attribs d ON b.nameId = c.nameId AND d.attribute = 'SHINY'
join vehicle notmax ON a.vehicleName = b.vehicleName AND a.nameid < notmax.nameid
where a.vehicleName like '%coo%'
AND notmax.id IS NULL
I have removed your GROUP BY and HAVING and replaced it with another join (assuming that only single attribute per nameId is possible).
I have also used one of the ways to find max per group and that is to join a table on itself and filter out a row for which there are no records that have a bigger id for a same name.
There are other ways, search so for 'max per group sql'. Also see here, though not complete.