I want to store data like the following, unique is on user_id and lids, in MySQL:
recordid user_id lids length breadth
------------------------------------------------------------
1 1 l1,l2 10 5
2 1 l1 7 5
3 1 l1,l3,l2 10 10
4 1 l2,l3 25 15
My query patterns are:
Give me length & breadth where lids are l2,l1
Give me length & breadth where lids are l2,l3
Basically, the input of lids can come in any order to search, still it should provide the correct length, breadth.
Since, we should not store the comma separated values in RDBMS.
Question - How should I structure the DB to have unqiue user_id/lids combinations which can provide the correct length & breadth without much string operations?
I came up with a solution to query the DB like this -
select * from table1 where find_in_set('l2', lids) AND find_in_set('l1', lids);
then in code, identify the count to be exact 2 of lids. But it is not the perfect solution. Need guidance regarding it.
AddOn - A SpringBoot + JPA (Hibernate) specific solution will be great, where there is no requirement of writing native sql query
As per comments if I create a table for lids -
recordid(fk) lid
----------------------------------
1 l1
1 l2
2 l1
3 l1
3 l3
3 l2
4 l2
4 l3
Then how will I ensure that just 1 unique combination of lids should be available for the user?
and what will be my select query? Will it be like the following?
select * from table1, lids where main.recordid = lids.recordid and lid IN ('l2','l3');
The IN operator will run a OR query instead of AND which will give wrong results as well.
Do I have to group based on the recordid in lids table then apply where condition? Apologies, I'm totally confused as I have read many articles related to it and got distracted.
Okay the question basically drills down to this - How to find if a list/set is exactly within another list
I want to find recordid having EXACT list of lids to search.
tl;dr: for design problems like this think about entities. relationships, amd sets.
You have two entities, records and lids. They have a many-to-many relationship.
Let's call your second table records_lids, to show that it's a many-to-many association table between records and lids. It has two columns, record_id and lid. When a row exists in that table it means that the record_id mentioned has the lid mentioned.
That table's primary key should be made of both its columns (record_id, lid). Because primary keys are unique, this prevents any record from having the same lid more than once.
Now, finding the set of record_id values with lid l1 is easy. You don't even need your first table.
SELECT record_id FROM records_lids WHERE lid = `l1`
To find records with multiple lids, you need to take the logical intersection of the sets of records with each lid. You can do that like this: (https://www.db-fiddle.com/f/cLf4b6LDwMH9eFRTTheZJr/0)
SELECT record_id
FROM (SELECT record_id FROM records_lids WHERE lid = 'l1') l1
NATURAL JOIN (SELECT record_id FROM records_lids WHERE lid = 'l2') l2
NATURAL JOIN (SELECT record_id FROM records_lids WHERE lid = 'l3') l3
The NATURAL JOIN operations handle the intersection operation; the result only includes rows with matching record_id values. (Some other makes of SQL table server have the INTERSECT operator, but not MySQL, yet...)
You can also do it this way (https://www.db-fiddle.com/f/cLf4b6LDwMH9eFRTTheZJr/1).
SELECT record_id
FROM records_lids
WHERE lid IN ('l1','l2','l3')
GROUP BY record_id
HAVING COUNT(*) = 3
The HAVING clause is how you insist you want records with all three lids.
Once you have the set of record_ids, you can join that to your other table. (https://www.db-fiddle.com/f/cLf4b6LDwMH9eFRTTheZJr/2)
SELECT records.*
FROM (SELECT record_id FROM records_lids WHERE lid = 'l1') l1
NATURAL JOIN (SELECT record_id FROM records_lids WHERE lid = 'l2') l2
NATURAL JOIN (SELECT record_id FROM records_lids WHERE lid = 'l3') l3
NATURAL JOIN records
or (https://www.db-fiddle.com/f/cLf4b6LDwMH9eFRTTheZJr/3)
SELECT *
FROM records
WHERE record_id IN (
SELECT record_id
FROM records_lids
WHERE lid IN ('l1','l2','l3')
GROUP BY record_id
HAVING COUNT(*) = 3
)
Edit: I did not completely understand your question. You want to exclude records without an *exactly( matching set of lids. Try this (https://www.db-fiddle.com/f/cLf4b6LDwMH9eFRTTheZJr/4). It depends on a quirk of MySQL, which is that Boolean expressions like lid IN ('l1', 'l2') have the value 0 when false and 1 when true.
SELECT *
FROM records
WHERE record_id IN (
SELECT record_id
FROM records_lids
GROUP BY record_id
HAVING SUM(lid IN ('l1', 'l2')) = 2
AND COUNT(*) = 2
)
SQL is, at its heart, a language for manipulating sets. The design technique here is
figure out your entities
work out the relationships between them
work out how to get the sets of entities you require
retrieve the rows you need matching the sets
Related
I have 2 tables:
Table1: users
id
name
faculty_id
level_id
1
john
1
1
2
mark
1
1
3
sam
1
2
Table 2: subjects
id
title
faculty_id
1
physics
1
2
chemistry
1
3
english
2
SQL query:
SELECT count(subjects.id) FROM users INNER JOIN subjects ON users.faculty_id = subjects.faculty_id WHERE users.level_id = 1
I'm trying to get count of subjects where users.level_id = 1, Which should be 2 in this case physics and chemistry.
But it's returning more than 2.
Why is that and how to get only 2?
I would recommend exists:
SELECT COUNT(*)
FROM subjects s
WHERE EXISTS (SELECT 1
FROM users u
WHERE u.faculty_id = s.faculty_id AND
u.level_id = 1
);
This counts subjects where a user exists with a level of 1.
You are joining users and subjects on faculty_id; this produces every combination of user and subject rows (2 users and 2 subjects makes 4 combined rows); change your query to SELECT users.*, subjects.* FROM... to see how this works.
count(subjects.id) counts the number of non-null subjects.id values in your results; you can just do count(distinct subjects.id).
The two tables are not directly related as none is parent to the other. The faculty table is parent to both tables and this is what relates the two tables indirectly.
When joining the faculties' students with the faculties' subjects per faculty, you get all combinations (john|physics, mark|physics, sam|physics, john|chemistry, mark||chemistry, ...). Whether John really has the subject Physics cannot even be gathered from the database. We see that John studies a faculty containing the subjects Physics and Chemistry, but does every student have every subject belonging to their faculty? You probably know but we don't. That shows that in order to write proper queries, one should know their database :-)
Now you are joining the tables and get all students per faculty multiplied with all subjects per faculty. You limit this to level_id = 1, which gets you 2 students x 2 subjects = 4. You could use COUNT(*) for this, because you are counting rows. By applying COUNT(subjects.id) instead you are only counting rows for which the subject ID is not null, but that is true for all rows, because all four combined rows have either subject ID 1 (Physics) or 2 (Chemistry). Counting something that cannot be null makes no sense, except for counting distinct, as has already been suggested. You can COUNT(DISTINCT subjects.id) to get the distinct number of subjects matching yur conditions.
This, however, has two drawbacks. First, the query doesn't clearly show your intention. Why do you join all students with all subjects, when your are not really interested in the (four) combinations? Secondly, you are building an unnecessary intermediate result (four rows in your small example) that must be searched for duplicates, so these can be removed from the counting. That means more memory consumed and more work for the DBMS.
What you want to count is subjects. So select from the subjects table. Your condition is that a student exists with level 1 for the same faculty. Conditions belong in the WHERE clause. Use EXISTS as Gordon suggests in his answer or use IN which is slightly shorter to write and may hence be considered a tad more readable (but that boils down to personal preference, as EXISTS and IN express exactly the same thing here).
select count(*)
from subjects
where faculty_id in (select faculty_id from users where level_id = 1);
You can just add "distinct" before subjects.id
your SQL query like:
SELECT count(distinct subjects.id) FROM users INNER JOIN subjects ON users.faculty_id = subjects.faculty_id WHERE users.level_id = 1
You want to count level_id and you have mentioned subject_id in the code. I would suggest first join two tables.
SELECT users.name, users.level_id,
subjects.title
FROM users
INNER JOIN subjects ON
users.faculty_id = subjects.faculty_id as new_table
After joining the table u can get the count.
SELECT level_id, COUNT(level_id)
FROM new_table
GROUP BY level_id
WHERE level_id = 1
(You have not mentioned group by in your code.)
I have two tables from two different databases, and both contain lastName and firstName columns. I need to create JOINa relationship between the two. The lastName columns match about 80% of the time, while the firstName columns match only about 20% of the time. And each table has totally different personID primary keys.
Generally speaking, what would be some "best practices" and/or tips to use when I add a foreign key to one of the tables? Since I have about 4,000 distinct persons, any labor-saving tips would be greatly appreciated.
Sample mismatched data:
db1.table1_____________________ db2.table2_____________________
23 Williams Fritz 98 Williams Frederick
25 Wilson-Smith James 12 Smith James Wilson
26 Winston Trudy 73 Winston Gertrude
Keep in mind: sometimes they match exactly, often they don't, and sometimes two different people will have the same first/last name.
You can join on multiple fields.
select *
from table1
inner join table2
on table1.firstName = table2.firstName
and table1.lastName = table2.lastName
From this you can determine how many 'duplicate' firstname / last name combos there are.
select table1.firstName, table2.lastName, count(*)
from table1
inner join table2
on table1.firstName = table2.firstName
and table1.lastName = table2.lastName
group by table1.firstName, table2.lastName
having count(*) > 1
Conversely, you can also determine the ones which match identically, and only once:
select table1.firstName, table2.lastName
from table1
inner join table2
on table1.firstName = table2.firstName
and table1.lastName = table2.lastName
group by table1.firstName, table2.lastName
having count(*) = 1
And this last query could be the basis for performing the bulk of your foreign key updates.
For those names that match more than once between the tables, they'll likely need some sort of manual intervention, unless there are other fields in the table that can be used to differentiate them?
I have one supertype table where I have to pick 1 subtype table from 2 subtypes a,b. A subtype cannot go with the other one so for me to query I have to check whether if the supertype id is contained on one of the subtypes. I have been doing experiment queries but cannot get it right.
This is what somehow I thought of:
SELECT * from supertypetable INNER JOIN
IF (a.id = given.id) then a ON a.id = supertypetable.id
ELSE b ON b.id = supertetable.id
job Table
________________________________
|job_id| blach2x....
________________________________
| 1 |
| 2 |
| 3 |
________________________________
partime Table
________________________________
|job_id| blach2x....
________________________________
| 2 |
| 3 |
________________________________
fulltime Table
________________________________
|job_id| blach2x....
________________________________
| 1 |
| |
________________________________
I want to join tables that satisfy my given id
This looks a lot like a polymorphic join in rails/activerecord. The way it's implemented there, the 'supertype' table has two fields: subtype_id and subtype_type. The subtype_type table has a string that can be easily turned into the name of the right subtype table; subtype_id has the id of the row in that table. Structuring your tables like this might help.
The next question you have to ask is what exactly are you expecting to see in the results? If you want to see the supertype table plus ALL of the subtype tables, you're probably going to have to join them one at a time, then union them all together. In other words, first join against just one of the subtype tables, then against the next one, etc. If this isn't what you're going for, maybe you could clarify your question further.
If a.id can never equal b.id you could do joing on both tables and then do a UNION and only the table where the id matched would return results:
SELECT * from supertypetable
INNER JOIN
a ON a.id = supertypetable.id
UNION
SELECT * from supertypetable
INNER JOIN
b ON b.id = supertypetable.id
If a.id can equal b.id, then this would not work. But it's an idea
EDITTING PER COMMENTS:
This approach only works if the structures of a and b are identical.
So one simple suggestion might be just:
SELECT * FROM job
left join parttime on parttime.job_id = job.job_id
left join fulltime on fulltime.job_id = job.job_id
where job.job_id = #job_id
And then let your application figure out which of the two joined tables doesn't have NULL data and display that.
If you don't mind inconsistent datasets and just always want the correct returned set regardless (although you're still going to need some kind of application logic since as you said, the structures of parttime and fulltime are different, so how are you going to display/utilize their data conditionally without some kind of inspection? And if you're going to do that inspection, you might as well do it up front, figure out for your given job_id what the subtype is, and then just pick the appropriate query to run there.)
Sorry! Digression!
A stored procedure can do this logic for you (removed all the joins, just an example):
CREATE PROCEDURE getSubTypeDATA (#job_id int)
BEGIN
IF (SELECT EXISTS(SELECT 1 FROM parttime WHERE job_id = #job_id)) = 1
BEGIN
SELECT * from parttime
END
ELSE
BEGIN
SELECT * from fulltime
END
END
Alternatively, if you want a consistent dataset (ideal for application logic), why not put the common columns between fulltime and parttime into a UNION statement, and create hard-coded NULLs for the columns they don't share. For example, if fullTime looked like
EmployeeID, DepartmentID, Salary
and partTime looked like
EmployeeID, DepartmentID, HourlyRate
you could do
SELECT job.*, employeeid, departmentid, salary, null as hourlyrate FROM job inner join fulltime on fulltime.job_id = job.job_id
where job.job_id = ?
union
SELECT job.*, employeeid, departmentid, null as salary, hourlyrate FROM job inner join parttime on parttime.job_id = job.job_id
where job.job_id = ?
If there are hundred different columns, this might be unwieldy. Also, in case this didn't make it obvious, having subtypes with completely different structures but using the same foreign key is a very good clue that you're breaking third normal form.
Basically I have two table
articleID
1
2
3
4
relatedType | articleID
3 1
4 1
3 2
4 3
5 3
2 4
I need to select the articleID that doesn't have any related records with type > 3. With this dataset I basically need:
articleID
2
4
Because their related type contain only 3 and 2.
I con do it with this query:
SELECT * FROM article
WHERE articleID NOT IN (SELECT articleID FROM relatedTable
^ WHERE type > 3 GROUP BY portalid )
|
|--- NOT IN does the trick!
BUT I would like to avoid nested query because this query is pretty slow. Any hint?
You can do
SELECT * FROM article a
WHERE NOT EXISTS
(SELECT NULL FROM relatedTable b WHERE b.type > 3
AND b.articleID = a.articleID)
Technically, all 3 ways to achieve the desired results (NOT IN, NOT EXISTS, LEFT JOIN) should behave the same (for non-nullable column) and normally generate the same execution plan except mysql where NOT IN is not recommended (or wasn't recommended prior to 5.5, maybe it changed).
I'd also blame GROUP BY in your original query for poor performance as well...
Use OUTER JOIN
SELECT a.articleID
FROM article a LEFT OUTER JOIN relatedTable r
ON (a.articleID = r.articleID and r.relatedType > 3)
WHERE r.articleID IS NULL
CORRECTION:
Sorry, I just realized that the request was not to have those rows listed which has ANY records with type > 3. You can still do it by having a sub-query in the JOIN or by creating a temp table, indexing it and then joining that. Whether any of these are actually faster than the NOT IN sub-query will depend on MySQL version and more importantly table size and stats.
If you need only article id, try this:
SELECT
articleID
FROM relatedTable
GROUP BY articleID
HAVING MAX(relatedType) <= 3
or you can JOIN this to your article table.
Using the tables below as an example and the listed query as a base query, I want to add a way to select only rows with a max id! Without having to do a second query!
TABLE VEHICLES
id vehicleName
----- --------
1 cool car
2 cool car
3 cool bus
4 cool bus
5 cool bus
6 car
7 truck
8 motorcycle
9 scooter
10 scooter
11 bus
TABLE VEHICLE NAMES
nameId vehicleName
------ -------
1 cool car
2 cool bus
3 car
4 truck
5 motorcycle
6 scooter
7 bus
TABLE VEHICLE ATTRIBUTES
nameId attribute
------ ---------
1 FAST
1 SMALL
1 SHINY
2 BIG
2 SLOW
3 EXPENSIVE
4 SHINY
5 FAST
5 SMALL
6 SHINY
6 SMALL
7 SMALL
And the base query:
select a.*
from vehicle a
join vehicle_names b using(vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
group
by a.id
having count(distinct c.attribute) = 2;
So what I want to achieve is to select rows with certain attributes, that match a name but only one entry for each name that matches where the id is the highest!
So a working solution in this example would return the below rows:
id vehicleName
----- --------
2 cool car
10 scooter
if it was using some sort of max on the id
at the moment I get all the entries for cool car and scooter.
My real world database follows a similar structure and has 10's of thousands of entries in it so a query like above could easily return 3000+ results. I limit the results to 100 rows to keep execution time low as the results are used in a search on my site. The reason I have repeats of "vehicles" with the same name but only a different ID is that new models are constantly added but I keep the older one around for those that want to dig them up! But on a search by car name I don't want to return the older cards just the newest one which is the one with the highest ID!
The correct answer would adapt the query I provided above that I'm currently using and have it only return rows where the name matches but has the highest id!
If this isn't possible, suggestions on how I can achieve what I want without massively increasing the execution time of a search would be appreciated!
If you want to keep your logic, here what I would do:
select a.*
from vehicle a
left join vehicle a2 on (a.vehicleName = a2.vehicleName and a.id < a2.id)
join vehicle_names b on (a.vehicleName = b.vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct c.attribute) = 2;
Which yield:
+----+-------------+
| id | vehicleName |
+----+-------------+
| 2 | cool car |
| 10 | scooter |
+----+-------------+
2 rows in set (0.00 sec)
As other said, normalization could be done on few levels:
Keeping your current vehicle_names table as the primary lookup table, I would change:
update vehicle a
inner join vehicle_names b using (vehicleName)
set a.vehicleName = b.nameId;
alter table vehicle change column vehicleName nameId int;
create table attribs (
attribId int auto_increment primary key,
attribute varchar(20),
unique key attribute (attribute)
);
insert into attribs (attribute)
select distinct attribute from vehicle_attribs;
update vehicle_attribs a
inner join attribs b using (attribute)
set a.attribute=b.attribId;
alter table vehicle_attribs change column attribute attribId int;
Which led to the following query:
select a.id, b.vehicleName
from vehicle a
left join vehicle a2 on (a.nameId = a2.nameId and a.id < a2.id)
join vehicle_names b on (a.nameId = b.nameId)
join vehicle_attribs c on (a.nameId=c.nameId)
inner join attribs d using (attribId)
where d.attribute in ('SMALL', 'SHINY')
and b.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct d.attribute) = 2;
The table does not seems normalized, however this facilitate you to do this :
select max(id), vehicleName
from VEHICLES
group by vehicleName
having count(*)>=2;
I'm not sure I completely understand your model, but the following query satisfies your requirements as they stand. The first sub query finds the latest version of the vehicle. The second query satisfies your "and" condition. Then I just join the queries on vehiclename (which is the key?).
select a.id
,a.vehiclename
from (select a.vehicleName, max(id) as id
from vehicle a
where vehicleName like '%coo%'
group by vehicleName
) as a
join (select b.vehiclename
from vehicle_names b
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
group by b.vehiclename
having count(distinct c.attribute) = 2
) as b on (a.vehicleName = b.vehicleName);
If this "latest vehicle" logic is something you will need to do a lot, a small suggestion would be to create a view (see below) which returns the latest version of each vehicle. Then you could use the view instead of the find-max-query. Note that this is purely for ease-of-use, it offers no performance benefits.
select *
from vehicle a
where id = (select max(b.id)
from vehicle b
where a.vehiclename = b.vehiclename);
Without going into proper redesign of you model you could
1) Add a column IsLatest that your application could manage.
This is not perfect but will satisfy you question (until next problem, see not at the end)
All you need is when you add a new entry to issue queries such as
UPDATE a
SET IsLatest = 0
WHERE IsLatest = 1
INSERT new a
UPDATE a
SET IsLatest = 1
WHERE nameId = #last_inserted_id
in a transaction or a trigger
2) Alternatively you can find out the max_id before you issue your query
SELECT MAX(nameId)
FROM a
WHERE vehicleName = #name
3) You can do it in single SQL, and providing indexes on (vehicleName, nameId) it should actually have decent speed with
select a.*
from vehicle a
join vehicle_names b ON a.vehicleName = b.vehicleName
join vehicle_attribs c ON b.nameId = c.nameId AND c.attribute = 'SMALL'
join vehicle_attribs d ON b.nameId = c.nameId AND d.attribute = 'SHINY'
join vehicle notmax ON a.vehicleName = b.vehicleName AND a.nameid < notmax.nameid
where a.vehicleName like '%coo%'
AND notmax.id IS NULL
I have removed your GROUP BY and HAVING and replaced it with another join (assuming that only single attribute per nameId is possible).
I have also used one of the ways to find max per group and that is to join a table on itself and filter out a row for which there are no records that have a bigger id for a same name.
There are other ways, search so for 'max per group sql'. Also see here, though not complete.