Efficiency question - Selecting numeric data from one field - mysql

I have a pair of tables and I need to search for numeric values in Table1 that match associated IDs on Table2. For example:
Table1
ID | Item
1 Cat
3 Frog
9 Dog
11 Horse
Table2
Category | Contains
Group 1 1
Group 2 3|9
Group 3 3|9|11
Originally I was thinking a LIKE would work, but if I searched for "1", I'd end up matching "11". I looked into SETs, but the MySQL docs state that the maximum number of elements is 64 and I have over 200 rows of items in Table1. I could wrap each item id with a character (e.g. "|1|") but that doesn't seem very efficient. Each Group will have unique items (e.g., there won't be two Cats in the same Group).
I found a similar topic as my problem and one of the answers suggested making another table, but I don't understand how that would work. A new table containing what, exactly?
The other option I have is to split the Contains into 6 separate columns, since there's never going to be more than 6 items in a Group, but then I'm not sure how to search all 6 columns without relying on six OR queries:
Category | C1 | C2 | C3 | C4 (etc)
Group 1 1 null null null
Group 2 3 9 null null
Group 3 3 9 11 null
SELECT * FROM Table2 WHERE C1 = '1' OR C2 = '1' OR C3 = '1' etc.
I'm not sure what the most efficient way of handling this is. I could use some advice from those with more experience with normalizing this kind of data please. Thank you.

I think it'd be best to create another table to normalize your data, however what you're proposing is not exactly what I'd suggest.
Realistically what you are modeling is a many-to-many relationship between table1 and table2. This means that one row in table1 can be associated with many rows in table2, and vice versa.
In order to create this kind of relation, you need a third table, which we can call rel_table1_table2 for now.
rel_table1_table2 will contain only primary key values from the two associated tables, which in this case seem to be table1.ID and table2.Category.
When you want to associate a row in table1 with a row in table2, you'd add a row to rel_table1_table2 with the primary key values from table1 and table2 respectively.
Example:
INSERT INTO rel_table1_table2 (ID, Category) VALUES (1, "Group 1")
When you need to find out what Items belong to a Category, you'd simply query your association table, for example:
SELECT i.Item from table1 t1 join rel_table1_table2 r on t1.ID=r.ID join table2 t2 on r.Category=t2.Category WHERE t2.Category="Group 3"
Does that make sense?

That "new" table would contain one row for each category an animal belongs to.
create table animal(
animal_id
,name
,primary key(animal_id)
)
create table category(
category_id
,name
,primary key(category_id)
)
create table animal_categories(
animal_id
,category_id
,primary key(animal_id, category_id)
)
For your example data, the animal_categories table would contain:
category_id | animal_id
+-----------+------------+
| 1 | 1 |
| 2 | 3 |
| 2 | 9 |
| 3 | 3 |
| 3 | 9 |
| 3 | 11 |
+-----------+------------+

Instead of using "like" use "REGEXP" so that you don't get "11" when looking for "1"

Break Table2.Contains in another table which joins Item and Category:
Item Item_Category Category
------ -------------- ---------
ID (1)----(*)ItemID Name
Name CategoryID(*)-------(1) ID
Now, your query will look like:
SELECT Category.* FROM Category, Item_Category
WHERE (Item_Category.CategoryID = Category.ID)
AND (Item_Category.ItemID IN (1, 2, 3, 11))

It seems like your problem is the way you are using the rows in Table 2. In databases it should always trigger a red flag when you find yourself using a list of values in a row.
Rather than having each category be in a single row in table 2, how about using the same category in multiple rows, with the Contains column only storing a single value. Your example could be changed to:
Table 1
ID | Item
1 Cat
3 Frog
9 Dog
11 Horse
Table 2
Category | Contains
Group 1 1
Group 2 3
Group 2 9
Group 3 3
Group 3 9
Group 3 11
Now when you want to find out "What items does group 2 contain?", you can write a query for that which selects all of the "Group 2" category rows from Table 2. When you want to find out, "What is the name of item 9", you can write a query that selects a row from Table 1.

Related

MySQL - how to get count of a single item frequency in a table of CSV values

I have a mysql table called "projects" with a single field containing CSV lists of project Ids. Assume that I cannot change the table structure.
I need a query that will allow me to quickly retrieve a count of rows that contain a particular project id, for example:
select count(*) from projects where '4' in (project_ids);
This returns just 1 result, which is incorrect (should be 3 results), but I think that it illustrates what I'm attempting to do.
CREATE TABLE `projects` (
`project_ids` varchar(255) DEFAULT NULL
);
INSERT INTO `projects` (`project_ids`)
VALUES
('1,2,4'),
('1,2'),
('4'),
('4,5,2'),
('1,2,5');
I was hoping that there might be a simple mysql function that would achieve this so that I don't have to anything complex sql-wise.
You could use this approach:
SELECT COUNT(*)
FROM projects
WHERE CONCAT(',', project_ids, ',') LIKE '%,4,%';
Or use FIND_IN_SET for a built-in way:
SELECT COUNT(*)
FROM projects
WHERE FIND_IN_SET('4', project_ids) > 0;
But, as to that which Gordon's comment alludes, a much better table design would be to have a junction table which relates a primary key in one table to all projects in another table. That junction table, based off your sample data, would look like this:
PK | project_id
1 | 1
1 | 2
1 | 4
2 | 1
2 | 2
3 | 4
4 | 4
4 | 5
4 | 2
5 | 1
5 | 2
5 | 5
With this design, if you wanted to find the count of PK's having a project_id of 4, you would only need a much simpler (and sargable) query:
SELECT COUNT(*)
FROM junction_table
WHERE project_id = 4;
You would need to use a like condition as follows
select count(*)
from projects
where concat(',',project_ids,',') like '%,4,%';

Joining pre-defined, possibly non-existing keys with table data

In MySQL (or SQL in general), is it possible to generate a list of pre-defined identifiers, joined with matching table data?
Take for instance the following table data, let's call it my_table:
id | value
---+------
1 | 'a'
3 | 'c'
Now, I have a list of possible id values and would like to get a full list of these values, together with joined data from the table above. With a list [1, 2, 3, 4], the desired result is:
item | id | value
-----+------+------
1 | 1 | 'a'
2 | NULL | NULL
3 | 3 | 'c'
4 | NULL | NULL
Obviously, a query like SELECT * FROM my_table WHERE id IN (1, 2, 3, 4) yields only results for two rows (values 'a' and 'c').
For a solution, I am thinking along the line of some form of temporary table, fed with the full list of id's ([1, 2, 3, 4]) and left joining that with the table data, such as
SELECT t1.`item`, t2.`id`, t2.`value`
FROM
...
AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
But how do I do that?
Is this even possible? Or is it really necessary to compare the result with the initial list in external code? (This would be possible, but not trivial as in my case, the identifiers are not integers)
(The ultimate idea of this, is that I would like a result set from the DB with all input id's so that I can easily identify the non-existing records)
Update: I guess it boils down to the question: how can I get a result set such as
id
---
1
2
3
4
from a (My)SQL server without having this as data in some table, but from setting the data in some query?
A new approach flashed into my mind... using a union.
SELECT t1.`item`, t2.`id`, t2.`value`
FROM (
select 1 as `item`
union select 2
union select 3
union select 4
) AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
It answers the question, but it remains to be seen whether this is the 'best' answer. It works as long as the list of items is not too long (which is the case for me).
Anyone a better solution?

How to SELECT same ID of values occuring in the same column?

I have got a table names in MySQL with following columns ID, type, row, value
The composite primary key is ID, type, row
The purpose of this table is to save all names and professions of a specified person in multiple rows - one data per row.
For example: Commonly in Spain people have two first names and two last names, like José Anastacio Rojas Laguna.
In germany, there are many persons having one first name but two last names. And even persons with wide profession, like teaching on university and working as a doctor in a hospital at the same time. In this case, in germany people would have trailing Prof. Dr. in their names. For example: Prof. Dr. José Anastacio Rojas Laguna
In this case, I would store all these information in the table like this:
ID | type | row | value
1 | 0 | 1 | Prof.
1 | 0 | 2 | Dr.
1 | 1 | 1 | José
1 | 1 | 2 | Anastacio
1 | 2 | 1 | Rojas
1 | 2 | 2 | Laguna
An ID is given for one single person. Every people in the table have one unique ID and even one person is identified by his ID. type defines as it says the type of the name. 0 means profession, 1 means first name and 2 means last name. row defines the position in the name. 1 means 1st first name, 2 means 2nd firstname, 3 means 3rd firstname, etc... The same for profession and last name.
Now I would like to find out, how i can SELECT the ID of a specified person by just passing some of the names of that person? How can I determine the ID by only giving a few of the values, which occur all in (or have) the same ID?
This will return users that have the name José Laguna with the same ID:
select t1.id, t1.name, t2.name
from yourTable t1
join (select * from yourTable
where name = 'Laguna') t2
on t1.id = t2.id
where t1.name = 'José'
I use José You could use a variable #searchText
SELECT *
FROM YourTable
WHERE ID IN (SELECT DISTINCT ID
FROM YourTable
WHERE value = 'José')
Or maybe use an IN if multiple parameters
WHERE value IN ('José', 'Laguna')
So here's something using GROUP_CONCAT. Tested with your sample data and works.
It groups together all of the person's titles into a single column, their given name into another single column, and all their family names into a third column. It wraps each of those columns with commas to ensure finding a particular name is accurate.
The snippet below will find anyone who:
Has atleast one given name of "José" and
Has atleast one family name of "Rojas"
All you have to do to find a different user is change the WHERE clause.
SELECT n.ID,n.type,n.row,n.value
FROM names n
INNER JOIN (
SELECT ID
FROM (
SELECT ID
,CONCAT(',',GROUP_CONCAT((CASE WHEN type=0 THEN value ELSE NULL END) ORDER BY value ASC),',') AS titles
,CONCAT(',',GROUP_CONCAT((CASE WHEN type=1 THEN value ELSE NULL END) ORDER BY value ASC),',') AS givenNames
,CONCAT(',',GROUP_CONCAT((CASE WHEN type=2 THEN value ELSE NULL END) ORDER BY value ASC),',') AS familyNames
FROM `names`
GROUP BY ID
) grouped
WHERE grouped.givenNames LIKE '%,Jose,%' AND grouped.familyNames LIKE '%,rojas,%'
) people ON n.ID = people.ID
Before edit, this may have not worked as intended. The extra commas ensure the name searched for is not found as a substring

Generating "Fake" Records Within A Query

I have a very basic statement, e.g.:
SELECT pet, animal_type, number_of_legs
FROM table
However, where table currently is, I want to insert some fake data, along the lines of:
rufus cat 3
franklin turtle 1
norm dog 5
Is it possible to "generate" these fake records, associating each value with the corresponding field, from within a query so that they are returned as the result of the query?
SELECT pet, animal_type, number_of_legs FROM table
union select 'rufus', 'cat', 3
union select 'franklin', 'turtle', 1
union select 'norm', 'dog', 5
This gives you the content of table plus the 3 records you want, avoiding duplicates, if duplicates are OK, then replace union with union all
edit: per your comment, for tsql, you can do:
select top 110 'franklin', 'turtle', 1
from sysobjects a, sysobjects b -- this cross join gives n^2 records
Be sure to chose a table where n^2 is greater than the needed records or cross join again and again
I'm not entirely sure what you're trying to do, but MySQL is perfectly capable of selecting "mock" data and printing it in a table:
SELECT "Rufus" AS "Name", "Cat" as "Animal", "3" as "Number of Legs"
UNION
SELECT "Franklin", "Turtle", "1"
UNION
SELECT "Norm", "Dog", "5";
Which would result in:
+----------+--------+----------------+
| Name | Animal | Number of Legs |
+----------+--------+----------------+
| Rufus | Cat | 3 |
| Franklin | Turtle | 1 |
| Norm | Dog | 5 |
+----------+--------+----------------+
Doing this query this way prevents actually having to save information in a temporary table, but I'm not sure if it's the correct way of doing things.

Row number for query results grouped by a column

I have a table that has the following columns:
id | fk_id | rcv_date
There may be multiple records with a common fk_id, which represents a foreign key id in a related table.
I need to create a query that will assign a row number to each record, grouped by fk_id, sorted by rcv_date.
I originally began with the following query, which works quite well for sorting and assigning row numbers:
SELECT #row:=#row +1 AS ordinality, c.fk_id, rcv_date
FROM (SELECT #row:=0) r, mytable c
ORDER BY rcv_date
However -- the row count and sorting is done across the entire dataset. I need the counting to be within a common fk_id. For example, the following sample data would return (the first column represents the row count/ordinality):
1 | 5 | 2011-10-01
2 | 5 | 2011-10-14
3 | 5 | 2011-11-02
4 | 5 | 2011-12-17
1 | 8 | 2011-09-03
2 | 8 | 2011-11-12
1 | 9 | 2011-10-08
2 | 9 | 2011-10-10
3 | 9 | 2011-11-19
The middle column represents the fk_id. As you can see, the sorting and row count is within the fk_id "grouping."
UPDATE
I have a query that seems to be working, but would like some input as to whether it can be improved:
SELECT IF(#last = c.fk_id, #row:=#row +1, #row:=1) AS ordinality, #last:=c.fk_id, c.fk_id, rcv_date
FROM (SELECT #row:=0) r, (SELECT #last:=0) l, mytable c
ORDER BY c.fk_id, rcv_date
So what this does is order by fk_id and then rcv_date -- which essentially handles my grouping. Then I use a second variable to compare the fk_id in the previous record with the current record: if it's the same, we increment the row; if different, we reset to 1.
My tests with real data appear to be working. I suspect it's a pretty inefficient query though -- so if anyone has ideas for improving it, or see possible flaws, I would love to hear.
This should be pretty straightforward.
SELECT (CASE WHEN #fk <> fk_id THEN #row:=1 ELSE #row:=#row + 1 END) AS ordinality,
#fk:=fk_id, rcv_date
FROM (SELECT #row:=0) AS r,
(SELECT #fk:=0) AS f,
(SELECT fk_id, rcv_date FROM files ORDER BY fk_id, rcv_date) AS t
I ordered by fk_id first to ensure all your foreign keys come together (what if they are not really in the table?), then I did your preferred ordering, ie by rcv_date. The query checks for a change in fk_id and if there is one, then row number variable is set to 1, or else the variable is incremented. Its handled in case statement. Notice that #fk:=fk_id is done after the case checking else it will affect the row number.
Edit: Just noticed your own solution which happened to be the same as I ended up with. Kudos! :)