Joining pre-defined, possibly non-existing keys with table data - mysql

In MySQL (or SQL in general), is it possible to generate a list of pre-defined identifiers, joined with matching table data?
Take for instance the following table data, let's call it my_table:
id | value
---+------
1 | 'a'
3 | 'c'
Now, I have a list of possible id values and would like to get a full list of these values, together with joined data from the table above. With a list [1, 2, 3, 4], the desired result is:
item | id | value
-----+------+------
1 | 1 | 'a'
2 | NULL | NULL
3 | 3 | 'c'
4 | NULL | NULL
Obviously, a query like SELECT * FROM my_table WHERE id IN (1, 2, 3, 4) yields only results for two rows (values 'a' and 'c').
For a solution, I am thinking along the line of some form of temporary table, fed with the full list of id's ([1, 2, 3, 4]) and left joining that with the table data, such as
SELECT t1.`item`, t2.`id`, t2.`value`
FROM
...
AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
But how do I do that?
Is this even possible? Or is it really necessary to compare the result with the initial list in external code? (This would be possible, but not trivial as in my case, the identifiers are not integers)
(The ultimate idea of this, is that I would like a result set from the DB with all input id's so that I can easily identify the non-existing records)
Update: I guess it boils down to the question: how can I get a result set such as
id
---
1
2
3
4
from a (My)SQL server without having this as data in some table, but from setting the data in some query?

A new approach flashed into my mind... using a union.
SELECT t1.`item`, t2.`id`, t2.`value`
FROM (
select 1 as `item`
union select 2
union select 3
union select 4
) AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
It answers the question, but it remains to be seen whether this is the 'best' answer. It works as long as the list of items is not too long (which is the case for me).
Anyone a better solution?

Related

MySQL - how to get count of a single item frequency in a table of CSV values

I have a mysql table called "projects" with a single field containing CSV lists of project Ids. Assume that I cannot change the table structure.
I need a query that will allow me to quickly retrieve a count of rows that contain a particular project id, for example:
select count(*) from projects where '4' in (project_ids);
This returns just 1 result, which is incorrect (should be 3 results), but I think that it illustrates what I'm attempting to do.
CREATE TABLE `projects` (
`project_ids` varchar(255) DEFAULT NULL
);
INSERT INTO `projects` (`project_ids`)
VALUES
('1,2,4'),
('1,2'),
('4'),
('4,5,2'),
('1,2,5');
I was hoping that there might be a simple mysql function that would achieve this so that I don't have to anything complex sql-wise.
You could use this approach:
SELECT COUNT(*)
FROM projects
WHERE CONCAT(',', project_ids, ',') LIKE '%,4,%';
Or use FIND_IN_SET for a built-in way:
SELECT COUNT(*)
FROM projects
WHERE FIND_IN_SET('4', project_ids) > 0;
But, as to that which Gordon's comment alludes, a much better table design would be to have a junction table which relates a primary key in one table to all projects in another table. That junction table, based off your sample data, would look like this:
PK | project_id
1 | 1
1 | 2
1 | 4
2 | 1
2 | 2
3 | 4
4 | 4
4 | 5
4 | 2
5 | 1
5 | 2
5 | 5
With this design, if you wanted to find the count of PK's having a project_id of 4, you would only need a much simpler (and sargable) query:
SELECT COUNT(*)
FROM junction_table
WHERE project_id = 4;
You would need to use a like condition as follows
select count(*)
from projects
where concat(',',project_ids,',') like '%,4,%';

Extract key, value from json objects in Postgres

I have a Postgres table that has content similar to this:
id | data
1 | {"a":"4", "b":"5"}
2 | {"a":"6", "b":"7"}
3 | {"a":"8", "b":"9"}
The first column is an integer and the second is a json column.
I want to be able to expand out the keys and values from the json so the result looks like this:
id | key | value
1 | a | 4
1 | b | 5
2 | a | 6
2 | b | 7
3 | a | 8
3 | b | 9
Can this be achieved in Postgres SQL?
What I've tried
Given that the original table can be simulated as such:
select *
from
(
values
(1, '{"a":"4", "b":"5"}'::json),
(2, '{"a":"6", "b":"7"}'::json),
(3, '{"a":"8", "b":"9"}'::json)
) as q (id, data)
I can get just the keys using:
select id, json_object_keys(data::json)
from
(
values
(1, '{"a":"4", "b":"5"}'::json),
(2, '{"a":"6", "b":"7"}'::json),
(3, '{"a":"8", "b":"9"}'::json)
) as q (id, data)
And I can get them as record sets like this:
select id, json_each(data::json)
from
(
values
(1, '{"a":"4", "b":"5"}'::json),
(2, '{"a":"6", "b":"7"}'::json),
(3, '{"a":"8", "b":"9"}'::json)
) as q (id, data)
But I can't work out how to achieve the result with id, key and value.
Any ideas?
Note: the real json I'm working with is significantly more nested than this, but I think this example represents my underlying problem well.
SELECT q.id, d.key, d.value
FROM q
JOIN json_each_text(q.data) d ON true
ORDER BY 1, 2;
The function json_each_text() is a set returning function so you should use it as a row source. The output of the function is here joined laterally to the table q, meaning that for each row in the table, each (key, value) pair from the data column is joined only to that row so the relationship between the original row and the rows formed from the json object is maintained.
The table q can also be a very complicated sub-query (or a VALUES clause, like in your question). In the function, the appropriate column is used from the result of evaluating that sub-query, so you use only a reference to the alias of the sub-query and the (alias of the) column in the sub-query.
This will solve it as well:
select you_table.id , js.key, js.value
from you_table, json_each(you_table.data) as js
Another way that i think is very easy to work when you have multiple jsons to join is doing something like:
SELECT data -> 'key' AS key,
data -> 'value' AS value
FROM (SELECT Hstore(Json_each_text(data)) AS data
FROM "your_table") t;
you can
select js.key , js.value
from metadata, json_each(metadata.column_metadata) as js
where id='6eec';

SQL NOT IN [list of ids] (performance)

I'm just wondering if the amount of id's in a list will influence query performance.
query example:
SELECT * FROM foos WHERE foos.ID NOT IN (2, 4, 5, 6, 7)
Where (2, 4, 5, 6, 7) is an indefinitely long list.
And how many is too many (in context of order)?
UPDATE: The reason why i'm asking it because i have two db. On of it (read-only) is the source of items and another one contain items that is processed by operator. Every time when operator asking for new item from read-only db I want to exclude item that is already processed.
Yes, the amount of IDs in the list will impact performance. A network packet is only so big, for example, and the database has to parse all that noise and turn it into a series of:
WHERE foo.ID <> 2
AND foo.ID <> 4
AND foo.ID <> 5
AND ...
You should consider other ways to let your query know about this set.
Here is wacky rewrite of that query that might perform a little better
SELECT * FROM foos
LEFT JOIN
(
SELECT 2 id UNION
SELECT 4 UNION
SELECT 5 UNION
SELECT 6 UNION
SELECT 7
) NOT_IDS
USING (id) WHERE NOT_IDS.id IS NULL;
The NOT_IDS subquery does work as shown by the following:
mysql> SELECT * FROM
-> (
-> SELECT 2 id UNION
-> SELECT 4 UNION
-> SELECT 5 UNION
-> SELECT 6 UNION
-> SELECT 7
-> ) NOT_IDS;
+----+
| id |
+----+
| 2 |
| 4 |
| 5 |
| 6 |
| 7 |
+----+
5 rows in set (0.00 sec)
mysql>
Just for fun, and given your update, I'm going to suggest a different strategy:
You could join across tables like so ...
insert into db1.foos (cols)
select cols
from db2.foos src
left join db1.foos dst
on src.pk = dst.pk
where dst.othercolumn is null
I'm not sure how the optimizer will handle this or if it's going to be faster (depends on your indexing strategy, I guess) than what you're doing.
The db's are in the same server? If yes you can make a multi-db query with a left join and take the null ones. (here an example: Querying multiple databases at once ) . Otherwise you can make a stored procedure, pass the id's with a string, and split them inside with a regular expression. I have a similar problem, but within an in-memory db and a postgres db. Luckly my situation is (In...)

Efficiency question - Selecting numeric data from one field

I have a pair of tables and I need to search for numeric values in Table1 that match associated IDs on Table2. For example:
Table1
ID | Item
1 Cat
3 Frog
9 Dog
11 Horse
Table2
Category | Contains
Group 1 1
Group 2 3|9
Group 3 3|9|11
Originally I was thinking a LIKE would work, but if I searched for "1", I'd end up matching "11". I looked into SETs, but the MySQL docs state that the maximum number of elements is 64 and I have over 200 rows of items in Table1. I could wrap each item id with a character (e.g. "|1|") but that doesn't seem very efficient. Each Group will have unique items (e.g., there won't be two Cats in the same Group).
I found a similar topic as my problem and one of the answers suggested making another table, but I don't understand how that would work. A new table containing what, exactly?
The other option I have is to split the Contains into 6 separate columns, since there's never going to be more than 6 items in a Group, but then I'm not sure how to search all 6 columns without relying on six OR queries:
Category | C1 | C2 | C3 | C4 (etc)
Group 1 1 null null null
Group 2 3 9 null null
Group 3 3 9 11 null
SELECT * FROM Table2 WHERE C1 = '1' OR C2 = '1' OR C3 = '1' etc.
I'm not sure what the most efficient way of handling this is. I could use some advice from those with more experience with normalizing this kind of data please. Thank you.
I think it'd be best to create another table to normalize your data, however what you're proposing is not exactly what I'd suggest.
Realistically what you are modeling is a many-to-many relationship between table1 and table2. This means that one row in table1 can be associated with many rows in table2, and vice versa.
In order to create this kind of relation, you need a third table, which we can call rel_table1_table2 for now.
rel_table1_table2 will contain only primary key values from the two associated tables, which in this case seem to be table1.ID and table2.Category.
When you want to associate a row in table1 with a row in table2, you'd add a row to rel_table1_table2 with the primary key values from table1 and table2 respectively.
Example:
INSERT INTO rel_table1_table2 (ID, Category) VALUES (1, "Group 1")
When you need to find out what Items belong to a Category, you'd simply query your association table, for example:
SELECT i.Item from table1 t1 join rel_table1_table2 r on t1.ID=r.ID join table2 t2 on r.Category=t2.Category WHERE t2.Category="Group 3"
Does that make sense?
That "new" table would contain one row for each category an animal belongs to.
create table animal(
animal_id
,name
,primary key(animal_id)
)
create table category(
category_id
,name
,primary key(category_id)
)
create table animal_categories(
animal_id
,category_id
,primary key(animal_id, category_id)
)
For your example data, the animal_categories table would contain:
category_id | animal_id
+-----------+------------+
| 1 | 1 |
| 2 | 3 |
| 2 | 9 |
| 3 | 3 |
| 3 | 9 |
| 3 | 11 |
+-----------+------------+
Instead of using "like" use "REGEXP" so that you don't get "11" when looking for "1"
Break Table2.Contains in another table which joins Item and Category:
Item Item_Category Category
------ -------------- ---------
ID (1)----(*)ItemID Name
Name CategoryID(*)-------(1) ID
Now, your query will look like:
SELECT Category.* FROM Category, Item_Category
WHERE (Item_Category.CategoryID = Category.ID)
AND (Item_Category.ItemID IN (1, 2, 3, 11))
It seems like your problem is the way you are using the rows in Table 2. In databases it should always trigger a red flag when you find yourself using a list of values in a row.
Rather than having each category be in a single row in table 2, how about using the same category in multiple rows, with the Contains column only storing a single value. Your example could be changed to:
Table 1
ID | Item
1 Cat
3 Frog
9 Dog
11 Horse
Table 2
Category | Contains
Group 1 1
Group 2 3
Group 2 9
Group 3 3
Group 3 9
Group 3 11
Now when you want to find out "What items does group 2 contain?", you can write a query for that which selects all of the "Group 2" category rows from Table 2. When you want to find out, "What is the name of item 9", you can write a query that selects a row from Table 1.

Fetching linked list in MySQL database

I have a MySQL database table with this structure:
table
id INT NOT NULL PRIMARY KEY
data ..
next_id INT NULL
I need to fetch the data in order of the linked list. For example, given this data:
id | next_id
----+---------
1 | 2
2 | 4
3 | 9
4 | 3
9 | NULL
I need to fetch the rows for id=1, 2, 4, 3, 9, in that order. How can I do this with a database query? (I can do it on the client end. I am curious if this can be done on the database side. Thus, saying it's impossible is okay (given enough proof)).
It would be nice to have a termination point as well (e.g. stop after 10 fetches, or when some condition on the row turns true) but this is not a requirement (can be done on client side). I (hope I) do not need to check for circular references.
Some brands of database (e.g. Oracle, Microsoft SQL Server) support extra SQL syntax to run "recursive queries" but MySQL does not support any such solution.
The problem you are describing is the same as representing a tree structure in a SQL database. You just have a long, skinny tree.
There are several solutions for storing and fetching this kind of data structure from an RDBMS. See some of the following questions:
"What is the most efficient/elegant way to parse a flat table into a tree?"
"Is it possible to make a recursive SQL query ?"
Since you mention that you'd like to limit the "depth" returned by the query, you can achieve this while querying the list this way:
SELECT * FROM mytable t1
LEFT JOIN mytable t2 ON (t1.next_id = t2.id)
LEFT JOIN mytable t3 ON (t2.next_id = t3.id)
LEFT JOIN mytable t4 ON (t3.next_id = t4.id)
LEFT JOIN mytable t5 ON (t4.next_id = t5.id)
LEFT JOIN mytable t6 ON (t5.next_id = t6.id)
LEFT JOIN mytable t7 ON (t6.next_id = t7.id)
LEFT JOIN mytable t8 ON (t7.next_id = t8.id)
LEFT JOIN mytable t9 ON (t8.next_id = t9.id)
LEFT JOIN mytable t10 ON (t9.next_id = t10.id);
It'll perform like molasses, and the result will come back all on one row (per linked list), but you'll get the result.
If what you are trying to avoid is having several queries (one for each node) and you are able to add columns, then you could have a new column that links to the root node. That way you can pull in all the data at once by the root id, but you will still have to sort the list (or tree) on the client side.
So in this is example you would have:
id | next_id | root_id
----+---------+---------
1 | 2 | 1
2 | 4 | 1
3 | 9 | 1
4 | 3 | 1
9 | NULL | 1
Of course the disadvantage of this as opposed to traditional linked lists or trees is that the root cannot change without writing on an order of magnitude of O(n) where n is the number of nodes. This is because you would have to update the root id for each node. Fortunately though you should always be able to do this in a single update query unless you are dividing a list/tree in the middle.
This is less a solution and more of a workaround but, for a linear list (rather than the tree Bill Karwin mentioned), it might be more efficient to use a sort column on your list. For example:
TABLE `schema`.`my_table` (
`id` INT NOT NULL PRIMARY KEY,
`order` INT,
data ..,
INDEX `ix_order` (`sort_order` ASC)
);
Then:
SELECT * FROM `schema`.`my_table` ORDER BY `order`;
This has the disadvantage of slower inserts (you have to reposition all sorted elements past the insertion point) but should be fast for retrieval because the order column is indexed.