Related
I'm attempting to take an existing application and re-architect the schema to support new customer requests and fix several outstanding issues (mostly around our current schema being heavily denormalized). In doing so, I've reached an interesting problem which at first glance seems to have a simple solution, but I can't seem to find the function I'm looking for.
The application is a media organization tool.
Our Old Schema:
Our old schema had separate models for "Groups", "Subgroups", and "Videos". A Group could have many Subgroups (one-to-many) and a Subgroup could have many Videos (one-to-many).
There were certain fields that were shared among Groups, Subgroups, and Videos. For instance, the Google Analytics ID to be used when the Video was embedded on a page. Whenever we displayed the embed page we would first look if the value was set on the Video. If not, we checked its Subgroup. If not, we checked its Group. The query looked roughly like so (I wish this were the real query, but unfortunately our application was written over many years by many junior developers, so the truth is much more painful):
SELECT
v.id,
COALESCE(v.google_analytics_id, sg.google_analytics_id, g.google_analytics_id) as google_analytics_id
FROM
Videos v
LEFT JOIN Subgroups sg ON sg.id = v.subgroup_id
LEFT JOIN Groups g ON g.id = sg.group_id
Pretty straight-forward. Now the issue we've run into is that customers want to be able to nest groups arbitrarily deep, and our schema clearly only allows for 2 levels (and, in fact, necessitates two levels - even if you only want one)
New Schema (First Pass):
As a first pass, I knew we'd want a basic tree structure for the Groups, so I came up with this:
CREATE TABLE Groups (
id INT PRIMARY KEY,
name VARCHAR(255),
parent_id INT,
ga_id VARCHAR(20)
)
We can then easily nest up to N levels deep with N joins like so:
SELECT
v.id,
COALESCE(v.ga_id, g1.ga_id, g2.ga_id, g3.ga_id, ...) as ga_id
FROM
Videos v
LEFT JOIN Groups g1 ON g1.id = v.group_id
LEFT JOIN Groups g2 ON g2.id = g1.parent_id
LEFT JOIN Groups g3 ON g3.id = g2.parent_id
...
There's obvious flaws with this approach: We don't know how many parents there will be so we don't know how many times we should JOIN, forcing us to implement a "max depth". Then even with a max depth, if a person only has a single level of groups we still perform multiple JOINs because our queries can't know how deep they need to go. MySQL offers recursive queries, but while looking into if that was the right option I found a smarter schema that produced the same results
New Schema (Take 2):
Looking into better ways to handle a tree structure, I learned about Adjacency Lists (my prior solution), Nested Sets, Materialized Paths, and Closure Tables. Other than Adjacency Lists (which depend on JOINs to grab the entire tree structure and so produces a single row with multiple columns per node on the tree), the other three solutions all return multiple rows for each node on the tree
I ended up going with a Closure Table solution like so:
CREATE TABLE Groups (
id INT PRIMARY KEY,
name VARCHAR(255),
ga_id VARCHAR(20)
)
CREATE TABLE Group_Closure (
ancestor_id INT,
descendant_id INT,
PRIMARY KEY (ancestor_id, descendant_id)
)
Now given a Video I can get all of its parents like so:
SELECT
v.id,
v.ga_id,
g.id,
g.ga_id
FROM
Videos v
JOIN Group_Closure gc ON v.group_id = gc.descendant
JOIN Groups g ON g.id = gc.ancestor;
This returns each group in the hierarchy as a separate row:
+------+---------+------+---------+
| v.id | v.ga_id | g.id | g.ga_id |
+------+---------+------+---------+
| 1 | abc123 | 2 | new_val |
| 1 | abc123 | 1 | default |
| 2 | NULL | 4 | xyz987 |
| 2 | NULL | 3 | NULL |
| 2 | NULL | 1 | default |
| 3 | NULL | 3 | NULL |
| 3 | NULL | 1 | default |
+------+---------+------+---------+
What I wish to do now is somehow achieve the same result I would have expected from using COALESCE on multiple self-joined Group tables: a single value for ga_id based on whichever node is "lowest" in the tree
Because I have multiple rows per Video, I suspect that this can be accomplished using GROUP BY and some kind of aggregate function:
SELECT
v.id,
COALESCE(v.ga_id, FIRST_NON_NULL(g.ga_id))
FROM
Videos v
JOIN Group_Closure gc ON v.group_id = gc.descendant
JOIN Groups g ON g.id = gc.ancestor
GROUP BY v.id, v.ga_id;
Note that because (ancestor, descendant) is my primary key, I believe the order of the group closure table can be guaranteed to always come back the same - meaning if I put the lowest node first, it will be the first row in the resulting query... If my understanding of this is incorrect, please let me know.
If you were to stick with an adjacency list, you could use a recursive CTE. This one traverses up from each video id value until it finds a non-NULL ga_id:
WITH RECURSIVE CTE AS (
SELECT id, ga_id, group_id
FROM videos
UNION ALL
SELECT CTE.id, COALESCE(CTE.ga_id, g.ga_id), g.parent_id
FROM `groups` g
JOIN CTE ON g.id = CTE.group_id AND CTE.ga_id IS NULL
)
SELECT id, ga_id
FROM CTE
WHERE ga_id IS NOT NULL
For my attempt to reconstruct your data from your question, this yields:
id ga_id
1 abc123
2 xyz987
3 default
Demo on dbfiddle
I'm a beginner when it comes to MySQL and I've taken it upon myself to create a type of translating service like google translate. The problem is the querys are not being displayed the way I enter them, instead they seem to be ordered by the ID column.
I've tried (with my limited knowledge) looking into different ways of creating relations etc. to display the equivelent words in the different languages. For now I've landed on trying to use the INNER JOIN clause to display and "structure" the sentences.
SELECT swedish.word,
german.word,
german.swear,
swedish.swear,
swedish.id
FROM swedish
INNER JOIN german
ON swedish.id = german.id
WHERE swedish.word = "Hej"
OR swedish.word = "Mitt"
OR swedish.word = "Namn"
OR swedish.word = "Är";
This will display the swedish words alongside the german words, aka create sentences but it will now diplay in the order i typed them in, instead it will sort in by the ID column, which mixes the words around. Is there any solution to this?
Here's and image of the results, ordered by the ID:
I've thought about using ORDER BY and some sort of temporary value and then order it by that but then I'm not sure about how to implement and auto increment that value for only the selected entries/rows.
I'm using OR statements to enable more than one entry in the same result, as parentheses (seen in other tutorials) gave me syntax errors.
Also, if there is a better way of going about this please let me know!
EDIT: I would want to clarify that I am aware that this is not a sustainable solution for creating a transaltion service, I simply thought this would be an interesting way to understand a bit more about how you can connect and work with different tables etc.
You can use FIND_IN_SET
ORDER BY FIND_IN_SET(swedish.word, 'Hej,Mitt,Namn,Är');
I would suggest a subquery with prioritization:
SELECT s.word, g.word, g.swear, s.swear, s.id
FROM swedish s JOIN
(SELECT 'Hej' as word, 1 as ord UNION ALL
SELECT 'Mitt' as word, 2 as ord UNION ALL
SELECT 'Namn' as word, 3 as ord UNION ALL
SELECT 'Är' as word, 4 as ord
) w
ON s.word = w.word JOIN
german g
ON s.id = g.id
ORDER BY w.ord;
The advantage of this approach over other approaches is that the list of words is only included once. This makes it easier to update and prevents errors when writing thew query.
Also, if there is a better way of going about this please let me know!
It isn’t the databases job to do this, it’s the front end’s job
If you have the sentence;
Hej Mitt Namn Ar Caius
Then the front end should do something like this (pseudocode):
string newsentence = “”
foreach(word in sentence.split(‘ ‘))
newsentence = newsentence + “ “ + dblookup(word)
(You can assume dblookup is a helper method that takes a single [swedish] word and returns the equivalent single [german] word)
The order is preserved because you perform database lookups in order as you traverse the sentence. You don’t try to send all the words to the db, and force order the results so you can just concat them back into a sentence, you look up one word at a time. If you have the same word twice in a sentence, all the approaches here (in other answers - at the time of writing this answer) will break; a sentence of “hej mitt hej” will come back ordered as “hallo hallo meine” because you can’t ask the db to order hej as both first and third, all the “hej” will order to be first
There isn’t much to be gained by submitting multiple words for translation, some minor performance benefit maybe but it would be trivial. If you were engineering this solution for performance you could have your dblookup method cache a few hundred thousand most recently requested words, but don’t bang your head on the wall of trying to submit an entire sentence to the db in “or or or” style and preserving the order; it’s so complicated to do so and for no practical benefit
As a brief aside, this isn’t how languages work either, though I appreciate that this is the very early stages and you may just be indertaking this for a learning exercise - you cannot make a translator software by literally translating each word by word individually
Not sure if that's what you mean, but try this way:
SELECT swedish.word,
german.word,
german.swear,
swedish.swear,
swedish.id
FROM swedish
INNER JOIN german
ON swedish.id = german.id
WHERE swedish.word = "Hej"
OR swedish.word = "Mitt"
OR swedish.word = "Namn"
OR swedish.word = "Är"
ORDER BY field(swedish.word,"Hej","Mitt","Namn","Är");
You can provide your way to sort the rows with CASE in ORDER BY:
SELECT
swedish.word, german.word, german.swear, swedish.swear, swedish.id
FROM swedish INNER JOIN german
ON swedish.id = german.id
WHERE swedish.word IN ('Hej', 'Mitt', 'Namn', 'Är')
ORDER BY CASE swedish.word
WHEN 'Hej' THEN 1
WHEN 'Mitt' THEN 2
WHEN 'Namn' THEN 3
WHEN 'Är' THEN 4
END
There are some fundamental problems with your question:
I've taken it upon myself to create a type of translating service like google translate.
Automated translation is hard, and can't be solved just by word-to-word database lookups (especially if you're "a beginner when it comes to MySQL"). Languages have lots of complicated grammar rules, and you can't translate a sentence just by translating it word-for-word. Take a look at this article: You'd really have to get into something like machine learning, rather than (just) database development.
If you want experience with automated translation, you might want to take a look at the Google Translate API.
Another problem is that you seem to have a separate table/entity for each language. This is problematic: As the number of languages grows, the number of tables will increase. To translate from Language A to Language B, you'll have to know which tables to use, which will likely involve dynamic SQL. It would be far better to properly normalize your data. Something like this:
CREATE TABLE Words
(
word_id INT PRIMARY KEY,
universal_name VARCHAR(255) -- the "universal" way to refer to a word (e.g. you could store the Esperanto version).
);
INSERT INTO Words
(word_id, universal_name)
VALUES
(1, 'hello');
CREATE TABLE word_translations
(
word_id INT NOT NULL FOREIGN KEY REFERENCES Words(word_id),
language VARCHAR(255) NOT NULL,
word_name VARCHAR(255) NOT NULL
);
INSERT INTO word_translations
(word_id, language, word_name)
VALUES
(1, 'en', 'hello'),
(1, 'es', 'hola');
But again, this won't really solve the problem of translation, since word-for-word translations aren't sufficient.
Same answer as Gordon Linoff, just automating the numbering of the order of filter list:
CREATE TABLE tbl
(fruit varchar(10), sweetness int)
;
INSERT INTO tbl
(fruit, sweetness)
VALUES
('apple', 7),
('banana', 6),
('papaya', 4),
('grape', 2),
('watermelon', 3)
;
Query, just including Postgres for its expressiveness :)
Live test: http://sqlfiddle.com/#!17/3fa48/3
with a as
(
select *
from unnest(array['banana','grape','apple','banana'])
with ordinality as x(f,i)
)
select tbl.*,'',a.i
from tbl
join a on tbl.fruit = a.f
order by a.i;
Output:
| fruit | sweetness | ?column? | i |
|--------|-----------|----------|---|
| banana | 6 | | 1 |
| grape | 2 | | 2 |
| apple | 7 | | 3 |
| banana | 6 | | 4 |
Query for MySQL:
Live test: http://sqlfiddle.com/#!9/86b0d9/4
select tbl.*, '', a.i
from tbl
join (
select #i := #i + 1 as i, x.f
from
(
select 'banana' as f
union all
select 'grape'
union all
select 'apple'
union all
select 'banana'
) as x
cross join (select #i := 0) y
) a on tbl.fruit = a.f
order by a.i;
Output:
| fruit | sweetness | | i |
|--------|-----------|--|---|
| banana | 6 | | 1 |
| grape | 2 | | 2 |
| apple | 7 | | 3 |
| banana | 6 | | 4 |
I've got the following two tables (in MySQL):
Phone_book
+----+------+--------------+
| id | name | phone_number |
+----+------+--------------+
| 1 | John | 111111111111 |
+----+------+--------------+
| 2 | Jane | 222222222222 |
+----+------+--------------+
Call
+----+------+--------------+
| id | date | phone_number |
+----+------+--------------+
| 1 | 0945 | 111111111111 |
+----+------+--------------+
| 2 | 0950 | 222222222222 |
+----+------+--------------+
| 3 | 1045 | 333333333333 |
+----+------+--------------+
How do I find out which calls were made by people whose phone_number is not in the Phone_book? The desired output would be:
Call
+----+------+--------------+
| id | date | phone_number |
+----+------+--------------+
| 3 | 1045 | 333333333333 |
+----+------+--------------+
There's several different ways of doing this, with varying efficiency, depending on how good your query optimiser is, and the relative size of your two tables:
This is the shortest statement, and may be quickest if your phone book is very short:
SELECT *
FROM Call
WHERE phone_number NOT IN (SELECT phone_number FROM Phone_book)
alternatively (thanks to Alterlife)
SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)
or (thanks to WOPR)
SELECT *
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number = Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL
(ignoring that, as others have said, it's normally best to select just the columns you want, not '*')
SELECT Call.ID, Call.date, Call.phone_number
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number=Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL
Should remove the subquery, allowing the query optimiser to work its magic.
Also, avoid "SELECT *" because it can break your code if someone alters the underlying tables or views (and it's inefficient).
The code below would be a bit more efficient than the answers presented above when dealing with larger datasets.
SELECT *
FROM Call
WHERE NOT EXISTS (
SELECT 'x'
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number
);
SELECT DISTINCT Call.id
FROM Call
LEFT OUTER JOIN Phone_book USING (id)
WHERE Phone_book.id IS NULL
This will return the extra id-s that are missing in your Phone_book table.
I think
SELECT CALL.* FROM CALL LEFT JOIN Phone_book ON
CALL.id = Phone_book.id WHERE Phone_book.name IS NULL
SELECT t1.ColumnID,
CASE
WHEN NOT EXISTS( SELECT t2.FieldText
FROM Table t2
WHERE t2.ColumnID = t1.ColumnID)
THEN t1.FieldText
ELSE t2.FieldText
END FieldText
FROM Table1 t1, Table2 t2
SELECT name, phone_number FROM Call a
WHERE a.phone_number NOT IN (SELECT b.phone_number FROM Phone_book b)
Alternatively,
select id from call
minus
select id from phone_number
Don't forget to check your indexes!
If your tables are quite large you'll need to make sure the phone book has an index on the phone_number field. With large tables the database will most likely choose to scan both tables.
SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)
You should create indexes both Phone_Book and Call containing the phone_number. If performance is becoming an issue try an lean index like this, with only the phone number:
The fewer fields the better since it will have to load it entirely. You'll need an index for both tables.
ALTER TABLE [dbo].Phone_Book ADD CONSTRAINT [IX_Unique_PhoneNumber] UNIQUE NONCLUSTERED
(
Phone_Number
)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = ON) ON [PRIMARY]
GO
If you look at the query plan it will look something like this and you can confirm your new index is actually being used. Note this is for SQL Server but should be similar for MySQL.
With the query I showed there's literally no other way for the database to produce a result other than scanning every record in both tables.
I want to use the data from table 'similar' to find results from table 'releases'
Table 'Similar' has this structure
artist similar_artist
Moodymann Theo Parrish
Moodymann Jeff Mills
Moodymann Marcellus Pittman
Moodymann Rick Wilhite
My query so far is
SELECT * FROM releases
WHERE
releases.all_artists REGEXP 'Moodymann'
OR releases.label_no_country='KDJ'
OR releases.all_artists IN (SELECT similar_artist
FROM similar
WHERE artist='Moodymann')
ORDER BY date DESC
the column 'all_artists' has records like this:
Moodymann | Theo Parrish | Rick Wade
Jeff Mills | Moodymann | Rick Wilhite
So the end query that I want will essentially be this
SELECT * FROM releases
WHERE
releases.all_artists REGEXP 'Moodymann'
OR releases.label_no_country='KDJ'
OR releases.all_artists IN ('Theo Parrish','Jeff Mills','Marcellus Pittman','Rick Wilhite')
To make matches I think I need to use REGEXP instead of IN - REGEXP returns the 'Subquery returns more than 1 row'. How can use the data returned from the subquery?
Also the query is taking a long time to run (up to 20 seconds) - is there anyway to speed this up as this is not usable in my web app.
Thanks!
The only way I would know of how to use REGEXP with a subquery, would be to use that subquery to produce a REGEXP string.
SELECT * FROM releases
WHERE
releases.all_artists REGEXP 'Moodymann'
OR releases.label_no_country='KDJ'
OR releases.all_artists REGEXP (
SELECT GROUP_CONCAT(similar_artist SEPARATOR '|')
FROM similar
WHERE artist='Moodymann'
GROUP BY similar_artist)
ORDER BY date DESC
The above isn't tested, is just a theory to what I might try. It's not going to be very optimal however.
update
Have since tested this and found that GROUP BY similar_artist should be GROUP BY artist
SELECT * FROM releases
WHERE
releases.all_artists REGEXP 'Moodymann'
OR releases.label_no_country='KDJ'
OR releases.all_artists REGEXP (
SELECT GROUP_CONCAT(similar_artist SEPARATOR '|')
FROM similar
WHERE artist='Moodymann'
GROUP BY artist)
ORDER BY date DESC
However, as mentioned by Pheonix you would be better off refactoring your structure to have a releases_artist table. You could then do all this work via JOINs which would be much, much faster.
Try this SQL
SELECT *
FROM releases
WHERE releases.all_artists LIKE '%Moodymann%'
OR releases.label_no_country='KDJ'
ORDER BY date DESC
SQL Fiddle
MySQL 5.5.30 Schema Setup:
CREATE TABLE Table1
(`artist` varchar(9), `similar_artist` varchar(17))
;
INSERT INTO Table1
(`artist`, `similar_artist`)
VALUES
('Moodymann', 'Theo Parrish'),
('Moodymann', 'Jeff Mills'),
('Moodymann', 'Marcellus Pittman'),
('Moodymann', 'Rick Wilhite')
;
create table allt(allf varchar(50));
insert into allt values('Moodymann | Theo Parrish | Rick Wade'),
('Jeff Mills | Moodymann | Rick Wilhite'),
('Jeff Mills | asdasdadasd | Rick Wilhite');
Query 1:
SELECT *
FROM allt
WHERE allt.allf LIKE '%Moodymann%'
Results:
| ALLF |
-----------------------------------------
| Moodymann | Theo Parrish | Rick Wade |
| Jeff Mills | Moodymann | Rick Wilhite |
You can do a join on a comma separated list (won't be fast, but might be quicker than using LIKE with a leading wild card), and you can replace your existing delimiter with a comma to allow this. Also you can use a load of UNIONs to get your list of artists to behave like a table to do a join on.
Further you can use union instead of your other WHERE clauses which might well help with allowing the use of indexes (MySQL will only use one index per table in a query, hence using OR to query on a different column forces it to not use an index for one of the columns it is checking).
As such you can do something like the following:-
SELECT releases.*
FROM releases
INNER JOIN (SELECT 'Theo Parrish' AS anArtist UNION SELECT 'Jeff Mills' UNION SELECT 'Marcellus Pittman' UNION SELECT 'Rick Wilhite') Sub1
ON FIND_IN_SET(Sub1.anArtist, REPLACE(releases.all_artists, " | ", ",")) > 0
UNION
SELECT releases.*
FROM releases
WHERE releases.label_no_country='KDJ'
However if changing the database design to split the pipe separated list of artists onto a different table is even a slight option then do that instead. It will be far quicker and will cope with far greater numbers of artists.
I have a MySQL database table with this structure:
table
id INT NOT NULL PRIMARY KEY
data ..
next_id INT NULL
I need to fetch the data in order of the linked list. For example, given this data:
id | next_id
----+---------
1 | 2
2 | 4
3 | 9
4 | 3
9 | NULL
I need to fetch the rows for id=1, 2, 4, 3, 9, in that order. How can I do this with a database query? (I can do it on the client end. I am curious if this can be done on the database side. Thus, saying it's impossible is okay (given enough proof)).
It would be nice to have a termination point as well (e.g. stop after 10 fetches, or when some condition on the row turns true) but this is not a requirement (can be done on client side). I (hope I) do not need to check for circular references.
Some brands of database (e.g. Oracle, Microsoft SQL Server) support extra SQL syntax to run "recursive queries" but MySQL does not support any such solution.
The problem you are describing is the same as representing a tree structure in a SQL database. You just have a long, skinny tree.
There are several solutions for storing and fetching this kind of data structure from an RDBMS. See some of the following questions:
"What is the most efficient/elegant way to parse a flat table into a tree?"
"Is it possible to make a recursive SQL query ?"
Since you mention that you'd like to limit the "depth" returned by the query, you can achieve this while querying the list this way:
SELECT * FROM mytable t1
LEFT JOIN mytable t2 ON (t1.next_id = t2.id)
LEFT JOIN mytable t3 ON (t2.next_id = t3.id)
LEFT JOIN mytable t4 ON (t3.next_id = t4.id)
LEFT JOIN mytable t5 ON (t4.next_id = t5.id)
LEFT JOIN mytable t6 ON (t5.next_id = t6.id)
LEFT JOIN mytable t7 ON (t6.next_id = t7.id)
LEFT JOIN mytable t8 ON (t7.next_id = t8.id)
LEFT JOIN mytable t9 ON (t8.next_id = t9.id)
LEFT JOIN mytable t10 ON (t9.next_id = t10.id);
It'll perform like molasses, and the result will come back all on one row (per linked list), but you'll get the result.
If what you are trying to avoid is having several queries (one for each node) and you are able to add columns, then you could have a new column that links to the root node. That way you can pull in all the data at once by the root id, but you will still have to sort the list (or tree) on the client side.
So in this is example you would have:
id | next_id | root_id
----+---------+---------
1 | 2 | 1
2 | 4 | 1
3 | 9 | 1
4 | 3 | 1
9 | NULL | 1
Of course the disadvantage of this as opposed to traditional linked lists or trees is that the root cannot change without writing on an order of magnitude of O(n) where n is the number of nodes. This is because you would have to update the root id for each node. Fortunately though you should always be able to do this in a single update query unless you are dividing a list/tree in the middle.
This is less a solution and more of a workaround but, for a linear list (rather than the tree Bill Karwin mentioned), it might be more efficient to use a sort column on your list. For example:
TABLE `schema`.`my_table` (
`id` INT NOT NULL PRIMARY KEY,
`order` INT,
data ..,
INDEX `ix_order` (`sort_order` ASC)
);
Then:
SELECT * FROM `schema`.`my_table` ORDER BY `order`;
This has the disadvantage of slower inserts (you have to reposition all sorted elements past the insertion point) but should be fast for retrieval because the order column is indexed.