Am building a document manager, where a file can belong to many folders(When copied action is being performed from front-end) and a folder can belong to many file likewise, also to keep in mind a file and folder can belong to another folder.With that design i opted for a many to many relationship as that makes a lot of sense. I have three tables folders,files,file_folder where the file_folder is the pivot table.
My Simple DB Schema is defined below :
files:
file_id | name
1 | document.docx
folders:
folder_id | name
1 | root
2 | images
file_folder:
file_id | folder_id | phash
NULL | 2 | 1
1 | NULL | 1
Now the problem is, i have been trying to write a join query that returns a list of both files and folders if they share the same folder,below are the list of queries i have tried but none has been able to return the desired result.
I pretty much want my result this way:
file_id | folder_id | name
1 | NULL | document.docx
NULL | 2 | images
SELECT * FROM
folders dir
JOIN file_folder dir_file
ON dir_file.dir_id = dir.dir_id
JOIN files file
ON dir_file.file_id = file.file_id
WHERE dir_file.dir_id = 1 ORDER BY dir_file.created_at
SELECT * FROM
folders dir,files file
JOIN file_folder dir_file
ON (dir_file.file_id = file.file_id OR dir_file.dir_id = dir.dir_id)
WHERE dir_file.dir_id = 1 ORDER BY dir_file.created_at
I know am just missing something out, that i can't figure out yet. Or maybe am just getting the DB Schema totally wrong. I really don't want to put files and folder on the same table and reference phash(parent folder), this will work but copying a folder with files into another one would just only copy the folder and not the files inside of it, as i will have to duplicate all the sub folders and files to make that happen, which is very bad. I will appreciate if i can get a solution to this
EDITED:
Below works fine for me now!
SELECT * FROM folders dir
LEFT JOIN file_folder dir_file
ON dir_file.dir_id = dir.dir_id
LEFT JOIN files file
ON dir_file.file_id = file.file_id
WHERE dir_file.phash = 1 ORDER BY dir_file.created_at
Due the fact you have not always match between the tables you should use left join
SELECT *
FROM folders dir
LEFT JOIN file_folder dir_file ON dir_file.dir_id = dir.dir_id and dir_file.phash = 1
LEFT JOIN files file ON dir_file.file_id = file.file_id
ORDER BY dir_file.created_at
And in left join move the where condition in On clause (otherwise work as a inner join )
Related
I have a database containing different information from our website. One table (called Raw_Pages) contains a list of every page on our site and the path to it (along with other fields of course). Another table (called Paths) contains a list of various branches of the site that are owned by different departments.
I'm trying to run a query to basically find all pages on the site that do not fall under one of the branches specified.
table Raw_Pages
+-------------------------+--------------+
| Field | Type |
+-------------------------+--------------+
| ID | int(11) |
| Path | varchar(500) |
| Title | varchar(255) |
+-------------------------+--------------+
table Paths
+----------+--------------+
| Field | Type |
+----------+--------------+
| ID | int(11) |
| Path | varchar(255) |
+----------+--------------+
We currently have 64,002 pages I'm checking against 757 paths (All departments have multiple branches due to different ones for different file types). I'm also planning to do a similar query for files, of which we have 306,625 and pulls from the same list of 757 paths. Yes, our site is a giant mess.
From what I can tell, a LEFT JOIN is what would work best for me with a wildcard on the right side. I am a novice at code so I could be far off.
SELECT * FROM Raw_Pages LEFT JOIN Paths ON Raw_Pages.path LIKE CONCAT(Paths.Path,'%') WHERE Paths.ID IS NULL
I'm honestly not sure if the above code works or not since it just freezes phpMyAdmin when I try it. I'm assuming something is wrong in it, or there is a better way.
Thank you!
If you have an index on Paths(Path), you might be able to do:
select rp.*
from raw_pages rp
where not exists (select 1
from paths p
where p.path <= rp.path and
p.path > concat(rp.path, '(')
);
It is possible for the subquery to use an index. I'm not sure it will.
If the value of the path field is identical in the two tables you could use:
SELECT *
FROM Raw_Pages AS R
LEFT JOIN Paths AS P ON (R.path=P.path)
WHERE R.ID IS NULL
If it matches only the name of the page or a piece of the route
SELECT *
FROM Raw_Pages AS R
LEFT JOIN Paths AS P ON (R.path LIKE CONCAT('%',P.path,'%'))
WHERE R.ID IS NULL
You can check this page to verify the type of query you need
It is good practice to index the path fields in both tables so that the query is faster due to the number of records
I'm proficient with joining tables in mySQL, but I'm having trouble with joining results from the SAME table. I'm creating a folder structure in PHP where a folder has an ID, a parent ID, a random-string ID, and a label.
My DB looks like:
| id | parent_id | uniq | label
---------------------------------
| 1 | 0 | w2d4f6 | dir 1
| 2 | 1 | h9k3h7 | dir 2
The front end uses the uniq var to identify a folder. So In the DB you can see that if I am opening the folder dir 1, the folder dir 2 will be inside it, since dir 2 has dir 1's ID as its parent.
Still with me?
|- dir 1
| + dir 2
The folder dir 1 is identified by its uniq string, so w2d4f6. So what I'm wanting to do is:
Get the parent ID of the record that has uniq='w2d4f6'
The parent ID is 1
Look for records where parent_id=1
I know this is totally wrong, and I think I should be using JOIN but I tried the following without success.
SELECT folders.label,folders.parent_id FROM folders WHERE folders.uniq='w2d4f6' AND folders.id=folders.parent_id
To get the children of a folder:
select b.label, b.parent_id
from folders a, folders b
where a.uniq = 'w2d4f6' AND b.parent_id = a.id
This should work if you already have the parent data and just want to request the child data by the uniq value of the parent:
SELECT label, parent_id FROM folders WHERE parent_id IN (SELECT id FROM folders WHERE uniq='w2d4f6')
Using one single SQL query with a join:
How can I add entries from a second table only if there is a corresponding entry available?
project source
description | source source_id | value
---------------------------- --------------------------------
Project 1 | 1 1 | Additional Info 1
Project 2 | null
When I type
select project.description, source.value
from project, source
where project.source = source.source_id
and project.description = "Project 1";
As desired I receive
Project 1 | Additional Info 1
However when I replace Project 1 with Project 2 in the last line, I won't get a result, because project.source is null.
Is it possible to use a single SQL query which outputs something like this?
Project 2 | null
I´m looking for a query which covers both cases.
Any ideas?
You can use a LEFT JOIN on the project table to make sure that all projects appear in the result set even if they have no matching value in the source table. Projects from the project table which do not match will have NULL for their value.
SELECT project.description AS description, source.value AS value
FROM project LEFT JOIN source
ON project.source = source.source_id
Output:
+--------------+--------------------+
| description | value |
---------------+--------------------+
| Project 1 | Additional Info 1 |
| Project 2 | null |
+--------------+--------------------+
Try to use left join....
SELECT project.description, source.value FROM project LEFT JOIN source ON project.source = source.source_id;
I'm passing through the following situation and have not found a good solution to this problem. I am going through a optimization of a API so am looking for fastest possible solution.
The following description is not exactly what I am doing, but I think it represents the problem well.
Let's say I have a table of products:
+----+----------+
| id | name |
+----+----------+
| 1 | product1 |
| 2 | product2 |
+----+----------+
And I have a table of attachments to each product, separate by language:
+----+----------+------------+-----------------------+
| id | language | product_id | attachment_url |
+----+----------+------------+-----------------------+
| 1 | bb | 1 | image1_bb.jpg |
| 1 | en | 1 | image1_en.jpg |
| 1 | pt | 1 | image1_pt.jpg |
| 2 | bb | 1 | image2_bb.jpg |
| 2 | pt | 1 | image2_pt.jpg |
+----+----------+------------+-----------------------+
What I intend to do is to get the correct attachment according to the language selected on the request. As you can see above, I can have several attachments to each product. We use Babel (bb) as a generic language, so every time I don't have a attachment to the right language, I should get the babel version. Is also important to consider that the Primary Key of the attachments table is a composite of id + language.
So, supposing I try to get all the data in pt, my first option to create a SQL query was:
SELECT p.id, p.name,
GROUP_CONCAT( '{',a.id,',',a.attachment_url, '}' ) as attachments_list
FROM products p
LEFT JOIN attachments a
ON (a.product_id=p.id AND (a.language='pt' OR a.language='bb'))
The problem is that, with this query I always get the bb data and I only want to get it when there is no attachment on the right language.
I already tried to do a subquery changing attachments for:
(SELECT * FROM attachments GROUP BY id ORDER BY id ASC, language DESC)
but it doubles the time of the request.
I also tried using DISTINCT inside the GROUP_CONCAT, but it only works if the whole result of each row is equal, so it does not work for me.
Does anyone knows any other solution that I can apply directly into the query?
EDIT:
Combining the answers of #Vulcronos and #Barmar made the final solution at least 2x faster than the one I first suggested.
Just to add some context, for anybody else who is looking for it. I am using Phalcon. Because of it, I had a lot of trouble putting the pieces together, as Phalcon PHQL does not support subqueries, nor a lot of the other stuff I had to use.
For my scenario, where I had to deliver approximatelly 1.2MB of JSON content, with more than 2100 objects, using custom queries made the total request time up to 3x faster than Phalcon native relations management methods (hasMany(), hasManyToMany(), etc.) and 10x faster than my original solution (which used a lot the find() method).
Try doing two joins instead of one:
SELECT p.id, p.name,
GROUP_CONCAT( '{',COALESCE(a.id, b.id),',',COALESCE(a.attachment_url, b.attachment_url), '}' ) as attachments_list
FROM products p
LEFT JOIN attachments a
ON (a.product_id=p.id AND a.language='pt')
LEFT JOIN attachments b
ON (a.product_id=p.id AND a.language='bb')
and then using COALESCE to return b instead of a if a doesn't exist. You can also do it with a subselect if the above doesn't work.
OR conditions tend to make queries slow, because it's hard to optimize them with indexes. Try joining separately using the two different languages.
SELECT p.id, p.name,
IFNULL(apt.attachment_url, abb.attachment_url) AS attachment_url
FROM products AS p
JOIN attachments AS abb ON abb.product_id = p.id
LEFT JOIN attachments AS apt ON alang.product_id = p.id AND apt.language = 'pt'
WHERE abb.language = 'bb'
This assumes that all products have a bb attachment, while pt is optional.
I left out the join of Product, because it's not relevant for this problem. It's only needed to include the product name in the resultset.
SELECT a.product_id, a.id, a.attachment_url FROM attachments a
WHERE a.language = ?
OR (a.language = 'bb'
AND NOT EXISTS
(SELECT * FROM attachments
WHERE language = ?
AND id = a.id
AND product_id = a.product_id));
Notes: problems like this usually have many possible solutions. This is not necessarily the most efficient one.
I was wondering if there is a way to simplify this down from two queries to a single one. I'd like to be able to sort the data as I pull it out of the database and this would allow me to do it.
The tables are setup like:
table: files
------------------------
fileID (INT) | domainID (INT)
------------------------
table: domains
------------------------
domainID (INT) | domainName (text)
------------------------
table: serverFiles
------------------------
fileID (INT) | uniqueUploads (INT)
------------------------
I am currently running this query first:
SELECT domains.domainName, files.fileID, COUNT(files.fileID)
FROM domains, files
WHERE files.domainID = domains.domainID
GROUP BY files.domainID;
Then looping through the results of that query I am running a second query using the fileID resulting from the first query ($fileIDFromFirstQuery):
SELECT serverFiles.uniqueUploads
FROM serverFiles
WHERE serverFiles.fileID = '$fileIDFromFirstQuery';
The results come out like:
Domains | Files with Domain | Unique Uploads
--------------------------------------------------
domain1.com 32 1412
domain2.com 21 699
domain3.com 52 293
I think this should work:
SELECT domains.domainID, domainName, COUNT(*), SUM(uniqueUploads)
FROM domains
INNER JOIN files ON files.domainID = domains.domainID
INNER JOIN serverFiles on serverFiles.fileID = files.fileID
GROUP BY domains.domainID, domainName
ETA:
Maybe I'm not seeing everything here, but why not just get rid of the serverFiles table and put "uniqueUploads" on the files table? Unless maybe you are sharing files between multiple domains, in which case it would make sense.