What is a simple way on postgres to create deeply nested JSON structures without having to write very complex indented queries? - json

I'm joining several tables together in a postgres database, and returning the values in the right joined table as an aggregated JSON structure in the left joined table. However I find that that query becomes more complicated the more tables that are joined. For example:
select row_to_json(output)
from (
select image_type.name,
(
select json_agg(instances)
from (
select image_instance.name, (
select json_agg(versions)
from (
select image_version.name
from image_version
where image_version.image_instance_id = image_version.image_instance_id
) versions
) AS versions
from image_instance
where image_instance.image_type_id = image_type.image_type_id
) instances
) AS images
from image_type
) output;
I've joined three tables here, however I'd like to add several more tables to this, but the code will quickly become unwieldy and hard to maintain. Is there a simple way to generate these kinds of aggregated joins?

First of all, JSON is no different than regular fields when combining data from multiple tables: things can get complex quite quickly. There are, however, a few techniques to keep things manageable:
1. Daisy chain functions
There is no need to treat the output from each function independently, you can feed the output from one function as input to the next in a single statement. In your example this means that you lose a level of sub-select for each level of aggregation and you can forget about the aliases. Your example becomes:
select row_to_json(row(image_type.name, (
select json_agg(image_instance.name, (
select json_agg(image_version.name)
from image_version
where image_version.image_instance_id = image_instance.id) -- join edited
from image_instance
where image_instance.image_type_id = image_type.image_type_id))))
from image_type;
2. Don't use scalar sub-queries
This may be a matter of personal taste, but scalar sub-queries tend to be difficult to read (and write: you had an obvious error in the join condition of your innermost scalar sub-query, just to illustrate my point). Use regular sub-queries with explicit joins and aggregations instead:
select row_to_json(row(it.name, iiv.name))
from image_type it
join (select image_type_id, json_agg(name, iv_name) as name
from image_instance ii
join (select image_instance_id, json_agg(name) as iv_name
from image_version group by 1) iv on iv.image_instance_id = ii.id
group by 1) iiv using (image_type_id);
3. Modularize
Right there at the beginning of the documentation, in the Tutorial section (highly recommended reading, however proficient you think you are):
Making liberal use of views is a key aspect of good SQL database
design.
create view iv_json as
select image_instance_id, json_agg(name) as iv_name
from image_version
group by 1;
create view ii_json as
select image_type_id, json_agg(name, iv_name) as name
from image_instance
join iv_json on image_instance_id = image_instance.id
group by 1;
Your main query now becomes:
select row_to_json(row(it.name, ii.name))
from image_type it
join ii_json ii using (image_type_id);
And so on...
This is obviously by far the easiest to code, test and maintain. Performance is a non-issue here: the query optimizer will flatten all the linked views into a single execution plan.
Final note: If you are using PG9.4+, you can use json_build_object() instead of row_to_json() for more intelligible output.

Related

SQL transform id and add where statement before join

I am pretty new to SQL. Here is an operation I am sure is simple for a lot of you. I am trying to join two tables across databases on the same server – dbB and dbA, and TableA (with IdA) and TableB (with IdB) respectively. But before doing that I want to transform column IdA into a number, where I would like to remove the “:XYZ” character from its values and add a where statement for another column in dbA too. Below I show my code for the join but I am not sure how to convert the values of the column. This allows me to match idAwith idB in the join. Thanks a ton in advance.
Select replace(idA, “:XYZ”, "")
from dbA.TableA guid
where event like “%2015”
left join dbB.TableB own
on guid.idA = own.idB
Few things
FROM, Joins, WHERE (unless you use subqueries) syntax order it's also the order of execution (notice select isn't listed as its near the end in order of operation but first syntactically!)
alias/fully qualify columns when multiple tables are involved so we know what field comes from what table.
order of operations has the SQL doing the from and JOINS 1st thus what you do in the select isn't available (not in scope yet) for the compiler, this is why you can't use select column aliases in the from, where or even group by as well.
I don't like Select * usually but as I don't know what columns you really need... I used it here.
As far as where before the join. most SQL compilers anymore use cost based optimization and figure out the best execution plan given your data tables and what not involved. So just put the limiting criteria in the where in this case since it's limiting the left table of the left join. If you needed to limit data on the right table of a left join, you'd put the limit on the join criteria; thus allowing it to filter as it joins.
probably need to cast IDA as integer (or to the same type as IDB) I used trim to eliminate spaces but if there are other non-display characters, you'd have issues with the left join matching)
.
SELECT guild.*, own.*
FROM dbA.TableA guid
LEFT JOIN dbB.TableB own
on cast(trim(replace(guid.idA, ':XYZ', '')) as int) = own.idB
WHERE guid.event like '%2015'
Or materialize the transformation first by using a subquery so IDA in its transformed state before the join (like algebra ()'s matter and get processed inside out)
SELECT *
FROM (SELECT cast(trim(replace(guid.idA, ':XYZ', '')) as int) as idA
FROM dbA.TableA guid
WHERE guid.event like '%2015') B
LEFT JOIN dbB.TableB own
on B.IDA = own.idB

Finding which of an array of IDs has no record with a single query

I'm generating prepared statements with PHP PDO to pull in information from two tables based on an array of IDs.
Then I realized that if an ID passed had no record I wouldn't know.
I'm locating records with
SELECT
r.`DEANumber`,
TRIM(r.`ActivityCode`) AS ActivityCode,
TRIM(r.`ActivitySubCode`) as ActivitySubCode,
// other fields...
a.Activity
FROM
`registrants` r,
`activities` a
WHERE r.`DEAnumber` IN ( ?,?,?,?,?,?,?,? )
AND a.Code = ActivityCode
AND a.Subcode = ActivitySubCode
But I am having trouble figuring out the negative join that says which of the IDs has no record.
If two tables were involved I think I could do it like this
SELECT
r.DEAnumber
FROM registrant r
LEFT JOIN registrant2 r2 ON r.DEAnumber = r2.DEAnumber
WHERE r2.DEAnumber IS NULL
But I'm stumped as to how to use the array of IDs here. Obviously I could iterate over the array and track which queries had not result but it seems like such a manual and wasteful way to go...
Obviously I could iterate over the array and track which queries had not result but it seems like such a manual and wasteful way to go.
What could be a real waste is spending time solving this non-existent "problem".
Yes, you could iterate. Either manually, or using a syntax sugar like array_diff() in PHP.
I suggest that instead of making your query more complex (means heavier to support) for little gain, you just move on.
As old man Knuth once said 'premature optimization is the root of all evil'.
The only thing I could think of a help from PDO is a fetch mode that will put IDs as keys for the returned array, and thus you'll be able to make it without [explicitly written] loop, like
$stmt->execute($ids);
$data = $stmt->fetchAll(PDO::FETCH_UNIQUE);
$notFound = array_diff($ids, array_keys($data));
Yet a manual loop would have taken only two extra lines, which is, honestly, not that a big deal to talk about.
You are on the right track - a left join that filters out matches will give you the missing joins. You just need to move all conditions on the left-joined table up into the join.
If you leave the conditions on the joined table in the where clause you effectively cause an inner join, because the where clause is executed on the rows after the join is made, which is too late if there was no join in the first place.
Change the query to use proper join syntax, specifying a left join, with the conditions on activity moved to the join'n on clause:
SELECT
r.DEANumber,
TRIM(r.ActivityCode) AS ActivityCode,
TRIM(r.ActivitySubCode) as ActivitySubCode,
// other fields...
a.Activity
FROM registrants r
LEFT JOIN activities a ON a.Code = ActivityCode
AND a.Subcode = ActivitySubCode
WHERE r.DEAnumber IN (?,?,?,?,?,?,?,?)
In your app code, if Activity is null then you know there was no activity for that id.
This won't affect performance much, other than to return (potentially) more rows.
To just select all registrants without activities:
select r.DEAnumber
from registrants r
left join activities a on a.Code = ActivityCode
and a.Subcode = ActivitySubCode
where r.`DEAnumber` IN ( ?,?,?,?,?,?,?,? )
and a.Code is null

Using an INNER JOIN without returning any columns from the joined table

Running an INNER JOIN type of query, i get duplicate column names, which can pose a problem. This has been covered here extensively and i was able to find the solution to this problem, asides from it being fairly logical, by SELECTing only the columns i need.
However, i would like to know how i could run such a query without actually returning any of the columns from the joined table.
This is my MySQL query
SELECT * FROM product z
INNER JOIN crosslink__productXmanufacturer a
ON z.id = a.productId
WHERE
(z.title LIKE "%search_term%" OR z.search_keywords LIKE "%search_term%")
AND
z.availability = 1
AND
a.manufacturerId IN (22,23,24)
Question
How would i modify this MySQL query in order to return only columns from product and none of the columns from crosslink__productXmanufacturer?
Add the table name to the *. Replace
SELECT * FROM product z
with
SELECT z.* FROM product z
Often when you are doing this, the intention may be clearer using in or exists rather than a join. The join is being used for filtering, so putting the condition in the where clause makes sense:
SELECT p.*
FROM product p
WHERE (p.title LIKE '%search_term%' OR p.search_keywords LIKE '%search_term%') AND
p.availability = 1 AND
exists (SELECT 1
FROM pXm
WHERE pXm.productId = p.id AND pxm.manufacturerId IN (22, 23, 24)
);
With the proper indexes, this should run at least as fast as the join version (the index is crosslink__productXmanufacturer(productId, manufacturerId). In addition, you don't have to worry about returning duplicate records, if there are multiple matches in crosslink__productXmanufacturer.
You may notice two other small changes I made to the query. First, the table aliases are abbreviates for the table names, making the logic easier to follow. Second, the string constants use single quotes (the ANSI standard) rather than double quotes. Using single quotes only for string and date constants helps prevent inadvertent syntax errors.

SQL query to select based on many-to-many relationship

This is really a two-part question, but in order not to mix things up, I'll divide into two actual questions. This one is about creating the correct SQL statement for selecting a row based on values in a many-to-many related table:
Now, the question is: what is the absolute simplest way of getting all resources where e.g metadata.category = subject AND where that category's corresponding metadata.value ='introduction'?
I'm sure this could be done in a lot of different ways, but I'm a novice in SQL, so please provide the simplest way possible... (If you could describe briefly what the statement means in plain English that would be great too. I have looked at introductions to SQL, but none of those I have found (for beginners) go into these many-to-many selections.)
The easiest way is to use the EXISTS clause. I'm more familiar with MSSQL but this should be close
SELECT *
FROM resources r
WHERE EXISTS (
SELECT *
FROM metadata_resources mr
INNER JOIN metadata m ON (mr.metadata_id = m.id)
WHERE mr.resource_id = r.id AND m.category = 'subject' AND m.value = 'introduction'
)
Translated into english it's 'return me all records where this subquery returns one or more rows, without returning the data for those rows'. This sub query is correlated to the outer query by the predicate mr.resource_id = r.id which uses the outer row as the predicate value.
I'm sure you can google around for more examples of the EXIST statement

MySQL JOIN based on dynamic LIKE statement between multiple tables

I have a table called faq. This table consists from fields faq_id,faq_subject.
I have another table called article which consists of article_id,ticket_id,a_body and which stores articles in a specific ticket. Naturally there is also a table "ticket" with fields ticket_id,ticket_number.
I want to retrieve a result table in format:
ticket_number,faq_id,faq_subject.
In order to do this I need to search for faq_id in the article.a_body field using %LIKE% statement.
My question is, how can I do this dynamically such that I return with SQL one result table, which is in format ticket_number,faq_id,faq_subject.
I tried multiple configurations of UNION ALL, LEFT JOIN, LEFT OUTER JOIN statements, but they all return either too many rows, or have different problems.
Is this even possible with MySQL, and is it possible to write an SQL statement which includes #variables and can take care of this?
First off, that kind of a design is problematic. You have certain data embedded within another column, which is going to cause logic as well as performance problems (since you can't index the a_body in such a way that it will help the JOIN). If this is a one-time thing then that's one issue, but otherwise you're going to have problems with this design.
Second, consider this example: You're searching for faq_id #123. You have an article that includes faq_id 4123. You're going to end up with a false match there. You can embed the faq_id values in the text with some sort of mark-up (for example, [faq_id:123]), but at that point you might as well be saving them off in another table as well.
The following query should work (I think that MySQL supports CAST, if not then you might need to adjust that).
SELECT
T.ticket_number,
F.faq_id,
F.faq_subject
FROM
Articles A
INNER JOIN FAQs F ON
A.a_body LIKE CONCAT('%', F.faq_id, '%')
INNER JOIN Tickets T ON
T.ticket_id = A.ticket_id
EDIT: Corrected to use CONCAT
SELECT DISTINCT t.ticket_number, f.faq_id, f.faq_subject
FROM faq.f
INNER JOIN article a ON (a.a_body RLIKE CONCAT('faq_id: ',faq_id))
INNER JOIN ticket t ON (t.ticket_id = a.ticket_id)
WHERE somecriteria