Integrating complex SQL queries with DataMapper - datamapper

I have a fairly complicated sorting function I need to use that involves several joins and a bit of math. Because of this, there's no way to incorporate this into the standard DataMapper syntax (such User.all(:order => ...) ). Currently I have something like
repository(:default).adapter.select('
SELECT users.id, ( (count(table1.*) - count(table2.*))/((EXTRACT(EPOCH FROM current_timestamp)) ) as sort FROM users
LEFT JOIN table1
ON table1.user_id = users.id
LEFT JOIN table2
ON ...
AND ...
GROUP BY users.id
ORDER BY sort DESC')
However, I want to integrate this into DataMapper such that I can call this on any datamapper collection
For example, User.all(:name => 'bob').crazy_sort_function returns users named 'bob' sorted by my special function, or perhaps User.all(:order => crazy_sort), etc...
Whatever formulation integrates this as seamlessly into the datamapper ORM as possible.
Note: since current_timestamp is part of the query, this isn't something I can compile for all rows periodically and fit into a column.

Related

Lazily evaluate MySQL view

I have some MySQL views which define a number of extra columns based on some relatively straightforward subqueries. The database is also multi-tenanted so each row has a company ID against it.
The problem I have is my views are evaluated for every row before being filtered by the company ID, giving huge performance issues. Is there any way to lazily evaluate the view so the 'where' clause in the outer query applies to the subqueries in the view. Or is there something similar to views that I can use to add the extra fields. I want to calculate them in SQL so the calculated fields can be used for filtering/searching/sorting/pagination.
I've taken a look at the MySQL docs that explain the algorithms available and am aware that the views can't be proccessed as a 'merge' since they contain subqueries.
view
create view companies_view as
select *,
(
select count(id) from company_user where company_user.company_id = companies.id
) as user_count,
(
select count(company_user.user_id)
from company_user join users on company_user.user_id = users.id
where company_user.company_id = companies.id
and users.active = 1
) as active_user_count,
(
select count(company_user.user_id)
from company_user join users on company_user.user_id = users.id
where company_user.company_id = companies.id
and users.active = 0
as inactive_user_count
from companies;
query
select * from companies_view where company_id = 123;
I want the subqueries in the view to be evaluated AFTER applying the 'where company_id = 123' from the main query scope. I can't hard code the company ID in the view since I want the view to be usable for any company ID.
You cannot change the order of evaluation, that is set by the MySQL server.
However, in this particular case you could rewrite the whole sql statement to use joins and conditional counts instead of subqueries:
select c.*,
count(u.id) as user_count,
count(if(u.active=1, 1, null)) as active_user_count,
count(if(u.active=0, 1, null)) as inactive_user_count
from companies c
left join company_user cu on c.id=cu.company_id
left join users u on cu.user_id = u.id
group by c.company_id, ...
If you have MySQL v5.7, then you may not need to add any further fields to the group by clause since the other fields in the companies table would be functionally dependent on the company_id. In earlier versions you may have to list all fields in the companies table (depends on the sql mode settings).
Another way to optimalise such query would be using denormalisation. Your users and company_user table probably have a lot more records than your companies table. You could add a user_count, an active_user_count, and an inactive_user_count field to the companies table, add after insert / update / delete triggers to the company_user table and an after update to the users table and update these 2 fields there. This way you would not need to do the joins and the conditional counts in the view.
It is possible to convince the optimizer to handle a view with scalar subqueries using the MERGE algorithm... you just have to beat the optimizer at its own game.
This will seem quite unorthodox to some, but it is a pattern I use with success in cases where this is needed.
Create a stored function to encapsulate each subquery, then reference the stored function in the view. The optimizer remains blissfully unaware that the functions will invoke the subqueries.
CREATE FUNCTION user_count (_cid INT) RETURNS INT
DETERMINISTIC
READS SQL DATA
RETURN (SELECT count(id) FROM company_user WHERE company_user.company_id = _cid);
Note that a stored function with a single statement does not need BEGIN/END or a change of DELIMITER.
Then in the view, replace the subquery with:
user_count(id) AS user_count,
And repeat the process for each subquery.
The optimizer will then process the view as a MERGE view, select the one appropriate row from the companies table based on the outer WHERE, invoke the functions, and... problem solved.

What is a simple way on postgres to create deeply nested JSON structures without having to write very complex indented queries?

I'm joining several tables together in a postgres database, and returning the values in the right joined table as an aggregated JSON structure in the left joined table. However I find that that query becomes more complicated the more tables that are joined. For example:
select row_to_json(output)
from (
select image_type.name,
(
select json_agg(instances)
from (
select image_instance.name, (
select json_agg(versions)
from (
select image_version.name
from image_version
where image_version.image_instance_id = image_version.image_instance_id
) versions
) AS versions
from image_instance
where image_instance.image_type_id = image_type.image_type_id
) instances
) AS images
from image_type
) output;
I've joined three tables here, however I'd like to add several more tables to this, but the code will quickly become unwieldy and hard to maintain. Is there a simple way to generate these kinds of aggregated joins?
First of all, JSON is no different than regular fields when combining data from multiple tables: things can get complex quite quickly. There are, however, a few techniques to keep things manageable:
1. Daisy chain functions
There is no need to treat the output from each function independently, you can feed the output from one function as input to the next in a single statement. In your example this means that you lose a level of sub-select for each level of aggregation and you can forget about the aliases. Your example becomes:
select row_to_json(row(image_type.name, (
select json_agg(image_instance.name, (
select json_agg(image_version.name)
from image_version
where image_version.image_instance_id = image_instance.id) -- join edited
from image_instance
where image_instance.image_type_id = image_type.image_type_id))))
from image_type;
2. Don't use scalar sub-queries
This may be a matter of personal taste, but scalar sub-queries tend to be difficult to read (and write: you had an obvious error in the join condition of your innermost scalar sub-query, just to illustrate my point). Use regular sub-queries with explicit joins and aggregations instead:
select row_to_json(row(it.name, iiv.name))
from image_type it
join (select image_type_id, json_agg(name, iv_name) as name
from image_instance ii
join (select image_instance_id, json_agg(name) as iv_name
from image_version group by 1) iv on iv.image_instance_id = ii.id
group by 1) iiv using (image_type_id);
3. Modularize
Right there at the beginning of the documentation, in the Tutorial section (highly recommended reading, however proficient you think you are):
Making liberal use of views is a key aspect of good SQL database
design.
create view iv_json as
select image_instance_id, json_agg(name) as iv_name
from image_version
group by 1;
create view ii_json as
select image_type_id, json_agg(name, iv_name) as name
from image_instance
join iv_json on image_instance_id = image_instance.id
group by 1;
Your main query now becomes:
select row_to_json(row(it.name, ii.name))
from image_type it
join ii_json ii using (image_type_id);
And so on...
This is obviously by far the easiest to code, test and maintain. Performance is a non-issue here: the query optimizer will flatten all the linked views into a single execution plan.
Final note: If you are using PG9.4+, you can use json_build_object() instead of row_to_json() for more intelligible output.

Finding which of an array of IDs has no record with a single query

I'm generating prepared statements with PHP PDO to pull in information from two tables based on an array of IDs.
Then I realized that if an ID passed had no record I wouldn't know.
I'm locating records with
SELECT
r.`DEANumber`,
TRIM(r.`ActivityCode`) AS ActivityCode,
TRIM(r.`ActivitySubCode`) as ActivitySubCode,
// other fields...
a.Activity
FROM
`registrants` r,
`activities` a
WHERE r.`DEAnumber` IN ( ?,?,?,?,?,?,?,? )
AND a.Code = ActivityCode
AND a.Subcode = ActivitySubCode
But I am having trouble figuring out the negative join that says which of the IDs has no record.
If two tables were involved I think I could do it like this
SELECT
r.DEAnumber
FROM registrant r
LEFT JOIN registrant2 r2 ON r.DEAnumber = r2.DEAnumber
WHERE r2.DEAnumber IS NULL
But I'm stumped as to how to use the array of IDs here. Obviously I could iterate over the array and track which queries had not result but it seems like such a manual and wasteful way to go...
Obviously I could iterate over the array and track which queries had not result but it seems like such a manual and wasteful way to go.
What could be a real waste is spending time solving this non-existent "problem".
Yes, you could iterate. Either manually, or using a syntax sugar like array_diff() in PHP.
I suggest that instead of making your query more complex (means heavier to support) for little gain, you just move on.
As old man Knuth once said 'premature optimization is the root of all evil'.
The only thing I could think of a help from PDO is a fetch mode that will put IDs as keys for the returned array, and thus you'll be able to make it without [explicitly written] loop, like
$stmt->execute($ids);
$data = $stmt->fetchAll(PDO::FETCH_UNIQUE);
$notFound = array_diff($ids, array_keys($data));
Yet a manual loop would have taken only two extra lines, which is, honestly, not that a big deal to talk about.
You are on the right track - a left join that filters out matches will give you the missing joins. You just need to move all conditions on the left-joined table up into the join.
If you leave the conditions on the joined table in the where clause you effectively cause an inner join, because the where clause is executed on the rows after the join is made, which is too late if there was no join in the first place.
Change the query to use proper join syntax, specifying a left join, with the conditions on activity moved to the join'n on clause:
SELECT
r.DEANumber,
TRIM(r.ActivityCode) AS ActivityCode,
TRIM(r.ActivitySubCode) as ActivitySubCode,
// other fields...
a.Activity
FROM registrants r
LEFT JOIN activities a ON a.Code = ActivityCode
AND a.Subcode = ActivitySubCode
WHERE r.DEAnumber IN (?,?,?,?,?,?,?,?)
In your app code, if Activity is null then you know there was no activity for that id.
This won't affect performance much, other than to return (potentially) more rows.
To just select all registrants without activities:
select r.DEAnumber
from registrants r
left join activities a on a.Code = ActivityCode
and a.Subcode = ActivitySubCode
where r.`DEAnumber` IN ( ?,?,?,?,?,?,?,? )
and a.Code is null

ActiveRecord vs. SQL - Is there a cleaner way?

I have a Rails 4 based application that's handling some SIEM style work for us. I'm a big believer in making code as readable as possible and then worrying about optimization. I'm finding that attempting to find all of the events that contain a set of words leads to exceptionally poor performance if I rely on AR, so I've resorted to using SQL directly even though it's fragile.
Is there a better way to do the following using AR?
sql = "select event_id from events_words where generated>'#{starting_time.to_s(:db)}' and word_id in (select id from words where words.text in ('#{terms.join("', '")}')) group by event_id having count(distinct(word_id))=#{terms.count}"
events_words is a join table containing the word_id for every word in every event, the event_id for each event and generated, the timestamp when the event was generated. The generated field is being used to limit search results to a time frame and the table itself is partitioned by date to keep the indices to a size that can fit in RAM.
For even better performance and readability, consider using a JOIN operation in place of the IN (subquery). To improve readability, consider qualifying every column reference.
Personally, I would find this statement to be much more "readable":
SELECT e.event_id
FROM events_words e
JOIN ( SELECT w.id
FROM words w
WHERE w.text IN ('#{terms.join("', '")}')
) s
ON s.id = e.word_id
WHERE e.generated > '#{starting_time.to_s(:db)}'
GROUP BY e.event_id
HAVING COUNT(DISTINCT(e.word_id))=#{terms.count}
... ("readability" gauged in terms of the ability of the reader to quickly figure out what the query is doing).
As to getting a query like that done in ActiveRecord (if that's possible), I am inclined to pity the poor soul that has to wade through whatever that looks like to decipher what the query is actually doing.
EDIT
After reviewing again, I see there's no need for the inline view. (That was generated from the subquery during my initial change to the JOIN operation, but that's not really necessary.
This should return an equivalent result:
SELECT e.event_id
FROM events_words e
JOIN words w
ON w.id = e.word_id
WHERE e.generated > '#{starting_time.to_s(:db)}'
AND w.text IN ('#{terms.join("', '")}')
GROUP BY e.event_id
HAVING COUNT(DISTINCT(e.word_id))=#{terms.count}
You might try this:
EventWord.joins(:word).
where(:words => {:text => terms}).
where("generated > ?", :starting_time).
group(:event_id).
having("count(distinct(word_id)) = ?", terms.count).
select(:event_id)
Or ...
Event.joins(:word).
where(:words => {:text => terms}).
where("generated > ?", :starting_time).
group(:id).
having("count(distinct(words.id)) = ?", terms.count)

Multiple table joins in rails

How do I write the mysql query below into rails activerecord
select
A.*,
B.*
from
raga_contest_applicants_songs AS A
join
raga_contest_applicants AS B
ON B.contest_applicant_id = A.contest_applicant_id
join
raga_contest_rounds AS C
ON C.contest_cat_id = B.contest_cat_id
WHERE
C.contest_cat_id = contest_cat_id
GROUP BY
C.contest_cat_id
I know how to write joins on two tables; however, I'm not very confident on how to use join on 3 tables.
To rewrite the SQL query you've got in your question, I think it should be like the following (though I'm having a hard time fully visualizing your model relationships, so this is a bit of guesswork):
RagaContextApplicantsSong.
joins(:raga_contest_applicants => [:raga_content_rounds], :contest_cat).
group('raga_contest_rounds.contest_cat_id')
...such that the joins method takes care of both of the two joins as well as the WHERE clause, followed finally by the group call.
As more for reference:
If you're joining multiple associations to the same model you can simply list them:
Post.joins(:category, :comments)
Returns all posts that have a category and at least one comment
If you're joining nested tables you can list them as in a hash:
Post.joins(:comments => :guest)
Returns all comments made by a guest
Nested associations, multiple level:
Category.joins(:posts => [{:comments => :guest}, :tags])
Returns all posts with their comments where the post has at least one comment made by a guest
You can also chain ActiveRecord Query Interface calls such that:
Post.joins(:category, :comments)
...produces the same SQL as...
Post.joins(:category).joins(:comments)
If all else fails you can always pass a SQL fragment directly into the joins method as a stepping stone to getting from your working query to something more ARQI-centric
Client.joins('LEFT OUTER JOIN addresses ON addresses.client_id = clients.id')
=> SELECT clients.* FROM clients LEFT OUTER JOIN addresses ON addresses.client_id = clients.id