Active Record (MYSQL) - Select distinct ids from multiple columns - mysql

Ver 14.14 Distrib 5.1.73
activerecord (4.1.14)
I have a trade model that belongs to a lender and borrower. I want to find all uniq counterparties to an institution's trades in one SQL query. The query below works, but only because I flatten & unique-ify the array after the SQL query:
Trade.where("borrower_id = :id OR lender_id = :id", id: institution.id).uniq.pluck(:lender_id, :borrower_id).flatten.uniq
(I know this includes the institution itself, so we normalize after with [1,2,3,4] - [1])
But what I'd like to do is use a Group By clause or something so that my SQL query handles the flatten.uniq part.
The below does not work because it returns a nested array of unique combinations of lender_id and borrower_id:
Trade.where("borrower_id = :id OR lender_id = :id", id: institution.id).group(:lender_id, :borrower_id).uniq.pluck(:lender_id, :borrower_id)
=> [[1,2], [1,3], [2,3]]
I want a flat array of JUST unique ids: [1,2,3]
Any ideas? Thanks!

I don't understand what you're trying to, or why you'd want to include a GROUP BY clause in the absence of any aggregating functions.
FWIW, a valid query might look like this...
SELECT DISTINCT t.lender_id
, t.borrower_id
FROM trades t
WHERE 28 IN(t.borrower_id,t.lender_id);

Related

Laravel - Grouping eloquent query by date and user

I have the following database table 'observations'
I am trying to make table by group the observations using three criteria (date - user_id - Type_Name_ID):-
There is no way coming into my mind of how to form an laravel query to get the required result.
Usually you can start from the known SQL query statement to get these results and use the methods provided in Query Builder.
>>> $observationsQuery = DB::table('observations')
->selectRaw('date, count(observation_id), user_id, Type_Name_ID')
->groupBy('date', 'user_id', 'Type_Name_ID');
>>> $observationsQuery->toSql();
=> "select date, count(observation_id), user_id, Type_Name_ID from "observations"
group by "date", "user_id", "Type_Name_ID""

"NOT IN" for Active Record

I have a MySQL query that I am trying to chain a "NOT IN" at the end of it.
Here is what it looks like in ruby using Active Record:
not_in = find_by_sql("SELECT parent_dimension_id FROM relations WHERE relation_type_id = 6;").map(&:parent_dimension_id)
joins('INNER JOIN dimensions ON child_dimension_id = dimensions.id')
.where(relation_type_id: model_relation_id,
parent_dimension_id: sub_type_ids,
child_dimension_id: model_type)
.where.not(parent_dimension_id: not_in)
So the SQL query I'm trying to do looks like this:
INNER JOIN dimensions ON child_dimension_id = dimensions.id
WHERE relations.relation_type_id = 5
AND relations.parent_dimension_id
NOT IN(SELECT parent_dimension_id FROM relations WHERE relation_type_id = 6);
Can someone confirm to me what I should use for that query?
do I chain on where.not ?
If you really do want
SELECT parent_dimension_id
FROM relations
WHERE relation_type_id = 6
as a subquery, you just need to convert that SQL to an ActiveRecord relation:
Relation.select(:parent_dimension_id).where(:relation_type_id => 6)
then use that as a value in a where call the same way you'd use an array:
not_parents = Relation.select(:parent_dimension_id).where(:relation_type_id => 6)
Relation.joins('...')
.where(relation_type_id: model_relation_id, ...)
.where.not(parent_dimension_id: not_parents)
When you use an ActiveRecord relation as a value in a where and that relation selects a single column:
r = M1.select(:one_column).where(...)
M2.where(:column => r)
ActiveRecord is smart enough to inline r's SQL as an in (select one_column ...) rather than doing two queries.
You could probably replace your:
joins('INNER JOIN dimensions ON child_dimension_id = dimensions.id')
with a simpler joins(:some_relation) if your relations are set up too.
You can feed where clauses with values or arrays of values, in which case they will be translated into in (?) clauses.
Thus, the last part of your query could contain a mapping:
.where.not(parent_dimension_id:Relation.where(relation_type_id:6).map(&:parent_dimension_id))
Or you can prepare a statement
.where('parent_dimension_id not in (?)', Relation.where(relation_type_id:6).map(&:parent_dimension_id) )
which is essentially exactly the same thing

Can I query a record with multiple associated records that fit certain criteria?

I have two tables, Show, Character. Each Show has_many Characters.
class Show < ActiveRecord::Base
has_many :characters
class Character < ActiveRecord::Base
belongs_to :show
What I want to do is return the results of a Show that is associated with multiple Characters that fit certain criteria.
For example, I want to be able to return a list of Shows that have as characters both Batman and Robin. Not Batman OR Robin, Batman AND Robin.
So the query should be something like
Show.includes(:characters).where(characters: {'name = ? AND name = ?', "Batman", "Robin"})
But this returns an error. Is there a syntax for this that would work?
UPDATE
The query
Show.includes(:characters).where('characters.name = ? AND characters.name = ?', "Batman", "Robin")
returns a value of 0, even though there are definitely Shows associated with both Batman and Robin.
Using plain SQL, one solution is :
select s. *
from shows as s
join characters as c1 on (s.id=c1.show_id)
join characters as c2 on (s.id=c2.show_id)
where c1.name='Batman'
and c2.name='Robin';
Using Arel, I would translate as :
Show.joins('join characters as c1 on shows.id=c1.show_id').joins('join
characters as c2 on shows.id=c2.show_id').where('c1.name = "Batman"').where(
'c2.name="Robin"')
So you'll have to get a little fancy with SQL here; especially if you want it to be performant and handle different types of matchers.
select count(distinct(characters.name)) as matching_characters_count, shows.* from shows
inner join characters on characters.show_id = shows.id and (characters.name = 'Batman' or characters.name = 'Robin')
group by shows.id
having matching_characters_count = 2
To translate into ActiveRecord
Show.select('count(distinct(characters.name)) as matching_characters_count, shows.*').
joins('inner join characters on characters.show_id = shows.id and (characters.name = "Batman" or characters.name = "Robin")').
group('shows.id').
having('matching_characters_count = 2')
Then you'd probably do a pass with interpolation and then AREL it up; so you wouldn't be building string queries.
A plain SQL solution using aggregate functions. The SQL statements returns the ID values of the ´shows´ records you are looking for.
SELECT c.show_id
FROM characters c
WHERE c.name IN ('Batman','Robin')
GROUP BY c.show_id
HAVING COUNT(DISTINCT c.name) = 2
You can put this statement into a select_values() call and then grab the shows with the values of the returned array.
I think this must work:
super_heroes = ['Batman', 'Robin']
Show.where(
id: Character.select(:show_id).where(name: super_heroes).group(:show_id)
.having("count(distinct characters.name) = #{super_heroes.size}")
)
I just write the #sschmeck query in a Rails way, as a subquery, and add a super_heroes var to show how it can be scaled.
Reduce the number of entries in the query as soon as possible is basic idea to get a better query in performance.
Show.where("id IN (( SELECT show_id FROM characters WHERE name = 'Batman') INTERSECT (SELECT show_id FROM characters WHERE name = 'Robin'))")
In above query, the 1st sub query gets the 'show_id' for 'Batman', the next one gets the 'show_id' for 'Robin'. In my opinion, just a few characters match the condition after doing 'INTERSECT'. Btw, you can use explain command in 'rails db' for choosing what is better solution.
I think you're looking for this:
Show.includes(:characters).where(characters: {name: ["Batman", "Robin"]})
or
Show.includes(:characters).where('characters.name IN ?', ["Batman", "Robin"]).references(:characters)
The references is required because we're using a string in the where method.
So first of all, Why is your approach is wrong?
So in the "characters_shows" table you can find records that looks like this
show_id name character_id character_name
1 first show 1 batman
2 second 1 batman
1 first show 2 robin
1 first show 3 superman
As you can see there will never be a case where the character name is batman and robin at the same row
Another approach will be something like this
names = ["Batman", "Robin"]
Show.joins(:characters).where(characters: {name: names}).group("shows.id").having("COUNT(*) = #{names.length}")

mysql search for category from serialized array

I have a table 'users_category'
'users_category' : id, prefix, name
and table 'users'
'users' : categories, etc...
where users.categories is an array of users_category.id ids
User can be part of any category, if I stored the array of categories they are part of as a serialized array, how can I run a query to check if they are part of 'category x'
Ex:
SELECT users.*, users_category.* FROM 'users', 'users_category' WHERE users.categories='category x' AND users_category.id = 'category x'
I can't run a 'LIKE' command because the users.categories is serialized. Is their any way to search within the serialized data. Also I know that the above query may have errors
Even though normalizing the table structure is a better way to move forward but if adopting that route is not optimal at this point then you may try this query:
SELECT u.*, uc.*
FROM users u, users_category uc
WHERE uc.name='X' AND FIND_IN_SET(uc.id, u.categories)
It uses mysql function FIND_IN_SET().
Working demo available at: http://sqlfiddle.com/#!2/7f392/3

Django ORM - Grouped aggregates with different select clauses

Imagine we have the Django ORM model Meetup with the following definition:
class Meetup(models.Model):
language = models.CharField()
speaker = models.CharField()
date = models.DateField(auto_now=True)
I'd like to use a single query to fetch the language, speaker and date for the
latest event for each language.
>>> Meetup.objects.create(language='python', speaker='mike')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='python', speaker='ryan')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='noah')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='shawn')
<Meetup: Meetup object>
>>> Meetup.objects.values("language").annotate(latest_date=models.Max("date")).values("language", "speaker", "latest_date")
[
{'speaker': u'mike', 'language': u'python', 'latest_date': ...},
{'speaker': u'ryan', 'language': u'python', 'latest_date': ...},
{'speaker': u'noah', 'language': u'node', 'latest_date': ...},
{'speaker': u'shawn', 'language': u'node', 'latest_date': ...},
]
D'oh! We're getting the latest event, but for the wrong grouping!
It seems like I need a way to GROUP BY the language but SELECT on a different
set of fields?
Update - this sort of query seems fairly easy to express in SQL:
SELECT language, speaker, MAX(date)
FROM app_meetup
GROUP BY language;
I'd love a way to do this without using Django's raw() - is it possible?
Update 2 - after much searching, it seems there are similar questions on SO:
Django Query that gets the most recent objects
How can I do a greatest n per group query in Django
MySQL calls this sort of query a group-wise maximum of a certain column.
Update 3 - in the end, with #danihp's help, it seems the best you can do
is two queries. I've used the following approach:
# Abuse the fact that the latest Meetup always has a higher PK to build
# a ValuesList of the latest Meetups grouped by "language".
latest_meetup_pks = (Meetup.objects.values("language")
.annotate(latest_pk=Max("pk"))
.values_list("latest_pk", flat=True))
# Use a second query to grab those latest Meetups!
Meetup.objects.filter(pk__in=latest_meetup_pks)
This question is a follow up to my previous question:
Django ORM - Get latest record for group
This is the kind of queries that are easy to explain but hard to write. If this be SQL I will suggest to you a CTE filtered query with row rank over partition by language ordered by date ( desc )
But this is not SQL, this is django query api. Easy way is to do a query for each language:
languages = Meetup.objects.values("language", flat = True).distinct.order_by()
last_by_language = [ Meetup
.objects
.filter( language = l )
.latest( 'date' )
for l in languages
]
This crash if some language don't has meetings.
The other approach is to get all max data for each language:
last_dates = ( Meetup
.objects
.values("language")
.annotate(ldate=models.Max("date"))
.order_by() )
q= reduce(lambda q,meetup:
q | ( Q( language = meetup["language"] ) & Q( date = meetup["ldate"] ) ),
last_dates, Q())
your_query = Meetup.objects.filter(q)
Perhaps someone can explain how to do it in a single query without raw sql.
Edited due OP comment
You are looking for:
"SELECT language, speaker, MAX(date) FROM app_meetup GROUP BY language"
Not all rdbms supports this expression, because all fields that are not enclosed into aggregated functions on select clause should appear on group by clause. In your case, speaker is on select clause (without aggregated function) but not appear in group by.
In mysql they are not guaranties than showed result speaker was that match with max date. Because this, we are not facing a easy query.
Quoting MySQL docs:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause...However, this is useful primarily when all values
in each nonaggregated column not named in the GROUP BY are the same
for each group.
The most close query to match your requirements is:
Reults = ( Meetup
.objects
.values("language","speaker")
.annotate(ldate=models.Max("date"))
.order_by() )