I want to count the amount of matching documents with a query using mongoid such as:
Chain.where(:updated_at.gte => past_time).count
However, I am worried that what's actually happening here is that mongoid is selecting and PARSING everything from mongoid and then returning the count to me. This seems very slow. I want mongo to directly return to me a count, so that ruby/mongoid doesnt have to parse a large amount of objects. In MYSQL I would do this by doing COUNT(column), which would spare PHP (for instance) the hassle of parsing/mapping a bunch of rows just to disregard them since I'm only interested in the amount of rows returned.
You're worrying needlessly. If you check the Mongoid docs, you'll see that Criteria#count is thing wrapper around Moped::Query#count. If you look at how Moped::Query#count works, you'll see this:
def count(limit = false)
command = { count: collection.name, query: selector }
command.merge!(skip: operation.skip, limit: operation.limit) if limit
result = collection.database.command(command)
result["n"].to_i
end
So Moped::Query#count simply sends a count command down into MongoDB, then MongoDB does the counting and sends the count back to your Ruby code.
Related
I'm developing an API using NestJS & TypeORM to fetch data from a MySQL DB. Currently I'm trying to get all the instances of an entity (HearingTonalTestPage) and all the related entities (e.g. Frequency). I can get it using createQueryBuilder:
const queryBuilder = await this.hearingTonalTestPageRepo
.createQueryBuilder('hearing_tonal_test_page')
.innerJoinAndSelect('hearing_tonal_test_page.hearingTest', 'hearingTest')
.innerJoinAndSelect('hearingTest.page', 'page')
.innerJoinAndSelect('hearing_tonal_test_page.frequencies', 'frequencies')
.innerJoinAndSelect('frequencies.frequency', 'frequency')
.where(whereConditions)
.orderBy(`page.${orderBy}`, StringToSortType(pageFilterDto.ascending));
The problem here is that this will produce a SQL query (screenshot below) which will output a line per each related entity (Frequency), when I want to output a line per each HearingTonalTestPage (in the screenshot example, 3 rows instead of 12) without losing its relations data. Reading the docs, apparently this can be easily achieved using the relations option with .find(). With QueryBuilder I see some relation methods, but from I've read, under the hood it will produce JOINs, which of course I want to avoid.
So the 1 million $ question here is: is it possible with CreateQueryBuilder to load the relations after querying the main entities (something similar to .find({ relations: { } }) )? If yes, how can I achieve it?
I am not an expert, but I had a similar case and using:
const qb = this.createQueryBuilder("product");
// apply relations
FindOptionsUtils.applyRelationsRecursively(qb, ["createdBy", "updatedBy"], qb.alias, this.metadata, "");
return qb
.orderBy("product.id", "DESC")
.limit(1)
.getOne();
it worked for me, all relations are correctly loaded.
ref: https://github.com/typeorm/typeorm/blob/master/src/find-options/FindOptionsUtils.ts
You say that you want to avoid JOINs, and are seeking an analogue of find({relations: {}}), but, as the documentation says, find({relations: {}}) uses under the hood, expectedly, LEFT JOINs. So when we talk about query with relations, it can't be without JOIN's.
Now about the problem:
The problem here is that this will produce a SQL query (screenshot
below) which will output a line per each related entity (Frequency),
when I want to output a line per each HearingTonalTestPage
Your query looks fine. And the result of the query, also, is ok. I think that you expected to have as a result of the query something similar to json structure(when the relation field contains all the information inside itself instead of creating new rows and spread all its values on several rows). But that is how the SQL works. By the way, getMany() method should return 3 HearingTonalTestPage objects, not 12, so what the SQL query returns should not worry you.
The main question:
is it possible with CreateQueryBuilder to load the relations after
querying the main entities
I did't get what do you mean by saying "after querying the main entities". Can you provide more context?
I am forced to use raw query with the Knex, since there is an issue with the union
One query, is not that bad. But now I have other type of issue.
All other Knex queries (non raw ones), they simply return an array with the results
For example:
knex('user_subscription_plan')
.select('*')
.where('paused_days', '>', 91)
.where('status', 'N_PAUSED')
will return an array, empty of there is no results.
However, if I run raw query, for example:
mySqlClient.raw('select * from user')
it will return an array, with two arrays inside it.
First one is the normal result, while other one contains some catalogue definitions.
That interferes with my logic. At the end of each call to knex, I have:
if (result.length > 0) {
// send email
}
Now, when I run the raw Query, the result is always greater then zero.
How can I tell Knex not to send the catalogue definitions, in other words, just send results back, exactly like it does on non raw queries?
According to https://github.com/knex/knex/issues/1802 there is no way around it, just do
mySqlClient.raw('select * from user')[0]
You should do it like #horatiu-jeflea answered.
Though we could add some way to knex to tell that also result of raw query should be parsed with default result parser. To make that way to appear in knex you could open feature request to its github issues.
Also there is https://knexjs.org/#Installation-post-process-response which you should be able to override to handle your raw query results are post processed.
If you want to get the same result as the select result, use select and pass your raw query as an argument without "SELECT" keyword. In your example, instead of doing:
mySqlClient.raw('select * from user')
You should do:
mySqlClient.select( mySqlClient.raw(' * from user') )
I am trying to sort my data according to timestamp field in my controller, note that the timestamp field may be null and may have some value. I wrote the following query.
#item = Item.sort_by(&:item_timestamp).reverse
.paginate(:page => params[:page], :per_page =>5)
But this gives error when I have items that have time_timestamp field value as NULL, but following query works.
#item = Item.order(:item_timestamp).reverse
.paginate(:page => params[:page], :per_page => 5)
Can anybody tell the difference between these two queries, and in which condition to use which one?
And I am using order and reverse to get the latest items from the database, Is this the best way or there are other best ways to get the latest data from database in terms of performance?
.sort_by is a Ruby method from Enumerable that is used to sort arrays (or array like objects). Using .sort_by will cause all the records to be loaded from the database into the servers memory, which can lead to serious performance problems (as well as your issue with nil values).
.order is a ActiveRecord method that adds a ORDER BY clause to the SQL select statement. The database will handle sorting the records. This is preferable in 99% of cases.
sort_by is executed in Ruby, so if you have a nil value, things will break in a way similar to this:
[3, nil, 1].sort
#=> ArgumentError: comparison of Fixnum with nil failed
order is executed by your RDBMS, which generally will do fine with NULL values. You can even specify where you want to put the NULL VALUES, by adding NULL FIRST (usually the default) or NULL LAST to your ORDER BY clause?
Hey you needn't you sort in that query, it'll work very long, if you work with DB you should always use :order, there solution for your problem
#item = Item.order('item_timestamp DESC NULLS LAST').paginate(:page => params[:page], :per_page => 5)
As it was said before me, .order is quicker, and it's enough in most cases, but sometimes you need sort_by, if you want to sort by value in a relation for example.
If you have a posts table and a view_counters table, where you have the number of views by article, you can't easily sort your posts by total views with .order.
But with sort_by, you can just do:
posts = #user.posts.joins(:view_counter)
#posts = posts.sort_by { |p| p.total_views }
.sort_by going to browse each element, get the relation value, then sort by the value of this relation, just with one code line.
You can further reduce the code with &:[attributeName], for example:
#posts = posts.sort_by(&:total_views)
Also, for your last question about the reverse, you can do this:
Item.order(item_timestamp: :desc)
When you use sort_by you break active record caching and as pointed out before, you load all the records into RAM memory.
When writing down queries, please always think about the SQL and the memory world, they are 2 separate things. It is like having an archive (SQL) and cart (Memory) where you put the files you take out of the archive to use later.
As most people mentioned the main difference is sort_by is a Ruby method and order is Rails ActiveRecord method. However, the scenario where to use them may vary case by case. For example you may have a scenario where sort_by may be appropriate if you already retrieved the data from the DB and want to sort on the loaded data. If you use order on then you might introduce n+1 issue and go to the database again while you already have the data loaded.
So, I am trying to execute a query using ArcGIS API, but it should match any Json queries. I am kind of new to this query format, so I am pretty sure I must be missing something, but I can't figure out what it is.
This page allows for testing queries on the database before I actually implement them in my code. Features in this database have several fields, including OBJECTID and Identificatie. I would like to, for example, select the feature where Identificatie = 1. If I enter this in the Where field though (Identificatie = 1) an error Failed to execute appears. This happens for every field, except for OBJECTID. Querying where OBJECTID = 1 returns the correct results. I am obviously doing something wrong, but I don't get it why OBJECTID does work here. A brief explanation (or a link to a page documenting queries for JSON, which I haven't found), would be appreciated!
Identificatie, along with most other fields in the service you're using, is a string field. Therefore, you need to use single quotes in your WHERE clause:
Identificatie = '1'
Or to get one that actually exists:
Identificatie = '1714100000729432'
OBJECTID = 1 works without quotes because it's a numeric field.
Here's a link to the correct query. And here's a link to the query with all output fields included.
I am doing something like this:
data = Model.where('something="something"')
random_data = data.rand(100..200)
returns:
NoMethodError (private method `rand' called for #<User::ActiveRecord_Relation:0x007fbab27d7ea8>):
Once I get this random data, I need to iterate through that data, like this:
random_data.each do |rd|
...
I know there's a way to fetch random data in MySQL, but I need to pick the random data like 400 times, so I think to load data once from database and 400 times to pick random number is more efficient than to run the query 400 times on MySQL.
But - how to get rid of that error?
NoMethodError (private method `rand' called for #<User::ActiveRecord_Relation:0x007fbab27d7ea8>):
Thank you in advance
I would add the following scope to the model (depends on the database you are using):
# to model/model.rb
# 'RANDOM' works with postgresql and sqlite, whereas mysql uses 'RAND'
scope :random, -> { order('RAND()') }
Then the following query would load a random number (in the range of 200-400) of objects in one query:
Model.random.limit(rand(200...400))
If you really want to do that in Rails and not in the database, then load all records and use sample:
Model.all.sample(rand(200..400))
But that to be slower (depending on the number of entries in the database), because Rails would load all records from the database and instantiate them what might take loads of memory.
It really depends how much effort you want to put into optimizing this, because there's more than one solution. Here's 2 options..
Something simple is to use ORDER BY RAND() LIMIT 400 to randomly select 400 items.
Alternatively, just select everything under the moon and then use Ruby to randomly pick 400 out of the total result set, ex:
data = Model.where(something: 'something').all # all is necessary to exec query
400.times do
data.sample # returns a random model
end
I wouldn't recommend the second method, but it should work.
Another way, which is not DB specific is :
def self.random_record
self.where('something = ? and id = ?', "something", rand(self.count))
end
The only things here is - 2 queries are being performed. self.count is doing one query - SELECT COUNT(*) FROM models and the other is your actual query to get a random record.
Well, now suppose you want n random records. Then write it like :
def self.random_records n
records = self.count
rand_ids = Array.new(n) { rand(records) }
self.where('something = ? and id IN (?)',
"something", rand_ids )
end
Use data.sample(rand(100..200))
for more info why rand is not working, read here https://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/4555