ThinkingSphinx indexing where condition is in joined table - thinking-sphinx

How to make index only records of active users, in my scenario i need to index messages but only of users who are active
so in messages model
define_index do
indexes messages.subject
indexes messages body
where "messages.user.is_active = 1"
end
How can this conditional clause be implemented.

Try adding a field for the association. After indexing the data the generated SQL query in config/development.sphinx.conf, will then have the join.
define index do
indexes subject
indexes body
indexes user.is_active, :as => :user_is_active
where "user.is_active = 1"
end

Related

database - how to do right indexing for fast execution of large data in mysql

I have a table which has a huge amount of data. I have 9 column in that table (bp_detail) and 1 column of ID which is my primary key in the table. So I am fetching data using query
select * from bp_detail
so what I need to do to get data in a fast way? should I need to make indexes? if yes then on which column?
I am also using that table (bp_detail) for inner join with a table (extras) to get record on the base of where clause, and the query that I am using is:
select * from bp_detail bp inner join extras e
on (bp.id = e.bp_id)
where bp.id = '4' or bp.name = 'john'
I have joined these tables by applying foreign key on bp_detail id and extras bp_id so in this case what should I do to get speedy data. Right Now I have an indexed on column "name" in extras table.
Guidance highly obliged
If selecting all records you would gain nothing by indexing any column. Index makes filtering/ordering by the database engine quicker. Imagine large book with 20000 pages. Having index on first page with chapter names and page numbers you can quickly navigate through the book. Same applies to the database since it is nothing more than a collection of records kept one after another.
You are planning to join tables though. The filtering takes place when JOINING:
on (bp.id = e.bp_id)
and in the WHERE:
where bp.id = '4' or bp.name = 'john'
(Anyway, any reason why you are filtering by both the ID and the NAME? ID should be unique enough).
Usually table ID's should be primary keys so joining is covered. If you plan to filter by the name frequently, consider adding an index there too. You ought to check how does database indexes work as well.
Regarding the name index, the lookup speed depends on search type. If you plan to use the = equality search it will be very quick. It will be quite quick with right wildcard too (eg. name = 'john%'), but quite slow with the wildcard on both sides (eg. name = '%john%').
Anyway, is your database large enough? Without much data and if your application is not read-intensive this feels like beginner's mistake called premature optimization.
depending on your searching criteria, if you are just selecting all of the data then the primary key is enough, to enhance the join part you can create an index on e.bp_id can help you more if you shared the tables schema

MySQL indexes optimisation

I have a big query with different tables queried with joins and with WHERE CLAUSES.
Now from my understanding the best index to have is to see the WHERE CLAUSE and add it as an index
select name from Table WHERE name = 'John'
We would have an index on the "name" field .
How would we determine the best index to have if the clause looks like this:
WHERE table1.field = 'x' and table2.field = 'y' etc...
of course the query is much more complicated than that , just want to know how to proceed and if you guys have a better idea .
SELECT ...
FROM tA
JOIN tB WHERE tA.x = tB.y
WHERE tA.name = 'foo'
AND tB.name = 'bar'
begs for
tA: INDEX(name, x)
tB: INDEX(name, y)
On the other hand:
SELECT ...
FROM tA
JOIN tB WHERE tA.name = tB.name
needs INDEX(name) on both tables.
If name is the PRIMARY KEY on each table, then those indexes are redundant and should not be added.
Etc.
How would we determine the best index to have if the clause looks like this:
WHERE table1.field = 'x' and table2.field = 'y' etc...
First of all as you are using join of 2 tables then join fields should be indexed and for better performance these fields should be integer type.
Now try to check which condition is filtering more data means reducing rows and try to create index on that field or composite index on multiple fields (make sure field should be in most left in index which is filtering more data) but index size should not increase too much.
Normally (not always) one table uses single index, so as you are filtering data from multiple tables so you can create index on both tables columns if you are getting sufficient data filteration by these fields.
Further anyone can advise better after seeing your actual query.
There is no such thing as single index for multiple tables. The first thing you could do, is to create an index for table1 on field and another one for table2 on field. If this still not fast enough, depending on your database schema, you could set a foreign key.
Lastly, you can create a view which contains data from both tables and then index that view. The advantage of a view is to have the data pre-joined which might make the query even faster.

News articles database - should I use index on date_publish field for faster SELECTS

Need some advice on how to optimize my articles table for read operations. I have articles table where I store articles editors write. There is requirement that editors can enter an article with a date_publish set in future. These articles can not be displayed in cover page at any time until the publish_date has actually come.
So my question here is should I have an index on date_publish field for better performance? I am using MySQL database, with InnoDB engine. I store dates as unixtimestamps in unsigned INT(11) field.
I when I make a read for list articles for cover page I do something like this:
SELECT articles.* FROM articles WHERE date_publish < $time
Adding an index on the column date_publish would optimize the following simple query:
SELECT * FROM articles WHERE date_publish < $time
However, if you change the query, such as add an ORDER clause to order by a column other than date_publish, you may need a compound (multi-column) index to optimize the query.
EDIT
To be able to fully utilize an index, a "covering" index must include all columns in the WHERE, JOIN and ORDER clauses, usually in that order. So, if you have a range in your WHERE clause on date_publish, and ORDER BY article_name, then you may wish to index on both columns (date_publish, article_name). That way MySQL can use the index for both selection and sorting.

getting the last 15 dated items -- any way to optimize?

I have the following rails associations and named scopes:
class Person
has_many :schedulings, :include => :event
class Scheduling
belongs_to :person
belongs_to :event
named_scope :recent, :order => "date DESC", :limit => 15
class Event
has_many :schedulings
I need to get the 15 most recent schedulings for a person. Here is my rails controller code, and the SQL query it generates:
Person.find(1).schedulings.recent
----
SELECT * FROM `schedulings` WHERE (`schedulings`.person_id = 1) ORDER BY date DESC LIMIT 15
Unfortunately this query is getting slower as my table sizes grow (currently at about 10K schedulings and 3K people).
Can anyone give me advice on either A) ways to optimize the existing query, or B) a more direct way to query that information, perhaps with direct SQL instead of Rails Active Record?
My schedulings table has a foreign key index on person_id, and an index on date. I do not have a dual index on person_id and date. Typically a person will not have more than one scheduling per date.
I am comfortable with Rails but a relative novice at SQL.
Creating an index on person_id and date should speed your code up nicely.
Also SELECT * is usually a bad idea speed wise.
Only select the fields you are actually going to use.
Especially if you have memo or blob fields in your table a select * will be slow as hell.
While you should have an index on the key to that table (IMO), more importantly for this case, you should have an index on the "date" field, as that's what you're ordering your results by...
I always like to use rails_footnotes while in development to see how well my queries (and how many) I'm needing for a given view

MySQL EXPLAIN: "Using index" vs. "Using index condition"

The MySQL 5.4 documentation, on Optimizing Queries with EXPLAIN, says this about these Extra remarks:
Using index
The column information is retrieved
from the table using only information
in the index tree without having to do
an additional seek to read the actual
row. This strategy can be used when
the query uses only columns that are
part of a single index.
[...]
Using index condition
Tables are read by accessing index
tuples and testing them first to
determine whether to read full table
rows. In this way, index information
is used to defer (“push down”) reading
full table rows unless it is
necessary.
Am I missing something, or do these two mean the same thing (i.e. "didn't read the row, index was enough")?
An example explains it best:
SELECT Year, Make --- possibly more fields and/or from extra tables
FROM myUsedCarInventory
WHERE Make = 'Toyota' AND Year > '2006'
Assuming the Available indexes are:
CarId
VIN
Make
Make and Year
This query would EXPLAIN with 'Using Index' because it doesn't need, at all, to "hit" the myUsedCarInventory table itself since the "Make and Year" index "cover" its need with regards to the elements of the WHERE clause that pertain to that table.
Now, imagine, we keep the query the same, but for the addition of a condition on the color
...
WHERE Make = 'Toyota' AND Year > '2006' AND Color = 'Red'
This query would likely EXPLAIN with 'Using Index Condition' (the 'likely', here is for the case that Toyota + year would not be estimated to be selective enough, and the optimizer may decide to just scan the table). This would mean that MySQL would FIRST use the index to resolve the Make + Year, and it would have to lookup the corresponding row in the table as well, only for the rows that satisfy the Make + Year conditions. That's what is sometimes referred as "push down optimization".
The difference is that "Using index" doesn't need a lookup from the index to the table, while "Using index condition" sometimes has to. I'll try to illustrate this with an example. Say you have this table:
id, name, location
With an index on
name, id
Then this query doesn't need the table for anything, it can retrieve all it's information "Using index":
select id, name from table where name = 'Piskvor'
But this query needs a table lookup for all rows where name equals 'Piskvor', because it can't retrieve location from the index:
select id from table where name = 'Piskvor' and location = 'North Pole'
The query can still use the index to limit the results to the small sets of row with a particular name, but it has to look at those rows in the table to check if the location matches too.