I have an innodb table with 100M records like this:
id name pid cid createdAt
int char int int timestamp
id is PK, and pid is indexed: key
the most often query is select count(*) from table1 where pid='pid'
my question is does this query do a full table scanning?
count(*) is very rarely what you want.
The count function counts rows that are not null, so count(name) counts records where the name field is not null for example. If the field being counted is not indexed then this results in a full table scan.
In the case of count(*) the database counts records that have at least one non null field, ie it excludes records where all of the fields are null. This might be what you want, but most people incorrectly use this form when they want to just count all of the records regardless of their content.
The most efficient way of counting all of the records without database specific syntax is count(1). This works because the value 1 is not null for every record, and does not require any data to be read from the database.
If you want to know what the query does, then look at the "explain" plan.
If you want to speed the query in question, then create an index on table1(pid).
The query should scan the index rather than the table.
Related
Step 1:
I am creating a simple table.
CREATE TABLE `indexs`.`table_one` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NULL,
PRIMARY KEY (`id`));
Step 2:
I make two inserts into this table.
insert into table_one (name) values ("B");
insert into table_one (name) values ("A");
Step 3:
I make a select, I get a table, the records in which are ordered by id.
SELECT * FROM table_one;
This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.
Now the part I don't understand.
Step 4:
I am creating an index on the name column.
CREATE INDEX index_name ON table_one(name)
I repeat step 3 again, but I get a different result. The lines are now ordered according to the name column.
Why is this happening? why the order of the rows in the table changes in accordance with the new index on the name column, because as far as I understand, in mysql, the primary key is the only clustered index, and all indexes created additionally are secondary.
I make a select, I get a table, the records in which are ordered by id. [...] This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.
There is some misunderstanding of a concept here.
Table rows have no inherent ordering: they represent unordered set of rows. While the clustered index enforces a physical ordering of data in storage, it does not guarantee the order in which rows are returned by a select query.
If you want the results of the query to be ordered, then use an order by clause. Without such clause, the ordering or the rows is undefined: the database is free to return results in whichever order it likes, and results are not guaranteed to be consistent over consecutive executions of the same query.
select * from table_one order by id;
select * from table_one order by name;
(GMB explains most)
Why is this happening? why the order of the rows in the table changes in accordance with the new index on the name column
Use EXPLAIN SELECT ... -- it might give a clue of what I am about to suggest.
You added INDEX(name). In InnoDB, the PRIMARY KEY column(s) are tacked onto the end of each secondary index. So it is effectively a BTree ordered by (name,id) and containing only those columns.
Now, the Optimizer is free to fetch the data from the index, since it has everything you asked for (id and name). (This index is called "covering".)
Since you did not specify an ORDER BY, the result set ordering is valid (see GMB's discussion).
Moral of the story: If you want an ordering, specify ORDER BY. (The Optimizer is smart enough to "do no extra work" if it can see how to provide the data without doing a sort.
Further experiment: Add another column to the table but don't change the indexes. Now you will find SELECT * FROM t is ordered differently than SELECT id, name FROM t. I think I have given you enough clues to predict this difference, if not, ask.
I am working on a project related to a database. I want to find out the highest value from the primary key column of a same table (say tbrmenuitem) which is stored in multiple databases.
So, is it possible through one query or I do have to fire a different query at different times to make the connection with multiple databases? (that is, the first query to get the table name in the database, the second query to find the primary key of the table I got and then MAX() on the value of the primary key column?)
You can query tables in other databases on a server similar to how you would any other tables. You just need to qualify the table name with the name of the schema (database).
SELECT MAX(max) FROM (
SELECT MAX(id_column) AS max
FROM test2.test2table
UNION ALL
SELECT MAX(id_column) AS max
FROM test.test1table
) AS t
What this does, is selects the MAX() of a column from the table test2table in the test2 database.
SELECT MAX(id_column) AS max
FROM test2.test2table
It then performs a UNION on that result, with the result of a similar query performed on the test1table table in the test database.
UNION ALL
SELECT MAX(id_column) AS max
FROM test.test1table
This is then wrapped in a subquery which pulls the maximum value of each of the results returned from the UNION.
Assume I have this table:
create table table_a (
id int,
name varchar(25),
address varchar(25),
primary key (id)
) engine = innodb;
When I run this query:
select * from table_a where id >= 'x' and name = 'test';
How will MySQL process it? Will it pull all the id's first (assume 1000 rows) then apply the where clause name = 'test'?
Or while it looks for the ids, it is already applying the where clause at the same time?
As id is the PK (and no index on name) it will load all rows that satisfy the id based criterion into memory after which it will filter the resultset by the name criterion. Adding a composite index containing both fields would mean that it would only load the records that satisfy both criteria. Adding a separate single column index on the name field may not result in an index merge operation, in which case the index would have no effect.
Do you have indexes on either column? That may affect the execution plan. The other thing is one might cast the 'x'::int to ensure a numeric comparison instead of a string comparison.
For the best result, you should have a single index which includes both of the columns id and name.
In your case, I can't answer the affect of the primary index to that query. That depends on DBMS's and versions. If you really don't want to put more index (because more index means slow write and updates) just populate your table with like 10.000.000 random results, try it and see the effect.
you can compare the execution times by executing the query first when the id comes first in the where clause and then interchange and bring the name first. to see an example of mysql performance with indexes check this out http://www.mysqlperformanceblog.com/2006/06/02/indexes-in-mysql/
You can get information on how the query is processed by running EXPLAIN on the query.
If the idea is to optimize that query then you might want to add an index like:
alter table table_a add unique index name_id_idx (name, id);
I was wondering what the best way of storing user queries correlated with timestamps in MySQL was. Let's say I have just two inputs, a user's "query" and "timestamp"...
I could create a MySQL table with fields (id, query, count, timestamp_list), where:
id is unique identifier of the query,
query is the literal query string,
count is the (constantly-UPDATEd) number of times that query is entered, and
timestamp_list is a LONGTEXT or something with a list of times that query was searched.
Is there a better way to correlate these using indexing I'm not familiar with? It seems like storing a list of timestamps in a LONGTEXT is dumb, but easy; perhaps I should create a separate table like:
id
query_id (correlates to id in first table)
timestamp
And I can join results with the first table. Any thoughts? Thanks!
If you need to record the timestamp when each query was performed, i'd suggest you have 2 tables:
tbl_queries
- id INT
- query VARCHAR
tbl_queries_performed
- id INT AUTOINCREMENT
- query_id INT
- timestamp CURRENT_TIMESTAMP
Each time you want to record a query, check if it's in tbl_queries already and then save an entry in tbl_queries_performed with the query_id respectively
I have a table with a nullable datetime field.
I'll execute queries like this:
select * from TABLE where FIELD is not null
select * from TABLE where FIELD is null
Should I index this field or is not necessary? I will NOT search for some datetime value in that field.
It's probably not necessary.
The only possible edge case when index can be used (and be of help) is if the ratio of null / not-null rows is rather big (e.g. you have 100 NULL datetimes in the table with 100,000 rows). In that case select * from TABLE where FIELD is null would use the index and be considerably faster for it.
In short: yes.
Slightly longer: yeeees. ;-)
(From http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html) - "A search using col_name IS NULL employs indexes if col_name is indexed."
It would depend on the number of unique values and the number of records in the table. If your just searching on whether or not a column is null or not, you'll probably have one query use it and one not depending on the amount of nulls in the table overall.
For example: If you have a table with 99% of the records have the querying column as null and you put/have an index on the column and then execute:
SELECT columnIndexed FROM blah WHERE columnIndexed is null;
The optimizer most likely won't use the index. It won't because it will cost more to read the index and then read the associated data for the records, than to just access the table directly. Index usage is based on the statistical analysis of a table, and one major player in that is cardinality of the values. In general, indexes work best and give the best performance when they select a small subset of the rows in the table. So if you change the above query to select where columnIndexed is not null, your bound to use the index.
For more details check out the following: http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html