JSONB Index using GIN doesn't work on postgres [duplicate] - json

I have the following table in PostgreSQL:
CREATE TABLE index_test
(
id int PRIMARY KEY NOT NULL,
text varchar(2048) NOT NULL,
last_modified timestamp NOT NULL,
value int,
item_type varchar(2046)
);
CREATE INDEX idx_index_type ON index_test ( item_type );
CREATE INDEX idx_index_value ON index_test ( value )
I make the following selects:
explain select * from index_test r where r.item_type='B';
explain select r.value from index_test r where r.value=56;
The explanation of execution plan looks like this:
Seq Scan on index_test r (cost=0.00..1.04 rows=1 width=1576)
Filter: ((item_type)::text = 'B'::text)'
As far as I understand, this is a full table scan. The question is: why my indexes are not used?
May be, the reason is that I have too few rows in my table? I have only 20 of them. Could you please provide me with a SQL statement to easily populate my table with random data to check the indexes issue?
I have found this article: http://it.toolbox.com/blogs/db2luw/how-to-easily-populate-a-table-with-random-data-7888, but it doesn't work for me. The efficiency of the statement does not matter, only the simplicity.

Maybe, the reason is that I have too few rows in my table?
Yes. For a total of 20 rows in a table a seq scan is always going to be faster than an index scan. Chances are that those rows are located in a single database block anyway, so the seq scan would only need a single I/O operation.
If you use
explain (analyze true, verbose true, buffers true) select ....
you can see a bit more details about what is really going on.
Btw: you shouldn't use text as a column name, as that is also a datatype in Postgres (and thus a reserved word).

The example you have found is for DB2, in pg you can use generate_series to do it.
For example like this:
INSERT INTO index_test(data,last_modified,value,item_type)
SELECT
md5(random()::text),now(),floor(random()*100),md5(random()::text)
FROM generate_series(1,1000);
SELECT max(value) from index_test;
http://sqlfiddle.com/#!12/52641/3
The second query in above fiddle should use index only scan.

Related

SLOW QUERY / IN HAVING Clause

I have a many-to-many relationship database in MySQL
And this Query:
SELECT main_id FROM posts_tag
WHERE post_id IN ('134','140','187')
GROUP BY main_id
HAVING COUNT(DISTINCT post_id) = 3
There are ~5,300,000 rows into this table and that query seems to be slow like 5 seconds (and slower if I add more ids into search)
I want to ask if there is any way to make it faster?
EXPLAIN shows this:
By the way, I want to add more conditions like NOT IN and possible JOIN new tables which has same structure but different data. Not so much like this but first I want to know if there is any way to make that simple query faster?
Any advice would be helpful, even another method, or structure etc.
PS: Hardware is Intel Core i9 3.6Ghz, 64GB RAM, 480GB SSD. So I think the server specs is not a problem.
Use a "composite" and "covering" index:
INDEX(post_id, main_id)
And get rid of INDEX(post_id) since it will then be redundant.
"Covering" helps speed up a query.
Assuming this is a normal "many-to-many" table, then:
CREATE TABLE post_main (
post_id -- similar to `id` in table `posts`
main_id -- similar to `id` in table `main`
PRIMARY KEY(post_id, main_id),
INDEX(main_id, post_id)
) ENGINE=InnoDB;
There is no need for AUTO_INCREMENT anywhere in a many-to-many table.
(You could add FK constraints, but I say 'why bother'.)
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
And NOT IN
This gets a bit tricky. I think this is one way; there may be others.
SELECT main_id
FROM post_main
WHERE post_id IN (244,229,193,93,61)
GROUP BY main_id AS x
HAVING COUNT(*) = 5
AND NOT EXISTS ( SELECT 1
FROM post_main
WHERE main_id = x.main_id
AND post_id IN (92,10,234) );
Alexfsk, your Query on the second line has the IN variables surrounded by single quotes. When your column name is defined as INT or mediumint (or any kind of int) datatype, adding the single quotes around the data causes datatype conversion delays on every row considered and delays completion of your query.

How to optimise mysql query as Full ProcessList is showing Sending Data for over 24 hours

I have the following query that runs forever and I am looking to see if there is anyway that I can optimise it. This is running on a table that has in total 1,406,480 rows of data but apart from the Filename and Refcolumn, the ID and End_Date have both been indexed.
My Query:
INSERT INTO UniqueIDs
(
SELECT
T1.ID
FROM
master_table T1
LEFT JOIN
master_table T2
ON
(
T1.Ref_No = T2.Ref_No
AND
T1.End_Date = T2.End_Date
AND
T1.Filename = T2.Filename
AND
T1.ID > T2.ID
)
WHERE T2.ID IS NULL
AND
LENGTH(T1.Ref_No) BETWEEN 5 AND 10
)
;
Explain Results:
The reason for not indexing the Ref_No is that this is a text column and therefore I get a BLOB/TEXT error when I try and index this column.
Would really appreciate if somebody could advise on how I can quicken this query.
Thanks
Thanks to Bill in regards to multi column indexes I have managed to make some headway. I first ran this code:
CREATE INDEX I_DELETE_DUPS ON master_table(id, End_Date);
I then added a new column to show the length of the Ref_No but had to change it from the query Bill mentioned as my version of MySQL is 5.5. So I ran it in 3 steps:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED;
UPDATE master_table SET Ref_No_length = LENGTH(Ref_No);
ALTER TABLE master_table ADD INDEX (Ref_No_length);
Last step was to change my insert query with the where clause for the length. This was changed to:
AND t1.Ref_No_length between 5 and 10;
I then ran this query and within 15 mins I had 280k worth of id's inserted into my UniqueIDs table. I did go change my insert script to see if I could add more values to the length by doing the following:
AND t1.Ref_No_length IN (5,6,7,8,9,10,13);
This was to bring in the values where length was also equal to 13. This query took a lot longer, 2hr 50 mins to be precise but the additional ask of looking for all rows that have length of 13 gave me an extra 700k unique ids.
I am looking at ways to optimise the query with the IN clause, but a big improvement where this query kept running for 24 hours. So thank you so much Bill.
For the JOIN, you should have a multi-column index on (Ref_No, End_Date, Filename).
You can create a prefix index on a TEXT column like this:
ALTER TABLE master_table ADD INDEX (Ref_No(10));
But that won't help you search based on the LENGTH(). Indexing only helps search by value indexed, not by functions on the column.
In MySQL 5.7 or later, you can create a virtual column like this, with an index on the values calculated for the virtual column:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED AS (LENGTH(Ref_No)),
ADD INDEX (Ref_No_length);
Then MySQL will recognize that your condition in your query is the same as the expression for the virtual column, and it will automatically use the index (exception: in my experience, this doesn't work for expressions using JSON functions).
But this is no guarantee that the index will help. If most of the rows match the condition of the length being between 5 and 10, the optimizer will not bother with the index. It may be more work to use the index than to do a table-scan.
the ID and End_Date have both been indexed.
You have PRIMARY KEY(id) and redundantly INDEX(id)? A PK is a unique key.
"have both been indexed" -- INDEX(a), INDEX(b) is not the same as INDEX(a,b) -- they have different uses. Read about "composite" indexes.
That query smells a lot like "group-wise" max done in a very slow way. (Alas, that may have come from the online docs.)
I have compiled the fastest ways to do that task here: http://mysql.rjweb.org/doc.php/groupwise_max (There are multiple versions, based on MySQL version and what issues your code can/cannot tolerate.)
Please provide SHOW CREATE TABLE. One important question: Is id the PRIMARY KEY?
This composite index may be useful:
(Filename, End_Date, Ref_No, -- first, in any order
ID) -- last
This, as others have noted, is unlikely to be helped by any index, hence T1 will need a full-table-scan:
AND LENGTH(T1.Ref_No) BETWEEN 5 AND 10
If Ref_No cannot be bigger than 191 characters, change it to a VARCHAR so that it can be used in an index. Oh, did I ask for SHOW CREATE TABLE? If you can't make it VARCHAR, then my recommended composite index is
INDEX(Filename, End_Date, ID)

How can I optimize this SQL query with 100.000 records?

There is my SQL query. Table system_mailer is for logging sent e-mails. When i want to search some data, query is 10 seconds long. It is possible on any way to optimize this query?
SELECT `ID`
FROM `system_mailer`
WHERE `group` = 'selling_center'
AND `group_parameter_1` = '1'
AND `group_parameter_2` = '2138'
Timins is around couple of seconds, how could it be optimised?
You might find the following index on 4 columns would help performance:
CREATE INDEX idx ON system_mailer (`group`, group_parameter_1, group_parameter_2, ID);
MySQL should be able to use this index on your current query. By the way, if you are using InnoDB, and ID is the primary key, then you might be able to drop it from the explicit index definition, and just use this:
CREATE INDEX idx ON system_mailer (`group`, group_parameter_1, group_parameter_2);
Please avoid naming your columns and tables with reserved MySQL keywords like group. Because you made this design decision, you will now be forced to forever escape that column name with backticks (ugly).
just be sure you have a composite index on table system_mailer for the columns
(`group`, `group_parameter_1`, `group_parameter_2`)
and you can use redudancy adding the id to index for avoid data table access in query
(`group`, `group_parameter_1`, `group_parameter_2`, ID)

Optimal MySQL index for search query within date ranges

I have a MySQL table of the form
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`fieldA` int(11) NOT NULL,
`fieldB` int(11) NOT NULL,
....
)
The table will have around 500,000,000 rows, with the remaining fields being floats.
The queries I will be using will be of the form:
SELECT * FROM myTable
WHERE fieldA= AND fieldB= AND timestamp>'' and timestamp<=''
ORDER BY timestamp;
At the moment I have two indices: a primary key on id, and a unique key on timestamp,fieldA,fieldB (hashed). At the moment, a select query like the above takes around 6 minutes on a reasonably powerful desktop PC.
What would the optimal index to apply? Does the ordering of the 3 fields in the key matter, and should I be using a binary tree instead of hashed? Is there a conflict between my primary key and the second index? Or do I have the best performance I can expect for such a large db without more serious hardware?
Thanks!
For that particular query adding an index to fieldA and fieldB probably would be optimal. Order of the columns in the index do matter.
Index Order
In order for Mysql to even consider using a particular index on the query the first column must be in the query, so for example:
alter table mytable add index a_b_index(a, b);
select * from mytable where a = 1 and b = 2;
The above query should use the index a_b_index. Now take this next example:
alter table mytable add index a_b_index(a, b);
select * from mytable where b = 2;
This will not use the index because the index starts with a, but a is never used in the query so mysql will not use it.
Comparison
Mysql will only use an index if you use equality comparison. So < and > won't use an index for that column, same with between
LIKE
Mysql does use indexes on the LIKE statement, but only when the % is at the end of the statement like this:
select * from mytable where cola like 'hello%';
Whereas these will not use a index:
select * from mytable where cola like '%hello';
select * from mytable where cola like '%hello%';
Hashed indexes are not used for ranges. They are used for equality comparisons only. Therefore, a hashed index cannot be used for the range portion of your query.
Since you have a range in your query, you should use a standard b-tree index. Ensure that fielda and fieldb are the first columns in the index, then timestamp. MySQL cannot utilize the index for searches beyond the first range.
Consider a multi-column index on (fielda, fieldb, timestamp).
The index should also be able to satisfy the ORDER BY.
To improve the query further, select only those three columns or consider a larger "covering" index.

How to optimize database this query in large database?

Query
SELECT id FROM `user_tmp`
WHERE `code` = '9s5xs1sy'
AND `go` NOT REGEXP 'http://www.xxxx.example.com/aflam/|http://xx.example.com|http://www.xxxxx..example.com/aflam/|http://www.xxxxxx.example.com/v/|http://www.xxxxxx.example.com/vb/'
AND check='done'
AND `dataip` <1319992460
ORDER BY id DESC
LIMIT 50
MySQL returns:
Showing rows 0 - 29 ( 50 total, Query took 21.3102 sec) [id: 2622270 - 2602288]
Query took 21.3102 sec
if i remove
AND dataip <1319992460
MySQL returns
Showing rows 0 - 29 ( 50 total, Query took 0.0859 sec) [id: 3637556 - 3627005]
Query took 0.0859 sec
and if no data, MySQL returns
MySQL returned an empty result set (i.e. zero rows). ( Query took 21.7332 sec )
Query took 21.7332 sec
Explain plan:
SQL query: Explain SELECT * FROM `user_tmp` WHERE `code` = '93mhco3s5y' AND `too` NOT REGEXP 'http://www.10neen.com/aflam/|http://3ltool.com|http://www.10neen.com/aflam/|http://www.10neen.com/v/|http://www.m1-w3d.com/vb/' and checkopen='2010' and `dataip` <1319992460 ORDER BY id DESC LIMIT 50;
Rows: 1
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE user_tmp index NULL PRIMARY 4 NULL 50 Using where
Example of the database used
CREATE TABLE IF NOT EXISTS user_tmp ( id int(9) NOT NULL
AUTO_INCREMENT, ip text NOT NULL, dataip bigint(20) NOT NULL,
ref text NOT NULL, click int(20) NOT NULL, code text NOT
NULL, too text NOT NULL, name text NOT NULL, checkopen
text NOT NULL, contry text NOT NULL, vOperation text NOT NULL,
vBrowser text NOT NULL, iconOperation text NOT NULL,
iconBrowser text NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=4653425 ;
--
-- Dumping data for table user_tmp
INSERT INTO `user_tmp` (`id`, `ip`, `dataip`, `ref`, `click`, `code`, `too`, `name`, `checkopen`, `contry`, `vOperation`, `vBrowser`, `iconOperation`, `iconBrowser`) VALUES
(1, '54.125.78.84', 1319506641, 'http://xxxx.example.com/vb/showthread.php%D8%AA%D8%AD%D9%85%D9%8A%D9%84-%D8%A7%D8%BA%D9%86%D9%8A%D8%A9-%D8%A7%D9%84%D8%A8%D9%88%D9%85-giovanni-marradi-lovers-rendezvous-3cd-1999-a-155712.html', 0, '4mxxxxx5', 'http://www.xxx.example.com/aflam/', 'xxxxe', '2010', 'US', 'Linux', 'Chrome 12.0.742 ', 'linux.png', 'chrome.png');
I want the correct way to do the query and optimize database
You don't have any indexes besides the primary key. You need to make index on fields that you use in your WHERE statement. If you need to index only 1 field or a combination of several fields depends on the other SELECTs you will be running against that table.
Keep in mind that REGEXP cannot use indexes at all, LIKE can use index only when it does not begin with wildcard (so LIKE 'a%' can use index, but LIKE '%a' cannot), bigger than / smaller than (<>) usually don't use indexes also.
So you are left with the code and check fields. I suppose many rows will have the same value for check, so I would begin the index with code field. Multi-field indexes can be used only in the order in which they are defined...
Imagine index created for fields code, check. This index can be used in your query (where the WHERE clause contains both fields), also in the query with only code field, but not in query with only check field.
Is it important to ORDER BY id? If not, leave it out, it will prevent the sort pass and your query will finish faster.
I will assume you are using mysql <= 5.1
The answers above fall into two basic categories:
1. You are using the wrong column type
2. You need indexes
I will deal with each as both are relevant for performance which is ultimately what I take your questions to be about:
Column Types
The difference between bigint/int or int/char for the dataip question is basically not relevant to your issue. The fundamental issue has more to do with index strategy. However when considering performance holistically, the fact that you are using MyISAM as your engine for this table leads me to ask if you really need "text" column types. If you have short (less than 255 say) character columns, then making them fixed length columns will most likely increase performance. Keep in mind that if any one column is of variable length (varchar, text, etc) then this is not worth changing any of them.
Vertical Partitioning
The fact to keep in mind here is that even though you are only requesting the id column from the standpoint of disk IO and memory you are getting the entire row back. Since so many of the rows are text, this could mean a massive amount of data. Any of these rows that are not used for lookups of users or are not often accessed could be moved into another table where the foreign key has a unique key placed on it keeping the relationship 1:1.
Index Strategy
Most likely the problem is simply indexing as is noted above. The reason that your current situation is caused by adding the "AND dataip <1319992460" condition is that it forces a full table scan.
As stated above placing all the columns in the where clause in a single, composite index will help. The order of the columns in the index will no matter so long as all of them appear in the where clause.
However, the order could matter a great deal for other queries. A quick example would be an index made of (colA, colB). A query with "where colA = 'foo'" will use this index. But a query with "where colB = 'bar'" will not because colB is not the left most column in the index definition. So, if you have other queries that use these columns in some combination it is worth minimizing the number of indexes created on the table. This is b/c every index increases the cost of a write and uses disk space. Writes are expensive b/c of necessary disk activity. Don't make them more expensive.
You need to add index like this:
ALTER TABLE `user_tmp` ADD INDEX(`dataip`);
And if your column 'dataip' contains only unique values you can add unique key like this:
ALTER TABLE `user_tmp` ADD UNIQUE(`dataip`);
Keep in mind, that adding index can take long time on a big table, so don't do it on production server with out testing.
You need to create index on fields in the same order that that are using in where clause. Otherwise index is not be used. Index fields of your where clause.
does dataip really need to be a bigint? According to mysql The signed range is -9223372036854775808 to 9223372036854775807 ( it is a 64bit number ).
You need to choose the right column type for the job, and add the right type of index too. Else these queries will take forever.