how to do reverse fulltext search in MySQL? - mysql

By default it's like this:
select * from main_table where match(col1,col2) against('search_item');
but what I want to fetch is the reverse,
say,I've restored all the search_item(1000 records,for example),
and I want to see which of them matches a specified row in main_table.
Is that doable?

You could do something like:
SELECT * FROM search_items WHERE (SELECT col1 FROM main_table WHERE ID = XXX) LIKE CONCAT('%',search_item,'%');
That is going to be pretty darn slow if you have a huge dataset to get through.
If speed is an issue, another way to handle this (although admittedly a lot more complicated) is to get all of the data out of the database and build yourself a ternary search tree (also called a trie). Once you get through the overhead of building the trie, matching against input strings is lightning fast compared to brute force methods.

Well I realise this is ten months late but just for the sake of posterity. I think what you're saying is that you want all rows in the search_items table which would not match the query :
select * from main_table where match(col1,col2) against('search_item');
I'm going to assume that your main_table has some unique identifier column which I'll call unqid.
select * from main_table
WHERE
unqid not in (select unqid
from main_table
where match(col1,col2) against('search_item')
)
My first thought was to use a MINUS operator but it turns out MySQL doesn't have one !

Related

Can MySQL/MariaDB stop doing a selection by reaching certain condition? Or select all rows before condition

For example, I have table
id;name
1;John
2;Mary
3;Cat
4;Cheng
I want selection to stop right after 3;Cat and still have as much rows in it as exist berore 3;Cat
I think this could be described with such a query
SELECT * FROM table WHERE condition ORDER BY id LIMIT name = 'Cat'
but of course there is no such a construction LIMIT name='Cat' in SQL.
Maybe something else fits?
Currently Im using extensive select, but it requires enormous 1200 rows to be sure that it has at least one record expected.
This is a not-so-ad answer
https://stackoverflow.com/a/22232897/1475428
Solution might look like
SELECT * WHERE id <= (SELECT MIN(id) WHERE name = 'Cat') order by id
MIN function plays role of backward approach that works like conditional LIMIT.
This looks like an ugly way, I still think there might be a better solution.
This is quite awkward to do in a single query. That means you probably should not try to do it in a single query.
Sometimes it's simpler to do a complex task in several steps. It's easier to write, it's easier to debug, it's easier to modify if you need to, and it's easier for future programmers to read your code if they need to take over responsibility.
So first query for the condition, and find out the id of the row you want to stop at:
SELECT MIN(id) FROM mytable WHERE name = 'Cat';
This returns either an id value, or else NULL if there is no row matching the condition.
If that result was not NULL, then use that value to run a simple query:
SELECT * FROM mytable WHERE id <= ? ORDER BY id
Else if the result was NULL, then default to a query with the fixed LIMIT you want:
SELECT * FROM mytable ORDER BY id LIMIT ?
If you have special conditions that aren't supported by simple SQL, then break it up into different queries that are each simple, and use a little bit of application logic to choose which query to run.

Query takes too long to run

I am running the below query to retrive the unique latest result based on a date field within a same table. But this query takes too much time when the table is growing. Any suggestion to improve this is welcome.
select
t2.*
from
(
select
(
select
id
from
ctc_pre_assets ti
where
ti.ctcassettag = t1.ctcassettag
order by
ti.createddate desc limit 1
) lid
from
(
select
distinct ctcassettag
from
ctc_pre_assets
) t1
) ro,
ctc_pre_assets t2
where
t2.id = ro.lid
order by
id
Our able may contain same row multiple times, but each row with different time stamp. My object is based on a single column for example assettag I want to retrieve single row for each assettag with latest timestamp.
It's simpler, and probably faster, to find the newest date for each ctcassettag and then join back to find the whole row that matches.
This does assume that no ctcassettag has multiple rows with the same createddate, in which case you can get back more than one row per ctcassettag.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
INNER JOIN
(
SELECT
ctcassettag,
MAX(createddate) AS createddate
FROM
ctc_pre_assets
GROUP BY
ctcassettag
)
newest
ON newest.ctcassettag = ctc_pre_assets.ctcassettag
AND newest.createddate = ctc_pre_assets.createddate
ORDER BY
ctc_pre_assets.id
EDIT: To deal with multiple rows with the same date.
You haven't actually said how to pick which row you want in the event that multiple rows are for the same ctcassettag on the same createddate. So, this solution just chooses the row with the lowest id from amongst those duplicates.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
WHERE
ctc_pre_assets.id
=
(
SELECT
lookup.id
FROM
ctc_pre_assets lookup
WHERE
lookup.ctcassettag = ctc_pre_assets.ctcassettag
ORDER BY
lookup.createddate DESC,
lookup.id ASC
LIMIT
1
)
This does still use a correlated sub-query, which is slower than a simple nested-sub-query (such as my first answer), but it does deal with the "duplicates".
You can change the rules on which row to pick by changing the ORDER BY in the correlated sub-query.
It's also very similar to your own query, but with one less join.
Nested queries are always known to take longer time than a conventional query since. Can you append 'explain' at the start of the query and put your results here? That will help us analyse the exact query/table which is taking longer to response.
Check if the table has indexes. Unindented tables are not advisable(until unless obviously required to be unindented) and are alarmingly slow in executing queries.
On the contrary, I think the best case is to avoid writing nested queries altogether. Bette, run each of the queries separately and then use the results(in array or list format) in the second query.
First some questions that you should at least ask yourself, but maybe also give us an answer to improve the accuracy of our responses:
Is your data normalized? If yes, maybe you should make an exception to avoid this brutal subquery problem
Are you using indexes? If yes, which ones, and are you using them to the fullest?
Some suggestions to improve the readability and maybe performance of the query:
- Use joins
- Use group by
- Use aggregators
Example (untested, so might not work, but should give an impression):
SELECT t2.*
FROM (
SELECT id
FROM ctc_pre_assets
GROUP BY ctcassettag
HAVING createddate = max(createddate)
ORDER BY ctcassettag DESC
) ro
INNER JOIN ctc_pre_assets t2 ON t2.id = ro.lid
ORDER BY id
Using normalization is great, but there are a few caveats where normalization causes more harm than good. This seems like a situation like this, but without your tables infront of me, I can't tell for sure.
Using distinct the way you are doing, I can't help but get the feeling you might not get all relevant results - maybe someone else can confirm or deny this?
It's not that subqueries are all bad, but they tend to create massive scaleability issues if written incorrectly. Make sure you use them the right way (google it?)
Indexes can potentially save you for a bunch of time - if you actually use them. It's not enough to set them up, you have to create queries that actually uses your indexes. Google this as well.

optimizing particular query mysql

So I've been searching for a solution and reading books, and havent been able to figure it out, the question is rather simple, I have 2 tables. On one table I have 2 fields:
table_1:"chromosome" and "position" both of the being integers.
table_2:"chromosome" "start" and "end", all being integers as well.
I want a query that gives me back all rows from table_1 that are between the start and end of table_2. The query looks like this:
SELECT
table_1 . *
FROM
table_1,
table_2
WHERE
table_1.chromosome = table_2.chromosome
AND table_1.position > table_2.start
AND table_1.position < table_1.end;
So this query works fine, but my tables are many millions of rows (7092713) and (215909) respectvely. I indexed chromosome, pos and chromosome, start, end. The weird part is that if I do the query one by one (perl DBI, do one statement for every row of table_2), this runs a lot faster. Not sure where am I screwing up.
Any help would be appreciated.
Jorge Kageyama
For the sake of clarity, let's start by recasting your query using the standard JOIN syntax. The query is equivalent but easier to read.
SELECT table_1 . *
FROM table_1
JOIN table_2 ON ( table_1.chromosome = table_2.chromosome
AND table_1.position > table_2.start
AND table_1.position < table_1.end)
Second, it's smart when searching large tables (or any tables for that matter) to avoid * in your SELECT clauses. Using * denies useful data to the optimizer about what you do, or don't, need in your result set. So let us say
SELECT table_1.chromosome, table_1.position
for SELECT.
So, it becomes clear that your result set, and your join, need chromosome and position, and nothing else, from your larger table. Try creating a compound BTREE index on that table, as follows.
CREATE INDEX ON table_1(chromosome,position) USING BTREE
Similarly, try creating an index on table_2 as follows.
CREATE INDEX ON table_2(chromosome,start, end) USING BTREE
These are called covering indexes. They contain enough columns that the query can be satisfied from the index without having to bounce back to the original table.
BTREE indexes (the default by the way) are inherently ordered. Appropriate records in table_1 can be found by range scans on the index starting with (chromosome,start) and ending with (chromosome,end).
Third, it's possible you're getting a massive combinatorial explosion of rows from table_1 in your result set. You'll get a row for every combination of rows in the two tables that matches your ON() clause. It's hard to know whether that's the case without knowing a lot about your data.
You could try to reduce that combinatorial explosion using
SELECT DISTINCT table_1.chromosome, table_1.position
Give this a try. If you're still not getting anywhere, maybe another question with complete table definitions and the results of EXPLAIN will be helpful.
Interesting question. Without knowing more about the quantities contained in "position," I would still approach it generally in this way:
Select for position generally from table_1 (with 7.0mm entities) so that the resulting table is a bin of a smaller amount of data. Let's say, for instance, that the "position" quantity is a set of discrete integers from 2-9. Select from table_1 where position is equal to 2, then select from table_2 where "start" is less than 2 and "end" is greater than 2. Iterate over this query selection 8 times updating a new table_3 with results.
I am assuming here that table_2 is unique on chromosome, and table_1 is not. Therefore, you end up with chromosomes that could have multiple positions within the same range (a chromosome has one range, but can appear anywhere within that range). You also, then, can't tell how large the resulting join table is going to be, but it could be quite large as each of the 7mm entities in table_1 could be within all ranges in table_2.
Iterating would allow you to "grow" your results while observing the quality at each point experimentally before committing to the entire loop.
Here is an idea of the query I have in mind (untested):
SELECT table_1.chromosome, table_1.position, table_2.start, table_2.end
FROM
(SELECT table_1.chromosome, table_1.position
from table_1 where table_1.position = 2)
JOIN
(SELECT table_2.chromosome, table_2.start, table_2.end
from table_2 where table_2.start < 2 AND table_2.end > 2)
ON
table_1.chromosome = table_2.chromosome
Good luck, and I hope you find your answer!

only select the row if the field value is unique

I sort the rows on date. If I want to select every row that has a unique value in the last column, can I do this with sql?
So I would like to select the first row, second one, third one not, fourth one I do want to select, and so on.
What you want are not unique rows, but rather one per group. This can be done by taking the MIN(pk_artikel_Id) and GROUP BY fk_artikel_bron. This method uses an IN subquery to get the first pk_artikel_id and its associated fk_artikel_bron for each unique fk_artikel_bron and then uses that to get the remaining columns in the outer query.
SELECT * FROM tbl
WHERE pk_artikel_id IN
(SELECT MIN(pk_artikel_id) AS id FROM tbl GROUP BY fk_artikel_bron)
Although MySQL would permit you to add the rest of the columns in the SELECT list initially, avoiding the IN subquery, that isn't really portable to other RDBMS systems. This method is a little more generic.
It can also be done with a JOIN against the subquery, which may or may not be faster. Hard to say without benchmarking it.
SELECT *
FROM tbl
JOIN (
SELECT
fk_artikel_bron,
MIN(pk_artikel_id) AS id
FROM tbl
GROUP BY fk_artikel_bron) mins ON tbl.pk_artikel_id = mins.id
This is similar to Michael's answer, but does it with a self-join instead of a subquery. Try it out to see how it performs:
SELECT * from tbl t1
LEFT JOIN tbl t2
ON t2.fk_artikel_bron = t1.fk_artikel_bron
AND t2.pk_artikel_id < t1.pk_artikel_id
WHERE t2.pk_artikel_id IS NULL
If you have the right indexes, this type of join often out performs subqueries (since derived tables don't use indexes).
This non-standard, mysql-only trick will select the first row encountered for each value of pk_artikel_bron.
select *
...
group by pk_artikel_bron
Like it or not, this query produces the output asked for.
Edited
I seem to be getting hammered here, so here's the disclaimer:
This only works for mysql 5+
Although the mysql specification says the row returned using this technique is not predictable (ie you could get any row as the "first" encountered), in fact in all cases I've ever seen, you'll get the first row as per the order selected, so to get a predictable row that works in practice (but may not work in future releases but probably will), select from an ordered result:
select * from (
select *
...
order by pk_artikel_id) x
group by pk_artikel_bron

What is the most efficient MySQL query to find all entries that start with a number?

In a database that has over 1 million entries, occasionally we need to find all rows that have a column name that starts with a number.
This is what currently is being used, but it just seems like there may be a more efficient manner in doing this.
SELECT * FROM mytable WHERE name LIKE '0%' OR name LIKE '1%' OR name ...
etc...
Any suggestions?
select * from table where your_field regexp '^[0-9]'
Hey,
you should add an index with a length of 1 to the field in the db. The query will then be significantly faster.
ALTER TABLE `database`.`table` ADD INDEX `indexName` ( `column` ( 1 ) )
Felix
My guess is that the indexes on the table aren't being used efficiently (if at all)
Since this is a char field of some type, and if this is the primary query on this table, you could restructure your indexes (and my mysql knowledge is a bit short here, somebody help out) such that this table is ordered (clustered index in ms sql) by this field, thus you could say something like
select * from mytable where name < char(57) and name > char(47)
Do some testing there, I'm not 100% on the details of how mysql would rank those characters, but that should get you going.
Another option is to have a new column that gives you a true/false on "starts_with_number". You could setup a trigger to populate that column. This might give the best and most predictable results.
If you're not actually using each and every field in the rows returned, and you really want to wring every drop of efficiency out of this query, then don't use select *, but instead specify only the fields you want to process.
I'm thinking...
SELECT * FROM myTable WHERE IF( LEFT( name, 1) = '#', 1,0)