I am trying to do a 'fuzzy' search on a sql table with names of people:
This is the table:
+----+------------+
| T1 | T2 |
+----+------------+
| 1 | Last,First |
| 2 | Last,First |
| 3 | Last,First |
+----+------------+
I want to get a select statement where I query T2 with the LIKE operator such that it works even when your query is "First Last"
The only way I can think of is by splitting the values and concatenating them and then searching again for the entry. Is there a better way to do this?
Yes, if there's a possibility you may put both last, first and first last into the database, the better way is to design your schema properly.
If you ever find yourself trying to search on, or otherwise manipulate, parts of columns, your schema is almost certainly broken. It will almost certainly kill performance.
The correct way is to have the table thus:
T1 FirstName LastName
== ========= ========
1 Pax Diablo
2 Bob Smith
3 George Jones
Then you can more efficiently split the user-entered name (once, before running the query) rather than trying to split every single name in the database.
In the case where the database always holds last, first, it may not actually be necessary for a schema change.
The problem you have in that case is simply one of interpreting what the user entered.
One possibility, although it is a performance killer, is to do a like for each separate word. So, if the user entered pax diablo, your resultant query might be:
select T1 from mytable
where T2 like '%pax%'
and T2 like '%diablo%'
That way, you don't care about the order so much.
However, given my dislike of slow queries, I'd try to steer clear of that unless absolutely necessary (or your database is relatively small and likely to stay that way).
There are all sorts of ways to speed up these sorts of queries, such as:
using whatever full-text search capabilities your DBMS has.
emulating such abilities by extracting and storing words during insert/update triggers (and removing them during delete triggers).
that previous case, but also ensuring extra columns used with lower-cased values of the current column (for speed).
telling the user they need to use the last, first form for searching.
trying to avoid the %something% search string as much as possible (with something%, indexes can still be used).
my previously mentioned "split th name into two columns" method.
You can try this one. But better yet reconstruct your schema of your table by separating the names.
SELECT *
FROM myTable
WHERE T2 LIKE CONCAT('%', 'First', '%','Last', '%') OR
T2 LIKE CONCAT('%', 'Last', '%','First', '%')
Related
I have this database: http://sqlfiddle.com/#!9/9c1f66/16
and I would like to produce an output like this for a certain DataEntry (in this case the one with id 1):
RowId | field1 | field2 | field3 | field4
1 | value_field1 | value_field2 | value_field3 | value_field4 |
I tried using pivot tables but I can't figure out how to do it properly.
The SQL language has a very strict and unbreakable rule requiring you to know the number of columns returned by a query before looking at any data in your tables.
The only way around this is with dynamic SQL, where you complete the query over three steps:
Run a query to find out information about your target columns.
Use the information from the first step to build a complex new SQL statement on the fly, using a PIVOT, conditional aggregations, numerous JOINs to the same table, or some combination thereof.
Run the query from step 2.
The design you are pursuing will require you to jump through those hoops for pretty much every query you will want to run. This will make application maintenance and development much more complex, it will make executing the query itself slow, and possibly worst of all this will break your ability to effectively index your data.
Currently I have a column in my table which has a set of comma separated values. I am currently using it to filter the results. I am wondering if it would be possible to index on it and query directly using it.
My table is as below:
userId types
123 A, B, C
234 B, C
If I want to query a user which has types A and C, should get 123
If with B and C then 123, 234
EDIT: I am aware the problem can be solved by normalization. However my table is actually storing json and this field is a virtual column referencing a list. there are no relations used anywhere. We are facing an issue where querying by types was not considered and is now causing performance impact
First of all, you should normalize your table and remove the CSV data. Use something like this:
userId | types
123 | A
123 | B
123 | C
234 | B
234 | C
For the specific query you have in mind, you might choose:
SELECT userId
FROM yourTable
WHERE types IN ('A', 'C')
GROUP BY userId
HAVING MIN(types) <> MAX(types);
With this in mind, MySQL might be able to use the following composite index:
CREATE INDEX idx ON yourTable (userId, types);
This index should cover the entire query above actually.
The answer is basically no . . . but not really. The important point is that you should restructure the data and store it in a properly. And "properly" means that string columns are not used to store multiple values.
However, that is not your question. You can create an index to do what you want. Such an index would be a full-text index, allowing you to use match(). If you take this approach you need to be very careful:
You need to use boolean mode when querying.
You need to set the minimum word length so single characters are recognized as words.
You need to check the stop-words list, so words such as "A" and "I" are included.
So, what you want to do is possible. However, it is not recommended because the data in not in a proper relational format.
MySQL supports Multi-Value Indexes for JSON columns as of MySQL 8.0.17.
It seems like exactly your case.
Details: https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued
i have a query like this:
ID | name | commentsCount
1 | mysql for dummies | 33
2 | mysql beginners guide | 22
SELECT
...,
commentsCount // will return 33 for first row, 22 for second one
FROM
mycontents
WHERE
name LIKE "%mysql%"
also i want to know the total of comments, of all rows:
SELECT
...,
SUM(commentsCount) AS commentsCountAggregate // should return 55
FROM
mycontents
WHERE
name LIKE "%mysql%"
but this one obviously returns a single row with the total.
now i want to merge these two queries in one single only,
because my actual query is very heavy to execute (it uses boolean full text search, substring offset search, and sadly lot more), then i don't want to execute it twice
is there a way to get the total of comments without making the SELECT twice?
!! custom functions are welcome !!
also variable usage is welcome, i never used them...
You can cache the intermediate result to a temporary table, and then do the sum over this table
One obvious solution is storing intermediate results withing another 'temporary' table, and than perform aggregation in the second step.
Another solution is preparing a lookup table containing sums you need (but there obviously needs to be some grouping ID, I call it MASTER_ID), like that:
CREATE TABLE comm_lkp AS
SELECT MASTER_ID, SUM(commentsCount) as cnt
FROM mycontents
GROUP BY MASTER_ID
Also create an index on that table on column MASTER_ID. Later, you can modify your query like that:
SELECT
...,
commentsCount,
cnt as commentsSum
FROM
mycontents as a
JOIN comm_lkp as b ON (a.MASTER_ID=b.MASTER_ID)
WHERE
name LIKE "%mysql%"
It also shouldn't touch your performance as long as lookup table will be relatively small.
A GROUP BY on one of the ID fields might do the trick. This will then give you the SUM(commentsCount) for each ID.
The query in your question is not detailed enough to know which of your fields/tables the ID field should come form.
I'm trying to make it quick and easy to perform a keyword search on a set of MySQL tables which are linked to each other.
There's a table of items with a unique "itemID" and associated data is spread out amongst other tables, all linked to via the itemID.
I've created a view which concatenates much of this information into one usable form. This makes searching really easy, but hasn't helped with performance. It's my first use of a view, and perhaps wasn't the right use. If anyone could give me some pointers I'd be very grateful.
A simplified example is:
ITEMS TABLE:
itemID | name
------ -------
1 "James"
2 "Bob"
3 "Mary"
KEYWORDS TABLE:
keywordID | itemID | keyword
------ ------- -------
1 2 "rabbit"
2 2 "dog"
3 3 "chicken"
plus many more relations...
MY VIEW: (created using CONCAT_WS, GROUP_CONCAT and a fair few JOINs)
itemID | important_search_terms
------ -------
1 "James ..."
2 "Bob, rabbit, dog ..."
3 "Mary, chicken ..."
I can then search the view for "mary" and "chicken" and easily find that itemID=3 matches. Brilliant!
The problem is, it seems to be doing all the work of the CONCATs and JOINs for each and every search which is not efficient. With my current test data searches are taking approx 2 seconds, which is not practical.
I was hoping that the view would be cached in some way, but perhaps I'm not using it in the right way.
I could have an actual table with this search info which I update periodically, but it doesn't seem as neat as I had hoped.
If anyone has any suggestions I'd be very grateful. Many Thanks
Well, a view is nothing more than making it easier to read what you query for but underneath perform the SQL-Statement lying underneath everytime.
So no wonder it is as slow (even slower...) as when you run that statement itself.
Usually this is done by indexing jobs (running at nighttime, not annoying anyone), or indexed inserts (when new data is inserted, checks run if it is a good idea to insert them into the indexed interesting words).
Having that at runtime is really hard and require well designed database structures and most of the time potent hardware for the sql server (depending of data amount).
A MySQL view is not the same as a materialized view in other SQL languages. All it's really doing is caching the query itself, not the data needed for the query.
The primary use for a MySQL view is to eliminate repetitive queries that you have to write over and over again.
You've made it easy, but not made it quick. I think if you look at the EXPLAIN for your query you are going to see that MySQL is materializing that view (writing out a copy of the result set from the view query as a "derived table") each time you run the query, and then running a query from that "derived table".
You would get better performance if you can have the "search" predicate run against each table separately, something like this:
SELECT 'items' AS source, itemID, name AS found_term
FROM items WHERE name LIKE 'foo'
UNION ALL
SELECT 'keywords', itemID, keyword
FROM keywords WHERE keyword LIKE 'foo'
UNION ALL
SELECT 'others', itemID
FROM others WHERE other LIKE 'foo'
-or-
if you don't care what the matched term is, or which table it was found in, and you just want to return a distinct list of itemID that were matched
SELECT itemID
FROM items WHERE name LIKE 'foo'
UNION
SELECT itemID
FROM keywords WHERE keyword LIKE 'foo'
UNION
SELECT itemID
FROM others WHERE other LIKE 'foo'
Currently I'm running these two queries:
SELECT COUNT(*) FROM `mytable`
SELECT * FROM `mytable` WHERE `id`=123
I'm wondering what format will be the most efficient. Does the order the queries are executed make a difference? Is there a single query that will do what I want?
Both queries are fairly unrelated. The COUNT doesn't use any indexes, while the SELECT likely uses the primary key for a fast look-up. The only thing the queries have in common is the table.
Since these are so simple, the query optimizer and results cache shouldn't have a problem performing very well on these queries.
Are they causing you performance problems? If not, don't bother optimizing them.
Does the order the queries are executed make a difference?
No, they reach for different things. The count will read a field that contains the number of colums of the table, the select by id will use the index. Both are fast and simple.
Is there a single query that will do what I want?
Yes, but it will make your code less clear, less maintenable (due to mixing concepts) and in the best case will not improve the performance (probably it will make it worse).
If you really really want to group them somehow, create a stored procedure, but unless you use this pair of queries a lot or in several places of the code, it can be an overkill.
First of: Ben S. makes a good point. This is not worth optimizing.
But if one wants to put those two statements in one SQl statement I think this is one way to do it:
select *,count(*) from mytable
union all
select *,-1 from mytable where id = 123
This will give one row for the count(*) (where one ignores all but the last column) and as many rows where id = 123 (where one ignores the last column as it is always -1)
Like this:
| Column1 | Column2 | Column3 | ColumnN | Count(*) Column |
---------------------------------------------------------------
| ignore | ignore | ignore | ignore | 4711 |
|valid data|valid data|valid data|valid data| -1 (ignore) |
Regards
Sigersted
What table engine are you using?
select count(*) is better on MyISAM compare to InnoDB. In MyISAM the number of rows for each table is stored. When doing count(*) the value is return. InnoDB doesn't do this because it supports transactions.
More info:
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/