Is there any alternative for mysql concat with better performance? - mysql

I am trying to apply join over two table, the column on which join needs to be applied values for them are not identical due to which i need to used concat but the problem is its taking very long time to run. So here is the example:
I have two tables:
Table: MasterEmployee
Fields: varchar(20) id, varchar(20) name, Int age, varchar(20) status
Table: Employee
Fields: varchar(20) id, varchar(20) designation, varchar(20) name, varchar(20) status
I have constant prefix: 08080
Postfix of constant length 1 char but value is random.
id in Employee = 08080 + {id in MasterEmployee} +{1 char random value}
Sample data:
MasterEmployee:
999, John, 24, approved
888, Leo, 26, pending
Employee:
080809991, developer, John, approved
080808885, Tester, Leo, approved
Here is the query that i am using:
select * from Employee e inner join MasterEmployee me
on e.id like concat('%',me.id,'%')
where e.status='approved' and me.status='approved';
Is there any better way to do the same ?? because i need to run same kind of query over very large dataset.

It would certainly be better to use the static prefix 08080 so that the DBMS can use an index. It won't use an index with LIKE and a leading wildcard:
SELECT * FROM Employee e INNER JOIN MasterEmployee me
ON e.id LIKE CONCAT('08080', me.id, '_')
AND e.status = me.status
WHERE e.status = 'approved';
Note that I added status to the JOIN condition since you want Employee.status to match MasterEmployee.status.
Also, since you only have one postfix character you can use the single-character wildcard _ instead of %.

It's not concat that's the issue, scalar operations are extremely cheap. The problem is using like like you are. Anything of the form field like '%...' automatically skips the index, resuling in a scan operation -- for what I think are obvious reasons.
If you have to have this code, then that's that, there's nothing you can do and you have to be resigned to the large performance hit you'll take. If at all possible though, I'd rethink either your database scheme or the way you address it.
Edit: Rereading it, what you want is to concatenate the prefix so your query takes the form field like '08080...'. This will make use of any indices you might have.

Related

Slow performing LEFT JOIN, CONCAT_WS search (MySQL, VBscript) [duplicate]

I am trying to apply join over two table, the column on which join needs to be applied values for them are not identical due to which i need to used concat but the problem is its taking very long time to run. So here is the example:
I have two tables:
Table: MasterEmployee
Fields: varchar(20) id, varchar(20) name, Int age, varchar(20) status
Table: Employee
Fields: varchar(20) id, varchar(20) designation, varchar(20) name, varchar(20) status
I have constant prefix: 08080
Postfix of constant length 1 char but value is random.
id in Employee = 08080 + {id in MasterEmployee} +{1 char random value}
Sample data:
MasterEmployee:
999, John, 24, approved
888, Leo, 26, pending
Employee:
080809991, developer, John, approved
080808885, Tester, Leo, approved
Here is the query that i am using:
select * from Employee e inner join MasterEmployee me
on e.id like concat('%',me.id,'%')
where e.status='approved' and me.status='approved';
Is there any better way to do the same ?? because i need to run same kind of query over very large dataset.
It would certainly be better to use the static prefix 08080 so that the DBMS can use an index. It won't use an index with LIKE and a leading wildcard:
SELECT * FROM Employee e INNER JOIN MasterEmployee me
ON e.id LIKE CONCAT('08080', me.id, '_')
AND e.status = me.status
WHERE e.status = 'approved';
Note that I added status to the JOIN condition since you want Employee.status to match MasterEmployee.status.
Also, since you only have one postfix character you can use the single-character wildcard _ instead of %.
It's not concat that's the issue, scalar operations are extremely cheap. The problem is using like like you are. Anything of the form field like '%...' automatically skips the index, resuling in a scan operation -- for what I think are obvious reasons.
If you have to have this code, then that's that, there's nothing you can do and you have to be resigned to the large performance hit you'll take. If at all possible though, I'd rethink either your database scheme or the way you address it.
Edit: Rereading it, what you want is to concatenate the prefix so your query takes the form field like '08080...'. This will make use of any indices you might have.

MySQL - Using a CASE statement vs. lookup table?

I'm debating between using a CASE statement or a lookup table to replace text from table2.columnB when table1.columnB = table2.columnA. I'd rather use a lookup table because it's easier to manage.
Our database pulls all the customer order information from our online store. It receives all the state names in full and I need to replace all instances of U.S. states with their 2-character abbreviation. (e.g. Texas -> TX)
How would I use a lookup table with this query for State?
Here's my query: http://sqlfiddle.com/#!9/e44aa3/12/0
Thank you in advance!
For your question how would add the lookup table in your code, you must add this join:
LEFT JOIN `state_abbreviations` AS `sa` ON `sa`.`shipping_zone` = `o`.`shipping_zone`
and change this line:
`o`.`shipping_zone` AS `State`
with:
COALESCE(`sa`.`zone_abbr`, `o`.`shipping_zone`) AS `State`
so you get the abbreviation returned.
See the demo.
Results:
Order ID Name State Qty Option Size Product Ref
12345 Mason Sklut NC 1 R L Tee R / Tee L
12346 John Doe OH 2 Bl S Hood 2x Bl / Hood S
Using a CASE expression is sure an option. However, it does not scale well: there are 50+ states in the US, so you would need to write 50 when branches, like:
case state
when 'North Carolina' then 'NC'
when 'Ohio' then 'OH'
when ...
end
Creating a mapping table seems like a better idea. It is also a good way to enforce referential integrity (ie ensure that the names being used really are state names).
That would look like:
create table states (
code varchar(2) not null primary key,
name varchar(100) not null
);
In your original table, you want to have a column that stores the state code, with a foreign key constraint that references states(code) (you may also store the state name, but this looks like a less efficient option in terms of storage).
You can do the mapping in your queries with a join:
select t.*, s.name state_name
from mytable t
inner join states s on s.code = t.state_code

Just what exactly is the performance loss in adding a table that gets joined on every request?

I'm working on an application that previously had unique handles for users only--but now we want to have handles for events, groups, places... etc. Unique string identifiers for many different first class objects. I understand the thing to do is adopt something like the Party Model, where every entity has its own unique partyId and handle. That said, that means on pretty much every data-fetching query, we're adding a join to get that handle! Certainly for every user.
So just what is the performance loss here? For a table with just three or four columns, is a join like this negligible? Or is there a better way of going about this?
Example Table Structure:
Party
int id
int party_type_id
varchar(256) handle
Events
int id
int party_id
varchar(256) name
varchar(256) time
int place_id
Users
int id
int party_id
varchar(256) first_name
varchar(256) last_name
Places
int id
int party_id
varchar(256) name
-- EDIT --
I'm getting a bad rating on this question, and I'm not sure I understand why. In PLAIN TERMS, I'm asking,
If I have three first class objects that must all share a UNIQUE HANDLE property, unique across all three objects, does adding an additional table that must be joined with on almost any request incur a significant performance hit? Is there a better way of accomplishing this in a relational database like MySQL?
-- EDIT: Proposed Queries --
Getting one user
SELECT * FROM Users u LEFT JOIN Party p ON u.party_id = p.id WHERE p.handle='foo'
Searching users
SELECT * FROM Users u LEFT JOIN Party p ON u.party_id = p.id WHERE p.handle LIKE '%foo%'
Searching all parties... I guess I'm not sure how to do this in one query. Would you have to select all Parties matching the handle and then get the individual objects in separate queries? E.g.
db.makeQuery(SELECT * FROM Party p WHERE p.handle LIKE '%foo%')
.then(function (results) {
// iterate through results and assemble lists of matching parties by type, then get those objects in separate queries
})
This last example is what I'm most concerned about I think. Is this a reasonable design?
The queries you show should be blazingly fast on any modern implementation, and should scale to tens or hundreds of thousands of millions of records without too much trouble.
Relational Database Management Systems (of which MySQL is one) are designed explicitly for this scenario.
In fact, the slow part of your second query:
SELECT * FROM Users u LEFT JOIN Party p ON u.party_id = p.id WHERE p.handle LIKE '%foo%'
is going to be WHERE p.handle LIKE '%foo%' as this will not be able to use an index. Once you have a large table, this part of the query will be many times slower than the join.

Two-way partial search using SQL

I have a PHP snippet that looks up a MySQL table and returns the top 6 closest matches, both exact as well as partial, against a given search string. The SQL statement is:
SELECT phone, name FROM contacts_table WHERE phone LIKE :ph LIMIT 6;
Using the above example, if :ph is assigned, say, %981% it would return every entry that contains 981, e.g. 9819133333, +917981688888, 9999819999, etc. However, is it also possible to return all entries whose values are contained within the search string using the same query? Thus, if the search string is 12345, it would return all of the following:
123456789 (contains the search string)
88881234500 (contains the search string)
99912345 (contains the search string)
123 (is contained within the search string)
45 (is contained within the search string)
2345 (is contained within the search string)
You can do a lookup where the number is LIKE the column:
SELECT * FROM `test`
WHERE '123456' LIKE CONCAT('%',`stuff`,'%')
OR `stuff` LIKE '%123456%';
An index will never be used, though, because an index cannot be used with a preceding %.
An alternate way to do it would be to create a temporary table in memory and insert tokenized strings and use a JOIN on the temporary table. This will likely be much slower than my solution above, but it is a potential option.
You can try the option of dynamic SQL:
SELECT
phone
FROM
contacts_table
WHERE
phone LIKE :ph or
phone = :val1 or
phone = :val2 or
phone = :val3 or
phone = :val4 or
phone = :val5 (so on a so forth)
LIMIT 6;
Where :ph will be your regular input (e.g. %981%) and valX is going to be tokenize input.
It would be good idea if you do the tokenizing smartly (say if input is of length 5 then go for token size of 3 or 4). Try to limit the number of tokens to get better performance.
DEMO
If you using PHP then do something like:
foreach ($phone as getPhoneNumberTokens($input)) {
if ($phone != "") {
$where_args[] = "phone = '$phone'";
}
}
$where_clause = implode(' OR ', $where_args);
You could use three tables. I don't actually know how performant it will be, though. I didn't actually insert anything to test it out.
contact would contain every contact. token would contain every valid token. What I mean is that when you insert into contact, you would also tokenize the phone number and insert every single token into the token table. Tokens would be unique. Kay. So, then you would have a relation table which will contain the many<->many relationship between contact and token.
Then, you would would get all contacts that have tokens that match the input phone number.
Table definitions:
CREATE TABLE contact (id int NOT NULL AUTO_INCREMENT, phone varchar(16), PRIMARY KEY (id), UNIQUE(phone));
CREATE TABLE token (id int NOT NULL AUTO_INCREMENT, token varchar(16), PRIMARY KEY (id), UNIQUE(token));
CREATE TABLE relation (token_id int NOT NULL, contact_id int NOT NULL);
The query:
There might be a better way to write this query (maybe by using a subquery rather than so many joins?), but this is what I came up with.
SELECT DISTINCT contact_list.phone FROM contact AS contact_input
JOIN relation AS relation_input
ON relation_input.contact_id = contact_input.id
JOIN token AS all_tokens
ON all_tokens.id = relation_input.token_id
JOIN relation AS relation_query
ON relation_query.token_id = all_tokens.id
JOIN contact AS contact_list
ON contact_list.id = relation_query.contact_id
WHERE contact_input.phone LIKE '123456789'
Query Plan:
However, this is with no data actually in the database, so the execution plan could change if data were present. It looks promising to me, because of the eq_ref and key usage.
I also made an SQL Fiddle demonstrating this.
Notes:
I didn't add any indexes. You could probably add some indexes and
make it more performant... but indexes might not actually help in
this instance, since you aren't querying over any duplicated rows.
It might be possible to add compiler hints or use LEFT/RIGHT Joins to improve query plan execution. LEFT/RIGHT Joins in the wrong place could break the query, though.
as it currently stands, you'd have to insert the queried number into the contact database and tokenize it and insert into relation and token prior to querying. Instead, you could use a temporary table for the queried tokens, then do JOIN temp_tokens ON temp_tokens.token = all_tokens.token... Actually, that's probably what you should do. But I'm not gonna re-write this answer right now.
Using integer columns for phone and token would perform better, if that is a valid option for you.
An alternate way to do it, which would be better than inserting all the tokens into the table just for a query would be to use an IN (), like:
SELECT DISTINCT contact.phone FROM token
JOIN relation
ON relation.token_id = token.id
JOIN contact
ON relation.contact_id = contact.id
WHERE token.token IN ('123','234','345','and so on')
And here is another, improved fiddle: http://sqlfiddle.com/#!9/48d0e/2

Multiple "where" -s from one table into one view

I have a table called "users" with 4 fields: ID, UNAME, NAME, SHOW_NAME.
I wish to put this data into one view so that if SHOW_NAME is not set, "UNAME" should be selected as "NAME", otherwise "NAME".
My current query:
SELECT id AS id, uname AS name
FROM users
WHERE show_name != 1
UNION
SELECT id AS id, name AS name
FROM users
WHERE show_name = 1
This generally works, but it does seem to lose the primary key (NaviCat telling me "users_view does not have a primary key...") - which I think is bad.
Is there a better way?
That should be fine. I'm not sure why it's complaining about the loss of a primary key.
I will offer one piece of advice. When you know that there can be no duplicates in your union (such as the two parts being when x = 1 and when x != 1), you should use union all.
The union clause will attempt to remove duplicates which, in this case, is a waste of time.
If you want more targeted assistance, it's probably best if you post the details of the view and the underlying table. Views themselves don't tend to have primary keys or indexes, relying instead on the underlying tables.
So this may well be a problem with your "NaviCat" product (whatever that is) expecting to see a primary key (in other words, it's not built very well for views).
If i am understanding your question correctly, you should be able to just use a CASE statement like below for your logic
SELECT
CASE WHEN SHOW_NAME ==1 THEN NAME ELSE UNAME END
FROM users
This can likely be better written as the following:
SELECT id AS id, IF(show_name == 1, name, uname) AS name
FROM users