Slow performing LEFT JOIN, CONCAT_WS search (MySQL, VBscript) [duplicate] - mysql

I am trying to apply join over two table, the column on which join needs to be applied values for them are not identical due to which i need to used concat but the problem is its taking very long time to run. So here is the example:
I have two tables:
Table: MasterEmployee
Fields: varchar(20) id, varchar(20) name, Int age, varchar(20) status
Table: Employee
Fields: varchar(20) id, varchar(20) designation, varchar(20) name, varchar(20) status
I have constant prefix: 08080
Postfix of constant length 1 char but value is random.
id in Employee = 08080 + {id in MasterEmployee} +{1 char random value}
Sample data:
MasterEmployee:
999, John, 24, approved
888, Leo, 26, pending
Employee:
080809991, developer, John, approved
080808885, Tester, Leo, approved
Here is the query that i am using:
select * from Employee e inner join MasterEmployee me
on e.id like concat('%',me.id,'%')
where e.status='approved' and me.status='approved';
Is there any better way to do the same ?? because i need to run same kind of query over very large dataset.

It would certainly be better to use the static prefix 08080 so that the DBMS can use an index. It won't use an index with LIKE and a leading wildcard:
SELECT * FROM Employee e INNER JOIN MasterEmployee me
ON e.id LIKE CONCAT('08080', me.id, '_')
AND e.status = me.status
WHERE e.status = 'approved';
Note that I added status to the JOIN condition since you want Employee.status to match MasterEmployee.status.
Also, since you only have one postfix character you can use the single-character wildcard _ instead of %.

It's not concat that's the issue, scalar operations are extremely cheap. The problem is using like like you are. Anything of the form field like '%...' automatically skips the index, resuling in a scan operation -- for what I think are obvious reasons.
If you have to have this code, then that's that, there's nothing you can do and you have to be resigned to the large performance hit you'll take. If at all possible though, I'd rethink either your database scheme or the way you address it.
Edit: Rereading it, what you want is to concatenate the prefix so your query takes the form field like '08080...'. This will make use of any indices you might have.

Related

MySQL - Using a CASE statement vs. lookup table?

I'm debating between using a CASE statement or a lookup table to replace text from table2.columnB when table1.columnB = table2.columnA. I'd rather use a lookup table because it's easier to manage.
Our database pulls all the customer order information from our online store. It receives all the state names in full and I need to replace all instances of U.S. states with their 2-character abbreviation. (e.g. Texas -> TX)
How would I use a lookup table with this query for State?
Here's my query: http://sqlfiddle.com/#!9/e44aa3/12/0
Thank you in advance!
For your question how would add the lookup table in your code, you must add this join:
LEFT JOIN `state_abbreviations` AS `sa` ON `sa`.`shipping_zone` = `o`.`shipping_zone`
and change this line:
`o`.`shipping_zone` AS `State`
with:
COALESCE(`sa`.`zone_abbr`, `o`.`shipping_zone`) AS `State`
so you get the abbreviation returned.
See the demo.
Results:
Order ID Name State Qty Option Size Product Ref
12345 Mason Sklut NC 1 R L Tee R / Tee L
12346 John Doe OH 2 Bl S Hood 2x Bl / Hood S
Using a CASE expression is sure an option. However, it does not scale well: there are 50+ states in the US, so you would need to write 50 when branches, like:
case state
when 'North Carolina' then 'NC'
when 'Ohio' then 'OH'
when ...
end
Creating a mapping table seems like a better idea. It is also a good way to enforce referential integrity (ie ensure that the names being used really are state names).
That would look like:
create table states (
code varchar(2) not null primary key,
name varchar(100) not null
);
In your original table, you want to have a column that stores the state code, with a foreign key constraint that references states(code) (you may also store the state name, but this looks like a less efficient option in terms of storage).
You can do the mapping in your queries with a join:
select t.*, s.name state_name
from mytable t
inner join states s on s.code = t.state_code

Just what exactly is the performance loss in adding a table that gets joined on every request?

I'm working on an application that previously had unique handles for users only--but now we want to have handles for events, groups, places... etc. Unique string identifiers for many different first class objects. I understand the thing to do is adopt something like the Party Model, where every entity has its own unique partyId and handle. That said, that means on pretty much every data-fetching query, we're adding a join to get that handle! Certainly for every user.
So just what is the performance loss here? For a table with just three or four columns, is a join like this negligible? Or is there a better way of going about this?
Example Table Structure:
Party
int id
int party_type_id
varchar(256) handle
Events
int id
int party_id
varchar(256) name
varchar(256) time
int place_id
Users
int id
int party_id
varchar(256) first_name
varchar(256) last_name
Places
int id
int party_id
varchar(256) name
-- EDIT --
I'm getting a bad rating on this question, and I'm not sure I understand why. In PLAIN TERMS, I'm asking,
If I have three first class objects that must all share a UNIQUE HANDLE property, unique across all three objects, does adding an additional table that must be joined with on almost any request incur a significant performance hit? Is there a better way of accomplishing this in a relational database like MySQL?
-- EDIT: Proposed Queries --
Getting one user
SELECT * FROM Users u LEFT JOIN Party p ON u.party_id = p.id WHERE p.handle='foo'
Searching users
SELECT * FROM Users u LEFT JOIN Party p ON u.party_id = p.id WHERE p.handle LIKE '%foo%'
Searching all parties... I guess I'm not sure how to do this in one query. Would you have to select all Parties matching the handle and then get the individual objects in separate queries? E.g.
db.makeQuery(SELECT * FROM Party p WHERE p.handle LIKE '%foo%')
.then(function (results) {
// iterate through results and assemble lists of matching parties by type, then get those objects in separate queries
})
This last example is what I'm most concerned about I think. Is this a reasonable design?
The queries you show should be blazingly fast on any modern implementation, and should scale to tens or hundreds of thousands of millions of records without too much trouble.
Relational Database Management Systems (of which MySQL is one) are designed explicitly for this scenario.
In fact, the slow part of your second query:
SELECT * FROM Users u LEFT JOIN Party p ON u.party_id = p.id WHERE p.handle LIKE '%foo%'
is going to be WHERE p.handle LIKE '%foo%' as this will not be able to use an index. Once you have a large table, this part of the query will be many times slower than the join.

SELECT grouping by value in field

Given the following (greatly simplified) example table:
CREATE TABLE `permissions` (
`name` varchar(64) NOT NULL DEFAULT '',
`access` enum('read_only','read_write') NOT NULL DEFAULT 'read_only'
);
And the following example contents:
| name | access |
=====================
| foo | read_only |
| foo | read_write |
| bar | read_only |
What I want to do is run a SELECT query that fetches one row for each unique value in name, favouring those with an access value of read_write, is there a way that this can be done? i.e- such that the results I would get are:
foo | read_write |
bar | read_only |
I may need to add new options to the access column in future, but they will always be in order of importance (lowest to highest) so, if possible, a solution that can cope with this would be especially useful.
Also, to clarify, my actual table includes other fields than these, which is why I'm not using a unique key on the name column; there will be multiple rows by name by design to suit various criteria.
The following will work on your data:
select name, max(access)
from permissions
group by name;
However, this orders by the string values, not the indexes. Here is another method:
select name,
substring_index(group_concat(access order by access desc), ',') as access
from permissions
group by name;
It is rather funky that order by goes by the index but min() and max() use the character value. Some might even call that a bug.
You can create another table with the priority of the access (so you can add new options), and then group by and find the MIN() value of the priority table:
E.g. create a table called Priority with the values
| PriorityID| access |
========================
| 1 | read_write |
| 2 | read_only |
And then,
SELECT A.Name, B.Access
FROM (
SELECT A.name, MIN(B.PriorityID) AS Most_Valued_Option -- This will be 1 if there is a read_write for that name
FROM permissions A
INNER JOIN Priority B
ON A.Access = B.Access
GROUP BY A.Name ) A
INNER JOIN Priority B
ON A.Most_Valued_Option = B.PriorityID
-- Join that ID with the actual access
-- (and we will select the value of the access in the select statement)
The solution proposed by Gordon is sufficient for the current requirements.
If we anticipate a future requirement for a priority order to be other than alphabetical string order (or by enum index value)...
As a modified version of Gordon's answer, I would be tempted to use the MySQL FIELD function and (its converse) ELT function, something like this:
SELECT p.name
, ELT(
MIN(
FIELD(p.access
,'read_only','read_write','read_some'
)
)
,'read_only','read_write','read_some'
) AS access
FROM `permissions` p
GROUP BY p.name
If the specification is to pull the entire row, and not just the value of the access column, we could use an inline view query to find the preferred access, and a join back to the preferences table to pull the whole row...
SELECT p.*
FROM ( -- inline view, to get the highest priority value of access
SELECT r.name
, MIN(FIELD(r.access,'read_only','read_write','read_some')) AS ax
FROM `permissions` r
GROUP BY r.name
) q
JOIN `permissions` p
ON p.name = q.name
AND p.access = ELT(q.ax,'read_only','read_write','read_some')
Note that this query returns not just the access with the highest priority, but can also return any columns from that row.
With the FIELD and ELT functions, we can implement any ad-hoc ordering of a list of specific, known values. Not just alphabetic ordering, or ordering by the enum index value.
That logic for "priority" can be contained within the query, and won't rely on an extra column(s) in the permissions table, or the contents of any other table(s).
To get the behavior we are looking for, just specifying a priority for access, the "list of the values" used in the FIELD function will need to match the "list of values" in the ELT function, in the same order, and the lists should include all possible values of access.
Reference:
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_elt
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_field
ADVANCED USAGE
Not that you have a requirement to do this, but considering possible future requirements... we note that...
A different order of the "list of values" will result in a different ordering of priority of access. So a variety of queries could each implement their own different rules for the "priority". Which access value to look for first, second and so on, by reordering the complete "list of values".
Beyond just reordering, it is also possible to omit a possible value from the "list of values" in the FIELD and ELT functions. Consider for example, omitting the 'read_only' value from the list on this line:
, MIN(FIELD(r.access,'read_write','read_some')) AS ax
and from this line:
AND p.access = ELT(q.ax,'read_write','read_some')
That will effectively limit the name rows returned. Only name that have an access value of 'read_write' or 'read_some'. Another way to look at that, a name that has only a 'read_only' for access will not be returned by the query.
Other modifications to the "list of values", where the lists don't "match" are also possible, to implement even more powerful rules. For example, we could exclude a name that has a row with 'read_only'.
For example, in the ELT function, in place of the 'read_only' value, we use a value that we know does not (and cannot) exist on any rows. To illustrate,
we can include 'read_only' as the "highest priority" on this line...
, MIN(FIELD(r.access,'read_only','read_write','read_some')) AS ax
^^^^^^^^^^^
so if a row with 'read_only' is found, that will take priority. But in the ELT function in the outer query, we can translate that back to a different value...
AND p.access = ELT(q.ax,'eXcluDe','read_write','read_some')
^^^^^^^^^
If we know that 'eXcluDe' doesn't exist in the access column, we have effectively excluded any name which has a 'read_only' row, even if there is a 'read_write' row.
Not that you have a specification or current requirement to do any of that. Something to keep in mind for future queries that do have these kinds of requirements.
You can use distinct statement (or Group by)
SELECT distinct name, access
FROM tab;
This works too:
SELECT name, MAX(access)
FROM permissions
GROUP BY name ORDER BY MAX(access) desc

Two-way partial search using SQL

I have a PHP snippet that looks up a MySQL table and returns the top 6 closest matches, both exact as well as partial, against a given search string. The SQL statement is:
SELECT phone, name FROM contacts_table WHERE phone LIKE :ph LIMIT 6;
Using the above example, if :ph is assigned, say, %981% it would return every entry that contains 981, e.g. 9819133333, +917981688888, 9999819999, etc. However, is it also possible to return all entries whose values are contained within the search string using the same query? Thus, if the search string is 12345, it would return all of the following:
123456789 (contains the search string)
88881234500 (contains the search string)
99912345 (contains the search string)
123 (is contained within the search string)
45 (is contained within the search string)
2345 (is contained within the search string)
You can do a lookup where the number is LIKE the column:
SELECT * FROM `test`
WHERE '123456' LIKE CONCAT('%',`stuff`,'%')
OR `stuff` LIKE '%123456%';
An index will never be used, though, because an index cannot be used with a preceding %.
An alternate way to do it would be to create a temporary table in memory and insert tokenized strings and use a JOIN on the temporary table. This will likely be much slower than my solution above, but it is a potential option.
You can try the option of dynamic SQL:
SELECT
phone
FROM
contacts_table
WHERE
phone LIKE :ph or
phone = :val1 or
phone = :val2 or
phone = :val3 or
phone = :val4 or
phone = :val5 (so on a so forth)
LIMIT 6;
Where :ph will be your regular input (e.g. %981%) and valX is going to be tokenize input.
It would be good idea if you do the tokenizing smartly (say if input is of length 5 then go for token size of 3 or 4). Try to limit the number of tokens to get better performance.
DEMO
If you using PHP then do something like:
foreach ($phone as getPhoneNumberTokens($input)) {
if ($phone != "") {
$where_args[] = "phone = '$phone'";
}
}
$where_clause = implode(' OR ', $where_args);
You could use three tables. I don't actually know how performant it will be, though. I didn't actually insert anything to test it out.
contact would contain every contact. token would contain every valid token. What I mean is that when you insert into contact, you would also tokenize the phone number and insert every single token into the token table. Tokens would be unique. Kay. So, then you would have a relation table which will contain the many<->many relationship between contact and token.
Then, you would would get all contacts that have tokens that match the input phone number.
Table definitions:
CREATE TABLE contact (id int NOT NULL AUTO_INCREMENT, phone varchar(16), PRIMARY KEY (id), UNIQUE(phone));
CREATE TABLE token (id int NOT NULL AUTO_INCREMENT, token varchar(16), PRIMARY KEY (id), UNIQUE(token));
CREATE TABLE relation (token_id int NOT NULL, contact_id int NOT NULL);
The query:
There might be a better way to write this query (maybe by using a subquery rather than so many joins?), but this is what I came up with.
SELECT DISTINCT contact_list.phone FROM contact AS contact_input
JOIN relation AS relation_input
ON relation_input.contact_id = contact_input.id
JOIN token AS all_tokens
ON all_tokens.id = relation_input.token_id
JOIN relation AS relation_query
ON relation_query.token_id = all_tokens.id
JOIN contact AS contact_list
ON contact_list.id = relation_query.contact_id
WHERE contact_input.phone LIKE '123456789'
Query Plan:
However, this is with no data actually in the database, so the execution plan could change if data were present. It looks promising to me, because of the eq_ref and key usage.
I also made an SQL Fiddle demonstrating this.
Notes:
I didn't add any indexes. You could probably add some indexes and
make it more performant... but indexes might not actually help in
this instance, since you aren't querying over any duplicated rows.
It might be possible to add compiler hints or use LEFT/RIGHT Joins to improve query plan execution. LEFT/RIGHT Joins in the wrong place could break the query, though.
as it currently stands, you'd have to insert the queried number into the contact database and tokenize it and insert into relation and token prior to querying. Instead, you could use a temporary table for the queried tokens, then do JOIN temp_tokens ON temp_tokens.token = all_tokens.token... Actually, that's probably what you should do. But I'm not gonna re-write this answer right now.
Using integer columns for phone and token would perform better, if that is a valid option for you.
An alternate way to do it, which would be better than inserting all the tokens into the table just for a query would be to use an IN (), like:
SELECT DISTINCT contact.phone FROM token
JOIN relation
ON relation.token_id = token.id
JOIN contact
ON relation.contact_id = contact.id
WHERE token.token IN ('123','234','345','and so on')
And here is another, improved fiddle: http://sqlfiddle.com/#!9/48d0e/2

Is there any alternative for mysql concat with better performance?

I am trying to apply join over two table, the column on which join needs to be applied values for them are not identical due to which i need to used concat but the problem is its taking very long time to run. So here is the example:
I have two tables:
Table: MasterEmployee
Fields: varchar(20) id, varchar(20) name, Int age, varchar(20) status
Table: Employee
Fields: varchar(20) id, varchar(20) designation, varchar(20) name, varchar(20) status
I have constant prefix: 08080
Postfix of constant length 1 char but value is random.
id in Employee = 08080 + {id in MasterEmployee} +{1 char random value}
Sample data:
MasterEmployee:
999, John, 24, approved
888, Leo, 26, pending
Employee:
080809991, developer, John, approved
080808885, Tester, Leo, approved
Here is the query that i am using:
select * from Employee e inner join MasterEmployee me
on e.id like concat('%',me.id,'%')
where e.status='approved' and me.status='approved';
Is there any better way to do the same ?? because i need to run same kind of query over very large dataset.
It would certainly be better to use the static prefix 08080 so that the DBMS can use an index. It won't use an index with LIKE and a leading wildcard:
SELECT * FROM Employee e INNER JOIN MasterEmployee me
ON e.id LIKE CONCAT('08080', me.id, '_')
AND e.status = me.status
WHERE e.status = 'approved';
Note that I added status to the JOIN condition since you want Employee.status to match MasterEmployee.status.
Also, since you only have one postfix character you can use the single-character wildcard _ instead of %.
It's not concat that's the issue, scalar operations are extremely cheap. The problem is using like like you are. Anything of the form field like '%...' automatically skips the index, resuling in a scan operation -- for what I think are obvious reasons.
If you have to have this code, then that's that, there's nothing you can do and you have to be resigned to the large performance hit you'll take. If at all possible though, I'd rethink either your database scheme or the way you address it.
Edit: Rereading it, what you want is to concatenate the prefix so your query takes the form field like '08080...'. This will make use of any indices you might have.