SQL Where Combining values - mysql

I have a database committee and one for users.
committee:
CREATE TABLE `committee` (
`com_ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`duties` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
`duties_de` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`duties_fr` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`users_IDFS` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`com_ID`),
KEY `users_IDFS` (`users_IDFS`),
CONSTRAINT `committee_ibfk_1` FOREIGN KEY (`users_IDFS`) REFERENCES `users` (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
and users
CREATE TABLE `users` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`rank` tinyint(2) NOT NULL,
`name` varchar(32) NOT NULL,
`first_name` varchar(32) DEFAULT NULL,
`email` varchar(64) NOT NULL,
`passwd` varchar(128) NOT NULL,
`street` varchar(128) DEFAULT NULL,
`location` varchar(128) DEFAULT NULL,
`plz` varchar(8) DEFAULT NULL,
`m_number` varchar(32) DEFAULT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8
Now my problem is that I want to update the user in the committee table, where the name e.g. "Hans Meier" == users.first_name and users.name
Is there a way I can split up the "Hans Meier" in an SQL statement or that I can combine users.first_name and users.name to a string and then compare it?

You can perform an update-join like
update committee c
join `users` u on c.name = u.name or c.name like CONCAT(u.first_name, '%')
set c.col = u.val;

The way to combine two strings into one is using the concat command.
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_concat
What you could do is, as you said, combine user.first_name and user.name and compare it.
select if('ab'=concat('a','b'),1,0);
What you should do is something like
select if('Hans Meier'= concat(user.first_name, ' ', user.name, <true clause>, <false clause>);
Don't forget the space or else it will fail.
select if('a b'=concat('a','b'),1,0);
The update sentence should look something like this
update committee
set user = <name of user>
where(user.name = concat(user.first_name, ' ', user.name);
Hope this helps!

You can use something like this
SUBSTRING('Hans Meier',0,LOCATE(" ", "Hans Meier"))
For first name
and SUBSTRING('Hans Meier',LOCATE(" ", "Hans Meier") + 1)
For last name

I am going to suggest that you fix your data model instead. If you are going to want to use the information from one table to update another, then it is best to store that information in the same way. You are asking to update using the full name but the only field I see for users to go into is an INT. In that case you would store the ID not the name.
Further it seems unlikely that a committee would have only one member. Likely you need a CommitteeMember table that stores the committeeID and the UserIDs of the members. It is a SQL antipattern to store multiple IDS in one column or to store the same committee record multiple times.
I see other problems with your model as well. Committees usually have names. I saw no field for that. When you add the name, add a unique index on this as well. If there are duties associated with a committee, then there should be a ComitteeDuty table because otherwise again you are going to have multiple records for the same committee or will have to concatenate in one field both of which are serious database design errors. You really need to read up on normalization before revisiting this design.

Related

Speed Up A Large Insert From Select Query With Multiple Joins

I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop

How to query records in appear in one table, but not in the other two tables

I have three tables: input, results, errors.
input table:
'input', 'CREATE TABLE `input` (\n `name` varchar(500) NOT NULL,\n PRIMARY KEY (`name`),\n UNIQUE KEY `domain_UNIQUE` (`name`)\n) ENGINE=InnoDB DEFAULT CHARSET=latin1'
The results table:
'results', 'CREATE TABLE `results` (\n `name` varchar(1000) NOT NULL,\n `no` varchar(500) DEFAULT NULL,\n `description` varchar(500) DEFAULT NULL,\n `version` varchar(500) DEFAULT NULL,\n `ext` longtext,\n PRIMARY KEY (`name`),\n UNIQUE KEY `domain_UNIQUE` (`name`)\n) ENGINE=InnoDB DEFAULT CHARSET=latin1'
The errors table:
'erros', 'CREATE TABLE `erros` (\n `error` varchar(500) DEFAULT NULL,\n `name` varchar(1000) NOT NULL,\n `code` longtext,\n PRIMARY KEY (`name`),\n UNIQUE KEY `ip_UNIQUE` (`name`)\n) ENGINE=InnoDB DEFAULT CHARSET=latin1'
I want to query the name field where exist in the input table, but not exists in the results and not in the errors tables.
I tried constructing the query using NOT IN, but it lasts forever and then mysql workbench crashes. Note that the name field in the errors and results tables is always a name that exist in the input, but with a fixed xxx prefix.
Here is my attempt:
select input.name
from myscheme.input, myscheme.results, myscheme.erros
where concat('xxx',input.name) not in (select results.name from myscheme.results)
and concat('xxx',input.name) not in (select erros.name from myscheme.erros);
Can you please help me query the name field where exist in the input but not in the results and not in the errors.
I would use not exists:
select i.name
from myschee.input i
where not exists (select 1 from myscheme.results r where r.name = concat('xxx', i.name)) and
not exists (select 1 from myscheme.errors e where e.name = concat('xxx', i.name));
Notes:
I don't know why you want to concatenate 'xxx' to the name, but presumably you have a reason.
The query is only selecting from one table, inputs, so that should be the only table in the FROM clause.
I strongly recommend NOT EXISTS over NOT IN, because NOT EXISTS works (as expected) even when the subquery returns NULL values.
I really don't understand the logic you are attempting in your WHERE clause. This seems like an AND condition with comparisons on the two tables.

Confusion on creating index to improve join performance in mysql

I read many posts on forum, but still I have confusion on creating index to speed up join queries in mysql, here is my doubt
I have two tables, one is category table which just contains few thousand lines, and contains all information about data, and another one is geo_data table which contains huge amount of data, I join geo_data table based on 2 keys s_key1 and s_key2. Following is structure of table
category table
CREATE TABLE `category` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`s_key1` int(11) DEFAULT NULL,
`s_key2` int(11) DEFAULT NULL,
`STD_DATE` datetime DEFAULT NULL,
`LATITUDE` float DEFAULT NULL,
`LONGITUDE` float DEFAULT NULL,
`COUNTRY_CD` varchar(15) DEFAULT NULL,
`INSTR_CODE` varchar(15) DEFAULT NULL,
`CANADACR_CD` varchar(15) DEFAULT NULL,
`PROBST_T` varchar(15) DEFAULT NULL,
`TYPE` varchar(15) DEFAULT NULL,
PRIMARY KEY (`Id`)
) ENGINE=MyISAM AUTO_INCREMENT=32350 DEFAULT CHARSET=latin1;
geo_data table
CREATE TABLE `geo_data` (
`s_key1` int(11) DEFAULT NULL,
`s_key2` int(11) DEFAULT NULL,
`MAGNETIC` float DEFAULT NULL,
`GRAVITY` float DEFAULT NULL,
`BATHY` float DEFAULT NULL,
`CORE` float DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I have many tables like geo_data table which contains s_key1, s_key2 and other columns, in my application I often use fields std_date,latitude,longitude,country_cd,type from category table
I do inner join, sometimes left join depending on the requirement, for example my query looks like below
SELECT
c.s_key1,
c.s_key2,
c.std_date,
c.latitude,
c.longitude,
g.magnetic,
g.bathy
FROM
category c, geo_data g
WHERE
c.s_key1 = g.s_key1 && c.s_key2 = g.s_key2;
and sometimes my where clause will have something like this too
WHERE
c.latitude between -30 to 30 AND
c.longitude between 10 to 140 AND
c.country_cd = 'INDIA' AND
c.type = 'NON_PROFIT';
So what's the right way of creating index to speed up my query, whether below one right ? please someone help
create index `myindex` on
`category` (s_key1,s_key2,std_date,latitude,longitude,country_cd)
create index `myindex` on
`geo_data` (s_key1,s_key2)
and One more doubt whether both tables (category,geo_data) should have index key to speed up performance or only geo_data table ?
From the where condition it makes sense to simplify the first index as:
create index `myindex` on
`category` (s_key1,s_key2)
The original however can improve the performance in terms that it doesn't have to access the full table row to get the other values. However it makes the index bigger and therefore slower. So it depends on whether this is optimization for only this query or there are more of them which use only the s_key1 and s_key2 (or with combination with other columns).
Regarding the clarification - for lat/lng check it will make sense to move std_date after lat/lng (or remove completely):
create index `myindex` on
`category` (s_key1,s_key2,latitude,longitude,std_date,country_cd)

MySQL + PHP ensure unique username (auto increment suffix)

Hi i am looking for the most performant way to ensure a unique username.
I did already check similar questions but none of them made me happy.
So here I came up with my solution. I appreciate your comments.
CREATE TABLE IF NOT EXISTS `user` (
`guid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`firstname` varchar(48) NOT NULL,
`lastname` varchar(48) NOT NULL,
`username` varchar(128) NOT NULL,
`unique_username` varchar(128) NOT NULL,
PRIMARY KEY (`guid`),
KEY `firstname` (`firstname`),
KEY `lastname` (`lastname`),
KEY `username` (`username`),
UNIQUE KEY `unique_username` (`unique_username`),
UNIQUE KEY `email` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
username contains firstname.lastname without numeric suffix while unique_username contains firstname.lastname.(count of equal usernames)
to get the count of equal usernames I am performing following query against the user table (in advance to the insert).
SELECT COUNT(*) FROM user WHERE username = 'username'
Unfortunately I can't use a lookup against firstname and lastname since they are case sensitive.
The docs say “nonbinary strings (CHAR, VARCHAR, TEXT), string searches use the collation of the comparison operands… nonbinary string comparisons are case insensitive by default”, so you should be able to do this:
SELECT COUNT(*) FROM user WHERE CONCAT_WS('.', `firstname`, `lastname`) = 'username`
To get around the case sensitivity you can use LCASE(column) to compare lower case values:
SELECT COUNT(*) FROM user
WHERE LCASE(lastname) = LCASE('Lastname')
AND LCASE(firstname) = LCASE('firstName');
You could also use LIKE to check the username field:
SELECT COUNT(*) FROM user WHERE username LIKE 'username%';
That way 'this.name', 'this.name.1' and 'this.name.2' would all get counted together.
I think both of these solutions will not let the optimizer take advantage of the indexes, so the performance might go down, but might be a non-issue.

How can I remove the underscores from all database data input?

I have this query run on a table of mine:
CREATE TABLE IF NOT EXISTS `checkout` (
`customer` varchar(32) NOT NULL,
`time` varchar(100) NOT NULL,
`productid` varchar(100) NOT NULL,
`price` decimal(10,0) NOT NULL,
`tickets` int(11) NOT NULL,
`index` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`index`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=2 ;
If, for example, a customer's name is "John Goodwill", in MySQL his name will auto convert to "John_Goodwill". I use the following code to revert:
update checkout set customer = replace(customer, '_', ' ');
This works for my current customers input in the database, but if any new input is added it goes back to underscore.
What query can I run to cause this to be set for every input?
Clean your data before it reaches the level where your database needs to manipulate it.
Solved by adding the following PHP:
$customer = str_replace("-", " ", $customer); $customer = str_replace("_", " ", $customer);