Taking the most commonly occuring (modal) value for several columns - sql-server-2008

I am cleaning records that have poorly recorded and inconsistent socio-demographic information over time, for the same person. I want to take the most commonly occuring value (the mode) for each person.
One way to do that is to partition by id and then count how many times each value occurs, retaining the highest count for each id:
DROP TABLE dbo.table
SELECT DISTINCT [id], [ethnic_group] AS [ethnic_mode], ct INTO dbo.table
FROM (
SELECT row_number() OVER (PARTITION BY [id] ORDER BY count([ethnic_group]) DESC) as rn, count([ethnic_group]) as ct, [ethnic_group], [id]
FROM
dbo.mytable GROUP BY [id], [ethnic_group]) ranked
where rn = 1
ORDER BY ct DESC
But I want to do this for several variables (ethnic group, income group and several more).
How can I select the mode for several variables within one statement and insert into one table (rather than creating separate tables for each variable)?
The table below illustrates an example of what I want to do:
DROP TABLE mytable;
CREATE TABLE mytable(
id VARCHAR(2) NOT NULL PRIMARY KEY
,ethnic_group VARCHAR(12) NOT NULL
,ethnic_mode VARCHAR(11) NOT NULL
,income VARCHAR(6) NOT NULL
,income_mode VARCHAR(11) NOT NULL
);
INSERT INTO mytable(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('id','ethnic_group','ethnic_mode','income','income_mode');
INSERT INTO mytable(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('1','white','white','middle','middle');
INSERT INTO mytable(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('1','white','white','middle','middle');
INSERT INTO mytable(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('1','mixed','white','high','middle');
INSERT INTO mytable(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('2','asian','asian','middle','middle');
INSERT INTO mytable(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('2','mixed','asian','middle','middle');
INSERT INTO mytable(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('2','asian','asian','middle','middle');

I would use sub-queries to accomplish this in 1 insert statement.
Here is an example based on the table structure from your illustration:
/* This is the original table and contains duplicate ID's */
DECLARE #source_table TABLE(
id VARCHAR(2) NOT NULL
,ethnic_group VARCHAR(12) NULL
,ethnic_mode VARCHAR(11) NULL
,income VARCHAR(6) NULL
,income_mode VARCHAR(11) NULL
);
/* This is the destination table and will not contain duplicate ID's */
DECLARE #destination_table TABLE(
id VARCHAR(2) NOT NULL PRIMARY KEY
,ethnic_group VARCHAR(12) NULL
,income VARCHAR(6) NULL
);
/* Populate the source table with data */
INSERT INTO #source_table(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('1','white','white','middle','middle');
INSERT INTO #source_table(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('1','white','white','middle','middle');
INSERT INTO #source_table(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('1','mixed','white','high','middle');
INSERT INTO #source_table(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('2','asian','asian','middle','middle');
INSERT INTO #source_table(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('2','mixed','asian','middle','middle');
INSERT INTO #source_table(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('2','asian','asian','middle','middle');
INSERT INTO #source_table(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('3','asian', NULL, NULL, NULL);
INSERT INTO #source_table(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('3',NULL, NULL,'middle', NULL);
INSERT INTO #source_table(id,ethnic_group,ethnic_mode,income,income_mode) VALUES ('3',NULL, NULL, NULL, NULL);
/* Insert from source into destination (removing duplicates) */
INSERT INTO #destination_table
(
id
, ethnic_group
, income
)
SELECT st.id
, (
SELECT TOP 1 ethnic_group
FROM #source_table sub_st
WHERE sub_st.id = st.id
GROUP BY ethnic_group
ORDER BY COUNT(sub_st.id) DESC
)
, (
SELECT TOP 1 income
FROM #source_table sub_st
WHERE sub_st.id = st.id
GROUP BY income
ORDER BY COUNT(sub_st.id) DESC
)
FROM #source_table st
GROUP BY st.id
/* View the destination to see there are no duplicates */
SELECT id
, ethnic_group
, income
FROM #destination_table

Related

Dynamic IN+IF clause with GROUP_CONCAT?

I am filtering buyers (many) for a given property (one) with the main condition of
if the buyer requires specific exact # of bathrooms (buyers.bathroom_exact='1'), match it to the property's bathroom value.
Buyers can select multiple exact bathroom matches. Here's a sample schema:
CREATE TABLE `buyers` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`name` text NOT NULL,
`bathroomreq` tinyint(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `buyers_bathrooms` (
`buyer_id` int(10) unsigned NOT NULL DEFAULT '0',
`number` decimal(10,2) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `buyers` ( name, bathroomreq ) VALUES ( 'John Doe', 1);
INSERT INTO `buyers` ( name, bathroomreq ) VALUES ( 'John Smith', 1);
INSERT INTO buyers_bathrooms ( buyer_id, number ) VALUES ( 1, 8 );
INSERT INTO buyers_bathrooms ( buyer_id, number ) VALUES ( 1, 9 );
INSERT INTO buyers_bathrooms ( buyer_id, number ) VALUES ( 1, 10 );
INSERT INTO buyers_bathrooms ( buyer_id, number ) VALUES ( 2, 5 );
The only way I can think of doing this is using a column which has an IF clause to see if the bathroomreq is 1, then match the group_concat of the buyers specified bathrooms:
SELECT IF ( bathroomreq = '1', 8 IN (
SELECT GROUP_CONCAT(number) AS bathrooms
FROM buyers_bathrooms
WHERE buyers.id = buyers_bathrooms.buyer_id
) , 1 ) AS bathroom_match,
bathroomreq,
id, name
FROM buyers
So this property basically has 8 bathrooms and that's why 8 is hardcoded in there.
Here's a mysql fiddle:
http://sqlfiddle.com/#!2/508cc/1
Is this the only way of solving the problem and is it reliable? Can I do it an alternative way with something like a dynamic INNER JOIN if bathroomreq is 1? Basically, can I avoid having to branch based on bathroom_match in my application code and filter this elegantly through SQL alone?
I think you can do what you want with an exists clause:
select b.bathroomreq, b.id, b.name,
(b.bathroomreq <> 1 or
exists (select 1
from buyers_bathrooms bb
where bb.buyer_id = b.id and bb.number = 8
)
) as bathroom_match
from buyers b;

How to use INSERT ... SELECT with a particular column auto-incrementing, starting at 1?

I am using INSERT ... SELECT to insert a data from specific columns from specific rows from a view into a table. Here's the target table:
CREATE TABLE IF NOT EXISTS `queue` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`customerId` int(11) NOT NULL,
`productId` int(11) NOT NULL,
`priority` int(11) NOT NULL,
PRIMARY KEY (`ID`),
KEY `customerId` (`customerId`),
KEY `productId` (`productId`),
KEY `priority` (`priority`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
The INSERT ... SELECT SQL I have works, but I would like to improve it if possible, as follows: I would like the inserted rows to start with 1 in the priority column, and each subsequent row to increment the priority value by 1. So, if three rows were inserted, the first would be priority 1, the second 2, and the third 3.
A exception to the "start at 1" rule: if there are existing rows in the target table for the specified customer, I would like the inserted rows to start with MAX(priority)+1 for that customer.
I thought I could use a subquery, but here's the problem: sometimes the subquery returns NULL (when there are no records in the queue table for the specified customer), which breaks the insert, as the priority column does not allow nulls.
I tried to CAST the column to an integer, but that still gave me NULL back when there are no records with that customer ID in the table.
I've hardcoded the customer ID in this example, but naturally in my application that would be an input parameter.
INSERT INTO `queue`
(
`customerId`,
`productId`,
`priority`,
`status`,
`orderId`)
SELECT
123, -- This is the customer ID
`PRODUCT_NO`,
(SELECT (MAX(`priority`)+1) FROM `queue` WHERE `customerId` = 123),
'queued',
null
FROM
`queue_eligible_products_view`
Is there a way to do this in one SQL statement, or a small number of SQL statements, i.e., less than SQL statement per row?
I do not think I can set the priority column to auto_increment, as this column is not necessarily unique, and the auto_increment attribute is used to generate a unique identity for new rows.
As Barmar mentions in the comments : use IFNULL to handle your sub query returning null. Hence:
INSERT INTO `queue`
(
`customerId`,
`productId`,
`priority`,
`status`,
`orderId`)
SELECT
123, -- This is the customer ID
`PRODUCT_NO`,
IFNULL((SELECT (MAX(`priority`)+1) FROM `queue` WHERE `customerId` = 123),1),
'queued',
null
FROM
`queue_eligible_products_view`
Here's how to do the incrementing:
INSERT INTO queue (customerId, productId, priority, status, orderId)
SELECT 123, product_no, #priority := #priority + 1, 'queued', null
FROM queue_eligible_products_view
JOIN (SELECT #priority := IFNULL(MAX(priority), 0)
FROM queue
WHERE customerId = 123) var

how to do group by and order by in one query?

I have two table one is 'tb_student' and other is 'tb_fees'
create query for 'tb_student'
CREATE TABLE `tb_student` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) NOT NULL,
`email` varchar(255) NOT NULL,
`class` varchar(255) NOT NULL,
`created_on` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`id`)
)
create query for 'tb_fees'
CREATE TABLE `tb_fees` (
`id` int(11) NOT NULL auto_increment,
`email` varchar(255) NOT NULL,
`amount` varchar(255) NOT NULL,
`created_on` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`id`)
)
In first table i am storing the student details and in other table storing the fees details
I want to select student details from 'tb_student' and last add fee from 'tb_fees' only for those student which are in class 6
so i tried this
SELECT *
FROM tb_student s INNER JOIN
tb_fees f on
s.email =f.email
WHERE s.class = 6 GROUP BY s.email ORDER BY f.created_on DESC
This will give result only the first created how to get last created values
fees table
insert into `tb_fees`(`id`,`email`,`amount`,`created_on`) values (5,'ram#gmail.com','5000','2013-05-01 14:20:15');
insert into `tb_fees`(`id`,`email`,`amount`,`created_on`) values (6,'Sam#gmail.com','5000','2013-05-02 14:20:23');
insert into `tb_fees`(`id`,`email`,`amount`,`created_on`) values (7,'jak#gmail.com','5000','2013-05-03 14:20:30');
insert into `tb_fees`(`id`,`email`,`amount`,`created_on`) values (8,'Sam#gmail.com','5000','2013-05-29 14:20:35');
insert into `tb_fees`(`id`,`email`,`amount`,`created_on`) values (9,'ram#gmail.com','5000','2013-05-30 14:20:39');
insert into `tb_fees`(`id`,`email`,`amount`,`created_on`) values (10,'jak#gmail.com','5000','2013-05-30 14:36:13');
insert into `tb_fees`(`id`,`email`,`amount`,`created_on`) values (11,'rose#gmail.com','5000','2013-05-30 14:36:15');
insert into `tb_fees`(`id`,`email`,`amount`,`created_on`) values (12,'nim#gmail.com','5000','2013-05-30 14:36:15');
Student table values
insert into `tb_student`(`id`,`name`,`email`,`class`,`created_on`) values (5,'Ram','ram#gmail.com','6','2013-04-30 14:00:56');
insert into `tb_student`(`id`,`name`,`email`,`class`,`created_on`) values (6,'Sam','Sam#gmail.com','6','2013-03-30 14:01:30');
insert into `tb_student`(`id`,`name`,`email`,`class`,`created_on`) values (7,'Nimmy','nim#gmail.com','7','2013-04-30 13:59:59');
insert into `tb_student`(`id`,`name`,`email`,`class`,`created_on`) values (8,'jak','jak#gmail.com','6','2013-03-30 14:07:32');
insert into `tb_student`(`id`,`name`,`email`,`class`,`created_on`) values (9,'rose','rose#gmail.com','5','2013-04-30 14:07:51');
Thank you
To get the latest fees something like this:-
SELECT s.* , f.*
FROM tb_student s
INNER JOIN
(SELECT email, MAX(created_on) AS created_on
FROM tb_fees
GROUP BY email) Sub1
ON s.email = sub1.email
INNER JOIN tb_fees f
ON s.email = f.email AND Sub1.created_on = f.created_on
WHERE s.class = 6
By the way, you probably want indexes on the email fields (or better, use the tb_student id field on the tb_fees table instead of the email field and index it)
Use MAX group function
SELECT s.*, f.amount,MAX(f.created_on)
FROM tb_student s
INNER JOIN
tb_fees f
ON
s.email =f.email
WHERE s.class = 6
GROUP BY s.email

How implementation my wants with MySQL JOIN

_
Hello everyone!
I have table
CREATE TABLE `labels` (
`id` INT NULL AUTO_INCREMENT DEFAULT NULL,
`name` VARCHAR(250) NULL DEFAULT NULL,
`score` INT NULL DEFAULT NULL,
`before_score` INT NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);
And I Have This Table
CREATE TABLE `scores` (
`id` INT NULL AUTO_INCREMENT DEFAULT NULL,
`name_id` INT NULL DEFAULT NULL,
`score` INT NULL DEFAULT NULL,
`date` DATETIME DEFAULT NULL,
PRIMARY KEY (`id`)
);
And i want have result where labels.score - have value last scores.score sorted by scores.date and labels.before_score where have value penultimate scores.score sorted by scores.date. Can I do This Only on Mysql slq and how?
Thanks.
ADD
For example i have this data on first table:
INSERT INTO `labels` (id, name, score, before_score) VALUES (1, 'John', 200, 123);
INSERT INTO `labels` (id, name, score, before_score) VALUES (2, 'Eddie', 2000, 2000);
INSERT INTO `labels` (id, name, score, before_score) VALUES (3, 'Bob', 400, 3101);
And second table
INSERT INTO `scores` (`id`,`name_id`,`score`,`date`) VALUES ('1','1','12','2013-07-10');
INSERT INTO `scores` (`id`,`name_id`,`score`,`date`) VALUES ('2','2','2000','2013-05-04');
INSERT INTO `scores` (`id`,`name_id`,`score`,`date`) VALUES ('3','3','654','2012-09-12');
INSERT INTO `scores` (`id`,`name_id`,`score`,`date`) VALUES ('4','1','123','2013-12-17');
INSERT INTO `scores` (`id`,`name_id`,`score`,`date`) VALUES ('5','1','200','2014-04-25');
INSERT INTO `scores` (`id`,`name_id`,`score`,`date`) VALUES ('6','3','3101','2013-12-02');
INSERT INTO `scores` (`id`,`name_id`,`score`,`date`) VALUES ('6','2','2000','2015-12-02');
INSERT INTO `scores` (`id`,`name_id`,`score`,`date`) VALUES ('6','3','400','2013-12-02');
If I understand correctly, you need the last two scores for each name_id.
I would tackle this with temporary tables:
Step 1. Last score:
create temporary table temp_score1
select name_id, max(`date`) as lastDate
from scores
group by name_id;
-- Add the appropriate indexes
alter table temp_score1
add unique index idx_name_id(name_id),
add index idx_lastDate(lastDate);
Step 2. Penultimate score. The idea is exactly the same, but using temp_score1 to filter the data:
create temporary table temp_score2
select s.name_id, max(`date`) as penultimateDate
from scores as s
inner join temp_score1 as t on s.nameId = t.nameId
where s.`date` < t.lastDate
group by name_id;
-- Add the appropriate indexes
alter table temp_score2
add unique index idx_name_id(name_id),
add index idx_penultimateDate(penultimateDate);
Step 3. Put it all together.
select
l.id, l.name,
s1.lastScore, s2.penultimateScore
from
`labels` as l
left join temp_score1 as s1 on l.id = s1.name_id
left join temp_score2 as s2 on l.id = s2.name_id
You can put this three steps inside a stored procedure.
Hope this helps you.

Insert and set value with max()+1 problems

I am trying to insert a new row and set the customer_id with max()+1. The reason for this is the table already has a auto_increatment on another column named id and the table will have multiple rows with the same customer_id.
With this:
INSERT INTO customers
( customer_id, firstname, surname )
VALUES
((SELECT MAX( customer_id ) FROM customers) +1, 'jim', 'sock')
...I keep getting the following error:
#1093 - You can't specify target table 'customers' for update in FROM clause
Also how would I stop 2 different customers being added at the same time and not having the same customer_id?
You can use the INSERT ... SELECT statement to get the MAX()+1 value and insert at the same time:
INSERT INTO
customers( customer_id, firstname, surname )
SELECT MAX( customer_id ) + 1, 'jim', 'sock' FROM customers;
Note: You need to drop the VALUES from your INSERT and make sure the SELECT selected fields match the INSERT declared fields.
Correct, you can not modify and select from the same table in the same query. You would have to perform the above in two separate queries.
The best way is to use a transaction but if your not using innodb tables then next best is locking the tables and then performing your queries. So:
Lock tables customers write;
$max = SELECT MAX( customer_id ) FROM customers;
Grab the max id and then perform the insert
INSERT INTO customers( customer_id, firstname, surname )
VALUES ($max+1 , 'jim', 'sock')
unlock tables;
Use alias name for the inner query like this
INSERT INTO customers
( customer_id, firstname, surname )
VALUES
((SELECT MAX( customer_id )+1 FROM customers cust), 'sharath', 'rock')
SELECT MAX(col) +1 is not safe -- it does not ensure that you aren't inserting more than one customer with the same customer_id value, regardless if selecting from the same table or any others. The proper way to ensure a unique integer value is assigned on insertion into your table in MySQL is to use AUTO_INCREMENT. The ANSI standard is to use sequences, but MySQL doesn't support them. An AUTO_INCREMENT column can only be defined in the CREATE TABLE statement:
CREATE TABLE `customers` (
`customer_id` int(11) NOT NULL AUTO_INCREMENT,
`firstname` varchar(45) DEFAULT NULL,
`surname` varchar(45) DEFAULT NULL,
PRIMARY KEY (`customer_id`)
)
That said, this worked fine for me on 5.1.49:
CREATE TABLE `customers` (
`customer_id` int(11) NOT NULL DEFAULT '0',
`firstname` varchar(45) DEFAULT NULL,
`surname` varchar(45) DEFAULT NULL,
PRIMARY KEY (`customer_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$
INSERT INTO customers VALUES (1, 'a', 'b');
INSERT INTO customers
SELECT MAX(customer_id) + 1, 'jim', 'sock'
FROM CUSTOMERS;
insert into table1(id1) select (max(id1)+1) from table1;
Use table alias in subquery:
INSERT INTO customers
( customer_id, firstname, surname )
VALUES
((SELECT MAX( customer_id ) FROM customers C) +1, 'jim', 'sock')
None of the about answers works for my case. I got the answer from here, and my SQL is:
INSERT INTO product (id, catalog_id, status_id, name, measure_unit_id, description, create_time)
VALUES (
(SELECT id FROM (SELECT COALESCE(MAX(id),0)+1 AS id FROM product) AS temp),
(SELECT id FROM product_catalog WHERE name="AppSys1"),
(SELECT id FROM product_status WHERE name ="active"),
"prod_name_x",
(SELECT id FROM measure_unit WHERE name ="unit"),
"prod_description_y",
UNIX_TIMESTAMP(NOW())
)
Your sub-query is just incomplete, that's all. See the query below with my additions:
INSERT INTO customers ( customer_id, firstname, surname )
VALUES ((SELECT MAX( customer_id ) FROM customers) +1), 'jim', 'sock')
You can't do it in a single query, but you could do it within a transaction. Do the initial MAX() select and lock the table, then do the insert. The transaction ensures that nothing will interrupt the two queries, and the lock ensures that nothing else can try doing the same thing elsewhere at the same time.
We declare a variable 'a'
SET **#a** = (SELECT MAX( customer_id ) FROM customers) +1;
INSERT INTO customers
( customer_id, firstname, surname )
VALUES
(**#a**, 'jim', 'sock')
This is select come insert sequel.
I am trying to get serial_no maximum +1 value and its giving correct value.
SELECT MAX(serial_no)+1 into #var FROM sample.kettle;
Insert into kettle(serial_no,name,age,salary) values (#var,'aaa',23,2000);