MySQL query/Search Experiment - mysql

Puzzle puzzle, riddle me functional (MySQL query/Search Experiment)
Stored Table
--------------------------------------------
| id | namespace | key | value |
--------------------------------------------
| 1 | page | item.id | test1 |
| 1 | page | content.title | page2 |
| 1 | trigger | tag | val1 |
| 2 | page | item.id | t1 |
| 2 | page | content.title | page3 |
| 2 | trigger | tag | val2 |
| 2 | oddball | num | in |
| 3 | truck | plate | 12345 |
--------------------------------------------
Search parameter: "page" can be anywhere but not in id
Desired Request output:
---------------------------------------------------------------------
|id | page.item.id | page.content.title | trigger.tag | oddball.num |
---------------------------------------------------------------------
|1 | test1 | page2 | val1 | NULL |
|2 | t1 | page3 | val2 | in |
---------------------------------------------------------------------
Hints:
ok solution: Solution with backend language (ex: php) + SQL queries
better solution: Solution with stored procedures
best solution: Solution with single SQL query, (pivot table?, temporary table?)
Fastest solution wins! (50 bounty points)
Cheers!
Goal is to have dynamic columns from agregated rows.

To get it working as pivot table you must run two queries:
Get the columns to be used
select distinct concat(namespace,'.',`key`) as `column`,
namespace,`key` from your_table;
+--------------------+-----------+---------------+
| column | namespace | key |
+--------------------+-----------+---------------+
| page.item.id | page | item.id |
| page.content.title | page | content.title |
| trigger.tag | trigger | tag |
| oddball.num | oddball | num |
| truck.plate | truck | plate |
+--------------------+-----------+---------------+
Combine with unique ids and get the each value as sub-query, to prevent sub-query more than one result it must contain aggregate function, I used max().
I created a stored procedure:
DELIMITER $$
DROP PROCEDURE IF EXISTS `get_pivot_table`$$
CREATE PROCEDURE `get_pivot_table`()
BEGIN
declare done int default 0;
declare v_sql text;
declare v_column varchar(100);
declare v_namespace varchar(100);
declare v_key varchar(100);
-- (1) getting the columns with this cursor
declare c_columns cursor for
select distinct concat(namespace,'.',`key`) as `column`
, namespace
,`key`
from your_table;
declare continue handler for not found set done = 1;
open c_columns;
-- (2) now creating the sub-queries based on cursor results
set v_sql = "select p.id ";
read_loop: loop
fetch c_columns into v_column, v_namespace, v_key;
if done then
leave read_loop;
end if;
set v_sql = concat(v_sql,", (select max(t.`value`) from your_table t
where t.id = p.id
and t.namespace = '", v_namespace ,"'
and t.`key` = '", v_key ,"') as `", v_column,"` ");
end loop;
close c_columns;
-- now run the entire query
set #sql = concat(v_sql," from (select distinct id from your_table) as p");
prepare stmt1 from #sql;
execute stmt1;
deallocate prepare stmt1;
END$$
DELIMITER ;
Then you can call the stored procedure:
mysql> call get_pivot_table();
+------+--------------+--------------------+-------------+-------------+-------------+
| id | page.item.id | page.content.title | trigger.tag | oddball.num | truck.plate |
+------+--------------+--------------------+-------------+-------------+-------------+
| 1 | test1 | page2 | val1 | NULL | NULL |
| 2 | t1 | page3 | val2 | in | NULL |
| 3 | NULL | NULL | NULL | NULL | 12345 |
+------+--------------+--------------------+-------------+-------------+-------------+
3 rows in set (0.00 sec)
The speed of that query will depend on the indexes of your_table and the amount of data.
It is based on An approach to mysql dynamic cross reference article.

Here's my solution using pivot table. Not in a single query though...
USE tempdb
GO
CREATE TABLE _temp ([id] int, [namespace] varchar(20), [key] varchar(20), [value] varchar(20))
INSERT INTO _temp VALUES (1, 'page', 'content.title', 'page2')
INSERT INTO _temp VALUES (1, 'page', 'item.id', 'test1')
INSERT INTO _temp VALUES(1, 'trigger', 'tag', 'val1')
INSERT INTO _temp VALUES (2, 'oddball', 'num', 'in')
INSERT INTO _temp VALUES (2, 'page', 'content.title', 'page3')
INSERT INTO _temp VALUES (2, 'page', 'item.id', 't1')
INSERT INTO _temp VALUES (2, 'trigger', 'tag', 'val2')
INSERT INTO _temp VALUES (3, 'truck', 'plate', '12345')
DECLARE #param AS varchar(15)
SET #param = 'page'
DECLARE #c AS nvarchar(100)
DECLARE #sql AS nvarchar(max)
SELECT #c =
ISNULL(
#c + ',[' + c + ']',
'[' + c + ']'
)
FROM (SELECT DISTINCT [namespace] + '.' + [key] AS c FROM _temp WHERE id IN (SELECT id FROM _temp WHERE ISNULL([namespace], '') + ISNULL([key], '') + ISNULL([value], '') LIKE '%' + #param + '%') ) AS col
SET #sql = N'
SELECT *
FROM
(
SELECT id,
namespace + ''.'' + [key] AS [column],
value
FROM _temp
WHERE id IN (SELECT id FROM _temp WHERE ISNULL([namespace], '''') + ISNULL([key], '''') + ISNULL([value], '''') LIKE ''%' + #param + '%'')
) AS src
PIVOT
(
MAX(value)
FOR [column]
IN (' + #c + ')
) AS piv'
EXECUTE (#sql)
DROP TABLE _temp

The nature of pivot tables in SQL is that it takes two queries.
The first to discover the set of distinct values and build a dynamic SQL query with one column per distinct value.
The second query to run the the dynamic query to get the pivot table result.
The reason for this is that SQL requires that you define the select-list columns before it accesses any data. There is no SQL query that can dynamically expand the columns of the select-list based on the distinct data values it discovers as it scans the table.
In other words: you can't pivot in a single SQL query.
Even in SQL implementations that have a built-in PIVOT operation, like Microsoft SQL Server, you still have to name the columns in the query syntax before you run it. Which means you need to know the distinct values you want to represent in the columns before that.
You would discover the distinct values with a simple query like this:
SELECT DISTINCT namespace, `key` FROM NoOneEverNamesTheirTableInSqlQuestions;
Then use the result of that to build a dynamic SQL query.
$sql = "SELECT DISTINCT namespace, `key` FROM NoOneEverNamesTheirTableInSqlQuestions";
$stmt = $pdo->query($sql);
$results = $stmt->fetchAll(PDO::FETCH_ASSOC);
$select_list = [];
foreach ($results as $row) {
$select_list[] = sprintf(
"MAX(CASE WHEN namespace=%s AND `key`=%s THEN value END) AS `%s.%s`",
$pdo->quote($row['namespace']), $pdo->quote($row['key']),
$row['namespace'], $row['key']);
}
$dynamic_sql = sprintf(
"SELECT id, %s FROM NoOneEverNamesTheirTableInSqlQuestions GROUP BY id",
implode(', ', $select_list));
You could also use SQL to do both at the same time, by returning the result of the first query in the form of a new SQL query to do the actual pivot.
SELECT CONCAT('SELECT id, ', GROUP_CONCAT(DISTINCT CONCAT(
'MAX(CASE WHEN namespace=', QUOTE(namespace), ' AND `key`=', QUOTE(`key`),
' THEN value END) AS `', CONCAT_WS('.', namespace, `key`), '`')),
' FROM NoOneEverNamesTheirTableInSqlQuestions GROUP BY id;') AS _sql
FROM NoOneEverNamesTheirTableInSqlQuestions;
The output of the query above is the real dynamic SQL for the pivot query, with each respective column of the select-list populated:
SELECT id,
MAX(CASE WHEN namespace='page' AND `key`='content.title' THEN value END) AS `page.content.title`,
MAX(CASE WHEN namespace='page' AND `key`='item.id' THEN value END) AS `page.item.id`,
MAX(CASE WHEN namespace='trigger' AND `key`='tag' THEN value END) AS `trigger.tag`,
MAX(CASE WHEN namespace='oddball' AND `key`='num' THEN value END) AS `oddball.num`,
MAX(CASE WHEN namespace='truck' AND `key`='plate' THEN value END) AS `truck.plate`
FROM NoOneEverNamesTheirTableInSqlQuestions GROUP BY id;
Then you run the dynamic query and you get the result you asked for:
+----+--------------------+--------------+-------------+-------------+-------------+
| id | page.content.title | page.item.id | trigger.tag | oddball.num | truck.plate |
+----+--------------------+--------------+-------------+-------------+-------------+
| 1 | page2 | test1 | val1 | NULL | NULL |
| 2 | page3 | t1 | val2 | in | NULL |
| 3 | NULL | NULL | NULL | NULL | 12345 |
+----+--------------------+--------------+-------------+-------------+-------------+
Here's both steps implemented as a MySQL stored procedure:
DELIMITER ;;
CREATE PROCEDURE PivotProc()
BEGIN
SELECT CONCAT('SELECT id, ', GROUP_CONCAT(DISTINCT CONCAT(
'MAX(CASE WHEN namespace=', QUOTE(namespace), ' AND `key`=', QUOTE(`key`),
' THEN value END) AS `', CONCAT_WS('.', namespace, `key`), '`')),
' FROM NoOneEverNamesTheirTableInSqlQuestions GROUP BY id;') AS _sql
FROM NoOneEverNamesTheirTableInSqlQuestions
INTO #sql;
PREPARE stmt FROM #sql;
EXECUTE stmt;
END;;
So what's the alternative if you don't want to run two queries?
The alternative is to run a single simple query to fetch the data as it exists in the database, with multiple rows per id. Then fix it up by post-processing it your application.
$sql = "SELECT id, namespace, `key`, value FROM NoOneEverNamesTheirTableInSqlQuestions";
$stmt = $pdo->query($sql);
$results = $stmt->fetchAll(PDO::FETCH_ASSOC);
$pivot_results = [];
foreach ($results as $row) {
if (!array_key_exists($row['id'], $pivot_results)) {
$pivot_results[$row['id']] = ['id' = $row['id']];
}
$field = sprintf("%s.%s", $row['namespace'], $row['key']);
$pivot_results[$row['id']][$field] = $row['value'];
}
Once you're done post-processing, you'll have a hash array with one row per id, each pointing to a hash array of fields indexed as the namespace.key names you described.

Related

Can I retrieve data just from columns whose names have in another table a certain property

I have tables products and column_names as follows:
products
+----+------+-----------+--------------+-------+
| id | code | category | description | link |
+----+------+-----------+--------------+-------+
| 1 | 1111 | category1 | description1 | link1 |
| 2 | 2222 | category1 | description2 | link2 |
| 3 | 3333 | category1 | description3 | link3 |
| 4 | 4444 | category2 | description4 | link4 |
| 5 | 5555 | category2 | description5 | link5 |
| 6 | 6666 | category3 | description6 | link6 |
+----+------+-----------+--------------+-------+
column_names
+----+-------------+-------+
| id | column | type |
+----+-------------+-------+
| 1 | id | type1 |
| 2 | code | type1 |
| 3 | category | type2 |
| 4 | description | type2 |
| 5 | link | type3 |
+----+-------------+-------+
I can make this statement:
SELECT ( SELECT `column` FROM `column_names` WHERE `column_id` = 3) FROM `products` WHERE `id` = 1
while I cannot get this statement:
SELECT ( SELECT `column` FROM `column_names` WHERE `type` = 'type2') FROM `products` WHERE `id` = 1
It gives me error #1242 - Subquery returns more than 1 row
But is it possible to perform query like that? Namely, I would like to extract data just of certain columns in the products table that have certain type in the column_names table.
Is this the right design of the tables or should there be another approach? Of course category should be in another table but this is not what I am asking.
Thank you very much!
So, thanks to Gordon Linoff and A Paul I was able to do what I wanted. I know this is probably a clumsy solution but it works. Anyone who would like to point out clumsiness is welcome.
So, first I created a user defined procedure GetMyColumns(). I cannot tell the exact purpose of the first two lines. It is just what the phpMyAdmin editor for functions added when I chose option in the editor.
DELIMITER $$
CREATE DEFINER=`geonextp`#`localhost` FUNCTION `GetMyColumns`(`type` VARCHAR(258)) RETURNS VARCHAR(4096) CHARSET latin1 NOT DETERMINISTIC READS SQL DATA SQL SECURITY DEFINER BEGIN
DECLARE rownum INT DEFAULT 1;
DECLARE counter INT DEFAULT 1;
DECLARE columns_string VARCHAR(4096) DEFAULT '';
DECLARE col_string VARCHAR(512);
SET rownum = (SELECT COUNT(*) FROM column_names);
SET columns_string = '';
WHILE counter <= rownum DO
SET col_string = (
SELECT `column_name`
FROM `column_names`
WHERE
`column_id` = counter AND
`column_type` = type
);
IF col_string IS NULL
THEN
SET col_string = '';
END IF;
IF columns_string = '' THEN
SET columns_string = col_string;
ELSE
IF NOT (col_string = '')
THEN
SET columns_string = CONCAT(CONCAT(columns_string, ', '), col_string);
END IF;
END IF;
SET counter = counter + 1;
END WHILE;
RETURN columns_string;
END$$
DELIMITER ;
Then I added a little piece of code A Paul suggested:
SET #inner_sql = GetColumns('type2');
SET #sql = CONCAT(CONCAT('SELECT ', #inner_sql), ' FROM products WHERE id = 1');
PREPARE stmt FROM #sql;
EXECUTE stmt;
The result is exactly:
+-----------+--------------+
| category | description |
+-----------+--------------+
| category1 | description1 |
+-----------+--------------+
It's been an experience :)

JSON merge arrays and UNIQUE or DISTINCT

In my MySQL database I have a table features with this structure:
+-------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| val | json | NO | | NULL | |
+-------+---------+------+-----+---------+----------------+
When I select everything from it, like so, I get the following:
mysql> select * from features;
+----+----------------------------------+
| id | val |
+----+----------------------------------+
| 1 | ["apple", "banana", "orange"] |
| 2 | ["apple", "orange", "pineapple"] |
| 3 | ["orange", "banana"] |
| 4 | [] |
+----+----------------------------------+
The value in the val column should always be an array of strings. This array can have any length (>= 0).
The question is:
How can I select all those array values in a single result set, not repeated? So that I get this result and use it in PHP:
+------------+
| arr_values |
+------------+
| apple |
| banana |
| orange |
| pineapple |
+------------+
The only constraint to solve this is that it should be compatible with MySQL v5.7.
If maximal amount of elements per JSON value is limited then (an example for not more than 10 elements)
SELECT DISTINCT JSON_EXTRACT(features.val, CONCAT('$[', numbers.num, ']')) arr_values
FROM features, ( SELECT 0 num UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 ) numbers
HAVING arr_values IS NOT NULL;
If really the max array size is limited nevertheless (for example, 1000000) then it is possible to generate the dynamic table with proper amount of number. But stored procedure with iterational parsing and temporary table is more safe solution.
UPDATE.
Non-limited solution (stored procedure).
CREATE PROCEDURE get_unique ()
BEGIN
CREATE TEMPORARY TABLE temp (val JSON);
INSERT INTO temp
SELECT val
FROM features;
CREATE TEMPORARY TABLE tmp (val JSON);
cycle: LOOP
INSERT IGNORE INTO tmp
SELECT JSON_EXTRACT(val, '$[0]')
FROM temp;
DELETE
FROM temp
WHERE JSON_EXTRACT(val, '$[1]') IS NULL;
UPDATE temp
SET val = JSON_REMOVE(val, '$[0]');
IF 0 = (SELECT COUNT(*)
FROM temp) THEN
LEAVE cycle;
END IF;
END LOOP;
DROP TEMPORARY TABLE temp;
SELECT DISTINCT *
FROM tmp
WHERE val IS NOT NULL;
DROP TEMPORARY TABLE tmp;
END
fiddle

MySQL Query INNER JOIN with aliases

I have two tables: users and users_info
users looks like this:
+----+----------+-------+
| id | slug | name |
+----+----------+-------+
| 1 | theploki | Kris |
+----+----------+-------+
and users_info looks like this:
+----+--------+----------+---------------+
| id | parent | info_key | info_val |
+----+--------+----------+---------------+
| 1 | 1 | email | kris#kris.com |
+----+--------+----------+---------------+
| 2 | 1 | age | 28 |
+----+--------+----------+---------------+
I want to SELECT a user who has user_info email = 'kris#kris.com'
- and -
return ALL user_info values and users values
Here's the result I'm looking for:
+----+----------+-------+---------------+-----+
| id | slug | name | email | age |
+----+----------+-------+---------------+-----+
| 1 | theploki | Kris | kris#kris.com | 28 |
+----+----------+-------+---------------+-----+
So far the closest I've gotten is with this query:
SELECT users.*, users_info.* FROM users
INNER JOIN users_info on users_info.parent = users.id
where users.id = (SELECT users_info.parent FROM users_info
WHERE users_info.parent = users.id
AND users_info.info_val = 'kris#kris.com')
And it returns this result:
+----+----------+-------+----+--------+----------+---------------+
| id | slug | name | id | parent | info_key | info_val |
+----+----------+-------+----+--------+----------+---------------+
| 1 | theploki | Kris | 1 | 1 | email | kris#kris.com |
+----+----------+-------+----+--------+----------+---------------+
| 1 | theploki | Kris | 2 | 1 | age | 28 |
+----+----------+-------+----+--------+----------+---------------+
Obviously I don't need the id of the users_info result and I want each info_key to be the "alias" (or column name) and each info_val to be the value for that "alias".
For this case, you can do it like this;) Just a simple table pivot.
select
users.id,
users.slug,
users.name,
max(if(users_info.info_key = 'email', users_info.info_val, null)) as email,
max(if(users_info.info_key = 'age', users_info.info_val, null)) as age
from users
inner join users_info
on users.id = users_info.parent
group by users.id
SQLFiddle DEMO HERE
If you have a dynamic info_key, you will need a dynamic sql to do this, here I give you a sample.
SET #sql = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'max(if(users_info.info_key = ''',
users_info.info_key,
''', users_info.info_val, null)) as ',
users_info.info_key
)
) INTO #sql
FROM users
inner join users_info
on users.id = users_info.parent
;
SET #sql = CONCAT('select users.id, users.slug, users.name, ', #sql, ' FROM users
inner join users_info group by users.id having email = \'kris#kris.com\'');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SQLFiddle DEMO HERE
This utilizes a change in the schema to support casting of data coming back. And it hinges on the use of a stored procedure.
The maximum value of group_concat is governed by your setting for the following variable (which is usually defaulted rather low, like 1K):
set session group_concat_max_len = 20000;
Embed that call at the top of your stored proc under BEGIN. The manual page is here. The value can be huge. For instance, at least 4GB.
Schema
drop table if exists users;
create table users
(
id int auto_increment primary key,
slug varchar(100) not null,
name varchar(100) not null
-- other indexes here like uniqueness, etc (btw none added)
);
drop table if exists users_info;
create table users_info
(
id int auto_increment primary key,
parent int not null,
info_key varchar(100) not null,
info_val varchar(100) not null,
datatype varchar(100) not null, -- see http://stackoverflow.com/a/8537070/ (DATA TYPES)
-- other indexes here (btw none added)
-- FK below:
foreign key `ui_2_users_9283` (parent) references users(id) -- I guess
);
Load Test data;
-- delete from users; -- note truncate disallowed on parent with an FK (so delete !)
insert users(slug,name) values
('theploki','Kris'),
('Yoda','Yoda');
-- select * from users;
-- truncate table users_info;
insert users_info(parent,info_key,info_val,datatype) values
(1,'email','kris#kris.com','char(100)'),
(1,'age','28','unsigned'),
(2,'birthdate','1996-02-14','date'),
(2,'email','yoda#starwars.com','char(100)'),
(2,'networth','102504.12','decimal(12,2)'),
(2,'age','910','unsigned');
Stored Procedure:
drop procedure if exists fetchInfoKeysByEmailAddr;
DELIMITER $$
create procedure fetchInfoKeysByEmailAddr(emailAddr varchar(100))
BEGIN
set #parentid=-1;
select parent into #parentid
from users_info
where info_key='email' and info_val=emailAddr;
if #parentid>0 then
-- http://stackoverflow.com/a/8537070/ (DATA TYPES)
SELECT GROUP_CONCAT(concat('cast("',info_val,'" as ',datatype,') as ',info_key)
ORDER BY info_key SEPARATOR ',') into #tail
FROM users_info
where parent=#parentid
GROUP BY parent;
set #final:=concat("select id,slug,name,",#tail,' from users where id=',#parentid);
PREPARE stmt1 FROM #final;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
end if;
END$$
DELIMITER ;
Test:
call fetchInfoKeysByEmailAddr('x');
-- user info does not exist, empty (todo: change accordingly)
call fetchInfoKeysByEmailAddr('kris#kris.com');
+----+----------+------+-----+---------------+
| id | slug | name | age | email |
+----+----------+------+-----+---------------+
| 1 | theploki | Kris | 28 | kris#kris.com |
+----+----------+------+-----+---------------+
call fetchInfoKeysByEmailAddr('yoda#starwars.com');
+----+------+------+-----+------------+-------------------+-----------+
| id | slug | name | age | birthdate | email | networth |
+----+------+------+-----+------------+-------------------+-----------+
| 2 | Yoda | Yoda | 910 | 1996-02-14 | yoda#starwars.com | 102504.12 |
+----+------+------+-----+------------+-------------------+-----------+
Due to the cast call embedded in the select, the data is brought back in its native, anticipated data type. Which means you can work on it directly.

MYSQL & innoDB alter dynamically AUTO_INCREMENT of a table

I have a problem, for example in my system I have the next table:
CREATE TABLE `sales` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`amount` FLOAT NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
-- is more complex table
With content:
+-----+-------+
| id | amount|
+-----+-------+
|2023 | 100 |
|2024 | 223 |
|2025 | 203 |
|... |
|2505 | 324 |
+-----+-------+
I don't know the current id(There are sales every day). I'm trying to normalize the table.
UPDATE sales SET id=id - 2022;
Result:
+-----+-------+
| id | amount|
+-----+-------+
| 1 | 100 |
| 2 | 223 |
| 3 | 203 |
|... |
| 482 | 324 |
+-----+-------+
The problem
My problem was trying to change the AUTO_INCREMENT, f.e.:
ALTER TABLE sales AUTO_INCREMENT = 483;
Its correct but I don't know the current id :(, I try the following query:
ALTER TABLE sales AUTO_INCREMENT = (SELECT MAX(id) FROM sales );
This causes me a error(#1064). Reading the documentation tells me:
In MySQL, you cannot modify a table and select from the same table in a subquery.
http://dev.mysql.com/doc/refman/5.7/en/subqueries.html
I try whit variables:
SET #new_index = (SELECT MAX(id) FROM sales );
ALTER TABLE sales AUTO_INCREMENT = #new_index;
But, this causes a error :(.
ALTER TABLE must have literal values in it by the time the statement is parsed (i.e. at prepare time).
You can't put variables or parameters into the statement at parse time, but you can put variables into the statement before parse time. And that means using dynamic SQL:
SET #new_index = (SELECT MAX(id) FROM sales );
SET #sql = CONCAT('ALTER TABLE sales AUTO_INCREMENT = ', #new_index);
PREPARE st FROM #sql;
EXECUTE st;
Thanks to Bill Karwin, my query was:
SET #sales_init = 2022;
DELETE FROM `sales` WHERE `sales`.`id` <= #sales_init;
UPDATE sales SET id=id - #sales_init;
-- set new index for sales
SET #new_init = (SELECT MAX(id) + 1 FROM sales );
SET #query = CONCAT("ALTER TABLE sales AUTO_INCREMENT = ", #new_init);
PREPARE stmt FROM #query;
EXECUTE stmt;

Grouping/clustering long repeating results into columns of data

I am collecting some information in mysql instead of excel. There are some labels defined for each cell type, and not all labels maybe present. So, I have 3 label, information and cell tables.
select cell_name, label, information from onco_celldb_information as info
left join onco_celldb_cells as cell on cell.`celldb_cell_id` = info.`celldb_cell_id`
left join onco_celldb_labels as label on info.`celldb_label_id` = label.`celldb_label_id`
order by cell.celldb_cell_id asc;
which results into:
running query above http://f.cl.ly/items/0m2k1a410s3D0K2Y0l1u/Screen%20Shot%202012-08-22%20at%2011.57.36%20AM.png
However what I want is to have something like this:
CellName Species CellType Origin
---------+-----------+-----------+-----------
P-815 Murine Mastroxxxx Human
L292 Something Megatrone Mouse
So to have them grouped by cellname, and have the results as columns. If the labels are not present just have NULL there (some results may not have a label present).
What do you suggest?
Edit with database structure:
mysql> describe celldb_cells;
+----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+----------------+
| celldb_cell_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| cell_name | varchar(256) | YES | | NULL | |
+----------------+------------------+------+-----+---------+----------------+
describe celldb_information;
+-----------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+------------------+------+-----+---------+----------------+
| celldb_information_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| celldb_cell_id | int(11) unsigned | YES | MUL | NULL | |
| celldb_label_id | int(11) unsigned | NO | MUL | NULL | |
| information | text | YES | | NULL | |
+-----------------------+------------------+------+-----+---------+----------------+
describe celldb_labels;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| celldb_label_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| label | varchar(256) | YES | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
What you are trying to do is called a PIVOT and unfortunately MySQL does not have a PIVOT function but you can replicate it using CASE statements and an aggregate function.
If you know all of the labels ahead of time and the number of them is manageable, then you could hard-code them similar to this:
SELECT cell_name,
MAX(CASE WHEN label = 'Cell Type' THEN information END) 'Cell Type',
MAX(CASE WHEN label = 'DSMZ no.' THEN information END) 'DSMZ no.'
FROM test
GROUP BY cell_name
See SQL Fiddle with Demo
With your query, you would do something like:
SELECT cell_name,
MAX(CASE WHEN label = 'Cell Type' THEN information END) 'Cell Type',
MAX(CASE WHEN label = 'DSMZ no.' THEN information END) 'DSMZ no.'
from onco_celldb_information as info
left join onco_celldb_cells as cell
on cell.`celldb_cell_id` = info.`celldb_cell_id`
left join onco_celldb_labels as label
on info.`celldb_label_id` = label.`celldb_label_id`
GROUP BY cell_name
However, it looks like you are going to have an unknown number of columns, so you will want to use a prepared statement:
SET #sql = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(case when label = ''',
label,
''' then information end) AS ''',
label, ''''
)
) INTO #sql
FROM test;
SET #sql = CONCAT('SELECT cell_name, ', #sql, ' FROM test
group by cell_name');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
See SQL Fiddle with Demo
So for your specific example if would be something like:
SET #sql = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(case when label = ''',
label,
''' then information end) AS ''',
label, ''''
)
) INTO #sql
FROM onco_celldb_labels;
SET #sql = CONCAT('SELECT cell_name, ', #sql, '
from onco_celldb_information as info
left join onco_celldb_cells as cell
on cell.`celldb_cell_id` = info.`celldb_cell_id`
left join onco_celldb_labels as label
on info.`celldb_label_id` = label.`celldb_label_id`
group by cell_name');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
If you know the number of the labels then it is possible to "pivot" the data so that rows become labels.
select cell_name,
max(case when info.celldb_label_id = 1 then information else NULL end) as LabelForInfo1,
max(case when info.celldb_label_id = 2 then information else NULL end) as LabelForInfo2,
max(case when info.celldb_label_id = 3 then information else NULL end) as LabelForInfo3,
..
from
onco_celldb_cells as cell
left join onco_celldb_information as info on cell.celldb_cell_id = info.celldb_cell_id
group by cell.celldb_cell_id, cell.cell_name
order by cell.celldb_cell_id asc;
If number and names of labels are not known - you can construct the query above dynamically on the basis of the information in onco_celldb_labels. So first of all generate the "dynamic" columns for the above query, by executing the following query:
select concat(
'max(case when info.celldb_label_id = ',
convert(celldb_label_id,char),
' then information else NULL end) as `',
label,
'`,')
from celldb_labels
Now join all the returned rows in one string, add beginning and end from the main query and execute. This way you have dynamic labels. As far as I know it is the only way to pivot table in MySQL.
It's not a very pretty solution, but if you only want a couple of the labels as columns and you can specify which ones, something like this should work:
SELECT
s1.cell_name AS cell_name,
s2.information AS Species,
s3.information AS Origin
-- Keep adding selects here for more columns
FROM
(SELECT distinct cell_name FROM onco_celldb_information) AS s1
LEFT JOIN onco_celldb_information AS s2
ON (s1.cell_name = s2.cell_name AND s2.label = 'Species')
LEFT JOIN onco_celldb_information AS s3
ON (s1.cell_name = s3.cell_name AND s3.label = 'Origin')
-- Keep adding more joins here for further columns you want.