CSV formatted GROUP_CONCAT in MySQL - mysql

Let's say I have a Table A that I want to transform into Table B.
The values in Table B should always be a CSV formated text with the same number of fields.
First, I need to know what is the largest number of values that a given category handles (in this case, 3 values in category 1, 2 and 4);
Secondly I also need to use that variable to "add" empty fields(",") to the end of the GROUP_CONCAT when a category has "missing" values.
I need this to have a "consistent" CSV in each cell. The application I'm using to process this data doesn't interpret well CSVs with different column number by row...
Table A
+----+----------+-------+
| id | category | value |
+----+----------+-------+
| 1 | 1 | a |
| 2 | 1 | b |
| 3 | 1 | c |
| 4 | 2 | d |
| 5 | 2 | e |
| 6 | 2 | f |
| 7 | 3 | g |
| 8 | 3 | h |
| 9 | 4 | i |
| 10 | 4 | j |
| 11 | 4 | k |
| 12 | 5 | l |
+----+----------+-------+
Table B
+--------------+---------------------+
| id(category) | value(group_concat) |
+--------------+---------------------+
| 1 | a,b,c |
| 2 | d,e,f |
| 3 | g,h, |
| 4 | i,j,k |
| 5 | l,, |
+--------------+---------------------+
EDITED (SQLFiddle):
http://sqlfiddle.com/#!2/825f8

first, to get the largest number of values that a given category handles:
select count(category) from tableA group by category order by count(category) desc limit 1;
second, to add empty fields(",") to the end of the GROUP_CONCAT when a category has "missing" values.
i created a function called unify_length to help do this.
this is the function:
delimiter $$
CREATE FUNCTION `unify_length`(csv_list CHAR(255), length INT) RETURNS char(255)
DETERMINISTIC
BEGIN
WHILE ((SELECT LENGTH(csv_list) - LENGTH(REPLACE(csv_list, ',', ''))) < length-1) DO /* count the number of occurrances in a string*/
SET csv_list = CONCAT(csv_list, ',');
END WHILE;
RETURN csv_list;
END$$
and this is the function call:
select category, unify_length(GROUP_CONCAT(value), length) from tablea group by category;
where length is what was returned from the first query.

Related

mysql: merge multiple rows into one

I have two tables: state_current and state_snapshots. state_current contains exactly 4 rows, the current values for 4 different keys:
+-----+-------+
| key | value |
+-----+-------+
| A | 1 |
| B | 2 |
| C | 3 |
| D | 4 |
+-----+-------+
Now, I want to add a row to state_snapshots that contains the values of each key in a seperate column:
+---+---+---+---+
| A | B | C | D |
+---+---+---+---+
| 1 | 2 | 3 | 4 |
| 1 | 2 | 3 | 5 |
| 1 | 2 | 4 | 5 |
...
+---+---+---+---+
Of course, the keys never change in state_current, only the values. What mySQL-query will create a row with the value of A in state_current in the first column, the value of B in state_current in the second and so on?
I'm new to mySQL, so thanks in advance for any help!
The simplest answer I can think about is:
insert into state_snapshots(a,b,c,d)
values ( (select value from state_current where key='A'),
(select value from state_current where key='B'),
(select value from state_current where key='C'),
(select value from state_current where key='D')
);

about mysql excute order by 1 and rand(1)?

In mysql I built a table, which id is int type, name and password is varchar type.
excute
select * from test.new_table order by rand(1);
then the result is:
This is because after set seed for rand the sequence is fixed, I already know.But if excute
select * from test.new_table order by 1 and rand(1);
then the result is:
For such a result I do not understand. In addition, if excute order by 'xxx' the results are arranged.
Not quite understand, hope you to give pointers.
You can view the results of the expression as part of your query:
mysql> select *, 1, rand(1), 1 and rand(1) from new_table order by 1 and rand(1);
+----+------+----------+---+---------------------+---------------+
| id | name | password | 1 | rand(1) | 1 and rand(1) |
+----+------+----------+---+---------------------+---------------+
| 1 | ghi | 111 | 1 | 0.40540353712197724 | 1 |
| 3 | abc | 234 | 1 | 0.1418603212962489 | 1 |
| 5 | 5 | 5 | 1 | 0.04671454713373868 | 1 |
| 7 | 7 | 7 | 1 | 0.6108337804776 | 1 |
| 2 | jkl | 123 | 1 | 0.8716141803857071 | 1 |
| 4 | def | 555 | 1 | 0.09445909605776807 | 1 |
| 6 | 6 | 6 | 1 | 0.9501954782290342 | 1 |
+----+------+----------+---+---------------------+---------------+
See how the boolean expression always results in 1?
As #Barmar described, any expression 1 and n results in either 0 or 1, depending on the value of n being zero or nonzero.
So your expression ORDER BY 1 AND RAND(1) is just like ORDER BY true (a constant expression) which means the ordering is a tie between every row, and MySQL orders them in an arbitrary way.
But arbitrary is not the same as random.

Distinct order-number sequence for every customer

I have table of orders. Each customer (identified by the email field) has his own orders. I need to give a different sequence of order numbers for each customer. Here is example:
----------------------------
| email | number |
----------------------------
| test#com.com | 1 |
----------------------------
| example#com.com | 1 |
----------------------------
| test#com.com | 2 |
----------------------------
| test#com.com | 3 |
----------------------------
| client#aaa.com | 1 |
----------------------------
| example#com.com | 2 |
----------------------------
Is possible to do that in a simple way with mysql?
If you want update data in this table after an insert, first of all you need a primary key, a simple auto-increment column does the job.
After that you can try to elaborate various script to fill the number column, but as you can see from other answer, they are not so "simple way".
I suggest to assign the order number in the insert statement, obtaining the order number with this "simpler" query.
select coalesce(max(`number`), 0)+1
from orders
where email='test1#test.com'
If you want do everything in a single insert (better for performance and to avoid concurrency problems)
insert into orders (email, `number`, other_field)
select email, coalesce(max(`number`), 0) + 1 as number, 'note...' as other_field
from orders where email = 'test1#test.com';
To be more confident about not assign at the same customer two orders with the same number, I strongly suggest to add an unique constraint to the columns (email,number)
create a column order_number
SELECT #i:=1000;
UPDATE yourTable SET order_number = #i:=#i+1;
This will keep incrementing the column value in order_number column and will start right after 1000, you can change the value or even you can even use the primary key as the order number since it is unique all the time
I think one more need column for this type of out put.
Example
+------+------+
| i | j |
+------+------+
| 1 | 11 |
| 1 | 12 |
| 1 | 13 |
| 2 | 21 |
| 2 | 22 |
| 2 | 23 |
| 3 | 31 |
| 3 | 32 |
| 3 | 33 |
| 4 | 14 |
+------+------+
You can get this result:
+------+------+------------+
| i | j | row_number |
+------+------+------------+
| 1 | 11 | 1 |
| 1 | 12 | 2 |
| 1 | 13 | 3 |
| 2 | 21 | 1 |
| 2 | 22 | 2 |
| 2 | 23 | 3 |
| 3 | 31 | 1 |
| 3 | 32 | 2 |
| 3 | 33 | 3 |
| 4 | 14 | 1 |
+------+------+------------+
By running this query, which doesn't need any variable defined:
SELECT a.i, a.j, count(*) as row_number FROM test a
JOIN test b ON a.i = b.i AND a.j >= b.j
GROUP BY a.i, a.j
Hope that helps!
You can add number using SELECT statement without adding any columns in table orders.
try this:
SELECT email,
(CASE email
WHEN #email
THEN #rownumber := #rownumber + 1
ELSE #rownumber := 1 AND #email:= email END) as number
FROM orders
JOIN (SELECT #rownumber:=0, #email:='') AS t

MySQL complex nth row selection

I have 2 tables:
Types Data
+----+----------+ +-------+-------+
| id | name | | id | type |
+----+----------+ +-------+-------+
| 1 | name1 | | 1 | 1 |
| 2 | name2 | | 2 | 5 |
| 3 | name3 | | 3 | 7 |
| 4 | name4 | | 4 | 4 |
| 5 | name5 | | 5 | 2 |
| 6 | name6 | | 6 | 6 |
| 7 | name7 | | 7 | 3 |
| .. | .. | | 8 | 5 |
+----+----------+ | 9 | 5 |
| 10 | 4 |
| 11 | 1 |
| 12 | 2 |
| 13 | 6 |
| 14 | 5 |
| 15 | 2 |
| ... | ... |
| 1...? | 1...? |
+-------+-------+
Data table is very large, it contains millions of rows I need to select 1000 rows, but the result has to be from whole table, so every nth row select. I'v done this using answer from How to select every nth row in mySQL starting at n but, I need add some more logic to it, I need a select query that would select every nth row of all the types. I guess this sound complicated so I'll try to describe what I would like to achieve:
Lets say there are 7 Types and Data table has 7M rows 0.5M rows for types 1,2,3, 1.5M rows for types 4,5,6,7 (just be clear intervals may now be equal for all the types).
I need 1000 records that contains equal amounts of types so if I 7 types each type can occur in result set ROUND(1000/7) which would be equal to 142 records per type so I need to select 142 per type from Data table;
For types 1,2,3 which contains 0.5M rows that would be ROUND(0.5M / 142) which equals every nth 3521 row;
For types 4,5,6,7 which contains 1.5M rows that would be ROUND(1.5M / 142) which equals every nth 10563 row;
So result would look something like this:
Result
+-------+------+
| id | type |
+-------+------+
| 1 | 1 |
| 3522 | 1 |
| 7043 | 1 |
| .. | .. |
| .. | 2 |
| .. | 2 |
| .. | .. |
| .. | 3 |
| .. | 3 |
| .. | .. |
| .. | 4 |
| .. | 4 |
| .. | .. |
| .. | 5 |
| .. | 5 |
| .. | .. |
| .. | 6 |
| .. | 6 |
| .. | .. |
| .. | 7 |
| .. | 7 |
| .. | .. |
+-------+------+
I could do this simply in any programming language with multiple queries that return each type's count from Data table, then after doing the maths selecting only single type at the time.
But I would like to do this purely in MySQL, using as less queries as possible.
EDIT
I'll try to explain in more detail what I wan't to achieve with real example.
I have table with 1437823 rows. Table schema looks like this:
+---------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| type | int(11) | NO | | NULL | |
| counter | int(11) | NO | | NULL | |
| time | datetime | NO | | NULL | |
+---------+----------+------+-----+---------+----------------+
That table type statistics is:
+------+-----------+
| Type | Row Count |
+------+-----------+
| 1 | 135160 |
| 2 | 291416 |
| 3 | 149863 |
| 4 | 296293 |
| 5 | 273459 |
| 6 | 275929 |
| 7 | 15703 |
+------+-----------+
(P.S. Types count can change in time.)
Let's say I need to select sample data from time interval, In first version of question I omitted time because I thought of it as insignificant but now I think it might have some significance when ordering to improve performance.
So anyway I need to select approximately 1000 rows sample in which there's equal chunk of data for each type, so the statistic of end result would look like this:
I am selecting 1000 rows with 7 types so ROUND(1000 / 7) = 143 rows per type;
+------+-----------+
| Type | Row Count |
+------+-----------+
| 1 | 143 |
| 2 | 143 |
| 3 | 143 |
| 4 | 143 |
| 5 | 143 |
| 6 | 143 |
| 7 | 143 |
+------+-----------+
So now I need to select 143 rows for each type in equal gaps in time interval. So for single type it would look something like this:
SET #start_date := '2014-04-06 22:20:21';
SET #end_date := '2015-02-20 16:20:58';
SET #nth := ROUND(
(SELECT COUNT(*) FROM data WHERE type = 1 AND time BETWEEN #start_date AND #end_date) / ROUND(1000 / (SELECT COUNT(*) FROM types))
);
SELECT r.*
FROM (SELECT * FROM data WHERE type = 1 AND time BETWEEN #start_date AND #end_date) r
CROSS
JOIN ( SELECT #i := 0 ) s
HAVING ( #i := #i + 1) MOD #nth = 1
Statistics:
+------+-----------+
| Type | Row Count |
+------+-----------+
| 1 | 144 |
+------+-----------+
This query would give me needed results with tolerable performance, but I would need a query for each type which would decrease performance and would require later to concatenate results into single data set since that's what I need for further processing, so I would like to do it in single query or at least get single result set.
P.S. I can tolerate row count deviation in result set as long as type chunks are equal.
This should do what you want (tested on a table with 100 rows with TYPE=1, 200 rows with TYPE=2, 300 rows with TYPE=3, 400 rows with TYPE=4; with the value 10 in _c / 10, I get 40 rows, 10 of each type). Please check the performance, since I'm obviously using a smaller sample table than what you really have.
select * from
(select
#n := #n + 1 _n,
_c,
data.*
from
(select
type _t,
count(*) _c
from data
group by type) _1
inner join data on(_t = data.type)
inner join (select #n := 0) _2 order by data.type) _2
where mod(_n, floor(_c / 10)) = 0
order by type, id;
Although this gets the same number from each group, it isn't guaranteed to get the exact same number from each group, since there are obviously rounding inaccuracies introduced by the floor(_c / 10).
What you want is a stratified sample. A good way to get a stratified sample is to order the rows by the type and assign a sequential number -- the numbering does not have to start over for each type.
You can then get 1000 rows by taking each nth value:
select d.*
from (select d.*, (#rn := #rn + 1) as rn
from data d cross join
(select #rn := 0) vars
order by type
) d
where mod(rn, floor( #rn / 1000 )) = 1;
Note: The final comparison is getting 1 out of n rows to approximate 1000. It might be off by one or two depending on the number of values.
EDIT:
Oops, the above does a stratified sample that matches the original distribution of the types in the data. To get equal counts for each group, enumerate them randomly and choose the first "n" for each group:
select d.*
from (select d.*,
(#rn := if(#t = type, #rn + 1,
if(#t := type, 1, 1)
)
) as rn
from data d cross join
(select #rn := 0, #t := -1) vars
order by type, rand()
) d cross join
(select count(*) as numtypes from types) as t
where rn <= 1000 / numtypes;

Excluding null results and counting number of rows based on several elements

I have the tables 'template' and 'object' at a relationship one to many.
I need to know how many 'objects' share the same 'template'.
This is simple enough but there are two columns in the object table, 'theme' and 'active'.
I need to add to my query to return the count of:
How many objects share the same template.
and have different 'object.active's (active is boolean, and is never null)
(so if there are three 'object's sharing the same 'template' then the count will not be incremented)
and have different object.theme's ('theme' is varchar and can be null)
'theme' will only have a value if object.active is true
excluding null object.themes
My biggest problem is that if 'object.active' all have the same value '1' or '0' then it should not add to the count, but if they all have the same value '1' and each have different object.theme's then they do need to add to the count.
So far I am at the following but when going through manually and counting what the figure should be I get an incorrect result:
SELECT sum(tmpUse) FROM(
SELECT COUNT(*) AS tmpUse,tmp.title FROM templates tmp
LEFT JOIN assessmentinstances ai ON ai.template_id = tmp.id
GROUP BY tmp.id
HAVING tmpUse>1
AND COUNT(DISTINCT ai.data_theme)>1
AND COUNT(DISTINCT ai.mobile_ready)>1
) alias
template table
_____
| id |
-----
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
-----
object table
____________________________________
| id | template_id | active | theme |
|-----|-------------|--------|--------|
| a | 1 | 0 | null |
| b | 1 | 1 | x |
| c | 1 | 1 | y |
| d | 3 | 1 | x |
| e | 3 | 0 | null |
| f | 1 | 1 | z |
| g | 2 | 1 | z |
| h | 2 | 0 | null |
| i | 4 | 1 | y |
| j | 5 | 1 | z |
| k | 1 | 1 | x |
| l | 1 | 0 | null |
| m | 1 | 0 | null |
| n | 3 | 0 | null |
| o | 3 | 1 | x |
|-------------------------------------|
The result I would hope from these tables would be:
id count
1 3
2 1
3 1
4 0
5 0
= 5
Template id 1 has 7 objects, objects include both 0 and 1 actives so look at the themes. the themes associated are the followinf: null, x, y, z, x, null, null. We ignore nulls and duplicates so this would add 3 to the count.
Template id 2 has 2 objects, one is active 1 and one 0, because these are different but only one distinct theme we can add 1 to the count.
Template id 3 has 2 active 1's and two active 0's so we know at least one will be added to the count. looking at their themes, they are the same so no more to the count, so only 1 is added for template id 3.
Template id 4 and 5 both have one object so we know this will not add to the count.
so the output from the query will be:
'5'
try this.
select sum(themecount) from (select template.templateid,
count(distinct case when object.theme IS NOT NULL then object.theme end) themeCount
from Object object where
count(distinct object.isActive)>1
group by object.template_id)