Mysql INNODB performance with redundant values in table - mysql

Please consider a big table with N cols of decimal and M rows.
N could be >50 and M could be > 1000000.
Many many rows will have all the same data, so there is heavy redundancy of tuples (except for id column).
This is the first simple solution
MAIN TABLE
+-----+------+------+-----+------+
| id | col1 | col2 | ... | colN |
+-----+------+------+-----+------+
| 1 | 0 | 0 | ... | 0 |
| 2 | 0 | 0 | ... | 0 |
| ... | ... | ... | ... | ... |
| M | 0 | 0 | ... | 0 |
+-----+------+------+-----+------+
The other solution with two tables could be:
MAIN TABLE
+-----+-------+
| id | refid |
+-----+-------+
| 1 | 1 |
| 2 | 1 |
| ... | ... |
| M | 2 |
+-----+-------+
REFERENCE TABLE (distinct tuples)
+-----+------+------+-----+------+
|refid| col1 | col2 | ... | colN |
+-----+------+------+-----+------+
| 1 | 0 | 0 | ... | 0 |
| 2 | 1 | 0 | ... | 1 |
| ... | ... | ... | ... | ... |
| P | 0 | 1 | ... | 1 |
+-----+------+------+-----+------+
Where REFERENCE TABLE will be joined to MAIN TABLE because it contains all distinct tuples of cols.
My question is: what is the best solution for high performance selecting and summing cols?
SELECT SUM(m.col1), SUM(m.col2) ... SUM(m.colN)
FROM maintable m
or with the other structure
SELECT SUM(r.col1), SUM(r.col2) ... SUM(r.colN)
FROM maintable m
JOIN referencetable r ON m.refid = r.refid
Please consider that the second structure may need a big index to find duplicates before insert new rows in REFERENCE TABLE

Related

Filter every column in MySQL

I have a database with three tables right now : equipements and equipements_statistics that contains the statistics of each equipements and finally stats that contains all type of statistics.
To retrieve an equipement on a filter I'm doing this query :
SELECT
*
FROM
`equipement`
INNER JOIN `equipement_stats` ON `equipement_stats`.`id_equipement` = `equipement`.`id_equipement`
INNER JOIN `stats` ON `stats`.`id_stats` = `equipement_stats`.`id_stats`
WHERE
`stats`.`id_stats` IN(1068, 1069)
GROUP BY
`equipement`.`id_equipement`
HAVING
COUNT(DISTINCT stats.id_stats) = 1
LIMIT 10
Tables are like this :
equipement
+---------------+-----------------+
| id_equipement | name_equipement |
+---------------+-----------------+
| 1 | one |
| 2 | two |
| 3 | three |
+---------------+-----------------+`
equipement_stats
+---------------+-----------+---------------+
| id_equipement | id_stats | random_number |
+---------------+-----------+---------------+
| 1 | 2 | 0 |
| 1 | 4 | 0 |
| 1 | 1069 | 1 |
| 1 | 8 | 0 |
| _____________ | _________ | _____________ |
| 2 | 1070 | 2 |
| 2 | 1069 | 3 |
| 2 | 20 | 0 |
| 2 | 40 | 0 |
+---------------+-----------+---------------+
If stats are 1068 or 1069 I must filter them on the column random_number but random_number value can be different for 1070 and 1069. How to look only for a precise id_stats with a precise random_number?
In my case for example, I would like to filter on equipements that has the stats 1070 with random_number 2 and stats 1069 with random_number 3 as the 2nd entry.
Thanks you for helping!
The easiest way to filter tuples is this:
WHERE (equipement_stats.id_stats, equipement_stats.random_number) IN ( (1068,2) , (1069,3) )

How to determine what's changed between database records

Presume first, that the following table exists in a MySQL Database
|----|-----|-----|----|----|-----------|--------------|----|
| id | rid | ver | n1 | n2 | s1 | s2 | b1 |
|----|-----|-----|----|----|-----------|--------------|----|
| 1 | 1 | 1 | 0 | 1 | Hello | World | 0 |
| 2 | 1 | 2 | 1 | 1 | Hello | World | 0 |
| 3 | 1 | 3 | 0 | 0 | Goodbye | Cruel World | 0 |
| 4 | 2 | 1 | 0 | 0 | Hello | Doctor | 1 |
| 5 | 2 | 2 | 0 | 0 | Hello | Nurse | 1 |
| 6 | 3 | 1 | 0 | 0 | Dippity | Doo-Dah | 1 |
|----|-----|-----|----|----|-----------|--------------|----|
Question
How do I write a query to determine whether for any given rid, what changed between the most recent version and the version immediately preceding it (if any) such that it produces something like this:
|-----|-----------------|-----------------|-----------------|
| rid | numbers_changed | strings_changed | boolean_changed |
|-----|-----------------|-----------------|-----------------|
| 1 | TRUE | TRUE | FALSE |
| 2 | FALSE | TRUE | FALSE |
| 3 | n/a | n/a | n/a |
|-----|-----------------|-----------------|-----------------|
I think that I should be able to do this by doing a cross-join between the table and itself but I can't resolve how to perform this join to get the desired output.
I need to generate this "report" for a table with 10's of columns and 1-10 versions of 100's of records (resulting in 1000's of rows). Note the particular design of the database is not my own and altering the structure of the database (at this time) is not an acceptable approach.
The actual format of the output isn't important - and if it simplifies the query getting a "full breakdown" of what changed for each "change set" would also be acceptable, for example
|-----|-----|-----|----|----|----|----|----|
| rid | old | new | n1 | n2 | s1 | s2 | b1 |
|-----|-----|-----|----|----|----|----|----|
| 1 | 1 | 2 | Y | N | N | N | N |
| 1 | 2 | 3 | Y | Y | Y | Y | N |
| 2 | 4 | 5 | N | N | N | Y | N |
|-----|-----|-----|----|----|----|----|----|
Note that it is also ok, in this case to omit rid records which only have a single version, as for the purposes of this report I only care about records that have changed and getting a separate list of records that haven't changed is an easy query
You can join every row with the following one with
select *
from history h1
join history h2
on h2.rid = h1.rid
and h2.id = (
select min(h.id)
from history h
where h.rid = h1.rid
and h.id > h1.id
);
Then you just need to compare every column from the two rows like h1.n1 <> h2.n1 as n1.
The full query would be:
select h1.rid, h1.id as old, h2.id as new
, h1.n1 <> h2.n1 as n1
, h1.n2 <> h2.n2 as n2
, h1.s1 <> h2.s1 as s1
, h1.s2 <> h2.s2 as s2
, h1.b1 <> h2.b1 as b1
from history h1
join history h2
on h2.rid = h1.rid
and h2.id = (
select min(h.id)
from history h
where h.rid = h1.rid
and h.id > h1.id
);
Result:
| rid | old | new | n1 | n2 | s1 | s2 | b1 |
|-----|-----|-----|----|----|----|----|----|
| 1 | 1 | 2 | 1 | 0 | 0 | 0 | 0 |
| 1 | 2 | 3 | 1 | 1 | 1 | 1 | 0 |
| 2 | 4 | 5 | 0 | 0 | 0 | 1 | 0 |
Demo: http://sqlfiddle.com/#!9/2e5d12/5
If the columns can contain NULLs, You might need something like NOT h1.n1 <=> h2.n1 as n1. <=> is a NULL-save equality check.
If the version within a rid group is guaranteed to be consecutive, you can simplify the JOIN to
from history h1
join history h2
on h2.rid = h1.rid
and h2.ver = h1.ver + 1
Demo: http://sqlfiddle.com/#!9/2e5d12/7

count record from two tables which has no relation

I have two tables tbl_user1 and tbl_user2 both are field name are same but there is no relation between that tables now I want to find total referred count from both table for example...
tbl_user1
-----------------------
UID | referenceBy | firstName | lastName | emailAddress
----------------------------------------------------------------------------
1 | NULL | aa1 | ab1 | aa1#email.com
2 | aa1#email.com | aa2 | ab2 | aa2#email.com
3 | NULL | aa3 | ab3 | aa3#email.com
4 | aa2#email.com | aa4 | ab4 | aa4#email.com
5 | aa2#email.com | aa5 | ab5 | aa5#email.com
6 | bb1#email.com | aa6 | ab6 | aa6#email.com
7 | bb2#email.com | aa7 | ab7 | aa7#email.com
8 | bb3#email.com | aa8 | ab8 | aa8#email.com
9 | bb3#email.com | aa9 | ab9 | aa9#email.com
and second one table is somthing like...
tbl_user2
-----------------------
UID | referenceBy | firstName | lastName | emailAddress
----------------------------------------------------------------------------
1 | NULL | bb1 | bc1 | bb1#email.com
2 | bb1#email.com | bb2 | bc2 | bb2#email.com
3 | NULL | bb3 | bc3 | bb3#email.com
4 | bb3#email.com | bb4 | bc4 | bb4#email.com
5 | bb2#email.com | bb5 | bc5 | bb5#email.com
6 | bb1#email.com | bb6 | bc6 | bb6#email.com
7 | aa2#email.com | bb7 | bc7 | bb7#email.com
8 | aa3#email.com | bb8 | bc8 | bb8#email.com
9 | bb5#email.com | bb9 | bc9 | bb9#email.com
now, as you can see there is no relation between these two tables and I want result like following..
MAIN_RESULT_THAT_I_WANT
-----------------------
referenceEmail | referenceEmailCount
----------------------------------------------------------------------------
aa1#email.com | 1
aa2#email.com | 3
aa3#email.com | 1
aa4#email.com | 0
aa5#email.com | 0
aa6#email.com | 0
aa7#email.com | 0
aa8#email.com | 0
aa9#email.com | 0
bb1#email.com | 3
bb2#email.com | 2
bb3#email.com | 3
bb4#email.com | 0
bb5#email.com | 1
bb6#email.com | 0
bb7#email.com | 0
bb8#email.com | 0
bb9#email.com | 0
here in result all emailAddress of all user and total of how many user(s) registered by that particular emailAddress.
I am guessing that the result you want is just copy and pasted since it seems inaccurate. Like HoneyBadger says it is strange that aa6 is missing and still in the result, that indicates you have another list you are not telling us about? Or you just write the result in notepad...
If you just want a list of emails and count this will work:
select referenceBy, count(1) as referenceEmailCount from (
select referenceBy from tbl_user1
union all
select referenceBy from tbl_user2
) as t
group by referenceBy
Otherwise give us more info if this is not what you need.
Since the schema is same for 2 tables so you can perform union to get combined results and can perform an outer query to get the total count.
select referenceEmail, count(*) as referenceEmailCount from (
select * from table1
union all
select * from table2
) as alias
group by alias.referenceEmail

Merge 2 different mysql tables

I have two tables:
Table a:
+----+------+
| id | data |
+----+------+
| 1 | 450 |
| 2 | 500 |
| 3 | 550 |
| 4 | 600 |
| 5 | 650 |
+----+------+
Table b:
+----+------+------+
| id | a_id | note |
+----+------+------+
| 1 | 2 | 25 |
| 2 | 5 | 10 |
+----+------+------+
I need a query that returns a table that consists of every row from table a with the notes from table b. I want 0 filled in where a note isn't available on a row. I want it to look like this:
+----+------+------+
| id | data | note |
+----+------+------+
| 1 | 450 | 0 |
| 2 | 500 | 25 |
| 3 | 550 | 0 |
| 4 | 600 | 0 |
| 5 | 650 | 10 |
+----+------+------+
How do I do that?
select a.id, a.data, coalesce(b.note, 0) as note
from a
left join b on a.id = b.a_id
What are you looking for is called LEFT/RIGHT JOIN. This question will give you more details about what they are.
Assume you have a query like:
SELECT * FROM a LEFT JOIN b ON some_condition;
Then, its output will contain every row from table a, along with data from table b where the condition is met. For rows where the condition is not met, the columns with data from b will contain null.

MYSQL - how to string comparisons and query?

+--------------------+---------------+------+-----+---------+-------+
| ID | GKEY |GOODS | PRI | COUNTRY | Extra |
+--------------------+---------------+------+-----+---------+-------+
| 1 | BOOK-1 | 1 | 10 | | |
| 2 | PHONE-1 | 2 | 12 | | |
| 3 | BOOK-2 | 1 | 13 | | |
| 4 | BOOK-3 | 1 | 10 | | |
| 5 | PHONE-2 | 2 | 10 | | |
| 6 | PHONE-3 | 2 | 20 | | |
| 7 | BOOK-10 | 2 | 20 | | |
| 8 | BOOK-11 | 2 | 20 | | |
| 9 | BOOK-20 | 2 | 20 | | |
| 10 | BOOK-21 | 2 | 20 | | |
| 11 | PHONE-30 | 2 | 20 | | |
+--------------------+---------------+------+-----+---------+-------+
Above is my table. I want to get all records which GKEY > BOOK-2, Who can tell me the expression with mysql?
Using " WHERE GKEY>'BOOK-2' " Cannot get the correct results.
How about (something like):
(this is MSSQL - I guess it will be similar in MySQL)
select
*
from
(
select
*,
index = convert(int,replace(GKEY,'BOOK-',''))
from table
where
GKEY like 'BOOK%'
) sub
where
sub.index > 2
By way of explanation: The inner query basically recreates your table, but only for BOOK rows, and with an extra column containing the index in the right data type to make a greater than comparison work numerically.
Alternatively something like this:
select
*
from table
where
(
case
when GKEY like 'BOOK%' then
case when convert(int,replace(GKEY,'BOOK-','')) > 2 then 1
else 0
end
else 0
end
) = 1
Essentially the problem is that you need to check for BOOK before you turn the index into a numberic, as the other values of GKEY would create an error (without doing some clunky string handling).
SELECT * FROM `table` AS `t1` WHERE `t1`.`id` > (SELECT `id` FROM `table` AS `t2` WHERE `t2`.`GKEY`='BOOK-2' LIMIT 1)