Strange results from 2 SQL queries - mysql

Query #1:
SELECT SUM(size)
FROM RepoSize s
LEFT JOIN VirtualRepo v ON s.repo_id = v.repo_id
WHERE v.repo_id IS NULL;
+----------------+
| SUM(size) |
+----------------+
| 61550890457198 |
+----------------+
1 row in set (0.32 sec)
Query #2:
SELECT SUM(size)
FROM RepoSize
WHERE repo_id NOT IN (SELECT repo_id FROM VirtualRepo);
+----------------+
| SUM(size) |
+----------------+
| 61551148262106 |
+----------------+
1 row in set (0.45 sec)
I thought the 2 SQL queries would return the same result, but the truth is the second value is larger than the first, repo_id is the primary key in both tables.
table structure:
mysql> desc RepoSize;
+---------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------------+------+-----+---------+-------+
| repo_id | char(37) | NO | PRI | NULL | |
| size | bigint(20) unsigned | YES | | NULL | |
| head_id | char(41) | YES | | NULL | |
+---------+---------------------+------+-----+---------+-------+
mysql> desc VirtualRepo;
+-------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+-------+
| repo_id | char(36) | NO | PRI | NULL | |
| origin_repo | char(36) | YES | MUL | NULL | |
| path | text | YES | | NULL | |
| base_commit | char(40) | YES | | NULL | |
+-------------+----------+------+-----+---------+-------+

As repo_id is primary key in both the tables, there is not a possibility of difference because of nulls in the field. So ideally both the queries should give the same result unless the data has changed in between the execution. So most probably the difference is because of change in data in between.

Related

Why is SELECT COUNT() returning an aggregated number when I'm expecting a count of each row?

So I'm struggling with this very (very) basic MySQL query which is supposed to retrieve courrier records ordered by number of joined reactions.
I have this table:
mysql> describe courrier;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| envoi | datetime | NO | | NULL | |
| intro | longtext | NO | | NULL | |
| courrier | longtext | NO | | NULL | |
| slug | varchar(255) | NO | | NULL | |
| categorie_id | int(11) | YES | MUL | NULL | |
| reponse | longtext | YES | | NULL | |
| recu | datetime | YES | | NULL | |
| published | tinyint(1) | NO | | NULL | |
| image_id | int(11) | YES | UNI | NULL | |
| like_count | int(11) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
12 rows in set (0.02 sec)
Which has:
mysql> select count(id) from courrier;
+-----------+
| count(id) |
+-----------+
| 56 |
+-----------+
1 row in set (0.00 sec)
Joined with:
mysql> describe reaction;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| courrier_id | int(11) | YES | MUL | NULL | |
| date | datetime | NO | | NULL | |
| ip | varchar(15) | NO | | NULL | |
| reaction | longtext | NO | | NULL | |
| url | varchar(255) | YES | | NULL | |
| name | varchar(255) | NO | | NULL | |
| status | int(11) | NO | | NULL | |
| email | varchar(255) | YES | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
9 rows in set (0.01 sec)
Which has:
mysql> select count(id) from reaction;
+-----------+
| count(id) |
+-----------+
| 236 |
+-----------+
1 row in set (0.00 sec)
On: ALTER TABLE reaction ADD CONSTRAINT FK_5DA165A18BF41DC7 FOREIGN KEY (courrier_id) REFERENCES courrier (id);
(backticks removed for readability)
So when I run this query:
SELECT c0_.id AS id_0,
c0_.name AS name_1,
c0_.slug AS slug_2,
c0_.envoi AS envoi_3,
c0_.intro AS intro_4,
c0_.courrier AS courrier_5,
c0_.reponse AS reponse_6,
c0_.published AS published_7,
c0_.like_count AS like_count_8,
c0_.recu AS recu_9,
COUNT(r1_.id) AS sclr_10,
c0_.image_id AS image_id_11,
c0_.categorie_id AS categorie_id_12
FROM courrier c0_
INNER JOIN reaction r1_ ON c0_.id = r1_.courrier_id
ORDER BY sclr_10 DESC LIMIT 25
I'm quite naturally expecting to be provided with one row per record in courrier along with a additional column specifying the number of joined reaction records.
But I'm returned: 1 row in set (0.03 sec). It's the first record inserted in courrier and the additional column is filled with the number 242.
What did I do wrong?
You should use a group by clause, otherwise the count will aggregate the whole result set:
SELECT c0_.id AS id_0 /*, ...*/,
COUNT(r1_.id) AS sclr_10
FROM courrier c0_
INNER JOIN reaction r1_ ON c0_.id = r1_.courrier_id
GROUP BY c0_.id
ORDER BY sclr_10 DESC
LIMIT 25
Note: if you are also interested in courrier records that have no corresponding record in reaction (count = 0), then use LEFT JOIN instead of INNER JOIN.

Empty results when the query uses SELECT * - The weird thing ever

I'm facing and weird issue with a query, I never saw something like this in more than 10 years developing.
A InnoDB table with 8M of rows, this table is used to store references between 2 other tables (pages and keywords)
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------------------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| page_id | int(11) unsigned | NO | PRI | 0 | |
| keyword_id | int(11) unsigned | NO | PRI | 0 | |
| avg_kc | tinyint(3) | YES | | NULL | |
| rank_on_page | tinyint(2) | YES | | NULL | |
| created_at | timestamp | NO | | CURRENT_TIMESTAMP | |
| updated_at | timestamp | NO | | 0000-00-00 00:00:00 | |
+-----------------+---------------------+------+-----+---------------------+----------------+
The issue is, if you use SELECT * returns an Empty set
mysql> select * from pages_metrics where keyword_id=2385 and page_id=6004 ;
Empty set (0.00 sec)
But, if you select only some fields the query runs well.
mysql> select id, page_id, keyword_id from pages_metrics where keyword_id=2385 and page_id=6004;
+---------+---------+------------+
| id | page_id | keyword_id |
+---------+---------+------------+
| 8469199 | 6004 | 2385 |
+---------+---------+------------+
1 row in set (0.00 sec)
I run the analyze table and all looks good.
mysql> analyze table pages_metrics;
+------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| keywords.pages_metrics | analyze | status | OK |
+------------------------+---------+----------+----------+
1 row in set (0.34 sec)
Any idea why is this happening?
Thanks in advance.

mySQL adding column breaks data

I have succesfully created a database in mySQL using the commandline and imported some data. It currently looks like this..
desc data;
+----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| code | varchar(10) | YES | | NULL | |
+----------+-------------+------+-----+---------+----------------+
SELECT * FROM data;
+----+----------+
| id | code |
+----+----------+
| 1 | 123abc
| 2 | 234def
| 3 | 567ghi
| 4 | 890jkl
I would like to add a column to the table called timestamp, I am doing this with..
alter table data add timestamp VARCHAR(20);
But then my table looks like this...
desc data;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| code | varchar(10) | YES | | NULL | |
| timestamp | varchar(20) | YES | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
SELECT * FROM data;
+----+----------+-----------+
| id | code | timestamp |
+----+----------+-----------+
| NULL |
| NULL |
| NULL |
| NULL |
Where am I going wrong?
here you can see the backticks
alter table `data` add `timestamp` VARCHAR(20);
SAMPLE
MariaDB []> desc data;
+-------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| e | enum('x1','x2','x3') | YES | | NULL | |
+-------+----------------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
MariaDB []> alter table `data` add `timestamp` VARCHAR(20);
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB []> desc data;
+-----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+----------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| e | enum('x1','x2','x3') | YES | | NULL | |
| timestamp | varchar(20) | YES | | NULL | |
+-----------+----------------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)
Table Data
MariaDB [who]> select * from `data`;
+----+------+-----------+
| id | e | timestamp |
+----+------+-----------+
| 1 | x1 | NULL |
| 2 | x2 | NULL |
+----+------+-----------+
2 rows in set (0.00 sec)
MariaDB [who]>

MySQL query with JOIN and GROUP BY optimization. Is it possible?

I have two tables: gpnxuser and key_value
mysql> describe gpnxuser;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| email | varchar(255) | YES | | NULL | |
| uuid | varchar(255) | NO | MUL | NULL | |
| partner_id | bigint(20) | NO | MUL | NULL | |
| password | varchar(255) | YES | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
and
mysql> describe key_value;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
| upkey | varchar(255) | NO | MUL | NULL | |
| user_id | bigint(20) | YES | MUL | NULL | |
| security_level | int(11) | NO | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
key_value.user_id is FK that references gpnxuser.id. I also have an index in gpnxuser.partner_id which is a FK that references a table called "partner" (which, I think, does not matter much to this question).
For partner_id = 64, I have 500K rows in gpnxuser which have relationship with approximatelly 6M rows in key_value.
I wanted to have a query that returned all distinct 'key_value.upkey' for userĀ“s belonging to a given partner. I did something like this:
select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
which takes forever to run. The explain for the query looks like:
mysql> explain select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | gpnxuser | ref | PRIMARY,FKB2D9FEBE725C505E | FKB2D9FEBE725C505E | 8 | const | 259640 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | key_value | ref | FK9E0C0F912D11F5A9 | FK9E0C0F912D11F5A9 | 9 | gpnx_finance_db.gpnxuser.id | 14 | Using where |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
My question is: is there a query that can run fast and obtain the result that I want?
what you need to do is utilize EXISTS statement: This will cause only partial table scan until a match found and not more.
select upkey from (select distinct upkey from key_value) upk
where EXISTS
(select 1 from gpnxuser u, key_value kv
where u.id=kv.user_id and partner_id=1 and kv.upkey = upk.upkey)
NB. In the original query, group by is misused: distinct looks better there.
select DISTINCT upkey from gpnxuser join key_value on
gpnxuser.id=key_value.user_id where partner_id=1
I would look into partitioning your key_value table on user_id, if you typically run queries based on this column.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Advanced Select

We have 4 tables:
mysql> desc Products;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| product_id | int(11) | NO | PRI | NULL | auto_increment |
| product | varchar(30) | NO | | NULL | |
+------------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> desc Vendors;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| vendor_id | int(11) | NO | PRI | NULL | auto_increment |
| vendor | varchar(30) | YES | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> desc Prices;
+------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+----------------+
| price_id | int(11) | NO | PRI | NULL | auto_increment |
| vendor_id | int(11) | NO | MUL | NULL | |
| product_id | int(11) | NO | MUL | NULL | |
| price | double | YES | | NULL | |
+------------+---------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> desc Bought;
+------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+----------------+
| bought_id | int(11) | NO | PRI | NULL | auto_increment |
| product_id | int(11) | NO | MUL | NULL | |
| date | date | YES | | NULL | |
| pieces | int(11) | YES | | 1 | |
+------------+---------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
now we need some complex select statements, to get the tables we need:
first table need columns [vendor, product, price, vendors(that offers the product)].
second table should show what was bought between $date1 and $date2 [product, pieces, vendor, price, date]
last table should show what could've been saved in the given time [vendor(the cheapest vendor for the product), product, pcs, price(for one product), sum(price for n products)].
As this wouldnt be complicated enaugh, the resulting tables have to show the names, instead of a key. We were sitting on this the hole day, but none of us have the knowledge to perform needed searches, so any help would be greatly appreciated.
Look into joins for selecting data from multiple tables:
SELECT * FROM Prices LEFT JOIN (Vendors, Products)
ON (Products.product_id=Prices.product_id AND Vendors.vendor_id=Prices.vendor_id)