Is INNER JOIN + INSERT INTO TABLE faster than UNION ALL in MySQL? - mysql

I have a query for a list of genes and I am doing an INNER JOIN to retrieve the results that matches those genes in a database (EER Diagram):
SELECT t1.*, database1.*
FROM t1
INNER JOIN database1
ON t1.GeneSymbol = database1.GeneSymbol;
I have multiple databases that contains interactions of genes with different number of rows (varies from 5,000 to 70,000,000 rows) and I would like to add up all the rows together that match. I have tried to perform an simple UNION ALL instead of a INNER JOIN like the following:
SELECT t1.*, database1.*
FROM t1, database1
WHERE t1.GeneSymbol = database1.GeneSymbol
UNION ALL
SELECT t1.*, database2.*
FROM t1, database2
WHERE t1.GeneSymbol = database2.GeneSymbol;
However, If I try to add up more and more databases using UNION ALL to merge the results, it would take forever. I was wondering if doing an INNER JOIN + INSERT INTO TABLE for all the databases and inserting the output in a table with the correct number of columns, would it go a lot faster ?

Related

How to do a join on 2 tables, but only return the data for one table?

I am not sure if this is possible. But is it possible to do a join on 2 tables, but return the data for only one of the tables. I want to join the two tables based on a condition, but I only want the data for one of the tables. Is this possible with SQL, if so how? After reading the docs, it seems that when you do a join you get the data for both tables. Thanks for any help!
You get data from both tables because join is based on "Cartesian Product" + "Selection". But after the join, you can do a "Projection" with desired columns.
SQL has an easy syntax for this:
Select t1.* --taking data just from one table
from one_table t1
inner join other_table t2
on t1.pk = t2.fk
You can chose the table through the alias: t1.* or t2.*. The symbol * means "all fields".
Also you can include where clause, order by or other join types like outer join or cross join.
A typical SQL query has multiple clauses.
The SELECT clause mentions the columns you want in your result set.
The FROM clause, which includes JOIN operations, mentions the tables from which you want to retrieve those columns.
The WHERE clause filters the result set.
The ORDER BY clause specifies the order in which the rows in your result set are presented.
There are a few other clauses like GROUP BY and LIMIT. You can read about those.
To do what you ask, select the columns you want, then mention the tables you want. Something like this.
SELECT t1.id, t1.name, t1.address
FROM t1
JOIN t2 ON t2.t1_id = t1.id
This gives you data from t1 from rows that match t2.
Pro tip: Avoid the use of SELECT *. Instead, mention the columns you want.
This would typically be done using exists (or in) if you prefer:
select t1.*
from table1 t1
where exists (select 1 from table2 t2 on t2.x = t1.y);
Although you can use join, it runs the risk of multiplying the number of rows in the result set -- if there are duplicate matches in table2. There is no danger of such duplicates using exists (or in). I also find the logic to be more natural.
If you join on 2 tables.
You can use SELECT to select the data you want
If you want to get a table of data, you can do this,just select one table date
SELECT b.title
FROM blog b
JOIN type t ON b.type_id=t.id;
If you want to get the data from two tables, you can do this,select two table date.
SELECT b.title,t.type_name
FROM blog b
JOIN type t ON b.type_id=t.id;

Merge two tables to one and remove duplicates

I have 2 tables in the same database.
I want to merge them based on the common id column. Because the tables are too huge I am not sure if there are duplicates.
How is it possible to merge these two tables into one based on the id and be sure that there are no duplicates?
SELECT *
FROM table1,table2
JOIN
GROUP BY id
What do you mean by merging two tables? Do you want records and columns from both the tables or columns from one and records from both?
Either way you will need to change the join clause only.
You could do a join on the columns you wish to
SELECT DISTINCT *
FROM table1 tb1
JOIN table2 tb2
ON table1.id = table2.id
Now if you want columns from only table1 do a LEFT JOIN
If you want columns from only table2 then a RIGHT JOIN
If you want columns from both the tables, use the query as is.
DISTINCT ensures that you get only a single row if there are multiple rows with the same data (but this distinct will check values for all columns in a row whether they are different or the same)
Union won't help if both tables have different number of columns. If you don't know about joins then use a Cartesian product
select distinct *
from table1 tb1, table2 tb2
where tb1.id = tb2.id
Where id is the column that is common between the tables.
Here if you want columns from only table1 do
select distinct tb1.*
Similarly replace tb1 by tb2 in the above statement if you just want table2 columns.
select distinct tb2.*
If you want cols from both just write '*'
In either cases I.e. joins and products said above if you need selective columns just write a table alias. E.g.
Consider :
table1 has id, foo, bar as columns
table2 has id, name,roll no, age
you want only id, foo, name from both the tables in the select query result
do this:
select distinct tb1.id, tb1.foo, tb2.name
from table1 tb1
join table2 tb2
on tb1.id=tb2.id
Same goes for the Cartesian product query. tb1, tb2 are BTW called as a table aliases.
If you want data from both the tables even if they have nothing in common just do
select distinct *
from table1 , table2
Note that this cannot be achieved using a join as join requires a common column to join 'on'
I am not sure What exactly do you want but anyway, this is your code
SELECT *
FROM table1,table2
JOIN
GROUP BY id
i just edit your query
SELECT *
FROM table1 JOIN table2
on table2.id = table1.id
GROUP BY table1.id // here you have to add table
//on which you will be group by at this moment this is table1
Try UNION:
https://dev.mysql.com/doc/refman/5.0/en/union.html
IT is very simple. Hope it will help.
Also you should have a look at "DISTINCT".

PHP Selecting from various database tables and ordering

I need to select data from 3 mysql database tables, tally up various results from 1 table (points), order by highest points to lowest and show only 100 results.
I have this query which I believe may be on the cusp of success but not quite.
The 3 tables are users, dealerships and sales_list.
Your assistance with achieving the above and correcting the query is appreciated.
$query = " SELECT t1.*, t2.*, t3.sales_points
FROM users t1
JOIN dealerships t2
ON t1.dealership_id = t2.dealership_id
INNER JOIN sales_list t3
ON t1.users_sales_guild_id = t3.users_sales_guild_id
ORDER BY t3.sales_points
LIMIT 100";

Merge 3 structurally identical tables if value in date column exists in all 3

I have a mysql db/server that has 3 tables that are identical in structure:
west, midwest and east.
I would like to create a national table with the sum of the columns of those regional tables, ONLY if the datetime row matches all 3 tables. That way if one hour is missing in a particular table, I don't end up summing 2 regions and calling it national.
Here is how I am thinking to do it:
All 3 tables have a datetime column.
Merge the tables (union?) only if the datetime row exists in all 3 tables.
Aggregate (sum) the columns grouped by datetime column. I would of course be summing all columns which carry int values.
I am not sure how to run a query that would perform this task.
These tables have 11mil rows so an efficient way would be great.
I am also open to other approaches to solve this problem.
I picked the answer from Neil because although the answer would not work if datetime col is not unique i.e. multiple rows in Table1 with the same datetime. Using any other method the performance I got was horrific, hours of query time. I decided to compromise. I created 3 new tables
westh, midwesth and southh.
These 3 new tables are a creation of aggregating the original tables by hour.
I then used Neils second version with a twist:
INNER JOIN Table2 USING (datetime)
While datetime is indexed in my tables that provides superior performance which is a firm criteria for me.
First version:
SELECT T123.dtcol, SUM(T123.intcol) AS intcolsum
FROM (
SELECT Table1.dtcol, Table1.intcol FROM Table1
UNION
SELECT Table2.dtcol, Table2.intcol FROM Table2
UNION
SELECT Table3.dtcol, Table3.intcol FROM Table3
) T123
GROUP BY T123.dtcol
HAVING COUNT(*) = 3
Second version:
SELECT Table1.dtcol, Table1.intcol + Table2.intcol + Table3.intcol AS intcolsum
FROM Table1 T1
INNER JOIN Table2 T2 ON T2.dtcol = T1.dtcol
INNER JOIN Table3 T2 ON T3.dtcol = T1.dtcol
use
SELECT A.dtcol, SUM (A.intcol) intcolsum FROM
(
SELECT 'T1' T, T1.* FROM Table1 T1
UNION
SELECT 'T2' T, T2.* FROM Table2 T2
UNION
SELECT 'T3' T, T3.* FROM Table3 T3
) A
WHERE A.dtcol IN
(
SELECT T1.dtcol
FROM Table1 T1
INNER JOIN Table2 T2 ON T2.dtcol = T1.dtcol
INNER JOIN Table3 T2 ON T3.dtcol = T1.dtcol
)
GROUP BY A.dtcol

MySQL, merging 2 or more tables before execute SELECT DISTINCT query?

I want to calculate how many unique logins from 2 (or probably more tables).
I tried this:
SELECT count(distinct(l1.user_id))
FROM `log_1` l1
LEFT JOIN `log_2` l2
ON l1.userid = l2.userid;
But it gives me result of l1. If I didn't put l1 on li.userid (distinct), it said "ambiguous".
How do I combine the table, and then select unique login of the combined table?
EDIT:
Tested: I test the count(distinct(l1.userid)) and count(distinct(l2.userid)). It gives me different result
If you are using LEFT JOIN then you will get at least one row in the combined result for each row in l1, so the join is entirely unnecessary if you just want a distinct count. This would give you the same result as your query:
SELECT count(distinct(l1.user_id))
FROM `log_1` l1
Perhaps you want an INNER JOIN or UNION instead? A UNION will count a user if they appear in either table. An INNER JOIN will count them only if they appear in both tables. Here's an example of the UNION:
SELECT count(*) FROM (
SELECT distinct(user_id) FROM `log_1`
UNION
SELECT distinct(user_id) FROM `log_2`
) T1