I have an app that reads from multiple mysql tables, but I'd like to put all the data into 1 table. Thing is, these tables have no linking fields... the app just sequentially processes the rows across the 3 tables, with the hope that the correct rows are lined up in each table (i.e. that row1 in table1 is applicable to row1 in table 2 and table3, and so on)
My tables are as follows:
Table1:
Name,Surname,ID,DoB
Table2:
Address,Town,State
Table3:
password
What I want is :
Table4:
Name,Surname,ID,DoB,Address,Town,State,password
I have created Table4 and I'm now trying to insert the values with a select query...
I've tried ...
SELECT
t1.Name,
t1.Surname,
t1.ID,
t1.DoB,
t2.Address,
t2.Town,
t2.State,
t3.password
FROM table1 AS t1,table2 AS t2, table3 AS t3;
...but this gives me duplicate rows cos there is no where clause. And since there's no linking fields, i can't use a JOIN statement, right?
I'm not a very experienced with SQL, so please help!
Well, officialy you're messed up. There is no first or last row in a RDBMS unless you use an ORDER BY clause. That's also what the manual states. If you issue a
SELECT * FROM your_table;
you can not be sure to get the result in the same order the rows were inserted or in the same order every time you issue the statement at all.
In practice on the other hand, most of the time you will get the same result and most of the time even in the same order the rows were inserted.
What you can do, is, to first slap the one who didn't think of putting a column in each table that determines a sort order (in the future use either an auto_increment column or a timestamp column that holds the date and time of insertion or whatever suits your needs) and second, (but really do this only if you have no other choice, as like I said it's unreliable) you can emulate a row number on which you can join.
SELECT * FROM (
SELECT table1.*, #rn1 := #rn1 + 1 as row_number FROM
table1,
(SELECT #rn1 := 0) v
) a
LEFT JOIN (
SELECT table2.*, #rn2 := #rn2 + 1 as row_number FROM
table2,
(SELECT #rn2 := 0) v
) b ON a.row_number = b.row_number
LEFT JOIN (
SELECT table3.*, #rn3 := #rn3 + 1 as row_number FROM
table3,
(SELECT #rn3 := 0) v
) c ON a.row_number = c.row_number
Related
I have a temporary query, e.g :
CREATE TEMPORARY TABLE IF NOT EXISTS table4 AS (select * from table1)
and then, i have a another table resulting from a query, like:
select column from table2
what I would to do is to concatenated this column as a new column on the temparary table. Inner join would not work because they dont have a commom column
This would be like the concatenate() on python with axis=0.
I would appreciate any help
If I understand correctly you want to add the concatenate results of the second query as another column of your temporary table. Doesn't make much sense without more context as why would you want the same results on the new column on every row. But here goes my solution:
CREATE TEMPORARY TABLE IF NOT EXISTS table4 AS
(
select
*,
(select group_concat(column) from table2 group by null) as concatcolumn
from
table1
)
I have grouped by NULL on the group_concat so that it groups through all the rows. Inside this "nested" (is it even called nested when inside a column definition?) you can add where conditions which would make this question make somewhat more sense. Hope this solution helps. Cheers,
*****EDIT****
Based on OP's comments and supposing that both tables have rows that are aligned (matching rows have same row number but no matching key). This was more difficult than I expected as this is tagged as MySQL but this DBRM has no ranking function. Here is what I came up with that is untested.
CREATE TEMPORARY TABLE IF NOT EXISTS table4 AS
(
select
t1.*,
t2.column
from
(
select t.*,#rownum := #rownum + 1 as rank from table1 t, (select #rownum := 0) r
) t1
join
(
select t.*,#rownum := #rownum + 1 as rank from table2 t, (select #rownum := 0) r
) t2
on
t1.rank = t2.rank
)
I have a simple table with following structure and a lot of rows:
id | name | title |
------------------------------
Need to replace id with other value, in other words I need to permanently shuffle my table. What query do I need to run? This query I need to run exactly one time... no matter how long time or memory it will take.
Considering your rows size is 800, you may do something like bellow:
Create a Temporary Table with all record of your table.
e.g. CREATE TABLE TMP_TABLE (SELECT * FROM YOUR_TABLE).
DROP TABLE YOUR_TABLE; .
CREATE TABLE YOUR_TABLE (SELECT * FROM TMP_TABLE ORDER BY RAND() )
DROP TABLE TMP_TABLE; .
The following query should do that following:
The whole id set will be same as before, just shuffle the ids;
tbl is the one to update
tbl2 generates a random row_num for tbl
tbl3 generates a random (different to above) row_num for tbl3
with tbl2.row_num1 = tbl3.row_num2, the shuffle is done
UPDATE tbl INNER JOIN
(SELECT *, (#rm1 := #rm1 + 1) as row_num1 FROM tbl CROSS JOIN (SELECT #rn1 := 0) param ORDER BY RAND()) tbl2
ON tbl.id = tbl2.id
INNER JOIN
(SELECT *, (#rm2 := #rm2 + 1) as row_num2 FROM tbl CROSS JOIN (SELECT #rn2 := 0) param ORDER BY RAND()) tbl3
ON tbl2.row_num1 = tbl3.row_num2
SET tbl.id = tbl3.id;
You could use database VIEW for your purpose using shuffle logic
Otherwise use another table to backup current table and then SELECT shuffle rows from the backup table. Then TRUNCATE the your table and then insert shuffle rows from select query of backup table.
I have a column in which I insert values incremented by one (the column is NOT AUTO_INCREMENT). How can I search for gaps in it, for instance the column goes: 1,2,4,5,6 - I want to select that missing 3. Thanks!
Simple case: only first gap
You can do that with:
SELECT
MIN(seq)
FROM
(SELECT
sequence,
#seq:=#seq+1 AS seq
FROM
t CROSS JOIN (SELECT #seq:=0) AS init) AS gap
WHERE
sequence!=seq
-here field sequence is pointing to your column, where you're looking for gap. See fiddle demo
Now, common case.
You can not use just JOIN or something like this, since size of your table may be lesser than possible gap in it. Imagine the situation, when your table is filled with only minimum and maximum value. For example, 1 and 10 - and you want to get all rows, so result will be sequence 1, 2, ... , 10. No matter how you will JOIN your table with itself, you will only get two rows - because only those two exist in your table. UNION is also not an option - since if we'll build that for 10, in common case that can be 100, 1000, e t.c. So for common case you have to create sequence table and fill it with values from MIN(sequence) and MAX(sequence) - and then, using LEFT JOIN, do like:
SELECT
full_table.sequence
FROM
full_table
LEFT JOIN t
ON full_table.sequence=t.sequence
WHERE
t.sequence IS NULL
Thats the statement I use for those tasks. id represents the field which you want to analyze.
SELECT t1.id+1 AS 'start_seq',
MIN(t2.id) - 1 AS 'end_seq'
FROM yourTable AS t1,
yourTable AS t2
WHERE t1.id < t2.id
GROUP BY t1.id
HAVING 'start_seq' < MIN(t2.id);
Though this one gets the job done, there might be better and more compact solutions out there.
You can try something like this
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM your_table t3 WHERE t3.id > t1.id) as gap_ends_at
FROM your_table t1
WHERE NOT EXISTS (SELECT t2.id FROM your_table t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
try this!!
DECLARE #a int
SET #a = SELECT MIN(num) FROM table1
WHILE (SELECT MAX(num) FROM table1 ) > #a
BEGIN
IF #a NOT IN ( SELECT num FROM table1 )
PRINT #a
SET #a=#a+1
END
I know it is not possible directly.
But I want to achieve this by any indirect method if possible.
Actually I wanted to add below query to view which throws error , Sub query not allowed in view.
select T1.Code,
T1.month,
T1.value,
IfNull(T2.Value,0)+IfNull(T3.value,0) as value_begin
from (select *,#rownum := #rownum + 1 as rownum
from Table1
Join (SELECT #rownum := 0) r) T1
left join (select *,#rownum1 := #rownum1 + 1 as rownum
from Table1
Join (SELECT #rownum1 := 0) r) T2
on T1.code = T2.code
and T1.rownum = T2.rownum + 1
left join (select *,#rownum2 := #rownum2 + 1 as rownum
from Table1
Join (SELECT #rownum2 := 0) r) T3
on T1.code = T3.code
and T1.rownum = T3.rownum + 2
Order by T1.Code,T1.rownum
So, I thought I will make Sub query as separate view but that again throws error that variables not allowed in view. Please Help to overcome this situation.
Thanx in advance
You could try the method of triangle join + count for assigning row numbers. It will likely not perform well on large datasets, but instead you should be able to implement everything with a couple of views (if you think there's no other way to do what you want to do than with a view). The idea is as follows:
The dataset is joined to itself on the condition of master.key >= secondary.key, where master is the instance where detail data will actually be pulled from, and secondary is the other instance of the same table used to provide the row numbers.
Based on that condition, the first* master row would be joined with one secondary row, the second one with two, the third one with three and so on.
At this point, you can group the result set by the master key column(s) as well as the columns that you need in the output (although in MySQL it would be enough to group by the master key only). Count the rows in every group will give you corresponding row numbers.
So, if there was a table like this:
CREATE TABLE SomeTable (
ID int,
Value int
);
the query to assign row numbers to the table could look like this:
SELECT m.ID, m.Value, COUNT(*) AS rownum
FROM SomeTable AS m
INNER JOIN SomeTable AS s ON m.ID >= s.ID
GROUP BY m.ID, m.Value
;
Since you appear to want to self-join the ranked rowset (and twice too), that would require using the above query as a derived table, and since you also want the entire thing to be a view (which doesn't allow subqueries in the FROM clause), you would probably need to define the ranking query as a separate view:
CREATE RankingView AS
SELECT m.ID, m.Value, COUNT(*) AS rownum
FROM SomeTable AS m
INNER JOIN SomeTable AS s ON m.ID >= s.ID
GROUP BY m.ID, m.Value
;
and subsequently refer to that view in the main query:
CREATE SomeOtherView AS
SELECT ...
FROM RankingView AS t1
LEFT JOIN RankingView AS t2 ON ...
...
This SQL Fiddle demo shows the method and its usage.
One note with regard to your particular situation. Your table probably needs row numbers to be assigned in partitions, i.e. every distinct Code row group needs its own row number set. That means that your ranking view should specify the joining condition as something like this:
ON m.Code = s.Code AND m.Month >= s.Month
Please note that months in this case are assumed to be unique per Code. If that is not the case, you may first need to create a view that groups the original dataset by Code, Month and rank that view instead of the original dataset.
* According to the order of key.
I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.
Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)
This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.
Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1), and then making my second query like this instead:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM temp_view
)
And that works just fine. MySQL does this in some milliseconds.
Any SQL experts here who can explain what's going on?
The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
The final query would look like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
)
Rewrite the query into this
SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id /* list a unique sometable field here*/
HAVING COUNT(*) > 1
I think st2.relevant_field must be in the select, because otherwise the having clause will give an error, but I'm not 100% sure
Never use IN with a subquery; this is notoriously slow.
Only ever use IN with a fixed list of values.
More tips
If you want to make queries faster,
don't do a SELECT * only select
the fields that you really need.
Make sure you have an index on relevant_field to speed up the equi-join.
Make sure to group by on the primary key.
If you are on InnoDB and you only select indexed fields (and things are not too complex) than MySQL will resolve your query using only the indexes, speeding things way up.
General solution for 90% of your IN (select queries
Use this code
SELECT * FROM sometable a WHERE EXISTS (
SELECT 1 FROM sometable b
WHERE a.relevant_field = b.relevant_field
GROUP BY b.relevant_field
HAVING count(*) > 1)
SELECT st1.*
FROM some_table st1
inner join
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;
I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.
This worked a lot faster, try it!
I have reformatted your slow sql query with www.prettysql.net
SELECT *
FROM some_table
WHERE
relevant_field in
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT ( * ) > 1
);
When using a table in both the query and the subquery, you should always alias both, like this:
SELECT *
FROM some_table as t1
WHERE
t1.relevant_field in
(
SELECT t2.relevant_field
FROM some_table as t2
GROUP BY t2.relevant_field
HAVING COUNT ( t2.relevant_field ) > 1
);
Does that help?
Subqueries vs joins
http://www.scribd.com/doc/2546837/New-Subquery-Optimizations-In-MySQL-6
Try this
SELECT t1.*
FROM
some_table t1,
(SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT (*) > 1) t2
WHERE
t1.relevant_field = t2.relevant_field;
Firstly you can find duplicate rows and find count of rows is used how many times and order it by number like this;
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
after that create a table and insert result to it.
create table CopyTable
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
Finally, delete dublicate rows.No is start 0. Except fist number of each group delete all dublicate rows.
delete from CopyTable where No!= 0;
sometimes when data grow bigger mysql WHERE IN's could be pretty slow because of query optimization. Try using STRAIGHT_JOIN to tell mysql to execute query as is, e.g.
SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)
but beware: in most cases mysql optimizer works pretty well, so I would recommend to use it only when you have this kind of problem
This is similar to my case, where I have a table named tabel_buku_besar. What I need are
Looking for record that have account_code='101.100' in tabel_buku_besar which have companyarea='20000' and also have IDR as currency
I need to get all record from tabel_buku_besar which have account_code same as step 1 but have transaction_number in step 1 result
while using select ... from...where....transaction_number in (select transaction_number from ....), my query running extremely slow and sometimes causing request time out or make my application not responding...
I try this combination and the result...not bad...
`select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
L.TRANSACTION_NUMBER AS VOUCHER,
L.ACCOUNT_CODE,
C.DESCRIPTION,
L.DEBET,
L.KREDIT
from (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE!='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L
INNER JOIN (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA
LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA
ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`
I find this to be the most efficient for finding if a value exists, logic can easily be inverted to find if a value doesn't exist (ie IS NULL);
SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL
*Replace relevant_field with the name of the value that you want to check exists in your table
*Replace primaryKey with the name of the primary key column on the comparison table.
It's slow because your sub-query is executed once for every comparison between relevant_field and your IN clause's sub-query. You can avoid that like so:
SELECT *
FROM some_table T1 INNER JOIN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) T2
USING(relevant_field)
This creates a derived table (in memory unless it's too large to fit) as T2, then INNER JOIN's it with T1. The JOIN happens one time, so the query is executed one time.
I find this particularly handy for optimising cases where a pivot is used to associate a bulk data table with a more specific data table and you want to produce counts of the bulk table based on a subset of the more specific one's related rows. If you can narrow down the bulk rows to <5% then the resulting sparse accesses will generally be faster than a full table scan.
ie you have a Users table (condition), an Orders table (pivot) and LineItems table (bulk) which references counts of Products. You want the sum of Products grouped by User in PostCode '90210'. In this case the JOIN will be orders of magnitude smaller than when using WHERE relevant_field IN( SELECT * FROM (...) T2 ), and therefore much faster, especially if that JOIN is spilling to disk!