How to shuffle id in my MySQL table permanently in place? - mysql

I have a simple table with following structure and a lot of rows:
id | name | title |
------------------------------
Need to replace id with other value, in other words I need to permanently shuffle my table. What query do I need to run? This query I need to run exactly one time... no matter how long time or memory it will take.

Considering your rows size is 800, you may do something like bellow:
Create a Temporary Table with all record of your table.
e.g. CREATE TABLE TMP_TABLE (SELECT * FROM YOUR_TABLE).
DROP TABLE YOUR_TABLE; .
CREATE TABLE YOUR_TABLE (SELECT * FROM TMP_TABLE ORDER BY RAND() )
DROP TABLE TMP_TABLE; .

The following query should do that following:
The whole id set will be same as before, just shuffle the ids;
tbl is the one to update
tbl2 generates a random row_num for tbl
tbl3 generates a random (different to above) row_num for tbl3
with tbl2.row_num1 = tbl3.row_num2, the shuffle is done
UPDATE tbl INNER JOIN
(SELECT *, (#rm1 := #rm1 + 1) as row_num1 FROM tbl CROSS JOIN (SELECT #rn1 := 0) param ORDER BY RAND()) tbl2
ON tbl.id = tbl2.id
INNER JOIN
(SELECT *, (#rm2 := #rm2 + 1) as row_num2 FROM tbl CROSS JOIN (SELECT #rn2 := 0) param ORDER BY RAND()) tbl3
ON tbl2.row_num1 = tbl3.row_num2
SET tbl.id = tbl3.id;

You could use database VIEW for your purpose using shuffle logic
Otherwise use another table to backup current table and then SELECT shuffle rows from the backup table. Then TRUNCATE the your table and then insert shuffle rows from select query of backup table.

Related

How to concatenate a table resulting from a query into another temporary table on a procedure stored on mysql?

I have a temporary query, e.g :
CREATE TEMPORARY TABLE IF NOT EXISTS table4 AS (select * from table1)
and then, i have a another table resulting from a query, like:
select column from table2
what I would to do is to concatenated this column as a new column on the temparary table. Inner join would not work because they dont have a commom column
This would be like the concatenate() on python with axis=0.
I would appreciate any help
If I understand correctly you want to add the concatenate results of the second query as another column of your temporary table. Doesn't make much sense without more context as why would you want the same results on the new column on every row. But here goes my solution:
CREATE TEMPORARY TABLE IF NOT EXISTS table4 AS
(
select
*,
(select group_concat(column) from table2 group by null) as concatcolumn
from
table1
)
I have grouped by NULL on the group_concat so that it groups through all the rows. Inside this "nested" (is it even called nested when inside a column definition?) you can add where conditions which would make this question make somewhat more sense. Hope this solution helps. Cheers,
*****EDIT****
Based on OP's comments and supposing that both tables have rows that are aligned (matching rows have same row number but no matching key). This was more difficult than I expected as this is tagged as MySQL but this DBRM has no ranking function. Here is what I came up with that is untested.
CREATE TEMPORARY TABLE IF NOT EXISTS table4 AS
(
select
t1.*,
t2.column
from
(
select t.*,#rownum := #rownum + 1 as rank from table1 t, (select #rownum := 0) r
) t1
join
(
select t.*,#rownum := #rownum + 1 as rank from table2 t, (select #rownum := 0) r
) t2
on
t1.rank = t2.rank
)

How can I do own increment value in mysql

I'm struggling to do proper sql script to increment field on specific way.
Those two script are without any exception, but nothing happened on the results.
Script 1:
UPDATE
myTable T1,
(
SELECT id,
(#s:=#s+1) AS seq
FROM myTable, (SELECT (#s:=0) AS s ) s
WHERE infotext IS NULL ORDER BY grouptext
) T2
SET sequence = seq
WHERE T1.id = T2.id
Script 2:
UPDATE myTable AS target
INNER JOIN (
SELECT supfault_id,
(#s:=#s+1) AS seq
FROM myTable, (SELECT (#s:=0) AS s ) s
WHERE infotext IS NULL ORDER BY grouptext
) AS ordered ON ordered.id = target.id
SET sequence = seq
This one get the last desc value from table1 and increment by one then update the table2:
set #inc = 0;
select cast(valToIncrement as signed) into #inc from
(select REPLACE(fkid,' ','') as valToIncrement from tbl_1 ORDER BY fkid)as a ORDER BY valToIncrement desc limit 1;
update tbl_2 set fkid = #inc + 1 where fkid = 122;
Subqueries working well separately, so I wondered why I can't update my sequence value by seq from subquery.
I'm not expert, but I felt that need to be used some virtual table for my subquery.
Here is solution for inner join case:
CREATE TEMPORARY TABLE supportGroupSeqcalculation AS
SELECT supfault_id,
(#s:=#s+1) AS seq
FROM myTable, (SELECT (#s:=0) AS s ) s
WHERE infotext IS NULL
ORDER BY grouptext;
UPDATE myTable AS target
INNER JOIN supportGroupSeqcalculation AS ordered ON ordered.supfault_id = target.supfault_id
SET sequence = seq;
DROP TEMPORARY TABLE supportGroupSeqcalculation;
We can get into temporary table specific order and record it as sequence value.
It is not necessarily to drop temporary table, it exists only in current session.

insert unrelated values from multiple mysql tables

I have an app that reads from multiple mysql tables, but I'd like to put all the data into 1 table. Thing is, these tables have no linking fields... the app just sequentially processes the rows across the 3 tables, with the hope that the correct rows are lined up in each table (i.e. that row1 in table1 is applicable to row1 in table 2 and table3, and so on)
My tables are as follows:
Table1:
Name,Surname,ID,DoB
Table2:
Address,Town,State
Table3:
password
What I want is :
Table4:
Name,Surname,ID,DoB,Address,Town,State,password
I have created Table4 and I'm now trying to insert the values with a select query...
I've tried ...
SELECT
t1.Name,
t1.Surname,
t1.ID,
t1.DoB,
t2.Address,
t2.Town,
t2.State,
t3.password
FROM table1 AS t1,table2 AS t2, table3 AS t3;
...but this gives me duplicate rows cos there is no where clause. And since there's no linking fields, i can't use a JOIN statement, right?
I'm not a very experienced with SQL, so please help!
Well, officialy you're messed up. There is no first or last row in a RDBMS unless you use an ORDER BY clause. That's also what the manual states. If you issue a
SELECT * FROM your_table;
you can not be sure to get the result in the same order the rows were inserted or in the same order every time you issue the statement at all.
In practice on the other hand, most of the time you will get the same result and most of the time even in the same order the rows were inserted.
What you can do, is, to first slap the one who didn't think of putting a column in each table that determines a sort order (in the future use either an auto_increment column or a timestamp column that holds the date and time of insertion or whatever suits your needs) and second, (but really do this only if you have no other choice, as like I said it's unreliable) you can emulate a row number on which you can join.
SELECT * FROM (
SELECT table1.*, #rn1 := #rn1 + 1 as row_number FROM
table1,
(SELECT #rn1 := 0) v
) a
LEFT JOIN (
SELECT table2.*, #rn2 := #rn2 + 1 as row_number FROM
table2,
(SELECT #rn2 := 0) v
) b ON a.row_number = b.row_number
LEFT JOIN (
SELECT table3.*, #rn3 := #rn3 + 1 as row_number FROM
table3,
(SELECT #rn3 := 0) v
) c ON a.row_number = c.row_number

percentile by COUNT(DISTINCT) with correlated WHERE only works with a view (or without DISTINCT)

I've got a weird one, and I don't know if it's my syntax (which seems straightforward) or a bug (or just unsupported).
Here's my query that works but is needlessly slow:
UPDATE table1
SET table1column1 =
(SELECT COUNT(DISTINCT table2column1) FROM table2view WHERE table2column1 <= (SELECT table2column1 FROM table2 WHERE table2.id = table1.id) )
/
(SELECT COUNT(DISTINCT table2column1) FROM table2)
+ (SELECT COUNT(DISTINCT table2column2) FROM table2view WHERE table2column2 <= (SELECT table2column2 FROM table2 WHERE table2.id = table1.id) )
/
(SELECT COUNT(DISTINCT table2column2) FROM table2)
+ (SELECT COUNT(DISTINCT table2column3) FROM table2view WHERE table2column3 <= (SELECT table2column3 FROM table2 WHERE table2.id = table1.id) )
/ (SELECT COUNT(DISTINCT table2column3) FROM table2);
It's just the sum of three percentiles (of table2column1, table2column2, and table2column3) with duplicates removed.
Here's where it gets weird. I have to use a view for this to work on the subquery with the WHERE or it will only UPDATE the first row of table1, and set the rest of the rows' table1column1 to 0. That table2view is an exact duplicate of table2. Yeah, weird.
If I don't use DISTINCT, I can do it without the view. Does that make sense? Note: I have to have DISTINCT because I have lots of duplicates.
I tried making it SELECT only from the view, but that slowed it down worse.
Does anyone know what the problem is and the best way to rework this query so it doesn't take so long? It's in a TRIGGER, and the updated data is pretty on demand.
Many thanks in advance!
Details
I'm testing the speed in phpMyAdmin's command line.
I'm pretty sure the degradation is coming from the view since the more of the view and the less of the actual table I use, the slower it gets.
When I do the one without DISTINCT, it's lightning fast.
Only works on views?
OK, so I just set up a copy of table2. I tried first to do the original query substituting the view with the copy. No go.
I tried to do the query below with the copy instead of the view. No go.
Hopefully the introduction of these constants will better show what I'm trying to do.
SET #table2column1_distinct_count = (SELECT COUNT(DISTINCT table2column1) FROM table2);
SET #table2column2_distinct_count = (SELECT COUNT(DISTINCT table2column2) FROM table2);
SET #table2column3_distinct_count = (SELECT COUNT(DISTINCT table2column3) FROM table2);
UPDATE table1, table2
SET table1.table1column1 = (SELECT COUNT(DISTINCT table2column1) FROM table2view WHERE table2column1 <= table2.table2column1) / #table2column1_distinct_count
+ (SELECT COUNT(DISTINCT table2column2) FROM table2view WHERE table2column2 <= table2.table2column2) / #table2column2_distinct_count
+ (SELECT COUNT(DISTINCT table2column3) FROM table2view WHERE table2column3 <= table2.table2column3) / #table2column3_distinct_count
WHERE table1.id = table2.id;
Again, when I use table2 instead of the table2view, it only updates the first row properly and sets all other rows' table1.table1column1 = 0.
Math
I'm trying to set table1.table1column1 = to the sum of the percentiles of table2column1, table2column2, and table2column3 by id.
I do a percentile by (counting the distinct values of a table2columnX <= to the current table2columnX ) / (the total count of distinct table2columnXs).
I use DISTINCT to get rid of the excessive duplicates.
View
Here's the SELECT for the view. Does this help?
CREATE VIEW myTable.table2view AS SELECT
table2.table2column1 AS table2column1,
table2.table2column2 AS table2column2,
table2.table2column2 AS table2column3,
FROM table2
GROUP BY table2.id;
Is there something special about the GROUP BY in the view's SELECT that makes this work (that I'm not seeing)?
I would probably say that the query is slow because it is repeatedly accessing the table when the trigger fires.
I am no SQL expert but I have tried to put together a query using temporary tables. You can see if it helps speed up the query. I have used different but similar sounding column names in my code sample below.
EDIT : There was a calculation error in my earlier code. Updated now.
SELECT COUNT(id) INTO #no_of_attempts from tb2;
-- DROP TABLE IF EXISTS S1Percentiles;
-- DROP TABLE IF EXISTS S2Percentiles;
-- DROP TABLE IF EXISTS S3Percentiles;
CREATE TEMPORARY TABLE S1Percentiles (
s1 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
CREATE TEMPORARY TABLE S2Percentiles (
s2 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
CREATE TEMPORARY TABLE S3Percentiles (
s3 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
INSERT INTO S1Percentiles (s1, percentile)
SELECT A.s1, ((COUNT(B.s1)/#no_of_attempts)*100)
FROM (SELECT DISTINCT s1 from tb2) A
INNER JOIN tb2 B
ON B.s1 <= A.s1
GROUP BY A.s1;
INSERT INTO S2Percentiles (s2, percentile)
SELECT A.s2, ((COUNT(B.s2)/#no_of_attempts)*100)
FROM (SELECT DISTINCT s2 from tb2) A
INNER JOIN tb2 B
ON B.s2 <= A.s2
GROUP BY A.s2;
INSERT INTO S3Percentiles (s3, percentile)
SELECT A.s3, ((COUNT(B.s3)/#no_of_attempts)*100)
FROM (SELECT DISTINCT s3 from tb2) A
INNER JOIN tb2 B
ON B.s3 <= A.s3
GROUP BY A.s3;
-- select * from S1Percentiles;
-- select * from S2Percentiles;
-- select * from S3Percentiles;
UPDATE tb1 A
INNER JOIN
(
SELECT B.tb1_id AS id, (C.percentile + D.percentile + E.percentile) AS sum FROM tb2 B
INNER JOIN S1Percentiles C
ON B.s1 = C.s1
INNER JOIN S2Percentiles D
ON B.s2 = D.s2
INNER JOIN S3Percentiles E
ON B.s3 = E.s3
) F
ON A.id = F.id
SET A.sum = F.sum;
-- SELECT * FROM tb1;
DROP TABLE S1Percentiles;
DROP TABLE S2Percentiles;
DROP TABLE S3Percentiles;
What this does is that it records the percentile for each score group and then finally just updates the tb1 column with the requisite data instead of recalculating the percentile for each student row.
You should also index columns s1, s2 and s3 for optimizing the queries on these columns.
Note: Please update the column names according to your db schema. Also note that each percentile calculation has been multiplied by 100 as I believe that percentile is usually calculated that way.

MySQL - SELECT WHERE field IN (subquery) - Extremely slow why?

I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.
Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)
This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.
Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1), and then making my second query like this instead:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM temp_view
)
And that works just fine. MySQL does this in some milliseconds.
Any SQL experts here who can explain what's going on?
The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
The final query would look like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
)
Rewrite the query into this
SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id /* list a unique sometable field here*/
HAVING COUNT(*) > 1
I think st2.relevant_field must be in the select, because otherwise the having clause will give an error, but I'm not 100% sure
Never use IN with a subquery; this is notoriously slow.
Only ever use IN with a fixed list of values.
More tips
If you want to make queries faster,
don't do a SELECT * only select
the fields that you really need.
Make sure you have an index on relevant_field to speed up the equi-join.
Make sure to group by on the primary key.
If you are on InnoDB and you only select indexed fields (and things are not too complex) than MySQL will resolve your query using only the indexes, speeding things way up.
General solution for 90% of your IN (select queries
Use this code
SELECT * FROM sometable a WHERE EXISTS (
SELECT 1 FROM sometable b
WHERE a.relevant_field = b.relevant_field
GROUP BY b.relevant_field
HAVING count(*) > 1)
SELECT st1.*
FROM some_table st1
inner join
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;
I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.
This worked a lot faster, try it!
I have reformatted your slow sql query with www.prettysql.net
SELECT *
FROM some_table
WHERE
relevant_field in
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT ( * ) > 1
);
When using a table in both the query and the subquery, you should always alias both, like this:
SELECT *
FROM some_table as t1
WHERE
t1.relevant_field in
(
SELECT t2.relevant_field
FROM some_table as t2
GROUP BY t2.relevant_field
HAVING COUNT ( t2.relevant_field ) > 1
);
Does that help?
Subqueries vs joins
http://www.scribd.com/doc/2546837/New-Subquery-Optimizations-In-MySQL-6
Try this
SELECT t1.*
FROM
some_table t1,
(SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT (*) > 1) t2
WHERE
t1.relevant_field = t2.relevant_field;
Firstly you can find duplicate rows and find count of rows is used how many times and order it by number like this;
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
after that create a table and insert result to it.
create table CopyTable
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
Finally, delete dublicate rows.No is start 0. Except fist number of each group delete all dublicate rows.
delete from CopyTable where No!= 0;
sometimes when data grow bigger mysql WHERE IN's could be pretty slow because of query optimization. Try using STRAIGHT_JOIN to tell mysql to execute query as is, e.g.
SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)
but beware: in most cases mysql optimizer works pretty well, so I would recommend to use it only when you have this kind of problem
This is similar to my case, where I have a table named tabel_buku_besar. What I need are
Looking for record that have account_code='101.100' in tabel_buku_besar which have companyarea='20000' and also have IDR as currency
I need to get all record from tabel_buku_besar which have account_code same as step 1 but have transaction_number in step 1 result
while using select ... from...where....transaction_number in (select transaction_number from ....), my query running extremely slow and sometimes causing request time out or make my application not responding...
I try this combination and the result...not bad...
`select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
L.TRANSACTION_NUMBER AS VOUCHER,
L.ACCOUNT_CODE,
C.DESCRIPTION,
L.DEBET,
L.KREDIT
from (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE!='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L
INNER JOIN (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA
LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA
ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`
I find this to be the most efficient for finding if a value exists, logic can easily be inverted to find if a value doesn't exist (ie IS NULL);
SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL
*Replace relevant_field with the name of the value that you want to check exists in your table
*Replace primaryKey with the name of the primary key column on the comparison table.
It's slow because your sub-query is executed once for every comparison between relevant_field and your IN clause's sub-query. You can avoid that like so:
SELECT *
FROM some_table T1 INNER JOIN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) T2
USING(relevant_field)
This creates a derived table (in memory unless it's too large to fit) as T2, then INNER JOIN's it with T1. The JOIN happens one time, so the query is executed one time.
I find this particularly handy for optimising cases where a pivot is used to associate a bulk data table with a more specific data table and you want to produce counts of the bulk table based on a subset of the more specific one's related rows. If you can narrow down the bulk rows to <5% then the resulting sparse accesses will generally be faster than a full table scan.
ie you have a Users table (condition), an Orders table (pivot) and LineItems table (bulk) which references counts of Products. You want the sum of Products grouped by User in PostCode '90210'. In this case the JOIN will be orders of magnitude smaller than when using WHERE relevant_field IN( SELECT * FROM (...) T2 ), and therefore much faster, especially if that JOIN is spilling to disk!