MySQL: The fastest way to split a table based on a condition - mysql

I have two tables:
1) is a list of all parameter-ids and the info to which set of parameters the parameter-id belongs
2) is data that includes some of the parameter-ids, and some additional data such as timestamp and values.
I'm designing a data-warehouse-like system. But instead of a summary table where i store precalculated values (that doesn't really make sense in my case) i try to decrease the amount of data the different reporting-scripts have to look through to get their results.
I want to transfer every row that is in table2 into a table for each set of parameters so that in the end i have "summary tables", one for each set of parameters. Which parameter belongs to which set is saved in table1.
Is there a faster way than to loop over every entry from table1, get #param_id = ... and #tablename = ... and do a INSERT INTO #tablename SELECT * FROM table2 WHERE parameter_id = #param_id? I read that a "Set based approach" would be faster (and better) than the procedural approach, but I don't quite get how that would work in my case.
Any help is appreciated!

Don't do it. Your 3rd table would be redundant with the original two tables. Instead do a JOIN between the two tables whenever you need pieces from both.
SELECT t1.foo, t2.bar
FROM t1
JOIN t2 ON t1.x = t2.x
WHERE ...;

Related

Pull records from one table where 1 variable exists in a second table? Very large tables

I am completely new to database coding, and I've tried Googling but cannot seem to figure this out. I imagine there's a simple solution. I have a very large table with MemberIDs and a few other relevant variables that I want to pull records from (table1). I also have a second large table of distinct MemberIDs (table2). I want to pull rows from table 1 where the MemberID exists in table2.
Here’s how I tried to do it, and for some reason I suspect this isn’t working correctly, or there may be a much better way to do this.
proc sql;
create table tablewant as select
MemberID, var1, var2, var3
from table1
where exists (select MemberID from table2)
;
quit;
Is there anything wrong with the way I’m doing this? What's the best way to solve this when working with extremely large tables (over 100 million records)? Would doing some sort of join be better? Also, do I need to change
where exists (select MemberID from table2)
to
where exists (select MemberID from table2 where table1.MemberID = table2.MemberID)
?
You want to implement a "semi-join". You second solution is correct:
select MemberID, var1, var2, var3
from table1
where exists (
select 1 from table2 where table1.MemberID = table2.MemberID
)
Notes:
There's no need to select anything special in the subquery since it's not checking for values, but for row existence instead. For example, 1 will do, as well as *, or even null. I tend to use 1 for clarity.
The query needs to access table2 and this should be optimized specially for such large tables. You should consider adding the index below, if you haven't created it already:
create index ix1 on table2 (MemberID);
The query does not have a filtering criteria. That means that the engine will read 100 million rows and will check each one of them for the matching rows in the secondary table. This will unavoidably take a long time. Are you sure you want to read them all? Maybe you need to add a filtering condition, but I don't know your requirements in this respect.

Determining whether each row exists in another MySQL table

I have two tables that are very similar. For example, let's say that each row has two ID numbers, and a data value. The first ID number may occur once, twice, or not be included, and the second ID number is either 1 or -1. The data value is not important, but for the sake of this example, we'll say it's an integer. For each pair of ID numbers, there can only be one data value, so if I have a data point where the ID's are 10 and 1, there won't be another 10 and 1 row with a different data value. Similarly, in the other table, the data point with ID's 10 and 1 will be the same as in the first table. I want to be able to select the rows that exist in both tables for the sake of changing the data value in all of the rows that are in both. My command for MySQL so far is as follows:
SELECT DISTINCT * FROM schema.table1
WHERE EXISTS (SELECT * from schema.table1
WHERE schema.table1.ID1 = schema.table2.ID1
and schema.table1.ID2 = schema.table2.ID2);
I want to be able to have this code select all the rows in table1 that are also in table2, but allow me to edit table1 values.
I understand that by creating a union of the two tables, I can see the rows that exist in both tables, but would this allow me to make changes to the actual data values if I changed the values in the merged set? For example, if I did:
SELECT DISTINCT * FROM schema.table1 inner join schema.table2
WHERE schema.table1.ID1 = schema.table2.ID1
schema.table1.ID2 = schema.table2.ID2;
If I call UPDATE on the rows that I get from this query, would the actual values in table1/table2 be changed or is this union just created in dynamic memory and I would just be changing values that get deleted when the query is over?
Update as follows:
UPDATE table1 SET data = whateverupdate
WHERE ID1 IN (SELECT ID1 from schema.table1
WHERE schema.table1.ID1 = schema.table2.ID1
and schema.table1.ID2 = schema.table2.ID2);
In your inner select statement, you cannot do a select * you'll have to select a particular column. This should work because your inner select finds the row in question and feeds it to your update statement. That being said, your inner select has to return the right row you need, else, the wrong row will be updated. Hope this helps.

Better way to get 15 tables results at a time in MySql

I have about 20 tables. These tables have only id (primary key) and description (varchar). The data is a lot reaching about 400 rows for one table.
Right now I have to get data of at least 15 tables at a time.
Right now I am calling them one by one. Which means that in one session I am giving 15 calls. This is making my process slow.
Can any one suggest any better way to get the results from the database?
I am using MySQL database and using Java Springs on server side. Will making view for all combined help me ?
The application is becoming slow because of this issue and I need a solution that will make my process faster.
It sounds like your schema isn't so great. 20 tables of id/varchar sounds like a broken EAV, which is generally considered broken to begin with. Just the same, I think a UNION query will help out. This would be the "View" to create in the database so you can just SELECT * FROM thisviewyoumade and let it worry about the hitting all the tables.
A UNION query works by having multiple SELECT stataements "Stacked" on top of one another. It's important that each SELECT statement has the same number, ordinal, and types of fields so when it stacks the results, everything matches up.
In your case, it makes sense to manufacturer an extra field so you know which table it came from. Something like the following:
SELECT 'table1' as tablename, id, col2 FROM table1
UNION ALL
SELECT 'table2', id, col2 FROM table2
UNION ALL
SELECT 'table3', id, col2 FROM table3
... and on and on
The names or aliases of the fields in the first SELECT statement are the field names that are used in the result set that is returned, so no worries about doing a bunch AS blahblahblah in subsequent SELECT statements.
The real question is whether this union query will perform faster than 15 individual calls on such a tiny tiny tiny amount of data. I think the better option would be to change your schema so this stuff is already stored in one table just like this UNION query outputs. Then you would need a single select statement against a single table. And 400x20=8000 is still a dinky little table to query.
To get a row of all descriptions into app code in a single roundtrip send a query kind of
select t1.description, ... t15.description
from t -- this should contain all needed ids
join table1 t1 on t1.id = t.t1id
...
join table1 t15 on t15.id = t.t15id
I cannot get you what you really need but here merging all those table values into single table
CREATE TABLE table_name AS (
SELECT *
FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID=t2.ID AND
...
LEFT JOIN tableN tN ON tN-1.ID=tN.ID
)

mySQL: How to identify duplicates based on four fields

I have read a few posts on SO on how to delete duplicates, by comparing a table with another instance of itself, however I don't want to delete the duplicates I want to compare them.
eg. I have the fields "id", "sold_price", "bruksareal", "kommunenr", "Gårdsnr" ,"Bruksnr", "Festenr", "Seksjonsnr". All fields are int.
I want to identify the rows that are duplicates/identical (the same bruksareal, kommunenr, gårdsnr, bruksnr,festenr and seksjonsnr). If identical then I want to give these rows a unique reference number.
I believe this will make is easier to identify the rows that I later want to compare on other fields (eg. such as "sold_price", "sold_date" etc..)
I'm open to suggestions if you believe my approach is wrong...
Perform a join on the table to itself across all fields, then use an exists, query, such as:
Update Table1
Set reference = UUID()
Where exists (
Select tb1.id
from Table1 tb1 inner join Table1 tb2 on
tb1.Field1 = tb2.Field1 AND
tb1.Field2 = tb2.Field2 AND
etc
Where tb1.Id = Table1.Id
And tb1.Id != tb2.Id
)
actually you can simplify with just a join
Update Table1
Set reference = UUID()
From Table1 inner join Table1 tb2 on
Table1.Field1 = tb2.Field1 AND
Table1.Field2 = tb2.Field2 AND
etc
Where Table1.Id != tb2.Id
Depending on where you want to do that, i would go for a hash implementation. For every insert, calculate the hash of the needed columns when you do the insert (trigger maybe), and after that you should be able to find out very easily what rows are duplicated (if you index that column, the queries should be pretty fast, but remember that that is still not a int column, so it will get a little slower over time).
After this you can do whatever you please with the duplicated records, without very expensive queries on the database.
Later edit: Make sure that you convert the null values into some defined value, since some of the mysql functions like MD5 will just return null if the operand is null. The same goes for concat - if one operand is null, it will return null (the same is not valid for concat_ws though).

Reducing MySQL Query Time (Currently running for 24 hours and still going)

I have a database with three tables and I need to cross reference the first table against the other two to create a fourth table of consolidated information. All the tables have one field which is common, this is the MSISDN (mobile / cell telephone number) and is at least 10 digits long.
Table 1 - 819,248 rows
Table 2 - 75,308,813 rows
Table 3 - 17,701,196 rows
I want to return all the rows from Table 1 and append some of the fields from Tables 2 and Table 3 when there's a matching MSISDN.
My query has been running now for over 24 hours and I have no way of knowing how long something like this should take.
This type of query may be a regular project - is there a way to significantly reduce the query time?
I have indexed tables 2 and 3 with MSISDN and the fields I need to return.
My query is like this:
create TABLE FinishedData
select
Table1.ADDRESS, table1.POSTAL, table1.MOBILE,
table1.FIRST, table1.LAST, table1.MID, table1.CARRIER,
table1.TOWN, table1.ID, table2.status as 'status1',
table2.CurrentNetworkName as 'currentnetwork1',
table2.DateChecked as 'datechecked1', table3.Status as 'status2',
table3.CurrentNetworkName 'currentnetwork2',
table3.DateChecked as 'datechecked2'
from
table1 left join (table2, table3)
on (right(table1.MOBILE, 10) = right(table2.MSISDN, 10)
AND right(table1.MOBILE,10) = right(table3.MSISDN,10))
MySQL is running on a 64bit windows machine with 12GB memory and 8 logical cores # 3GHz. MySQLd is only using 10% cpu and 600MB of resources when running the query.
Any help is appreciated.
The kill performance issue is with right function When you use this function, MySQL can't use indexes.
My suggest is:
Create new fields in table2 and table 3 with reverse content of MSISDN
Make the join replacing right function by left function.
With this little change MySQL will can take indexes to make your joins.
Explained steps:
1) Create new columns:
Alter table table2 add column r_MSISDN varchar(200);
update table2 set r_MSISDN = reverse( MSISDN );
Alter table table3 add column r_MSISDN varchar(200);
update table3 set r_MSISDN = reverse( MSISDN );
2) New join:
...
from
table1 left join (table2, table3)
on (right(table1.MOBILE, 10) = left(table2.r_MSISDN, 10)
AND right(table1.MOBILE,10) = left(table3.r_MSISDN,10))
RIGHT is a function. Using a function in where clause means MySQL (and perhaps any database) cannot use an index because it has to compute the value returned by the function for each row before comparing.
If you want to make this query any faster, consider storing the MSISDN in a normalized form and comparing using = operator.
Now I am not sure what MSISDN number looks like. If it is a fixed width number then your job is easy. If it contains separators (spaces/hyphens) and the separators are only there for readability you should remove them before storing in database. If the first 10 characters are important and remaining are optional, you might consider storing the first 10 and remaining characters in separate columns.
As others have already mentioned, the problem is with the right function which does not allow using any indexes.
In simple words, your current query for each row in table1 makes a full scan of table2 and for each match makes a full scan of table3. Considering how many rows you have in table2 and table3, you have a good chance to see the world before the query is finished.
Another problem is that the query initiates a huge transaction which should be able, as MySQL thinks, to be rolled back and you might think over the isolation level.
I would not change the current tables though. I would create subcopies of table2 and table3 with the required columns, and add the right(table2.MSISDN, 10) as a separate indexed column in table2 copy (right(table3.MSISDN,10) in table3 copy).
Then, you can do the LEFT JOIN with the copies, or even reduce the copies to the rows which do match anything in table1 and do the LEFT JOIN then.