I want to insert 5 new lines in a table if and only if none of the 5 lines are already there. If one of them is in the table, then I want to abort the insertion (without updating anything), and know which one (or which ones) were already there.
I can think of long ways to do this (such as looking if SELECT col1 WHERE col1 IN (value1,value2,...) returns anything, and then inserting only if it doesn't)
I also guess transactions can do this, but I'm currently learning how they work. However, I don't know if a transaction can give me which entry(ies) is (are) a duplicate(s).
With or without transactions, is there any way to do this in one or two queries only ?
Thanks
I doubt there is a better way than the solution you mentioned: First run a SELECT query and if it doesn't return anything, INSERT. You asked for something in one or two queries. This is exactly two queries, so pretty efficient in my view. I can't think of an efficient way to use transactions for this. Transactions are good when you have multiple INSERT or UPDATE queries, you have only one.
The insert instruction does not give a lot of chances to do the job. If you turn on an UNIQUE constraint in the desired field and than insert all the fields in only one instruction such
INSERT INTO FOO(col1) VALUES
(val1),
(val2),
(val3),
(val4),
(val5);
It is going to give an exception due the constraint violation and therefore abort the instruction.
If you want avoid the exception the job becomes a little pervert:
INSERT INTO FOO(col1) VALUES
Seleect a.* from (Select val1
union
Select val2
union
select val3
union
select val4
union
select val5 ) a
inner join
( select g.* from(
select false b from foo where col1 in(val1,val2....)
union
select true) g
limit 1) b on b.b
What happen? the most inner query returns true only if there is no values therefore it will insert all the values only if there is no values.
Related
I am completely new to database coding, and I've tried Googling but cannot seem to figure this out. I imagine there's a simple solution. I have a very large table with MemberIDs and a few other relevant variables that I want to pull records from (table1). I also have a second large table of distinct MemberIDs (table2). I want to pull rows from table 1 where the MemberID exists in table2.
Here’s how I tried to do it, and for some reason I suspect this isn’t working correctly, or there may be a much better way to do this.
proc sql;
create table tablewant as select
MemberID, var1, var2, var3
from table1
where exists (select MemberID from table2)
;
quit;
Is there anything wrong with the way I’m doing this? What's the best way to solve this when working with extremely large tables (over 100 million records)? Would doing some sort of join be better? Also, do I need to change
where exists (select MemberID from table2)
to
where exists (select MemberID from table2 where table1.MemberID = table2.MemberID)
?
You want to implement a "semi-join". You second solution is correct:
select MemberID, var1, var2, var3
from table1
where exists (
select 1 from table2 where table1.MemberID = table2.MemberID
)
Notes:
There's no need to select anything special in the subquery since it's not checking for values, but for row existence instead. For example, 1 will do, as well as *, or even null. I tend to use 1 for clarity.
The query needs to access table2 and this should be optimized specially for such large tables. You should consider adding the index below, if you haven't created it already:
create index ix1 on table2 (MemberID);
The query does not have a filtering criteria. That means that the engine will read 100 million rows and will check each one of them for the matching rows in the secondary table. This will unavoidably take a long time. Are you sure you want to read them all? Maybe you need to add a filtering condition, but I don't know your requirements in this respect.
I have about 20 tables. These tables have only id (primary key) and description (varchar). The data is a lot reaching about 400 rows for one table.
Right now I have to get data of at least 15 tables at a time.
Right now I am calling them one by one. Which means that in one session I am giving 15 calls. This is making my process slow.
Can any one suggest any better way to get the results from the database?
I am using MySQL database and using Java Springs on server side. Will making view for all combined help me ?
The application is becoming slow because of this issue and I need a solution that will make my process faster.
It sounds like your schema isn't so great. 20 tables of id/varchar sounds like a broken EAV, which is generally considered broken to begin with. Just the same, I think a UNION query will help out. This would be the "View" to create in the database so you can just SELECT * FROM thisviewyoumade and let it worry about the hitting all the tables.
A UNION query works by having multiple SELECT stataements "Stacked" on top of one another. It's important that each SELECT statement has the same number, ordinal, and types of fields so when it stacks the results, everything matches up.
In your case, it makes sense to manufacturer an extra field so you know which table it came from. Something like the following:
SELECT 'table1' as tablename, id, col2 FROM table1
UNION ALL
SELECT 'table2', id, col2 FROM table2
UNION ALL
SELECT 'table3', id, col2 FROM table3
... and on and on
The names or aliases of the fields in the first SELECT statement are the field names that are used in the result set that is returned, so no worries about doing a bunch AS blahblahblah in subsequent SELECT statements.
The real question is whether this union query will perform faster than 15 individual calls on such a tiny tiny tiny amount of data. I think the better option would be to change your schema so this stuff is already stored in one table just like this UNION query outputs. Then you would need a single select statement against a single table. And 400x20=8000 is still a dinky little table to query.
To get a row of all descriptions into app code in a single roundtrip send a query kind of
select t1.description, ... t15.description
from t -- this should contain all needed ids
join table1 t1 on t1.id = t.t1id
...
join table1 t15 on t15.id = t.t15id
I cannot get you what you really need but here merging all those table values into single table
CREATE TABLE table_name AS (
SELECT *
FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID=t2.ID AND
...
LEFT JOIN tableN tN ON tN-1.ID=tN.ID
)
I have a table. Let say it has 2 columns .
Column 1 is Id , column 2 is dependent ID
If I want to know , all Ids that are dependent upon an ID. This dependency can be transitive .
i.e if Id1 is dependent upon Id2 and Id3 is dependent upon Id1.
then If I want all dependents upon ID2 tesult should then the both Id2 and Id1.
For this I will have to fire multiple queries to mysql until I get a nullSet.
I.e
Select Id where dependentID='ID2';
This will give one set. Then I will have to recursively fire the above query on the ID set which is outputted by the above query.
Can I do it in just one query somehow i.e only one I/O or is there a better way then the above approach ?
The database I am using is MYSQL.
Life will be simpler if you store the pairs in a conical order, such as
INSERT INTO tbl (id1, id2) VALUES
(LEAST($a, $b), GREATEST($a, $b));
Back to your "recursive" query. No, there is no way to do it inside MySQL's SQL. You must write a loop in your favorite programming language. Well, OK, you could write a MySQL Stored Procedure, that that feels really difficult.
In your language, I would suggest a loop that adds to an array. Keep the array sorted. Each time through the loop, do one more SELECT using an IN clause with the array elements. Terminate the loop when you find no new items for the array.
The SELECT would be something like
( SELECT id2 FROM tbl WHERE id1 IN (...) )
UNION
( SELECT id1 FROM tbl WHERE id2 IN (...) )
It would be good to have compound indexes (id1,id2) and (id2, id1).
I've done some searching around but I haven't found a clear answer and explanation to my question.
I have 5 tables called table1, table2, table3, table4 and table5 and I want to do COUNT(*) on each of the tables to get the number of rows.
Should I try to combine these into one query or use 5 separate queries? I have always been taught that the least number of queries the better so I am guessing I should try to combine them into one query.
One way of doing it is to use UNION but does anyone know what the most efficient way of doing this is and why?
Thanks for any help.
Assuming you just want a count(*) from each one, then
SELECT
( SELECT count(*) from table1 ) AS table1,
( SELECT count(*) from table2 ) AS table2,
( SELECT count(*) from table3 ) AS table3,
etc...
)
would give you those counts as a single row. The DB server would still be running n+1 queries (n tables, 1 parent query) to get those counts, but it'd be the same story if you were using a UNION anyways. The union would produce multiple rows with 1 value in each, v.s. the 1 row with multiple values of the subselect method.
Provided you have read access to this (so rather not on shared hosting where you don’t have your actual own database instance) you could read that info from the INFORMATION_SCHEMA TABLES table, that does have a TABLE_ROWS column.
(Be aware of what it says for InnoDB tables there – so if you don’t you MyISAM and need the precise counts, the other answer has the better solution.)
I'm trying to find the most efficient way to determine if a table row exists.
I have in mind 3 options:
SELECT EXISTS(SELECT 1 FROM table1 WHERE some_condition);
SELECT 1 FROM table1 WHERE some_condition LIMIT 0,1;
SELECT COUNT(1) FROM table1 WHERE some_condition;
It seems that for MySQL the first approach is more efficient:
Best way to test if a row exists in a MySQL table
Is it true in general for any database?
UPDATE:
I've added a third option.
UPDATE2:
Let's assume the database products are mysql, oracle and sql-server.
I would do
SELECT COUNT(1) FROM table 1 WHERE some_condition.
But I don't think it makes a significant difference unless you call it a lot (in which case, I'd probably use a different strategy).
If you mean to use as a test if AT LEAST ONE row exists with some condition (1 or 0, true or false), then:
select count(1) from my_table where ... and rownum < 2;
Oracle can stop counting after it gets a hit.
Exists is faster because it will return the number of results that match the subquery and not the whole result.
The different methods have different pros and cons:
SELECT EXISTS(SELECT 1 FROM table1 WHERE some_condition);
might be the fastest on MySQL, but
SELECT COUNT(1) FROM table 1 WHERE some_condition
as in #Luis answer gives you the count.
More to the point I recommend you take a look at your business logic: Very seldom is it necessary to just see if a row exists, more often you will want to
either use these rows, so just do the select and handle the 0-rows case
or you will want to change these rows, in which case just do your update and check mysql_affected_rows()
If you want to INSERT a row if it doesn't already exist, take a look at INSERT .. ON DUPLICATE KEY or REPLACE INTO
The exists function is defined generally in SQL, it isn't only as a MySQL function : http://www.techonthenet.com/sql/exists.php
and I usually use this function to test if a particular row exists.
However in Oracle I've seen many times the other approach suggested before:
SELECT COUNT(1) FROM table 1 WHERE some_condition.