I'm writing a DB layer which talks to MS SQL Server, MySQL & Oracle. I need an operation which can update an existing row if it contains certain data, otherwise insert a new row; All in one SQL operation.
Essentially I need to save over existing data if it exists, or add it if it doesn't
Conceptually this is the same as upsert except it only needs to work on a single table. I'm trying to make sure I don't need to delete then insert as this has a performance impact.
Is there generic SQL to do this or do I need vendor specific solutions?
Thanks.
You need vendor specific SQL as MySQL (unlike MS and Oracle) doesn't support MERGE
http://en.wikipedia.org/wiki/Merge_(SQL)
I suspect that sooner rather than later, you're going to need a vendor specific implementation of your DB layer - SQL portability is pretty much a myth as soon as you do anything even slightly advanced.
I am pretty sure this is going to be vendor specific. For SQL Server, you can accomplish this using the MERGE statement.
If you are using SQL Server 2008, use Merge Statement. But keep in mind that if your Insert part has some condition involve, then it cannot be used. In which case you need to write your own way for accomplishing this. And in your case it has to be since you are involving MySQL which does not have a Merge Statement.
Why are you not using an ORM layer (like Entity Framework) for this purpose?
Just some pseudo code(in C#)
public int SaveTask(tblTaskActivity task, bool isInsert)
{
int result = 0;
using (var tmsEntities = new TMSEntities())
{
if (isInsert) //for insert
{
tmsEntities.AddTotblTaskActivities(task);
result = tmsEntities.SaveChanges();
}
else //for update
{
var taskActivity = tmsEntities.tblTaskActivities.Where(i => i.TaskID == task.TaskID).FirstOrDefault();
taskActivity.Priority = task.Priority;
taskActivity.ActualTime = task.ActualTime;
result = tmsEntities.SaveChanges();
}
}
return result;
}
In MySQL you have something similar to merge:
insert ... on duplicate key update ...
MySQL Reference - Insert on duplicate key update
Related
I looked through the docs and I didn't find anything on this subject, but I thought I'd ask, to be sure:
Is there a way for OrmLites INSERT and UPDATE APIs to make it possible in one query, to insert/update columns that are not present in the POCO?
DateTime myTimestamp = DateTime.Now;
db.Insert<MyPoco>(myPoco, new { MyNewColumn=myTimeStamp });
or something like it?
I know that I can make a custom SQL, so either make a second query, inserting the custom columns, or write the whole thing myself, but I'd like to avoid that and let OrmLite do what it's supposed to do.
OrmLite is a typed code-first ORM where each POCO is the authoritative source which maps 1:1 to their respective RDBMS tables.
You can’t use OrmLite’s typed APIs with an unknown or dynamic schema and would need to Execute Custom SQL INSERT, e:g:
db.ExecuteSql(
"INSERT INTO page_stats (ref_id, fav_count) VALUES (#refId, #favCount)",
new { refId, favCount });
JDBC allows us to fetch the value of a primary key that is automatically generated by the database (e.g. IDENTITY, AUTO_INCREMENT) using the following syntax:
PreparedStatement ps= connection.prepareStatement(
"INSERT INTO post (title) VALUES (?)",
Statement.RETURN_GENERATED_KEYS
);
while (resultSet.next()) {
LOGGER.info("Generated identifier: {}", resultSet.getLong(1));
}
I'm interested if the Oracle, SQL Server, postgresQL, or MySQL driver uses a separate round trip to fetch the identifier, or there is a single round trip which executes the insert and fetches the ResultSet automatically.
It depends on the database and driver.
Although you didn't ask for it, I will answer for Firebird ;). In Firebird/Jaybird the retrieval itself doesn't require extra roundtrips, but using Statement.RETURN_GENERATED_KEYS or the integer array version will require three extra roundtrips (prepare, execute, fetch) to determine the columns to request (I still need to build a form of caching for it). Using the version with a String array will not require extra roundtrips (I would love to have RETURNING * like in PostgreSQL...).
In PostgreSQL with PgJDBC there is no extra round-trip to fetch generated keys.
It sends a Parse/Describe/Bind/Execute message series followed by a Sync, then reads the results including the returned result-set. There's only one client/server round-trip required because the protocol pipelines requests.
However sometimes batches that can otherwise be streamed to the server may be broken up into smaller chunks or run one by on if generated keys are requested. To avoid this, use the String[] array form where you name the columns you want returned and name only columns of fixed-width data types like integer. This only matters for batches, and it's a due to a design problem in PgJDBC.
(I posted a patch to add batch pipelining support in libpq that doesn't have that limitation, it'll do one client/server round trip for arbitrary sized batches with arbitrary-sized results, including returning keys.)
MySQL receives the generated key(s) automatically in the OK packet of the protocol in response to executing a statement. There is no communication overhead when requesting generated keys.
In my opinion even for such a trivial thing a single approach working in all database systems will fail.
The only pragmatic solution is (in analogy to Hibernate) to find the best working solution for each target RDBMS (and
call it a dialect of your one for all solution:)
Here the information for Oracle
I'm using a sequence to generate key, same behavior is observed for IDENTITY column.
create table auto_pk
(id number,
pad varchar2(100));
This works and use only one roundtrip
def stmt = con.prepareStatement("insert into auto_pk values(auto_pk_seq.nextval, 'XXX')",
Statement.RETURN_GENERATED_KEYS)
def rowCount = stmt.executeUpdate()
def generatedKeys = stmt.getGeneratedKeys()
if (null != generatedKeys && generatedKeys.next()) {
def id = generatedKeys.getString(1);
But unfortunately you get ROWID as a result - not the generated key
How is it implemented internally? You can see it if you activate a 10046 trace (BTW this is also the best way to see
how many roundtrips were performed)
PARSING IN CURSOR
insert into auto_pk values(auto_pk_seq.nextval, 'XXX')
RETURNING ROWID INTO :1
END OF STMT
So you see the JDBC Standard 3.0 is implemented, but you don't get a requested result. Under the cover is used the
RETURNING clause.
The right approach to get the generated key in Oracle is therefore:
def stmt = con.prepareStatement("insert into auto_pk values(auto_pk_seq.nextval, 'XXX') returning id into ?")
stmt.registerReturnParameter(1, Types.INTEGER);
def rowCount = stmt.executeUpdate()
def generatedKeys = stmt.getReturnResultSet()
if (null != generatedKeys && generatedKeys.next()) {
def id = generatedKeys.getLong(1);
}
Note:
Oracle Release 12.1.0.2.0
To activate the 10046 trace use
con.createStatement().execute "alter session set events '10046 trace name context forever, level 12'"
con.createStatement().execute "ALTER SESSION SET tracefile_identifier = my_identifier"
Depending on frameworks or libraries to do things that are perfectly possible in plain SQL is bad design IMHO, especially when working against a defined DBMS. (The Statement.RETURN_GENERATED_KEYS is relatively innocuous, although it apparently does raise a question for you, but where frameworks are built on separate entities and doing all sorts of joins and filters in code or have custom-built transaction isolation logic things get inefficient and messy very quickly.)
Why not simply:
PreparedStatement ps= connection.prepareStatement(
"INSERT INTO post (title) VALUES (?) RETURNING id");
Single trip, defined result.
Is there a way to do an insert/select with Linq that translates to this sql:
INSERT INTO TableA (...)
SELECT ...
FROM TableB
WHERE ...
Yes #bzlm covered it first, but if you prefer something a bit more verbose:
// dc = DataContext, assumes TableA contains items of type A
var toInsert = from b in TableB
where ...
select new A
{
...
};
TableA.InsertAllOnSubmit(toInsert);
dc.SubmitChanges();
I kind of prefer this from a review/maintenance point of view as I think its a bit more obvious what's going on in the select.
In response to the observation by #JfBeaulac :
Please note that this will not generate the SQL shown - so far as I'm aware it's not actually possible to generate directly using Linq (to SQL), you'd have to bypass linq and go straight to the database. Functionally its should achieve the same result in that it will perform the select and will then insert the data - but it will round-trip the data from the server to the client and back so may not be optimal for large volumes of data.
context
.TableA
.InsertAllOnSubmit(
context
.TableB
.Where( ... )
.Select(b => new A { ... })
);
My understanding is that the LinqToSql pseudolanguage describes a set using a syntax very similar to SQL and this will allow you to efficiently update a property on a collection of objects:
from b in BugsCollection where b.status = 'closed' set b.status = 'open'
This would update the underlying database using just one SQL statement.
Normally an ORM needs to retieve all of the rows as separate objects, update attributes on each of them and save them individually to the database (at least that's my understanding).
So, how does linq-to-sql avoid having to do this when other orms are not able to avoid it?
The syntax shown in your question is incorrect. LINQ is not intended to have side-effects; it is a query language. The proper way to accomplish what you're looking for is
var x = from b in dataContext.BugsCollection where b.status == "closed";
foreach (var y in x)
y.status = "open";
dataContext.SubmitChanges();
This would generate the single SQL statement that you're talking about. The reason it is able to accomplish this is because of deferred execution - the L2S engine doesn't actually talk to the database until it has to - in this case, because SubmitChanges() was called. L2S then sends the generated SQL statement to the database for execution.
Because LINQ to SQL uses Expression Trees to convert your Query Syntax to actual SQL...it then executes the SQL against the database (rather than pulling all of the data, executing against the in-memory data, and then writing the changes back to the database).
For example, the following Query Syntax:
var records = from r in Records
where r.Property == value
select r
Gets translated first to Lamda Syntax:
Records.Where(r => r.Property == value).Select();
And finally to SQL (via Expression Trees):
SELECT Property, Property2, Property3 FROM Record WHERE Property = #value
...granted, the example doesn't update anything...but the process would be the same for an update query as opposed to a simple select.
Recently we turned a set of complicate C# based scheduling logic into SQL CLR stored procedure (running in SQL Server 2005). We believed that our code is a great SQL CLR candidate because:
The logic involves tons of data from sQL Server.
The logic is complicate and hard to be done using TSQL
There is no threading or sychronization or accessing resources from outside of the sandbox.
The result of our sp is pretty good so far. However, since the output of our logic is in form of several tables of data, we can't just return a single rowset as the result of the sp. Instead, in our code we have a lot of "INSERT INTO ...." statements in foreach loops in order to save each record from C# generic collection into SQL tables. During code review, someone raised concern about whether the inline SQL INSERT approach within the SQL CLR can cause perforamnce problem, and wonder if there's other better way to dump data out (from our C# generic collections).
So, any suggestion?
I ran across this while working on an SQLite project a few months back and found it enlightening. I think it might be what you're looking for.
...
Fastest universal way to insert data
using standard ADO.NET constructs
Now that the slow stuff is out of the
way, lets talk about some hardcore
bulk loading. Aside from SqlBulkCopy
and specialized constructs involving
ISAM or custom bulk insert classes
from other providers, there is simply
no beating the raw power of
ExecuteNonQuery() on a parameterized
INSERT statement. I will demonstrate:
internal static void FastInsertMany(DbConnection cnn)
{
using (DbTransaction dbTrans = cnn.BeginTransaction())
{
using (DbCommand cmd = cnn.CreateCommand())
{
cmd.CommandText = "INSERT INTO TestCase(MyValue) VALUES(?)";
DbParameter Field1 = cmd.CreateParameter();
cmd.Parameters.Add(Field1);
for (int n = 0; n < 100000; n++)
{
Field1.Value = n + 100000;
cmd.ExecuteNonQuery();
}
}
dbTrans.Commit();
}
}
You could return a table with 2 columns (COLLECTION_NAME nvarchar(max), CONTENT xml) filled with as many rows as internal collections you have. CONTENT will be an XML representation of the data in the collection.
Then you can use the XML features of SQL 2005/2008 to parse each collection's XML into tables, and perform your INSERT INTO's or MERGE statements on the whole table.
That should be faster than individual INSERTS inside your C# code.