Mysql duplicate row deletion with Perl DBI across two tables - mysql

This one is a pretty good one IMO and I have not seen a close exampled on SO or Google so here you go. I need to do the following within a Perl application I am building. Unfortunately it can not be done directly in MySQL and will require DBI. In a nutshell I need to take Database1.tableA and locate every record with the column 'status' matching 'started'. This I can do as it is fairly easy (not very good with DBI yet, but have read the docs), but where I am having issues is what I have to do next.
my $started_query = "SELECT primary_ip FROM queue WHERE status='started'";
my $started = $dbh->prepare($started_query);
$started->execute();
while ( my #started = $started->fetchrow_array() ) {
# Where I am hoping to have the following occur so it can go by row
# for only rows with the status 'started'
}
So for each record in the #started array, really only contains one value per iteration of the while loop, I need to see if it exists in the Database2.tableA and IF it does exist in the other database (Database2.tableA) I need to delete it from Database1.tableA, but if it DOES NOT exist in the other database (Database2.tableA) I need to update the record in the current database (Database1.tableA).
Basically replicating the below semi-valid MySQL syntax.
DELETE FROM tableA WHERE primary_ip IN (SELECT primary_ip FROM db2.tablea) OR UPDATE tableA SET status = 'error'
I am limited to DBI to connect to the two databases and the logic is escaping me currently. I could do the queries to both databases and store in #arrays and then do a comparison, but that seems redundant as I think it should be possible within the while ( my #started = $started->fetchrow_array() ) as that will save on runtime and resources required. I am also not familiar enough with passing variables between DBI instances and as the #started array will always contain the column value I need to query for and delete I would like to take full advantage of having that defined and passed to the DBI objects.
I am going to be working on this thing all night and already ran through a couple pots of coffee so your helping me understand this logic is greatly appreciated.

You'll be better off with fetchrow_hashref, which returns a hashref of key/value pairs, where the keys are the column names, rather than coding based on columns showing up at ordinal positions in the array.
You need an additional database handle to do the lookups and updates because you've got a live statement handle on the first one. Something like this:
my $dbh2 = DBI->connect(...same credentials...);
...
while(my $row = $started->fetchrow_hashref)
{
if(my $found = $dbh2->selectrow_hashref("SELECT * FROM db2.t2 WHERE primary_ip = ?",undef,$row->{primary_ip}))
{
$dbh2->do("DELETE FROM db1.t1 WHERE primary_ip = ?",undef,$found->{primary_ip});
}
else
{
$dbh2->do("UPDATE db1.t1 SET status = 'error' WHERE primary_ip = ?",undef,$found->{primary_ip}");
}
}
Technically, I don't "need" to fetch the row from db2.t2 into my $found since you're only apparently testing for existence, there are other ways, but using it here is a bit of insurance against doing something other than you intended, since it will be undef if we somehow get some bad logic going and that should keep us from making some potential wrong changes.
But approaching a relational database with loop iterations is rarely the best tactic.
This "could" be done directly in MySQL with just a couple of queries.
First, the updates, where t1.status = 'started' and t2.primary_ip has no matching value for t1.primary_ip:
UPDATE db1.t1 a LEFT JOIN db2.t2 b ON b.primary_ip = a.primary_ip
SET a.status = 'error'
WHERE b.primary_ip IS NULL AND a.status = 'started';
If you are thinking "but b.primary_ip is never null" ... well, it is null in a left join where there are no matching rows.
Then deleting the rows from t1 can also be accomplished with a join. Multi-table joins delete only the rows from the table aliases listed between DELETE and FROM. Again, we're calling "t1" by the alias "a" and t2 by the alias "b".
DELETE a
FROM db1.t1 a JOIN db2.t2 b ON a.primary_ip = b.primary_ip
WHERE a.status = 'started';
This removes every row from t1 ("a") where status = 'started' AND where a matching row exists in t2.

Related

SQLAlchemy bulk update strategies

I am currently writing a web app (Flask) using SQLAlchemy (on GAE, connecting to Google's cloud MySQL) and needing to do bulk updates of a table. In short, a number of calculations are done resulting in a single value needing to be updated on 1000's of objects. At the moment I'm doing it all in a transaction, but still at the end, the flush/commit is taking ages.
The table has an index on id and this is all carried out in a single transaction. So I believe I've avoided the usual mistakes, but is is still very slow.
INFO 2017-01-26 00:45:46,412 log.py:109] UPDATE wallet SET balance=%(balance)s WHERE wallet.id = %(wallet_id)s
2017-01-26 00:45:46,418 INFO sqlalchemy.engine.base.Engine ({'wallet_id': u'3c291a05-e2ed-11e6-9b55-19626d8c7624', 'balance': 1.8711760000000002}, {'wallet_id': u'3c352035-e2ed-11e6-a64c-19626d8c7624', 'balance': 1.5875759999999999}, {'wallet_id': u'3c52c047-e2ed-11e6-a903-19626d8c7624', 'balance': 1.441656}
From my understanding there is no way to do a bulk update in SQL actually, and the statement above ends up being multiple UPDATE statements being sent to the server.
I've tried using Session.bulk_update_mappings() but that doesn't seem to actually do anything :( Not sure why, but the updates never actually happen. I can't see any examples of this method actually being used (including in the performance suite) so not sure if it is intended to be used.
One technique I've seen discussed is doing a bulk insert into another table and then doing an UPDATE JOIN. I've given it a test, like below, and it seems to be significantly faster.
wallets = db_session.query(Wallet).all()
ledgers = [ Ledger(id=w.id, amount=w._balance) for w in wallets ]
db_session.bulk_save_objects(ledgers)
db_session.execute('UPDATE wallet w JOIN ledger l on w.id = l.id SET w.balance = l.amount')
db_session.execute('TRUNCATE ledger')
But the problem now is how to structure my code. I'm using the ORM and I need to somehow not 'dirty' the original Wallet objects so that they don't get committed in the old way. I could just create these Ledger objects instead and keep a list of them about and then manually insert them at the end of my bulk operation. But that almost smells like I'm replicating some of the work of the ORM mechanism.
Is there a smarter way to do this? So far my brain is going down something like:
class Wallet(Base):
...
_balance = Column(Float)
...
#property
def balance(self):
# first check if we have a ledger of the same id
# and return the amount in that, otherwise...
return self._balance
#balance.setter
def balance(self, amount):
l = Ledger(id=self.id, amount=amount)
# add l to a list somewhere then process later
# At the end of the transaction, do a bulk insert of Ledgers
# and then do an UPDATE JOIN and TRUNCATE
As I said, this all seems to be fighting against the tools I (may) have. Is there a better way to be handling this? Can I tap into the ORM mechanism to be doing this? Or is there an even better way to do the bulk updates?
EDIT: Or is there maybe something clever with events and sessions? Maybe before_flush?
EDIT 2: So I have tried to tap into the event machinery and now have this:
#event.listens_for(SignallingSession, 'before_flush')
def before_flush(session, flush_context, instances):
ledgers = []
if session.dirty:
for elem in session.dirty:
if ( session.is_modified(elem, include_collections=False) ):
if isinstance(elem, Wallet):
session.expunge(elem)
ledgers.append(Ledger(id=elem.id, amount=elem.balance))
if ledgers:
session.bulk_save_objects(ledgers)
session.execute('UPDATE wallet w JOIN ledger l on w.id = l.id SET w.balance = l.amount')
session.execute('TRUNCATE ledger')
Which seems pretty hacky and evil to me, but appears to work OK. Any pitfalls, or better approaches?
-Matt
What you're essentially doing is bypassing the ORM in order to optimize the performance. Therefore, don't be surprised that you're "replicating the work the ORM is doing" because that's exactly what you need to do.
Unless you have a lot of places where you need to do bulk updates like this, I would recommend against the magical event approach; simply writing the explicit queries is much more straightforward.
What I recommend doing is using SQLAlchemy Core instead of the ORM to do the update:
ledger = Table("ledger", db.metadata,
Column("wallet_id", Integer, primary_key=True),
Column("new_balance", Float),
prefixes=["TEMPORARY"],
)
wallets = db_session.query(Wallet).all()
# figure out new balances
balance_map = {}
for w in wallets:
balance_map[w.id] = calculate_new_balance(w)
# create temp table with balances we need to update
ledger.create(bind=db.session.get_bind())
# insert update data
db.session.execute(ledger.insert().values([{"wallet_id": k, "new_balance": v}
for k, v in balance_map.items()])
# perform update
db.session.execute(Wallet.__table__
.update()
.values(balance=ledger.c.new_balance)
.where(Wallet.__table__.c.id == ledger.c.wallet_id))
# drop temp table
ledger.drop(bind=db.session.get_bind())
# commit changes
db.session.commit()
Generally it is poor schema design to need to update thousands of rows frequently. That aside...
Plan A: Write ORM code that generates
START TRANSACTION;
UPDATE wallet SET balance = ... WHERE id = ...;
UPDATE wallet SET balance = ... WHERE id = ...;
UPDATE wallet SET balance = ... WHERE id = ...;
...
COMMIT;
Plan B: Write ORM code that generates
CREATE TEMPORARY TABLE ToDo (
id ...,
new_balance ...
);
INSERT INTO ToDo -- either one row at a time, or a bulk insert
UPDATE wallet
JOIN ToDo USING(id)
SET wallet.balance = ToDo.new_balance; -- bulk update
(Check the syntax; test; etc.)

Replacing existing View but MySQL says "Table doesn't exist"

I have a table in my MySQL database, compatibility_core_rules, which essentially stores pairs of ids which represent compatibility between parts which have fields with those corresponding ids. Now, my aim is to get all possible compatibility pairs by following the transitivity of the pairs (e.g. so if the table has (1,2) and (2,4), then add the pair (1,4)). So, mathematically speaking, I'm trying to find the transitive closure of the compatibility_core_rules table.
E.g. if compatibility_core_rules contains (1,2), (2,4) and (4,9), then initially we can see that (1,2) and (2,4) gives a new pair (1,4). I then iterate over the updated pairs and find that (4,9) with the newly added (1,4) gives me (1,9). At this point, iterating again would add no more pairs.
So my approach is to create a view with the initial pairs from compatibility_core_rules, like so:
CREATE VIEW compatibility_core_rules_closure
AS
SELECT part_type_field_values_id_a,
part_type_field_values_id_b,
custom_builder_id
FROM compatibility_core_rules;
Then, in order to iteratively discover all pairs, I need to keep replacing that view with an updated version of itself that has additional pairs each time. However, I found MySQL doesn't like me referencing the view in its own definition, so I make a temporary view (with or replace, since this will be inside a loop):
CREATE OR REPLACE VIEW compatibility_core_rules_closure_temp
AS
SELECT part_type_field_values_id_a,
part_type_field_values_id_b,
custom_builder_id
FROM compatibility_core_rules_closure;
No problems here. I then reference this temporary view in the following CREATE OR REPLACE VIEW statement to update the compatibility_core_rules_closure view with one iteration's worth of additional pairs:
CREATE OR REPLACE VIEW compatibility_core_rules_closure
AS
SELECT
CASE WHEN ccr1.part_type_field_values_id_a = ccr2.part_type_field_values_id_a THEN ccr1.part_type_field_values_id_b
WHEN ccr1.part_type_field_values_id_a = ccr2.part_type_field_values_id_b THEN ccr1.part_type_field_values_id_b
END ccrA,
CASE WHEN ccr1.part_type_field_values_id_a = ccr2.part_type_field_values_id_a THEN ccr2.part_type_field_values_id_b
WHEN ccr1.part_type_field_values_id_a = ccr2.part_type_field_values_id_b THEN ccr2.part_type_field_values_id_a
END ccrB,
ccr1.custom_builder_id custom_builder_id
FROM compatibility_core_rules_closure_temp ccr1
INNER JOIN compatibility_core_rules_closure_temp ccr2
ON (
ccr1.part_type_field_values_id_a = ccr2.part_type_field_values_id_a OR
ccr1.part_type_field_values_id_a = ccr2.part_type_field_values_id_b
)
GROUP BY ccrA,
ccrB
HAVING -- ccrA and ccrB are in fact not the same
ccrA != ccrB
-- ccrA and ccrB do not belong to the same part type
AND (
SELECT ptf.part_type_id
FROM part_type_field_values ptfv
INNER JOIN part_type_fields ptf
ON ptfv.part_type_field_id = ptf.id
WHERE ptfv.id = ccrA
LIMIT 1
) !=
(
SELECT ptf.part_type_id
FROM part_type_field_values ptfv
INNER JOIN part_type_fields ptf
ON ptfv.part_type_field_id = ptf.id
WHERE ptfv.id = ccrB
LIMIT 1
)
Now this is where things go wrong. I get the following error:
#1146 - Table 'db509574872.compatibility_core_rules_closure' doesn't exist
I'm very confused by this error message. I literally just created the view/table only two statements ago. I'm sure the SELECT query itself is correct since if I try it by itself and it runs fine. If I change the first line to use compatibility_core_rules_closure2 instead of compatibility_core_rules_closure then it runs fine (however, that's not much use since I need to be re-updating the same view again and again). I've looked into the SQL SECURITY clauses but have not had any success. Also been researching online but not getting anywhere.
Does anyone have any ideas what is happening and how to solve it?
MySQL doesn't support sub-queries in views.
You'll have to separate them... ie. using another view containing the sub-query inside you main view.
Running the create statement for that view will render an error, not creating it, hence the doesn't exist error you're getting when querying it.

SQL MERGE statement to update data

I've got a table with data named energydata
it has just three columns
(webmeterID, DateTime, kWh)
I have a new set of updated data in a table temp_energydata.
The DateTime and the webmeterID stay the same. But the kWh values need updating from temp_energydata table.
How do I write the T-SQL for this the correct way?
Assuming you want an actual SQL Server MERGE statement:
MERGE INTO dbo.energydata WITH (HOLDLOCK) AS target
USING dbo.temp_energydata AS source
ON target.webmeterID = source.webmeterID
AND target.DateTime = source.DateTime
WHEN MATCHED THEN
UPDATE SET target.kWh = source.kWh
WHEN NOT MATCHED BY TARGET THEN
INSERT (webmeterID, DateTime, kWh)
VALUES (source.webmeterID, source.DateTime, source.kWh);
If you also want to delete records in the target that aren't in the source:
MERGE INTO dbo.energydata WITH (HOLDLOCK) AS target
USING dbo.temp_energydata AS source
ON target.webmeterID = source.webmeterID
AND target.DateTime = source.DateTime
WHEN MATCHED THEN
UPDATE SET target.kWh = source.kWh
WHEN NOT MATCHED BY TARGET THEN
INSERT (webmeterID, DateTime, kWh)
VALUES (source.webmeterID, source.DateTime, source.kWh)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
Because this has become a bit more popular, I feel like I should expand this answer a bit with some caveats to be aware of.
First, there are several blogs which report concurrency issues with the MERGE statement in older versions of SQL Server. I do not know if this issue has ever been addressed in later editions. Either way, this can largely be worked around by specifying the HOLDLOCK or SERIALIZABLE lock hint:
MERGE INTO dbo.energydata WITH (HOLDLOCK) AS target
[...]
You can also accomplish the same thing with more restrictive transaction isolation levels.
There are several other known issues with MERGE. (Note that since Microsoft nuked Connect and didn't link issues in the old system to issues in the new system, these older issues are hard to track down. Thanks, Microsoft!) From what I can tell, most of them are not common problems or can be worked around with the same locking hints as above, but I haven't tested them.
As it is, even though I've never had any problems with the MERGE statement myself, I always use the WITH (HOLDLOCK) hint now, and I prefer to use the statement only in the most straightforward of cases.
I often used Bacon Bits great answer as I just can not memorize the syntax.
But I usually add a CTE as an addition to make the DELETE part more useful because very often you will want to apply the merge only to a part of the target table.
WITH target as (
SELECT * FROM dbo.energydate WHERE DateTime > GETDATE()
)
MERGE INTO target WITH (HOLDLOCK)
USING dbo.temp_energydata AS source
ON target.webmeterID = source.webmeterID
AND target.DateTime = source.DateTime
WHEN MATCHED THEN
UPDATE SET target.kWh = source.kWh
WHEN NOT MATCHED BY TARGET THEN
INSERT (webmeterID, DateTime, kWh)
VALUES (source.webmeterID, source.DateTime, source.kWh)
WHEN NOT MATCHED BY SOURCE THEN
DELETE
If you need just update your records in energydata based on data in temp_energydata, assuming that temp_enerydata doesn't contain any new records, then try this:
UPDATE e SET e.kWh = t.kWh
FROM energydata e INNER JOIN
temp_energydata t ON e.webmeterID = t.webmeterID AND
e.DateTime = t.DateTime
Here is working sqlfiddle
But if temp_energydata contains new records and you need to insert it to energydata preferably with one statement then you should definitely go with the answer that Bacon Bits gave.
UPDATE ed
SET ed.kWh = ted.kWh
FROM energydata ed
INNER JOIN temp_energydata ted ON ted.webmeterID = ed.webmeterID
Update energydata set energydata.kWh = temp.kWh
where energydata.webmeterID = (select webmeterID from temp_energydata as temp)
THE CORRECT WAY IS :
UPDATE test1
INNER JOIN test2 ON (test1.id = test2.id)
SET test1.data = test2.data

processing data with perl - selecting for update usage with mysql

I have a table that is storing data that needs to be processed. I have id, status, data in the table. I'm currently going through and selecting id, data where status = #. I'm then doing an update immediately after the select, changing the status # so that it won't be selected again.
my program is multithreaded and sometimes I get threads that grab the same id as they are both querying the table at a relatively close time to each other, causing the grab of the same id. i looked into select for update, however, i either did the query wrong, or i'm not understanding what it is used for.
my goal is to find a way of grabbing the id, data that i need and setting the status so that no other thread tries to grab and process the same data. here is the code i tried. (i wrote it all together for show purpose here. i have my prepares set at the beginning of the program as to not do a prepare for each time it's ran, just in case anyone was concerned there)
my $select = $db->prepare("SELECT id, data FROM `TestTable` WHERE _status=4 LIMIT ? FOR UPDATE") or die $DBI::errstr;
if ($select->execute($limit))
{
while ($data = $select->fetchrow_hashref())
{
my $update_status = $db->prepare( "UPDATE `TestTable` SET _status = ?, data = ? WHERE _id=?");
$update_status->execute(10, "", $data->{_id});
push(#array_hash, $data);
}
}
when i run this, if doing multiple threads, i'll get many duplicate inserts, when trying to do an insert after i process my transaction data.
i'm not terribly familiar with mysql and the research i've done, i haven't found anything that really cleared this up for me.
thanks
As a sanity check, are you using InnoDB? MyISAM has zero transactional support, aside from faking it with full table locking.
I don't see where you're starting a transaction. MySQL's autocommit option is on by default, so starting a transaction and later committing would be necessary unless you turned off autocommit.
It looks like you simply rely on the database locking mechanisms. I googled perl dbi locking and found this:
$dbh->do("LOCK TABLES foo WRITE, bar READ");
$sth->prepare("SELECT x,y,z FROM bar");
$sth2->prepare("INSERT INTO foo SET a = ?");
while (#ary = $sth->fetchrow_array()) {
$sth2->$execute($ary[0]);
}
$sth2->finish();
$sth->finish();
$dbh->do("UNLOCK TABLES");
Not really saying GIYF as I am also fairly novice at both MySQL and DBI, but perhaps you can find other answers that way.
Another option might be as follows, and this only works if you control all the code accessing the data. You can create lock column in the table. When your code accesses the table it (pseudocode):
if row.lock != 1
row.lock = 1
read row
update row
row.lock = 0
next
else
sleep 1
redo
again though, this trusts that all users/script that access this data will agree to follow this policy. If you cannot ensure that then this won't work.
Anyway thats all the knowledge I have on the topic. Good Luck!

Can we control LINQ expression order with Skip(), Take() and OrderBy()

I'm using LINQ to Entities to display paged results. But I'm having issues with the combination of Skip(), Take() and OrderBy() calls.
Everything works fine, except that OrderBy() is assigned too late. It's executed after result set has been cut down by Skip() and Take().
So each page of results has items in order. But ordering is done on a page handful of data instead of ordering of the whole set and then limiting those records with Skip() and Take().
How do I set precedence with these statements?
My example (simplified)
var query = ctx.EntitySet.Where(/* filter */).OrderByDescending(e => e.ChangedDate);
int total = query.Count();
var result = query.Skip(n).Take(x).ToList();
One possible (but a bad) solution
One possible solution would be to apply clustered index to order by column, but this column changes frequently, which would slow database performance on inserts and updates. And I really don't want to do that.
EDIT
I ran ToTraceString() on my query where we can actually see when order by is applied to the result set. Unfortunately at the end. :(
SELECT
-- columns
FROM (SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent1
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent2
WHERE (Extent1.ID = Extent2.ID) AND (Extent2.userId = :p__linq__4)
)
) AS Project2
limit 0,10 ) AS Limit1
LEFT OUTER JOIN (SELECT
-- columns
FROM table2 AS Extent3 ) AS Project3 ON Limit1.ID = Project3.ID
UNION ALL
SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent4
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent5
WHERE (Extent4.ID = Extent5.ID) AND (Extent5.userId = :p__linq__4)
)
) AS Project6
limit 0,10 ) AS Limit2
INNER JOIN table3 AS Extent6 ON Limit2.ID = Extent6.ID) AS UnionAll1
ORDER BY UnionAll1.ChangedDate DESC, UnionAll1.ID ASC, UnionAll1.C1 ASC
My workaround solution
I've managed to workaround this problem. Don't get me wrong here. I haven't solved precedence issue as of yet, but I've mitigated it.
What I did?
This is the code I've used until I get an answer from Devart. If they won't be able to overcome this issue I'll have to use this code in the end.
// get ordered list of IDs
List<int> ids = ctx.MyEntitySet
.Include(/* Related entity set that is needed in where clause */)
.Where(/* filter */)
.OrderByDescending(e => e.ChangedDate)
.Select(e => e.Id)
.ToList();
// get total count
int total = ids.Count;
if (total > 0)
{
// get a single page of results
List<MyEntity> result = ctx.MyEntitySet
.Include(/* related entity set (as described above) */)
.Include(/* additional entity set that's neede in end results */)
.Where(string.Format("it.Id in {{{0}}}", string.Join(",", ids.ConvertAll(id => id.ToString()).Skip(pageSize * currentPageIndex).Take(pageSize).ToArray())))
.OrderByDescending(e => e.ChangedOn)
.ToList();
}
First of all I'm getting ordered IDs of my entities. Getting only IDs is well performant even with larger set of data. MySql query is quite simple and performs really well. In the second part I partition these IDs and use them to get actual entity instances.
Thinking of it, this should perform even better than the way I was doing it at the beginning (as described in my question), because getting total count is much much quicker due to simplified query. The second part is practically very very similar, except that my entities are returned rather by their IDs instead of partitioned using Skip and Take...
Hopefully someone may find this solution helpful.
I haven't worked directly with Linq to Entities, but it should have a way to hook specific stored procedures into certain locations when needed. (Linq to SQL did.) If so, you could turn this query into a stored procedure, doing exacly what is required, and doing it efficiently.
Assuming from you comment the persisting the values in a List is not acceptable:
There's no way to completely minimize the iterations, as you intended (and as I would have tried too, living in hope). Cutting the iterations down by one would be nice. Is it possible to just get the Count once and cache/session it? Then you could:
int total = ctx.EntitySet.Count; // Hopefully you can not repeat doing this.
var result = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).Skip(n).Take(x).ToList();
Hopefully you can cache the Count somehow, or avoid needing it every time. Even if you can't, this is the best you can do.
Could you please create a sample illusrating the problem and send it to us (support * devart * com, subject "EF: Skip, Take, OrderBy")?
Hope we will be able to help you.
You can also contact us using our forums or contact form.
Are you absolutely certain the ordering is off? What does the SQL look like?
Can you reorder your code as follows and post the output?
// Redefine your queries.
var query = ctx.EntitySet.Where(/* filter */).OrderBy(e => e.ChangedDate);
var skipped = query.Skip(n).Take(x);
// let's look at the SQL, shall we?
var querySQL = query.ToTraceString();
var skippedSQL = skipped.ToTraceString();
// actual execution of the queries...
int total = query.Count();
var result = skipped.ToList();
Edit:
I'm absolutely certain. You can check my "edit" to see trace result of my query with skipped trace result that is imperative in this case. Count is not really important.
Yeah, I see it. Wow, that's a stumper. Might even be an outright bug. I note you're not using SQL Server... what DB are you using? Looks like it might be MySQl.
One way:
var query = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).ToList();
int total = query.Count;
var result = query.Skip(n).Take(x).ToList();
Convert it to a List before skipping. It's not too efficient, mind you...