YCSB cleanup phase

YCSB cleanup phase - mysql

[CLEANUP], Operations, 1
[CLEANUP], AverageLatency(us), 627.0
[CLEANUP], MinLatency(us), 627
[CLEANUP], MaxLatency(us), 627
[CLEANUP], 95thPercentileLatency(ms), 0
[CLEANUP], 99thPercentileLatency(ms), 0
[CLEANUP], 0, 1
[CLEANUP], 1, 0
[CLEANUP], 2, 0
[CLEANUP], 3, 0
[CLEANUP], 4, 0
Taken from YCSB output, what is a database cleanup operation? I am trying to understand why MySQL takes so much longer during this phase than a MongoDB system.

The cleanup operation is needed to shut down the thread(s) which was providing the workload to the db. So, if you set the parameter -thread n, you will see exactly n cleanup operations at the end of the benchmark!

Related

Understanding the output of a pyomo model, number of solutions is zero?

I am using this code to create a solve a simple problem:
import pyomo.environ as pyo
from pyomo.core.expr.numeric_expr import LinearExpression
model = pyo.ConcreteModel()
model.nVars = pyo.Param(initialize=4)
model.N = pyo.RangeSet(model.nVars)
model.x = pyo.Var(model.N, within=pyo.Binary)
model.coefs = [1, 1, 3, 4]
model.linexp = LinearExpression(constant=0,
linear_coefs=model.coefs,
linear_vars=[model.x[i] for i in model.N])
def caprule(m):
return m.linexp <= 50
model.capme = pyo.Constraint(rule=caprule)
model.obj = pyo.Objective(expr = model.linexp, sense = maximize)
results = SolverFactory('glpk', executable='/usr/bin/glpsol').solve(model)
results.write()
And this is the output:
# ==========================================================
# = Solver Results =
# ==========================================================
# ----------------------------------------------------------
# Problem Information
# ----------------------------------------------------------
Problem:
- Name: unknown
Lower bound: 50.0
Upper bound: 50.0
Number of objectives: 1
Number of constraints: 2
Number of variables: 5
Number of nonzeros: 5
Sense: maximize
# ----------------------------------------------------------
# Solver Information
# ----------------------------------------------------------
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Error rc: 0
Time: 0.09727835655212402
# ----------------------------------------------------------
# Solution Information
# ----------------------------------------------------------
Solution:
- number of solutions: 0
number of solutions displayed: 0
It says the number of solutions is 0, and yet it does solve the problem:
print(list(model.x[i]() for i in model.N))
Will output this:
[1.0, 1.0, 1.0, 1.0]
Which is a correct answer to the problem. what am I missing?

The interface between pyomo and glpk sometimes (always?) seems to return 0 for the number of solutions. I'm assuming there is some issue with the generalized interface between the pyomo core module and the various solvers that it interfaces with. When I use glpk and cbc solvers on this, it reports the number of solutions as zero. Perhaps those solvers don't fill that data element in the generalized interface. Somebody w/ more experience in the data glob returned from the solver may know precisely. That said, the main thing to look at is the termination condition, which I've found to be always accurate. It reports optimal.
I suspect that you have some mixed code from another model in your example. When I fix a typo or two (you missed the pyo prefix on a few things), it solves fine and gives the correct objective value as 9. I'm not sure where 50 came from in your output.
(slightly cleaned up) Code:
import pyomo.environ as pyo
from pyomo.core.expr.numeric_expr import LinearExpression
model = pyo.ConcreteModel()
model.nVars = pyo.Param(initialize=4)
model.N = pyo.RangeSet(model.nVars)
model.x = pyo.Var(model.N, within=pyo.Binary)
model.coefs = [1, 1, 3, 4]
model.linexp = LinearExpression(constant=0,
linear_coefs=model.coefs,
linear_vars=[model.x[i] for i in model.N])
def caprule(m):
return m.linexp <= 50
model.capme = pyo.Constraint(rule=caprule)
model.obj = pyo.Objective(expr = model.linexp, sense = pyo.maximize)
solver = pyo.SolverFactory('glpk') #, executable='/usr/bin/glpsol').solve(model)
results = solver.solve(model)
print(results)
model.obj.pprint()
model.obj.display()
Output:
Problem:
- Name: unknown
Lower bound: 9.0
Upper bound: 9.0
Number of objectives: 1
Number of constraints: 2
Number of variables: 5
Number of nonzeros: 5
Sense: maximize
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Error rc: 0
Time: 0.00797891616821289
Solution:
- number of solutions: 0
number of solutions displayed: 0
obj : Size=1, Index=None, Active=True
Key : Active : Sense : Expression
None : True : maximize : x[1] + x[2] + 3*x[3] + 4*x[4]
obj : Size=1, Index=None, Active=True
Key : Active : Value
None : True : 9.0

Couchbase benchmark reveals very slow INSERTs and GETs (using KeyValue operations); slower than persisted MySQL data

I did a small benchmark test to compare Couchbase (running in Win) with Redis and MySql (EDIT: added Aerospike to test)
We are inserting 100 000 JSON "documents" into three db/stores:
Redis (just insert, there is nothing else)
Couchbase (in-memory Ephemeral buckets, JSON Index on JobId)
MySql (Simple table; Id (int), Data (MediumText), index on Id)
Aerospike (in-memory storage)
The JSON file is 67 lines, about 1800 bytes.
INSERT:
Couchbase: 60-100 seconds (EDIT: seems to vary quite a bit!)
MySql: 30 seconds
Redis: 8 seconds
Aerospike: 71 seconds
READ:
We are reading 1000 times, and we do this 10 times and look at averages.
Couchbase: 600-700 ms for 1000 GETs (Using KeyValue operations, not Query API. Using Query API, this takes about 1500 ms)
MySql: 90-100 ms for 1000 GETs
Redis: 50-60 ms for 1000 GETs
Aerospike: 750 ms for 1000 GETs
Conclusion:
Couchbase seems slowest (the INSERT times varies a lot it seems), Aerospike is also very slow. Both of these are using in-memory storage (Couchbase => Ephemeral bucket, Aerospike => storage-engine memory).
Question: Why the in-memory write and read on Couchbase so slow, even slower than using normal MySQL (on an SSD)?
CODE
Note: Using Task.WhenAll, or awaiting each call, doesn't make a difference.
INSERT
Couchbase:
IBucket bucket = await cluster.BucketAsync("halo"); // <-- ephemeral
IScope scope = bucket.Scope("myScope");
var collection = scope.Collection("myCollection");
// EDIT: Added this to avoid measuring lazy loading:
JObject t = JObject.FromObject(_baseJsonObject);
t["JobId"] = 0;
t["CustomerName"] = $"{firstnames[rand.Next(0, firstnames.Count - 1)]} {lastnames[rand.Next(0, lastnames.Count - 1)]}";
await collection.InsertAsync("0", t);
await collection.RemoveAsync("0");
List<Task> inserTasks = new List<Task>();
sw.Start();
foreach (JObject temp in jsonObjects) // jsonObjects is pre-created so its not a factor in the test
{
inserTasks.Add(collection.InsertAsync(temp.GetValue("JobId").ToString(), temp));
}
await Task.WhenAll(inserTasks);
sw.Stop();
Console.WriteLine($"Adding {nbr} to Couchbase took {sw.ElapsedMilliseconds} ms");
Redis (using ServiceStack!)
sw.Restart();
using (var client = redisManager.GetClient())
{
foreach (JObject temp in jsonObjects)
{
client.Set($"jobId:{temp.GetValue("JobId")}", temp.ToString());
}
}
sw.Stop();
Console.WriteLine($"Adding {nbr} to Redis took {sw.ElapsedMilliseconds} ms");
sw.Reset();
Mysql:
MySql.Data.MySqlClient.MySqlConnection mySqlConnection = new MySql.Data.MySqlClient.MySqlConnection("Server=localhost;Database=test;port=3306;User Id=root;password=root;");
mySqlConnection.Open();
sw.Restart();
foreach (JObject temp in jsonObjects)
{
MySql.Data.MySqlClient.MySqlCommand cmd = new MySql.Data.MySqlClient.MySqlCommand($"INSERT INTO test (id, data) VALUES ('{temp.GetValue("JobId")}', #data)", mySqlConnection);
cmd.Parameters.AddWithValue("#data", temp.ToString());
cmd.ExecuteNonQuery();
}
sw.Stop();
Console.WriteLine($"Adding {nbr} to MySql took {sw.ElapsedMilliseconds} ms");
sw.Reset();
READ
Couchbase:
IBucket bucket = await cluster.BucketAsync("halo");
IScope scope = bucket.Scope("myScope");
var collection = scope.Collection("myCollection");
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 1000; i++)
{
string key = $"{r.Next(1, 100000)}";
var result = await collection.GetAsync(key);
}
sw.Stop();
Console.WriteLine($"Couchbase Q: {q}\t{sw.ElapsedMilliseconds}");
Redis:
Stopwatch sw = Stopwatch.StartNew();
using (var client = redisManager.GetClient())
{
for (int i = 0; i < nbr; i++)
{
client.Get<string>($"jobId:{r.Next(1, 100000)}");
}
}
sw.Stop();
Console.WriteLine($"Redis Q: {q}\t{sw.ElapsedMilliseconds}");
MySQL:
MySqlConnection mySqlConnection = new MySql.Data.MySqlClient.MySqlConnection("Server=localhost;Database=test;port=3306;User Id=root;password=root;");
mySqlConnection.Open();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < nbr; i++)
{
MySqlCommand cmd = new MySql.Data.MySqlClient.MySqlCommand($"SELECT data FROM test WHERE Id='{r.Next(1, 100000)}'", mySqlConnection);
using MySqlDataReader rdr = cmd.ExecuteReader();
while (rdr.Read())
{
}
}
sw.Stop();
Console.WriteLine($"MySql Q: {q} \t{sw.ElapsedMilliseconds} ms");
sw.Reset();
Couchbase setup:
and
and Bucket Durability:
I only have 1 Node (no cluster), it's local on my machine, running Ryzen 3900x 12 cores, M.2 SSD, Win10, 32 GB RAM.
If you made it this far, here is a GitHub repo with my benchmark code:
https://github.com/tedekeroth/CouchbaseTests

I took your CouchbaseTests, commented out the non-Couchbase bits. Fixed the query to select from the collection ( myCollection ) instead of jobcache, and removed the Metrics option. And created an index on JobId.
create index mybucket_JobId on default:myBucket.myScope.myCollection (JobId)
It inserts the 100,000 documents in 19 seconds and kv-fetches the documents on average 146 usec and query by JobId on average 965 usec.
Couchbase Q: 0 187
Couchbase Q: 1 176
Couchbase Q: 2 143
Couchbase Q: 3 147
Couchbase Q: 4 140
Couchbase Q: 5 138
Couchbase Q: 6 136
Couchbase Q: 7 139
Couchbase Q: 8 125
Couchbase Q: 9 129
average et: 146 ms per 1000 -> 146 usec / request
Couchbase Q: 0 1155
Couchbase Q: 1 1086
Couchbase Q: 2 1004
Couchbase Q: 3 901
Couchbase Q: 4 920
Couchbase Q: 5 929
Couchbase Q: 6 912
Couchbase Q: 7 911
Couchbase Q: 8 911
Couchbase Q: 9 927
average et: 965 ms per 1000 -> 965 usec / request. (coincidentally exactly the same as with the java api).
This was on 7.0 build 3739 on a Mac Book Pro with the cbserver running locally.
######################################################################
I have a small LoadDriver application for the java sdk that uses the kv api. With 4 threads, it shows an average response time of 54 micro-seconds and throughput of 73238 requests/second. It uses the travel-sample bucket on a cb server on localhost. git#github.com:mikereiche/loaddriver.git
Run: seconds: 10, threads: 4, timeout: 40000us, threshold: 8000us requests/second: 0 (max), forced GC interval: 0ms
count: 729873, requests/second: 72987, max: 2796us avg: 54us, aggregate rq/s: 73238
For the query API I get the following which is 18 times slower.
Run: seconds: 10, threads: 4, timeout: 40000us, threshold: 8000us requests/second: 0 (max), forced GC interval: 0ms
count: 41378, requests/second: 4137, max: 12032us avg: 965us, aggregate rq/s: 4144

I would have to run such a comparison myself to do a full investigation, but two things stand out.
Your parallel execution isn't truly fully parallel. async methods run synchronously up to the first await, so all of the code in InsertAsync/GetAsync before the first await is running sequentially as you add your tasks, not parallel.
CouchbaseNetClient does some lazy connection setup in the background, and you're paying that cost in the timed section. Depending on the environment, including SSL negotiation and such things, this can be a significant initial latency.
You can potentially address the first issue by using Task.Run to kick off the operation, but you may need to pre-size the default Threadpool size.
You can address the second issue by doing at least one operation on the bucket (including bucket.WaitUntilReadyAsync()) before the timed section.
60 seconds for inserts still look abnormal. How many nodes and what Durability setting are you using?

Manjaro/Arch: Disabling IRQ #31 (Thinkpad X1 7th)

I've stumbled upon the beforementioned error (disabling IRQ #31) during boot and have tried to resolve it by trying to find out what caused the interrupt.
Running lspci -v | grep 31 gives no result and a cat /proc/interrupts | grep 31 returns:
31: 0 0 0 0 100000 0 0 0 IO-APIC 31-fasteoi tpm0
122: 0 0 0 0 0 0 0 0 PCI-MSI 3162112-edge pciehp
131: 0 0 0 0 0 0 4780 0 PCI-MSI 1572871-edge nvme0q7
136: 0 0 0 0 0 0 377 0 PCI-MSI 520192-edge enp0s31f6
160: 0 0 1 0 0 0 1260 0 PCI-MSI 333831-edge iwlwifi: queue 7
RES: 25967 11451 13108 4439 3441 4165 4926 4203 Rescheduling interrupts
How should I proceed knowing that it maybe has to do something with IO-APIC 31-fasteoi tpm0?
Many thanks.

I ran into a similar problem on my Lenovo L390 (running Fedora 31). In my case the issue reproduced by waking up the laptop when in suspension mode.
For me the issue was resolved (or worked around) by disabling the Trusted Platform Module in the BIOS.

mapping j function (used defined) over a list

I've written my own version of the exponential (^) function which works fine for simple scalars :
3 : '+/ (y&^%!) i.50'
It doesn't work over a list, so I thought of modifying it with "0
3 : '+/ (y"0&^%!) i.50'
This works over a list but gives the wrong answers.
Two questions arise:
1) Given my usage of "0 doesn't work, is there one that does ?
2) If I don't have access to a functional definition like this, what is the best way to apply it to the individual elements of an array ?

You need to apply the rank conjunction "0 over the function you want to map (y&^%!), instead of its argument y:
3 : '+/(y&^%!)"0 i.50'
However, the precision isn't as good as the native ^:
a =: 3 : '+/(y&^%!)"0 i.50' 4 4 $ 10+i.20
b =: ^ 4 4 $ 10+i.20
a = b
1 1 1 1
0 0 0 0
0 0 0 0
0 0 0 0

Database update weirdness

I am developing a game in Django in which the quantity of rebels updates after a turn change. Rebels are represented as integers from 0 to 100 in the (MySQL) database.
When I save, this happens:
print world.rebels
>>> 0
rebstab = 0
world.rebels += rebstab
world.save()
print world.rebels
>>> 0
However, when I use F() expressions (as I gather I should to prevent race conditions), this happens:
print world.rebels
>>> 0
rebstab = 0
world.rebels = F('rebels') + rebstab
world.save()
print world.rebels
>>> 100
What's going on?

I don't have any experience with using F() objects like this so this answer may not be useful, but having a look at the documentation it mentions:
In order to access the new value that has been saved in this way, the object will need to be reloaded:
reporter = Reporters.objects.get(pk=reporter.pk)
so you may have to actually query the DB again to see the updated value:
print world.rebels
>>> 0
rebstab = 0
world.rebels = F('rebels') + rebstab
world.save()
print World.objects.get(pk=world.pk)
>>> 100
Edit: Also look [at the example where you don't need to pull the objects into memory and can do everythign at the DB level:
World.objects.get(pk=...).update(rebels=F('rebels) + 1)

I just figured out the reason had nothing to do with the F() object itself but rather a conflict with a custom save method. See Django F() objects and custom saves weirdness for a more indepth explanation.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

YCSB cleanup phase - mysql

The cleanup operation is needed to shut down the thread(s) which was providing the workload to the db. So, if you set the parameter -thread n, you will see exactly n cleanup operations at the end of the benchmark!

Related

Understanding the output of a pyomo model, number of solutions is zero?

Couchbase benchmark reveals very slow INSERTs and GETs (using KeyValue operations); slower than persisted MySQL data

Manjaro/Arch: Disabling IRQ #31 (Thinkpad X1 7th)

mapping j function (used defined) over a list

Database update weirdness

Categories

Resources