Possible To Get Around Converting To Decimal for Django DB Saves?

Possible To Get Around Converting To Decimal for Django DB Saves? - mysql

While testing out some of our save task performances on Django (bulk_create),
I noticed in a cProfile test that prior to saving, django runs a get_db_prep_save, which calls a value_to_db_decimal. However, because we have so many rows (dealing with financial data) we are updating at a time, the cumulative time is over 20 seconds for this process. You can see the amount of calls made in the ncalls.
cProfile
ncalls tottime percall cumtime percall filename:lineno(function)
328166 0.344 0.000 21.733 0.000 __init__.py:891(get_db_prep_save)
328166 0.334 0.000 20.521 0.000 __init__.py:860(value_to_db_decimal)
309035 1.253 0.000 20.288 0.000 util.py:142(format_number)
I've attempted creating Django objects with DecimalFields set up exactly to our Django models.py (4 Decimal places) using Decimal(x).quantize(.0004), but no matter what I do, this specific function ends up taking 20+ seconds. (I was secretly hoping by passing Decimal types, it wouldn't need to be reformatted by Django.) Is there a way I can get around this when I create the Django object to be saved into MySQL?
One option seems to be override get_db_prep_save, but I'm just wondering if there's something simpler I'm missing. Thanks!

Related

How do I wait for a random amount of time before executing the next action in Puppeteer?

I would love to be able to wait for a random amount of time (let's say a number between 5-12 seconds, chosen at random each time) before executing my next action in Puppeteer, in order to make the behaviour seem more authentic/real world user-like.
I'm aware of how to do it in plain Javascript (as detailed in the Mozilla docs here), but can't seem to get it working in Puppeteer using the waitFor call (which I assume is what I'm supposed to use?).
Any help would be greatly appreciated! :)

You can use vanila JS to randomly wait between 5-12 seconds between action.
await page.waitFor((Math.floor(Math.random() * 12) + 5) * 1000)
Where:
5 is the start number
12 is the end number
1000 means it's converting seconds to milliseconds
(PS: However, if you question is about waiting 5-12 seconds randomly before every action, then you should have a class with wrapper, which is a different issue until you update your question.)

Optimizing web parsing with Beautifulsoup, Functions I don't know?

I have a multiprocessing loop which uses urllib and beautifulsoup to scan webpages for data, then I run if statements. Each process takes about 3 seconds to run. 2.95 of those seconds are spent getting the html, and the remainder is spent running ifs and cutting up the very small amount of data that I need.
The webpages consists of about 622 lines with something like 125,000 characters. I only need two or three lines and 200-300 characters. I am looking for a way to shorten the time this loop runs. Is there a function that will allow me to skip the first 500 lines of html? does anyone have other recommendations? for now I am using tags and attributes to determine what info I need, but if I could just say 'I want to read only lines 500-700' wouldnt that be faster?
given that the entire pool of multiprocesses takes nearly three minutes to run, any amount of time I can shave off will be helpful to me. Here's what I am using so far to pick apart the html.
source = urllib.request.urlopen(l[y]).read()
soup = bs.BeautifulSoup(source,'lxml')
for row in soup.html.body.find_all('table', attrs={'class':'table-1'}):
for i,j in zip(row.find_all('a'), row.find_all('td', attrs={'width':'130', 'align':'right'})):
>run ifs
Thank you for reading.

how to find writes per second on mysql?

Is there a mysql variable or monitoring that tells how many writes per second are being recorded ?
Can I use some variables values and compute to get the same result ?
Let's say I need to plot a graph dynamically of the same. What should i be doing ?
Im looking for command line options and not GUI based monitoring tools.
I have a mixed tokudb and innodb use case, so something non-storage engine specific would be better.

( Com_insert + Com_delete + Com_delete_multi +
Com_replace + Com_update + Com_update_multi ) / Uptime
gives you "writes/sec" since startup. This is from the point of view of the user issuing queries (such as INSERT).
Or did you want "rows written / sec"?
Or "disk writes / sec"?
The values for the above expression come from either SHOW GLOBAL STATUS or the equivalent place in information_schema.
If you want "write in the last 10 minutes", then capture the counters 10 minutes ago and now. The subtract to get the 'change' and finally divide.
There are several GUIs that will do that arithmetic and much more. Consider MonYog ($$), MySQL Enterprise Monitor ($$$), cacti, etc.

Weka Decision Tree

I am trying to use weka to analyze some data. I've got a dataset with 3 variables and 1000+ instances.
The dataset references movie remakes and
how similar they are (0.0-1.0)
the difference in years between the movie and the remake
and lastly if they were made by the same studio (yes or no)
I am trying to make a decision tree to analyze the data. Using the J48 (because that's all I have ever used) I only get one leaf. Im assuming I'm doing something wrong. Any help is appreciated.
Here is a snippet from the data set:
Similarity YearDifference STUDIO TYPE
0.5 36 No
0.5 9 No
0.85 18 No
0.4 10 No
0.5 15 No
0.7 6 No
0.8 11 No
0.8 0 Yes
...
If interested the data can be downloaded as a csv here http://s000.tinyupload.com/?file_id=77863432352576044943

Your data set is not balanced cause there are almost 5 times more "No" then "Yes" for class attribute. That's why J48 is tree which is actually just one leaf that classifies everything as "NO". You can do one of these things:
sample your data set so you have equal number of No and Yes
Try using better classification algorithm e.g. Random Forest (it's located few spaces below J48 in Weka explorer GUI)

neo4j batchimporter is slow with big IDs

i want to import csv-Files with about 40 million lines into neo4j. For this i try to use the "batchimporter" from https://github.com/jexp/batch-import.
Maybe it's a problem that i provide own IDs. This is the example
nodes.csv
i:id
l:label
315041100 Person
201215100 Person
315041200 Person
rels.csv :
start
end
type
relart
315041100 201215100 HAS_RELATION 30006
315041200 315041100 HAS_RELATION 30006
the content of batch.properties:
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=1000M
neostore.relationshipstore.db.mapped_memory=5000M
neostore.propertystore.db.mapped_memory=4G
neostore.propertystore.db.strings.mapped_memory=2000M
neostore.propertystore.db.arrays.mapped_memory=1000M
neostore.propertystore.db.index.keys.mapped_memory=1500M
neostore.propertystore.db.index.mapped_memory=1500M
batch_import.node_index.node_auto_index=exact
./import.sh graph.db nodes.csv rels.csv
will be processed without errors, but it takes about 60 seconds!
Importing 3 Nodes took 0 seconds
Importing 2 Relationships took 0 seconds
Total import time: 54 seconds
When i use smaller IDs - for example 3150411 instead of 315041100 - it takes just 1 second!
Importing 3 Nodes took 0 seconds
Importing 2 Relationships took 0 seconds
Total import time: 1 seconds
Actually i would take even bigger IDs with 10 digits. I don't know what i'm doing wrong. Can anyone see an error?
JDK 1.7
batchimporter 2.1.3 (with neo4j 2.1.3)
OS: ubuntu 14.04
Hardware: 8-Core-Intel-CPU, 16GB RAM

I think the problem is that the batch importer is interpreting those IDs as actual physical ids on disk. And so the time is spent in the file system, inflating the store files up to the size where they can fit those high ids.
The ids that you're giving are intended to be "internal" to the batch import, or? Although I'm not sure how to tell the batch importer that is the case.
#michael-hunger any input there?

the problem is that those ID's are internal to Neo4j where they represent disk record-ids. if you provide high values there, Neo4j will create a lot of empty records until it reaches your ids.
So either you create your node-id's starting from 0 and you store your id as normal node property.
Or you don't provide node-id's at all and only lookup nodes via their "business-id-value"
i:id id:long l:label
0 315041100 Person
1 201215100 Person
2 315041200 Person
start:id end:id type relart
0 1 HAS_RELATION 30006
2 0 HAS_RELATION 30006
or you have to configure and use an index:
id:long:people l:label
315041100 Person
201215100 Person
315041200 Person
id:long:people id:long:people type relart
0 1 HAS_RELATION 30006
2 0 HAS_RELATION 30006
HTH Michael
Alternatively you can also just write a small java or groovy program to import your data if handling those ids with the batch-importer is too tricky.
See: http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Possible To Get Around Converting To Decimal for Django DB Saves? - mysql

Related

How do I wait for a random amount of time before executing the next action in Puppeteer?

Optimizing web parsing with Beautifulsoup, Functions I don't know?

how to find writes per second on mysql?

Weka Decision Tree

neo4j batchimporter is slow with big IDs

Categories

Resources