For now I always created the structure and logic of my backend with Django. But when I insert data into the database I alwas did that directly via a http request to a php script.
When the projects grows it is getting pretty messy. As well, there were always complications with timestamps from the database to the backend.
I want to eliminate all these flaws but could not find any good example If I could just make a request to a certain view in Django, containing all the information I want to put into the database. Django and the database are on the same machine, but the data that is being inserted comes from different devices.
Maybe you could give me a hint how to search further for this
You can just create a python script and run that.
Assuming you have a virtualenv, ensure you have to activated. And put this in the root of your Django project.
#!/usr/bin/env python
import os
import django
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myapp.settings")
django.setup()
# Import after setup, to ensure they are initiliazed properly.
from myapp.models import MyModel, OtherModel
if __name__ == "__main__":
# Create your objects.
obj = MyModel.objects.create(some_value="foo")
other_obj = OtherModel.objects.create(title="Bar", ref=obj)
You can also use transaction to ensure it only commits all or none.
from django.db import transaction
with transaction.atomic():
obj = MyModel.objects.create(some_value="foo")
other_obj = OtherModel.objects.create(title="Bar", ref=obj)
Should one of the creations fail, everything is rolled back. This prevent from ending up with a half filled or corrupt database.
Related
I have a pyramid game server app that uses sqlalchemy to read/write to postgres database. I want to read a certain table(call it games) from the database at the time this app is created. This games data will be used by one of my wsgi middleware, that is hooked in the app, to send statsd metrics. To do this, I added a subscriber in the main function of my app like:
config.add_subscriber(init_mw_data, ApplicationCreated)
Now, I want to read the games table in the following function:
def init_mw_data(event):
...
...
Anybody know how I can read games table inside the function init_mw_data ?
It based how you configured your application.
Default sqlalchemy template from pyramid-cookiecutter-starter have dbsession_factory.
So, you can do something like this:
def init_mw_data(event):
registry = event.app.registry
dbsession = registry['dbsession_factory']()
dbsession.query(...)
...
I have a project that is connected to an external database and just have view access to it. So I created my models with managed=False flag.
I was wondering how can I find out in django that any change in that database is happened. Is there any solution in django or I should find a method to communicate between that database and my django app. like socket, database triggers and ...?
More details:
Image my models is like this:
class Alert(models.Model):
key = models.CharField(max_length=20)
class Meta:
managed = False
Now i want to be notified in django each time the database is updated. I want a signal to capture database updates and do something in django?
Let's say I need to create a lot of different documents/collections in firestore. I need to add it quickly, like copy and paste json. I can't do that with standard firebase console, because adding 100 documents will take me forever. Is there any solutions for to bulk create mock data with a given structure in firestore db?
If you switch to the Cloud Console (rather than Firebase Console) for your project, you can use Cloud Shell as a starting point.
From the Cloud Shell environment you'll find tools like node and python installed and available. Using whatever one you prefer, you can write a script using the Server Client libraries.
For example in Python:
from google.cloud import firestore
import random
MAX_DOCUMENTS = 100
SAMPLE_COLLECTION_ID = u'users'
SAMPLE_COLORS = [u'Blue', u'Red', u'Green', u'Yellow', u'White', u'Black']
# Project ID is determined by the GCLOUD_PROJECT environment variable
db = firestore.Client()
collection_ref = db.collection(SAMPLE_COLLECTION_ID)
for x in range(0, MAX_DOCUMENTS - 1):
collection_ref.add({
u'primary': random.choice(SAMPLE_COLORS),
u'secondary': random.choice(SAMPLE_COLORS),
u'trim': random.choice(SAMPLE_COLORS),
u'accent': random.choice(SAMPLE_COLORS)
})
While this is the easiest way to get up and running with a static dataset, it lives a little to be desired. Namely with Firestore, live dynamic data is needed to exercises it's functionally, such as real-time queries. For this task, using Cloud Scheduler & Cloud Functions is a relatively easy way to regularly updating sample data.
In addition to the sample generation code, you'll specific the update frequency in the Cloud Scheduler. For instance in the image below, */10 * * * * defines a frequency of every 10 minutes using the standard unix-cron format:
For non-static data, often a timestamp is useful. Firestore provides a way to have a timestamp from the database server added at write-time as one of the fields:
u'timestamp': firestore.SERVER_TIMESTAMP
It is worth noting that timestamps like this will hotspot in production systems if not sharded correctly. Typically 500 writes/second to the same collection is the maximum you will want so that the index doesn't hotspot. Sharding can be as simple something like as each user having their own collection (500 writes/second per user). However for this example, writing 100 documents every minute via a scheduled Cloud Function is definitely not an issue.
FireKit is a good resource to use for this purpose. It even allows sub-collections.
https://retroportalstudio.gumroad.com/l/firekit_free
I am trying to start a simple django app. I have been on it for days. I was able to this in flask in a few hrs.
I need advice on connecting to an external database to grab tables and display them on django pages.
This is my code in flask
#app.route("/topgroups")
def topgroups():
con = sql.connect("C:\\Users\\win10\\YandexDisk\\apps\\flask\\new_file.sqlite")
con.row_factory = sql.Row
cur = con.cursor()
cur.execute("SELECT domain, whois, Traffic, Groups,LE,adddate FROM do_1 where Groups in (75,86,66,58,67,57,68,85,48,56,76,77,46,65,47,64,45,55,74,54,44,33,34,43)")
rows = cur.fetchall();
return render_template("index.html", rows = rows)
I will give you the python answer but read until the end because you may be losing a lot on Django if you follow this approach.
Python comes with SQLite capabilities so you don't even need to install packages Python Docs:
Connect
import sqlite3
conn = sqlite3.connect('C:\\Users\\win10\\YandexDisk\\apps\\flask\\new_file.sqlite')
Want to ensure read only? from the docs
conn = sqlite3.connect('file:C:\\Users\\win10\\YandexDisk\\apps\\flask\\new_file.sqlite?mode=ro', uri=True)
Use
cur = conn.cursor()
... (just like for flask)
Note/ My recommendation
One of the biggest advantages of Django is:
Define your data models entirely in Python. You get a rich, dynamic database-access API for free — but you can still write SQL if needed.
And you'll loose a lot without it, from basic stuff like what you asked to even unit-testing capabilities.
Follow this tutorial to integrate your database: Integrating Django with a legacy database.
You may set manage=False and Django won't touch in those tables,
only create new ones to support the app.
If you just use that DB for some special purpose then give a look at Django Multiple databases
My current task on hand is to figure out the best approach to load millions of documents in solr.
The data file is an export from DB in csv format.
Currently, I am thinking about splitting the file into smaller files and having a script while post this smaller ones using curl.
I have noticed that if u post high amount of data, most of the time the request times out.
I am looking into Data importer and it seems like a good option
Any others ideas highly appreciated
Thanks
Unless a database is already part of your solution, I wouldn't add additional complexity to your solution. Quoting the SOLR FAQ it's your servlet container that is issuing the session time-out.
As I see it, you have a couple of options (In my order of preference):
Increase container timeout
Increase the container timeout. ("maxIdleTime" parameter, if you're using the embedded Jetty instance).
I'm assuming you only occasionally index such large files? Increasing the time-out temporarily might just be simplest option.
Split the file
Here's the simple unix script that will do the job (Splitting the file in 500,000 line chunks):
split -d -l 500000 data.csv split_files.
for file in `ls split_files.*`
do
curl 'http://localhost:8983/solr/update/csv?fieldnames=id,name,category&commit=true' -H 'Content-type:text/plain; charset=utf-8' --data-binary #$file
done
Parse the file and load in chunks
The following groovy script uses opencsv and solrj to parse the CSV file and commit changes to Solr every 500,000 lines.
import au.com.bytecode.opencsv.CSVReader
import org.apache.solr.client.solrj.SolrServer
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.common.SolrInputDocument
#Grapes([
#Grab(group='net.sf.opencsv', module='opencsv', version='2.3'),
#Grab(group='org.apache.solr', module='solr-solrj', version='3.5.0'),
#Grab(group='ch.qos.logback', module='logback-classic', version='1.0.0'),
])
SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/");
new File("data.csv").withReader { reader ->
CSVReader csv = new CSVReader(reader)
String[] result
Integer count = 1
Integer chunkSize = 500000
while (result = csv.readNext()) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", result[0])
doc.addField("name_s", result[1])
doc.addField("category_s", result[2])
server.add(doc)
if (count.mod(chunkSize) == 0) {
server.commit()
}
count++
}
server.commit()
}
In SOLR 4.0 (currently in BETA), CSV's from a local directory can be imported directly using the UpdateHandler. Modifying the example from the SOLR Wiki
curl http://localhost:8983/solr/update?stream.file=exampledocs/books.csv&stream.contentType=text/csv;charset=utf-8
And this streams the file from the local location, so no need to chunk it up and POST it via HTTP.
Above answers have explained really well the ingestion strategies from single machine.
Few more options if you have big data infrastructure in place and want to implement distributed data ingestion pipeline.
Use sqoop to bring data to hadoop or place your csv file manually in hadoop.
Use one of below connector to ingest data:
hive- solr connector, spark- solr connector.
PS:
Make sure no firewall blocks connectivity between client nodes and solr/solrcloud nodes.
Choose right directory factory for data ingestion, if near real time search is not required then use StandardDirectoryFactory.
If you get below exception in client logs during ingestion then tune autoCommit and autoSoftCommit configuration in solrconfig.xml file.
SolrServerException: No live SolrServers available to handle this
request
Definitely just load these into a normal database first. There's all sorts of tools for dealing with CSVs (for example, postgres' COPY), so it should be easy. Using Data Import Handler is also pretty simple, so this seems like the most friction-free way to load your data. This method will also be faster since you won't have unnecessary network/HTTP overhead.
The reference guide says ConcurrentUpdateSolrServer could/should be used for bulk updates.
Javadocs are somewhat incorrect (v 3.6.2, v 4.7.0):
ConcurrentUpdateSolrServer buffers all added documents and writes them into open HTTP connections.
It doesn't buffer indefinitely, but up to int queueSize, which is a constructor parameter.