How to import/export RavenDB data from file? - json

I have an application that uses embedded RavenDB. I would like to be able to import/export a specific sets of documents (a document with all nested/referenced documents) to a file.
My ideal function would work like:
var session = store.OpenSession();
MyDocument d1 = session.Load<MyDocument>(someId);
ImportExport.Export(store, d1, "file.xyz");
and then with a different IDocumentStore:
ImportExport.Import(store, "file.xyz");
var session = store.OpenSession();
MyDocument d2 = session.Load<MyDocument>(someId);
And of course d1 equals d2 in any way.
AFAIK Smuggler utility exports all documents at once.
My only other idea was using Json.NET to serialize MyDocument object, save it to file, and then deserialize it (and store it). I have a feeling this is a way to go, but will it work with when MyDocument has many other documents inside?

I ended up using the "Raven.Smuggler.exe" program to get things done using "large" ravendump files. It's unclear to me, however, whether this import process [drops and replaces from scratch] or [merges] the data -- I would perform a database drop and re-create before doing the import process below, to guarantee data integrity.
Download a copy of a matching RavenDB build.
Extract it someplace simple (ex: C:\RavenDB-Build-2956)
Invoke the Smuggler.
Command (setup/replace variable placeholders as needed)
C:\RavenDB-Build-2956\Smuggler\Raven.Smuggler.exe in $instance $dump "-u=$user" "-p=$plainpassword" -d=$dbname
Example variables
$instance = http://localhost/ravendb_iis_web_app/ or maybe http://localhost:8080/
$dump = C:\dump.ravendump
$user = User
$plainpassword = Password
$dbname = MyDatabase

You have the Smuggler API to handle this. See:
Export:
https://github.com/ravendb/ravendb/blob/d54bfba11995e915cf94f35ef3887fcb7d747033/Raven.Database/Smuggler/SmugglerDatabaseApi.cs#L163
Import:
https://github.com/ravendb/ravendb/blob/d54bfba11995e915cf94f35ef3887fcb7d747033/Raven.Database/Smuggler/SmugglerDatabaseApi.cs#L90

Related

JMeter reads the old version of csv

I am trying to test some APIs but first I need to get session keys for each customer I use in the test. I have a CSV file with customer login information and read it with each thread.
I have the following form in my JMeter file.
CSV Data Set Config - User Login Info
Set username, password - one for each iteration
Setup Thread Group
BeanShell Sampler to delete the sessionKeys.csv
File file = new File("C:/user/sessionKeys.csv");
if (file.exists() && file.isFile()) {
file.delete();
}
Login Thread - ThreadCount = threadCount, Loop = 1
Login Request
BeanShell PostPrecessor to create file and append Session Keys to sessionKeys.csv
if("${sessionKey}" != "not_found")
{
File file = new File("C:/user/sessionKeys.csv");
FileWriter fWriter = new FileWriter(file, true);
BufferedWriter buff = new BufferedWriter(fWriter);
buff.write("${sessionKey}\n");
buff.close();
fWriter.close();
}
CSV Data Set Config - Session Keys
API Call Thread - ThreadCount = threadCount, Loop = loop
GetData Request
And I noticed that even if the file is actually deleted, created and filled with new sessionKeys, first a few requests uses the old sessionKeys in the file before it was deleted.
I have tried adding constant timer or changing the structure of the JMeter file but nothing worked.
Take a look at JMeter Test Elements Execution Order
Configuration elements
Pre-Processors
Timers
Sampler
Post-Processors (unless SampleResult is null)
Assertions (unless SampleResult is null)
Listeners (unless SampleResult is null)
CSV Data Set Config is a Configuration Element hence it's executed long before the Beanshell Sampler and this perfectly explains the behaviour you're facing.
So if you need to do some pre-processing of the CSV file you will need to do it in setUp Thread Group
Also be aware that starting from JMeter 3.1 you're supposed to be using JSR223 Test Elements and Groovy language for scripting so it makes sense considering migrating.
It looks like the CSV Config element in your test plan exists outside of the thread group and so will be called first before the file is deleted and recreated.
In your case it might be simpler to just store the session key as a JMeter property so that it can be accessed in all thread groups. You can store it using Groovy like props.put("${sessionKey}", sessionKey) or via JMeter functions like ${__setProperty("sessionKey",${sessionKey})}.
The property can then be accessed again using the property function like ${__P(sessionKey,)}.

Psychopy: how to avoid to store variables in the csv file?

When I run my PsychoPy experiment, PsychoPy saves a CSV file that contains my trials and the values of my variables.
Among these, there are some variables I would like to NOT be included. There are some variables which I decided to include in the CSV, but many others which automatically felt in it.
is there a way to manually force (from the code block) the exclusion of some variables in the CSV?
is there a way to decide the order of the saved columns/variables in the CSV?
It is not really important and I know I could just create myself an output file without using the one of PsychoPy, or I can easily clean it afterwards but I was just curious.
PsychoPy spits out all the variables it thinks you could need. If you want to drop some of them, that is a task for the analysis stage, and is easily done in any processing pipeline. Unless you are analysing data in a spreadsheet (which you really shouldn't), the number of columns in the output file shouldn't really be an issue. The philosophy is that you shouldn't back yourself into a corner by discarding data at the recording stage - what about the reviewer who asks about the influence of a variable that you didn't think was important?
If you are using the Builder interface, the saving of onset & offset times for each component is optional, and is controlled in the "data" tab of each component dialog.
The order of variables is also not under direct control of the user, but again, can be easily manipulated at the analysis stage.
As you note, you can of course write code to save custom output files of your own design.
there is a special block called session_variable_order: [var1, var2, var3] in experiment_config.yaml file, which you probably should be using; also, you should consider these methods:
from psychopy import data
data.ExperimentHandler.saveAsWideText(fileName = 'exp_handler.csv', delim='\t', sortColumns = False, encoding = 'utf-8')
data.TrialHandler.saveAsText(fileName = 'trial_handler.txt', delim=',', encoding = 'utf-8', dataOut = ('n', 'all_mean', 'all_raw'), summarised = False)
notice the sortColumns and dataOut params

How to add w:altChunk and its relationship with python-docx

I have a use case that make use of <w:altChunk/> element in Word document by inject (fragment of) HTML file as alternate chunks and let Word do it works when the file gets opened. The current implementation was using XML/XSL to compose WordML XML, modify relationships, and do all packaging stuffs manually which is a real pain.
I wanted to move to python-docx but the API doesn't support this directly. Currently I found a way to add the <w:altChunk/> in the document XML. But still struggle to find a way to add relationship and related file to the package.
I think I should make a compatible part and pass it to document.part.relate_to function to do its job. But still can't figure how to:
from docx import Document
from docx.oxml import OxmlElement, qn
from docx.opc.constants import RELATIONSHIP_TYPE as RT
def add_alt_chunk(doc: Document, chunk_part):
''' TODO: figuring how to add files and relationships'''
r_id = doc.part.relate_to(chunk_part, RT.A_F_CHUNK)
alt = OxmlElement('w:altChunk')
alt.set(qn('r:id'), r_id)
doc.element.body.sectPr.addprevious(alt)
Update:
As per scanny's advice, below is my working code. Thank you very much Steve!
from docx import Document
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
from docx.opc.part import Part
from docx.opc.constants import RELATIONSHIP_TYPE as RT
def add_alt_chunk(doc: Document, html: str):
package = doc.part.package
partname = package.next_partname('/word/altChunk%d.html')
alt_part = Part(partname, 'text/html', html.encode(), package)
r_id = doc.part.relate_to(alt_part, RT.A_F_CHUNK)
alt_chunk = OxmlElement('w:altChunk')
alt_chunk.set(qn('r:id'), r_id)
doc.element.body.sectPr.addprevious(alt_chunk)
doc = Document()
doc.add_paragraph('Hello')
add_alt_chunk(doc, "<body><strong>I'm an altChunk</strong></body>")
doc.add_paragraph('Have a nice day!')
doc.save('test.docx')
Note: the altChunk parts only work/appear when document is open using MS Word
Well, some hints here anyway. Maybe you can post your working code at the end as a full "answer":
The alt-chunk part needs to start its life as a docx.opc.part.Part object.
The blob argument should be the bytes of the file, which is often but not always plain text. It must be bytes though, not unicode (characters), so any encoding has to happen before calling Part().
I expect you can work out the other arguments:
package is the overall OPC package, available on document.part.package.
You can use docx.opc.package.OpcPackage.next_partname() to get an available partname based on a root template like: "altChunk%s" for a name like "altChunk3". Check what partname prefix Word uses for these, possibly with unzip -l has-an-alt-chunk.docx; should be easy to spot.
The content-type is one in docx.opc.constants.CONTENT_TYPE. Check the [Content_Types].xml part in a .docx file that has an altChunk to see what they use.
Once formed, the document_part.relate_to() method will create the proper relationship. If there is more than one relationship (not common) then you need to create each one separately. There would only be one relationship from a particular part, just some parts are related to more than one other part. Check the relationships in an existing .docx to see, but pretty good guess it's only the one in this case.
So your code would look something like:
package = document.part.package
partname = package.next_partname("altChunkySomethingPrefix")
content_type = docx.opc.constants.CONTENT_TYPE.THE_RIGHT_MIME_TYPE
blob = make_the_altChunk_file_bytes()
alt_chunk_part = Part(partname, content_type, blob, package)
rId = document.part.relate_to(alt_chunk_part, RT.A_F_CHUNK)
etc.

How to upload multiple JSON files into CouchDB

I am new to CouchDB. I need to get 60 or more JSON files in a minute from a server.
I have to upload these JSON files to CouchDB individually as soon as I receive them.
I installed CouchDB on my Linux machine.
I hope some one can help me with my requirement.
If possible can someone help me with pseudo code.
My Idea:
Is to write a python script for uploading all JSON files to CouchDB.
Each and every JSON file must be each document and the data present in
JSON must be inserted same into CouchDB
(the specified format with values in a file).
Note:
These JSON files are Transactional, every second 1 file is generated
so I need to read the file upload as same format into CouchDB on
successful uploading archive the file into local system of different folder.
python program to parse the json and insert into CouchDb:
import sys
import glob
import errno,time,os
import couchdb,simplejson
import json
from pprint import pprint
couch = couchdb.Server() # Assuming localhost:5984
#couch.resource.credentials = (USERNAME, PASSWORD)
# If your CouchDB server is running elsewhere, set it up like this:
couch = couchdb.Server('http://localhost:5984/')
db = couch['mydb']
path = 'C:/Users/Desktop/CouchDB_Python/Json_files/*.json'
#dirPath = 'C:/Users/VijayKumar/Desktop/CouchDB_Python'
files = glob.glob(path)
for file1 in files:
#dirs = os.listdir( dirPath )
file2 = glob.glob(file1)
for name in file2: # 'file' is a builtin type, 'name' is a less-ambiguous variable name.
try:
with open(name) as f: # No need to specify 'r': this is the default.
#sys.stdout.write(f.read())
json_data=f
data = json.load(json_data)
db.save(data)
pprint(data)
json_data.close()
#time.sleep(2)
except IOError as exc:
if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
raise # Propagate other kinds of IOError.
I would use CouchDB bulk API, even though you have specified that you need to send them to db one by one. For example, by implementing a simple queue that gets sent out every say 5 - 10 seconds via a bulk doc call will greatly increase performance of your application.
There is obviously a quirk in that and that is you need to know the IDs of the docs that you want to get from the DB. But for the PUTs it is perfect. (it is not entirely true, you can get ranges of docs using bulk operation if the IDs you are using for your docs can be sorted nicely).
From my experience working with CouchDB, I have a hunch that you are dealing with Transactional documents in order to compile them into some sort of sum result and act on that data accordingly (maybe creating next transactional doc in series). For that you can rely on CouchDB by using 'reduce' functions on the views you create. It takes a little practice to get reduce function working properly and is highly dependent on what it is you actually what to achieve and what data you are prepared to emit by the view so I can't really provide you with more detail on that.
So in the end the app logic would go something like that:
get _design/someDesign/_view/yourReducedView
calculate new transaction
add transaction to queue
onTimeout
send all in transaction queue
If I got that first part of why you are using transactional docs wrong all that would really change is the part where you getting those transactional docs in my app logic.
Also, before writing your own 'reduce' function, have a look at buil-in ones (they are alot faster then anything outside of db engine can do)
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
EDIT:
Since you are starting, I strongly recommend to have a look at CouchDB Definitive Guide.
NOTE FOR LATER:
Here is one hidden stone (well maybe not so much a hidden stone but not an obvious thing to look out for for the new-comer in any case). When you write reduce function make sure that it does not produce too much output for the query without boundaries. This will extremely slow down the entire view even when you provide reduce=false when getting stuff from it.
So you need to get JSON documents from a server and send them to CouchDB as you receive them. A Python script would work fine. Here is some pseudo-code:
loop (until no more docs)
get new JSON doc from server
send JSON doc to CouchDB
end loop
In Python, you could use requests to send the documents to CouchDB and probably to get the documents from the server as well (if it is using an HTTP API).
You might want to checkout the pycouchdb module for python3. I've used it myself to upload lots of JSON objects into couchdb instance. My project does pretty much the same as you describe so you can take a look at my project Pyro at Github for details.
My class looks like that:
class MyCouch:
""" COMMUNICATES WITH COUCHDB SERVER """
def __init__(self, server, port, user, password, database):
# ESTABLISHING CONNECTION
self.server = pycouchdb.Server("http://" + user + ":" + password + "#" + server + ":" + port + "/")
self.db = self.server.database(database)
def check_doc_rev(self, doc_id):
# CHECKS REVISION OF SUPPLIED DOCUMENT
try:
rev = self.db.get(doc_id)
return rev["_rev"]
except Exception as inst:
return -1
def update(self, all_computers):
# UPDATES DATABASE WITH JSON STRING
try:
result = self.db.save_bulk( all_computers, transaction=False )
sys.stdout.write( " Updating database" )
sys.stdout.flush()
return result
except Exception as ex:
sys.stdout.write( "Updating database" )
sys.stdout.write( "Exception: " )
print( ex )
sys.stdout.flush()
return None
Let me know in case of any questions - I will be more than glad to help if you will find some of my code usable.

How can I access the information associated to an object from a Mercurial plugin?

I am trying to write a small Mercurial extension, which, given the path to an object stored within the repository, it will tell you the revision it's at. So far, I'm working on the code from the WritingExtensions article, and I have something like this:
cmdtable = {
# cmd name function call
"whichrev": (whichrev,[],"hg whichrev FILE")
}
and the whichrev function has almost no code:
def whichrev(ui, repo, node, **opts):
# node will be the file chosen at the command line
pass
So , for example:
hg whichrev text_file.txt
Will call the whichrev function with node being set to text_file.txt. With the use of the debugger, I found that I can access a filelog object, by using this:
repo.file("text_file.txt")
But I don't know what I should access in order to get to the sha1 of the file.I have a feeling I may not be working with the right function.
Given a path to a tracked file ( the file may or may not appear as modified under hg status ), how can I get it's sha1 from my extension?
A filelog object is pretty low level, you probably want a filectx:
A filecontext object makes access to data related to a particular filerevision convenient.
You can get one through a changectx:
ctx = repo['.']
fooctx = ctx['foo']
print fooctx.filenode()
Or directly through the repo:
fooctx = repo.filectx('foo', '.')
Pass None instead of . to get the working copy ones.