I am trying to implement ezdxf into a Flask Web-App, where I am trying to render a file and offer it as a download.
Is that possible without a database? (If not, how can I can I change the file directory of the saveas function to a web database?)
Thanks Jan
You can write the DXF file to a text stream by the write method and therefore can be written into a string by using a StringIO object. StringIO.getvalue() returns an unicode string, which has to be encoded into a binary string with the correct encoding if your app needs binary encoded data.
Text encoding for DXF R2007 (AC1021) and later is always 'utf8', for older DXF versions the required encoding is stored in Drawing.encoding.
import io
import ezdxf
def to_binary_data(doc):
stream = io.StringIO()
doc.write(stream)
dxf_data = stream.getvalue()
stream.close()
enc = 'utf-8' if doc.dxfversion >= 'AC1021' else doc.encoding
return dxf_data.encode(enc)
doc = ezdxf.new()
binary_data = to_binary_data(doc)
Some more information and samples of your code will help. You can use the html A element to enable a user to download a file from their browser. You have to link the "href" property of the A element as the contents of the dxf file.
Here is an example of how to do this with ezdxf information, based on Mozman's information above too:
# Export file as string data so it can be transfered to the browser html A element href:
# Create a string io object: An in-memory stream for text I/O
stream_obj = io.StringIO()
# write the doc (ie the dxf file) to the doc stream object
doc.write(stream_obj)
# get the stream object values which returns a string
dxf_text_string = stream_obj.getvalue()
# close stream object as required by good practice
stream_obj.close()
file_data = "data:text/csv;charset=utf-8," + dxf_text_string
and then assign the "file_data" to the href property. I use Dash - Plotly callbacks and can provide you with code on how to do it in that if you want.
Or you can also use the flask.send_file function in a flask routing. This requires the data to be in binary format.
# The following code is within a flask routing
# Create a BytesIO object
mem = io.BytesIO()
# Get the stringIO values as string, encode it to utf-8 and write it to the bytes object
# Create a string io object: An in-memory stream for text I/O
stream_obj = io.StringIO()
# write the doc (ie the dxf file) to the doc stream object
doc.write(stream_obj)
# The bytes object file type object is what is required for the flask.send_file method to work
mem.write(stream_obj.getvalue().encode('utf-8'))
mem.seek(0)
# Close StringIO object
stream_obj.close()
return flask.send_file(
mem,
mimetype='text/csv',
attachment_filename='drawing.dxf',
as_attachment=True,
cache_timeout=0
)
I can provide you with more information if you want but you may need to provide some of your code structure to see how youre encoding and passing the data around. Thanks JF
Related
I have PIL Image and I want to convert it to a string that is JSON serializable and then I want to convert it back to the PIL Image as it was. I have literally read hundreds of questions and answers on StackOverflow but nobody could help me what I want to do.
Some answers say to use tostring() method from PIL which is actually now depreciated and is tobytes() now and it returns bytes object which I can't directly put into JSON.
Then some of them used base64.b64encode() which also returns bytes object and is still not JSON compatible.
Mind that I don't wish to store the string directly into a file, I want to make it JSON compatible and then convert it back to PIL Image as it was.
And Yes, I don't want to save it with Image.save()
You can simply decode a b64encoded bytes object to a string, i.e.
>>> image = b"1234567890"
>>> base64.b64encode(image)
b'MTIzNDU2Nzg5MA==' # bytes
>>> base64.b64encode(image).decode()
'MTIzNDU2Nzg5MA==' # string
As a side note you can also use b85encode to save space.
From Selcuk's answer I created a string and was able to convert it back to image, I did:
from PIL import Image
import base64
image = Image.open("ptable.png")
bytes = image.tobytes()
mystr = base64.b64encode(bytes).decode()
# _dict = {"bytes": bytes}
newbytes = base64.b64decode(mystr)
image = Image.frombytes("RGBA", (image.size), newbytes, "raw")
image.show()
I have 2 directories: 1 with txt files and the other with corresponding JSON (metadata) files (around 90000 of each). There is one JSON file for each CSV file, and they share the same name (they don't share any other fields). I am trying to index all these files in Apache solr.
The txt files just have plain text, I mapped each line to a field call 'sentence' and included the file name as a field using the data import handler. No problems here.
The JSON file has metadata: 3 tags: a URL, author and title (for the content in the corresponding txt file).
When I index the JSON file (I just used the _default schema, and posted the fields to the schema, as explained in the official solr tutorial), I don't know how to get the file name into the index as a field. As far as i know, that's no way to use the Data import handler for JSON files. I've read that I can pass a literal through the bin/post tool, but again, as far as I understand, I can't pass in the file name dynamically as a literal.
I NEED to get the file name, it is the only way in which I can associate the metadata with each sentence in the txt files in my downstream Python code.
So if anybody has a suggestion about how I should index the JSON file name along with the JSON content (or even some workaround), I'd be eternally grateful.
As #MatsLindh mentioned in the comments, I used Pysolr to do the indexing and get the filename. It's pretty basic, but I thought I'd post what I did as Pysolr doesn't have much documentation.
So, here's how you use Pysolr to index multiple JSON files, while also indexing the file name of the files. This method can be used if you have your files and your metadata files with the same filename (but different extensions), and you want to link them together somehow, like in my case.
Open a connection to your Solr instance using the pysolr.Solr command.
Loop through the directory containing your files, and get the filename of each file using os.path.basename and store it in a variable (after removing the extension, if necessary).
Read the file's JSON content into another variable.
Pysolr expects whatever is to be indexed to be stored in a list of dictionaries where each dictionary corresponds to one record.
Store all the fields you want to index in a dictionary (solr_content in my code below) while making sure the keys match the field names in your managed-schema file.
Append the dictionary created in each iteration to a list (list_for_solr in my code).
Outside the loop, use the solr.add command to send your list of dictionaries to be indexed in Solr.
That's all there is to it! Here's the code.
solr = pysolr.Solr('http://localhost:8983/solr/collection_name')
folderpath = directory-where-the-files-are-present
list_for_solr = []
for filepath in iglob(os.path.join(folderpath, '*.meta')):
with open(filepath, 'r') as file:
filename = os.path.basename(filepath)
# filename is xxxx.yyyy.meta
filename_without_extension = '.'.join(filename.split('.')[:2])
content = json.load(file)
solr_content = {}
solr_content['authors'] = content['authors']
solr_content['title'] = content['title']
solr_content['url'] = content['url']
solr_content['filename'] = filename_without_extension
list_for_solr.append(solr_content)
solr.add(list_for_solr)
I have downloaded a xxxx.json.lz4 file from https://censys.io/ however when I try to read the file using the following line I get no data out/count of 0.
metadata_lz4 = spark.read.json("s3n://file.json.lz4")
it returns no results although decompressing manually works fine and can be imported into Spark.
I have also tried
val metadata_lz4_2 = spark.sparkContext.newAPIHadoopFile("s3n://file.json.lz4", classOf[TextInputFormat], classOf[LongWritable], classOf[Text])
Which also returns no results.
I have multiple of these files which are 100GBs each so really keen on not having to decompress each one manually.
Any ideas?
According to this open issue spark LZ4 decompressor uses different specification then the standard LZ4 decompressor.
Hence until this issue will be solved in apache-spark, you won't be able to use spark LZ4 in order to decompress standard LZ4 compressed files.
I don't think our Lz4Codec implementation actually uses the FRAME
specification (http://cyan4973.github.io/lz4/lz4_Frame_format.html)
when creating text based files. It seems it was added in as a codec
for use inside block compression formats such as
SequenceFiles/HFiles/etc., but wasn't oriented towards Text files from
the looks of it, or was introduced at a time when there was no FRAME
specification of LZ4.
Therefore, fundamentally, we are not interoperable with the lz4
utility. The difference is very similar to the GPLExtras' LzoCodec vs.
LzopCodec, the former is just the data compressing algorithm, but the
latter is an actual framed format interoperable with lzop CLI utility.
To make ourselves interoperable, we'll need to introduce a new frame
wrapping codec such as LZ4FrameCodec, and users could use that when
they want to decompress or compress text data produced/readable by
lz4/lz4cat CLI utilities.
I achieved to parse lz4 compression in Pyspark in this way:
import lz4.frame
import pyspark.sql.functions as F
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").getOrCreate()
sc = spark.sparkContext
list_paths = ['/my/file.json.lz4', '/my/beautiful/file.json.lz4']
rdd = sc.binaryFiles(','.join(list_paths))
df = rdd.map(lambda x: lz4.frame.decompress(x[1])).map(lambda x: str(x)).map(lambda x: (x, )).toDF()
this is usually enough for non complex objects. But if the compressed JSON you are
parsing has nested structures, then it is necessary to do extra cleaning on the parsed file before calling the function F.from_json():
schema = spark.read.json("/my/uncompressed_file.json").schema
df = df.select(F.regexp_replace(F.regexp_replace(F.regexp_replace(F.regexp_replace(F.regexp_replace("_1", "None", "null"), "False", "false"), "True", "true"), "b'", ""), "'", "").alias("json_notation"))
result_df = df.select(F.from_json("json_notation", schema)
where /my/uncompressed_file.json is the /my/file.json.lz4 that you have previously decompressed (unless you want to provide the schema manually if not too complex, it will work anyway)
I wanted to load csv files from a URL in PySpark, is it even possible to do so?
I keep the files on GitHub.
Thanks!
There is no naive way in pyspark (see here).
However, if you have a function that takes as input a URL and outputs the csv:
def read_from_URL(UR):
# your logic here
return data
You can use spark to parallelize this operation:
URL_list = ['http://github.com/file/location/file1.csv, ...]
data = sc.parallelize(URL_list).map(read_from_URL)
I am using Gatling to stress test a RESTful API. I will be posting data that is JSON to a particular URI. I want to use a feed file that is a .tsv where each line is a particular JSON element. However, I get errors and I just can't seem to find a pattern or system to add "" to my .tsv JSON so the feed will work. Attached is my code and tsv file.
package philSim
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._
class eventAPISimulation extends Simulation {
object Query {
val feeder = tsv("inputJSON.tsv").circular
val query = forever {
feed(feeder)
.exec(
http("event")
.post("my/URI/here")
.body(StringBody("${json}")).asJSON
)
}
}
val httpConf = http.baseURL("my.url.here:portnumber")
val scn = scenario("event").exec(Query.query)
setUp(scn.inject(rampUsers(100) over (30 seconds)))
.throttle(reachRps(2000) in (30 seconds), holdFor(3 minutes))
.protocols(httpConf)
}
Here is an example of my unedited .tsv with JSON:
json
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:20:50.123Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"sent","eventData":{}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"INVALID","eventTime":"2015-01-23T23:25:20.342Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"received","eventData":{"osVersion":"7.1.2","productVersion":"5.9.2"}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:27:30.342Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"followedLink","eventData":{"deepLinkUrl":"URL.IS.HERE","osVersion":"7.1.2","productVersion":"5.9.2"}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:27:30.342Z","platform":"AndroidApp","notificationId":"123456","pushType":1,"action":"followedLink","eventData":{"deepLinkUrl":"URL.IS.HERE"}}
{"userId":"234342234","secondaryIdType":"mobileProfileId","secondaryIdValue":"66666638","eventType":"push","eventTime":"2015-01-23T23:25:20.342Z","platform":"iPhoneApp","notificationId":"123456","pushType":1,"action":"error","eventData":{}}
I have seen this blog post which talks about manipulating quotation marks (") to get the author's JSON with .tsv to work but the author doesn't offer a system how. I have tried various things and nothing I do really works. Some JSON will work with the quotation wrap similar to what the author of the paper does. However, this doesn't work for everything. What are the best practices for dealing with JSON and Gatling? Thanks for your help!
Straight from Gatling's documentation : Use rawSplit so that Gatling's TSV parser will be able to handle your JSON entries:
tsv("inputJSON.tsv", rawSplit = true).circular