How can I pass a .txt file as a function parameter - function

Say I have a function that reads a .txt file and creates arrays based on the columns of the data within that file. What I have right now inside the function looks like:
data = open("some_file_name.txt","r")
But if I want to change the .txt file that the function reads I have to manually go into the code and type in the new file name before running it again. Instead, how can I pass any file name to the function so it looks like:
my_function(/filepath/some_file_name.txt):
data = open("specified_file_name.txt","r")

I think you want
def my_function(filepath):
data = open(filepath, "r")
...
and then
my_function("/filepath/some_file_name.txt")
or better:
def my_function(data):
...
and then
with open("/filepath/some_file_name.txt", "rb") as data:
my_function(data)
The latter version lets you pass in any file-like object to my_function().
Update: if you want to get fancy and allow file names or file handles:
def my_func(data):
if isinstance(data, basestring):
with open(data, 'rb') as f:
return my_func(f)
...

Related

Django FileField saving empty file to database

I have a view that should generate a temporary JSON file and save this TempFile to the database. The content to this file, a dictionary named assets, is created using DRF using serializers. This file should be written to the database in a model called CollectionSnapshot.
class CollectionSnapshotCreate(generics.CreateAPIView):
permission_classes = [MemberPermission, ]
def create(self, request, *args, **kwargs):
collection = get_collection(request.data['collection_id'])
items = Item.objects.filter(collection=collection)
assets = {
"collection": CollectionSerializer(collection, many=False).data,
"items": ItemSerializer(items, many=True).data,
}
fp = tempfile.TemporaryFile(mode="w+")
json.dump(assets, fp)
fp.flush()
CollectionSnapshot.objects.create(
final=False,
created_by=request.user,
collection_id=collection.id,
file=ContentFile(fp.read(), name="assets.json")
)
fp.close()
return JsonResponse({}, status=200)
Printing assets returns the dictionary correctly. So I am getting the dictionary normally.
Following the solution below I do get the file saved to the db, but without any content:
copy file from one model to another
Seems that json.dump(assets, fp) is failing silently, or I am missing something to actually save the content to the temp file prior to sending it to the database.
The question is: why is the files in the db empty?
I found out that fp.read() throws content based on the current pointer in the file. At least, that is my understanding. So after I dump assets dict as json the to temp file, I have to bring back the cursor to the beggining using fp.seek(0). This way, when I call fp.read() inside file=ContentFile(fp.read(), ...) it actually reads all the content. It was giving me empty because there was nothing to read since the cursor was at the end of the file.
fp = tempfile.TemporaryFile(mode="w+")
json.dump(assets, fp)
fp.flush()
fp.seek(0) // important
CollectionSnapshot.objects.create // stays the same

Get information out of large JSON file

I am new to JSON file and i'm strugeling to get any information out of it.
The structure of the JSON file is as following:
Json file Structure
Now what I need is to access the "batches", to get the data from each variable.
I did try codes (shown below) i've found to reach deeper keys but somehow i still didnt get any results.
1.
def safeget(dct, *keys):
for key in keys:
try:
dct = dct[key]
except KeyError:
return None
return dct
safeget(mydata,"batches")
def dict_depth(mydata):
if isinstance(mydata, dict):
return 1 + (max(map(dict_depth, mydata.values()))
if mydata else 0)
return 0
print(dict_depth(mydata))
The final goal then would be to create a loop to extract all the information but thats something for the future.
Any help is highly appreciated, also any recommendations how i should ask things here in the future to get the best answers!
As far as I understood, you simply want to extract all the data without any ordering?
Then this should work out:
# Python program to read
# json file
import json
# Opening JSON file
f = open('data.json',)
# returns JSON object as
# a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data['emp_details']:
print(i)
# Closing file
f.close()

How can I append id to whitelist.json so it will be added and the bot can read it without restart

How can I append id to whitelist.json so it will be added and the bot can read it without restart, so when i append the id to whitelist.json the bot can directly read it without restart and let the new id use the bot? here's what I've tried
#bot.command()
async def whitelist(ctx, ids: int=None):
data = {}
for id in ids:
data[id] = []
data[id].append(id)
with open('./whitelist.json', 'w') as file:
json.dump(data, file)
and here's what's in whitelist.json if it will help:
{
"Whitelist": [483686172221243402]
}
Note: I want the command let me add more than one id per time, and again i want the bot to access the whitelist.json and read the new data directly so the new id can use the bot
You should use json.dumps() and write to the file like so:
import json
#bot.command()
async def whitelist(ctx, ids: int=None):
data = {}
for id in ids:
data[id] = []
data[id].append(id)
with open("whitelist.txt", "w") as f:
f.write(json.dumps(data)) # write string to file using dumps.
now the next time you open this file the contents will be modified.

Value Error while trying to read json file

In Django, I am trying to read the countries to cities json file that's available here: https://raw.githubusercontent.com/David-Haim/CountriesToCitiesJSON/master/countriesToCities.json
I have downloaded the file locally into my static assets folder and I am doing the following to open, read and push all cities into another array
obj = []
filename = 'static/json/countriesToCities.json'
with open(filename, "r") as f:
data = json.loads(f.read())
for key, values in data:
obj.append(key[0])
However, this gives me the following error:
ValueError at /citiesUrl/
No JSON object could be decoded
How do I push all the values of each key into a new array?
use load instead loads (first for files, second for strings)
I've tested your JSON and it works:
json_data = open('/Users/madzohan/Downloads/data.json', 'r')
data = json.load(json_data)

How to read whole file in one string

I want to read json or xml file in pyspark.lf my file is split in multiple line in
rdd= sc.textFile(json or xml)
Input
{
" employees":
[
{
"firstName":"John",
"lastName":"Doe"
},
{
"firstName":"Anna"
]
}
Input is spread across multiple lines.
Expected Output {"employees:[{"firstName:"John",......]}
How to get the complete file in a single line using pyspark?
There are 3 ways (I invented the 3rd one, the first two are standard built-in Spark functions), solutions here are in PySpark:
textFile, wholeTextFile, and a labeled textFile (key = file, value = 1 line from file. This is kind of a mix between the two given ways to parse files).
1.) textFile
input:
rdd = sc.textFile('/home/folder_with_text_files/input_file')
output: array containing 1 line of file as each entry ie. [line1, line2, ...]
2.) wholeTextFiles
input:
rdd = sc.wholeTextFiles('/home/folder_with_text_files/*')
output: array of tuples, first item is the "key" with the filepath, second item contains 1 file's entire contents ie.
[(u'file:/home/folder_with_text_files/', u'file1_contents'), (u'file:/home/folder_with_text_files/', file2_contents), ...]
3.) "Labeled" textFile
input:
import glob
from pyspark import SparkContext
SparkContext.stop(sc)
sc = SparkContext("local","example") # if running locally
sqlContext = SQLContext(sc)
for filename in glob.glob(Data_File + "/*"):
Spark_Full += sc.textFile(filename).keyBy(lambda x: filename)
output: array with each entry containing a tuple using filename-as-key with value = each line of file. (Technically, using this method you can also use a different key besides the actual filepath name- perhaps a hashing representation to save on memory). ie.
[('/home/folder_with_text_files/file1.txt', 'file1_contents_line1'),
('/home/folder_with_text_files/file1.txt', 'file1_contents_line2'),
('/home/folder_with_text_files/file1.txt', 'file1_contents_line3'),
('/home/folder_with_text_files/file2.txt', 'file2_contents_line1'),
...]
You can also recombine either as a list of lines:
Spark_Full.groupByKey().map(lambda x: (x[0], list(x[1]))).collect()
[('/home/folder_with_text_files/file1.txt', ['file1_contents_line1', 'file1_contents_line2','file1_contents_line3']),
('/home/folder_with_text_files/file2.txt', ['file2_contents_line1'])]
Or recombine entire files back to single strings (in this example the result is the same as what you get from wholeTextFiles, but with the string "file:" stripped from the filepathing.):
Spark_Full.groupByKey().map(lambda x: (x[0], ' '.join(list(x[1])))).collect()
If your data is not formed on one line as textFile expects, then use wholeTextFiles.
This will give you the whole file so that you can parse it down into whatever format you would like.
This is how you would do in scala
rdd = sc.wholeTextFiles("hdfs://nameservice1/user/me/test.txt")
rdd.collect.foreach(t=>println(t._2))
"How to read whole [HDFS] file in one string [in Spark, to use as sql]":
e.g.
// Put file to hdfs from edge-node's shell...
hdfs dfs -put <filename>
// Within spark-shell...
// 1. Load file as one string
val f = sc.wholeTextFiles("hdfs:///user/<username>/<filename>")
val hql = f.take(1)(0)._2
// 2. Use string as sql/hql
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val results = hiveContext.sql(hql)
Python way
rdd = spark.sparkContext.wholeTextFiles("hdfs://nameservice1/user/me/test.txt")
json = rdd.collect()[0][1]