How to read CSV file from POST? - csv

I've been stuck for hours on this csv problem. The following code is run after a form is posted :
fichier_en_lecture = request.FILES['fichier_csv'].read()
nom_du_fichier = request.FILES['fichier_csv'].name
importateur = request.user
traitement_du_fichier(fichier_en_lecture, importateur)
And the "traitement_du_fichier" function goes like this :
def traitement_du_fichier(fichier_en_lecture, nom_du_fichier, importateur):
nouveau_fichier = FichierAdhérents(importateur=importateur, fichier_csv=nom_du_fichier)
nouveau_fichier.save()
import csv
lecteur = csv.reader(fichier_en_lecture, delimiter=",", quotechar='|')
for row in lecteur:
nouvel_adhérent = AdhérentDuFichier()
nouvel_adhérent['fichier_adhérents'] = nouveau_fichier
column_counter = 0
nouvel_adhérent['fédération'] = row[column_counter]
column_counter += 1
nouvel_adhérent['date_première_adhésion'] = row[column_counter]
column_counter += 1
nouvel_adhérent['date_dernière_cotisation'] = row[column_counter]
I get the following error :
iterator should return strings, not int (did you open the file in text mode?)
I've tried to use open() but from what I understand, open() only works with a direct path to the uploaded file. However, I need to do this from memory.

In python 3,
I used:
import csv
from io import StringIO
csvf = StringIO(xls_file.read().decode())
reader = csv.reader(csvf, delimiter=',')
xls_file being the file got from the POST form.
I hope it helps.

Related

Using CL_FDT_XL_SPREADSHEET class for a .CSV possible?

I have a program that maintains custom Z tables by exporting the table to an excel spreadsheet and it also refreshes the table and updates from an excel spreadsheet with .XSLX files.
However, I also want the program to accept .CSV files.
I use the CL_GUI_FRONTEND_SERVICES=>GUI_UPLOAD method to get the raw data, but when I try to convert the raw data to an XSTRING, an error is thrown
My question: Is the CL_FDT_XL_SPREADSHEET class suitable for .CSV file data or is it only suitable for .XLSX files?
The upload to SAP from .XLSX is done with the CL_GUI_FRONTEND_SERVICES=>GUI_UPLOAD method to get the raw data. Then converted to XSTRING and passed into the CL_FDT_XL_SPREADSHEET class and the IF_FDT_DOC_SPREADSHEET~GET_ITAB_FROM_WORKSHEET method is called to pass that data to a variable where it is used in another method to upload to SAP. This works fine.
Code:
METHOD import_excel_data.
DATA: lt_xtab TYPE cpt_x255,
lv_size TYPE i.
IF i_filetype = abap_true. "******.XLSX UPLOAD*********
cl_gui_frontend_services=>gui_upload( EXPORTING filename = i_file
filetype = 'BIN'
IMPORTING filelength = lv_size
CHANGING data_tab = lt_xtab
EXCEPTIONS
file_open_error = 1
file_read_error = 2
error_no_gui = 3
not_supported_by_gui = 4
OTHERS = 5 ).
IF sy-subrc <> 0.
RAISE EXCEPTION TYPE zcx_excel_exception EXPORTING i_message = |Invalid File { i_file }| ##no_text.
ENDIF.
ELSE."******.CSV UPLOAD*********
cl_gui_frontend_services=>gui_upload( EXPORTING filename = i_file
filetype = 'ASC'
has_field_separator = abap_true
IMPORTING filelength = lv_size
CHANGING data_tab = lt_xtab
EXCEPTIONS
file_open_error = 1
file_read_error = 2
error_no_gui = 3
not_supported_by_gui = 4
OTHERS = 5 ).
IF sy-subrc <> 0.
RAISE EXCEPTION TYPE zcx_excel_exception EXPORTING i_message = |Invalid File { i_file }| ##no_text.
ENDIF.
ENDIF.
cl_scp_change_db=>xtab_to_xstr( EXPORTING im_xtab = lt_xtab
im_size = lv_size
IMPORTING ex_xstring = DATA(lv_xstring) ).
DATA(lo_excel) = NEW cl_fdt_xl_spreadsheet( document_name = i_file
xdocument = lv_xstring ).
lo_excel->if_fdt_doc_spreadsheet~get_worksheet_names(
IMPORTING worksheet_names = DATA(lt_worksheets) ).
rt_table = lo_excel->if_fdt_doc_spreadsheet~get_itab_from_worksheet( lt_worksheets[ 1 ] ).
IF rt_table IS INITIAL.
RAISE EXCEPTION TYPE zcx_excel_exception EXPORTING i_message = 'No Data found in Excel File' ##no_text.
ENDIF.
ENDMETHOD.
Is the CL_FDT_XL_SPREADSHEET class suitable for .CSV file data or is it only suitable for .XLSX files?
No. CL_FDT_XL_SPREADSHEET is based on ABAP iXML framework and works purely with XML formats compliant with OOXML specification which XLSX is also based on.
CSV is nowhere near this pre-requisite, so it won't work.

CSV not importing JSON with correct format into database

Just like the title says, here is my code:
require 'json'
def import_csv
path = Rails.root.join('folder1', 'folder2', 'file.csv')
counter = 0
puts "inserts on table started..."
CSV.foreach(path, headers: true) do |row|
next if row.to_hash['deleted_at'] != nil
counter += 1
puts row.to_json #shows correct format
someModel = someModel.new(row.to_hash) #imports incorrect format of json with backslash in db
#someModel = someModel.new(row.to_json) #ArgumentError: When assigning attributes, you must pass a hash as an argument.
someModel.skip_callbacks = true
someModel.save!
end
puts "#{counter} inserts on table apps complete"
end
import_csv
I can not import the CSV File in the correct format. The import works, but the structure is wrong.
EXPECTED
{"data":{"someData":72}}
GETTING
"{\"data\":{\"someData\":72}}"
How can I import it with the correct JSON format?
If all headers are correct as of the column names of the model
Maybe you can try:
JSON.parse(row.to_json)

How do I store a contentfile into ImageField in Django

I am trying to convert an image uploaded by user into a PDF , and then store it into an ImageField in a mysql database ,using a form, but am facing an error when trying to store the PDF into the database
My views.py is:
from django.core.files.storage import FileSystemStorage
from PIL import Image
import io
from io import BytesIO
from django.core.files.uploadedfile import InMemoryUploadedFile
from django.core.files.base import ContentFile
def formsubmit(request): #submits the form
docs = request.FILES.getlist('photos')
print(docs)
section = request.POST['section']
for x in docs:
fs = FileSystemStorage()
print(type(x.size))
img = Image.open(io.BytesIO(x.read()))
imgc = img.convert('RGB')
pdfdata = io.BytesIO()
imgc.save(pdfdata,format='PDF')
thumb_file = ContentFile(pdfdata.getvalue())
filename = fs.save('photo.pdf', thumb_file)
linkobj = Link(link = filename.file, person = Section.objects.get(section_name = section), date = str(datetime.date.today()), time = datetime.datetime.now().strftime('%H:%M:%S'))
linkobj.save()
count += 1
size += x.size
return redirect('index')
My models.py:
class Link(models.Model):
id = models.BigAutoField(primary_key=True)
person = models.ForeignKey(Section, on_delete=models.CASCADE)
link = models.ImageField(upload_to= 'images', default = None)
date = models.CharField(max_length=80, default = None)
time = models.CharField(max_length=80,default = None)
Error I am getting is:
AttributeError: 'str' object has no attribute 'file'
Other methods I have tried:
1) linkobj = Link(link = thumb_file, person = Section.objects.get(section_name = section), date = str(datetime.date.today()), time = datetime.datetime.now().strftime('%H:%M:%S'))
RESULT OF ABOVE METHOD:
1)The thumb_file doesnt throw an error, rather it stores nothing in the database
Points I have noticed:
1)The file is being stored properly into the media folder, ie: I can see the pdf getting stored in the media folder
How do I solve this? Thank you
You don't (basically ever) need to initialize a Storage by yourself. This holds especially true since the storage for the field might not be a FileSystemStorage at all, but could e.g. be backed by S3.
Something like
import datetime
import io
from PIL import Image
from django.core.files.base import ContentFile
def convert_image_to_pdf_data(image):
img = Image.open(io.BytesIO(image.read()))
imgc = img.convert("RGB")
pdfdata = io.BytesIO()
imgc.save(pdfdata, format="PDF")
return pdfdata.getvalue()
def formsubmit(request): # submits the form
photos = request.FILES.getlist("photos") # list of UploadedFiles
section = request.POST["section"]
person = Section.objects.get(section_name=section)
date = str(datetime.date.today())
time = datetime.datetime.now().time("%H:%M:%S")
count = 0
size = 0
for image in photos:
pdfdata = convert_image_to_pdf_data(image)
thumb_file = ContentFile(pdfdata, name="photo.pdf")
Link.objects.create(
link=thumb_file,
person=person,
date=date,
time=time,
)
count += 1
size += image.size
return redirect("index")
should be enough here, i.e. using a ContentFile for the converted PDF content; the field should deal with saving it into the storage.
(As an aside, why are date and time stored separately as strings? Your database surely has a datetime type...)
Ok so I found an answer, to be fair I wont accept my own answer as it doesn't provide an exact answer to the question I asked, rather its a different method, so if anyone does know , please do share so that the community can benefit:
My Solution:
Instead of using ContentFile, I used InMemoryUploadedFile, to store the converted pdf and then moved it into the database( in an ImageField)
I am going to be honest, I am not completely sure about why ContentFile was not working, but when going through the documentation I found out that :
The ContentFile class inherits from File, but unlike File it operates on string content (bytes also supported), rather than an actual file.
Any detailed explanation is welcome
My new views.py
from django.core.files.storage import FileSystemStorage
from PIL import Image
import io
from io import BytesIO
from django.core.files.uploadedfile import InMemoryUploadedFile
from django.core.files.base import ContentFile
import sys
def formsubmit(request): #submits the form
docs = request.FILES.getlist('photos')
print(docs)
section = request.POST['section']
for x in docs:
fs = FileSystemStorage()
print(type(x.size))
img = Image.open(io.BytesIO(x.read()))
imgc = img.convert('RGB')
pdfdata = io.BytesIO()
imgc.save(pdfdata,format='PDF')
thumb_file = InMemoryUploadedFile(pdfdata, None, 'photo.pdf', 'pdf',sys.getsizeof(pdfdata), None)
linkobj = Link(link = thumb_file, person = Section.objects.get(section_name = section), date = str(datetime.date.today()), time = datetime.datetime.now().strftime('%H:%M:%S'))
linkobj.save()
count += 1
size += x.size
return redirect('index')
If you have a question, you can leave it in the comments and ill try to answer it, Good luck!!!

Iterating through multiline input, and match to database items

I need help iterating through input to a webapp I'm writing, which looks like:
The users will be inputting several hundred (or thousands) of urls pasted from excel documents, each on a new line like this. Thus far, as you can see, I've created the input page, an output page, and written the code to query the database.
from flask import Flask,render_template, request
from flask_sqlalchemy import SQLAlchemy
from urllib.parse import urlparse
from sqlalchemy.ext.declarative import declarative_base
app = Flask(__name__)
app.config["DEBUG"] = True
app.config["SECRET_KEY"] = "secret_key_here"
db = SQLAlchemy(app)
SQLALCHEMY_DATABASE_URI = db.create_engine(connector_string_here))
app.config[SQLALCHEMY_DATABASE_URI] = SQLALCHEMY_DATABASE_URI
app.config["SQLALCHEMY_POOL_RECYCLE"] = 299
app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False
db.Model = declarative_base()
class Scrapers(db.Model):
__tablename__ = "Scrapers"
id = db.Column(db.Integer, primary_key = True)
scraper_dom = db.Column(db.String(255))
scraper_id = db.Column(db.String(128))
db.Model.metadata.create_all(SQLALCHEMY_DATABASE_URI)
Session = db.sessionmaker()
Session.configure(bind=SQLALCHEMY_DATABASE_URI)
session = Session()
scrapers = session.query(Scrapers.scraper_dom, Scrapers.scraper_id).all()
#app.route("/", methods=["GET","POST"])
def index():
if request.method == "Get":
return render_template("url_page.html")
else:
return render_template("url_page.html")
#app.route("/submit", methods=["GET","POST"])
def submit():
sites = [request.form["urls"]]
for site in sites:
que = urlparse(site).netloc
return render_template("submit.html", que=que)
#scrapers.filter(Scrapers.scraper_dom.in_(
#next(x.scraper_id for x in scrapers if x.matches(self.fnetloc))
As is apparent, this is incomplete. I've omitted previous attempts at matching the input, as I realized I had issues iterating through the input. At first, I could only get it to print all of the input instead of iterating over it. And now, it prints like this:
Which is just repeating the urlparse(site).netloc for the first line of input, some random number of times. It is parsing correctly and returning the actual value I will need to use later (for each urlparse(site).netloc match scraper_dom and return associated scraper_id). Now, though, I've tried using input() but kept getting errors with [request.form["urls"]] not being an iterable.
Please help, it'd be much appreciated.
Output of sites:
New output with:
que = [urlparse(site).netloc for site in request.form["urls"].split('\n')]

Transfering csv files into hdfs, with converting them to avro, using flume

I am new to Big Data and I have task to transfer csv files to HDFS using Flume, but it also should convert those csv to avro. I tried to do that using following flume configuration:
a1.channels = dataChannel
a1.sources = dataSource
a1.sinks = dataSink
a1.channels.dataChannel.type = memory
a1.channels.dataChannel.capacity = 1000000
a1.channels.dataChannel.transactionCapacity = 10000
a1.sources.dataSource.type = spooldir
a1.sources.dataSource.spoolDir = {spool_dir}
a1.sources.dataSource.fileHeader = true
a1.sources.dataSource.fileHeaderKey = file
a1.sources.dataSource.basenameHeader = true
a1.sources.dataSource.basenameHeaderKey = basename
a1.sources.dataSource.interceptors.attach-schema.type = static
a1.sources.dataSource.interceptors.attach-schema.key = flume.avro.schema.url
a1.sources.dataSource.interceptors.attach-schema.value = {path_to_schema_in_hdfs}
a1.sinks.dataSink.type = hdfs
a1.sinks.dataSink.hdfs.path = {sink_path}
a1.sinks.dataSink.hdfs.format = text
a1.sinks.dataSink.hdfs.inUsePrefix = .
a1.sinks.dataSink.hdfs.filePrefix = drone
a1.sinks.dataSink.hdfs.fileSuffix = .avro
a1.sinks.dataSink.hdfs.rollSize = 180000000
a1.sinks.dataSink.hdfs.rollCount = 100000
a1.sinks.dataSink.hdfs.rollInterval = 120
a1.sinks.dataSink.hdfs.idleTimeout = 3600
a1.sinks.dataSink.hdfs.fileType = DataStream
a1.sinks.dataSink.serializer = avro_event
The output where avro file with flume's default schema.I also tried to use AvroEventSerializer, but I just got a lot of different error, I solved all of them, except this one:
ERROR hdfs.HDFSEventSink: process failed
java.lang.ExceptionInInitializerError
at org.apache.hadoop.hdfs.DFSOutputStream.computePacketChunkSize(DFSOutputStream.java:1305)
at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1243)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1266)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1101)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1059)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75)
Thank you for any help.
Sory for mistakes in the config. I fixed them and found the way to convert css to avro. I a little bit modified AvroEventSerializer this way:
public void write(Event event) throws IOException {
if (dataFileWriter == null) {
initialize(event);
}
String[] items = new String(event.getBody()).split(",");
city.put("deviceID", Long.parseLong(items[0]));
city.put("groupID", Long.parseLong(items[1]));
city.put("timeCounter", Long.parseLong(items[2]));
city.put("cityCityName", items[3]);
city.put("cityStateCode", items[4]);
city.put("sessionCount", Long.parseLong(items[5]));
city.put("errorCount", Long.parseLong(items[6]));
dataFileWriter.append(citi);
}
and here is city definition:
private GenericRecord city = null;
Please reply, if you know a better way to do that