I´m using Google Cloud Funtion to create a table, for now everything works the way it is supposed to.
But I would like to add a new field in the table, one that could show the time of its creation.
This is an example of the code that I´m using at the moment.
Don´t know why Is not working but my main goal is to actually be able to do it with one table and the replicate the process in codes there I handle two or more tables.
Example:
structure of the data in the bucket:
Function
Code
main
from google.cloud
import bigquery
import pandas as pd
from previsional_tables
import table_TEST1
creation_date = pd.Timestamp.now()# Here is where I´ m supposed to get the date.
def main_function(event, context):
dataset = 'bd_clients'
file = event
input_bucket_name = file['bucket']
path_file = file['name']
uri = 'gs://{}/{}'.format(input_bucket_name, path_file)
path_file_list = path_file.split("/")
file_name_ext = path_file_list[len(path_file_list) - 1]
file_name_ext_list = file_name_ext.split(".")
name_file = file_name_ext_list[0]
print('nombre archivo ==> ' + name_file.upper())
print('Getting the data from bucket "{}"'.format(uri))
path_file_name = str(uri)
print("ruta: ", path_file_name)
if ("gs://bucket_test" in path_file_name):
client = bigquery.Client()
job_config = bigquery.LoadJobConfig()
table_test1(dataset, client, uri, job_config, bigquery)
tables
def table_test1(dataset, client, uri, job_config, bigquery):
table = "test1"
dataset_ref = client.dataset(dataset)
job_config.autodetect = True
job_config.max_bad_records = 1000
job_config.schema = [
bigquery.SchemaField("NAME", "STRING"),
bigquery.SchemaField("LAST_NAME", "STRING"),
bigquery.SchemaField("ADDRESS", "STRING"),
bigquery.SchemaField("DATE", bigquery.enums.SqlTypeNames.DATE)# create each column in Big Query along with types
]
job_config.source_format = bigquery.SourceFormat.CSV
job_config.field_delimiter = ';'
job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
load_job = client.load_table_from_uri(uri, dataset_ref.table(table), job_config = job_config)
requirements
# Function dependencies, for example:
# package>=version
google-cloud-bigquery==2.25.1
pysftp==0.2.9
pandas==1.4.2
fsspec==2022.5.0
gcsfs==2022.5.0
Output structure in Database:
Related
I have datasets with identical schemas stored in folders that denote an id for the dataset, e.g.:
\11111\dataset
\11112\dataset
Where the '11111' etc. indicates the dataset id. I am trying to write a transform in code repository to loop through the datasets and append them all together. The following code works for this:
def create_outputs(dataset_ids):
transforms = []
for id in dataset_ids:
#transform_df(
Output(output_path + "/appended_dataset"),
input_path=Input(input_path + id + "/dataset"),
)
def compute(input_path):
return input_path
transforms.append(compute)
return transforms
id_list = ['11111','11112']
TRANSFORMS = create_outputs(id_list)
However, rather than having the id's hardcoded in the id_list, I would like to have a separate dataset that holds the dataset id's that need to be appended. I am having difficulty getting something that works.
I have tried the following code, where the id_list_dataset holds the ids to be included in the append:
# input dataset
id_list_dataset = ["ri.foundry.main.dataset.abcdefg"]
schema = T.StructType([
T.StructField('ID', T.StringType())
])
sc = SparkContext.getOrCreate()
rdd = sc.parallelize(id_list_dataset)
sqlContext = SQLContext(sc)
# define dataframe
temp_df = sqlContext.createDataFrame(rdd, schema)
# get list of ID's
id_list = temp_df.select('ID').collect
TRANSFORMS = create_outputs(id_list)
However, this is giving the following error:
TypeError: 'method' object is not iterable
I have been trying to export a result table from a SQL query, but only the records inside the table is displayed in the Excel table. I am using MySQL and python(3.9.11) to run this code. Here is the code that I have been using to do so.
import pymysql
dbconn = pymysql.connect(<db details>)
cus = dbconn.cursor()
cus.execute('sql query')
res = cus.fetchall()
data = []
for i in res:
data += list(i)
var = open('main.csv','w')
for i in data:
var.write(str(i))
var.close()
Also, the result table's all columns data is written in a single column.
I tried and adding a var.write('/n'), but it will convert the rows into columns.
import pymysql
import csv
dbconn = pymysql.connect(DB Connection details)
cus = dbconn.cursor()
cus.execute("sql query here")
res = cus.fetchall()
column_names = [i[0] for i in cus.description]
fp = open('main.csv', 'w')
myFile = csv.writer(fp, lineterminator = '\n')
myFile.writerow(column_names)
myFile.writerows(res)
I am trying to convert an image uploaded by user into a PDF , and then store it into an ImageField in a mysql database ,using a form, but am facing an error when trying to store the PDF into the database
My views.py is:
from django.core.files.storage import FileSystemStorage
from PIL import Image
import io
from io import BytesIO
from django.core.files.uploadedfile import InMemoryUploadedFile
from django.core.files.base import ContentFile
def formsubmit(request): #submits the form
docs = request.FILES.getlist('photos')
print(docs)
section = request.POST['section']
for x in docs:
fs = FileSystemStorage()
print(type(x.size))
img = Image.open(io.BytesIO(x.read()))
imgc = img.convert('RGB')
pdfdata = io.BytesIO()
imgc.save(pdfdata,format='PDF')
thumb_file = ContentFile(pdfdata.getvalue())
filename = fs.save('photo.pdf', thumb_file)
linkobj = Link(link = filename.file, person = Section.objects.get(section_name = section), date = str(datetime.date.today()), time = datetime.datetime.now().strftime('%H:%M:%S'))
linkobj.save()
count += 1
size += x.size
return redirect('index')
My models.py:
class Link(models.Model):
id = models.BigAutoField(primary_key=True)
person = models.ForeignKey(Section, on_delete=models.CASCADE)
link = models.ImageField(upload_to= 'images', default = None)
date = models.CharField(max_length=80, default = None)
time = models.CharField(max_length=80,default = None)
Error I am getting is:
AttributeError: 'str' object has no attribute 'file'
Other methods I have tried:
1) linkobj = Link(link = thumb_file, person = Section.objects.get(section_name = section), date = str(datetime.date.today()), time = datetime.datetime.now().strftime('%H:%M:%S'))
RESULT OF ABOVE METHOD:
1)The thumb_file doesnt throw an error, rather it stores nothing in the database
Points I have noticed:
1)The file is being stored properly into the media folder, ie: I can see the pdf getting stored in the media folder
How do I solve this? Thank you
You don't (basically ever) need to initialize a Storage by yourself. This holds especially true since the storage for the field might not be a FileSystemStorage at all, but could e.g. be backed by S3.
Something like
import datetime
import io
from PIL import Image
from django.core.files.base import ContentFile
def convert_image_to_pdf_data(image):
img = Image.open(io.BytesIO(image.read()))
imgc = img.convert("RGB")
pdfdata = io.BytesIO()
imgc.save(pdfdata, format="PDF")
return pdfdata.getvalue()
def formsubmit(request): # submits the form
photos = request.FILES.getlist("photos") # list of UploadedFiles
section = request.POST["section"]
person = Section.objects.get(section_name=section)
date = str(datetime.date.today())
time = datetime.datetime.now().time("%H:%M:%S")
count = 0
size = 0
for image in photos:
pdfdata = convert_image_to_pdf_data(image)
thumb_file = ContentFile(pdfdata, name="photo.pdf")
Link.objects.create(
link=thumb_file,
person=person,
date=date,
time=time,
)
count += 1
size += image.size
return redirect("index")
should be enough here, i.e. using a ContentFile for the converted PDF content; the field should deal with saving it into the storage.
(As an aside, why are date and time stored separately as strings? Your database surely has a datetime type...)
Ok so I found an answer, to be fair I wont accept my own answer as it doesn't provide an exact answer to the question I asked, rather its a different method, so if anyone does know , please do share so that the community can benefit:
My Solution:
Instead of using ContentFile, I used InMemoryUploadedFile, to store the converted pdf and then moved it into the database( in an ImageField)
I am going to be honest, I am not completely sure about why ContentFile was not working, but when going through the documentation I found out that :
The ContentFile class inherits from File, but unlike File it operates on string content (bytes also supported), rather than an actual file.
Any detailed explanation is welcome
My new views.py
from django.core.files.storage import FileSystemStorage
from PIL import Image
import io
from io import BytesIO
from django.core.files.uploadedfile import InMemoryUploadedFile
from django.core.files.base import ContentFile
import sys
def formsubmit(request): #submits the form
docs = request.FILES.getlist('photos')
print(docs)
section = request.POST['section']
for x in docs:
fs = FileSystemStorage()
print(type(x.size))
img = Image.open(io.BytesIO(x.read()))
imgc = img.convert('RGB')
pdfdata = io.BytesIO()
imgc.save(pdfdata,format='PDF')
thumb_file = InMemoryUploadedFile(pdfdata, None, 'photo.pdf', 'pdf',sys.getsizeof(pdfdata), None)
linkobj = Link(link = thumb_file, person = Section.objects.get(section_name = section), date = str(datetime.date.today()), time = datetime.datetime.now().strftime('%H:%M:%S'))
linkobj.save()
count += 1
size += x.size
return redirect('index')
If you have a question, you can leave it in the comments and ill try to answer it, Good luck!!!
First of all, sorry for my lack of knowledge regarding databases, this is my first time working with them.
I am having some issues trying to get the data from an excel file and putting it into a data base.
Using answers from the site, I managed to kind of connect to the database by doing this.
import pandas as pd
import pyodbc
server = 'XXXXX'
db = 'XXXXXdb'
# create Connection and Cursor objects
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db + ';Trusted_Connection=yes')
cursor = conn.cursor()
# read data from excel
data = pd.read_excel('data.csv')
But I dont really know what to do now.
I have 3 tables, which are connected by a 'productID', my excel file mimics the data base, meaning that all the columns in the excel file have a place to go in the DB.
My plan was to read the excel file and make lists with each column, then insert into the DB each column value but I have no idea how to create a query that can do this.
Once I get the query I think the data insertion can be done like this:
query = "xxxxxxxxxxxxxx"
for row in data:
#The following is not the real code
productID = productID
name = name
url = url
values = (productID, name, url)
cursor.execute(query,values)
conn.commit()
conn.close
Database looks like this.
https://prnt.sc/n2d2fm
http://prntscr.com/n2d3sh
http://prntscr.com/n2d3yj
EDIT:
Tried doing something like this, but i'm getting 'not all arguments converted during string formatting' Type error.
import pymysql
import pandas as pd
connStr = pymysql.connect(host = 'xx.xxx.xx.xx', port = xxxx, user = 'xxxx', password = 'xxxxxxxxxxx')
df = pd.read_csv('GenericProducts.csv')
cursor = connStr.cursor()
query = "INSERT INTO [Productos]([ItemID],[Nombre])) values (?,?)"
for index,row in df.iterrows():
#cursor.execute("INSERT INTO dbo.Productos([ItemID],[Nombre])) values (?,?,?)", row['codigoEspecificoProducto'], row['nombreProducto'])
codigoEspecificoProducto = row['codigoEspecificoProducto']
nombreProducto = row['nombreProducto']
values = (codigoEspecificoProducto,nombreProducto)
cursor.execute(query,values)
connStr.commit()
cursor.close()
connStr.close()
I think my problem is in how I'm defining the query, surely thats not the right way
Try this, you seem to have changed the library from pyodbc to mysql, it seems to expect %s instead of ?
import pymysql
import pandas as pd
connStr = pymysql.connect(host = 'xx.xxx.xx.xx', port = xxxx, user = 'xxxx', password = 'xxxxxxxxxxx')
df = pd.read_csv('GenericProducts.csv')
cursor = connStr.cursor()
query = "INSERT INTO [Productos]([ItemID],[Nombre]) values (%s,%s)"
for index,row in df.iterrows():
#cursor.execute("INSERT INTO dbo.Productos([ItemID],[Nombre]) values (%s,%s)", row['codigoEspecificoProducto'], row['nombreProducto'])
codigoEspecificoProducto = row['codigoEspecificoProducto']
nombreProducto = row['nombreProducto']
values = (codigoEspecificoProducto,nombreProducto)
cursor.execute(query,values)
connStr.commit()
cursor.close()
connStr.close()
I'm trying to introduce slick into my code to replace some existing jdbc code.
First of all I'd like to use a scala worksheet to run a really simple query, I want to pass in an integer id, and get back a string uuid. This is the simplest method in the whole codebase.
As I understand I need to make a connection to the database, setup an action, and then run the action. I have the following code:
val db = Database.forURL("jdbc:mysql://mysql-dev.${URL}/${DB}?autoReconnect=true&characterEncoding=UTF-8",
driver = "com.mysql.jdbc.Driver", user = "${user}",password= "${pass}")
val getUUID = sql"""SELECT ${UUIDFIELD} from users u WHERE u.${IDFIELD} = ${Id}""".as[String]
val uuid:String = db.run(getUUID)
println(uuid)
I'm pretty sure I don't have the driver setup correctly in the Database.forURL call, but also the worksheet is complaining that the result of db.run is not a string. How do I get to the string UUID value?
The db.run method returns a Future[_] type. You should use Await for getting result from it.
val db = Database.forURL("jdbc:mysql://mysql-dev.${URL}/${DB}?autoReconnect=true&characterEncoding=UTF-8",
driver = "com.mysql.jdbc.Driver", user = "${user}",password= "${pass}")
val getUUID = sql"""SELECT ${UUIDFIELD} from users u WHERE u.${IDFIELD} = ${Id}""".as[String]
val uuidFuture:Future[String] = db.run(getUUID)
import scala.concurrent._
import scala.concurrent.duration._
val uuid:String = Await.result(uuidFuture, Duration.Inf)
println(uuid)