Good morning all!!
Yesterday I was looking for a function that was able to write a single file CSV with pyspark in Azure Databricks but I did not find anything. So I've built my own function and I wanted to share my solution with the community and if it's possible create like a thread with different solutions for the same problem.
Sorry, because I commented the code in Spanish but basically the function does:
Save the dataframe you've created into a new directory (which is allocated inside the path you've defined and begin with 'temp_') and save the partition there using "coalesce(1)"
Rename the CSV file as you want and moves it to the desired path
Delete de temporary file
That's all! You have your unique CSV file
def escribe_fichero_unico(dataframe, path, file_name, file_format = 'csv'):
"""
Definición: (1) Genera carpeta temporal para guardar particiones que Spark genera por defecto
a la hora de guardar archivos, (2) Une todas las particiones en un único archivo, (3) Mueve este
archivo al directorio anterior y (4) Borra la carpeta temporal
Parámetros:
dataframe: dataframe que quieras guardar como fichero único
file_name: en formato string escribe nombre del archivo
file_format: en formato string escribe 'csv' o 'parquet'
path: en formato string escribe el path donde quieres guardar el csv
"""
import os
# 1) Guardamos el dataframe creando una carpeta temporal que guarda todas las particiones
path_temp = path + 'temp_' + file_name + '_trash'
if file_format == 'csv':
dataframe.coalesce(1).write.format('csv').mode('overwrite') \
.options(header="true", schema="true", delimiter=";") \
.save(path_temp)
else if file_format == 'parquet':
dataframe.coalesce(1).write.format("parquet").mode("overwrite") \
.save(path_temp)
# 2)Une todas las particiones en un único archivo
file_part = [file.path for file in dbutils.fs.ls(path_temp) if os.path.basename(file.path).startswith('part')][0]
# 3) Mueve este archivo al directorio anterior
dbutils.fs.mv(file_part, path + file_name + '.' + file_format)
# 4) Borra la carpeta temporal
dbutils.fs.rm(path_temp, True)
I hope this work for you as well :)
chitown88 help me to find the json on this website : https://www.iwc.com/fr/fr/watch-collections/pilot-watches/iw329303-big-pilots-watch-43.html
It seems that you need to replace html by .productinfo.FR.json
Source : How to scrape specific information on a website
I would like to do the same output with this page : https://www.omegawatches.com/fr-fr/watch-omega-constellation-quartz-27-mm-12315276005001
But I cannot manage to scrape those informations because the page is dynamic and I cannot find the json data, I searched for hours.
Do you have any solutions in order to scrape the same output than the question source ?
There isn't anything special about this page. Just grab the right information with beautifulsoup:
import requests
from bs4 import BeautifulSoup
url = "https://www.omegawatches.com/fr-fr/watch-omega-constellation-quartz-27-mm-12315276005001"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
title = soup.title.text.split("|")[0].strip()
description = soup.select_one(".description-content").get_text(strip=True)
price = soup.select_one(".price").text
info = "\n".join(
tag.get_text(strip=True)
for tag in soup.select(
".product-info-details-right .li, .product-info-details-right li"
)
)
print(title)
print()
print(description)
print("-" * 80)
print(price)
print("-" * 80)
print(info)
Prints:
Constellation Quartz 27 mm - 123.15.27.60.05.001
L'esthétique particulièrement remarquable et intemporelle de la collection OMEGA Constellation se caractérise par l'originalité du cadran et la présence des fameuses griffes.Ce modèle brossé se distingue par un cadran en nacre blanche protégé par un verre saphir résistant aux rayures. La lunette sertie de diamants est montée sur un boîtier de 27 mm en acier inoxydable sur un bracelet également en acier inoxydable.Cette montre est animée par le calibre OMEGA 1376, un mouvement de précision à quartz.
--------------------------------------------------------------------------------
5 000,00 €
--------------------------------------------------------------------------------
Diamants
Entre‑corne :18 mm
Bracelet :acier
Boîtier :acier
Diamètre du boîtier :27 mm
Couleur du cadran :blanc
Verre :verre saphir bombé résistant aux rayures, traité antireflet à l’intérieur
Étanchéité :10 bars (100 mètres / 330 pieds)
Type de mouvement :quartz
Calibre :OMEGA 1376
Mouvement de précision à quartz, finition rhodiée.
Durée de vie de la pile :48 mois
Type :quartz
Swiss Made
GARANTIE 5 ANS
Paiement sécurisé
Livraison et retour offerts
I'm starting to work with PowerShell and my first task is to fix a problem with a script that already runs in the production environment.
This script is called via Webhook and it receives a parameter that the webhook passes to it.
I need to run this script inside PowerShell ISE to be able to debug it but I don't know how to fill the variables that are normally filled when it is called by Webhook.
Here is the beginning of the code where the variables are filled in, can someone give me a tip on how to fill the variable "WebHookData"..?
Thanks in advance.
I've tried to do this but it didn't work...
Sorry for putting images instead of the code, but for some reason I can't post the code.
This is the JSON that I use..
{"source":"la-draft-clipboard","value":[{"tokenKey":"8EAD3F03-E08F-4D58-8B1A-2AB8BD2F25DB","type":"literal","tokenExpression":"{"},{"tokenKey":"A7596123-17DF-49A9-AC18-1196A4CD457E","type":"new_line","tokenExpression":"\n"},{"tokenKey":"36DF511D-C1A9-4BC8-B2E9-37BCA058FB78","type":"literal","tokenExpression":" \"AutomationAccountName\": \"proj-00016-automation-account\","},{"tokenKey":"918137AE-EC61-4B77-A5F2-B527E2D4E3C9","type":"new_line","tokenExpression":"\n"},{"tokenKey":"DCC2D1C1-14F0-4869-A44C-08F8AB35B0B3","type":"literal","tokenExpression":" \"BeginPeakTime\": \"7:00\","},{"tokenKey":"61F7441B-0688-4AD2-A1A5-086C4F7F6D1E","type":"new_line","tokenExpression":"\n"},{"tokenKey":"2F3DD3CA-BD83-46EF-9529-C890C2E31CAF","type":"literal","tokenExpression":" \"ConnectionAssetName\": \"AzureRunAsConnection\","},{"tokenKey":"C6DD6FD0-E99A-48A8-96AA-3974D66FD9BD","type":"new_line","tokenExpression":"\n"},{"tokenKey":"A4E7A469-D08A-4C5A-8C6B-06E58996A0EC","type":"literal","tokenExpression":" \"EndPeakTime\": \"17:00\","},{"tokenKey":"E67547BC-98BB-4749-A84E-A36B761EE504","type":"new_line","tokenExpression":"\n"},{"tokenKey":"727D64BD-906C-4DA3-84C5-44F3054B2DEB","type":"literal","tokenExpression":" \"HostPoolName\": \"VDI-POOL-001\","},{"tokenKey":"92AFEBB8-4307-42C2-8BD0-C55ACC848940","type":"new_line","tokenExpression":"\n"},{"tokenKey":"F37993F9-1471-4E58-B43F-9BB08C4D4A03","type":"literal","tokenExpression":" \"LimitSecondsToForceLogOffUser\": 0,"},{"tokenKey":"8B2517D1-046E-43EF-BF75-B1EC5F31B83D","type":"new_line","tokenExpression":"\n"},{"tokenKey":"7464316E-6A8D-4F82-B269-95FF76A69014","type":"literal","tokenExpression":" \"LogOffMessageBody\": \"Salve seus trabalhos! Em aproximadamente 15 minutos, este terminal virtual será desligado automaticamente devido às políticas de otimização de custos da companhia. Caso seja necessário continuar suas atividades, um novo terminal poderá ser acessado após este período.\","},{"tokenKey":"7328955E-0025-4AA1-A0AE-CDAFA4238927","type":"new_line","tokenExpression":"\n"},{"tokenKey":"384AF3CF-CA86-4820-A5E1-230C09909662","type":"literal","tokenExpression":" \"LogOffMessageTitle\": \"ATENÇÃO!!!\","},{"tokenKey":"5E2EBD78-8599-487F-8DC5-CF9699595DDD","type":"new_line","tokenExpression":"\n"},{"tokenKey":"B7E409AF-A5AE-4622-A45E-5982FD15B03E","type":"literal","tokenExpression":" \"MaintenanceTagName\": \"NO_TAG\","},{"tokenKey":"3F9BF963-790D-45B1-9F04-D71A2B7C84DC","type":"new_line","tokenExpression":"\n"},{"tokenKey":"B6E94E37-69C0-4BF8-AE69-CD7B4EA9CB83","type":"literal","tokenExpression":" \"MinimumNumberOfRDSH\": 20,"},{"tokenKey":"00A1D37B-F82B-42F6-B792-75B39EBD6A83","type":"new_line","tokenExpression":"\n"},{"tokenKey":"F41B0C75-4541-4772-BF30-2D4F6DF045C6","type":"literal","tokenExpression":" \"ResourceGroupName\": \"proj-00016-wvd-rg\","},{"tokenKey":"FE6FC329-DC12-4782-83CE-F48BDC6B74B5","type":"new_line","tokenExpression":"\n"},{"tokenKey":"785500F8-3D71-4D91-AADA-D6ABF1EFD66B","type":"literal","tokenExpression":" \"ResourceGroupNameAutomation\": \"proj-00016-automation-rg\","},{"tokenKey":"BD3331BF-3BF9-4B9E-B9B8-C03E448B2D85","type":"new_line","tokenExpression":"\n"},{"tokenKey":"25586050-62A0-4CAF-81FD-C5770DF20B63","type":"literal","tokenExpression":" \"RunbookLogoffShutdown\": \"ARMLogoffAndShutdown\","},{"tokenKey":"C4B9E432-C41D-4374-9531-F2AEFDD51267","type":"new_line","tokenExpression":"\n"},{"tokenKey":"0155B6AB-7CAB-4C4E-BB1F-A643D9B0575B","type":"literal","tokenExpression":" \"SessionThresholdPerCPU\": 0.75,"},{"tokenKey":"3EAA1C7E-0119-40B9-9AF8-85D10E0FA3FD","type":"new_line","tokenExpression":"\n"},{"tokenKey":"2D904698-1386-47D7-9513-7CEE702BA0D3","type":"literal","tokenExpression":" \"TimeDifference\": \"-3:00\""},{"tokenKey":"40D497B6-AAED-4334-81C7-10B8C6745DE0","type":"new_line","tokenExpression":"\n"},{"tokenKey":"18EB90AF-25D4-4956-8A85-41BA555C6A95","type":"literal","tokenExpression":"}"}]}
Based on the JSON you've posted and the parts of the code we can see in the screenshot, give the following mock object a try:
$mockWebhookPayload = [pscustomobject]#{
WebhookName = 'NameOfWebhookGoesHere'
RequestHeader = #{ 'Content-Type' = 'application/json' }
RequestBody = #'
{
"AutomationAccountName": "proj-00016-automation-account",
"BeginPeakTime": "7:00",
"ConnectionAssetName": "AzureRunAsConnection",
"EndPeakTime": "17:00",
"HostPoolName": "VDI-POOL-001",
"LimitSecondsToForceLogOffUser": 0,
"LogOffMessageBody": "Salve seus trabalhos! Em aproximadamente 15 minutos, este terminal virtual será desligado automaticamente devido às políticas de otimização de custos da companhia. Caso seja necessário continuar suas atividades, um novo terminal poderá ser acessado após este período.",
"LogOffMessageTitle": "ATENÇÃO!!!",
"MaintenanceTagName": "NO_TAG",
"MinimumNumberOfRDSH": 20,
"ResourceGroupName": "proj-00016-wvd-rg",
"ResourceGroupNameAutomation": "proj-00016-automation-rg",
"RunbookLogoffShutdown": "ARMLogoffAndShutdown",
"SessionThresholdPerCPU": 0.75,
"TimeDifference": "-3:00"
}
'#
}
& .\path\to\webhook-script.ps1 -WebHookData $mockWebhookPayload
I'm pretty new with this JSON and AJAX staff, so I was following a tutorial on youtube: https://www.youtube.com/watch?v=rJesac0_Ftw&t=1029s.
The thing is that I have followed the steps exactly like in the video, but I get the following error:
VM34:1 Uncaught SyntaxError: Unexpected token / in JSON at position 0
at JSON.parse (<anonymous>)
at XMLHttpRequest.theRequest.onload (loader.js:5)
My JSON script:
[
{
"name":"一",
"sound": {
"kunyomi": ["ひと.つ"],
"onyomi": ["イチ"]
},
"description":"Representaba la unidad, el absoluto. Cuando funciona como componente, este carácter adquiere el significado de suelo o de techo según su posición: si se encuentra encima de otro componente, toma el significado de techo; si está debajo, de suelo. Todas las formas antiguas de los números están asociadas a fuerzas del universo y a la mitología. Los números pares son el ying y los impares son el yang.",
"examples":["-月[いちがつ] - Enero", "-日[ついたち] - Día uno", "-回[いっかい] - Dos veces", "-階[いっかい] - Primer Piso"]
},
{
"name":"ニ",
"sound": {
"kunyomi": ["ふた.つ"],
"onyomi": ["ニ、ジ"]
},
"description":"Representa el cielo 一 y la tierra 一, el ying y el yang. Al igual que en el caso de los numerales romanos, el kanji de dos es una simple duplicación del trazo horizontal que significa uno.",
"examples":["二月[にがつ] - Febrero", "二日[ふつか] - Día dos", "二回[にかい] - Dos veces"]
}
]
And I call that JSON with the following code:
var ourRequest = new XMLHttpRequest();
ourRequest.open('GET', 'http://127.0.0.1/japones_flat/kanjis_n5.json');
ourRequest.onload = function() {
"use strict";
var response = JSON.parse(ourRequest.responseText);
console.log(response[0]);
};
ourRequest.send();
I made a little research and it looks like the problem resides in the JSON.parse method, saying that the token "/" is making troubles. After that, I noticed that Dreamweaver had left by default a comment in my .json file, so I deleted it (Because it started with "/") but I keep getting this anoying error. Can you guys help me?
Thanks in advance!
I have this diagram.
Diagram (1/2)
Diagram (2/2)
Just in case : Equipo (Device), Servicio (Service) , Categoria(Category), Persona (Person), Comprobante (Voucher,Receipt)
I need to be able to do the following:
"Cuando un cliente informa de un equipo al cual se le suministrará alguno de los
servicios es necesario llevar registro de su fecha de alta, como también su fecha de baja si la hubiere.(Whenever a client informs of a device which is going to be given a service, it is needed to register the activation date and the deactivation date if there is one)
The first thing I tought was to just add attributes to the service table, for example ( act_date timestamp and dea_date timestamp), would that be okey ? Or maybe create another table, like the one that says "Categoria"(Category), but to save those dates. Any ideas ?
Thank you.