I have a lot of json files in my bucket in GCS and I need to create a table for each one.
Normally, I do it manually in BigQuery: selecting the format (json), giving it a name and using automatically detected schema.
Is there any way of creating multiple tables at once using data from GCS?
Disclaimer: I have a blogpost authored on this topic at https://medium.com/p/54228d166a7d
Essentially you can leverage Cloud Workflows, to automate this process.
a sample workflow would be:
ProcessItem:
params: [project, gcsPath]
steps:
- initialize:
assign:
- dataset: wf_samples
- input: ${gcsPath}
# omitted parts for simplicity
- runLoadJob:
call: BQJobsInsertLoadJob_FromGCS
args:
project: ${project}
configuration:
jobType: LOAD
load:
sourceUris: ${gcsPath}
schema:
fields:
- name: "mydate"
type: "TIMESTAMP"
- name: "col1"
type: "FLOAT"
- name: "col2"
type: "FLOAT"
destinationTable:
projectId: ${project}
datasetId: ${dataset}
tableId: ${"table_"+output.index}
result: loadJobResult
- final:
return: ${loadJobResult}
BQJobsInsertLoadJob_FromGCS:
params: [project, configuration]
steps:
- runJob:
call: http.post
args:
url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/jobs"}
auth:
type: OAuth2
body:
configuration: ${configuration}
result: queryResult
next: queryCompleted
- queryCompleted:
return: ${queryResult.body}
In this answer you have a solution to recursively go through your bucket and load csv files to BQ. You can adapt this code with for instance:
gsutil ls gs://mybucket/**.json | \
xargs -I{} echo {} | \
awk '{n=split($1,A,"/"); q=split(A[n],B,"."); print "mydataset."B[1]" "$0}' | \
xargs -I{} sh -c 'bq --location=YOUR_LOCATION load --replace=false --autodetect --source_format=NEWLINE_DELIMITED_JSON {}'
This is if you want to run a load job in parallel manually.
If you want to add automation, you can use workflows as #Pentium10 recommends, or plug the Bash command into a Cloud Run instance coupled with a Scheduler for instance (you can look at this repo for inspiration)
Related
I tried to create a container group and want to push those container logs in Loganalytics.
apiVersion: 2019-12-01
location: eastus2
name: mycontainergroup003
properties:
containers:
- name: mycontainer003
properties:
environmentVariables: []
image: fluent/fluentd
ports: []
resources:
requests:
cpu: 1.0
memoryInGB: 1.5
osType: Linux
restartPolicy: Always
diagnostics:
logAnalytics:
workspaceId: /subscriptions/f446b796-978f-4fa0-8462-......../resourcegroups/v_deployment-docker_us/providers/microsoft.operationalinsights/workspaces/deployment-docker-logs
workspaceKey: nEZSOUGe1huaCksRB2ahsFz/ibcaQr3WPdAHiLc............
tags: null
type: Microsoft.ContainerInstance/containerGroups
Now whenever I try to run :
az container create --resource-group rg-deployment-docker --name mycontainergroup003 --file .\azure-deploy-aci.yaml
then I would get the error as :
(InvalidLogAnalyticsWorkspaceId) The log analytics setting is invalid. WorkspaceId contains invalid character, e.g. '/', '.', etc.
Code: InvalidLogAnalyticsWorkspaceId
Message: The log analytics setting is invalid. WorkspaceId contains invalid character, e.g. '/', '.', etc.
Now I wish to create such parameter type with the help of parameter json file as mentioned in the URL:
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/resource-manager-workspace
{
"$schema": "https://schema.management.azure.com/schemas/2019-08-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"workspaceId": {
"type": "string"
}
},
}
Now I would run the below command:
az container create --resource-group rg-deployment-docker --name mycontainergroup003 --file .\azure-deploy-aci.yaml --parameters parameters.json
but getting the error as :
unrecognized arguments: --parameters parameters.json
It seems such arguments are invalid with az container create command. Can someone please suggest an alternate.
You need to pass the log analytics workspace GUID instead of passing the entire ResourceId in your Yaml file and also As per the documentation az container create cmdlet doesn't have any parameter to pass --parameterflag.
Post making the above changes, i am able to deploy the container without any issues.
Here is the sample screenshot output for reference:
I am trying to get data with rundeck webhook plugin, and for this i am usig curl command:
curl -X POST -d '{"name":"John", "age":30, "car":null}' https://rundeck_server/api/12/webhook/QSxTDYd08dcYxKh1R5YJNOPQvmSJH2Z8#Netbox_Job
In rundeck webhook plugin options i add those 2 variables, 'whkpayload' to get all the json data and 'name' to get the name only (must return John in this example):
-whkpayload ${raw} -name ${data.name}
And finally i show them with those lines:
echo #option.whkpayload#
echo #option.name#
I get an empty result and i can't figure out why. Any one may help me please ?
Following this, you need to use an option called whkpayload in your job, and set it as ${raw} in the webhook configuration.
I made an example:
The job definition in YAML format (with the whkpayload option):
- defaultTab: nodes
description: ''
executionEnabled: true
id: 0fcfca07-02f6-4583-a3eb-0002276bdf2d
loglevel: INFO
name: HelloWorld
nodeFilterEditable: false
options:
- name: age
- name: car
- name: name
- name: whkpayload
plugins:
ExecutionLifecycle: null
scheduleEnabled: true
sequence:
commands:
- description: command step
exec: echo "name ${option.name} - age ${option.age} - car ${option.car} - payload
${option.whkpayload}"
- description: inline-script step
fileExtension: .sh
interpreterArgsQuoted: false
script: echo "name #option.name# - age #option.age# - car #option.car# - payload
#option.whkpayload#"
scriptInterpreter: /bin/bash
keepgoing: false
strategy: node-first
uuid: 0fcfca07-02f6-4583-a3eb-0002276bdf2d
The webhook configuration.
The webhook calling from cURL:
curl -H "Content-Type: application/json" -X POST -d '{"field1":"John", "field2":30, "field3":"chevy"}' http://localhost:4440/api/40/webhook/0vBZjWWrnXWvqENEdxkn0JRvjn5R63J0#MyWebhook
The result.
I have a JSON with key pairs and I want to access the values from Rundeck Options dynamically during the job execution.
For shell script, we can do a $RD_OPTIONS_<>.
Similarly is there some format I can use in a JSON file?
Just use #option.myoption# in a inline-script step.
You need a tool to use on an inline script step to manipulate JSON files on Rundeck. I made an example using JQ. Alternatively, you can use bash script-fu to reach the same goal.
For example, using this JSON file:
{
"books": [{
"fear_of_the_dark": {
"author": "John Doe",
"genre": "Mistery"
}
}]
}
Update the file with the following jq call:
To test directly in your terminal
jq '.books[].fear_of_the_dark += { "ISBN" : "9999" }' myjson.json
On Rundeck Inline-script
echo "$(jq ''.books[].fear_of_the_dark += { "ISBN" : "#option.isbn#" }'' myjson.json)" > myjson.json
Check how looks on an inline-script job (check here to know how to import the job definition to your Rundeck instance).
- defaultTab: nodes
description: ''
executionEnabled: true
id: d8f1c0e7-a7c6-43d4-91d9-25331cc06560
loglevel: INFO
name: JQTest
nodeFilterEditable: false
options:
- label: isbn number
name: isbn
required: true
plugins:
ExecutionLifecycle: null
scheduleEnabled: true
sequence:
commands:
- description: original file content
exec: cat myjson.json
- description: pass the option and save the content to the json file
fileExtension: .sh
interpreterArgsQuoted: false
script: 'echo "$(jq ''.books[].fear_of_the_dark += { "ISBN" : "#option.isbn#"
}'' myjson.json)" > myjson.json'
scriptInterpreter: /bin/bash
- description: modified file content (after jq)
exec: cat myjson.json
keepgoing: false
strategy: node-first
uuid: d8f1c0e7-a7c6-43d4-91d9-25331cc06560
Finally, check the result.
Here you can check more about executing scripts on Rundeck and here more about the JQ tool.
I would like to export a value from a stack that I am running and then import it into another stack as sorta a "global" parameter, so that I can manipulate it and use it for an S3 bucket name. I already know that I can accomplish importing the value individually on a line within a resource using something like:
{ "Fn::ImportValue" : { "Fn::Sub" : "${StackName}-ParameterName" } }
But is there a way to import it into my Parameters section?
Thanks for any help
But is there a way to import it into my Parameters section?
There is no such option. The closest you could get would be to save your global values in SSM Parameter Store and use dynamic references in CloudFormation as Default values in your Parameters.
There are two ways to achieve this
Use SSM Parameter Store, store the value from the source stack to SSM parameter store
BasicParameter:
Type: AWS::SSM::Parameter
Properties:
Name: AvailabilityZone
Type: String
Value:
Ref: AvailabilityZone
and then reference the value directly into the parameters section like below:
---
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
...
AvailabilityZone:
Description: Amazon EC2 instance Availablity Zone
Type: AWS::SSM::Parameter::Value<String>
Default: AvailabilityZone
Mappings: {}
Conditions: {}
Resources:
myinstance:
Type: AWS::EC2::Instance
Properties:
AvailabilityZone:
Ref: AvailabilityZone
...
The full example can be found here
You consume the output from the source stack and pass them to the destination stack while launching the stack.
Source stack output configured
Outputs:
InstanceID:
Description: The Instance ID
Value: !Ref EC2Instance
Consumer them in the destination stack:
aws \
--region us-east-1 \
cloudformation deploy \
--template-file cfn.yml \
--stack-name mystack \
--no-fail-on-empty-changeset \
--tags Application=awesomeapp \
--parameter-overrides \
"Somevar=OUTUT_FROM_SOURCE_STACK"
I am trying to create an Openshift template for a Job that passes the job's command line arguments in a template parameter using the following template:
apiVersion: v1
kind: Template
metadata:
name: test-template
objects:
- apiVersion: batch/v2alpha1
kind: Job
metadata:
name: "${JOB_NAME}"
spec:
parallelism: 1
completions: 1
autoSelector: true
template:
metadata:
name: "${JOB_NAME}"
spec:
containers:
- name: "app"
image: "batch-poc/sample-job:latest"
args: "${{JOB_ARGS}}"
parameters:
- name: JOB_NAME
description: "Job Name"
required: true
- name: JOB_ARGS
description: "Job command line parameters"
Because the 'args' need to be an array, I am trying to set the template parameter using JSON syntax, e.g. from the command line:
oc process -o=yaml test-template -v=JOB_NAME=myjob,JOB_ARGS='["A","B"]'
or programmatically through the Spring Cloud Launcher OpenShift Client:
OpenShiftClient client;
Map<String,String> templateParameters = new HashMap<String,String>();
templateParameters.put("JOB_NAME", jobId);
templateParameters.put("JOB_ARGS", "[ \"A\", \"B\", \"C\" ]");
KubernetesList processed = client.templates()
.inNamespace(client.getNamespace())
.withName("test-template")
.process(templateParameters);
In both cases, it seems to fail because Openshift is interpreting the comma after the first array element as a delimiter and not parsing the remainder of the string.
The oc process command sets the parameter value to '["A"' and reports an error: "invalid parameter assignment in "test-template": "\"B\"]"".
The Java version throws an exception:
Error executing: GET at: https://kubernetes.default.svc/oapi/v1/namespaces/batch-poc/templates/test-template. Cause: Can not deserialize instance of java.util.ArrayList out of VALUE_STRING token\n at [Source: N/A; line: -1, column: -1] (through reference chain: io.fabric8.openshift.api.model.Template[\"objects\"]->java.util.ArrayList[0]->io.fabric8.kubernetes.api.model.Job[\"spec\"]->io.fabric8.kubernetes.api.model.JobSpec[\"template\"]->io.fabric8.kubernetes.api.model.PodTemplateSpec[\"spec\"]->io.fabric8.kubernetes.api.model.PodSpec[\"containers\"]->java.util.ArrayList[0]->io.fabric8.kubernetes.api.model.Container[\"args\"])
I believe this is due to a known Openshift issue.
I was wondering if anyone has a workaround or an alternative way of setting the job's parameters?
Interestingly, if I go to the OpenShift web console, click 'Add to Project' and choose test-template, it prompts me to enter a value for the JOB_ARGS parameter. If I enter a literal JSON array there, it works, so I figure there must be a way to do this programmatically.
We worked out how to do it; template snippet:
spec:
securityContext:
supplementalGroups: "${{SUPPLEMENTAL_GROUPS}}"
parameters:
- description: Supplemental linux groups
name: SUPPLEMENTAL_GROUPS
value: "[14051, 14052, 48, 65533, 9050]"
In our case we have 3 files :
- environment configuration,
- template yaml
- sh file which run oc process.
And working case looks like this :
environment file :
#-- CORS ---------------------------------------------------------
cors_origins='["*"]'
cors_acceptable_headers='["*","Authorization"]'
template yaml :
- apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: plugin-common-cors
annotations:
kubernetes.io/ingress.class: ${ingress_class}
config:
origins: "${{origins}}"
headers: "${{acceptable_headers}}"
credentials: true
max_age: 3600
plugin: cors
sh file running oc :
if [ -f templates/kong-plugins-template.yaml ]; then
echo "++ Applying Global Plugin Template ..."
oc process -f templates/kong-plugins-template.yaml \
-p ingress_class="${kong_ingress_class}" \
-p origins=${cors_origins} \
-p acceptable_headers=${cors_acceptable_headers} \
-p request_per_second=${kong_throttling_request_per_second:-100} \
-p request_per_minute=${kong_throttling_request_per_minute:-2000} \
-p rate_limit_by="${kong_throttling_limit_by:-ip}" \
-o yaml \
> yaml.tmp && \
cat yaml.tmp | oc $param_mode -f -
[ $? -ne 0 ] && [ "$param_mode" != "delete" ] && exit 1
rm -f *.tmp
fi
The sh file should read environment file.