I have sheets that have a data range connected to a pivot table. The pivot table has a secondary row sor by the last column. I have queries that update the data range and add a column. I would like to be able to use Google App Script to sort by that new column. I can get the current pivot table and seem to be updating it but the changes do not seem to be happening. I used the method here. The original table
I used the method here. The original table is as follows:
`
{
"columns":[
{
"sortOrder":"ASCENDING",
"sourceColumnOffset":2
}
],
"values":[
{
"summarizeFunction":"SUM",
"sourceColumnOffset":4
}
],
"source":{
"endColumnIndex":5,
"startRowIndex":6,
"endRowIndex":193,
"sheetId":698433721,
"startColumnIndex":0
},
"rows":[
{
"valueBucket":{
"buckets":[
{
"stringValue":"6/26/2019"
}
]
},
"showTotals":true,
"sortOrder":"DESCENDING",
"sourceColumnOffset":3
}
]
}
'
The 'stringValue" contains the text of the header of the current column. I assume that changing that would change the sort column but it has no effect.
When I read the pivot table again after making the change the new data shows up but the UI representation of the pivot table does not change and the sort column does not change.
Are you replacing the whole pivot table? I think the documentation sets this as a requirement no? To quote >>'Essentially, editing a pivot table requires replacing it with a new one'<<.
Related
I have a bunch of json files which have an array with column names and a separate array for the rows.
I want a dynamic way of retrieving column names and merge them with the rows for each json file.
Been playing around with derived columns and column patterns, but struggling to get it working.
I want the column names from [data.column.shortText] and values for each corresponding [data.rows.value] according to the order.
Example format
{
"messages":{
},
"data":{
"columns":[
{
"columnName":"SelectionCriteria1",
"shortText":"Case no."
},
{
"columnName":"SelectionCriteria2",
"shortText":"Period for periodical values",
},
{
"columnName":"SelectionCriteria3",
"shortText":"Location"
},
{
"columnName":"SelectionCriteriaAggregate",
"shortText":"Value"
}
],
"rows":[
[
{
"value":"23523"
},
{
"value":12342349
},
{
"value":"234234",
"code":3342
},
{
"value":234234234
}
]
]
}
}
First, you need to fix your Json data, i can see you have an extra comma in columns second Json and in rows you have value as int and as string so when i tried to parse it in ADF i got an error.
i don't quite understand why you're trying to do merge by position because normally we get rows more than columns, and if you'll get 5 rows and 3 columns you will get an error.
Here is my approach to your problem:
the main idea is that i added index column to both arrays and joined the jsons by Inner Join.
created a Source Data (its 2 but you can make it one to simplify your data flow)
added Select activity to select relevant arrays from the data.
flattened the array(in order to add index column)
added index by using rank activity (please read more about rank and dense rank and what is the difference between the two)
added a Join activity , inner join by index column.
Select activity to remove index column from the result.
saved output to sink.
Json Data that i worked with:
Data Flow:
SelectRows Activity:
Flatten Activity:
Rank actitity:
Join activity:
please check these links:
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expressions-usage#mapAssociation
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-map-functions
I am facing some difficulties while trying to create a query that can match only whole phrases, but allows wildcards as well.
Basically I have a filed that contains a string (it is actually a list of strings, but for simplicity I am skipping that), which can contain white spaces or be null, lets call it "color".
For example:
{
...
"color": "Dull carmine pink"
...
}
My queries need to be able to do the following:
search for null values (inclusive and exclusive)
search for non null values (inclusive and exclusive)
search for and match only a whole phrase (inclusive and exclusive). For example:
dull carmine pink --> match
carmine pink --> not a match
same as the last, but with wildcards (inclusive and exclusive). For example:
?ull carmine p* --> match to "Dull carmine pink"
dull carmine* -> match to "Dull carmine pink"
etc.
I have been bumping my head against the wall for a few days with this and I have tried almost every type of query I could think of.
I have only managed to make it work partially with a span_near query with the help of this topic.
So basically I can now:
search for a whole phrase with/without wildcards like this:
{
"span_near": {
"clauses": [
{
"span_term": {"color": "dull"}
},
{
"span_term": {"color": "carmine"}
},
{
"span_multi": {"match": {"wildcard": {"color": "p*"}}}
}
],
"slop": 0,
"in_order": true
}
}
search for null values (inclusive and exclusive) by simple must/must_not queries like this:
{
"must" / "must_not": {'exist': {'field': 'color'}}
}
The problem:
I cannot find a way to make an exclusive span query. The only way I can find is this. But it requires both include & exclude fields, and I am only trying to exclude some fields, all others must be returned. Is there some analog of the "match_all":{} query that can work inside of an span_not's include field? Or perhaps an entire new, more elegant solution?
I found the solution a month ago, but I forgot to post it here.
I do not have an example at hand, but I will try to explain it.
The problem was that the fields I was trying to query were analyzed by elasticsearch before querying. The analyzer in question was dividing them by spaces etc. The solution to this problem is one of the two:
1. If you do not use a custom mapping for the index.
(Meaning if you let elasticsearch to dynamically create the appropriate mapping for your field when you were adding it).
In this case elastic search automatically creates a subfield of the text field called "keyword". This subfield uses the "keyword" analyzer which does not process the data in any way prior to querying.
Which means that queries like:
{
"query": {
"bool": {
"must": [ // must_not
{
"match": {
"user.keyword": "Kim Chy"
}
}
]
}
}
}
and
{
"query": {
"bool": {
"must": [ // must_not
{
"wildcard": {
"user.keyword": "Kim*y"
}
}
]
}
}
}
should work as expected.
However with the default mapping, the keyword field will most likely be case-sensitive. In order for it to be case-insensitive as well, you will need to create a custom mapping, that applies a lower-case (or upper-case) normalizer to the query and keyword field prior to matching.
2. If you use a custom mapping
Basically the same as above, however you will have to create a new subfield (or field) manually that uses the keyword analyzer (and possibly a normalizer in order for it to be case-insensitive).
P.S. As far as I am aware changing of a mapping is no longer possible in elasticsearch. This means that you will have to create a new index with the appropriate mapping, and then reindex your data to the new index.
I want to achieve the following use case:
I have a file as follows:
enter code here
{"FirstName":6785,"Lastname":"Charles","Address":"1103 pioneer St"}
{"HouseName":67783,"Lastname":"Stevenson","Address":"Abel St"}
{"FoodName":7473,"Lastname":"luther","Address":"Half Moon Bay"}
si I want to add "NAME" and "Value" tag in the first column across each row so I can easily push all the FirstName, HouseName, and FoodName attributes under one column in MySQL named 'Name' and its respective values under "Value" column in MySQL. for example I want the data to look like as following:
{NAME:"FirstName","Value":6785,"Lastname":"Charles","Address":"1103 pioneer St"}
{NAME:"HouseName","Value":67783,"Lastname":"Stevenson","Address":"Abel St"}
{NAME:"FoodName","Value":7473,"Lastname":"luther","Address":"Half Moon Bay"}
My table in MySQL is as follows:
Name Value Lastname Address
I am using the following flow:
GetFile->SplitRecord->ConvertJsonToSQL ->PutSQL
I want under the NAME column all the first column attribute names of each row(FirstName, HouseName, FoodName) and under Value column its respective values to be entered.
How can i achieve this use case in NiFi?
What Regex I should use in the ReplaceText to achieve this. Anny help is appreciated. Thank You!
You can use GetFile -> JoltTransformJson (or JoltTransformRecord) -> PutDatabaseRecord for this, which avoids the split and separate SQL generation/execution. Use the following spec in JoltTransformRecord:
[
{
"operation": "shift",
"spec": {
"*": {
"Address": "[#2].Address",
"Lastname": "[#2].Lastname",
"*Name": {
"$": "[#3].Name",
"#": "[#3].Value"
}
}
}
}
]
Notice I changed the output NAME column to Name to match your MySQL column name, it makes things easier on PutDatabaseRecord when the columns match the field names.
I'm using Google App Script to migrate data through BigQuery and I've run into an issue because the SQL I'm using to perform a WRITE_TRUNCATE load is causing the destination table to be recreated with column modes of NULLABLE rather than their previous mode of REQUIRED.
Attempting to change the modes to REQUIRED after the data is loaded using a metadata patch causes an error even though the columns don't contain any null values.
I considered working around the issue by dropping the table and recreating it again with the same REQUIRED modes, then loading the data using WRITE_APPEND instead of WRITE_TRUNCATE. But this isn't possible because a user wants to have the same source and destination table in their SQL.
Does anyone know if it's possible to define a BigQuery.Jobs.insert request that includes the output schema information/metadata?
If it's not possible the only alternative I can see is to use my original work around of a WRITE_APPEND but add a temporary table into the process, to allow for the destination table appearing in the source SQL. But if this can be avoid that would be nice.
Additional Information:
I did experiment with different ways of setting the schema information but when they didn't return an error message the schema seemed to get ignored.
I.e. this is the json I'm passing into BigQuery.Jobs.insert
jsnConfig =
{
"configuration":
{
"query":
{
"destinationTable":
{
"projectId":"my-project",
"datasetId":"sandbox_dataset",
"tableId":"hello_world"
},
"writeDisposition":"WRITE_TRUNCATE",
"useLegacySql":false,
"query":"SELECT COL_A, COL_B, '1' AS COL_C, COL_TIMESTAMP, COL_REQUIRED FROM `my-project.sandbox_dataset.hello_world_2` ",
"allowLargeResults":true,
"schema":
{
"fields":
[
{
"description":"Desc of Column A",
"type":"STRING",
"mode":"NULLABLE",
"name":"COL_A"
},
{
"description":"Desc of Column B",
"type":"STRING",
"mode":"REQUIRED",
"name":"COL_B"
},
{
"description":"Desc of Column C",
"type":"STRING",
"mode":"REPEATED",
"name":"COL_C"
},
{
"description":"Desc of Column Timestamp",
"type":"INTEGER",
"mode":"NULLABLE",
"name":"COL_TIMESTAMP"
},
{
"description":"Desc of Column Required",
"type":"STRING",
"mode":"REQUIRED",
"name":"COL_REQUIRED"
}
]
}
}
}
}
var job = BigQuery.Jobs.insert(jsnConfig, "my-project");
The result is that the new or existing hello_world table is truncated and loaded with the data specified in the query (so part of the json package is being read), but the column descriptions and modes aren't added as defined in the schema section. They're just blank and NULLABLE in the table.
More
When I tested the REST request above using Googles API page for BigQuery.Jobs.Insert it highlighted the "schema" property in the request as invalid. I think it appears the schema can be defined if you're loading the data from a file, i.e. BigQuery.Jobs.Load but it doesn't seem to support that functionality if you're putting the data in using an SQL source.
See the documentation here: https://cloud.google.com/bigquery/docs/schemas#specify-schema-manual-python
You can pass a schema object with your load job, meaning you can set fields to mode=REQUIRED
this is the command you should use:
bq --location=[LOCATION] load --source_format=[FORMAT] [PROJECT_ID]:[DATASET].[TABLE] [PATH_TO_DATA_FILE] [PATH_TO_SCHEMA_FILE]
as #Roy answered, this is done via load only. Can you output the logs of this command?
I am trying to learn mongodb. Suppose there are two tables and they are related. For example like this -
1st table has
First name- Fred, last name- Zhang, age- 20, id- s1234
2nd table has
id- s1234, course- COSC2406, semester- 1
id- s1234, course- COSC1127, semester- 1
id- s1234, course- COSC2110, semester- 1
how to insert data in the mongo db? I wrote it like this, not sure is it correct or not -
db.users.insert({
given_name: 'Fred',
family_name: 'Zhang',
Age: 20,
student_number: 's1234',
Course: ['COSC2406', 'COSC1127', 'COSC2110'],
Semester: 1
});
Thank you in advance
This would be a assuming that what you want to model has the "student_number" and the "Semester" as what is basically a unique identifier for the entries. But there would be a way to do this without accumulating the array contents in code.
You can make use of the upsert functionality in the .update() method, with the help of of few other operators in the statement.
I am going to assume you are going this inside a loop of sorts, so everything on the right side values is actually a variable:
db.users.update(
{
"student_number": student_number,
"Semester": semester
},
{
"$setOnInsert": {
"given_name": given_name,
"family_name": family_name,
"Age": age
},
"$addToSet": { "courses": course }
},
{ "upsert": true }
)
What this does in an "upsert" operation is first looks for a document that may exist in your collection that matches the query criteria given. In this case a "student_number" with the current "Semester" value.
When that match is found, the document is merely "updated". So what is being done here is using the $addToSet operator in order to "update" only unique values into the "courses" array element. This would seem to make sense to have unique courses but if that is not your case then of course you can simply use the $push operator instead. So that is the operation you want to happen every time, whether the document was "matched" or not.
In the case where no "matching" document is found, a new document will then be inserted into the collection. This is where the $setOnInsert operator comes in.
So the point of that section is that it will only be called when a new document is created as there is no need to update those fields with the same information every time. In addition to this, the fields you specified in the query criteria have explicit values, so the behavior of the "upsert" is to automatically create those fields with those values in the newly created document.
After a new document is created, then the next "upsert" statement that uses the same criteria will of course only "update" the now existing document, and as such only your new course information would be added.
Overall working like this allows you to "pre-join" the two tables from your source with an appropriate query. Then you are just looping the results without needing to write code for trying to group the correct entries together and simply letting MongoDB do the accumulation work for you.
Of course you can always just write the code to do this yourself and it would result in fewer "trips" to the database in order to insert your already accumulated records if that would suit your needs.
As a final note, though it does require some additional complexity, you can get better performance out of the operation as shown by using the newly introduced "batch updates" functionality.For this your MongoDB server version will need to be 2.6 or higher. But that is one way of still reducing the logic while maintaining fewer actual "over the wire" writes to the database.
You can either have two separate collections - one with student details and other with courses and link them with "id".
Else you can have a single document with courses as inner document in form of array as below:
{
"FirstName": "Fred",
"LastName": "Zhang",
"age": 20,
"id": "s1234",
"Courses": [
{
"courseId": "COSC2406",
"semester": 1
},
{
"courseId": "COSC1127",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 1
},
{
"courseId": "COSC2110",
"semester": 2
}
]
}