CSV, Grouping Products with their parent - csv

I've got a CSV file which will be used to import products into Woocommerce.
It's been generated from Magento and I am having severe trouble grouping parent products with their children (variations). The parent product has the Type "configurable" and the child "simple" and I cannot for the life of me find any unique identifier between the two of them which will allow me to import the children into their parent product.
I notice that each parent product SKU ends with 00, and goes up incremental for each of it's children, 01, 02, 03 and I've noticed that each parent product name is very similar to the child product, with the exception of the added variation attribute such as color etc. - Black, - Blue, - Red etc. being so similar in either SKU or Product name, would it be possible?
I'm thinking this has to be possible using OpenRefine.
Here are some example data, if you've got any idea about a method I can use to combine the two, please don't hesitate to elaborate.
ProductNo,Sku,Type,Product_name,Product_description,Price,CreatedDate,Status,categories
32,VIS00500,configurable,"Spinner II Battery","Batterycapacity: 1650mah",
33,Vis00501,simple,"Spinner II Battery - Black","Spinner II Battery - Black",0.0000,2014-10-02,Enabled,"Shop by Brand>Vision>Batteries and MODs",
And here is another, showing what I hope to achieve:
ProductNo,ParentID,Sku,Type,Product_name,Product_description,Price,CreatedDate,Status,categories
32,VIS00500, ,configurable,"Spinner II Battery","1650mah",14.1800,2014-10-02,Enabled,"Shop by Brand>Vision>Batteries and MODs",
33,Vis00501,32,simple,"Spinner II Battery - Black","Spinner II Battery - Black",0.0000,2014-10-02,Enabled,"Shop by Brand>Vision>Batteries and MODs",

I see you you've provided much more detailed explanations on the Open Refine Google group.
So, could you apply these operations (using "Undo/Redo") on your dataset and tell me if the result is right for you ?
[
{
"op": "core/row-reorder",
"description": "Reorder rows",
"mode": "row-based",
"sorting": {
"criteria": [
{
"errorPosition": 1,
"caseSensitive": false,
"valueType": "string",
"column": "Sku",
"blankPosition": 2,
"reverse": false
}
]
}
},
{
"op": "core/column-addition",
"description": "Create column isparent at index 2 based on column Sku using expression grel:if(value.match(/(.+00$)/).length() > 0, \"parent\", \"\")",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "isparent",
"columnInsertIndex": 2,
"baseColumnName": "Sku",
"expression": "grel:if(value.match(/(.+00$)/).length() > 0, \"parent\", \"\")",
"onError": "set-to-blank"
},
{
"op": "core/column-move",
"description": "Move column isparent to position 0",
"columnName": "isparent",
"index": 0
},
{
"op": "core/column-addition",
"description": "Create column parent at index 3 based on column Sku using expression grel:if(row.record.cells.Sku.value[0] != value, row.record.cells.Sku.value[0], \"\")",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "parent",
"columnInsertIndex": 3,
"baseColumnName": "Sku",
"expression": "grel:if(row.record.cells.Sku.value[0] != value, row.record.cells.Sku.value[0], \"\")",
"onError": "set-to-blank"
}
]
If you prefer human explanations :
1° Sort your column "Sku" as text by ascending order (a-z). Reorder rows permanently
2° Create a column "isparent" based on the column "Sku" using the following Grel formula, and move this new column to the begining :
if(value.match(/(.+00$)/).length() > 0, "parent", "")
3° Create a new column "parent_id" based on "Sku" using this Grel formula :
if(row.record.cells.Sku.value[0] != value, row.record.cells.Sku.value[0], "")

Related

JSON beginner: (can I/how can I/should I) automatically convert a clean JSON array into many SQL tables in a particular way?

Background: I am a JSON beginner and advanced-beginner SQL user. I am trying to learn how to parse JSON using MySQL, and I came up with the following problem to try to solve. This might be a stupid thing to want to do--let me know.
Terminology: As I understand it, I can refer to everything in between a pair of curly braces ("{..}") as an "object," and every pair of things separated by a ":" as a "key-value pair," and everything in between a pair of brackets ("[..]") as an "array." That's the terminology I'll use below, but if that's wrong then I will change it.
Problem: This question has a lot of potentially incorrect assumptions built into it. I want to start with a JSON array and do the following things. (Can I/should I/how can I) do this?
Convert every array of the form [a,b,c,...] into an object of the
form {"1":a,"2":b,"3":c,...}, including arrays that are inside of
other objects or arrays. (Except don't do this to the outermost
array.)
For every key whose value is sometimes an object and sometimes not an object, convert all of its associated "non-object" values x
into objects of the form {"genericvalue" : x}.
Find a string that is never used as a key within the JSON file; call it f.
To every object, add a key-value pair equal to "f" : i, where i is an integer such that no two objects have the same key-value pair "f" : i. (Trying to create some kind of unique identifier.)
Define a "key path" as a sequence of keys {k_1,k_2,k_3...k_n} such that the value associated with k_1 is an object containing k_2, the value associated with k_2 is an object containing k_3, and so on. Enumerate all "key paths" associated with the JSON file.
For each "key path," do the following:
If this key path never has an associated key-value pair whose value is an object, don't make a table.
If this key path always has an associated key-value pair whose value is an object, make a table with one column for each key that appears in any of the objects associated with this key path, plus another column called f_parent.
I think the above should be the only possibilities, because of step 2.
For each object o, create a row in the table associated with its key path with the following field values:
If o is nested inside another object, the "f_parent" column should be equal to the value associated with the key "f" in the object containing o. Otherwise, the "f" column should be null.
For all of the other columns {k_1,k_2,...k_n}, the column should be a) null if k_n is not a key that appears within o, or it should be equal to the value associated with k_n if k_n is a key
that appears within o.
Example: I want to start with a JSON array like this:
[
{
"color": "pink",
"flavor": "strawberry",
"ingredients": {
"1": "strawberries",
"2": "chemical x"
}
},
{
"color": "green",
"flavor": "pancake",
"tags": [
"breakfast",
"green"
]
},
{
"color": "red",
"flavor": "pumpkin spice",
"price": 3,
"ingredients": {
"0": "pumpkin",
"1": "spice"
},
"tags": "seasonal"
}
]
Let's set f equal to "recordid". After steps 1-3, I think I want it to look like this:
[
{
"recordid" : 1,
"color": "pink",
"flavor": "strawberry",
"ingredients": {
"recordid" : 2,
"1": "strawberries",
"2": "chemical x"
}
},
{
"recordid" : 3,
"color": "green",
"flavor": "pancake",
"tags": {
"recordid" : 4,
"1":"breakfast",
"2":"green"
}
},
{
"recordid" : 5,
"color": "red",
"flavor": "pumpkin spice",
"price": 3,
"ingredients": {
"recordid" : 6,
"0": "pumpkin",
"1": "spice"
},
"tags": {
"recordid" : 7,
"genericvalue":"seasonal"
}
}
]
So basically the array of tags for green pancake has been turned into an object, the single "seasonal" tag for red pumpkin spice has been turned into an object, and everything now has a record id.
Then I want to make the following tables:
"flavors" (I made this up); this is the table associated with the empty "key path"; the fields would be
recordid_parent (always null)
recordid
color
flavor
ingredients
tags
"flavors.tags"; the fields would be
recordid_parent
recordid
genericvalue
1
2
"flavors.ingredients"; the fields would be
recordid_parent
recordid
1
2
0
The "flavors.ingredients" table would look like this:
recordid_parent
recordid
1
2
0
1
2
strawberries
chemical x
null
5
6
null
spice
pumpkin
The "flavors.tags" table would look like this:
recordid_parent
recordid
1
2
genericvalue
3
4
breakfast
green
null
5
7
null
null
seasonal
The "flavors" table would look like this:
recordid_parent
recordid
color
flavor
price
ingredients
tags
null
1
pink
strawberry
null
(JSON object)
(JSON object)
null
3
green
pancake
null
(JSON object)
(JSON object)
null
5
red
pumpkin spice
3
(JSON object)
(JSON object)
(I didn't fill in all of the "(JSON object)" stuff).

Using Power Query to extract data from nested arrays in JSON

I'm relatively new to Power Query, but I'm pulling in this basic structure of JSON from a web api
{
"report": "Cost History",
"dimensions": [
{
"time": [
{
"name": "2019-11",
"label": "2019-11",
…
},
{
"name": "2019-12",
"label": "2019-12",
…
},
{
"name": "2020-01",
"label": "2020-01",
…
},
…
]
},
{
"Category": [
{
"name": "category1",
"label": "Category 1",
…
},
{
"name": "category2",
"label": "Category 2",
…
},
…
]
}
],
"data": [
[
[
40419.6393798211
],
[
191.44
],
…
],
[
[
2299.652439184997
],
[
0.0
],
…
]
]
}
I actually have 112 categories and 13 "times". I figured out how to do multiple queries to turn the times into column headers and the categories into row labels (I think). But the data section is alluding me. Because each item is a list within a list I'm not sure how to expand it all out. Each object in the date array will have 112 numbers and there will be 13 objects. If that all makes sense.
So ultimately I want to make it look like
2019-11 2019-20 2020-01 ...
Category 1 40419 2299
Category 2 191 0
...
First time asking a question on here, so hopefully this all makes sense and is clear. Thanks in advance for any help!
i am also researching this exact thing and looking for a solution. In PQ, it displays nested arrays as a list and there is a function to extract values choosing a separating characterenter image description here
So this becomes, this
enter image description here
= Table.TransformColumns(#"Filtered Rows", {"aligned_to_ids", each Text.Combine(List.Transform(_, Text.From), ","), type text})
However the problem i'm trying to solve is when the nested json has multiple values like this: enter image description here
And when these LIST are extracted then an error message is caused, = Table.TransformColumns(#"Extracted Values1", {"collaborators", each Text.Combine(List.Transform(_, Text.From), ","), type text})
Expression.Error: We cannot convert a value of type Record to type Text.
Details:
Value=
id=15890
goal_id=323
role_id=15
Type=[Type]
It seems the multiple values are not handled and PQ does not recognise the underlying structure to enable the columns to be expanded.

Use Doctrine to search into a json database column

I have a Symfony 3.2 project, and I need to filter data from a json column.
Given that we have an entity named "pack" with a json column named "settings" containing this kind of data:
{
"name": "My pack",
"blocks": [
{
"name": "Block 1",
"fields": [
{"label": "A", "value": "57"},
{"label": "B", "value": "100"}
]
},
{
"name": "Bock 2",
"fields": [
{"label": "C", "value": "80"}
]
}
]
}
I have to search packs with a field which has the label "B" and its value at "100", but each pack doesn't have same blocks and fields order.
So in my repository, using Doctrine\ORM\EntityRepository and opsway/doctrine-dbal-postgresql (for GET_JSON_FIELD and GET_JSON_OBJECT functions), this kind of condition works:
use Doctrine\ORM\EntityRepository;
class Packs extends EntityRepository
{
public function findFiltered(...)
{
return $this->createQueryBuilder('pack')
->andWhere("GET_JSON_FIELD(GET_JSON_OBJECT(pack.settings, '{blocks,0,fields,1}'), 'label') = :label")
->andWhere("GET_JSON_FIELD(GET_JSON_OBJECT(pack.settings, '{blocks,0,fields,1}'), 'value') = :value")
->setParameter('label', 'B')
->setParameter('value', '100')
;
}
}
But the problem is that I have to specify the precise block (the first block object), and the precise field (the second field object of the first block object). And my two condition aren't connected, it search if there is a label "B", then it search if there is a value "100". When I would like to have a research in all blocks and fields to find the good label for the good value. Any idea?
I found the good SQL request for my problem:
SELECT *
FROM pack p, json_array_elements(p.settings#>'{blocks}') blocks, json_array_elements(blocks#>'{fields}') fields
WHERE fields->>'label' = 'B' and fields->>'value' = '100';
But how I do that with doctrine?
Maybe this link can help you, it is a custom filter for a JSON type field, maybe it will serve as an example, but these functions with this bundle solved the problem for me. I hope this helps someone else too. Cheers!

How to Extract a Word and insert a string containing the word into an existing longtext field?

I'm nervously migrating and updating a MySQL database and for each record I need some help to achieve the following:
Extract the last word from the 'Name' column of a record
Encapsulate the extracted word in a string.
Replace the first character in the 'elements' column (longtext) with the new string.
Example:
The 'Name' column contains the following data:
Andrew Brewer
The extracted word should be encapsulated in a string to look this:
{
"0152cbf9-2bab-48dc-81e3-82d7b1b505ec": {
"0": {
"value": "Brewer"
}
},
The current 'elements' column begins with the following data:
{
"540aa2ad-9a8d-454d-b915-605b884e76d5": {
"file": "images\/profile-male\/ANDY_B_PROFILE.jpg",
"title": "",
"link": "",
"target": "0",
"rel": "",
"lightbox_image": "images\/profile-male\/ANDY_B_PROFILE.jpg",
"spotlight_effect": "",
"caption": "",
"width": 833,
"height": 1163
},
etc...
We need the new 'elements' column to look like this (assuming that it's easier to replace the first character in a field than to insert it):
{
"0152cbf9-2bab-48dc-81e3-82d7b1b505ec": {
"0": {
"value": "Brewer"
}
},
"540aa2ad-9a8d-454d-b915-605b884e76d5": {
"file": "images\/profile-male\/ANDY_B_PROFILE.jpg",
"title": "",
"link": "",
"target": "0",
"rel": "",
"lightbox_image": "images\/profile-male\/ANDY_B_PROFILE.jpg",
"spotlight_effect": "",
"caption": "",
"width": 833,
"height": 1163
},
etc...
No special allowances for spaced surnames etc. is required
Any and all help gratefully received.
What would the query code be to achieve the above?
I'm a designer with HTML/CSS/jQuery skills but an absolute SQL novice and I haven't gotten very far today, despite spending 12 hours on this.
I've had intermittent results with obtaining the last name as some of the names have trailing spaces and are resisting the trim command. Maybe they're not spaces but another type of whitespace? In any case the idea was to reverse the string, find the first space, trim and reverse again, which is succeeding as far as it goes.
SELECT RTRIM(`name`),
REVERSE(SUBSTRING(REVERSE(`name`), 1, LOCATE(' ', REVERSE(`name`)) - 1)) AS LastName
FROM `artist`;
The next step would be to concatenate the string start+LastName+end then replace the first character of the 'elements' column but I have no idea how to do this.
If I'm in the wrong place to ask this type of question, please feel free to close this and I'll resort to editing the database manually.

Access deeper elements of a JSON using postgresql 9.4

I want to be able to access deeper elements stored in a json in the field json, stored in a postgresql database. For example, I would like to be able to access the elements that traverse the path states->events->time from the json provided below. Here is the postgreSQL query I'm using:
SELECT
data#>> '{userId}' as user,
data#>> '{region}' as region,
data#>>'{priorTimeSpentInApp}' as priotTimeSpentInApp,
data#>>'{userAttributes, "Total Friends"}' as totalFriends
from game_json
WHERE game_name LIKE 'myNewGame'
LIMIT 1000
and here is an example record from the json field
{
"region": "oh",
"deviceModel": "inHouseDevice",
"states": [
{
"events": [
{
"time": 1430247045.176,
"name": "Session Start",
"value": 0,
"parameters": {
"Balance": "40"
},
"info": ""
},
{
"time": 1430247293.501,
"name": "Mission1",
"value": 1,
"parameters": {
"Result": "Win ",
"Replay": "no",
"Attempt Number": "1"
},
"info": ""
}
]
}
],
"priorTimeSpentInApp": 28989.41467999999,
"country": "CA",
"city": "vancouver",
"isDeveloper": true,
"time": 1430247044.414,
"duration": 411.53,
"timezone": "America/Cleveland",
"priorSessions": 47,
"experiments": [],
"systemVersion": "3.8.1",
"appVersion": "14312",
"userId": "ef617d7ad4c6982e2cb7f6902801eb8a",
"isSession": true,
"firstRun": 1429572011.15,
"priorEvents": 69,
"userAttributes": {
"Total Friends": "0",
"Device Type": "Tablet",
"Social Connection": "None",
"Item Slots Owned": "12",
"Total Levels Played": "0",
"Retention Cohort": "Day 0",
"Player Progression": "0",
"Characters Owned": "1"
},
"deviceId": "ef617d7ad4c6982e2cb7f6902801eb8a"
}
That SQL query works, except that it doesn't give me any return values for totalFriends (e.g. data#>>'{userAttributes, "Total Friends"}' as totalFriends). I assume that part of the problem is that events falls within a square bracket (I don't know what that indicates in the json format) as opposed to a curly brace, but I'm also unable to extract values from the userAttributes key.
I would appreciate it if anyone could help me.
I'm sorry if this question has been asked elsewhere. I'm so new to postgresql and even json that I'm having trouble coming up with the proper terminology to find the answers to this (and related) questions.
You should definitely familiarize yourself with the basics of json
and json functions and operators in Postgres.
In the second source pay attention to the operators -> and ->>.
General rule: use -> to get a json object, ->> to get a json value as text.
Using these operators you can rewrite your query in the way which returns correct value of 'Total Friends':
select
data->>'userId' as user,
data->>'region' as region,
data->>'priorTimeSpentInApp' as priotTimeSpentInApp,
data->'userAttributes'->>'Total Friends' as totalFriends
from game_json
where game_name like 'myNewGame';
Json objects in square brackets are elements of a json array.
Json arrays may have many elements.
The elements are accessed by an index.
Json arrays are indexed from 0 (the first element of an array has an index 0).
Example:
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
-- returns "Mission1"
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
This did help me