Kusto Remove partial duplicate - relational-database

With the table storedata, I am trying to remove the row "Target TargetCheese 4"
The logic here is if there are two or more entries for the same product at a given store it will choose the StoreNumber which best fits that store based on the other rows. If the StoreNumber doesn't match but it is not a duplicate Product then the number will not change; for example SafewayEggs will have StoreNumber equal to 1 even though there are more Safeway entries with the StoreNumber as 6 because there is only one row of SafewayEggs.
let storedata=
datatable (Store:string, Product:string ,StoreNumber:string)
["Target", "TargetCheese", "4",
"Target", "TargetCheese", "5",
"Target", "TargetApple", "5",
"Target", "TargetCorn", "5",
"Target", "TargetEggs", "5",
"Kroger", "KrogerApple", "2",
"Kroger", "KrogerCorn", "2",
"Kroger", "KrogerEggs", "2",
"Safeway", "SafewayApple", "6",
"Safeway", "SafewayCorn", "6",
"Safeway", "SafewayEggs", "1"
];
I am hoping to see this result table from the storedata table:
Store Product StoreNumber
Target TargetCheese 5
Target TargetApple 5
Target TargetCorn 5
Target TargetEggs 5
Kroger KrogerApple 2
Kroger KrogerCorn 2
Kroger KrogerEggs 2
Safeway SafewayApple 6
Safeway SafewayCorn 6
Safeway SafewayEggs 1

You might need different steps:
find the "best fit" StoreNumber - in my example below, the one with most occurences, use arg_max
dataset that has to be cleaned up with (1), more than 1 occurence per store and product, use count
the dataset that needs no cleanup, only one occurence per store and product
a union of (3) and the corrected dataset
let storedata=
datatable (Store:string, Product:string ,StoreNumber:string)
["Target", "TargetCheese", "5",
"Target", "TargetCheese", "4",
"Target", "TargetApple", "5",
"Target", "TargetCorn", "5",
"Target", "TargetEggs", "5",
"Kroger", "KrogerApple", "2",
"Kroger", "KrogerCorn", "2",
"Kroger", "KrogerEggs", "2",
"Safeway", "SafewayApple", "6",
"Safeway", "SafewayCorn", "6",
"Safeway", "SafewayEggs", "1"
];
// (1) evaluate best-fit StoreNumber
let storenumber =
storedata
| order by Store, StoreNumber
| summarize occ= count () by Store, StoreNumber
| summarize arg_max(occ, *) by Store;
// (2) dataset to be cleaned = more than one occurence per store and product
let cleanup =
storedata
| summarize occ = count () by Store, Product
| where occ > 1
| project-away occ;
// (3) dataset with only one occurrence
let okdata =
storedata
| summarize occ= count () by Store, Product
| where occ==1
| project-away occ;
// (4) final dataset
let res1 =storenumber
| join cleanup on Store
| project Store, Product, StoreNumber;
let res2 = storedata
| join okdata on Store, Product
| project-away Store1, Product1;
res1
| union res2;

I don't understand the logic you want for removing the following line:
"Target", "TargetCheese", "4"
But if you want to take the highest value for Store and Product, then you can use the following approach:
storedata
| summarize max(StoreNumber) by Store, Product

Related

Select date range into different column

Name
Date
Score
A
01-01-2023
100
A
01-01-2023
200
A
03-01-2023
300
B
02-01-2023
400
B
03-01-2023
100
B
03-01-2023
100
i have this table and i want to seperate it into multiple column of date and SUM the score on that date using Query Builder laravel or Raw SQL so it become like :
Name
Day 1
Day 2
Day 3
A
300
0
300
B
0
400
200
all of this is upto the current month so january until 31 and so on
You aren't providing anything like your attempted query, how you are passing the date ( it is a range, month only etc ), and your desired json ouput.
its hard to even assume how you are going to do things specially you are passing a column value as column name in your desired result (which doesn't make much sense with raw sql query unless those columns
aren't dynamic).
but to give you a starting point, you can simply group them by name, then date, then do another grouping by date in the collection
e.i;
$result = DB::table('table_name')->select([
'name',
'date',
])
->selectRaw('sum(score) AS score')
->groupBy(['name', 'date'])->get();
return $result->groupBy('date');
then you should be able to get result in a format like below;
{
"01-01-2023" : [
{
"name": "A",
"date": "01-01-2023",
"score": "300"
}
],
"02-01-2023" : [
{
"name": "A",
"date": "02-01-2023",
"score": "300"
}
{
"name": "B",
"date": "02-01-2023",
"score": "200"
}
],
"03-01-2023" : [
.
.
.
]
}
For you desired table result, thats better be changed to a dynamic rows instead of dynamic column
EDIT
In reference with Karl answer, you can loop through a date range and inject additional select statement.
e.i. current month dates
$dateRange = \Carbon\CarbonPeriod::create(now()->startOfMonth(), now()->endOfMonth() )->toArray();
$result = DB::table('table_name')->select(['name']);
foreach ($dateRange as $date) {
$dateFormat = $date->format('d-m-Y');
$day = $date->format('j');
$result->selectRaw("SUM(CASE WHEN Date = '$dateFormat' THEN Score ELSE 0 END) AS 'Day $day'");
}
return $result->groupBy('name')->get();
just to keep date in group by
->groupBy('date');

Postgres: count rows in jsonb for a specific key

I have a table with two labels: id INT and value JSONB. In value I have a json object props with keys id_1, id_2, and so on, with their respective values.
Is there a way to count the rows where the JSON object props has a specific key, such as id_1?
In this example, there should be two results: rows 1 and 4.
id | value
1 | {"name": "Jhon", "props": {"id_1": {"role": "role1", "class": "class1"}, "id_2": {"role": "role2", "class": "class2"}}}
2 | {"name": "Frank", "role": ["role1", "role2"]}
3 | {"name": "Bob", "props": {"id_3": {"role": "role3", "class": "class3"}, "id_4": {"role": "role4"}}}
4 | {"name": "Carl", "props": {"id_5": {"role": "role5", "class": "class5"}, "id_1": {"class": "class6"}}}
I tried something like this, but to make it work, I have to also specify the value, but the value could change for every row. For example, with this query, I only get one row back.
SELECT count(value)
FROM "myTable"
where value->'props' ->> 'id_1' = '{"role": "role1", "class": "class1"}'
Try this-
SELECT COUNT(z.*) FROM (
SELECT id, value->'props'->>'id_1' as val FROM "myTable" ) z WHERE z.val
IS NOT NULL
Use the ? operator to test whether a key exists, regardless of value.
SELECT count(*)
FROM "myTable"
where value -> 'props' ? 'id_1

NodeJS, MySQL - JSON Stringify - Advanced query

I have an object in the table column saved using JSON.stringify and it looks like this:
[{
"id": 1,
"uID": 10
}, {
"id": 2,
"uID": 10
}, {
"id": 3,
"uID": 94
}]
I need a query that will check if a given column contains values e.g.
I want uID = 10 and id = 2 will return
I want uID = 10 and id = 5 will not return it
I want uID = 10 and id = 2, uID = 94 and id = 0 will not return it
(because uID = 94 and id = 0 is not here)
Unless you are querying programmatically where you can parse the JSON and then do the logic, I would recommend something like this:
SELECT * FROM Table WHERE Column LIKE '%"id": 1,"uID": 10%'
The LIKE keyword allows us to use wildcards (%) but still do an exact text match for what we define.
Add a wildcard in the middle, but note that order matters with this strategy.
SELECT * FROM Table WHERE Column LIKE '%"id": 1%"id": 2%'
I need it to work backwards too:] e.g. I have
[{
"id": 1,
"uID": 10
}, {
"id": 2,
"uID": 55
}, {
"id": 3,
"uID": 94
}]
SELECT * FROM Table WHERE Column LIKE '%"uID": 55%"uID": 94%' <- will be working
SELECT * FROM Table WHERE Column LIKE '%"uID": 94%"uID": 55%' <- will be not working
Does not work "back"

Modelling multi-valued columns in RDBMS [duplicate]

This question already has answers here:
How to return rows that have the same column values in MySql
(3 answers)
Closed 6 years ago.
I have raw data in JSON as follows:
{
"id": 1,
"tags": [{
"category": "location",
"values": ["website", "browser"]
},{
"category": "campaign",
"values": ["christmas_email"]
}]
},
{
"id": 2,
"tags": [{
"category": "location",
"values": ["website", "browser", "chrome"]
}]
},
{
"id": 3,
"tags": [{
"category": "location",
"values": ["website", "web_view"]
}]
}
The tag category and its values are dynamically generated and are not known beforehand. I need to load this data into an RDBMS table and then later make queries to the data. The queries may be as follows:
Extract all rows where location has values "website" and "browser". The output of this query should return rows with id 1 and 2.
I need some help in modelling this into a table schema to support such queries. I was thinking of tables as:
Table 1: MAIN
Columns: ID, TAG_LIST_ID
Row1: 1 TL1
Row2: 2 TL2
Row3: 3 TL3
Table 2: TAGS
Columns: TAG_ID, TAG_CATEGORY, TAG_VALUE
Row1: TID1 location website
Row2: TID2 location browser
Row3: TID3 location chrome
Row4: TID4 location web_view
Row5: TID5 campaign christmas_email
Table 3: TAG_MAPPING
Columns: TAG_MAPPING_ID, TAG_LIST_ID, TAG_ID
Row1: TMID1 TL1 TID1
Row2: TMID2 TL1 TID2
Row3: TMID3 TL1 TID5
Row4: TMID4 TL2 TID1
Row5: TMID5 TL2 TID2
Row6: TMID6 TL2 TID3
Row7: TMID7 TL3 TID1
Row8: TMID8 TL3 TID4
Now to query all rows where location has values "website" and "browser", I could write
SELECT * from MAIN m, TAGS t, TAG_MAPPING tm
WHERE m.TAG_LIST_ID=tm.TAG_LIST_ID AND
tm.TAG_ID = t.TAG_ID AND
t.TAG_CATEGORY = "location" AND
(t.TAG_VALUE="website" OR t.TAG_VALUE="browser")
However this will return all the three rows; changing the OR condition to AND will return no rows. What is the right way to design the schema?
Any pointers appreciated.
Just replace the OR by IN and a counter:
SELECT tm.TAG_LIST_ID, count(1) as cnt
FROM MAIN m, TAGS t, TAG_MAPPING tm
WHERE tm.TAG_LIST_ID= m.TAG_LIST_ID
AND tm.TAG_ID = t.TAG_ID
AND t.TAG_CATEGORY = "location" AND
AND t.TAG_VALUE IN ("website","browser")
GROUP by tm.TAG_LIST_ID
having count(1) > 1 -- should be greater than 1 because you are looking for 2 words. This values change according the number of words.

convert mysql result to array or hash

SQL query:-
Class Test
def self.execute_mysql(host, database, query)
Net::SSH.start('test.com', user, forward_agent: true) do |ssh|
ssh.exec!("mysql -ppassword -utestuser -h #{host} #{database} -A --execute '#{query}'")
end
end
Command to run:-
result = Test.execute_mysql('app', 'sample', 'select * from foo')
result string:-
id name address age
1 ram US 25
2 sam US 30
3 jack India 32
.
.
.
.
100 Peterson US 27
result variable returns as string class. Suppose it returns 100 records.How can i loop through each record ?
are you looking for something like this?
> result
=> "id name address age\n1 ram US 25\n2 sam US 30\n3 jack India 32"
> result.split(" ").each_slice(4){|e| print e }
=> ["id", "name", "address", "age"]["1", "ram", "US", "25"]["2", "sam", "US", "30"]["3", "jack", "India", "32"]
The answer to your question depends on a lot of things, and you carefully need to put a lot of checks to make it robust.
I'll however post a simple answer based on some assumptions to get you started on.
res = "id name address age
1 ram US 25
2 sam US 30
3 jack India 32"
arr = []
res.each_line {|line| arr << line.split(" ")}
arr
# => [["id", "name", "address", "age"], ["1", "ram", "US", "25"], ["2", "sam", "US", "30"], ["3", "jack", "India", "32"]]
Now you can easily iterate over the arrays to access particular attribute.