I am storing a list of integers as JSON array inside the column called ConvertedIds inside a table SelectionLogs
The type of column is MediumText and some of the example values of the column are [290766,319075,234525,325364,3472,34241,85643,11344556,88723,656378]
I am using following sql to generate the list of IDs from the column as rows
SELECT hm.id FROM SelectionLogs hlog,
JSON_TABLE(ConvertedIds, '$[*]' columns (Id int path '$')) AS hm
And then following query to extract further informations from other tables like
SELECT hm.id,hc.firstname ,hc.lastname ,hc.email FROM SelectionLogs hlog,
JSON_TABLE(ConvertedIds, '$[*]' columns (Id int path '$')) AS hm
LEFT JOIN contacts hc ON hc.Id = hm.id
Now i have to update this column based on the presence of a given value of IDs
For example if an ID exists in this column on any rows , i have to update the array after removing the ID
For example: [1,2,3,4,5,6,7] If ID : 3 exists , remove 3 and update column as [1,2,4,5,6,7]
I can use the following query to find the records from table SelectionLogs with given id present in column ConvertedIds
SELECT DISTINCT hlog.Id FROM SelectionLogs hlog,
JSON_TABLE(ConvertedIds, '$[*]' columns (Id int path '$')) AS hm
WHERE hm.id=427529
Now i have plans to iterate through each rows from my console program written in c#
foreach row in result
List<int> columnIds = read from column ConvertedIds as list of int
Remove the given int number from List
Update column ConvertedIds for given rowId refreshed List
Can i perform the updation via SQL itself ?
DEMO fiddle with some explanations.
-- source data
CREATE TABLE t1 (id INT, val JSON)
SELECT 1 id, '[1,2,3]' val UNION SELECT 2, '[3,4,5]' UNION SELECT 3, '[5,6,7]';
CREATE TABLE t2 (id INT) SELECT 1 id UNION SELECT 4;
-- UPDATE source table
UPDATE t1
JOIN ( SELECT t1.id, JSON_ARRAYAGG(jsontable.id) val
FROM t1
CROSS JOIN JSON_TABLE(t1.val,
'$[*]' COLUMNS (id INT PATH '$')) jsontable
LEFT JOIN t2 ON t2.id = jsontable.id
WHERE t2.id IS NULL
GROUP BY t1.id ) data_for_update USING (id)
SET t1.val = data_for_update.val;
Related
I got a list of IDs as comma seperated list of JSON values and some example data sets are like below [340596,340597,340595]
This list can be huge sometimes 50k IDs seperated by commas
The following query connects these IDs to a table primary key and fetch the records that is currently exists in the table
SELECT s.id,s.contactid, s.Quantity FROM
JSON_TABLE('[340596,340597,340595]', '$[*]' columns (Id int path '$')) AS sm
LEFT JOIN mastertable s ON s.Id = sm.id
The mastertable might contain these IDs or might be these records are erased from mastertable So the purpose of this query is to ensure the return result set contains only the active records
I have to apply one more filtering against this query and the filtering is based on another JSON int array and need to match it against the column ContactID
SELECT s.id,s.contactid, s.Quantity FROM
JSON_TABLE('[340596,340597,340595]', '$[*]' columns (Id int path '$')) AS sm
LEFT JOIN mastertable s ON s.Id = sm.id
WHERE s.ContactId IN (
SELECT cm.id FROM
JSON_TABLE('[12345,450597,640595]', '$[*]' columns (Id int path '$')) AS cm
)
However the Mysql IN perfomance is not better for large result sets . Can we replace this IN with some other better ways?
You can dump the ids inside IN clause in a temporary table and then join them with JSON_TABLE to get the result.
ALternatively you can use a CTE and join the same.
with temp as (
SELECT cm.id FROM
JSON_TABLE('[12345,450597,640595]', '$[*]' columns (Id int path '$')) AS cm
)
SELECT s.id,s.contactid, s.Quantity FROM
JSON_TABLE('[340596,340597,340595]', '$[*]' columns (Id int path '$')) AS sm
LEFT JOIN mastertable s ON s.Id = sm.id
INNER JOIN temp t ON s.ID = t.id;
Cont. with this post, a function is created to parse json input then insert values into three tables, with previous inserted ids as parameter for last
insert.
If i want to insert two arrays into the same table, i can just do
insert into t2 (car, car_type)
select json_array_elements_text(d::json -> 'car'),json_array_elements_text(d::json -> 'car_type')::int4 returning id;
)
how to make it work with index as below?
function:
create or replace function test_func(d json)
returns void as $$
begin
with j as (select d)
, a as (
select car,brand,type, t1.id oid
from j
join json_array_elements_text(j.d->'cars') with ordinality t1(car,id) on true
join json_array_elements_text(j.d->'brands') with ordinality t2(brand,id)
on t1.id = t2.id
join json_array_elements_text(j.d->'car_type') with ordinality t2(type,id)
on t1.id = t2.id // this line apparently doesnt work, t2 has been joined twice
)
, n as (
insert into t1 (name) values (d::json -> 'name') returning id
), c as (
insert into t2 (cars,car_type) select car,type from a order by oid returning id // needs to insert two columns here from two arrays
)
, ag as (
select array_agg(c.id) cid from c
)
insert into t3 (id, name_id, cars_id, brand)
select 1, n.id,cid[oid], brand
from a
join n on true
join ag on true
;
end;
$$ language plpgsql;
Tables:
CREATE TABLE t1 ( "id" SERIAL PRIMARY KEY, "name" text NOT NULL );
CREATE TABLE t2 ( "id" SERIAL PRIMARY KEY, "cars" text NOT NULL, "car_type" int );
CREATE TABLE t3 ( "id" int, "name_id" int REFERENCES t1(id), "cars_id" int REFERENCES t2(id), "brand" text );
Test:
select test_func('{"name":"john", "cars":["bmw X5 xdrive","volvo v90 rdesign"], "brands":["bmw","volvo"],"car_type":[1,1]}');
you used t2 for aliasing two different sets - try:
create or replace function test_func(d json)
returns void as $$
begin
with j as (select d)
, a as (
select car,brand,car_type, t1.id oid
from j
join json_array_elements_text(j.d->'cars') with ordinality t1(car,id) on true
join json_array_elements_text(j.d->'brands') with ordinality t2(brand,id)
on t1.id = t2.id
join json_array_elements_text(j.d->'car_type') with ordinality car_t(car_type,id)
on t1.id = car_t.id
)
, n as (
insert into t1 (name) values (d::json -> 'name') returning id
), c as (
insert into t2 (cars,car_type) select car,car_type::int from a order by oid returning id -- needs to insert two columns here from two arrays
)
, ag as (
select array_agg(c.id) cid from c
)
insert into t3 (id, name_id, cars_id, brand)
select 1, n.id,cid[oid], brand
from a
join n on true
join ag on true
;
end;
$$ language plpgsql;
result:
t=# select * from t2;
id | cars | car_type
----+-------------------+----------
1 | bmw X5 xdrive | 1
2 | volvo v90 rdesign | 1
(2 rows)
I have a table called data:
create table data
(
ID int primary key,
val varchar(50),
forID int
constraint fk_forID foreign key (forID) references otherTable(forID)
)
I have a view called dataFrequencies
create view dataFrequencies (val, freq)
as select val, COUNT(*)
from data
group by val
order by freq desc
What I want is the subset of rows from table data where val is in the top fifty rows of dataFrequencies.
My current solution is somewhat roundabout. I create a table topFifty that contains the top 50 rows of dataFrequencies. Then I create a view topFiftyVals which selects all from data but inner joins on table topFifty:
create table topFifty
(
val varchar(50) primary key
)
insert into topFifty select val from dataFrequencies order by frequency desc limit 50;
create view topFiftyVals (ID, val, forID)
as select *
from data d
inner join topFifty tf on d.val = tf.val
I am sure there is some kind of direct querying method that will do this! Thanks for all the help!
Yes, there is a direct way. It's the code in your topFiftyVals view, slightly altered:
select d.*, tf.freq
from data d
inner join ( select val, COUNT(*) AS freq
from data
group by val
order by freq desc
limit 50
) tf
on d.val = tf.val ;
Couldn't you just do:
SELECT *
FROM data
WHERE val IN (SELECT val
FROM dataFrequencies
ORDER BY frequency DESC
LIMIT 50);
We are working on large volume data (row counts given below) :
Table 1 : 708408568 rows -- 708 million
Table 2 : 1416817136 rows -- 1.4 billion
Table 1 Schema:
----------------
ID - Int PK
column2 - Int
Table 2 Schema
----------------
Table1ID - Int FK
SomeColumn - Int
SomeColumn - Int
Table1 has PK1 which servers as FK for Table 2.
Index details :
Table1 :
PK Clustered Index on Id
Non Clustered (Non Unique) on column2
Table 2 :
Table1ID (FK) Clustered Index
Below is the query which needs to be executed :
SELECT t1.[id]
,t1.[column2]
FROM Table1 t1
inner join Table2 t2
on s.id = cs.id
WHERE t1.[column2] in (select [id] from ConvertCsvToTable('1,2,3,4,5.......10000')) -- 10,000 Comma seperated Ids
So to summarize, The inner join on ID should be handled by the clustered index on the same Ids on both PK and FK.
and as for the "huge" Where condition on column2 we have a nonclustered index.
However, the query is taking 4 minutes for a small subset of 100 Ids, we need to pass 10,000 ids.
Is there a better way design wise that we can do this, or possibly does Table Partitioning help?
Just wanted to get some ways of how to solve huge volume Select with Inner Join and Where IN.
Note : ConvertCsvToTable is a Split function which has already been determined to perform optimally.
Thanks !
This is what I would try:
Create a temp table with the structure of the return from the function. Make sure to set the column ID as primary key so that the optimizer takes it into consideration...
CREATE TABLE #temp
(id int not null
...
,PRIMARY KEY (id) )
then call the function
insert into #temp exec ConvertCsvToTable('1,2,3,4,5.......10000')
then use the temp table directly joined in the query
SELECT t1.[id], t1.[column2]
FROM Table1 t1, t2, #temp
where t1.id = t2.id
and t1.[column2] = #temp.id
Bring the condition into the join
It gives the optimizer a chance to first filter by t1.[column2] first
Try different hash hints
SELECT t1.[id], t1.[column2]
FROM Table1 t1 with (nolock)
inner join Table2 t2 with (nolock)
on s.id = cs.id
and t1.[column2] in (select [id] from ConvertCsvToTable('1,2,3,4,5.......10000'))
You may need to tell it to use that index on Column2.
But give it a chance to do the right thing.
In the where you were not giving it a chance to do the right thing.
If you go with #temp then try
(and declare a PK on the temp as Rodolfo stated +1)
This will pretty much force it to start with small table
It could still get stupid do the join on T2 first but I doubt it.
SELECT t1.[id], t1.[column2]
FROM #temp
JOIN Table1 t1 with (nolock)
on t1.[column2] = #temp.ID
join Table2 t2 with (nolock)
on t2.ID = t1.ID
I want to show static data using mysql if id has no value in another table. I've used Left Join in joining them and if the id from another table does not exist in the table joined it will not display a thing, so is it possible to display a value for that specific id with no equal values in the other table?..
You can use COALESCE(yourLeftJoinTable.yourLeftJoinField,0) to display 0 if the value is null i.e.
SELECT
table1.*,
COALESCE(table2.id,0) AS table2ID
FROM table1
LEFT JOIN table2
ON table2.t1_id = table1.id
The following assumes table1 has field (id INT PK), and table2 has fields (id INT PK, t1_id INT) where table2.t1_id links to table1.id