Append text from one row in dataframe to all rows in another dataframe - cbind

I would like to add unique identifiers from one dataframe to all the rows in another dataframe, but do it for all the dataframes I have. For example, unique identifier 1 would be added to all rows in data1, unique identifier 2 would be added to all rows in data2, and so on and so forth. Preferably this would be done in order, so that unique identifier 1 would line up to 1 and so on and so forth.
Something like this
Data1:
Data
Identifier
a
1
b
1
Data 2:
Data
Identifier
a
2
b
2
I have the dataframes listed as Data1, Data2, Data3, etc using a for loop, and the identifiers in another dataframe.
for(i in 1:length(data_files)) {
assign(paste0("data", i),
read.csv(paste0("~/documents/",
data_files[i])))
}
Thanks!

Related

Sort JSON array by values

Is it possible to sort json arrays so you would get the same result no matter the original value order?
Original:
id json_field
--------------
1 ["john", "mike"]
2 ["mike", "john"]
Altered:
id json_field
--------------
1 ["john", "mike"]
2 ["john", "mike"]
I need this so i could create a virtually generated index to quickly find records with json fields containing same values
ALTER TABLE `table` ADD COLUMN sorted_json_index VARCHAR(255) GENERATED ALWAYS as FUNCTION_TO_SORT_JSON_SOMEHOW(json_field) AFTER json_field

How to organise the storage many to many relations in Json in mysql

I have the table with JSON-field (example)
# table1
id | json_column
---+------------------------
1 | {'table2_ids':[1,2,3], 'sone_other_data':'foo'}
---+------------------------
2 | {'foo_data':'bar', 'table2_ids':[3,5,11]}
And
# table2
id | title
---+------------------------
1 | title1
---+------------------------
2 | title2
---+------------------------
...
---+------------------------
11 | title11
Yes, I know about stored many-to-many relation in the third table. But it's a duplication data (in first case relations in Json_column, in second in third-table)
I know about generated columns in MySQL, but I don't understand how to use it for stored m2m relations. Maybe I have use views to get pairs of table1.id <-> table2.id. But how use index in this case?
I can't understand your explanation for why you can't use a third table to represent the many-to-many pairs. Using a third table is of course the best solution.
I think views have no relevance to this problem.
You could use JSON_EXTRACT() to access individual members of the array. You can use a generated column to pull each member out so you can easily reference it as an individual value.
create table table1 (
id int auto_increment primary key,
json_column json,
first_table2_id int as (json_extract(json_column, '$.table2_ids[0]'))
);
insert into table1 set json_column = '{"table2_ids":[1,2,3], "sone_other_data":"foo"}'
(You must use double-quotes inside a JSON string, and single-quotes to delimit the whole JSON string.)
select * from table1;
+----+-----------------------------------------------------+-----------------+
| id | json_column | first_table2_id |
+----+-----------------------------------------------------+-----------------+
| 1 | {"table2_ids": [1, 2, 3], "sone_other_data": "foo"} | 1 |
+----+-----------------------------------------------------+-----------------+
But this is still a problem: In SQL, the table must have the columns defined by the table metadata, and all rows therefore have the same columns. There no such thing as each row populating additional columns based on the data.
So you need to create another extra column for each potential member of the array of table2_ids. If the array has fewer elements than the number of columns, JSON_EXTRACT() will fill in NULL when the expression returns nothing.
alter table table1 add column second_table2_id int as (json_extract(json_column, '$.table2_ids[1]'));
alter table table1 add column third_table2_id int as (json_extract(json_column, '$.table2_ids[2]'));
alter table table1 add column fourth_table2_id int as (json_extract(json_column, '$.table2_ids[3]'));
I'll query using vertical output, so the columns will be easier to read:
select * from table1\G
*************************** 1. row ***************************
id: 1
json_column: {"table2_ids": [1, 2, 3], "sone_other_data": "foo"}
first_table2_id: 1
second_table2_id: 2
third_table2_id: 3
fourth_table2_id: NULL
This is going to get very awkward. How many columns do you need? That depends on how many table2_ids is the maximum length of the array.
If you need to search for rows in table1 that reference some specific table2 id, which column should you search? Any of the columns may have that value.
select * from table1
where first_table2_id = 2
or second_table2_id = 2
or third_table2_id = 2
or fourth_table2_id = 2;
You could put an index on each of these generated columns, but the optimizer won't use them.
These are some reasons why storing comma-separated lists is a bad idea, even inside a JSON string, if you need to reference individual elements.
The better solution is to use a traditional third table to store the many-to-many data. Each value is stored on its own row, so you don't need many columns or many indexes. You can search one column if you need to look up references to a given value.
select * from table1_table2 where table2_id = 2;

Postgres JSON querying - match all pairs, ignore order

I have a Postgres table that contains a jsonb column, the data in which is arbitrarily deep.
id | jsonb_data
---|----------------------
1 | '{"a":1}'
2 | '{"a":1,"b":2}'
3 | '{"a":1,"b":2,"c":{"d":4}}'
Given a JSON object in my WHERE clause, I want to find the rows that contain objects that contain the same data and no more, but in any order. Including, preferably, nested objects.
SELECT * FROM table
WHERE json_match_ignore_order(jsonb_data, '{"b":2,"a":1}');
id | jsonb_data
---|-----------
2 | '{"a":1,"b":2}'
This would essentially work identically to the following Ruby code, but I'd really like to do it in the database if possible.
table.select { |row| row.jsonb_data_as_a_hash == {b: 2, a: 1} }
How can I do this?
With jsonb type you can use equal sign even for values with nested objects.
Thus the following will also work:
create table jsonb_table(
id serial primary key,
jsonb_data jsonb
);
insert into jsonb_table(jsonb_data)
values
('{"a":1}'),
('{"a":{"c":5},"b":2}'),
('{"a":{"c":5},"b":2,"c":{"d":4}}');
select * from jsonb_table
where jsonb_data = '{"b":2,"a":{"c":5}}'::jsonb;
You will get rows with objects containing same keys with same values recursively (in this case only the second row).

MySQL: Insert data from one file into two tables, one with auto_increment id, and one join table

I've got data with 500M+ records, in a file with 2 fields, blob and c_id.
There are also two other files with the same data in other formats:
A file with 90M unique blobs.
A file with a blob, and a comma-separated list of c_ids per record.
I've got two tables:
table_a: [id, blob] # id is auto-increment
table_b: [a_id, c_id]
For each unique blob, a record in table_a must be created. For each record in the file, a record in table_b must be created with the appropriate foreign key to table_a.
The solution I use now is to generate insert statements, using last_insert_id, but it's too slow. I'd prefer to use LOAD DATA INFILE, but the auto-increment id is making stuff complicated.
e.g.
# Raw data
c_id blob
1 aaaa
2 aaaa
3 aaaa
3 aaab
4 aaac
Desired output:
# Table_a
id blob
1 aaaa
2 aaab
3 aaac
# Table_b
c_id a_id
1 1
2 1
3 2
3 2
4 3
I am not sure how you are populating the "c_id" field for table_b, but you can do something like this:
load all the data in table_a first, then load table_b by executing batch queries like:
"SELECT id into outfile '/tmp/file1.csv' FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n' from table_a where id > 0 limit 100000"
and using load infile on '/tmp/file1.csv'.

combining two identical tables( actually table fields) into one

this is actually a followup from a previous question but it contains different information , so I decided to start a new topic.
Summary :
I have 2 tables one TMP_TABLE and a BKP_TABLE. Both have the same fieldstructure and identical datatypes for the fields( with almost identical data).
let's say TMP_TABLE is constructed like this
TMP_TABLE
NAME PREFIX PARAMETERS
data data data
data data data
data data data
and BKP_TABLE looks like this
BKP_TABLE
NAME PREFIX PARAMETERS
data1 data1 data1
data1 data1 data1
data1 data1 data1
Is it possible to combine these two tables into one that looks like this
END_RESULTTABLE
NAME PREFIX PARAMETERS
data data1 data1
data data1 data1
data data1 data1
As you can see I wish to drop one of the fields and replace it with another.
The sequence is pretty much the same so I don't have to worry about records being incorrect.
A side question
At the moment both TMP and BKP contain the exact same data ( 113 records)
when I do this
SELECT * FROM TMP_TABLE
UNION ALL
SELECT * FROM BKP_TABLE
I get 226. Why does this happen. I thought that duplicate entries ( which I can clearly see) would not appear in my virtual table.
EDIT:
I would like to replace one field of TMP_data with BKP_table field ( example like name).
UNION ALL
will return all records from both selects (hence the ALL)
UNION
will remove duplicates
Assuming your two tables have a key in common (e.g. name), you can do something like this:
create table end_resulttable as
select t.name, t.prefix, b.parameters
from tmp_table t, bkp_table b
where t.name = b.name;
Is that what you mean?