I have a simple EAV table, that I want to convert to JSON/B and insert it into a column that I will add to the entity table.
This is meant to be used as a migration query.
My EAV :
Record ( id, ... )
RecInfos ( recordid, key, value )
For earch entry in the record table, it will create a json representation of each key / value that is can be found in the RecInfos table, and this will be send as an update on the Record table.
I am using postgresql 10.3
Here is what I was searching for :
update
record r
set
infos = (
select
json_agg(json_build_object('name',i.name,'value',i.value))
from
recinfos i
where
i.rec_id = r.id
)
Related
Let's say I have created the following dimension table:
create table schema1.DOMAIN (
ID INT AUTO_INCREMENT PRIMARY KEY NOT NULL,
DOMAIN_NAME VARCHAR(10)
);
And I have a table of logs with records where DOMAIN_NAME is a column. My goal here is to write an insert statement that will populate this dimension table with values for DOMAIN_NAME, but only when they don't already exist. For example:
INSERT INTO schema1.DOMAIN (ID, DOMAIN_NAME)
select distinct DOMAIN_NAME from LOGS l where not exists (select 1 from schema1.DOMAIN d where d.domain_name = l.domain_name);
I haven't actually run this on a MySQL db yet, but I have the following questions:
Notice I didn't supply a value for the ID column in schema1.DOMAIN for the insert. Does this matter? If it's not supplied, will it simply auto-increment the primary key? Or will it throw an error? Is there a way to avoid supplying this ID and have it auto-increment automatically? This is the desired behavior for me. What is the best way to do this?
Is there a more performant way to do this?
I want this to work whether schema1.DOMAIN is empty or already has records and we are dumping parsing a log for a new value. Are these two objectives not compatible.
1.Notice I didn't supply a value for the ID column in schema1.DOMAIN for the insert. Does this matter? If it's not supplied, will it simply auto-increment the primary key? Or will it throw an error? Is there a way to avoid supplying this ID and have it auto-increment automatically? This is the desired behavior for me. What is the best way to do this?
Ans.
INSERT INTO schema1.DOMAIN (DOMAIN_NAME)
select distinct DOMAIN_NAME from LOGS l where not exists (select 1 from schema1.DOMAIN d where d.domain_name = l.domain_name);
2.Is there a more performant way to do this?
Ans. Left outer join would perform better
3.I want this to work whether schema1.DOMAIN is empty or already has records and we are dumping parsing a log for a new value. Are these two objectives not compatible.
Ans. Seems compatible
The query you wanted to write - I just removed id from the list of columns for insert: it will auto-increment automatically for every insert:
insert into schema1.domain (domain_name)
select distinct domain_name
from logs l
where not exists (select 1 from schema1.domain d where d.domain_name = l.domain_name);
You could also use the insert ... on duplicate key syntax. This requires defining a unique constraint on the domain column:
create table schema1.domain (
id int auto_increment primary key not null,
domain_name varchar(10) unique
);
Then you can do:
insert into schema1.domain (domain_name)
select distinct domain_name from logs l
on duplicate key update domain = values(domain)
When a domain that already exists in the table is met, the query goes to the on duplicate key clause, where a dummy operation is performed.
I currently have a MySQL table that is formatted as such -
What I would like to do is move this data to a new table, but instead of having VARCHAR's for the part, location, and customer data, I would like to assign each of them autoincrementing id's based on the value. For example, part "DEF" would have an id of 1 and part "GHI" would have an id of 2. This is what the table would look like -
Is there an SQL query to do this?
You want the values in the part and loc columns to be auto-incrementing integers, or you have type tables for part and loc respectively with auto-incrementing integers?
Option-1:
Create the new table with a different name than the old one. Insert the entries into the new table from the old table mapping values to integers as you go.
INSERT INTO new_table_name (part, loc, quan, date, customer)
SELECT CASE
WHEN part = 'DEF' THEN 1
WHEN part = 'GHI' THEN 2
END
, CASE
WHEN loc = '...' THEN 1
WHEN loc = '...' THEN 2
WHEN loc = '...' THEN 3
END
, quan
, date
, customer
FROM original_table
Option-2:
The following is a sample type table for part:
If you have a type table for part and loc, you can do something like this...
SELECT part.id
, loc.id
, quan
, date
, customer
FROM original_table orig INNER JOIN part prt
ON orig.part = prt.value
INNER JOIN loc
ON orig.loc = loc.value
As far as I know, there is no way to use the auto-increment feature to directly generate values for the table you described.
It is good practice to create new tables and populate it rather than trying to change the existing one (Especially, if your application is live and you are dealing with customer data).
I would suggest you to create a new schema as follows:
Schema Diagram
You can populate all the new tables using your existing table and assign all the primary keys to Auto-increment.
This helps you to scale your application and maintain it easily.
I want to insert data into the columns where no data can be inserted more than once! No data of same type/Category can be same!
I Know the easier / best way is to use defining the attribute as UNIQUE / PRIMARY KEY ... But there any others ways to do this!
You can check the data before inserting it, by using a group by clause, or distinct or a join. It really depends on your requirement.
For example, if the data is exactly identical , using DISTINCT is enough:
INSERT INTO <YourTable>
SELECT DISTINCT ...
FROM ...
Or directly check if the data exists in the table:
INSET INTO <YourTable>
SELECT ....
FROM Table s
WHERE NOT EXISTS(SELECT 1 FROM YourTable t
WHERE t.type = s.type and t.category = s.category)
And so on..
I'm trying to remove doublettes (sometimes triplettes, unfortunately!) from a MySQL table. My issue is that the only unique data available are the primary key, so in order to identify doublettes, you have to take account all the columns.
I've managed to identify all records that have doublettes and copied them along with their doublettes (including their primary keys) to the table temp. The source table is called translation and it has an integer primary key with the name TranslationID. How do I move on from here? Thanks!
edit Available columns are:
TranslationID
LanguageID
Translation
Etymology
Type
Source
Comments
WordID
Latest
DateCreated
AuthorID
Gender
Phonetic
NamespaceID
Index
EnforcedOwner
The duplicity issue resides with the rows with the Latest column assigned 1.
edit #2 Thank you, everyone for your time! I've solved the problem by using WouterH's answer, resulting in the following query:
DELETE from translation USING translation, translation as translationTemp
WHERE translation.Latest = 1
AND (NOT translation.TranslationID = translationTemp.TranslationID)
AND (translation.LanguageID = translationTemp.LanguageID)
AND (translation.Translation = translationTemp.Translation)
AND (translation.Etymology = translationTemp.Etymology)
AND (translation.Type = translationTemp.Type)
AND (translation.Source = translationTemp.Source)
AND (translation.Comments = translationTemp.Comments)
AND (translation.WordID = translationTemp.WordID)
AND (translation.Latest = translationTemp.Latest)
AND (translation.AuthorID = translationTemp.AuthorID)
AND (translation.NamespaceID = translationTemp.NamespaceID)
You can remove duplicates without temporary table or subquery. Delete all rows that have the same data but a different TranslationID
DELETE from translation USING translation, translation as translationTemp
WHERE (NOT translation.TranslationID = translationTemp.TranslationID)
AND (translation.LanguageID = translationTemp.LanguageID)
AND (translation.Translation = translationTemp.Translation)
AND (translation.Etymology = translationTemp.Etymology)
AND // compare other fields here
Create a SELECT statement with your current SELECT as a sub-select, so that you can return a col of IDs that should be removed. Then apply that SELECT in a DELETE FROM statement.
Example (pseudo code):
SELECT1 = SELECT ... AS temp; # the table you have right now
SELECT2 = SELECT TranslationID FROM (SELECT1)
Final query will look like this:
DELETE FROM table_name WHERE TranslationID IN (SELECT2);
You just need to insert the SELECT with sub-select in the final query.
Top stop duplicates in future you can change your engine to the InnoDB engine like this:
ALTER TABLE table_name ENGINE=InnoDB;
Then add a Unique constraint to the TranslationID field.
If the doublettes/triplettes are identical except for the primary key, then you can select all records from temp which are identical to another except for having a larger primary key than that other; this will give you temp w/ the record w/ the minimum key for each doublet/triplette. You can then delete these records from translation.
Instead of identifying the lines that aren't unique, I would try to copy the valid data to a new table, and then remove the old one and replace it by this new, cleaned table.
I can see of two ways:
Using the DISTINCT keyword in your SQL query (source);
Using a GROUP BY statement on all columns (source).
I have a table in my database which has duplicate records that I want to delete. I don't want to create a new table with distinct entries for this. What I want is to delete duplicate entries from the existing table without the creation of any new table. Is there any way to do this?
id action
L1_name L1_data
L2_name L2_data
L3_name L3_data
L4_name L4_data
L5_name L5_data
L6_name L6_data
L7_name L7_data
L8_name L8_data
L9_name L9_data
L10_name L10_data
L11_name L11_data
L12_name L12_data
L13_name L13_data
L14_name L14_data
L15_name L15_data
see these all are my fields :
id is unique for every row.
L11_data is unique for respective action field.
L11_data is having company names while action is having name of the industries.
So in my data I'm having duplicate name of the companies in L11_data for their respective industries.
What I want is to have is unique name and other data of the companies in the particular industry stored in action. I hope I have stated my problem in a way that you people can understand it.
Yes, assuming you have a unique ID field, you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their group of values.
Example query:
DELETE FROM Table
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM Table
GROUP BY Field1, Field2, Field3, ...
)
Notes:
I freely chose "Table" and "ID" as representative names
The list of fields ("Field1, Field2, ...") should include all fields except for the ID
This may be a slow query depending on the number of fields and rows, however I expect it would be okay compared to alternatives
EDIT: In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.
ALTER IGNORE TABLE 'table' ADD UNIQUE INDEX(your cols);
Duplicates get NULL, then you can delete them
DELETE
FROM table_x a
WHERE rowid < ANY (
SELECT rowid
FROM table_x b
WHERE a.someField = b.someField
AND a.someOtherField = b.someOtherField
)
WHERE (
a.someField,
a.someOtherField
) IN (
SELECT c.someField,
c.someOtherField
FROM table_x c
GROUP BY c.someField,
c.someOtherField
HAVING count(*) > 1
)
In above query the combination of someField and someOtherField must identify the duplicates distinctively.