Delete partial duplicate rows - duplicates

I have a Dataverse table that has a few columns. One of those columns is an Order Number column. There should only be one row per order number. If there is more than 1, only the first one should be kept. How can I do this in Power Automate?
What I have tried so far: First, I created an array of all the order numbers. From there, I feel stuck. I started to add an Apply to Each action, loop through the table, count how many of each order number there are, but then I confused myself and didn't think that was the right way to go.
Or...is there a way to keep the "duplicate" rows from getting added to the Dataverse table in the first place? The data is getting loaded into the table via a JSON load. Is there a way to delete the "duplicate" items from the JSON?
Here's an example of the situation:
| OrderNumber | OrderDate | CustomerName |
| 450123| 2-24-22 | Business A |
| 450123| 2-25-22 | Business A |
| 383238| 2-24-22 | Business B |

Related

Best way to handle duplicated rows

I have insurance companies "dictionary" in my database, let's say:
+----+-------------------+----------+
| ID | Name | Data |
+----+-------------------+----------+
| 1 | InsuranceCompany1 | SomeData |
+----+-------------------+----------+
But I'm fetching data from another system, and in result I got duplicates of insurance companies, but without my data:
+----+-------------------+----------+
| ID | Name | Data |
+----+-------------------+----------+
| 1 | InsuranceCompany1 | SomeData |
+----+-------------------+----------+
| 2 | InsuranceCompany1 | |
+----+-------------------+----------+
Both records are related in variety of models but they refer to the same data, and what I want is to pair these records without changing queries or data in other tables, so noone knows there are two records, but both refer to one instance which is
+----+-------------------+----------+
| 1 | InsuranceCompany1 | SomeData |
+----+-------------------+----------+
My question is: Is there some proper way to handle situations like this?
I've came up with solution which is to add parent_id column, and manually set parent_id in duplicated rows, and then override Eloquent methods like find in a model to return parent if there is parent_id set.
Copying SomeData column is not an option because there can be condition if insurance_company_id == id;
You can try creating a view of your dict table something like this:
CREATE VIEW unique_dict AS
SELECT MIN(ID) ID,
Name,
GROUP_CONCAT(Data) Data
FROM dict
GROUP BY Name
That will give you one row per name.
Then, in your queries requiring one row per name, SELECT from the unique_dict view rather than the dict table.
GROUP_CONCAT() yields a list of values from Data, which helps if more than one duplicated row contains a value: you get them all.
Longer term you might be smart to consider these duplicates to be "dirty data", and clean them up as you INSERT new rows. How to do that?
Create a unique index on Name.
CREATE UNIQUE INDEX unique_name ON dict(Name);
Then, when loading new data into dict use Eloquent's updateOrCreate() function. Here's something to read about that. Laravel 5.1 Create or Update on Duplicate

How to extract relational data from a flat table using SQL?

I have a single flat table containing a list of people which records their participation in different groups and their activities over time. The table contains following columns:
- name (first/last)
- e-mail
- secondary e-mail
- group
- event date
+ some other data in a series of columns, relevant to a specific event (meeting, workshop).
I want to extract distinct people from that into a separate table, so that further down the road it could be used for their profiles giving them a list of what they attended and relevant info. In other words, I would like to have a list of people (profiles) and then link that to a list of groups they are in and then a list of events per group they participated in.
Obviously, same people appear a number of times:
| Full name | email | secondary email | group | date |
| John Smith | jsmith#someplace.com | | AcOP | 2010-02-12 |
| John Smith | jsmith#gmail.com | jsmith#somplace.com | AcOP | 2010-03-14 |
| John Smith | jsmith#gmail.com | | CbDP | 2010-03-18 |
| John Smith | jsmith#someplace.com | | BDz | 2010-04-02 |
Of course, I would like to roll it into one record for John Smith with both e-mails in the resulting People table. I can't rule out that there might be more records for same person with other e-mails than those two - I can live with that. To make it more complex ideally I would like to derive a list of groups, creating a Groups table (possibly with further details on the groups) and then a list of meetings/activities for each group. By linking that I would then have clean relational model.
Now, the question: is there a way to perform such a transformation of data in SQL? Or do I need to write a procedure (program) that would traverse the database and do it?
The database is in MySQL, though I can also use MS Access (it was given to me in that format).
There is no tool that does this automatically. You will have to write a couple queries (unless you want to write a DTS package or something proprietary). Here's a typical approach:
Write two select statements for the two tables you wish to create-- one for users and one for groups. You may need to use DISTINCT or GROUP BY to ensure you only get one row when the source table contains duplicates.
Run the two select statements and inspect them for problems. For example, it's possible some users show up with two different email addresses, or some users have the same name and were combined incorrectly. These will need to be cleaned up in order to proceed. There is great way to do this-- it's more or less a manual process requiring expert knowledge of the data.
Write CREATE TABLE scripts based on the two SELECT statements so that you can store the results somewhere.
Use INSERT FROM or SELECT INTO to populate the tables from your two SELECT statements.

Autoincrement non-unique column

I am trying to create a queue system for pre-orders for our webshop. Sometimes we have more orders than stock for a few deliveries to our warehouse and I'm trying to organize so that the people that made the orders first gets their products first.
The problem comes when a customer wants to make a change to an order, by for example adding something to it. For booking purposes we then make a return of the first order to our system which creates another order, and then finally create a new order with everything on it. This causes this customer to be last in the queue list in our current system where we go by date created.
What I would like to do is to have the original queue spot be copied over to the new order without messing up the autoincrementing. This also means that there will be three orders (Original, plus return, plus new) with the same number.
id | order | queue | ordercomment
1 | 1001 | 1 | new order
2 | 1002 | 2 | new order
3 | 1003 | 3 | new order
4 | 1004 | 1 | return order 1001
5 | 1005 | 1 | corrected order 1001
6 | 1006 | 4 | new order
Is there any way to handle this without making a manual incrementing solution that checks for the current highest number whenever an order is made?
where we go by date created
But your data has no such date. You're relying on incrementing integers to determine the sort order, and that's where you're running into trouble.
If you want to sort by the date created, store the date the order was created. Any time you modify, append, or otherwise recreate an order you can still preserve the original order date. Perhaps with two columns, the date of the current order and the date of the original order. (For most orders these two values would be the same, but there's nothing wrong with that.)
Then your order of priority would simply be the date of the original order.
Basically, don't try to use an integer as a timestamp. Use a timestamp.

How to get the right "version" of a database entry?

Update: Question refined, I still need help!
I have the following table structure:
table reports:
ID | time | title | (extra columns)
1 | 1364762762 | xxx | ...
Multiple object tables that have the following structure
ID | objectID | time | title | (extra columns)
1 | 1 | 1222222222 | ... | ...
2 | 2 | 1333333333 | ... | ...
3 | 3 | 1444444444 | ... | ...
4 | 1 | 1555555555 | ... | ...
In the object tables, on an object update a new version with the same objectID is inserted, so that the old versions are still available. For example see the entries with objectID = 1
In the reports table, a report is inserted but never updated/edited.
What I want to be able to do is the following:
For each entry in my reports table, I want to be able to query the state of all objects, like they were, when the report was created.
For example lets look at the sample report above with ID 1. At the time it was created (see the time column), the current version of objectID 1 was the entry with ID 1 (entry ID 4 did not exist at that point).
ObjectID 2 also existed with it's current version with entry ID 2.
I am not sure how to achieve this.
I could use a query that selects the object versions by the time column:
SELECT *
FROM (
SELECT *
FROM objects
WHERE time < [reportTime]
ORDER BY time DESC
)
GROUP BY objectID
Lets not talk about the performance of this query, it is just to make clear what I want to do. My problem is the comparison of the time columns. I think this is no good way to make sure that I got the right object versions, because the system time may change "for any reason" and the time column would then have wrong data in it, which would lead to wrong results.
What would be another way to do so?
I thought about not using a time column for this, but instead a GLOBAL incremental value that I know the insertion order across the database tables.
If you are interting new versions of the object, and your problem is the time column(I assume you are using this column to sort which one is newer); I suggest you to use an auto-incremental ID column for the versions. Eventually, even if the time value is not reliable for you, the ID will be.Since it is always increasing. So higher ID, newer version.

mysql lookup table

Lookup table - unique row identity
The other lookup tables just do not make sense as from what I have seen giving a row an ID then putting that id in another table which also has a id then adding these id's to some more tables which may reference them and still creating a lookup tables with more id's (this is how all the examples I can find seem) What I have done is this :
product_item - table
------------------------------------------
id | title | supplier | price
1 | title11 | suuplier1 | price1
etc.
it then goes on to include more items (sure you get it)
product_feature - table
--------------------------
id | title | iskeyfeature
1 | feature1 | true
feature_desc - table
-----------------------------
id | title | desc
1 | desc1 | text description
product_lookup - table
item_id | feature_id | feature_desc
1 | 1 | 1
1 | 2 | 2
1 | 3 | 3
1 |64 | 15
(as these only need to be referenced in the lookup the id's can be multiples per item or multiple items per feature)
What I want to do without adding item_id to every feature row or description row is retrieve only the columns from the multiple tables where their id is referenced in the same row of the lookup table. I want to know if it is possible to select all the referenced columns from the lookup row if I only know the item_id eg. Item_id = 1 return all rows where item_id = 1 with the columns referenced in the same row. Every item can have multiple features and also every feature could be attached to multiple items , this will not matter if I can just get the pattern right in how to construct this query from a single known value.
Any assistance or just some direction will be greatly appreciated. I'm using phpmyadmin, and sure this will be easier with some php voodoo I am learning mysql from tutorials ect and would like to know how to do it with sql directly.
Having a NULL value in a column is not the major concern that would lead to this design - it's the problem with adding new attribute columns in the future, at which MySQL is disgracefully bad.
If you want to make a query that returns everything about an item in one row, you need to LEFT OUTER JOIN back to the product_lookup table for each feature_id. This is about every 10th mysql question on Stack Overflow, so you should be able to find tons of examples.