Sql delete thousands of rows with text file - mysql

I have a database that has a few tables with over 150,000 rows. I want to delete all but about 18,000.
My plan is to go through and obtain the Id of every item I need to keep. I have a list but some names might not match so I plan on manually documenting the ids that I need to keep in a text file. Is there a way I can use said text file to delete everything but the IDs in this text file? Is there a better way?
Backstory: I'm running a game server eq emulator and was able to acquire a list of all items in the game up the expansions I want from the game wiki. But the emulator was written years into the release so items are not in order. I need to get rid of many items on this list of ids and I need to do this in multiple tables.
I tried searching the internet for like situations and saw references to using a list to complete the task. Now my goal is generating a list of items to keep because it is a lot smaller than the items I need to remove.

Alter the table(s) to add column named something like ToDelete. Then you can start updating this column to target your desired rows, and can do it across potentially many steps. Finally, you can run one short DELETE query to remove every targeted row by targeting the new column in the WHERE clause.
I would also be inclined to rename or copy the tables first, to preserve the data in the existing database for quick recovery of rows anywhere you make a mistake.

Related

SSRS (MS Report Builder) - How do I add multiple lines of queried details to a single row of otherwise more general info?

I've been trying to figure this out, but I'm struggling.
Working in Microsoft Report Builder (latest version), I have a table that, for the most part, contains general information from a specific table, which I'll call GeneralInfo. In that table, each person has only a single row containing information about that person. However, one of my columns has a one-to-many relationship with the rest of the row. I'll call this other column DetailInfo.
This table provides an example of the kind of thing I'm going for:
In this example, all of the white cells come from the GeneralInfo table. The Orange cell may include many rows of work history, and each entry includes multiple elements from the DetailInfo table, separated by Newlines. The two tables can be matched / joined on the ID value.
This may not be the best way to go about a report, but it's part of the spec I was given. I know this can be done, but I'm having trouble learning how. Can anyone help me out?
Edit - I just found out that another column is also potentially one-to-many. In the example table, it would be saying that the "Occupation" value comes from the DetailInfo table, rather than the GeneralInfo table.
In MOST circumstances, this would just be "Construction Foreman" over and over again, and we would only want to show that once. However, in rare circumstances, an individual may have multiple concurrent (differing) Occupation values that would have to be shown. Is that possible? Should I make that a separate question?
I took Soundappan A's advice and created a sub-report in the column that needed the extra data. This video was helpful to me in learning how to set that up:
https://www.youtube.com/watch?v=LhSitVAnhyc

Why are my records not being inserted in the same order they are being executed?

I'm using Microsoft Office Access as my DBMS and I'm using VBA to write my code for this project.
I'm doing data scraping for items on a website and I encountered something that appeared odd to me after I had inserted my data into a table.
In my code I use a loop to iterate through and collect all the items that the website has to offer. Once I have all the data for one item I insert it into my table and then move on to the next. There's 14,724 items that I need to insert into my table. If I iterate over all of them, they will be added to the table but they are out of order once I look at them in the table, even though all the items are there, however if I adjust the loop to only collect...let's say only the first 10 items then they will appear in the same order in which they were collected which is the same order they appear in the source code for the website.
It is important to note that my table does not have an id field because it is not required as there's one other field that serves as a unique identifier for an item in the table.
This does not seem like a big issue but I'm curious as to why this happens. Is there some kind of limitation when using MS-Access as your DBMS?
Any insight is greatly appreciated.
Thank you.
A table is not a spreadsheet.
This is by design of any relational database engine. Records in a table have no order other than what you eventually assign or apply.
If you want to sort the data (ascending or descending) use a Query. The Table doesn't have any order. Even the Fields don't have any relevant order.

A way to update data in Oracle

I have a table that I need to update each day. The data comes in a text file every time. I wrote a program that extracts the data from the text file and and writes it in the table, but now I want to modify it to just update the existing data. The data is mostly the same, it might differ only a few things.
I was thinking about MERGE but I don't know very well how I could use this in my program. All the examples that I saw used a second table.
So it would be like creating a second table in which I extract the current data, after which I make the merge into the old table to update the records. I want to avoid creating a second table, so I was wondering if there is any way to do this?
Thanks!

Extract Distinct Record in SSIS

I am writing the SSIS package to import the data from *.csv files to the SQL 2008 DB. The problem is that one of the file contains the duplicate records in the csv file and I want to extract only the distinct values from that source. Please see the image below.
Unfortunately, the generated files are not under my control and it is owned by the third party and I could not change the way they generated.
I did use the LookUp Component. But it only checks the existing data against the incoming data. It does not check the duplicate records in the incoming data.
I believe the sort component gives an option to remove duplicate rows.
Depends on how serious you want to get about the duplicates. Do you need a record of what was duplicated or is it enough to just get rid of them? Sort component will get rid of dups on the sort field. However, the dups may have different data in the other fields and then you want a differnt strategy. Usually I load all to staging tables and clean up from there. I send the dupes removed to an exception table (we have to answer a lot of questions from our customers about why things don't match what they sent) and I often use a set of business rules (and use either an execute SQl or data flow tasks to enforce the rules) to determine which one to pick if there are duplicates in one area but not another (say two business addresses when we can only store 1). I also make sure the client is aware of how we determine which of the two to pick.
Use SORT tool for that from Toolbox, then click on it. You will get all available input columns.
Check the column and change sortType direction and then check "remove rows with duplicate sort value".
Bring in the data from the csv file the way it is, then dedup it after it's loaded.
It'll be easier to debug, too.
I used Aggregate Component and Group By both QualificationID and UnitID. If you want, you can also use Sort Component too. Perhaps, my information might help others.

Save and get arbitrary sort order in SQL Server

My client wants to sort products by drag & drop. The drag & drop part is easy with javascript.
My problem is how do I save and get the sort order?
I'm using .net c# and SQL Server 2008.
When I move a product and drop it in a new position I get the id of the product that's moved, product in front and product behind. With this data I want to update the sort order of products.
I was thinking of adding a field with position, but then I guess I have to update every item when position changes.
In general adding an additional position field is the only thing you can do, to get truly arbitrary ordering.
But you can implement it in several ways. Here are two ways I've implemented myself some time ago.
1. Method: Update all position values, by looping over your items and performing an UPDATE statement for every position.
This is easy to implement, but because of the many updates, it's not good for many items and/or large tables. Especially if you do it via Ajax and perform a complete re-ordering on every change in the list.
2. Method: Do a smart update of only the affected rows.
SELECT all items in the current sort order (The "old list") (Usually fast compared to an UPDATE statement)
Iterate over all items from the "new list" and compare each item to the item from the old list at the same position/index. If the items are the same, don't do anything
If the items are different find that item from the old list, which should actually be at that position and update its position value accordingly (Some lookup data structure might be useful here)
That way you only have to perform minimal database updates, but you'll have more complex code.
Personally I'd go with the first way, until the database updates actually become a performance problem.
We have a sort column but yes we have to re-index all rows as things change. You could mitigate this by assigning sort's in large enough increments to allow some level of movability before you have to do this, such as in 10's or 100's but that's not the best solution and I'd be interested to see what other ideas people have.
If you can capture each move programatically (with up and down buttons for example) then you can just swap the position numbers of the row moving and the row being moved. Make sure that you add new rows at the max position + 1.