Joining across 3000+ tables - mysql

Ok, strange one here
I have a database for customer data. My customers are businesses with their own customers.
I have 3000 tables (one for each business) with several thousand email addresses in each. Each table is identical, save the name.
I need to find a way to find where emails cross over between businesses (ie appear in multiple tables) and the name of the table that they sit in.
I have tried collating all entries and table names into one table and using a "group by", but the volume of data is too high to run this without our server keeling over...
Does anyone have a suggestion on how to accomplish this without running 3000 sets of joins?
Also, I cannot change the data structure AT ALL.
Thanks
EDIT: In response to those "helpful" restructure comments, not my database, not my system, I only started a couple of months ago to analyse the data

Multiple tables of identical structure almost never makes sense, all it would take is a business field to fix this structure. If at all possible you should fix the structure. If it has been foisted upon you and you cannot change it, you should still be able to work with it.
Select the distinct emails and the table name from each table either UNION ALL or pull them into a new table, then use GROUP BY and HAVING to find emails with multiple tables.
SELECT email
FROM Combined_Table
GROUP BY email
HAVING COUNT(sourc_table) > 1

So, you say you can't change the data structure, but you might be able to provide a compatible upgrade.
Provide a new mega table:
CREATE TABLE business_email (
id_business INT(10) NOT NULL,
email VARCHAR(255) NOT NULL UNIQUE,
PRIMARY KEY id_business, email
) ENGINE = MYISAM;
Myisam engine so you don't have to worry about transactions.
Add a trigger to every single business table to duplicate the email into the new one:
DELIMITER \\
CREATE TRIGGER TRG_COPY_EMAIL_BUSINESS1 AFTER INSERT OR UPDATE ON business1 FOR EACH ROW
BEGIN
INSERT INTO `business_email` (`id_business`, `email`) VALUES (NEW.`id_business`, NEW.`email`) ON DUPLICATE KEY UPDATE `id_business`=NEW.`id_business`;
END;
\\
DELIMITER ;
Your problem is to add it dynamically whenever a new table is created. It shouldn't be a problem since apparently there's already dynamic DDL in your application code.
Copy all existing data to the new table:
INSERT INTO `business_email` (`id_business`, `email`)
SELECT email FROM business1
UNION
SELECT email FROM business2
...
;
COMMIT;
proceed with your query on the new business_email table, that should be greatly simplified:
SELECT `id_business` FROM `business_email`
WHERE
GROUP BY `email`
HAVING COUNT(`email`) > 2;
This query should be easy to cope with. If not, please detail the issue as I don't think properly indexed tables should be a problem even for millions of rows (Which I don't believe is the case since we talk about emails)
The advantage of this solution is that you stay up to date all the time, while you don't change the way your application works. You just add another layer to provide additional business value.

Related

I need to link separate table IDs

I have a dashboard where I can update, delete and create. I have 3 separated tables, Developers, Absent, Date.
Developers contains developer names and personal information. Absent is used for a segment of code, basically on certain days someone is absent. Date is mostly the same but this is for holidays instead.
I joined tables, so absent.absent_id=developers.absent_id, date.date_id=developers.date_id
When I create a developer I need to insert values. HOWEVER, I'm having a problem. The IDs of the tables need to be manually inputed through the database. I would like it so that If I create a new developer on the dashboard using Insert, The absent_id and date_id are ID linked between tables.
In short:
If create developer on submit, Add new Auto-incremented ID row to all 3 tables. Anyway this can be done?
You can achieve it with 3 SQL request, first insert new Developer and return its ID, use that to insert a new row in Absent Table and also return its ID, use both IDs to create Date row, i don't think it's related to Front-end(React) but rather a backend.
Cheers.

Update a table with data from another where non FK columns match

I'm working on an online registry which was created by a previous programmer. I Have to fix a bunch of data integrity issues revolving around postal codes and cities. I am trying to do a large update query using data from our table of Canadian postal codes and our table of registrants. My query seems to literally take infinite time on my development environment. Not sure why.
Create Temporary Table RegistrantToChange AS (
SELECT
intID, vcCity, vcPostalCode
FROM
tblRegistrantWebsiteSignUps
WHERE
vcPostalCode NOT LIKE '00%' AND vcPostalCode!=''
AND (vcCity = '' OR vcCity = 'unspecified')
);
UPDATE RegistrantToChange, tblPostalCodes
SET
vcPostalCode = tblPostalCodes.PostalCode
WHERE
vcCity = tblPostalCodes.CityName;
Pardon the horrific and inconsistent naming. I just recently took over this project and am still in the process of refactoring the whole thing.
vcCity in your temporary table is not indexed, and if tblPostalCodes.CityName is not indexed then the JOIN in the update has a lot of work to do and may take some time.
I would suggest creating the temporary table first with an index on vcCity, then perform an INSERT...SELECT to populate it. Ensure that tblPostalCodes.CityName is indexed and then perform your update.

SQL to update column in modified table

I am a reasonably competent SQL programmer but my skills are still pretty much in the domain of simple INSERT, SELECT, UPDATE statements with an occasional LIKE etc thrown in. What I am currently trying to do is rather more complex. Here is the scenario.
I have three tables.
Table 1, *users* identifies users via a User ID, uid. Users can have one or more sub accounts
Table 2 *accounts* keeps a record of subaccounts for each user with, amongst other things the columns uid and sid where uid is the one defined in the *users* table.
Table 3, *data* is currently storing some data, in a data column that is being associated with a particular subaccount, sid.
The thing I have just realized is that there is no particular reason to block users from using those data across subaccounts. No problem - I can change my data subset search SQL to work with the uid instead. However, given the frequency of such searches, it seems well worth while simply sticking in a uid column in *data*.
To do that I would need to write some smart SQL that would get uid,sid pairs from the *accounts* table and use that information to update the newly created uid column in the data table. This I have to admit is beyond my knowledge of SQL.
I should mention that the system using these data is now in production and has several 100s of users so the option of just acting like they are not there is not available. Not terribly relevant I think but I should mention that uid and sid are alphanumeric strinsg with both columns being indexed.
I would be most grateful to anyone here who might be able to help out with it.
Mysql can do updates based on joins and based on reading of your schema here's what I'd do...
UPDATE accounts a, data d
set d.uid=a.uid
where a.sid=d.sid
and d.uid is NULL

sync records of two tables in the same database in MYSQL

I have two tables with some same fields like:
Table A: fname, lname, address, age, email, mobile, website, blog
Table B: fname, lname, address, age, email
Both these tables are used by different modules on my website. I want to sync the first five fields of both tables in such a way that whenever a new row is added or an existing row is modified in Table A, the Table B is updated automatically and vice versa.
For Example.
A user created a new record in Table A. Now the Table B should also be updated with this new information. and vice versa if a user creates a new record in Table B, the Table A should also be updated with this new information.
A user modified a record in Table A. Now the Table B should also be updated with this modified information. and vice versa if a user modifies a new record in Table B, the Table A should also be updated with this modified information.
How can I achieve this. I thought of using triggers but would it not create an inifinite loop resulting is server error!
Is any field among those 5 guaranteed to be unique? You could add a conditional to the trigger to check to see if that field exists before inserting the record in the table.
You might want to rethink the design also. Storing duplicate records in 2 places seems a little scaring. You're going to have to have triggers for updates, inserts, and deletes.
If u just need to update one table in case the other table gets updated, Instead of creating a table (as a part of some other table), create a View which is also like a table but virtual (not real).
but since u've asked for both sides update.
What I believe is that you should go back little back of this problem....and tell us why u need to update both the tables according to the other table,,,
Because you are just keeping duplicate data at two places that is of no need.
So, try to think whether it can be done without creating two tables, or something like create one table and one view for partial columns requirement.
It is not an answer to your problem, but I am trying to solve your problem in an optimized way which is good for everyone's health....
Hope you understood what i tried to tell. :)

adding data to interrelated tables..easier way?

I am a bit rusty with mysql and trying to jump in again..So sorry if this is too easy of a question.
I basically created a data model that has a table called "Master" with required fields of a name and an IDcode and a then a "Details" table with a foreign key of IDcode.
Now here's where its getting tricky..I am entering:
INSERT INTO Details (Name, UpdateDate) Values (name, updateDate)
I get an error: saying IDcode on details doesn't have a default value..so I add one then it complains that Field 'Master_IDcode' doesn't have a default value
It all makes sense but I'm wondering if there's any easy way to do what I am trying to do. I want to add data into details and if no IDcode exists, I want to add an entry into the master table. The problem is I have to first add the name to the fund Master..wait for a unique ID to be generated(for IDcode) then figure that out and add it to my query when I enter the master data. As you can imagine the queries are going to probably get quite long since I have many tables.
Is there an easier way? where everytime I add something it searches by name if a foreign key exists and if not it adds it on all the tables that its linked to? Is there a standard way people do this? I can't imagine with all the complex databases out there people have not figured out a more easier way.
Sorry if this question doesn't make sense. I can add more information if needed.
p.s. this maybe a different question but I have heard of Django for python and that it helps creates queries..would it help my situation?
Thanks so much in advance :-)
(decided to expand on the comments above and put it into an answer)
I suggest creating a set of staging tables in your database (one for each data set/file).
Then use LOAD DATA INFILE (or insert the rows in batches) into those staging tables.
Make sure you drop indexes before the load, and re-create what you need after the data is loaded.
You can then make a single pass over the staging table to create the missing master records. For example, let's say that one of your staging table contains a country code that should be used as a masterID. You could add the master record by doing something along the lines of:
insert
into master_table(country_code)
select distinct s.country_code
from staging_table s
left join master_table m on(s.country_code = m.country_code)
where m.country_code is null;
Then you can proceed and insert the rows into the "real" tables, knowing that all detail rows references a valid master record.
If you need to get reference information along with the data (such as translating some code) you can do this with a simple join. Also, if you want to filter rows by some other table this is now also very easy.
insert
into real_table_x(
key
,colA
,colB
,colC
,computed_column_not_present_in_staging_table
,understandableCode
)
select x.key
,x.colA
,x.colB
,x.colC
,(x.colA + x.colB) / x.colC
,c.understandableCode
from staging_table_x x
join code_translation c on(x.strange_code = c.strange_code);
This approach is a very efficient one and it scales very nicely. Variations of the above are commonly used in the ETL part of data warehouses to load massive amounts of data.
One caveat with MySQL is that it doesn't support hash joins, which is a join mechanism very suitable to fully join two tables. MySQL uses nested loops instead, which mean that you need to index the join columns very carefully.
InnoDB tables with their clustering feature on the primary key can help to make this a bit more efficient.
One last point. When you have the staging data inside the database, it is easy to add some analysis of the data and put aside "bad" rows in a separate table. You can then inspect the data using SQL instead of wading through csv files in yuor editor.
I don't think there's one-step way to do this.
What I do is issue a
INSERT IGNORE (..) values (..)
to the master table, wich will either create the row if it doesn't exist, or do nothing, and then issue a
SELECT id FROM master where someUniqueAttribute = ..
The other option would be stored procedures/triggers, but they are still pretty new in MySQL and I doubt wether this would help performance.