I don't have my server up and running yet, so unfortunately I can't test yet, but I was hoping you could shed some light on a question I have.
So, Table A has an inverse one-to-many relationship with Table B, so would it be better to store the ID's of Table B in Table A search by ID, or would it be better/faster to query Table B for all results where it's Table A ID is equal to my Table A's ID?
Basically Search (Search for row based on ID) vs Query (Grab all rows that have a certain value).
As long as the column containing the ID in B is a (foreign) key. It is most certainly faster. Storing a non-scalars (ie. lists) in columns in a database is generally a bad idea.
Related
I have two big tables for example:
'tbl_items' and 'tbl_items_transactions'
First table keeping some items metadata which may have 20 (varchar) columns with millions rows... and second table keeping each transaction of first table.
for example if a user insert new record to tbl_items then automatically a new record will be adding to tbl_items_transactions with same data plus date, username and transaction type to keep each row history.
so in the above scenario two tables have same columns but tbl_items_transactions have 3 extra columns date, username, transaction_type to keep each tbl_items history
now assume we have 1000 users that wants to Insert, Update, Delete tbl_items records with a web application. so these two tables scale very soon (maybe billion rows in tbl_items_transactions)
I have tried MySQL, MariaDB, PostgreSQL... they are very good but when table scale and millions rows inserted they are slow when run some select queries on tbl_items_transactions... but sometimes PostgreSQL is faster than MySQL or MariaDB
now I think I'm doing wrong things... If you was me... do you use MariaDB or PostgreSQL or somthing like that and structure your database like what I did?
Your setup is wrong.
You should not duplicate the columns from tbl_items in tbl_items_transactions, rather you should have a foreign key in the latter table pointing to the former.
That way data integrity is preserved, and tbl_items_transactions will be much smaller. This technique is called normalization.
To speed up queries when the table get large, define indexes on them that match the WHERE and JOIN conditions.
I have created a database with One to many relationship
The Parent Table say Master has 2 columns NodeId,NodeName; NodeId is the PrimaryKey and it is of type int rest are of type varchar.
The Child Table say Student has 5 columns NodeId,B,M,F,T; and NodeId is the ForeignKey over here.
none of the columns B,M,F,T are unique and it can have null values hence none of these columns have been defined as Primary Key.
assume student table has more than 20,00,000 fields.
My fetch query is
SELECT * FROM STUDENT WHERE NODEID = 1 AND B='1-123'
I would like to improve the speed of fetching , Any suggestion regarding improvement of the DB structure or alternative fetch query would be really helpful or any suggestion that can improve overall efficiency is most welcome.
since foreign key is not indexed by default, maybe adding indexes to nodeID in student and B would improve query performance if inserts performance are not as big of a issue.
Update:
an index is essentially a way to keep your data sorted to increase search/query time. It should be good enough to just think of it as an ordered list.
An index is quite transparent, so your query would remain exactly the same.
A simple index does allow rows with the same indexed fields, so it should be fine.
it is worth to m mention. a primary key element is indexed by default, however a PK does not allow duplicate data.
also, since it's keeping an ordering of your data, insertion time will increase, however if your dataset is big query time should become faster.
Okay so here are my table schemas.
I have 2 tables. Say Table A and Table B. The primary key of Table A is PriKeyA bigint(50) and primary key of Table B is PriKeyB varchar(255). Both PriKeyA and PriKeyB contain the same type of data.
The relevant fields of Table A required for this problem are Last_login_date_in_A (date) and Table B is the primary key itself.
What I need to do is, get those PriKeyA's in A which are not there in Table B's PriKeyB column and the Last_login_date_in_A column should be greater than 30 days from the current date. Basically I need the difference of Table A and Table B along with a certain condition(which is the date in this problem)
Here is my SQL command
: SELECT A.PriKeyA from A
LEFT JOIN B ON A.PriKeyA = B.PriKeyB
WHERE B.PriKeyB IS NULL and DATEDIFF(CURRENTDATE,Last_login_date_in_A)>30;
However when I run this MySQL command, it takes about ridiculously long amount of time (About 3 hours). The size of Table A is 2,50,000 and Table B is 42,000 records respectively. I thought that this problem could arise due to the fact that PriKeyA and PriKeyB are different datatypes. So i also used the CAST(PriKeyB as unsigned) in the query. But that too didn't work. There was a marginal performance improvement.
What could be the possible problems? I've used Left Joins before and they never have taken this long.
The expense of the query appears to be for these reasons:
The SQL datatype for A's PK and B's PK aren't the same.
Table A probably doesn't have an index on Last_login_date_in_A
What this means is that ALL rows in table A MUST be examined one row at a time in order to determine if the > 30 days ago criteria is true. This is especially true if A has 2,500,000 rows (as evidenced by how you placed your commas in A's row count) instead of 250,000.
Adding an index on Last_login_date_in_A might help you out here, but will also slightly slow down insert/update/delete statement times for the table due to needing to update the additional index.
Additionally, you should utilize the documentation for explaining MySQL's actual chosen query plan for your query at: MySQL query plan documentation
I would like to know the the difference between the materialized view and normal table in mysql. Though mysql dont support materialized view but we can use flexview
currently we are processing a call detail records( cdr files) and saving the records in the table (say table A) and then we have created another table( say table B ) which will have the records from table A but grouped by records. so table B will have computed records of table A eg group by country or city.
Table A keeps on growing at the enormous size as it is saving all the cdr files
And Table B is saving all the results we got from table A using a group by queries
Now my question how the materialized view can be implemented in this scenario and how it makes a difference when compared to table B in above scenario as far as the query performance is concerned.
In terms of comparing query performance, a SELECT from table B vs. a SELECT from a "flewview materialized view", there will be no difference, because what flexview does is create an actual table, and maintain the contents of that table.
What is probably going to be different is how the "materialized view" table is populated. It's very likely going to be different than how you populate and maintain table B.
Depending on how flexview maintains the table, when insert/update/delete operations are occurring, there might be an impact to concurrency. i.e. a query being blocked from performing a read when the "refresh" process has rows locked.
But as far as a query against the materialized view, it won't be any different than a query pf a regular table.
I have an application which collects data into a mysql table. The table has no unique id column, so I can't reference a specific row by id.
I want to write a dump application which every day dumps the new rows added to the table to upload them elsewhere. I could do it by adding a unique id field and storing the last id dumped, but I don't want to add an id column just for that to the table.
So I thought I store the number of rows in the table at every dump and use that number as an offset next time the table is dumped (select * from table limit verylargenumber offset x). Of course, it works only if there is a guarantee new rows always inserted at the end of the table, so all new rows will be after the offset.
I assume I can rely on that. Am I right?
No this isn't the case. The database will move stuff around to optimize and make queries faster. You would have to add an order by clause to your query to ensure any sort of order. You should definitely consider adding a unique id to your table.
No, you aren't. There is no surety as to the order in which the engine will return the rows. A table without a unique ID is generally not such a godo idea anyways. In this case, you definitely have reason enough to use one.
Similar to a file system, unless the table is optimized or defragmented, deleted data will free up a "slot" where new data will be inserted. It isn't always appended to the end of the table.
So say you have 3 rows: A, B, C
If you delete B, then your table will essentially look like A, [free space], C
So if you insert D into your table, it will now look like: A, D, C
Your best bet is to use a unique auto incrementing key. This will also speed up queries.