How to delete columns of a table in mysql where all values are null for that column. I have 38GB table so performance is required. - mysql

I have a huge DB table(38 GB) in which there are many columns which are having all its value as null. Problem is that before creating table you are not aware which columns will have data and due to that we have to keep all columns while creating table. But due to this performance of queries are very bad.
So need to find all columns which have all its value null reduce size of table. Also when inner joins are done it takes too much of time. So is it the case that inner join on large tables takes more time.

Do a
SELECT Count(distinct colName) FROM myTable
For eacht column. This way you will get als only result if there is no other value. You can then
ALTER TABLE myTable DROP COLUMN colName
to drop the col.
An alternative might be that you do a
SELECT * FROM myTable procedure Analyse()
This way you will get an overview on your table with all columns containing two interesting columns: Empties_or_zeros and Nulls. Both contain the Count of empty rows.

Related

MySQL - Does SELECT * need an index of all table fields?

I would like to know if it is necessary to create an index for all fields within a table if one of your queries will use SELECT *.
To explain, if we had a table that 10M records and we did a SELECT * query on it would the query run faster if we have created an index for all fields within the table or does MySQL handle SELECT * in a different way to SELECT first_field, a_field, last_field.
To my understanding, if I had a query that did SELECT first_field, a_field FROM table then it would bring performance benefits if we created an index on first_field, a_field but if we use SELECT * is there even a benefit from creating an index for all fields?
Performing a SELECT * FROM mytable query would have to read all the data from the table. This could, theoretically, be done from an index if you have an index on all the columns, but it would be just faster for the database to read the table itself.
If you have a where clause, having an index on (some of) the columns you have conditions on may dramatically improve the query's performance. It's a gross simplification, but what basically happens is the following:
The appropriate rows are filtered according to the where clause. It's much faster to search for these rows in an index (which is, essentially, a sorted tree) than a table (which is an unordered set of rows).
For the columns that where in the index used in the previous step the values are returned.
For the columns that aren't, the table is accessed (according to a pointer kept in the index).
indexing a mysql table for a column improves performance when there is a need to search or edit a row/record based on that column of that table.
for example, if there is an 'id' column and if it is a primary key; And in that case if you want to search a record using where clause on that 'id' column then you don't need to create index for the 'id' column because primary key column will act as an indexed column.
In another case, if there is an 'pid' column in the table and if it is not a primary key; Then in order to search based on 'pid' column then to improve performance it is better to create an index for the 'pid' column. That will make query fast to search the expected record.

Generic stored procedure to lag a table column

I need to calculate returns at different frequencies. In order to do so, I would like to be able to lag the values in a column by k units. While I have found different specific solutions, I have not been able to make a general stored procedure (most likely due to my inexperience with mysql). How could I best do this?
I have a table with multiple columns, amongst which columns containing info on:
ID
Date
Price
The end result should be a table with all the original columns, plus a column containing the lagged values of Price.
To keep the procedure general, I could imagine the procedure would take the table name, necessary column names (e.g. ID, Date, Price), and number of lags k as input, and append a column to the table.
You can do what you want with a correlated subquery. Here is an example:
select t.*,
(select t2.price
from <tablename> t2
where t2.date < t.date
order by date
limit 1 offset 1 -- change the offset for a bigger lag
) as price_lag_1
from <tablename> t;
Your desire to create a generic stored procedure is not very SQL-y. MySQL doesn't support table-valued functions, so you wouldn't be able to use the resulting table as an actual table.
If you want to put this in a stored procedure that is generic, you will need dynamic SQL to construct the SQL statement, using the particular table and columns that you pass in.
Instead, I would suggest that you simply learn how to express what you want as a query. If you have multiple tables with the same structure, then you may want to revisit your data model. Have multiple similar tables is often an example of an entity being inappropriately spread across too many tables.

mySQL: duplicating multiple records via temporary table, how to preserve autoincrement index?

I wish to duplicate a selection of records in a mySQL table.
The pk of the table is an autoincremented int.
I want to do this with one set of mysql queries (for performance reasons).
It seems like the fastest way to do this is to put the results of the selection into a temporary table,
make any changes needed, and reinsert the records back to the original table, like this:
CREATE TEMPORARY TABLE temp1234 ENGINE=MEMORY SELECT * FROM a_table WHERE column='my selection';
# do updates in temp1234; (altering FK's mainly)
INSERT INTO a_table SELECT * FROM temp1234;
But when I try to do this i get an error for duplicate PKs.
Now, I realise that I could alter the INSERT with SELECT query to exclude the pk/ID column, but as I am proceduraly generating these queries across multiple tables for a large data copying function, i want to avoid having to supply column names.
What is the best way around this problem?

How to make the query fast of Comparing between different data types IN MYSQL

1) My question is that, when i have two large table which cannot be alter because of there sizes.
Now i have to join them on a common field and now compare one field from both the table which is having same data but one's data type is int and anothers is varchar.
I know we can done this easily, but when table have millions of record then comparing between two different data type is slow down, how can i make it fast.
2) my similar 2nd question is that when i have to join two tables on some field like id and which is in different data type in both the table. like one is int and another is char.....how can i join this two table because i cannot wait for many days.
(One solution i have tried is to create new table as an abstract(by in file out file) of old . While i have now changed the data type from char to int during create table and then took the in file)
If anybody have any other solution, please share
Make sure the conversion happens on the first table in the join, that way:
the conversion only happens only once per row
indexes can be used to join with the second table
for example:
select *
from table1
join table2 on table2.intcol = cast(table1.varcharcol as signed)
This sample query will use an index on table2.intcol (if one exists) to join the two tables.
Yes cast can be used for changing data types
select *
from table1
join table2 on table2.intcol = cast(col as int)

Index counter shared by multiple tables in mysql

I have two tables, each one has a primary ID column as key. I want the two tables to share one increasing key counter.
For example, when the two tables are empty, and counter = 1. When record A is about to be inserted to table 1, its ID will be 1 and the counter will be increased to 2. When record B is about to be inserted to table 2, its ID will be 2 and the counter will be increased to 3. When record C is about to be inserted to table 1 again, its ID will be 3 and so on.
I am using PHP as the outside language. Now I have two options:
Keep the counter in the database as a single-row-single-column table. But every time I add things to table A or B, I need to update this counter table.
I can keep the counter as a global variable in PHP. But then I need to initialize the counter from the maximum key of the two tables at the start of apache, which I have no idea how to do.
Any suggestion for this?
The background is, I want to display a mix of records from the two tables in either ASC or DESC order of the creation time of the records. Furthermore, the records will be displayed in page-style, say, 50 records per page. Records are only added to the database rather than being removed. Following my above implementation, I can just perform a "select ... where key between 1 and 50" from two tables and merge the select datasets together, sort the 50 records according to IDs and display them.
Is there any other idea of implementing this requirement?
Thank you very much
Well, you will gain next to nothing with this setup; if you just keep the datetime of the insert you can easily do
SELECT * FROM
(
SELECT columnA, columnB, inserttime
FROM table1
UNION ALL
SELECT columnA, columnB, inserttime
FROM table2
)
ORDER BY inserttime
LIMIT 1, 50
And it will perform decently.
Alternatively (if chasing last drop of preformance), if you are merging the results it can be an indicator to merge the tables (why have two tables anyway if you are merging the results).
Or do it as SQL subclass (then you can have one table maintain IDs and other common attributes, and the other two reference the common ID sequence as foreign key).
if you need creatin time wont it be easier to add a timestamp field to your db and sort them according to that field?
i believe using ids as a refrence of creation is bad practice.
If you really must do this, there is a way. Create a one-row, one-column table to hold the last-used row number, and set it to zero. On each of your two data tables, create an AFTER INSERT trigger to read that table, increment it, and set the newly-inserted row number to that value. I can't remember the exact syntax because I haven't created a trigger for years; see here http://dev.mysql.com/doc/refman/5.0/en/triggers.html