I want to find tablename and row count of all the tables in a database in mysql and pgsql by using query. Is there any query to find this?
The SQL-standard INFORMATION_SCHEMA provides information about the structure of your database - which tables it has, etc. It does not, however, contain row counts.
At least for PostgreSQL you have at least two options for getting row counts:
Use an external program or a PL/PgSQL function that generates dynamic SQL using EXECUTE to do a SELECT count(*) FROM tablename for each table found in the INFORMATION_SCHEMA (excluding system tables); or
Run ANALYZE then get the approximate rowcounts from the PostgreSQL statistics tables. This approach is a lot faster, but is only getting an approximate table rowcount based on statistical sampling and estimation.
This has been discussed in detail for PostgreSQL here.
The approach of querying INFORMATION_SCHEMA for a table list and then looping over the tables doing count should be portable across all databases. Other approaches will likely require various degrees of database-specific code.
For postgresql:
SELECT
nspname AS schema,relname table_name,reltuples::int rows_count
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE
nspname NOT IN ('pg_catalog', 'information_schema') AND
relkind='r' and reltuples>0
ORDER BY relname ;
Related
Is there any simple way to get count data of single table from different databases(db names) with same schema.
I tried like this..
SELECT COUNT(id) AS db1users,
(SELECT COUNT(id) FROM DB2.users) AS db2users,(
SELECT COUNT(id) FROM DB3.users) AS db3users,
(SELECT COUNT(id) FROM DB4.users) AS db4users,
..........
..........
..........
FROM DB1.users;
I got the exact result but query becoming very large. Is there any simple way to get this..
Please help
Another option, that avoids the need for dynamic sql (and is far less expensive, hence much more scalable), would be to use MySQL the INFORMATION_SCHEMA.TABLES table. It has a column named TABLE_ROWS, whose specification is as follows:
TABLE_ROWS
The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40% to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.
If this matches your requirement, then you can use a simple query loke:
SELECT table_schema, table_rows FROM information_schema.tables WHERE table_name = 'users'
The best way to do this would be via a scripting language (e.g. Python, Ruby, PHP); you'd execute a database query to get all the database names from your database, then create a SQL statement with all your select count(id) from...; once you've built the SQL statement, you'd execute it.
You can also do this in dynamic SQL inside MySQL; dynamic SQL is hard to write and debug, so I'm not a huge fan....
I have a query which gets data by joining 3 big tables (~1mm records each), in addition they are very busy tables.
is it better to do the traditional joins? or rather first fetch values from first table and do a secondary query passing the values retrieved as in comma delimited in clause?
Option #1
SELECT *
FROM BigTable1 a
INNER JOIN BigTable2 b using(someField2)
INNER JOIN BigTable3 c using(someField3)
WHERE a.someField1 = 'value'
vs
Option #2
$values = SELECT someField2 FROM WHERE someField1 = 'value'; #(~20-200 values)
SELECT *
FROM BigTable2
INNER JOIN BigTable3 c using(someField1)
WHERE someField2 in ($values)
Option #3
create temp-table to store these values from BigTable1
and use this instead of join to BigTable1 directly
any other option?
I think the best option is to try both approaches and run explain on them.
Finally, one optimization you could make would be to use a stored procedure for the second approach which would reduce the time/overhead of having to run 2 queries from the client.
Finally, Joining is quite an expensive operation for very large tables since your essentially projecting and selecting over 1m X 1m rows. ( terms: What are projection and selection?)
There is no definitive answer to your question and you could profile both ways since they depend on multiple factors.
However, the first approach is usually taken and should be faster if all of the tables are correctly indexed and the sizes of the rows are "standard".
Also take into account that in the second approach the latency of the network communication will be far worse since you will need multiple trips to the DB.
I have a dozen of tables with the same structure. All of their names match question_20%. Each table has an indexed column named loaded which can have values of 0 and 1.
I want to count all of the records where loaded = 1. If I had only one table, I would run select count(*) from question_2015 where loaded = 1.
Is there a query I can run that finds the tables in INFORMATION_SCHEMA.TABLES, sums over all of these counts, and produces a single output?
You can do what you want with dynamic SQL.
However, you have a problem with your data structure. Having multiple parallel tables is usually a very bad idea. SQL supports very large tables, so having all the information in one table is a great convenience, from the perspective of querying (as you are now learning) and maintainability.
SQL offers indexes and partitioning schemes for addressing performance issues on large tables.
Sometimes, separate tables are necessary, to meet particular system requirements. If so, then a view should be available to combine all the tables:
create view v_tables as
select t1.*, 'table1' as which from table1 union all
select t2.*, 'table2' as which from table2 union all
. . .
If you had such a view, then your query would simply be:
select which, count(*)
from v_tables
where loaded = 1
group by which;
This query is inefficient and unable to execute. track and desiredspeed table have almost million records.... after this we want to self join the track table for further processing. any efficient approach to execute bellow query is appreciated..
select
t_id,
route_id,
t.timestamp,
s_lat,
s_long,
longitude,
latitude,
SQRT(POW((latitude - d_lat),2) + POW((longitude - d_long),2)) as dst,
SUM(speed*18/5)/count(*) as speed,
'20' as actual_speed,
((20-(speed*18/5))/(speed*18/5))*100 as speed_variation
from
track t,
desiredspeed s
WHERE
LEFT(s_lat,6) = LEFT(latitude,6)
AND LEFT(s_long,6)=LEFT(longitude,6)
AND t_id > 53445
group by
route_id,
s_lat,
s_long
order by
t_id asc
firstly you are using sybase join syntax i would change that
you are also performing two computations per join across large datasets this is likely to be inefficient
this will not be able to use an index as you are performing computation on the column, either store the data precomputed or alternately add a computed column based on the rule applied above, and index accordingly
Finally it may be quicker if you used temp tables or common Table expressions (although do not know MySQL too well here)
Related (SQL Server): Count(*) vs Count(1)
Could you please tell me what is better in performance (MySQL)? Count(*) or count(1)?
This is a MySQL answer.
They perform exactly the same - unless you are using MyISAM, then a special case for COUNT(*) exists. I always use COUNT(*) anyway.
https://dev.mysql.com/doc/refman/5.6/en/aggregate-functions.html#function_count
For MyISAM tables, COUNT(*) is optimized to return very quickly if the
SELECT retrieves from one table, no other columns are retrieved, and
there is no WHERE clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization only applies to MyISAM
tables, because an exact row count is stored for this storage engine
and can be accessed very quickly. COUNT(1) is only subject to the same
optimization if the first column is defined as NOT NULL.
###EDIT
Some of you may have missed the dark attempt at humour. I prefer to keep this as a non-duplicate question for any such day when MySQL will do something different to SQL Server. So I threw a vote to reopen the question (with a clearly wrong answer).
The above MyISAM optimization applies equally to
COUNT(*)
COUNT(1)
COUNT(pk-column)
COUNT(any-non-nullable-column)
So the real answer is that they are always the same.