Correct database operation to get the data desired - mysql

Ok, so I've got a MySQL database with several tables. One of the tables (table A) has the items of most interest to me.
It has a column called type and a column called entity_id. The primary key is something called registration_id, which is more or less irrelevant to me currently.
Ultimately, I want to gather all items of a particular type, but which have a unique entity_id. The only problem with this is that entity_id in table A is NOT a unique key. It is possible to have multiple registration_ids per entity_id.
Now, there's another table (table B) which has only a list of unique entity_ids (that is, it is the primary key on that table), however there's no information on the type in that table.
So with these two tables, what is the best way to get the data I want?
I was thinking some sort of way (DISTINCT) that I could use on the first table, alone, or possibly a join of some sort (I'm still relatively new to the concept of joins) between table A and table B, combining the entity_id from table B with the type from table A.
What's the most efficient database operation for this for now? And should I (eventually, not right now as I simply do not have the time, sadly) change the database structure for greater efficiency?
If anyone needs any additional information or graphics, let me know.

If I understand correctly you can use either GROUP BY
SELECT entity_id
FROM table1
WHERE type = ?
GROUP BY entity_id
or DISTINCT
SELECT DISTINCT entity_id
FROM table1
WHERE type = ?
Here is SQLFiddle demo

Table Joins are a costly operation. If you are dealing with large datasets then the time it takes to execute a join operation is non-negligible.
The following SQL statement will grab all entity_id's and group them by type. So for each entity_id only 1 of each type will be in the result set:
SELECT type, entity_id FROM TableA GROUP BY type, entity_id;

I think this is what you are looking for. Try this to give you the types that have only one (unique) entity_id.
SELECT type , count(entity_id)
FROM table1
GROUP BY type
HAVING COUNT(entity_id)=1
Here is the SQL Fiddle

Related

Get the most frequent value in a linking table and refer back to it in a main table in MySQL

I have a couple of tables being used in a Star Wars database along with a joining table (this is part of a Uni project).
I'm trying to write a query where it will return what the most common mode of transportation is, based on the characters in a table, and modes of transportation in another table linked together by a joining table.
The primary key in both the person and transportation tables is a simple "id" column, and then both tables have a name column.
A linking table called person_transportation has been created with columns containing the ids from both the person and transportation tables.
I can run the following query:
SELECT transport_id
FROM person_transport
GROUP BY transport_id
ORDER BY COUNT(*)
DESC LIMIT 1
And that returns the value 2 (which is the correct ID for the ship I want). I can't seem to find a way to put this query into another query which will return the ship name instead of an ID number.
I have searched through plenty of similar issues in here, but they all seem to relate to a query being run on a single table, not a linking table.
Any ideas?
I guess there is a transport_table with columns like transport_id and transport_name.
Use your query in a WHERE clause like this:
SELECT transport_name
FROM transport_table
WHERE transport_id = (
SELECT transport_id
FROM person_transport
GROUP BY transport_id
ORDER BY COUNT(*)
DESC LIMIT 1
)
Change the table's name and the column names to the actual names.

MySQL: Group by query optimization

I've got a table of the following schema:
+----+--------+----------------------------+----------------------------+
| id | amount | created_timestamp | updated_timestamp |
+----+--------+----------------------------+----------------------------+
| 1 | 1.00 | 2018-01-09 12:42:38.973222 | 2018-01-09 12:42:38.973222 |
+----+--------+----------------------------+----------------------------+
Here, for id = 1, there could be multiple amount entries. I want to extract the last added entry and its corresponding amount, grouped by id.
I've written a working query with an inner join on the self table as below:
SELECT t1.id,
t1.amount,
t1.created_timestamp,
t1.updated_timestamp
FROM transactions AS t1
INNER JOIN (SELECT id,
Max(updated_timestamp) AS last_transaction_time
FROM transactions
GROUP BY id) AS latest_transactions
ON latest_transactions.id = t1.id
AND latest_transactions.last_transaction_time =
t1.updated_timestamp;
I think inner join is an overkill and this can be replaced with a more optimized/efficient query. I've written the following query with where, group by, and having but it isn't working. Can anyone help?
select id, any_value(`updated_timestamp`), any_value(amount) from transactions group by `id` having max(`updated_timestamp`);
There are two (good) options when performing a query like this in MySQL. You have already tried one option. Here is the other:
SELECT t1.id,
t1.amount,
t1.created_timestamp,
t1.updated_timestamp
FROM transactions AS t1
LEFT OUTER JOIN transactions later_transactions
ON later_transactions.id = t1.id
AND later_transactions.last_transaction_time > t1.updated_timestamp
WHERE later_transactions.id IS NULL
These methods are the ones in the documentation, and also the ones I use in my work basically every day. Which one is most efficient depends on a variety of factors, but usually, if one is slow the other will be fast.
Also, as Strawberry points out in the comments, you need a composite index on (id,updated_timestamp). Have separate indexes for id and updated_timestamp is not equivalent.
Why a composite index?
Be aware that an index is just a copy of the data that is in the table. In many respects, it works the same as a table does. So, creating an index is creating a copy of the table's data that the RDBMS can use to query the table's information in a more efficient manner.
An index on just updated_timestamp will create a copy of the data that contains updated_timestamp as the first column, and that data will be sorted. It will also include a hidden row ID value (that will work as a primary key) in each of those index rows, so that it can use that to look up the full rows in the actual table.
How does that help in this query (either version)? If we wanted just the latest (or earliest) updated_timestamp overall, it would help, since it can just check the first or last record in the index. But since we want the latest for each id, this index is useless.
What about just an index on id. Here we have a copy of the id column, sorted by the id column, with the row ID attached to each row in the index.
How does this help this query? It doesn't, because it doesn't even have the updated_timestamp column as part of the index, and so won't even consider using this index.
Now, consider a composite index: (id,updated_timestamp).
This creates a copy of the data with the id column first, sorted, and then the second column updated_timestamp is also included, and it is also sorted within each id.
This is the same way that a phone book (if people still use those things as something more than paperweights) is sorted by last name and then first name.
Because the rows are sorted in this way, MySQL can look, for each id, at just the last record of a given id. It knows that that record contains the highest updated_timestamp value, because of how the index is defined.
So, it only has to look up one row for each id that exists. That is fast. Further explanation into why would take up a lot more space, but you can research it yourself if you like, by just looking into B-Trees. Suffice to say, finding the first (or last) record is easy.
Try the following:
ALTER TABLE transactions
ADD INDEX `LatestTransaction` (`id`,`updated_timestamp`)
And then see whether your original query or my alternate query is faster. Likely both will be faster than having no index. As your table grows, or your select statement changes it may affect which of these queries is faster, but the index is going to provide the biggest performance boost, regardless of which version of the query you use.

When is GROUP BY required for aggregate functions?

I have a table called myEntity as follows:
- id (PK INT NOT NULL)
- account_id (FK INT NOT NULL)
- key (INT NOT NULL. UNIQUE for given account_id)
- name (VARCHAR NOT NULL. UNIQUE FOR given account_id)
I don't wish to expose the primary key id to the user, and added key for this purpose. key kind of acts as an auto-increment column for a given accounts_id which will need to be manually done by the application. I first planned on making the primary key composite id-account_id, however, the table is joined to other tables, and before I knew it, I had four columns in a table which could have been one. While account_id-name does the same as account_id-key, key is smaller and will minimize network traffic when a client requests multiple records. Yes, I know it isn't properly normalized, and while not my direct question, would appreciate any constructive criticism comments.
Sorry for the rambling... When is GROUP BY required for an aggregate function? For instance, what about the following? https://stackoverflow.com/a/1547128/1032531 doesn't show one. Is it needed?
SELECT COALESCE(MAX(key),0)+1 FROM myEntity WHERE accounts_id=123;
You gave a query as an example not requiring GROUP BY. For the sake of explanation, I'll simplify it as follows.
SELECT MAX(key)
FROM myEntity
WHERE accounts_id = 123
Why doesn't that query require GROUP BY? Because you only expect one row in the result set, describing a particular account.
What if you wanted a result set describing all your accounts with one row per account? Then you would use this:
SELECT accounts_id, MAX(key)
FROM myEntity
GROUP BY accounts_id
See how that goes? You get one row in this result set for each distinct value of accounts_id. By the way, MySQL's query planner knows that
SELECT accounts_id, MAX(key)
FROM myEntity
WHERE accounts_id = '123'
GROUP BY accounts_id
is equivalent to the same query omitting the GROUP BY clause.
One more thing to know: If you have a compound index on (accounts_id, key) in your table, all these queries will be almost miraculously fast because the query planner will satisfy them with a very efficient loose index scan. That's specific to MAX() and MIN() aggregate functions. Loose index scans can't bue used for SUM() or AVG() or similar functions; those require tight index scans.
It's only needed when you need it. For example, if you wanted to return all of the keys, you could use
SELECT COALESCE(MAX(key),0)+1 FROM myEntity GROUP BY accounts_id
rather than your select. But your select is fine (though it seems like you may have made things a little hard for yourself with your structure but I don't know what issues you're trying to address)

If your table has more selects than inserts, are indexes always beneficial?

I have a mysql innodb table where I'm performing a lot of selects using different columns. I thought that adding an index on each of those fields could help performance, but after reading a bit on indexes I'm not sure if adding an index on a column you select on always helps.
I have far more selects than inserts/updates happening in my case.
My table 'students' looks like:
id | student_name | nickname | team | time_joined_school | honor_roll
and I have the following queries:
# The team column is varchar(32), and only has about 20 different values.
# The honor_roll field is a smallint and is only either 0 or 1.
1. select from students where team = '?' and honor_roll = ?;
# The student_name field is varchar(32).
2. select from students where student_name = '?';
# The nickname field is varchar(64).
3. select from students where nickname like '%?%';
all the results are ordered by time_joined_school, which is a bigint(20).
So I was just going to add an index on each of the columns, does that make sense in this scenario?
Thanks
Indexes help the database more efficiently find the data you're looking for. Which is to say you don't need an index simply because you're selecting a given column, but instead you (generally) need an index for columns you're selecting based on - i.e. using a WHERE clause (even if you don't end up including the searched column in your result).
Broadly, this means you should have indexes on columns that segregate your data in logical ways, and not on extraneous, simply informative columns. Before looking at your specific queries, all of these columns seem like reasonable candidates for indexing, since you could reasonably construct queries around these columns. Examples of columns that would make less sense would be things phone_number, address, or student_notes - you could index such columns, but generally you don't need or want to.
Specifically based on your queries, you'll want student_name, team, and honor_roll to be indexed, since you're defining WHERE conditions based on the values of these columns. You'll also benefit from indexing time_joined_school if, as you suggest, you're ORDER BYing your queries based on that column. Your LIKE query is not actually easy for most RDBs to handle, and indexing nickname won't help. Check out How to speed up SELECT .. LIKE queries in MySQL on multiple columns? for more.
Note also that the ratio of SELECT to INSERT is not terribly relevant for deciding whether to use an index or not. Even if you only populate the table once, and it's read-only from that point on, SELECTs will run faster if you index the correct columns.
Yes indexes help on accerate your querys.
In your case you should have index on:
1) Team and honor_roll from query 1 (only 1 index with 2 fields)
2) student_name
3) time_joined_school from order
For the query 3 you can't use indexes because of the like statement. Hope this helps.

Subquery for fetching table name

I have a query like this :
SELECT * FROM (SELECT linktable FROM adm_linkedfields WHERE name = 'company') as cbo WHERE group='BEST'
Basically, the table name for the main query is fetched through the subquery.
I get an error that #1054 - Unknown column 'group' in 'where clause'
When I investigate (removing the where clause), I find that the query only returns the subquery result at all times.
Subquery table adm_linkedfields has structure id | name | linktable
Currently am using MySQL with PDO but the query should be compatible with major DBs (viz. Oracle, MSSQL, PgSQL and MySQL)
Update:
The subquery should return the name of the table for the main query. In this case it will return tbl_company
The table tbl_company for the main query has this structure :
id | name | group
Thanks in advance.
Dynamic SQL doesn't work like that, what you created is an inline-view, read up on that. What's more, you can't create a dynamic sql query that will work on every db. If you have a limited number of linktables you could try using left-joins or unions to select from all tables but if you don't have a good reason you don't want that.
Just select the tablename in one query and then make another one to access the right table (by creating the query string in php).
Here is an issue:
SELECT * FROM (SELECT linktable FROM adm_linkedfields WHERE name = 'company') as cbo
WHERE group='BEST';
You are selecting from DT which contains only one column "linktable", then you cant put any other column in where clause of outer block. Think in terms of blocks the outer select is refering a DT which contains only one column.
Your problem is similar when you try to do:
create table t1(x1 int);
select * from t1 where z1 = 7; //error
Your query is:
SELECT *
FROM (SELECT linktable
FROM adm_linkedfields
WHERE name = 'company'
) cbo
WHERE group='BEST'
First, if you are interested in cross-database compatibility, do not name columns or tables after SQL reserved words. group is a really, really bad name for a column.
Second, the from clause is returning a table containing a list of names (of tables, but that is irrelevant). There is no column called group, so that is the problem you are having.
What can you do to fix this? A naive solution would be to run the subquery, run it, and use the resulting table name in a dynamic statement to execute the query you want.
The fundamental problem is your data structure. Having multiple tables with the same structure is generally a sign of a bad design. You basically have two choices.
One. If you have control over the database structure, put all the data in a single table, linktable for instance. This would have the information for all companies, and a column for group (or whatever you rename it). This solution is compatible across all databases. If you have lots and lots of data in the tables (think tens of millions of rows), then you might think about partitioning the data for performance reasons.
Two. If you don't have control over the data, create a view that concatenates all the tables together. Something like:
create view vw_linktable as
select 'table1' as which, t.* from table1 t union all
select 'table2', t.* from table2 t
This is also compatible across all databases.