How to add index to a mysql database to improve performance? - mysql

I'm working on a project that synchronizes contents on multiple clusters. I want to use two programs to do this task. One program periodically updates the existence status of contents on each cluster, The other program do the content copy when one content is not on all clusters.
The database tables are designed in following way.
table contents -- stores content information.
+----+----------+
| id | name |
+----+----------+
| 1 | content_1|
| 2 | content_2|
| 3 | content_3|
+----+----------+
table clusters -- stores cluster information
+----+----------+
| id | name |
+----+----------+
| 1 | cluster_1|
| 2 | cluster_2|
+----+----------+
table content_cluster -- each record indicates that one content is on one cluster
+----------+----------+-------------------+
|content_id|cluster_id| last_update_date|
+----------+----------+-------------------+
| 1 | 1 |2020-10-01T11:30:00|
| 2 | 2 |2020-10-01T11:30:00|
| 3 | 1 |2020-10-01T10:30:00|
| 3 | 2 |2020-10-01T10:30:00|
+----------+----------+-------------------+
The first program updates these tables periodically (a little part may changed, most of the table stay the same). The second program iteratively gets one content record which is not on all clusters. The select query is something as follow.
SELECT content_id
FROM content_cluster
GROUP BY content_id
HAVING COUNT(cluster_id) < <cluster_number>
LIMIT 1;
It seems not so effective as I have to group the table on my each query.
As I am not familiar with database, I'm wondering if it is a good way to design the database. Do I have to add some indices? How should I write select query to make it effective?

Related

How to structure a MySQL query to join with an exclusion

In short; we are trying to return certain results from one table based on second level criteria of another table.
I have a number of source data tables,
So:
Table DataA:
data_id | columns | stuff....
-----------------------------
1 | here | etc.
2 | here | poop
3 | here | etc.
Table DataB:
data_id | columnz | various....
-----------------------------
1 | there | you
2 | there | get
3 | there | the
4 | there | idea.
Table DataC:
data_id | column_s | others....
-----------------------------
1 | where | you
2 | where | get
3 | where | the
4 | where | idea.
Table DataD: etc. There are more and more will be added ongoing
And a relational table of visits, where there are "visits" to some of these other data rows in these other tables above.
Each of the above tables holds very different sets of data.
The way this is currently structured is like this:
Visits Table:
visit_id | reference | ref_id | visit_data | columns | notes
-------------------------------------------------------------
1 | DataC | 2 | some data | etc. | so this is a reference
| | | | | to a visit to row id
| | | | | 2 on table DataC
2 | DataC | 3 | some data | etc. | ...
3 | DataB | 4 | more data | etc. | so this is a reference
| | | | | to a visit to row id
| | | | | 4 on table DataB
4 | DataA | 1 | more data | etc. | etc. etc.
5 | DataA | 2 | more data | etc. | you get the idea
Now we currently list the visits by various user given criteria, such as visit date.
however the user can also choose which tables (ie data types) they want to view, so a user has to tick a box to show they want data from DataA table, and DataC table but not DataB, for example.
The SQL we currently have works like this; the column list in the IN conditional is dynamically generated from user choices:
SELECT visit_id,columns, visit_data, notes
FROM visits
WHERE visit_date < :maxDate AND visits.reference IN ('DataA','DataC')
The Issue:
Now, we need to go a step beyond this and list the visits by a sub-criteria of one of the "Data" tables,
So for example, DataA table has a reference to something else, so now the client wants to list all visits to numerous reference types, and IF the type is DataA then to only count the visits if the data in that table fits a value.
For example:
List all visits to DataB and all visits to DataA where DataA.stuff = poop
The way we currently work this is a secondary SQL on the results of the first visit listing, exampled above. This works but is always returning the full table of DataA when we only want to return a subset of DataA but we can't be exclusive about it outside of DataA.
We can't use LEFT JOIN because that doesn't trim the results as needed, we can't use exclusionary joins (RIGHT / INNER) because that then removes anything from DataC or any other table,
We can't find a way to add queries to the WHERE because again, that would loose any data from any other table that is not DataA.
What we kind of need is a JOIN within an IF/CASE clause.
Pseudo SQL:
SELECT visit_id,columns, visit_data, notes
FROM visits
IF(visits.reference = 'DataA')
INNER JOIN DataA ON visits.ref_id = DataA.id AND DataA.stuff = 'poop'
ENDIF
WHERE visit_date < 2020-12-06 AND visits.reference IN ('DataA','DataC')
All criteria in the WHERE clause are set by the user, none are static (This includes the DataA.stuff criteria too).
So with the above example the output would be:
visit_id | reference | ref_id | visit_data | columns | notes
-------------------------------------------------------------
1 | DataC | 2 | some data | etc. |
2 | DataC | 3 | some data | etc. |
5 | DataA | 1 | more data | etc. |
We can't use Union because the different Data tables contain lots of different details.
Questions:
There may be a very straightforward answer to this but I can't see it,
How can we approach trying to achieve this sort of partial exclusivity?
I suspect that our overarching architecture structure here could be improved (the system complexity has grown organically over a number of years). If so, what could be a better way of building this?
What we kind of need is a JOIN within an IF/CASE clause.
Well, you should know that's not possible in SQL.
Think of this analogy to function calls in a conventional programming language. You're essentially asking for something like:
What we need is a function call that calls a different function depending on the value you pass as a parameter.
As if you could do this:
call $somefunction(argument);
And which $somefunction you call would be determined by the function called, depending on the value of argument. This doesn't make any sense in any programming language.
It is similar in SQL — the tables and columns are fixed at the time the query is parsed. Rows of data are not read until the query is executed. Therefore one can't change the tables depending on the rows executed.
The simplest answer would be that you must run more than one query:
SELECT visit_id,columns, visit_data, notes
FROM visits
INNER JOIN DataA ON visits.ref_id = DataA.id AND DataA.stuff = 'poop'
WHERE visit_date < 2020-12-06 AND visits.reference = 'DataA';
SELECT visit_id,columns, visit_data, notes
FROM visits
WHERE visit_date < 2020-12-06 AND visits.reference = 'DataC';
Not every task must be done in one SQL query. If it's too complex or difficult to combine two tasks into one query, then leave them separate and write code in the client application to combine the results.

MS Access help needed forming a specific report

I have a table with a column for agent names and a column for each of the skills those agents could possibly have. Each skill the agent is assigned shows a 1 in the field under that skill.
Columns look like this:
+---------+----------+----------+----------+
| Name | 'Skill1' | 'Skill2' | 'Skill3' |
+---------+----------+----------+----------+
| John | 1 | | 1 |
| Sam | 1 | 1 | |
| Roberta | 1 | | 1 |
+---------+----------+----------+----------+
I would like to make a query that returns a list of all agent names that have a 1 for each particular skill. The query would return something like this:
+-----------+
| Skill 1 |
+-----------+
| John |
| Sam |
| Roberta |
+-----------+
Additionally I would like to be able to query a single name and retrieve all skills that agent has (all rows the Name column has a 1 in) like this:
+-----------+
| John |
+-----------+
| Skill 1 |
| Skill 3 |
+-----------+
I've done this in Excel using an index but I'm new to Access and not sure how to complete this task.
Thanks in advance.
One of the reasons that you are finding this task difficult is because your database is not normalised and so due to the way that your database is structured, you are working against MS Access, not with it.
Consequently, whilst a solution is still possible with the current data, the resulting queries will be painful to construct and will either be full of multiple messy iif statements, or several union queries performing the same operations over & over again, one for each 'skill'.
Then, if you every wish to add another Skill to the database, all of your queries have to be rewritten!
Whereas, if your database was normalised (as Gustav has suggested in the comments), the task would be a simple one-liner; and what's more, if you add a new skill later on, your queries will automatically output the results as if the skill had always been there.
Your data has a many-to-many relationship: an agent may have many skills, and a skill may be known by many agents.
As such, the most appropriate way to represent this relationship is using a junction table.
Hence, you would have a table of Agents such as:
tblAgents
+-----+-----------+----------+------------+
| ID | FirstName | LastName | DOB |
+-----+-----------+----------+------------+
| 1 | John | Smith | 1970-01-01 |
| ... | ... | ... | ... |
+-----+-----------+----------+------------+
This would only contain information unique to each agent, i.e. minimising the repeated information between records in the table.
You would then have a table of possible Skills, such as:
tblSkills
+-----+---------+---------------------+
| ID | Name | Description |
+-----+---------+---------------------+
| 1 | Skill 1 | Skill 1 Description |
| 2 | Skill 2 | Skill 2 Description |
| ... | ... | ... |
+-----+---------+---------------------+
Finally, you would have a junction table linking Agents to Skills, e.g.:
tblAgentSkills
+----+----------+----------+
| ID | Agent_ID | Skill_ID |
+----+----------+----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 3 | 2 |
+----+----------+----------+
Now, say you want to find out which agents have Skill 1, the query is simple:
select Agent_ID from tblAgentSkills where Skill_ID = 1
What if you want to find out the skills known by an agent? Equally as simple:
select Skill_ID from tblAgentSkills where Agent_ID = 1
Of course, these queries will merely return the ID fields as present in the junction table - but since the ID uniquely identifies a record in the tblAgents or tblSkills tables, such ID is all you need to retrieve any other required information:
select
tblAgents.FirstName,
tblAgents.LastName
from
tblAgentSkills inner join tblAgents on
tblAgentSkills.AgentID = tblAgents.ID
where
tblAgentSkills.Skill_ID = 1
To get all agents with skill1, open the query designer and create the following query:
this will generate the following sql
SELECT Skills.AgentName
FROM Skills
WHERE (((Skills.Skill1)=1));
If you adjust the names you can also paste this query into the sql pane of the designer to get the query you want.
To get all the skills an agent has I chose a parameterized query. Open the query designer and create a new query:
When you run this query it will ask you for the name of the agent. Make sure to type the agent name exactly. Here is the resulting sql:
SELECT Skills.AgentName, Skills.Skill1, Skills.Skill2, Skills.Skill3
FROM Skills
WHERE (((Skills.AgentName)=[Agent]));
If you continue working with this query I would improve the table design by breaking your table into a skills table, agents table, skills&agents table. Then link the skills and agents tables to the skills&agents table in a many to many relationship. The query to get all an agents skills would then look like this in the designer:

Dynamic value to display numbers of entries in second table

I've got multiple entries in table A and would like to display the number of entries in a coloumn of table B. Is there a way to create a dynamic cell-content displaying the number of entries in a table?
I'm a beginner in MySQL and did not find a way to do it so far.
Example table A:
+----+------+------------+
| id | name | birthday |
+----+------+------------+
| 1 | john | 1976-11-18 |
| 2 | bill | 1983-12-21 |
| 3 | abby | 1991-03-11 |
| 4 | lynn | 1969-08-02 |
| 5 | jake | 1989-07-29 |
+----+------+------------+
What I'd like in table B:
+----+------+----------+
| id | name | numusers |
| 1 | tblA | 5 |
+----+------+----------+
In my actual database there is no incrementing ID so just taking the last value would not work - if this would've been a solution.
If MySQL can't handle this the option would be to create some kind of cronjob on my server reading the number of rows and writing them into that cell. I know how to do this - just checking if there's another way.
I'm not looking for a command to run on the mysql-console. What I'm trying to figure out is if there's some option which dynamically changes the cell's value to what I've described above.
You can create a view that will give you this information. The SQL for this view is inspired by an answer to a similar question:
CREATE VIEW table_counts AS
SELECT table_name, table_rows
FROM information_schema.tables
WHERE table_schema = '{your_db}';
The view will have the cells you speak of. As you can see, it is just a filter on an already existing table, so you might consider that this table information_schema.tables is the answer to your question.
You can do that directly with COUNT() for example SELECT COUNT(*) FROM TblA The you get all rows from that table. If you IDXs are ok then its very fast. If you write it to another table you have to make an request too to get the result of the second table. So i think your can do it directly.
If you have some performance problems there are some other possibilities like Triggers or Stored Procedures to calculate that result and save them in a memory table to get a better performance.

MYSQL query fetching DATA from two table using IN method one as composition of multiple data

I have two tables
one as td_job which has these structure
|---------|-----------|---------------|----------------|
| job_id | job_title | job_skill | job_desc |
|------------------------------------------------------|
| 1 | Job 1 | 1,2 | |
|------------------------------------------------------|
| 2 | Job 2 | 1,3 | |
|------------------------------------------------------|
The other Table is td_skill which is this one
|---------|-----------|--------------|
|skill_id |skill_title| skill_slug |
|---------------------|--------------|
| 1 | PHP | 1-PHP |
|---------------------|--------------|
| 2 | JQuery | 2-JQuery |
|---------------------|--------------|
now the job_skill in td_job is actualy the list of skill_id from td_skill
that means the job_id 1 has two skills associated with it, skill_id 1 and skill_id 2
Now I am writing a query which is this one
SELECT * FROM td_job,td_skill
WHERE td_skill.skill_id IN (SELECT td_job.job_skill FROM td_job)
AND td_skill.skill_slug LIKE '%$job_param%'
Now when the $job_param is PHP it returns one row, but if $job_param is JQuery it returns empty row.
I want to know where is the error.
The error is that you are storing a list of id's in a column rather than in an association/junction table. You should have another table, JobSkills with one row per job/skill combination.
The second and third problems are that you don't seem to understand how joins work nor how in with a subquery works. In any case, the query that you seem to want is more like:
SELECT *
FROM td_job j join
td_skill s
on find_in_set(s.skill_id, j.job_skill) > 0 and
s.skill_slug LIKE '%$job_param%';
Very bad database design. You should fix that if you can.

1 very large table or 3 large table? MySQL Performance

Assume a very large database. A table with 900 million records.
Method A:
Table: Posts
+----------+-------------- +------------------+----------------+
| id (int) | item_id (int) | post_type (ENUM) | Content (TEXT) |
+----------+---------------+------------------+----------------+
| 1 | 1 | user | some text ... |
+----------+---------------+------------------+----------------+
| 2 | 1 | page | some text ... |
+----------+---------------+------------------+----------------+
| 3 | 1 | group | some text ... |
// row 1 : User with ID 1 has a post with ID #1
// row 2 : Page with ID 1 has a post with ID #2
// row 3 : Group with ID 1 has a post with ID #3
The goal is displaying 20 records from all 3 post_types in a page.
SELECT * FROM posts LIMIT 20
But I am worried about number of records for this method
Method B:
Separate 900 million records to 3 tables with 300 millions for each one.
Table: User Posts
+----------+-------------- +----------------+
| id (int) | user_id (int) | Content (TEXT) |
+----------+---------------+----------------+
| 1 | 1 | some text ... |
+----------+---------------+----------------+
| 2 | 2 | some text ... |
+----------+---------------+----------------+
| 3 | 3 | some text ... |
Table: Page Posts
+----------+-------------- +----------------+
| id (int) | page_id (int) | Content (TEXT) |
+----------+---------------+----------------+
| 1 | 1 | some text ... |
+----------+---------------+----------------+
| 2 | 2 | some text ... |
+----------+---------------+----------------+
| 3 | 3 | some text ... |
Table: Group Posts
+----------+----------------+----------------+
| id (int) | group_id (int) | Content (TEXT) |
+----------+----------------+----------------+
| 1 | 1 | some text ... |
+----------+----------------+----------------+
| 2 | 2 | some text ... |
+----------+----------------+----------------+
| 3 | 3 | some text ... |
now to get a list of 20 posts to display
SELECT * FROM User_Posts LIMIT 10
SELECT * FROM Page_Posts LIMIT 10
SELECT * FROM group_posts LIMIT 10
// and make an array or object of result. and display in output.
In this method, I should sort them in an array in php, and then semd them to page.
Which method is preferred?
Separating a 900 million records table to three tables will affect on speed of reading and writing in mysql?
This is actually a discussion about Singe - Table - Inheritance vs. Table Per Class Inheritance and missing out joined inheritance. The former is related to Method A, the second to your Method B and Method C would be as having all IDs of your posts in one table and deferring specific attributes for group or user - posts ijto different tables.
Whilst having a big sized table always has its negativ impacts related to table full scans the approach of splitting tables has it's own , too. It depends on how often your application needs to access the whole list of posts vs. only retrieving certain post types.
Another consideration you should take into account is data partitioning which can be done with MySQL or Oracle Database e.g. which is a way of organizing your data within tables given opportunities for information lifecycle (which data is accessed when and how often, can part of it be moved and compressed reducing database size and increasing the speed for accessing the left part of the data in the table), which is basically split into three major techniques:
Range based partitioning, list based partitioning and hash based partitioning.
Other features not so commonly supported related to reducing table sizes are the ones dealing with insert's with timestamp invalidating the inserted data automatically after a certain timeperiod has expired.
What indeed is a major application design decision and can boost performance is to distinguish between read and writeaccesses to the database at application level.
Consider a MySQL - Backend: Because writeaccesses are obviously more critical to database performance then read accesses you could setup a MySQL - Instance for writing to the database and another one as replicant of this for the readaccesses, though this is also discussable, mainly when it comes to RDT (real time decisions), where absolute consistency of data at any given time is a must.
Using object pools as a layer between your application and the database also is a technique to improve application performance though I don't know of existing solutions in the PHP world yet. Oracle Hot Cache is a pretty sophisticated example of it.
You could build your own one implemented on top of a in - memory database or using memcache, though.