MySQL Fastest Way To Handle Composite Key Query - mysql

I'm having some issues with slow queries on a MySQL database with many rows and I'm just hoping to make sure I'm doing this right.
I have a table that contains a TAID with an associated Row like this:
Row | TAID
----------
1 1
2 1
3 1
4 1
1 2
2 2
3 2
4 2
Currently I have TAID, Row setup as a Composite Key, but I generally query all the rows using the TAID column.
Is it slow because there are multiple instances of the TAID?
Am I thinking about this the right way?
Edit: I think the order of the columns is the problem.
I actually have the Row before the TAID and I'm querying on the TAID.
Going to try flipping the order.

As suggested the order of the Composite keys needed to be flipped.
Since I'm querying on the TAID, it needs to be the first key.

Related

Shared foreign keys without duplication of entries?

Sorry for the beginner question.
I have an Outputs table:
ID
value
0
x
1
y
2
z
And an Inputs table that is linked to the Outputs through the outputsID:
ID
outputsID
name
0
0
A
1
1
B
2
1
C
3
2
B
4
2
C
Assuming that multiple outputs have at least one shared input (in this example outputID 1,3 and 2,4 are the same), is there a way to avoid the duplication of entries in my Inputs table (inputID 3 and 4)?
The 'normal' answer to your question is no. Rows 1 and 2 address output 1, and Rows 3 and 4 address output 2. They aren't duplicates and each reflect something distinct.
So if you are a beginner, I would say you shouldn't want to get rid of these rows.
That said, there are some more advanced techniques. For example, you could have the OutputsID column be an array with multiple values. This is harder, more complex, and non-standard.

MySQL query id generation

I have table and when I add a new row I want to calculat an id number to it.
conditions:
if i have missing number give the smallest one (3)
if i have no missing number than give the next number that comes in a row (5)
How can I build this conditions into my query?
My query:
INSERT INTO sample(product) VALUES ('$product')
table
product
id
name1
1
name2
2
name4
4
solution if I have missing id
product
id
name1
1
name2
2
name4
4
name5
3
solution if I have no missing id
product
id
name1
1
name2
2
name3
3
name4
4
name5
5
MySQL has a feature to generate an auto-increment id. See https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html
The number is always incremented. It does not go back and fill in missing id values. This is deliberate and necessary to prevent race conditions while concurrent sessions are inserting rows.
If you want to find unused values as you insert a new row, you find that to prevent two concurrent sessions from using the same value, you end up having to lock the whole table. This hurts performance for many apps.
You should drop the requirement that the id values must be consecutive. They are not ordinal row numbers. You may always have missing id values, because you may delete rows, or rollback a transaction that inserts a row, or the auto-increment mechanism can even skip values as it generates them.

MYSQL DB Best method to store keywords and URL index

Which of these methods would be the most efficient way of storing, retrieving, processing and searching a large (millions of records) index of stored URLs along with there keywords.
Example 1: (Using one table)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com videos,photos,images
2 yoursite.com videos,games
3 hissite.com games,images
4 hersite.com photos,pictures
---------------------------------------------------------
Example 2: (one-to-one Relationship from one table to another)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_KEYWORDS---------------------------------------------
ID DOMAIN_ID KEYWORDS
1 1 videos,photos,images
2 2 videos,games
3 3 games,images
4 4 photos,pictures
---------------------------------------------------------
Example 3: (one-to-one Relationship from one table to another (Using a reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 2 2
3 3 3
4 4 4
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos,photos,images
2 videos,games
3 games,images
4 photos,pictures
---------------------------------------------------------
Example 4: (many-to-many Relationship from url to keyword ID (using reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 4
7 3 3
8 4 2
9 4 5
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos
2 photos
3 images
4 games
5 pictures
---------------------------------------------------------
My understanding is that Example 1 would take the largest amount of storage space however searching through this data would be quick (Repeat keywords saved multiple times, however keywords are sat next to the relevant domain)
wWhereas Example 4 would save a tons on storage space but searching through would take longer. (Not having to store duplicate keywords, however referencing multiple keywords for each domain would take longer)
Could anyone give me any insight or thoughts on which the best method would be to utilise when designing a database that can handle huge amounts of data? With the foresight that you may want to display a URL with its assosicated keywords OR search for one or more keywords and bring up the most relevant URLs
You do have a many-to-many relationship between url and keywords. The canonical way to represent this in a relational database is to use a bridge table, which corresponds to example 4 in your question.
Using the proper data structure, you will find out that the queries will be much easier to write, and as efficient as it gets.
I don't know what drives you to think that searchin in a structure like the first one will be faster. This requires you to do pattern matching when searching for each single keyword, which is notably slow. On the other hand, using a junction table lets you search for exact matches, which can take advantage of indexes.
Finally, maintaining such a structure is also much easier; adding or removing keywords can be done with insert and delete statements, while other structures require you do do string manipulation in delimited list, which again is tedious, error-prone and inefficient.
None of the above.
Simply have a table with 2 string columns:
CREATE TABLE domain_keywords (
domain VARCHAR(..) NOT NULL,
keyword VARCHAR(..) NOT NULL,
PRIMARY KEY(domain, keyword),
INDEX(keyword, domain)
) ENGINE=InnoDB
Notes:
It will be faster.
It will be easier to write code.
Having a plain id is very much a waste.
Normalizing the domain and keyword buys little space savings, but at a big loss in efficiency.
"Huse database"? I predict that this table will be smaller than your Domains table. That is, this table is not your main concern for "huge".

SQL query to identify max value in an subset of records to be used as boundary condition for Batch Job partitioning

I have around 2 million records in the database and I want to us the concept of partitions in one of my batch jobs. In order to do this I need to first identify the boundary records of the partition. Can anyone help out to identify boundry values using SQL query. To illustrate consider i have student records as follows
STUDENT_ID STUDENT_NAME
1 JACK
2 SPARROW
3 JONNY
4 WALKER
5 SKY
6 DANNY
Now if i want to create 2 partitions by boundary condition of first partition will be STUDENT_ID between 1 to 3 and STUDENT_ID between 4 to 6. consider similar situation incase student_id is a string or random id. How to identify the bounday condition. Currently I am thinking of first querying all the records in the database and then partitioning them in the java code. But if I have 2 million records this is highly not recommended what should i do in this condition?
You can use limit command in mySql as follow:
SELECT...
LIMIT y OFFSET x

Row to cloumn based on where condition - Mysql

I have table with colimns
ID|NAME|AGE
1 |name1|40
1 |name2|45
2 |name3|30
2 |name4|39
result i want like this
ID1|NAME1|AGE1|ID2|NAME2|AGE2
1 |name1|40 | 2 |name3|30
1 |name2|45 | 2 |name4|39
there are around 5k rows.
Thanks.
You can get a full product of the tables:
select table.col1,table.col2,table.col3,table2.col1,table2.col2,table2.col3
from table, table2
where table.col1='test' and table2.col2='test1'
The result may have duplicate records from the both tables. But as you don't have any primary keys that's possibly not an issue for you.