Calling Data from 2 tables - mysql

I am kind of new to SQL. I have 2 MySQL Tables. Below is their structure.
Key_Hash Table
CREATE TABLE `key_hash` (
`primary_key` int(11) NOT NULL,
`hash` text NOT NULL,
`totalNumberOfWords` int(11) NOT NULL,
PRIMARY KEY (`primary_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
--
Key_Word Table
CREATE TABLE `key_word` (
`primary_key` bigint(20) NOT NULL AUTO_INCREMENT,
`indexVal` int(11) NOT NULL,
`hashed_word` char(3) NOT NULL,
PRIMARY KEY (`primary_key`),
KEY `hashed_word` (`hashed_word`,`indexVal`)
) ENGINE=InnoDB AUTO_INCREMENT=28570982 DEFAULT CHARSET=latin1
Now, below is my query
SELECT `indexVal`, COUNT(`indexVal`) FROM `key_word` WHERE `hashed_word` IN ('001','01v') GROUP BY `indexVal` LIMIT 100;
When you run the above query, you will get an output like below
The important thing here to note is that indexVal in key_word table is the same set of data in primary_key in key_hash table (I think it can be a foreign key?). In other words, primary_key data in key_hash table appear as indexVal in key_word table. But pleas note indexVal can appear any number of times inside the table because it is not a primary key in key_word.
OK so, this is not the query what I need exactly. I need to count how many times each unique indexVal appear in the above search, and divide it by appropriate value in key_hash.totalNumberOfWords.
I am providing few examples below.
Imagine I ran the above query, now the result is generated. It says
indexVal 0 appeared 10 times in search
indexVal 1 appeared 20 times in search
indexVal 300 appeared 20,000 times in search
Now keep in mind that key_hash.primary_key = key_word.indexVal . first I search for key_hash.primary_key which is similar to key_word.indexVal and get the associated key_hash.numberOfWords. Then I divide the count() appeared in the above mentioned query from this key_hash.numberOfWords and multiply the total answer by 100 (to get the value as a percentage). Below is a query I tried but it has errors.
SELECT `indexVal`,COUNT(`indexVal`), (COUNT(`indexVal`) / (select `numberOfWords` from `key_hash` where `primary_key`=`key_word.indexVal`)*100) FROM `key_word` WHERE `hashed_word` IN ('001','01v') GROUP BY `indexVal` LIMIT 100;
How can I do this job?
EDIT
This is how the key_hash table looks like
This is how the key_word table looks like

You can use a JOIN instead of a sub-query
SELECT w.indexVal
, COUNT(w.indexVal)
, COUNT(w.indexVal) / MAX(h.numberOfWords) * 100
FROM key_word w
INNER JOIN key_hash h ON h.primary_key = w.indexVal
WHERE w.hashed_word IN ('001','01v')
GROUP BY indexVal
LIMIT 100

Related

MySQL Recursive SQL select takes too long to execute

I have a table with records that are referenced to their correlate records as "sponsors" of record and I am using this SQL select to obtain a list of 10 records:
SELECT A.ref_user_id, A.ref_user_id_sponsor,
IF(A.businessname IS NULL OR A.businessname = '', LTRIM(RTRIM(CONCAT(A.name, ' ', A.surname))), A.businessname) AS namesurnamesponsor, A.level
FROM (
with recursive parent_users (ref_user_id, ref_user_id_sponsor, name, surname, businessname, level) AS (
SELECT ref_user_id, ref_user_id_sponsor, name, surname, businessname, 1 level
FROM users_details
WHERE ref_user_id = XXXXXXXXX
union all
SELECT t.ref_user_id, t.ref_user_id_sponsor, t.name, t.surname, t.businessname, level + 1
FROM users_details t INNER JOIN parent_users pu
ON t.ref_user_id = pu.ref_user_id_sponsor
)
SELECT * FROM parent_users ) A LIMIT 10
but looks like it takes too long to extract just 10 records from a table of just 120 records total. Plus , I tried to create an index to speed up :
CREATE INDEX idx_ref_user_id_ref_user_id_sponsor ON (ref_user_id, ref_user_id_sponsor)
but it takes too long even to create the index which would help the SELECT to give me back just those 10 results .
Do you have a suggest for that? An alternative Index? or even an alternative way to obtain upper 10 sponsors of selected record declared by WHERE ref_user_id = XXXXXXXXX ? Thanks to all! Cheers
EDIT : I run an EXPLAIN SELECT for the above query and I obtained this result:
and table structure is:
CREATE TABLE IF NOT EXISTS users_details (
ID bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
ref_user_id bigint(20) UNSIGNED NOT NULL,
ref_user_id_sponsor bigint(20) UNSIGNED DEFAULT NULL,
sponsorship_code varchar(6) NOT NULL,
name varchar(250) NOT NULL,
surname varchar(250) NOT NULL,
businessname varchar(300) DEFAULT NULL,
activate tinyint(1),
PRIMARY KEY (ID),
CONSTRAINT fk_users_details_id
FOREIGN KEY (ref_user_id)
REFERENCES users(ID)
ON DELETE CASCADE,
CONSTRAINT fk_users_ref_user_id_sponsor
FOREIGN KEY (ref_user_id_sponsor)
REFERENCES users(ID)
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
and a SELECT COUNT(*) FROM users_details returns just 120 records. The table users just contains ID, login, pwd and a couple others columns of the users
Edit 2 : Maybe is there a better SELECT to obtain same result concerning a list of the upline of an user referencing its sponsor? Or just would be better add a
CREATE INDEX idx_ref_user_id_ref_user_id_sponsor ON (ref_user_id, ref_user_id_sponsor)
to speed it up?
Edit 3 :
here is a photo of 20 of those users.... obiously hiding sensible informations and as you can see ref_user_id and ref_user_id_sponsor are strictly connected between them :
and about the time of execution looks like no end time which is weird because some time ago with just some less data (like 60 users instead of 120) , that query gave me quickly enough 10 users result.
Eventually maybe is there an alternative recursive SELECT that would give me same result back just to check if could be that with recursive clause or not? Or even shall I have to create index on those two columns ref_user_id and ref_user_id_sponsor to speed it up?

Is there any way to get this order by query to start searching from the given timestamp on a MySQL index?

I am working on a mysql 5.6 database, and I have a table looking something like this:
CREATE TABLE `items` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`node_type_id` int(11) NOT NULL,
`property_native_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`parent_item_id` bigint(20) DEFAULT NULL,
`external_timestamp` datetime DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_items_on_acct_node_prop` (`account_id`,`node_type_id`,`property_native_id`),
KEY `index_items_on_account_id_and_external_timestamp` (`account_id`,`external_timestamp`),
KEY `index_items_on_account_id_and_created_at` (`account_id`,`created_at`),
KEY `parent_item_external_timestamp_idx` (`parent_item_id`,`external_timestamp`),
) ENGINE=InnoDB AUTO_INCREMENT=194417315 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I am trying to optimize a query doing this:
SELECT *
FROM items
WHERE parent_item_id = ?
AND external_timestamp < ( SELECT external_timestamp
FROM items
WHERE id = ?
) FROM items ORDER BY
external_timestamp LIMIT 5
Currently, there is an index on parent_item_id, so when I run this query with EXPLAIN, I get an "extra" of "Using where; Using filesort"
When I modify the index to be (parent_item_id, external_timestamp), then the EXPLAIN's "extra" becomes "Using index condition"
The problem is that the EXPLAIN's "rows" field is still the same (which is usually a couple thousand rows, but it could be millions in some use-cases).
I know that I can do something like AND external_timestamp > (1 week ago) or something like that, but I'd really like the number of rows to be just the number of LIMIT, so 5 in that case.
Is it possible to instruct the database to lock onto a row and then get the 5 rows before it on that (parent_item_id, external_timestamp) index?
(I'm unclear on what you are trying to do. Perhaps you should provide some sample input and output.) See if this works for you:
SELECT i.*
FROM items AS i
WHERE i.parent_item_id = ?
AND i.external_timestamp < ( SELECT external_timestamp
FROM items
WHERE id = ? )
ORDER BY i.external_timestamp
LIMIT 5
Your existing INDEX(parent_item_id, external_timestamp) will probably be used; see EXPLAIN SELECT ....
If id was supposed to match in all 5 rows, then the subquery is not needed.
SELECT items.*
FROM items
CROSS JOIN ( SELECT external_timestamp
FROM items
WHERE id = ? ) subquery
WHERE items.parent_item_id = ?
AND items.external_timestamp < subquery.external_timestamp
ORDER BY external_timestamp LIMIT 5
id is PK, hence the subquery will return only one row (or none).

Should we use the "LIMIT clause" in following example?

There is a structure:
CREATE TABLE IF NOT EXISTS `categories` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`parent_id` int(11) unsigned NOT NULL DEFAULT '0',
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query_1:
SELECT * FROM `categories` WHERE `id` = 1234
Query_2:
SELECT * FROM `categories` WHERE `id` = 1234 LIMIT 1
I need to get just one row. Since we apply WHERE id=1234 (finding by PRIMARY KEY) obviously that row with id=1234 is only one in whole table.
After MySQL has found the row, whether engine to continue the search when using Query_1?
Thanks in advance.
Look at this SQLFiddle: http://sqlfiddle.com/#!2/a8713/4 and especially View Execution Plan.
You see, that MySQL recognizes the predicate on a PRIMARY column and therefore it does not matter if you add LIMIT 1 or not.
PS: A little more explanation: Look at the column rows of the Execution Plan. The number there is the amount of columns, the query engine thinks, it has to examine. Since the columns content is unique (as it's a primary key), this is 1. Compare it to this: http://sqlfiddle.com/#!2/9868b/2 same schema but without primary key. Here rows says 8. (The execution plan is explained in the German MySQL reference, http://dev.mysql.com/doc/refman/5.1/en/explain.html the English one is for some reason not so detailed.)

MYSQL SUM() with GROUP BY and LIMIT

I got this table
CREATE TABLE `votes` (
`item_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`vote` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`item_id`,`user_id`),
KEY `FK_vote_user` (`user_id`),
KEY `vote` (`vote`),
KEY `item` (`item_id`),
CONSTRAINT `FK_vote_item` FOREIGN KEY (`item_id`) REFERENCES `items` (`id`) ON UPDATE CASCADE,
CONSTRAINT `FK_vote_user` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
And I got this simple select
SELECT
`a`.`item_id`, `a`.`sum`
FROM
(SELECT
`item_id`, SUM(vote) AS `sum`
FROM
`votes`
GROUP BY `item_id`) AS a
ORDER BY `a`.`sum` DESC
LIMIT 10
Right now, with only 250 rows, there isn't a problem, but it's using filesort. The vote column has either -1, 0 or 1. But will this be performant when this table has millions or rows?
If I make it a simpler query without a subquery, then the using temporary table appears.
Explain gives (the query completes in 0.00170s):
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 33 Using filesort
2 DERIVED votes index NULL PRIMARY 8 NULL 250
No, this won't be efficient with millions of rows.
You'll have to create a supporting aggregate table which would store votes per item:
CREATE TABLE item_votes
(
item_id INT NOT NULL PRIMARY KEY,
votes UNSIGNED INT NOT NULL,
upvotes UNSIGNED INT NOT NULL,
downvotes UNSIGNED INT NOT NULL,
KEY (votes),
KEY (upvotes),
KEY (downvotes)
)
and update it each time a vote is cast:
INSERT
INTO item_votes (item_id, votes, upvotes, downvotes)
VALUES (
$item_id,
CASE WHEN $upvote THEN 1 ELSE -1 END,
CASE WHEN $upvote THEN 1 ELSE 0 END,
CASE WHEN $upvote THEN 0 ELSE 1 END
)
ON DUPLICATE KEY
UPDATE
SET votes = votes + VALUES(upvotes) - VALUES(downvotes),
upvotes = upvotes + VALUES(upvotes),
    downvotes = downvotes + VALUES(downvotes)
then select top 10 votes:
SELECT *
FROM item_votes
ORDER BY
votes DESC, item_id DESC
LIMIT 10
efficiently using an index.
But will this be performant when this table has millions or rows?
No, it won't.
If I make it a simpler query without a subquery, then the using temporary table appears.
Probably because the planner would turn it into the query you posted: it needs to calculate the sum to return the results in the correct order.
To quickly grab the top voted questions, you need to cache the result. Add a score field in your items table, and maintain it (e.g. using triggers). And index it. You'll then be able to grab the top 10 scores using an index scan.
First, you don't need the subquery, so you can rewrite your query as:
SELECT `item_id`, SUM(vote) AS `sum`
FROM `votes`
GROUP BY `item_id`
ORDER BY `a`.`sum` DESC
LIMIT 10
Second, you can build an index on votes(item_id, vote). The group by will then be an index scan. This will take time as the table gets bigger, but it should be manageable for reasonable data sizes.
Finally, with this structure of a query, you need to do a file sort for the final order by. Whether this is efficient or not depends on the number of items you have. If each item has, on average, one or two votes, then this may take some time. If you have a fixed set of items and there are only a few hundred or thousand, then then should not be a performance bottleneck, even as the data size expands.
If this summary is really something you need quickly, then a trigger with a summary table (as explained in another answer) provides a faster retrieval method.

Order by two fields - Indexing

So I've got a table with all users, and their values. And I want to order them after how much "money" they got. The problem is that they have money in two seperate fields: users.money and users.bank.
So this is my table structure:
CREATE TABLE IF NOT EXISTS `users` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(54) COLLATE utf8_swedish_ci NOT NULL,
`money` bigint(54) NOT NULL DEFAULT '10000',
`bank` bigint(54) NOT NULL DEFAULT '10000',
PRIMARY KEY (`id`),
KEY `users_all_money` (`money`,`bank`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci AUTO_INCREMENT=100 ;
And this is the query:
SELECT id, (money+bank) AS total FROM users FORCE INDEX (users_all_money) ORDER BY total DESC
Which works fine, but when I run EXPLAIN it shows "Using filesort", and I'm wondering if there is any way to optimize it?
Because you want to sort by a derived value (one that must be calculated for each row) MySQL can't use the index to help with the ordering.
The only solution that I can see would be to create an additional total_money or similar column and as you update money or bank update that value too. You could do this in your application code or it would be possible to do this in MySQL with triggers too if you wanted.