iterate mysql database from specific record - mysql

$pic = $DBcon->query("SELECT * FROM tbl_pictures WHERE type='holidays' ASC LIMIT 200");
Lets say my database has 500 records. If i use the query above, it starts at 200 and stops at the last record.
So the first 200 won't be searched, am I getting that right? If that is true, how do I have to write my query to start at record 200, go towards 500, and afterwards search records 0 - 200, too?
note: as soon as one match was found, the iteration can stop. it's all just about the next picture
Thank you guys :)
thats my database if necessary:
CREATE TABLE IF NOT EXISTS `tbl_pictures` (
`picture_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(60) NOT NULL,
`type` varchar(60) NOT NULL,
`filename` varchar(60) NOT NULL,
`title` mediumtext NOT NULL,
`description` mediumtext NOT NULL,
`date` varchar(60) NOT NULL,
PRIMARY KEY (`picture_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

If you're looking for the next picture in a sequence where they're ordered by insertion ID:
SELECT * FROM tbl_pictures WHERE picture_id>?
ORDER BY picture_id LIMIT 1
LIMIT affects only the results given, not how many records are actually searched.

Related

Pulling a random value out of a table is returning a null value

I have a stored procedure that I've used to 'de-identify' client information when I want to use it in a test environment. I am replacing actual names and addresses with random values. I have database tables in a database called dict (for dictionary) for female names, male names, last names, and addresses.
Each of these has a field called f_row_id that is a sequential number from 1 to x, one for each record in the table.
We recently upgraded to mySQL 8 and the stored procedure quit working. I ended up with NULL for every field where I tried filling in a random value out of the other table. In trying to find what will now work, I'm unable to get the following query to work as I expect:
SELECT
f_enroll_id,
(SELECT f_name FROM dict.dummy_female_first_name fn WHERE fn.f_row_id = (FLOOR(RAND() * 850) + 1) LIMIT 1)
FROM
t_enroll
My data table (that I eventually want to have contain random names) is called t_enroll. There is an ID field in that (f_enroll_id) I want to get a list of each ID and a random first name for each record in that table.
There are 850 records in the table of random first names (dummy_female_first_name) (in my stored procedure this is a session variable that I compute at the start of the procedure).
When I first tried running this I got an error that my sub-query returned more than one value. I don't understand why it would do that since (FLOOR(RAND() * 850) + 1) should return a single integer. So I added the LIMIT 1. But when I run this, about half of the returned rows have NULL for the first name.
I have verified that all the rows in my first name table have a row ID, that the row ID is unique, and there not any gaps in the numbers.
What do you think is causing this?
Thanks in advance!
Here is the schema for the table that I'm updating:
CREATE TABLE `t_enroll` (
`f_enroll_id` int(15) NOT NULL AUTO_INCREMENT,
`f_status` int(2) DEFAULT NULL,
`f_date_enrolled` date NOT NULL DEFAULT '0000-00-00',
`f_first_name` varchar(20) DEFAULT NULL,
`f_mi` char(1) DEFAULT NULL,
`f_last_name` varchar(20) NOT NULL DEFAULT '',
`f_maiden_name` varchar(20) DEFAULT NULL,
`f_dob` date NOT NULL DEFAULT '0000-00-00',
`f_date_fee_received` date NOT NULL DEFAULT '0000-00-00',
`f_gender` int(11) NOT NULL DEFAULT '2',
`f_address_1` varchar(40) DEFAULT NULL,
`f_address_2` varchar(20) DEFAULT NULL,
`f_quadrant` char(2) DEFAULT NULL,
`f_city` varchar(25) DEFAULT NULL,
`f_state` char(2) NOT NULL DEFAULT '',
`f_county` varchar(3) NOT NULL,
`f_zip_code` varchar(10) DEFAULT NULL,
PRIMARY KEY (`f_enroll_id`),
KEY `f_date_enrolled` (`f_date_enrolled`),
KEY `f_last_name` (`f_last_name`),
KEY `f_first_name` (`f_first_name`),
KEY `f_dob` (`f_dob`),
KEY `f_gender` (`f_gender`)
ENGINE=InnoDB AUTO_INCREMENT=532 DEFAULT CHARSET=latin1 COMMENT='InnoDB free: 15360 kB';
Here is the schema for the dictionary table where I pull names from:
CREATE TABLE `dummy_female_first_name` (
`f_row_id` int(11) NOT NULL,
`f_name` varchar(25) NOT NULL,
PRIMARY KEY (`f_row_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
As I mentioned in my comment, I have found an alternate approach using the ORDER BY RAND() LIMIT 1 variation. But I am still curious as to what is going on that prevented my original method to fail. This is something that changed in the more recent mySQL version because it used to work.
Thanks again.
It is a much more expensive approach, but you can use:
SELECT f_enroll_id,
(SELECT f_name FROM dict.dummy_female_first_name fn ORDER BY rand() LIMIT 1)
FROM t_enroll;
You can make this more efficient using:
SELECT f_enroll_id,
(SELECT f_name
FROM dict.dummy_female_first_name fn
WHERE rand() < 0.01
ORDER BY rand() LIMIT 1
)
FROM t_enroll;
The where clause means that about 8 rows will filter through so the sorting will be much faster.

Speed Up A Large Insert From Select Query With Multiple Joins

I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop

How to optimized mysql query having large dataset

I have two tables with the following schema,
CREATE TABLE `open_log` (
`delivery_id` varchar(30) DEFAULT NULL,
`email_id` varchar(50) DEFAULT NULL,
`email_activity` varchar(30) DEFAULT NULL,
`click_url` text,
`email_code` varchar(30) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `sent_log` (
`email_id` varchar(50) DEFAULT NULL,
`delivery_id` varchar(50) DEFAULT NULL,
`email_code` varchar(50) DEFAULT NULL,
`delivery_status` varchar(50) DEFAULT NULL,
`tries` int(11) DEFAULT NULL,
`creation_ts` varchar(50) DEFAULT NULL,
`creation_dt` varchar(50) DEFAULT NULL,
`on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)
The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks,
Faisal Nasir
First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
FROM send_log sl
WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
);
Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);
Your query is slow for a variety of reasons:
The use of string concatenation makes it impossible for MySQL to use an index.
The select distinct in the subquery is unnecessary.
Exists can be faster than in.
If this request is often on, you can greatly increase it by create bigint id column, enven if it not unique.
For example you can put trigger and create column like this
alter table sent_log for_get bigint;
After that create trigger/ update it to put hash into that bigint
for_get=CONV(substr(md5(concat(email_id, delivery_id)),1,10),16,10)
If you have such column in both table and index on it, query will be like
SELECT *
FROM open_log ol
left join send_log sl on sl.for_get=ol.for_get
WHERE sl.email_id is not null and sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id;
That query will be fast.

Ordering in MySQL Bogs Down

I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing

database model for a rotating rounds system

Struggling with how to correctly model a database for a fairly simple app i'm putting together. App is just a simple thing i'm using to learn a few framework. Each week someone in the office has to take a turn bringing in beers. This is done on a rotation, we take it in turns following a specific order. Occasionally someone isn't around and they skip their turn but will have to take it the following week (or indeed the first subsequent week they can). Obviously we also have staff come and staff leave so this needs to be accounted for. Initially i'd modelled this in terms of drinkers and rounds. Each time someone bought a round they would be entered into the rounds table, whoever is first in the order with the least rounds is up next. This works perfectly until someone new joins as without inserting dummy data to cover all their "missing" rounds it will be their turn every time until they have caught up on the number of rounds everyone else has done.
This is currently what I have:
CREATE TABLE `rounds` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`drinker_id` int(11) NOT NULL,
`description` text COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`)
)
CREATE TABLE `drinkers` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`email` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
)
Any ideas how I can re-model this in a way that will allow for people coming, and going, and people skipping their turn?
Thanks.
You only need one table like
CREATE TABLE `drinkers` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`email` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`is_available` TINYINT(1) NOT NULL DEFAULT 0,
`last_buy` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
INDEX(`last_buy`)
)
and you can get the person who needs to buy drinks like this
SELECT * FROM drinkers
WHERE is_available
ORDER BY last_buy ASC
LIMIT 1
I would keep the drinkers (but add a flag if someone is sick - 0/1) and put rounds as this
CREATE TABLE `rounds` (
`drinker_id` int(11) NOT NULL,
`passed` int(11) NOT NULL,
`passed_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
UNIQUE KEY (`id`)
)
Then put in round a record for every user with his ID and 0 for passed, 0000-00-00 00:00:00 for passed at. When someone buys drinks mark that in the 'rounds' table, with making passed to passed + 1 and updating the date. If some of the drinkers leaves work, remove him from rounds. If a new one comes add him to rounds with passed equals to the max passed values (give them a fresh start :) ). If some one is sick, he obviously can not buy drinks so someone else should. To define who should buy drinks, run this
SELECT r.passed, r.drinker_id
FROM drinkers d
JOIN rounds r ON d.id = r.drinker_id
WHERE d.sick = 0
ORDER BY r.passed DESC, d.id ASC
This will give you the driker with least times buying a drink.
In rounds table you will have somethin like that
drinker id, passed, date_passed
====================================
1 0 0000-00-00
2 1 2013-03-01
3 0 0000-00-00