Use specific mysql index with rails - mysql

I have this ActiveRecord query
issue = Issue.find(id)
issue.articles.includes(:category).merge(Category.where(permalink: perma))
And the translated to mysql query
SELECT `articles`.`id` AS t0_r0, `articles`.`title` AS t0_r1,
`articles`.`hypertitle` AS t0_r2, `articles`.`html` AS t0_r3,
`articles`.`author` AS t0_r4, `articles`.`published` AS t0_r5,
`articles`.`category_id` AS t0_r6, `articles`.`issue_id` AS t0_r7,
`articles`.`date` AS t0_r8, `articles`.`created_at` AS t0_r9,
`articles`.`updated_at` AS t0_r10, `articles`.`photo_file_name` AS t0_r11,
`articles`.`photo_content_type` AS t0_r12, `articles`.`photo_file_size` AS t0_r13,
`articles`.`photo_updated_at` AS t0_r14, `categories`.`id` AS t1_r0,
`categories`.`name` AS t1_r1, `categories`.`permalink` AS t1_r2,
`categories`.`created_at` AS t1_r3, `categories`.`updated_at` AS t1_r4,
`categories`.`issued` AS t1_r5, `categories`.`order_articles` AS t1_r6
FROM `articles` LEFT OUTER JOIN `categories` ON
`categories`.`id` = `articles`.`category_id` WHERE
`articles`.`issue_id` = 409 AND `categories`.`permalink` = 'Διεθνή' LIMIT 1
In the explation of this query I saw that uses wrong index
+----+-------------+------------+-------+---------------------------------------------------------------------------+-------------------------------+---------+-------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+-------+---------------------------------------------------------------------------+-------------------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | categories | const | PRIMARY,index_categories_on_permalink | index_categories_on_permalink | 768 | const | 1 | 100.00 | |
| 1 | SIMPLE | articles | ref | index_articles_on_issue_id_and_category_id, index_articles_on_category_id | index_articles_on_category_id | 2 | const | 10 | 100.05 | Using where |
+----+-------------+------------+-------+---------------------------------------------------------------------------+-------------------------------+---------+-------+------+----------+-------------+
I have two indexes, category_id alone and issue_id - category_id.
In this query I'm searching with issue_id and category_id which is much faster when using the index_articles_on_issue_id_and_category_id than the index_articles_on_category_id.
How can I select the correct index with active record query?

You can facilitate arel like so to use an index:
class Issue
def self.use_index(index)
# update: OP fixed my mistake
from("#{self.table_name} USE INDEX(#{index})")
end
end
# then
Issue.use_index("bla").where(some_condition: true)

Add use_index to ActiveRecord::Relation.
There was some discussion for multiple years about adding this to Rails core, however, it looks like the PR and branch got abandoned.
If you are aware of what database you're using and the limitations of this, you can easily add this to your own codebase so it can be used.
Very similar to #krichard's solution except more generalized for all models to use instead of just Issue.
config/initializers/active_record_relation.rb
class ActiveRecord::Relation
# Allow passing index hints to MySQL in case the query planner gets confused.
#
# Example:
# Message.first.events.use_index( :index_events_on_eventable_type_and_eventable_id )
# #=> Event Load (0.5ms) SELECT `events`.* FROM `events` USE INDEX (index_events_on_eventable_type_and_eventable_id) WHERE `events`.`eventable_id` = 123 AND `events`.`eventable_type` = 'Message'
#
# MySQL documentation:
# https://dev.mysql.com/doc/refman/5.7/en/index-hints.html
#
# See https://github.com/rails/rails/pull/30514
#
def use_index( index_name )
self.from( "#{ self.quoted_table_name } USE INDEX ( #{ index_name } )" )
end
end
This will allow you to use something like:
issue.articles.includes( :category ).use_index( :index_articles_on_issue_id_and_category_id )
And the resulting SQL will include:
FROM articles USE INDEX( index_articles_on_issue_id_and_category_id )

Related

Loading quoted numbers into snowflake table from CSV with COPY TO <TABLE>

I have a problem with loading CSV data into snowflake table. Fields are wrapped in double quote marks and hence there is problem with importing them into table.
I know that COPY TO has CSV specific option FIELD_OPTIONALLY_ENCLOSED_BY = '"'but it's not working at all.
Here are some pices of table definition and copy command:
CREATE TABLE ...
(
GamePlayId NUMBER NOT NULL,
etc...
....);
COPY INTO ...
FROM ...csv.gz'
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT"
;
Csv file looks like this:
"3922000","14733370","57256","2","3","2","2","2019-05-23 14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000"
I get an error
'''Numeric value '"3922000"' is not recognized '''
I'm pretty sure it's because NUMBER value is interpreted as string when snowflake is reading "" marks, but since I use
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
it shouldn't even be there... Does anyone have some solution to this?
Maybe something is incorrect with your file? I was just able to run the following without issue.
1. create the test table:
CREATE OR REPLACE TABLE
dbNameHere.schemaNameHere.stacko_58322339 (
num1 NUMBER,
num2 NUMBER,
num3 NUMBER);
2. create test file, contents as follows
1,2,3
"3922000","14733370","57256"
3,"2",1
4,5,"6"
3. create stage and put file in stage
4. run the following copy command
COPY INTO dbNameHere.schemaNameHere.STACKO_58322339
FROM #stageNameHere/stacko_58322339.csv.gz
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 0
ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "CONTINUE";
4. results
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| stageNameHere/stacko_58322339.csv.gz | LOADED | 4 | 4 | 4 | 0 | NULL | NULL | NULL | NULL |
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.436s
5. view the records
>SELECT * FROM dbNameHere.schemaNameHere.stacko_58322339;
+---------+----------+-------+
| NUM1 | NUM2 | NUM3 |
|---------+----------+-------|
| 1 | 2 | 3 |
| 3922000 | 14733370 | 57256 |
| 3 | 2 | 1 |
| 4 | 5 | 6 |
+---------+----------+-------+
Can you try with a similar test as this?
EDIT: A quick look at your data shows many of your numeric fields appear to start with commas, so something definitely amiss with the data.
Assuming your numbers are European formatted , decimal place, and . thousands, reading the numeric formating help, it seems Snowflake does not support this as input. I'd open a feature request.
But if you read the column in as text then use REPLACE like
SELECT '100,1234'::text as A
,REPLACE(A,',','.') as B
,TRY_TO_DECIMAL(b, 20,10 ) as C;
gives:
A B C
100,1234 100.1234 100.1234000000
safer would be to strip placeholders first like
SELECT '1.100,1234'::text as A
,REPLACE(A,'.') as B
,REPLACE(B,',','.') as C
,TRY_TO_DECIMAL(C, 20,10 ) as D;

Django query on related model

I have models like below
class Scheduler(models.Model):
id = <this is primary key>
last_run = <referencing to id in RunLogs below>
class RunLogs(models.Model):
id = <primary key>
scheduler = <referencing to id in Scheduler above>
overall_status = <String>
Only when the scheduler reaches the scheduled time of the job, RunLogs entry is created.
Now I am querying on RunLogs to show running schedules as below.
current = RunLog.objects\
.filter(Q(overall_status__in = ("RUNNING", "ON-HOLD", "QUEUED") |
Q(scheduler__last_run__isnull = True))
The above query gives me all records with matching status from RunLogs but does not give me records from Scheduler with last_run is null.
I understand why the query is behaving so but is there a way to get records from scheduler also with last_run is null
?
I just did the same steps which you followed and found the reason why you where getting all the records after running your query. Here is the exact steps and a solution for this.
Steps
Created models
from django.db import models
class ResourceLog(models.Model):
id = models.BigIntegerField(primary_key=True)
resource_mgmt = models.ForeignKey('ResourceMgmt', on_delete=models.DO_NOTHING,
related_name='cpe_log_resource_mgmt')
overall_status = models.CharField(max_length=8, blank=True, null=True)
class ResourceMgmt(models.Model):
id = models.BigIntegerField(primary_key=True)
last_run = models.ForeignKey(ResourceLog, on_delete=models.DO_NOTHING, blank=True, null=True)
Added the data as following:
resource_log
+----+----------------+------------------+
| id | overall_status | resource_mgmt_id |
+----+----------------+------------------+
| 1 | RUNNING | 1 |
| 2 | QUEUED | 1 |
| 3 | QUEUED | 1 |
+----+----------------+------------------+
resource_mgmt
+----+-------------+
| id | last_run_id |
+----+-------------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
| 4 | 3 |
+----+-------------+
According to the above table resource_mgmt(4) is referring to resource_log(3). But thing to be noted is, resource_log(3) is not referring to resource_mgmt(4).
Ran the following command in python shell
In [1]: resource_log1 = ResourceLog.objects.get(id=1)
In [2]: resource_log.resource_mgmt
Out[2]: <ResourceMgmt: ResourceMgmt object (1)>
In [3]: resource_log1 = ResourceLog.objects.get(id=2)
In [4]: resource_log.resource_mgmt
Out[4]: <ResourceMgmt: ResourceMgmt object (1)
In [5]: resource_log1 = ResourceLog.objects.get(id=3)
In [6]: resource_log.resource_mgmt
Out[6]: <ResourceMgmt: ResourceMgmt object (1)>
from this we can understand that all the resource_log objects are referring to 1st object of resource_mgmt(ie, id=1).
Q) Why all the objects are referring to 1st object in the resource_mgmt?
resource_mgmt is a foreign key field which is not null. Its default value is 1. when you create a resource_log object, if you are not specifying resource_mgmt, it will add the default value there which is 1.
Run your query
In [60]: ResourceLog.objects.filter(resource_mgmt__last_run__isnull = True)
Out[60]: <QuerySet [<ResourceLog: ResourceLog object (1)>, <ResourceLog: ResourceLog object (2)>, <ResourceLog: ResourceLog object (3)>]>
This query is returning all three ResourceLog objects because all three are referring to 1st resource_mgmt object which has its is_null value as True
Solution
You actually want to check the reverse relationship.
We can achieve this using two queries:
rm_ids = ResourceMgmt.objects.exclude(last_run=None).values_list('last_run', flat=True)
current = ResourceLog.objects.filter(overall_status__in = ("RUNNING", "QUEUED")).exclude(id__in=rm)
The output is:
<QuerySet [<ResourceLog: ResourceLog object (1)>, <ResourceLog: ResourceLog object (2)>]>
Hope that helps!

Two things to do in MySQL IF()

I have a problem concerning the IF() function in MySQL.
I would like to return a string and change the value of a variable. Somwhat like:
IF(#firstRow=1, "Dear" AND #firstRow:=0, "dear")
This outputs only '0' instead of 'Dear'...
I would be very thankful for some input on ways I could solve this problem!
Louis :)
AND is a boolean operator, not a "also do this other thing" operator.
"Dear" AND 0 returns 0 because 0 is treated as false in MySQL and <anything> AND false will return false.
Also because the integer/boolean value of "Dear" is 0 as well. Using a string in a numeric context just reads initial digits in the string, if any, and ignores the rest.
It's not clear what your problem is, but I guess you want to capitalize the word "dear" if the row is the first one in the result set.
Instead of being too clever by half trying to fit the side-effect into your expression, do yourself a favor and break it out into a separate column:
mysql> SELECT IF(#firstRow=1, 'Dear', 'dear'), #firstRow:=0 AS _ignoreThis
-> FROM (SELECT #firstRow:=1) AS _init
-> CROSS JOIN
-> mytable;
+---------------------------------+-------------+
| IF(#firstRow=1, 'Dear', 'dear') | _ignoreThis |
+---------------------------------+-------------+
| Dear | 0 |
| dear | 0 |
| dear | 0 |
+---------------------------------+-------------+
But if you really want to make your code as confusing and unreadable as possible, you can do something like this:
SELECT IF(#firstRow=1, CONCAT('Dear', IF(#firstRow:=0, '', '')), 'dear')
FROM (SELECT #firstRow:=1) AS _init
CROSS JOIN
...
But remember this important metric of code quality: WTFs per minute.
Use a case expression instead of IF() as the syntax is far easier to follow e.g.
select
case when #firstRow = 1 then 'Dear' else 'dear' end AS Salutation
, #firstRow := 0
from (
select 1 n union all
select 2 n union all
select 3
) d
cross join (SELECT #firstRow:=1) var
+---+------------+----------------+
| | Salutation | #firstRow := 0 |
+---+------------+----------------+
| 1 | Dear | 0 |
| 2 | dear | 0 |
| 3 | dear | 0 |
+---+------------+----------------+
Demo

Periodic "Opening tables" on MySQL Insert

I have a long-running insert that shows a status that toggles between "NULL" and "Opening tables"
| 5 | mckelvey | mushroom.jpl.nasa.gov:57050 | smap_ampcs_v5_2_0 | Query | 7105 | Opening tables | INSERT INTO ChannelValue
| 5 | mckelvey | mushroom.jpl.nasa.gov:57050 | smap_ampcs_v5_2_0 | Query | 7114 | NULL | INSERT INTO ChannelValue
It does this continually. Show open tables shows 10 opened tables, 5 in use.
Global status "Opened_tables" is 190 and does not increase. table_open_cache is 1024.
Is something wrong? Why does it constantly open tables for an insert already in progress?
The insert is below. The tables in the v4 database are MyISAM, and v5_2 are InnoDB. ComboParent is temporary and MyISAM. No errors in the server log.
INSERT INTO ChannelValue
SELECT
cv.sessionId,
cv.hostId,
cv.sessionFragment,
1 AS id,
cv.id AS channelDataId,
cp.parentId,
NULL AS uniqueId,
IF (cv.dnUnsignedValue IS NOT NULL,
CONVERT(cv.dnUnsignedValue, SIGNED),
cv.dnIntegerValue) AS dnPackedValue,
cv.dnDoubleValue,
sv.stringId,
cv.eu,
SET_FLAGS(cv.dnDoubleFlag,
cv.euFlag,
cv.dnAlarmState,
cv.euAlarmState,
cv.dnUnsignedValue) AS flags
FROM smap_ampcs_v4_0_0.ChannelValue AS cv
STRAIGHT_JOIN ComboParent AS cp
ON ((cv.sessionId = cp.sessionId) AND
(cv.hostId = cp.hostId) AND
(cv.sessionFragment = cp.sessionFragment) AND
(cv.sclkCoarse = cp.sclkCoarse) AND
(cv.sclkFine = cp.sclkFine) AND
(cv.ertCoarse = cp.ertCoarse) AND
(cv.ertFine = cp.ertFine) AND
(cv.scetCoarse = cp.scetCoarse) AND
(cv.scetFine = cp.scetFine) AND
(cv.dssId = cp.dssId) AND
(cv.vcid <=> cp.vcid) AND
(cv.isRealtime = cp.isRealtime))
LEFT JOIN StringValue AS sv
ON (
(cv.hostId = sv.hostId) AND
(cv.sessionId = sv.sessionId) AND
(cv.sessionFragment = sv.sessionFragment) AND
(0 = sv.fromSse) AND
(cv.dnStringValue = sv.stringValue)
)
WHERE (cv.packetId IS NULL)

SQL Server 2008: U and X locks - deadlock on one table without any indexes. How?

I observe really strange behavior of my DB.
I have one small table (about 300 rows) where one field is continuously updated.
And I was getting a lot of deadlocks there - update of the table was deadlocking the similar update of the same table (U lock vs X lock).
So I decided to remove the clustered index (so table doesn't have any indexes now) to fix the deadlocks. But it didn't help and now I'm getting the deadlock between the U and X lock modes.
So one table, no indexes and 2 sessions updating one table
Victim
update dbo.MyNumber set
#nextno = nextno = nextno + 1
where [type] = #type
and yearid = #yearid
Winning query:
update dbo.MyNumber set
#nextno = nextno = nextno + 1
where [type] = #TYPE
and yrclosedyn = 0
Rows are definitely different but the page is the same.
How Is it possible? Maybe it is connected to the lock escalation, or ...?
I really appreciate any suggestions.
Thanks in advance
Mike
DEADLOCK XML:
<deadlock-list>
<deadlock victim="process6c492e8">
<process-list>
<process id="processb6a988" taskpriority="0" logused="1848" waitresource="RID: 5:1:127478:16" waittime="3478" ownerId="17153439" transactionname="user_transaction" lasttranstarted="2012-12-18T12:31:40.147" XDES="0xffffffff89482258" lockMode="U" schedulerid="7" kpid="4248" status="suspended" spid="98" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2012-12-18T12:31:49.913" lastbatchcompleted="2012-12-18T12:31:49.913" clientapp="PenAIR" hostname="S16047425" hostpid="9300" loginname="sa" isolationlevel="read committed (2)" xactid="17153439" currentdb="5" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
<executionStack>
<frame procname="MYDATABASE.dbo.MyStoredProcedure" line="92" stmtstart="9062" stmtend="9388" sqlhandle="0x030005002d15a05e58b5710016a100000100000000000000">
UPDATE dbo.MyNumber Set
#NEXTNO = NEXTNO = NEXTNO + 1
WHERE (TYPE = #TYPE) AND (YRCLOSEDYN = 0) </frame>
</executionStack>
<inputbuf>
Proc [Database Id = 5 Object Id = 1587549485] </inputbuf>
</process>
<process id="process6c492e8" taskpriority="0" logused="192" waitresource="RID: 5:1:127478:20" waittime="8252" ownerId="17153562" transactionname="user_transaction" lasttranstarted="2012-12-18T12:31:45.140" XDES="0x6583b1e0" lockMode="U" schedulerid="13" kpid="19824" status="suspended" spid="143" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2012-12-18T12:31:45.140" lastbatchcompleted="2012-12-18T12:31:45.140" clientapp="PenAIR" hostname="S16047425" hostpid="4760" loginname="sa" isolationlevel="read committed (2)" xactid="17153562" currentdb="5" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
<executionStack>
<frame procname="MYDATABASE.dbo.MyStoredProcedure" line="92" stmtstart="9062" stmtend="9388" sqlhandle="0x030005002d15a05e58b5710016a100000100000000000000">
UPDATE dbo.MyNumber Set
#NEXTNO = NEXTNO = NEXTNO + 1
WHERE ([TYPE] = #TYPE) AND (YRCLOSEDYN = 0) </frame>
</executionStack>
<inputbuf>
Proc [Database Id = 5 Object Id = 1587549485] </inputbuf>
</process>
</process-list>
<resource-list>
<ridlock fileid="1" pageid="127478" dbid="5" objectname="MYDATABASE.dbo.MyNumber" id="lock464f2640" mode="X" associatedObjectId="72057594131120128">
<owner-list>
<owner id="processb6a988" mode="X"/>
</owner-list>
<waiter-list>
<waiter id="process6c492e8" mode="U" requestType="wait"/>
</waiter-list>
</ridlock>
<ridlock fileid="1" pageid="127478" dbid="5" objectname="MYDATABASE.dbo.MyNumber" id="lockfffffffff1974980" mode="X" associatedObjectId="72057594131120128">
<owner-list>
<owner id="process6c492e8" mode="X"/>
</owner-list>
<waiter-list>
<waiter id="processb6a988" mode="U" requestType="wait"/>
</waiter-list>
</ridlock>
</resource-list>
</deadlock>
</deadlock-list>
Shredding your deadlock graph into tabular form shows the following.
+----------+-------------------------+-----------+-----------+------------+----------+--------------------+--------------------+---------+
| LockMode | LockedObject | TranCount | LockEvent | LockedMode | WaitMode | WaitResource | IsolationLevel | LogUsed |
+----------+-------------------------+-----------+-----------+------------+----------+--------------------+--------------------+---------+
| U | MYDATABASE.dbo.MyNumber | NULL | rid | X | U | RID: 5:1:127478:20 | read committed (2) | 192 |
| U | MYDATABASE.dbo.MyNumber | NULL | rid | X | U | RID: 5:1:127478:16 | read committed (2) | 1848 |
+----------+-------------------------+-----------+-----------+------------+----------+--------------------+--------------------+---------+
You still haven't answered my question in the comments as to whether the sequence generation code is only called once in every transaction.
It is easy to generate a deadlock graph similar to the one in your post if not.
Setup
CREATE TABLE dbo.MyNumber
(
[TYPE] CHAR(1),
YRCLOSEDYN INT,
NEXTNO INT
)
INSERT INTO dbo.MyNumber
VALUES ('X', 0, 1),
('Y', 0, 1)
GO
CREATE PROC MyStoredProcedure #TYPE CHAR(1),
#NEXTNO INT OUTPUT
AS
UPDATE dbo.MyNumber
SET #NEXTNO = NEXTNO = NEXTNO + 1
WHERE ( [TYPE] = #TYPE )
AND ( YRCLOSEDYN = 0 )
Connection 1
BEGIN TRAN
DECLARE #NEXTNO INT
EXEC MyStoredProcedure 'Y', #NEXTNO OUTPUT
WAITFOR DELAY '00:00:05'
EXEC MyStoredProcedure 'X', #NEXTNO OUTPUT
ROLLBACK
Connection 2
(Run immediately after executing the code in connection 1)
BEGIN TRAN
DECLARE #NEXTNO INT
EXEC MyStoredProcedure 'X', #NEXTNO OUTPUT
EXEC MyStoredProcedure 'Y', #NEXTNO OUTPUT
ROLLBACK
The deadlock graph output from that is very similar to the one above
+----------+-------------------------+-----------+-----------+------------+----------+-----------------+--------------------+---------+
| LockMode | LockedObject | TranCount | LockEvent | LockedMode | WaitMode | WaitResource | IsolationLevel | LogUsed |
+----------+-------------------------+-----------+-----------+------------+----------+-----------------+--------------------+---------+
| U | MYDATABASE.dbo.MyNumber | 2 | rid | X | U | RID: 11:1:144:1 | read committed (2) | 248 |
| U | MYDATABASE.dbo.MyNumber | 2 | rid | X | U | RID: 11:1:144:0 | read committed (2) | 248 |
+----------+-------------------------+-----------+-----------+------------+----------+-----------------+--------------------+---------+
If this is the explanation for your issue you will need to ensure that you update the Sequences in the same order in all transactions (I assume there must be some good reason why you can't just use an IDENTITY column based solution)