I am trying to think through this programming project and I would like some input.
I have two tables with records that can relate to other records in the same table or records in the other table.
Lets call table 1 book and table 2 bank. Both tables have the same layout.
+----+----------+--------+--------+-----------------+
| id | tranDate | refNum | amount | relationship_id |
+----+----------+--------+--------+-----------------+
What I am trying to figure out is how I can get an incremented relationship_id for every relationship and if I can do it only using MySQL code.
For example: To find records that can relate to other records in the same database I look for all records with a common refNum and see if the sum of their amount equals zero. If it does I want to relate them.
UPDATE book
LEFT JOIN (SELECT refnum AS matchedref
FROM book
WHERE relationship_id IS NULL
GROUP BY refnum
HAVING Sum(amount) = 0) AS t1
ON refnum = matchedref
SET relationship_id = ???
WHERE matchedref IS NOT NULL;
Related
I'm post-processing traces for two different kinds of events, where the data is stored in table A and B. Both tables have an producer ID and a time index value. While the same producer can trigger a record in both tables, the time when the different events occur are independent, and much more frequent for table B.
I want to update table A such that, for every row in table A, a column value from table B is taken for the most recent row in table B for the same producer.
Example mappings between two tables:
Here is a simplified example with just one producer in both tables. The goal is not to get the oldest entry in table B, but rather the most recent entry in table B relative to a row in table A. I'm showing B.tIdx < A.tIdx in this example, but <= is just as good for my purposes; just a detail.
Table A Table B
+----+------+----------------------+ +------+------+-------+
| ID | tIdx | NEW value SET FROM B | | ID | tIdx | value |
+----+------+----------------------+ +------+------+-------+
| 1 | 2 | 12.5 | | 1 | 1 | 12.5 |
| 1 | 4 | 4.3 | | 1 | 2 | 9.0 |
+----+------+----------------------+ | 1 | 3 | 4.3 |
| 1 | 4 | 7.8 |
| 1 | 5 | 6.2 |
+------+------+-------+
The actual tables have thousands of different IDs, millions of rows, and nearly as many distinct time index values as rows. I'm having trouble to come up with an UPDATE that doesn't take days to complete.
The following UPDATE works, but executes far too slowly; it starts off at a rate of 100s of updates/s, but soon slows to roughly 5 updates/s.
UPDATE A AS x SET value =
(SELECT value
FROM B AS y
WHERE x.ID = y.ID AND x.tIdx > y.tIdx
ORDER BY y.tIdx DESC
LIMIT 1);
I've tried creating indexes for ID and tIdx separately, but also multi-column indexes with both orders (ID,tIdx) and (tIdx,ID). But even when the multi-column indexes exist, EXPLAIN shows that it only ever indexes on ID or tIdx, but not both together.
I was wondering if the solution is to create nested SELECTs, to first get a temporary table with a particular ID, and then find the 1 row in table B that will meet the time constraint for each tIdx for that particular ID. The following SELECT, with hardcoded ID and tIdx, works and is very fast, completing in 0.00 sec.
SELECT value, ID, tIdx
FROM (
SELECT value, ID, tIdx
FROM B
WHERE ID = 5216
) y
WHERE tIdx < 1253707
ORDER BY tIdx DESC LIMIT 1;
I'd like to incorporate this into an UPDATE somehow, but replace the hardcoded ID and tIdx with the ID,tIdx pair for each row in A.
Or try any other suggestion for a more efficient UPDATE statement.
This is my first post to stackoverflow. Sincere apologizes in advance if I have violated any etiquette.
Update with Inner Join should do it, but it's going to get nasty to do this.
Update a INNER JOIN
(Select b.ID, maxb.atIdx, b.value
From b INNER JOIN (Select a.ID, a.tIdx as atIdx, max(b.tIdx) as bigb
From b INNER JOIN a
ON b.ID=a.ID
Where b.tIdx<=a.tIdx
Group By a.ID,a.tIdx) maxb
ON b.ID=maxb.ID and b.tIdx=maxb.bigb
) bestb ON a.ID=bestb.ID and a.tIdx=bestb.atIdx
Set a.value=bestb.value
To explain this it's best to start with the innermost SQL and work your way to the outermost UPDATE. To start, we need to join every record in table A to every record in table B for each ID. We can filter out the B records that are too recent and summarize that result for each table A record. That leaves us with the tIdx of the B table whose value goes into A for every record key in A. So then we join that to the B table to select the values to update, preserving the A-table's keys. That result is joined back to A to perform the update.
You'll have to see whether this is fast enough for you - I'm worried that this accesses the B table twice and the inner query creates A LOT of join combinations. I would pull out that inner query and see how long it runs by itself. On the positive side, they are all very simple, straightforward queries and they are connected by Inner Joins so there is some opportunity for efficiency in the query optimizer. I think indexes on a(ID,TIdx) [fast lookup to get the Update row] and b(ID) would be useful here.
One thing you can try is lead() to see if that helps the performance:
UPDATE A JOIN
(SELECT b.*,
LEAD(tIDx) OVER (PARTITION BY id order by tIDx) as next_tIDx
FROM b
) b
ON a.id = b.id AND
a.tIDx >= b.tIDx AND
(b.next_tIDx IS NULL or a.tIDx < b.next_tIDx)
SET a.value = b.value;
And for this you want an index on b(id, tidx).
I have a database called global and in that database I have groups table and reservations table, and the relation is one to many (one group can have many reservations).
What I wanted to do was to display the group name along with the first check in date and the last check out date of the reservations.
I am developing using codeigniter 3.x.
Assuming you have this structure of tables:
GROUPS
+--------+-----------+
| id | Name |
+--------+-----------+
and:
RESERVATIONS
+--------+-----------+------------------------------+------------------------------+
| id | Groupid | ReservationCheckinDate | ReservationCheckoutDate |
+--------+-----------+------------------------------+------------------------------+
the SQL to get the desired result would be:
SELECT MIN(ReservationCheckinDate), MAX(ReservationCheckoutDate), Name FROM RESERVATIONS
JOIN GROUPS ON GROUPS.id = RESERVATIONS.Groupid
GROUP BY Name
ORDER BY Name;
hope it helps
I have two tables - books and images. books has columns like id, name, releasedate, purchasecount. images has bookid (which is same as the id in books, basically one book can have multiple images. Although I haven't set any foreign key constraint), bucketid, poster (each record points to an image file in a certain bucket, for a certain bookid).
Table schema:
poster is unique in images, hence it is a primary key.
Covering index on books: (name, id, releasedate)
Covering index on images: (bookid,poster,bucketid)
My query is, given a name, find the top ten books (sorted by number of purchasecount) from the books table whose name matches that name, and for that book, return any (preferably the first) record (bucketid and poster) from the images table.
Obviously this can be solved by two queries by running the first, and using its results to query the images table, but that will be slow, so I want to use 'join' and subquery to do it in one go. However, what I am trying is not giving me correct results:
select books.id,books.name,year(releasedate),purchasecount,bucketid,poster from books
inner join (select bucketid,bookid, poster from images) t on
t.bookid = books.id where name like "%foo%" order by purchasecount desc limit 2;
Can anybody suggest an optimal query to fetch the result set as desired here (including any suggestion to change the table schema to improve search time) ?
Updated fiddle: http://sqlfiddle.com/#!9/17c5a8/1.
The example query should return two results - fooe and fool, and one (any of the multiple posters corresponding to each book) poster for each result. However I am not getting correct results. Expected:
fooe - 1973 - 459 - 11 - swt (or fooe - 1973 - 459 - 11 - pqr)
fool - 1963 - 456 - 12 - xxx (or fool - 1963 - 456 - 111 - qwe)
I agree with Strawberry about the schema. We can discuss ideas for better performance and all that. But here is my take on how to solve this after a few chats and changes to the question.
Note below the data changes to deal with various boundary conditions which include books with no images in that table, and tie-breaks. Tie-breaks meaning using the max(upvotes). The OP changed the question a few times and added a new column in the images table.
Modified quetion became return 1 row make per book. Scratch that, always 1 row per book even if there are no images. The image info to return would be the one with max upvotes.
Books table
create table books
( id int primary key,
name varchar(1000),
releasedate date,
purchasecount int
) ENGINE=InnoDB;
insert into books values(1,"fool","1963-12-18",456);
insert into books values(2,"foo","1933-12-18",11);
insert into books values(3,"fooherty","1943-12-18",77);
insert into books values(4,"eoo","1953-12-18",678);
insert into books values(5,"fooe","1973-12-18",459);
insert into books values(6,"qoo","1983-12-18",500);
Data Changes from original question.
Mainly the new upvotes column.
The below includes a tie-break row added.
create table images
( bookid int,
poster varchar(150) primary key,
bucketid int,
upvotes int -- a new column introduced by OP
) ENGINE=InnoDB;
insert into images values (1,"xxx",12,27);
insert into images values (5,"pqr",11,0);
insert into images values (5,"swt",11,100);
insert into images values (2,"yyy",77,65);
insert into images values (1,"qwe",111,69);
insert into images values (1,"blah_blah_tie_break",111,69);
insert into images values (3,"qwqqe",14,81);
insert into images values (1,"qqawe",8,45);
insert into images values (2,"z",81,79);
Visualization of a Derived Table
This is just to assist in visualizing an inner piece of the final query. It demonstrates the gotcha for tie-break situations, thus the rownum variable. That variable is reset to 1 each time the bookid changes otherwise it increments. In the end (our final query) we only want rownum=1 rows so that max 1 row is returned per book (if any).
Final Query
select b.id,b.purchasecount,xDerivedImages2.poster,xDerivedImages2.bucketid
from books b
left join
( select i.bookid,i.poster,i.bucketid,i.upvotes,
#rn := if(#lastbookid = i.bookid, #rn + 1, 1) as rownum,
#lastbookid := i.bookid as dummy
from
( select bookid,max(upvotes) as maxup
from images
group by bookid
) xDerivedImages
join images i
on i.bookid=xDerivedImages.bookid and i.upvotes=xDerivedImages.maxup
cross join (select #rn:=0,#lastbookid:=-1) params
order by i.bookid
) xDerivedImages2
on xDerivedImages2.bookid=b.id and xDerivedImages2.rownum=1
order by b.purchasecount desc
limit 10
Results
+----+---------------+---------------------+----------+
| id | purchasecount | poster | bucketid |
+----+---------------+---------------------+----------+
| 4 | 678 | NULL | NULL |
| 6 | 500 | NULL | NULL |
| 5 | 459 | swt | 11 |
| 1 | 456 | blah_blah_tie_break | 111 |
| 3 | 77 | qwqqe | 14 |
| 2 | 11 | z | 81 |
+----+---------------+---------------------+----------+
The significance of the cross join is merely to introduce and set starting values for 2 variables. That is all.
The results are the top ten books in descending order of purchasecount with the info from images if it exists (otherwise NULL) for the most upvoted image. The image selected honors tie-break rules picking the first one as mentioned above in the Visualization section with rownum.
Final Thoughts
I leave it to the OP to wedge in the appropriate where clause at the end as the sample data given had no useful book name to search on. That part is trivial. Oh, and do something about the schema for the large width of your primary keys. But that is off-topic at the moment.
I have three tables namely the class_record, class_violation, violation.
The table class_record has these columns and data:
Class Violation CR No. | Class ID
000-000 | A30-000
000-001 | A30-000
The table class_violation has these columns and data:
Class Violation CR No. | Violation ID
000-000 | 2
000-000 | 1
000-001 | 2
000-001 | 4
000-001 | 3
The table violation has these columns and data:
Violation ID | First Amount | Second Amount
1 | 1000 | 2000
2 | 200 | 400
3 | 500 | 1000
4 | 500 | 1000
The table class_record contains the information of the class record.
The table class_violation is the table that contains what are the violations that is committed. And lastly, the table violation contains the information about the violations.
If the violation is committed twice, the second amount will be triggered instead of the first amount. As you can see on table class_violation, on Violation ID column, number 2 violation id is committed twice. The second amount of the must be the charged amount instead of the first amount. So the total charged amount will be the first amount plus the second amount if committed twice. My question is that how do I get the get the second amount instead of the first amount and get its total amount of the violations committed? So far here is my SQL query but is terribly wrong:
SELECT SUM(`First Amount`)
FROM violation
WHERE `Violation ID`
IN (SELECT `Violation ID` FROM class_violation
WHERE `Class Violation No.`
IN (SELECT `Class Violation CR No.`
FROM class_record WHERE `Class ID` = 'A30-000'))
Please help me. Sorry for my english. The result of the query must be:
SUM
2600
Here is my sqlfiddle link: http://sqlfiddle.com/#!2/2712a
See this query:
SELECT class_record.ClassID,SUM(A.VioAmount),GROUP_CONCAT(A.VioAmount)
FROM class_record
INNER JOIN (SELECT class_record.ClassID, class_record.CR_No, IF(COUNT(violation.ViolationID)=1, SUM(FAmount),(FAmount+SAmount)) AS VioAmount
FROM violation
INNER JOIN class_violation ON violation.ViolationID = class_violation.ViolationID
INNER JOIN class_record ON class_violation.CR_No = class_record.CR_No
WHERE ClassID = 'A30-000'
GROUP BY violation.ViolationID, class_record.ClassID) A ON A.ClassID = class_record.ClassID
AND A.CR_No = class_record.CR_No
GROUP BY class_record.ClassID
You can check the SQLFiddle too at http://sqlfiddle.com/#!2/0d855b/35
Now I am taking the values in a separate Select and then JOINing it with another to obtain the SUM. Hope this solves your problem. All the Best.
You need return column "Second amount" if count(Violation ID)=2 and "First amount" if count(Violation ID)=1?
If I understand your question, look for "CASE" operator.
I have a fairly complicated operation that I'm trying to perform with just one SQL query but I'm not sure if this would be more or less optimal than breaking it up into n queries. Basically, I have a table called "Users" full of user ids and their associated fb_ids (id is the pk and fb_id can be null).
+-----------------+
| id | .. | fb_id |
|====|====|=======|
| 0 | .. | 12345 |
| 1 | .. | 31415 |
| .. | .. | .. |
+-----------------+
I also have another table called "Friends" that represents a friend relationship between two users. This uses their ids (not their fb_ids) and should be a two-way relationship.
+----------------+
| id | friend_id |
|====|===========|
| 0 | 1 |
| 1 | 0 |
| .. | .. |
+----------------+
// user 0 and user 1 are friends
So here's the problem:
We are given a particular user's id ("my_id") and an array of that user's Facebook friends (an array of fb_ids called fb_array). We want to update the Friends table so that it honors a Facebook friendship as a valid friendship among our users. It's important to note that not all of their Facebook friends will have an account in our database, so those friends should be ignored. This query will be called every time the user logs in so it can update our data if they've added any new friends on Facebook. Here's the query I wrote:
INSERT INTO Friends (id, friend_id)
SELECT "my_id", id FROM Users WHERE id IN
(SELECT id FROM Users WHERE fb_id IN fb_array)
AND id NOT IN
(SELECT friend_id FROM Friends WHERE id = "my_id")
The point of the first IN clause is to get the subset of all Users who are also your Facebook friends, and this is the main part I'm worried about. Because the fb_ids are given as an array, I have to parse all of the ids into one giant string separated by commas which makes up "fb_array." I'm worried about the efficiency of having such a huge string for that IN clause (a user may have hundreds or thousands of friends on Facebook). Can you think of any better way to write a query like this?
It's also worth noting that this query doesn't maintain the dual nature of a friend relationship, but that's not what I'm worried about (extending it for this would be trivial).
If I am not mistaken, your query can be simplified, if you have a UNIQUE constraint on the combination (id, friend_id), to:
INSERT IGNORE INTO Friends
(id, friend_id)
SELECT "my_id", id
FROM Users
WHERE fb_id IN fb_array ;
You should have index on User (fb_id, id) and test for efficiency. if the number of the itmes in the array is too big (more than a few thousands), you may have to split the array and run the query more than once. Profile with your data and settings.
Depends on if if the following columns are nullable (value can be NULL):
USERS.id
FRIENDS.friend_id
Nullable:
SELECT DISTINCT
"my_id", u.id
FROM Users u
WHERE u.fb_id IN fb_array
AND u.id NOT IN (SELECT f.friend_id
FROM FRIENDS f
WHERE f.id = "my_id")
Not Nullable:
SELECT "my_id", u.id
FROM Users u
LEFT JOIN FRIENDS f ON f.friend_id = u.id
AND f.id = "my_id"
WHERE u.fb_id IN fb_array
AND f.fried_id IS NULL
For more info:
http://explainextended.com/2010/05/27/left-join-is-null-vs-not-in-vs-not-exists-nullable-columns/
http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/
Speaking to the number of values in your array
The tests run in the two articles mentioned above contain 1 million rows, with 10,000 distinct values.