A little background for the issue:
In Venezuela, there is a law that defines how an special document called Withholding Receipt (issued when a company, designated by the Tax Administration, withheld taxes to be declared by the company and not the client, really confusing legal thing) will be identified and the information it will present. It says that it have to be numbered with the following format:
YYYYMMXXXXXXXX
Where YYYY represents the year, MM the month and XXXXXXXX represents an incremental number (up to 8 digits wide) that will be refreshed (start from 0 again) if overflowed.
I could've used a plain vanilla AUTO_INCREMENT field in order to solve this puzzle, however, the real issue begins here.
According to the Agents of the Tax Administration, the incremental numbering refreshes automatically each month, meaning Receipt No. 20151200000001 and No. 20160100000001 can exist on the database and dont collide.
This means, it makes impossible to use an AUTO_INCREMENT field since its value will be resetted to 0 each month.
What options can be used to solve this puzzle? Using of course, database features only.
PS: Can be in any database (including No-SQL).
PS2: year and month can different be fields on the table/document/entity.
Edit
I did some research on MySQL based on #Gordon Linoff answer, here is a working example
CREATE TABLE IF NOT EXISTS test (
id int(11) NOT NULL AUTO_INCREMENT,
invoice_no varchar(12) NOT NULL,
year int(11) NOT NULL,
month int(11) NOT NULL,
identifier int(11) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;
DELIMITER //
CREATE TRIGGER ins_tr BEFORE INSERT ON test
FOR EACH ROW BEGIN
SET #maxID = (SELECT COALESCE(MAX(identifier), 0)
FROM test
WHERE CONCAT(year, lpad(month, 2, '0')) = CONCAT(NEW.year, lpad(NEW.month, 2, '0'))
);
SET NEW.identifier = #maxID +1;
END
//
DELIMITER ;
INSERT INTO test (invoice_no, year, month) VALUES (1, 2015, 12), (2, 2015, 12), (3, 2016, 1), (4, 2016, 1);
Result:
+----+------------+------+-------+------------+
| id | invoice_no | year | month | identifier |
+----+------------+------+-------+------------+
| 1 | 1 | 2015 | 12 | 1 |
| 2 | 2 | 2015 | 12 | 2 |
| 3 | 3 | 2016 | 1 | 1 |
| 4 | 4 | 2016 | 1 | 2 |
+----+------------+------+-------+------------+
In researching on a way for MongoDB or any NoSQL engine.
You would implement this in a relational database using a trigger. The trigger would implement logic such as:
select new.TaxReceiptNumber := concat(date_format(curdate(), '%Y%m',
lpad(coalesce(max(right(TaxReceiptNumber, 8) + 0), 0), 8, '0')
from t
where left(TaxReceiptNumber, 6) = date_format(curdate(), '%Y%m');
I might be tempted to store the incremental number and date of the receipt in different columns. However, given that you have to work with tax authorities, it might be better to just have the number as a single column.
First, don't make this your primary key.
Second, store the current number somewhere. When you use it, increment it by 1.
Finally, on the first of the month or when it reaches a certain value, reset it to 1.
Related
I am running this procedure a few million times, and although each time it takes a few ms, eventually it takes a couple of weeks to run all of them. I was wondering if anyone could help me optimizing or improving its performance. Any improvement might save days!
CREATE PROCEDURE process_parameters(IN parameter1 VARCHAR(128), IN parameter2 VARCHAR(128), IN combination_type CHAR(1))
BEGIN
SET #parameter1_id := NULL, #parameter2_id := NULL;
SET #parameter1_hash := "", #parameter2_hash := "";
IF parameter1 IS NOT NULL THEN
SET #parameter_hash := parameter1;
INSERT IGNORE INTO `collection1` (`parameter`) VALUES (parameter1);
SET #parameter1_id := (SELECT `id` FROM `collection1` WHERE `parameter` = parameter1);
END IF;
IF parameter2 IS NOT NULL THEN
SET #parameter2_hash := parameter2;
INSERT IGNORE INTO `collection2` (`parameter`) VALUES (parameter2);
SET #parameter2_id := (SELECT `id` FROM `collection2` WHERE `parameter` = parameter2);
END IF;
SET #hash := MD5(CONCAT(#parameter1_hash, #parameter2_hash));
INSERT IGNORE INTO `combinations` (`hash`,`type`,`parameter1`,`parameter2`) VALUES (#hash, combination_type, #parameter1_id, #parameter2_id);
END
The logic behind of it is: I store unique combinations of (parameter1, parameter2) in combinations, where parameter1 or paramter2 can be NULL (but never both at the same time). I store a type in combinations to know later which parameter has value. To ensure that a combination is unique I added an MD5 field (a primary key (parameter1,parameter2) will not work because of comparison with NULL always returns NULL). Each parameter has a separate table (collection1 and collection2 respectively) to store their unique id. There are hundreds/thousands of unique parameter1 and parameter2, but their combinations are highly repeated and are much below the cardinal multiplication.
As an example, ("A", "1"), ("A", "2"), ("B", "1"), ("A", "1"), ("A", NULL), (NULL, "2") would yield:
`collection1` (`id`, `parameter`)
1, "A"
2, "B"
`collection2` (`id`, `parameter`)
1, "1"
2, "2"
`combinations` (`type`, `parameter1`, `parameter2`)
"P1andP2", 1, 1,
"P1andP2", 1, 2,
"P1andP2", 2, 1,
"P1Only", 1, NULL
"P2Only", NULL, 2
These are the definitions of the tables:
DESCRIBE `combinations`;
+-------------+-----------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------------------+------+-----+---------+----------------+
| combination | int(11) | NO | PRI | NULL | auto_increment |
| hash | char(32) | NO | UNI | NULL | |
| type | enum('P1andP2','P1Only','P2Only') | NO | | NULL | |
| parameter1 | int(11) | YES | | NULL | |
| parameter2 | int(11) | YES | | NULL | |
+-------------+-----------------------------------+------+-----+---------+----------------+
DESCRIBE `collection1`; (`collection2` is identical)
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| parameter | varchar(255) | NO | UNI | NULL | |
+-----------+--------------+------+-----+---------+----------------+
Any help will be appreciated!
Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.
Use LAST_INSERT_ID()
SET #parameter1_id := (SELECT `id` FROM `collection1`
WHERE `parameter` = parameter1);
can be replaced by
SELECT #parameter1_id := LAST_INSERT_ID();
It will avoid a round trip to the server.
Oops... The OP points out that the id won't be returned if the row is a dup. This is a workaround that might run faster:
INSERT INTO `collection1` (`parameter`)
VALUES (parameter1)
ON DUPLICATE KEY UPDATE
id = LAST_INSERT_ID(id);
SELECT #parameter1 := LAST_INSERT_ID(id);
It's a kludgy trick that is documented somewhere in the documentation. But; more below...
Shrink table
Do you really need combination? You have another UNIQUE key that could be used as the PRIMARY KEY. This might cut in half the time taken for the final INSERT.
This may (or may not) speed things up, but only because the row size shrinks: Instead of storing the md5 into CHAR(32), store UNHEX(md5) into BINARY(16).
Batch INSERT
Can you gather a bunch of these to INSERT at once? If you gather 1000 rows and string them into a single INSERT (actually 3 INSERTs, since 3 tables are involved), it will run literally 10 times as fast.
Because of needing the ids, it gets more complicated. You would need to batch things into collection1 and collection2; then work on combinations.
Since the "combination*" tables are essentially "normalization", see my discussion of how to batch them very efficiently: http://mysql.rjweb.org/doc.php/staging_table#normalization It involves 2 statements, one to insert new rows, the other to grab all the ids for the batch.
COALESCE
Get rid of #parameter*_hash and #hash completely. Change the use of #hash call to:
INSERT IGNORE INTO combinations (...) VALUES
( CONCAT(COALESCE(parameter1,''), COALESCE(parameter2, '')),
...)
Think of it this way... Each statement takes a non-trivial amount of time. (This shows up significantly in batching of inserts.) I'm getting rid of 4 statements at some expense due to adding complexity to one statement.
Settings
The most important might be innodb_flush_log_at_trx_commit = 2.
3 Streams
Write 3 procedures, each one with the code simplified to the particular type. Combining this with batching should further speed things up.
Potential issues
I think these two will get the same hash. Hence, only one row for these two:
("xyz", NULL)
(NULL, "xyz")
Be aware that INSERT IGNORE will burn ids if there is already a row with the given unique key. Because of this, keep an eye on running out of values with INT (only 2 billion). Changing to INT UNSIGNED would up it to 4B, still in 4 bytes.
I want to get the greatest (or lowest) value in a field for a specific value of a different field but I am a bit lost. I am already aware of answered questions on the topic, but I already have a join in my query and I can't apply the terrific answers I found on my specific problem.
I have two tables, namely register and records. Records has all (weather) stations listed once for each month (each stationid represented 12 times, if complete data exists, a stationid can thus not be presented more than 12 times), and register has all stations listed with some of their characteristics. For the sake of the example, the two tables look pretty much like this:
CREATE TABLE IF NOT EXISTS `records` (
`stationid` varchar(30),
`month` int(11),
`tmin` decimal(3,1),
`tmax` decimal(3,1),
`commentsmax` text,
`commentsmin` text,
UNIQUE KEY `webcode` (`stationid`,`month`)
);
INSERT INTO `records` (`stationid`, `month`, `tmin`, `tmax`, `commentsmin`, `commentsmax`) VALUES
('station1', 7, '10.0', '46.0', 'Extremely low temperature.', 'Very high temperature.'),
('station2', 7, '15.0', '48.0', 'Very low temperature.', 'Extremely low temperature.'),
('station1', 1, '-10', '15', 'Extremely low temperature.', 'Somewhat high temperature.');
CREATE TABLE IF NOT EXISTS `register` (
`stationid` varchar(30),
`stationname` varchar(40),
`stationowner` varchar(10),
`georegion` varchar(40),
`altitude` int(4),
KEY `stationid` (`stationid`)
);
INSERT INTO `register` (`stationid`, `stationname`, `stationowner`, `georegion`, `altitude`) VALUES
('station1', 'Halifax', 'Maria', 'the North', 16),
('station2', 'Leeds', 'Peter', 'the South', 240);
The desired output is:
+-------------+-------+-------+---------------+-----------+----------+-----------------------------+
| stationname | month | tmin | stationowner | georegion | altitude | commentsmin |
+-------------+-------+-------+---------------+-----------+----------+-----------------------------+
| Leeds | 7 | 15.0 | Peter | the South | 240 | Very low temperature |
| Halifax | 1 | -10.0 | Maria | the North | 16 | Extremely low temperature |
+-------------+-------+-------+---------------+-----------+----------+-----------------------------+
where each station appears only one with the lowest temperatures from table 'records', including some station properties from the table 'register'. I am using the following code:
SELECT register.stationname, records.month, min(records.tmin), register.stationowner, register.georegion, register.altitude, records.commentsmin FROM records INNER JOIN register ON records.stationid=register.stationid GROUP BY records.stationid ORDER BY min(tmin) ASC
but it doesn't give the correct bits of the records table corresponding to the lowest tmin values BY stationid when there are many records in the tables.
I have seen solutions like this one here: MySQL Greatest N Results with Join Tables, but I just can't get my head around applying it on my two tables. I would be grateful for any ideas!
SELECT stuff
FROM some_table x
JOIN some_other_table y
ON y.something = x.something
JOIN
( SELECT something
, MIN(something_other_thing) min_a
FROM that_other_table
GROUP
BY something
) z
ON z.something = y.something
AND z.min_a = y.another_thing;
Consider two tables like this:
TABLE: current
-------------------
| id | dept | value |
|----|------|-------|
| 4| A | 20 |
| 5| B | 15 |
| 6| A | 25 |
-------------------
TABLE: history
-------------------
| id | dept | value |
|----|------|-------|
| 1| A | 10 |
| 2| C | 10 |
| 3| B | 20 |
-------------------
These are just simple examples... in the actual system both tables have considerably more columns and considerably more rows (10k+ rows in current and 1M+ rows in history).
A client application is continuously (several times a second) inserting new rows into the current table, and 'moving' older existing rows from current to history (delete/insert within a single transaction).
Without blocking the client in this activity we need to take a consistent sum of values per dept across the two tables.
With transaction isolation level set to REPEATABLE READ we could just do:
SELECT dept, sum(value) FROM current GROUP BY dept;
followed by
SELECT dept, sum(value) FROM history GROUP BY dept;
and add the two sets of results together. BUT each query would block inserts on its respective table.
Changing the isolation level to READ COMMITTED and doing the same two SQLs would avoid blocking inserts, but now there is a risk of entries being double counted if moved from current to history while we are querying (since each SELECT creates its own snapshot).
Here's the question then.... what happens with isolation level READ COMMITTED if I do a UNION:
SELECT dept, sum(value) FROM current GROUP BY dept
UNION ALL
SELECT dept, sum(value) FROM history GROUP BY dept;
Will MySQL generate a consistent snapshot of both tables at the same time (thereby removing the risk of double counting) or will it still take snapshot one table first, then some time later take snapshot of the second?
I have not yet found any conclusive documentation to answer my question, so I went about trying to prove it instead. Although not proof in the scientific sense, my findings suggest a consistent snapshot is created for all tables in a UNION query.
Here's what I did.
Create the tables
DROP TABLE IF EXISTS `current`;
CREATE TABLE IF NOT EXISTS `current` (
`id` BIGINT NOT NULL COMMENT 'Unique numerical ID.',
`dept` BIGINT NOT NULL COMMENT 'Department',
`value` BIGINT NOT NULL COMMENT 'Value',
PRIMARY KEY (`id`));
DROP TABLE IF EXISTS `history`;
CREATE TABLE IF NOT EXISTS `history` (
`id` BIGINT NOT NULL COMMENT 'Unique numerical ID.',
`dept` BIGINT NOT NULL COMMENT 'Department',
`value` BIGINT NOT NULL COMMENT 'Value',
PRIMARY KEY (`id`));
Create a procedure that sets up 10 entries in the current table (id = 0, .. 9), then sits in a tight loop inserting 1 new row into current and 'moving' the oldest row from current to history. Each iteration is performed in a transaction, as a result the current table remains at a steady 10 rows, while the history table grows quickly. At any point in time min(current.id) = max(history.id) + 1
DROP PROCEDURE IF EXISTS `idLoop`;
DELIMITER $$
CREATE PROCEDURE `idLoop`()
BEGIN
DECLARE n bigint;
-- Populate initial 10 rows in current table if not already there
SELECT IFNULL(MAX(id), -1) + 1 INTO n from current;
START TRANSACTION;
WHILE n < 10 DO
INSERT INTO current VALUES (n, n % 10, n % 1000);
SET n = n + 1;
END WHILE;
COMMIT;
-- In tight loop, insert new row and 'move' oldest current row to history
WHILE n < 10000000 DO
START TRANSACTION;
-- Insert new row to current
INSERT INTO current values(n, n % 10, n % 1000);
-- Move oldest row from current to history
INSERT INTO history SELECT * FROM current WHERE id = (n - 10);
DELETE FROM current where id = (n - 10);
COMMIT;
SET n = n + 1;
END WHILE;
END$$
DELIMITER ;
Start this procedure running (this call won't return for some time - which is intentional)
call idLoop();
In another session on the same database we can now try out a variation on the UNION ALL query in my original posting.
I have modified it to (a) slow down execution,and (b) return a simple result set (two rows) that indicates whether any entries 'moved' whilst the query was running have been missed or double counted.
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT 'HST' AS src, MAX(id) AS idx, COUNT(*) AS cnt, SUM(value) FROM history WHERE dept IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
UNION ALL
SELECT 'CRT' AS src, MIN(id) AS idx, COUNT(*) AS cnt, SUM(value) FROM current WHERE dept IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);
The sum(value) and where dept in (...) are just there to add work to the query and slow it down.
The indication of a positive outcome is if the two idx values are adjacent, like this:
+-----+--------+--------+------------+
| src | idx | cnt | SUM(value) |
+-----+--------+--------+------------+
| HST | 625874 | 625875 | 312569875 |
| CRT | 625875 | 10 | 8795 |
+-----+--------+--------+------------+
2 rows in set (1.43 sec)
I'd still be happy to hear any authoritative information on this.
I asked this question previously, then someone suggested that it was a duplicate of another previously answered question. However, I could not adapt that solution to what I need despite 3 hours of trying.
So, my new question is how to adapt that solution to my own needs.
A simplified version of my category/subcategory database schema looks like this:
tblAllCategories
record_id title level parent_cat_id parent_id keywords
-------------------------------------------------------------------------------------------
1 Antiques & Collectables 0 NULL NULL junk
2 Art 0 NULL NULL
25 Furniture 1 1 1
59 Office Furniture 2 1 25 retro,shabby chic
101 Chairs 3 1 59
Notes:
Level 0 = top-level category, level 1 = second level, etc
parent_cat_id is the top-level category (i.e. having level 0)
parent_id refers to the level immediately above the relevant level
I added the keyword column to assist keyword searches so that items in certain relevant categories would be returned if the user entered a keyword but did not select a category to drill down into.
So, at the front end, after the user enters keyword, e.g., "Retro", I need to return not only the category that has the term "retro" in its keyword column, but also all higher level categories. So, according to the schema above, a search on "retro" would return category 59 along with its super-categories - 25 and 1.
The query should be sorted by level, such that the front end search results would look something like this (after necessary coding):
The solution offered is from this question
And the query is as follows:
SELECT T2.id, T2.title,T2.controller,T2.method,T2.url
FROM (
SELECT
#r AS _id,
(SELECT #r := parent_id FROM menu WHERE id = _id) AS parent_id,
#l := #l + 1 AS lvl
FROM
(SELECT #r := 31, #l := 0) vars,
menu m
WHERE #r <> 0) T1
JOIN menu T2
ON T1._id = T2.id
ORDER BY T1.lvl DESC;
I need to adapt this query to work off a passed keyword, not an ID.
Edit the vars subquery to have #r equal to the record_id of the row with the keywork, something like
SELECT T2.record_id, T2.title,T2.level,T2.keywords
FROM (SELECT #r AS _id
, (SELECT #r := parent_id
FROM tblAllCategories
WHERE record_id = _id) AS parent_id
, #l := #l + 1 AS lvl
FROM (SELECT #r := record_id, #l := 0
FROM tblAllCategories
WHERE keywords like '%retro%') vars
, tblAllCategories m
WHERE #r <> 0) T1
JOIN tblAllCategories T2 ON T1._id = T2.record_id
ORDER BY T1.lvl DESC;
SQLFiddle demo
Having the keywork as a comma separated values is not the best, a many to many relationship between this table and a keyword table (with the compulsory junction table) will be better as it will avoid the use of LIKE. In this example if there were another category with the keyword 'retrobike' that category and all his hierarchy will also be in the result.
This is going to take a while so get some coffee.
There are a lot of good resources available for hierarchical development. Most of what you will see below comes from sites like this and it refers you to Celko which I hardily recommend.
The first thing you'll have to do is remove the keywords field. The extra effort in development, use and maintenance is nowhere near the benefit received. I'll show you how to implement it later.
In this design, think of a row as a node. Each node has two values, the left boundary and the right boundary. These form a range or span of influence. If a node has boundaries of 1:4 and another node has 2:3, the second node is a subnode of the first as its span is contained in the span of the first. Also, as the boundaries of the second node are consecutive, there can be no node below it, so it must be a leaf node. This may sound complicated at first, especially when considering many levels of nodes, but you will see how the SQL is relatively easy to write and the maintenance effort for the table is minimal.
The complete script is here.
CREATE TABLE categories (
id INT not null auto_increment PRIMARY KEY,
name VARCHAR( 50 ) NOT NULL,
lBound INT NOT NULL,
rBound INT NOT NULL,
-- MySQL does not implement check constraints. These are here for illustration.
-- The functionality will be implemented via trigger.
CONSTRAINT cat_ptr_incr_chk CHECK ( lBound < rBound ), -- basic integrity check
CONSTRAINT cat_ptr_root_chk CHECK ( lBound >= 0 ) -- eliminates negative values
);
create unique index ndx_cat_lBound on categories( lBound );
create unique index ndx_cat_rBound on categories( rBound );
Notice there is nothing here that say "I'm a leaf node", "I'm a root" or "My root node is such-and-such." This information is all encompassed by the lBound and rBound (left boundary, right boundary) values. Let's build a few nodes so we can see what this looks like.
INSERT INTO categories( name, lBound, rBound )
values( 'Categories', 0, 1 );
ID name lBound rBound
== ========== ====== ======
1 Categories 0 1
This we do before creating the triggers on the table. That's really so the insert trigger doesn't have to have special code that must recognize when the first row (the root node of the entire structure). That code would only be executed when the first row is inserted and never again. Now we don't have to worry about it.
So now me have the root of the structure. Notice that its bounds are 0 and 1. Nothing can fit between 0 and 1 so this is a leaf node. The tree root is also a leaf. That means the tree is empty.
So now we write the triggers and dml procedures. The code is in the script so I won't duplicate it here, just say that the insert and delete triggers will not allow just anyone to issue an Insert or Delete statement. Anyone may issue an Update, but only the name is allowed to be changed. The only way Inserts, Deletes and complete Updates may be performed is through the procedures. With that in mind, let's create the first node under the root.
call ins_category( 'Electronics', 1 );
This creates a node with the name 'Electronics' as a subnode of the node with ID=1 (the root).
ID name lBound rBound
== ========== ====== ======
1 Categories 0 3
2 Electronics 1 2
Notice how the trigger has expanded the right boundary of the root to allow for the new node. The next node will be yet another level.
call ins_category( 'Televisions', 2 );
Node 2 is Electronics so the new node will be its subnode.
ID name lBound rBound
== ========== ====== ======
1 Categories 0 5
2 Electronics 1 4
3 Televisions 2 3
Let's create a new upper level node -- still it must be under the root, but will be the start of a subtree beside Electronics.
call ins_category( 'Antiques & Collectibles', 1 );
ID name lBound rBound
== ========== ====== ======
1 Categories 0 7
2 Electronics 1 4
3 Televisions 2 3
4 Antiques & Collectibles 5 6
Notice the 5-6 does not fit between any boundary range except for the root. So it is a subnode directly under the root, just like Electronics, but is independent of the other subnodes.
The SQL to give a clearer picture of the structure is not complicated. After completing the tree with a lot more nodes, let's see what it looks like:
-- Examine the tree or subtree using pre-order traversal. We start at the node
-- specified in the where clause. The root of the entire tree has lBound = 0.
-- Any other ID will show just the subtree starting at that node.
SELECT n.ID, n.NAME, n.lBound, n.rBound
FROM categories p
join categories n
on n.lBound BETWEEN p.lBound AND p.rBound
where p.lBound = 0
ORDER BY n.lBound;
+----+----------------------------+--------+--------+
| id | name | lBound | rBound |
+----+----------------------------+--------+--------+
| 1 | >Categories | 0 | 31 |
| 2 | -->Electronics | 1 | 20 |
| 3 | ---->Televisions | 2 | 9 |
| 4 | ------>Tube | 3 | 4 |
| 5 | ------>LCD | 5 | 6 |
| 6 | ------>Plasma | 7 | 8 |
| 7 | ---->Portable Electronics | 10 | 19 |
| 8 | ------>MP3 Players | 11 | 14 |
| 9 | -------->Flash | 12 | 13 |
| 10 | ------>CD Players | 15 | 16 |
| 11 | ------>2-Way Radios | 17 | 18 |
| 12 | -->Antiques & Collectibles | 21 | 28 |
| 14 | ---->Furniture | 22 | 27 |
| 15 | ------>Office Furniture | 23 | 26 |
| 16 | -------->Chairs | 24 | 25 |
| 13 | -->Art | 29 | 30 |
+----+----------------------------+--------+--------+
The output above is actually from a view defined in the script, but it shows clearly the hierarchical structure. This may easily be converted to a set of nested menus or navigational nodes.
There are enhancements that may be made, but they needn't change this basic structure. You'll find it reasonably easy to maintain. I had started out thinking this would be a whole lot easier in a DBMS such as Oracle, SQL Server or PostGreSQL which allows triggers on views. Then access could be limited to only the views so triggers would take care of everything. That would eliminate the need for separate stored procedures. But this way isn't half bad. I could happily live with it. In fact, there is a simplicity and flexibility to using the stored procedures that wouldn't be available thru views alone (you can't pass parameters to views).
The keyword feature is also defined but I won't show that here. Look at the script. Execute it a little at a time to get a clear picture of what is taking place. If you have any questions, you know where to find me.
[Edit] Added a few enhancements, including working with the keywords.
You can use this simple, for Add a new column by HeirarchyID type for management Tree :
We can use Microsoft example CLICK HERE
THIS IS A SAMPLE TABLE
create table [EmployeeTB]
(
employee int identity primary key,
name nvarchar(50),
hourlyrate money,
managerid int -- parent in personnel tree
);
set identity_insert dbo.[EmployeeTB] on;
insert into [EmployeeTB] (employee, name, hourlyrate, managerid)
values
(1, 'Big Boss', 1000.00, 1),
(2, 'Joe', 10.00, 1),
(8, 'Mary', 20.00, 1),
(14, 'Jack', 15.00, 1),
(3, 'Jane', 10.00, 2),
(5, 'Max', 35.00, 2),
(9, 'Lynn', 15.00, 8),
(10, 'Miles', 60.00, 8),
(12, 'Sue', 15.00, 8),
(15, 'June', 50.00, 14),
(18, 'Jim', 55.00, 14),
(19, 'Bob', 40.00, 14),
(4, 'Jayne', 35.00, 3),
(6, 'Ann', 45.00, 5),
(7, 'Art', 10.00, 5),
(11, 'Al', 70.00, 10),
(13, 'Mike', 50.00, 12),
(16, 'Marty', 55.00, 15),
(17, 'Barb', 60.00, 15),
(20, 'Bart', 1000.00, 19);
set identity_insert dbo.[EmployeeTB] off;
select * from [EmployeeTB]
order by managerid
--Big Boss /
--Joe /1/
--Jane /1/1/
--Max /1/2/
--Ann /1/2/1/
--Art /1/2/2/
Now add NEW COLUMN BY HEIRARCHY
alter table [EmployeeTB]
add [Chain] hierarchyid;
-- fills all Chains
with sibs
as
(
select managerid,
employee,
cast(row_number() over (partition by managerid order by employee) as varchar) + '/' as sib
from [EmployeeTB]
where employee != managerid
)
--select * from sibs
,[noChain]
as
(
select managerid, employee, hierarchyid::GetRoot() as Chain from [EmployeeTB]
where employee = managerid
UNION ALL
select P.managerid, P.employee, cast([noChain].Chain.ToString() + sibs.sib as hierarchyid) as Chain
from [EmployeeTB] as P
join [noChain] on P.managerid = [noChain].employee
join sibs on
P.employee = sibs.employee
)
--select Chain.ToString(), * from [noChain]
update [EmployeeTB]
set Chain = [noChain].Chain
from [EmployeeTB] as P join [noChain]
on P.employee = [noChain].employee
select Chain.ToString(), * from [EmployeeTB]
order by managerid
we can find any model of view for this example.
I have a script which uploads a file and stores the details of the file name in the database. When a document gets uploaded I want to be able to update the name of the file in the database to be proceeded by an incremental number such as _1, _2, _3 (before the file extension) if the DOCUMENT_ID already exists. The table structure looks like this:
ID | DOCUMENT_ID | NAME | MODIFIED | USER_ID
33 | 81 | document.docx | 2014-03-21 | 1
34 | 82 | doc.docx | 2014-03-21 | 1
35 | 82 | doc.docx | 2014-03-21 | 1
36 | 82 | doc.docx | 2014-03-21 | 1
So in the case above I would want ID 35 NAME to be doc_1.docx and ID 36 NAME to be doc_2.docx.
This is where I have got to so far. I have retrieved the last file details that have been uploaded:
$result1 = mysqli_query($con,"SELECT ID, DOCUMENT_ID, NAME, MODIFIED
FROM b_bp_history ORDER BY ID DESC LIMIT 1");
while($row = mysqli_fetch_array($result1))
{
$ID = $row['ID'];
$documentID = $row['DOCUMENT_ID'];
$documentName = $row['NAME'];
$documentModified = $row['MODIFIED'];
}
So this will give me the details I need to see whether the DOCUMENT_ID exists already. Now I thought it would be best to see if it does exist then by carrying out the following:
$sql = "SELECT ID, DOCUMENT_ID
FROM b_bp_history WHERE DOCUMENT_ID = $documentID";
$result2 = mysqli_query($sql);
if(mysqli_num_rows($result2) >0){
/* This is where I need my update */
} else {
/* I don't need an update in here as it will automatically add to the database
table with no number after it. Not sure if I should always add the first one
with a _1 after it so the increment is easy? */
}
As you can see from the above I need an update in there that basically checks to see if a number exists after the name and if it does then increment it by one. On the else statement i.e. if the DOCUMENT_ID doesn't already exist I could add the first one with an _1.docx so that the increment will be easier?
If the DOCUMENT_ID does already exist the update in the first half will need to check the last number before the extension and increment by +1, so if it's _1 then then next will be _2. Not sure how to do this though either. The end result I want is:
ID | DOCUMENT_ID | NAME | MODIFIED | USER_ID
33 | 81 | document.docx | 2014-03-21 | 1
34 | 82 | doc.docx | 2014-03-21 | 1
35 | 82 | doc_1.docx | 2014-03-21 | 1
36 | 82 | doc_2.docx | 2014-03-21 | 1
Generating a Sequence ID Value in MySQL to Represent a Revision ID Based Naming Convention
I used MySQL 5.5.32 to develop and test this solution. Be sure to review the bottom section of my solution for a few homework assignments for future consideration in your overall design approach.
Summary of Requirements and Initial Comments
A external script writes to a document history table. Meta information about a user submitted file is kept in this table, including its user assigned name. The OP requests a SQL update statement or procedural block of DML operations that will reassign the original document name to one that represents the concept of a discrete REVISION ID.
The original table design contains a independent primary key: ID
An implied business key also exists in the relationship between DOCUMENT_ID (a numerical id possibly assigned externally by the script itself) and MODIFIED (a DATE typed value representing when the latest revision of a document was submitted/recorded).
Although other RDBMS systems have useful objects and built-in features such as Oracle's SEQUENCE object and ANALYTICAL FUNCTIONS, There are options available with MySQL's SQL based capabilities.
Setting up a Working Schema
Below is the DDL script used to build the environment discussed in this solution. It should match the OP description with an exception (discussed below):
CREATE TABLE document_history
(
id int auto_increment primary key,
document_id int,
name varchar(100),
modified datetime,
user_id int
);
INSERT INTO document_history (document_id, name, modified,
user_id)
VALUES
(81, 'document.docx', convert('2014-03-21 05:00:00',datetime),1),
(82, 'doc.docx', convert('2014-03-21 05:30:00',datetime),1),
(82, 'doc.docx', convert('2014-03-21 05:35:00',datetime),1),
(82, 'doc.docx', convert('2014-03-21 05:50:00',datetime),1);
COMMIT;
The table DOCUMENT_HISTORY was designed with a DATETIME typed column for the column called MODIFIED. Entries into the document_history table would otherwise have a high likeliness of returning multiple records for queries organized around the composite business key combination of: DOCUMENT_ID and MODIFIED.
How to Provide a Sequenced Revision ID Assignment
A creative solution to SQL based, partitioned row counts is in an older post: ROW_NUMBER() in MySQL by #bobince.
A SQL query adapted for this task:
select t0.document_id, t0.modified, count(*) as revision_id
from document_history as t0
join document_history as t1
on t0.document_id = t1.document_id
and t0.modified >= t1.modified
group by t0.document_id, t0.modified
order by t0.document_id asc, t0.modified asc;
The resulting output of this query using the supplied test data:
| DOCUMENT_ID | MODIFIED | REVISION_ID |
|-------------|------------------------------|-------------|
| 81 | March, 21 2014 05:00:00+0000 | 1 |
| 82 | March, 21 2014 05:30:00+0000 | 1 |
| 82 | March, 21 2014 05:35:00+0000 | 2 |
| 82 | March, 21 2014 05:50:00+0000 | 3 |
Note that the revision id sequence follows the correct order that each version was checked in and the revision sequence properly resets when it is counting a new series of revisions related to a different document id.
EDIT: A good comment from #ThomasKöhne is to consider keeping this REVISION_ID as a persistent attribute of your version tracking table. This could be derived from the assigned file name, but it may be preferred because an index optimization to a single-value column is more likely to work. The Revision ID alone may be useful for other purposes such as creating an accurate SORT column for querying a document's history.
Using MySQL String Manipulation Functions
Revision identification can also benefit from an additional convention: the column name width should be sized to also accommodate for the appended revision id suffix. Some MySQL string operations that will help:
-- Resizing String Values:
SELECT SUBSTR('EXTRALONGFILENAMEXXX',1,17) FROM DUAL
| SUBSTR('EXTRALONGFILENAMEXXX',1,17) |
|-------------------------------------|
| EXTRALONGFILENAME |
-- Substituting and Inserting Text Within Existing String Values:
SELECT REPLACE('THE QUICK <LEAN> FOX','<LEAN>','BROWN') FROM DUAL
| REPLACE('THE QUICK <LEAN> FOX','<LEAN>','BROWN') |
|--------------------------------------------------|
| THE QUICK BROWN FOX |
-- Combining Strings Using Concatenation
SELECT CONCAT(id, '-', document_id, '-', name)
FROM document_history
| CONCAT(ID, '-', DOCUMENT_ID, '-', NAME) |
|-----------------------------------------|
| 1-81-document.docx |
| 2-82-doc.docx |
| 3-82-doc.docx |
| 4-82-doc.docx |
Pulling it All Together: Constructing a New File Name Using Revision Notation
Using the previous query from above as a base, inline view (or sub query), this is a next step in generating the new file name for a given revision log record:
SQL Query With Revised File Name
select replace(docrec.name, '.', CONCAT('_', rev.revision_id, '.')) as new_name,
rev.document_id, rev.modified
from (
select t0.document_id, t0.modified, count(*) as revision_id
from document_history as t0
join document_history as t1
on t0.document_id = t1.document_id
and t0.modified >= t1.modified
group by t0.document_id, t0.modified
order by t0.document_id asc, t0.modified asc
) as rev
join document_history as docrec
on docrec.document_id = rev.document_id
and docrec.modified = rev.modified;
Output With Revised File Name
| NEW_NAME | DOCUMENT_ID | MODIFIED |
|-----------------|-------------|------------------------------|
| document_1.docx | 81 | March, 21 2014 05:00:00+0000 |
| doc_1.docx | 82 | March, 21 2014 05:30:00+0000 |
| doc_2.docx | 82 | March, 21 2014 05:35:00+0000 |
| doc_3.docx | 82 | March, 21 2014 05:50:00+0000 |
These (NEW_NAME) values are the ones required to update the DOCUMENT_HISTORY table. An inspection of the MODIFIED column for DOCUMENT_ID = 82 shows that the check-in revisions are numbered in the correct order with respect to this part of the composite business key.
Finding Un-processed Document Records
If the file name format is fairly consistent, a SQL LIKE operator may be enough to identify the record names which have been already altered. MySQL also offers filtering capabilities through REGULAR EXPRESSIONS, which offers more flexibility with parsing through document name values.
What remains is figuring out how to update just a single record or a set of records. The appropriate place to put the filter criteria would be on the outermost part of the query right after the join between aliased tables:
...
and docrec.modified = rev.modified
WHERE docrec.id = ??? ;
There are other places where you can optimize for faster response times, such as within the internal sub query that derives the revision id value... the more you know about the specific set of records that you are interested in, you can segment the beginning SQL statements to look only at what is of interest.
Homework: Some Closing Comments on the Solution
This stuff is purely optional and they represent some side thoughts that came to mind on aspects of design and usability while writing this up.
Two-Step or One-Step?
With the current design, there are two discrete operations per record: INSERT by a script and then UPDATE of the value via a SQL DML call. It may be annoying to have to remember two SQL commands. Consider building a second table built for insert only operations.
Use the second table (DOCUMENT_LIST) to hold nearly identical information, except possibly two columns:
BASE_FILE_NAME (i.e., doc.docx or document.docx) which may apply for multiple HISTORY_ID values.
FILE_NAME (i.e., doc_1.docx, doc_2.docx, etc.) which will be unique for each record.
Set a database TRIGGER on the source table: DOCUMENT_HISTORY and put the SQL query we've developed inside of it. This will automatically populate the correct revision file name at roughly the same moment after the script fills the history table.
WHY BOTHER? This suggestion mainly fits under the category of SCALABILITY of your database design. The assignment of a revision name is still a two step process, but the second step is now handled automatically within the database, whereas you'd have to remember to include it everywhere you invoked a DML operation on top of the history table.
Managing Aliases
I didn't see it anywhere, but I assume that the USER initially assigns some name to the file being tracked. In the end, it appears that it may not matter as it is an internally tracked thing that the end user of the system would never see.
For your information, this information isn't portrayed to the customer, it is saved in a table in the database as a version history...
Reading the history of a given document would be easier if the "base" name was kept the same once it has been given:
In the data sample above, unless the DOCUMENT_ID is known, it may not be clear that all the file names listed are related. This may not necessarily be a problem, but it is a good practice from a semantic point of view to separate user assigned file names as ALIASES that can be changed and assigned at will at any time.
Consider setting up a separate table for tracking the "User-Friendly" name given by the end user, and associating it with the document id it is supposed to represent. A user may make hundreds or thousands of rename requests... while the back end file system uses a simpler, more consistent naming approach.
I had similar trouble recently, but I'm using MSSQL and I don't no MySQL syntax, so here is a T-SQL code. Hope, it will help you!
declare
#id int,
#document_id int,
#document_name varchar(255),
#append_name int,
#name varchar(255),
#extension varchar(10)
set #append_name = 1
select top 1
#id = ID,
#document_id = DOCUMENT_ID,
#document_name = NAME
from
b_bp_history
while exists (
select *
from b_bp_history
where
NAME = #document_name and
DOCUMENT_ID = #document_id and
ID <> #id)
begin
set #name = ''
set #extension = ''
declare #dot_index int -- index of dot-symbol in document name
set #dot_index = charindex('.', reverse(#document_name))
if (#dot_index > 0)
begin
set #name = substring(#document_name, 0, len(#document_name) - #dot_index + 1)
set #extension = substring(#document_name, len(#document_name) - #dot_index + 2, len(#document_name) - len(#name))
end
else
set #name = #document_name
if (#append_name > 1) -- if not first try to rename file
begin
if (right(#name, len(cast(#append_name - 1 as varchar)) + 1)) = '_' + cast(#append_name - 1 as varchar)
begin
set #name = substring(#name, 0, len(#name) - (len(cast(#append_name - 1 as varchar))))
end
end
set #name = #name + '_' + cast(#append_name as varchar)
if (len(#extension) > 0)
set #document_name = #name + '.' + #extension
else
set #document_name = #name
set #append_name = #append_name + 1
end
update b_bp_history
set NAME = #document_name
where ID = #id
Here is the Working UPDATE QUERY
UPDATE document_history
INNER JOIN (SELECT dh.id, IF(rev.revision_id = 0, dh.name,REPLACE(dh.name, '.', CONCAT('_', rev.revision_id, '.'))) AS new_name,
rev.document_id, rev.modified
FROM (
SELECT t0.document_id, t0.modified, count(*) - 1 AS revision_id
FROM document_history as t0
JOIN document_history as t1
ON t0.document_id = t1.document_id
AND t0.modified >= t1.modified
GROUP BY t0.document_id, t0.modified
ORDER BY t0.document_id ASC, t0.modified ASC) AS rev
JOIN document_history dh
ON dh.document_id = rev.document_id
AND dh.modified = rev.modified) update_record
ON document_history.id = update_record.id
SET document_history.name = update_record.new_name;
You can see the SQL Fiddle at http://www.sqlfiddle.com/#!2/9b3cda/1
I used the information available on this page on UPDATE to assemble my query:
MySQL - UPDATE query based on SELECT Query
Used the page below for generating a Revision ID:
ROW_NUMBER() in MySQL
Also used the schema provided by Richard Pascual in his elaborate answer.
Hope this query helps you to name your document as you wish.