I have cumulative input values that start life as smallints.
I read these values from a Access database, and aggregate them into a MySQL database.
Now I'm faced with input values of type smallint that are cumulative, thus always increasing.
Input Required output
---------------------------------
0 0
10000 10000
32000 32000
-31536 34000 //overflow in the input
-11536 54000
8464 74000
I process these values by inserting the raw data into a blackhole table and in the trigger to the blackhole I upgrade the data before inserting it into the actual table.
I know how to store the previous input and output, or if there is none, how to select the latest (and highest) inserted value.
But what's the easiest/fastest way to deal with the overflow, so I get the correct output.
Given you have a table named test with a primary key called id and the column is named value Then just do this:
SELECT
id,
test.value,
(SELECT SUM(value) FROM test AS a WHERE a.id <= test.id) as output
FROM test;
This would be the output:
------------------------
| id | value | output |
------------------------
| 1 | 10000 | 10000 |
| 2 | 32000 | 42000 |
| 3 | -31536 | 10464 |
| 4 | -11536 | -1072 |
| 5 | 8464 | 7392 |
------------------------
Hope this helps.
If it doesn't work, just convert your data to INT (or BIGINT for lots of data). It does not hurt and memory is cheap this days.
Related
I have a requirement wherein I will be getting records which I need to insert into a database (MariaDB 10.3) table wherein for each record I have 2 base values viz. name and amount and one processed value viz. action (imagine this action is something acted upon by user from UI).
+---------+--------+----------------+----------------+
| name | amount | action | created_at |
+---------+--------+----------------+----------------+
| Akshay | 1000 | processed | 2019-08-01 |
+---------+--------+----------------+----------------+
Now, what I want to achieve is when next time I receive a record with name and amount which already exist in the table, then populate action by automatic reference to the previous entry from this same table whose name and amount match.
And if name and amount combination does not exist in the table, then do not populate action.
Desired end result is depicted in structure below:
+---------+--------+----------------+----------------+
| name | amount | action | created_at |
+---------+--------+----------------+----------------+
| Akshay | 1000 | processed | 2019-08-04 |
| Akshay | 1001 | | 2019-08-03 |
| Saanvi | 1000 | | 2019-08-02 |
| Akshay | 1000 | processed | 2019-08-01 |
+---------+--------+----------------+----------------+
Any clues how can I achieve this functionality?
DROP PROCEDURE IF EXISTS db.SP_CREATE_VALUE;
CREATE PROCEDURE db.`SP_CREATE_VALUE`(IN `in_name` VARCHAR(50), IN `in_amount` INT)
BEGIN
DECLARE numAlreadyExists INT(11) DEFAULT 0;
DECLARE strExistingAction VARCHAR(20) DEFAULT "";
SET numAlreadyExists=(SELECT COUNT(*) FROM table_name WHERE name=in_name AND in_amount=in_amount);
IF (numAlreadyExists >0) THEN
SET strExistingAction=(SELECT action FROM table_name WHERE name=in_name AND in_amount=in_amount ORDER BY table_id DESC LIMIT 1);
END IF;
INSERT INTO table_name
name=in_name,amount=in_amount,action=strExistingAction;
END;
And then when you want to create a new record, simply...
CALL SP_CREATE_VALUE('Akshay',1000);
You can use window functions. For instance:
select name, amount,
max(action) over (partition by name, amount order by date) as action,
date
from t;
I have a table with over then 50kk rows.
trackpoint:
+----+------------+-------------------+
| id | created_at | tag |
+----+------------+-------------------+
| 1 | 1484407910 | visitorDevice643 |
| 2 | 1484407913 | visitorDevice643 |
| 3 | 1484407916 | visitorDevice643 |
| 4 | 1484393575 | anonymousDevice16 |
| 5 | 1484393578 | anonymousDevice16 |
+----+------------+-------------------+
where 'created_at' is a timestamp of row added.
and i have a list of timestamps, for example like this one:
timestamps = [1502744400, 1502830800, 1502917200]
I need to select all timestamp in every interval between i and i+1 of timestamp.
Using Django ORM it's look like:
step = 86400
for ts in timestamps[:-1]:
trackpoint_set.filter(created_at__gte=ts,created_at__lt=ts + step).values('tag').distinct().count()
Because of actually timestamps list is very very longer and table has many of rows, finally i getting 500 time-out
So, my question is, how to for it in ONE raw SQL query join rows and list of values, so it looks like [(1502744400, 650), (1502830800, 1550)...]
Where second first value is timestamp, and the second is count of unique tags in each interval.
First index created_at. Next build query like created_at in (timestamp, timestamp+1). For each timestamp, run the query one by one rather than all at once.
I am seeing this behavior on Windows 7 - mysql v5.6.17. When I try to view table records, the field with auto incrementing primary key (here it is the column with 'id') do not show values in all rows, but those values actually were populated after loading them from a data file. My initial thought was some of the values in ID column were missing, but they are not!
If I query individual row, then it shows the value in id column. Is this a normal behavior? Is there a better way to display table records?
Please see the image enclosed below.
The data certainly appears corrupted.
An 'uncorrupted' result might look like this...
+----+-------------+-----+-----+-------+
| id | PlateNumber | Row | Col | Orf |
+----+-------------+-----+-----+-------+
| 1 | 1 | A | 1 | Empty |
| 2 | etc... | | | |
In my projects I often need to store the result of a SELECT in another table (we call this a "resultset"). The reason is to dynamically display a large number of rows in a web application while loading only small chunks as necessary.
Typically, this is done by queries such as this one:
SET #counter := 0;
INSERT INTO resultsetdata
SELECT "12345", #counter:=#counter+1, a.ID
FROM sometable a
JOIN bigtable b
WHERE (a.foo = b.bar)
ORDER BY a.whatever DESC;
The fixed "12345" value is just a value to identify the "resultset" as a whole and changes for each query. The second column is a incrementing index counter that is meant to allow direct access to a specific row in the result and the ID column references the specific row in the source data table.
When the application needs a certain range of the result I just join resultsetdata with the source table to get the detailed data - which is quick as opposed to the resultsetdata query above which may take 2-3 seconds to complete (which explains why I need this intermediary table).
The SELECT query itself is not relevant for this question.
resultsetdata has the following structure:
CREATE TABLE `resultsetdata` (
`ID` int(11) NOT NULL,
`ContIdx` int(11) NOT NULL,
`Value` int(11) NOT NULL,
PRIMARY KEY (`ID`,`ContIdx`)
) ENGINE=InnoDB;
This usually works like a charm but lately we noticed that in some cases the ORDER of the result is not correct. This depends on the query itself (for example, adding DISTINCT is a typical cause), the server version and the data contained in the source tables, so I guess one can say that the row order is unpredictable with this method. Probably it depends on internal optimizations.
However, the problem is now that I can't think of any alternative solution that gives me the expected result.
Since the resultset can get several thousands of rows, loading all data in memory and then manually INSERTing it is not feasible.
Any suggestions?
EDIT: For further clarification, have a look at these queries:
DROP TABLE IF EXISTS test;
CREATE TABLE test (ID INT NOT NULL, PRIMARY KEY(ID)) ENGINE=InnoDB;
INSERT INTO test (ID) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
SET #counter:=0;
SELECT "12345", #counter:=#counter+1, ID
FROM test
ORDER BY ID DESC;
This produces the following result as "expected":
+-------+----------------------+----+
| 12345 | #counter:=#counter+1 | ID |
+-------+----------------------+----+
| 12345 | 1 | 10 |
| 12345 | 2 | 9 |
| 12345 | 3 | 8 |
| 12345 | 4 | 7 |
| 12345 | 5 | 6 |
| 12345 | 6 | 5 |
| 12345 | 7 | 4 |
| 12345 | 8 | 3 |
| 12345 | 9 | 2 |
| 12345 | 10 | 1 |
+-------+----------------------+----+
10 rows in set (0.00 sec)
As said, in some cases (I can't provide a testcase here, sorry), this may lead to a result similar to this:
+-------+----------------------+----+
| 12345 | #counter:=#counter+1 | ID |
+-------+----------------------+----+
| 12345 | 10 | 10 |
| 12345 | 9 | 9 |
| 12345 | 8 | 8 |
| 12345 | 7 | 7 |
| 12345 | 6 | 6 |
| 12345 | 5 | 5 |
| 12345 | 4 | 4 |
| 12345 | 3 | 3 |
| 12345 | 2 | 2 |
| 12345 | 1 | 1 |
+-------+----------------------+----+
I'm not saying this is a MySQL bug and I fully understand that my method currently provides unpredictable results. Still, I don't know how to tweak this to get predictable results.
This is because the order that records are sorted when they are inserted is unrelated to the order when you retrieve them.
When you retrieve them a query plan will be created. If no ORDER BY is specified in your SELECT statement then the order will depend on the query plan produced. This is why it is unpredictable and adding DISTINCT can change the order.
The solution is to store enough data that you can retrieve them in the correct order using an ORDER BY clause. In your case you have ordered your data by a.whatever. Can a.whatever be stored in resultsetdata? If so then you can read the records out in the correct order.
Maybe you could wrap the select into another select:
SET #counter := 0;
INSERT INTO resultsetdata
SELECT *, #counter := #counter + 1
FROM (
SELECT "12345", a.ID
FROM sometable a
JOIN bigtable b
WHERE a.foo = b.bar
ORDER BY a.whatever DESC
) AS tmp
... but you are still at the mercy of the dumbness of MySQL's optimizer.
That's all I found about this topic, but I couln't find a hard guarantee:
Pure-SQL Technique for Auto-Numbering Rows in Result Set
http://www.xaprb.com/blog/2006/12/02/how-to-number-rows-in-mysql/
http://www.xaprb.com/blog/2005/09/27/simulating-the-sql-row_number-function/
I have a MySQL table with many numeric columns (some INT, some FLOAT). I would like to query it with the MySQL command-line client (specifically, mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (x86_64) using readline 6.1), like so:
SELECT * FROM table WHERE foo;
Unfortunately, if the value of any numeric field exceeds 10^6, this client displays the result in scientific notation, which makes reading the results difficult.
I could correct the problem by FORMAT-ing each of the fields in my query, but there are many of them and many tables I would like to query. Instead I'm hoping to find a client variable or flag I can set to disable scientific notation for all queries.
I have not been able to find one in the --help or the man page, nor searching Google or this site. Instead all I find are discussions of preserving/removing scientific notation when using <insert-programming-language>'s MySQL API.
Thank you for any tips.
::edit::
Here's an example table ...
mysql> desc foo;
+--------------+-------------+------+-----+-------------------+
| Field | Type | Null | Key | Default |
+--------------+-------------+------+-----+-------------------+
| date | date | NO | PRI | NULL |
| name | varchar(20) | NO | PRI | NULL |
| val | float | NO | | NULL |
| last_updated | timestamp | NO | | CURRENT_TIMESTAMP |
+--------------+-------------+------+-----+-------------------+
and some example values ...
mysql> select * from foo where date='20120207';
+------------+--------+--------------+---------------------+
| date | name | val | last_updated |
+------------+--------+--------------+---------------------+
| 2012-02-07 | A | 88779.5 | 2012-02-07 13:38:14 |
| 2012-02-07 | B | 1.00254e+06 | 2012-02-07 13:38:14 |
| 2012-02-07 | C | 78706.5 | 2012-02-07 13:38:15 |
+------------+--------+--------------+---------------------+
Now, the actual values I loaded into the third field are:
88779.5, 1002539.25, 78706.5390625
and they can be seen exactly if I manipulate the value:
mysql> select date, name, ROUND(val, 10), last_updated from foo where ...
+------------+---+--------------------+---------------------+
| 2012-02-07 | A | 88779.5000000000 | 2012-02-07 13:38:14 |
| 2012-02-07 | B | 1002539.2500000000 | 2012-02-07 13:38:14 |
| 2012-02-07 | C | 78706.5390625000 | 2012-02-07 13:38:15 |
Something in the client seems to be enforcing that I only be allowed to see six significant figures, even though there are more in the table.
If a query such as
mysql> select ROUND(*, 2) from foo ...
were possible, that would be great! Otherwise I can't really take the time to individually wrap 100 column names in "ROUND()" whenever I need to inspect some data.
Interestingly, I occasionally use a phpMyAdmin interface to browse the contents of some of these tables, and that interface also has this 6 significant figure limitation. So it's not limited to just the CLI.
Well, after reading the documentation more thoroughly, I still can't see any reason why a client would limit itself to displaying only 6 sig figs from a FLOAT (especially when the table itself is definitely storing more).
Nonetheless, an acceptable solution (for this weary user) is to change all my tables to use DECIMAL(16,4) instead of FLOAT. Unfortunately, this makes all my numbers show up with 4 decimal places (even if they're all '0'). But at least all numbers have the same width now, and my client never displays them in scientific notation or limits the number of sig figs in its output.
Wouldn't the CAST function allow you to request that the values for a certain field are returned as DECIMAL ? Not an expert and haven't tried it, but that would be the first thing I try.
I know this is old but this helped me.. I used a view..
create view foo2 as select date, name, ROUND(val, 10) val, last_updated from foo
Then just do your queries on foo2. also works in phpmyadmin