My use-case is creating a stacked column graph using a logs table stored in MySQL.
Currently I have a regular column chart, but I would like to aggregate data by site_id so that I can see which parts of the bar are attributed to which site.
My current technique for the existing column chart is to get a list of dates, and count the records whilst grouping by the date. Then I use a for loop to create the 14 day period I need, and then loop my data to populate matching counts into the correct day.
SELECT DATE(`created`) AS `day`,
COUNT(`id`) AS `count`
FROM `api_logs` `ApiLogs`
WHERE DATE(created) BETWEEN DATE_SUB(CURDATE(), INTERVAL 14 day) AND CURDATE()
GROUP BY DATE(`created`)
ORDER BY DATE(`created`)
For the stacked chart though, I can't think of a way to collate the data in MySQL without performing a number of queries or building a subquery to collate the count per site.
Is there an established pattern for querying for a result which is easily compatible with the stacked column chart?
My front-end is built in PHP, if there are any post-query processing solutions.
the problem is you need a column for each site
if you have a set number of sites, then you can build the columns manually in the sql
SELECT DATE(`created`) AS `day`,
SUM(CASE WHEN `site_id` = 'A' THEN 1 ELSE 0 END) AS `site A`,
SUM(CASE WHEN `site_id` = 'B' THEN 1 ELSE 0 END) AS `site B`
FROM `api_logs` `ApiLogs`
WHERE DATE(created) BETWEEN DATE_SUB(CURDATE(), INTERVAL 14 day) AND CURDATE()
GROUP BY DATE(`created`)
ORDER BY DATE(`created`)
otherwise, you can build the columns dynamically and aggregate
using google's DataView and data.group
first, add site_id to the sql
SELECT DATE(`created`) AS `day`,
`site_id` AS `site_id`,
COUNT(`id`) AS `count`
FROM `api_logs` `ApiLogs`
WHERE DATE(created) BETWEEN DATE_SUB(CURDATE(), INTERVAL 14 day) AND CURDATE()
GROUP BY DATE(`created`), `site_id`
ORDER BY DATE(`created`), `site_id`
which should result, similar to the following...
['Date', 'Site', 'Count'],
[new Date('11/17/2016'), 'A', 10],
[new Date('11/17/2016'), 'B', 15],
[new Date('11/17/2016'), 'C', 22],
see following working snippet for building columns dynamically...
google.charts.load('current', {
callback: function () {
// raw table data
var data = google.visualization.arrayToDataTable([
['Date', 'Site', 'Count'],
[new Date('11/17/2016'), 'A', 10],
[new Date('11/17/2016'), 'B', 15],
[new Date('11/17/2016'), 'C', 22],
[new Date('11/17/2016'), 'D', 8],
[new Date('11/16/2016'), 'A', 12],
[new Date('11/16/2016'), 'B', 6],
[new Date('11/16/2016'), 'C', 13],
[new Date('11/16/2016'), 'E', 14],
[new Date('11/15/2016'), 'A', 9],
[new Date('11/15/2016'), 'B', 16],
[new Date('11/15/2016'), 'D', 11]
]);
// create view with columns for each site, then agg view
var view = new google.visualization.DataView(data);
var aggColumns = [];
var viewColumns = [0];
data.getDistinctValues(1).forEach(function (site, index) {
viewColumns.push({
calc: function (dt, row) {
if (dt.getValue(row, 1) === site) {
return dt.getValue(row, 2);
}
return null;
},
label: site,
type: 'number'
});
aggColumns.push({
aggregation: google.visualization.data.sum,
column: index + 1,
label: site,
type: 'number'
});
});
view.setColumns(viewColumns);
var group = google.visualization.data.group(
view,
[0],
aggColumns
);
var chart = new google.visualization.ColumnChart(document.getElementById('chart_div'));
chart.draw(group, {
isStacked: true
});
},
packages: ['corechart']
});
<script src="https://www.gstatic.com/charts/loader.js"></script>
<div id="chart_div"></div>
Related
I am working with a NBA script in MySQL and I have to find out who is the shortest player in database. I am using feet as measurement and after executing the query i found out that the player the query was giving me was not the right answer.
The query is
select * from players where height=(select min(height) from players);
And it gaves me:
'420', 'Carlos Arroyo', 'Florida International', ' 6-2', '202', 'G', 'Magic'
where 6-2 is the height.
Instead of giving me one of these results
'26', 'Brevin Knight', 'Stanford', '5-10', '170', 'G', 'Clippers'
'113', 'Nate Robinson', 'Washington', '5-9', '180', 'G', 'Knicks'
'182', 'Earl Boykins', 'Eastern michigan', '5-5', '133', 'G', 'Bobcats'
'372', 'Damon Stoudamire', 'Arizona', '5-10', '171', 'G', 'Spurs'
'482', 'Chucky Atkins', 'South Florida', '5-11', '185', 'G', 'Nuggets'
And if I order by height players, the result it's a bit annoying:
'Carlos Arroyo', ' 6-2'
'Shareef Abdur-Rahim', ' 6-9'
'Louis Amundson', ' 6-9'
'Brevin Knight', '5-10'
'Damon Stoudamire', '5-10'
'Chucky Atkins', '5-11'
'Earl Boykins', '5-5'
'Nate Robinson', '5-9'
'Aaron Brooks', '6-0'
'Allen Iverson', '6-0'
'Kyle Lowry', '6-0'
'Jammer Nelson', '6-0'
'Sebastian Telfair', '6-0'
'Chris Paul', '6-0'
Convert the height-string to a number which you can use for numeric comparison.
select player, height
from players
where cast(substring_index(height, '-', 1) as unsigned)*100+
cast(right(concat('0', substring_index(height, '-', -1)),2) as unsigned)
in (
select min(cast(substring_index(height, '-', 1) as unsigned)*100+
cast(right(concat('0', substring_index(height, '-', -1)),2) as unsigned))
from players
)
See dbfiddle
...
where 6-2 is the height. Instead of giving me one of these results
...
You tell that all values '5-xx' are equivalent to each other, i.e. only value before the dash is taken into account.
Also you tell that you need in only one output row, and any row of shown 5 rows matches - i.e. you do not need in secondary sorting.
If so then you may simply do
SELECT *
FROM players
ORDER BY CAST(height AS UNSIGNED) LIMIT 1
I currently have a table in snowflake as below
Address Zip State
123 St. 94143 CA
3432 St. 93059 TX
I wan to create a json block as below:
{
"Address" : 123 St.,
"Zip" : 93059,
"State" : CA
},
{
"Address" : 3432 St.,
"Zip" : 94143,
"State" : TX
}
I have the code below:
select
OBJECT_CONSTRUCT (
'Address', Address ,
'Zip', Zip,
'State', State )
as json_value
from example_table_above;
The above currently returns sometimes single records of each key pair value, but not in a blob at all times.
ie it returns just this - no real patttern or seperation by record:
"JSON_VALUE"
{"Address":adsf}
{"Address":"Triang St"}
{"Zip":949}
{"State":CA}
This is a subtle point, but the JSON block you want is not valid. In order to make it valid, it needs to be wrapped with an outer array:
[
{
"Address": "123 St.",
"State": "CA",
"Zip": 94143
},
{
"Address": "3422 St",
"State": "TX",
"Zip": 93059
}
]
To get Snowflake to construct a JSON document like that, simply wrap the OBJECT_CONSTRUCT function with ARRAY_AGG like this:
select array_agg(object_construct('Address', address, 'Zip', zip, 'State', state)) from T1;
Can you provide a bit more detail on what you need to do with this large JSON object(s)? Are you looking to output it from Snowflake as result set or to a file, or do you want to insert it back into another table in Snowflake?
With the unconstrained array_agg all qualifying rows from your query are being grouped into a single object, and if your table exceeds the 16777216 byte array/object limit, as in your case, you get the array size error your seeing.
Array_Agg and Object_Agg are aggregate functions, so if there is a grouping you can use to keep each Array/Object within the size limit you could use that?
e.g. grouping by STATE reduces each JSON object by ~1/50th in size.
Select
State,
ARRAY_AGG(
OBJECT_CONSTRUCT (
'Address', Address ,
'Zip', Zip,
'State', State )
)
as json_value
from T
Group By State;
How to unload single large JSON Object > 16777216 Bytes
Create some chunky test data
Create or Replace Table T as
Select
(MOD(ABS(RANDOM()),999)||' '||
UPPER(RANDSTR(
ABS(MOD(RANDOM(),20)),
RANDOM() ))||' '||
ARRAY_CONSTRUCT( 'Rd.', 'St.', 'Av.', 'Way', 'Dr.','Cl.')[MOD(ABS(RANDOM()),6)]) ADDRESS,
RIGHT(RANDOM(),5) ZIP,
ARRAY_CONSTRUCT( 'AL', 'AK', 'AS', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC',
'FM', 'FL', 'GA', 'GU', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS',
'KY', 'LA', 'ME', 'MH', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO',
'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'MP',
'OH', 'OK', 'OR', 'PW', 'PA', 'PR', 'RI', 'SC', 'SD', 'TN',
'TX', 'UT', 'VT', 'VI', 'VA', 'WA', 'WV', 'WI', 'WY')[MOD(ABS(RANDOM()),59)]::VARCHAR STATE
from
table(GENERATOR( ROWCOUNT => 400000))
;
Create a view to construct JSON data as multi-line text
Create or Replace View V as
With
rc as (Select count(*) row_count from T),
ro as (Select
row_number() over ( order by NULL) as RID,
(object_construct(*))::VARCHAR row_obj
from T),
json_data as (
Select 0 as RID, '['::VARCHAR as JSON_LINE
UNION ALL
Select RID, row_obj as JSON_LINE from ro where RID = 1
UNION ALL
Select RID, ','||row_obj as JSON_LINE from ro where RID > 1
UNION ALL
Select row_count+1 as RID, ']'::VARCHAR as JSON_LINE from rc
)
Select RID, JSON_LINE from json_data
order by RID
;
Check the view output.
Select the 10 first and last records, including the array '[' ']' rows
Select RID, JSON_LINE from V where RID <=10 or RID >= 399990;
Create CSV stage to unload into
Note: file_format options to ensure escape characters aren't introduced
create or replace stage T_STAGE_CSV file_format = (
type = 'CSV'
skip_header = 1
ESCAPE = ' '
COMPRESSION = NONE);
Copy data from the View into the Stage.
copy into #TEST.TEST.T_STAGE_CSV from (Select JSON_LINE from V);
Check stage as output may get split across more than one file, so you will need to concatenate the files together externally of Snowflake!
list #TEST.TEST.T_STAGE_CSV;
Pull files to your client.
GET #TEST.TEST.T_STAGE_CSV/ file:////Users/fieldy/Documents/_sql_code/data;
Concatenate the files together externally
e.g.
cat /Users/fieldy/Documents/_sql_code/data/data* > /Users/fieldy/Documents/_sql_code/datadata.json
I have a sql table related to discontinuous dates:
CREATE TABLE IF NOT EXISTS date_test1 ( items CHAR ( 8 ), trade_date date );
INSERT INTO `date_test1` VALUES ( 'a', '2020-03-20');
INSERT INTO `date_test1` VALUES ( 'b', '2020-03-20');
INSERT INTO `date_test1` VALUES ('a', '2020-03-21');
INSERT INTO `date_test1` VALUES ( 'c', '2020-03-22');
INSERT INTO `date_test1` VALUES ( 'd', '2020-03-22');
INSERT INTO `date_test1` VALUES ('a', '2020-03-25');
INSERT INTO `date_test1` VALUES ( 'e', '2020-03-26');
In this table, '2020-03-23' and '2020-03-24' are missed. I want to fill them by their previous data, in this table, the '2020-03-22' data.
Expected result:
The number of continues missing dates and of the records in one day are both uncertain.
So how to do this in mysql?
This solution uses Python and assumes that there aren't so many rows that they cannot be read into memory. I do not warrant this code free from defects; use at your own risk. So I suggest you run this against a copy of your table or make a backup first.
This code uses the pymysql driver.
import pymysql
from datetime import date, timedelta
from itertools import groupby
import sys
conn = pymysql.connect(db='x', user='x', password='x', charset='utf8mb4', use_unicode=True)
cursor = conn.cursor()
# must be sorted by date:
cursor.execute('select items, trade_date from date_test1 order by trade_date, items')
rows = cursor.fetchall() # tuples: (datetime.date, str)
if len(rows) == 0:
sys.exit(0)
groups = []
for k, g in groupby(rows, key=lambda row: row[1]):
groups.append(list(g))
one_day = timedelta(days=1)
previous_group = groups.pop(0)
next_date = previous_group[0][1]
for group in groups:
next_date = next_date + one_day
while group[0][1] != next_date:
# missing date
for tuple in previous_group:
cursor.execute('insert into date_test1(items, trade_date) values(%s, %s)', (tuple[0], next_date))
print('inserting', tuple[0], next_date)
conn.commit()
next_date = next_date + one_day
previous_group = group
Prints:
inserting c 2020-03-23
inserting d 2020-03-23
inserting c 2020-03-24
inserting d 2020-03-24
Discussion
With your sample data, after the rows are fetched, rows is:
(('a', datetime.date(2020, 3, 20)), ('b', datetime.date(2020, 3, 20)), ('a', datetime.date(2020, 3, 21)), ('c', datetime.date(2020, 3, 22)), ('d', datetime.date(2020, 3, 22)), ('a', datetime.date(2020, 3, 25)), ('e', datetime.date(2020, 3, 26)))
After the following is run:
groups = []
for k, g in groupby(rows, key=lambda row: row[1]):
groups.append(list(g))
groups is:
[[('a', datetime.date(2020, 3, 20)), ('b', datetime.date(2020, 3, 20))], [('a', datetime.date(2020, 3, 21))], [('c', datetime.date(2020, 3, 22)), ('d', datetime.date(2020, 3, 22))], [('a', datetime.date(2020, 3, 25))], [('e', datetime.date(2020, 3, 26))]]
That is, all the tuples with the same date are grouped together in a list so it becomes to easier to detect missing dates.
I have the following (pseudo)JSON in a type JSON (LONGTEXT) column in my MariaDB 10.2
{"order":
{"otherstuff":...},
{"dates":
[
{
"typeId":2,
"date":"2019-05-21 09:00:00"
},
{
"typeId":4,
"date":"2019-05-21 10:00:00"
}
]
}
}
What I need is the order's date while I know which type I need (4).
An order can have a number of dates identified by their typeId. typeId 4 is not always in second position.
SELECT JSON_UNQUOTE(JSON_SEARCH(`json`, 'one', 4, NULL, '$.dates[*].typeId'))
// gives me: $.dates[1].typeId
My first thought now was to REPLACE typeId with date, but that complains about mixed collations.
How would I (more elegantly) reference the 'date' value here?
Also, the query is supposed to be the expression of a GENERATED column in my table. Since date id4 is not necessarily there for every order, I tried this:
SELECT IF(4 IN (JSON_EXTRACT(json, '$.dates[*].typeId')), 'yes', 'no')
// above condition evaluates to [2, 4]
I have trimmed away '[' and ']' but then it only gives me a 'yes' if 4 is first in the array (is it an array?).
So (without brackets):
[4, 7] -> yes
[2, 4] -> no
I'm assuming this doesn't get recognized as an array of values but a string. Then why does it give me 'yes' if my needle is in first position?
Instead of yes and no I obviously want to use the date and NULL.
The MySQL JSON functions are quite new to me. So maybe someone could point me in the right direction?
Try:
Option 1:
SELECT
JSON_UNQUOTE(
JSON_EXTRACT(
`json`,
REPLACE(
JSON_UNQUOTE(
JSON_SEARCH(
`json`,
'one',
4,
NULL,
'$.order.dates[*].typeId'
)
),
'typeId',
'date'
)
)
) `date`;
Option 2:
SELECT
IF(
JSON_CONTAINS(
JSON_EXTRACT(
`json`,
'$.order.dates[*].typeId'
),
4
),
'yes',
'no'
) `exists`;
See dbfiddle.
FW: just a heads up that I may have combined mysql syntax and SQL syntax. I work with SQL at work (aka, right now) and then use mysql for side projects so I may not have used proper syntax on everything (query should be correct) so please bear with me lol
I have a database consisting of prices for various parts aggregated by day (PartPrices). I want to run a query to see which parts grew in price the most and which declined in price the most over a date range. This date range could be from a start date to now or a start date to any period in time.
The schema looks as follows:
PartID | DateTime (Time stamp from start of day) | Price
Essentially what I have is as follows:
DECLARE #StartDate AS DATETIMEOFFSET
DECLARE #EndDate AS DATETIMEOFFSET
WITH LastPartPrices AS (
SELECT *
FROM
(SELECT
PartID
,DateTime
,Price
FROM PartPrices
WHERE PartPrices.DateTime <= #EndDate
GROUP BY `PartID`, DateTime DESC, Price) t
GROUP BY `PartID`),
HighestPartPrices AS (
SELECT *
FROM
(SELECT
PartID
,DateTime
,Price
FROM PartPrices
WHERE PartPrices.DateTime BETWEEN #StartDate AND #EndDate
GROUP BY `PartID`, Price DESC, DateTime) t
GROUP BY `PartID`),
LowestPartPrices AS (
SELECT *
FROM
(SELECT
PartID
,DateTime
,Price
FROM PartPrices
WHERE PartPrices.DateTime BETWEEN #StartDate AND #EndDate
GROUP BY `PartID`, Price ASC, DateTime) t
GROUP BY `PartID`)
SELECT
Parts.ID
,Parts.Name
,Parts.Description
,LastPartPrices.Price AS LastPrice
,( (LastPartPrices.Price - HighestPartPrices.Price) / HighestPartPrices.Price ) AS HighCurrentDifference
,( (LastPartPrices.Price - LowestPartPrices.Price) / LowestPartPrices.Price ) AS LowCurrentDifference
FROM Parts
INNER JOIN LastPartPrices ON Parts.ID = LastPartPrices.PartID
INNER JOIN HighestPartPrices ON Parts.ID = HighestPartPrices.PartID
INNER JOIN LowestPartPrices ON Parts.ID = LowestPartPrices.PartID
I don't feel this query is optimized which is why I'm reaching out to the SO community for input. If you think I should handle this data differently, I'm open to suggestions as well. Thanks in advance. Here is some raw sample data from PartPrices (keep in mind, these are exaggerated to avoid having to add a ton of data):
{1, '2016-03-01T00:00:00+00:00', 150.40 },
{1, '2016-03-02T00:00:00+00:00', 170.50 },
{1, '2016-03-03T00:00:00+00:00', 160.00 },
{2, '2016-03-01T00:00:00+00:00', 80.30 },
{2, '2016-03-02T00:00:00+00:00', 100.00 },
{2, '2016-03-03T00:00:00+00:00', 120.00 },
{3, '2016-03-01T00:00:00+00:00', 10.50 },
{3, '2016-03-02T00:00:00+00:00', 20.10 },
{3, '2016-03-03T00:00:00+00:00', 30.00 }
What I would expect is to get the following:
{ 'ID', 'Name', 'Description', 'LastPrice', 'HighCurrentDifference', 'LowCurrentDifference' }
{ 1, 'Advil', 'Pain Killer', 160.00, -0.06, 0.06 },
{ 2, 'Bud Light', 'Beer', 120.00, 0.2, 0.49 },
{ 3, 'XBox One', 'Game Console', 30.00, 0.49, 1.85 }