I wrote the following mysql code:
select trade_dt,
ticker_id,
settle_price,
volume
from hist
where volume > 0 and trade_dt between '06/22/2011' and '06/30/2011';
but unfortunately, it returns dates outside the time window as follows:
+------------+------------+--------------+--------+
| trade_dt | ticker_id | settle_price | volume |
+------------+------------+--------------+--------+
| 06/23/2006 | N (Jul 06) | 156.900000 | 90 |
| 06/26/2006 | N (Jul 06) | 155.600000 | 63 |
| 06/27/2006 | N (Jul 06) | 159.300000 | 79 |
| 06/28/2006 | N (Jul 06) | 159.600000 | 57 |
| 06/29/2006 | N (Jul 06) | 143.400000 | 511 |
| 06/30/2006 | N (Jul 06) | 140.200000 | 342 |
| 06/23/2005 | V (Oct 05) | 151.200000 | 61 |
| 06/23/2011 | U (Sep 11) | 22.500000 | 6284 |
| 06/24/2011 | U (Sep 11) | 23.100000 | 4505 |
| 06/27/2011 | U (Sep 11) | 22.650000 | 3118 |
| 06/28/2011 | U (Sep 11) | 22.100000 | 3707 |
| 06/29/2011 | U (Sep 11) | 21.500000 | 5830 |
| 06/30/2011 | U (Sep 11) | 20.750000 | 9207 |
| 06/23/2008 | F (Jan 09) | 23.260000 | 2 |
and I wonder if that is because my trade_dt is defined as a string in hist table.EDITED table to replace char(10) with date
desc hist;
+-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| futures_id | int(11) | NO | PRI | NULL | auto_increment |
| trade_dt | date | NO | | NULL | |
| ticker_id | varchar(46) | NO | MUL | NULL | |
| settle_price | decimal(10,6) | NO | | NULL | |
| change_in_price | decimal(10,6) | NO | | NULL | |
| volume | bigint(11) | NO | | NULL | |
| open_int | bigint(11) | NO | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
how do I fix my date problem?
ok, I changed the trade_dt field to date instead of char(10) and now when I run the below statement to load the data into the DB, it inserts blanks for dates, just because how the date is formatted in the data file.
LOAD DATA LOCAL INFILE '$fn' INTO TABLE $tn FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' (trade_dt,ticker_id,settle_price,change_in_price, volume, open_int);
this is a sample of raw data:
03/30/2012,Z (Dec 12),25.81,25.81,25.50,25.70,25.60,-0.45,24,0,318
As you note, it's because the data type is strings. You could get the correct result by casting the column to a date type inside the query, but that's incredibly inefficient. The only sound way to fix is to change the type of the trade_dt column, inside the table definition.
select trade_dt, ticker_id, settle_price, volume from
hist where volume > 0 and trade_dt between '2011-06-22' and '2011-06-30';
MySQL retrieves and displays DATE values in 'YYYY-MM-DD' format.
The problem is really with string field type.
You have two options:
Change field to date (preferred)
Store string dates in 'YYYY/MM/DD' format
While saving data you get date as dd/mm/yyyy just explode the date and save as in correct format sql accepts (YYYY-MM-DD);
$date = '26/07/2001';
$data= explode("/",$date);
$dateField = $data[2]."-".$data[1]."-".$data[0];
save this $dateField as date in sql table...
Related
I'm Marc and new to coding and databases. For my study I need to create a table with geographical data, collected by gps. This needs to be done in mysql 5. Importing the measurements from .csv, I came up with the following table:
+-----------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+----------+------+-----+---------+-------+
| meting_nr | int(11) | NO | PRI | 0 | |
| y_coord | double | YES | | NULL | |
| x_coord | double | YES | | NULL | |
| height | double | YES | | NULL | |
| type | char(40) | YES | | NULL | |
| type_nr | int(11) | YES | | NULL | |
| pt | point | YES | | NULL | |
+-----------+----------+------+-----+---------+-------+
7 rows in set (0.00 sec)
I determined the minimum and the maximum coordinates using the following query;
select meting_nr, astext(pt) from gps where (x_coord = (select min(x_coord) from gps)) or (x_coord = (select max(x_coord) from gps)) or (y_coord = (select min(y_coord) from gps)) or (y_coord = (select max(y_coord) from gps));
this results in the following points:
+-----------+--------------------------------+
| meting_nr | astext(pt) |
+-----------+--------------------------------+
| 101 | POINT(138235.3123 452751.2959) |
| 104 | POINT(138238.6632 452749.3718) |
| 161 | POINT(138207.704 452714.8049) |
| 190 | POINT(138197.9728 452715.1304) |
+-----------+--------------------------------+
I want a MBR around ALL these points. With following query I get a MBR around each seperate point:
select meting_nr, astext(envelope(pt)) from gps where (x_coord = (select min(x_coord) from gps)) or (x_coord = (select max(x_coord) from gps)) or (y_coord = (select min(y_coord) from gps)) or (y_coord = (select max(y_coord) from gps));
resulting in:
+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| meting_nr | astext(envelope(pt)) |
+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| 101 | POLYGON((138235.3123 452751.2959,138235.3123 452751.2959,138235.3123 452751.2959,138235.3123 452751.2959,138235.3123 452751.2959)) |
| 104 | POLYGON((138238.6632 452749.3718,138238.6632 452749.3718,138238.6632 452749.3718,138238.6632 452749.3718,138238.6632 452749.3718)) |
| 161 | POLYGON((138207.704 452714.8049,138207.704 452714.8049,138207.704 452714.8049,138207.704 452714.8049,138207.704 452714.8049)) |
| 190 | POLYGON((138197.9728 452715.1304,138197.9728 452715.1304,138197.9728 452715.1304,138197.9728 452715.1304,138197.9728 452715.1304)) |
+-----------+----------------------------------------------------------------------------------------------
What am I doing wrong?
I have a situation where we have hit max row size due to excessive columns.
I've found a way to list the columns and the max characters for the type of mediumtext which I intend on converting to varchar(200 or smaller) to reduce the footprint.
I would like to further refine this query to just show any mediumtext that is less than 255 characters.
SELECT CONCAT(
GROUP_CONCAT(
CONCAT('(SELECT \'',COLUMN_NAME,'\' AS `column`,MAX(CHAR_LENGTH(`',COLUMN_NAME,'`)) AS `max_length` ','FROM`',TABLE_SCHEMA,'`.`',TABLE_NAME,'` ORDER BY `max_length` DESC LIMIT 1)')
SEPARATOR ' UNION ALL '
), ';'
) AS _SQL
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'myTable'
AND COLUMN_TYPE='mediumtext';
I've been replacing ; with order by max_length desc; to get what I need and then eliminate what I don't want. However, I'd like to pretty it up.
This is the output I'm getting right now.
| column | max_length |
+--------------------------+------------+
| a_str_7 | 291 |
| description | 268 |
| a_str_8 | 160 |
| a_str_9 | 93 |
| close_notes | 46 |
| a_str_6 | 2 |
| comments | NULL |
| work_notes | NULL |
| group_list | NULL |
| a_str_10 | NULL |
| comments_and_work_notes | NULL |
| additional_assignee_list | NULL |
| user_input | NULL |
| a_str_11 | NULL |
| work_notes_list | NULL |
| approval_history | NULL |
| watch_list | NULL |
| a_str_12 | NULL |
| mgt_only | NULL |
I have the following query:
SELECT final_query.chr
, final_query.start
, final_query.end
, co.chr
, co.start
, co.end
, final_query.count
FROM (SELECT ed.chr
, ed.start
, ed.end
, case when e.bin1=ed.bin then e.bin2 else e.bin1 end AS target
, count
FROM (SELECT * FROM coordinates
WHERE chr="chr1" AND (start between 3960000 AND 4000000 OR end between 3960000 AND 4000000)
) ed
JOIN counts e ON (e.bin1 = ed.bin OR e.bin2=ed.bin)
SORT BY count LIMIT 1,20)
AS final_query
JOIN coordinates co ON final_query.target=co.bin;
and the output of EXPLAINED is:
+------+-------------+-------------+--------+---------------+---------+---------+-------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------------+--------+---------------+---------+---------+-------+----------+------------------------------------+
| 1 | SIMPLE | e | ALL | bin1,bin2 | NULL | NULL | NULL | 30763816 | Using filesort |
| 1 | SIMPLE | coordinates | ref | PRIMARY,chr | chr | 22 | const | 4929 | Using index condition; Using where |
| 1 | SIMPLE | co | eq_ref | PRIMARY | PRIMARY | 22 | func | 1 | Using where |
+------+-------------+-------------+--------+---------------+---------+---------+-------+----------+------------------------------------+
What I am doing is to perform the following query of table coordinates, which has field chr indexed. So, in the subquery shown below, I filter those rows that match my conditions.
... (SELECT * FROM coordinates
WHERE chr="chr1" AND (start between 3960000 AND 4000000 OR end between 3960000 AND 4000000)
) ...
This table outputs field bin, also indexed. This field bin links with bin1 and bin2 both from table counts and indexed as well. So, here, what I want is to get all those rows in table counts having coordinates.bin in fields bin1 and bin2. Why in this step no index is used?
Besides of it, I would like to add an ORDER BY in my query, just before the LIMIT statement. But it slows too much my query. I don't know why, because it have to sort a maximum of 4000 rows...
How can I optimize my query?
My tables, from the DESCRIBE statement:
Table counts
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| bin1 | varchar(20) | NO | MUL | NULL | |
| bin2 | varchar(20) | NO | MUL | NULL | |
| count | float(6,2) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
Table coordinates
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| bin | varchar(20) | NO | PRI | NULL | |
| chr | varchar(20) | NO | MUL | NULL | |
| start | int(11) | NO | | NULL | |
| end | int(11) | NO | | NULL | |
+-------+-------------+------+-----+---------+-------+
I have following tables:
**visitors**
+---------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+----------------+
| visitors_id | int(11) | NO | PRI | NULL | auto_increment |
| visitors_path | varchar(255) | NO | | | |
+---------------------+--------------+------+-----+---------+----------------+
**fedora_info**
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| pid | varchar(255) | NO | PRI | | |
| owner_uid | int(11) | YES | | NULL | |
+----------------+--------------+------+-----+---------+-------+
First I looking for visitors_path that are related to specific pages by:
SELECT visitors_id, visitors_path
FROM visitors
WHERE visitors_path REGEXP '[[:<:]]fedora/repository/.*:[0-9]+$';
The above query return expected result.
now .*:[0-9]+ in above query referred to pid in second table. now I want know count of result in above query grouped by owner_uid in second table.
How can I JOIN this tables?
EDIT
sample data:
visitors
+-------------+---------------------------------+
| visitors_id | visitors_path |
+-------------+---------------------------------+
| 4574 | fedora/repository/islandora:123 |
| 4575 | fedora/repository/islandora:123 |
| 4580 | fedora/repository/islandora:321 |
| 4681 | fedora/repository/islandora:321 |
| 4682 | fedora/repository/islandora:321 |
| 4704 | fedora/repository/islandora:321 |
| 4706 | fedora/repository/islandora:456 |
| 4741 | fedora/repository/islandora:456 |
| 4743 | fedora/repository/islandora:789 |
| 4769 | fedora/repository/islandora:789 |
+-------------+---------------------------------+
fedora_info
+-----------------+-----------+
| pid | owner_uid |
+-----------------+-----------+
| islandora:123 | 1 |
| islandora:321 | 2 |
| islandora:456 | 3 |
| islandora:789 | 4 |
+-----------------+-----------+
Expected result:
+-----------------+-----------+
| count | owner_uid |
+-----------------+-----------+
| 2 | 1 |
| 4 | 2 |
| 3 | 3 |
| 2 | 4 |
| 0 | 5 |
+-----------------+-----------+
I suggest you to normalize your database. When inserting rows in visitors extract pid in the front end language and put it in a separate column (e.g. fi_pid). Then you can join it easily.
The following query might work for you. But it'll be little cpu intensive.
SELECT
COUNT(a.visitors_id) as `count`,
f.owner_uid
FROM (SELECT visitors_id,
visitors_path,
SUBSTRING(visitors_path, ( LENGTH(visitors_path) -
LOCATE('/', REVERSE(visitors_path)) )
+ 2) AS
pid
FROM visitors
WHERE visitors_path REGEXP '[[:<:]]fedora/repository/.*:[0-9]+$') AS `a`
JOIN fedora_info AS f
ON ( a.pid = f.pid )
GROUP BY f.owner_uid
Following query returns expected result, but its very slow Query took 9.6700 sec
SELECT COUNT(t2.pid), t1.owner_uid
FROM fedora_info t1
JOIN (SELECT TRIM(LEADING 'fedora/repository/' FROM visitors_path) as pid
FROM visitors
WHERE visitors_path REGEXP '[[:<:]]fedora/repository/.*:[0-9]+$') t2 ON t1.pid = t2.pid
GROUP BY t1.owner_uid
I am doing a query that is retrieving some data from the past three months, the only problem is that some of the data I am getting doesn't have entries in certain months. Since they have no entries I'd like to mark that month as 0.
My first thought was the create a temp table and left join the labels that I need out of it. But that hasnt been successful.
Can anyone think of a way to do this?
Example: I want the last 3 months of Data and I am getting
'Component', 1325.1988
'Component', 554.1652
'Component', 103.6668
'Development', 203.4163
'Development', 59.4500
'Development', 19.7498
'Flash Assets', 285.5334
'Flash Assets', 302.1501
'Flash Assets', 61.1836
'Release', 0.6000
'Release', 2.3666
'Repackage', 416.2169
'Repackage', 5195.0839
'Repackage', 4.5667
'Source Diff', 1.9000
Where 'Source Diff' and 'Release' don't have 3 entries.
Thanks
Query
SELECT bt.name as 'Labels',
SUM(TIME_TO_SEC(TIMEDIFF(bs.eventtime, b.submittime))/60) AS 'Data'
FROM builds b JOIN buildstatuses bs ON bs.buildid = b.id JOIN buildtypes bt
ON bt.id = b.buildtype WHERE DATE(b.submittime)
BETWEEN DATE_SUB(CURDATE(), INTERVAL 2 MONTH) AND DATE(CURDATE())
AND bs.status LIKE 'Started HANDLER' AND b.buildtype != 11
AND b.buildtype != 5 AND b.buildtype != 4 GROUP BY bt.name, MONTH(b.submittime);
Table Schema
builds
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| submittime | datetime | NO | | NULL | |
| buildstatus | int(11) | NO | | NULL | |
| buildtype | varchar(20) | NO | | NULL | |
| buildid | int(11) | NO | | NULL | |
+---------------+------------------+------+-----+---------+----------------+
buildtypes
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(200 | NO | | NULL | |
+---------------+------------------+------+-----+---------+----------------+
buildstatuses
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| buildid | int(11) | NO | MUL | NULL | |
| eventtime | datetime | NO | | NULL | |
+------------+----------+------+-----+---------+----------------+
Here are some similar questions:
How to get values for every day in a month
Group by day and still show days without rows?
MySQL: filling empty fields with zeroes when using GROUP BY