I have a table with just one row and one column which stores a JSON array with about 30MB/16k objects in it:
CREATE TABLE [dbo].[CitiesTable]
(
[CitiesJson] [NVARCHAR](MAX) NOT NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[CitiesTable] ([CitiesJson])
VALUES ('{"cities":[{"cityName": "London","residentCount": 8961989},{"cityName": "Paris","residentCount": 2165423},{"cityName": "Berlin","residentCount": 3664088}]}')
I use this query to parse the JSON and bring it into a relational structure:
SELECT x.[CityName], x.[ResidentCount]
FROM
OPENJSON((SELECT [CitiesJson] FROM dbo.CitiesTable), '$.cities')
WITH
(
[CityName] [NVARCHAR] (50) '$.cityName',
[ResidentCount] [INT] '$.residentCount'
) AS x
Which yields:
CityName ResidentCount
---------- -------------
London 8961989
Paris 2165423
Berlin 3664088
I'd like to create a view for this so that I don't have to include the bulky query in several places.
But using this query inside a view has the downside that the JSON has to be parsed each time the view is executed... So I'm considering to create an Indexed View to gain the advantage that the view itself just has to be re-executed if the underlying table-data changes.
Unfortunately an indexed view has quite some prerequisites. Being one of them that no subqueries are allowed.
Hence the view can be created...
CREATE VIEW dbo.Cities_IndexedView
WITH SCHEMABINDING
AS
SELECT x.[CityName], x.[ResidentCount]
FROM
OPENJSON((SELECT [CitiesJson] FROM dbo.CitiesTable), '$.cities')
WITH
(
[CityName] [NVARCHAR] (10) '$.cityName',
[ResidentCount] [INT] '$.residentCount'
) AS x
But the following index creation fails:
CREATE UNIQUE CLUSTERED INDEX Cities_IndexedView_ucidx
ON dbo.Cities_IndexedView([CityName]);
Cannot create index on view MyTestDb.dbo.Cities_IndexedView" because it contains one or more subqueries. Consider changing the view to use only joins instead of subqueries. Alternatively, consider not indexing this view.
Is there any way to work around this? I don't know how to access the CitiesJson column within the OPENJSON without using a sub-select...
EDIT:
Zhorov had a nice idea to eliminate the subquery:
SELECT x.[CityName], x.[ResidentCount]
FROM [dbo].[CitiesTable] c
CROSS APPLY OPENJSON(c.[CitiesJson], '$.cities') WITH ([CityName] [NVARCHAR] (10) '$.cityName', [ResidentCount] [INT] '$.residentCount') AS x
But unfortunately APPLY can't be used in indexed views (see here):
Cannot create index on view "MyTestDb.dbo.Cities_IndexedView" because it contains an APPLY. Consider not indexing the view, or removing APPLY.
The additional requirements also state that OPENXML and table valued functions aren't allowed either. So I guess OPENJSON is just not yet mentioned in the docs but isn't allowed as well :-(
Locally I use SQL Server 2016. I created db fiddle over here which uses SQL Server 2019. And yep, OPENJSON just seems to be impossible to use:
Cannot create index on the view 'fiddle_cf57a9b555f74ea1ada4c5d0d277cf95.dbo.Cities_IndexedView' because it uses OPENJSON.
Creating an indexed view can only use a sujection from the original data to the target view, which is never the case when the query contains subqueries, outer join or UNION, INTERSECT or EXCEPT.
You must use another logic like a target table and a trigger.
About the table structure to do that, just use the SELECT INTO to create the "snapshot" table with or without primary formal data ::
SELECT IDENTITY(INT, 1, 1) AS JSON_ID, x.[CityName], x.[ResidentCount]
INTO MySQL_Schema.MyJSON_Data
FROM
OPENJSON((SELECT [CitiesJson] FROM dbo.CitiesTable), '$.cities')
WITH
(
[CityName] [NVARCHAR] (10) '$.cityName',
[ResidentCount] [INT] '$.residentCount'
) AS x
Then make JSON_ID a primary key :
ALTER MySQL_Schema.MyJSON_Data ADD PRIMARY KEY (JSON_ID);
Related
This may seem like a dumb question. I am wanting to set up an SQL db with records containing numbers. I would like to run an enquiry to select a group of records, then take the values in that group, do some basic arithmetic on the numbers and then save the results to a different table but still have them linked with a foreign key to the original record. Is that possible to do in SQL without taking the data to another application and then importing it back? If so, what is the basic function/procedure to complete this action?
I'm coming from an excel/macro/basic python background and want to investigate if it's worth the switch to SQL.
PS. I'm wanting to stay open source.
A tiny example using postgresql (9.6)
-- Create tables
CREATE TABLE initialValues(
id serial PRIMARY KEY,
value int
);
CREATE TABLE addOne(
id serial,
id_init_val int REFERENCES initialValues(id),
value int
);
-- Init values
INSERT INTO initialValues(value)
SELECT a.n
FROM generate_series(1, 100) as a(n);
-- Insert values in the second table by selecting the ones from the
-- First one .
WITH init_val as (SELECT i.id,i.value FROM initialValues i)
INSERT INTO addOne(id_init_val,value)
(SELECT id,value+1 FROM init_val);
In MySQL you can use CREATE TABLE ... SELECT (https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html)
Subquery on in clause:
SELECT * FROM TABLE1 WHERE IN (SELECT Field1 FROM TABLE2)
Literal on in clause:
SELECT * FROM TABLE1 WHERE IN (1,2,3,4)
Which query is better?
Appends
Ok, let's I elaborate my database
-- `BOARD` is main board table
CREATE TABLE BOARD (
BoardKey INT UNSIGNED,
Content TEXT,
PRIMARY KEY (BoardKey)
)
-- `VALUE` is extra value table
CREATE TABLE VALUE (
BoardKey INT UNSIGNED,
Value TEXT
)
And this example is searching board record using EAV fields
First step is extract needed board keys from VALUE table
Next step is searching board from BOARD table using extracted board keys
This example is just example,
so I don't need restructuring table design
Subquery on in clause:
SELECT * FROM BOARD WHERE (SELECT BoardKey FROM VALUE WHERE Value='SOME')
Literal on in clause:
SELECT BoardKey FROM VALUE WHERE AND Value='SOME'
Get list of BoardKey and put to the some variable
SELECT * FROM BOARD WHERE BoardKey IN (1,2,3,4)
It all depends on your initial requirements. If you know the values (here 1,2,3,4) are static, you may hard code them. But if they will change in the future, it is better to use the sub query. Normally subquery is more durable but more resource consuming.
Please elaborate on your requirements, so that we can understand the problem and answer you better.
EDIT 1:
Ok, first of all, i have never seen a EAV model on two table, basically it is done with one table. In you case you will have difficulty searching for the key in the two table when you can combine them in one table. Ideally, you table should be like this :
CREATE TABLE BOARD
(
BoardKey INT UNSIGNED,
Content TEXT,
Value TEXT
PRIMARY KEY (BoardKey)
)
Finally, you can do
SELECT * FROM BOARD WHERE Value='SOME'
If the value 'SOME' will change in the future, better stick with Subquery. Hope it helped, vote answered if so.
I have the following table in PostgreSQL:
CREATE TABLE index_test
(
id int PRIMARY KEY NOT NULL,
text varchar(2048) NOT NULL,
last_modified timestamp NOT NULL,
value int,
item_type varchar(2046)
);
CREATE INDEX idx_index_type ON index_test ( item_type );
CREATE INDEX idx_index_value ON index_test ( value )
I make the following selects:
explain select * from index_test r where r.item_type='B';
explain select r.value from index_test r where r.value=56;
The explanation of execution plan looks like this:
Seq Scan on index_test r (cost=0.00..1.04 rows=1 width=1576)
Filter: ((item_type)::text = 'B'::text)'
As far as I understand, this is a full table scan. The question is: why my indexes are not used?
May be, the reason is that I have too few rows in my table? I have only 20 of them. Could you please provide me with a SQL statement to easily populate my table with random data to check the indexes issue?
I have found this article: http://it.toolbox.com/blogs/db2luw/how-to-easily-populate-a-table-with-random-data-7888, but it doesn't work for me. The efficiency of the statement does not matter, only the simplicity.
Maybe, the reason is that I have too few rows in my table?
Yes. For a total of 20 rows in a table a seq scan is always going to be faster than an index scan. Chances are that those rows are located in a single database block anyway, so the seq scan would only need a single I/O operation.
If you use
explain (analyze true, verbose true, buffers true) select ....
you can see a bit more details about what is really going on.
Btw: you shouldn't use text as a column name, as that is also a datatype in Postgres (and thus a reserved word).
The example you have found is for DB2, in pg you can use generate_series to do it.
For example like this:
INSERT INTO index_test(data,last_modified,value,item_type)
SELECT
md5(random()::text),now(),floor(random()*100),md5(random()::text)
FROM generate_series(1,1000);
SELECT max(value) from index_test;
http://sqlfiddle.com/#!12/52641/3
The second query in above fiddle should use index only scan.
I'm just starting out with MySQL (I come from using SQL Server previously). I haven't yet started implementing anything in MySQL, just researching how to do things and what problems I might encounter.
In SQL Server I've used CTEs to successfully recurse through an adjacency list table structure to produce the desired result set. From what I can tell so far with MySQL, it does not support CTEs. I've got a fairly simple table structure to hold my hierarchy (written in SQL Server syntax b/c of my familiarity with it):
CREATE TABLE TreeNodes (
NodeId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
ParentNodeId int NULL,
Name varchar(50) NOT NULL,
FullPathName varchar(MAX) NOT NULL, -- '/' delimited names from root to current node
IsLeaf bit NOT NULL -- is this node a leaf?
)
Side Note: I realize that FullPathName and IsLeaf are not required and could be determined at query time, but the insert of a tree node will be a very uncommon occurrence as opposed to the queries against this table - which is why I plan to compute those two values as part of the insert SP (will make the queries that need those two values less costly).
With CTE (in SQL Server), I would have a function like the following to find leaf nodes of current node:
CREATE FUNCTION fn_GetLeafNodesBelowNode (
#TreeNodeId int
)
RETURNS TABLE
AS
RETURN
WITH Tree (NodeId, Name, FullPathName, IsLeaf)
AS (
SELECT NodeId, Name, FullPathName, IsLeaf FROM TreeNodes WHERE NodeId = #TreeNodeId
UNION ALL
SELECT c.NodeId, c.Name, c.FullPathName, c.IsLeaf FROM Tree t
INNER JOIN TreeNodes c ON t.NodeId = c.ParentNodeId
)
SELECT * FROM Tree WHERE IsLeaf = 1
How would I do the same with MySQL?
Thanks in advance.
You can get it done by some sort of stored functions and bit logic.
Here is one example.
Have a try.
This might be a basic question: I am using a temporary table in some of my php code like so:
CREATE TEMPORARY TABLE ttable( `d` DATE NOT NULL , `p` DECIMAL( 11, 2 ) NOT NULL , UNIQUE KEY `date` ( `date` ) );
INSERT INTO ttable( d, p ) VALUES ( '$d' , '$p' );
SELECT * FROM ttable;
As we scale up our site, will this ever be a problem? ie, will user1's ttable & user2's ttable ever get mixed up & user1 sees user2's ttable & vice versa? Is it better to create a unique name for each unique temporary table?
thx
Temporary tables are session-specific. Every time you connect to a host (in PHP, this is done with mysql_connect), temporary tables that you create exist only within that session/connection.
It is almost always better to find a different way than using temporary tables.
The only time I would consider them is under the following conditions:
The activity is rare. Meaning, a given user MIGHT do this once a week.
It is used as a holding container prior to doing a regular full import of data.
It deals with data whose structure is unknown prior to being filled.
All three of those really go with building some type of generic bulk import routines where the data mapping is defined at run time.
If you find yourself creating temp tables frequently in the application, there's probably a better way.
Scalability is going to depend on the amount of data being loaded and frequency of temp table usage. For a low trafficked site it might be okay.
We're in the process of ripping out a ton of temp table usage by a client's app. 90% of the queries in their system result in a temp table being created. Analysis of all the queries have shown that the original dev used this mechanism simply because they didn't understand SQL. We're doing this because performance has radically dropped off as new users are added to the system.
Can you post a use case? Maybe we can help provide an alternate mechanism.
UPDATE:
Now that we have a use case, here is a simple table structure to accomplish what you need.
Table ZipCodes
ZipCode char(5) [or char(10) depending on need]
CityName varchar(50)
*other columns as necessary such as latitude or whatever.
Table TempReadings
ZipCode char(5) [foreign key to the ZipCode table]
ReadingDate datetime
Temperature float (or some equivalent)
To get all the temp readings for a given zip code you would do something like:
select ZipCode, ReadingDate, Temperature
from TempReadings
if you need info from the main ZipCode table:
select Z.ZipCode, Z.CityName, TR.ReadingDate, TR.Temperature
from ZipCodes Z
inner join TempReadings TR on (TR.ZipCode = Z.ZipCode)
add where clauses as necessary. Note that none of the above requires having a separate table per zip code.