Delete Data JSON from table MySql - mysql

Hi I need to remove from a json table everything that contains the name weapon_pistol50 , this is one of my tables in mysql
{"weapons":[{"ammo":74,"name":"WEAPON_PISTOL50"},{"ammo":118,"name":"WEAPON_PISTOL50"},{"ammo":54,"name":"WEAPON_PISTOL"}]}
The table is named: datastore_data
and the column that contains json format is called data.
I want to update all the tables by deleting this from the json: '{"ammo":118,"name":"WEAPON_PISTOL50"}'
I haven't tested many variables for now, but I need to do the above.

Here's a solution tested on MySQL 8.0:
update datastore_data cross join json_table(data, '$.weapons[*]'
columns(
i for ordinality,
ammo int path '$.ammo',
name varchar(20) path '$.name'
)
) as j
set data = json_remove(data, concat('$.weapons[', j.i-1, ']'))
where j.ammo = 118 and j.name = 'WEAPON_PISTOL50';
If you are using a version of MySQL too old to support JSON_TABLE(), then it's a lot harder.
Frankly, it would be far easier if you didn't use JSON. Instead, store one weapon per row in a second table with normal columns named ammo and name.
create table weapons(
id serial primary key,
owner_id int,
ammo int,
name varchar(20)
);
Then you could do this task much more simply:
delete from weapons
where ammo = 118 and name = 'WEAPON_PISTOL50';
Storing data in JSON documents might seem like a convenient way to load complex data into a single row of a single table, but virtually every task you have to do with that data afterwards becomes a lot harder than if you had used normal tables and columns.

Related

How to Parse JSON Column When Hierarchy Key Changes Every Row SQL Server

I have a requirement to build an automated report from a SQL table that has a column with nested JSON data. There is a primary key called distribution orders and then another hierarchy where the key is an order number that changes every row. The value for this key is an array of key-value pairs. I am trying to extract the keys of this array as columns and their respective values. I am trying to use OPENJSON and CROSS APPLY to achieve this. I have found plenty of material on how to do this, but the problem I am facing is trying to get by the second layer key that changes every row. After performing CROSS APPLY OPENJSON twice the key is now this order number string which will be different for every row of data that I select. I cannot hardcode this value as I have hundreds of rows to parse. Here is example JSON data to illustrate my problem:
{"distributionOrders":{"3000283984":[{"orderNumber":"3000283984","orderType":"STC","itemNumber":"W01874"}]}}
{"distributionOrders":{"3000308956":[{"orderNumber":"3000308956","orderType":"EVA","itemNumber":"S28741"}]}}
{"distributionOrders":{"3000308961":[{"orderNumber":"3000308961","orderType":"EXP","itemNumber":"W09234"}]}}
{"distributionOrders":{"3000309119":[{"orderNumber":"3000309119","orderType":"STC","itemNumber":"W01874"}]}}
I am trying to get to orderNumber, orderType, and itemNumber. In the first example "3000283984" is the key I am trying to get past without using the key name as it is currently named.
This query works great for one row of data:
SELECT p.orderNumber, p.orderType, p.itemNumber
FROM myDatabase
CROSS APPLY OPENJSON(shipment_details)
WITH (distributionOrders NVARCHAR(max) AS JSON) do
CROSS APPLY OPENJSON(do.distributionOrders)
WITH ("3000325050" NVARCHAR(max) AS JSON)nu
OUTER APPLY OPENJSON(nu."3000325050")
WITH(orderNumber varchar(20), orderType varchar(20), itemNumber varchar(20))p
Now any ideas on how I can get it to scale for hundreds of rows? Modification of the original JSON to be a generic key name may be possible, but is not something I have control over. Thanks!
With no expected results this is a bit of a guess but perhaps this is what you are after?
CREATE TABLE dbo.YourTable (YourJSON nvarchar(MAX));
GO
INSERT INTO dbo.YourTable(YourJSON)
VALUES
(N'{"distributionOrders":{"3000283984":[{"orderNumber":"3000283984","orderType":"STC","itemNumber":"W01874"}]}}'),
(N'{"distributionOrders":{"3000308956":[{"orderNumber":"3000308956","orderType":"EVA","itemNumber":"S28741"}]}}'),
(N'{"distributionOrders":{"3000308961":[{"orderNumber":"3000308961","orderType":"EXP","itemNumber":"W09234"}]}}'),
(N'{"distributionOrders":{"3000309119":[{"orderNumber":"3000309119","orderType":"STC","itemNumber":"W01874"}]}}');
GO
SELECT dO.orderNumber,
dO.orderType,
dO.itemNumber
FROM dbo.YourTable YT
CROSS APPLY OPENJSON(YT.YourJSON, '$.distributionOrders') J
CROSS APPLY OPENJSON(J.[value])
WITH (orderNumber bigint,
orderType varchar(3),
itemNumber varchar(6)) dO;
GO
DROP TABLE dbo.YourTable;
db<>fiddle

SQL database structure for time series data type

I wonder if someone could take a minute out of their day to give some suggestion on my database structure design.
I have sensor data (e.g temperature, humidity ...) with time series format (10Hz) which are installed in different floors of different houses of different cities. So let say something like this:
*City Paris-->House A -->Floor 1 --> Sensor Humidity & temp --> csv file with time series for hours, days, years
City Paris-->House B -->Floor 3 --> Sensor Humidity --> csv file with time series for hours, days, years*
So now I would like to answer these questions:
1- What would be the most efficient method to store the data A sql database?
2- Would it make sense to have timestamp data stored in sql database but the sensor data in CSV file and then link them them to sql database?
3- What about the scalability and possibility to add new sensors?
Many thanks for your help and suggestion in advance,
If your objective is to run time-series analytics, I would recommend to break down your data so that each reading is in one row and to use a time-series database.
The schema proposed earlier is good. But I personally find storing the data in 3 tables too complex as you need to write / check constraints across 3 different tables, and most of your queries will require JOIN clauses.
There are ways to make this schema simpler, for example by leveraging the symbol type in QuestDB. Symbol stores repetitive strings as a map of integers. On the surface, you are manipulating strings, but the storage cost and operation complexity is that of an int.
This means you can store all your data in a single, more simple table, with no performance or storage penalty. And this would simplify both ingestion as you write into only one table, and queries by removing the need to perform multiple joins.
Here is what the DDL would look like.
CREATE TABLE measurements (
id INT,
ts TIMESTAMP,
sensor_name SYMBOL,
floor_name SYMBOL,
building_name SYMBOL,
building_city SYMBOL,
type SYMBOL,
value DOUBLE
) timestamp (ts)
If you want to add more sensors or buildings, all you need to do is write to the same table.
At least you should not save the csv in the database as a varchar or text at once. You should break down eveything in as small parts as possible. My suggestion is you first create a table like this
CREATE TABLE measurements (measurement_id INT PRIMARY KEY, floor_id INT, type VARCHAR(50), value FLOAT)
Then you create a table for floors
CREATE TABLE floors (floor_id INT PRIMARY KEY, building_id INT, floor_name INT)
And at least the connection to the building
CREATE TABLE buildings (building_id INT PRIMARY KEY, building_name VARCHAR(200), building_city VARCHAR(200))
You should create foreign keys from the floors.floor_id to measurements.floor_id and the buildings.building_id to floor.building _id.
You can even break down into more tables to have cities and/or addresses in own once if you like.

Import from Excel to SQL with conditional check for duplicates

I have a huge number of data stored in PDF files which I would like to convert into a SQL database. I can extract the tables from the PDF files with some online tools. I also know how to import this into MySQL. BUT:
The list contains users with names, birth dates and some other properties. A user may exist in other PDF files too. So when I'm about to convert the next file into Excel and import it to MySQL, I want to check if that user already exists in my table. And this should be done based on several properties - we may have the same user name, but with different date of birth, that can be a new record. But if all the selected properties match then that specific user would be a duplicate and shouldn't be imported.
I guess this is something I can do with a copy from temporary table but not sure what the selection should be. Let's say user name is stored in column A, date of birth in column B and city in column C. What would be the right script to verify these in the existing table and skip copy if all three match with an existing record?
Thanks!
1- Create a permanent table
Create table UploadData
(
id int not null AUTO_INCREMENT,
name varchar(50),
dob datetime,
city varchar(30)
)
2- Import your data in Excel to your SQL DB. This is how you do it in Sql Server mentioned below, not sure about MySQL but might be something similar. You said you know how to do it already in your question, that's why I am not specifying each step for MySQL
Right-click to your DB, go to Tasks -> Import Data, From: Microsoft Excel, To: Your DB name, Select UploadData table, (check Edit Columns to make sure the columns are matching), finish uploading from Excel to your SQL DB.
3- Check if data exists in your main table, if not, add.
CREATE TEMPORARY TABLE #matchingData (id int, name varchar(50), dob datetime, city (varchar(30))
INSERT INTO #matchingData
select u.id, u.name, u.dob, u.city
from main_table m
inner join UploadData u on u.name = m=name
and u.dob = m.dob
and u.city = m.city
insert into main_table (name, dob, city)
select name, dob, city
from UploadData
where id not in (select id from #matchingData)
4- No need UploadData table anymore. So: DROP TABLE UploadData
Add primary key constraints to Column A, Column B and Column C
It will avoid duplicate rows but can have duplicate values under single column.
Note: There is a limit on maximum number of primary keys in a particular table.

automatically retrieve data from related tables

I'm working with a database that contains a table called model_meta which contains metadata about all the various models in use by the application. Here is the relevant data structure:
CREATE TABLE model_meta (
id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(64),
oo INT(11),
om INT(11),
mo INT(11),
mm INT(11),
INDEX (name)
);
CREATE TABLE inventory (
id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
type VARCHAR(255),
customers_id INT(11)
);
CREATE TABLE customers (
id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255),
contact VARCHAR(255)
);
The columns oo, om, mo, and mm in the model_meta table contain a comma-separated list of ids to which that model has the specified relationship (i.e. one-to-one, one-to-many, etc.).
When a client requests data, all I'm given is the name of the table they're requesting (e.g. 'inventory') - from that, I need to determine what relationships exist and query those tables to return the appropriate result set.
Given a single variable (let's call it $input) that contains the name of the requested model, here are the steps:
get model metadata: SELECT model_meta.* FROM model_meta WHERE model_meta.name = $input;
determine which, if any, of the relationship columns (oo, om, mo, mm) contain values - keeping in mind that they can contain a comma-separated list of values.
use the values from step 2 to determine the name of the related model(s) - for the sake of example, let's just say that only mo contains a value and we'll refer to it as $mo.
So: SELECT model_meta.name FROM model_meta WHERE model_meta.id = $mo;
Let's call this result $related.
Finally, select data from the requested table and all tables that are related to it - keeping in mind that we may be dealing with a one-to-one, one-to-many, many-to-one, or many-to-many relationship. For this specific example:
In psuedo-SQL: SELECT $input.*, $related.* FROM $input LEFT JOIN $related ON ($related.id = $input.$related_id);
This method uses three separate queries - the first to gather metadata about the requested table, the second to gather the names of related tables, and the third to query those tables and return the actual data.
My question: Is there an elegant way to combine any of these queries, reducing their number from from 3 to 2 - or even down to one single query?
The real goal, of course, is to in some way automate the retrieval of data from related tables (without the client having to knowing how the tables are related). That's the goal.

Copying certain data from one table's columns into another through a link table

As part of a very slow refactoring process of an inherited system, I need to eliminate a couple of slow joins and subqueries. As I'm familiarising myself with the system, I'm slowly sanitising the database structure, to get rid of the held-together-by-duct-tape feeling, making incremental improvements, hoping nothing breaks in the meantime. Part of this involves combining data from two tables linked by a third into one.
Table structure is similar to this:
CREATE TABLE groups
(
group_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
-- various other fields that are appropriate to groups
...
-- these fields need to be filled
a ENUM(...) NOT NULL,
b INTEGER NOT NULL,
c VARCHAR(...) NOT NULL
);
CREATE TABLE items
(
-- key is determined by an external data source
item_id INTEGER NOT NULL PRIMARY KEY,
-- various other fields that are appropriate to items
...
-- these fields shouldn't be here, but in the groups table
a ENUM(...) NOT NULL,
b INTEGER NOT NULL,
c VARCHAR(...) NOT NULL
);
CREATE TABLE group_items
(
item_id INTEGER NOT NULL,
group_id INTEGER NOT NULL,
PRIMARY KEY (item_id,group_id)
);
An item may be in more than one group. Each record in the table "items" has values for columns a, b and c, which are actually not properties of the items, but of the groups of which the items are a part. (This is causing problems, as the values may be different if the item is in another group).
I can't remove the fields from the items table yet, as they are filled by an insane import process from an almost-as-insane data source. Until I get around to fixing the import process, I'm stuck with having the fields exist in the items table, but in the short term at least I can eliminate the slow lookups to get them.
Right now I have a loop in PHP that runs over each group, takes the values from the first item it encounters (which is fine -- all items in a group will have the same values for a, b and c) and places them into the group. This process is rather slow and laborious and unfortunately runs very frequently on an overloaded and underpowered server. Is there a smart way to copy these (and only these) values from the items table into the groups table and have MySQL do the heavy lifting, rather than relying on a PHP script?
Looks like I found my own answer. As the number of items in each group is relatively small, there may be some duplicate work being done but it's not a bottleneck and much faster than the PHP loop:
UPDATE
groups g
INNER JOIN group_items USING(group_id)
INNER JOIN items i USING(item_id)
SET
g.a = i.a,
g.b = i.b,
g.c = i.c;
Seems to do what I need.