Whats the best way to retrieve array data from MySql - mysql

I'm storing a object / data structure like this inside a MySql (actually a MariaDb) database:
{
idx: 7,
a: "content A",
b: "content B",
c: ["entry c1", "entry c2", "entry c3"]
}
And to store it I'm using 2 tables, very similar to the method described in this answer: https://stackoverflow.com/a/17371729/3958875
i.e.
Table 1:
+-----+---+---+
| idx | a | b |
+-----+---+---+
Table 2:
+------------+-------+
| owning_obj | entry |
+------------+-------+
And then made a view that joins them together, so I get this:
+-----+------------+------------+-----------+
| idx | a | b | c |
+-----+------------+------------+-----------+
| 7 | content A1 | content B1 | entry c11 |
| 7 | content A1 | content B1 | entry c21 |
| 7 | content A1 | content B1 | entry c31 |
| 8 | content A2 | content B2 | entry c12 |
| 8 | content A2 | content B2 | entry c22 |
| 8 | content A2 | content B2 | entry c32 |
+-----+------------+------------+-----------+
My question is what is the best way I can get it back to my object form? (e.g. I want an array of the object type specified above of all entries with idx between 5 and 20)
There are 2 ways I can think of, but both seem to be not very efficient.
Firstly we can just send this whole table back to the server, and it can make a hashmap with the keys being the primary key or some other unique index, and collect up the different c columns, and rebuild it that way, but that means it has to send a lot of duplicate data, and take a bit more memory and processing time to rebuild on the server. This method also won't be very pleasant to scale if we have multiple arrays, or have arrays within arrays.
Second method would be to do multiple queries, filter Table 1 and get back the list of idx's you want, and then for each idx, send a query for Table 2 where owning_obj = current idx. This would mean sending a whole lot more queries.
Neither of these options seems very good, so I'm wondering if there is a better way. Currently I'm thinking it can be something that utilizes JSON_OBJECT(), but I'm not sure how.
This seems like a common situation, but I can't seem to find the exact wording to search for to get the answer.
PS: The server interfacing with MySql/MariaDb is written in Rust, don't think this is relevant in this question though

You can use GROUP_CONCAT to combine all the c values into a comma-separated string.
SELECT t1.idx, t1.a, t1.b, GROUP_CONCAT(entry) AS c
FROM table1 AS t1
LEFT JOIN table2 AS t2 ON t1.idx = t2.owning_obj
GROUP BY t1.idx
Then explode the string in PHP:
$result_array = [];
while ($row = $result->fetch_assoc()) {
$row['c'] = explode(',', $row['c']);
$result_array[] = $row;
}
However, if the entries can be long, make sure you increase group_concat_max_len.
If you're using MySQL 8.0 you can also use JSON_ARRAYAGG(). This will create a JSON array of the entry values, which you can convert to a PHP array using json_decode(). This is a little safer, since GROUP_CONCAT() will mess up if any of the values contain comma. You can change the separator, but you need a separator that will never be in any values. Unfortunately, this isn't in MariaDB.

Related

mySQL - Reiteratively Count rows that have particular CSV string

2-column MySQL Table:
| id| class |
|---|---------|
| 1 | A,B |
| 2 | B,C,D |
| 3 | C,D,A,G |
| 4 | E,F,G |
| 5 | A,F,G |
| 6 | E,F,G,B |
Requirement is to generate a report/output which tells which individual CSV value of class column is in how many rows.
For example, A is present in 3 rows (with id 1,3,5), and C is present in 2 rows (with id 2,3), and G is in 4 rows (3,4,5,6) so the output report should be
A - 3
B - 3
C - 2
...
...
G - 4
Essentially, column id can be ignored.
The draft that I can think of - first all the values of class column need to picked, split on comma, then create a distinct list of each unique value (A,B,C...), and then count how many rows contain the unique value from that distinct list.
While I know basic SQL queries, this is way too complex for me. Am unable to match it with some CSV split function in MySQL. (Am new to SQL so don't know much).
An alternative approach I made it to work - Download class column values in a file, feed it to a perl script which will create a distinct array of A,B,C, then read the downloaded CSV file again foreach element in distinct array and increase the count, and finally publish the report. But this is in perl which will be a separate execution, while the client needs it in SQL report.
Help will be appreciated.
Thanks
You may try split-string-into-rows function to get distinct values and use COUNT function to find number of occurrences. Specifically check here

More than 255 Fields in Access 2000/2010

I am converting a 20-year old system from DBase IV into Access 2010, via Access 2000, in order to be more suitable for Windows 10. However, I have about 350 fields in the database as it is a parameters table and MS-Access 2000 and MS-Access 2010 are complaining about it. I have repaired the database to removed the internal count problem but am rather surprised that Windows 10 software would have such a low restriction. Does anyone know how to bypass this? Obviously I can break it into 2 tables but this seems rather archaic.
When you start to run up against limitations such as this, it reeks of poor database design.
Given that you state that the table in question is a 'parameters' table, with so many parameters, have you considered structuring the table such that each parameter occupies its own record?
For example, consider the following approach, where ParamName is the primary key for the table:
+----------------+------------+
| ParamName (PK) | ParamValue |
+----------------+------------+
| Param1 | Value1 |
| Param2 | Value2 |
| ... | |
| ParamN | ValueN |
+----------------+------------+
Alternatively, if there is the possibility that each parameter may have multiple values, you can simple add one additional field to differentiate between multiple values for the same parameter, e.g.:
+----------------+--------------+------------+
| ParamName (PK) | ParamID (PK) | ParamValue |
+----------------+--------------+------------+
| Param1 | 1 | Value1 |
| Param1 | 2 | Value2 |
| Param1 | 3 | Value3 |
| Param2 | 1 | Value2 |
| ... | ... | ... |
| ParamN | 1 | Value1 |
| ParamN | N | ValueN |
+----------------+--------------+------------+
I had similar problem - we have more than 300 fields in one Contact table on SQL sever linked to Access. You probably do not need to display 255 fields on one form - that would not be user friendly. You can split it to several sub-forms with different underlined queries for each form with less than the limitation. All sub-forms would be linked by the ID.
Sometimes splitting tables as suggested above is not the best idea because of performance.
As Lee Mac described a sample change in structure of a "parameters" table really would be your better choice. You could then define some constants for each of these to be used in code to prevent accidental misspelling later in code in case used in many places.
Then you could create a function (or functions) that take a parameter of what parameter setting you are looking for, it queries the table for that as the key and returns the value. Not being a VB/Access developer, but would think cant overload the functions to have a single function but returning different data types such as string, int, dates, etc. So you may want functions something like
below samples in C#, but principle would be the same.
public int GetAppParmInt( string whatField )
public DateTime GetAppParmDate( string whatField )
public string GetAppParmString( string whatField )
etc...
Then you could get the values by calling the function that has the sole purpose of querying the parameters table for that one key and returns the value as stored.
Hopefully a combination of offered solutions here can help you in your upgrade, even if your parameter table (expanding a bit on Lee Mac's answer) has each data type you are storing to correspond with the "GetAppParm[type]"
ParmsTable
PkID ParmDescription ParmInt ParmDate ParmString
1 CompanyName Your Company
2 StartFiscalYear 2019-06-22
3 CurrentQuarter 4
4... etc.
Then you don't have to worry about changing / data conversions all over the place. They are stored in the proper data type you expect and return that type.

Parsing multiple JSON schemas with Spark

I need to collect a few key pieces of information from a large number of somewhat complex nested JSON messages which are evolving over time. Each message refers to the same type of event but the messages are generated by several producers and come in two (and likely more in the future) schemas. The key information from each message is similar but the mapping to those fields is dependent on the message type.
I can’t share the actual data but here is an example:
Message A
—header:
|—attribute1
|—attribute2
—typeA:
|—typeAStruct1:
||—property1
|-typeAStruct2:
||-property2
Message B
-attribute1
-attribute2
-contents:
|-message:
||-TypeB:
|||-property1
|||-TypeBStruct:
||||-property2
I want to produce a table of data which looks something like this regardless of message type:
| MessageSchema | Property1 | Property2 |
| :———————————- | :———————— | :———————— |
| MessageA | A1 | A2 |
| MessageB | B1 | B2 |
| MessageA | A3 | A4 |
| MessageB | B3 | B4 |
My current strategy is read the data with schema A and union with the data read with Schema B. Then I can filter the nulls that result from parsing a type A message with a B schema and vice versa. This seems very inefficient especially once a third or fourth schema emerge. I would like to be able to parse the message correctly on the first pass and apply the correct schema.
As i see it - there is only one way:
For each message type you create an 'adapter' that will create dataframe from input and transform it to the common schema dataframe
Then union outputs of the adapters
Obviously, if you change 'common' schema - you will need to tailor your 'adapters' as well.

How do I turn a list of interconnected pairs of ids into a cluster of ids?

I have a table with pairs (and sometimes triples) of ids, which act as sort of links in a chain
+------+-----+
| from | to |
+------+-----+
| id1 | id2 |
| id2 | id3 |
| id4 | id5 |
+------+-----+
I want to create a new table where all the links are clustered into chains/families:
+-----+----------+
| id | familyid |
+-----+----------+
| id1 | 1 |
| id2 | 1 |
| id3 | 1 |
| id4 | 2 |
| id5 | 2 |
+-----+----------+
i.e. add up all chains in a link into a single family, and give it an id.
in the example above, the first 2 rows of the first table create one family, and the last row creates another family.
Solution
I will use node.js to query big batches of rows (a few thousands every batch), process them, and insert them into my own table with a family id.
The issue
The problem is I have a few tens of thousands of id pairs, and I will also need to add new ids over time after the initial creation of the families table, and i will need to add ids to existing families
Are there good algorithms for clustering pairs of data into families/clusters, keeping my issue in mind?
Not sure if it's an answer as more some ideas...
I created two tables similar to the ones you have, the first one I populated with the same data as you have.
Table Base, fromID, toID
Table chain, fromID, chainID (numeric, null allowed)
I then inserted all unique values from Base into chain with a null value for chainID. The idea being these are the rows as yet unprocessed.
It was then a case of repeatedly running a couple of statements...
update chain c
set chainID = n
where chainid is null and exists ( select 1 from base b where b.fromID = c.fromID )
order by fromID
limit 1
This would allocate the next chain ID to the first row without one (n needs to be generated from somewhere and incremented each time you run this)
Then the one that relates all of the records...
update chain c
join base b on b.toID = c.fromID
join chain c1 on b.fromID = c1.fromID
set c.chainID = c1.chainID
where c.chainID is null and c1.chainID is not null
This is run repeatedly until it affects 0 rows (i.e. it's nothing more to do).
Then run the first update to create the next chain etc. Again if you run the first update till it affects 0 rows, this shows that they are all linked.
Would be interested if you want to try this and see if it stands up with more complex scenarios.
This looks a lot like clustering over graph dataset where 'familyid' is the cluster center number.
Here is a question I think is relevant.
Here is the algorithm description. You will need to implement under the conditions you described.

MySQL Natural Sort (like OSX Finder)

I've searched for this for a long time, but the solutions I've found aren't working as I need.
Let me explain: I have a table containing a couple of thousands of products, each one with an alphanumeric SKU, used also for sorting.
This SKU consists of:
Category Code (variable number of alphabetic characters),
Product Number (integer),
Product Model Variation (optional, variable number of alphabetic characters)
For example: MANT 12 CL (without spaces)
Now, I need to get them ordered like this (and if these were filenames, OSX Finder would order them perfectly):
MANT1
MANT2
MANT2C
MANT2D
MANT2W
MANT3
MANT4C
MANT9
MANT12
MANT12C
MANT12CL
MANT12P
MANT13
MANT21
MANT24
MANT24D
MANT29
Of course ORDER BY sku is plainly wrong:
MANT1
MANT12
MANT12C
MANT12CL
MANT12P
MANT13
MANT2
MANT21
MANT24
MANT24D
MANT29
MANT2C
MANT2D
MANT2W
MANT3
MANT4C
MANT9
And ORDER BY LENGTH(sku), sku has problems sorting the model variations:
MANT1
MANT2
MANT3
MANT9
MANT12
MANT13
MANT21
MANT24
MANT29
MANT2C
MANT2D
MANT2W
MANT4C
MANT12C
MANT12P
MANT24D
MANT12CL
So, is there a way to sort this stuff like Finder would?
(Also, once sorted, is there a way to get the next and previous product? I don't mind using several queries: at this point elegance is the last of my problems...)
Thanks everybody in advance.
One last thing: during my searches I've found this answer to a similar question
but I have no idea how to use it in PHP, so I don't know if it works and is actually an answer to my question.
Are you using PHP when fetching data?
If so, try using natural sort function for in memory sort after data is already loaded?
The order is not 'plain wrong', it simply depends what collation you use. In your case, you might try the binary collation, for example, 'latin1_bin'.
Following example the ORDER BY using COLLATE for UTF8 data:
mysql> SELECT c1 FROM t1 ORDER BY c1;
+------+
| c1 |
+------+
| a1 |
| a12 |
| a13c |
| a2 |
| a21 |
+------+
mysql> SELECT c1 FROM t1 ORDER BY c1 COLLATE 'utf8_bin';
+------+
| c1 |
+------+
| a1 |
| a12 |
| a2 |
| a21 |
| a13c |
+------+