Is there any kind of non text interface to MySQL? - mysql

I have a MySQL query that returns a result with a single column of integers. Is there any way to get the MySQL C API to transfer this as actually integers rather than as ASCII text? For that matter is there a way to get MySQL to do /any/ of the API stuff as other than ASCII text. I'm thinking this would save a bit of time in sprintf/sscanf or whatever else is used as well as in bandwidth.

You're probably out of luck, to be honest. Looking at the MySQL C API (http://dev.mysql.com/doc/refman/5.0/en/mysql-fetch-row.html, http://dev.mysql.com/doc/refman/5.0/en/c-api-datatypes.html, look at MYSQL_ROW) there doesn't seem to be a mechanism for returning data in its actual type... the joys of using structs I guess.
You could always implement a wrapper which checks against the MYSQL_ROW's type attribute (http://dev.mysql.com/doc/refman/5.0/en/c-api-datatypes.html) and returns a C union, but that's probably poor advice; don't do that.

Related

Is Athena a viable/sensible choice for infrequently searching *unstructured* JSON?

I'm recording request and response headers and bodies for all traffic to our API and from our API to 3rd party services into S3 as into tiny objects.
I want to be able to query this data infrequently. For example (pseudo-code):
select $.cars[0].color from "objects" where object_path in (....);
Other info:
Many "objects" in S3 won't have a valid path to $.cars[0].color (it's just one example).
I hope to not use Glue.
Cost is important - this is something that will be queried very infrequently. Configuring some ElasticSearch/similar solution is terribly out of budget for the use case.
I hope to not define my own set of schemas (this is simply not feasible).
Athena says it can search unstructured JSON. I'm having trouble creating a proof-of-concept to show this is true.
Is Athena right fr me? Am I missing a better solution?
I think Athena will work for your case.
Athena handles missing properties in JSON objects. For example, if you define the cars column as array<struct<color:string>>:
the property can be missing ⇒ SELECT cars … will be NULL
it can be an empty list ⇒ SELECT cars[1] … (Athena arrays start at 1) will result in an error, but element_at(cars, 1) and try(cars[1]) will return NULL
the object may be missing the color property ⇒ SELECT cars[1].color … will be NULL
for completely free-form JSON define the column as string and use the JSON functions to query it.
Glue is not necessary. Create the table manually, from your application, or with CloudFormation, and configure it to use partition projection and you will not have to think about using Glue crawlers at all.
Athena doesn't cost anything when you don't use it, and if you will query only infrequently this is key. Make sure to compress your data, and partition it in a way that supports your query patterns (e.g. by date or month if you most often will query recent data).
Not sure what you mean by having to define your "own set of schemas", so perhaps you can clarify that part?

Remove JSON keys with wildcards from a MySQL field

I have a MySQL 8.0.20 database with a table that describes metadata about uploaded image files. One column contains a JSON object with a whole bunch of auto-generated data that I'm trying to clean up.
This JSON object sometimes contains one or more variable key names that match a specific pattern. Something like
{
"image_name": "P10043983",
"image_size": "60138",
"image_original_exifdata": "{
'FileName':'P10043983.jpg',
'MimeType':'image/jpeg',
'UndefinedTag:0xA435':'\u0000\u0000\u0000\u0000\u0000\u0000'
}"
}
That UndefinedTag:0xA435 (with many permutations) is the problem. It's referring to various image Exif details like lens type, GPS data, etc. It's stuff that I'm not interested in and that these cameras mostly don't provide, so I've ended up with a table full of long strings of useless characters just taking up space. I want those JSON fields gone for performance and cleanliness.
Is there a way to run a SQL query that would use wildcards or regular expressions to find (and, ideally, remove) all of these pesky variable keys? I'd like to avoid manually making a list of all of the possible "UndefinedTag" keys to search against, and I also didn't like the results when I just treated the whole thing as a string and did REGEXP_REPLACE calls (it sometimes left trailing commas that broke my JSON and were difficult for me to avoid/resolve).
I know some of the JSON functions like JSON_SEARCH() accept wildcards, but it explicitly says the search path can't end in a wildcard (so no UndefinedTag:0x** allowed). Many of the functions I'm after (e.g., JSON_REMOVE()) don't accept wildcards at all. Hell, I've even had trouble finding known keys, and I suspect that silly colon in the key name might have something to do with it.
So, how can I clean up my table and remove the many forms of this UndefinedTag problem? Maybe it's easier to just go back to the regex_replace plan and deal instead with the trailing commas?

Convert comma to dot in Python or MySQL

I have a Python script which collects data and sends it to my MySQL table.
I noticed that the "Cost" sometimes is 0,95 which results in 0 in my table since my table use "0.95" instead of "0,95".
I assume the best solution is to convert the , to . in my Python script by using:
variable.replace(",", ".")
However, couldn't one solution be to change format in my MySQL table? So that I store numbers in this format:
1100
0,95
0,1
150000
My Django Model
cost = models.DecimalField(max_digits=10, decimal_places=4, default=None)
Any feedback on how to best solve this issue?
Thanks
Your first instinct is correct: convert the "unusual" (comma-decimal) input into the standard format that MySQL used by default (dot-decimal) at the first point where you receive it.
there's lots of ways to write numbers
Be careful, though that you don't get stung by people using commas as thousands separators like "3,203,907.23", or the European form "3.203.907,23", the Swiss "3'203'907,23' or even this form, which is widely used in India: "32,03,907.71" (yes, I did mean to type only two digits there!)
To make your life easier, the rule for currencies is relatively simple:
where a dot or comma is followed by only two digits at the end of the string, that character is acting as the decimal separator.
Once you know which is the decimal separator, you can safely remove all other non-digits from the string, change the decimal separator you found to . then use any standard library string-to-number conversion.
Storage format isn't presentation format
Yes, you can tell MySQL to use comma as its decimal separator, but doing that will break so much of your code - including the parts of the framework that read from the database and expect dot-decimal numbers - that you'll regret doing it that way very quickly...
There's a general principle at work here: you should do your data storage and processing using a format that is easy to process, interchangeable with other systems, and understood by other software developers.
Consider what happens if you need to allow a different framework to access your MySQL database to generate reports... whoever develops that software (and it may be you) will be glad that the numbers are all stored the way numbers are "always" stored in databases.
Convert on the way in, re-convert on the way out
Where you need to accept input in a different format, convert that input into your standardised format as early as possible.
When you need to use an output format, do the conversion to that format as late as possible.
The idea is to keep as much of your system "unexceptional" as possible. A programmer who has to remember what numeric format will in force at the time when a given method is called is not a happy programmer.
P.S.
The option you're talking about in MySQL is an example of this pattern: it doesn't change how numeric data is stored. All that changes is how you pass numbers to MySQL and how it presents them back to you.

How do I identify this JSON-like data structure?

I just came across a JSON wannabe that decides to "improve" it by adding datatypes... of course, the syntax makes it nearly impossible to google.
a:4:{
s:3:"cmd";
s:4:"save";
s:5:"token";
s:22:"5a7be6ad267d1599347886";
}
Full data is... much larger...
The first letter seems to be a for array, s for string, then the quantity of data (# of array items or length of string), then the actual piece of data.
With this type of syntax, I currently can't Google meaningful results. Does anyone recognize what god-forsaken language or framework this is from?
Note: some genius decided to stuff this data as a single field inside a database, and it included critical fields that I need to perform aggregate functions on. The rest I can handle if I can get a way to parse this data without resorting to ugly serial processing.
If this can be parsed using MSSQL 2008 that results in a view, I'll throw in a bounty...
I would parse it with a UDF written in .NET - https://learn.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-user-defined-functions
You can either write a custom aggregate function to parse and calculate these nutty fields, or a scalar value function that returns the field as JSON.
I'd probably opt for the latter in the name of separation of concerns.

QR codes Limits

I have to generate codes with custom fields: id of field+name of field+values of the field.
How long is the data I can encode inside the QRcode? I need to know how many fields\values I can insert.
Should I use XML or JSON or CSV? What is most generic and efficient?
XML / JSON will not qualify for a QR code's alphanumeric mode since it will include lower-case letters. You'll have to use byte mode. The max is 2,953 characters. But, the practical limit is far less -- perhaps a few hundred characters.
It is far better to encode a hyperlink to data if you can.
As Terence says, no reader will do anything with XML/JSON except show it. You need a custom reader anyway to do something useful with that data. (Which suggests this is not a good use case for QR codes.) But if you're making your own reader, you can use gzip compression to make the payload much smaller. Your reader would know to unzip it.
You might get away with something workable but this is not a good approach in general.
The maximum number of alphanumeric characters you can have is 4,296. Although this will require the lowest form of error correction and will be very hard to scan.
JSON is generally more efficient at data storage than XML.
However, you will need to write your own app to scan the code - I don't know of any which will process raw JSON or XML. All the scanners will show you the text, though.