mySQL table with {"Twitter": 28, "Total": 28, "Facebook": 1} - mysql

There is a table with one column, named "info", with content like {"Twitter": 28, "Total": 28, "Facebook": 1}. When I write sql, I want to test whether "Total" is larger than 10 or not. Could someone help me write the query? (table name is landslides_7d)
(this is what I have)
SELECT * FROM landslides_7d WHERE info.Total > 10;
Thanks.

The data format seems to be JSON. If you have MySQL 5.7 you can use JSON_EXTRACT or the short form ->. Those functions don't exist in older versions.
SELECT * FROM landslides_7d WHERE JSON_EXTRACT(info, '$.total') > 10;
or
SELECT * FROM landslides_7d WHERE info->total > 10;
See http://dev.mysql.com/doc/refman/5.7/en/json-search-functions.html#function_json-extract
Mind that this is a full table scan. On a "larger" table you want to create an index.
If you're on an older version of MySQL you should create an extra column to your table and manually add the total value to that column.

You probably are storing the JSON in a single blob or string column. This is very inefficient, since you can't make use of indexes, and will need to parse the entire JSON structure on every where query. I'm not sure how much flexibility you need, but if the JSON attributes are relatively fixed, I recommend running a script (ruby, Python, etc.) on the table contents and storing "total" in a traditional columnar format. For example, you could add a new column "total" which contains the total attribute as an INT.
A side benefit of using a script is that you can catch any improperly formatted JSON - something you can't do in a single query.
You can also keep "total" column maintained with a trigger (on update/insert of "info"), using the JSON_EXTRACT function referenced in #johannes answer.

Related

MySQL virtual column and wildcard

I was trying MySQL secondary indexing referring to MySQL Documentation, and weird thing happened.
Firstly, I created a table with small modification per the example in the document
create table jemp(
c JSON,
g VARCHAR(20) GENERATED ALWAYS AS (c->"$.name"),
INDEX i (g)
)
Secondly, I inserted values per the example in the document
INSERT INTO jemp (c) VALUES
('{"id": "1", "name": "Fred"}'), ('{"id": "2", "name": "Wilma"}'),
('{"id": "3", "name": "Barney"}'), ('{"id": "4", "name": "Betty"}');
And then, I tried to perform a fuzzy search with "like" and "wildcard". This doesn't work because index doesn't support prefix %, but it can get result.
select c->"$.name" as name from jemp where g like "%F%"
Here is the weird thing, I removed the prefix %, and index did work. However, I didn't get any results. Per my poor understanding of MySQL, this should work.
select c->"$.name" as name from jemp where g like "F%"
I would be so much appreciate if anyone could help me with it.
For your query to work, you want a generated column that extracts the name as text rather than JSON. That is, use ->> instead of ->:
g VARCHAR(20) GENERATED ALWAYS AS (c ->> '$.name')
Then: the index may help for both following conditions:
where g like 'F%'
where g = 'F'
Whether MySQL decides to use it or not is another story; basically the databases assesses whether using the index will be faster than a full scan. If it believes that the condition will match on a large number of rows, it will probably choose to full scan.
Note that I consistently use single quotes for string literals; although MySQL tolerates otherwise, this is what the SQL standard specifies. In some other databases, double quotes stand for identifiers (this also is compliant with the standard).

How to read the value in a BINARY column in MySQL

I want to add a BINARY(35) column to a table in which values are written whose bits are each assigned to a specific meaning.
i.e. "000110001010..."
1st bit: day 1,
2nd bit: day 2,
etc..
I've found out how to write the value into the table
INSERT INTO MYTABLE VALUES(x'03011...');
but how do I retrieve it from the database?
If I cast the column as a character string, I'll loose everything past the first x'00' (NULL) in the value. In my application, its entirely possible that they'll still be '1's past this.
Because I'm using the C++ connector, I've only its API functions to retrieve the data so I'll need to know the type of the data retrieved. The API does not have a getBinary() function. If any of you can tell me which function to use, I'd really appreciate it.
Got the answer from another Q&A site.
SELECT HEX(mycolumn) FROM MYTABLE;
If anyone wants to read more about this:
Hexidecimal Literals: https://dev.mysql.com/doc/refman/5.7/en/hexadecimal-literals.html
Bit-Field Literals: https://dev.mysql.com/doc/refman/5.7/en/bit-field-literals.html
Substring(cast column as varchar), 1,1)

Converting CSV string to multiple columns in Apache Drill

Using: Apache Drill
I am trying to bring the following data in a more structured form:
"apple","juice", "box:12,shipment_id:143,pallet:B12"
"mango", "pulp", "box:7,shipment_id:133,pallet:B19,route:11"
"grape", "jam", "box:10"
Desired output:
fruit, product, box_id, shipment_id, pallet_id, route_id
apple,juice, 12, 143, B12, null
mango, pulp, 7, 133, B19, 11
grape, jam, 10, null, null, null
Dataset runs into couple of GBs. Drill reads the input into three columns with the last string in one single column. Have successfully achieved the desired output by performing string manipulation operations (REGEXP_REPLACE and CONCAT) on the last column, then reading the column as json (CONVERT_FROM), and finally separating into different columns using KVGEN and FLATTEN.
The execution time is pretty high due to the regex functions. Is there a better approach?
(PS: execution time is compared to using a pyspark job to achieve the desired output).
I do not see any other way to do it 100% with Apache Drill, without any intermediate storage
You may try with a Custom Function in Java, to make it easier to write.
Since you have done the work,
have you tried to save the data in a Parquet file? CTAS command: http://drill.apache.org/docs/create-table-as-ctas-command/
This would make subsequent queries a lot faster.

MySQL increment value in a text field

Say I have a text field with JSON data like this:
{
"id": {
"name": "value",
"votes": 0
}
}
Is there a way to write a query which would find id and then would increment votes value?
I know i could just retrieve the JSON data update what I need and reinsert updated version, but i wonder is there a way to do this without running two queries?
UPDATE `sometable`
SET `somefield` = JSON_REPLACE(`somefield`, '$.id.votes', JSON_EXTRACT(`somefield` , '$.id.votes')+1)
WHERE ...
Edit
As of MySQL 5.7.8, MySQL supports a native JSON data type that enables efficient access to data in JSON documents.
JSON_EXTRACT will allow you to access a particular JSON element in a JSON field, while JSON_REPLACE will allow you to update it.
To specify the JSON element you wish to access, use a string with the format
'$.[top element].[sub element].[...]'
So in your case, to access id.votes, use the string '$.id.votes'.
The SQL code above demonstrates putting all this together to increment the value of a JSON field by 1.
I think for a task like this you're stuck using a plain old SELECT followed by an UPDATE (after you parse the JSON, increment the value you want, and then serialize the JSON back).
You should wrap these operations in a single transaction, and if you're using InnoDB then you might also consider using SELECT ... FOR UPDATE : http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
This is sort of a tangent, but I thought I'd also mention that this is the type of operation that a NoSQL database like MongoDB is quite good at.

MySQL - Extracting numbers out of strings

In a MySQL database, I have a table which contains itemID, itemName and some other fields.
Sample records (respectively itemID and itemName):
vaX652bp_X987_foobar, FooBarItem
X34_bar, BarItem
tooX56, TOOX_What
I want to write a query which gives me an output like:
652, FooBarItem
34, BarItem
56, TOOX_What
In other words, I want to extract out the number from the itemID column. But the condition is that the extracted number should be the number that occurs after the first occurence of the character "X" in the itemID column.
I am currently trying out locate() and substring() but could not (yet) achieve what I want..
EDIT:
Unrelated to the question - Can any one see all the answers (currently two) to this question ? I see only the first answer by "soulmerge". Any ideas why ? And the million dollar question - Did I just find a bug ?!
That's a horrible thing to do in mysql, since it does not support extraction of regex matches. I would rather recommend pulling the data into your language of choice and processing it there. If you really must do this in mysql, using unreadable combinations of LOCATE and SUBSTRING with multiple CASEs is the only thing I can think of.
Why don't you try to make a third column where you can store, at the moment of the insertion of the record (separating the number in PHP or so), the number alone. So this way you use a little more of space to save a lot of processing.
Table:
vaX652bp_X987_foobar, 652, FooBarItem
X34_bar, 34, BarItem
tooX56, 56, TOOX_What
This isn't so unreadable :
SELECT 0+SUBSTRING(itemID, LOCATE("X", itemID)+1), itemName FROM tableName