Django mysql count distinct gives different result to postgres - mysql

I'm trying to count distinct string values for a fitered set of results in a django query against a mysql database versus the same data in a postgres database. However, I'm getting really confusing results.
In the code below, NewOrder represents queries against the same data in a postgres database, and OldOrder is the same data in a MYSQL instance.
( In the old database, completed orders had status=1, in the new DB complete status = 'Complete'. In both the 'email' field is the same )
OldOrder.objects.filter(status=1).count()
6751
NewOrder.objects.filter(status='Complete').count()
6751
OldOrder.objects.filter(status=1).values('email').distinct().count()
3747
NewOrder.objects.filter(status='Complete').values('email').distinct().count()
3825
print NewOrder.objects.filter(status='Complete').values('email').distinct().query
SELECT DISTINCT "order_order"."email" FROM "order_order" WHERE "order_order"."status" = Complete
print OldSale.objects.filter(status=1).values('email').distinct().query
SELECT DISTINCT "order_order"."email" FROM "order_order" WHERE "order_order"."status" = 1
And here is where it gets really bizarre
new_orders = NewOrder.objects.filter(status='Complete').values_list('email', flat=True)
len(set(new_orders))
3825
old_orders = OldOrder.objects.filter(status=1).values_list('email',flat=True)
len(set(old_orders))
3825
Can anyone explain this discrepancy? And possibly point me as to why results would be different between postgres and mysql? My only guess is a character encoding issue, but I'd expect the results of the python set() to also be different?

Sounds like you're probably using a case-insensitive collation in MySQL. There's no equivalent in PostgreSQL; the closest is the citext data type, but usually you just compare lower(...) of strings, or use ILIKE for pattern matching.
I don't know how to say it in Django, but I'd see if the count of the set of distinct lowercased email addresses is the same as the old DB.
According to the Django docs something like this might work:
NewOrder.objects.filter(status='Complete').values(Lower('email')).distinct()

Related

how to include hard-coded value to output from mysql query?

I've created a MySQL sproc which returns 3 separate result sets. I'm implementing the npm mysql package downstream to exec the sproc and get a result structured in json with the 3 result sets. I need the ability to filter the json result sets that are returned based on some type of indicator in each result set. For example, if I wanted to get the result set from the json response which deals specifically with Suppliers then I could use some type of js filter similar to this:
var supplierResultSet = mySqlJsonResults.filter(x => x.ResultType === 'SupplierResults');
I think SQL Server provides the ability to include a hard-coded column value in a SQL result set like this:
select
'SupplierResults',
*
from
supplier
However, this approach appears to be invalid in MySQL b/c MySQL Workbench is telling me that the sproc syntax is invalid and won't let me save the changes. Do you know if something like what I'm trying to achieve is possible in MySQL and if not then can you recommend alternative approaches that would help me achieve my ultimate goal of including some type of fixed indicator in each result set to provide a handle for downstream filtering of the json response?
If I followed you correctly, you just need to prefix * with the table name or alias:
select 'SupplierResults' hardcoded, s.* from supplier s
As far as I know, this is the SQL Standard. select * is valid only when no other expression is added in the selec clause; SQL Server is lax about this, but most other databases follow the standard.
It is also a good idea to assign a name to the column that contains the hardcoded value (I named it hardcoded in the above query).
In MySQL you can simply put the * first:
SELECT *, 'SupplierResults'
FROM supplier
Demo on dbfiddle
To be more specific, in your case, in your query you would need to do this
select
'SupplierResults',
supplier.* -- <-- this
from
supplier
Try this
create table a (f1 int);
insert into a values (1);
select 'xxx', f1, a.* from a;
Basically, if there are other fields in select, prefix '*' with table name or alias

nested "select " query in mysql

hi i am executing nested "select" query in mysql .
the query is
SELECT `btitle` FROM `backlog` WHERE `bid` in (SELECT `abacklog_id` FROM `asprint` WHERE `aid`=184 )
I am not getting expected answer by the above query. If I execute:
SELECT abacklog_id FROM asprint WHERE aid=184
separately
I will get abacklog_id as 42,43,44,45;
So if again I execute:
SELECT `btitle` FROM `backlog` WHERE `bid` in(42,43,44,45)
I will get btitle as scrum1 scrum2 scrum3 msoffice
But if I combine those queries I will get only scrum1 remaining 3 atitle will not get.
You Can Try As Like Following...
SELECT `age_backlog`.`ab_title` FROM `age_backlog` LEFT JOIN `age_sprint` ON `age_backlog`.`ab_id` = `age_sprint`.`as_backlog_id` WHERE `age_sprint`.`as_id` = 184
By using this query you will get result with loop . You will be able to get all result with same by place with comma separated by using IMPLODE function ..
May it will be helpful for you... If you get any error , Please inform me...
What you did is to store comma separated values in age_sprint.as_backlog_id, right?
Your query actually becomes
SELECT `ab_title` FROM `age_backlog` WHERE `ab_id` IN ('42,43,44,45')
Note the ' in the IN() function. You don't get separate numbers, you get one string.
Now, when you do
SELECT CAST('42,43,44,45' AS SIGNED)
which basically is the implicit cast MySQL does, the result is 42. That's why you just get scrum1 as result.
You can search for dozens of answers to this problem here on SO.
You should never ever store comma separated values in a database. It violates the first normal form. In most cases databases are in third normal form or BCNF or even higher. Lower normal forms are just used in some special cases to get the most performance, usually for reporting issues. Not for actually working with data. You want 1 row for every as_backlog_id.
Again, your primary goal should be to get a better database design, not to write some crazy functions to get each comma separated number out of the field.

get all records matching the query string and display like in google suggest by google

What I want to do is, somewhat similar to google suggest.
My client page will submit the search text through ajax to server. The server will grab that text and query all the records matching that string and return back to client page.
e.g.
text_frm_client = "Ba". The query will show all the records beginning with "Ba"
The raw sql query to achieve my problem is
**Select * from table_name where column1 LIKE "Ba%" or column2 LIKE "Ba%"**
Now I want to port this query to django model. What I found is somewhat somewhat similar.
https://docs.djangoproject.com/en/dev/ref/models/querysets/#std:fieldlookup-contains
But this is only for one field. How can I accomplish the sql query with django model.
You can use the Q
data = MyModel.objects.filter(
Q(column1__contains="Ba") |
Q(column2__contains="Ba")
)

How best to retrieve result of SELECT COUNT(*) from SQL query in Java/JDBC - Long? BigInteger?

I'm using Hibernate but doing a simple SQLQuery, so I think this boils down to a basic JDBC question. My production app runs on MySQL but my test cases use an in memory HSQLDB. I find that a SELECT COUNT operation returns BigInteger from MySQL but Long from HSQLDB.
MySQL 5.5.22
HSQLDB 2.2.5
The code I've come up with is:
SQLQuery tq = session.createSQLQuery(
"SELECT COUNT(*) AS count FROM calendar_month WHERE date = :date");
tq.setDate("date", eachDate);
Object countobj = tq.list().get(0);
int count = (countobj instanceof BigInteger) ?
((BigInteger)countobj).intValue() : ((Long)countobj).intValue();
This problem of the return type negates answers to other SO questions such as getting count(*) using createSQLQuery in hibernate? where the advice is to use setResultTransformer to map the return value into a bean. The bean must have a type of either BigInteger or Long, and fails if the type is not correct.
I'm reluctant to use a cast operator on the 'COUNT(*) AS count' portion of my SQL for fear of database interoperability. I realise I'm already using createSQLQuery so I'm already stepping outside the bounds of Hibernates attempts at database neutrality, but having had trouble before with the differences between MySQL and HSQLDB in terms of database constraints
Any advice?
I don't known a clear solution for this problem, but I will suggest you to use H2 database for your tests.
H2 database has a feature that you can connect using a compatibility mode to several different databases.
For example to use MySQL mode you connect to the database using this jdbc:h2:~/test;MODE=MySQL URL.
You can downcast to Number and then call the intValue() method. E.g.
SQLQuery tq = session.createSQLQuery("SELECT COUNT(*) AS count FROM calendar_month WHERE date = :date");
tq.setDate("date", eachDate);
Object countobj = tq.list().get(0);
int count = ((Number) countobj).intValue();
Two ideas:
You can get result value as String and then parse it to Long or BigInteger
Do not use COUNT(*) AS count FROM ..., better use something like COUNT(*) AS cnt ... but in your example code you do not use name of result column but it index, so you can use simply COUNT(*) FROM ...

Query on custom metadata field?

This is a request from my client to tweak an existing Perl script. However, it is the actual database structure on their end that confuses me.
The requirement looks pretty simple:
only pull records where _X begins with 1, 2, or 9.
However, the underlying database is not that simple, here is the guideline from their DBA:
"_X is a custom metadata field. The database stores this data in rows, not columns, within the customData table. In order to query the custom data table in an efficient manner you need to know the Field_ID for the custom field you get that from the fielddef table:
SELECT Field_ID FROM FieldDef WHERE Name = "_X";
This returns:
10012
"Now you can query CustomData. For example:
SELECT Record_ID FROM CustomData where Field_ID="10012" AND StringValue="2012-04";
He also suggests that in my case, probably it would be:
"SELECT Record_ID FROM CustomData where Field_ID="10012" AND (StringValue LIKE '1%' || StringValue LIKE '2%' || StringValue LIKE '9%')
The weird thing is that the existing Perl script doesn't contain anything like "Select Record_ID FROM" but all like "SELECT StringValue FROM".
So that is why I am very confused here: What is "store in rows, not in columns"? Why first query the Field_ID table then CustomData? I would not be able to communicate with any of them during this weekend but really wish to get some idea on the whole thing, hope experts can help me a little on sorting out the whole structure.
More info(Table schema):
http://pastebin.com/ZiDTCCC0
The existing perl script:(focus on lines 72-136)
http://pastebin.com/JHpikTeZ
Thanks in advance.
What they seem to be using is some kind of Entity-Attribute-Value model, with the entities stored as ints and explained in another table (FieldDef).
You explained pretty well how you queried it (although you can do it in one query, with a join or a subquery), and your problem seems to be that you don't know how the Perl script does it. Unfortunately, without us seeing the Perl script, we can't either :]