Were are facing a big problem with string length control in SQL Server 2008.
A brief recap of our system:
import data in a persistent staging area from *.txt file (semicolon as separator), using bulk insert in SQL Server environment;
in PSA table all columns are varchar(MAX);
cleaning operations using insert statement based on a select with multiple where conditions.
The problem we deal with is on a single column type and length, in fact in data warehouse level it has to be numeric and its lengths must not exceed 13 digits.
The select is the following:
select cast(LTRIM(RTRIM(data_giacenza)) as numeric),
LTRIM(RTRIM(codice_socio)),
LTRIM(RTRIM(codice_gln)),
LTRIM(RTRIM(tipo_gln)),
LTRIM(RTRIM(codice_articolo_socio)),
LTRIM(RTRIM(codice_ean_prodotto)),
LTRIM(RTRIM(codice_ecat_prodotto)),
LTRIM(RTRIM(famiglia)),
LTRIM(RTRIM(marca)),
LTRIM(RTRIM(classificazione_liv_1)),
LTRIM(RTRIM(classificazione_liv_2)),
LTRIM(RTRIM(classificazione_liv_3)),
LTRIM(RTRIM(classificazione_liv_4)),
LTRIM(RTRIM(modello)),
LTRIM(RTRIM(descrizione_articolo)),
cast(LTRIM(RTRIM(giacenza)) as numeric),
cast(LTRIM(RTRIM(acquistato)) as numeric), 'X' FROM psa_stock a
where EXISTS
(
SELECT 0
FROM(
SELECT
data_giacenza
,codice_socio
,codice_gln
,codice_articolo_socio
FROM psa_stock
where
LEN(LTRIM(RTRIM(data_giacenza))) = 8 and LEN(LTRIM(RTRIM(codice_socio))) = 3
and LEN(LTRIM(RTRIM(codice_gln))) = 13 and LEN(LTRIM(RTRIM(tipo_gln))) = 3
and LEN(LTRIM(RTRIM(codice_articolo_socio))) <= 15
and (LEN(LTRIM(RTRIM(codice_ean_prodotto))) <= 13 or LEN(ISNULL(codice_ean_prodotto, '')) = 0)
and (LEN(LTRIM(RTRIM(codice_ecat_prodotto))) = 9 or LEN(ISNULL(codice_ecat_prodotto, '')) = 0)
and LEN(LTRIM(RTRIM(famiglia))) = 2
and (LEN(LTRIM(RTRIM(marca))) <= 20 or LEN(ISNULL(marca, '')) = 0)
and (LEN(LTRIM(RTRIM(modello))) <= 30 or LEN(ISNULL(modello, '')) = 0)
and (LEN(LTRIM(RTRIM(descrizione_articolo))) <= 50 or LEN(ISNULL(descrizione_articolo, '')) = 0)
and LEN(LTRIM(RTRIM(giacenza))) <= 5
and LEN(LTRIM(RTRIM(acquistato))) <= 5
and (LEN(LTRIM(RTRIM(classificazione_liv_1))) <= 15 or LEN(ISNULL(classificazione_liv_1, '')) = 0)
and (LEN(LTRIM(RTRIM(classificazione_liv_2))) <= 15 or LEN(ISNULL(classificazione_liv_2, '')) = 0)
and (LEN(LTRIM(RTRIM(classificazione_liv_3))) <= 15 or LEN(ISNULL(classificazione_liv_3, '')) = 0)
and (LEN(LTRIM(RTRIM(classificazione_liv_4))) <= 15 or LEN(ISNULL(classificazione_liv_4, '')) = 0)
and ISNUMERIC(ltrim(rtrim(REPLACE(data_giacenza, ' ', '')))) = 1
and ISNUMERIC(ltrim(rtrim(REPLACE(codice_gln, ' ', '')))) = 1
and ISNUMERIC(LTRIM(RTRIM(REPLACE(giacenza, ' ', '')))) = 1 and charindex(',', giacenza) = 0
and ISNUMERIC(LTRIM(RTRIM(REPLACE(acquistato, ' ', '')))) = 1
and ISNUMERIC(ltrim(rtrim(REPLACE(codice_ean_prodotto, ' ', '')))) = 1
and ISNUMERIC(ltrim(rtrim(REPLACE(codice_ecat_prodotto, ' ', '')))) = 1
and codice_socio in (select codice_socio from ana_socio)
and tipo_gln in (select tipo from ana_gln)
and codice_gln in (select codice_gln from dw_key_gln)
group by
data_giacenza
,codice_socio
,codice_gln
,codice_articolo_socio
having COUNT (*) = 1
) b
where
a.data_giacenza = b.data_giacenza and
a.codice_articolo_socio = b.codice_articolo_socio and
a.codice_socio = b.codice_socio and
a.codice_gln = b.codice_gln)
The critical field is codice_ean_prodotto.
In fact, it allows to consider also values as SEAGAT7636490026751,NE20000003039,NE20000002168 which are not numeric and, the first, overlap maximum dimensions.
As result, the insert statement gives back
String o binary data would be truncated
error and fails the insertion.
Thanks in advance! I look forward your help!!!
Enrico
Have you tried executing that query, and adding codice_ean_prodotto = 'NE20000003039' to the where clause? Be sure that these are the actual field that is giving you the problem. If the select returns a row with that where clause, then something's wrong with the logic.
I'm leaning towards your having COUNT (*) = 1 clause in the EXISTS subquery - is it possible to have more than one record for these specific keys? As long as your PK is made up of those 4 fields (data_giacenza, codice_articolo_socio, codice_socio, codice_gln), you shouldn't need the GROUP BY and HAVING clauses at all. If you're not joining on your primary key, it could be that it is the culprit.
It's hard to tell without seeing your data model, however.
I figured out what was wrong.
In the inner select, we were excluding from the selection all records not respecting format constraints and duplication (the meaning of count(*)=1), extracting only the PK of the destination table.
But when selecting with PK we retrieve also those record that were duplicates, but were excluded by the format constraint, leading the insert to error due to dimension issues.
Now I divided the steps:
Duplicates lookup and deletion
Selection with format constraints
It works!
Related
We have a scenario where users answer some questions related to a parent entity that we'll call a widget. Each question has both a numeric and word answer. Multiple users answer each question for a given widget.
We then display a row for each widget with the average numeric answer for each question. We do that using a MySQL pseudo-pivot with dynamic columns as detailed here So we end up with something like:
SELECT widget_id, ...
ROUND(IFNULL(AVG(CASE
WHEN LOWER(REPLACE(RQ.question, ' ', '_')) = 'overall_size' THEN
if(RA.num = '', 0, RA.num) END),0) + .0001, 2) AS `raw_avg_overall_size`,
...
... where overall_size would be one of the question types related to the widget and might have "answers" from 5 users like 1,2,2,3,1 to that question for a given widget_id based on the answer options below:
Answers
answer_id
answer_type
num
word
111
overall_size
1
x-large
112
overall_size
2
large
113
overall_size
3
medium
114
overall_size
4
small
115
overall_size
5
x-small
So we would end up with a row that had something like this:
widget_id
average_overall_size
115
1.80
What we can't figure out is then given if we round 1.80 to zero precision we get 2 in this example which is the word value 'large' from our data above. We like to include that in the query output too so that end up with:
widget_id
raw_average_overall_size
average_overall_size
115
1.80
large
The issue is that we do not know the average for the row until the query runs. So how can we then reference the word value for that average answer in the same row when executing the query?
As mentioned we are pivoting into a variable and then run another query for the full execution. So if we join in the pivot section, that subquery looks something like this:
SET #phase_id = 1;
SET SESSION group_concat_max_len = 100000;
SET #SQL = NULL;
SET #NSQL = NULL;
SELECT GROUP_CONCAT(DISTINCT
CONCAT(
'ROUND(IFNULL(AVG(CASE
WHEN LOWER(REPLACE(RQ.short_question, '' '', ''_'')) = ''',
nsq,
''' THEN
if(RA.answer = '''', 0, RA.answer) END),0) + .0001, 2) AS `',
CONCAT('avg_raw_',nsq), '`,
REF.value, -- <- ******* THIS FAILS **** --
ROUND(IFNULL(STDDEV(CASE
WHEN LOWER(REPLACE(RQ.short_question, '' '', ''_'')) = ''',
nsq,
''' THEN RA.answer END), 0) + .0001, 3) AS `',
CONCAT('std_dev_', nsq), '`
'
)
ORDER BY display_order
) INTO #NSQL
FROM (
SELECT FD.ref_value, FD.element_name, RQ.display_order, LOWER(REPLACE(RQ.short_question, ' ', '_')) as nsq
FROM review_questions RQ
LEFT JOIN form_data FD ON FD.id = RQ.form_data_id
LEFT JOIN ref_values RV on FD.ref_value = RV.type
WHERE RQ.phase_id = #phase_id
AND FD.element_type = 'select'
AND RQ.is_active > 0
GROUP BY FD.element_name
HAVING MAX(RV.key_name) REGEXP '^[0-9]+$'
) nq
/****** suggested in 1st answer ******/
LEFT JOIN ref_values REF ON REF.`type` = nq.ref_value
AND REF.key_name = ROUND(CONCAT('avg_raw_',nsq), 0);
So we need the word answer (from the REF join's REF.value field in the above code) in the pivot output, but it fails with 'Unknown column REF.value. If we put REF.value in it's parent query field list, that also fails with the same error.
You'll need to join the table/view/query again to get the 'large' value.
For example:
select a.*, b.word
from (
-- your query here
) a
join my_table b on b.answer_id = a.answer_id
and b.num = round(a.num);
An index on my_table (answer_id, num) will speed up the extra search.
This fails, leading to the default of "2":
LOWER(REPLACE(RQ.question, ' ', '_')) = 'overall_size'
That is because the question seems to be "average_overall_size", not "overall_size".
String parsing and manipulation is the pits in SQL; suggest using the application to handle such.
Also, be aware that you may need a separate subquery to compute aggregate (eg AVG()), else it might not be computed over the set of values you think.
Query into temp table, then join
First query should produce table as follows:
CREATE temp table, temp_average_size
widget_id
average_overall_size
rounded_average_size
115
1.80
2
LEFT JOIN
select s.*, a.word
from temp_average_size s LEFT JOIN answers a
ON (s.rounded_average_size = a.num AND a.answer_type = 'overall_size)
Question:
I looked at various other examples to increment over rows, but all resulted in the same wrong output. The problem which I encountered was that my code did not successfully increment over rows to build a correct index per new row in the result-set per episode (highlighted in red below).
My first try was:
SET #ep_1 = "Peaky Blinders";
SET #curRow_1 = 0;
SELECT
DATE_FORMAT(created_at, "%Y%m%d") AS year_month_day,
#curRow_1 := #curRow_1 + 1 AS row_number,
#ep_1 AS episode_title,
COUNT(id) AS episode_plays
FROM netflix.episode_plays
WHERE
episode_id = "xyz"
AND created_at >= "2019-07-01" AND created_at <= "2019-07-07"
GROUP BY 1
Other than the rows not incrementing correctly; I also got the following error when I tried setting some variables in the beginning of my code:
Error running query: Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8_general_ci,IMPLICIT) for operation '='
(Note: I have no affiliation with Netflix, I just used Netflix dummy data to answer my question)
I broke down my question in various sections and got to the final answer below.
The most important part was to add the initial result-sets into a subqueries, and thereafter select the data from tables x1,x2, etc.
The second part of the question was, how to combine multiple datasets together (in my case: how do one not only do it for one specific netflix episode, but multiple episodes)? I settled on the UNION ALL - clause.
In the first iteration I tried hard-coding the dates, and thereafter found the INTERVAL-function very helpful.
Finally, the unicode-error I fixed by adding COLLATE utf8_unicode_ci after setting my variables.
If you find mistakes in my code or have any other suggestions, please feel free to suggest them.
-- SET DATA
-- variables for table x1
SET #ep_1 = "Peaky Blinders" COLLATE utf8_unicode_ci;
SET #id_1 = (SELECT id FROM netflix.episodes WHERE episode_title = #ep_1);
SET #date_1 = (SELECT created_at FROM netflix.episodes WHERE episode_title = #ep_1);
SET #curRow_1 = 0;
-- variables for table x2
SET #ep_2 = "Brooklyn Nine-Nine" COLLATE utf8_unicode_ci;
SET #id_2 = (SELECT id FROM netflix.episodes WHERE episode_title = #ep_2);
SET #date_2 = (SELECT created_at FROM netflix.episodes WHERE episode_title = #ep_2);
SET #curRow_2 = 0;
-- QUERY DATA
SELECT
x1.year_month_day,
#curRow_1 := #curRow_1 + 1 AS row_number,
x1.episode_title,
x1.episode_plays
FROM (
SELECT
DATE_FORMAT(created_at, "%Y%m%d") AS year_month_day,
#ep_1 AS episode_title,
COUNT(id) AS episode_plays
FROM netflix.episode_plays
WHERE
episode_id = #id_1
AND created_at >= #date_1 AND created_at <= DATE_ADD(#date_1 , INTERVAL 7 DAY)
GROUP BY 1) x1
UNION ALL
SELECT
x2.year_month_day,
#curRow_2 := #curRow_2 + 1 AS row_number,
x2.episode_title,
x2.episode_plays
FROM (
SELECT
DATE_FORMAT(created_at, "%Y%m%d") AS year_month_day,
#ep_2 AS episode_title,
COUNT(id) AS episode_plays
FROM netflix.episode_plays
WHERE
episode_id = #id_2
AND created_at >= #date_2 AND created_at <= DATE_ADD(#date_2 , INTERVAL 7 DAY)
GROUP BY 1) x2
I've got a rather large query that is trying to get a list of carriers and compare the amount of insurance they have on record to identify carriers that do not meet a minimum threshold. If I run the select query it works just fine with no errors. But when I try to use it for an insert into a table it returns this error message
[Err] 1366 - Incorrect decimal value: '' for column '' at row -1
I have to use the cast as decimal at the bottom of this query because the value that is being stored in the database is a varchar and I cannot change that.
Anyone have any ideas?
set #cw_days = 15;
INSERT INTO carrier_dnl (carrier_id, dnl_reason_id, status_id)
SELECT work_cw_carrier_status_update.carrier_id, company_dnl_schema.dnl_reason_id,
CASE
WHEN work_cw_carrier_status_update.comparison_date > #cw_days THEN 1
ELSE 4
END as status
FROM work_cw_carrier_status_update
JOIN company_dnl_schema
ON company_dnl_schema.dnl_reason_id = 51
LEFT OUTER JOIN carrier_insurance
ON carrier_insurance.carrier_id = work_cw_carrier_status_update.carrier_id
WHERE ifnull(carrier_insurance.insurance_type_id,4) = 4
AND date(now()) BETWEEN IFNULL(carrier_insurance.insurance_effective_date,DATE_SUB(now(),INTERVAL 1 day)) AND IFNULL(carrier_insurance.insurance_expiration_date,DATE_ADD(now(),INTERVAL 1 day))
AND CASE WHEN NULLIF(carrier_insurance.insurance_bipdto_amount,'') is null THEN 0 < company_dnl_schema.value
ELSE
ifnull(cast(replace(carrier_insurance.insurance_bipdto_amount, '*','') as decimal),0) < company_dnl_schema.value
END
AND ( work_cw_carrier_status_update.b_bulk = 0 OR work_cw_carrier_status_update.b_bulk = 1 )
AND ( work_cw_carrier_status_update.b_otr = 1 OR work_cw_carrier_status_update.b_ltl = 1
OR work_cw_carrier_status_update.b_dray = 1 OR work_cw_carrier_status_update.b_rail = 1
OR work_cw_carrier_status_update.b_intermodal = 1 OR work_cw_carrier_status_update.b_forwarder = 1
OR work_cw_carrier_status_update.b_broker = 1 )
group by work_cw_carrier_status_update.carrier_id;`
If the select seems to work, then there are two possible problems. The first is that the select doesn't really work and the problem appears further down in the data. Returning one or a handful of rows is not always the same as "working".
The second is an incompatibility with the types for the insert. You can try to use silent conversion to convert the values in the select to numbers:
SELECT work_cw_carrier_status_update.carrier_id + 0, company_dnl_schema.dnl_reason_id + 0,
(CASE WHEN work_cw_carrier_status_update.comparison_date > #cw_days THEN 1
ELSE 4
END) as status
This may look ugly, but it is not nearly as ugly as storing ids as strings in one table and as numbers in another.
Here is my Input Control:
Note that the Info in the "Course Group" (Single Select Query) is in English
Here is the Query that gets the data in "Course Group"
select distinct(cog.cog_id) id, concat(cd.cd_shortdescription, '
(', cn.cn_shortname,' - ', cog.cog_org_be_id, ' - ', cd_code.cd_code, ')')
coursegroup
from es_exam_statistics_ft es, cg_classgroup cg, org_organisation org,
cn_campusname cn, cog_coursegroup cog, cd_codedescription cd, cd_code
where es.es_cg_id = cg.cg_id
and es.es_cog_id = cog.cog_id
and cog.cog_coursegroup_cd_id = cd.cd_id
and cd.cd_id = cd_code.cd_id
and org.org_be_id = cog.cog_org_be_id
and org.org_campusid = cn.cn_campusid
and cg.cg_startdate >= $P{startDate}
and cg.cg_enddate <= $P{endDate}
and cd.cd_language_id = 3
and cn.cn_language_id = 3
order by coursegroup
The problem comes with the lines i have made bold
Language Id's
2=Afrikaans
3=English
Now as you can see, the Query is hard-coded so that the language is always English, So if a user logs in, in a different language, the data in the input control will always be English
I tried replacing the value 3 form query (and cd.cd_language_id = 3) with "$P{REPORT_LOCALE}.getDisplayLanguage().equals("English") ? new Integer(3): new Integer(2)"
Which works in the XML of a Report, but doesn't work in the input Controls Query
How do I solve this issue ?
You cant change language of input controls value because these values are stored in database, Input control only fetch the data from database whatever data is stored in database, if in database its stored in other language then only you can change the input controls values.
To change the input values at JasperReport server level :-
1:- Create one more input control(p_language) to ask the Language
(value column (id) and visible column(language desc))
2:- then create a casecading input control to fetch the value of course group
using this query.
select distinct(cog.cog_id) id, concat(cd.cd_shortdescription, '
(', cn.cn_shortname,' - ', cog.cog_org_be_id, ' - ', cd_code.cd_code, ')')
coursegroup
from es_exam_statistics_ft es, cg_classgroup cg, org_organisation org,
cn_campusname cn, cog_coursegroup cog, cd_codedescription cd, cd_code
where es.es_cg_id = cg.cg_id
and es.es_cog_id = cog.cog_id
and cog.cog_coursegroup_cd_id = cd.cd_id
and cd.cd_id = cd_code.cd_id
and org.org_be_id = cog.cog_org_be_id
and org.org_campusid = cn.cn_campusid
and cg.cg_startdate >= $P{startDate}
and cg.cg_enddate <= $P{endDate}
and cd.cd_language_id = $P{p_language}
and cn.cn_language_id = $P{p_language}
order by coursegroup
There are two types of records in my Db such as MS-NW and CS in the same column of table DICIPLINE I want to wrap if its CS (ANY TWO STRING LIKE CS,TE OR THE LIKE) then wrap it to BS(CS) (OR BS(TE) ETC) or if its MS-NW (Or MS-CS, MS-TE and the like) then wrap it to MS(NW) from the column dicipline.
I updated for two strings successfully and following is the query for that kindly let me know how can i do it for values like MS-NW OR MS-CS and convert it to the format like MS(NW) from following query .
UPDATE DEG set DICIPLINE = concat("BS(",DICIPLINE,")") where CHAR_LENGTH(DICIPLINE) = 2
The below query helps you to update your data.
update deg set DISIPLINE = if(length(DISIPLINE)= 2,concat('BC(',DISIPLINE,')')
,concat('MS(',substr(DISIPLINE, 4,4),')'));
See Sqlfiddle demo.
For safety, create a temporary column of same type and perform an update like this:
UPDATE deg
SET dicipline_temp = CASE
WHEN CHAR_LENGTH(dicipline) = 2
THEN CONCAT('BS(', dicipline, ')')
WHEN CHAR_LENGTH(dicipline) = 5 AND SUBSTRING(dicipline, 3, 1) = '-'
THEN CONCAT(REPLACE(dicipline, '-', '('), ')')
END
WHERE CHAR_LENGTH(dicipline) = 2 OR (CHAR_LENGTH(dicipline) = 5 AND SUBSTRING(dicipline, 3, 1) = '-')
If results are acceptable, update the actual column.