I have a table with the following fields:
id, type, date, changelog.
The changelog field has 10 useful pieces of information I would like to split out into their own fields. both new and old: name, month, year, zipcode, status
So I would like to create a table with the following fields:
id, type, date, old_name, new_name, old_month, new_month, old_year, new_year, old_zipcode, new_zipcode, old_status, new_status.
When all 5 pieces of information exist it is easy but when some are missing I can’t get it to work. Any help is appreciated.
a typical changelog field doesn't have all of these pieces of information, just what is being updated.
for example:
id type date changelog
101 upd 1/1/2019 ---!hash:ActiveSupport
name:
- Adam
- Chris
month:
- 7
- 12
status:
- 1
- 3
Which would translate to:
id type date old_name new_name old_month new_month old_year new_year old_zipcode new_zipcode old_status new_status
101 upd 1/1/19 Adam Chris 7 12 1 3
This is not a complete solution (it assumes you can already parse out the values when you know they are present), but it addresses how to handle when those values are missing:
INSERT INTO tableV2 (id, type, date, old_name, new_name, and so on....)
SELECT id, type, date
, CASE WHEN INSTR(changelog, 'name:') = 0 THEN NULL
ELSE (parse the value out here)
END AS old_name
, CASE WHEN INSTR(changelog, 'name:') = 0 THEN NULL
ELSE (parse the value out here)
END AS new_name
, and so on....
FROM tableV1
;
The parsing, while not trivial, probably won't be too difficult other than the tediousness of it. You'll need to take the found "tag" location, find the 3 newlines following it (first for the tag, latter for each value), and then use those along with other string functions such as SUBSTR, LEFT... and maybe some CHAR_LENGTH(tag string) like CHAR_LENGTH('name:') to make the parsing repeatable for each tag with minor modification.
Related
I have a table that for an ID, will have data in several bucket fields. I want a function to pull out a sum of buckets, but the function parameters will include the start and end bucket field.
So, if I had a table like this:
ID Bucket0 Bucket30 Bucket60 Bucket90 Bucket120
10 5.00 12.00 10.00 0.0 8.00
If I send in the ID and the parameters Bucket0, Bucket0, it would return only the value in the Bucket0 field: 5.00
If I send in the ID and the parameters Bucket30, Bucket120, it would return the sum of the buckets from 30 to 120, or (12+10+0+8) 30.00.
Is there a nicer way to write this other than a huge ugly
if parameter1=bucket0 and parameter2=bucket0
then select bucket0
else if parameter1=bucket0 and parameter2=bucket1
then select bucket0 + bucket1
else if parameter1=bucket0 and parameter2=bucket2
then select bucket0 + bucket1 + bucket2
and so on?
The table already exists, so I don't have a lot of control over that. I can make my parameters for the function however I want. I can safely say that if a set of buckets are wanted, none in the middle will be skipped, so specifying start and end buckets would work. I could have a single comma delimited string of all buckets wanted.
It would have been better if your table had been normalised, like this:
id | bucket | value
---+-----------+------
10 | bucket000 | 5
10 | bucket030 | 12
10 | bucket060 | 10
10 | bucket090 | 0
10 | bucket120 | 8
Also, the buckets should better have names that are easy to compare in ranges, so that bucket030 comes between bucket000 and bucket120 in the normal alphabetical order, which is not the case if you leave out the padded zeroes.
If the above normalisation is not possible, then use an unpivot clause to turn your current table into the structure depicted above:
select id, sum(value)
from (
select *
from mytable
unpivot (value for bucket_id in (bucket0 as 'bucket000',
bucket30 as 'bucket030',
bucket60 as 'bucket060',
bucket90 as 'bucket090',
bucket120 as 'bucket120'))
) normalised
where bucket_id between 'bucket000' and 'bucket060'
group by id
When you do this with parameter variables, make sure those parameters have the padded zeroes as well.
You could for instance ensure that as follows for parameter1:
if parameter1 like 'bucket%' then
parameter1 := 'bucket' || lpad(+substr(parameter1, 7), 3, '0');
end if;
...etc.
I have table with column 'ID', 'File_Name'
Table
ID File_Name
123 ROSE1234_LLDAtIInstance_03012014_04292014_190038.zip
456 ROSE1234_LLDAtIInstance_08012014_04292014_190038.zip
All I need is to pickup the first date given in file name.
Required:
ID Date
123 03012014
456 08012014
Here's one method assuming 8 characters after 2nd _ is always true.
It finds the position of the first _ then looks for the position of the 2nd _ using the position of the first _+1 then it looks for the 8 characters after the 2nd _
SELECT Id
, substr(File_name, instr(File_name,'_',instr(File_name,'_')+1)+1,8) as Date
FROM Table
or
a more elegant way would be to use a RegExp_Instr Function which eliminates the need for nesting instr.
SELECT Id, substr(File_name,REGEXP_INSTR(FileName,'_',1,2)+1,8) as date
FROM dual;
Why don't you simply put the date in separate column? E.g. you can than query the (indexed) date. The theory says the date is a property of the file. It's about avoiding errors, maintainability and so on. What in the zip files? Excel sheets I suppose :-)
Use a much simplified call to REGEXP_SUBSTR( ):
SQL> with tbl(ID, File_name) as (
2 select 123, 'ROSE1234_LLDAtIInstance_03012014_04292014_190038.zip' from dual
3 union
4 select 456, 'ROSE1234_LLDAtIInstance_08012014_04292014_190038.zip' from dual
5 )
6 select ID,
7 REGEXP_SUBSTR(File_name, '_(\d{8})_', 1, 1, NULL, 1) "Date"
8 from tbl;
ID Date
---------- ----------------------------------------------------
123 03012014
456 08012014
SQL>
For 11g, click here for the parameters to REGEXP_SUBSTR( ).
EDIT: Making this a virtual column would be another way to handle it. Thanks to Epicurist's post for the idea. The virtual column will contain a date value holding the filename date once the ID and filename are committed. Add it like this:
alter table X_TEST add (filedate date generated always as (TO_DATE(REGEXP_SUBSTR(Filename, '_(\d{8})_', 1, 1, NULL, 1), 'MMDDYYYY')) virtual);
So now just insert the ID and Filename, commit and there's your filedate. Note that its read-only.
If I have a typical MySQL query such as:
SELECT CommentNo, CreatedDate, CreatedTime, Text FROM Comment
The results will be displayed in a simple table. But is it possible to have MySQL format the output so that a blank line can be inserted into the results based upon a change in a column?
So if my query above returns the following by default:
CommentNo CreatedDate CreatedTime Text
1 2012-08-02 15:33:27 This.
2 2012-08-02 15:34:40 That.
3 2013-06-30 19:45:48 Something else.
4 2013-06-30 21:26:01 Nothing.
5 2013-06-30 21:26:43 Was.
6 2013-07-01 13:40:32 Hello.
7 2013-07-01 14:08:25 Goodbye.
Is it possible to insert a blank row upon change of column value for CreatedDate, so I get:
CommentNo CreatedDate CreatedTime Text
1 2012-08-02 15:33:27 This.
2 2012-08-02 15:34:40 That.
3 2013-06-30 19:45:48 Something else.
4 2013-06-30 21:26:01 Nothing.
5 2013-06-30 21:26:43 Was.
6 2013-07-01 13:40:32 Hello.
7 2013-07-01 14:08:25 Goodbye.
The above is not how data would be stored in the Comment table, it is an issue of formatting query output.
Or would it be better to use a BufferedWriter in Java?
First, let me say that this type of formatting should definitely be done at the application layer and not as a query.
That said, this seems like an amusing exercise, and it isn't so hard:
select CommentNo, CreatedDate, CreatedTime, Text
from (select c.*, CreatedDate as key1, 0 as ordering
from comment c
union all
select c2.*, c.CreatedDate, 1
from (select distinct CreatedDate from comment c) c left join
comment c2
on 1 = 0
) c
order by key1, ordering, id;
Note the use of the left join in the second subquery to bring in all the columns, so it matches the select * in the first subquery. However, getting rid of the last two columns still requires listing all of them.
I have adapted the code found at: enter link description here which can easily be extended to include a line when parsing the data to write the .csv file.
// Iinitialise FileWriter object.
fileWriter = new FileWriter(fileName);
// Iinitialise CSVPrinter object/ t.
csvFilePrinter = new CSVPrinter(fileWriter, csvFileFormat);
// Create .csv file header.
csvFilePrinter.printRecord(FILE_HEADER);
// Write a new student object list to the .csv file.
for (Student student : students) {
ArrayList studentDataRecord = new ArrayList();
studentDataRecord.add(String.valueOf(student.getId()));
studentDataRecord.add(student.getFirstName());
studentDataRecord.add(student.getLastName());
studentDataRecord.add(student.getGender());
studentDataRecord.add(String.valueOf(student.getAge()));
csvFilePrinter.printRecord(studentDataRecord);
csvFilePrinter.println();
}
Although I have yet to add code to determine the change in the column value required but that is straightforward.
I was given a task to show the CPU usage trend as part of a building process which also do regression test.
Each individual test case run has a record in the table RegrCaseResult. The RegrCaseResult table looks something like this:
id projectName ProjectType returnCode startTime endTime totalMetrics
1 'first' 'someType' 16 'someTime' 'someOtherTime' 222
The RegrCaseResult.totalMetrics is a special key which links to another table called ThreadMetrics through ThreadMetrics.id.
Here is how ThreadMetrics will look like:
id componentType componentName cpuTime linkId
1 'Job Totals' 'Job Totals' 'totalTime' 34223
2 'parser1' 'parser1' 'time1' null
3 'parser2' 'generator1' 'time2' null
4 'generator1' 'generator1' 'time3' null
------------------------------------------------------
5 'Job Totals' 'Jot Totals' 'totalTime' 9899
...
The rows with the compnentName 'Job Totals' is what the totalMetrics from RegrCaseResult table will link to and the 'totalTime' is what I am really want to get given a certain projectType. The 'Job Totals' is actually a summation of the other records - in the above example, the summation of time1 through time3. The linkId at the end of table ThreadMetrics can link back to RegrCaseResult.id.
The requirements also states I should have a way to enforce the condition which only includes those projects which have a consistent return code during certain period. That's where my initial question comes from as follows:
I created the following simple table to show what I am trying to achieve:
id projectName returnCode
1 'first' 16
2 'second' 16
3 'third' 8
4 'first' 16
5 'second' 8
6 'first' 16
Basically I want to get all the projects which have a consistent returnCode no matter what the returnCode values are. In the above sample, I should only get one project which is "first". I think this would be simple but I am bad when it comes to database. Any help would be great.
I tried my best to make it clear. Hope I have achieved my goal.
Here is an easy way:
select projectname
from table t
group by projectname
having min(returncode) = max(returncode);
If the min() and max() values are the same, then all the values are the same (unless you have NULL values).
EDIT:
To keep 'third' out, you need some other rule, such as having more than one return code. So, you can do this:
select projectname
from table t
group by projectname
having min(returncode) = max(returncode) and count(*) > 1;
select projectName from projects
group by projectName having count(distinct(returnCode)) = 1)
This would also return projects which has only one entry.
How do you want to handle them?
Working example: http://www.sqlfiddle.com/#!2/e7338/8
This should do it:
SELECT COUNT(ProjectName) AS numCount, ProjectName FROM (
SELECT ProjectName FROM Foo
GROUP BY ProjectName, ReturnCode
) AS Inside
GROUP BY Inside.ProjectName
HAVING numCount = 1
This groups all the ProjectNames by their names and return codes, then selects those that only have a single return code listed.
SQLFiddle Link: http://sqlfiddle.com/#!2/c52b6/11/0
You can try something like this with Not Exists:
Select Distinct ProjectName
From Table A
Where Not Exists
(
Select 1
From Table B
Where B.ProjectName = A.ProjectName
And B.ReturnCode <> A.ReturnCode
)
I'm not sure exactly what you're selecting, so you can change the Select statement to what you need.
I have a very large csv log file with the following header:
CustomerID , Date , URL , ....
I want to find all those customers who had visited at least 2 distinct URLS exactly in 2 days within the last 3 days.
What would be the SQL command ,
I though of this one : (how the date part looks : GETDATE()-4 is not important at the moment)
SELECT CustomerID FROM log
WHERE DATE > (GETDATE() - 4)
GROUP BY (CustomerID, DATE, URL)
HAVING COUNT(DISTINCT(DATE)) = 2
AND HAVING (COUNT(DISTINCT(URL))) > 2
Just miss out the having keyword so like
Having condition1 > val1 and condition2 >val 2
Sorry I'm on a phone so can't copy and paste that well