How to remove duplicate rows in Google Sheets using script - csv

I currently have a column of data titled JobID. In this column, there are duplicates from an import that runs daily and grabs the latest data on the JobID's in question and appends them to the top of the sheet.
Therefore the most recent JobID rows are the ones with the data we need.
I'd like to know if there is a script that can be run on the sheet called 'History' to look up the column JobID, search every row below for duplicates and remove them, leaving the top, most recent JobID rows in the sheet.
I know that it is really easy to remove duplicates using the "Remove Duplicates" tool in Google Sheets... but I'm lazy and I'm trying to automate as much of this process as possible.
The script I have below runs without an error but is still not doing what I need it to. Wondering where I am going wrong here:
function removeDuplicates() {
//Get current active Spreadsheet
var sheet = SpreadsheetApp.getActive();
var history = sheet.getSheetByName("History");
//Get all values from the spreadsheet's rows
var data = history.getDataRange().getValues();
//Create an array for non-duplicates
var newData = [];
//Iterate through a row's cells
for (var i in data) {
var row = data[i];
var duplicate = false;
for (var j in newData) {
if (row.join() == newData[j].join()) {
duplicate = true;
}
}
//If not a duplicate, put in newData array
if (!duplicate) {
newData.push(row);
}
}
//Delete the old Sheet and insert the newData array
history.clearContents();
history.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}

Remove Duplicate JobIDs
This function will keep the ones nearest to the top of the list. If you want to go the other way then resort the list in reverse order.
function removeDuplicates() {
var ss=SpreadsheetApp.getActive();
var sh=ss.getSheetByName("History");
var vA=sh.getDataRange().getValues();
var hA=vA[0];
var hObj={};
hA.forEach(function(e,i){hObj[e]=i;});//header title to index
var uA=[];
var d=0;
for(var i=0;i<vA.length;i++) {
if(uA.indexOf(vA[i][hObj['JobID']])==-1) {
uA.push(vA[i][hObj['JobID']]);
}else{
sh.deleteRow(i+1-d++);
}
}
}

Remove Duplicate JobIDs in Python
Based on Cooper's answer I wrote the same function in Python:
gsheet_id = "the-gsheet-id"
sh = gc.open_by_url("https://docs.google.com/spreadsheets/d/%s/edit#gid=0" % gsheet_id)
wks = sh[0]
def removeDuplicates(gwks):
headerRow = gwks[1]
columnToIndex = {}
i = 0
for column in headerRow:
columnToIndex[column] = i
i += 1
uniqueArray = []
d = 0
row_i = 0
for row in gwks:
row_i += 1
if gwks[row_i][columnToIndex['JobID']] not in uniqueArray:
uniqueArray.append(gwks[row_i][columnToIndex['JobID']])
else:
d += 1
gwks.delete_rows(row_i + 1 - d, 1)
removeDuplicates(wks)

Related

AutoFill Data with Blank Rows - Google Sheets / Google Apps Script

I have the below spreadsheet that I would like to AutoFill the persons name. the issue is that there are blank rows between the names. Each name is in line with a sku2 and needs to be inline with all locations. there can be up to 10 blank rows (due to how many locations).
if I could loop this maybe
function LoopTillLr() {
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getRange('A2').activate();
spreadsheet.getActiveRange().autoFillToNeighbor(SpreadsheetApp.AutoFillSeries.DEFAULT_SERIES);
spreadsheet.getCurrentCell().getNextDataCell(SpreadsheetApp.Direction.DOWN).activate();
};
Appreciate any help
If you only want to replicate the NAME values against variable LOCATION values then use this script:
function myFunction() {
var ss = SpreadsheetApp.getActiveSheet();
var lastRow = ss.getDataRange().getLastRow();
for (var i = 1; i < lastRow+1; i++) {
if (ss.getRange(i,1).getValue() == "") {
var value = ss.getRange(i-1,1).getValue();
ss.getRange(i,1).setValue(value);
}
}
}
Ensure that A2 is not empty else the script will fail.
If it is a lot of records, you can create a function and run it. The following does this until the end of the sheet, so make sure to delete all the rows towards the end which you do not need or adjust the range in the 2nd row.
function autoFillDown(){
const range = SpreadsheetApp.getActiveSheet().getRange("A:A");
const rows = range.getValues();
let outputArray = [];
rows.forEach( row => {
// if it contains a name, leave it
if( row[0].length > 1){
outputArray.push( [row[0]] )
// otherwise replace it with the value above it
} else {
outputArray.push( [outputArray[outputArray.length-1]] );
}
});
range.setValues( outputArray );
}

Google Script to remove duplicates from top to bottom

I have a script which combines three scripts to do the following:
1) Insert rows from one tab to the top of another tab
2) Remove duplicates from the tab in which the data was just added
3) Clear out the old tab from which the data was just ported over from
For the De-dupe script, it deletes rows starting at the bottom and then goes up. So I'm having established and existing data deleted. What I need it to do is start at the top and go down. So if new row records ported over from the first script are found to be a duplicate, it should delete those instead.
How can I get the de-dupe script to essentially process the opposite way?
I did find reverse logic with the below link, but I can't find a way to make it work with my script and keep getting errors. I'm also not sure if this would be the best methodology to fit in with my overall script.
Link: Removing Duplicate Rows in a google Spreadsheet from the end row
function Run(){
insert();
removeDuplicates();
clear1();
}
function insert() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var source = ss.getSheetByName('Candidate Refresh'); // change here
var des = ss.getSheetByName('Candidate Listing'); // change here
var sv = source
.getDataRange()
.getValues();
sv.shift();
des.insertRowsAfter(1, sv.length);
des.getRange(2, 1, sv.length, source.getLastColumn()).setValues(sv);
}
//Code in Question Start
function removeDuplicates() {
var sheet = SpreadsheetApp.getActiveSheet();
var rows = sheet.getLastRow();
var firstColumn = sheet.getRange(1, 2, rows, 1).getValues();
firstColumn = firstColumn.map(function (e) {return e[0]})
for (var i = rows; i >0; i--) {
if (firstColumn.indexOf(firstColumn[i-1]) != i-1) {
sheet.deleteRow(i);
}
}
}
//Code in Question End
function clear1() {
var sheet = SpreadsheetApp.getActive().getSheetByName('Candidate Refresh');
sheet.getRange('A2:K100').clearContent()
}
If new rows at the top of the sheet are found to be a duplicate, delete the new rows at the top.
try this:
function removeDuplicates() {
var sheet=SpreadsheetApp.getActiveSheet();
var rows=sheet.getLastRow();
var firstColumn=sheet.getRange(1, 2, rows, 1).getValues();
firstColumn = firstColumn.map(function(e){return e[0]})
var uA=[];
for (var i=rows;i>0;i--) {
if (uA.indexOf(firstColumn[i-1])!=-1) {
sheet.deleteRow(i);
}else{
uA.push(firstColumn[i-1]);
}
}
}

Delete duplicate rows and add first column to original row Google script

I've been trying to look on the internet for a piece of code or a direction that could help me solve this issue.
Basically, I have a set of data over four columns where in the latest column a description is given on what the data is representing.
I am importing new data everyday where the description could be the same but a different set of data is given which are just numbers.
Now I would like my script to find the duplicates, take the data and add that to the original data and remove the duplicate.
So, I only want to accumulate column one and delete the rest of the data.
I'm aware of how to find duplicates, delete the entire set of data and push the new set of data without duplicates to the sheet.
However, I cannot find any possibility for this online. It is probably a possibility to get the values of a range in the var Newdata copy that to the row where the duplicate is found and then push that array in its entirety.
However, everything i've tried to incorporate that gives me multiple bugs and infinite calculation times.
I hope someone can help me with this.
function DuplicateRemoval(){
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();
var newData = new Array();
for(i in data){
var row = data[i];
var duplicate = false;
for(j in newData){
if(row.join() == newData[j].join()){
duplicate = true;
}
}
if(!duplicate){
newData.push(row);
}
}
sheet.clearContents();
sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}
Something like this?
function overwriteWithNew() {
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();
var newData = [];
data.forEach(function(row, rowI) {
if (!newData.some(function(row2) {return row[0] === row2[0];})) { // If the key is not in the output yet
for (var row2I = data.length - 1; row2I >= rowI; row2I--) { // Then, starting from the last column
if (data[row2I][0] === row[0]) { // Find the latest data with the same key
newData.push(data[row2I]); // Add it to the output
break; // And continue with the next row
}
}
}
});
sheet.clearContents();
sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}

Google Sheets - App Script - Deleting Row(s) based on Matching ID Leaving Header Alone

Getting close to the final tweaks of the Open Sales Order System Online.
Google Example Sheet - OSOSO
I am trying to REMOVE/DELETE line items from 'Orders' sheet that are in the 'SHIPPED' sheet.
I have created an ID column so that every line item entered into the 'Order' Sheet will have a unique ID, this Unique ID is carried through the Packing Slip system and on into the 'SHIPPED' sheet.
I would like to REMOVE/DELETE these line items from the 'Orders' sheet once they have been transferred to the 'SHIPPED' sheet.
Because I am a neophyte at coding, I am having some troubles.
Here is the script I have been tweaking to use for the above purpose:
function deleteRowInOrders() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var shipped = ss.getSheetByName("SHIPPED");
var orders = ss.getSheetByName("Orders");
var shipVal = shipped.getDataRange().getValues();
var orderVal = orders.getDataRange().getValues();
var resultArray = [];
for(var n in orderVal){
var keep = true
for(var p in shipVal){
if( orderVal[n][0] == shipVal[p][0]){
keep=false ; break ;
}
}
if(keep){ resultArray.push(orderVal[n])};
}
orders.clear()
orders.getRange(1,1,resultArray.length,resultArray[0].length).setValues(resultArray);
}
The Problem I have;
It strips any formatting on the 'Orders' Sheet.
It deletes the 1st row (header) if they match the 'SHIPPED' sheet.
Thank you for any help and guidance you can offer.
M
First, I thought to identify duplicate row IDs and save them in an array. Later loop through them and delete one by one. Make sure when you are deleting a row it changes your row counts. To catch that I used rowsDeleted variable. Hope this is clear.
function deleteRowInOrders()
{
var ss = SpreadsheetApp.getActiveSpreadsheet();
var shipped = ss.getSheetByName("SHIPPED");
var orders = ss.getSheetByName("Orders");
var shipVal = shipped.getDataRange().getValues();
var orderVal = orders.getDataRange().getValues();
var rowIDs = [];
for(var n in orderVal ){
for(var p in shipVal){
if( orderVal[n][0] == shipVal[p][0]){
rowIDs.push(n);
}
}
}
var rowsDeleted = 0 ;
for(var row in rowIDs)
{
var deleteRowID = parseInt(rowIDs[row])+1 - rowsDeleted;
if(deleteRowID>1) // without header row
{
orders.deleteRow(deleteRowID);
rowsDeleted++;
}
}
}

Automatically move data from one sheet to another in google docs

i have a spreadsheet that i keep track of tasks i need to do, once complete i enter a date in the last column. What i want is for that completed task to be moved to sheet 2.
At present i have sheet 1 named SUD_schedule and i want the completed row of data to be moved to sheet 2 named SUD_archive. I've looked through the forum posts already and i've tried a variation of scripts but so far no luck. The closest i have come is this script:
function onEdit() {
var sheet1 = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();//Original sheet
var sheet2 = SpreadsheetApp.getActiveSpreadsheet().getSheets()[1];//target sheet
// to act on only one sheet, check the sheet name here:
//If it si not first sheet, it will do nothing
if (sheet1.getSheetName() != "SUD_schedule") {
return;
}
//Get Row and column index of active cell.
var rowIndex = sheet1.getActiveRange().getRowIndex();
var colIndex = sheet1.getActiveRange().getColumnIndex();
//If the selected column is 10th and it is not a header row
if (colIndex == 16 && rowIndex > 1) {
//Get the data from the current row
var data = sheet1.getRange(rowIndex,1,1,9).getValues();
var lastRow2;
(sheet2.getLastRow()==0)?lastRow2=1:lastRow2=sheet2.getLastRow()+1;
//Copy the data to the lastRow+1th row in target sheet
sheet2.getRange(lastRow2,1,1,data[0].length).setValues(data);
}
}
Column P (16) is the task complete date, row 1 is frozen and contains column headers.
Can anybody help show where i'm going wrong.
Kind regards
Den
Your code is not generic and you are more complicating your objective. Below will work out your need.
function onEdit(){
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet1 = ss.getSheetByName('SUD_schedule');
var sheet2 = ss.getSheetByName('SUD_archive');
var dateColumn = "16";
var array = []
var range = sheet1.getRange(1, 1, sheet1.getLastRow(), dateColumn);
for (var i = 2; i <= sheet1.getLastRow(); i++) //i iterates from 2 as you say R1 is header
{
if(isValidDate(range.getCell(i, dateColumn).getValue()) == true) //checking if any values on column16 is valid date
{
data = sheet1.getRange(i, 1, 1, dateColumn).getValues(); //Getting the range values of particular row where C16 is date
for (var j = 0; j < dateColumn; j++) //Adding the row in array
{
array.push(data[0][j]);
}
}
if(array.length > 0)
{
sheet2.appendRow(array); //Appending the row in sheet2
array = [];
sheet1.deleteRow(i); //deleting the row in sheet as you said you want to move, if you copy remove this and next line
i=i-1; //managing i value after deleting a row.
}
}
}
//Below function return true if the given String is date, else false
function isValidDate(d) {
if ( Object.prototype.toString.call(d) !== "[object Date]" )
return false;
return !isNaN(d.getTime());
}
I am not sure that the syntax you have as used below is entirely correct.
(sheet2.getLastRow()==0)?lastRow2=1:lastRow2=sheet2.getLastRow()+1;
sheet2.getRange(lastRow2,1,1,data[0].length).setValues(data);
What I know will work for certain is if you omit the variable lastRow2 all together and use this instead.
sheet2.getRange(getLastRow+1,1,1,data[0].length).setValues(data);
To complement Joachin's answer, here is how you can adapt that code if you don't have the date in the last row. In the below shown part of the code replace Lastcolumnumber with your last column.
//Getting the range values of particular row where C16 is date
data = sheet1.getRange(i, 1, 1, LASTCOLUMNNUMBER).getValues();
//Adding the row in array
for (var j = 0; j < LASTCOLUMNNUMBER; j++)