Properly using chrome.tabCapture in a manifest v3 extension - google-chrome

Edit:
As the end of the year and the end of Manifest V2 is approaching I did a bit more research on this and found the following workarounds:
The example here that uses the desktopCapture API:
https://github.com/GoogleChrome/chrome-extensions-samples/issues/627
The problem with this approach is that it requires the user to select a capture source via some UI which can be disruptive. The --auto-select-desktop-capture-source command line switch can apparently be used to bypass this but I haven't been able to use it with success.
The example extension here that works around tabCapture not working in
service workers by creating its own inactive tab from
which to access the tabCapture API and record the currently
active tab:
https://github.com/zhw2590582/chrome-audio-capture
So far this seems to be the best solution I've found so far in terms of UX. The background page provided in Manifest V2 is essentially replaced with a phantom tab.
The roundaboutedness of the second solution also seems to suggest that the tabCapture API is essentially not intended for use in Manifest V3, or else there would have been a more straightforward way to use it. I am disappointed that Manifest V3 is being enforced while essentially leaving behind Manifest V2 features such as this one.
Original Post:
I'm trying to write a manifest v3 Chrome extension that captures tab audio. However as far as I can tell, with manifest v3 there are some changes that make this a bit difficult:
Background scripts are replaced by service workers.
Service workers do not have access to the chrome.tabCapture API.
Despite this I managed to get something that nearly works as popup scripts still have access to chrome.tabCapture. However, there is a drawback - the audio of the tab is muted and there doesn't seem to be a way to unmute it. This is what I have so far:
Query the service worker current tab from the popup script.
let tabId;
// Fetch tab immediately
chrome.runtime.sendMessage({command: 'query-active-tab'}, (response) => {
tabId = response.id;
});
This is the service worker, which response with the current tab ID.
chrome.runtime.onMessage.addListener(
(request, sender, sendResponse) => {
// Popup asks for current tab
if (request.command === 'query-active-tab') {
chrome.tabs.query({active: true}, (tabs) => {
if (tabs.length > 0) {
sendResponse({id: tabs[0].id});
}
});
return true;
}
...
Again in the popup script, from a keyboard shortcut command, use chrome.tabCapture.getMediaStreamId to get a media stream ID to be consumed by the current tab, and send that stream ID back to the service worker.
// On command, get the stream ID and forward it back to the service worker
chrome.commands.onCommand.addListener((command) => {
chrome.tabCapture.getMediaStreamId({consumerTabId: tabId}, (streamId) => {
chrome.runtime.sendMessage({
command: 'tab-media-stream',
tabId: tabId,
streamId: streamId
})
});
});
The service worker forwards that stream ID to the content script.
chrome.runtime.onMessage.addListener(
(request, sender, sendResponse) => {
...
// Popup sent back media stream ID, forward it to the content script
if (request.command === 'tab-media-stream') {
chrome.tabs.sendMessage(request.tabId, {
command: 'tab-media-stream',
streamId: request.streamId
});
}
}
);
The content script uses navigator.mediaDevices.getUserMedia to get the stream.
// Service worker sent us the stream ID, use it to get the stream
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
navigator.mediaDevices.getUserMedia({
video: false,
audio: true,
audio: {
mandatory: {
chromeMediaSource: 'tab',
chromeMediaSourceId: request.streamId
}
}
})
.then((stream) => {
// Once we're here, the audio in the tab is muted
// However, recording the audio works!
const recorder = new MediaRecorder(stream);
const chunks = [];
recorder.ondataavailable = (e) => {
chunks.push(e.data);
};
recorder.onstop = (e) => saveToFile(new Blob(chunks), "test.wav");
recorder.start();
setTimeout(() => recorder.stop(), 5000);
});
});
Here is the code that implements the above: https://github.com/killergerbah/-test-tab-capture-extension
This actually does produce a MediaStream, but the drawback is that the sound of the tab is muted. I've tried playing the stream through an audio element, but that seems to do nothing.
Is there a way to obtain a stream of the tab audio in a manifest v3 extension without muting the audio in the tab?
I suspect that this approach might be completely wrong as it's so roundabout, but this is the best I could come up with after reading through the docs and various StackOverflow posts.
I've also read that the tabCapture API is going to be moved for manifest v3 at some point, so maybe the question doesn't even make sense to ask - however if there is a way to still properly use it I would like to know.

I found your post very useful in progressing my implementation of an audio tab recorder.
Regarding the specific muting issue you were running into, I resolved it by looking here: Original audio of tab gets muted while using chrome.tabCapture.capture() and MediaRecorder()
// Service worker sent us the stream ID, use it to get the stream
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
navigator.mediaDevices.getUserMedia({
video: false,
audio: true,
audio: {
mandatory: {
chromeMediaSource: 'tab',
chromeMediaSourceId: request.streamId
}
}
})
.then((stream) => {
// To resolve original audio muting
context = new AudioContext();
var audio = context.createMediaStreamSource(stream);
audio.connect(context.destination);
const recorder = new MediaRecorder(stream);
const chunks = [];
recorder.ondataavailable = (e) => {
chunks.push(e.data);
};
recorder.onstop = (e) => saveToFile(new Blob(chunks), "test.wav");
recorder.start();
setTimeout(() => recorder.stop(), 5000);
});
});

This may not be exactly what you are looking for, but perhaps it may provide some insight.
I've tried playing the stream through an audio element, but that seems to do nothing.
Ironically this is how I managed to get around the issue; by creating an object in the popup itself. When using tabCapture in the popup script, it returns the stream, and I set the audio srcObject to that stream.
HTML:
<audio id="audioObject" autoplay> No source detected </audio>
JS:
chrome.tabCapture.capture({audio: true, video: false}, function(stream) {
var audio = document.getElementById("audioObject");
audio.srcObject = stream
})
According to this post on Manifest V3, chrome.capture will be the new namespace for tabCapture and the like, but I haven't seen anything beyond that.

I had this problem too, and I resolve it by using Web Audio API. Just create a new context and conect it to a media stream source using the captures MediaStream, this is an example:
avoidSilenceInTab: (desktopStream: MediaStream) => {
var contextTab = new AudioContext();
contextTab
.createMediaStreamSource(desktopStream)
.connect(contextTab.destination);
}

Related

Chrome manifest V3 extensions and externally_connectable documentation

After preparing the migration of my chrome manifest V2 extension to manifest V3 and reading about the problems with persistent service workers I prepared myself for a battle with the unknown. My V2 background script uses a whole bunch of globally declared variables and I expected I need to refactor that.
But to my great surprise my extension background script seems to work out of the box without any trouble in manifest V3. My extension uses externally_connectable. The typical use case for my extension is that the user can navigate to my website 'bla.com' and from there it can send jobs to the extension background script.
My manifest says:
"externally_connectable": {
"matches": [
"*://localhost/*",
"https://*.bla.com/*"
]
}
My background script listens to external messages and connects:
chrome.runtime.onMessageExternal.addListener( (message, sender, sendResponse) => {
log('received external message', message);
});
chrome.runtime.onConnectExternal.addListener(function(port) {
messageExternalPort = port;
if (messageExternalPort && typeof messageExternalPort.onDisconnect === 'function') {
messageExternalPort.onDisconnect(function () {
messageExternalPort = null;
})
}
});
From bla.com I send messages to the extension as follows
chrome.runtime.sendMessage(EXTENSION_ID, { type: "collect" });
From bla.com I receive messages from the extension as follows
const setUpExtensionListener = () => {
// Connect to chrome extension
this.port = chrome.runtime.connect(EXTENSION_ID, { name: 'query' });
// Add listener
this.port.onMessage.addListener(handleExtensionMessage);
}
I tested all scenarios including the anticipation of the famous service worker unload after 5 minutes or 30 seconds inactivity, but it all seems to work. Good for me, but something is itchy. I cannot find any documentation that explains precisely under which circumstances the service worker is unloaded. I do not understand why things seem to work out of the box in my situation and why so many others experience problems. Can anybody explain or refer to proper documentation. Thanks in advance.

can make extinction chrome work automatically when open chrome [duplicate]

I'm writing a Chrome extension and trying to overlay a <div> over the current webpage as soon as a button is clicked in the popup.html file.
When I access the document.body.insertBefore method from within popup.html it overlays the <div> on the popup, rather than the current webpage.
Do I have to use messaging between background.html and popup.html in order to access the web page's DOM? I would like to do everything in popup.html, and to use jQuery too, if possible.
ManifestV3 service worker doesn't have any DOM/document/window.
ManifestV3/V2 extension pages (and the scripts inside) have their own DOM, document, window, and a chrome-extension:// URL (use devtools for that part of the extension to inspect it).
You need a content script to access DOM of web pages and interact with a tab's contents. Content scripts will execute in the tab as a part of that page, not as a part of the extension, so don't load your content script(s) in the extension page, use the following methods:
Method 1. Declarative
manifest.json:
"content_scripts": [{
"matches": ["*://*.example.com/*"],
"js": ["contentScript.js"]
}],
It will run once when the page loads. After that happens, use messaging but note, it can't send DOM elements, Map, Set, ArrayBuffer, classes, functions, and so on - it can only send JSON-compatible simple objects and types so you'll need to manually extract the required data and pass it as a simple array or object.
Method 2. Programmatic
ManifestV2:
Use chrome.tabs.executeScript in the extension script (like the popup or background) to inject a content script into a tab on demand.
The callback of this method receives results of the last expression in the content script so it can be used to extract data which must be JSON-compatible, see method 1 note above.
Required permissions in manifest.json:
Best case: "activeTab", suitable for a response to a user action (usually a click on the extension icon in the toolbar). Doesn't show a permission warning when installing the extension.
Usual: "*://*.example.com/" plus any other sites you want.
Worst case: "<all_urls>" or "*://*/", "http://*/", "https://*/" - when submitting into Chrome Web Store all of these put your extension in a super slow review queue because of broad host permissions.
ManifestV3 differences to the above:
Use chrome.scripting.executeScript.
Required permissions in manifest.json:
"scripting" - mandatory
"activeTab" - ideal scenario, see notes for ManifestV2 above.
If ideal scenario is impossible add the allowed sites to host_permissions in manifest.json.
Some examples of the extension popup script that use programmatic injection to add that div.
ManifestV3
Don't forget to add the permissions in manifest.json, see the other answer for more info.
Simple call:
(async () => {
const [tab] = await chrome.tabs.query({active: true, currentWindow: true});
await chrome.scripting.executeScript({
target: {tabId: tab.id},
func: inContent1,
});
})();
// executeScript runs this code inside the tab
function inContent1() {
const el = document.createElement('div');
el.style.cssText = 'position:fixed; top:0; left:0; right:0; background:red';
el.textContent = 'DIV';
document.body.appendChild(el);
}
Note: in Chrome 91 or older func: should be function:.
Calling with parameters and receiving a result
Requires Chrome 92 as it implemented args.
Example 1:
res = await chrome.scripting.executeScript({
target: {tabId: tab.id},
func: (a, b) => { return [window[a], window[b]]; },
args: ['foo', 'bar'],
});
Example 2:
(async () => {
const [tab] = await chrome.tabs.query({active: true, currentWindow: true});
let res;
try {
res = await chrome.scripting.executeScript({
target: {tabId: tab.id},
func: inContent2,
args: [{ foo: 'bar' }], // arguments must be JSON-serializable
});
} catch (e) {
console.warn(e.message || e);
return;
}
// res[0] contains results for the main page of the tab
document.body.textContent = JSON.stringify(res[0].result);
})();
// executeScript runs this code inside the tab
function inContent2(params) {
const el = document.createElement('div');
el.style.cssText = 'position:fixed; top:0; left:0; right:0; background:red';
el.textContent = params.foo;
document.body.appendChild(el);
return {
success: true,
html: document.body.innerHTML,
};
}
ManifestV2
Simple call:
// uses inContent1 from ManifestV3 example above
chrome.tabs.executeScript({ code: `(${ inContent1 })()` });
Calling with parameters and receiving a result:
// uses inContent2 from ManifestV3 example above
chrome.tabs.executeScript({
code: `(${ inContent2 })(${ JSON.stringify({ foo: 'bar' }) })`
}, ([result] = []) => {
if (!chrome.runtime.lastError) {
console.log(result); // shown in devtools of the popup window
}
});
This example uses automatic conversion of inContent function's code to string, the benefit here is that IDE can apply syntax highlight and linting. The obvious drawback is that the browser wastes time to parse the code, but usually it's less than 1 millisecond thus negligible.

Ways to capture incoming WebRTC video streams (client side)

I am currently looking to find a best way to store a incoming webrtc video streams. I am joining the videocall using webrtc (via chrome) and I would like to record every incoming video stream to from each participant to the browser.
The solutions I am researching are:
Intercept network packets coming to the browsers e.g. using Whireshark and then decode. Following this article: https://webrtchacks.com/video_replay/
Modifying a browser to store recording as a file e.g. by modifying Chromium itself
Any screen-recorders or using solutions like xvfb & ffmpeg is not an options due the resources constrains. Is there any other way that could let me capture packets or encoded video as a file? The solution must be working on Linux.
if the media stream is what you want a method is to override the browser's PeerConnection. Here is an example:
In an extension manifest add the following content script:
content_scripts": [
{
"matches": ["http://*/*", "https://*/*"],
"js": ["payload/inject.js"],
"all_frames": true,
"match_about_blank": true,
"run_at": "document_start"
}
]
inject.js
var inject = '('+function() {
//overide the browser's default RTCPeerConnection.
var origPeerConnection = window.RTCPeerConnection || window.webkitRTCPeerConnection || window.mozRTCPeerConnection;
//make sure it is supported
if (origPeerConnection) {
//our own RTCPeerConnection
var newPeerConnection = function(config, constraints) {
console.log('PeerConnection created with config', config);
//proxy the orginal peer connection
var pc = new origPeerConnection(config, constraints);
//store the old addStream
var oldAddStream = pc.addStream;
//addStream is called when a local stream is added.
//arguments[0] is a local media stream
pc.addStream = function() {
console.log("our add stream called!")
//our mediaStream object
console.dir(arguments[0])
return oldAddStream.apply(this, arguments);
}
//ontrack is called when a remote track is added.
//the media stream(s) are located in event.streams
pc.ontrack = function(event) {
console.log("ontrack got a track")
console.dir(event);
}
window.ourPC = pc;
return pc;
};
['RTCPeerConnection', 'webkitRTCPeerConnection', 'mozRTCPeerConnection'].forEach(function(obj) {
// Override objects if they exist in the window object
if (window.hasOwnProperty(obj)) {
window[obj] = newPeerConnection;
// Copy the static methods
Object.keys(origPeerConnection).forEach(function(x){
window[obj][x] = origPeerConnection[x];
})
window[obj].prototype = origPeerConnection.prototype;
}
});
}
}+')();';
var script = document.createElement('script');
script.textContent = inject;
(document.head||document.documentElement).appendChild(script);
script.parentNode.removeChild(script);
I tested this with a voice call in google hangouts and saw that two mediaStreams where added via pc.addStream and one track was added via pc.ontrack. addStream would seem to be local media streams and the event object in ontrack is a RTCTrackEvent which has a streams object. which I assume are what you are looking for.
To access these streams from your extenion's content script you will need to create audio elements and set the "srcObject" property to the media stream: e.g.
pc.ontrack = function(event) {
//check if our element exists
var elm = document.getElementById("remoteStream");
if(elm == null) {
//create an audio element
elm = document.createElement("audio");
elm.id = "remoteStream";
}
//set the srcObject to our stream. not sure if you need to clone it
elm.srcObject = event.streams[0].clone();
//write the elment to the body
document.body.appendChild(elm);
//fire a custom event so our content script knows the stream is available.
// you could pass the id in the "detail" object. for example:
//CustomEvent("remoteStreamAdded", {"detail":{"id":"audio_element_id"}})
//then access if via e.detail.id in your event listener.
var e = CustomEvent("remoteStreamAdded");
window.dispatchEvent(e);
}
Then in your content script you can listen for that event/access the mediastream like so:
window.addEventListener("remoteStreamAdded", function(e) {
elm = document.getElementById("remoteStream");
var stream = elm.captureStream();
})
With the capture stream available to your content script you can do pretty much anything you want with it. For example, MediaRecorder works really well for recording the stream(s) or you could use something like peer.js or maybe binary.js to stream to another source.
I haven't tested this but it should also be possible to override the local streams. For example, in the inject.js you could establish some blank mediastream, override navigator.mediaDevices.getUserMedia and instead of returning the local mediastream return your own mediastream.
This method should work in firefox and maybe others as well assuming you use an extenion/app to load the inject.js script at the start of the document. It being loaded before any of the target's libs is key to making this work.
edited for more detail
edited for even more detail
Capturing packets will only give you the network packets which you would then need to turn into frames and put into a container. A server such as Janus can record videos.
Running headless chrome and using the javascript MediaRecorder API is another option but much more heavy on resources.

getUserMedia() Screen share with Audio and update tray

When I use getUserMedia() for screen share, I don't get audio.
Things which I would like to do, but couldn't find any relevant stuff:
I want to capture both the screen and audio at the same time. How can I achieve this ?
When my screen share starts, the below tray appears. What it is called ? and how can I modify it (like its looks) ?
Screenshot:
if you want one stream made of your screensharing for the video track and your webcam/mike audio for the audio track, you will need to make 2 calls to getusermedia with constraints set to screen and audio, respectively. then you will have to put the tracks in a common stream. Eventually, you can attach that stream to a peer connection.
as peveuve said, you can also use two peer connections, but it comes with at least two problems:
you will not have synchronization between audio and video (not so important for screensahring)
you will need two connection => twice the number of ports => more chance to fail. That is more likely to be a problem.
this is a mandatory security feature from the browser (to prevent a rogue page to broadcast your screen without you knowing it). I do not know of a way to manipulate it at all
its possible with npm-msr on Chrome.
getScreenId(function (error, sourceId, screen_constraints) {
navigator.getUserMedia = navigator.mozGetUserMedia || navigator.webkitGetUserMedia;
navigator.getUserMedia(screen_constraints, function (stream) {
navigator.getUserMedia({audio: true}, function (audioStream) {
stream.addTrack(audioStream.getAudioTracks()[0]);
var mediaRecorder = new MediaStreamRecorder(stream);
mediaRecorder.mimeType = 'video/mp4'
mediaRecorder.stream = stream;
document.querySelector('video').src = URL.createObjectURL(stream);
var video = document.getElementById('screen-video')
if (video) {
video.src = URL.createObjectURL(stream);
video.width = 360;
video.height = 300;
}
}, function (error) {
alert(error);
});
}, function (error) {
alert(error);
});
});
Check this answer: Is it possible broadcast audio with screensharing with WebRTC
You can't share both screen and audio in the same peer, you have to open 2 peers.

Web Audio API: How to load another audio file?

I want to write a basic script for HTML5 Web Audio API, can play some audio files. But I don't know how to unload a playing audio and load another one. In my script two audio files are playing in the same time,but not what I wanted.
Here is my code:
var context,
soundSource,
soundBuffer;
// Step 1 - Initialise the Audio Context
context = new webkitAudioContext();
// Step 2: Load our Sound using XHR
function playSound(url) {
// Note: this loads asynchronously
var request = new XMLHttpRequest();
request.open("GET", url, true);
request.responseType = "arraybuffer";
// Our asynchronous callback
request.onload = function() {
var audioData = request.response;
audioGraph(audioData);
};
request.send();
}
// This is the code we are interested in
function audioGraph(audioData) {
// create a sound source
soundSource = context.createBufferSource();
// The Audio Context handles creating source buffers from raw binary
soundBuffer = context.createBuffer(audioData, true/* make mono */);
// Add the buffered data to our object
soundSource.buffer = soundBuffer;
// Plug the cable from one thing to the other
soundSource.connect(context.destination);
// Finally
soundSource.noteOn(context.currentTime);
}
// Stop all of sounds
function stopSounds(){
// How can do this?
}
// Events for audio buttons
document.querySelector('.pre').addEventListener('click',
function () {
stopSounds();
playSound('http://thelab.thingsinjars.com/web-audio-tutorial/hello.mp3');
}
);
document.querySelector('.next').addEventListener('click',
function() {
stopSounds();
playSound('http://thelab.thingsinjars.com/web-audio-tutorial/nokia.mp3');
}
);
You should be pre-loading sounds into buffers once, at launch, and simply resetting the AudioBufferSourceNode whenever you want to play it back.
To play multiple sounds in sequence, you need to schedule them using noteOn(time), one after the other, based on buffer respective lengths.
To stop sounds, use noteOff.
Sounds like you are missing some fundamental web audio concepts. This (and more) is described in detail and shown with samples in this HTML5Rocks tutorial and the FAQ.