Memory leak and high CPU usage caused by frequent Data Channel restarts #357

sirzooro · 2024-11-29T18:23:36Z

Your environment.

Version: sctp v1.8.16, webrtc v3.2.42
Browser: Chrome

What did you do?

I have two apps, server one written in go and JS client running in Chrome. These apps open WebRTC session with DataChannel and no RTP streams. Client then requests stream of some data from server, sent via data channel. When data stops flowing (e.g. due to some network issue), client closes data channel and requests new one from the server (WebRTC session is not restarted when this happens). I have case when this data channel restart was triggered every about 10-20 minutes.

What did you expect?

Server app can run 24/7 for a long time without issues.

What happened?

Memory and CPU usage grows, and server app has to be restarted periodically as a workaround. Before restart server app had lots of memory allocated from pion/sctp/Stream.packetize. Looks that data not received by client before it closed data channel somehow got stuck in some pion sctp queue and never dropped after its data channel became closed. It stays there until WebRTC sessions closes or whole app is restarted.

Additionally I noticed that new data channels are added to pion/webrtc/SCTPTransport.dataChannels list but never removed from it. This also caused small memory leak in my case, although much smaller than one described above.

CC @enobufs @edaniels

Edit: I have tried to recreate it and was able to sometimes get following error:
sctp ERROR: 2024/11/29 23:01:24 [0xc001132000] stream 99 not found)
It is logged from here: https://github.com/pion/pion/blob/94171946f00b6acd784fa5c520acdc96aaea5a8b/sctp/association.go#L2259

Probably these chunks should be marked as abandoned?

The text was updated successfully, but these errors were encountered:

sirzooro · 2024-12-03T17:40:35Z

@jerry-tao looks that we need something like #239 . Could you return to that issue? If we cannot remove chunks when stream closes, maybe we need some timeout to check queues for them? Also keep in mind that when CPU is busy, some chunks may be stuck in pending queue for more time than usual, so some extra protection against this may be needed too.

sirzooro · 2024-12-04T17:16:23Z

I have performed some tests trying to reproduce this and found that Chrome sends SACKs for chunks enqueued after stream was closed and pion removes them from pending queue. So this may be caused by retransmissions on pion side or delayed SACKs from Chrome. This needs more testing.

jerry-tao · 2024-12-05T05:21:02Z

It seems @edaniels and @enobufs are planning on V2 in #314, you could attach this to it.
Could you try the #239 patch to see if it solves your problem?

sirzooro · 2024-12-05T13:54:03Z

Hi @jerry-tao, I have tried your patch. Now memory is reclaimed much faster after stream is closed (I also lowered max RTO value to few seconds). So my problem was caused by retransmissions.

I saw that your patch was reverted because it caused issues with some tests. Could you add it again, but this time together with configuration option so it would be disabled by default? Bo doing it this way tests would not break, and people like me who need this feature could enable it at runtime via webrtc.SettingsEngine.

jerry-tao · 2024-12-06T07:55:27Z

The approach discussed in #314, blocking I/O, could resolve this issue in a better way, if I’ve understood it correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak and high CPU usage caused by frequent Data Channel restarts #357

Memory leak and high CPU usage caused by frequent Data Channel restarts #357

sirzooro commented Nov 29, 2024 •

edited

Loading

sirzooro commented Dec 3, 2024

sirzooro commented Dec 4, 2024

jerry-tao commented Dec 5, 2024

sirzooro commented Dec 5, 2024

jerry-tao commented Dec 6, 2024

Memory leak and high CPU usage caused by frequent Data Channel restarts #357

Memory leak and high CPU usage caused by frequent Data Channel restarts #357

Comments

sirzooro commented Nov 29, 2024 • edited Loading

Your environment.

What did you do?

What did you expect?

What happened?

sirzooro commented Dec 3, 2024

sirzooro commented Dec 4, 2024

jerry-tao commented Dec 5, 2024

sirzooro commented Dec 5, 2024

jerry-tao commented Dec 6, 2024

sirzooro commented Nov 29, 2024 •

edited

Loading