-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak and high CPU usage caused by frequent Data Channel restarts #357
Comments
@jerry-tao looks that we need something like #239 . Could you return to that issue? If we cannot remove chunks when stream closes, maybe we need some timeout to check queues for them? Also keep in mind that when CPU is busy, some chunks may be stuck in pending queue for more time than usual, so some extra protection against this may be needed too. |
I have performed some tests trying to reproduce this and found that Chrome sends SACKs for chunks enqueued after stream was closed and pion removes them from pending queue. So this may be caused by retransmissions on pion side or delayed SACKs from Chrome. This needs more testing. |
Hi @jerry-tao, I have tried your patch. Now memory is reclaimed much faster after stream is closed (I also lowered max RTO value to few seconds). So my problem was caused by retransmissions. I saw that your patch was reverted because it caused issues with some tests. Could you add it again, but this time together with configuration option so it would be disabled by default? Bo doing it this way tests would not break, and people like me who need this feature could enable it at runtime via |
The approach discussed in #314, blocking I/O, could resolve this issue in a better way, if I’ve understood it correctly. |
Your environment.
What did you do?
I have two apps, server one written in go and JS client running in Chrome. These apps open WebRTC session with DataChannel and no RTP streams. Client then requests stream of some data from server, sent via data channel. When data stops flowing (e.g. due to some network issue), client closes data channel and requests new one from the server (WebRTC session is not restarted when this happens). I have case when this data channel restart was triggered every about 10-20 minutes.
What did you expect?
Server app can run 24/7 for a long time without issues.
What happened?
Memory and CPU usage grows, and server app has to be restarted periodically as a workaround. Before restart server app had lots of memory allocated from
pion/sctp/Stream.packetize
. Looks that data not received by client before it closed data channel somehow got stuck in some pion sctp queue and never dropped after its data channel became closed. It stays there until WebRTC sessions closes or whole app is restarted.Additionally I noticed that new data channels are added to
pion/webrtc/SCTPTransport.dataChannels
list but never removed from it. This also caused small memory leak in my case, although much smaller than one described above.CC @enobufs @edaniels
Edit: I have tried to recreate it and was able to sometimes get following error:
sctp ERROR: 2024/11/29 23:01:24 [0xc001132000] stream 99 not found)
It is logged from here: https://github.com/pion/pion/blob/94171946f00b6acd784fa5c520acdc96aaea5a8b/sctp/association.go#L2259
Probably these chunks should be marked as abandoned?
The text was updated successfully, but these errors were encountered: