dapr - v1.12.4
Dapr 1.12.4
This update includes bug fixes:
- Mitigate race condition in placement table during sidecar restarts
- Fixes in service invocation when target app has multiple replicas on Kubernetes
Mitigate race condition in placement table during sidecar restarts
Problem
When restarting a significant deployment on Kubernetes (> 20 pods), many pods will be removed from placement table. When the restarts happens concurrently with leadership changes in placement service, the placement table can get into a corrupt state that requires the restart of placement service #7311
Impact
Impacts users running Dapr 1.12.0-1.12.3 that uses actors and deployments with many sidecar instances.
Root cause
Race condition between placement leadership changes and placement table updates due to pods being terminated.
Solution
Mitigate by reducing the chances of leadership changes in placement service and fix a bug that can cause terminated pods to remain in placement table forever.
Fixes in service invocation when target app has multiple replicas on Kubernetes
Problem
When performing service invocation to another app using Dapr (using either HTTP or gRPC), the caller sidecar establishes a gRPC connection with the target sidecar. When the target app is deployed with multiple replicas (scaled horizontally), a connection is established with each replica and requests are load-balanced across all replicas.
In Dapr 1.12.3 and lower running on Kubernetes, due to the way this logic was implemented, if the connection with one of the replicas became idle, it would have caused the connection between the caller sidecar and all replicas to be severed, possibly interrupting other in-flight service invocation calls.
Impact
This issue impacts users running Dapr 1.12.3 and lower, running on Kubernetes, that use service invocation to invoke apps that are scaled horizontally.
This issue does not impact users that are running outside of Kubernetes or who are using other Dapr nameresolution components (including mDNS or Consul).
Root cause
On Kubernetes, the gRPC connection was established with the DNS name of the target Service, and we allowed the gRPC-Go library to perform name resolution with the Kubernetes DNS server and establish a connection with each replica. When the connection with one of the replicas becomes idle, the caller app receives a "GOAWAY" message which is interpreted as a signal to terminate all connections with all replicas of the app.
Solution
We have changed the internals of service invocation, so we always perform DNS resolution in the caller Dapr sidecar, also on Kubernetes. The gRPC-Go library receives an individual IP address to connect to, so "GOAWAY" messages only impact the connection with an individual replica.
Details
- 🔍View and search all dapr releases.
- 🛠️Create and share lists to track your tools.
- 🚨Setup notifications for major, security, feature or patch updates.
- 🚀Much more coming soon!