Victoria Series Release Notes¶
2.26.0¶
New Features¶
Extend concurrent reads to erasure coded policies. Previously, the options
concurrent_gets
andconcurrency_timeout
only applied to replicated policies.
Add a new
concurrent_ec_extra_requests
option to allow the proxy to make some extra backend requests immediately. The proxy will respond as soon as there are enough responses available to reconstruct.
The concurrent read options (
concurrent_gets
,concurrency_timeout
, andconcurrent_ec_extra_requests
) may now be configured per storage-policy.
Replication servers can now handle all request methods. This allows ssync to work with a separate replication network.
All background daemons now use the replication network. This allows better isolation between external, client-facing traffic and internal, background traffic. Note that during a rolling upgrade, replication servers may respond with
405 Method Not Allowed
. To avoid this, operators should remove the config optionreplication_server = true
from their replication servers; this will allow them to handle all request methods before upgrading.
S3 API improvements:
Fixed some SignatureDoesNotMatch errors when using the AWS .NET SDK.
Add basic read support for object tagging. This improves compatibility with AWS CLI version 2. Write support is not yet implemented, so the tag set will always be empty.
CompleteMultipartUpload requests may now be safely retried.
Improved quota-exceeded error messages.
Improved logging and statsd metrics. Be aware that this will cause an increase in the proxy-logging statsd metrics emitted for S3 responses. However, this should more accurately reflect the state of the system.
S3 requests are now less demanding on the container layer.
Servers now open one listen socket per worker, ensuring each worker serves roughly the same number of concurrent connections.
Server workers may now be gracefully terminated via
SIGHUP
orSIGUSR1
. The parent process will then spawn a fresh worker.
Allow proxy-logging middlewares to be configured more independently.
Improve performance when increasing partition power.
Known Issues¶
In a rolling upgrade from liberasurecode 1.5.0 or earlier to 1.6.0 or later, object-servers may quarantine newly-written data, leading to availability issues or even data loss. See bug 1886088 for more information, including how to determine whether you are affected. Several mitigations are available to operators:
If proxy and object layers can be upgraded independently and proxies can be upgraded quickly:
Stop and disable the object-reconstructor before upgrading. This ensures no upgraded object server starts writing new fragments that old object servers would quarantine.
Upgrade liberasurecode on all object servers. Object servers can now read both old and new fragments.
Upgrade liberasurecode on all proxy servers. Newly-written data will now use new fragments. Note that not-yet-upgraded proxies will not be able to read these newly-written fragments but will instead respond
500 Internal Server Error
.After upgrading, re-enable and restart the object-reconstructor.
If your users can tolerate it, consider a read-only rolling upgrade. Before upgrading, enable the read-only middleware cluster-wide to prevent new writes during the upgrade. Additionally, stop and disable the object-reconstructor as above. Upgrade normally, then disable the read-only middleware and re-enable and restart the object-reconstructor.
Avoid upgrading liberasurecode until swift and liberasurecode better-support a rolling upgrade. Swift remains compatible with liberasurecode 1.5.0 and earlier.
Note
Ubuntu 18.04 and RDO’s CentOS 7 repos package liberasurecode 1.5.0, while Ubuntu 20.04 and RDO’s CentOS 8 repos currently package liberasurecode 1.6.0 or 1.6.1. Take care when upgrading major distro versions!
Upgrade Notes¶
If your cluster has encryption enabled and is still running Swift under Python 2, we recommend upgrading Swift before transitioning to Python 3. Otherwise, new writes to objects with non-ASCII characters in their paths may result in corrupted downloads when read from a proxy-server still running old swift on Python 2. See bug 1888037 for more information. Note that new tags including a fix for the bug are planned for all maintained stable branches; upgrading to any one of those should be sufficient to ensure a smooth upgrade to the latest Swift.
The above bug was caused by a difference in string types that resulted in ambiguity when decrypting. To prevent the ambiguity for new data, set
meta_version_to_write = 3
in your keymaster configuration after upgrading all proxy servers.If upgrading from Swift 2.20.0 or Swift 2.19.1 or earlier, set
meta_version_to_write = 1
in your keymaster configuration prior to upgrading.See the provided
keymaster.conf-sample
for more information about this setting.
If your cluster is configured with a separate replication network, note that background daemons will switch to using this network for all traffic. If your account, container, or object replication servers are configured with
replication_server = true
, these daemons may log a flood of405 Method Not Allowed
messages during a rolling upgrade. To avoid this, comment out the option and restart replication servers before upgrading.
Bug Fixes¶
Python 3 bug fixes:
Fixed an error when reading encrypted data that was written while running Python 2 for a path that includes non-ASCII characters.
Object expiration respects the
expiring_objects_container_divisor
config option.fallocate_reserve
may be specified as a percentage in more places.The ETag-quoting middleware no longer raises TypeErrors.
Sharding improvements:
Prevent object updates from auto-creating shard containers. This ensures more consistent listings for sharded containers during rebalances.
Deleted shard containers are no longer considered root containers. This prevents unnecessary sharding audit failures and allows the deleted shard database to actually be unlinked.
swift-container-info
now summarizes shard range information. Pass-v
/--verbose
if you want to see all of them.Improved container-sharder stat reporting to reduce load on root container databases.
Don’t inject shard ranges when user quits.
During rebalances, clients should no longer get 404s for data that exists but whose replicas are overloaded.
Improved cache management for account and container responses.
Allow operators to pass either raw or URL-quoted paths to
swift-get-nodes
. Notably, this allowsswift-get-nodes
to work with the reserved namespace used for object versioning.
Container read ACLs now work with object versioning. This only allows access to the most-recent version via an unversioned URL.
Improved how containers reclaim deleted rows to reduce locking and object update throughput.
Large object reads log fewer client disconnects.
Allow ratelimit to be placed multiple times in a proxy pipeline, such as both before s3api and auth (to handle swift requests without needing to make an auth decision) and after (to limit S3 requests).
Shuffle object-updater work. This somewhat reduces the impact a single overloaded database has on other containers’ listings.
Fix a proxy-server error when retrieving erasure coded data when there are durable fragments but not enough to reconstruct.
Fix an error in the proxy server when finalising data.
Various other minor bug fixes and improvements.