This minor release of the Sun ZFS Storage Appliance software contains significant bug fixes for all supported platforms. Please carefully review the list of CRs that have been addressed and all known issues prior to updating.
This release requires appliances to be running the 2011.1.4.2 micro release or higher prior to updating to this release. In addition, this release includes update health checks that are performed automatically when an update is started, prior to the actual update. If an update health check fails, it can cause an update to abort. The update health checks help ensure component issues that may impact an update are addressed. It is important to resolve all hardware component issues prior to performing an update.
This release includes all the issues addressed from the prior releases. Prior release notes may be found here: Software Updates.
This release includes support for DE2-24C/P drive enclosures. For more information on these high capacity and high performance drive enclosures contact your Oracle Sales representative or see the Sun ZFS Storage Appliance Product Webpage.
NOTE: THIS RELEASE DOES NOT SUPPORT MIXING DE2-24C/P DRIVE ENCLOSURES WITH OTHER DRIVE ENCLOSURE TYPES.
If an appliance is running a release prior to 2011.1.5, the following upgrade procedures should be used to configure a new system with DE2-24C/P drive enclosures.
1. Rack all components and cable everything with the exception of the SAS cables between the DE2-24C/P drive enclosures and the controller.
2. Power on the controller and perform the initial configuration (networking, password, etc.).
3. Download the 2011.1.7 package and perform the software update.
4. After the controller has rebooted and is running 2011.1.7, attach the DE2-24C/P drive enclosures to the controller (see the Online Help Installation:Cabling section).
5. The controller will see the drive enclosures and will automatically start the DE2-24C/P IOM firmware upgrades to version 0010.
6. After the DE2-24C/P IOM firmware upgrades are complete, perform the standard storage configuration.
1. Rack all of the components and cable everything with the exception of the SAS cables between the DE2-24C/P drive enclosures and the controllers.
2. Power on each controller and perform the initial configuration (networking, password, etc.) for each as a standalone appliance.
NOTE: DO NOT PERFORM THE CLUSTER CONFIGURATION UNTIL THE LAST STEP.
3. Download the 2011.1.7 package to each controller and perform the software update.
4. After the controllers have rebooted, log into one controller and perform a factory reset.
5. After the factory reset has been completed and the controller is waiting for initial configuration, connect the DE2-24C/P drive enclosures to both controllers (see the Online Help Installation:Cabling section).
6. The non-factory reset controller will see the drive enclosures and will automatically start the DE2-24C/P IOM firmware upgrades to version 0010.
7. After the DE2-24C/P IOM firmware upgrades are complete, perform the standard storage and cluster configuration.
When updating from a 2010.Q3 release to a 2011.1 release, the following deferred updates are available and may be reviewed in the Maintenance System BUI screen. See the "Maintenance:System:Updates#Deferred_Updates" section in the online help for important information on deferred updates before applying them.
NOTE: APPLYING 2011.1 DEFERRED UPDATES WILL PREVENT ROLLING BACK TO PREVIOUS VERSIONS OF 2010.Q3 SOFTWARE OR EARLIER.
1. RAIDZ/Mirror Deferred Update (Improved RAID performance)
This deferred update improves both latency and throughput on several important workloads. These improvements rely on a ZFS pool upgrade provided by this update.
2. Optional Child Directory Deferred Update (Improved snapshot performance)
This deferred update improves list retrieval performance and replication deletion performance by improving dataset rename speed. These improvements rely on a ZFS pool upgrade provided by this update. Before this update has been applied, the system will be able to retrieve lists and delete replications, but will do so using the old, much slower, recursive rename code.
Shares snapshots interface
The numclones property is now decremented correctly in the shares snapshots interface.
1579628 SUNBT7174561 NAS cache doesn't update "numclones" upon clone deletion properly (16506645)
- Sun Storage 7110
- Sun Storage 7210
- Sun Storage 7310
- Sun Storage 7410
- Sun ZFS Storage 7120
- Sun ZFS Storage 7320
- Sun ZFS Storage 7420
- Sun ZFS Backup Appliance
- Sun ZFS 7000 Storage Appliance Simulator
The following CRs have been fixed in this release. The release-specific CR ID is shown in parenthesis.
|15399422||SUNBT6561524 savecore should denote that a particular vmcore file i (16323544)|
|15706330||SUNBT7032737 publish and perish: race between cte_copy() and cte_publish_all() (16901441)|
|15707133||SUNBT7033822 contract_exit/contract_abandon need to learn to let go (16901377)|
|15715549||SUNBT7044697 arc_p is inflated. (16470266)|
|15724669||SUNBT7059268 ndmpd should not reset the record size after mover_stop (16634542)|
|15738023||SUNBT7083517 savecore gives wrong error when dump being copied (16323531)|
|15744871||SUNBT7096240 savecore can print garbled text for failure-reason (16297277)|
|15746216||SUNBT7098077 mdb won't open a vmdump file even with -f (16323554)|
|15748087||SUNBT7100477 savecore misses out zero filled pages at the end of a (16323563)|
|15755414||SUNBT7112465 vhci_scsi_start() should return the return value of sc (16739462)|
|15758100||SUNBT7116711 FC ports are frequently reset while the system is under heavy load (16764118)|
|15758422||SUNBT7117076 savecore hanging/looping while unpacking vmdump file (16323568)|
|15758740||SUNBT7117528 vHCI pm_pre_config() needs to check for DEVI_DEVICE_REMOVED client (16744726)|
|15761877||SUNBT7122398 Use after free in savecore (16323572)|
|15784735||SUNBT7160420 Inconsistency in the client commands: unable to discover eligi (15805541)|
|15796248||SUNBT7174561 NAS cache doesn't update "numclones" upon clone deletion properly (16506645)|
|15810219||SUNBT7191667 kmem_cache_alloc(KM_PUSHPAGE) seems to be returning NULL|
|15845236||libak should set FD_CLOEXEC on file descriptors for log files (16547088)|
|15968295||Devices get unconfigured for false positive on xpterr check (15972325)|
|16099913||ndi_devi_config_one can leak a node reference if NDI_CONFIG is used (16739527)|
|16267536||RW2 SIM upgrade failure: DAM overloads use of DAM_SPEND (16733042)|
|16275675||7420C FC Luns inaccessible; PWWN disappeared and then reappeared in fabric (16764114)|
|16292939||install creates root partition with incorrect size (16608076)|
|16299119||Add Windows Server 2012 AD to the support list against the ZFSSA (16776423)|
|16304739||fdisk -B should clear the on-disk vtoc first. (16608098)|
|16344072||libtopo xml processing halts child processing if a duplicate node is found (16594853)|
|16367940||NDMP Tape to Tape copy is failing using DMA Netbackup 7.5 (16609372)|
|16405284||Bump AKV_REQUIRED to 2011.1.4.2 (16595985)|
|16447269||reinserted system disk was not detected by ak (16703622)|
|16456751||can should propagate up both pHCI and vHCI stick (16739552)|
|16493179||stat tick should do more to avoid livelock (16703723)|
|16597450||appliance FC port flapped due to qlt mbox timeout (16764129)|
|16610668||'fdisk -g /dev/rdsk/c#t#d#p0' fails for labeled lofi device on Sparc (16624844)|
|16615986||failed-demotion-re-promotion .vs. hotplug-promotion (16739586)|
|16689910||mptsas/lsc/scu return incorrect value from tgtmap_deactivate_cb (16739705)|
|16702815||3-way dump-NDMP restore fails with OSB 10.4.0.2 / catalog restore (16719477)|
|16733600||Include timing data of import, NAS cache discovery failback in ak-2011 rm_log|
|Title||Network Datalink Modifications Do Not Rename Routes|
|Related Bug IDs||15488020|
The Configuration/Network view permits a wide variety of networking configuration changes on the Sun Storage system. One such change is taking an existing network interface and associating it with a different network datalink, effectively moving the interface's IP addresses to a different physical link (or links, in the case of an aggregation). In this scenario, the network routes associated with the original interface are automatically deleted, and must be re-added by the administrator to the new interface. In some situations this may imply loss of a path to particular hosts until those routes are restored.
|Title||Appliance doesn't boot after removing first system disk|
|Related Bug IDs||15546043|
In a 7210 System, removing the first system disk will make the system unbootable, despite the presence of a second mirrored disk. To workaround this issue, break into the BIOS boot menu, under 'HDD boot order', modify the list so the first item is "[SCSI:#0300 ID00 LU]".
|Title||Network interfaces may fail to come up in large jumbogram configurations|
|Related Bug IDs||15573843|
In systems with large numbers of network interfaces using jumbo frames, some network interfaces may fail to come up due to hardware resource limitations. Such network interfaces will be unavailable, but will not be shown as faulted in the BUI or CLI. If this occurs, turn off jumbo frames on some of the network interfaces.
|Title||Multi-pathed connectivity issues with SRP initiators|
|Related Bug IDs||15609172, 15611632, 15618166, 15618253, 15618436, 15621220, 15621562, 15622079|
In cluster configurations, Linux multi-path clients have experienced loss of access to shares on the appliance. If this happens, a new session or connection to the appliance may be required to resume I/O activity.
|Title||Rolling back after storage reconfiguration results in faulted pools|
|Related Bug IDs||15586706|
Rolling back to a previous release after reconfiguring storage will result in pool(s) appearing to be faulted. These pools are those that existed when the rollback target release was in use, and are not the same pools that were configured using the more recent software. The software does not warn about this issue, and does not attempt to preserve pool configuration across rollback. To work around this issue, after rolling back, unconfigure the storage pool(s) and then import the pools you had created using the newer software. Note that this will not succeed if there was a pool format change between the rollback target release and the newer release under which the pools were created. If this is the case, an error will result on import and the only solution will be to perform the upgrade successfully. Therefore, in general, one best avoids this issue by not reconfiguring storage after an upgrade until the functionality of the new release has been validated.
|Title||Unanticipated error when cloning replicated projects with CIFS shares|
|Related Bug IDs||15615612|
When cloning replicated projects that are exported using the new "exported" property and shared via CIFS, you will see an error and the clone will fail. You can work around this by unexporting the project or share or by unsharing it via CIFS before attempting to create the clone.
|Title||Some FC paths may not be rediscovered after takeover/failback|
|Related Bug IDs||15618238|
After a takeover and subsequent failback of shared storage, Qlogic FC HBAs on Windows 2008 will occasionally not rediscover all paths. When observed in lab conditions, at least one path was always rediscovered. Moreover, when this did occur the path was always rediscovered upon initiator reboot. Other HBAs on Windows 2008 and Qlogic HBAs on other platforms do not exhibit this problem.
|Title||Unable to change resource allocation during initial cluster setup when using CLI|
|Platforms||7310C, 7410C, 7320C, 7420C|
|Related Bug IDs||15667251|
When performing initial cluster setup via the CLI, any attempt to change the storage controller to which a resource is allocated will result in an error message of the form error: bad property value "(other_controller)" (expecting "(controller)"). To work around this problem, use the BUI to perform initial cluster setup. Alternately, complete cluster setup, log out of the CLI, log back in and return to the configuration cluster resources context to finish resource allocation and initial failback.
|Title||Chassis service LED is not always illuminated in response to hardware faults|
|Platforms||7120, 7320, 7320C, 7420, 7420C|
|Related Bug IDs||15646092|
In some situations the chassis service LED on the controller will not be illuminated following a failure condition. Notification of the failure via the user interface, alerts including email, syslog, and SNMP if configured, and Oracle Automatic Service Request ("Phone Home") will function normally.
|Title||iSCSI IOs sometimes fail on cluster takeover/failback when using Solaris MPxIO clients|
|Platforms||7310C, 7320C, 7410C, 7420C|
|Related Bug IDs||15648589|
iSCSI IO failures have been seen during takeover/failback when iSCSI targets that are separately owned by each controller in a cluster are part of the same target group. To workaround this issue, iSCSI target groups should only contain iSCSI targets owned by a single controller. This also implies that the default target group should not be used in this case.
|Title||HCA port may be reported as down|
|Related Bug IDs||15698685|
HCA ports may be reported as down after reboot. If the overlaid datalinks and interfaces are functioning, this state is incorrect.
|Title||nearly full storage pool impairs performance and manageability|
|Related Bug IDs||15378956, 15661408, 15663845|
Storage pools at more than 80% capacity may experience degraded I/O performance, especially when performing write operations. This degradation can become severe when the pool exceeds 90% full and can result in impaired manageability as the free space available in the storage pool approaches zero. This impairment may include very lengthy boot times, slow BUI/CLI operation, management hangs, inability to cancel an in-progress scrub, and very lengthy or indefinite delays while restarting services such as NFS and SMB. Best practices, as described in the product documentation, call for expanding available storage or deleting unneeded data when a storage pool approaches these thresholds. Storage pool consumption can be tracked via the BUI or CLI; refer to the product documentation for details.
|Title||Solaris 10 iSCSI client failures under heavy load|
|Related Bug IDs||15662377|
If using Solaris 10 as the iSCSI initiator, you must use Solaris 10 Update 10 or later.
|Title||management UI hangs on takeover or management restart with thousands of shares or LUNs|
|Related Bug IDs||15665874, 15699950|
When a cluster takeover occurs or the management subsystem is restarted either following an internal error or via the maintenance system restart CLI command, management functionality may hang in the presence of thousands of shares or LUNs. The likelihood of this is increased if the controller is under heavy I/O load. The threshold at which this occurs will vary with load and system model and configuration; smaller systems such as the 7110 and 7120 may hit these limits at lower levels than controllers with more CPUs and DRAM, which can support more shares and LUNs and greater loads. Best Practices include testing cluster takeover and failback times under realistic workloads prior to placing the system into production. If you have a very large number of shares or LUNs, avoid restarting the management subsystem unless directed to do so by your service provider.
|Title||moving shares between projects can disrupt client I/O|
|Related Bug IDs||15664600|
When moving a share from one project to another, client I/O may be interrupted. Do not move shares between projects while client I/O is under way unless the client-side application is known to be resilient to temporary interruptions of this type.
|Title||repair of faulted pool does not trigger sharing|
|Related Bug IDs||15661166|
When a faulted pool is repaired, the shares and LUNs on the pool are not automatically made available to clients. There are two main ways to enter this state:
- Booting the appliance with storage enclosures disconnected, powered off, or missing disks
- Performing a cluster takeover at a time when some or all of the storage enclosures and/or disks making up one or more pool were detached from the surviving controller or powered off
When the missing devices become available, controllers with SAS-1 storage subsystems will automatically repair the affected storage pools. Controllers with SAS-2 storage subsystems will not; the administrator must repair the storage pool resource using the resource management CLI or BUI functionality. See product documentation for details. In neither case, however, will the repair of the storage pool cause the shares and LUNs to become available. To work around this issue, restart the management subsystem on the affected controller using the maintenance system restart command in the CLI. This is applicable ONLY following repair of a faulted pool as described above.
|Title||DFS links may be inaccessible from some Windows clients|
|Related Bug IDs||15650980|
Under rare circumstances, some Microsoft Windows 2008/Vista and Microsoft Windows XP clients may be unable to access DFS links on the appliance, receiving the error Access is denied. Windows 2003 is believed not to be affected. The proximate cause of this problem is that the client incorrectly communicates with the DFS share as if it were an ordinary share; however, the root cause is not known. The problem has been observed with other DFS root servers and is not specific to the Storage 7000 appliance family. At present the only known way to resolve this issue is via reinstallation of the affected client system. If you encounter this problem, please contact your storage service provider and your Microsoft Windows service provider.
|Title||NDMP service may enter the maintenance state when changing properties|
|Related Bug IDs||15664828|
When changing properties of the NDMP service, it may enter the maintenance state due to a timeout. This will be reflected in the NDMP service log with an entry of the form stop method timed out. If this occurs, restart the NDMP service as described in the product documentation. The changes made to service properties will be preserved and do not need to be made again.
|Title||Solaris/VxVM FC initiator timeouts|
|Platforms||7310C, 7410C, 7320C, 7420C|
|Related Bug IDs||15642153|
Symantec has enhanced their code to handle I/O delays during takeover and/or failback. This work was covered under Symantec bug number e2046696 - fixes for dmp_lun_retry_timeout handling issues found during SUN7x10 array qualification. Symantec created hot fix VRTSvxvm 5.1RP1_HF3 (for Solaris SPARC & x86) with this fix in it. The next patch 5.1RP2 & major update 5.1SP1 will have these changes. Obtain and install these patches from Symantec if you are using VxVM on Solaris as an FC initiator attached to a clustered appliance.
|Title||Missing data in certain Analytics drilldowns|
|Related Bug IDs||15648562|
When drilling down on a statistic that existed prior to the current system startup, certain statistics may show no data in the drilldown. This can occur if the original statistic required looking up a DNS name or initiator alias, as would typically be the case for statistics broken down by client hostname (for files) or initiator (for blocks). The problem occurs only intermittently and only with some statistics. To work around this issue, disable or delete the affected dataset(s), then restart the management software stack using the 'maintenance system restart' command in the CLI. Once the statistics are recreated or reenabled, subsequent drilldowns should contain the correct data.
|Title||Solaris initiators may lose access to FC LUNs during cluster takeover|
|Platforms||7310C, 7410C, 7320C, 7420C|
|Related Bug IDs||15648815|
If using Solaris 10 as the FC initiator, you must use Solaris 10 Update 10 or later.
|Title||Multiple SMB DFS roots can be created|
|Related Bug IDs||15664518|
It is possible to create more than the maximum of 1 DFS standalone root on an appliance if multiple pools are available. Do not create multiple DFS roots.
|Title||Intermittent probe-based IPMP link failures|
|Related Bug IDs||15664567|
An appliance under heavy load may occasionally detect spurious IPMP link failures. This is part of the nature of probe-based failure detection and is not a defect. The product documentation explains the algorithm used in determining link failure; the probe packets it uses may be delayed if the system is under heavy load.
|Title||Backup of "system" pool|
|Related Bug IDs||15671861|
The NDMP backup subsystem may incorrectly allow backup operations involving filesystems on the system pool, which contains the appliance software. Attempting to back up these filesystems will result in exhaustion of space on the system pool, which will interfere with correct operation of the appliance. Do not attempt to back up any filesystem in the system pool. If you have done so in the past, check the utilisation of the pool as described in the Maintenance/System section of the product documentation. If the pool is full or nearly full, contact your authorized service provider.
|Title||Configuration restore does not work on clustered systems|
|Platforms||7310C, 7320C, 7410C, 7420C|
|Related Bug IDs||15666733, 15700466, 15700693|
If a system is configured in a cluster, or has ever been configured in a cluster, the configuration restore feature does not work properly. This can lead to appliance panics, incorrect configuration, or a hung system. At the present time, configuration restore should only be used on stand-alone systems.
|Title||Node fails to join cluster after root password change prior to rollback:|
|Platforms||7310C, 7410C, 7320C, 7420C|
|Related Bug IDs||15649957|
In a cluster configuration, if the root password is changed prior to a rollback to an older release a cluster join failure can occur on that node. If this occurs, change the root password on the node that was rolled back to match the other node and perform a reboot. Once both nodes are at the same version and operating as a cluster, the root password can be changed again as needed.
|Title||SMB operation during AD outages may create damaged ACLs|
|Related Bug IDs||15565116|
SMB operations while the Active Directory (AD) domain controllers are unavailable can yield damaged ACL entries. In particular, problems can arise if Windows group SIDs are present in the ACLs on dataset roots. Access control may not work right, but will fix itself when the DCs become available and the cache entries expire, which normally takes approximately 10 minutes. SMB client sessions started during that period might need to be restarted. Any ACL that includes a Windows group written during such an outage will be damaged. A CIFS client can be used to repair damaged entries.
|Title||Cannot modify MTU of datalink in an IPMP group|
|Related Bug IDs||15708978|
As of the 2011.1 release, the datalink MTU can be explicitly set via the appliance BUI. However, attempting to change the MTU of a datalink with an IP interface in an IPMP group causes the datalink to enter the maintenance state. If this occurs, prior to placing the datalink into the IPMP group, destroy and recreate the datalink with the desired MTU.
|Title||Resilver can severely impact I/O latency|
|Related Bug IDs||15701038|
During a disk resilver operation (e.g., due to activating a spare after a disk failure), latency for I/O associated with the containing pool may be severely impacted. For example, the “NFSv3 operations broken down by latency” Analytics statistic can show 2-4 second response times. After the resilver completes, I/O latency returns to normal.
|Title||Windows 2008 R2 IB client may fail to ping appliance|
|Related Bug IDs||15746292|
Due to what appears to be an initiator-side problem, Windows 2008 R2 InfiniBand (IB) initiators may be initially unable to access the appliance. If this occurs, disable and re-enable the IB port on the initiator side by navigating to Network Connections, right-clicking on the appropriate IB port, selecting “disable,” right-clicking again, and selecting “enable.”
|Title||NDMP-ZFS backup limitations for clones|
|Related Bug IDs||15716003|
First, to successfully back up and restore a clone by itself (i.e., without backing up its containing project), ZFS_MODE=dataset must be set in the data management application. Second, to successfully back up and restore a project that contains a clone whose origin resides in the same project as the clone, use ZFS_MODE=recursive (the default mode). Third, to successfully back up a project containing a clone whose origin resides in a project different from the clone, back up the shares of the project individually using ZFS_MODE=dataset. (This even applies to shares that are not clones, although at least one will be a clone.) These limitations may be lifted in a future release. For more information on NDMP and the “zfs” backup type, refer to http://www.oracle.com/technetwork/articles/systems-hardware-architecture/ndmp-whitepaper-192164.pdf
|Title||Revision B3 SAS HBAs not permitted with 2011.1 release|
|Platforms||7210, 7310, 7410|
|Related Bug IDs||15749140|
Due to a defect with Revision B3 SAS HBAs, which is exposed by changes in the 2011.1 release, 7210, 7310, and 7410 appliances with Revision B3 SAS HBAs will be prevented by the appliance kit update health check software from upgrading to 2011.1. If this occurs, please contact Oracle Support about an upgrade to Revision C0 SAS HBAs.
|Title||"ZFS" should be used instead of "Sun" in /etc/multipath.conf|
|Platforms||7120, 7320, 7420|
|Related Bug IDs||15760277, 15761179|
When configuring Linux FC Multipath client initiators for use with 7120, 7320 and 7420 platforms, the product string in the /etc/multipath.conf file should be "ZFS Storage 7x20". 7x10 platforms should continue to use the "Sun Storage 7x10" product string.
|Title||Following a SIM upgrade, a Logzilla may be left with a single path|
|Related Bug IDs||15754494|
If this problem happens, SIM upgrades will stop and the UI will report one path to the affected device from both heads in a cluster system. To re-enable the path and continue with SIM upgrades, the affected Logzilla device must be re-seated. The device should be pulled from its bay and re-inserted after waiting 10 seconds.
|Title||Shadow migration issues|
|Related Bug IDs||15654495, 15661918, 15595857, 15702398|
Avoid shadow migrating filesystems which have thousands of files and/or directories in the root directory of the source file system. If errors are encountered in migrating the root directory of the source file system, the migration may fail to make progress. Cancel and re-start after fixing the errors. Losing a shadow migration source can have severe negative impact on sharing of all other file systems. Restore access to the source as soon as possible, or cancel the migration if access to the source cannot be re-established.
|Title||DE2-24C/P drive enclosure is not discovered after attaching multiple initiators|
|Platforms||All platforms using DE2-24C/P Drive Enclosure|
|Related Bug IDs||16594972|
DE2-24C/P drive enclosures maintain a table of known initiators with a length of seven. If the SES target gets a request from an initiator that it does not already know about and the initiator table is full, it will refuse connection. This is very unlikely to happen unless numerous drive enclosure configuration changes are made. To workaround this issue, safely shutdown the appliance and power cycle the drive enclosure.
|Title||Deleting multiple update images can cause CLI/BUI to hang|
|Related Bug IDs||16306188|
When multiple update images are deleted too quickly, the AK CLI and BUI can become unresponsive. To workaround this issue, wait for each update image deletion to complete before deleting the next update image.
|Title||USB stick changes boot order during PXE installs|
|Related Bug IDs||17230434|
If an external USB stick is attached during a PXE install, the default boot device ordering will be changed during automatic reboot after install and the system will repeat the PXE install instead of booting from the internal system HDD. To workaround this issue, remove any connected external USB stick prior to PXE installs.