GlassFish Server Open Source Edition 3.1 - Shoal Group Management Service (GMS) for Runtime Clustering Services
- GlassFish Server Open Source Edition 3.1 - Shoal Group Management Service (GMS) for Runtime Clustering Services
- Scope of the Project
- Feature Overview
- GMS-01 Shoal GMS over Grizzly implementation
- GMS-02 Integrate Shoal GMS into GF v3.1 using dev. stop-gap clustering cmds
- GMS-07 GMS over Grizzly virtual multicast
- GMS-09 Monitoring Stat Providers
- One-pager / Functional Specification
- Dev Tests
- Milestone Schedule
- References / Links
- Email Alias
Shoal GMS is a clustering framework that provides infrastructure to build fault tolerance, reliability and availability.
The clustering framework provides the following functionalities to Glassfish services and user applications.
- GMS event notification
- register callback for GMS event notification
- GMS runtime notifies when changes in group membership occur.
- Cluster-wide Messaging
- send a message to one, sublist or broadcast to all members of cluster
- register message callback to process messages sent by other group members.
- GMS status methods
- list of all members or just list of CORE members
- request a member status on any member in cluster
- GMS Group Configuration
- configure multicast address and port used by a group
- ability to configure gms listener BIND_INTERFACE_ADDRESS(ip address of network interface) to use on a multihome machine
Recent enhancement to GMS is to have a pluggable transport and a Grizzly implementation of that transport.
GF HA subsystem is building on top of GMS clustering framework and its messaging.
|Feature-ID||Priority||Description||Eng Response||Owner(s)||Estimate (Person Days)||Source of Requirement||Status/Comments|
|GMS-01||P1||Shoal GMS over Grizzly implementation||YES||Joe Fialli, Bobby Bissett||DONE||switch to a well supported transport||shoal gms dev level testing confirmed working|
|GMS-02||P1||Integrate Shoal GMS into GF v3.1 using dev. stop-gap clustering cmds||YES||Joe Fialli, Bobby Bissett||near complete||earlier gms in gf v3.1 testing||this is a transitional integration. building of gms-adapter not enabled in cluster pom.xml yet. tmp gms cluster config files/asadmin cluster cmds so gms in gf v3.1 testing can start|
|GMS-03||P1||Shoal GMS integration via domain.xml configuration||YES||Bobby/Joe||estimateManDays||impl v2.1 functionality||initial gmsconfig document presented to admin team, updating doc with feedback. Send out update to admin dev alias.|
|GMS-04||P1||Introduce GMS GroupHandle.getPreviousAliveOrReadyMembers()||YES||Joe Fialli||5 days||HA request||Mahesh needs to use this during HA M3 development. Higher priority than rejoin.|
|GMS-05||P1||Introduce GMS rejoin subevent in JOIN and JOINED_AND_READY notification||YES||Bobby||estimateManDays||compensate for loss of v2.1 NodeAgent as gms watchdog||subevent informs GMS client that a clustered instance failed and was restarted quicker than GMS heartbeat failure detection would have been able to detect failure. In GF v3.1, local initrc technique for monitoring a process and restarting it when it fails will cause this to occur.|
|GMS-06||P2||asadmin get-health clustered-instance or cluster||YES||Joe||3-7 days||feature parity||note: only works for gms-enabled cluster|
|GMS-07||P3||GMS over Grizzly virtual multicast||NO||ownerTBD||estimateManDays||identified as desirable feature during GF v3.1 launch||status/comments|
|GMS-08||P2||multicast enabled diagnostic utility||YES||Bobby||estimate?||ease of support request||test already exist in shoal gms just needs to be properly packaged|
|GMS-09||P2||Monitoring Stat Providers||YES||Bobby/Joe||estimateManDays||Provide stats that be used for both monitoring and debugging. messaging throughput and event notification counters|
|GMS-10||P2||Upgrade from v2.1 cluster and group-management-service element's attributes/properties to v3.1 cluster/group-management-server||YES||Bobby||??||feature parity|
|GMS-11||P2||Update external library Shoal GMS to meet GF v3.1 logging requirements||YES||Bobby/Joe||??||feature parity|
Replace JXTA as the transport provider for GMS and provide an implementation over Grizzly using Grizzly as the transport provider for TCP and UDP messages. The basic implementation is being provided by Bongjae Chang, Shoal community member, but needs further enhancements to make it production quality.
- Tune default Grizzly NetworkManager parameters. These are settable via following GMS Property parameters. <b>MAX_POOLSIZE, CORE_POOLSIZE, KEEP_ALIVE_TIME, POOL_QUEUE_SIZE, HIGH_WATER_MARK, MAX_PARALLEL, WRITE_TIMEOUT, MAX_WRITE_SELECTOR_POOL_SIZE</b>. These parameters manage resources available to process incoming Grizzly processing.
- The GMS over Grizzly implementation will be tested over several of the critical test scenarios to establish its viability and stability for usage in a production quality GlassFish deployment. Initial stability tests will be based on module level distributed tests.
- Requires QE to port over earlier GMS tests that were based on Appserver and depended on Appserver asadmin commands et al and revert to a module level test as we did with initial GlassFish v2 testing.
- Fix P1 and P2 type bugs
- Grizzly framework and util jars version 1.9.19
- Add GMS messaging load testing since HA relying directly on GMS over Grizzly messaging. (in v2.1, HA used jxta messaging directly)
- QE use the stop-gap clustering cmds/configurations so gms over grizzly testing in gf v3.1 does not have to wait till GMS-03 functionality complete. Should be simple replacement of stop-gap cluster command for equivalent asadmin cluster command when GMS-03 is ready.
- Make GMS OSGi compliant - GMS was not an OSGi compliant module. Changes being incorporated to mavenize GMS to publish to maven repository, and use GF v3 maven commands to build an OSGi compliant module
- Create a GMS Service in GlassFish that will act as a delegate to start and stop GMS module in each GF 3.1 instance
Create a GMS Service in GlassFish that will delegate start and stop operations on GMS module in each GF 3.1 instance. This is an @Startup class that will be part of the startup services in a clustered GF instance. Need to close on requirements with respect to when GMS module will be started - need architects to consider impact of introducing instability and unpredictability in group memberships with lazy startup guidelines. COMPLETED by Sheetal.
GMS for JoinedAndReady (instance start and restart), failure detection notifications and all Messaging pertaining to session replication.
DOL (possibly for deployment framework support to correctly handle automatic timer creation) <br>
Multicast is not always enabled in all production environments. Additionally, it is not available in cload environments.
So this feature is desirable for those configurations.
Implement as a staic list of ip address and ports that make up cluster. no dynamic capabilities when multicast not enabled in network.
- Yet another configuration to test. Really need to run all existing shoal tests with this configuration to be sure it is working.
Provide monitoring stats that will help with diagnosing GMS membership related issues.
- Information on GMS junit dev tests posted on Dev Tests page
- TBD. Post how to run automated distributed shoal gms developer unit test will be added to Dev Tests page
- 14 junit tests. Run as part of "mvn install". Also, "ant run-junit-tests.
- distributed developer level tests (runnable with multiple instances on one machine AND distributed as one instance per machine.)
- 4 gms event notification tests (join/joinAndReady/suspected/failure/plannedShutdown)
- 2 gms messaging tests
- Automation of nightly running of shoal gms distributed developer level test (both on single machine and distributed) via hudson is in progress.
- Link to Test Plans
- Link to Documentation
|Item #||Date/Milestone||GF Issue Tracker||Feature-ID||Description||QA/Docs Handover?||Status / Comments|
|01||DONE||GMS-01||GMS over Grizzly||QA handover done, no doc handover needed||Initial distributed unit level testing done.|
|02||DONE||gfit 12189||GMS-02||Integrate Shoal GMS into GF v3.1 using dev. stop-gap clustering cmds|
|03||DONE||gfit 12190||GMS-03||Shoal GMS integration via domain.xml configuration||QA Handoff: PENDING(will complete on 6/28) Docs: yes||AS ARCH completed on gms config doc|
|04||M3||DONE gfit 12191||GMS-04||Introduce GMS GroupHandle.getPreviousAliveOrReadyMembers()||No||Dev test is sufficient. Only useful for consistent hash calculation in HA.|
|05||M3||DONE gfit 12192||GMS-05||Introduce GMS rejoin subevent in JOIN and JOINED_AND_READY notification||QA Handover YES, DOC: javadoc, Project shoal web pages||Status/Comments|
|06||M?||v3.2||GMS-07||GMS over Grizzly virtual multicast||N/A||while desirable, impl and testing resources not identified yet|
|07||M4||DONE gfit 12195||GMS-08||multicast enabled diagnostic utility||YES||DONE|
|08||moved to v3.2||gfit 12194||GMS-09||Monitoring Stat Providers||YES||This feature was moved from M4 to v3.2. message throughput, thread utilitization, number of detect SUSPECTED, number of FAILURES,|
|09||M4(8/16)||DONE gfit 12563||GMS-10||Upgrade from v2.1 cluster and group-management-service element's attributes/properties to v3.1 cluster/group-management-server||YES||manual testing completed. automated testing to be done in admin devtest.|
|10||M5(9/13)||DONE gfit 12196||GMS-11||Update external library Shoal GMS to meet GF v3.1 logging requirements||No QA handoff or doc||Status/Comments|
|11||M1(5/24)||DONE||GMS one-pager||Identifying dependencies and new methods being added.|
|12||M4||DONE GFIT 12193||GMS-06||asadmin get-heatlth cluster or clustered-instance||YES||automated test in admin devtest|
|Task||Target Milestone||Start||End Date||Owner(s)||Feature ID||Status / Comments|
|Test GMS over Grizzly in shoal unit testing based on evaluation criteria||DONE||Kazem, Steve DiMilla||maintain gf v2.1 quality. Run 6-8 critical scenarios of all scenarios over several iterations to establish feasibility of Grizzly transport based implementation and stability levels equivalent to the JXTA based implementation. still need scenario 51.|
|Automate running shoal distributed dev tests via Hudson||M1||05/03||05/17||Steve DiMilla||GMS-01||automation of standard shoal notification validation nearly complete. msg throughput test still being worked on.|
|OSGI shoal-gms.jar||M1||DONE||DONE||Sheetal||GMS-02||initial pass completed.|
|Load shoal-gms.jar only when gms-enabled||M2||5/2?||endDate||Bobby or Joe (whoever is free first to work on it)||GMS-02 or GMS-03||only load shoal-gms.jar when a clustered instance/DAS has a gms-enabled cluster. Check with Jerome if okay to enable gms in gfv3.1 before implementing this task.|
|Meet GF v3.1 OSGI requirements||M2||start||end||Bobby||GMS-02||Export minimal gms pks based on OSGI requirements documented in How to run OSGI Pkg Dep Analyzer|
|stop-gap clustering control in gf v3.1||M1||DONE||DONE||Steve DiMilla||GMS-02||dev handoff to QE was completed on April 29. Note: cluster/pom.xml is not building gms-adapter yet. Check with Jerome if okay to enable it before completing Lazy loading of shoal-gms.jar only when cluster gms-enabled is on.|
|DAS joins all of its domain.xml clusters when started||DONE||Start||endDate||Joe||GMS-03||task documents v2.1 impl behavior , may need to be revisited in v3.1|
|DAS dynamically joins cluster created by "asadmin create-cluster"||DONE||Start||endDate||Joe||GMS-03||task docs v2.1 impl behavior, may need to be revisited in v3.1|
|Test GMS within gf v3.1 with stop-gap clustering||M2||start||DONE||Kazem/Steve||featureID||stop-gap clustering implemented and initial gms integration into gf is completed. cluster/gms-adapter checked in but building not enabled yet by cluster/pom.xml.|
|Test GMS within gf v3.1 with domain.xml/asadmin cmds||M?||start||endDate||Kazem||featureID||use domain.xml config and GF v3.1 asadmin cmds. Depends on asadmin start-cluster, start-instance, stop-instance, stop-cluster.|
|asadmin get-health clustered-instance or cluster||M4||startDate||3-7 days||Joe||GMS-06||note: only works for gms-enabled cluster. Leverage existing GMS get-member-status method. most time will be implementing Admin CLI command side.|
|Developer unit test for Distributed State Cache||M3||CANCELLED||5 day task||Joe Fialli||featureId||source of NPE and hangs in v2.1, IIOP will use DSC as it did in v2.1 to get IIOP address of other members in cluster|
|Disable Distributed State Cache for GF v3.1||M3||1 day task||Joe Fialli||featureId||more efficient for IIOP to just read IIOP port info directly from domain.xml. transaction is no longer using fencing, so add property to disable DSC completely.|
|Performance Tuning: Tune Grizzly NetworkManager default settings||M4||StartDate||EndDate||Joe Fialli||featureId||tune default values|
|Place ALL SEVERE, WARNING, INFO log messages into logstrings.properties. Must have event id (GMS-050)||M5||startDate||endDate||Bobby||GMS-11||status/comments|
|Add Diagnostic messages for SEVERE and WARNING||M5||startDate||endDate||Joe||GMS-11||Minimally required for GMS properties when misset cause instance to not come up. (i.e. BIND_INTERFACE_ADDRESS)|
Old URL (read-only): http://wiki.glassfish.java.net/Wiki.jsp?page=GlassFishv3.1GMS
New Page: https://wikis.sun.com/display/glassfish/GlassFishv3.1GMS