Testing: Performance and analysis

Testing: Performance and analysis

Objectives
Test Environment
Portal Infrastructure Baseline
Tuning Guide

User scenarios
Think time
Cookies and sessions
Metrics
Repeatability principle

Saturation
Bottlenecks

Note on ramp rates
Priming the Portal

Analysis Techniques
Common Problems

JVM heap utilization
Logging
Java class and variable synchronization
Database contention
LDAP responsiveness
Excessive session sizes
Exceptions being thrown
Dynacache concerns DRS replication modes
Dynacache eviction concerns
Operating system concerns
Customer portlets

Capacity Planning
Load testing
Extrapolating Results
Test With Full Cluster
Failover Testing
Ongoing Capacity Planning
Cost of Vertical Clustering

Objectives

Determine load level at which system fails
Finding bottlenecks that throttle throughput
Extrapolate future growth requirements

IBM defines a system as the complete end-to-end set of components required to deliver requested Web pages to a requesting user's browser.
Backend components such as databases and LDAP are generally maintained by separate organizations.

Test Environment

The performance test environment needs to be either the production environment itself or a mirror of production, ideally with the same hardware, topology, and back-end systems.
Testing is all about iterations and repeatability.
Isolate the portal system on a separate network to reduce variability in results, and to reduce stress on the corporate network itself. If an isolated network is not feasible, try to ensure that the components of the test are all collocated on the same subnet of a network router. WebSphere Portal best practice recommends using a gigabit Ethernet connection between the portal and its database. Optimally, this connection extends to the LDAP servers, the Web servers, and important back-end services.
Load generators should be on a LAN segment local to the Web server and/or the portal itself.
A common customer concern involves the load generators being on the same local LAN segment as the portal itself. The user experience, which initiates from outside the local LAN segment, is excluded. However, we are tuning and resolving issues with the portal and its surrounding components. Trying to tune the network between the users and the portals makes analysis needlessly complex. We therefore remove it from the test. There are better tools and processes for network tuning than the processes used here.

Portal Infrastructure Baseline

First conduct incremental sets of baseline tests on out-of-the-box portal. Then conduct tests after transfering the database, enabling security, and configuring front-end Web servers, firewalls and load balancers. Configure any external Security managers such s SiteMinder or WebSeal). Create a simple home page with a couple of portlets that do not access any back-end systems (for example, the World Clock portlet). Create a simple load testing script that accesses the unauthenticated home page and then logs in (authenticates) and idles without logging out. From this point, to add simulated users (Vusers) until the system is saturated. Using the bottleneck analysis techniques described below, find and fix any bottlenecks in the infrastructure. Note the performance baseline of this system.
Now, add to the system any customized themes and skins, and repeat the previous test. Find and fix any important bottlenecks in the revised system. Finally, as described below, add the actual portlets to be used on the home page and perform bottleneck analysis.
This type of baseline is effective in finding bottlenecks in the infrastructure that are independent of the application. Further, it can provide a reference when analyzing the extent to which the applications place additional load above and beyond the basic WebSphere Portal infrastructure.

Tuning Guide

Apply the recommendations outlined in the WebSphere Portal Tuning Guide to all systems before you embark on any performance testing.

Considerations

Select a load generator to produce simulated user requests for Web pages, and metrics such as response time and page views per second. Ideally the generator can aggregate data such as CPU utilization on the portal and HTTP servers, as well as mod_status data from the HTTP server.
Load generators for driving load include...

Mercury LoadRunner
Borland SilkPerformer
IBM Rational Performance Tester

The load generator should have enough virtual users to drive the system to saturation.
For authenticated access, generate sufficient unique test user IDs exist in the LDAP. The portal artifacts are generally cached on a per-user basis. If the load simulation uses the same user ID for all tests, performance appears artificially high because the artifacts do not need to be loaded from the LDAP directory and the database.

User scenarios

To tune the WebSphere Portal system to handle large numbers of users and to accurately predict its ability to handle specific numbers of users correctly, it is important to determine the most probable scenarios for users of the system. The test must then accurately simulate those user scenarios using the load generator. One effective way to do this step is to list the most likely use cases.
Write a script for each use case (for example, buyers vs. browsers) and assign set a percentage of users for each scenario a probability of likelihood that a percentage of (for example, 15% vs. 85%) the whole user population will execute that scenario.
A vUser represents an active channel, (TCP socket) over which requests are made and returned.

Think time

As think time is reduced, the number of requests per second increases, which in turn increases the load on the system. Reducing think time generally increases the average response time for WebSphere Portal login and page-to-page navigation.
A think time of 10 seconds can be set for experienced users. 30 seconds for inexperienced users.

Cookies and sessions

Users rarely logout via the logout button. Browsers sit idle until sessions time out. Session cleanup occurs via WAS session timeout. Sessions available memory in the JVM heap risking heap exhaustion.
As each individual simulation executes a particular use case, the use case should go idle as opposed to logging out. As the script cycles back around to log in a new user on this particular Vuser, the cookies for old session (typically JSESSIONID) and LTPA along with any application-specific cookies need to be cleaned up appropriately before logging in the next user using that script. This model also implies that sufficient test IDs need to exist so that a test ID can sit idle for the length of the WAS session timeout without risk of being reused until the previous session times out.

Metrics

Important metrics...

Page views per second (PVs).
Request response times, including login and page-to-page.

Most of the load generators already provide aggregate Page View per second (PVs) metrics.
At the conclusion of each test, a graph of Vusers ramp rate versus the three metrics is required for doing analysis.
Leverage system monitoring tool such as...

IBM Tivoli Composite Application Manager (ITCAM) for WebSphere
Computer Associates Wily IntroScope

Repeatability principle

For all runs of a particular scenario, the metrics produced converge to the same results if the runs are sufficiently long.

Saturation

Number of active Vusers at which point adding more Vusers does not result in an increase in the number of PVs.
To effectively drive a system to saturation, add Vusers a few at a time, let the system stabilize, observe whether PVs increase, and add more Vusers as possible. ("Stabilize," in this context, means that the response times are steady within a window of several minutes.) On LoadRunner, if you plot Vusers against throughput (PVs), the PVs initially rises linearly with the number of Vusers, then reaches a maximum and actually decreases slightly from that point. The saturation point is the number of Vusers at which the PVs is at maximum.

Bottlenecks

Generally the result of two issues...

Contention for shared resources. This contention can be the result of synchronized Java classes, methods, or data structures, contention for serial resources (for example SystemOut.log).
Excessive response times in back-end databases, remote IBM content management systems, or Web servers. Or the network itself. Routers and firewalls can impose congestion control or can be poorly tuned.

As load increases, contention for these resources increases, making contention locks easier to detect and correct. This detail is why effective load testing is a requirement for bottleneck analysis. Bottleneck analysis is iterative. Fix one bottleneck, go to the next. A single JVM is used to avoid having to resolv cross-JVM contention.
Note that response time optimization is generally more appropriately done in a non-loaded system and with tooling specific to the task (for example, JProbe).

Note on ramp rates

Ramp a small fixed number (for example, two Vusers per minute) for a set period of time (for example, five minutes). Then wait for a time to let the system stabilize (for example, five minutes) at which time loop back and add another batch of Vusers in the same fashion.
This technique gives the portal time to fill the various caches in an orderly fashion and provides for the ability to more accurately detect saturation points.

Priming the Portal

After a portal restart, a short script should be executed prior to the main test to preload the access control and anonymous page caches. Failure to do so can skew the initial response times inordinately.

Analysis Techniques

At saturation, to determine the cause, take a Java thread dump (kill -3) against the portal Java process under test. Look for threads that are blocked or in a wait state.

Common Problems

JVM heap utilization

Apply initial JVM tuning recommendations as outlined in the Portal Tuning Guide.
Enable verboseGC
Leave it enabled, even during production. The amount of log data is not large.
To force verboseGC log rolling...
-Xverbosegclog:{SERVER_LOG_ROOT}/verboseGC#.log,5,10000

To cause any Java object allocations greater than 1M to be recorded in native_stderr.log, go to...
Servers | Application Servers | WebSphere_Portal | Java and process management | Process definition | Java Virtual Machine | Custom properties

...and set...
ALLOCATION_THRESHOLD = 1000000

For Out of Memory errors that coincide with large object allocations when verboseGC.log shows plenty of available heap, fragmentation is the likely culprit. To fix, set -Xloratio0.1.
If verboseGC.log indicates a large number of mark stack overflows (MSOs), to override the default, and provide additional mark stack space, set -Xgcthreads.

Logging

Logging using direct writes to SystemOut.log or using a logging class such as log4j causes serialization between running threads and significantly degrades portal performance. In production portal systems, log only what is absolutely needed. When using log4j, log only errors; do not log warnings or informational messages. If logging is required for audit purposes, consider using a portal service or a different service running in a separate JVM.
Turn off all logging and remove all debug code that writes to files before doing performance testing.

Java class and variable synchronization

Use of method-level synchronization blocks where a method is in a monitor wait (MW) state with one method holding a lock can be problematic. In this case, you have Java code that is synchronized and is causing serialization in the system.
Use of synchronized class variables or synchronized HashMaps can also cause this problem.
In both cases (method or variable synchronization), the problem can be exacerbated by arbitrarily increasing the number of WAS transport threads in which the portal runs. By increasing the number of threads, you increase the probability of hitting portal code that is synchronized in this fashion, which ultimately serializes all the threads.

Database contention

If the thread dump indicates numerous threads waiting in JDBC classes in Socket.read() methods, then there are likely response time issues in the database itself.
At initial database transfer time, Portal sets up the databases with indexes that should be good initial starting points. The DBA should monitor the database and effect changes to remove bottlenecks in the system.
Common problems...

Queries taking excessive time due to table scans
Insufficient processor and memory resources on the DB server itself
Insufficient allowed connections as opposed to the configured JDBC pool sizes on Portal and WCM

When thread dumps indicate excessive JDBC wait times, take snapshots for long queries. Generally, Portal and WCM queries all execute in subseconds, if not in milliseconds. Look at the execution plans for long-running queries, and see if additional indexes might be required to improve response times on problematic queries.
When threads are waiting on JDBC pool resources in WAS, you see the threads in a condition wait (CW) state in the WAS connection pool (J2C) classes. In this case, we might need to increase the pool size for this data source. Note that in doing so, we might need to increase the number of connections that the database server can handle concurrently.

LDAP responsiveness

If several threads are in the Socket.read() method of the JNDI classes, they are likely waiting on results from the LDAP directory.

Excessive session sizes

If customer-written portlets are storing too much data in the session, that condition invariably leads to memory and performance issues.

Exceptions being thrown

Unchecked exceptions slow down JVM down and causes serial I/O to the SystemOut.log print stream, serializing the WAS transport threads.
Results on systems with flaws are considered non-repeatable.
WebSphere Portal should not be allowed to enter a high-load production environment with any errors in the logs.

Dynacache concerns DRS replication modes

WebSphere Portal requires that the WAS Dynamic Cache Service (dynacache) and cache replication be enabled. The default mode of replication, PUSH, can cause performance problems in the portal environment. WebSphere Portal V6.0.1.5 and V6.1 change the default for all Portal and WCM dynacaches to be NOT SHARED instead of PUSH.
The use of NOT SHARED is strongly recommended for the vast majority of WebSphere Portal configurations. Three actions are needed to ensure that each Portal cluster member is fully optimized for WebSphere Portal V6.0.1.4 and earlier. The first is to set the replication mode to NOT SHARED using the WAS console for each cluster member. The second is to install Portal PK64925. The third is to install WMM PK62457 and add the parameter cachesSharingPolicy with a value of NOT_SHARED to the LDAP section of the wmm.xml files on each node. We can check out further details here.
WCM dynacaches also should be set to NOT SHARED with the exception of the "menu" cache. To complete this task, in the dmgr console, navigate to...
Resources | Cache Instances | Object Cache Instances

...and change each of the individual cache instances to a mode of NOT SHARED. As of the time of this writing, there are 11 instances for WebSphere Content Manager.
Finally, there are WAS changes that can further, although marginally, reduce the amount of network traffic between cluster members due to replication events. For each cluster member (either WCM or portal), navigate to...
Servers | Application Servers | WebSphere_Portal | Java and process management | Process definition | Java Virtual Machine | Custom properties | New

...and define the following properties:
com.ibm.ws.cache.CacheConfig.filterLRUInvalidation=true
com.ibm.ws.cache.CacheConfig.filterTimeOutInvalidation=true
com.ibm.ws.cache.CacheConfig.cacheInvalidateEntryWindow=2
com.ibm.ws.cache.CacheConfig.cacheEntryWindow=2

Dynacache eviction concerns

Install the advanced dynacache monitor. If one or more of the caches seem to have large amounts of least recently used (LRU) evictions, the size of that cache might need to be increased. The sizes of the WebSphere Portal caches are mostly located in the WAS Resource Environment Provider WP_CacheManagerService. The size of WCM dynacaches is controlled from the dmgr console in the Object Caches section.

Operating system concerns

Under no circumstances should memory paging occur on an operating system hosting Portal or WCM. If it is, actions must be taken to alleviate this situation. Performance will immediately and dramatically degrade in the presence of paging.
Enable large page support on AIX and set the JVM property "-Xlp to dramatically improve memory utilization.
On AIX, consider setting the memory management option "lru_file_repage" to 0 to ensure that computational memory is prioritized over file I/O buffers. This setting ensures that in situations where physical memory becomes limited, AIX will not swap out the Java processes in favor of file I/O buffers.

Customer portlets

Common issues...

Use of synchronized class variables.
Excessive database calls. Consider using DB caching layers or dynacache to reduce the load on application databases or back-end services.
Unsynchronized use of HashMaps. There are timing scenarios in which these classes get into infinite loops if separate threads hit the same HashMap without being synchronized.

Capacity Planning

The goal of capacity planning is to estimate the total number of WebSphere Portal JVMs required that satisfy a certain user population within predetermined SLA metrics prior to entering production.
Typical metrics include:

Portal login response time (typically around four seconds)
Page-to-page response times after being already logged in (typically around two seconds)

Load testing

Run tests for saturation and failure of any of the SLA metrics.
If the test reaches saturation before any of the SLA metrics are exceeded and if it has already been determined that there are no bottlenecks that can or will be excised, then we can immediately calculate the number of nodes required.
If the SLA metrics are exceeded before reaching saturation, then you must analyze the failure to determine the next course of action. If you determine that you do not need to resolve the response time issues, then proceed directly to calculating the number of nodes, as discussed in the next section of this article.

Extrapolating Results

In general, if a single WebSphere Portal node can sustain n users within given SLA metrics, then 2 nodes can sustain 1.95 * n users. The accepted horizontal scaling factor for a portal is .95. Thus, if a single WebSphere Portal node can sustain n users within given SLA metrics, then m nodes can sustain

n (1 + .95 + .95² + .95³ + ... + .95^m)

Thus, the horizontal scaling factor is slightly less than linear.
This scaling factor assumes that the database capacity does not bottleneck the system. In fact, this scaling factor is primarily a metric of the degeneration of the WebSphere Portal database for logging in users.
Vertical cloning (scaling) is somewhat different. Vertical cloning is indicated when a single JVM saturates a node at a processor utilization around 80 percent or less. Note that in most cases, bottleneck analysis usually provides relief. In the absence of Java heap issues, a single JVM can usually be tuned to saturate a node at 85 to 90 percent processor utilization.

Test With Full Cluster

If sufficient load generation capacity exists (including test IDs), it is wise to do a final series of tests in which the whole user community is simulated against the full cluster to ensure viability of the entire system.

Failover Testing

If there is a system requirement for full performance during a failover, this scenario should also be scripted and tested.
Before running this scenario, review plugin-cfg.xml at the HTTP server to ensure that the cluster definitions are correct. Consider adding the parameter ServerIOTimeOut to the cluster members. This parameter augments the ConnectIOTimeout parameter, which is the amount of time before a cluster member is marked as down in the event that the remote server fails to open a socket connection upon request. The parameter is normally present in plugin-cfg.xml and defaults to 0, which means it relies on the operating system to return timeout status to the plug-in instead of the plug-in explicitly timing the connection itself.
The parameter ServerIOTimeout is, by default, not included in plugin-cfg.xml. This parameter sets a time-out on the actual HTTP requests. If the portal does not answer in the allotted time, the server is marked down. This step is useful because there are certain classes of failures whereby the portal cluster member accepts a socket open request, but the JVM has hung and will not respond to HTTP requests. Without ServerIOTimeout, the plug-in does not mark the cluster member as down; however, it is not able to handle requests. This situation results in requests being routed to a hung server.
During this test, start with the cluster fully operational. Enable Vusers in the simulation to the maximum number that the SLA mandates. Then, stop one or more cluster members. We can do this step gracefully by stopping the cluster members from the deployment manager or by simulating a network failure by removing the Ethernet cable from a cluster node. Many other failure modes might be worth investigating (for example, database failures, Web service failures, and so on). After the simulated cluster member outage, ensure that the surviving cluster members handle the remaining load according to the system requirements. Then, restart the offline cluster members to ensure that the load returns to a balanced state over time.

Ongoing Capacity Planning

If a system is already in production and is meeting its current SLA goals, you also want to plan for future growth in the number of users of the system. Assuming that the applications on the WebSphere Portal do not significantly change, we can derive the necessary measurements and calculations from a running production system. You need proper tooling, though, to take the measurements.
In short, if n JVM can support x users, then each JVM can support (x/n)^(1/.95) users.

Cost of Vertical Clustering

When additional cluster members are active on the same physical node, costs are associated with it. First, there is process context switching. The operating system must now manage additional processes (JVMs).
Second, there is more contention for processor resources. Generally, vertically clustering is always a bad choice if the number of active cluster members exceeds the number of processors in the node less one. You should never have three cluster members on a three-processor node, for example. Two cluster members on a three-processor node might be acceptable under certain conditions.
Apart from performance concerns, having additional cluster members might make sense strictly for reliability reasons. If a WebSphere Portal installation is on a single node, then in the event of a software failure that crashes one JVM (without crashing the operating system), we can mitigate the effect of the crash by adding vertical cluster members. The assumption is that most software failures are localized to a single JVM and do not affect the others on the same node. Therefore, the cluster continues serving requests while the failing JVM is restarted.
In a 32-bit operating system, process address spaces are limited to 4 gigabytes of memory. Most operating systems split this space as 2 gigabytes of user space and 2 gigabytes of kernel space. There are exceptions whereby the user space can be increased to ~3 gigabytes and the kernel reduced to 1 gigabyte (Solaris, AIXÂ, and MicrosoftÂ WindowsÂ 2003 Enterprise, for example).
If the address space available to the JVM is 2 gigabytes, then the JVM can allocate approximately a 1.5-gigabyte heap space.
There are cases when the combination of the WebSphere Portal base memory working set, along with the total memory required for all the portlets running during stress, could approach and exhaust the 1.5-gigabyte heap. When this happens, and if there is still a significant amount of processor resource available (20 to 30 percent or more), then vertical cloning could increase the total throughput of the box by effectively creating 3 gigabytes of JVM heap and dividing the workload evenly between the two 1.5-gigabyte heap JVMs.
If the WebSphere Portal application (and the portal itself) uses enough synchronized methods or class variables, we can, under load, end up with a high and frequent number of blocked threads in the application server. We can identify this situation by taking thread dumps under load and noticing that there are lots of Web container threads sitting in MW state waiting for these synchronized artifacts.
In this case, reducing the maximum number of Web container threads on a per-cluster-member basis reduces these stalls. If, after that change, the processor is not consumed as described previously, then vertical cloning can increase the aggregate throughput for the whole node.

Testing: Performance and analysis

Objectives

Test Environment

Portal Infrastructure Baseline

Tuning Guide

Considerations

User scenarios

Think time

Cookies and sessions

Metrics

Repeatability principle

Saturation

Bottlenecks

Note on ramp rates

Priming the Portal

Analysis Techniques

Common Problems

JVM heap utilization

Logging

Java class and variable synchronization

Database contention

LDAP responsiveness

Excessive session sizes

Exceptions being thrown

Dynacache concerns DRS replication modes

Dynacache eviction concerns

Operating system concerns

Customer portlets

Capacity Planning

Load testing

Extrapolating Results

Test With Full Cluster

Failover Testing

Ongoing Capacity Planning

Cost of Vertical Clustering