From 1fc0ab0e98484ad833bd5ce06da024fe964e2a18 Mon Sep 17 00:00:00 2001
From: Sayali Bhavsar <sayalibhavsar9009@gmail.com>
Date: Mon, 15 Jun 2026 15:21:55 +0530
Subject: [PATCH] docs: rewrite README to match wrapper documentation standard

Closes #38
---
 README.md | 243 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 220 insertions(+), 23 deletions(-)
diff --git a/README.md b/README.md
index ddb9673..259f3e9 100755
--- a/README.md
+++ b/README.md
@@ -1,36 +1,233 @@
-Automation wrapper for linpack
+# Intel LINPACK Benchmark Wrapper
 
-Description:
-         The Linpack Benchmark is a measure of a computer's floating-point rate of execution.
-         It is determined by running a computer program that solves a dense system of linear
-         equations.
+## Description
+
+This wrapper facilitates the automated execution of the Intel LINPACK benchmark. LINPACK is a standard measure of a computer's floating-point rate of execution, determined by solving a dense system of linear equations. Performance is reported in GFLOPS (billions of floating-point operations per second).
+
+The wrapper provides:
+- Automated LINPACK execution with CPU topology-aware configuration.
+- Automatic detection of sockets, cores, and hyperthreading.
+- Per-socket and multi-socket thread scaling.
+- NUMA interleave control via numactl.
+- Result collection, processing, and verification.
+- CSV and JSON output formats.
+- System configuration metadata capture.
+- Integration with test_tools framework.
+- Optional Performance Co-Pilot (PCP) integration.
+
+## Command-Line Options
+
+```
+Linpack Options:
+  --interleave <value>: numactl interleave option. Defaults to "all".
+
+General test_tools options:
+  --home_parent <value>: Parent home directory. If not set, defaults to current working directory.
+  --host_config <value>: Host configuration name, defaults to current hostname.
+  --iterations <value>: Number of times to run the test, defaults to 1.
+  --run_user: User that is actually running the test on the test system. Defaults to current user.
+  --sys_type: Type of system working with (aws, azure, hostname). Defaults to hostname.
+  --sysname: Name of the system running, used in determining config files. Defaults to hostname.
+  --tuned_setting: Used in naming the results directory. For RHEL, defaults to current active tuned profile.
+      For non-RHEL systems, defaults to 'none'.
+  --use_pcp: Enable Performance Co-Pilot monitoring during test execution.
+  --tools_git <value>: Git repo to retrieve the required tools from.
+      Default: https://github.com/redhat-performance/test_tools-wrappers
+  --usage: Display this usage message.
+```
+
+## What the Script Does
+
+The `linpack_run` script performs the following workflow:
+
+1. **Environment Setup**:
+   - Clones the test_tools-wrappers repository if not present (default: ~/test_tools).
+   - Sources error codes and general setup utilities.
+
+2. **Package Installation**:
+   - Installs required dependencies via package_tool.
+   - Dependencies are defined in linpack.json for different OS variants (RHEL, Ubuntu, SLES, Amazon Linux).
+
+3. **Hardware Detection**:
+   - Detects CPU count, cores per socket, threads per core, and number of sockets.
+   - Identifies hyperthreading configuration and socket-to-CPU mappings.
+   - Determines NUMA topology for memory interleaving.
+
+4. **LINPACK Binary Setup**:
+   - Copies the pre-built `xlinpack_xeon64` binary from the `uploads/` directory.
+   - Uses a `linpack.dat` configuration file to define problem size (default: N=20000).
+
+5. **Test Execution**:
+   - Runs LINPACK across socket configurations, scaling from single-socket to all sockets.
+   - Handles both hyperthreaded and non-hyperthreaded configurations separately.
+   - Uses `numactl --interleave` for memory placement control.
+   - Sets `OMP_NUM_THREADS` and `GOMP_CPU_AFFINITY` for thread binding.
+   - Executes for the specified number of iterations per configuration.
+
+6. **Data Collection**:
+   - Captures system configuration (CPU, memory, NUMA topology, kernel version).
+   - Records LINPACK configuration parameters and thread/socket settings.
+   - Logs timestamps for test runs.
+   - Optionally records PCP performance data.
+
+7. **Result Processing**:
+   - Extracts performance metrics (GFLOPS) from LINPACK output.
+   - Generates CSV files with configuration and performance data.
+   - Creates JSON output for verification.
+   - Validates results against Pydantic schema.
+
+8. **Verification**:
+   - Validates results against Pydantic schema (results_schema.py).
+   - Ensures all required fields are present and valid.
+   - Uses csv_to_json and verify_results from test_tools.
+
+9. **Output**:
+   - Saves all raw output files, processed CSV/JSON, and system metadata.
+   - Optionally saves PCP performance data.
+   - Archives results to configured storage location.
+
+## Dependencies
+
+Location of underlying workload: Requires the licensed Intel LINPACK binary (`xlinpack_xeon64`). The binary must be placed in the `uploads/` directory as `xlinpack_xeon64-NEW`.
 
 Location of useful documentation: https://www.netlib.org/utk/people/JackDongarra/faq-linpack.html
-Location of underlying workload: Requires the licensed linpack kit.
 
-Packages required: bc,numactl
+**Packages required**:
+- RHEL: bc, numactl, perf, unzip, git, zip
+- Ubuntu: unzip, zip
+- Amazon Linux: bc, unzip, zip
+- SLES: bc, libnuma1, git, unzip, zip
 
 To run:
+```bash
+git clone https://github.com/redhat-performance/linpack-wrapper
+cd linpack-wrapper/linpack
+./linpack_run
+```
+
+## The LINPACK Benchmark
+
+Intel LINPACK solves a dense system of linear equations:
+
+**Ax = b**
+
+Where:
+- **A** is an N x N matrix of double-precision floating-point numbers
+- **x** and **b** are vectors of length N
+- The benchmark measures the time to solve for x using LU factorization
+
+### Key Parameters (linpack.dat)
+
+1. **Problem Size (N)**: The dimension of the matrix. Default is 20000. Larger values use more memory and take longer but can achieve higher performance.
+
+2. **Leading Dimension**: Slightly larger than N (default: 20016) to avoid cache conflicts during matrix operations.
+
+3. **Alignment**: Memory alignment in KBytes (default: 4) for optimal cache line usage.
+
+### Thread and Socket Scaling
+
+The wrapper automatically runs LINPACK across multiple configurations:
+
+- **Non-hyperthreaded systems**: Runs per-socket, then progressively adds sockets.
+- **Hyperthreaded systems**: Runs with non-hyperthread cores first, then with hyperthread pairs, scaling across sockets.
+
+Thread affinity is controlled via `GOMP_CPU_AFFINITY` to pin threads to specific cores.
+
+### Performance Metric
+
+LINPACK reports performance in **GFLOPS** (billions of floating-point operations per second). Higher values indicate better floating-point computational capability.
+
+## Output Files
+
+The results directory (`linpack_results/`) contains:
+
+- **results_linpack.csv**: CSV file with LINPACK configuration and performance metrics per socket/thread configuration.
+- **results_linpack.json**: JSON output generated from the CSV for verification.
+- **linpack.out.\***: Raw output files from individual LINPACK runs, including CPU affinity and GFLOPS results.
+- **test_results_report**: Overall test pass/fail status.
+- **hw_info.out**: System hardware metadata (CPU info, memory, NUMA topology, kernel version).
+- **PCP data** (if --use_pcp option used): Performance Co-Pilot monitoring data.
+
+## Examples
+
+### Basic run with defaults
+```bash
+./linpack_run
 ```
-[root@hawkeye ~]# git clone https://github.com/redhat-performance/linpack-wrapper
-[root@hawkeye ~]# linpack-wrapper/linpack/linpack_run
+This runs with:
+- numactl interleave across all NUMA nodes
+- 1 iteration
+- Automatic socket/thread scaling based on detected hardware
+
+### Run with specific NUMA interleave
+```bash
+./linpack_run --interleave 0,1
 ```
+Interleaves memory across NUMA nodes 0 and 1 only.
 
+### Run multiple iterations
+```bash
+./linpack_run --iterations 3
+```
+Runs the benchmark 3 times per thread/socket configuration.
 
+### Run with PCP monitoring
+```bash
+./linpack_run --use_pcp
 ```
-Options
-linpack Usage:
-  --interleave: numactl interleave option
+Collects Performance Co-Pilot data during the run.
 
-General options
-  --home_parent <value>: Our parent home directory.  If not set, defaults to current working directory.
-  --host_config <value>: default is the current host name.
-  --iterations <value>: Number of times to run the test, defaults to 1.
-  --run_user: user that is actually running the test on the test system. Defaults to user running wrapper.
-  --sys_type: Type of system working with, aws, azure, hostname.  Defaults to hostname.
-  --sysname: name of the system running, used in determining config files.  Defaults to hostname.
-  --tuned_setting: used in naming the tar file, default for RHEL is the current active tuned.  For non
-    RHEL systems, default is none.
-  --usage: this usage message.
-  --use_pcp: Enables use of Performance Co-Pilot in wrappers, defaults to 0.
+### Combination example
+```bash
+./linpack_run --iterations 5 --interleave all --use_pcp
 ```
+Runs 5 iterations per configuration with NUMA interleave across all nodes and PCP monitoring.
+
+## Result Schema
+
+The Pydantic validation schema (`results_schema.py`) defines the following fields:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| ht_config | str | Hyperthreading configuration (e.g., hyper_no, ht_yes_1_socket_nh) |
+| sockets | int (>0) | Number of sockets used |
+| threads | int (>0) | Number of threads used |
+| unit | str | Performance unit (GFLOPS) |
+| MB_per_sec | int (>0) | Throughput in the reported unit |
+| cpu_affin | str | CPU affinity string used for the run |
+| Start_Date | datetime | Test start timestamp |
+| End_Date | datetime | Test end timestamp |
+
+## Return Codes
+
+The script uses standardized error codes from test_tools error_codes:
+- **0**: Success
+- **101**: Git clone failure
+- **E_GENERAL**: General execution errors (binary not found, test execution failures, validation failures).
+- **E_USAGE**: Invalid usage/arguments
+- **E_PARSE_ARGS**: Argument parsing errors
+
+## Notes
+
+### Licensed Binary
+LINPACK requires a pre-built Intel binary (`xlinpack_xeon64`). This binary is not included in the repository and must be obtained separately and placed as `uploads/xlinpack_xeon64-NEW` relative to the working directory.
+
+### Architecture Support
+This wrapper targets x86_64 Intel Xeon systems. The binary name (`xlinpack_xeon64`) reflects this architecture requirement.
+
+### NUMA Interleave
+The `--interleave` option controls how memory is distributed across NUMA nodes via `numactl --interleave`. The default value of "all" distributes memory round-robin across all available NUMA nodes, which is generally optimal for LINPACK's memory access patterns.
+
+### Hyperthreading Behavior
+The wrapper automatically detects hyperthreading and tests multiple configurations:
+- With HT: Tests non-HT cores only, then HT pairs, across increasing socket counts.
+- Without HT: Tests all cores per socket, then scales across sockets.
+
+This provides a comprehensive view of performance across different threading configurations.
+
+### Performance Tips
+- Run multiple iterations to verify consistency.
+- Ensure system is idle (no other workloads) for best results.
+- Disable CPU frequency scaling (use performance governor) for reproducible results.
+- Consider the active tuned profile on RHEL systems.
+- Custom `linpack.dat` files can be placed per-host using the host_config mechanism.