Announcement
January 15, 2021 – The Solicitation due date is extended two calendar days to January 21, 2021, 4pm ET.
Responding to climate change requires understanding, adaptation, and mitigation to achieve transition to a low carbon society and global sustainability objectives. More Our common research goal is to develop, test, and apply state-of-the-science computer-based global climate simulation models, based upon a strong scientific foundation while leveraging leading edge high performance computing and information technologies. The objective is to increase dramatically the skill, resolution, complexity, and the throughput of computer model-based projections of climate variability and change to enable sound decision-making on issues of national importance. As part of this overall strategic partnership, ORNL provides leadership-class high-performance computing and data capabilities dedicated that enable climate prediction and projection at the modeled resolutions and levels of complexity essential for producing accurate regional climate variability and change. Less
ORNL seeks proposals associated with the acquisition, delivery, installation, integration, and operation of high performance computing (HPC) resources that can fulfill the global and regional numerical climate modeling requirements of the National Oceanic and Atmospheric Administration (NOAA). All computing resources associated with this procurement are collectively referenced as NOAA-C5.
ORNL will allow Q&A, in writing, through the Procurement Office, while the Request for Proposals (RFP) remains active. Interested Offerors must submit all questions, comments or other communication regarding the NOAA-C5 RFP, including the benchmarks, via email to the UT-Battelle, LLC (UT-Battelle) Procurement Officer, Georgia Stone at [email protected].
Offerors are advised to monitor this Web site for NOAA-C5 RFP amendments and other NOAA-C5 RFP information updates. ORNL may notify Offerors of updated NOAA-C5 RFP information via email. However, ORNL is under no obligation to do so.
Most Recent Q&A. Last update January 15, 2021
[Q1] In CM4, there is existing code
call mpp_send(msgsize, plen = 1, to_pe = send_pelist(p), tag=COMM_TAG_2) More where the mpp_send interface is mapped to the MPI_Isend – non-blocking send. The application launches several outstanding non-blocking send operations to different peers but uses the same send buffer (msgsize fortran variable) for all operations w/o waiting. The size of the send is 4 bytes, so in most cases it proceeds through bcopy eager protocol (ie the MPI library would copy the input to internal buffers) and in that case the application continues successfully. However, when the number of outstanding requests exceeds internal library resources the requests may go to pending mode and in this case the same buffer is modified with the next mpp_send call, causing an error. Please advise. [A1] One suggested solution to this specific issue that eliminates this assumption is 581c581 This same change can eliminate similar issues in CM4, ESM, and SPEAR. Less
< integer, allocatable, dimension(:) :: send_count, recv_count, recv_size2
—
> integer, allocatable, dimension(:) :: send_count, recv_count, recv_size2, send_arr
586c586
<
—
>
1140a1141,1142
>
>
1142a1145
> if (nsend_update >0) allocate(send_arr(nsend_update))
1152c1155,1156
< call mpp_send(msgsize, plen = 1, to_pe = send_pelist(p), tag=COMM_TAG_2)
—
> send_arr(p) = msgsize
> call mpp_send(send_arr(p), plen = 1, to_pe = send_pelist(p), tag=COMM_TAG_2)
1254c1258
<
—
> if(allocated(send_arr)) deallocate(send_arr)
[Q2] In ESM4, the application needs to specify rank counts for the “atm” and “ocean” models. Each of them have their own decomposition/inputs for any rank count. More [A2] The atm and ocean components run in parallel, so the shortest runtime requires finding the appropriate balance across the offered hardware and PE solution that optimizes the runtime of both. Offeror should initially execute ESM4 using the current/default decomposition/inputs for the rank counts for both the atm and ocean models to establish an improvement over the baseline (NOAA C4’s Intel Broadwell 18-core) hardware configuration, and document that result as described in Table 4-2 of the Statement of Work. The stdout for each of these runs provides all of the component runtimes for atmosphere, land, ice, ocean so that Offeror may assess how well the initial configuration is balanced. Offeror may then adjust the horizontal domain decomposition to further optimize or balance the problem as they see fit. The intent of these adjustments is to optimize the ranks / threads for each component to balance the runtimes, avoiding overall load imbalance. The Offeror may experiment with different task and thread layouts for the optimal configuration. While the Offeror may change the horizontal decomposition at will, the number of vertical levels and the vertical decomposition must not be altered. It is noted that, in general, if a given component’s runtime is dominated by parallel overheads, then increasing the tasks and / or threads for that component will not improve overall runtime. Further, increasing the parallelism for each component in isolation may help determine which gives the greatest reduction in overall runtime for a given increase in parallelism.While the calculation of the Figure of Merit V uses the main loop time (the longest of these components), Company will evaluate the independent component runtimes of both the baseline and optimized case as well. Please ensure that both the baseline and optimized cases are reported in the format shown in Table 4-2, and that the stdout for both baseline and optimized is included in your response. Calculation of V is expected to use the optimized result.The demonstration of the impact of hardware features that might include multi-threading, or similar is encouraged. The use of additional threads can be changed by altering coupler_nml::atmos_nthreads in the input.nml for the specific runs. As provided, the benchmarks take advantage of an internal framework for process pinning, rank affinity, and other similar features that interoperate with the Slurm resource manager. If the Offeror is using Slurm and wants to take advantage of this internal framework, you may modify the namelist parameter coupler_nml::use_hyper_thread to .TRUE. If the Offeror prefers to accomplish this via external tools or command-line options, the internal framework can be disabled by adding a namelist section to the input.nml: &fms_affinity_nml affinity = .false. / Less
[Q3] With SHiELD, we observe that validation may pass or fail depending on the decomposition used for layout_x, layout_y, io_layout. More [A3] The only validation requirement for these tests is that the answer is bit-wise reproducible given the same parallel decomposition, the same binary executable, and the same inputs. There is no specific capability of verifying correctness if the compiler flags, inputs, or decomposition are changed. To ensure accuracy of the result, the Offeror should ensure that the baseline problem validates successfully, and note this in their response.Beyond the validation of the baseline problem-If the Offeror changes the code, then Offeror should demonstrate that that same code base, with the same decomposition and compiler flags still matches the baseline result.If the Offeror uses different compiler options and/or adjusts the horizontal decomposition, potential changes to the floating-point numerics may generate a different result and, as a result, a validation failure using the provided script. If this is the case, Offeror may simply delete the cksums and INITIAL files in the ${BASEDIR}/SHiELD/VERIFICATION/small/ directory, and then rerun that case to generate a “new” result in the VERIFICATION directory. Offeror should then repeat the same run, with successful verification. The SHiELD benchmark is a 3km average grid spacing production-sample using 3072x3072x79(L) degrees of freedom per cubed-sphere tile with an atmosphere-only run. The Offeror may assess different parallel decompositions in the horizontal dimension. Generally, the application performs better when the size of each decomposed domain is similar in each of the two dimensions. MPI overheads are not expected to play a significant role in this benchmark. Less
[Q4] From SOW v.1.0.2, Table 4.1 (Benchmark Baselines and Weighting) shows TBD for the Baseline Times for the new SHiELD benchmark.
More [A4] The baseline times for the small SHiELD case are main loop time: 10,325 seconds; total runtime: 10,510 seconds; 6144 cores. This is corrected in SOW v.1.0.3 (Amendment 1). Less
[Q5] For ESM4, please identify the parameters in the input file(s) that relate to modification/configuration of the horizontal domain decomposition.
More [A5] ESM4 contains multiple inter-related dependencies that can be quite nuanced. Namelist options that control the horizontal (x and y) domain decomposition may also have dependencies with related namelist options. Further, if any of the components use MPI rank masking, modifications to the domain decomposition are further nuanced. Offerors may adjust namelist options, but should proceed cautiously as there is no expedient answer to this question. Given that the baseline case provided is balanced for the current C4 system, Offerors may choose to report the baseline without significant effort to additional decomposition efforts. Less
[Q6] For SPEAR, is it permitted to modify the input.nml parameter “ocean_npes”, currently configured to a value of 180?
More [A6] The ocean component of SPEAR does not use MPI-rank masking, making this significantly easier to accomplish, The baseline case is structured as follows: SPEAR is running (concurrently) 4 ensembles. Each ensemble is using 108 PEs, for a total of 4*108 = 432 PEs. This is described in INPUT/MOM_layout as: #override LAYOUT = 12,9 where the product of the two numbers, here 12,9, is the number of PEs per ensemble. As a specific example, the Offeror could adjust the number of PEs per ensemble. A target of 45 PEs per ensemble, 4 ensembles, 180 PEs total would require a change to the INPUT/MOMN_layout to, for example: #override LAYOUT = 5,9 and change the namelist variable “ocean_npes” under the namelist “coupler_nml” to “45”. Offeror should evaluate the impact to the main loop time, as changes to the ocean component may generate or exacerbate an imbalance. Less
[Q7] In ESM4, in src/esm4.1_libs_compile/GFDL_atmos_cubed_sphere/driver/GFDL/atmosphere.F90,
More The specific error reported is that the extent of dimension 4 of array QTEND is 4 and the corresponding extent of array ATM is 88. What remedy exists that can eliminate this condition? [A7] Offeror may eliminate LHS vs RHS prevalence within a specific compiler by modifying the source code in src/esm4.1_libs_compile/GFDL_atmos_cubed_sphere/driver/GFDL/atmosphere.F90 from if ( id_tdt_dyn>0 .or. query_cmip_diag_id(ID_tnta) ) ttend(:, :, : ) = Atm(mygrid)%pt(isc:iec, jsc:jec, : ) if ( any((/ id_qdt_dyn, id_qldt_dyn, id_qidt_dyn, id_qadt_dyn /) > 0) .or. & query_cmip_diag_id(ID_tnhusa) ) qtend(:, :, :, : ) = Atm(mygrid)%q (isc:iec, jsc:jec, :, : ) to if ( id_tdt_dyn>0 .or. query_cmip_diag_id(ID_tnta) ) ttend(:, :, : ) = Atm(mygrid)%pt(isc:iec, jsc:jec, : ) Less [Q8] We have proposal data that will be over the 25MB email limit. Do you have a method for receiving a file over the 25MB email limit? More An Offeror may still choose to use email. The limit of the size of an individual email is 25MB for the ORNL email system. Refer to the instruction in the Solicitation and Offer for further details. If Offeror intends to use the File Upload Service, please upload a sample document no later than 4pm Eastern Time on January 15, 2021 and ORNL will verify receipt. Less [Q9] Section 4.3, Table 4-1 describes individual weights that are associated with the five benchmarks. More [A9] Table 4-2 indicates that the EMS Large benchmark does not contribute to the calculation of performance figure V, and is (effectively) optional. However, as Offeror notes, Table 4-1 (through Amendment 3) incorrectly stipulated that ESM Large would contribute 20% of the improvement above the baseline to the performance figure V. Laboratory has revised Table 4-1 to reflect the contribution of each benchmark to the overall improvement. This increases the contribution from CM4, ESM Small, SHiELD and SPEAR. UFS remains 5%. Please reference Attachment 1 version 1.0.4 for the revised weights. As this question was received very late in the solicitation process, Laboratory will extend the due date two calendar days to January 21, 2021 to allow Offerors to revisit the changes to the calculation of V. Less RFP Documents (1) The Subcontract Administrator/Procurement Officer Contact is changed to: Georgia Stone (2) Section J is revised to extend the date for Questions to January 12, 2021, 4pm ET. Benchmarks ftp://ftp.gfdl.noaa.gov/perm/GFDL_pubrelease/Benchmarks/SHiELD_6144 Deprecated Items Attachment 1 – NOAA-C5 Statement of Work. Anticipated Structure The information in this section is provided for consideration only prior to release of the planned RFP. The final proposal instructions will be included in the RFP when it is published on this website. An award resulting from the planned RFP may be made to the responsible Offeror who submits a proposal that is determined to provide the best value to the Laboratory considering both technical merit and price and the trade-offs between price and technical merit (e.g., the value in selecting a higher priced proposal against the technical merit of the proposal). Laboratory reserves the right to (1) make a selection on the basis of an initial proposal; (2) negotiate with any or all Offerors for any reason; (3) award a subcontract based on all or part of an Offeror’s proposal, including any options contained in the proposal; (4) reject any or all proposals and make no award; (5) waive any minor irregularities in any proposal; (6) cancel the request for proposal (RFP) at any time prior to award without cost to the Laboratory or the Government. The total price of the base award including options shall not exceed $24,500,000. An Offeror may submit more than one proposal. However, each submission must stand alone as an independent Response, meeting all proposal criteria. The RFP documents and attachments will be published on this website. Procurement Schedule Prior to the release of the RFP, ORNL will provide a representative set of benchmarks that will form the performance guarantee for an Offeror’s proposal. The early release of the benchmarks provides Offerors with more time to assess the applications, the data sets, the correctness criteria, and how these applications may perform on candidate systems. The benchmarks are hosted and provided by NOAA. Prior to the release of the RFP, ORNL will provide a separate DRAFT Statement of Work that provides the technical specifications that are anticipated for the NOAA-C5 system. This Statement of Work will describe Program Background; High Level Design Objectives; Benchmarks; Compute, I/O, Interconnect, Network and OS; Maintenance and Support, Facilities, Program Management, and Acceptance Test Requirements. Offerors are encouraged to read the DRAFT SOW carefully, and to ask clarifying questions that can strengthen the document or resolve any initial inconsistencies. ORNL will use a competitive RFP selection process. The release of the RFP will begin a quiet period for the procurement, with any and all communication between a potential Offeror and ORNL mandated through ORNL Procurement. The official Solicitation and Offer will include the current/revised SOW, subject to any necessary modifications, the Proposal Preparation Instructions, the Price Schedule, and other supporting documents. The RFP is available only via ORNL-maintained web services, and requires authorization and authentication for access. Procurement Officer Contact Contact Georgia Stone, Procurement Officer, UT-Battelle
Email: [email protected]
Phone: (865)341-0638
2. unzip and untar that package
3. Grab the existing errata: ftp://ftp.gfdl.noaa.gov/perm/GFDL_pubrelease/Benchmarks/ESM4_errata.tar
4. untar that small tarball.
5. Follow instructions in ESM4/ERRATA/README.errata for patching ESM4
Comments and questions due for the DRAFT no later than Friday, October 2, 2020 5:00pm Pacific Time.