[NMC PRObE] PRObE Update & Call for Kodiak destructive testing proposals

Andree Jacobson andree at newmexicoconsortium.org
Tue Jun 2 07:54:48 MDT 2015


Greetings PRObE users.

  Summer is here and work on the next large PRObE cluster Nanek is well
under way. However, Kodiak's power distribution and serial console
aggregation units - also popularly referred to as Iceboxes, are failing at
a rapid rate (maybe not surprising after 8+ years of continuous
operation?). A few months back, we brought Kodiak down to 500 nodes because
we had 50 iceboxes left. Unfortunately many of these have now also failed.
In order to keep Kodiak stable, we are forced to cut another 300 nodes from
the pool. It will leave just one row of 200 Kodiak nodes that we can keep
up until Nanek is ready to come online.

  With this announcement, we also invite you to submit proposals for
destructive testing of the remaining 200 nodes of Kodiak. Send your
suggestions on how to put Kodiak nodes under fatal pressure and how you
propose to instrument and analyse those failures to
probe at newmexicoconsortium.org by June 30th, 2015.

  We thank you for your patience during this rebuilding phase and remind
you that Nome and Susitna are fully operational. Nome has 256x 16-core
nodes, and Susitna has 32x 64-core nodes (and K20 GPUs, SSDs, High speed
interconnect, etc).

  The PRObE team wishes you a great summer and we look forward to hearing
from you with your proposals.

/Andree

*Andree Jacobson*
Chief Information Officer
PRObE Project Manager

New Mexico Consortium
4200 W Jemez Rd, Ste. 200
Los Alamos, NM 87544

*Phone:* (505) 412-4180, *Fax:* (505) 212-0049
*Email:* andree at newmexicoconsortium.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://rfd.newmexicoconsortium.org/pipermail/nmc-probe/attachments/20150602/0b4c4e5f/attachment.html>


More information about the NMC-PRObE mailing list