Revisiting the fan failures

…And it’s been over a year again.

In my defense, during that year we’ve been working pretty hard to get the Exactum Greenhouse to produce foodstuff, with more than moderate success. But this post is not about the Greenhouse or the chilies contained therein. This post is about the Nidec DR04XLG-12PUS1 dual fan units that have been failing systematically in the Helsinki Chamber.

For over three years, some of our servers have withstood the harsh winters and exceptionally sunny summers of our nordic climate. But not all. A batch of HP DL360 g3 servers did all fail, and in all failures the root cause was the same: the CPU fan block.

Now that the defense of my Ph.D. thesis looms nearer, we are trying to finalize an article about the long term effects of free air cooling. Generally speaking, all other components have not exhibited systematic failures. Since our set of hardware is pretty diverse, it seems that our experiment has been somewhat of a success.

But for the article, we need to delve deeper, and try to explain what precisely goes wrong with the CPU fan blocks.

Nidec fans and a digital multimeter

The outer part of the dual fan cut open for multimeter measurements. One of the units is working, the other two dead.

As mentioned before, the Nidec dual fan model is designated DR04XLG-12PUS1. It consists of a three-blade “inner” unit designated R04G-12PS1 B, and a five-blade “outer” unit designated D04XL-12PU B. It is the outer unit which fails. Previously, I had thought that the outer unit still spins up and rotates, but this proved to be false. Of the failing CPU fan blocks, most of the outer fans fail completely and do not spin up. Occasional spasms may be seen, but that is all.

Luckily, there wasn’t just a single failure. Of the 5-6 CPU fan blocks, almost every single one displayed the “1611 CPU Zone Fan Assembly Failures“. This means that we have plenty of broken fans to experiment with, and that is precisely what we did.

With the help of Mikko “dogo” Rantanen, we cut open several of the broken fan units. We also cut open a fan unit which still worked in order to get a comparison point. Then, we started to figure out what is the difference between a live and a dead unit.

Our initial hypothesis was that the coil wires might be faulty. This could have meant that the insulation around the wires had oxidized and caused a short circuit, preventing the motor to spin up. In order to test this hypothesis we measured the resistance across the circuit board’s connections to the coil wires.

Nidec fan with solder points marked

The three solder points to the coil wires are marked A, B, and C.

The circuit board has three connection points to the coil wires. We designated these points as A, B, and C. Their pairwise resistances are as follows

  • A-B ~4,8 Ω
  • B-C ~2,4 Ω
  • A-C ~4,8 Ω

Except in a failing unit. For a unit that refuses to spin up, the pairwise resistances are higher, but not precisely double:

  • A-B ~8,2 Ω
  • B-C ~16,2 Ω
  • A-C ~8,2 Ω

What does this mean? We don’t precisely know yet. What we know at this point is this:

  1. There seemed to be no short circuit, so our initial hypothesis was abandoned
  2. The soldering points seem to be working as they should, so it’s not a problem with the solder
  3. We can deduce if a unit has failed based on the resistance readouts

Perhaps during next week, we will find out more.

Posted in Helsinki Chamber | Tagged | Leave a comment

Greenhouse over Exactum

Today, we finally brought the plants into our greenhouse. I’ve intentionally kept this project in the dark until we had proper pictures, but that changes now:

Lassi placing the first plants.

Lassi placing the first plants. (Click image for the picture gallery.)

On December 5th, 2012 we finished the construction of our experimental rooftop greenhouse. The idea is to harvest the exhaust heat from the computer servers we’ve been free cooling with outside air since 2010. In this, our project is similar to that of Dr. Paul Brenner‘s Green Cloud supercomputing project. They harvest server exhaust heat up a greenhouse at the South Bend Conservatory.

The main differences between South Bend and us are that

  1. we constructed the greenhouse from scratch in order to examine the viability of rooftop gardening
  2. the greenhouse is part of a larger green roof installation under construction for the World Design Capital Helsinki year 2012
  3. our climate is somewhat more harsh, presenting additional questions about the sustainability of the plants (see below)
Greenhouse surrounded by snow

Greenhouse surrounded by snow

Due to its prototype nature, our greenhouse is also much smaller than that of the South Bend Conservatory. The full size of our greenhouse is only 9,4 m². We heat our greenhouse using an experimental 26 U rack presented earlier in this blog.
I will present the greenhouse dimensions, heat characteristics, and plant selection in more detail in upcoming posts. Right now, we are looking for volunteers to help out with the project. If you become interested, don’t hesitate to contact Mikko Pervilä and/or Lassi Remes at the University of Helsinki.
Posted in greenhouse | Leave a comment

Systematic fan failures in the HC

I confess to having had a bad case of writer’s block with this blog. Subsequently, there have been no updates for the past seven months. I’m correcting this issue now.

The cause for the blog blackout has been a number of fan failures affecting a specific brand and model of servers in the Helsinki Chamber (HC). As a scientist, I’ve tried to figure out conclusive proof for the cause of the faults, and having been unable to do so, I have been unwilling to publish partial results. My hope is that this blog post might attract the web searches of a system administrator who has experienced similar failures and been able to deduct their root cause.

In our case, the failing servers are 1U sized HP DL360 G3 models. In the two years we’ve been running experiments using free air-side cooling using direct, unconditioned outside air, these servers have been the only ones to exhibit systematic failures.  This kind of failures has also been called common-mode (CMF) or common-cause failures (CCF). It is of particular interest to me, as this phenomenon is the very one that I originally set out to study with the direct free air cooling experiments.

HP DL360 G3 CPU fan block

Figure 1: HP DL360 G3 CPU fan block

Caveat emptor

We have been using the same HP DL360 G3 models in our regular data center for years, and in no way have they been more or less prone to failures than other server brands or models. I am not claiming that these models are flawed although they do fail regularly when cooled with outside air. Similarly, I can not claim that direct free air cooling would be an unfeasible technique: the numerous other models we have used have not exhibited CMFs. What can be said is that with a reasonable probability, there exists a server fan type which is unsuitable for direct free air cooling, and a number of other server fan types which remain suitable.

Symptoms

The servers in question start to fail with the following warnings in their HP Integrated Management Log (IML).

Event: 20 Added: 06/14/2011 02:32
CRITICAL: Machine Environment - Fan Failure (Fan 1, Location CPU).
Event: 21 Added: 06/14/2011 02:32
CRITICAL: OS Class - Automatic Operating System Shutdown Initiated Due to Fan Failure.
Event: 22 Added: 06/13/2011 23:59
CAUTION: POST Messages - POST Error: 1611-CPU Zone Fan Assembly Failure Detected.
Event: 23 Added: 06/13/2011 23:59
CAUTION: POST Messages - POST Error: Fan Solution Not Sufficient.
Event: 24 Added: 06/14/2011 00:59

The errors are duplicated into syslog through the hpasmd daemon, if it is running:

Oct 23 09:00:15 lost25 hpasmd[747]: CRITICAL: hpasmd: Fan Failure (Fan 1, Location CPU)
Oct 23 09:00:15 lost25 hpasmd[747]: CRITICAL: hpasmd: Automatic Operating System Shutdown
Initiated Due to Fan Failure
Oct 23 09:00:22 lost25 hpasmd[747]: NOTICE: hpasmd: Fan Failure (Fan 1, Location CPU)
has been repaired
Oct 23 09:00:22 lost25 hpasmd[747]: NOTICE: hpasmd: Automatic Operating System Shutdown
Initiated Due to Fan Failure has been repaired

What happens is that the CPU fan block depicted in Fig.1 tells the system management board that there is a failing fan. As there does not seem to be any redundancy in the block despite the four fan assemblies, even a single faulty fan is enough to cause the errors to surface in the block. I have experimented this by shifting fans one-by-one from a malfunctioning server to a correctly functioning server, until the latter starts to show the same symptoms as the previous.

Initially, the errors are simply warnings and the malfunctioning server will recover, aborting the automatic operating system shutdown. Later on, this will happen less and less, causing the server to go into a delayed reboot loop. The OS will shut down and the server will remain shut down for a varying number of minutes, and after this, the system board will try again. After a few minutes of operating, a new critical warning is logged and the OS is shut down again.

Attempted repairs

Despite the logged errors, visual inspection reveals that the fan assemblies remain operational and keep rotating until the shutdown. Since we have been running the servers in pretty low temperatures, even a completely dead fan block would probably have been sufficient for normal operation.

In our case, a total of seven HP DL360 G3 fan blocks have failed with this type of problem. We initially installed five units in the HC, and after discovering these problems, I replaced two fan blocks with used fan blocks from spare servers. I also reconstructed three more “correct” fan blocks by marking the failed fans and shifting unmarked fans until a fan block no longer reported problems.

These replacements rule out individual events like power spikes which might have destroyed the fans. Likewise, web searches reveal no error reports supporting the idea that the problems would be caused by faulty firmware, perhaps solvable through an upgrade. As only the fan blocks placed in the HC have failed, the root cause does not seem to be a manufacturing or handling error either. Finally, the problems are not caused by the fan assemblies clogging with pollen or lint, which I have verified by breaking down the assemblies into their base components.

The end result is that all seven fan blocks ultimately caused delayed reboot loops, and we were forced to remove the HP DL360 G3:s from the HC. The identical models purchased at the same time in our regular data center have not caused problems, and neither has the 2U-sized HP DL380 G3 which is still installed in the HC.

Nidec DR04XLG-12PUS1

Figure 2: Nidec DR04XLG-12PUS1 fan units

Failing fans

The fan used in the CPU fan block are Nidec DR04XLG-12PUS1 40*40*48 mm units. A web search will yield other users who have had problems with this fan type, but not enough to claim that the model would be systematically erroneous. Our own control groups also disproves this idea, as the servers in our regular data centers have not failed.

What is peculiar about these fans is that they are double fan units, i.e., there are two fans connected in a serial fashion. In addition, the second fan is reversed and rotates in the opposite direction. This behaviour is our current best guess on what goes wrong.

Buest guess at cause

The motor section of the Nidec fans is visible in Figure 2. As in normal fans, the motor is located in the center of the fan unit and the fan blades rotate in front of the motor unit. My theory is that in a normal fan, the blades work somewhat like an umbrella, pushing the humidity in the air away from the motor shielded in the middle of the unit.

Since in the units the second fan unit is reversed, the internals of the motor become more prone to any humidity in the air. This might cause transient faults which the fan unit then reports to the fan block, and onwards to the system management board.

Solutions?

So far, we have figured out no solutions for the problem. What I’m planning is an emulated experiment by removing one of the DL360 G3:s from the control group and trying to see if I can make the server fail indoors by raising the relative humidity of the air. If so, the cause of the problems is the humidity combined with the reversed air flow.

If there are any readers with hands-on experience with this type of errors, we are interested in hearing from you. My user account is pervila at the Department of CS servers, so you can easily figure out my e-mail address.

Posted in Helsinki Chamber | Tagged | Leave a comment

Keyboards and dishwashers

Click picture above for full gallery.

Now, this is certainly nothing new, as multiple sources have already reported that some keyboards are dishwasher-safe, or that they can at least take one round of dishwashing:

  1. http://www.npr.org/templates/story/story.php?storyId=11029793
  2. http://www.wikihow.com/Clean-a-Keyboard-in-a-Dishwasher
  3. http://boingboing.net/2005/05/30/clean-your-keyboard-.html
  4. http://voices.washingtonpost.com/fasterforward/2009/04/can_you_clean_a_keyboard_in_a.html

What piqued my curiousity was that some comments reported problems, a few total failures, and all very varying drying times and techniques, like shaking the keyboards every 1-2 days for a full week. Sunlight was often mentioned as a quick way to dry the keyboards.

Why did I choose to delve into this? In Finland, keyboards are categorized as electronic waste (“SER-jäte”) meaning that they contain hazardous materials. Granted, most of the chassis is plastic, but the circuit board and the connector sheets require special care after disposing.

Now, a single keyboard might not seem like much of a problem, and certainly not like much of an investment, but multiply that by a each computer we have at the Computer Science Department and you will get multiple cubic metres of wasted materials. What’s worse, often these keyboards get thrown not because they are faulty but because it would cost too much to clean them!

 

A particularly gruesome case. The picture doesn’t do it full justice.

What if there was a way to get the keyboards clean in an automated fashion, requiring nothing but a dishwasher and some spare time to wait for drying? A long story short, I assembled six keyboards and an optical mouse to figure out what exactly happens to these devices during the normal dish washing cycle. I got the following devices as test subjects:

  1. Dell RT7D60 USB with electronic card reader
  2. Keytronic E06102SV019-C PS/2
  3. Logitech Y-ST39 PS/2
  4. Logitech Y-ST39 PS/2
  5. Logitech Y-SG13 PS/2
  6. Logitech-branded Liteon-HIS Y-UT76
  7. Logitech M-BT58 Optical USB Cord Mouse

I washed them in a max. 50 C washing program for glassware, and deselected drying. After the program finished, the keyboards were positively drenched, and had to be drained before removal from the dishwasher. I left the keyboards to dry over the weekend for over 86 hours.

On the following Monday, units #1, #3, #4, and #7 worked perfectly without additional complications. It was specially encouraging to notice that the mouse belonged to this category, since during prolonged use, mice tend to become almost as grimy as keyboards.

The other three worked very poorly or mostly not at all. I opened the keyboards and noticed that most of the devices had accumulated water between the connector sheets used to signal key presses. Interestingly, even the working units exhibited water droplets, but still worked perfectly! This leads me to believe that there exists a low-cost technique which allows all keyboards to remain dishwasher-safe.

 

Pens and pencils inserted between sheets for additional air circulation

For the nonworking units, additional maneuvering was necessary to gently peel the sheets apart and leave them to dry for an additional 24 hours. I’m quite sure that all units would’ve dried a lot faster if they were opened to begin with, but the point of the exercise was to get the keyboards clean with minimal user intervention.

Finally, after the remaining three keyboards had dried out completely, I reassembled them (with one flaw) and retried the testing procedure. Now, all six units plus the mouse were working perfectly. We had now removed a number of keyboards from the waste pile and put them back into active use.

Additional things to research:

  • Effect of microbes and other growths before and after dishwashing
  • Medical recommendations for cleaning keyboards, specially in multiple-user environments like the department
  • How many times a keyboard can be put through the washing cycle
  • What is the minimum change necessary to allow for fully automatic washing, i.e., no chassis opening required
Posted in Meta | Tagged | Leave a comment

Improving chamber air flow, part 3: Improvements

I contacted Halton‘s Risto Kosonen and enquired whether Halton would be interested in participating in our research. After a few phone discussions, Risto caught the idea and promised to consider it. Roughly a week from that, I got a call from Exactum‘s doorman. We had received quite a hefty package.

Ventilation grille made of steel

The outdoors ventilation grille in the picture above is designed to cover a building’t supply air intake. The fins are form roughly 45 degree angles and permit air trough with a minimized resistance. Rain is almost completely blocked by the sharp upwards turn, and the water will gather at the bottom of the grill, from where it will simply flow down a connecting wall.

We used the grille for the exactly opposite purpose, and replaced our home-grown exhaust cover with the grille. As we didn’t want to waste the exhaust cover, it was repurposed as the supply cover. Thus, exhaust air flows straight through the grille and out, but supply air is forced to make the 90 degree turns described earlier. The following video displays the current configuration.

Helsinki Chamber with improved air flow (on YouTube)

With these changes, we are able to supply air with a temperature elevation of within 1 C of the ambient temperature. The drawbacks are that we can no longer view the computers leds through the clear plexiglass windows. In addition, the wing nuts we have been using to secure the plexiglass covers are clearly suboptimal for the grille and our cover hat. Finally, snow remains a problem with both covers. Very fine snow could very well be carried either with the intake suction or through the rear grille, given a sufficient wind speed.

The next step is to connect both covers with sliders so that an administrator can easily view what the situation inside the chamber is. As for snow, I have no idea yet. Switching to the plexiglass covers avoid the problem, but this is troublesome if an admin operates more than a single chamber.

Posted in Helsinki Chamber | Tagged | Leave a comment

Improving chamber air flow, part 2: Solutions

After quite a lot of design work, I decided to ask Ville Hautakangas for help. Ville helped me to improve the design and solved most of the manufacturing problems. Our solution consisted of a “cover hat” for the exhaust chamber, similar in design to some ventilation pipes used in housing and appartment buildings.

Construction phase with materials

A master craftsman at work

As materials, we used 5 mm thick corrugated plastic sheets from Zymotec. Model airplane enthusiasts have used the same material for miniature construction, since the sheets are easy to handle and quite strong. We had previous experience with corrugated plastic sheets from the second version of our home-grown cold aisle containment setup, which I will describe later on in this blog.

Our idea for the cover was to force the exhaust air flow around two 90 degree angles and out from the exhaust chamber. For the air flow, this is a moderate flow impediment, but still manageable. Conversely, water can only reach the inner wall of the exhaust cover, from where it will run down through the open bottom section of the exhaust cover.

Temperature changes in May 2011

Sensor temperatures relative to ambient, May 2011

The graph above depicts sensor temperatures from the supply and exhaust chambers relative to the ambient (climate) temperature shown in green. The first part of the graph show that the exhaust section has gradually become quite warm, with elevations spiking 35 C above ambient 10 to 15 C, resulting in absolute temperatures of 45 to 50 C in the exhaust section.

The first drop in the graph occurs on 5.5.2011 (middle of week 18) after the installation of our home-grown rear cover. For many hours, everything seemed to be perfect with drops of over 20 C for the lower exhaust sensor, for example. Unfortunately, the following day showed that the temps were mostly normalizing. The effect is visible as the spike that immediately follows the surge.

This caused much head-scratching, as the exhaust air flow had definitely improved. Ville turned out to be right in his guesstimate: the supply chamber had to be improved as well. We had encountered both exhaust overpressure and supply underpressure, and both had to be resolved separately.

(The graph is somewhat of a spoiler, for in the end, our solutions did work.)

Posted in Helsinki Chamber | Tagged | Leave a comment

Improving chamber air flow, part 1: New problems

Over the past two months, we have encountered steadily raising temperatures within the prototype Helsinki Chamber. After some investigation, we had to conclude that both the exhaust and supply air flows were insufficient. During experiments, two interesting new phenomena occurred

  1. Supply underpressure, whenever the supply is unsufficient
  2. Exhaust overpressure, whenever the exhaust is insufficient

During supply underpressure, the servers are simply provided insufficient supply air. This causes suction on the supply chamber, which forces some of the exhaust air to reverse flow through the servers and back into the supply chamber. Exhaust overpressure works similarly, but the cause is the insufficient air flow out of the exhaust chamber.

Both effects can occur on their own, and both can be verified by allowing the HC to run without either the front or back covers. Moreover, both effects have their counterparts in production environments using either cold or hot aisle containment. Supply underpressure can happen whenever the CRACs do not supply enough cool air into the cold aisle. Exhaust overpressure is more elusive, but will happen if a hot aisle has insufficient exhaust air suction.

These effects were initially very depressing, because from an engineering perspective they signalled that only a full “wind tunnel” would be ideal for server air flow. Such a tunnel would be very tricky to construct without also allowing rain or snow inside the chambers. This meant that a certain researcher had to return to the design whiteboard.

Picture showing design phase

Initial sketches on how to fix the problem

 

Posted in Helsinki Chamber | Tagged | Leave a comment

Measuring temperatures: part 3

While waiting for the DS9490R USB adapters to arrive, we designed cabling, pinouts, which conductor colors to use and the sensor housing (inside RJ11 female-female adapter). The USB adapter pinout is

  1. Vdd, +5VDC outputDS9490R pinout
  2. GND, power ground
  3. OW, 1-Wire data
  4. GND_OW, 1-Wire return
  5. SUSO, USB suspend output
  6. NC (not connected)

DS18B20 pinout (TO-92, 8-pin 150mil SO and μSOP)

and the DS18B20 pinout is

  1. GND, common ground
  2. DQ, 1-Wire data
  3. Vdd, +3.0V..+5.5V for non-parasitic mode of operation

We used the TO-92 form factor device (leftmost in the picture).

Between the RJ11 housings we have two pair category 5 cabling. We have a 6 pin connector on the USB adapter, 4 pin jacks to connect to the RJ11 housings and 3 pins on the temperature sensors. We decided to build a 6-to-4 pin conversion cable and use straight 4-pin cabling between the RJ11 sensor housings. Within minutes, we had following schema laid out:

Pinout and coloring for cabling

The conversion cable can be just 10-15 cm long and then connected to one of the RJ11 female-female adapters (without a sensor inside) or measured and cut to custom length so that it connects to the first RJ11 sensor housing on the wire.

Building the sensor housings took some iterative work. To allow air flow through the adapter, each RJ11 adapter housing was drilled with two holes on the opposing, larger sides. We recommend following workflow:

  1. Open one adapter.
  2. Examine the adapter and decide on best hole positions. Some places are structurally impossible.
  3. Use a mounted drill to gingerly puncture the housing of each adapter. Be careful no to damage the internal wiring by drilling too deep.

In the pictures below, drill size of 5mm has been used. The centers of the drill holes are 10mm apart and 4,5mm from the centerline (where the housing breaks into two halves).

We tried soldering the DS18B20 sensors to the wires inside the RJ11 adapters, but that proved to be an exercise in futility. Though the sensor did work, the soldered leads needed to be covered in some dielectric (=insulation tape) to avoid short circuiting the leads. With 20 * 3 = 60 wires to strip of insulation, carefully solder to sensor leads and finally insulate, this would have been unnecessarily laborious task.

In the end, we settled for a purely mechanical connection with the following workflow (click the links for pics):

  1. Open the RJ11 female-female adapter (some are tight and holding with pliers helps).
  2. Use 3-4mm wide flat-head screwdriver or similar to lift female pin housing from the adapter.
  3. Straighten the pins and draw them few millimeters out of their sockets.
  4. Insert DS18B20 into same holes with the pins. Our pin ordering was: 1=yellow, 2=green, 3=black.
  5. Force the pins back into their sockets. The flat-head screwdriver works well here. Try to apply force directly towards the insertion orientation, and if you can’t get the pin back in, withdraw the DS18B20 slightly so that it does not sit too deep in the hole. See end result here.
  6. Bend the pins back to their original shape.
  7. Insert the pin housing back to the RJ11 adapter frame.
  8. Close the sensor housing.

It’ll take some 150-180 seconds to do one sensor after some practice. For reference, please see this youtube-video.

Finally, connect the parts together, blacklist Linux ds2490 kernel module, run digitemp_DS2490 from cron, create RRD-files and start graphing your datacenter temperatures.

Posted in Meta | Leave a comment

Measuring temperatures: part 2

DS9490R USB to 1-wire adapterA 1-Wire network seemed really promising, so we decided tentatively to use it for temperature instrumentation. We set out to list what parts are needed: at least a computer<->1-Wire adapter, some sensors and cabling to put it all together. For the 1-Wire adapter, serial port connectivity was out of the question as it can not provide operating power and is being obsoleted, so USB was the next logical choice. As luck has it, Maxim manufactures part number DS9490R which is a USB<->1-Wire adapter that can provide up to 42mA operating current to the bus.

DS18B20 closeup viewAs for the sensors, we knew from previous experience that DS18B20 sensors would be suitable for our purposes. The DS18B20 is a 1-Wire temperature sensor with selectable resolution of 0.5, 0.25, 0.125 or 0.0625 °C (9..12 bits),  0.5°C  accuracy from –10°C to +85°C and measurement drift of 0.2°C.

Maxim is a nice company and provides free samples of many of its products. However, their lead times, even when trying to buy from their web shop, were more in the order of months rather than weeks, so we set out to find suppliers from Finland. Having prior experience with sourcing parts, we quickly found out that Bebek in Hakaniemi and Partco supply DS18B20 temperature sensors and other parts off the shelf. The USB adapters were harder to come by on a short notice, but Tapio Haapala from F-Solutions lent us a hand.

RJ11 male-male adapterSome effort was put into thinking about how to actually connect the sensors onto the wire. After considerable deliberation, we decided to use RJ11 female-female adapters (with holes drilled to sides for air flow) to house the sensors (more about this on part 3). We can then use the adapters to both extend the bus as needed and multidrop the sensors wherever needed.

Cabling (and to some extent, connectors) is the backbone of the 1-Wire network, so we picked 100m of 2-pair Cat5 cable, about 10 pieces of 6P6C modular connectors, enough 6P4C modular connectors and about 25 pieces of RJ11 female-female adapters . The 6P6C jacks connect to the USB-adapter, the 6P4C jacks are used to connect the RJ11 adapters together and the RJ11 adapters are use to house the DS18B20 sensors. The term RJ11 is used here liberally to mean 6 position wide modular jack with variable connected pin count (Wikipedia has an article about Registered Jack naming).

Our final shopping list looked roughly like:

  • 2 DS9490R USB <-> 1-Wire adapters
  • 20 DS18B20 TO-92 packaged temperature sensors
  • 25 modular RJ11 female-female adapters
  • 5 6P6C modular connectors
  • 50 6P4C modular connectors
  • 100 m reel of Cat5 2-pair data cable

Total price for these items is somewhere under 250 euros. The cable reel is about 65-70 eur, the USB adapters about 35-40 euros per unit and the temperature sensors can be had for about 3..4 euros per unit. The RJ11 adapters’ and connectors’ price is next to nothing.

  1. http://datasheets.maxim-ic.com/en/ds/DS18B20.pdf
  2. http://datasheets.maxim-ic.com/en/ds/DS9490-DS9490R.pdf
Posted in Meta | Leave a comment

Measuring temperatures: part 1

This is post number #1 in a series on how to graph temperature in datacenter from scratch. This is an easy and suitable building project for almost all age groups, competence levels and even budgets. Parts 2 and 3 are published 24. and 26. April, respectively.

Machine room temperatures are often measured using built-in thermal sensors in computers. It is relatively easy to use SNMP to get data from CPU thermal diodes, inlet and outlet air flow temperatures, and then integrate them into some monitoring system with alerts. But the big picture can’t be conveyed using sensor points inside the computers. Design and implementation of a machine room should really include ambient temperature measurements to determine air flow and cooling system supply air temperatures.

Manufacturers tend to have solutions that are prohibitively expensive for bulk use (“starting at $325″), are big (size of a cigarette pack or more) and unwieldy (require separate cabling and electricity supply per unit). Enter summer 1989 and Dallas Semiconductor’s (DS) 1-Wire network (also known as MicroLAN™ or µLAN). This is a low-current, low-voltage bus which requires, at minimum, two conductors for data and power, simplifying cabling, design, and implementation. By 1990, the 1-Wire network protocol had matured and DS introduced first stainless-steel packaged, rugged battery-like memory devices, readable by contact to a reader connected to 1-Wire network. Small, inexpensive, TO92-packaged (small transistor) 1-Wire temperature and humidity sensors have been available at least since the early 1990′s.

1-Wire can use a multitude of topologies, but reliable networks are easiest implemented using bus topology, like in old coaxial ethernet networks. Star topology is not recommended unless 1-Wire switches are used. The term “1-Wire” is a bit misleading, because the network requires at least two conductors: one for data and operating power supply (for so-called parasitic powered devices) and another for signal/power ground. Common reference level (GND, ground) is required, just as with regular RS232 serial port which requires at least TX, RX and GND. Most 1-Wire attached devices can also be externally powered. They then have at least one more conductor for separate operating power supply (usually between 2.5V and 5.5V) and possibly, but not necessarily, a fourth conductor for additional ground lead. Usually signal and power ground can be common.

In other words, 1-Wire networks use either two or three conductors. The parasitic powered networks (two conductors, data and data ground) have stricter limitations than externally powered network with regards to network size and amount of devices that can be attached to the network. Externally powered networks can have lengths of up to 300m and contain tens of sensors. There is probably some upper limit, but DS writes in their application notes that the amount of sensors is virtually unlimited, because every sensor has unique 64-bit ID code.

Voltages and currents used in 1-Wire networks are very small. The devices have idle power requirements of less than 1000 nA and active power is typically less than 1.5 mA. Voltage swing is from -0.8V to +2.2V (minimum for externally powered) or +3.0V (minimum for parasitic powered devices). Suffice to say, for long network runs it is advisable to use good quality, low capacitance (<50pF/m) and low resistance twisted-pair cable and do the connectors in the cable in a professional manner. With little practice and effort, it is easy to build a reliable network a hundred meters long with 15 or more externally powered 1-Wire devices.

  1. http://www.avtech.com/About/Articles/AVT/NA/All/-/DD-NN-AN-TN/Recommended_Computer_Room_Temperature_Humidity.htm
  2. http://www.javaworld.com/javaworld/jw-04-1998/jw-04-javadev.html?page=1
  3. http://www.datanab.com/sensors/sensors_1wire.htm
  4. http://www.maxim-ic.com/appnotes10.cfm
  5. http://pdfserv.maxim-ic.com/en/an/AN148.pdf
Posted in Meta | Leave a comment