Fixing the Western Digital excessive load/unload problem with idle3-tools

I have an always-on machine (self made Ubuntu 16.04 based using retail parts) that is lightly loaded in CPU but is continually writing to the hard disks. I recently added a Western Digital 2TB drive ( WDC WD20EZRZ-00Z5HB0 ) about 2 weeks ago (368 hours) and noticed that the Load/Unload cycles was over 30,000.
The other two drives, both WD Blue 1TB ( WDC WD10EZEX-08M2NA0 ) showed up with less than 100 on 15,000 hours so something was very wrong with this new drive in my system.
The fix is to adjust the wdidle3 setting from its default of 80, which means 8.0 seconds, to a bigger number such as 129, which means 30 seconds.

The number on newer drives is not just divided by 10 but is staggered scale so 1-128 is divided by 10 but 129-255 is in 30 seconds increments (129 = 30sec, 130 = 60sec and so on) for newer drives but it is just divided by 10 for older drives. I do not know what is deemed new or old manufacturing date for WD drives.

You can read and set the wdidle3 parameter using hdparm -J option but it has a very cautious warning message as the option is experimental.

You can read the S.M.A.R.T value for the Load/Unload using smartctl from the command line, (or GSmartControl using a GUI) and the ID is 193 thus,

sudo smartctl -A /dev/sdc | grep "^193"

I get…

193 Load_Cycle_Count 0x0032 190 190 000 Old_age Always - 31283

where the 31283 is my current, but now stable, excessive count.

To set the parameter I downloaded and used the idle3-tools from http://idle3-tools.sourceforge.net/ This is a small program that you just untar and make or you can install it with,

sudo apt-get install idle3-tools

I set my drive with…

sudo idle3ctl -s 129 /dev/sdc

…which for new drives means 30 seconds ( and -s 130 would mean 60 seconds and so on) . After you set the value in the drive you must turn off the power to your computer (drive) as the value is read at power up. So do a shutdown and then power off.

I have no idea why Western Digital do what they do but it is not that friendly unless you delve deep. Reading the WD support forums the problem is from many years ago and is still taking place today. I expect to be able to install a disk and not have it artificially age itself but adapt to the load. Equally I have no idea yet if the 30 second value I have set is a better compromise. It has stopped the value increasing and if my programs stall then the disk will unload reasonably early.

yarn install fails as it is using cmdtest package and not yarn

yarn was previously provided by cmttest and these are very different programs to yarn the package manager.  If you are using software sources (e.g. FarmBot.io)  that are managed through yarn (you will find a yarn.lock file with the software source) but when you try and run yarn install and then you get e.g.

yarn install
ERROR: [Errno 2] No such file or directory: 'install'

then firstly check to make sure that cmdtest is not installed.

sudo apt remove cmdtest

You would install yarn with

sudo apt-get install yarn

See the yarn packager web site for more details on installation.

Acer Aspire ONE no WIFI in Ubuntu due to hardware switch state

Intermittently when an Acer Aspire ONE suspends in Ubuntu 14.10 then the WIFI does not come back. The hardware switch (a non-latching slider switch on the front right hand side of the laptop) has no effect. Rebooting and disabling/enabling Networking has no effect.

The rfkill list command will show,
0: phy0: Wireless LAN
Soft blocked: no
Hard blocked: yes

One way I found to clear this is to power down the laptop and then hold the WIFI switch to the right i.e. in the on position and then use the power switch to turn on the laptop as normal but keep the WIFI switch held on. You should see the little orange WIFI  LED blink once and once the laptop is starting to boot up after the BIOS display then release the WIFI switch so it flips back to the left/off position.

The WIFI should be back to normal now and the rfkill list will show Hard blocked: no.

Older Transmission-gtk stops working after Ubuntu 15.04 upgrade

Upgraded my development machine from Ubuntu 14.10 to 15.04 and found an odd quirk with Transmission-gtk. The symptom was that Magnet links would not load from Firefox even though magnet is associated with transmission-gtk (to see this go to about:preferences#applications in Firefox and search for magnet)

I ran the transmission-gtk from a terminal and got,

transmission-gtk: error while loading shared libraries: libminiupnpc.so.8: cannot open shared object file: No such file or directory

This was strange as the packages depended upon libminiupnpc.so.10 so I then did a which transmission-gtk and it was /usr/local/bin/transmission-gtk and not the expected /usr/bin/transmission-gtk and I then remembered that I had manually installed Transmission version 2.83 on top of the older package manager version because the older package manager version of Transmission-gtk  in Ubuntu 14.10 would intermittently crash. As an aside if you really want the manually installed 2.83 version to run in Ubuntu 15.04 then you can symbolic link the relevant library with this,

cd /usr/lib/x86_64-linux-gnu
sudo ln -s libminiupnpc.so.10 libminiupnpc.so.8
sudo ldconfig -v

I eventually decided to uninstall the manually entered version by going to my source built directory and doing a sudo make uninstall

After you have uninstalled the manually entered version then a which transmission-gtk should return /usr/bin/transmission-gtk

The Transmission version with Ubuntu 15.04 is now 2.84 (onwards).

Once you have fixed it that the transmission-gtk can launch then clicking magnet links in Firefox now works.

Opera slow Flash due to multiple plugin locations enabled

I use Opera as one of a number of browsers for testing purposes. Noticed that Flash was jumpy (Opera 12.16 with flash 11.2 r202 on Ubuntu 14.04 64 bit). It took a while to find the problem but I believe that it was due to multiple flash plugins enabled. To see what plugins are enabled go to,

Opera -> Page -> Developer Tools -> Plug-ins

(or the shortcut URL of opera:plugins )

If you see multiple locations of the Shockwave flash enabled then disable all except one e.g. leave the one located at,  /usr/lib/mozilla/plugins/libflashplayer.so enabled.

Restart Opera and hopefully this may clear your problem.

memtest86+ cannot load a ramdisk with an old kernel image

This error happens when you use UNetbootin to create an Ubuntu disk and it incorrectly adds a ramdisk to the memtest86+ boot option.

Until UNetbootin fix their code then cursor down to the “Test memory” option and hit tab and then at the boot options remove the “initrd=/ubninit” so that the command line is now just…

/install/mt86plus

and then hit enter and Memtest86+ will now run as expected.

My Ubuntu 14.04 currently has UNetbootin 585-2ubuntu1 and this quirk will possibly be fixed in newer releases but sometimes all you have lying around is an emergency install USB/disk so always good to know how to get around  a problem rather than downloading new code.

Broken upstart causes Internal Error, No file name for udev

I was upgrading x2goserver and it stalled on * Cleaning up stale X2Go sessions. This is a normal log message within the /etc/init.d/x2goserver start() and it then runs x2gocleansessions after it logs this message. There shouldn’t have been any problems with this but it was just stuck there so I killed the dpkg and then I retried to add or remove anything but found that udev would not configure e.g. when I did sudo apt-get autoremove then I got,

Setting up udev (175-0ubuntu13.1) ...
invoke-rc.d: unknown initscript, /etc/init.d/udev not found.
dpkg: error processing udev (--configure):
 subprocess installed post-installation script returned error exit status 100!

Within synaptic when I tried to re-install udev then I got,

E: Internal Error, No file name for udev:amd64

The trick is that you can’t just re-install udev but must also re-install upstart.

This is because udev files link to upstart files and it is possible that a broken install has udev pointing at /etc/init.d/udev but that file is a link to /lib/init/upstart-job but the upstart is missing for some reason.

There may be other packages that have this kind of dependency e.g. winbind ufw squid3 and so on and certainly the x2goserver didn’t want to start properly. If you look in the /etc/init and see broken links to /lib/init/upstart-job then your problem should be fixed if you re-install upstart first.

As an aside after the upstart and udev was all cleaned up then the x2goserver removal and installation then worked.

Fixing held packages after repository change (apt-get upgrade verses dist-upgrade)

This example shows a fundamental difference to remember between apt-get upgrade and apt-get dist-upgrade

The example is that I altered the repository for the 0ad game package from the default to the ppa for the 0ad (i.e. sudo add-apt-repository ppa:wfg/0ad) . I then did a sudo apt-get update and then I did a sudo apt-get upgrade 0ad

Nothing wrong with that except that there is a new package dependancy for the 0ad game and this new to be installed package causes apt-get upgrade to hold the package upgrade.

To fix this you use sudo apt-get dist-upgrade instead. It will ask if you want to upgrade the packages and more importantly ask if you want to install any new packages rather than just holding them and not doing anything.

Mailfilter fails to POP timestamp in message-ID invalid

Mailfilter POP timestamp in message-ID invalid and potentially also causes mailfilter to stall at 100% CPU.

I have a fetchmail daemon that call mailfilter as a postconnect (defined in the .fetchmailrc file). I got the following error message,
mailfilter: Examining 297 message(s).

mailfilter: Error: POP timestamp in message-ID invalid.
mailfilter: Error: Parsing the header of message 292 failed.
mailfilter: Error: Scanning of mail account failed.
mailfilter: Error: Skipping account xxxxxx@example.com@mail.example.com due to earlier errors.

and I noticed that the mailfilter process was running at 100% CPU though that may be unrelated.

I found that mailfilter 0.8.3 has a new option of -i to ignore POP timestamp. This is probably what I want to make this more stable.

As I’m adding this to a Parallels based server it is unlikely to have this version of mailfilter as this package has very little development activity as it is a stable application so I had to build from source.

My server didn’t have svn so I browsed the Sourceforge svn for mailfilter on my local PC and at the bottom there is a linkfor “Download GNU tarball” I copied that link and then pasted into my console and used wget to get this latest tarball from Sourceforge. It has a odd name so moved that to a tar.gz file name e.g. mv index.html\?view\=tar mailfilter.0.8.3.tar.gz and then ran tar xvfz mailfilter.0.8.3.tar.gz

Perquisite packages for building,

  • g++
  • bison
  • flex
  • libssl-dev

There may be more but that is the ones I needed to add to my server.

Making this you cd to the mailfilter directory and then run

./autogen.sh
make
sudo make install

If that works then this’ll probably install the mailfilter to /usr/local/bin/mailfilter so now in the .fetchmailrc change the postconnect line to have,

postconnect ''/usr/local/bin/mailfilter -i ''

use double quotes if you pass the new -i option to ignore timestamps. The -i is a new feature in Mailfilter 0.8.3 (not in 0.8.2).

Killall the mailfilter and fetchmail and then re-launch your fetchmail daemon.

Errors

Missing g++

If you see,

checking whether the C++ compiler works... no
configure: error: in `/root/sources/mailfilter':
configure: error: C++ compiler cannot create executables
See `config.log' for more details.

Then check the log file and look for g++ line e.g.

configure:2879: checking for g++
configure:2909: result: no

Do a g++ and if it comes back with -bash: g++: command not found then install the g++ package and then it will work. Re-run the ./autogen.sh and then make

Missing bison

If you get the make fail and you can see /bin/sh: yacc: command not found then you need a YACC of some kind. I installed bison re-run the ./autogen.sh and then make

Missing flex

If you get an error in the make e.g. it crashes out with g++: rcfile.cc: No such file or directory

g++: no input files
make[2]: *** [rcfile.o] Error 1

then check back and see if you see error: FlexLexer.h: No such file or directory . If so then check flex is installed. Install and then re-run ./autogen.sh and then make

Missing openssl header files
If you see openssl/ssl.h: No such file or directory and similar openssl/rand.h: No such file or directory then you need to install libssl-dev. Install that package and then re-run ./autogen.sh and then make.

After a month and a few weeks of use it has been stable. The erroneous timestamps are also suspected to stall Outlook 2003. I pick up emails in parallel to my customer to provide emergency support when they are on holiday. As Microsoft support for both XP and Office 2003 is finishing in April 2014 I plan to migrate the customer to a newer OS and Office version.

Ubuntu 10.10 package download of large files can fail with OverflowError: signed integer is less than minimum

I was doing a distribution upgrade on a 10.10 system to 11.04 via do-release-upgrade. The system has games installed, so the total files to download are over 2.5 Gigabytes e.g. games like Nexuiz has a data file that is about 273 Megabytes. The Internet access is low speed broadband (about 70 KBytes per second download maximum) with other machines using this ADSL line so that’s about 10 hours for the whole release.

The  do-release-upgrade downloads can fail on these larger files on congested lines at and if you look at the log file, i.e. tail /var/log/dist-upgrade/main.log ,  it will say something like,

  File "/tmp/update-manager-HXahEI/DistUpgradeViewText.py", line 42, in pulse
    apt.progress.text.AcquireProgress.pulse(self, owner)

  File "/usr/lib/python2.6/dist-packages/apt/progress/text.py", line 164, in pulse
    apt_pkg.time_to_str(eta))

OverflowError: signed integer is less than minimum

If you look at that code in /usr/lib/python2.6/dist-packages/apt/progress/text.py (it’s Python) around line 161 onwards and think about what can happen on …

            eta = int(float(self.total_bytes - self.current_bytes) /
                      self.current_cps)

There are a number of issues here with this. That eta value isn’t checked before it is passed to the apt_pkg.time_to_str() and that’s not good because,

1) I think the self.current_cps can be a float less than 1 and as the size of an int on this system (64bit Athlon with Python 2.6.6 ) is,

>>> import sys
>>> print sys.maxint
9223372036854775807

then the eta could be quite large e.g. if that maximum value was seconds and converted to years would be just under 300 billion years.

2) But the actual error is “signed integer is less than minimum” not an overflow of a maximum so this bug seems to be about some magic number. Now if you enter in the python program,

>>> import apt_pkg
>>> print apt_pkg.time_to_str(-1)
213503982334601d 7h 0min 15s

and if you enter in other odd numbers then you can trigger the “signed integer is less than minimum” e.g. see these examples,

>>> apt_pkg.TimeToStr(-2147483648)
'213503982309746d 3h 46min 8s'
>>> apt_pkg.TimeToStr(-2147483649)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: signed integer is less than minimum

so it clearly can have values over 24855 days i.e. apt_pkg.time_to_str(2147483647) so looks like there are some odd boundaries that cause OverflowError: signed integer is less than minimum as well as a OverflowError: signed integer is greater than maximum. I ran a loop that incremented an integer by 1 and gave up and Control-C it when it went through 836477585 which is 9681d 10h 53min 5s so if there are any boundary conditions it’s not obvious. I suspect that the garbage is with what is fed to the TimeToStr() and not a flaw in TimeToStr(). I suspect that if the download process resets itself then the file size self.total_bytes is temporarily nonsensical e.g. 0 whilst the program resets the download.

So that code section needs sanity limits on the eta because looks like we can’t trust any of self.current_cps to be reasonable or self.total_bytes to be accurate but I think self.current_bytes may always be fine, e.g. I changed line 164 end =… to have some range checking,

            if eta < 0:
                end = " %sB/s ~%s" % (apt_pkg.size_to_str(self.current_cps),apt_pkg.time_to_str(0))
            elif eta > (30 * 24 *60 * 60):
                end = " %sB/s >%s" % (apt_pkg.size_to_str(self.current_cps),apt_pkg.time_to_str(30 * 24 *60 * 60))
            else:
                end = " %sB/s %s" % (apt_pkg.size_to_str(self.current_cps),
                                 apt_pkg.time_to_str(eta))

where the 30*24*60*60 means 30 days but there are other ways of doing this e.g. check  self.total_bytes is greater or equal to self.current_bytes or limit eta to a range and then keep existing calculation.

Once you edit that file you can simply restart the do-release-upgrade console and it will use your new code on the fly.

The bug is in other distributions e.g. see this bug report https://bugs.launchpad.net/ubuntu/+source/update-manager/+bug/884625 but not fixed. The problem is that the apt_pkg.time_to_str(), which is actually apt_pkg.TimeToStr() is probably being passed nonsensical values for self.total_bytes due to file downloads being reset for large and/or poor circuits. The apt_pkg.TimeToStr() should also reasonably handle negative times and not display nonsense but that’s another problem.

Recovering from power loss-interrupted Ubuntu distribution upgrade

Was doing an Ubuntu maverick (10.10) to natty (11.04) upgrade on a test system (desktop) and the power was lost. No UPS. On reboot it didn’t come back properly and I got,

The disk drive for / is not ready yet or not present

As far as I know this was part way through the package installations. If it fails during the package download then there is never any problem but with a part-install then parts of the system are running the new version and parts are on the old distribution.

The fix is fairly easy – at the prompt above then type M to get to the manual recovery.

The disk will be mounted as read-only so then remount this,

sudo mount -o remount,rw /

then try these commands,

sudo apt-get update
sudo dpkg --configure -a

then reboot and then the system should come back in a partially upgraded but stable state and you can continue with the distribution upgrade with,

sudo apt-get upgrade -f
sudo apt-get dist-upgrade

After the reboot it will be on the new distribution. Note that if a package is corrupted then you may need to delete that one package file from the /var/cache/apt/… location and re-run the sudo apt-get upgrade -f command. It is a pity the distribution upgrade process doesn’t have this kind of logic built in to facilitate unattended completion or recovery of a partial distribution upgrade.

Other useful commands that are related but not part of the repair: To start off the upgrade process (which was interrupted by the power supply problem),

sudo do-release-upgrade

To verify what version you are running,

lsb_release -r

Enabling IPv6 in Ubuntu ufw

I was creating a new web site dogstarplanet.com and as I was installing it on my IPv6 enabled host I thought I would setup the A and AAAA records for the same CNAME.

Windows based PCs without any IPv6 routing obviously ignore any AAAA records and the browser connects to the site as expected but an Ubuntu desktop I was using was unable to get to the site – both Firefox and Opera not connecting.

I loaded Wireshark to see if my traffic was leaving and though I could see the DNS queries for AAAA and A records there was no TSP traffic (Tunnel Setup Protocol) to the freenet6.net IPv4 address  (I’m using gogoc package out of the box). This means that the browser connection was not getting to the tunnel interface. This means firewalling or kernel.

If I run the Firestarter then I also see the tun (routed IP tunnel) but no traffic passes (note: that I have since removed Firestarter and now run Gufw).

Well the IPv6 is in the kernel but I had ufw enabled and that doesn’t have IPv6 enabled by default so you get the error message if you try and use ping6 of e.g.

ping6 ipv6.google.com
 ping: sendmsg: Operation not permitted
 ping: sendmsg: Operation not permitted
 ping: sendmsg: Operation not permitted
 ...

If it is safe you can quickly test this is your problem by turning off the ufw with the command,

sudo ufw disable

Now your ping6 should work. If it does not then you have a tunnel problem. Use the command netstat  -rn6  to see if you have tun entries.

It is easy to enable IPv6 in ufw by editing /etc/default/ufw and towards the top there is a line of IPV6=no which you change to IPV6=yes

Save that and then disable and then enable the firewall i.e. sudo ufw enable or do a sudo ufw reload if it was still running.

Now you will be able to ping6 and connect to IPv6 enable hosts using a browser. Note that when you ping6 then there is a PTR query (that you would only see in wireshark) and you may get a no such name response if you have not configured your host DNS records right so if you are committed to setting up IPv6 on your host then please check you have added a suitable DNS PTR entry for the dotted nibble PTR part of your IPv6 address. Very few protocols, perhaps only mail connections and obviously ping6, use IPv6 PTR queries.

Ubuntu

We are currently Ubuntu fans. Before I was a Mandrake/Mandriva fan and before that a RedHat fan and before that a Slackware fan…..

What have I noticed in the past 14 years of using various GNU/Linux distributions ?  Three things:

  1. The distributions evolve best by doing better than their competition within their niche in the GNU/Linux world not competing against the Windows world. Make it simple for  applications to provide equal cross-platform  performance and by proxy then GNU/Linux competes with Windows.
  2. My children don’t actually care what distribution they can play games on. They know that certain applications run well in Windows, others are only found on Ubuntu (Linux) and that browser-based games are more or less cross-platform.
  3. New installations and upgrades have got easier and faster but as Familiarity breeds contempt, this means that it is trivial to wipe a machine, clean install a new distribution that you got from a Torrent if something has annoyed you with your current favourite distribution.

IBM is 100 years old this week in June 2011. Will Open Source technology such as GNU/Linux be with us in 100 years time ? You can bet your bottom fiat currency it will.