11/30/14

Storage Spaces - an update

Today I noticed that the StorageSpaces event log is full of dire notifications:


So I tried to move my virtual machines to another host (so I could safely work on the StorageSpace), but was told "Virtual machine migration operation failed at migration source.  Failed to establish a connection with host. No credentials are available in the security package" even though I'd long-since configured constrained delegation with these commands from AidanFinn:

$HostName = "host1"
$HostFQDN = "$HostName.demo.internal"
Get-ADComputer host2| Set-ADObject -Add @{"msDS-AllowedToDelegateTo"="Microsoft Virtual System Migration Service/$HostFQDN", "Microsoft Virtual System Migration Service/$HostName", "cifs/$HostFQDN", "cifs/$HostName"}

Restarting the NETLOGON service on the source Hyper-V server fixed this.

With my virtual machines moved off, I wanted to remove (from the pool) the physical disk that StorageSpaces reported an I/O error on (to run Seagate diagnostics on it)...but because a virtual disk which used that physical disk was in a degraded state, the Server Manager wouldn't let me do that.

In a production environment, you might just pull out the suspected bad drive and put in a new drive, but here I really want to run the Seagate diagnostics on the drive while it was still in the computer case (you could say I'm lazy)...so I deleted the virtual disk, removed the physical drive from the storage space, launched the Seagate diagnostic tool (it was able to see the drive) and started the "Fix All - Long" test (link) which took 3.5 hours and reported the drive as good, so I added the physical disk back into the Storage Space and recreated the virtual disk and volume on it.

Upon trying to move virtual machines back to this host, I got the same error message as above!  In fact, I couldn't even RDP into the VM host by hostname...although doing so by IP address worked.  The solution to both problems was to configure the NIC with an online DNS server.

In the process of re-learning Storage Spaces a bit, I found this nice overview of the technology and this in-depth explanation of how to replace a failed disk.

OK, so life is back to normal.  In the future, I want an email alert when Storage Spaces writes an error or warning into its Windows event log...here are the scripts:

c:\email.ps1
Function Send-Email($EmailSubject, $EmailBody)
{
$Username = "jeremy@comcast.net"
$Password = ConvertTo-SecureString 'MyPassword' -AsPlainText -Force
$cred = New-Object System.Management.Automation.PSCredential $Username, $Password

Send-MailMessage -from VM-HOST1<jeremy@comcast.net> -to jeremy@mydomain.com -subject $EmailSubject -body $EmailBody -smtpserver smtp.comcast.net -port 587 -usessl -Credential $cred
}


c:\monitor-storagespaces.ps1
#Import the function above
. "C:\Email.ps1"

$MyFilter = @{LogName='Microsoft-Windows-StorageSpaces-Driver/Operational';Level=1,2,3;StartTime=(Get-Date).AddMinutes(-5)}

Get-WinEvent -FilterHashTable $MyFilter | ForEach-Object {

    $EmailSubject = "Storage Spaces " + $_.LevelDisplayName + " (" + $_.ID + ")"
    $EmailBody = $_.Message
    Send-Email $EmailSubject $EmailBody
    }


c:\create-task.ps1
$T = New-ScheduledTaskTrigger -Once -At (Get-Date).AddMinutes(5)
$T.RepetitionInterval = (New-TimeSpan -Minutes 5)
$T.RepetitionDuration = (New-TimeSpan -Days 3650)

$A = New-ScheduledTaskAction -execute "Powershell.exe" -argument "-nologo -noprofile -noninteractive -ExecutionPolicy Bypass -File C:\Monitor-StorageSpaces.ps1"
Register-ScheduledTask -TaskName "Monitor Storage Spaces (setup by Jeremy)" -Trigger $T -Action $A -User "NT AUTHORITY\SYSTEM" -RunLevel 1

11/28/14

Misc notes

Beginning on Wednesday night and continuing today, I’ve been working on my home lab setup.  Here are things I’ve been learning.

– You can’t set a “default domain” for email addresses in Office 365 if you’re using AD FS.  Instead, you’re supposed to define a user’s primary email address in the on-prem Active Directory and let Dirsync send it up to the cloud.

– So I set the correct email address for my account in Active Directory, went to bed, woke up the next morning and wondered “Why hasn’t it been updated in O365?”!  Well, for one, the “Forefront Identity Manager Synchronization Service” was stopped on the box that has DirSync installed…so I started it…and then wondered why the status screen had so many “stopped-extension-dll-exception” lines…that was because the username was incorrect for the O365 account that DirSync was trying to use…fixed that…then I *thought* I was executing some manual syncs, but actually wasn’t – here’s the correct sequence of syncs when DirSync is working properly:


– The AD FS sign-on page wasn’t loading after I’d been tinkering with Failover Clustering…found that the AD FS service had been set to manual!  Fixed that.

– Glanced at a current sFlow report from the switch and thought “Why is my laptop sending so much UDP traffic to 10.10.10.22?”!  It turns out that Windows enables SNMP on TCP/IP printer ports so it can see if they’re online or offline…however, you then wind up with this chatty behavior…so I turned off SNMP on all my TCP/IP printer ports.

– Spent a bunch of time tinkering with IGMP multicast in both the NLB cluster for ADFS and inside the HP switch…found it to be temperamental and essentially worthless when I tested it by pulling the plug on a node.  Switched back to basic multicast on a dedicated VLAN…which works great.

– Placed the wireless network on its own VLAN and isolated it with Sonicwall firewall rules…added a rule to allow my laptop’s IP address to pass through to the LAN…created a DHCP reservation for the MAC address of my laptop.

– My laptop’s clock (a domain workstation) had drifted several minutes behind, so I logged into a domain controller and followed instructions to run (syntax of w32tm):

w32tm /config /manualpeerlist:0.us.pool.ntp.org /syncfromflags:MANUAL
w32tm /config /update
w32tm /resync


…this worked fine on the domain controller, but when I ran w32tm /resync on my laptop, there was no change to the clock…why?!  Well, I had assumed that I knew which server is the PDC on my network, but I assumed wrong…by going to ADUC…right-click on the domain at the top, click Operations Masters…I realized that my *other* domain controller is the PDC, and that’s what domain workstations are going to check for their time…so I corrected its time with the commands above and then ran w32tm /resync on my laptop successfully.

11/21/2014 -

Last Wednesday night Dustin or Kalen told me about “NetFlow” which is a traffic reporting tool baked into Cisco hardware.  HP invented a similar traffic reporting tool called “sFlow” which doesn’t report every single packet: its advocates describe this as a more “scalable” approach, its detractors state that this reduces precision.  Anyway, I have a shiny new HP switch on my hands and am eager to learn this sFlow stuff!

This blog post introduces sFlow by comparing it to jelly beans.  This sflow.org essay describes the math behind sampling.  On the switch, I need to configure a “sampling rate” and a “polling interval”.  The sampling rate refers to the switch telling you what 1 packet out of every X packets is; the polling interval is (maybe?) how often that data is pushed to a collector server.

To configure sFlow on my Procurve switch, I found a PDF file called “Traffic monitoring on ProCurve switches with sFlow and InMon Traffic Sentinel”.

sFlow is more efficient than SNMP.  Here’s a guide to sampling rates.

SSH’d to the switch’s IP with Putty and logged in with “admin” and no password.  The HP CLI includes tab-completion and instantly shows command options when you append a ? to any valid command.

First, wanted to enable sFlow:

enable
config
sflow 1 destination 192.168.x.x
sflow 1 sampling all 50 (the lowest allowed value in packets)
sflow 1 polling all 20 (the lowest allowed value in seconds)

…then set the date/time:

time 11/21/2014 17:28
time timezone -480
(for Pacific Standard)

Lastly, saved the running configuration to the startup configuration:

write memory

To view all the sFlow data coming out of the switch, you need software.  In an enterprise, I think I’d like to try “Traffic Sentinel”.  Tonight, I tried out Plixer’s Scrutinizer, but found it too resource intensive and slow.  By contrast, the free Java-based sFlowTrend by inMon works well.  One way of viewing the traffic on your switch(s) is with a “network circle” and that made me wonder why my computer was connecting to interesting hostnames ending in 1e100.net.  Here’s why.

11/20/2014 -

This evening I received a Sonicwall TZ 105 ($193).  Created a MySonicwall.com account which let me immediately register the device, but it took several minutes before I could login to the web portal to download firmware.  It appears that I’m licensed to download firmware updates for only 90 days.  Applied latest firmware to the Sonicwall (5.9.6).

Applied latest firmware to the HP Procurve switch (YA.15.16), downloaded from here.

8/25/2014 -

Recent projects at work:
  • Migrated a metal-roofing company’s email to Office 365 (10 users) and virtualized their servers onto new hardware.  Installed Veeam with High-Rely drives.
  • Migrated a concrete washout company’s email to Office 365 (28 users).
  • Installed a Bluesocket wireless access point for a winery (it was plagued by so many delays that I bought flowers for the manager as an apology).
  • Moved the servers of a company that monitors vibrations for the Alaskan Way Viaduct.
  • Troubleshot dropped VoIP calls for an Alaskan seafood company.
  • Audited Microsoft licensing compliance at a flooring company.
  • Upgraded Asterisk to fix poor call quality for a property management client.
  • Setup an automatically-deployed SSL VPN for a satellite company.