Re: GroupStudy Server crash

From: Elliott Reyes <fontananetworkengineer_at_gmail.com>
Date: Thu, 12 May 2011 15:35:26 -0700

We should take up a collection to help paul out

On Thu, May 12, 2011 at 3:32 PM, Haroon <itguy.pro_at_gmail.com> wrote:

> Paul,
>
> Thats some ordeal... sorry to hear!
>
> Your backup strategy is great.... I also use rsync to backup over 110GB to
> remote servers twice a day but my stuff (linux server, mysql DBs, etc.) is
> not on vm.
>
> ESXi 3.5 comes with very little tools for recovery.
>
> Have you tried enabling SSH on esxi??? That may give you more control and
> go
> under the hood as far as vmware is concerned...
>
> As far as your datacenters, what datacenter do you use in Dallas?
> Softlayer?
> Instead of copying 200GB to your house (residential cable/dsl?), I can
> recommend a vendor that I use to backup data right in dallas datacenter
> using rsync.
>
> Please keep us posted.
>
> Thanks,
>
> Haroon
>
>
> On Thu, May 12, 2011 at 2:00 PM, ccieagent <ccieagent_at_verizon.net> wrote:
>
> > Paul,
> > Good to see you got it working again. I was beginning to wonder if the
> > pressure of our joining GroupStudy and the CCIE Flyer was going to
> happen!
> > LOL
> > Sorry to hear about your ordeal. Talk to you soon.
> >
> > -----Original Message-----
> > From: nobody_at_groupstudy.com [mailto:nobody_at_groupstudy.com] On Behalf Of
> > Paul
> > Borghese
> > Sent: Thursday, May 12, 2011 4:40 PM
> > To: ccielab_at_groupstudy.com
> > Subject: GroupStudy Server crash
> >
> > On Monday one of the GroupStudy servers in Atlanta had a catastrophic
> disk
> > failure. This brought down the entire site and every mailing list. The
> > good news is this list is impacted only minimally from the failure and
> > should (may?) be back to normal shortly. I apologize for any
> > inconvenience.
> >
> > Since this is a technical list I will go into the details for those that
> > are
> > interested. Please comment if you have any suggestions. On Monday I
> > decided it had been too long since I backed up the Atlanta server, which
> > is now running as a VM on a VMWare ESXi 3.5 server. At one time I had
> an
> > elaborate backup mechanism where every evening critical database file
> were
> > copied to a directory then rsync ed to a linux server at my house. But
> > over
> > the last two years we moved a number of times and the remote server is
> > still
> > packed away in a storage unit.
> >
> > I decided since it was a VM I could simply scp the .vmdk file to my
> house,
> > thus creating a perfect backup. The problem was the VM contained
> > multiple 200 GB files. I noticed part of the storage size was caused
> > because a snapshot had been taken on the GroupStudy server. To reduce
> > the size of the backup, I decided to delete the snapshot. This action
> > should have merged the snapshot with the primary. But instead it removed
> > all .vmdk disks and left me with a .vmdk file that was simply 26k
> (instead
> > of 200+ GB)!! Oops. The GroupStudy disk was totally destroyed.
> >
> > If the VMWare server was running on another OS, I could have simply gone
> in
> > with an undelete program and tried to recover the files. But VM ESXi
> runs
> > a
> > propriety locked-down OS with very few tools. I hate to say this, but my
> > last backup of the server was made two years ago, just before I
> > moved (yea I know but hey it is a hobby). I called a number of
> people
> > asking for advice. I would like to thank in particular Darby Weaver who
> > found a VMWare guy who was quite knowledgeable. But even he was stumped.
> >
> > I decided to take it slow and not do anything that may prevent a
> recovery.
> > In the evenings (I still have a day job :-) ) I started reading
> everything
> > I could find about VM ESXi. I have now read more VMWare
> > knowledge base articles then I care to admit. I am thinking about
> seeing
> > what certifications VMWare offers and simply taking the test.
> >
> > VMWare does offer support, for $300/call, which I really did not want to
> > spend. But it was obvious I was not getting anywhere trying to figure it
> > out myself. So I reluctantly plunked down my credit card. In my
> > research I found VM did offer an undelete program for the ESX platform.
> I
> > was hopeful my support request would, at minimum, give me access to the
> > undelete program for ESXi. Or maybe some internal use only magic VMWare
> > has
> > in their back pocket. But in all honestly, they were not much help.
> > The VMWare tech support is not that great. Cisco TAC will escalate your
> > problem until you are fixed. VMWare support seems to be for people that
> > don t know how to read manuals. In all fairness, it could be that my
> > $300 support call is not going to the same people that handle high paying
> > corporate outages. VMWare suggested I call a data recovery service.
> The
> > data recovery service said this happens all the time and they could
> > recover the data for $3-$5k. I simply do not want to spend that kind of
> > money.
> >
> > The VMWare server contains two 1 GB disks, a primary and extra disk.
> > The original GroupStudy VM was running on the primary. I used the
> > two-year-old backup disks to create a new GroupStudy VM on the extra
> disk,
> > thus preserving the primary to the best of my ability. Of course the
> > backups were created before migrating to VMWare, so none of the kernel
> > drivers worked out of the box.
> >
> > After fixing the kernel and initrd boot files, the GroupStudy website has
> > been restored literally to the date of the Obama Inauguration. So
> welcome
> > back to January 2009 (quick buy Apple Stock and gold!). With regards
> > to the CCIE List, this actually has less impact then you would think.
> The
> > actual mailing list is running off a Linux server in Dallas, and has been
> > unaffected. What we lost was two years of archives. But I may be able
> to
> > get them back as the Dallas server has copies of the archives in a MySQL
> > DB.
> > If they are complete, I can simply write a Perl script to extract the
> > archives to a format the website can use.
> >
> > Bu there is other lists that are affected more and frankly being a techie
> I
> > hate to give up. We know the data is most likely still on the disk. We
> > just need to find it. I feel with the disk, a hex editor, and some
> > voodoo I could recover the data I needed. Frankly I only need one of the
> > backup files that was created daily, not the entire disk. The problem
> is
> > the Primary disk is where the VMWare OS is located, so I can t simply
> > remove it. And it currently resides in a data center in Atlanta, a 10
> > hour drive from my house!
> >
> > So now I am trying to MacGyver my way to the disk. The extra disk
> > contains about 400 GB of free space. It turns out VMWare does offer disk
> > dump and gzip on the ESXi platform. I am disk dumping the entire primary
> > hard drive to the extra drive, using gzip to compress the data. I am
> > praying for a much better then 2:1 compression ratio! If that works I
> will
> > download the dd file and restore to another 1 TB hard drive, thus
> > creating a copy of the primary drive. Then I need to figure out the
> > VMWare partition tables and vmdk disk formats. If the 400GB of free
> > space is not enough, I am considering mounting an Amazon EC2 NFS server
> on
> > the VMWare file system and trying again. I also called the VMWare
> support
> > engineer (that poor guy) and asked him to send me any documentation he
> can
> > find about the VMFS and VMDK file structures. I also found an open
> source
> > VMFS driver (http://code.google.com/p/vmfs/) that may be of use.
> >
> > So if you have any suggestions, please send them to me! No matter how
> bad
> > it got, I kept on thinking at least I am not Sony!
> >
> > Paul Borghese
> >
> >
> > Blogs and organic groups at http://www.ccie.net
> >
> > _______________________________________________________________________
> > Subscription information may be found at:
> > http://www.groupstudy.com/list/CCIELab.html
> >
> >
> > Blogs and organic groups at http://www.ccie.net
> >
> > _______________________________________________________________________
> > Subscription information may be found at:
> > http://www.groupstudy.com/list/CCIELab.html
>
>
> Blogs and organic groups at http://www.ccie.net
>
> _______________________________________________________________________
> Subscription information may be found at:
> http://www.groupstudy.com/list/CCIELab.html

Blogs and organic groups at http://www.ccie.net
Received on Thu May 12 2011 - 15:35:26 ART

This archive was generated by hypermail 2.2.0 : Wed Jun 01 2011 - 09:01:11 ART