Posts

Showing posts from 2017

ESXi host not reachable on Network – How to Troubleshoot

Image
This post is going to explain you the troubleshooting experience of one of my recent issues which was caused by Pause Flood issue on HP Virtual Connect. Possibly All VMware Administrators will aware about the basic network troubleshooting like try to reach the host via ping, check for Physical NIC failures, Cable connectivity, and switch port failures or even router failure. This post is not going to explain you with this procedures for basic troubleshooting I got a alert from the monitoring team for one of the ESXi host is not reachable on the network.  I thought may be PSOD (Purple Screen of Death) on host. I assume to reboot the host and fix the PSOD. When i connect to the ILO of my ESXi host, Host was Up and i tried to reach via ping but it is not reachable. I suspect issue could be problem with the Network adapter but it is not. Again thought to check the physical cabling of the host .That is also good. I checked with network team for switch port failures and it is also good.

Troubleshooting an ESXi/ESX host in non responding state

Error:  Unable to access the specified host, either it doesn't exist, the server software is not responding, or there is a network problem Resolution Validate that each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. After each step, try to connect to vCenter Server. Do not skip a step. VMware ESX/ESXi host that is in a Not Responding state Verify that the ESXi host is in a powered ON state. Verify that the ESXi host can be reconnected, or if reconnecting the ESXi host resolves the issue. Verify that the ESXi host is able to respond back to vCenter Server at the correct IP address. If vCenter Server does not receive heartbeats from the ESXi host, it goes into a not responding state. Verify that network connectivity exists from

VPXD, VPXA and HOSTD

VPXD-It is Vcenter Server Service. If this service is stopped then we will not able to connect to Vcenter Server via Vsphere client. VPXA-It is the agent of Vcenter server. also known as mini vcenter server which is installed on the each esx server which is managed by Vcenter server. What are the management action we are performing on top of the vcenter server. (Like:- Increasing/Decreasing RAM & HDD, Making any type of changes in cluster, doing vmotion. This agent collects all information from the vcenter server and pass this information to the kernal of the esx server. HOSTD- This is the agent of ESX server, here VPXA pass the information to the HOSTD and hostd pass the information to ESX server. Hope this one give you clear explanation about the difference in between each of them.

"I copied it" VS "I moved it"

To make a backup copy of a virtual machine created with VMware, just copy the folder to another location. When you power on the copy (open the VM), you will be asked if you have moved the virtual machine or copied it. Select that you "Moved It". This will keep all of the setings the same. If you select the "Copied It" option, a new UUID and MAC address will be generated, which could cause Windows Activation to come up and can also cause Linux machines to have problems with the Ethernet devices. You will not be able to use a backed-up virtual machine at the same time as the original because you will have a MAC Address conflict on your network. If you need to be able to run the backed-up virtual machine at the same time as the original, you should make a copy using the "Copied It" option.  OR.... You can run a backup script through the vmware consolidated backup framework

Configuring vSphere Replication fails with the error: Target disk UUID validation failed

Symptoms Cannot configure vSphere Replication Configuring vSphere Replication fails with the error You see the error: Target disk UUID validation failed Purpose This article provides steps to resolve the issue when configuring vSphere Replication fails with  the Target disk UUID validation failed  error. Cause vSphere Replication allows you to copy your virtual disk files to the remote datacenter and point those as replication seeds during configuring replication to avoid network bandwidth consumption. vSphere Replication compares the differences and replicates only the changed blocks. This error occurs if the virtual machines disk UUID is different than the source disk.   This issue occurs while performing common procedures that normally cause this issue: Cloning the source virtual machine from vCenter Server and then transferring the files to the DR site If the copied virtual machine was registered manually and powered on with virtual machine answer  I copie

The available memory resources in the parent resource pool are insufficient for the operation

Image
I have deployed the new VM. But unable to power ON the virtual machine. Its showing the below message This issue is caused to high memory reservations set on few virtual machines. The memory is being reserved only for those machines even if it’s not being used; hence the new virtual machine does not have enough memory to power on. Hence the suggestion would be that the reservation for those virtual machines be reduced, so that some memory would be made available for the new virtual machine to power on. Please refer to the vSphere HA Guide available from the following URL on Page 20 for more information about the Admission Control Slot Calculation.  http://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-55-availability-guide.pdf

"Unable to apply DRS resource settings on host" error in vCenter Server

If the problem persists after restarting the management agents:         1.         Put the host into maintenance mode. Note : If DRS does not migrate the virtual machines to other hosts, migrate them manually.         2.         Verify that EVC is enabled on the cluster.         3.         If EVC is enabled, navigate to the  Edit Settings  of the cluster.         4.         Click  VMware EVC  and confirm that it is currently enabled and has the proper setting for the cluster.         5.         Ensure there are no virtual machines running on the host.         6.         Remove the ESXi/ESX host by dragging it out of the cluster.         7.         Re-add the ESXi/ESX host by dragging it into the cluster.         8.         Exit from maintenance mode. To work around this issue, run the  rm /etc/vmware/hostd/pools.xml  command on the ESXi/ESX host, then restart the management agents ( hostd  and  vpxa ) on the ESXi/ESX host. If the issue persists, remove th

HA agent on the host failed – what means the different possible messages?

Image
It’s been long time since I wanted to write for myself the different possible states of HA agent messages, when HA agent on the host failed. So If you find it elsewhere, don’t worry, it’s possible as there are many VMware resources out there with many competent folks. For my own study purposes I need to have it written on my website. There are different possible states that can occur with the VMware vSphere HA agent. What means different possible messages when one of the hosts inside of HA cluster fails? Depending on the failure scenario there are different alert messages showing. Alerts concerning the HA agent like:  Network Partitioned state, Agent unreachable state, Host failed state, network isolated state, uninitilization/initialization state, host failed state. … While some of them are quite self-explanatory, I will try to focus on those which aren’t so clear. In article  How to configure VMware High Availability (HA) cluster  I walks you through the components which are part o

ESXi : Lost uplink redundancy on virtual switch "vSwitch0". Physical NIC vmnic0 is down

We had an issue suddenly getting an alert : Lost uplink redundancy on virtual switch "vSwitch0". Physical NIC vmnic0 is down. Affected portgroups:"vMotion", "Management Network". So as a troubleshooting Select the ESXi host effected Click on Configuration, Select Networking, Click on vSwitch0 Properties,Click on Network adapters Select on vmnic0, Click on remove, Click on Close Again Select Networking, Click on vSwitch0 Properties,Click on Network adapters, Click on Add, Select the vmnic0, Click on Next, Click on Next, Click on Finish Now Click on Ports tab of vSwitch Properties, Select on vMotion, Click on Edit, Click on NIC Teaming, Select on vmnic1, Click on Moveup button  to Active Adapters, Select on vmnic0, Click on Move Down to Standby Adapters, Click on OK Now Click on Ports tab of vSwitch Properties, Select on Management Network, Click on Edit, Click on NIC Teaming, Select on vmnic1, Click on Moveup button  to Active Adapters, Select o

"Host IPMI system event log status" alarm in vCenter Server

To resolve this issue, ensure that the NTP settings are correct on the ESXi host, clear the IPMI System Event log file, and reset the sensors.   To clear the IPMI System Event log file and reset the sensors: Open vCenter Server using the vSphere Client. In the vCenter Server Inventory, select the ESXi/ESX host. Click the  Hardware Status  tab. Click  System Event log  under View. Click  Reset Event   Log . The red alert is removed from the System Event log. Click  Reset Sensors  to reset the host sensors. Starting with ESXi 5.1 U2, 5,5 P01 and later, you can use the  localcli  command line to clear the IPMI SEL logs. To clear the IPMI SEL logs in ESXi 5.1 and later: Connect to the ESXi host through SSH.  Run this command: localcli hardware ipmi sel clear If you find an incorrect date and if you are unable to reset the logs: In ESX - Restart the management agents and  sfcbd-watchdog . To restart  sfcbd-watchdog , run this command from the ESX console: /etc

Upgrade vCenter Agent task in progress – vCenter 5.0 Update1

There is a common issue in vsphere 5.0 update1 stating “Upgrade vCenter Agents on cluster hosts is IN Progress” without any status change even after few days. Even after the Management agent, host and vCenter server restart status remain the same. The issue is related to AgentUpgrade.autoUpgradeAgents parameter which is set to False by default and vCenter keeps trying to upgrade the VPXA agent on the hosts. Changing “AgentUpgrade.autoUpgradeAgents = True ” and restarting the vCenter server or vCenter services would fix the problem and status would be showing normal. To Change the settings 1. Login to vCenter server 2. Click Administration > vCenter Server Setting > Advanced Settings 3. Change AgentUpgrade.autoUpgradeAgents to “True” 4. Restart vCenter services/vCenter server

Raw Device Mapping option is grayed out

Need to disable storage I/O and rescan datastore. There will not be LUN available to map as RDM.

Troubleshooting VPX_HIST_STAT table sizes in VMware vCenter Server 5.1

In previous versions of vCenter Server, the vCenter database contained four VPX_HIST_STATx tables where past Day, Week, Month and Year performance statistics were collected and stored. In vCenter Server 5.1, performance metrics are now stored with multiple dynamic tables. To create a temporary table that contains the table sizes of all tables in the vCenter Server database, run the query below. After the temporary table is completed, query with the select statement included below to determine table sizes for VPX_HIST_STATx tables. 1. Open a new SQL Query. 2. Verify that the database selected is the vCenter Server database. 3. From the SQL Management Studio, click New Query. 4. Copy the query below into the query pane. 5. Click Execute. Note: Replace the string VPX_HIST_STAT1% with the VPX_HIST_STATx table you wish to query for example: 'VPX_HIST_STAT2%' 'VPX_HIST_STAT3%' 'VPX_HIST_STAT4%' For SQL: create table #TEMP ([ NAME] NVARCHAR(128), [ROWS

ESXi hosts failing to authenticate against Active Directory

You may noticed ESXi host may fail to authenticate to AD when lsassd service fails To resolve the issues 1. Place the host in Maintenance mode 2. Connect to host using SSH 3. Stop the lsassd services by runing /etc/init.d/lsassd stop 4. Copy the file /etc/krb5.conf from one host where the authentication works fine 5. Start the lsassd services /etc/init.d/lsassd start

How to shrink size of a vmdk file in ESXi 5.0?

Image
When we try to reduce or shrink the size of an existing vmdk file, the operation fails. Resolution There is no option to reduce the size using vSphere client. For this you may need to use putty or cli. Please remember to delete unwanted data from the OS and to shrink the partition internally using the diskmgmt.msc tool. After shrinking , perform the below: Login to the ESXi using putty Browse to the vmdk location (eg: cd vmfs/volumes/datastore1/VMname) Take a backup of the existing vmname.vmdk and vmname-flat.vmdk files using the cp command in linux (cp filename backup_filename) Open the vmdk file using vi editor - vi vmname.vmdk Modify the value corresponding to RW to the required disk space. If you need to shrink the file to xGB, use the value : x*1024*1024*2. For eg: if you want to shrink the disk to 25GB, give the value 25*1024*1024*2= 52428800 Once finished save the file and use vmkfstools command to clone a disk using the new settings. vmkfstools -i vmname.

VMs will be shown as inaccessible in the vCenter

Image
Reason A VM can become inaccessible due to any of the below reason: Issue with the ESXi servers Issue with the vCenter Issue with the datastore Resolution In all the three cases these are the below three troubleshooting steps: First step will be to restart the management agents in the ESXi. Login to the ESXi using SSH Run any of the below commands to restart the management agents /etc/init.d/hostd restart /etc/init.d/vpxa restart OR services.sh restart If this step did not resolve the issue for you, try the second step Second step will be to remove the VM from the inventory and add using the vmx file Right click on the affected VM Choose the option 'Remove from the Inventory' (Be cautious about this action...Do not delete the VM) After this step, go to the vmx location of the VM Right click on the VM and 'Add to the Inventory' This step will definitely resolve your issue. But this step works fine only when we

Switch Configuration for storage and vSphere

Fabric OS (goals) Fabos Version 5.3.1 Welcome to VMware Consultancy Services(vCloud Automation Center). goals login: admin Password: goals:admin> switchshow switchName:     goals switchType:     26.2 switchState:    Online switchMode:     Native switchRole:     Principal switchDomain:   1 switchId:       fffc01 switchWwn:      10:00:00:05:1e:35:7f:aa zoning:         ON (configfile) switchBeacon:   OFF Area Port Media Speed State     Proto =====================================   0   0   id    N2   Online           F-Port  50:06:01:68:3e:e0:38:39   1   1   --    N2   No_Module   2   2   id    N2   Online           F-Port  10:00:00:00:c9:44:f1:af   3   3   --    N2   No_Module   4   4   --    N2   No_Module   5   5   --    N2   No_Module   6   6   --    N2   No_Module   7   7   --    N2   No_Module   8   8   id    N2   Online           F-Port  50:06:01:60:3e:e0:38:39   9   9   --    N2   No_Module  10  10   --    N2   No_Module  11  11   --    N2   No_M