Creating Dynamic VMs Infrastructure With Xen Hosts

Introduction

In today’s world, the server infrastructure machines are either in on-premise data centers, private data centers or public cloud data centers. These machines are either physical bare metal machines, virtual machines (VMs) with hypervisors or small containers like docker containers on top of physical or virtual machines. These machines physically might be in the local lab. In a private data center scenario, where your own procured hosts are placed in a physical space that is shared in a third party data center and connected remotely. Whereas in the public data centers like AWS, GCP, Azure, OCI the machines are either reserved or created on-demand for the highly scalable needs connecting remotely. Each of these have its own advantages w.r.t scalability, security, reliability, management and costs associated with those infrastructures.

The product development environment teams might need many servers during the SDLC process. Let us say if one had chosen the private data center with their own physical machines along with Xen Servers. Now, the challenge is how the VMs lifecycle is managed for provision or termination in similar to cloud environments with lean and agile processes.

This document is aiming to provide the basic infrastructure model, architecture, minimum APIs and sample code snippets so that one can easily build dynamic infrastructure environments.

Benefits

First let us understand the typical sequence of the steps followed in this server’s infrastructure process. You can recollect it as below.

1. Procurement of the new machines by IT
2. Host virtualization – Install Xen Server and Create VM Templates by IT
3. Static VMs request by Dev and test teams via (say JIRA) tickets to IT
4. Maintain the received VM IPs in a database or a static file or hardcoded in configuration files or CI tools like in Jenkins config.xml
5. Monitor the VMs for health checks to make sure these are healthy before using to install the products
6. Cleanup or uninstall before or after Server installations
7. Windows might need some registry cleanup before installing the product
8. Fixed allocation of VMs to an area or a team or dedicated to an engineer might have been done

Now, how can you make this process more lean and agile? Can you eliminate most of the above steps with simple automation?

Yes. In our environment, we had more than 1000 VMs and tried to achieved and mainly the below.

“Disposable VMs on-demand as required during tests execution. Solve Windows cleanup issues with regular test cycles.”

As you see below, using the dynamic VMs server manager API service, 6 out of 8 steps can be eliminated and it gives the unlimited infrastructure view for the entire product team. Only the first 2 steps – procure and host virtualization are needed. In effect, this saves in terms of time and cost!

Typical flow to get infrastructure

Dynamic Infrastructure model

The below picture shows our proposed infrastructure for a typical server product environments where 80% of docker containers, 15% as dynamic VMs and 5% as static pooled VMs for special cases. This distribution can be adjusted based on what works most for your environment.

Infrastructure model

From here on, we will discuss more about Dynamic VM server manager part.

Dynamic Server Manager architecture

In the dynamic VMs server manager, a simple API service where the below REST APIs can be exposed and can be used anywhere in the automated process. As the tech stack shows, python 3 and Python based Xen APIs are used for actual creation of VMs with XenServer host. Flask is being used for the REST service layer creation. The OS can be any of your product supported platforms like windows2016, centos7, centos8, debian10, ubuntu18, oel8, suse15.

Dynamic VMs server manager architecture

Save the history of the VMs to track the usage and time to provision or termination can be analyzed further. For storing the json document, Couchbase enterprise server, which is a nosql document database can be used.

Simple REST APIs

*Method*	*URI(s)*	*Purpose*
GET	/showall	Lists all VMs in json format
GET	/getavailablecount/<os>	Gets the list of available VMs count for the given <os>
GET	/getservers/<name>?os=<os> /getservers/<name>?os=<os>&count=<count> /getservers/<name>?os=<os>&count=<count>&cpus=<cpus>&mem=<memsize> /getservers/<name>?os=<os>&expiresin=<minutes>	Provisions given <count> VMs of <os>. cpus count and mem size also can be supported. expiresin parameter in minutes to get expiry (auto termination) of the VMs.
GET	/releaseservers/<name>?os=<os> /releaseservers/<name>?os=<os>&count=<count>	Terminates given <count> VMs of <os>

Pre-requirements for dynamic VM targeted Xen Hosts

Identify targeted dynamic VM Xen Hosts
Copy/create the VM templates
Move these Xen Hosts a separate VLAN/Subnet (work with IT) for IPs recycle

Implementation

At a high level –

Create functions each REST API
Call a common service to perform different REST actions.
Understand the Xen Session creation, getting the records, cloning VM from template, attaching the right disk, waiting for the VM creation and IP received; deletion of VMs, deletion of disks
Start a thread for expiry of VMs automatically
Read the common configuration such as .ini format
Understand working with Couchbase database and save documents
Test all APIs with required OSes and parameters
Fix issues if any
Perform a POC with few Xen Hosts

The below code snippets can help you to understand even better.

APIs creation

@app.route('/showall/<string:os>')
@app.route("/showall")
def showall_service(os=None):
    count, _ = get_all_xen_hosts_count(os)
    log.info("--> count: {}".format(count))
    all_vms = {}
    for xen_host_ref in range(1, count + 1):
        log.info("Getting xen_host_ref=" + str(xen_host_ref))
        all_vms[xen_host_ref] = perform_service(xen_host_ref, service_name='listvms', os=os)
    return json.dumps(all_vms, indent=2, sort_keys=True)


@app.route('/getavailablecount/<string:os>')
@app.route('/getavailablecount')
def getavailable_count_service(os='centos'):
    """
    Calculate the available count:
        Get Total CPUs, Total Memory
        Get Free CPUs, Free Memory
        Get all the VMs - CPUs and Memory allocated
        Get each OS template - CPUs and Memory
        Available count1 = (Free CPUs - VMs CPUs)/OS_Template_CPUs
        Available count2 = (Free Memory - VMs Memory)/OS_Template_Memory
        Return min(count1,count2)
    """
    count, available_counts, xen_hosts = get_all_available_count(os)
    log.info("{},{},{},{}".format(count, available_counts, xen_hosts, reserved_count))
    if count > reserved_count:
        count -= reserved_count
    log.info("Less reserved count: {},{},{},{}".format(count, available_counts, xen_hosts,
                                                 reserved_count))
    return str(count)


# /getservers/username?count=number&os=centos&ver=6&expiresin=30
@app.route('/getservers/<string:username>')
def getservers_service(username):
    global reserved_count
    if request.args.get('count'):
        vm_count = int(request.args.get('count'))
    else:
        vm_count = 1
    os_name = request.args.get('os')
    if request.args.get('cpus'):
        cpus_count = request.args.get('cpus')
    else:
        cpus_count = "default"

    if request.args.get('mem'):
        mem = request.args.get('mem')
    else:
        mem = "default"

    if request.args.get('expiresin'):
        exp = int(request.args.get('expiresin'))
    else:
        exp = MAX_EXPIRY_MINUTES

    if request.args.get('format'):
        output_format = request.args.get('format')
    else:
        output_format = "servermanager"

    xhostref = None
    if request.args.get('xhostref'):
        xhostref = request.args.get('xhostref')
    reserved_count += vm_count
    if xhostref:
        log.info("-->  VMs on given xenhost" + xhostref)
        vms_ips_list = perform_service(xhostref, 'createvm', os_name, username, vm_count,
                                       cpus=cpus_count, maxmemory=mem, expiry_minutes=exp,
                                       output_format=output_format)
        return json.dumps(vms_ips_list)

 ...


# /releaseservers/{username}
@app.route('/releaseservers/<string:username>/<string:available>')
@app.route('/releaseservers/<string:username>')
def releaseservers_service(username):
    if request.args.get('count'):
        vm_count = int(request.args.get('count'))
    else:
        vm_count = 1
    os_name = request.args.get('os')
    delete_vms_res = []
    for vm_index in range(vm_count):
        if vm_count > 1:
            vm_name = username + str(vm_index + 1)
        else:
            vm_name = username
        xen_host_ref = get_vm_existed_xenhost_ref(vm_name, 1, None)
        log.info("VM to be deleted from xhost_ref=" + str(xen_host_ref))
        if xen_host_ref != 0:
            delete_per_xen_res = perform_service(xen_host_ref, 'deletevm', os_name, vm_name, 1)
            for deleted_vm_res in delete_per_xen_res:
                delete_vms_res.append(deleted_vm_res)
    if len(delete_vms_res) < 1:
        return "Error: VM " + username + " doesn't exist"
    else:
        return json.dumps(delete_vms_res, indent=2, sort_keys=True)


def perform_service(xen_host_ref=1, service_name='list_vms', os="centos", vm_prefix_names="",
                    number_of_vms=1, cpus="default", maxmemory="default",
                    expiry_minutes=MAX_EXPIRY_MINUTES, output_format="servermanager",
                    start_suffix=0):
    xen_host = get_xen_host(xen_host_ref, os)
...

def main():
    # options = parse_arguments()
    set_log_level()
    app.run(host='0.0.0.0', debug=True)

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

@app.route('/showall/<string:os>')

@app.route("/showall")

def showall_service(os=None):

count, _ = get_all_xen_hosts_count(os)

log.info("--> count: {}".format(count))

all_vms = {}

for xen_host_ref in range(1, count + 1):

log.info("Getting xen_host_ref=" + str(xen_host_ref))

all_vms[xen_host_ref] = perform_service(xen_host_ref, service_name='listvms', os=os)

return json.dumps(all_vms, indent=2, sort_keys=True)

@app.route('/getavailablecount/<string:os>')

@app.route('/getavailablecount')

def getavailable_count_service(os='centos'):

"""

Calculate the available count:

Get Total CPUs, Total Memory

Get Free CPUs, Free Memory

Get all the VMs - CPUs and Memory allocated

Get each OS template - CPUs and Memory

Available count1 = (Free CPUs - VMs CPUs)/OS_Template_CPUs

Available count2 = (Free Memory - VMs Memory)/OS_Template_Memory

Return min(count1,count2)

"""

count, available_counts, xen_hosts = get_all_available_count(os)

log.info("{},{},{},{}".format(count, available_counts, xen_hosts, reserved_count))

if count > reserved_count:

count -= reserved_count

log.info("Less reserved count: {},{},{},{}".format(count, available_counts, xen_hosts,

reserved_count))

return str(count)

# /getservers/username?count=number&os=centos&ver=6&expiresin=30

@app.route('/getservers/<string:username>')

def getservers_service(username):

global reserved_count

if request.args.get('count'):

vm_count = int(request.args.get('count'))

else:

vm_count = 1

os_name = request.args.get('os')

if request.args.get('cpus'):

cpus_count = request.args.get('cpus')

else:

cpus_count = "default"

if request.args.get('mem'):

mem = request.args.get('mem')

else:

mem = "default"

if request.args.get('expiresin'):

exp = int(request.args.get('expiresin'))

else:

exp = MAX_EXPIRY_MINUTES

if request.args.get('format'):

output_format = request.args.get('format')

else:

output_format = "servermanager"

xhostref = None

if request.args.get('xhostref'):

xhostref = request.args.get('xhostref')

reserved_count += vm_count

if xhostref:

log.info("--> VMs on given xenhost" + xhostref)

vms_ips_list = perform_service(xhostref, 'createvm', os_name, username, vm_count,

cpus=cpus_count, maxmemory=mem, expiry_minutes=exp,

output_format=output_format)

return json.dumps(vms_ips_list)

...

# /releaseservers/{username}

@app.route('/releaseservers/<string:username>/<string:available>')

@app.route('/releaseservers/<string:username>')

def releaseservers_service(username):

if request.args.get('count'):

vm_count = int(request.args.get('count'))

else:

vm_count = 1

os_name = request.args.get('os')

delete_vms_res = []

for vm_index in range(vm_count):

if vm_count > 1:

vm_name = username + str(vm_index + 1)

else:

vm_name = username

xen_host_ref = get_vm_existed_xenhost_ref(vm_name, 1, None)

log.info("VM to be deleted from xhost_ref=" + str(xen_host_ref))

if xen_host_ref != 0:

delete_per_xen_res = perform_service(xen_host_ref, 'deletevm', os_name, vm_name, 1)

for deleted_vm_res in delete_per_xen_res:

delete_vms_res.append(deleted_vm_res)

if len(delete_vms_res) < 1:

return "Error: VM " + username + " doesn't exist"

else:

return json.dumps(delete_vms_res, indent=2, sort_keys=True)

def perform_service(xen_host_ref=1, service_name='list_vms', os="centos", vm_prefix_names="",

number_of_vms=1, cpus="default", maxmemory="default",

expiry_minutes=MAX_EXPIRY_MINUTES, output_format="servermanager",

start_suffix=0):

xen_host = get_xen_host(xen_host_ref, os)

...

def main():

# options = parse_arguments()

set_log_level()

app.run(host='0.0.0.0', debug=True)

Creation Xen session

def get_xen_session(xen_host_ref=1, os="centos"):
    xen_host = get_xen_host(xen_host_ref, os)
    if not xen_host:
        return None
    url = "http://" + xen_host['host.name']
    log.info("\nXen Server host: " + xen_host['host.name'] + "\n")
    try:
        session = XenAPI.Session(url)
        session.xenapi.login_with_password(xen_host['host.user'], xen_host['host.password'])
    except XenAPI.Failure as f:
        error = "Failed to acquire a session: {}".format(f.details)
        log.error(error)
        return error
    return session

def get_xen_session(xen_host_ref=1, os="centos"):

xen_host = get_xen_host(xen_host_ref, os)

if not xen_host:

return None

url = "http://" + xen_host['host.name']

log.info("\nXen Server host: " + xen_host['host.name'] + "\n")

try:

session = XenAPI.Session(url)

session.xenapi.login_with_password(xen_host['host.user'], xen_host['host.password'])

except XenAPI.Failure as f:

error = "Failed to acquire a session: {}".format(f.details)

log.error(error)

return error

return session

List VMs

def list_vms(session):
    vm_count = 0
    vms = session.xenapi.VM.get_all()
    log.info("Server has {} VM objects (this includes templates):".format(len(vms)))
    log.info("-----------------------------------------------------------")
    log.info("S.No.,VMname,PowerState,Vcpus,MaxMemory,Networkinfo,Description")
    log.info("-----------------------------------------------------------")

    vm_details = []

    for vm in vms:
        network_info = 'N/A'
        record = session.xenapi.VM.get_record(vm)
        if not (record["is_a_template"]) and not (record["is_control_domain"]):
            log.debug(record)
            vm_count = vm_count + 1
            name = record["name_label"]
            name_description = record["name_description"]
            power_state = record["power_state"]
            vcpus = record["VCPUs_max"]
            memory_static_max = record["memory_static_max"]
            if record["power_state"] != 'Halted':
                ip_ref = session.xenapi.VM_guest_metrics.get_record(record['guest_metrics'])
                network_info = ','.join([str(elem) for elem in ip_ref['networks'].values()])
            else:
                continue  # Listing only Running VMs

            vm_info = {'name': name, 'power_state': power_state, 'vcpus': vcpus,
                       'memory_static_max': memory_static_max, 'networkinfo': network_info,
                       'name_description': name_description}
            vm_details.append(vm_info)
            log.info(vm_info)

    log.info("Server has {} VM objects and {} templates.".format(vm_count, len(vms) - vm_count))
    log.debug(vm_details)
    return vm_details

def list_vms(session):

vm_count = 0

vms = session.xenapi.VM.get_all()

log.info("Server has {} VM objects (this includes templates):".format(len(vms)))

log.info("-----------------------------------------------------------")

log.info("S.No.,VMname,PowerState,Vcpus,MaxMemory,Networkinfo,Description")

log.info("-----------------------------------------------------------")

vm_details = []

for vm in vms:

network_info = 'N/A'

record = session.xenapi.VM.get_record(vm)

if not (record["is_a_template"]) and not (record["is_control_domain"]):

log.debug(record)

vm_count = vm_count + 1

name = record["name_label"]

name_description = record["name_description"]

power_state = record["power_state"]

vcpus = record["VCPUs_max"]

memory_static_max = record["memory_static_max"]

if record["power_state"] != 'Halted':

ip_ref = session.xenapi.VM_guest_metrics.get_record(record['guest_metrics'])

network_info = ','.join([str(elem) for elem in ip_ref['networks'].values()])

else:

continue # Listing only Running VMs

vm_info = {'name': name, 'power_state': power_state, 'vcpus': vcpus,

'memory_static_max': memory_static_max, 'networkinfo': network_info,

'name_description': name_description}

vm_details.append(vm_info)

log.info(vm_info)

log.info("Server has {} VM objects and {} templates.".format(vm_count, len(vms) - vm_count))

log.debug(vm_details)

return vm_details

Create VM

def create_vm(session, os_name, template, new_vm_name, cpus="default", maxmemory="default",
              expiry_minutes=MAX_EXPIRY_MINUTES):
    error = ''
    vm_os_name = ''
    vm_ip_addr = ''
    prov_start_time = time.time()
    try:
        log.info("\n--- Creating VM: " + new_vm_name + " using " + template)
        pifs = session.xenapi.PIF.get_all_records()
        lowest = None
        for pifRef in pifs.keys():
            if (lowest is None) or (pifs[pifRef]['device'] < pifs[lowest]['device']):
                lowest = pifRef
        log.debug("Choosing PIF with device: {}".format(pifs[lowest]['device']))
        ref = lowest
        mac = pifs[ref]['MAC']
        device = pifs[ref]['device']
        mode = pifs[ref]['ip_configuration_mode']
        ip_addr = pifs[ref]['IP']
        net_mask = pifs[ref]['IP']
        gateway = pifs[ref]['gateway']
        dns_server = pifs[ref]['DNS']
        log.debug("{},{},{},{},{},{},{}".format(mac, device, mode, ip_addr, net_mask, gateway,
                                                dns_server))
        # List all the VM objects
        vms = session.xenapi.VM.get_all_records()
        log.debug("Server has {} VM objects (this includes templates)".format(len(vms)))

        templates = []
        all_templates = []
        for vm in vms:
            record = vms[vm]
            res_type = "VM"
            if record["is_a_template"]:
                res_type = "Template"
                all_templates.append(vm)
                # Look for a given template
                if record["name_label"].startswith(template):
                    templates.append(vm)
                    log.debug(" Found %8s with name_label = %s" % (res_type, record["name_label"]))

        log.debug("Server has {} Templates and {} VM objects.".format(len(all_templates),
                                                                      len(vms) - len(
                                                                          all_templates)))

        log.debug("Choosing a {} template to clone".format(template))
        if not templates:
            log.error("Could not find any {} templates. Exiting.".format(template))
            sys.exit(1)

        template_ref = templates[0]
        log.debug("  Selected template: {}".format(session.xenapi.VM.get_name_label(template_ref)))

        # Retries when 169.x address received
        ipaddr_max_retries = 3
        retry_count = 1
        is_local_ip = True
        vm_ip_addr = ""
        while is_local_ip and retry_count != ipaddr_max_retries:
            log.info("Installing new VM from the template - attempt #{}".format(retry_count))
            vm = session.xenapi.VM.clone(template_ref, new_vm_name)

            network = session.xenapi.PIF.get_network(lowest)
            log.debug("Chosen PIF is connected to network: {}".format(
                session.xenapi.network.get_name_label(network)))
            vifs = session.xenapi.VIF.get_all()
            log.debug(("Number of VIFs=" + str(len(vifs))))
            for i in range(len(vifs)):
                vmref = session.xenapi.VIF.get_VM(vifs[i])
                a_vm_name = session.xenapi.VM.get_name_label(vmref)
                log.debug(str(i) + "." + session.xenapi.network.get_name_label(
                    session.xenapi.VIF.get_network(vifs[i])) + " " + a_vm_name)
                if a_vm_name == new_vm_name:
                    session.xenapi.VIF.move(vifs[i], network)

            log.debug("Adding non-interactive to the kernel commandline")
            session.xenapi.VM.set_PV_args(vm, "non-interactive")
            log.debug("Choosing an SR to instantiate the VM's disks")
            pool = session.xenapi.pool.get_all()[0]
            default_sr = session.xenapi.pool.get_default_SR(pool)
            default_sr = session.xenapi.SR.get_record(default_sr)
            log.debug("Choosing SR: {} (uuid {})".format(default_sr['name_label'], default_sr['uuid']))
            log.debug("Asking server to provision storage from the template specification")
            description = new_vm_name + " from " + template + " on " + str(datetime.datetime.utcnow())
            session.xenapi.VM.set_name_description(vm, description)
            if cpus != "default":
                log.info("Setting cpus to " + cpus)
                session.xenapi.VM.set_VCPUs_max(vm, int(cpus))
                session.xenapi.VM.set_VCPUs_at_startup(vm, int(cpus))
            if maxmemory != "default":
                log.info("Setting memory to " + maxmemory)
                session.xenapi.VM.set_memory(vm, maxmemory)  # 8GB="8589934592" or 4GB="4294967296"
            session.xenapi.VM.provision(vm)
            log.info("Starting VM")
            session.xenapi.VM.start(vm, False, True)
            log.debug("  VM is booting")

            log.debug("Waiting for the installation to complete")

            # Get the OS Name and IPs
            log.info("Getting the OS Name and IP...")
            config = read_config()
            vm_network_timeout_secs = int(config.get("common", "vm.network.timeout.secs"))
            if vm_network_timeout_secs > 0:
                TIMEOUT_SECS = vm_network_timeout_secs

            log.info("Max wait time in secs for VM OS address is {0}".format(str(TIMEOUT_SECS)))
            if "win" not in template:
                maxtime = time.time() + TIMEOUT_SECS
                while read_os_name(session, vm) is None and time.time() < maxtime:
                    time.sleep(1)
                vm_os_name = read_os_name(session, vm)
                log.info("VM OS name: {}".format(vm_os_name))
            else:
                # TBD: Wait for network to refresh on Windows VM
                time.sleep(60)

            log.info("Max wait time in secs for IP address is " + str(TIMEOUT_SECS))
            maxtime = time.time() + TIMEOUT_SECS
            # Wait until IP is not None or 169.xx (when no IPs available, this is default) and timeout
            # is not reached.
            while (read_ip_address(session, vm) is None or read_ip_address(session, vm).startswith(
                    '169')) and \
                    time.time() < maxtime:
                time.sleep(1)
            vm_ip_addr = read_ip_address(session, vm)
            log.info("VM IP: {}".format(vm_ip_addr))

            if vm_ip_addr.startswith('169'):
                log.info("No Network IP available. Deleting this VM ... ")
                record = session.xenapi.VM.get_record(vm)
                power_state = record["power_state"]
                if power_state != 'Halted':
                    session.xenapi.VM.hard_shutdown(vm)
                delete_all_disks(session, vm)
                session.xenapi.VM.destroy(vm)
                time.sleep(5)
                is_local_ip = True
                retry_count += 1
            else:
                is_local_ip = False

        log.info("Final VM IP: {}".format(vm_ip_addr))

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

def create_vm(session, os_name, template, new_vm_name, cpus="default", maxmemory="default",

expiry_minutes=MAX_EXPIRY_MINUTES):

error = ''

vm_os_name = ''

vm_ip_addr = ''

prov_start_time = time.time()

try:

log.info("\n--- Creating VM: " + new_vm_name + " using " + template)

pifs = session.xenapi.PIF.get_all_records()

lowest = None

for pifRef in pifs.keys():

if (lowest is None) or (pifs[pifRef]['device'] < pifs[lowest]['device']):

lowest = pifRef

log.debug("Choosing PIF with device: {}".format(pifs[lowest]['device']))

ref = lowest

mac = pifs[ref]['MAC']

device = pifs[ref]['device']

mode = pifs[ref]['ip_configuration_mode']

ip_addr = pifs[ref]['IP']

net_mask = pifs[ref]['IP']

gateway = pifs[ref]['gateway']

dns_server = pifs[ref]['DNS']

log.debug("{},{},{},{},{},{},{}".format(mac, device, mode, ip_addr, net_mask, gateway,

dns_server))

# List all the VM objects

vms = session.xenapi.VM.get_all_records()

log.debug("Server has {} VM objects (this includes templates)".format(len(vms)))

templates = []

all_templates = []

for vm in vms:

record = vms[vm]

res_type = "VM"

if record["is_a_template"]:

res_type = "Template"

all_templates.append(vm)

# Look for a given template

if record["name_label"].startswith(template):

templates.append(vm)

log.debug(" Found %8s with name_label = %s" % (res_type, record["name_label"]))

log.debug("Server has {} Templates and {} VM objects.".format(len(all_templates),

len(vms) - len(

all_templates)))

log.debug("Choosing a {} template to clone".format(template))

if not templates:

log.error("Could not find any {} templates. Exiting.".format(template))

sys.exit(1)

template_ref = templates[0]

log.debug(" Selected template: {}".format(session.xenapi.VM.get_name_label(template_ref)))

# Retries when 169.x address received

ipaddr_max_retries = 3

retry_count = 1

is_local_ip = True

vm_ip_addr = ""

while is_local_ip and retry_count != ipaddr_max_retries:

log.info("Installing new VM from the template - attempt #{}".format(retry_count))

vm = session.xenapi.VM.clone(template_ref, new_vm_name)

network = session.xenapi.PIF.get_network(lowest)

log.debug("Chosen PIF is connected to network: {}".format(

session.xenapi.network.get_name_label(network)))

vifs = session.xenapi.VIF.get_all()

log.debug(("Number of VIFs=" + str(len(vifs))))

for i in range(len(vifs)):

vmref = session.xenapi.VIF.get_VM(vifs[i])

a_vm_name = session.xenapi.VM.get_name_label(vmref)

log.debug(str(i) + "." + session.xenapi.network.get_name_label(

session.xenapi.VIF.get_network(vifs[i])) + " " + a_vm_name)

if a_vm_name == new_vm_name:

session.xenapi.VIF.move(vifs[i], network)

log.debug("Adding non-interactive to the kernel commandline")

session.xenapi.VM.set_PV_args(vm, "non-interactive")

log.debug("Choosing an SR to instantiate the VM's disks")

pool = session.xenapi.pool.get_all()[0]

default_sr = session.xenapi.pool.get_default_SR(pool)

default_sr = session.xenapi.SR.get_record(default_sr)

log.debug("Choosing SR: {} (uuid {})".format(default_sr['name_label'], default_sr['uuid']))

log.debug("Asking server to provision storage from the template specification")

description = new_vm_name + " from " + template + " on " + str(datetime.datetime.utcnow())

session.xenapi.VM.set_name_description(vm, description)

if cpus != "default":

log.info("Setting cpus to " + cpus)

session.xenapi.VM.set_VCPUs_max(vm, int(cpus))

session.xenapi.VM.set_VCPUs_at_startup(vm, int(cpus))

if maxmemory != "default":

log.info("Setting memory to " + maxmemory)

session.xenapi.VM.set_memory(vm, maxmemory) # 8GB="8589934592" or 4GB="4294967296"

session.xenapi.VM.provision(vm)

log.info("Starting VM")

session.xenapi.VM.start(vm, False, True)

log.debug(" VM is booting")

log.debug("Waiting for the installation to complete")

# Get the OS Name and IPs

log.info("Getting the OS Name and IP...")

config = read_config()

vm_network_timeout_secs = int(config.get("common", "vm.network.timeout.secs"))

if vm_network_timeout_secs > 0:

TIMEOUT_SECS = vm_network_timeout_secs

log.info("Max wait time in secs for VM OS address is {0}".format(str(TIMEOUT_SECS)))

if "win" not in template:

maxtime = time.time() + TIMEOUT_SECS

while read_os_name(session, vm) is None and time.time() < maxtime:

time.sleep(1)

vm_os_name = read_os_name(session, vm)

log.info("VM OS name: {}".format(vm_os_name))

else:

# TBD: Wait for network to refresh on Windows VM

time.sleep(60)

log.info("Max wait time in secs for IP address is " + str(TIMEOUT_SECS))

maxtime = time.time() + TIMEOUT_SECS

# Wait until IP is not None or 169.xx (when no IPs available, this is default) and timeout

# is not reached.

while (read_ip_address(session, vm) is None or read_ip_address(session, vm).startswith(

'169')) and \

time.time() < maxtime:

time.sleep(1)

vm_ip_addr = read_ip_address(session, vm)

log.info("VM IP: {}".format(vm_ip_addr))

if vm_ip_addr.startswith('169'):

log.info("No Network IP available. Deleting this VM ... ")

record = session.xenapi.VM.get_record(vm)

power_state = record["power_state"]

if power_state != 'Halted':

session.xenapi.VM.hard_shutdown(vm)

delete_all_disks(session, vm)

session.xenapi.VM.destroy(vm)

time.sleep(5)

is_local_ip = True

retry_count += 1

else:

is_local_ip = False

log.info("Final VM IP: {}".format(vm_ip_addr))

Delete VM

def delete_vm(session, vm_name):
    log.info("Deleting VM: " + vm_name)
    delete_start_time = time.time()
    vm = session.xenapi.VM.get_by_name_label(vm_name)
    log.info("Number of VMs found with name - " + vm_name + " : " + str(len(vm)))
    for j in range(len(vm)):
        record = session.xenapi.VM.get_record(vm[j])
        power_state = record["power_state"]
        if power_state != 'Halted':
            # session.xenapi.VM.shutdown(vm[j])
            session.xenapi.VM.hard_shutdown(vm[j])

        # print_all_disks(session, vm[j])
        delete_all_disks(session, vm[j])

        session.xenapi.VM.destroy(vm[j])

        delete_end_time = time.time()
        delete_duration = round(delete_end_time - delete_start_time)

        # delete from CB
        uuid = record["uuid"]
        doc_key = uuid
        cbdoc = CBDoc()
        doc_result = cbdoc.get_doc(doc_key)
        if doc_result:
            doc_value = doc_result.value
            doc_value["state"] = 'deleted'
            current_time = time.time()
            doc_value["deleted_time"] = current_time
            if doc_value["created_time"]:
                doc_value["live_duration_secs"] = round(current_time - doc_value["created_time"])
            doc_value["delete_duration_secs"] = delete_duration
            cbdoc.save_dynvm_doc(doc_key, doc_value)

def delete_vm(session, vm_name):

log.info("Deleting VM: " + vm_name)

delete_start_time = time.time()

vm = session.xenapi.VM.get_by_name_label(vm_name)

log.info("Number of VMs found with name - " + vm_name + " : " + str(len(vm)))

for j in range(len(vm)):

record = session.xenapi.VM.get_record(vm[j])

power_state = record["power_state"]

if power_state != 'Halted':

# session.xenapi.VM.shutdown(vm[j])

session.xenapi.VM.hard_shutdown(vm[j])

# print_all_disks(session, vm[j])

delete_all_disks(session, vm[j])

session.xenapi.VM.destroy(vm[j])

delete_end_time = time.time()

delete_duration = round(delete_end_time - delete_start_time)

# delete from CB

uuid = record["uuid"]

doc_key = uuid

cbdoc = CBDoc()

doc_result = cbdoc.get_doc(doc_key)

if doc_result:

doc_value = doc_result.value

doc_value["state"] = 'deleted'

current_time = time.time()

doc_value["deleted_time"] = current_time

if doc_value["created_time"]:

doc_value["live_duration_secs"] = round(current_time - doc_value["created_time"])

doc_value["delete_duration_secs"] = delete_duration

cbdoc.save_dynvm_doc(doc_key, doc_value)

Historic Usage of VMs

It is better to maintain the history of all the VMs created and terminated along with other useful data. Here is the example of json document stored in the Couchbase, a free Nosql database server. Insert a new document using the key as the xen opac reference uuid whenever a new VM is provisioned and update the same whenever VM is terminated. Track the live usage time of the VM and also how the provisioning/termination done by each user.

       record = session.xenapi.VM.get_record(vm)
       uuid = record["uuid"]

       # Save as doc in CB
        state = "available"
        username = new_vm_name
        pool = "dynamicpool"
        doc_value = {"ipaddr": vm_ip_addr, "origin": xen_host_description, "os": os_name,
                     "state": state, "poolId": pool, "prevUser": "", "username": username,
                     "ver": "12", "memory": memory_static_max, "os_version": vm_os_name,
                     "name": new_vm_name, "created_time": prov_end_time,
                     "create_duration_secs": create_duration, "cpu": vcpus, "disk": disks_info}
        # doc_value["mac_address"] = mac_address
        doc_key = uuid

        cb_doc = CBDoc()
        cb_doc.save_dynvm_doc(doc_key, doc_value)

record = session.xenapi.VM.get_record(vm)

uuid = record["uuid"]

# Save as doc in CB

state = "available"

username = new_vm_name

pool = "dynamicpool"

doc_value = {"ipaddr": vm_ip_addr, "origin": xen_host_description, "os": os_name,

"state": state, "poolId": pool, "prevUser": "", "username": username,

"ver": "12", "memory": memory_static_max, "os_version": vm_os_name,

"name": new_vm_name, "created_time": prov_end_time,

"create_duration_secs": create_duration, "cpu": vcpus, "disk": disks_info}

# doc_value["mac_address"] = mac_address

doc_key = uuid

cb_doc = CBDoc()

cb_doc.save_dynvm_doc(doc_key, doc_value)

class CBDoc:
    def __init__(self):
        config = read_config()
        self.cb_server = config.get("couchbase", "couchbase.server")
        self.cb_bucket = config.get("couchbase", "couchbase.bucket")
        self.cb_username = config.get("couchbase", "couchbase.username")
        self.cb_userpassword = config.get("couchbase", "couchbase.userpassword")
        try:
            self.cb_cluster = Cluster('couchbase://' + self.cb_server)
            self.cb_auth = PasswordAuthenticator(self.cb_username, self.cb_userpassword)
            self.cb_cluster.authenticate(self.cb_auth)
            self.cb = self.cb_cluster.open_bucket(self.cb_bucket)
        except Exception as e:
            log.error('Connection Failed: %s ' % self.cb_server)
            log.error(e)

    def get_doc(self, doc_key):
        try:
            return self.cb.get(doc_key)
        except Exception as e:
            log.error('Error while getting doc %s !' % doc_key)
            log.error(e)

    def save_dynvm_doc(self, doc_key, doc_value):
        try:
            log.info(doc_value)
            self.cb.upsert(doc_key, doc_value)
            log.info("%s added/updated successfully" % doc_key)
        except Exception as e:
            log.error('Document with key: %s saving error' % doc_key)
            log.error(e)

class CBDoc:

def __init__(self):

config = read_config()

self.cb_server = config.get("couchbase", "couchbase.server")

self.cb_bucket = config.get("couchbase", "couchbase.bucket")

self.cb_username = config.get("couchbase", "couchbase.username")

self.cb_userpassword = config.get("couchbase", "couchbase.userpassword")

try:

self.cb_cluster = Cluster('couchbase://' + self.cb_server)

self.cb_auth = PasswordAuthenticator(self.cb_username, self.cb_userpassword)

self.cb_cluster.authenticate(self.cb_auth)

self.cb = self.cb_cluster.open_bucket(self.cb_bucket)

except Exception as e:

log.error('Connection Failed: %s ' % self.cb_server)

log.error(e)

def get_doc(self, doc_key):

try:

return self.cb.get(doc_key)

except Exception as e:

log.error('Error while getting doc %s !' % doc_key)

log.error(e)

def save_dynvm_doc(self, doc_key, doc_value):

try:

log.info(doc_value)

self.cb.upsert(doc_key, doc_value)

log.info("%s added/updated successfully" % doc_key)

except Exception as e:

log.error('Document with key: %s saving error' % doc_key)

log.error(e)

"dynserver-pool": {
  "cpu": "6",
  "create_duration_secs": 65,
  "created_time": 1583518463.8943903,
  "delete_duration_secs": 5,
  "deleted_time": 1583520211.8498628,
  "disk": "75161927680",
  "ipaddr": "x.x.x.x",
  "live_duration_secs": 1748,
  "memory": "6442450944",
  "name": "Win2019-Server-1node-DynVM",
  "origin": "s827",
  "os": "win16",
  "os_version": "",
  "poolId": "dynamicpool",
  "prevUser": "",
  "state": "deleted",
  "username": "Win2019-Server-1node-DynVM",
  "ver": "12"
}

"dynserver-pool": {

"cpu": "6",

"create_duration_secs": 65,

"created_time": 1583518463.8943903,

"delete_duration_secs": 5,

"deleted_time": 1583520211.8498628,

"disk": "75161927680",

"ipaddr": "x.x.x.x",

"live_duration_secs": 1748,

"memory": "6442450944",

"name": "Win2019-Server-1node-DynVM",

"origin": "s827",

"os": "win16",

"os_version": "",

"poolId": "dynamicpool",

"prevUser": "",

"state": "deleted",

"username": "Win2019-Server-1node-DynVM",

"ver": "12"

}

Configuration

The Dynamic VM server manager service configuration such as couchbase server, xenhost servers, template details, default expiry and network timeout values can be maintained in a simple .ini format. Any new Xen Host received, then just add as a separate section.The config is dynamically loaded without restarting the Dynamic VM SM service.

Sample config file: .dynvmservice.ini

[couchbase]
couchbase.server=<couchbase-hostIp>
couchbase.bucket=<bucket-name>
couchbase.username=<username>
couchbase.userpassword=<password>

[common]
vm.expiry.minutes=720
vm.network.timeout.secs=400

[xenhost1]
host.name=<xenhostip1>
host.user=root
host.password=xxxx
host.storage.name=Local Storage 01
centos.template=tmpl-cnt7.7
centos7.template=tmpl-cnt7.7
windows.template=tmpl-win16dc - PATCHED (1)
centos8.template=tmpl-cnt8.0
oel8.template=tmpl-oel8
deb10.template=tmpl-deb10-too
debian10.template=tmpl-deb10-too
ubuntu18.template=tmpl-ubu18-4cgb
suse15.template=tmpl-suse15

[xenhost2]
host.name=<xenhostip2>
host.user=root
host.password=xxxx
host.storage.name=ssd
centos.template=tmpl-cnt7.7
windows.template=tmpl-win16dc - PATCHED (1)
centos8.template=tmpl-cnt8.0
oel8.template=tmpl-oel8
deb10.template=tmpl-deb10-too
debian10.template=tmpl-deb10-too
ubuntu18.template=tmpl-ubu18-4cgb
suse15.template=tmpl-suse15

[xenhost3]
...


[xenhost4]
...

[couchbase]

couchbase.server=<couchbase-hostIp>

couchbase.bucket=<bucket-name>

couchbase.username=<username>

couchbase.userpassword=<password>

[common]

vm.expiry.minutes=720

vm.network.timeout.secs=400

[xenhost1]

host.name=<xenhostip1>

host.user=root

host.password=xxxx

host.storage.name=Local Storage 01

centos.template=tmpl-cnt7.7

centos7.template=tmpl-cnt7.7

windows.template=tmpl-win16dc - PATCHED (1)

centos8.template=tmpl-cnt8.0

oel8.template=tmpl-oel8

deb10.template=tmpl-deb10-too

debian10.template=tmpl-deb10-too

ubuntu18.template=tmpl-ubu18-4cgb

suse15.template=tmpl-suse15

[xenhost2]

host.name=<xenhostip2>

host.user=root

host.password=xxxx

host.storage.name=ssd

centos.template=tmpl-cnt7.7

windows.template=tmpl-win16dc - PATCHED (1)

centos8.template=tmpl-cnt8.0

oel8.template=tmpl-oel8

deb10.template=tmpl-deb10-too

debian10.template=tmpl-deb10-too

ubuntu18.template=tmpl-ubu18-4cgb

suse15.template=tmpl-suse15

[xenhost3]

...

[xenhost4]

...

Examples

Sample REST API calls using curl

$ curl 'http://127.0.0.1:5000/showall'
{
  "1": [
    {
      "memory_static_max": "6442450944",
      "name": "tunable-rebalance-out-Apr-29-13:43:59-7.0.0-19021",
      "name_description": "tunable-rebalance-out-Apr-29-13:43:59-7.0.0-19021 from tmpl-win16dc - PATCHED (1) on 2020-04-29 20:43:33.785778",
      "networkinfo": "172.23.137.20,172.23.137.20,fe80:0000:0000:0000:0585:c6e8:52f9:91d1",
      "power_state": "Running",
      "vcpus": "6"
    },
    {
      "memory_static_max": "6442450944",
      "name": "nserv-nserv-rebalanceinout_P0_Set1_compression-Apr-29-06:36:12-7.0.0-19022",
      "name_description": "nserv-nserv-rebalanceinout_P0_Set1_compression-Apr-29-06:36:12-7.0.0-19022 from tmpl-win16dc - PATCHED (1) on 2020-04-29 13:36:53.717776",
      "networkinfo": "172.23.136.142,172.23.136.142,fe80:0000:0000:0000:744d:fd63:1a88:2fa8",
      "power_state": "Running",
      "vcpus": "6"
    }
  ],
  "2": [
    ..
  ]
  "3": [
    ..
  ]
  "4": [
    ..
  ]
  "5": [
    ..
  ]
  "6": [
    ..
  ]
}

$ curl 'http://127.0.0.1:5000/getavailablecount/windows'
2
$ curl 'http://127.0.0.1:5000/getavailablecount/centos'
10 
$ curl 'http://127.0.0.1:5000/getservers/demoserver?os=centos&count=3'
["172.23.137.73", "172.23.137.74", "172.23.137.75"]
$ curl 'http://127.0.0.1:5000/getavailablecount/centos'
7
$ curl 'http://127.0.0.1:5000/releaseservers/demoserver?os=centos&count=3'
[
  "demoserver1",
  "demoserver2",
  "demoserver3"
]
$ curl 'http://127.0.0.1:5000/getavailablecount/centos'
10 
$

$ curl 'http://127.0.0.1:5000/showall'

{

"1": [

{

"memory_static_max": "6442450944",

"name": "tunable-rebalance-out-Apr-29-13:43:59-7.0.0-19021",

"name_description": "tunable-rebalance-out-Apr-29-13:43:59-7.0.0-19021 from tmpl-win16dc - PATCHED (1) on 2020-04-29 20:43:33.785778",

"networkinfo": "172.23.137.20,172.23.137.20,fe80:0000:0000:0000:0585:c6e8:52f9:91d1",

"power_state": "Running",

"vcpus": "6"

{

"memory_static_max": "6442450944",

"name": "nserv-nserv-rebalanceinout_P0_Set1_compression-Apr-29-06:36:12-7.0.0-19022",

"name_description": "nserv-nserv-rebalanceinout_P0_Set1_compression-Apr-29-06:36:12-7.0.0-19022 from tmpl-win16dc - PATCHED (1) on 2020-04-29 13:36:53.717776",

"networkinfo": "172.23.136.142,172.23.136.142,fe80:0000:0000:0000:744d:fd63:1a88:2fa8",

"power_state": "Running",

"vcpus": "6"

}

"2": [

]

"3": [

]

"4": [

]

"5": [

]

"6": [

]

}

$ curl 'http://127.0.0.1:5000/getavailablecount/windows'

$ curl 'http://127.0.0.1:5000/getavailablecount/centos'

$ curl 'http://127.0.0.1:5000/getservers/demoserver?os=centos&count=3'

["172.23.137.73", "172.23.137.74", "172.23.137.75"]

$ curl 'http://127.0.0.1:5000/getavailablecount/centos'

$ curl 'http://127.0.0.1:5000/releaseservers/demoserver?os=centos&count=3'

[

"demoserver1",

"demoserver2",

"demoserver3"

]

$ curl 'http://127.0.0.1:5000/getavailablecount/centos'

Jenkins jobs with single VM

curl -s -o ${BUILD_TAG}_vm.json "http://<host:port>/getservers/${BUILD_TAG}?os=windows&format=detailed"
VM_IP_ADDRESS="`cat ${BUILD_TAG}_vm.json |egrep ${BUILD_TAG}|cut -f2 -d':'|xargs`"
if [[ $VM_IP_ADDRESS =~ ^([0-9]{1,3}\.){3}[0-9]{1,3}$ ]]; then
  echo Valid IP received
else
  echo NOT a valid IP received
  exit 1
fi

curl -s -o ${BUILD_TAG}_vm.json "http://<host:port>/getservers/${BUILD_TAG}?os=windows&format=detailed"

VM_IP_ADDRESS="`cat ${BUILD_TAG}_vm.json |egrep ${BUILD_TAG}|cut -f2 -d':'|xargs`"

if [[ $VM_IP_ADDRESS =~ ^([0-9]{1,3}\.){3}[0-9]{1,3}$ ]]; then

echo Valid IP received

else

echo NOT a valid IP received

exit 1

Jenkins jobs with multiple VMs needed

curl -s -o ${BUILD_TAG}.json "${SERVER_API_URL}/getservers/${BUILD_TAG}?os=${OS}&count=${COUNT}&format=detailed"
cat ${BUILD_TAG}.json
VM_IP_ADDRESS="`cat ${BUILD_TAG}.json |egrep ${BUILD_TAG}|cut -f2 -d':'|xargs|sed 's/,//g'`"
echo $VM_IP_ADDRESS
ADDR=()
INDEX=1
for IP in `echo $VM_IP_ADDRESS`
do
  if [[ $IP =~ ^([0-9]{1,3}\.){3}[0-9]{1,3}$ ]]; then
    echo Valid IP=$IP received
    ADDR[$INDEX]=${IP}
    INDEX=`expr ${INDEX} + 1`
  else
    echo NOT a valid IP=$IP received
    exit 1
  fi
done

echo ${ADDR[1]}
echo ${ADDR[2]}
echo ${ADDR[3]}
echo ${ADDR[4]}

curl -s -o ${BUILD_TAG}.json "${SERVER_API_URL}/getservers/${BUILD_TAG}?os=${OS}&count=${COUNT}&format=detailed"

cat ${BUILD_TAG}.json

VM_IP_ADDRESS="`cat ${BUILD_TAG}.json |egrep ${BUILD_TAG}|cut -f2 -d':'|xargs|sed 's/,//g'`"

echo $VM_IP_ADDRESS

ADDR=()

INDEX=1

for IP in `echo $VM_IP_ADDRESS`

if [[ $IP =~ ^([0-9]{1,3}\.){3}[0-9]{1,3}$ ]]; then

echo Valid IP=$IP received

ADDR[$INDEX]=${IP}

INDEX=`expr ${INDEX} + 1`

else

echo NOT a valid IP=$IP received

exit 1

done

echo ${ADDR[1]}

echo ${ADDR[2]}

echo ${ADDR[3]}

echo ${ADDR[4]}

Key considerations

Here are few of my observations noted during the process and it is better to handle to make it more reliable.

Handle Storage name/ID different among different Xen Hosts
- Keep track of VM storage device name in the service input config file.
Handle partial templates only available on some Xen Hosts while provisioning
When Network IPs not available and Xen APIs gets the default 169.254.xx.yy on Windows. Wait until getting the non 169 address or timeout.
Release servers should ignore os template as some of the templates might not be there Xen Hosts
Provision on a specific given Xen Host reference
Handle No IPs available or not getting network IPs for some of the VMs created.
- Plan to have a different subnet for dynamic VMs targeted Xen Hosts. The default network DHCP IP lease expiry might be in days (say 7 days) and no new IPs are provided.
Handle the capacity check to count the in progress as Reserved IPs and should show less count than full at the moment. Otherwise, both in-progress and incoming requests might have issues. One or two VMs (cpus/memory/disk sizes) can be in buffer while creating and checking if any parallel requests.

References

Some of the key references that help while creating the dynamic VM server manager service.

Hope you had a good reading time!

Disclaimer: Please view this as a reference if you are dealing with Xen Hosts. Feel free to share if you learned something new that can help us. Your positive feedback is appreciated!

Thanks to Raju Suravarjjala, Ritam Sharma, Wayne Siu, Tom Thrush, James Lee for their help during the process.

Jagadesh Munta, Principal Software Engineer, Couchbase

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

All Posts

Creating Dynamic VMs Infrastructure With Xen Hosts

Introduction

Benefits

Dynamic Infrastructure model

Dynamic Server Manager architecture

Simple REST APIs

Pre-requirements for dynamic VM targeted Xen Hosts

Implementation

Historic Usage of VMs

Configuration

Examples

Sample REST API calls using curl

Jenkins jobs with single VM

Jenkins jobs with multiple VMs needed

Key considerations

References

Author

Posted by Jagadesh Munta, Principal Software Engineer, Couchbase

Leave a reply Cancel reply