Azure Virtual Machine Operation Manual v1.0

Suntory Azure Managed Service Standard Document

Azure Virtual Machine

Operation Manual

Document ID	AZ-VM-OPS-001
Version	1.0
Status	RELEASED
Created	2026-05-18
Revised	2026-05-18
Service Manager	Suntory Holdings Limited（SHD）
Operations Team	TCS (Responsible for procedure execution and proactive updates to this document)
Author	Tomoki Koyama

This document defines the day-to-day operational procedures for Azure Virtual Machines, including procedures for incident and failure alert response.
TCS shall perform all operations in accordance with this document and proactively keep it up to date whenever changes occur.

Revision History

Ver.	Revision Date	Author	Description	Approver
1.0	2026-05-18	Tomoki Koyama (SHD)	Initial release	—

📋 Related Documents

Document ID	Document Name	Type	Notes
AZ-VM-OVERVIEW-001	Azure Virtual Machine Service Overview	Service Overview	—
AZ-VM-DESIGN-001	Azure Virtual Machine Design Document	Design Document	Design rationale and standard value details
AZ-VM-PARAM-001	Azure Virtual Machine Parameter Sheet	Parameter Sheet	Entry and approval form for build-time configuration
AZ-VM-BUILD-001	Azure Virtual Machine Build Procedure	Build Procedure	Portal / Terraform build procedures
AZ-VM-OPS-001	Azure Virtual Machine Operation Manual (this document)	Operation Manual	—

Responsibility Matrix (RACI)

Common Rules / Pre-Work Checklist

Task / Activity	SHD	TCS	BU Representative
Operation Manual management and final approval	A	R	I
ServiceNow request submission (Spec Upgrade / Disk Addition / Deletion / Restore)	I	I	R
ServiceNow CR / Incident closure	I	R	I
VM Restart approval (Normal Operations)	I	C	A
VM Restart approval (Emergency: Incident / Failure)	A	R	I
Pre-Work Backup execution	I	R	I
VM Restart execution	I	R	I
VM Spec Upgrade execution	I	R	I
Disk Addition execution	I	R	I
Disk Expansion execution (TCS decision based on monitoring alert)	I	A	I
VM Deletion execution	I	R	I
VM Restore execution	I	R	I
Work Completion Report	I	R	I

Common Pre-Work Checklist (applies to all operations)

☐	No.	Verification Item	Details / Reference
☐	1	Verify target VM and Resource Group name	Confirm information stated in the ServiceNow ticket or alert
☐	2	Obtain pre-work backup	Refer to "Procedure 1: Pre-Work Backup" in this document. Excluded for emergency restarts.
☐	3	Log in to Azure Portal and confirm permissions	Verify that the Contributor role or higher is assigned for the target Subscription
☐	4	Confirm maintenance window	As a rule, perform work within the agreed maintenance window to minimize business impact
☐	5	Confirm ServiceNow ticket	Ensure the ticket includes work details, requester, and approval status (for request-triggered operations)
☐	6	Confirm where to record completion	Record the work result in the ServiceNow ticket and ensure it is properly closed

Trigger Types and Work Flows

Trigger Type	Source	Basic Flow
ServiceNow CR	BU Representative submits a request via ServiceNow	Review CR → Pre-work checks → Obtain backup → Execute work → Verify → Close CR
Monitoring Alert	Azure Monitor / NewRelic alert	Review and assess alert → Emergency decision → Notify BU (if possible) → Execute work → Record incident
Incident (Emergency)	Failure detected / Escalation	Assess situation → Immediate response (post-hoc BU notification) → Root cause investigation → Record incident

Procedure 1: Pre-Work Backup

Trigger	TCS executes this as a rule before performing any CR (except emergency restarts)
Operator	TCS
Estimated Duration	Approx. 10–30 minutes (until the backup job completes)

Take an On-Demand Backup from Azure Portal

☐	Step	Action	Details / Verification Points
☐	1	Open the Recovery Services Vault	Azure Portal → "Recovery Services vaults" → Select the target Vault
☐	2	Confirm the backup target VM	"Backup items" → "Azure Virtual Machine" → Confirm the target VM is displayed
☐	3	Execute "Backup Now"	Select the target VM → "Backup Now" → Set retention period (30 days or more recommended for work backups) → "OK"
☐	4	Confirm backup job completion	In "Backup Jobs", wait until the job status shows "Completed". Proceed to the next step only after completion.
☐	5	Record the backup point	Record the date and time of the completed backup point in the ServiceNow ticket

Azure CLI (Alternative Procedure)

# Trigger an on-demand backup az backup protection backup-now \ --resource-group <RG-name> \ --vault-name <Vault-name> \ --container-name <VM-name> \ --item-name <VM-name> \ --backup-management-type AzureIaasVM \ --retain-until 30-06-2026 # Backup retention date (DD-MM-YYYY) # Check backup job status az backup job list \ --resource-group <RG-name> \ --vault-name <Vault-name> \ --status InProgress

Procedure 2: Azure VM Restart

Trigger	When an incident or failure alert occurs, or when TCS determines that a restart is necessary for service recovery
Operator	TCS
Important Rules	[Normal Operations] Before restarting, TCS must always contact the BU Representative for the target system, obtain approval for execution and the scheduled date/time, and then proceed. [Emergency] In the event of a complete VM outage where service continuation is impossible, an immediate restart without prior BU notification is permitted. However, TCS must provide a post-hoc report to the BU promptly after the restart.

Normal Operations Restart (after BU approval) ServiceNow CR or TCS-initiated

Step A: Pre-Confirmation with BU Representative

☐	Step	Action	Details / Verification Points
☐	1	Prepare restart reason, target VM, and proposed date/time	Confirm and record the basis for the restart need (alert details, error logs, etc.)
☐	2	Send a confirmation message to the BU Representative	Contact via Teams or Email using the confirmation template below
☐	3	Obtain approval from BU	Receive the approval message and record the approval details (scheduled date/time) in the ServiceNow ticket. If approval cannot be obtained, escalate to the next level.

Confirmation Message Template for BU Representative

Subject: [Action Required / Response Needed] VM Restart Approval Request (VM Name: <Target VM Name>) Dear <BU Representative Name>, I hope this message finds you well. This is <Operator Name> from the TCS Operations Team. We have identified an issue with the following VM and would like to perform a restart. We apologize for the inconvenience, and would appreciate your confirmation on whether we may proceed, along with your preferred date and time. ■ Target VM Name : <VM Name> ■ Target RG Name : <Resource Group Name> ■ Reason for Restart: <Describe specifically, e.g., sustained CPU spike, OS freeze, etc.> ■ Proposed Date/Time: <e.g., May 18, 2026, 22:00–22:30 (JST)> ■ Estimated Downtime: Approx. 5–10 minutes ■ Operator : <TCS Operator Name / Contact> To approve, please reply with "Approved (Scheduled time: XX:XX)" at your earliest convenience. Thank you for your cooperation.

Step B: Executing the Restart

☐	Step	Action	Details / Verification Points
☐	1	Obtain pre-work backup	Execute "Procedure 1: Pre-Work Backup"
☐	2	Open the target VM in the Portal	Azure Portal → "Virtual machines" → Select the target VM
☐	3	Execute "Restart"	Click "Restart" in the top menu → Click "Yes" in the confirmation dialog
☐	4	Confirm VM returns to Running state	Wait until the VM "Status" shows "Running" (typically 3–5 minutes)
☐	5	Verify application and service operation	Confirm that the application is functioning normally via the BU Representative or system monitoring
☐	6	Record results in ServiceNow ticket and close	Record the restart date/time, result, and verification details

Azure CLI

# Restart the VM (OS shutdown followed by start) az vm restart \ --resource-group <RG-name> \ --name <VM-name> # Check VM status az vm get-instance-view \ --resource-group <RG-name> \ --name <VM-name> \ --query "instanceView.statuses[?starts_with(code,'PowerState')]" \ --output table

Emergency Restart (when service is down due to failure) Emergency Response

⚠️ Prior BU confirmation may be omitted. However, TCS must promptly provide a post-hoc report to the BU Representative after the restart is complete.
If the issue is not resolved after the restart, escalate to the next level.

☐	Step	Action	Details / Verification Points
☐	1	Confirm and record the failure situation	Record alert details, logs, and error messages. Register the incident in ServiceNow.
☐	2	Execute "Restart" on the VM	Perform an immediate restart via Azure Portal or CLI (backup not required)
☐	3	Confirm VM returns to Running state	Wait until Status shows "Running"
☐	4	Confirm service recovery	Confirm the service is operating normally via system monitoring
☐	5	Send post-hoc report to BU Representative	Promptly report the reason for restart, execution date/time, result, and service recovery status
☐	6	Root cause investigation and incident recording	Investigate the root cause that necessitated the restart and record it in ServiceNow

Procedure 3: Azure VM Spec Upgrade (VM Resize)

Trigger	ServiceNow CR Performed upon receipt of a ServiceNow request from a BU Representative
Operator	TCS
Important Notes	VM resizing requires the VM to be stopped (deallocated). The VM will be down during the operation (service interruption). Always agree on a downtime window with the BU before proceeding.

Notes on VM Family Change

⚠️ When changing VM families (e.g., D-series → E-series), always verify the following items.

Verification Item	Details
Availability Zone compatibility	Confirm that the new VM size is available in the current Availability Zone (Zone 1 / 2 / 3). Check "Zone availability" on the VM size selection screen in the Portal.
Trusted Launch (vTPM / Secure Boot) support	Confirm that the new VM size supports Trusted Launch (check via Basics tab → Security type)
Accelerated Networking support	Confirm that the new size supports SR-IOV. Re-verify that Accelerated Networking remains enabled in the NIC settings.
Ultra Disk compatibility	If Ultra Disk is in use, confirm that the new VM size supports Ultra Disk.
Data disk cache settings	Confirm that data disk cache settings are maintained after the family change.

VM Resize Procedure

☐	Step	Action	Details / Verification Points
☐	1	Review the ServiceNow CR	Confirm that the new VM size, scheduled date/time, requester, and BU approval status are all documented
☐	2	Review Family Change notes	Check the "Notes on VM Family Change" section above
☐	3	Obtain pre-work backup	Execute "Procedure 1: Pre-Work Backup"
☐	4	Notify BU Representative of work start	Notify the planned start time and expected downtime duration
☐	5	Stop (deallocate) the VM	Azure Portal → "Stop" → "Yes" → Wait until Status shows "Stopped (deallocated)"
☐	6	Resize the VM	Portal → VM → "Settings" → "Size" → Select the new size → Click "Resize"
☐	7	Start the VM	Click "Start" → Wait until Status shows "Running"
☐	8	Verify the size change	Confirm the new size is displayed in VM "Overview" → "Size"
☐	9	Re-verify Accelerated Networking	Confirm NIC "Overview" → "Accelerated networking" shows "Enabled"
☐	10	Verify application operation	Coordinate with the BU Representative to confirm the service is operating normally
☐	11	Complete and close the ServiceNow CR	Record the before/after sizes, execution date/time, and verification results, then close

Azure CLI

# Stop (deallocate) the VM az vm deallocate --resource-group <RG-name> --name <VM-name> # Check available sizes (sizes available in the current zone) az vm list-vm-resize-options \ --resource-group <RG-name> \ --name <VM-name> \ --query "[?name=='Standard_D4s_v5']" --output table # Resize the VM az vm resize \ --resource-group <RG-name> \ --name <VM-name> \ --size Standard_D4s_v5 # Start the VM az vm start --resource-group <RG-name> --name <VM-name>

Procedure 4: Disk Addition and Expansion

Disk Addition Trigger	ServiceNow CR Performed upon receipt of a ServiceNow request from a BU Representative
Disk Expansion Trigger	ServiceNow CR Request from BU, or Monitoring Alert When a disk capacity exhaustion alert is triggered and TCS determines an emergency expansion is needed (TCS may proceed at their own discretion)
Operator	TCS

4-A: New Data Disk Addition ServiceNow CR Trigger

☐	Step	Action	Details / Verification Points
☐	1	Review the ServiceNow CR	Confirm the disk size, storage type, and intended use
☐	2	Obtain pre-work backup	Execute "Procedure 1: Pre-Work Backup"
☐	3	Open the VM's Disks settings in the Portal	Azure Portal → Target VM → "Settings" → "Disks" → Click "+ Add data disk"
☐	4	Create and configure the new disk	Configure the following in "Create disk": Name: `<hostname>_data<N>` (e.g., JZJP1WAPSP001_data01) Size: Size (GiB) as per the request Storage type: Standard SSD LRS (general) / Premium SSD LRS (DB data) Source type: None (empty disk) Key management: Platform-managed key Shared disk: No Delete with VM: ON
☐	5	Click "Save" to attach the disk	On the Disks screen, click "Save" → Confirm the disk appears under Data disks
☐	6	Initialize and format the disk at the OS level	Follow the OS-specific procedures below (requires connection to the VM)
☐	7	Verify operation and close the ServiceNow CR	Report disk addition completion to the BU Representative and close the CR

Disk Initialization at OS Level (Windows)

Windows PowerShell (Administrator privileges)

# Check for uninitialized disks Get-Disk | Where-Object PartitionStyle -eq 'RAW' # Initialize the disk (GPT) Initialize-Disk -Number <disk-number> -PartitionStyle GPT # Create a partition and assign a drive letter (e.g., drive D) New-Partition -DiskNumber <disk-number> -UseMaximumSize -DriveLetter D # Format with NTFS Format-Volume -DriveLetter D -FileSystem NTFS -NewFileSystemLabel "Data" -Confirm:$false

Disk Initialization at OS Level (Linux)

Linux Bash (root / sudo)

# Check the newly added disk lsblk # Create a partition (e.g., /dev/sdc) sudo parted /dev/sdc --script mklabel gpt sudo parted /dev/sdc --script mkpart primary ext4 0% 100% # Format with ext4 sudo mkfs.ext4 /dev/sdc1 # Create mount point and mount (e.g., /data01) sudo mkdir -p /data01 sudo mount /dev/sdc1 /data01 # Configure auto-mount on startup (append to /etc/fstab) echo "/dev/sdc1 /data01 ext4 defaults 0 2" | sudo tee -a /etc/fstab

4-B: Existing Disk Capacity Expansion Monitoring Alert Response or ServiceNow CR

💡 TCS self-initiated expansion: If monitoring alerts show disk utilization has reached a critical level (e.g., over 90%), TCS may perform an immediate expansion at their own discretion. However, TCS must promptly notify the BU Representative afterward.

☐	Step	Action	Details / Verification Points
☐	1	Review the alert details and identify the target disk	Determine the target VM name, disk name, current utilization, and the size after expansion
☐	2	Obtain pre-work backup	Execute "Procedure 1: Pre-Work Backup" (if possible)
☐	3	Expand the Azure Managed Disk size	Portal → Target Disk (Managed Disk) → "Size + performance" → Enter the new size → "Save" Both OS disks and data disks support online expansion while the VM is running (no VM stop required)
☐	4	Expand the partition and file system at the OS level	Follow the OS-specific procedures below (perform immediately after expansion)
☐	5	Verify disk capacity after expansion	Confirm that the disk capacity has increased at the OS level (Windows: File Explorer / Linux: df -h)
☐	6	Report completion to BU Representative and update ServiceNow	Record the before/after sizes, execution date/time, and verification results

Azure Managed Disk Expansion (CLI)

Azure CLI

# Expand disk size (e.g., expand to 200 GiB) az disk update \ --resource-group <RG-name> \ --name <disk-name> \ --size-gb 200

File System Expansion at OS Level (Windows)

Windows PowerShell (Administrator privileges)

# Check the expandable size $size = (Get-PartitionSupportedSize -DriveLetter C) Write-Host "Max size: $($size.SizeMax / 1GB) GB" # Expand the partition to maximum size (e.g., drive C) Resize-Partition -DriveLetter C -Size $size.SizeMax

File System Expansion at OS Level (Linux)

Linux Bash (root / sudo)

# Check current partition status lsblk df -h [For ext4 (RHEL / Ubuntu)] # Expand the partition (e.g., partition 1 of /dev/sda) sudo growpart /dev/sda 1 # Expand the file system (online expansion, no reboot required) sudo resize2fs /dev/sda1 [For xfs] sudo xfs_growfs /dev/sda1 # Verify after expansion df -h

Procedure 5: Azure VM Deletion

VM Deletion Procedure

☐	Step	Action	Details / Verification Points
☐	1	Review the ServiceNow CR	Confirm the target VM name, RG name, deletion reason, requester, and BU approval status
☐	2	Double-check the target VM name and RG name	Open the VM in the Portal and confirm that the VM name, RG, and tags match the details in the CR
☐	3	Obtain a final backup	Execute "Procedure 1: Pre-Work Backup" to preserve the final state before deletion
☐	4	Notify the BU Representative of the deletion	Notify the deletion date/time and target VM via Teams / Email
☐	5	Delete the VM	Portal → Target VM → "Delete" → Confirm that the target resources (VM, NIC, OS disk) are checked → Enter VM name → "Delete"
☐	6	Confirm related resources are deleted	Verify in the Portal that the NIC, OS disk, and data disks (if Delete with VM is ON) have been deleted
☐	7	Remove the VM from CyberArk	Delete the target VM entry from CyberArk
☐	8	Update ServiceNow CMDB	Retire/delete the corresponding record in the ServiceNow CMDB
☐	9	Complete and close the ServiceNow CR	Record the deletion completion date/time and verification results, then close

Azure CLI

# Pre-verify VM name and RG name az vm show --resource-group <RG-name> --name <VM-name> --output table # Delete the VM (automatic deletion of related resources depends on "Delete with VM" settings) az vm delete \ --resource-group <RG-name> \ --name <VM-name> \ --yes # Check if any NICs remain (delete manually if found) az network nic list --resource-group <RG-name> --output table # Check if the OS disk remains (delete manually if found) az disk list --resource-group <RG-name> --output table

Procedure 6: Azure VM Restore

6-A: File / Folder Restore

☐	Step	Action	Details / Verification Points
☐	1	Review the ServiceNow CR	Confirm the file path to restore, backup point (date/time), and restore destination
☐	2	Open the Recovery Services Vault	Azure Portal → "Recovery Services vaults" → Select the target Vault
☐	3	Select "File Recovery"	"Backup items" → "Azure Virtual Machine" → Target VM → "File Recovery"
☐	4	Select the restore point	Select the backup point (date/time) specified in the CR
☐	5	Download and execute the recovery script	Get the script via "Download Executable" → Run it on the target VM → The backup disk will be mounted
☐	6	Copy the target files to the restore destination	Copy the target files from the mounted backup disk to the original path
☐	7	Unmount the disk	After recovery is complete, click "Unmount disks" in the Portal to unmount the backup disk (auto-released after 12 hours)
☐	8	Verify restore results and close the ServiceNow CR	Have the BU Representative confirm file restore completion, then close the CR

6-B: Full VM Restore (Restore VM)

⚠️ A full VM restore cannot overwrite an existing VM. Always specify a new VM name / NIC / disk for the restore.
After the restore, re-registration in CyberArk and CMDB update are required for the restored VM.

☐	Step	Action	Details / Verification Points
☐	1	Review the ServiceNow CR	Confirm the target VM, backup point (date/time), destination RG, and new VM name
☐	2	Open the Recovery Services Vault	Portal → "Recovery Services vaults" → Target Vault → "Backup items" → "Azure Virtual Machine"
☐	3	Select "Restore VM" for the target VM	Target VM → Click "Restore VM"
☐	4	Select the restore point	Select the backup point (date/time) specified in the CR
☐	5	Configure the restore settings	Configure the following under "Create new": Restore type: "Create new virtual machine" Virtual machine name: New VM name (follow naming conventions) Resource group: Destination RG Virtual network / Subnet: Select from existing VNets / subnets Staging location: Specify a temporary storage account
☐	6	Click "Restore" to start the restore	In "Backup Jobs", wait until the job shows "Completed" (typically 30–60 minutes)
☐	7	Assign the Common NSG to the restored VM	Re-assign the Common NSG (si2-securitygroup-shd-cs-tokyo-cmn-01) to the NIC of the restored VM
☐	8	Reconfigure the backup policy for the VM	Enable backup for the restored VM in the Recovery Services Vault
☐	9	Reconfigure tags	Set Suntory standard tags (Subsidiary / ServiceName / Environment, etc.) on the restored VM
☐	10	Re-register in CyberArk	Register the restored VM in CyberArk and configure/change the administrator password
☐	11	Update the ServiceNow CMDB	Update the corresponding CMDB record with the new VM name and IP address
☐	12	Verify operation and close the ServiceNow CR	Have the BU Representative confirm service recovery, then close the CR

Escalation Flow

Escalation Criteria

Situation	Escalation Target	Response Method
An event occurs that is not covered by the documented procedures	TCS Azure Operation Team → SHD	Create a ServiceNow incident, record the situation, and escalate
Service does not recover after restart or restore	TCS Azure Operation Team → Azure Support	Follow the Azure support inquiry procedure (refer to separate document)
Cannot obtain restart approval from the BU Representative	TCS Azure Operation Team → SHD	Request SHD to coordinate with the BU
An error occurs during Azure Portal operations	TCS Azure Operation Team → Azure Support	Capture the error message and operation logs via screenshots and report
Disk capacity shortage continues after expansion	SHD → BU	Escalate to SHD from an architecture review perspective

Contact Information (update separately)

Role	Person / Team	Contact Method
TCS Operations Team Lead	—	Teams channel / Phone
SHD Service Manager	Tomoki Koyama (SHD / Digital & AI Global ITG)	Teams / Email
Azure Technical Support	—	Azure Portal Support Request (refer to separate document)