XML Traversal in PowerShell for dummies
The Object: System.Xml.XmlDocument
This is a .NET class implementing the W3C DOM (Document Object Model). When you load XML into it, it parses the entire document into a tree of nodes in memory. Every node in that tree is a .NET object. The types you encounter:
| .NET Type | What it is |
|---|---|
XmlDocument |
The root container — the document itself |
XmlElement |
A tag: <EntityType Name="Foo"> |
XmlAttribute |
An attribute on a tag: Name="Foo" |
XmlText |
Text content inside a tag |
XmlNodeList |
A collection of nodes (returned by SelectNodes) |
$xml = [System.Xml.XmlDocument]::new()
$xml.Load("C:\path\to\file.xml")
::new() is the .NET constructor. You could also write New-Object System.Xml.XmlDocument — same result, uglier. Load() parses the file from disk into the in-memory tree.
Querying: XPath
You query the tree using XPath — a query language specifically for XML. Two methods:
$xml.SelectSingleNode("//EntityType[@Name='Foo']") # returns first match
$xml.SelectNodes("//EntityType") # returns XmlNodeList (all matches)
XPath syntax you need to know:
| Syntax | Meaning |
|---|---|
//TagName |
Find this tag anywhere in the document |
/Root/Child |
Exact path from root |
[@Attr='Value'] |
Filter where attribute equals value |
[@Attr] |
Filter where attribute exists |
* |
Any element |
. |
Current node |
.. |
Parent node |
So //edm:EntityType[@Name='PurchaseOrderLineV2'] means: anywhere in the document, find an EntityType element (in the edm namespace) where the Name attribute is exactly PurchaseOrderLineV2.
XML Namespaces and XmlNamespaceManager
XML namespaces exist to prevent name collisions when multiple XML vocabularies are mixed in one document. A namespace is declared on an element like this:
<Schema xmlns="http://docs.oasis-open.org/odata/ns/edm">
That URI is the namespace. edm is just a prefix alias — a shorthand you define yourself so you do not have to type the full URI every time. The prefix is meaningless on its own; it is the URI that matters.
In .NET, XmlNamespaceManager holds a registry of prefix → URI mappings:
$ns = [System.Xml.XmlNamespaceManager]::new($xml.NameTable)
$ns.AddNamespace("edm", "http://docs.oasis-open.org/odata/ns/edm")
NameTable is the document's internal string pool — the namespace manager needs it to intern strings efficiently. You could name the prefix anything:
$ns.AddNamespace("x", "http://docs.oasis-open.org/odata/ns/edm")
$xml.SelectSingleNode("//x:EntityType[@Name='Foo']", $ns)
Same result. The prefix edm is convention, not law.
When do you need a namespace manager? Whenever the XML declares a default namespace with xmlns=. Without the manager, XPath cannot resolve the tag names and returns nothing.
Working With Results
Once you have a node:
$entity = $xml.SelectSingleNode("//edm:EntityType[@Name='PurchaseOrderLineV2']", $ns)
# Read an attribute
$entity.Name # "PurchaseOrderLineV2"
$entity.GetAttribute("Name") # same thing — more explicit
# Get child elements
$entity.ChildNodes # all direct children as XmlNodeList
$entity.SelectNodes("edm:Property", $ns) # children matching tag name
# Walk the XmlNodeList
$entity.SelectNodes("edm:Property", $ns) | ForEach-Object {
[PSCustomObject]@{
Name = $_.Name
Type = $_.GetAttribute("Type")
}
}
$_.Name on an XmlElement gives you the tag's Name attribute — this is a property XPath exposes directly. GetAttribute() is more explicit and unambiguous.
The Alternative: Select-Xml
PowerShell has a built-in cmdlet. It is slower and more verbose, but sometimes convenient:
Select-Xml -Path "file.xml" -XPath "//edm:EntityType" -Namespace @{edm="http://docs.oasis-open.org/odata/ns/edm"} |
Select-Object -ExpandProperty Node
Select-Xml wraps XmlDocument internally. The direct approach gives full access to the object model without the abstraction overhead.
When NOT to Use DOM
XmlDocument loads the entire file into memory. For very large files (hundreds of MB), use XmlReader instead — it is a forward-only streaming parser that does not materialise the whole tree. The trade-off: no XPath, no random access.
For one-time lookups on files that fit in RAM, DOM is correct.
Complete Example: Querying OData Metadata
# Load the document
$xml = [System.Xml.XmlDocument]::new()
$xml.Load("C:\path\to\metadata.xml")
# Register the namespace
$ns = [System.Xml.XmlNamespaceManager]::new($xml.NameTable)
$ns.AddNamespace("edm", "http://docs.oasis-open.org/odata/ns/edm")
# Find a specific entity and its properties
$entity = $xml.SelectSingleNode("//edm:EntityType[@Name='PurchaseOrderLineV2']", $ns)
$entity.SelectNodes("edm:Property", $ns) |
Where-Object { $_.Name -like "*Status*" } |
Select-Object Name, Type
# Resolve an enum type
$enum = $xml.SelectSingleNode("//edm:EnumType[@Name='PurchStatus']", $ns)
$enum.SelectNodes("edm:Member", $ns) | Select-Object Name, Value