XPath
XPath (XML Path Language) is a powerful query language used to navigate and select elements from XML documents. It is commonly employed in web scraping, XML processing, and testing scenarios.
XPath Syntax
XPath expressions use a path notation to identify and navigate XML elements. The basic syntax includes:
- Node Selection:
/is used to select the root node.elementselects all child elements of the current node with the specified name:/root/element - Wildcards: The
*symbol is used as a wildcard for any element./root/* - Predicates: Square brackets
[]are used to specify conditions for node selection./root/element[@attribute='value']
Absolute XPath
Absolute XPath provides the complete path from the root node to the desired element. It starts with a single forward slash /.
/html/body/div[1]/p[2]
Relative XPath
Relative XPath selects elements based on their relationship to other elements. It does not start from the root, allowing for more flexible and adaptable expressions.
//div[@class='example']/a
XPath Functions
XPath provides various functions for more complex queries. Examples include text(), contains(), and position().
//h2[contains(text(),'XPath')]
Some functions include:
text(): Thetext()function is used to select the text content of an element.contains(element, value): Thecontains()function is used to check if a string contains a specific substring.starts-with(element, value): Thestarts-with()function is used to check if a string starts with a specified prefix.concat(elements...): Theconcat()function concatenates two or more strings.not(element): Thenot()function negates a given expression.position(): Theposition()function returns the position of the current node in the selection.last(): Thelast()function returns the position of the last node in the selection.