XPath Cheatsheet for Links and URLs#
Combine attributes and functions to target specific link nodes. But first, how to get all <a>
link nodes:
//a
Exact Matching#
Get all HTTPS links using the starts-with()
function:
//a[starts-with(@href, 'https')]
Get all non-HTTPS links. As above, but negated by the non()
function:
//a[not(starts-with(@href, 'https'))]
Get all links for MP3 audio files using the ends-with()
function:
//a[ends-with(@href, '.mp3')]
Get all links without a trailing slash by combining the not()
and ends-with()
functions.
//a[not(ends-with(@href, '/'))]
Non-Exact Matching#
Get all blog links that using the contains()
function:
//a[contains(@href, 'blog')]
Negate the above to get all non-blog links using the not()
function:
//a[not(contains(@href, 'blog'))]
Get all blog links about food using the and
operator:
//a[contains(@href, 'blog') and contains(@href, 'food')]
Get all blog links that aren't about food by combining the and
and not()
operators:
//a[contains(@href, 'blog') and not(contains(@href, 'food'))]
Get all blog or news links using the or
operator:
//a[contains(@href, 'blog') or contains(@href, 'news')]
Other#
Get all links with a URL longer than 55 characters using the string-length()
function:
//a[string-length(@href) > 55]