XPath Cheatsheet for Links and URLs#
Combine attributes and functions to target specific link nodes. But first, how to get all <a> link nodes:
//a
Exact Matching#
Get all HTTPS links using the starts-with() function:
//a[starts-with(@href, 'https')]
Get all non-HTTPS links. As above, but negated by the non() function:
//a[not(starts-with(@href, 'https'))]
Get all links for MP3 audio files using the ends-with() function:
//a[ends-with(@href, '.mp3')]
Get all links without a trailing slash by combining the not() and ends-with() functions.
//a[not(ends-with(@href, '/'))]
Non-Exact Matching#
Get all blog links that using the contains() function:
//a[contains(@href, 'blog')]
Negate the above to get all non-blog links using the not() function:
//a[not(contains(@href, 'blog'))]
Get all blog links about food using the and operator:
//a[contains(@href, 'blog') and contains(@href, 'food')]
Get all blog links that aren't about food by combining the and and not() operators:
//a[contains(@href, 'blog') and not(contains(@href, 'food'))]
Get all blog or news links using the or operator:
//a[contains(@href, 'blog') or contains(@href, 'news')]
Other#
Get all links with a URL longer than 55 characters using the string-length() function:
//a[string-length(@href) > 55]