XPath Cheatsheet for Links and URLs#

Combine attributes and functions to target specific link nodes. But first, how to get all <a> link nodes:

//a

Exact Matching#

Get all HTTPS links using the starts-with() function:

//a[starts-with(@href, 'https')]

Get all non-HTTPS links. As above, but negated by the non() function:

//a[not(starts-with(@href, 'https'))]

Get all links for MP3 audio files using the ends-with() function:

//a[ends-with(@href, '.mp3')]

Get all links without a trailing slash by combining the not() and ends-with() functions.

//a[not(ends-with(@href, '/'))]

Get all blog links that using the contains() function:

//a[contains(@href, 'blog')]

Negate the above to get all non-blog links using the not() function:

//a[not(contains(@href, 'blog'))]

Get all blog links about food using the and operator:

//a[contains(@href, 'blog') and contains(@href, 'food')]

Get all blog links that aren't about food by combining the and and not() operators:

//a[contains(@href, 'blog') and not(contains(@href, 'food'))]

Get all blog or news links using the or operator:

//a[contains(@href, 'blog') or contains(@href, 'news')]

Get all links with a URL longer than 55 characters using the string-length() function:

//a[string-length(@href) > 55]