How to Get the HTML Source#
Sometimes it's useful to get the HTML source of a page or specific elements. With Browserist, that's easily done.
For the following examples, let's imagine the following boilerplate page source:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Example.com</title>
</head>
<body>
<h1>Welcome</h1>
<p>This is a paragraph.</p>
</body>
</html>
Page Source#
How to get all the HTML source of a page:
Python | |
---|---|
This will print the full page source as above:
How to Get the HTML from Single-Page Applications (SPAs) or Lazy Loading Pages
When working with single-page applications (SPA) that dynamically or lazily load the content of a page, it's notoriously difficult to get the page source. The source simply changes depending on its state. Instead, you can get the HTML of a particular state. For example:
Python | |
---|---|
We use browser.scroll.page.to_end()
to ensure that all loading of all page content is triggered, which can be replaced by other interactions like clicking a button or selecting an item in a drop-down menu.
And instead of browser.get.html.page_source()
, we use browser.get.html.element_outer("//html")
to get the HTML of the current state.
Source by Element#
Inner HTML#
How to get the inner HTML source of an element:
Python | |
---|---|
This will give you the inner HTML of the <body>
tag:
Outer HTML#
How to get the outer HTML source of an element:
Python | |
---|---|
This will give you the outer HTML of the <body>
tag: