Python Basics for Hackers, Part 5: Creating a Web Site Scraper to Find Potential Passwords

Welcome back, my aspiring cyberwarriors!

Creating password lists is a key element of a successful password cracking strategy. In nearly every case, we need to provide a list of potential passwords to the password cracking tool whether it be hashcat, John the Ripper, BurpSuite, cameradar or others (the exception being a true brute force attack where you attempt all possible character combinations).

Human beings, in general, will use a password they can remember. If the person can't remember the password, it really doesn't serve the purpose. People often use such as things as their birth date, wedding date, children or spouses names, etc. to create a password. Then, they will often munge (replacing letters with numbers or special characters such as p@$$w0rd) it or add numbers or special characters to meet minimum password requirements (one lower, one upper case, one number and one special character). We have several tools capable of creating such password lists including cupp and crunch.

In some cases, people will use special words from their business or industry assuming that no one would possibly suspect these are their passwords. For instance, someone in the pastry industry might use words particular to their industry such as strudel or croissant. Someone in the biotech industry might use crspr or polymerase. To be able to create a potential password list for these industries (assuming have have already tried the most common passwords), you may want to scrape key words from their website to create a password list.

Although we already have a tool named cewl for this purpose, let's create our own web scraper in Python. In this way, we can learn a bit more Python but also can develop a framework for scraping web sites in general. Web scraping is used by many IT tools and technologies such as search engines (Google and Bing) and artificial intelligence engines, such ChatGPT and others.

Step #1: Getting Started

To begin, you will need two key packages, requests and BeautifulSoup. You can install them using pip.

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML,which is useful for web scraping.

The requests library in Python is a popular tool for making HTTP requests. It abstracts the complexities of making requests behind a simple API, providing an easier method to send HTTP/1.1 requests.

kali > pip install beautifulsoup4 requests

Step #2: Begin Writing our Python Code

Lets now open a text editor in Kali and begin building our password scraper. Here I am using mousepad but you can use any text editor or IDE.

To begin, we will need to import requests and BeautifulSoup along with the regular expressions library (for more on regular expression see this article).

Next, we need to prompt the user for the;

(1) URL to scrape,

(2) the minimum and maximum word length,

(3) the file name to write the potential passwords to.

Minimum and maximum word length is crucial as we do not want to scrape small words such as "is","a", "the" or other similar words. At the same time, we do not want to capture words that are obviously too long to be likely passwords.

Now we need to grab the content from the URL chosen by the user. We use an IF statement to make certain the URL is actually online and functioning. If the the status code is anything other than 200, it prints an error massage and exits.

Here we use BeautifulSoup to grab the content, parse it and place it into variable named "soup".

Next, we use regex to extract the words from the page content.

(r'\b\w+\b)

This regex expression parses the HTML by first including all raw data (r) then looking for a word boundary (\b), then looking for any alphanumeric character (\w) followed by any number of characters (+) and finally ending at another word boundary (\b).

Next, our code filters by the minimum and maximum number of characters input by the user.

Finally, our code writes the words meeting the minimum and maximum criteria to a file specified by the user.

You can see the completed script below. Save it as HackersArisePasswordScraper.py

Step #3: Test our Script

Finally, let's test this script.

You will need to give yourself execute permissions.

kali > sudo chmod 755 HackersArisePasswordScraper.py

We can now run this script to test its effectiveness.

kali > python3 ./HackersArisePasswordScraper.py

When prompted for the URL, I entered the artificial intelligence and graphics chip developer, http://nvidia.com. When prompted for the minimum word length, I entered 8 and when prompted for the maximum word length, I entered 12. Finally, I specified the name of the ouput file as nvidia.txt when prompted by the script.

When it completed running, I can view the words in our ouput file using the more command.

kali > more nvidia.txt

Fabulous!

Now that you have collected these words, you will likely want to create variations on them by using various tools such as cupp, BurpSuite and crunch.

Summary

Finding potential passwords is a key task in cracking passwords. Nearly every password cracking tool requires that input a password list. In many cases, users will utilize a word from their industry or business in creating their password and then munge it and/or add numbers and special characters to meet minimum password requirements. We can collect these key words from their industry by scraping their website and then using the list to crack their passwords. It's likely you will want to munge these words using BurpSuite, cupp, or crunch.

Web Scraping is a key tool and technology within the IT, AI and cybersecurity industry. By building a tool for web scraping with BeautifulSoup, we have opened a large variety of capabilities for the potential cyberwarrior!