Website categorization API and use cases

Why do we need website categorization?

Limiting access to risky websites is a cybersecurity best practice, but many enterprise cybersecurity solutions leave vulnerability gaps — using unreliable data, methods that are easily evaded, or simply a lack of adequate coverage. With the right kind of website categorization tools and policy enforcement, intelligence can be gathered on the types of websites your employees are visiting, and you can use this information to uncover hidden threats. In today’s world of heightened digital risk, it’s important to further your approach beyond prevention and discover where your organization is vulnerable so you can eliminate those risks from within.

Website categorization is important in the fight against corporate, private and government cyber-security breaches. For example, to properly prepare for a cyber-attack, one must first understand which websites employees are using. This allows one to mitigate threats with actionable data before malicious sites can infect internal systems with malware or allow unauthorized access through an SQL injection.

Whether an organization is strict or lenient about what its employees can do online, security and IT teams often need assistance in detecting when a user is going to a website that isn’t related to business needs. Categorizing every website that exists is impossible for most security teams, and so security services and products must do it for them.

You need to block employees from spending time on non-work related websites instead of looking at cat videos or reading the news. You apply web content filtering, but then you run into a problem: how do you know when an employee is going to a website that is not work-related? Fortunately, there are services available to help automatically categorize websites for you.

Categorization definitions

The category of a website is subjectively determined. One web content filter may classify a site as Financial Services, while another classifies it as Finance. There are many different services that say that they can help organizations categorize every site on the internet. In reality, this is not possible. That’s why you need to have a solution that allows you to categorize websites for your organization without being tied to someone else’s categorization system.

Here is an example of a set of website categories that can be used for content filtering done in a company:

  • Malicious
  • Phishing
  • DDNS
  • Proxies
  • Drugs that are prohibited
  • Gambling activities
  • SE
  • Looking for jobs
  • Copyright problematic
  • Legal issues
  • Adult content
  • Downloading
  • Music playing
  • News browsing
  • Sports websites
  • Games
  • Shopping activities

What is Website categorization?

Website categorization (also known as website classification or URL classification) is a means for companies to classify sites they access frequently under different umbrellas for marketing, cybersecurity, and brand protection purposes. Examples of website categories include entertainment, shopping, games, and more.

Cybersecurity and website categorization go hand in hand. With the many types of websites being accessed on a daily basis by millions of end users across the world, it is important for organizations to monitor the types of sites that employees frequently visit. Identifying these sites and exploring who is visiting them helps to build a complete picture of what is happening in the web browser environment. This is where cybersecurity comes into play.

An interesting subset of text classification models is categorization of products, e.g. in ecommerce domain. This can help online stores improve discoverability on their websites.

Useful resources and libraries for website categorization:

Yarn package for website categorization

Useful npm trends package