In this third episode of "This Is How We Do It," we delve into the world of artificial intelligence (AI) and machine learning (ML) with Billy Hewlett, leader of the AI research team here at Palo Alto Networks and grandson of Bill Hewlett from the Hewlett Packard Corporation. Billy and his team are responsible for developing machine learning models to combat malware and other cyberthreats.
Billy’s journey in the field of AI for security began when he first programmed AI systems to protect innocent players from trolls in popular video games, like World of Warcraft®. Today, his work focuses on applying machine learning to identify and stop malicious activities, such as malware, phishing and other cyberthreats, ensuring the safety of Palo Alto Networks customers.
One of the first topics discussed is the alarming growth of malware over the years. Billy explains that the number of unique malware samples has skyrocketed from around 85 million in 2012 to over a billion today. This exponential growth necessitates innovative approaches to detecting and mitigating these threats.
Billy then highlights some exciting applications of their AI-powered products. For instance, their machine learning models can analyze various aspects of a webpage, such as its content, images and URL, to determine if it is a phishing page. The ability to automate this process is particularly crucial considering the massive scale at which these analyses need to be performed. With millions of potential threats encountered daily, relying solely on human experts is impractical, making machine learning a vital tool in ensuring effective security.
The interview further explores the challenge of phishing detection, where attackers continually evolve their tactics. Initially, the focus was on identifying suspicious web page content, but attackers began using JavaScript to create convincing replicas of legitimate login pages. To address this, Palo Alto Networks implemented machine learning models capable of analyzing webpage images. For instance, by training the models on known images from different banks and organizations, they can accurately identify phishing attempts by detecting mismatches between the actual organization and the presented content. Billy explains further:
“You can look at the URL of the webpage. All of these things will allow you to
make a decision whether or not this is a phishing page. And all of this is done
by machine learning. But now imagine that you have to do this 50-60 million
times a day. That's the scale that we're talking about. So, obviously you can't do
that at that scale with a human expert. So instead, you have a machine that you
can train to do that.”
Another crucial aspect of threat detection is analyzing URL strings. While humans can often identify misspellings or other irregularities, machines struggle to differentiate between legitimate and malicious URLs. To overcome this, Palo Alto Networks uses machine learning to assess the characteristics of a URL and classify it as either benign or malicious. This approach enhances their ability to detect phishing attempts and other credential-based attacks:
“So, in this case, we have machine learning applied to both the text on the page
and the image present. … often an expert can quickly identify misspellings or
fraudulent URLs, like if someone misspelled 'Amazon' in the URL. They can
visually inspect it and recognize the issue. However, we can also apply machine
learning to analyze the URL string itself, such as 'www.go0gle.com.' By training
the model with various strings, we can determine if a URL is malicious or benign."
David Szabo, who conducted the interview, raises an interesting point about the computational requirements of running machine learning models on firewalls. Billy explains that while the processing power required is significant, the main limiting factor is memory. Firewalls need to handle massive volumes of data, making memory optimization essential. By designing lightweight models and leveraging efficient memory usage, Palo Alto Networks successfully implements machine learning at the edge, allowing for real-time threat detection without compromising performance.
“I'm most proud of machine learning in the firewall. Taking this huge ML
problem and running it in our edge device. The idea is we're going to take all
of our big machine learning in the cloud, which makes sense since you have all
the resources of the cloud to do it there, and we're actually going to push it
down until it's running in the firewall.
I describe this as if you have this huge firehose of data — all this information
that's coming from WildFire, from URL filtering, from all these different places,
and we're going to winnow that down and get to a very, very, very tiny model
that we can run in real time on the firewall. And this model is going to actually
run at packet speed.”
The conversation then shifts to the process of training the machine learning models. Billy explains that the training is conducted in the cloud, utilizing vast amounts of data collected from various sources. Palo Alto Networks builds a new version of the model every day, incorporating data from the previous two to three weeks. These models undergo rigorous testing on separate datasets spanning two months, ensuring their effectiveness and adaptability to evolving threat landscapes. Once a new model proves superior to the existing one, it is distributed to all the firewalls within the network.
Billy expresses his pride in the achievement of deploying machine learning on firewalls, allowing for real-time threat detection at the packet level. This innovation enables Palo Alto Networks to swiftly identify and block malicious activities without requiring the entire file for analysis. By continually updating its models and distributing them to the firewalls, the company stays at the forefront of security technology.
In conclusion, the use of artificial intelligence and machine learning has become a critical component in the battle against evolving cyberthreats and threat actors, who employ their own versions of these tools. Palo Alto Networks, under the leadership of Billy and his AI research team, has made significant strides in using these technologies to protect customers from malware and other malicious activities. With the exponential growth of malware over the years, it has become essential to find innovative ways to identify and prevent attacks.
Watch the full interview on the Cortex YouTube Channel!