Likely the single most controversial part of the Mirage NAC product involves its use of ARP. Stiennon referred to "ARP twiddling" as our means of quarantine during our recent, er, discussion on the usefulness of NAC. As tempting as it was to respond to the comment in the thread, I thought perhaps it was better to leave it for its own post. So, what's up with our use of ARP? Why do we do it this way and why haven't we moved more quickly to one of the more common means of quarantining endpoints? Here's the lowdown that before has been kept more on the down-low.
More than just quarantine.
One of the first discussions we (at least I) end up having with prospects surrounds the notion of device state. The knowledge of network entry and exit forms the bookends of the NAC process. Any NAC solution must have this basic notion in order to provide the necessary governance. To be sure, many options exist. Some solutions take alerts from the switching infrastructure (SNMP traps, Syslog messages, whatever) based upon either the link state of the port or the population of the bridging tables. Others may have a RADIUS hook, on the assumption that any device that's connecting is authenticating in one way or another. Still others may have a DHCP hook and use address assignment as the state trigger. Finally, some of the inline solutions simply wait to see traffic flow through them. There are three basic reasons we like ARP for this: First, and most importantly, it's immediate since it's part of the initial stack initialization. Second, it's independent of the switching infrastructure, in that it works the same way regardless of downstream switches, whether the switches can send traps, etc.) Third, it's independent of endpoint characteristics, such as OS type and whether the address is statically assigned or dynamically assigned.
But also quarantine
Yes, we use ARP for quarantine. The marketing side of the house prefers "ARP Management" to "ARP Poisoning" or "ARP Twiddling." I don't especially care. The best way to enumerate why we do it this way is to back up and review, at a slightly higher level, what we think quarantining should be and mean:
Quarantining should be fast
This almost seems like it could go without saying. Whether effected for the purposes of device-specific remediation or more generalized network protection, time is both money and risk. A quarantining method that takes, for example, seconds to put in place is, at least to us, a non-starter.
Quarantining should be holistic
One of the fundamental disagreements I have with Stiennon's notion of network security (rant alert) is that packet (as opposed to endpoint connection) dropping is sufficient as a mitigation method. With a threat blended into, say, a bot (with an https control channel), a file-sharing based worm, a keystroke logger that may be logging data but not sending it, and a spam relay, the very notion of separating "good" traffic from "bad" traffic is silly. Take even one of the highly outdated threats (and quaint, by today's standards) threats like Blaster and Welchia. The "bad traffic" in those cases was Windows networking traffic, whose primary usage was *inside* the infrastructure. So then a "good traffic/bad traffic" policy removes windows networking functions from the connection, which removes access to both file shares and (assuming Exchange) corporate email. Now then, how useful, really, is that endpoint's connection? Well, they can get their stock quotes. And they can get to internal portal-based applications. Boy, there's a great idea. "I see, Mr/Ms user that you're infected with malware; welcome to my Oracle Financials application." How does that idea pass muster with anyone?
Quarantining should be full-cycle
This may or may not be a point of debate, but we continue to believe that a quarantine mechanism that is only available at admission time is wholly inadequate. This is the "main" thing keeping us from leveraging 802.1x as a quarantine mechanism (the lack of this capability in 802.1x environments has been a rant of mine before, since the RFC that would allow for this is 5 years old). As much as I like the idea of enforcement at the point of access, I simply do not see how it's workable unless and until it's possible to revoke previously granted access based on policy.
Quarantining should be transparent
I put this last since I think it's one with the largest amount of wiggle room. The thing I always liked the least about SNMP and CLI based VLAN moving is that there remains the need to get the endpoint to go request a new address as a result of the VLAN change. Similarly, the best that DHCP can offer is to play around with the timing elements, but that's not the same as the ability to do quarantine on-demand per policy. Not to beat a dead horse or anything, but RFC 3576 would get us there if the switch vendors would just implement it. Did I mention that the RFC is 5 years old?
So, there you have it. We use ARP for state because it's fast, robust and hard to bypass. We use ARP for quarantine because, quite frankly, we've yet to see any other quarantine mechanism that can fit the bill.
PS
I almost forgot. The main knock against our "ARP twiddling" approach, beyond just the philosophical objection, is that it's easy to bypass. Is it? Some methods may be. Ours is not. Not from our own internal testing, and not from installations spanning 550 customers across 38 countries. And, yes, we thought of the static cache-entry trick already.