Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental DosHandler #12068

Open
wants to merge 24 commits into
base: jetty-12.1.x
Choose a base branch
from
Open

Conversation

gregw
Copy link
Contributor

@gregw gregw commented Jul 22, 2024

Fix #10478. This is a simple DosHandler that delays and then rejects all requests over a limit.

@gregw
Copy link
Contributor Author

gregw commented Jul 22, 2024

@sbordet @lorban Can you give this a quick initial review before I commit time to tests, config and doco.

@lorban
Copy link
Contributor

lorban commented Jul 23, 2024

Speaking about the design, I see two problems if you configure a moderately large throughput (say 10K req/s):

  • You need to allocate an array with one 64-bit entry per request per second. For 10K req/s that's ~80 KB of memory that's constantly scanned and updated. That alone will totally trash all CPU caches.
  • There's a single lock protecting that array. I'm not sure this could even reach 10K req/s.

@gregw
Copy link
Contributor Author

gregw commented Jul 24, 2024

@lorban how about this version that uses an exponential moving average?

@gregw gregw marked this pull request as ready for review July 25, 2024 21:34
@gregw
Copy link
Contributor Author

gregw commented Jul 26, 2024

@sbordet Can you take this one on?

@sbordet
Copy link
Contributor

sbordet commented Jul 29, 2024

What I'd like:

  • Pluggable algorithm for rate exceeded -- this will reduce the number of parameters to the constructor, by separating the parameters for the algorithm from those just related to the DoSHandler like maxTrackers.
  • Now each tracker is a CyclicTimeout, but DosHandler should handle all the Trackers via CyclicTimeouts.
  • usePort seems weird, as using the ephemeral port from the client seems going against the purpose of the DoS defense: the client will use many ephemeral ports.
  • Not sure I understand the current algorithm: if a client sends 11 requests, and the 11th exceeds the rate, it is queued, but I'd say it's simpler to reject it immediately. Right now there is a hard-coded 2 s timeout that when expires rejects the queued request. Rejecting immediately would simplify (no queue, no queue parameters), and I see no point waiting 2 seconds to reject?

I would keep this Handler as simple as possible: if rate is exceeded, reject.

@gregw
Copy link
Contributor Author

gregw commented Jul 31, 2024

@sbordet

What I'd like:

  • Pluggable algorithm for rate exceeded -- this will reduce the number of parameters to the constructor, by separating the parameters for the algorithm from those just related to the DoSHandler like maxTrackers.

OK, but I don't see replacing two algorithm arg (samplePeriodMs and alpha) with one new ExponentialMovingAverage(samplePeriodMs, alpha) as much of a saving in parameters. But I'm OK to have the algorithm pluggable.

But then perhaps we should do the same for the ID extraction, which is currently a protected method and two constructor arguments. I'll have a play and see...

  • Now each tracker is a CyclicTimeout, but DosHandler should handle all the Trackers via CyclicTimeouts.

OK.

  • usePort seems weird, as using the ephemeral port from the client seems going against the purpose of the DoS defense: the client will use many ephemeral ports.

It if for the use case when the server is behind some intermediary, so the remote IP is the intermediary and not the client. Sometimes you cannot trust the Forwarded-for headers because not all intermediaries are smart enough to police that they are not sent from the client itself. So using the source port on the intermediary is a proxy for identifying the connection and thus the client. This is (was?) commonly used by Cisco smart routers. But if we make the ID algorithm pluggable, then this can be done at no cost.

  • Not sure I understand the current algorithm: if a client sends 11 requests, and the 11th exceeds the rate, it is queued, but I'd say it's simpler to reject it immediately. Right now there is a hard-coded 2 s timeout that when expires rejects the queued request. Rejecting immediately would simplify (no queue, no queue parameters), and I see no point waiting 2 seconds to reject?

The idea of delay rather than rejecting is to delay additional requests on the same connection. An attacker can pipeline many HTTP/1 requests in a single TCP frame that is buffered. If you just reject the request, then the next one will be there and you will need to do work to reject that. Closing the connection can avoid that, but then that tells the attacker that they need to open another connection to continue the attack. By delaying, the attacker does not know if they are causing DOS or not, and they have to hold open resources to keep the connection alive.

I would keep this Handler as simple as possible: if rate is exceeded, reject.

Reject is not good enough. We'd have to close the connection to avoid issues of many pipelined h1 requests. But then we don't have that semantic for H2 and H3 connections, i.e. we can send a response to a single request that will close all other streams on the same h2 connection.

Delay is expensive for the server, so perhaps we should come up with a way of closing h2 and h3 connections?

@gregw
Copy link
Contributor Author

gregw commented Jul 31, 2024

@sbordet I've pushed a change to make the ID a pluggable function rather than fixed. It is OK, but might a bit of effort to make it configurable from a module ini file.

I'll try the same approach for the rate algorithm...

@sbordet
Copy link
Contributor

sbordet commented Jul 31, 2024

@gregw are you planning of integrating request handling with connection acceptance? I.e. stop accepting connections from the suspicious IP?

@gregw
Copy link
Contributor Author

gregw commented Jul 31, 2024

@gregw are you planning of integrating request handling with connection acceptance? I.e. stop accepting connections from the suspicious IP?

I wasn't..... but that could be a good idea. Would need new semantic in our connectors, but we need to add some new semantic any if we are to be able to close a connection for h2/h3.

@gregw
Copy link
Contributor Author

gregw commented Jul 31, 2024

@sbordet I've made the Rate pluggable now as well. It is all a bit messy and lacks javadoc, but give me some feedback on the direction before I commit any more effort. I might work a little bit tomorrow as well.

Copy link
Contributor

@sbordet sbordet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregw I think Rate could be renamed to RateControl (which we already have in jetty-http2) or similar, but rather than having a getRate() and having to pass the maxRequestPerSecond as an extra parameter, I'd prefer a RateControl.isExceeded(Request), so all parameters end up in the RateControl.Factory specific implementation.

Also, given that reject+block connections, or wait+reject are valid strategies, then perhaps we need to abstract that too, introducing a RejectHandler that also can be implementation specific.
One of the implementations can be linked to the Connector to block connections (which perhaps requires more changes -- I think we have suspend accepting for all remote clients, but not from specific IPs).

DoSHandler(Handler, PeerMapper, RateControl.Factory, RejectHandler) { ... }

where PeerMapper is your Function<Request, String> to map the remote peer.

@gregw
Copy link
Contributor Author

gregw commented Jul 31, 2024

@sbordet A wise programmer once said:

I would keep this Handler as simple as possible

By making this handler entirely pluggable, I think perhaps we are adding complexity where none is needed. Writing an entirely new custom handler is not that difficult and is better documented than writing three implementations of unknown interfaces to plug into this Handler.
This pluggability will also make XML configuration really difficult.

I suggest we consider going back to a simple handler, with the algorithms in protected methods where possible, and keep it simple. If anybody wants something different, they can extend or replace.
I'd prefer reject+block as the default algorithm, but that needs wider changes and less simplicity. So I think the delay+reject approach is fine for a simple DosHandler.

@gregw
Copy link
Contributor Author

gregw commented Aug 1, 2024

@sbordet I've added in pluggable rejection. It is not too ugly. I'll try the xml configuration soon. If you have time for some more feedback/review, it would be appreciated before I commit too much to the XML

@gregw
Copy link
Contributor Author

gregw commented Aug 1, 2024

@sbordet I've added XmlConfiguration and filled out the DosHandler a bit more. It's a little more complex than I'd like, but it is not too bad.

@gregw gregw requested review from sbordet and lorban August 1, 2024 08:16
@gregw
Copy link
Contributor Author

gregw commented Aug 23, 2024

@sbordet @lachlan-roberts @lorban nudge!!! 3 weeks is too long to wait for feedback!

@gregw gregw changed the base branch from jetty-12.0.x to jetty-12.1.x August 23, 2024 01:57
@gregw
Copy link
Contributor Author

gregw commented Sep 16, 2024

@sbordet nudge... no shove.... no POKE!!!!

protected boolean onConditionsMet(Request request, Response response, Callback callback) throws Exception
{
// Reject if we have too many Trackers
if (_maxTrackers > 0 && _trackers.size() >= _maxTrackers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _maxTrackers > 0 is always true because it is defaulted in the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was to have 0 being unlimited... but perhaps Integer.MAX_VALUE is better for that? For now I've changed the constructor to allow a 0 value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value 0 can also be used as an "emergency button", if it could be set via JMX, to stop all requests.
In this way, 0 is not special, -1 is default, and large values are large values.

/**
* A Handler to reject DoS requests after first delaying them.
*/
public static class DelayedRejectHandler extends Handler.Abstract
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this class could have a CyclicTimeouts<Exchange> field that does all the expiration logic -- it should simplify the implementation by a lot and avoid bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a go at doing this, but I don't think it is right or simpler. CyclicTimeouts is optimized for timeouts that are expected to be cancelled. It also does fine grained timeouts that strive to be accurate. For this delayed handler, we just tick with half the delay period with a timeout that we know will expire and then we process the queue. I think it is best this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to understand why it is best having to rewrite the loop, checking for expirations, rescheduling, when it would have been enough to implement onExpired() { Response.writeError(); }, leaving all of the above to existing JDK or Jetty utility classes?
If you don't want to use CyclicTimeouts just use a plain Scheduler, but don't rewrite a scheduler.

jetty-core/jetty-server/src/main/config/etc/jetty-dos.xml Outdated Show resolved Hide resolved
Comment on lines +10 to +11
<Name><Property name="jetty.dos.id.type" default="ID_FROM_REMOTE_ADDRESS"/></Name>
<Class><Property name="jetty.dos.id.class" default="org.eclipse.jetty.server.handler.DoSHandler"/></Class>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd swap these 2 lines, so it looks more Java like as in DoSHandler.ID_FROM_REMOTE_ADDRESS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that order does not validate again the DTD

Comment on lines 17 to 19
<Arg name="samplePeriodMs" type="long"><Property name="jetty.dos.samplePeriodMs" default="-1"/></Arg>
<Arg name="alpha" type="double"><Property name="jetty.dos.alpha" default="-1.0"/></Arg>
<Arg name="maxRequestsPerSecond" type="int"><Property name="jetty.dos.maxRequestsPerSecond" default="100"/></Arg>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename these properties as jetty.dos.rateControlFactory.expMovingAvg.samplePeriodMs etc. so they are scoped to the RateControlFactory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for alpha, but sample periods are likely to be common to all trackers and the maxRequestsPerSecond is definitely common to any tracker

Comment on lines 25 to 29
<Arg name="delayMs" type="long"><Property name="jetty.dos.delayMs" default="1000"/></Arg>
<Arg name="maxDelayQueue" type="int"><Property name="jetty.dos.maxDelayQueue" default="1000"/></Arg>
<Arg name="reject">
<New class="org.eclipse.jetty.server.handler.DoSHandler$StatusRejectHandler">
<Arg name="status"><Property name="jetty.dos.rejectStatus" default="429"/></Arg>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename these properties as jetty.dos.rejectHandler.delayed.delayMs etc. so they are scoped to the RejectHandler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for delayMs, but the reject status will be common to all/most reject handlers

protected boolean onConditionsMet(Request request, Response response, Callback callback) throws Exception
{
// Reject if we have too many Trackers
if (_maxTrackers > 0 && _trackers.size() >= _maxTrackers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value 0 can also be used as an "emergency button", if it could be set via JMX, to stop all requests.
In this way, 0 is not special, -1 is default, and large values are large values.

*/
class Tracker implements CyclicTimeouts.Expirable
{
private final AutoLock _lock = new AutoLock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strict need for a lock -- it penalizes normal requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below

Comment on lines 303 to 309
CyclicTimeouts<Tracker> cyclicTimeouts = _cyclicTimeouts;
if (cyclicTimeouts != null)
{
// schedule a check to remove this tracker if idle
_expireAt = now + _idleCheck.toNanos();
cyclicTimeouts.schedule(this);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. This class is already Expirable, so it just needs to update the _expireAt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ummm no! You still need to actually schedule it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just schedule it at creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ummmm no! The whole point of a cyclic timeout is that it is optimised to be cancelled. If we are getting requests, we don't need to expire the timeout, we want to set it again into the future. We only want to expire if we have not received any requests for a period of time.

In fact, I think I'm missing a schedule for when it expires, but is not idle!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to expire the timeout, we want to set it again into the future

That's exactly what CyclicTimeouts does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to expire the timeout, we want to set it again into the future

That's exactly what CyclicTimeouts does.

Then the contract is strange and we have gotten it "wrong" all over the code base:

I can see a few places where schedule is called from just a setIdleTimeout method, so it is relying on the ability to just mutate the return of getExpireNanoTime, but then the schedule method is named incorrectly. It should just be called add. This is also contradictory to the behaviour in CyclicTimeout, where a call to schedule is needed after a timeout has expired.

So I do not think it wrong to call schedule, but it may be unnecessary. The contract/naming of CyclicTimeouts needs to be improved so that it is used consistently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public boolean isRateExceededBySample(long now)
{
// Count the request
_sampleCount++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use atomics and remove the need for a lock in Tracker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use atomicS plural without it being a race. There is too much info to record into a single atomic long. We could use an atomicReference, but then we'd be allocating... maybe that is better than a lock? @lorban???

Let's get the rest of the class agreed first and we can consider a better lock

@Override
protected boolean onExpired(Tracker tracker)
{
return tracker.isIdle(NanoTime.now());
Copy link
Contributor

@sbordet sbordet Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand.

The idleness is either determined by time, and/or by the specific rate control algorithm.

For example, with a moving window, if you are idle for as long as the sample period, you have no more samples, so you are idle.
The implementation can just set the _expireAt are the right time, so there is no need to have a method to ask.

For another algorithm, it would know differently when it is idle, and therefore set _expireAt at the right time.

So perhaps we need to have the Tracker delegate to the RateControl, either always (RateControl extends Expirable) or optionally by having Tracker test RateControl instanceof Expirable (otherwise defaulting).

So maybe we don't need Tracker at all, since it's just a wrapper to a RateControl.

}

/**
* @param status The status used to reject a request, or 0 to fail the request or -1 for a default ({@link HttpStatus#TOO_MANY_REQUESTS_429}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this javadoc be moved to the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class javadoc does say pretty much this?

/**
* A Handler to reject DoS requests after first delaying them.
*/
public static class DelayedRejectHandler extends Handler.Abstract
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to understand why it is best having to rewrite the loop, checking for expirations, rescheduling, when it would have been enough to implement onExpired() { Response.writeError(); }, leaving all of the above to existing JDK or Jetty utility classes?
If you don't want to use CyclicTimeouts just use a plain Scheduler, but don't rewrite a scheduler.

@gregw gregw requested a review from sbordet September 24, 2024 23:22
Copy link
Contributor

@sbordet sbordet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have concerns about using exponential moving average (does not seem the right algorithm) for rate control, and about RateControl.isIdle() which I think unnecessary.
Let's discuss this face to face.

@lorban
Copy link
Contributor

lorban commented Sep 25, 2024

Let's discuss this face to face.

Count me in.

@gregw
Copy link
Contributor Author

gregw commented Sep 27, 2024

@sbordet I've looked at having the rate tracker calculate when it will next go idle, but it involves multiple Math.log calculations, so I'm not sure it is worthwhile. I've delegated the default idleCheckPeriod to the rateTrackerFactory class instead. It can set a period based on the alpha, but currently I'm not doing that,, as it will probably need to know the max rate as well.

Why don't you take over this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Introduce DoSHandler
4 participants