Parallel.ForEach constrains at the number of cores on your computer, so it’s good for processor intensive stuff but not great for things that rely on network waits.
My suggestion would be a modification of option 3. Instead of spinning up all the tasks at once, use a SemaphoreSlim to make sure you’re not doing more than a few dozen at a time. I’ve seen that trying to start too many tasks at once can have a negative perf impact.
Also, make sure you’re using multiplex IO (default after 2.4.0) or that you have a very high maximum number of connections.
Brant