@atomicpoet and that's yet another reason why We need Blocklist Feed support [github.com]...
@kkarhan@infosec.space @atomicpoet@atomicpoet.org anyone can use a proxy, therefore is them. Look for residencial IP providers, they do like to talk high about scrapping. Also think of Tor or something more obfuscated or by simply hiding under AT&T as mozilla agent.
@gameplayer @atomicpoet it's not that simple.
In fact, many ISPs will forcibly disconnect customers if they detect they run an open proxy or tor exit node.
@kkarhan@infosec.space @atomicpoet@atomicpoet.org @torproject@mastodon.social 1 Mb/s upscalled to multiple tor daemon is enough. Their goal is to not download a 1GB file but many 1 ~ 80 KB/s files.
@gameplayer @atomicpoet Given the cost and overhead of facilitating a private network at scale to do so, I'd say that doesn't fly.
#aws is way more convenient for such a job...
@kkarhan@infosec.space @atomicpoet@atomicpoet.org Is true, you can make them unpleasant to obtain such data, especially when you poison their model with useless data instead of just blocking.
But once they've got the money, they will use that money to get even more data, just like a farmer.
Btw I have a Deja vu, like this talk already happened some days ago.
@gameplayer @atomicpoet could've been.
But for better or worse I'm not...
@gameplayer @atomicpoet @kkarhan
"AI IS USELESS COZ IT GIVES YOU DUMB ANSWERS!"
Also
"Let's poison AI training data!"
@n_dimension@infosec.exchange @atomicpoet@atomicpoet.org @kkarhan@infosec.space are you with or againist ml? I don't allow unauthrized usage for profit.