L
21

Spent 3 hours refactoring a Python loop and cut runtime from 12 seconds to 0.4

I had this scraper script that was taking forever to run on a list of 5000 URLs. Was using a regular for loop with requests.get one by one. Switched to using multiprocessing with Pool and bam, went from 12 seconds to under half a second. Did anyone else not realize how big a difference parallel processing makes until they actually tried it?
3 comments

Log in to join the discussion

Log In
3 Comments
lewis.mila
lewis.mila14h ago
Right, but here's the thing nobody's talking about - my laptop almost caught fire the first time I tried this on a machine with only 4GB of RAM. Python's multiprocessing copies everything to each worker process, so if you've got a huge list of URLs or something like that, you can actually slow things down with all that memory copying. Plus some websites will straight up block you if you hit them with 8 requests at once. Had to learn the hard way to add small delays and respect rate limits. The speed boost is real but you gotta be smart about it or you'll just trade one bottleneck for another.
1
dylan_stone33
6 times I've had to explain to non-tech friends why their "make it faster" trick actually broke everything, it's the same lesson every time. Reminds me of how @lewis.mila was saying you can't just brute force things without understanding the system underneath. Speed is useless if you crash the whole thing doing it.
2
felixb25
felixb256h agoMost Upvoted
Ngl, that rate limit thing got me too. Speed is nothing if you get yourself blocked.
1