I am still working on a way to do this. To keep it short, I can fork out the server resources to "crawl" every thread that is currently viewable on ITM but there are a few uncertain things I am not 100% sure of.

1. Would ITM allow it? (instant show stopper if that is the case)
2. Just how much space is required (keeping in mind all the images/uploads/gifs/avatars etc can take up a fair chunk of space). Long term the amount is irrelevant as ill probably just dump it on S3, but for the crawl itself i will need to know what will be required.
3. How to program some logic into the crawl that knows the # of pages per thread, and to crawl said pages (i am no web dev program haxor, so not 100% sure on this part) FYI, wget with several variables seems to be the way to do this.

RE question 1. I had a thought, Gruso if you could pass this on that would be great. I am guessing the bulk of the issues come from table locking whenever a user posts/updates data? If that is the case, would it be possible to drop the entire ITM archive pre 2013 or whatever into a "Archive" forum that is 100% read only. And I dont mean just create another forum within ITM. I mean drop it into its own DB + vBulletin installation, separate from the current ITM forums. Lock it up, make it read only and sit there with no post/user updates at all. Disable registrations, effectively preventing any further updates. Allow searching of course, but that is about it.
This would:
A) Allow us to look back over old threads, remember the good (or bad?) times and get a laugh every so often
B) Allow us to continue running on spanking new forum with only 2013/relevant threads in it.

Again, happy to provide resources if required.



Originally Posted by RunningWithScissors

hardstyle does have melodies they're just really retarded