Page 1 of 1

Traffic congestion at ChoralWiki - restricted access

Posted: 02 Nov 2009 20:14
by choralia
We are currently experiencing severe traffic congestion problems at ChoralWiki. It might be an attack by bots. The main website has been temporarily disabled, and access to the back-up website has been intentionally restricted by means of username and password, so as to make life more difficult to bots.

The current username and password to access are quite obvious: ChoralWiki for both. We might change them at a later time.

We apologize for the inconvenience, and we hope we can remove these exceptional security measures as soon as possible.

Max

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 02 Nov 2009 21:46
by bobnotts
Thanks for the update, Max. I'm sure everyone appreciates the hard work you and Carlos put into keeping CPDL accessible even in difficult times such as this.

Rob

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 02 Nov 2009 22:42
by choralia
Thank you Rob. The main website is now also available, however, to prevent problems, access is still restricted by password, both on the main website and on the back-up website. I plan to re-open both to unrestricted access during off-peak hours, about 7 - 8 hours from now. I go to sleep now!

Max

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 03 Nov 2009 03:15
by elena
I still cannot access website with or without password. Any suggestions?

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 03 Nov 2009 05:42
by choralia
Unrestricted access is now restored.

Max

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 19 Nov 2009 13:55
by Antonyx
Since the beginning of November I have been encountering the following notice every time I log on:

'ChoralWiki has a problem
Sorry! This site is experiencing technical difficulties.
Try waiting a few minutes and reloading.

(Can't contact the database server: Unknown error (localhost))'

The above notice is as of today's log on attempt. There seems to be no posting since the beginning of November updating the position - or have I missed it?

Would it be possible please to say what is happening. I am loth to try to upload anything unless I can be sure that the procedure will not be wasted time.

Thanks.

Antonyx/David Monks

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 19 Nov 2009 15:05
by choralia
In one year the traffic at ChoralWiki more than doubled. We have implemented some configuration tricks to accommodate this growth while still remaining within affordable hosting solutions, however the traffic remains huge and we expect to be such until Christmas.

The system logs show that currently about 4% of page requests submitted are being denied due to congestion, so most requests are being correctly handled anyway. Please be patient and try again. We can re-activate restricted access as a further countermeasure (especially to temporarily block crawlers, that visit the whole website thus producing much more traffic than humans), however I think we should consider it only if the failure rate becomes larger than it is now.

We are now considering a server upgrade, however hosting costs will increase by quite a large factor, and this needs to be carefully evaluated as CPDL's budget is rather limited.

Max

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 19 Nov 2009 20:17
by anaigeon
Hi Choralia,

What are exactly are these "block crawlers", and what do their visits bring to the site?

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 19 Nov 2009 21:40
by choralia
Whenever you use a search engine such as Google and you find some search results, this is because Google continuously navigates the whole internet, visiting billions of pages, and storing their contents inside a huge database, so that pages corresponding to given search query can be quickly retrieved. The Google program that continuously navigates internet is a "crawler" (or "spider", as it goes through the web), and it's official name is "googlebot" ("bot" stands for "robot", as it's a fully automated task).

Many other crawlers exist, and they regularly scan the entire ChoralWiki (i.e., several thousands pages - much more than any visit by humans). Anybody who looks for the score of a certain work may have ChoralWiki's results listed with any other results, thanks to the visits made by these crawlers. Therefore, they are very useful. However, they create a lot of traffic, that adds-up to traffic made by human users.

Hackers also use bots to inject heavy traffic into their target websites so as to possibly cause downtime. From time to time I do something like that on ChoralWiki during off-peak traffic hours, to carry out load tests and evaluate how much growth margin we still have with the current hosting solutions.

A "trick" to (temporarily) block bots during traffic peaks is to restrict access through username and password, and provide username and password in a way that can be easily understood by humans, but not by bots. This is what I temporarily activated at the beginning of November. In that case, we probably had a bot attack from hackers, and this simple trick was very effective to solve the problem.

Max

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 19 Nov 2009 22:07
by anaigeon
I see, "crawler" is the word I wasn't remembering (but indeed I've already seen "google" bot mentionned as a visitor) !
BTW, would it be possible to list the most useful and famous ones, thus rejecting visits by others (including spammers) ?
I'm not quite aware of these technical questions, that's just an idea, and please do not spend your time if it's completely irrelevant ;-)

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 20 Nov 2009 07:11
by choralia
No, your question is not irrelevant: it's something we had to consider when ChoralWiki was moved to the current servers.

All crawlers "should" read a specific file, named robots.txt, that is specifically intended to instruct crawlers about rules they "should" apply when visiting the website. Our robots.txt file contains a very simple instruction directed to all crawlers: please wait 20 seconds between any page visits. This would allow crawlers to scan the entire website in about one week (that is a reasonable time to refresh their databases) without creating a large amount of traffic.

Unfortunately, it seems that most crawlers simply disregard instructions provided by robots.txt, so they do what they want... :evil:

Max

Re: Traffic congestion at ChoralWiki - restricted access

Posted: 22 Nov 2009 23:05
by anaigeon
I see, it's not that simple, till they create a web police :wink: