[00:00:00] An infinete queue of 300 seconds pauses. Is that normal? [00:00:10] binbot: should be ok again, no? [00:00:19] binbot: On hu.wikipedia? [00:00:27] This moment is OK at last. [00:00:43] I could not go to sleep because of this. [00:00:55] Yes, huwiki [00:01:15] Maxlag seems to be back to 0: http://hu.wikipedia.org/wiki/Szerkeszt%C5%91:BinBot/munk?action=edit&maxlag=-1 [00:01:45] I am just angry and wish some dirty things to the server. [00:02:05] There was db updates happening earlier iirc, and that would have caused the replication lag issues [00:02:15] binbot: I guess it was a schema update [00:02:23] OK. [00:02:32] those are partly atomic options which need to lock whole tab�es [00:02:34] * tables [00:02:40] How can I get information of these in advance? [00:03:03] New patchset: Lcarr; "switched /etc/nagios-plugins to /etc/nagios-plugins/config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2782 [00:03:10] Is there a notice board to see when not to begin bot work? [00:03:44] binbot: If it's very important that your bot run on time, you should adjust your bot's maxlag parameter (there's a config option if you're using pywikipedia). You should adjust it based on the priority; most bots can have a low value, since it's fine if they wait around a while to do automated edits. [00:04:08] Well work like this is on the rarer side so there isn't much use of a noticeboard for it, because people would forget and not pay much attention to it [00:04:49] binbot: You can also check what the maxlag value is right now with the URL I gave above. (&maxlag=-1 will always fail and tell you what it's currently at.) That way you know whether there will be delays. [00:04:54] I almost always do manual edits. [00:05:05] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2782 [00:05:06] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2782 [00:05:17] I correct spelling and such things that can't be done in automatic mode. [00:05:24] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2709 [00:05:24] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2709 [00:06:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:06:53] PROBLEM - DPKG on searchidx1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [00:06:54] @Pathoschild: thank you, I will use that! I didn't know. [00:07:02] Welcome. :) [00:11:51] https://www.mediawiki.org/w/index.php?title=Manual:Maxlag_parameter&diff=503651&oldid=501547 [00:12:26] PROBLEM - Host cp1019 is DOWN: PING CRITICAL - Packet loss = 100% [00:12:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 6.450 seconds [00:13:47] RECOVERY - Host cp1019 is UP: PING OK - Packet loss = 0%, RTA = 30.93 ms [00:18:08] PROBLEM - Backend Squid HTTP on cp1019 is CRITICAL: Connection refused [00:18:53] PROBLEM - Frontend Squid HTTP on cp1019 is CRITICAL: Connection refused [00:26:43] Anyone got an idea about https://en.wikipedia.org/wiki/MediaWiki:Wikimediaplayer.js ? [00:35:14] RECOVERY - DPKG on searchidx1001 is OK: All packages OK [00:39:53] RECOVERY - Backend Squid HTTP on cp1019 is OK: HTTP OK HTTP/1.0 200 OK - 27400 bytes in 0.179 seconds [00:40:47] RECOVERY - Frontend Squid HTTP on cp1019 is OK: HTTP OK HTTP/1.0 200 OK - 27546 bytes in 0.107 seconds [00:44:47] !log aaron synchronized php-1.19/includes/filerepo/backend/FileBackend.php 'deployed r112377' [00:44:49] Logged the message, Master [00:44:51] aaron cleared profiling data [00:48:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:48:35] PROBLEM - SSH on sq39 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:50] PROBLEM - Host sq39 is DOWN: PING CRITICAL - Packet loss = 100% [00:52:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.475 seconds [00:53:17] New patchset: Catrope; "Point the l10nupdate script to git instead of SVN" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2783 [00:53:32] ^demon|away, Ryan_Lane: ---^^ [00:57:13] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2783 [00:57:14] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2783 [00:57:17] RoanKattouw: ^^ [00:57:30] DUDE [00:57:35] Did you read the commit message? [00:58:09] New patchset: Catrope; "Revert "Point the l10nupdate script to git instead of SVN"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2784 [00:59:06] Ryan_Lane: So please revert that [00:59:15] oh? [00:59:20] The git repos don't actually work yet, this commit breaks LU in its current state [00:59:21] New patchset: Ryan Lane; "Revert "Point the l10nupdate script to git instead of SVN"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2785 [00:59:30] I already did that [00:59:32] See 2784 [00:59:39] I'll abandon mine then [00:59:44] Change abandoned: Ryan Lane; "dupe" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2785 [00:59:49] Oh OK [00:59:49] bahahaha [00:59:50] PROBLEM - Host cp1043 is DOWN: PING CRITICAL - Packet loss = 100% [00:59:55] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2784 [00:59:56] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2784 [01:00:01] RoanKattouw: I hadn't pushed it all the way out [01:00:10] RoanKattouw: don't point me to a change unless you need it merged :) [01:00:12] I figured [01:00:13] heh [01:00:24] Do you read commit summaries? At all? [01:00:34] I read the title, not the message [01:00:58] hmm, of course Gerrit won't let me remerge [01:01:01] I'll have to resubmit it [01:01:04] yeah [01:01:06] With a new change-Id too [01:01:08] yep [01:01:12] !log aaron synchronized php-1.19/includes/StreamFile.php 'deployed r112379' [01:01:15] Logged the message, Master [01:01:16] aaron cleared profiling data [01:02:21] New patchset: Catrope; "Point the l10nupdate script to git instead of SVN" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2786 [01:02:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2786 [01:09:15] !log catrope synchronized wmf-config/CommonSettings.php 'Remove outdated itwiki lockdown bypass code' [01:09:18] Logged the message, Master [01:14:20] does anyone know of wikitechwiki having issues sending mail? [01:14:53] I'm not getting enotifs, nor am I getting the confirmation mail re-send [01:15:08] PROBLEM - Host ms-be4 is DOWN: PING CRITICAL - Packet loss = 100% [01:17:06] RECOVERY - Host cp1043 is UP: PING OK - Packet loss = 0%, RTA = 30.82 ms [01:21:08] PROBLEM - Varnish traffic logger on cp1043 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [01:23:14] RECOVERY - Varnish traffic logger on cp1043 is OK: PROCS OK: 2 processes with command name varnishncsa [01:23:25] New review: Catrope; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2787 [01:23:25] Change merged: Catrope; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/2787 [01:25:47] PROBLEM - Host cp1043 is DOWN: PING CRITICAL - Packet loss = 100% [01:28:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:28:11] RECOVERY - Host cp1043 is UP: PING OK - Packet loss = 0%, RTA = 30.81 ms [01:31:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.506 seconds [01:32:05] robla: pcache hits looks better [01:32:59] PROBLEM - Varnish HTTP mobile-frontend on cp1043 is CRITICAL: Connection refused [01:33:26] PROBLEM - Varnish traffic logger on cp1043 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [01:34:56] RECOVERY - Varnish HTTP mobile-frontend on cp1043 is OK: HTTP OK HTTP/1.1 200 OK - 634 bytes in 0.062 seconds [01:35:03] robla: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/89706 [01:35:10] I noticed that says "post1.18" [01:35:23] RECOVERY - Varnish traffic logger on cp1043 is OK: PROCS OK: 2 processes with command name varnishncsa [01:42:34] robla: probably a good idea to glance through the post1.18 tag [01:44:50] !log aaron synchronized php-1.19/includes/StreamFile.php 'deployed r112382' [01:44:52] Logged the message, Master [01:45:15] !log aaron synchronized php-1.19/includes/filerepo/backend/FileBackend.php 'deployed r112382' [01:45:18] Logged the message, Master [01:45:19] aaron cleared profiling data [02:08:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:12:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.733 seconds [02:17:41] !log LocalisationUpdate completed (1.18) at Sat Feb 25 02:17:41 UTC 2012 [02:17:44] Logged the message, Master [02:33:27] !log LocalisationUpdate completed (1.19) at Sat Feb 25 02:33:27 UTC 2012 [02:33:29] Logged the message, Master [02:54:53] RECOVERY - Puppet freshness on ms-be5 is OK: puppet ran at Sat Feb 25 02:54:43 UTC 2012 [04:50:03] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [05:34:18] PROBLEM - Disk space on search1018 is CRITICAL: DISK CRITICAL - free space: /a 3691 MB (2% inode=99%): [05:34:36] PROBLEM - Disk space on search1017 is CRITICAL: DISK CRITICAL - free space: /a 3692 MB (2% inode=99%): [06:07:11] /query domas [06:25:11] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [06:31:11] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [06:31:11] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [08:14:44] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [08:20:44] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [08:20:45] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [09:02:18] New review: Hashar; "(no comment)" [analytics/reportcard] (master) C: 1; - https://gerrit.wikimedia.org/r/2417 [10:28:01] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:30:14] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [11:23:54] New patchset: Hashar; "use MWScript in 'sql' script for centralauth DB" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2788 [14:52:04] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [16:26:38] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [16:32:38] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [16:32:38] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [16:39:14] Anyone here with lists.wm.o admin access ? [16:40:07] I'm list admin of cvn-private but lost my login awhile ago, I couldn't choose a password myself and I can't seem to find a reset function. [16:40:35] keep getting '# cvn-private moderator bounce waiting' mails [17:03:23] PROBLEM - MySQL Slave Delay on db16 is CRITICAL: CRIT replication delay 205 seconds [17:04:52] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [18:15:53] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [18:21:49] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [18:21:50] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [18:31:07] RECOVERY - MySQL Slave Delay on db16 is OK: OK replication delay 6 seconds [20:15:14] for some reason apparently logged actions get done twice... https://commons.wikimedia.org/w/index.php?title=Special:Log&page=File%3AAfrican+Cup+of+Nations+2012.png I just reverted once: two reverts. [20:15:14] Also noted: the revert tages ages (~ 30 seconds to 2:30 minutes) to return (browser waits...). The long revert duration was confirmed three times (with two different browsers and accounts) by me at https://commons.wikimedia.org/w/index.php?title=File:AaatestSonnepalmenstrand-portrait_new.jpg&action=history [20:19:28] Saibo, Chromium? [20:19:48] (for the first) [20:20:00] FF10 and Opera11.6 [20:20:06] ok [20:20:20] @replag [20:20:22] Nemo_bis: No replag currently. See also "replag all". [20:20:28] @replag all [20:20:29] Nemo_bis: [s1] db36: 0s, db12: 0s, db32: 0s, db38: 0s, db52: 0s, db53: 0s; [s2] db13: 0s, db30: 0s, db24: 0s, db54: 0s; [s3] db39: 0s, db34: s, db25: 0s, db11: 0s [20:20:30] Nemo_bis: [s4] db22: 0s, db31: 0s, db33: 0s, db51: 0s; [s5] db45: 0s, db35: 0s, db44: 0s, db55: 0s; [s6] db47: 0s, db43: 0s, db46: 0s, db50: 0s; [s7] db37: 0s, db16: 1s, db18: 0s, db26: 0s [20:20:49] well, at least this was an improvement [20:20:53] just do a revert at this test file and see what you get. Note that the revert is done "fast". Just it takes ages for the server feedback [20:21:22] not sure why it did two reverts. I think I did not click twice ;) [20:21:50] I'm not a rollbacker [20:22:16] ah, but you mean only file versions restore? [20:22:18] you do not need to [20:22:20] yes [20:22:35] I mean file version "reverts" ;) [20:22:48] sorry - that was not clear [20:24:23] well, I guess that's not browser- or location-specific [20:24:37] let's leave it to the devs :p [20:25:42] did you test it? [20:25:49] no [20:25:55] a independent confirmation would be good [20:26:05] Roan is killing cadmium http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=&c=Miscellaneous+eqiad&h=cadmium.eqiad.wmnet&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS let's hope we'll have the videos soon :p [20:26:17] I will put it to Bugzilla then [20:26:35] I thought it is named Candyium? ;) [20:27:52] candyium?! [20:28:05] sugary servers [21:40:22] hi, are there at the moment known problems with thumbnail generation on commons? [21:41:14] Akoopal: would not be the only server problem ... ;) [21:41:22] which image? all? [21:41:29] what do you observe? [21:41:39] well, I see complaints that images are not generated [21:41:53] like on this article: https://nl.wikipedia.org/wiki/%C4%BDubom%C3%ADr_Ft%C3%A1%C4%8Dnik [21:42:11] confirm [21:42:16] did you try a purge? [21:42:47] https://commons.wikimedia.org/wiki/File:Ftacnik_lubomir_20081025_berlin_bundesliga.jpg the 450px thumb works [21:43:13] yep [21:43:18] the 264 one doesn't [21:43:27] which is our default for infoboxes [21:43:31] http://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Ftacnik_lubomir_20081025_berlin_bundesliga.jpg/264px-Ftacnik_lubomir_20081025_berlin_bundesliga.jpg [21:43:33] yes [21:43:37] don't see it [21:44:13] message: "Die Grafik "http://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Ftacnik_lubomir_20081025_berlin_bundesliga.jpg/264px-Ftacnik_lubomir_20081025_berlin_bundesliga.jpg" kann nicht angezeigt werden, weil sie Fehler enthält." [21:44:32] we could try a purge of the file page [21:44:48] maybe it got broken during the transition to SWIFT [21:44:59] go ahead [21:45:16] purged [21:45:20] takes ... [21:45:21] done [21:45:30] does not work [21:45:36] nope :-( [21:45:39] no 264 thumb [21:45:50] http://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Ftacnik_lubomir_20081025_berlin_bundesliga.jpg/265px-Ftacnik_lubomir_20081025_berlin_bundesliga.jpg works... [21:45:58] hmm, intresting [21:46:27] http://upload.wikimedia.org/wikipedia/commons/thumb/1/21/Ftacnik_lubomir_20081025_berlin_bundesliga.jpg/264px-Ftacnik_lubomir_20081025_berlin_bundesliga.jpg work URL (directory) works ;) [21:46:38] replaced 1/11 with 1/21 [21:46:57] hmmmm [21:47:14] seems there is some broken image cached or something [21:47:25] can you purge a specific size somehow? [21:47:47] probably no operations staff is here currently.. so I can (if you are not registered) submit it to bugzilla [21:47:52] hm.. not really [21:48:14] hoped somebody was peeking in :-) [21:48:23] with a purge all sizes are deleted on the servers [21:48:28] at least should be... [21:49:04] I just tried the 263 one (just to try), and that one took a while (so probably generated), but worked [21:49:24] if it was something important (e.g. with many/all images) we could ping some ;) But I would like to leave them in their weekend ;) [21:49:39] yes, if it takes time it usually is freshly generated [21:49:52] *trying with wget* [21:50:26] 0 bytes ;) [21:50:31] as I said, I see more complaints popping up, this was an example [21:50:36] mark: perhaps around? [21:50:47] okay [21:52:25] all thumbs at the new files look good to me [22:00:32] Saibo: ok, guess I better file a bug [22:01:22] yup, okay [22:10:11] Saibo: https://bugzilla.wikimedia.org/show_bug.cgi?id=34718 [22:14:12] Akoopal: okay. I have added the observation with the wrong URL [22:14:24] ok [22:45:16] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiktionary (10223) [23:02:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:04:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 6.079 seconds [23:38:11] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [23:40:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:44:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.949 seconds