[00:04:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.917 seconds [00:21:35] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [00:31:21] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [00:52:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:56:41] New review: Jdlrobson; "yes... this should be merged still." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/11963 [01:04:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.493 seconds [01:07:57] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [01:39:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:41:42] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 260 seconds [01:42:09] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 288 seconds [01:48:36] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 680s [01:52:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.302 seconds [01:55:57] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [01:58:21] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 0 seconds [01:59:24] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 5s [02:27:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:31] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [02:30:31] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [02:30:31] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [02:30:31] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [02:30:31] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [02:30:32] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [02:30:32] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [02:30:33] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [02:30:33] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [02:32:37] PROBLEM - SSH on sodium is CRITICAL: Server answer: [02:38:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.021 seconds [02:53:55] RECOVERY - SSH on sodium is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [03:11:46] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [03:13:25] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [03:35:09] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [05:15:29] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [06:30:50] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [06:30:50] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [06:30:50] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [06:36:52] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [07:08:49] PROBLEM - swift-account-reaper on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:37] PROBLEM - swift-container-replicator on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:55] PROBLEM - swift-object-server on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:04] PROBLEM - swift-container-server on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:13] PROBLEM - swift-account-replicator on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:13] PROBLEM - swift-object-replicator on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:13] PROBLEM - swift-container-updater on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:13] PROBLEM - swift-object-updater on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:13] PROBLEM - swift-account-server on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:40] PROBLEM - swift-account-auditor on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:40] PROBLEM - swift-object-auditor on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:40] PROBLEM - swift-container-auditor on ms-be7 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:02:22] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [08:02:22] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [08:39:20] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [08:58:59] PROBLEM - SSH on ms-be7 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:38:02] New review: Siebrand; "In my experience, ops doesn't scan open patch sets. You have to find an ops friend, or create an RT ..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/11963 [11:52:23] PROBLEM - Puppet freshness on snapshot1001 is CRITICAL: Puppet has not run in the last 10 hours [12:11:08] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [12:23:26] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [12:31:23] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [12:31:23] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [12:31:23] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [12:31:23] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [12:31:23] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [12:31:24] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [12:31:24] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [12:31:25] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [12:31:25] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [12:35:35] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [12:55:04] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [13:01:04] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [13:36:10] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [13:36:15] GODDAMNIT [14:05:25] PROBLEM - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is CRITICAL: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2588* [14:08:25] RECOVERY - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is OK: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y OK - 2350 [14:14:07] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , itwiki (24106) [14:14:52] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , itwiki (23176) [14:16:04] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [14:17:07] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [14:18:35] :O [14:21:10] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [14:25:10] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.040 second response time [14:38:31] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [14:38:49] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [14:39:16] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [14:44:31] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.53 ms [14:48:52] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [14:51:34] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [14:55:01] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [15:02:40] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.046 second response time [15:17:04] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [15:21:16] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [15:25:55] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [15:33:32] PROBLEM - Apache HTTP on mw8 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:41:11] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.036 second response time [16:04:45] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [16:04:53] PROBLEM - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is CRITICAL: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2650* [16:06:32] RECOVERY - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is OK: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y OK - 2013 [16:06:50] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [16:10:26] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [16:10:58] New patchset: Dereckson; "(bug 39905) Create interface editor group on pt.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22370 [16:11:54] New patchset: Dereckson; "(bug 39905) Create interface editor group on pt.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22370 [16:12:47] New review: Dereckson; "PS2: fixing commit message (Change-Id: were in the middle of the text)." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/22370 [16:22:44] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.036 second response time [16:27:41] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [16:30:14] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [16:31:53] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [16:31:53] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [16:31:53] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [16:33:14] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [16:37:52] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [16:41:10] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [16:44:01] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [16:59:28] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.040 second response time [17:11:55] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [17:16:34] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.021 second response time [17:33:49] PROBLEM - Apache HTTP on mw8 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:41:58] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [18:03:07] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [18:03:07] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [18:09:13] New review: Hashar; "Thanks for the cleanup, we probably want to rebase this against latest version." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/8438 [18:40:10] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [18:42:07] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [18:54:35] Reedy: What's up with fenari, why won't git pull work? [18:54:44] (re your jq 1.8.1 bugzilla reply) [18:54:51] Same reason as it wouldn't earlier in the week [18:54:57] MaxSem has a bad umask [18:55:06] so no group write on some of the object folders/files [18:55:08] I get public key warning, not user permissions [18:55:30] oh, not anymore [18:55:31] error: insufficient permission for adding an object to repository database .git/objects [18:55:32] error: insufficient permission for adding an object to repository database .git/objects [18:55:32] fatal: failed to write object [18:55:32] fatal: unpack-objects failed [18:55:33] indeed [18:55:44] any roots online? [18:55:49] Nope [18:56:00] Seemingly none all day [18:56:08] maybe we can replace git with a wrapper that disallows anything if umask is wrong [18:56:20] Doesn't seem worth the effort [18:56:23] can't we change the default mask somewhere? [18:56:25] kick the people who have bad umasks [18:56:33] I believe mutante did something in puppet for that [18:57:19] this is going to happen again unless we either force everybody to change their bashrc (which is unlikely to happen or stay that way for all new people getting access) or fix the default. [18:57:36] Twice in one week [18:57:43] But yet it hasn't happened for a while [18:57:51] Yeah, he has [18:57:51] https://gerrit.wikimedia.org/r/#/c/22111/ [18:58:06] Just needs reviewing and merging [19:00:04] and then it needs to be fixed retroactively [19:00:28] Shouldn't really need to be any cleanup to be done [19:01:46] well, obviously php-1.20wmf10 needs to be fixed [19:01:52] nothing can be deployed right now, that's unacceptable [19:02:11] If it had been an emergency, roots could be raised [19:02:39] there's several people patrolling edits over the weekend, that can't because of the bug in jquery 1.8 [19:02:47] (at least not using their favorite tools) [19:02:53] which is fixed in 1.8.1 [19:02:56] (a regression that is) [19:03:07] * Reedy shrug [19:03:09] Shit happens [19:04:49] Having said that, if it was really needed, it could be live hacked and fixed up later [19:04:58] it's not like there's no way around it [19:06:24] Burn MaxSem at the stake! [19:07:36] * Damianz throws an apple at Brooke [19:07:54] I could use an Apple. [19:07:58] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [19:08:43] I'm kicking https://bugzilla.wikimedia.org/show_bug.cgi?id=29902 a bit, trying to get some people to start committing some changes for it. [19:49:13] Krinkle: Reedy: so what are the permissions exactly now? [19:49:27] no group write [19:49:34] but what else? [19:49:35] drwxr-xr-x 2 maxsem wikidev 4096 2012-08-30 22:35 09 [19:49:52] hrmm [19:50:02] the dir is called 09 ? [19:50:14] there's 4 of them [19:50:20] nothing has changed recently, its just that people forget to make sure that /h/w/common needs to be writeable by wikidev and not exclusively owned and writable by the creator of a file (which is the linux default) [19:50:25] jeremyb: git object internals [19:50:32] Krinkle: right [19:50:33] grouped in dirs by hash charachters [19:50:40] first two I think [19:51:03] same way as we do image hashing in MW [19:51:13] well, sort of [19:51:23] x/y/xyfoobar [19:55:09] so, it's already setgid but that's not good enough because it's not also g+w [19:55:33] you could do a git wrapper pretty easily [19:56:12] you could do a suid script to chmod. or a cron [21:33:14] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22295 [21:53:44] PROBLEM - Puppet freshness on snapshot1001 is CRITICAL: Puppet has not run in the last 10 hours [21:54:09] Reedy:ping [21:54:15] Hi [22:05:28] New review: Reedy; "I can't remember if we need to set a default... (ie does it make noise when it's trying to export th..." [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/20876 [22:09:19] Reedy: http://en.wikipedia.org/w/index.php?title=User:Keelan717&diff=prev&oldid=510308755 [22:09:25] oops [22:09:27] https://bugzilla.wikimedia.org/show_bug.cgi?id=39780 [22:09:32] wrong copy/paste [22:12:30] Does it work for ipv4 addresses? [22:12:36] yeah [22:12:52] (according to the steward who reported it to me) [22:13:59] I don't think that's a bug [22:14:07] $6 has been changed in usage from the original message [22:14:08] Your current IP address is $6. [22:14:24] (that part is something else) [22:14:38] (unrelated but causing users to face generic messages when globally blocked" [22:14:41] ) [22:14:59] What? [22:15:08] basically I was filing two bugs in one [22:15:36] The part beginning with "furthermore" is unrelated to the other part [22:15:51] I can't see it working with ipv4 addresses either [22:16:26] hhm... [22:16:36] in either case, I think we should show the range [22:16:45] as with the local Blockedtext [22:17:23] which makes it an enhancement, not a bug [22:17:43] the steward who told me to file it will be interested [22:17:53] PROBLEM - Apache HTTP on mw8 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:19:16] Reedy: for the record, the /44 of that proxy used as an example is rangeblocked globally [22:19:42] I think the same problem shows up for the generic message on enwiki [22:20:06] http://www.1proxy.de/index.php?q=aHR0cDovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9TcGVjaWFsOlVzZXJMb2dpbi9zaWdudXA%3D [22:20:19] (same range, different wiki) [22:21:33] still an enhancement, I guess [22:32:44] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [22:32:44] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [22:32:44] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [22:32:44] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [22:32:44] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [22:32:45] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [22:32:45] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [22:32:46] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [22:32:46] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [22:49:50] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [22:55:41] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [23:05:20] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [23:23:38] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (42653) [23:24:23] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (42198) [23:26:05] srsly. [23:28:26] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.301 second response time [23:36:59] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours