[00:00:08] PROBLEM - Puppet freshness on cp1024 is CRITICAL: Puppet has not run in the last 10 hours [00:06:08] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [00:06:08] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [00:43:11] PROBLEM - Puppet freshness on cp1026 is CRITICAL: Puppet has not run in the last 10 hours [00:43:11] PROBLEM - Puppet freshness on cp1027 is CRITICAL: Puppet has not run in the last 10 hours [00:51:08] PROBLEM - Puppet freshness on cp1028 is CRITICAL: Puppet has not run in the last 10 hours [01:26:58] New patchset: Tim Starling; "More rights for mediawikiwiki sysops" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22536 [01:27:26] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22536 [01:35:00] morning TimStarling [01:35:10] hello [01:40:47] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 252 seconds [01:41:05] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 268 seconds [01:47:14] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 638s [01:56:14] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [01:59:41] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 7s [02:00:53] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 12 seconds [02:50:14] RECOVERY - Puppet freshness on ms-be6 is OK: puppet ran at Tue Sep 4 02:50:05 UTC 2012 [02:51:28] !log powering down ms-be6, broken hardware [02:51:36] Logged the message, Master [02:54:26] PROBLEM - swift-account-reaper on ms-be6 is CRITICAL: Connection refused by host [02:54:35] PROBLEM - swift-object-server on ms-be6 is CRITICAL: Connection refused by host [02:54:44] PROBLEM - swift-object-updater on ms-be6 is CRITICAL: Connection refused by host [02:54:44] PROBLEM - swift-account-auditor on ms-be6 is CRITICAL: Connection refused by host [02:54:53] PROBLEM - swift-container-updater on ms-be6 is CRITICAL: Connection refused by host [02:54:53] PROBLEM - SSH on ms-be6 is CRITICAL: Connection refused [02:55:02] PROBLEM - swift-object-replicator on ms-be6 is CRITICAL: Connection refused by host [02:55:02] PROBLEM - swift-account-server on ms-be6 is CRITICAL: Connection refused by host [02:55:20] PROBLEM - swift-container-replicator on ms-be6 is CRITICAL: Connection refused by host [02:55:29] PROBLEM - swift-container-auditor on ms-be6 is CRITICAL: Connection refused by host [02:55:29] PROBLEM - swift-object-auditor on ms-be6 is CRITICAL: Connection refused by host [02:55:47] PROBLEM - swift-account-replicator on ms-be6 is CRITICAL: Connection refused by host [02:55:56] PROBLEM - swift-container-server on ms-be6 is CRITICAL: Connection refused by host [03:17:50] RECOVERY - Puppet freshness on snapshot1001 is OK: puppet ran at Tue Sep 4 03:17:18 UTC 2012 [04:04:38] PROBLEM - Puppet freshness on cp1023 is CRITICAL: Puppet has not run in the last 10 hours [04:09:44] PROBLEM - Puppet freshness on cp1022 is CRITICAL: Puppet has not run in the last 10 hours [04:33:08] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [04:34:11] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 2.94 ms [04:35:41] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [04:35:41] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [04:35:41] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [04:35:41] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [04:35:41] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [04:35:42] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [04:35:42] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [04:35:43] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [04:35:43] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [04:38:23] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [05:05:23] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [05:40:11] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [08:36:42] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [08:36:42] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [08:36:42] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [08:42:42] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [09:54:24] PROBLEM - Puppet freshness on cp1025 is CRITICAL: Puppet has not run in the last 10 hours [10:01:18] PROBLEM - Puppet freshness on cp1024 is CRITICAL: Puppet has not run in the last 10 hours [10:07:18] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [10:07:18] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [10:43:51] PROBLEM - Puppet freshness on cp1026 is CRITICAL: Puppet has not run in the last 10 hours [10:43:51] PROBLEM - Puppet freshness on cp1027 is CRITICAL: Puppet has not run in the last 10 hours [10:51:57] PROBLEM - Puppet freshness on cp1028 is CRITICAL: Puppet has not run in the last 10 hours [11:58:41] paravoid: ping [12:38:10] PROBLEM - Puppet freshness on mw74 is CRITICAL: Puppet has not run in the last 10 hours [12:51:13] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [12:55:06] !log Pushed varnish 3.0.3plus-rc1 packages into the precise-wikimedia APT repository [12:55:17] Logged the message, Master [13:05:07] New patchset: Mark Bergsma; "Import debian/ dir from testing/persistent" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/22545 [13:05:07] New patchset: Mark Bergsma; "varnish (3.0.3plus~rc1-wm1) precise; urgency=low" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/22546 [13:07:17] Change merged: Mark Bergsma; [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/22545 [13:07:44] Change merged: Mark Bergsma; [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/22546 [13:20:37] RECOVERY - Varnish HTTP upload-backend on cp1021 is OK: HTTP OK HTTP/1.1 200 OK - 634 bytes in 0.053 seconds [13:20:55] RECOVERY - Varnish HTTP upload-frontend on cp1021 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.053 seconds [13:20:55] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [13:21:40] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [13:29:58] New patchset: Hashar; "(bug 38299) alias 'cmr10' font to 'Computer Modern'" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22533 [13:31:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22533 [13:34:58] New review: Hashar; "Patchset 2 enforce the "Roman" style for cmr10 alias:" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/22533 [13:35:10] font madness [13:42:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:43:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.064 seconds [13:47:14] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (66170) [13:47:59] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (66291) [13:52:20] refreshLinks2: 65399 [13:52:24] oh wonderbar [13:53:06] hashar: it's alright, they'll OOM ;) [14:01:46] I am not sure why there are soooo many of them [14:04:04] Reedy: seems like the zhwiki has implemented wikidata using templates :-] [14:04:15] ... [14:04:15] Country_data_Japan Country_data_Spain ... [14:04:19] I shouldn't be suprised... [14:05:00] there are tons of duplicates too [14:05:13] isn't job_namespace,job_title a primary key? [14:05:23] PROBLEM - Puppet freshness on cp1023 is CRITICAL: Puppet has not run in the last 10 hours [14:05:24] na its not [14:05:25] hmm [14:05:38] New patchset: Mark Bergsma; "Temporarily disable cp1029-1036" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22553 [14:06:30] New patchset: Mark Bergsma; "Send originals and thumbs/temps to Swift" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22554 [14:07:16] maybe they are just pilling up [14:07:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22553 [14:07:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22554 [14:10:20] PROBLEM - Puppet freshness on cp1022 is CRITICAL: Puppet has not run in the last 10 hours [14:11:54] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22553 [14:12:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22554 [14:14:49] New patchset: Demon; "Only set up SMTP for gerrit production host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22217 [14:15:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22217 [14:17:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:25:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [14:32:07] New review: Demon; "Not sure what this error means:" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/22215 [14:36:17] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [14:36:17] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [14:36:17] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [14:36:17] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [14:36:17] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [14:36:18] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [14:36:18] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [14:36:19] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [14:36:19] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [14:43:04] New patchset: Demon; "Puppetized gerrit replication config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22215 [14:43:55] New review: Demon; "PS4 is based on testing I did on the labs install--worked as expected." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/22215 [14:43:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22215 [14:54:58] New patchset: Mark Bergsma; "Don't do changes on cp1029-1036 for now" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22565 [14:55:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22565 [14:56:06] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22565 [14:59:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:12:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.309 seconds [15:16:51] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Tue Sep 4 15:16:26 UTC 2012 [15:19:05] RECOVERY - Varnish HTTP upload-backend on cp1028 is OK: HTTP OK HTTP/1.1 200 OK - 634 bytes in 0.054 seconds [15:20:17] RECOVERY - Varnish HTCP daemon on cp1028 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [15:20:44] RECOVERY - Varnish HTTP upload-frontend on cp1028 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.061 seconds [15:21:11] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [15:27:20] RECOVERY - Puppet freshness on cp1027 is OK: puppet ran at Tue Sep 4 15:27:07 UTC 2012 [15:30:11] RECOVERY - Varnish HTCP daemon on cp1027 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [15:30:29] RECOVERY - Varnish HTTP upload-backend on cp1027 is OK: HTTP OK HTTP/1.1 200 OK - 632 bytes in 0.054 seconds [15:30:56] RECOVERY - Varnish HTTP upload-frontend on cp1027 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.053 seconds [15:31:50] RECOVERY - Varnish traffic logger on cp1027 is OK: PROCS OK: 3 processes with command name varnishncsa [15:36:29] RECOVERY - NTP on cp1028 is OK: NTP OK: Offset -0.05854594707 secs [15:40:59] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [15:42:20] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Tue Sep 4 15:42:06 UTC 2012 [15:47:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:47:53] RECOVERY - NTP on cp1027 is OK: NTP OK: Offset -0.05533313751 secs [16:00:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.043 seconds [16:01:14] RECOVERY - Varnish HTCP daemon on cp1023 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [16:01:14] RECOVERY - Varnish HTTP upload-backend on cp1023 is OK: HTTP OK HTTP/1.1 200 OK - 634 bytes in 0.056 seconds [16:02:53] RECOVERY - Puppet freshness on cp1024 is OK: puppet ran at Tue Sep 4 16:02:21 UTC 2012 [16:03:29] RECOVERY - NTP on cp1023 is OK: NTP OK: Offset -0.04468381405 secs [16:04:32] RECOVERY - Varnish HTCP daemon on cp1024 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [16:06:02] RECOVERY - Varnish HTTP upload-backend on cp1024 is OK: HTTP OK HTTP/1.1 200 OK - 632 bytes in 0.054 seconds [16:10:23] RECOVERY - Puppet freshness on cp1026 is OK: puppet ran at Tue Sep 4 16:10:15 UTC 2012 [16:12:38] RECOVERY - Varnish HTTP upload-backend on cp1026 is OK: HTTP OK HTTP/1.1 200 OK - 634 bytes in 0.982 seconds [16:14:08] RECOVERY - Varnish HTCP daemon on cp1026 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [16:21:56] RECOVERY - NTP on cp1024 is OK: NTP OK: Offset -0.04961526394 secs [16:26:08] !log reimaging searchidx1001 [16:26:17] Logged the message, notpeter [16:26:33] !log temp stopping puppet on brewster [16:26:42] Logged the message, notpeter [16:29:53] RECOVERY - NTP on cp1026 is OK: NTP OK: Offset -0.05717396736 secs [16:30:11] PROBLEM - Host searchidx1001 is DOWN: PING CRITICAL - Packet loss = 100% [16:32:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:53] RECOVERY - Host searchidx1001 is UP: PING OK - Packet loss = 0%, RTA = 26.51 ms [16:39:47] PROBLEM - Lucene disk space on searchidx1001 is CRITICAL: Connection refused by host [16:40:23] PROBLEM - SSH on searchidx1001 is CRITICAL: Connection refused [16:40:38] Change abandoned: Hashar; "we use role::upload::cache on labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16115 [16:41:34] Change abandoned: Demon; "Not going to get back to this soon -- abandoning for now. Can always restore if I change my mind abo..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21725 [16:46:10] New review: Hashar; "We need to rebuild the l10n cache. There is code for it hidden in scap, that need to be extracted ou..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/22116 [16:46:32] RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [16:46:59] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [16:58:14] PROBLEM - NTP on searchidx1001 is CRITICAL: NTP CRITICAL: No response from NTP server [17:15:20] RECOVERY - Lucene disk space on searchidx1001 is OK: DISK OK [17:21:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:22:44] !log search32 shutting down to replace processor [17:22:53] Logged the message, Master [17:29:26] New patchset: RobH; "adding helium and potassium as poolcounter hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22585 [17:30:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22585 [17:32:26] RECOVERY - NTP on searchidx1001 is OK: NTP OK: Offset -0.04581332207 secs [17:35:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.018 seconds [17:39:36] New review: RobH; "self review is the best review!" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/22585 [17:39:36] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22585 [17:41:04] !log pulled latest update of branch 1.20wmf10 on labsconsole [17:41:13] Logged the message, Master [17:41:17] Ryan_Lane: you working from home? [17:41:21] yep [17:41:25] cheater ;p [17:41:34] I'll be coming in the rest of the week [17:42:56] heh [17:43:27] RECOVERY - Host search32 is UP: PING WARNING - Packet loss = 80%, RTA = 0.23 ms [17:45:15] New patchset: RobH; "tweaking the partman file for helium install" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22587 [17:46:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22587 [17:46:27] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [17:47:57] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [17:49:45] New patchset: RobH; "tweaking the partman file for helium install, and netboot listing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22587 [17:50:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22587 [17:50:55] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22587 [17:53:02] notpeter: search32�.processor is in�.ran a puppet update�we'll see [17:53:47] cmjohnson1: sweet! thank you [17:54:07] !log helium being reinstalled to poolcounter server per RT#3407 [17:54:16] Logged the message, RobH [18:07:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:08:07] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22495 [18:09:09] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16866 [18:20:05] anyone with RT access about? Can you tell me if there's a request for the mailing list rename of chaptercommittee-l to affcom? [18:20:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.417 seconds [18:21:24] mark: pong, a little late :) [18:22:09] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [18:22:54] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [18:37:27] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [18:37:27] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [18:37:27] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [18:38:01] !log srv281 going down for troubleshooting [18:38:10] Logged the message, Master [18:43:27] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [18:54:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:57:15] New review: Umherirrender; "Nobody here, to get a +2 for this patch set? Is there a problem with this patch set, that there is n..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/21326 [19:00:35] New review: Dzahn; "I manually copied the config from patch set 3 to srv193 and tried to reload Apache." [operations/apache-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/13293 [19:08:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [19:09:46] !log swift consistency check (heads of various objects) running from bastion1001 as ariel in screen session, about 10 HEADS/sec, if this starts to impact swift just shoot it [19:09:55] Logged the message, Master [19:11:39] New review: Reedy; "I'm still catching up from a week without internet access" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/21326 [19:35:44] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21061 [19:36:24] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (25596), zhwiki (29899) [19:37:45] PROBLEM - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is CRITICAL: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2713* [19:41:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:42:24] PROBLEM - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is CRITICAL: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2588* [19:43:54] RECOVERY - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is OK: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y OK - 2363 [19:51:15] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (16751), zhwiki (32287) [19:53:39] RECOVERY - Host srv281 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms [19:55:27] PROBLEM - Puppet freshness on cp1025 is CRITICAL: Puppet has not run in the last 10 hours [19:55:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [19:55:43] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22484 [19:56:00] ottomata: ^ [19:56:45] danke! [19:56:58] np [19:57:04] something still isn't quite right with that, but I need rfaulkner to help me fix [19:57:06] PROBLEM - Apache HTTP on srv281 is CRITICAL: Connection refused [19:57:06] not sure what is up [19:57:10] but that is all def needed [19:57:11] so thanks [19:57:23] alright, sure [20:00:29] New patchset: Demon; "Make SSL unconditional for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22619 [20:01:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22619 [20:04:21] New patchset: Demon; "Make SSL unconditional for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22619 [20:05:07] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/22619 [20:05:37] New patchset: Demon; "Make SSL unconditional for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22619 [20:06:06] RECOVERY - Apache HTTP on srv281 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.070 second response time [20:06:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22619 [20:08:30] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [20:08:30] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [20:21:43] New patchset: Demon; "Make SSL unconditional for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22619 [20:22:35] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/22619 [20:22:49] Hi. Have you touched the SUL databases? - My global unified login isn't working anymore on some wikis and I've not touched anything. [20:23:05] New patchset: Demon; "Make SSL unconditional for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22619 [20:23:16] What error are you getting? [20:23:20] "you" is vague [20:23:44] you = Dear Wikimedia Tech Team :) [20:23:56] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/22619 [20:24:11] I can't log in anymore on, for example, bnwiki in spite of having a SUL there [20:24:12] New review: Asher; "Why is WLM reliant on toolserver at all? If the api portion is being moved to wmf, why not the data..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/17964 [20:24:41] Incorrect password message, that's the error I get. [20:25:23] New patchset: Demon; "Make SSL unconditional for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22619 [20:26:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22619 [20:27:18] When do you know you could last logon? [20:27:35] Yesterday. [20:27:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:30:51] brb, phone call [20:31:18] PROBLEM - Lucene on search1021 is CRITICAL: Connection refused [20:34:27] RECOVERY - Lucene on search1021 is OK: TCP OK - 0.027 second response time on port 8123 [20:37:54] !log srv291 shutting down for troubleshooting [20:38:03] Logged the message, Master [20:38:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.332 seconds [20:42:17] New patchset: Pyoungmeister; "lucene.php: moving en search back to eqiad" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22662 [20:44:16] heya paravoid, you around? if so, I have a probably easy to answer .deb packaging q [20:45:20] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22662 [20:45:34] !log moving en search back to eqiad [20:45:44] Logged the message, notpeter [20:50:12] ottomata: yes [20:50:43] heya [20:50:58] so I think i've got things pretty good for my libanon stuff, need to build .deb [20:51:04] i want to build packages for both lucid and precise [20:51:29] do I need to modify changelog and then build each? or just build each and then rename .deb files to include ~lucid or ~precise? [20:51:37] paravoid: thanks for you reply to the syslog mail :) [20:52:03] never ever rename .debs [20:52:48] you should add stanzas in debian/changelog like 1.0-1~lucid1 with a message that says e.g. "Backport to lucid" and rebuild [20:53:02] this will create proper source/binary packages [20:53:35] hm, ok cool [20:53:51] is it proper to commit that to the repository with my debian/ stuff? [20:54:18] what repository? [20:54:20] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22619 [20:54:54] ah paravoid, some good news about the swift backend, I'm running a consistency check (= head on all object sin a few containers), and did half of one container with everything looking very good, that's about 30k objects. [20:54:58] the git way of doing that is having a master branch (upstream code only), a debian branch (up to date debian packaging, frequent merges with master) and a lucid and a precise branch [20:55:11] ahh cool [20:55:18] hmmmmm [20:55:34] that does sound nicer [20:55:43] this is the repo [20:55:44] https://gerrit.wikimedia.org/r/gitweb?p=analytics/libanon.git;a=tree [20:55:44] do you use git-buildpackage? [20:55:49] no, just dpkg-buildpackage [20:56:07] well... :) git-bp is nice [20:56:13] handles the master/debian branches nicely [20:56:20] reading... [20:56:21] cool [20:56:34] btw, feel free to add me as a reviewer on such commits if you like [20:57:18] cool, will do! i would love if you could do a post review on the debian/ dir that is already there [20:58:08] looks good; have you seen dh? [20:58:14] you could use it for even simpler debian/rules [20:58:46] i think so…but not really, i've seen it do the default stuff, but, since this came with an autogen.sh script [20:58:55] i wanted to just call that in the configure step [20:59:02] can I still use dh to do that? [20:59:50] you need dh-autoreconf, which is a separate package [20:59:55] Reedy: You might want to have a look at https://gerrit.wikimedia.org/r/22534 those were your files (according to git blame) [20:59:56] New patchset: Demon; "Puppetized gerrit replication config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22215 [21:00:01] but, yes, it handles this case [21:00:38] hm. [21:00:48] hoo: they were only "mine" so much as in I added them to the git repo [21:00:55] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22215 [21:01:15] Suspected that as they were both May this year :/ [21:01:25] Do you think they're safe to delete, though? [21:01:52] !log moving en search traffic back to pmtpa [21:02:01] New patchset: Demon; "Only set up SMTP for gerrit production host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22217 [21:02:02] Logged the message, notpeter [21:02:49] New patchset: Pyoungmeister; "lucene.php: moving search traffic back to pmtpa" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22666 [21:02:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22217 [21:03:06] paravoid: still no wait-io on virt6-8 [21:03:19] Ryan_Lane: hmmmm, interesting. [21:03:23] yep [21:03:31] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22666 [21:03:34] I'm not going to complain, but I'd like to know why :) [21:04:00] RECOVERY - Host srv266 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [21:05:10] notpeter: can throw srv266 back into the pool later today as well [21:05:20] cmjohnson1: yep [21:05:57] thx [21:08:21] PROBLEM - Apache HTTP on srv266 is CRITICAL: Connection refused [21:11:21] RECOVERY - Apache HTTP on srv266 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.348 second response time [21:12:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:22:29] New patchset: Pyoungmeister; "Revert "Using log4j to log Lucene results to udp2log."" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22667 [21:23:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22667 [21:23:54] New review: Pyoungmeister; "for now..." [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/22667 [21:23:55] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22667 [21:24:27] I caught someone's changes [21:24:30] pertaining to gerrit [21:25:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.810 seconds [21:26:01] TomDaley's [21:26:07] notpeter: maybe related to ^demon's mail just now? [21:26:23] ahh, yes, TomDaley == ^demon [21:26:44] where just now is 15 mins back [21:27:26] I don't see it... well, it's live now! [21:27:30] hope that's a positive thing... [21:28:05] * Damianz waits for prod gerrit to break :D [21:28:27] notpeter: actually it was labs-l [21:28:54] ah, gotcha [21:29:40] New patchset: Ottomata; "SearchDaemon.java - moving null checking into logResults() and out of encode() method." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/22668 [21:29:41] New patchset: preilly; "Orange Cameroon added new IP ranges" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22669 [21:30:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22669 [21:31:47] notpeter: can you merge this change set https://gerrit.wikimedia.org/r/#/c/22669/ [21:31:53] preilly: usre [21:32:13] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22669 [21:32:21] preilly: need force puppet run/ [21:32:23] TomDaley: anything we should have in here that we currently dont? https://labsconsole.wikimedia.org/wiki/MediaWiki:Titleblacklist [21:32:31] TomDaley: for instance, characters we shouldn't allow? [21:32:34] notpeter: yes [21:32:36] preilly: kk [21:33:25] doing so now [21:33:36] Ryan_Lane: gerrit2? :) [21:33:43] Ryan_Lane: cron(d) ? [21:34:00] is cron a user? [21:34:21] TomDaley: gerrit2 is a user [21:34:33] so is novaadmin [21:34:39] I thought that was what we were blocking, not letting people register the name. [21:34:58] it's already registered :) [21:35:15] "User account "Gerrit2" is not registered." [21:35:20] you already can't register names that are already registered [21:35:25] huh [21:35:34] https://labsconsole.wikimedia.org/wiki/User:Gerrit2 says no way [21:35:46] Ryan_Lane: perfect timing for 2-factor [21:35:58] tech days? [21:35:59] yeah [21:36:01] no [21:36:03] gerrit-dev ldap login [21:36:10] ah [21:36:11] heh [21:36:12] yeah [21:36:22] root@gerrit-dev can basically escalate to anyone else [21:36:27] yep [21:36:36] to whoever's using it anyway (I didn't in fear of that) [21:36:43] ugh, why are we using that for auth? [21:36:44] not a huge fan of allowing apps to ldap auth in labs [21:36:50] me neither [21:36:56] we should really have another ldap server locally [21:36:58] to test against [21:37:05] Yes, please. [21:37:23] and anyone can set their own ldap passwd in that other instance as much as they want (from the shell) [21:37:37] can we also make ishmael/graphite use that new ldap too? [21:37:45] eh? [21:37:46] new ldap for what? [21:37:49] graphite is in producton [21:38:01] paravoid: an ldap on localhost on that one box [21:38:14] so what. it's not a big deal if someone can break into graphite [21:38:17] i think [21:38:24] 04 21:36:56 < Ryan_Lane> we should really have another ldap server locally [21:38:27] graphite already ldap auths [21:38:31] locally = new ldap [21:38:35] no no no [21:38:40] ldap::self? :) [21:39:00] I'm saying that apps that want to test ldap should each have their own ldap [21:39:01] TomDaley: what's up with your blog? 404 ? [21:39:16] I killed it a long time ago. [21:39:21] Like...a year and a half ago? [21:39:42] k. was just looking at the planet changes [21:39:55] I put a comment on one of the changes saying I could be removed. [21:41:45] Ryan_Lane: "adm", "shutdown" and "halt" might be users per http://refspecs.linuxbase.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/usernames.html [21:42:02] mutante: do you have sysop on labsconsole? [21:42:13] would be good if you could add names as you notice them [21:42:13] yep [21:42:30] TomDaley: we need to exclude certain characters right? [21:42:38] Like? [21:42:38] TomDaley: any clue which ones are disallowed by gerrit? [21:42:50] Offhand, don't know of any. [21:42:53] I know we're utf-8 safe. [21:43:09] ok. we'll find out quickly enough when I enable self-registration monday [21:43:18] did I mention we're coming out of beta on monday? :) [21:43:18] lemme ask [21:43:20] paravoid: ^^ [21:43:53] yep, I saw that [21:44:13] I would prefer still calling it "beta" but still :) [21:44:22] heh [21:44:33] well, it's hard to call something beta when you allow anyone to sign up [21:44:43] Gmail did for a long time. [21:44:46] open beta, I guess? [21:44:55] what TomDaley said :) [21:44:56] I'd consider it stable enough right now to come out of beta [21:45:03] it's very web 2.0 to call everything beta :> [21:45:03] it's been in beta for like a year [21:45:08] gamma? [21:45:13] well, a year in october, anyway [21:45:18] When it's beta, you can claim "it's beta" when it breaks :) [21:45:21] Ryan_Lane: putting me out of a job? [21:45:30] jeremyb: yep :) [21:45:42] Someone can write a gadget to make it say beta again [21:45:43] TomDaley: I can just say "it's labs" [21:46:00] Reedy: Do we have gadgets on labsconsole? [21:46:05] yes [21:46:40] Ryan_Lane: Just think how much money minecraft made in 'alpha' ;) [21:46:49] hwh [21:46:51] err [21:46:52] heh [21:49:34] Titleblacklist sorted alphabetically and added a few [21:52:09] New patchset: Jeremyb; "add new GLAM-WIKI US blog to en planet" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22671 [21:52:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22671 [21:53:05] thanks [21:55:07] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22671 [21:55:09] * jeremyb pokes Ryan_Lane with an ircecho [21:55:15] ah [21:55:16] wow, fast mutante ;) [21:55:20] lemme do some reviews today [21:55:57] jeremyb: i just happened to be in my inbox [21:56:01] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19786 [21:56:06] mutante: heh [21:56:11] preilly: i also merged the ACL changes to Orange Niger [21:56:17] on sockpuppet that is [21:57:15] mutante: so it gets deployed on puppet run? or is there some cron job? [21:57:37] * jeremyb can't remember how it worked with svn [21:57:48] New patchset: Hashar; "extract l10n update from scap" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22673 [21:58:12] jeremyb: the URL will be added to config on puppet run, but then there is a cron at 0:00 UTC to actually run planet and use that config [21:58:37] and.. that is just in the new planet ..on labs or zirconium so far.. [21:58:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22673 [21:58:45] need to run to bank.. bbl [21:58:54] mutante: ahhh. but what about the list of feeds on the side of the page? that's also only once a day? [21:59:14] and what about the old planet then? [21:59:38] jeremyb: conflict :( [21:59:53] Heh, totally knew that would conflict. [21:59:54] mind rebasing? [21:59:58] yeah, me too [22:00:01] That hasn't been rebased since the gerrit.pp overhaul. [22:00:19] Ryan_Lane: surely. just review for rubber stampability and then i'll rebase [22:00:24] did [22:00:28] k [22:00:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:00:58] Ryan_Lane: which # exactly? [22:01:18] https://gerrit.wikimedia.org/r/#/c/8344/ [22:01:29] ok [22:01:32] jeremyb: eh yeah, old planet is still svn but svn should not be used if it can wait just a little while longer. we already synced the configs [22:01:33] Ryan_Lane: can you do 8120 too? [22:01:35] jeremyb: Actually, we can test out https://gerrit.wikimedia.org/r/#/c/11589/ on labs now too if we want. [22:01:40] looking at it no [22:01:41] now [22:02:03] jeremyb: currently the thing is getting translations for the menu for the non-English languages.. need to run right now though [22:02:09] TomDaley: oh that. i saw you comment on google code today ;) [22:02:19] Yeah, somebody brought it up. [22:02:19] Again [22:02:25] mutante: bank! [22:02:45] I'll be honest [22:02:52] I have no clue what 8120 is doing [22:03:03] New review: Hashar; "Will need https://gerrit.wikimedia.org/r/22673 to update the l10n cache." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/22116 [22:03:06] didn't you say that before ;/ [22:03:30] didn't it already support multiple files per project? [22:03:31] iirc TomDaley reviewed that one [22:03:57] I never actually *tested* it. [22:04:09] every time the hooks get changed they break and I spend 30 mins to an hour fixing them [22:04:13] every single time [22:04:17] right, just reviewed [22:04:25] do we have hooks working in labs? [22:04:32] Well, they fire. [22:04:37] But we don't have ircecho enabled on labs [22:04:51] TomDaley: [commentlinks] so, i did already test it myself in labs and gave up on it because i couldn't get it to work the way i wanted. maybe i need to try harder or maybe we need to wait for upstream to fix it. [22:04:55] sure but we could [22:05:07] Ryan_Lane: i could test in labs if you like [22:05:16] iswildcard = lambda string: "*" in string [22:05:18] ^^?? [22:05:21] wtf? [22:05:30] where is string being defined? [22:05:54] wtf is match doing? [22:06:53] all this so that we can support wildcards in the names? [22:07:00] why don't we just explicitly name the files? [22:07:21] i think it was wildcards for project names [22:07:27] ahhhh [22:07:27] don't remember for sure [22:07:27] ok [22:07:33] that makes sense, then [22:07:36] !log adding srv266 and srv281 back to apaches pool (to see if they break...) [22:07:45] Logged the message, notpeter [22:08:17] Ryan_Lane: i don't remember the code that well (and haven't opened it recently) but in the excerpt you pasted 'string' is the only parameter for a function called iswildcard. iswildcard('foo') would be false and iswildcard('foo*') would be true [22:08:47] https://gerrit.wikimedia.org/r/#/c/8120/6/files/gerrit/hooks/hookhelper.py,unified [22:09:17] ah. crap [22:09:19] ignore me [22:09:23] I'm reading that porly [22:09:25] poorly [22:09:44] this is why I hate lambdas. this code is really not easy to read [22:10:00] it's all lambdas and regexes [22:10:03] You know...come 2.5 all this crap should be plugins anyway. [22:10:09] Move IRC to a plugin. [22:10:20] that doesn't make it any easier to manage [22:11:50] and things like this: logs = isinstance(logs, basestring) and [logs] or logs [22:11:52] anyway, i certainly would be happy to have it tested and i'm willing to do it. just don't know if i can do it today [22:12:27] what does that line even do? [22:12:47] 133, 134 and 135 make no sense to me [22:13:04] doesn't 134 duck type to a boolean? [22:13:28] which means it isn't iterable [22:14:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [22:18:49] when I tell my friends I know some basic regex they stars at me [22:18:56] "you are such a hacker" ... [22:19:17] I should run a regex 101 one day [22:20:03] is: "logs = isinstance(logs, basestring) and [logs] or logs" like "if blah ? something : somethingelse" [22:20:04] ? [22:20:05] hashar: everyone knows that 99% of hackers use grep [22:20:11] jeremyb: ? [22:20:47] Ryan_Lane: sorry, IRL distraction. back in a bit [22:20:49] PROBLEM - poolcounter on helium is CRITICAL: PROCS CRITICAL: 0 processes with command name poolcounterd [22:20:49] I really don't mind the extra lines of code that would make that easier to read [22:23:29] Ryan_Lane: That's confusing also because there are no parentheses indicating the priority of and relative to or [22:23:37] I mean you can look that stuff up, but it's often unclear when you read it [22:23:42] exactly [22:23:52] I'm asking for it to be changed in the review [22:24:19] Generally, in languages like these, "foo or bar" means foo ? foo : bar and "foo and bar" means foo ? bar : false [22:24:31] not a fan [22:24:44] that one liner can be written more simply: [22:24:45] if isinstance(logs, basestring): [22:24:45] logs = [logs] [22:24:46] JS has it and I really like being able to do index = index || 0; [22:25:14] But doing it with and is rare, and mixing and and or like that without parentheses is a bit too weird for me [22:25:32] my two liner does the same thing and is way clearer [22:25:37] Yeah [22:26:05] the original syntax can be mistaken to think you're duck typing to a boolean [22:26:08] It's barely more code, too [22:26:28] Actually, discounting whitespace it's actually less code [22:26:34] yeah [22:27:32] New review: Ryan Lane; "I'd like some of the syntax to be simplified. The logic looks fine, though." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/8120 [22:28:01] RoanKattouw: still need this? https://gerrit.wikimedia.org/r/#/c/10669/ [22:28:19] I don't actually know [22:28:19] oh [22:28:20] ori-l-away: ? [22:28:26] you have a review already [22:28:31] Right [22:28:42] I also submitted it for someone else (Ori) a long time ago and I don't know if he still needs it [22:28:45] https://gerrit.wikimedia.org/r/#/c/11979/ ….? [22:28:52] is this still waiting on someone's approval? [22:29:11] yeah [22:29:13] waiting ct's [22:31:12] Yeah [22:31:14] Ryan_Lane: does CT not have a gerrit account? [22:31:25] we should add him as reviewer to shell requests like that, but i tried and no dice. [22:31:34] yeah [22:33:01] * AaronSchulz reminds tim about ScanSet :) [22:33:59] what about it? [22:34:20] Ryan_Lane, RoanKattouw, I usually send a private IRC message to woosters with th RT ticket URL and then it gets approved real fast. [22:35:18] CT is aware of this ticket [22:35:18] TimStarling: is it staying on nfs? [22:35:29] * AaronSchulz is aware of Roan [22:35:35] It's caught up in weird-ass internal politics about how Peter isn't technically in engineering [22:36:42] in that case, we should retroactively make THREE gerrit accounts for each person who's been left out. so they can catch up. [22:36:46] well, paravoid was telling me that he doesn't want anything to stay on NFS [22:37:04] ideally [22:37:19] I would have thought it would be easier to leave EasyTimeline, ExtensionDistributor and ScanSet on NFS for now [22:37:32] I was saying that we want to move as much upload as possible to swift then see what to do about the rest [22:37:38] since the required resources would be negligible [22:39:25] PROBLEM - Puppet freshness on mw74 is CRITICAL: Puppet has not run in the last 10 hours [22:39:26] TimStarling: Got a second? [22:40:59] TimStarling: well, in any case, I'd say let's separate content that we upload vs. content that users upload; isn't timeline something that users do? [22:41:25] (separate it because if we opt in keeping it in NFS, we might adjust our processes for replicating the content, e.g. git) [22:41:53] timeline is the same as math, it has images generated by the parser based on user input [22:42:09] ExtensionDistributor also needs to be written to on user requests [22:42:12] aha [22:42:17] ScanSet is the only one that's truly static [22:42:32] so, if we're putting math on swift, why not put timeline too? [22:42:52] we can, with a config change [22:43:09] !log running heads on commons d/d7 on bast1001 as ariel from screen, data consistency check [22:43:17] Logged the message, Master [22:43:20] TimStarling, on the OTRS login screen there's a section called "News" which was last updated in Jan 2009, an OTRS admin told me that you need to have db access to modify this, are there instructions somewhere that others can refer to to be able to add to it if an RT/bugzilla request is opened? I ask you because you appear to be the one that patched in 3 years ago [22:43:48] Thehelpfulone: actually it's a file, it's not in the DB [22:44:18] Ah okay - who can edit it? [22:44:42] any root [22:44:58] bz needs puppetization so anyone can do it ;_; [22:45:18] And would they know where it is? [22:45:22] paravoid: want to schedule math/timeline windows? [22:45:31] didn't Tim just say that we should keep timeline on NFS? [22:45:39] wikitech.wikimedia.org says Source is in /opt/otrs [22:45:49] I would love it not to be on nfs in the mid term [22:45:57] he said it would be "easier" ;) [22:46:26] true :) [22:46:29] paravoid: the code is already done (math already uses the abstractions) [22:46:30] well, if the code is already written then it's not easier [22:46:39] same with timeline [22:47:07] okay, great [22:47:22] TimStarling: depending on cache hit rate, I guess we might want to pre-copy the files for good measure [22:47:24] so, how does this work? are you going to do an initial sync of the content first? [22:47:32] though there are already scripts for that [22:47:36] could someone do a labs user creation for me? I *think* it's not done yet. not sure if i'm reading ldaplist properly [22:47:40] svn/labs shell/git name/MW user: akshay , email akshay.leadindia at gmail dotcom [22:47:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:47:54] math is generated from the wikitext, how does timeline work (i.e.what is it)? I ask in the case that we move to swift and then wind up having ot move back to ms7 cause too many servers crap out [22:48:00] hopefully not but... it's possible [22:48:00] maybe TomDaley [22:48:12] apergos: with timeline, lol [22:48:17] Thehelpfulone: /opt/otrs/Kernel/Output/HTML/Standard/Motd.dtl [22:48:27] Thanks [22:48:52] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [22:49:01] is it something that will be regenerated from wikitext if the content is not found? [22:49:12] it's also in patches/30-wm-brand.patch [22:49:34] AaronSchulz: the window is for switching the config, right? are you planning to do an initial sync or to regenerate everything as we go? [22:49:36] TimStarling: do you know which revision/tag/release we're on for OTRS? [22:49:51] apergos: it should regen, same with math [22:49:55] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [22:49:57] ok great [22:50:00] Thehelpfulone: heh, TimStarling just said where it was [22:50:06] but i was curious so i found it right when he did [22:50:09] paravoid: I don't know what the cache hit rate is [22:50:11] you need it changed? [22:50:22] RobH, yes but the wording is yet to be decided [22:50:28] paravoid: I can always copy to be safe [22:50:34] ok, well, I know where it is now, so I am happy to help [22:50:40] those copies shouldn't take very long [22:50:43] relatively speaking [22:50:48] Ryan_Lane: so i have some backscroll to read (and have to reread the patch) but I think maybe I get what you want for 8120. i'll work on that and on the rebase in a couple hrs [22:50:52] jeremyb: a heavily patched version of a 2.4 pre-release development version, see /opt/otrs/README.wikimedia [22:50:57] great, I'll let you know when we have a message [22:51:02] jeremyb: cool, thanks [22:51:09] RobH: If you're in the mood for something OTRSey, RT # 3515 should only take a minute :-) [22:51:14] Ryan_Lane: do you want to do that labs account above? [22:51:17] they told me that my security patch was accepted, that's where most of the patching was [22:51:28] which labs account? [22:51:31] * 3513 [22:51:47] Ryan_Lane: 22:47:40 UTC. so 4 mins ago [22:51:57] in this channel [22:52:13] which user name? [22:52:21] 04 22:47:40 < jeremyb> svn/labs shell/git name/MW user: akshay , email akshay.leadindia at gmail dotcom [22:52:28] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [22:52:46] it's one that already has an svn account? [22:53:07] I don't see a request for it on the developer access page [22:53:26] TimStarling, jeremy asks because https://bugzilla.wikimedia.org/show_bug.cgi?id=22622#c24 - there are some XSS vulnerabilities pre 2.4.14 [22:53:45] jeremyb: all of the names should be akshay? [22:53:46] Ryan_Lane: yeah. i'm not certain I'm reading ldaplist right but it's not in [[special:listusers]] [22:53:55] Ryan_Lane: yes [22:54:01] RD, is this about the news item update or something else (I was just going to send an email to the otrs mailing list) [22:54:03] AaronSchulz: something like 95% misses it seems. that sounds wrong... [22:54:07] Thehelpfulone: So in case you have to ask another root to do this, the actual MOTD file is located on the OTRS server (williams) in /opt/otrs/Kernel/Output/HTML/Standard/Motd.dtl [22:54:15] Thehelpfulone: Nope, something different [22:54:20] the patch tim listed references that file for content [22:54:27] paravoid: lets like at timeline first [22:54:41] yep I figured that out :) [22:54:47] Ryan_Lane: also there's 2 different people named (as in birth cert) akshay! [22:54:59] what do you mean? [22:55:03] timeline is all misses [22:55:26] high or low rate? [22:55:33] *request rate [22:55:35] 0% hit rate [22:55:39] jeremyb: ? [22:55:44] :-D [22:55:51] Ryan_Lane: just they both exist. that confused me for a bit. [22:56:05] I still don't understand what you mean [22:56:09] paravoid: I was talking about qps [22:56:12] Change merged: Pyoungmeister; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/22668 [22:56:18] are you saying that this svn user may not be the same person? [22:56:22] are there fucked up accounts? [22:56:28] please be specific when you say stuff like that [22:56:29] Ryan_Lane: no, i think the accounts are fine [22:56:35] 0% hit rate but only a few gets/sec is fine too [22:56:35] Thehelpfulone: looks pretty easy to fix [22:56:43] so I still have no clue what you mean [22:56:47] Ryan_Lane: just saying there's a kaldari and a lane, so there's also 2 akshays [22:56:48] not really my job though [22:57:01] that still makes no sense to me [22:57:09] TimStarling: I thought everything was your job ;) [22:57:16] TimStarling, whose job is it in engineering so I can nicely ask them? :) [22:57:16] AaronSchulz: timeline's qps is almost negligible [22:57:23] figures [22:57:32] jeremyb: this is just an observation that two different people have the same last name? [22:58:09] math is more, but still just a few qps [22:58:17] Ryan_Lane: it's a first name i think. but sure. was just a warning because i recognized the name because i'd seen the other akshay before and i had them mixed up at first [22:58:33] ok [22:58:38] you could ask ops but they will take 6 months and upgrade the server to precise as a prerequisite ;) [22:58:38] well, that request is done [22:58:45] no clue how to tell the user that, though [22:58:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.793 seconds [22:58:54] paravoid: ok, sounds simple then [22:59:01] Ryan_Lane: he's on irc, i'll tell him [22:59:10] Ryan_Lane: i see the change in ldaplist [22:59:15] danke [22:59:28] TimStarling, ask RD, we've been waiting for years :P [22:59:40] yw [22:59:56] And I was told today it is "not on the roadmap" :P Not promising [22:59:58] Shrug [23:00:02] told him [23:00:15] That's why I ask for little things like database queries in the meantime :P [23:00:18] for upgrade to 3.0 you mean? [23:00:43] to fix that security advisory all you have to do is change two files, they tell you which two files to change and what to change them to on http://www.otrs.com/de/open-source/community-news/security-advisories/security-advisory-2012-02/ [23:00:53] Thehelpfulone: TimStarling: actually i didn't really ask because of the bug. it was more of a "how to reproduce the same state that's currently in prod using labs" thing [23:01:01] (or using some other staging env) [23:01:56] paravoid: well the script is still needed, since they re-render on parse, not view [23:02:06] AaronSchulz: right [23:02:10] AaronSchulz: I was just thinking about that [23:03:01] so, 1. make MW write on both 2. run the sync script 3. switch squids [23:03:45] yeah [23:04:37] and just to clarify, you'll sync only /math, not /wikipedia/en/math, right? [23:04:51] oh yeah, since we were getting requests for those.... [23:05:19] paravoid: well we'd do timeline first...but yes, for math, just /math [23:05:22] we're getting requests for both, /math is the path iirc [23:05:35] the new* [23:06:01] yeah we switched back to just math/ a while back...though some cached versions of pages may still refer to site/lang/math for a while... [23:06:27] but timeline is going to stay in site/lang/timeline, correct? [23:06:39] yep [23:06:45] great [23:06:51] some still do, I am looking at the live requests right now [23:06:57] since our squid cache will just affect /math, we should be good there [23:07:06] GET /wikipedia/es/math/... [23:07:20] the site/lang ones would still hit nfs [23:07:21] apergos: cached pages [23:07:26] uh huh [23:07:35] we'll leave ms7 up until those hits gradually vanish [23:07:36] so how long before those pages fall out of the cache? [23:07:52] 30 days since that change? [23:07:52] we could even rewrite them if needed [23:08:11] I presume the hash on the right-hand side remained the same? [23:08:46] yes [23:08:53] RD: that sql query doesnt work per 3513 [23:09:05] hash is a function of the mathml text (to be rendered) and options [23:09:08] ok, well having the rewrite in our back pocket is good, because ms7 does have limited space [23:09:34] I dont' think we'll hit the wall but if performance turns out to be an issue before then, we might have to hustle [23:10:03] (*cough* zfs issues *cough*) [23:10:44] RobH: Damn it! Ok, thanks [23:10:55] there is system_address [23:11:07] but the sql query joins system_user, which is a non-existent table [23:11:16] there is also role_user [23:12:12] i updated the rt with details. [23:12:53] what prep work are we going to need? I'm thinking from the ops side, none really [23:13:09] AaronSchulz: so, squid switch is trivial; (1) and (2) above sound like your territory, so I'd say schedule it as you please and just ping us? [23:13:13] New patchset: Aaron Schulz; "Write timeline files to swift and NFS." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22683 [23:13:18] hah [23:14:19] New review: Ryan Lane; "hostname needs to be changed into an array called hostnames." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/15561 [23:14:23] RoanKattouw_away: ^^ [23:18:34] PROBLEM - Host srv281 is DOWN: PING CRITICAL - Packet loss = 100% [23:19:20] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17640 [23:21:14] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/21895 [23:22:32] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22111 [23:23:31] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22215 [23:24:10] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22217 [23:25:43] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21326 [23:26:35] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21475 [23:27:22] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21561 [23:27:49] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22683 [23:27:50] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/19230 [23:32:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:32:53] New review: Thehelpfulone; "So did we delete sep11 wiki back in 2003?" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/22534 [23:34:50] correct me if I mis understand, but doesn't "diff all unified" mean that gerrit should only open one diff? [23:35:08] lol [23:35:13] at least I thought that's what it did the last time I pressed it? :P [23:35:34] it never has on gerrit ;) [23:35:35] I think it's always been horribly broken [23:36:41] well it is the "unified diff" of "all files" [23:37:16] more broken is looking at a diff where a rebase happened in the middle [23:37:36] so how many browser tabs would it open? [23:41:19] see now https://codereview.qt-project.org/#patch,all,26062,9 is nice [23:43:13] ugh, gerrit uses a 1024 bit host ssh key? how did i never notice that before? [23:43:16] ;-( [23:43:42] * jeremyb runs away [23:45:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.587 seconds [23:46:46] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (43815) [23:47:49] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwikisource (25332), zhwiki (43427) [23:55:42] AaronSchulz: so? [23:56:04] did we agree on the plan/deploy windows? [23:56:52] paravoid: you mean me pinging you? :) [23:57:37] Planning? Pfft. [23:57:48] I guess when the script is done we can do (3) [23:59:28] Someone thinks their wiki SUL password is being cracked. What shall I tell them?