[00:00:14] jeremyb: today I'm working on self-registration [00:00:18] for labsconsole [00:00:29] I promise I'll do reviews monday [00:00:33] Ryan_Lane: i saw something about that [00:00:38] Ryan_Lane: you're working monday? [00:00:43] oh [00:00:45] tuesday [00:00:50] that's better ;) [00:00:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:01:18] https://labsconsole.wikimedia.org/wiki/Help:Contents#Requesting_Shell_Access [00:01:19] :) [00:01:23] Ryan_Lane: Grr, looks like the LDAP certs moved [00:01:36] yep [00:01:42] LDAP certs? as in TLS for LDAP? [00:01:43] things in openstack changed too [00:01:51] he means in the manifesta [00:01:53] manifests [00:01:57] they are in the role classes now [00:02:31] oh, he's talking about the rebase i guess [00:03:04] Ryan_Lane: what about that already has svn but wants a rename one? [00:04:21] maybe we can hide the completed flag? or do i need to try logged out [00:05:06] completed? [00:05:17] completed means the admin has completed it [00:05:19] Yeah, I found them [00:05:40] I added hostname => $fqdn, port => 636 to the install_certificate calls in the role file, and removed the cert monitoring from manifests/ldap.pp [00:05:50] * Ryan_Lane nods [00:06:25] Submitting rebased version [00:06:27] New patchset: Catrope; "Clean up the mess that is SSL certificate installation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/15561 [00:07:07] Ryan_Lane: but when submitting a new one you don't need to see the box. all new requests are by definition not completed [00:07:13] ah [00:07:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/15561 [00:07:14] right [00:07:16] it can be hidden [00:07:18] yeah. lemme do that [00:08:11] hm [00:08:23] it needs to be hidden on creation only [00:13:11] hm [00:13:33] I guess I could have one form for creating and another for editing [00:15:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.098 seconds [00:15:26] yep [00:15:27] that works [00:16:52] much simpler process now [00:18:01] * Damianz looks at ryan having a converation with himself and just yays at self registration [00:18:27] I'm basically asking for feedback ;) [00:19:52] I don't really see why you'd need to edit it tbh [00:21:22] Like once marked as complete being able to edit the "justification" seems wrong somehow, granted it's just editing a page so not really stopable anyway [00:21:49] the user may want to edit her/her own justification [00:22:11] and they are just screwing themselves if they mark it comple [00:22:13] complete [00:23:29] her/her, no guys allowed [00:23:52] And yeah, form wise it looks ok tbh [00:27:14] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [00:27:14] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [00:27:14] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [00:27:31] abartov: the ticket's in the queue now (or should be) [00:33:14] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [00:47:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:48:23] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (10680) [00:48:40] zhwiki again... [01:00:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.650 seconds [01:03:59] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [01:34:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:39] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [01:39:23] jeremyb: thanks! [01:41:57] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 258 seconds [01:41:57] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 260 seconds [01:47:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.379 seconds [01:47:39] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 601s [01:52:54] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (35844) [01:53:21] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (35676) [01:55:45] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 24 seconds [01:58:27] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 6s [01:58:45] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 23 seconds [01:59:21] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [01:59:21] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [02:22:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.364 seconds [02:47:35] New patchset: Catrope; "(bug 39877) Set X-Frame-Options: SAMEORIGIN if UploadWizard enabled" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22290 [02:48:03] Change merged: Catrope; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22290 [02:51:24] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.046 second response time [03:25:00] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [03:40:09] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [03:48:06] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [04:09:15] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.043 second response time [04:32:32] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [04:47:32] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [05:27:37] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 181 seconds [05:29:07] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 0 seconds [05:39:01] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [05:48:01] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [05:50:16] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 189 seconds [05:50:43] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 213 seconds [05:56:34] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 0 seconds [05:57:01] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 0 seconds [06:27:57] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [06:27:57] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:27:57] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [06:27:57] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [06:27:57] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [06:27:58] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [06:27:58] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [06:27:59] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [06:27:59] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [07:33:08] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [09:13:23] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [09:36:29] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [09:36:38] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [09:37:14] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [09:49:41] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [09:50:53] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms [10:24:52] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [10:28:01] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [10:28:01] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [10:28:01] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [10:34:01] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [10:39:08] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [10:43:37] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.044 second response time [12:00:48] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [12:00:48] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [12:09:46] New patchset: Alex Monk; "(bug 39878) Add dewikiversity and ltwiki to betawikiversity import sources." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22295 [12:24:16] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [12:37:10] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [12:53:22] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [13:14:16] abartov: you should have the same email i got, you're all set [13:18:42] i guess maybe Thehelpfulone was gone then? too bad no one caught and fixed my "only roots" ;/ [13:19:51] (it definitely was only sometime in the last 6 months - 3 years!) [13:53:08] jeremyb, hmm? [14:00:47] 31 23:53:58 < abartov> who has the power to do this? [14:00:47] 31 23:55:49 < jeremyb> abartov: only roots [14:01:12] Thehelpfulone: obviously I was wrong. just was kinda surprised no one told me so here ;) [14:01:28] ah [14:01:45] yeah the site admin password can allow you to get into the admin intereface for any mailing list - and thus you can reset the password [14:02:03] yeah, i knew that. just obviously didn't know who had the passwd [14:02:20] i've had some mailman instances where i was the only admin ;) [14:02:40] other than ops, there's only one person that I know that has it [14:03:20] and i didn't know he did. anyway, whatever, it's done [15:54:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.041 seconds [16:29:11] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [16:29:11] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [16:29:11] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [16:29:11] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [16:29:11] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [16:29:12] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [16:29:12] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [16:29:13] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [16:29:13] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [17:05:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:16:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.821 seconds [17:34:13] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [17:49:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:02:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.854 seconds [18:37:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:50:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.707 seconds [19:14:48] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [19:23:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:39:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds [19:47:45] hello [19:48:08] any database guys around? [19:49:16] not really, Platonides was looking for one earlier. but you're looking for a slightly different group of people [19:49:23] (he was trying to optimise a query i think) [19:49:57] * jeremyb wonders who's lurking out there [19:50:03] jeremyb: true - i am looking for some ops [19:52:05] nosy: i can't find the problem in your munin [19:52:15] tried all the replication links [19:52:34] jeremyb: pardon how do you mean? [19:53:20] its not monitoring this irc channel :D [19:53:41] i don't follow [19:53:46] e.g. http://munin.toolserver.org/Database/rosemary/mysql_replication.html [19:53:54] i dont too [19:54:05] i tried for like 10 different hosts and found nothing else broken besides rosemary [19:54:21] so where's the s2/s4 problem then? it's not monitored at all? [19:54:23] ah it does not monitor trainwreck replags [19:54:28] or in nagios but not munin? [19:54:28] k [19:54:44] no i ask the tsbot for @replag all [19:55:02] thats about the same as doing the mysql query i just sent you [19:55:03] right [19:55:06] k [19:55:19] but rosemary...i will check... [19:56:24] nosy: lag is going up on s2 [19:56:31] but not at 1 s/s [19:56:38] so *something* is happening [20:01:37] true but why so slowly? i still hope it is on the side of db54 :D [20:02:05] nosy: then wouldn't it's slaves also be lagged? [20:03:52] because there are not so many meters of cable inbetween? no i think you must be right... [20:10:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.755 seconds [20:29:36] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [20:29:36] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [20:29:36] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [20:35:36] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [20:42:48] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [20:44:27] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [20:47:45] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [20:58:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:09:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.882 seconds [21:13:25] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [21:26:27] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [21:44:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:56:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.373 seconds [21:58:24] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [22:01:42] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [22:01:42] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [22:30:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:38:36] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [22:42:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.089 seconds [23:14:59] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [23:16:56] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [23:18:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:20:23] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [23:32:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [23:35:05] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [23:35:59] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.94 ms