[00:12:34] PROBLEM - HTTPS_m.wikipedia.org on cp1061 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [00:12:37] PROBLEM - HTTPS_m.wikimediafoundation.org on cp4006 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [00:12:37] PROBLEM - HTTPS_m.wikivoyage.org on cp1069 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [00:12:37] PROBLEM - HTTPS_wikimedia.org on cp3019 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [00:12:37] PROBLEM - HTTPS_wikisource.org on cp3022 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [00:12:37] PROBLEM - HTTPS_m.wikivoyage.org on cp3012 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [00:12:37] PROBLEM - HTTPS_m.wiktionary.org on amssq60 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [00:15:17] RECOVERY - HTTPS_m.wikipedia.org on cp1061 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 336 days) [00:15:22] RECOVERY - HTTPS_m.wikivoyage.org on cp1069 is OK: SSL_CERT OK - X.509 certificate for *.m.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:02 2015 GMT (expires in 336 days) [00:15:44] RECOVERY - HTTPS_m.wikimediafoundation.org on cp4006 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 336 days) [00:15:44] RECOVERY - HTTPS_m.wiktionary.org on amssq60 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 336 days) [00:15:44] RECOVERY - HTTPS_m.wikivoyage.org on cp3012 is OK: SSL_CERT OK - X.509 certificate for *.m.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:02 2015 GMT (expires in 336 days) [00:15:45] RECOVERY - HTTPS_wikimedia.org on cp3019 is OK: SSL_CERT OK - X.509 certificate for *.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 4 21:06:06 2015 GMT (expires in 318 days) [00:15:45] RECOVERY - HTTPS_wikisource.org on cp3022 is OK: SSL_CERT OK - X.509 certificate for *.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:09 2015 GMT (expires in 336 days) [02:56:24] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: puppet fail [03:09:15] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [03:33:11] PROBLEM - Router interfaces on mr1-esams is CRITICAL: CRITICAL: host 91.198.174.247, interfaces up: 36, down: 1, dormant: 0, excluded: 1, unused: 0BRge-0/0/0: down - Core: msw-oe12-esamsBR [04:27:35] (03PS3) 10KartikMistry: Beta: Add support for more language pairs [puppet] - 10https://gerrit.wikimedia.org/r/180724 [04:38:01] (03PS1) 10KartikMistry: Quote file modes [puppet] - 10https://gerrit.wikimedia.org/r/181380 [04:41:39] (03PS4) 10KartikMistry: Beta: Add support for more language pairs [puppet] - 10https://gerrit.wikimedia.org/r/180724 [04:47:41] (03PS5) 10KartikMistry: Beta: Add support for more language pairs [puppet] - 10https://gerrit.wikimedia.org/r/180724 [04:52:36] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [04:57:25] (03PS1) 10KartikMistry: Properly align indentation of => [puppet] - 10https://gerrit.wikimedia.org/r/181381 [05:28:32] (03PS1) 10Andrew Bogott: Our custom build of bootstrap-vz is called python-bootstrap-vz [puppet] - 10https://gerrit.wikimedia.org/r/181382 [05:29:18] (03CR) 10Andrew Bogott: [C: 032] Our custom build of bootstrap-vz is called python-bootstrap-vz [puppet] - 10https://gerrit.wikimedia.org/r/181382 (owner: 10Andrew Bogott) [05:37:59] PROBLEM - Router interfaces on mr1-esams is CRITICAL: CRITICAL: host 91.198.174.247, interfaces up: 36, down: 1, dormant: 0, excluded: 1, unused: 0BRge-0/0/0: down - Core: msw-oe12-esamsBR [05:38:33] (03PS1) 10Andrew Bogott: Removed a useless dependency [puppet] - 10https://gerrit.wikimedia.org/r/181384 [05:39:49] (03CR) 10Andrew Bogott: [C: 032] Removed a useless dependency [puppet] - 10https://gerrit.wikimedia.org/r/181384 (owner: 10Andrew Bogott) [06:10:04] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [06:10:16] PROBLEM - salt-minion processes on virt1012 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [06:34:47] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:48] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:19] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:30] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:27] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:44:46] <_joe_> good morning [06:45:55] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:46:03] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:47:13] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:14] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:48] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:52:30] ACKNOWLEDGEMENT - salt-minion processes on virt1012 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion andrew bogott Working on this now -- same failure on 1010, 1011, 1012 [06:55:12] RECOVERY - salt-minion processes on virt1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:57:47] RECOVERY - salt-minion processes on virt1012 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:58:12] RECOVERY - salt-minion processes on virt1011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:58:39] <_joe_> andrewbogott: you may need to delete their salt key from palladium and restart the salt minion [06:58:47] <_joe_> oh I see you did that probably [06:58:51] _joe_: yep! Just did. [07:02:09] (03PS1) 10Giuseppe Lavagetto: Installation of the orientdb test cluster [puppet] - 10https://gerrit.wikimedia.org/r/181385 [07:04:37] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Installation of the orientdb test cluster [puppet] - 10https://gerrit.wikimedia.org/r/181385 (owner: 10Giuseppe Lavagetto) [07:35:47] * YuviPanda waves at people [07:58:51] PROBLEM - HTTPS_unified on cp4020 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:58:54] PROBLEM - HTTPS_m.wikisource.org on amssq46 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:59:03] PROBLEM - HTTPS_wiktionary.org on cp3007 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:59:03] PROBLEM - HTTPS_m.wikinews.org on amssq47 is CRITICAL: SSL_CERT CRITICAL: Error: [07:59:03] PROBLEM - HTTPS_wiktionary.org on cp1037 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:59:04] PROBLEM - HTTPS_m.wikiquote.org on amssq55 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:59:04] PROBLEM - HTTPS_m.wikimedia.org on cp4003 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:59:04] PROBLEM - HTTPS_wikidata.org on cp4008 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:59:04] PROBLEM - HTTPS_wikidata.org on cp3003 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [08:02:29] RECOVERY - HTTPS_m.wiktionary.org on cp1040 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 335 days) [08:02:29] RECOVERY - HTTPS_wiktionary.org on cp1037 is OK: SSL_CERT OK - X.509 certificate for *.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:05 2015 GMT (expires in 335 days) [08:02:29] RECOVERY - HTTPS_m.wikivoyage.org on cp1052 is OK: SSL_CERT OK - X.509 certificate for *.m.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:02 2015 GMT (expires in 335 days) [08:02:29] RECOVERY - HTTPS_wikidata.org on cp4008 is OK: SSL_CERT OK - X.509 certificate for *.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:02 2015 GMT (expires in 335 days) [08:02:29] RECOVERY - HTTPS_wikidata.org on cp3003 is OK: SSL_CERT OK - X.509 certificate for *.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:02 2015 GMT (expires in 335 days) [08:02:30] RECOVERY - HTTPS_m.wikimediafoundation.org on cp1057 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 335 days) [08:02:30] RECOVERY - HTTPS_wikisource.org on cp4006 is OK: SSL_CERT OK - X.509 certificate for *.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:09 2015 GMT (expires in 335 days) [08:07:14] <_joe_> paravoid: ping [08:10:37] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [08:25:55] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [08:30:42] PROBLEM - DPKG on capella is CRITICAL: Connection refused by host [08:31:14] PROBLEM - Disk space on capella is CRITICAL: Connection refused by host [08:33:01] (03PS1) 10Andrew Bogott: Fix the projectgid fact. [puppet] - 10https://gerrit.wikimedia.org/r/181387 [08:33:06] (03PS1) 10Yuvipanda: tools: Add initial set of uwsgi nodes [puppet] - 10https://gerrit.wikimedia.org/r/181388 [08:33:16] PROBLEM - RAID on capella is CRITICAL: Connection refused by host [08:33:43] PROBLEM - configured eth on capella is CRITICAL: Connection refused by host [08:33:46] (03CR) 10Yuvipanda: [C: 032] tools: Add initial set of uwsgi nodes [puppet] - 10https://gerrit.wikimedia.org/r/181388 (owner: 10Yuvipanda) [08:33:52] PROBLEM - dhclient process on capella is CRITICAL: Connection refused by host [08:34:03] PROBLEM - puppet last run on capella is CRITICAL: CRITICAL: puppet fail [08:34:38] RECOVERY - DPKG on capella is OK: All packages OK [08:34:54] (03PS2) 10Andrew Bogott: Fix the projectgid fact. [puppet] - 10https://gerrit.wikimedia.org/r/181387 [08:35:02] RECOVERY - Disk space on capella is OK: DISK OK [08:35:43] (03CR) 10Andrew Bogott: [C: 032] Fix the projectgid fact. [puppet] - 10https://gerrit.wikimedia.org/r/181387 (owner: 10Andrew Bogott) [08:36:06] andrewbogott: merged yours too [08:36:12] thanks! [08:36:33] RECOVERY - RAID on capella is OK: OK: optimal, 1 logical, 2 physical [08:37:03] RECOVERY - configured eth on capella is OK: NRPE: Unable to read output [08:37:12] RECOVERY - dhclient process on capella is OK: PROCS OK: 0 processes with command name dhclient [08:37:22] RECOVERY - puppet last run on capella is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:38:25] (03CR) 10Alexandros Kosiaris: "Two stray } characters, otherwise looks fine" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/180724 (owner: 10KartikMistry) [08:40:58] greetings [08:41:39] _joe_: pong [08:43:03] Can please get access to https://phabricator.wikimedia.org/dashboard/view/45/ ? [08:43:18] <_joe_> matanya: uh, yes, one sec [08:43:58] <_joe_> so, I wanted you advice: I have new hhvm packages ready to be tested, and CI (for instance) would love to use it. But if I put it on the apt repo now, it's going to be there even if I'm still not confident it should go in production [08:44:22] <_joe_> is there a "good" way to solve this that is not using puppet? [08:45:30] <_joe_> (basically, beta and CI should receive the packages before they go in prod and both testing in CI and beta are required before I'm confident it's ready for production) [08:46:08] yeah we haven't solved this properly yet [08:46:14] <_joe_> I know [08:46:17] you can dpkg -i them [08:46:24] <_joe_> ehe thanks :P [08:46:32] or you can put them in apt and don't upgrade prod [08:46:39] both aren't great solutions :) [08:47:03] <_joe_> I was searching for a third way [08:47:35] <_joe_> we may add one "experimental" repository we can add only to test hosts maybe? [08:47:41] could create a new testing repo... [08:47:44] ah, as you say :) [08:47:45] <_joe_> so a different distribution from trusty-wikimedia [08:48:16] <_joe_> (I don't remember if reprepro handles that decently) [08:48:45] <_joe_> andrewbogott: my proposal is quite lame, that's why I was searching for advice [08:49:34] <_joe_> I think I'll go the "update apt, don't upgrade prod" way [08:49:37] <_joe_> for now [08:50:22] a new distribution won't be great, because then you'll have to the matrix of distro x stability [08:50:22] if the changes aren't too invasive I'd say that's a good option currently at least [08:50:31] trusty-experimental, precise-experimental etc. [08:50:47] we could add a component [08:50:47] <_joe_> paravoid: that's why I said it's a lame solution [08:51:05] but even then, something that's experimental for one service may not be for another [08:51:11] <_joe_> paravoid: but we can have the same package at different versions in two components in reprepro? [08:51:26] <_joe_> I remember it's not possible, but I may be wrong [08:51:50] it's not [08:52:00] <_joe_> paravoid: yes it's clear there is no magic bullet [08:53:15] <_joe_> but having something that works like backports in debian ("don't install from here unless explicitly stated") would be handy. Maybe too much effort [08:54:05] that's a flag in Release [08:54:18] NotAutomatic: yes [08:54:27] (and there's also a newish flag ButAutomaticUpgrades: yes) [08:54:40] but that's for a distribution, not a component :) [08:54:47] <_joe_> eh :P [08:54:58] but we pin our repository anyway [08:59:04] back later! [09:02:59] <_joe_> matanya: you should be able to see the dashboard btw [09:03:12] <_joe_> I thought I told you that I was done, sorry [09:03:33] np, thanks much [09:04:03] _joe_: You do not have permission to view this object. [09:05:35] <_joe_> matanya: which is your user? [09:05:43] matanya [09:05:52] <_joe_> maybe it has less permissions than I can grant for that dashboard? [09:06:06] maybe [09:06:51] <_joe_> matanya: you signed an NDA right? [09:07:01] yeah, have access to RT [09:07:18] <_joe_> so I'm pretty sure if you log in with your LDAP account instead than with mediawiki oauth you'll have the right permissions [09:08:07] oh, so i need ldap [09:08:46] I didn't know i even have a password for that. _joe_ where can i find that ? [09:08:47] <_joe_> matanya: gerrit [09:08:52] ok [09:08:54] <_joe_> the gerrit account [09:10:39] _joe_: this user already exists [09:10:50] does this mean my 0auth and my ldap conflict? [09:11:24] matanya: nope [09:12:24] matanya: https://phabricator.wikimedia.org/settings/panel/external/ [09:12:34] Attach your LDAP account via there [09:13:29] _joe_: FYI - Matanya is in the WMF-NDA project so LDAP vs OAuth makes no difference (unless I'm mis understanding what your discussing that is :) ) [09:14:56] Request: POST http://phabricator.wikimedia.org/auth/login/ldap:self/, from 10.64.0.172 via cp1044 cp1044 ([10.64.0.172]:80), Varnish XID 1600070746 [09:14:56] Forwarded for: MYIP, 10.64.0.172 [09:14:57] Error: 503, Service Unavailable at Mon, 22 Dec 2014 09:14:31 GMT [09:15:00] <_joe_> JohnFLewis: he is in LDAP, is he in phabricator? [09:15:17] it happens to me quite often [09:15:32] _joe_: Yeah. He has the NDA group which ops use [09:15:54] <_joe_> uhm, the dashboard is now open to WMF-NDA... [09:16:09] still no permission :) [09:16:26] <_joe_> matanya: ok I'll take a look in a few [09:17:07] Ditto [09:17:12] is hitting tools.wmflabs.org/* slow for everyone else or just me? [09:17:50] YuviPanda: wfm [09:18:15] YuviPanda: define slow. Not really 'slow' for me but not 'quick' [09:23:09] (03PS6) 10KartikMistry: Beta: Add support for more language pairs [puppet] - 10https://gerrit.wikimedia.org/r/180724 [09:24:11] what is packagist-admin? [09:25:06] paravoid: the mail alias? [09:25:11] yes [09:27:30] I believe an email alias for a new packagist account. So multiple people can receive packagist emails or so? [09:30:05] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Add support for more language pairs [puppet] - 10https://gerrit.wikimedia.org/r/180724 (owner: 10KartikMistry) [09:30:18] <_joe_> JohnFLewis: so that bryan is not the only one able to manage it, see T84767 [09:31:15] _joe_: that's where I got my 'I think it is this' from :p [09:31:58] <_joe_> JohnFLewis: oh sorry, I didn't see you were /answering/ paravoid [09:32:44] :p how is 'learning to manage dashboard permissions' coming _joe_ ? [09:33:14] <_joe_> JohnFLewis: I think I had given view permissions to the dashboard but not to individual panels [09:33:19] <_joe_> matanya: can you try now? [09:33:28] Oh nevermind - I see it _joe_ [09:33:36] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [09:33:39] Likely [09:34:33] trying [09:35:00] works! thank you! [09:37:03] <_joe_> yeah phab has no concept of hierarchical permissions, it seems. Which probably makes sense [09:37:27] why is monitoring a project [09:37:29] ugh [09:37:59] paravoid: so we can monitor monitoring ;) [09:38:09] just a tag would have been fine [09:41:15] (03PS1) 10Yuvipanda: dynamicproxy: Enable uwsgi and fastcgi for urlproxy [puppet] - 10https://gerrit.wikimedia.org/r/181389 [09:43:22] (03CR) 10Yuvipanda: [C: 032] dynamicproxy: Enable uwsgi and fastcgi for urlproxy [puppet] - 10https://gerrit.wikimedia.org/r/181389 (owner: 10Yuvipanda) [09:53:35] _joe_: reminds me; still on the 'joyful' ops duty/willing to review two patches? [09:59:42] paravoid: heh there seem to be so much overlap between tags and projects and so on, somebody's tag is somebody else's project [10:00:17] godog: see my poke for Joe, I got the wrong guy :) [10:02:02] JohnFLewis: hehe I am on indeed this week [10:03:44] godog: https://gerrit.wikimedia.org/r/#/c/181268/ if you don't mind - I hope it's correct at least :) [10:05:47] <_joe_> we don't have an way to manage lvm via puppet, right? [10:07:00] _joe_: there’s a bunch of classes for doing that, but the module is called labs_lvm [10:07:01] not sure why [10:07:12] <_joe_> yeah, apart from that [10:10:27] PROBLEM - check_puppetrun on db1008 is CRITICAL: CRITICAL: Puppet has 2 failures [10:10:42] godog: as I just said on the list, let's use https://phabricator.wikimedia.org/T1147 :) [10:11:22] goddamit, uwsgi_pass is different from proxy_pass, different fastcgi_pass... [10:11:35] I'm getting all OCD-y about this so please do call me on it if I start becoming counterproductive :) [10:15:00] <_joe_> I prefer #debs too, but I'll abstain from any naming discussions [10:15:16] <_joe_> I don't care, I think having the right tags is more important [10:15:17] PROBLEM - check_puppetrun on db1008 is CRITICAL: CRITICAL: Puppet has 2 failures [10:15:47] <_joe_> (I hate that phab doesn't allow you to search for tickets "tagged operations only" [10:15:55] what do you mean? [10:16:48] JohnFLewis: I'll take a look [10:17:45] godog: kk, a stab I took looking at other stuff by mutante so, I hope what I put there does something :p [10:19:20] <_joe_> paravoid: tickets tagged operations but with no other tags [10:19:57] <_joe_> so they're not project-related (or need to be routed) and they surely need attention from the onduty opsen [10:20:31] <_joe_> because right now it's like ops-requests and ops-core are merged in "operations" [10:20:31] PROBLEM - check_puppetrun on db1008 is CRITICAL: CRITICAL: Puppet has 2 failures [10:20:53] shouldn’t we just use the ‘needs triage’ ‘priority’ field? [10:21:09] so triaging simply involves looking at tickets with that set [10:21:11] <_joe_> YuviPanda: what if people set a priority when opening the ticket? [10:21:35] they could just as well add appropriate projects :) [10:21:42] <_joe_> also, one can set a priority then leave a ticket unattended and unassigned [10:22:28] <_joe_> YuviPanda: in that case, if we really use phab to track our work, individual opsens following a project should be aware [10:22:50] <_joe_> IDK I'm just guessing how to handle this, I don't have a good solution [10:24:06] hmm, in general you just let the default priority be (unless you are the person working on it) when filing. then someone triages, and people assign it to themselves if they are. [10:24:23] and higher priority tickets without someone assigned is just things that need working on yet don’t have anyone working on [10:24:35] (or no ‘single’ person, at leas) [10:25:18] YuviPanda: heh in theory you are right, in practice I'm noticing that it isn't obvious who "owns" a ticket in multiple projects (and thus its priority) [10:25:30] but it is something we'll need to discuss after we get some practice [10:25:31] PROBLEM - check_puppetrun on db1008 is CRITICAL: CRITICAL: Puppet has 2 failures [10:25:51] godog: indeed, and I usually look at comments / related activity than the ‘assigned’ field anyway. [10:27:54] _joe_: I always thought that "in all projects: operations" would do this [10:27:57] but apparently it's not [10:28:01] some tickets are actually solved best with a little bit of team work, trying to find a sigle owner for everything can even be counter-productive in some cases..imho [10:29:01] (there's a different search field for "all" and "any", but apparently it's only about the operands) [10:29:27] <_joe_> paravoid: it just gets all tickes that are ALSO tagged operations [10:29:32] yeah [10:30:12] <_joe_> mutante: I strongly disagree. It may be teamwork, still someone should be responsible to take one task to completion [10:30:23] RECOVERY - check_puppetrun on db1008 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [10:30:29] I agree with both of you :P [10:30:36] <_joe_> yes [10:30:46] depends on the ticket [10:30:56] i agree with the "let's use phab ticket instead of the list":) [10:30:56] some tickets are actionable, others are more general goals [10:31:32] <_joe_> so, the simplest thing would be to require people to tag requests to ops as "ops-requests" [10:31:37] that won't work [10:31:47] <_joe_> but that will not happen anytime soon [10:31:54] s/anytime soon// :) [10:31:58] (good thing we're meeting in a ~month, that'll require some discussion) [10:32:04] in Bugzilla there was a thing called "shell" [10:32:22] it doesn't make much sense nowadays anymore, but historically it meant "needs ops" [10:32:30] because back in the days having shell was the same thing [10:32:45] so people still used that in BZ to tag it as an ops thing [10:33:00] <_joe_> paravoid: those should be tagged "Epic" in the scrum lingo [10:33:20] <_joe_> because scrum doesn't work :) [10:33:20] there's an #epic :) [10:34:35] <_joe_> paravoid: I know [10:34:41] <_joe_> that's why I suggested it [10:34:52] I think that first of all, we should tag everything as #operations [10:35:01] (except labs, possibly fundraising, possibly analytics) [10:35:13] <_joe_> (or, it would be nice to be able to assign tickets to the whole team) [10:35:27] there's currently no query that can give you all of our team's tickets other than handcrafting a query that has ops-foo ops-bar op-baz [10:36:00] yes, assigning to a team alias sounds good [10:36:06] <_joe_> ewww I never looked inside the openjdk package [10:36:14] (03CR) 10Filippo Giunchedi: [C: 04-1] "we should do this in stages:" [puppet] - 10https://gerrit.wikimedia.org/r/181268 (owner: 10John F. Lewis) [10:36:18] why would you assign to the team alias instead of leaving it unassigned? [10:36:40] because then we can search for what is assigned to us [10:37:10] (03CR) 10John F. Lewis: "I'll split this patch later then." [puppet] - 10https://gerrit.wikimedia.org/r/181268 (owner: 10John F. Lewis) [10:37:42] godog: fair enough. I'll split the patch later :) [10:38:21] JohnFLewis: cool, thanks for your work! [10:39:25] godog: but the config I've provided looks like it would work in some universe right? [10:40:36] <_joe_> nice, the openjdk package needs... openjdk in order to be installed [10:40:43] <_joe_> s/installed/build/ [10:40:46] <_joe_> *built [10:40:54] <_joe_> (I have an horrible lag right now) [10:41:24] openjdk 8 won't even be in jessie btw [10:41:30] JohnFLewis: absolutely, the changes themselves seem good [10:41:35] there were hundreds of bugs with openjdk so they decided to not ship it [10:42:39] (03CR) 10Dzahn: "re: comments from godog: if you switch DNS config before removing Apache https config you can run into redirect loops when the existing co" [puppet] - 10https://gerrit.wikimedia.org/r/181268 (owner: 10John F. Lewis) [10:43:05] PROBLEM - Varnishkafka Delivery Errors on cp3007 is CRITICAL: Return code of 137 is out of bounds [10:43:23] do we really need java 8? :) [10:43:54] <_joe_> paravoid: it's needed for the new titan version, so I'm backporting from vivid [10:45:27] <_joe_> paravoid: for Titan, yes; [10:46:32] <_joe_> better, for the latest titan beta, yes [10:47:10] <_joe_> since they allegedly fixes some of our issues there, we want to try it out [10:47:33] RECOVERY - Varnishkafka Delivery Errors on cp3007 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [10:47:33] RECOVERY - Varnishkafka Delivery Errors on amssq51 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [10:50:54] andrewbogott_afk: ping me when you're around [10:51:03] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: Puppet last ran 9 days ago [10:52:30] ok, puppet's broken [10:53:46] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Puppet last ran 9 days ago [10:53:58] 9 days ? [10:54:06] that's me reenabling it [10:54:16] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet last ran 3 days ago [10:54:42] I don't remember disabling it ... [10:54:52] was there a comment or smt ? [10:55:42] (03PS1) 10Faidon Liambotis: Confine projectgid fact to the labs realm [puppet] - 10https://gerrit.wikimedia.org/r/181393 [10:55:47] - defaultConsistency: one [10:55:48] + defaultConsistency: localQuorum [10:56:38] gabriel probably :) [10:56:39] (03PS2) 10Faidon Liambotis: Confine projectgid fact to the labs realm [puppet] - 10https://gerrit.wikimedia.org/r/181393 [10:56:56] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:57:02] but this was bad for various reasons, so force-running it [10:57:21] does that confine make sense? opinions? [10:58:04] (03PS3) 10Faidon Liambotis: Confine projectgid fact to the labs realm [puppet] - 10https://gerrit.wikimedia.org/r/181393 [10:58:10] do facts have access to puppet variables like that ? [10:58:26] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:59:18] hm, is realm a variable and not a fact? [10:59:22] before touching planet to move it behind misc-web: why? planet already has it's own certificate [10:59:35] it has a *.planet wildcard cert [10:59:41] mehyou're probably right [10:59:51] paravoid: it is a variable not a fact [11:00:27] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:01:22] <_joe_> mw1191 is me [11:01:40] (03PS1) 10Faidon Liambotis: Revert "Fix the projectgid fact" [puppet] - 10https://gerrit.wikimedia.org/r/181395 [11:01:45] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [11:01:59] (03PS2) 10Faidon Liambotis: Revert "Fix the projectgid fact" [puppet] - 10https://gerrit.wikimedia.org/r/181395 [11:02:24] I think facter can access environment information from puppet probably [11:02:39] I doubt it [11:02:47] (03Abandoned) 10Faidon Liambotis: Confine projectgid fact to the labs realm [puppet] - 10https://gerrit.wikimedia.org/r/181393 (owner: 10Faidon Liambotis) [11:02:53] (03PS3) 10Faidon Liambotis: Revert "Fix the projectgid fact" [puppet] - 10https://gerrit.wikimedia.org/r/181395 [11:03:12] (03CR) 10Faidon Liambotis: [C: 032] Revert "Fix the projectgid fact" [puppet] - 10https://gerrit.wikimedia.org/r/181395 (owner: 10Faidon Liambotis) [11:03:18] paravoid, well, the "environment" setting is local to a node [11:03:34] (03CR) 10Faidon Liambotis: [V: 032] Revert "Fix the projectgid fact" [puppet] - 10https://gerrit.wikimedia.org/r/181395 (owner: 10Faidon Liambotis) [11:04:33] (or, add the fact directly to the relevant environment) [11:04:53] <_joe_> paravoid: looking into it [11:05:11] uh that's the other me that reconnected :P [11:06:58] (03CR) 10Dzahn: [C: 031] Quote file modes [puppet] - 10https://gerrit.wikimedia.org/r/181380 (owner: 10KartikMistry) [11:07:08] (03CR) 10Filippo Giunchedi: "revised list of steps:" [puppet] - 10https://gerrit.wikimedia.org/r/181268 (owner: 10John F. Lewis) [11:09:17] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [11:09:54] svn.. pff :) [11:10:04] i'll look though [11:10:25] look what ? it is just that the cert is expiring ... [11:10:27] it's _actually_ svn though [11:10:30] not just web [11:11:53] PROBLEM - HTTPS_m.wiktionary.org on cp4012 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [11:12:36] akosiaris: yea, the cert is expiring and if i ask for a new one i guarantee it takes about 10 seconds for "why dont you move behind misc-web" [11:12:55] and then i wanted to do that, expecting it to be just http [11:14:06] <_joe_> paravoid, what about confining that fact to "fqdn ends in wmflabs"? would that work? [11:14:27] then we realized it also has "svn-authz" and stuff [11:16:13] but if you want to just get a new cert, please do [11:19:45] mutante: yeah I was going to say misc-web too. Q: svn is read-only right now isn't it ? Do we need authentication for anything ? [11:20:53] ha 2 [11:21:26] i don't know, i don't think we do [11:29:18] (03CR) 10Dzahn: "quoting file modes is good - want to also align the => while at it?" [puppet] - 10https://gerrit.wikimedia.org/r/181380 (owner: 10KartikMistry) [11:31:55] (03CR) 10KartikMistry: "Yes. https://gerrit.wikimedia.org/r/#/c/181381/ for that." [puppet] - 10https://gerrit.wikimedia.org/r/181380 (owner: 10KartikMistry) [11:33:30] (03CR) 10Alexandros Kosiaris: [C: 032] Properly align indentation of => [puppet] - 10https://gerrit.wikimedia.org/r/181381 (owner: 10KartikMistry) [11:38:24] did we lose icinga? [11:38:39] yeah we did [11:39:55] and mgmt doesn't work [11:39:58] awesome [11:41:14] <_joe_> :/ [11:45:42] <_joe_> do we have someone we can call to powercycle one server while chris is sleeping? [11:45:58] <_joe_> meaning physically powercycle [11:46:30] <_joe_> else, we can reinstall on a spare maybe. [11:47:52] i think we might be able to call eqiad for smart hands [11:48:06] we have smart hands and there's also an expedited option [11:48:09] but it charges us [11:49:01] maybe 2 more hours and we could call Chris? [11:49:04] so we'd at least need mark for the preapproval [11:49:07] it's like 7 in the morning [11:49:23] <_joe_> 2 hours without monitoring? [11:49:36] do it [11:49:40] the smart hands [11:49:45] oh hey [11:50:05] doing it [11:50:49] <_joe_> we have some spares we can use in case of need [11:50:52] A4 would be 0104 I assume [11:51:22] there's 0101-0108, 0201-0208 etc. [11:51:38] safe bet :) [11:56:38] <_joe_> we should really set up a different architecture for nagios [11:56:46] that's well established [11:57:41] at least we still have watchmouse .. [11:58:28] grrr this portal [11:58:34] "Submitting" and waits [12:03:25] so I can see it now under submitted requests as "requested" but with no order ID [12:03:30] time for a phone call I guess [12:06:31] (03PS1) 10KartikMistry: Beta: Fix language codes for nno and nob [puppet] - 10https://gerrit.wikimedia.org/r/181397 [12:09:44] (03PS1) 10Dzahn: add dev.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/181398 [12:13:49] ^ do you think dev.wikimedia.org as a portal is a good idea ? --> https://phabricator.wikimedia.org/T67074 [12:15:44] (03PS1) 10KartikMistry: Properly align indentation of => [puppet] - 10https://gerrit.wikimedia.org/r/181399 [12:16:04] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] [WMF] New Package Version with various bugfixes [debs/hhvm] - 10https://gerrit.wikimedia.org/r/180752 (owner: 10Giuseppe Lavagetto) [12:20:10] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Fix language codes for nno and nob [puppet] - 10https://gerrit.wikimedia.org/r/181397 (owner: 10KartikMistry) [12:23:43] (03CR) 10Hoo man: [C: 04-1] Removed 'OTRS-member' user group on commons. (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180560 (owner: 10Dereckson) [12:23:50] (03CR) 10Dzahn: [C: 032] Quote file modes [puppet] - 10https://gerrit.wikimedia.org/r/181380 (owner: 10KartikMistry) [12:50:52] (03CR) 10Dereckson: Removed 'OTRS-member' user group on commons. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180560 (owner: 10Dereckson) [12:53:38] <_joe_> g | On Ops duty: _joe_ | On Product duty: James_F [12:53:50] (03PS2) 10Dereckson: Removed 'OTRS-member' user group on commons. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180560 [12:54:03] _joe_: nice try ;) [12:54:10] (03PS3) 10Dzahn: Added ang.wikibooks and ie.wikibooks to closed.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180451 [12:54:27] <_joe_> JohnFLewis: I have such an horrible connection today :( [12:54:36] <_joe_> I'm blind-typing now [12:54:51] (03CR) 10Dereckson: "PS 2: addressed Hoo comment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180560 (owner: 10Dereckson) [12:54:53] ah, how about that email address? does it redirect yet? [12:55:11] @rt.wikimedia.org [12:58:29] yay :) [12:59:43] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [13:00:13] ah, they powercycled? [13:00:24] yes [13:00:36] mgmt still doesn't work [13:04:48] (03PS2) 10Qgil: add dev.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/181398 (owner: 10Dzahn) [13:05:13] (03PS3) 10Qgil: add dev.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/181398 (owner: 10Dzahn) [13:10:51] (03PS3) 1020after4: phabricator: Change security_topic from "default: none" to "default: default" [puppet] - 10https://gerrit.wikimedia.org/r/179407 [13:11:15] (03CR) 1020after4: "This is totally deployable, just needs anyone from ops to +2 it." [puppet] - 10https://gerrit.wikimedia.org/r/179407 (owner: 1020after4) [13:11:54] ^ famous last words [13:11:58] :) [13:12:03] hehe [13:12:44] i'm not familiar with the term. what is a "paper cut bug"? [13:13:03] * mutante reads WP article [13:13:27] ok:) [13:16:01] "require a phd daemon restart to be fully affective" [13:16:22] _fully_? [13:16:26] <_joe_> brb, lunch [13:18:07] (03CR) 10Dzahn: "but it also says "DO NOT CHANGE THESE VALUES. Must be updated to match in security extensions. " and "require a phd daemon restart to be f" [puppet] - 10https://gerrit.wikimedia.org/r/179407 (owner: 1020after4) [13:48:13] (03CR) 1020after4: [C: 031] "This doesn't actually change the values, it just changes the default... and the previous default was an invalid value. Restarding phd woul" [puppet] - 10https://gerrit.wikimedia.org/r/179407 (owner: 1020after4) [13:48:22] PROBLEM - Host capella is DOWN: CRITICAL - Plugin timed out after 15 seconds [14:06:50] (03PS1) 10KartikMistry: Fixed spacing and alignment [puppet] - 10https://gerrit.wikimedia.org/r/181404 [14:18:16] (03PS2) 10KartikMistry: Fixed spacing and alignment [puppet] - 10https://gerrit.wikimedia.org/r/181404 [14:27:06] (03CR) 10Andrew Bogott: "Yeah, there's no reason for this fact to run on production at all." [puppet] - 10https://gerrit.wikimedia.org/r/181395 (owner: 10Faidon Liambotis) [14:27:28] andrewbogott: but I couldn't find an easy way to do this [14:27:36] (03PS3) 10KartikMistry: Fixed spacing and alignment [puppet] - 10https://gerrit.wikimedia.org/r/181404 [14:27:46] andrewbogott: I didn't research it much though, I didn't have the time [14:29:11] <_joe_> is there a definitive way to tell that an host is a labs host? the fqdn maybe? [14:30:11] fqdn gets fucked up easily, especially if people mess with /etc/hosts [14:30:17] * YuviPanda sadly looks at tools project [14:30:22] yeah, .wmflabs [14:30:42] YuviPanda: are tools instances not .wmflabs? [14:30:55] andrewbogott: they are now :) [14:31:06] (03PS1) 10Unicodesnowman: Make upload.wikimedia.org set Timing-Allow-Origin [puppet] - 10https://gerrit.wikimedia.org/r/181405 [14:31:08] andrewbogott: tools-webproxy was just tools-webproxy for a long time, fixed now. [14:37:44] PROBLEM - Host 2620:0:861:1:7a2b:cbff:fe09:c21 is DOWN: CRITICAL - Plugin timed out after 15 seconds [14:39:22] wait what [14:39:29] where does that IPv6 come from? [14:39:56] wtf [14:40:11] ah, eh.. weird. Icinga says host is down but service is up on it [14:40:33] RECOVERY - Host 2620:0:861:1:7a2b:cbff:fe09:c21 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [14:45:35] this was hydrogen [14:45:47] but all four recursors have host_name in icinga [14:46:35] yep, thanks i saw it in Icinga [14:48:12] RECOVERY - Host capella is UP: PING OK - Packet loss = 0%, RTA = 44.41 ms [14:51:33] re: SVN "ViewVC uses Subversion's Python bindings to interact with and pull information out of your Subversion repositories. " [14:52:05] so if we remove the client , viewvc wouldn't work [14:54:45] PROBLEM - configured eth on capella is CRITICAL: Connection refused by host [14:54:45] PROBLEM - RAID on capella is CRITICAL: Connection refused by host [14:55:11] PROBLEM - dhclient process on capella is CRITICAL: Connection refused by host [14:56:21] PROBLEM - puppet last run on capella is CRITICAL: Connection refused by host [14:56:50] PROBLEM - salt-minion processes on capella is CRITICAL: Timeout while attempting connection [14:56:58] PROBLEM - DPKG on capella is CRITICAL: Timeout while attempting connection [14:57:16] PROBLEM - Disk space on capella is CRITICAL: Timeout while attempting connection [15:05:28] (03CR) 10Dzahn: [C: 032] Properly align indentation of => [puppet] - 10https://gerrit.wikimedia.org/r/181399 (owner: 10KartikMistry) [15:12:58] (03PS1) 10Yuvipanda: tools: Add Mono fastcgi server package to exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/181409 [15:13:04] Coren: ^ +1? [15:13:35] <_joe_> mono [15:13:42] <_joe_> YuviPanda: do you support plain old cgis? [15:13:53] <_joe_> I want to write a tool in pure fortran77 [15:14:06] _joe_: you’d be surprised... [15:14:16] oh [15:14:19] we don’t actually have fortran [15:14:28] _joe_: we actually have a bunch of stuff running on tcl [15:15:13] <_joe_> YuviPanda: http://www.nber.org/sys-admin/fortran-cgi/ [15:16:22] _joe_: my eyesss... [15:16:33] _joe_: although, I personally do *like* mono and C# [15:16:44] <_joe_> YuviPanda: that's actually pretty decently written fortran 77 [15:17:42] _joe_: I think it was mostly the reaction to 'well we can not actually parse the query string because the separators could be ANYWHERE!!1' [15:17:48] <_joe_> YuviPanda: I never used it, whenever I took a look I was underimpressed [15:18:01] it's been nice since C#3 [15:20:15] https://translate.google.com/translate?hl=en&sl=auto&tl=en&u=http%3A%2F%2Fwww.roland-illig.de%2Flang.bf-cgi.html [15:20:25] <_joe_> YuviPanda: and about F77 string handling... it simply is not there. [15:20:43] yup. I realized that just then... [15:23:34] <_joe_> YuviPanda: there's cobol-cgi too, in case someone wanted to use it... [15:24:40] (03PS2) 10Yuvipanda: tools: Add Mono fastcgi server package to exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/181409 [15:24:54] _joe_: not to mention bash on balls [15:25:08] (https://github.com/jneen/balls) [15:25:13] https://packages.debian.org/wheezy/beef [15:27:17] (03CR) 10Yuvipanda: [C: 032] tools: Add Mono fastcgi server package to exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/181409 (owner: 10Yuvipanda) [15:42:38] (03PS1) 10Yuvipanda: tools: Add uwsgi machines to the appropriate queue [puppet] - 10https://gerrit.wikimedia.org/r/181411 [15:43:37] Coren: ^ [15:43:57] (03PS2) 10coren: tools: Add uwsgi machines to the appropriate queue [puppet] - 10https://gerrit.wikimedia.org/r/181411 (owner: 10Yuvipanda) [15:45:03] (03CR) 10coren: [C: 032] tools: Add uwsgi machines to the appropriate queue [puppet] - 10https://gerrit.wikimedia.org/r/181411 (owner: 10Yuvipanda) [15:45:24] Coren: so that adds them to the queue? do I need to add anything else? [15:46:18] No, it doesn't. Right now, the gridengine configuration stanzas create the config files but do not apply them - I'm waiting for codfw to test them thoroughly before I make them live. You still have to apply the host to the queue by hand for now. [15:47:04] It's a simple matter: just 'qconf -mq webgrid-uwsgi' and add the hostname to the hostlist line [15:47:35] Coren: yeah, I still have that in history... [15:48:55] YuviPanda: If you're curious, the resulting config will end up in /data/project/.system/gridengine/etc/queues [15:55:00] Coren: gah, got disconnected [15:55:17] 15:48‧ YuviPanda: If you're curious, the resulting config will end up in /data/project/.system/gridengine/etc/queues [15:55:20] Coren: I see no tools-uwsgi queue there... [15:55:26] oh, maybe puppet hasn't run on that yet? [15:55:30] * YuviPanda checks [15:55:41] YuviPanda: It needs a few puppet runs; one to generate the "resource" and one to collect it. [15:55:49] ah, right [15:55:54] the poor man's resource collection thing [15:56:58] Coren: right, so I gave it 256 slots and also added it to the queue. let me make it a submit host too [15:59:14] (03PS1) 10John F. Lewis: etherpad: add Varnish misc config [puppet] - 10https://gerrit.wikimedia.org/r/181412 [15:59:33] (03PS1) 10John F. Lewis: etherpad: remove SSL stanza [puppet] - 10https://gerrit.wikimedia.org/r/181413 [15:59:47] (03Abandoned) 10John F. Lewis: etherpad: move behind misc-lb.eqiad [puppet] - 10https://gerrit.wikimedia.org/r/181268 (owner: 10John F. Lewis) [16:00:58] godog: Separated the patches + added you as a reviewer. [16:01:06] (03CR) 10Dzahn: [C: 031] etherpad: add Varnish misc config [puppet] - 10https://gerrit.wikimedia.org/r/181412 (owner: 10John F. Lewis) [16:03:26] JohnLewis: cool! I have to leave shortly, will take a look tomorrow tho [16:03:57] (03CR) 10Dzahn: [C: 04-1] "RewriteRule /p/*$ https://<%= @etherpad_host %>/ [NC,L]" [puppet] - 10https://gerrit.wikimedia.org/r/181413 (owner: 10John F. Lewis) [16:04:41] mutante: ah. Will do in a moment [16:08:02] Coren: hmm, added it to the queue, restarted gridengine minion, but nothing seems to be scheduled there still/ [16:08:03] ? [16:08:35] Lemme take a look at things. [16:09:42] Coren: oh, nevrmind. it does work :) [16:09:46] YuviPanda: I see nothing on the queue. [16:10:01] Coren: because of the defaults of webservice2, it was trying to find a precise uwsgi queue, which doesn't exist, and got stuck. [16:10:10] I killed it and restarted it and now it runs [16:10:21] Ah! Yep, that makes sense. [16:11:13] and yes, works! :) https://tools.wmflabs.org/wp-signpost/feed [16:11:25] (03PS2) 10John F. Lewis: etherpad: remove SSL stanza [puppet] - 10https://gerrit.wikimedia.org/r/181413 [16:17:51] (03PS1) 10Dzahn: cache: install the planet SSL cert on misc-web [puppet] - 10https://gerrit.wikimedia.org/r/181415 [16:21:07] (03PS2) 10John F. Lewis: cache: install the planet SSL cert on misc-web [puppet] - 10https://gerrit.wikimedia.org/r/181415 (owner: 10Dzahn) [16:21:53] mutante: gerritbot wants 'Bug: T*' instead of 'Bugs: T*' :) [16:22:28] (03CR) 10John F. Lewis: "Looks sane and is what I want for moving planet stuff behind Varnish and off Zirconium's public side :)" [puppet] - 10https://gerrit.wikimedia.org/r/181415 (owner: 10Dzahn) [16:22:42] (03CR) 10John F. Lewis: [C: 031] cache: install the planet SSL cert on misc-web [puppet] - 10https://gerrit.wikimedia.org/r/181415 (owner: 10Dzahn) [16:22:53] lovely idea if I select '+1' >.> [16:23:46] JohnLewis: ah, typo. thanks [16:34:13] (03PS3) 10Dzahn: cache: install the planet SSL cert on misc-web [puppet] - 10https://gerrit.wikimedia.org/r/181415 [16:36:16] (03CR) 10Dzahn: "the file i want is /files/ssl/star.planet.wikimedia.org.crt" [puppet] - 10https://gerrit.wikimedia.org/r/181415 (owner: 10Dzahn) [16:37:33] <_joe_> !log uploading java8 packages for trusty [16:37:41] Logged the message, Master [16:39:31] _joe_: yay! :) [16:42:26] (03CR) 10Steinsplitter: [C: 031] Removed 'OTRS-member' user group on commons. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180560 (owner: 10Dereckson) [16:58:19] I am taking neon down for a few mins this will affect icinga. Any objections? [16:59:53] paravoid, akosiaris: I disabled puppet on the cassandra test hosts in order to be able to tweak the parameters in a timely manner [17:00:33] PROBLEM - puppetmaster https on virt1000 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 500 Internal Server Error [17:04:04] !log powering down neon (icinga) to drain flea power and reset idrac [17:04:05] (03CR) 10BBlack: [C: 031] cache: install the planet SSL cert on misc-web [puppet] - 10https://gerrit.wikimedia.org/r/181415 (owner: 10Dzahn) [17:04:06] Logged the message, Master [17:05:06] planet is a second-level wildcard [17:05:12] so our regular wildcard can't handle it [17:05:17] so what do we gain by moving it to misc-web? [17:05:28] huh? [17:05:31] that's why we install the wildcard on misc-web [17:05:44] gain = no public IP on zirconium [17:05:59] so, SNI for misc-web then? [17:06:08] we already do SNI for misc-web [17:06:12] for phabricator [17:06:17] ah right, wmfusercontent [17:06:19] https://phabricator.wikimedia.org/T60048 [17:06:57] I can't say I understand this much tbh :) [17:06:59] probably a valid question, though, is whether we're ok with requiring SNI for SSL for planet, like we are for the more-internal things behind misc-web [17:07:13] the attack surface isn't changed much [17:07:23] the only think that changes is the certificate cost [17:07:26] it would be great if you can comment on T60048 [17:07:29] and, well, the cost of an IP :) [17:07:39] i expected this when talking to John [17:07:42] the cert cost stays the same, it's the same cert on either host [17:08:41] oh and I guess the SNI bit doesn't change either, since zirconium was already SNI as well [17:08:43] not for etherpad, but who cares really :) [17:08:58] and contacts [17:09:13] which is a really weird domain anyway and I'm not even sure it belongs to prod [17:09:55] it's unpuppetized anyway, iirc [17:10:02] this change doesn't change those though. I guess you're looking ahead to "how do we move them all?" [17:10:06] what is bad about the change? [17:10:22] (03PS1) 10John F. Lewis: planets: add Varnish statement [puppet] - 10https://gerrit.wikimedia.org/r/181419 [17:10:32] JohnLewis: before you continue see above please [17:11:52] well [17:11:58] i would have bet money on "why is planet not behind misc-web" [17:12:07] in the opposite case [17:12:11] so we have had this idea that misc-web is internal-ish stuff [17:12:11] ? [17:12:23] of all the things on zirconium, that seems borderline for misc-web actually :) [17:12:33] i never knew that misc-web is internal-ish [17:12:35] well internal and tooling, but not the primary public sites [17:13:06] well we certainly treat it differently, e.g. using higher-security settings, right? [17:13:12] (which break some older browsers) [17:13:58] well I thought we did, I don't see the ref for strong now [17:14:56] but in any case, we made that case about mathoid/citoid, that they shouldn't be on misc if they're a real public/production service as opposed to tools/internal [17:15:13] personally - I see no gain nor loss from doing either. [17:15:40] if the answer is to not do it , T60048 should be rejected [17:15:54] I don't know [17:16:13] the main thing I'm thinking right now is that if we're drawing a distinction about what goes on misc-web, it's certainly not a very well defined one. [17:16:28] (for these odd cases) [17:16:40] i guess that's because the string "misc" in itself is already like "anything else" [17:16:50] bblack: the name 'misc-web' to me just says 'anything that we can't say is something in its own right' [17:17:08] e.g. etherpad is misc, planets would technically be misc etc. [17:17:34] well yeah, but the functional part is probably more important. "something in its own right" functionally is probably something we care more about uptime for and/or may consume significant resources [17:18:00] e.g. not planets :) [17:18:13] and maybe "is something considered a general public production service as opposed to internal/tools stuff", but I'm not even sure what that means right now even though I said it earlier. [17:19:29] (03Abandoned) 10John F. Lewis: planets: add Varnish statement [puppet] - 10https://gerrit.wikimedia.org/r/181419 (owner: 10John F. Lewis) [17:20:05] phab is misc, that's probably an important datapoint in defining how important things on misc can be [17:20:14] For what it's worth, I filed that bug a long time ago when I deployed Scholarships on zirconium. At the time I was a bit amazed that ssh access to the host didn't require using a bastion. If it's a dumb idea feel free to wontfix it [17:21:08] I do like the general idea we've been moving towards of using misc as a way to get service machines off of public IPs [17:21:16] bd808: currently everything but planets no one is disputing moving over. So it wouldn't fall under 'dumb' just 'touches on a topic no one is certain on' [17:21:34] I'd much rather that all our "public" endpoints be our standard frontends with e.g. nginx/varnish, instead of some arbitrary software. [17:22:13] JohnLewis: *nod* just wanted to be clear where I was coming from when I opened the ticket. [17:22:50] currently only planets, etherpad and bug-attachment stand on zirconium via DNS anyway [17:23:04] so either all of the misc things go on misc and there's no distinction other than "probably low bandwidth", or if we want to keep moving random "public" servers back into the private network over time, we need another misc that's for things we don't like on misc. [17:23:46] bblack: 'misc-but-no-misc-web-lb.eqiad.wikimedia.org' :p [17:24:26] to misc or not to misc? that is the question [17:24:37] +1 [17:25:25] whether tis nobler in men's hearts to suffer the exposure of random ports or hide behind the known shield of nginx+varnish [17:33:20] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Puppet has 1 failures [17:37:20] (03PS1) 10Giuseppe Lavagetto: admin: add krenair to deployers [puppet] - 10https://gerrit.wikimedia.org/r/181421 [17:41:38] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:41:59] !log taking neon down again to reseat idrac nic card [17:42:01] Logged the message, Master [17:43:43] ottomata: hey! [17:43:54] ottomata: jdlrobson hasn’t been able to ssh in to stat1003 since the VLAN move. [17:45:05] (03PS2) 10Giuseppe Lavagetto: admin: add krenair to deployers [puppet] - 10https://gerrit.wikimedia.org/r/181421 [17:45:37] yuvipanda: is he going through a bastion and using its new fqdn, stat1003.eqiad.wmnet? [17:47:32] jdlrobson: can you paste your .ssh/config? [17:47:53] jdlrobson: apparently the address changed to stat1003.eqiad.wmnet and stat1003.wikimedia.org no longer resolves [17:48:55] Host stat1003.wikimedia.org \ ProxyCommand ssh -e none bast1001.wikimedia.org exec nc -w 3600 %h %p [17:49:21] ForwardAgent no \ Host *.eqiad.wmnet \ ProxyCommand ssh -a -W %h:%p bast1001.wikimedia.org [17:49:38] ^ although the lines are the other way round in my config, i guess those are the relevant ones [17:50:55] jdlrobson: stat1003.eqiad.wmnet [17:50:56] is the url [17:51:00] hostname* [17:51:06] not .wikmedia.or [17:51:52] g [17:52:49] jdlrobson: ^ change that and it should work? [17:53:28] looking [17:53:40] wooo!! [17:53:41] it works [18:03:04] qchris: I see you’ve shut down your qchris-* labs machines. are you planning on powering them back up at some point or are they deaaad [18:03:41] YuviPanda: The would get powered up again in about 2 weeks or so. [18:03:58] Would you prefer me to kill them, and recreate when needed? [18:04:30] qchris: yeah, preferably in a project by themselves :D [18:04:38] new projects are freeee! [18:04:41] moar the better [18:04:52] But they gonna die for good in ~1 month. [18:05:01] hmm [18:05:05] that’s fine too [18:41:42] (03PS4) 10Rush: phabricator: Change security_topic from "default: none" to "default: default" [puppet] - 10https://gerrit.wikimedia.org/r/179407 (owner: 1020after4) [18:42:34] (03CR) 10Rush: [C: 032 V: 032] "tested and confirmed, it was confused before. where the default key was set to an option value and not a key. seems resonable and mostly" [puppet] - 10https://gerrit.wikimedia.org/r/179407 (owner: 1020after4) [18:53:30] (03CR) 10Bartosz Dziewoński: "BEST CHRISTMAS GIFT <3" [puppet] - 10https://gerrit.wikimedia.org/r/179407 (owner: 1020after4) [18:53:43] (03PS2) 10Rush: phabricator: strip Ubuntu 12.04 (precise) support [puppet] - 10https://gerrit.wikimedia.org/r/179882 (owner: 10Faidon Liambotis) [18:59:34] !log running sync-common on virt1000 [18:59:36] Logged the message, Master [19:00:26] (03CR) 10Gergő Tisza: [C: 031] Make upload.wikimedia.org set Timing-Allow-Origin [puppet] - 10https://gerrit.wikimedia.org/r/181405 (owner: 10Unicodesnowman) [19:04:40] !log deployed T85113 [19:04:42] Logged the message, Master [19:10:06] PROBLEM - BGP status on cr1-ulsfo is CRITICAL: CRITICAL: No response from remote host 198.35.26.192 [19:15:41] RECOVERY - BGP status on cr1-ulsfo is OK: OK: host 198.35.26.192, sessions up: 10, down: 0, shutdown: 0 [19:17:45] !log reedy Purged l10n cache for 1.25wmf7 [19:18:07] !log reedy Purged l10n cache for 1.25wmf8 [19:18:25] !log reedy Purged l10n cache for 1.25wmf9 [19:19:09] !log reedy Purged l10n cache for 1.25wmf10 [19:20:14] !log reedy Purged l10n cache for 1.25wmf11 [19:22:07] (03CR) 10Rush: [C: 032] "@faidon....I'm assuming since this is wholly Phab specific you are ok w/ me merging and babysitting? Normally I don't like to merge other" [puppet] - 10https://gerrit.wikimedia.org/r/179882 (owner: 10Faidon Liambotis) [19:22:35] (03PS1) 10Reedy: Remove 1.25wmf2 through 1.25wmf4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181431 [19:24:16] (03PS2) 10Reedy: Remove 1.25wmf2 through 1.25wmf5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181431 [19:26:47] (03CR) 10Reedy: [C: 032] Remove 1.25wmf2 through 1.25wmf5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181431 (owner: 10Reedy) [19:26:52] (03Merged) 10jenkins-bot: Remove 1.25wmf2 through 1.25wmf5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181431 (owner: 10Reedy) [19:27:12] greg-g: I'd like the deployment mantle... How are we coordinating this week? [19:27:36] awight: There's no train this week [19:28:10] Reedy: that makes sense. But when people such as yours truly need to do deployments, how do we avoid conflict? [19:28:24] Put it on the calendar as usual I guess [19:28:29] there is none :) [19:28:33] I'll start something... [19:46:29] There are HHVM errors on mediawiki-core... anyone recognize these? https://integration.wikimedia.org/ci/job/mediawiki-phpunit-hhvm/452/consoleFull [19:46:44] It's all, "MWTidy::clean: error text return from HHVM tidy is not supported" [19:47:57] Also--Korean? [19:48:33] awight: ori- might know about them, he was working on that recently afaik [19:48:34] silly joke, which makes Jenkins unusable? [19:48:47] gwicke: ok thx! [19:49:30] ori-: u know if this HHVM failure above is a known issue? [19:49:40] I'm gonna override jenkins in this case... [19:53:24] !log awight Synchronized php-1.25wmf12/extensions/CentralNotice: RecordImpression logging for CentralNotice hide cookie bug (duration: 00m 06s) [19:53:30] Logged the message, Master [19:53:35] awight: That thing is broken for at least a week now [19:53:36] !log awight Synchronized php-1.25wmf13/extensions/CentralNotice: RecordImpression logging for CentralNotice hide cookie bug (duration: 00m 06s) [19:53:37] ignore it... [19:53:39] Logged the message, Master [19:54:12] hoo: thanks for the news! [19:54:27] Seems like I should just mark those tests as skipped... [19:54:30] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:48] (03PS1) 10Gergő Tisza: [WIP] Deploy Sentry on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181439 [20:05:16] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [20:10:14] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [20:10:33] awight: I fixed the ui language for Jenkins. The trick is to go to https://integration.wikimedia.org/ci/configure, change Locale > Default Language from "en" to "en-US" (or vice versa) and hit "Save" (blue button). It's some strange bug that recurs semi-frequently with various languages taking over of the configured one. [20:13:17] hah! [20:49:32] PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: puppet fail [21:04:23] RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:14:23] Reedy: can we do a test server upload if wikimania2014 video ? [21:14:34] *server side *of [21:14:45] we could do.. :P [21:15:41] Reedy: you can grab a file below 5GB encoding02.eqiad.wmflabs at /data/project/wikimania2014/ready/ [21:15:48] Where are they? [21:16:13] on encoding02.eqiad.wmflabs at /data/project/wikimania2014/ready/ [21:16:24] e.g Evaluation_I_Metrics.webm [21:16:47] Reedy: i guess you have access, if you don't i can add you [21:16:54] I won't by default [21:17:32] adding [21:19:18] Reedy: done [21:26:06] !log ran delete from localnames where ln_name="Nonoh" and ln_wiki="ruwiki" limit 1; on centralauth for https://phabricator.wikimedia.org/T85041 [21:26:10] Logged the message, Master [21:51:25] Reedy: ? [22:05:46] (03PS1) 10Ori.livneh: Revert "admin: revoke ori's key temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/181516 [22:06:02] _joe|meeting_: ^ [22:07:17] (03PS2) 10Giuseppe Lavagetto: Revert "admin: revoke ori's key temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/181516 (owner: 10Ori.livneh) [22:07:38] (03CR) 10Giuseppe Lavagetto: [C: 032] Revert "admin: revoke ori's key temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/181516 (owner: 10Ori.livneh) [22:08:02] (03CR) 10Nuria: "Analytics team respectfully requests that if the user has opted out from sending data this header is not sent." [puppet] - 10https://gerrit.wikimedia.org/r/180812 (owner: 10Dr0ptp4kt) [22:08:48] (03CR) 10OliverKeyes: "That doesn't seem to be something we should be pushing to the varnish layer, however." [puppet] - 10https://gerrit.wikimedia.org/r/180812 (owner: 10Dr0ptp4kt) [22:09:51] robh, is the edit policy on https://phabricator.wikimedia.org/T84072 a mistake? [22:10:30] wmf nda is the edit [22:10:33] that is not a mistake. [22:10:48] it seems thats what the default is for the onsite queues for now [22:11:17] (I agree it seems odd, i initially tried to change it to operations, but it caused issues mid-adjustment to phabricator on week 1 so i reverted it per mark's request) [22:11:47] well, you didnt say it seemed odd, but i think it does =] [22:16:10] robh, phabricator should not allow this really [22:16:25] not allow what? [22:16:28] edit by nda? [22:16:33] edit policy to be different from view policy [22:16:42] uh, it absoluately should [22:16:49] why? [22:16:52] i dont mind folks seeing what my onsite folks do [22:17:00] but under no reason should they be cluttering up tickets iwth questions [22:17:11] but, its not my call, and i didnt make that decision [22:17:20] and i have no idea if thats why it is [22:17:22] but i happen to like it [22:17:45] I filed a ticket against phabricator anyway. [22:18:02] There is no reason for volunteer involvement in the day to day onsite technician ticketing and tasks for physical work. [22:18:14] There is a ton of reason for volunteer involvment in like 90% of the rest of ops =] [22:18:26] (so please dont take my stance on this as anti-volunteer) [22:18:35] If it's visible, it must be editable. [22:18:53] our puppet repo disagrees [22:18:57] * matanya does take it as anti-volunteer approche :) [22:19:10] this is a task, not a git repository [22:21:17] speaking of which, robh what happened to maint announce in phab ? [22:21:31] matanya: its not in phab yet [22:21:36] its planned migration is later [22:21:47] like access and procure? [22:21:59] yep, but likely all three will go at different times [22:22:11] i expect access will go first, but thats due to its nature involving other linked tickets [22:22:15] and lots of folks [22:22:23] and thats NOT official, just what i think will happen [22:22:26] and private stuff [22:22:36] yea, there is ongoing discussion on how to implement that [22:22:52] "no, don't give him access, he is dumb!" :D [22:22:58] we have to have the option of an out of band security discussion and still do better (than we do in RT) relaying the information back to the initial requestor [22:23:27] yeah, that makes sense [22:23:40] Krenair: So I think I understand your stance from a philosophical standpoint; I just disagree with it from a practical one. [22:24:01] But not so strongly that if I turn out to be the only voice I'd tilt at that particular windmill ;] [22:24:38] I created a task, hopefully other people will join in [22:26:08] yep, i'll comment as well [22:26:13] but i agree with sharing them [22:26:25] case in point, im the one who set 90% of the onsite queue tickets to public so far ;D [22:26:30] public view that is [22:26:59] robh, your restriction doesn't even work. [22:27:21] its not MY restriction [22:27:26] It doesn't prevent questions, just prevents editing task description, priority and things [22:27:27] let me be clear [22:27:29] I didnt set any of this [22:27:33] and I just started using it last week. [22:28:02] Krenair: does it prevent commenting? [22:28:08] No [22:28:15] bah, i wish it did [22:28:18] Pff. [22:31:17] hey, if the rest of ops wants it open to everyone on earth to change our priorities, it'll change [22:31:21] but i doubt that is going to be cool [22:31:28] we use the priorities and set them on a team level [22:32:07] Every other WMF team manages. [22:33:25] hey Coren , how hard would it be for me to work on https://phabricator.wikimedia.org/T81543 without knowing wmf network setup ? [22:35:38] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [22:38:47] matanya i am working on that one, will update the ticket later today :) [22:38:56] and i'd welcome your help [22:41:22] great timing, please do let me know what you need jgage [22:41:41] thanks amigo [23:00:16] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=535 [critical =500] [23:05:13] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=535 [critical =500] [23:10:15] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] [23:15:19] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] [23:20:23] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] [23:25:27] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] [23:30:17] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] [23:33:05] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [23:35:14] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] [23:36:07] (03PS1) 10Springle: add db2030 to m1 temporarily for T85150 [puppet] - 10https://gerrit.wikimedia.org/r/181524 [23:37:59] (03CR) 10Springle: [C: 032] add db2030 to m1 temporarily for T85150 [puppet] - 10https://gerrit.wikimedia.org/r/181524 (owner: 10Springle) [23:38:28] springle: woo :) [23:39:32] JohnLewis: seemed easiest way [23:39:44] will take ~hour for cloning [23:39:48] springle: hey - if it works, who cares :P [23:40:11] I don't think that will matter as Daniel won't here for a while [23:40:18] ah np then [23:40:50] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] [23:43:02] !log xtrabackup clone db2010 to db2030 [23:43:05] Logged the message, Master [23:45:18] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] [23:50:12] PROBLEM - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] [23:53:26] ACKNOWLEDGEMENT - check_missing_thank_yous on db1025 is CRITICAL: CRITICAL missing_thank_yous=534 [critical =500] Jeff_Green awight is looking into it [23:56:44] springle, we going to have this done for phabricator as well? [23:58:04] Krenair: i'm missing something, sorry. "this done" ?