[00:18:30] PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [00:20:28] RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 17511 bytes in 0.055 second response time [00:29:37] odder: what's up? [00:36:07] 6operations, 10Deployment-Systems, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1428520 (10Krenair) Shall we open a separate ticket against scap for that? [00:43:00] ori: Not sure exactly. [00:43:23] ori: I blocked a user with the hide option to hide their user name from the log, and ended up suppressing all (?) edits made by them [00:43:32] which isn't what I thought I would achieve [00:43:43] although hoo tells me this is what the code does in fact [00:44:33] so I have in effect suppressed edits without meaning to do so, which is... interesting [00:45:09] so I think it's a failure in terms of documentation and manuals, which don't say anywhere this is actually what you're gonna get when you hide-block a user on a wiki [00:46:45] OK, but nothing is on fire, then [00:47:17] not making light of the problem, just wondering if I should drop things and help out or go back to my weekend [00:47:30] opting for the latter :) Do file a bug if you think the behavior is unreasonable though. [00:52:26] 6operations, 10Deployment-Systems, 5Patch-For-Review: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1428539 (10Krenair) [00:52:44] ori: Yes, definitely do go back [00:52:51] I'll file a bug in the morning UTC time [00:53:03] (loosely defined) [00:56:52] 7Puppet, 6Labs, 6Phabricator: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428544 (10Negative24) 3NEW [00:56:58] PROBLEM - LVS HTTPS IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [00:57:22] 7Puppet, 6Labs, 6Phabricator: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428551 (10Negative24) a:3Negative24 [00:57:39] PROBLEM - puppet last run on oxygen is CRITICAL Puppet has 1 failures [00:57:43] 7Puppet, 6Labs, 6Phabricator: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428544 (10Negative24) [00:58:18] 7Puppet, 6Labs, 6Phabricator: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428544 (10Negative24) [00:58:49] RECOVERY - LVS HTTPS IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 70201 bytes in 1.095 second response time [01:01:37] (03PS1) 10Negative24: [WIP] Phabricator: Create differential puppet role [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) [01:02:35] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Phabricator: Create differential puppet role [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) (owner: 10Negative24) [01:05:54] (03PS2) 10Negative24: [WIP] Phabricator: Create differential puppet role [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) [01:06:28] heh, I'm so good at writing puppet... [01:06:34] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Phabricator: Create differential puppet role [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) (owner: 10Negative24) [01:08:18] PROBLEM - LVS HTTP IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [01:10:14] (03PS3) 10Negative24: [WIP] Phabricator: Create differential puppet role [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) [01:10:58] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Phabricator: Create differential puppet role [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) (owner: 10Negative24) [01:11:49] RECOVERY - LVS HTTP IPv6 on text-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 495 bytes in 3.009 second response time [01:12:40] RECOVERY - puppet last run on oxygen is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [01:14:51] (03PS4) 10Negative24: [WIP] Phabricator: Create differential puppet role [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) [01:17:35] (03CR) 10Negative24: "Now that we're past the stupid mistake stage..." [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) (owner: 10Negative24) [01:19:18] (03CR) 10Negative24: "Adding a few reviewers to critique..." [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) (owner: 10Negative24) [01:52:06] (03CR) 10BBlack: [C: 032] tlsproxy: add negotiated cipher to conn props [puppet] - 10https://gerrit.wikimedia.org/r/222842 (owner: 10BBlack) [01:55:38] PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [01:57:29] RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 496 bytes in 0.004 second response time [02:09:29] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [02:11:09] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60557 bytes in 0.093 second response time [02:18:54] !log l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 07s) [02:19:01] Logged the message, Master [02:22:12] !log LocalisationUpdate completed (1.26wmf12) at 2015-07-06 02:22:12+00:00 [02:22:17] Logged the message, Master [04:35:45] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 6 04:35:45 UTC 2015 (duration 35m 44s) [04:35:49] Logged the message, Master [04:43:17] (03PS1) 10Springle: repool db1034; depool db1041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222989 [04:45:42] 6operations: Firewall configurations for database hosts - https://phabricator.wikimedia.org/T104699#1428594 (10Springle) Wasn't sure if you wanted to change that process :) 4444 would be fine. All sounds good, then. [04:50:11] 6operations, 7Database: codfw frontends cannot connect to mysql at db2029 - https://phabricator.wikimedia.org/T104573#1428596 (10Springle) A bunch of "unauthenticated user" in processlist still makes me suspect the thread pool, since that symptom has been seen on prod slaves with thread_pool_size=16 (but not t... [04:53:14] (03CR) 10Springle: [C: 032] repool db1034; depool db1041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222989 (owner: 10Springle) [04:53:20] (03Merged) 10jenkins-bot: repool db1034; depool db1041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222989 (owner: 10Springle) [04:55:59] legoktm: any info on CommonSettings.php, modified on tin? (only asking as your uid is on it, but perhaps that was from a stash?) [05:00:27] !log stash/pull/apply CommonSettings.php on tin, which was left with modifications [05:00:32] Logged the message, Master [05:01:27] !log springle Synchronized wmf-config/db-eqiad.php: repool db1034, depool db1041 (duration: 00m 12s) [05:01:31] Logged the message, Master [05:03:30] Hi folks! [05:03:37] Any active here? [05:04:45] https://www.mediawiki.org/w/index.php?title=MediaWiki_1.26/wmf12/Changelog [05:05:16] (03PS1) 10KartikMistry: CX: Enable ContentTranslation in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222991 [05:05:20] All wikimedia wikis have wmf12 version, but the changelog is empty! [05:13:29] 6operations, 7discovery-system: conftools: hostname creation validation, set != create - https://phabricator.wikimedia.org/T104574#1428609 (10Joe) p:5Triage>3Normal [05:23:02] (03PS2) 10KartikMistry: CX: Enable ContentTranslation in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222991 (https://phabricator.wikimedia.org/T94123) [05:42:53] (03PS3) 10Ori.livneh: misc-varnish: allow HTTP DELETE (for deleting dashboards in tessera) [puppet] - 10https://gerrit.wikimedia.org/r/222542 [05:54:08] PROBLEM - puppet last run on mw1211 is CRITICAL Puppet has 1 failures [05:55:43] (03CR) 10BBlack: [C: 031] misc-varnish: allow HTTP DELETE (for deleting dashboards in tessera) [puppet] - 10https://gerrit.wikimedia.org/r/222542 (owner: 10Ori.livneh) [05:56:27] (03CR) 10Ori.livneh: [C: 032] misc-varnish: allow HTTP DELETE (for deleting dashboards in tessera) [puppet] - 10https://gerrit.wikimedia.org/r/222542 (owner: 10Ori.livneh) [06:08:49] RECOVERY - puppet last run on mw1211 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:24:00] springle: hmm, shouldn't be me...lemme see what the diff is [06:24:47] springle: not me. since it's image scaler related, maybe ori or _joe_? (dirty CommonSettings.php on tin) [06:25:05] oh could be me, let me see [06:25:27] yes, that was me. reverted, sorry. [06:29:32] tcp slow start impact on page load, with enwiki's [[white house]] as an example: https://www.youtube.com/watch?v=C8orjQLacTo [06:33:09] PROBLEM - puppet last run on cp4014 is CRITICAL Puppet has 1 failures [06:35:48] PROBLEM - puppet last run on db2065 is CRITICAL Puppet has 1 failures [06:35:49] PROBLEM - puppet last run on db1015 is CRITICAL Puppet has 1 failures [06:36:09] PROBLEM - puppet last run on db1021 is CRITICAL Puppet has 1 failures [06:39:09] PROBLEM - puppet last run on mw1228 is CRITICAL Puppet has 1 failures [06:39:49] PROBLEM - puppet last run on mw2093 is CRITICAL Puppet has 1 failures [06:39:50] PROBLEM - puppet last run on mw1092 is CRITICAL Puppet has 1 failures [06:40:19] PROBLEM - puppet last run on mw1170 is CRITICAL Puppet has 1 failures [06:40:20] PROBLEM - puppet last run on mw1046 is CRITICAL Puppet has 1 failures [06:41:08] PROBLEM - puppet last run on mw1061 is CRITICAL Puppet has 1 failures [06:41:39] PROBLEM - puppet last run on mw2096 is CRITICAL Puppet has 1 failures [06:41:40] PROBLEM - puppet last run on mw2003 is CRITICAL Puppet has 1 failures [06:41:49] PROBLEM - puppet last run on mw2206 is CRITICAL Puppet has 1 failures [06:46:09] RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:46:58] RECOVERY - puppet last run on db1015 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:47:18] RECOVERY - puppet last run on db1021 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:47:48] RECOVERY - puppet last run on mw1046 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:48:29] RECOVERY - puppet last run on mw1228 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:48:29] RECOVERY - puppet last run on mw1061 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:48:48] RECOVERY - puppet last run on db2065 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:49:09] RECOVERY - puppet last run on mw2003 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:49:09] RECOVERY - puppet last run on mw1092 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:49:29] RECOVERY - puppet last run on mw1170 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:50:49] RECOVERY - puppet last run on mw2093 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:50:50] RECOVERY - puppet last run on mw2096 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:50:59] RECOVERY - puppet last run on mw2206 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:51:05] (03PS5) 10Tobias Gritschacher: Add Phragile module. [puppet] - 10https://gerrit.wikimedia.org/r/218930 (https://phabricator.wikimedia.org/T101235) (owner: 10Jakob) [07:42:24] 6operations, 7HHVM: HHVM memory leaks result in OOMs & 500 spikes - https://phabricator.wikimedia.org/T104769#1428716 (10Joe) Re-doing the same analysis we've done for T99525 I found that little to nothing has changed: - On an standard appserver I see: ``` -------------------- /tmp/heaps/1.heap => /tmp/heaps/... [07:47:44] 6operations, 7HHVM: HHVM memory leaks result in OOMs & 500 spikes - https://phabricator.wikimedia.org/T104769#1428742 (10Joe) a:3Joe [07:51:34] (03CR) 10Alexandros Kosiaris: [C: 031] static-bugzilla: update Apache config for 2.4 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/222692 (owner: 10Dzahn) [07:54:17] 6operations, 7HHVM: HHVM memory leaks result in OOMs & 500 spikes - https://phabricator.wikimedia.org/T104769#1428774 (10Joe) This surely looks like something doesn't work well within HHVM - in the code for StribngData::MakeUncounted I see the following comment: ``` // create either a static or an uncounted st... [07:58:49] 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1428781 (10Krinkle) >>! In T102827#1374685, @faidon wrote: > Moreover, I'd like to ask if there is any point of having all of the dona... [07:59:02] 6operations, 7Database: Replicate the Phabricator database to labsdb - https://phabricator.wikimedia.org/T52422#1428787 (10jcrespo) [08:08:48] PROBLEM - Cassandra database on restbase1004 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon [08:08:49] PROBLEM - Cassanda CQL query interface on restbase1004 is CRITICAL: Connection refused [08:28:49] PROBLEM - puppet last run on mw2112 is CRITICAL Puppet has 1 failures [08:30:09] (03CR) 10Santhosh: [C: 031] CX: Enable ContentTranslation in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222991 (https://phabricator.wikimedia.org/T94123) (owner: 10KartikMistry) [08:30:49] RECOVERY - Cassandra database on restbase1004 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon [08:31:07] <_joe_> !log restarted cassandra on rb1004. again. [08:31:10] Logged the message, Master [08:32:39] RECOVERY - Cassanda CQL query interface on restbase1004 is OK: TCP OK - 0.001 second response time on port 9042 [08:43:17] 6operations, 7HHVM: HHVM memory leaks result in OOMs & 500 spikes - https://phabricator.wikimedia.org/T104769#1428960 (10Joe) I just noticed we don't set hhvm.server.apc.expire_on_sets which defaults to false... this is most probably the culprit here. [08:43:26] * _joe_ headdesks ^^ [08:43:39] RECOVERY - puppet last run on mw2112 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [08:43:47] <_joe_> paravoid: I'm not /sure/ this will solve the issues, but I'm pretty confident it will [08:47:59] 6operations, 10Continuous-Integration-Infrastructure: Provide Jessie package to fullfil Mediawiki::Packages requirement - https://phabricator.wikimedia.org/T95002#1428988 (10hashar) [08:48:02] 6operations, 10Continuous-Integration-Infrastructure, 6Multimedia: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1428989 (10hashar) [08:51:53] (03PS1) 10Hashar: Use libav instead of ffmpeg on Jessie [puppet] - 10https://gerrit.wikimedia.org/r/222999 (https://phabricator.wikimedia.org/T95002) [08:51:56] (03PS1) 10Giuseppe Lavagetto: hhvm: enable apc items expiration on the canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/223000 (https://phabricator.wikimedia.org/T104769) [08:53:56] _joe_: maybe you want to do the same on beta cluster? [08:54:19] <_joe_> hashar: yes, but it won't show any difference [08:55:46] (03CR) 10Hashar: "Cherry picked on CI puppet master:" [puppet] - 10https://gerrit.wikimedia.org/r/222999 (https://phabricator.wikimedia.org/T95002) (owner: 10Hashar) [08:56:28] (03CR) 10Giuseppe Lavagetto: "this setting is used here:" [puppet] - 10https://gerrit.wikimedia.org/r/223000 (https://phabricator.wikimedia.org/T104769) (owner: 10Giuseppe Lavagetto) [08:57:00] <_joe_> hashar: I'm pretty sure this will solve our problems of "leaking" memory [08:58:09] _joe_: what I meant is that beta probably suffers from the same issue [08:58:29] <_joe_> hashar: yes, but you'll never notice it in practice [08:58:47] <_joe_> it takes one week and something of production traffic to make this OOM an appserver [08:58:55] ah ok [08:59:01] <_joe_> you maybe get 1 day of prod traffic in 10 years on beta :P [09:01:34] 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Provide Jessie package to fullfil Mediawiki::Packages requirement - https://phabricator.wikimedia.org/T95002#1429013 (10hashar) [09:01:59] <_joe_> !log restarted the appserver on mw1059 with hhvm.server.apc.expire_on_sets = true, restarted the heap profiling to confirm my hypothesis on T104769 [09:02:03] Logged the message, Master [09:02:21] 6operations, 10Continuous-Integration-Infrastructure, 6Multimedia, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1429015 (10MoritzMuehlenhoff) For jessie I recommend we use a backport of ffmpeg 2.7.1 as current... [09:07:50] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 50.00% of data above the critical threshold [500.0] [09:08:35] <_joe_> seems worse than usual ^^ [09:10:26] <_joe_> it [09:10:35] <_joe_> *it's some bot bugging us [09:11:39] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 12 data above and 0 below the confidence bounds [09:28:19] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [09:33:45] moritzm: https://phabricator.wikimedia.org/T103335 <-- did you mean libav --> ffmpeg in you comment ? [09:46:42] 6operations, 10Continuous-Integration-Infrastructure, 6Multimedia, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1429108 (10MoritzMuehlenhoff) >>! In T103335#1429015, @MoritzMuehlenhoff wrote: > In Debian the De... [09:46:52] matanya: yeah, just corrected [09:47:04] thanks [09:47:06] (for the second section) [10:13:01] (03CR) 10JanZerebecki: Add Phragile module. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/218930 (https://phabricator.wikimedia.org/T101235) (owner: 10Jakob) [10:25:22] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1429135 (10faidon) What kind of Javascript do you see loaded from wmfusercontent.org? Could you give an example? I see multiple resources loaded from there myself. It is unfortunat... [10:26:35] 6operations, 10Continuous-Integration-Infrastructure, 6Multimedia, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1429138 (10Bawolff) Umm, we already use libav. In debian (unless you go back far enough), the ffmp... [10:40:02] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1429158 (10jcrespo) Just FYI (I am not worried about this), the ones I can see being loaded on a sample uncached request to this very same ticket: https://phab.wmfusercontent.org/re... [10:41:48] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1429160 (10Aklapper) > Alternatively, the domain should redirect to a valid website That aspect is kind of covered in {T104735} [10:44:19] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected [11:16:39] PROBLEM - Cassanda CQL query interface on restbase1001 is CRITICAL: Connection refused [11:16:58] PROBLEM - Cassandra database on restbase1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (cassandra), command name java, args CassandraDaemon [11:30:04] hoo: Respected human, time to deploy WikibaseQuality/Constraints to wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150706T1130). Please do the needful. [11:33:39] PROBLEM - puppet last run on mw1042 is CRITICAL Puppet has 1 failures [11:33:58] (03PS1) 10Hoo man: Enable WikibaseQuality and WikibaseQualityConstraints on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223011 (https://phabricator.wikimedia.org/T99351) [11:40:29] !log Created the `wbqc_constraints` table on wikidatawiki [11:40:34] Logged the message, Master [11:44:48] (03CR) 10JanZerebecki: [C: 031] Enable WikibaseQuality and WikibaseQualityConstraints on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223011 (https://phabricator.wikimedia.org/T99351) (owner: 10Hoo man) [11:48:29] RECOVERY - puppet last run on mw1042 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [11:49:15] !log hoo Started scap: Update WikibaseQuality and WikibaseQualityConstraint [11:49:18] Logged the message, Master [12:01:25] (03PS1) 10Krinkle: varnish: Update default varnish error page [puppet] - 10https://gerrit.wikimedia.org/r/223012 [12:07:04] (03PS2) 10Hashar: Remove Gerrit replication to lanthanum.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/222595 (https://phabricator.wikimedia.org/T86658) [12:07:55] anyone could please merge a Gerrit config change for me? It is to disable replication of git repos on an host we are dismantling https://gerrit.wikimedia.org/r/#/c/222595/ [12:08:13] when puppet apply the conf change, it will reload/restart Gerrit automatically [12:11:36] * mobrovac on cassandra issue on rb1001 [12:15:12] !log hoo Finished scap: Update WikibaseQuality and WikibaseQualityConstraint (duration: 25m 56s) [12:15:16] Logged the message, Master [12:15:43] (03PS2) 10Yuvipanda: labstore: Minor code cleanup of the exports daemon [puppet] - 10https://gerrit.wikimedia.org/r/222690 [12:15:50] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Minor code cleanup of the exports daemon [puppet] - 10https://gerrit.wikimedia.org/r/222690 (owner: 10Yuvipanda) [12:16:09] RECOVERY - Cassandra database on restbase1001 is OK: PROCS OK: 1 process with UID = 111 (cassandra), command name java, args CassandraDaemon [12:16:38] (03CR) 10Hoo man: [C: 032] Enable WikibaseQuality and WikibaseQualityConstraints on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223011 (https://phabricator.wikimedia.org/T99351) (owner: 10Hoo man) [12:16:44] (03Merged) 10jenkins-bot: Enable WikibaseQuality and WikibaseQualityConstraints on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223011 (https://phabricator.wikimedia.org/T99351) (owner: 10Hoo man) [12:17:39] RECOVERY - Cassanda CQL query interface on restbase1001 is OK: TCP OK - 0.000 second response time on port 9042 [12:18:12] !log hoo Synchronized wmf-config/: Enable WikibaseQuality and WikibaseQualityConstraints on wikidata (duration: 00m 13s) [12:18:16] Logged the message, Master [12:21:21] (03PS1) 10Yuvipanda: labstore: Fix stupid typo [puppet] - 10https://gerrit.wikimedia.org/r/223013 [12:21:34] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Fix stupid typo [puppet] - 10https://gerrit.wikimedia.org/r/223013 (owner: 10Yuvipanda) [12:28:16] 6operations, 5Patch-For-Review: puppetmaster self: Could not find dependency Class[Role::Access_new_install] for File[/usr/local/sbin/install-console] at /etc/puppet/modules/puppetmaster/manifests/scripts.pp:62 - https://phabricator.wikimedia.org/T103499#1429434 (10hashar) 5Open>3Resolved a:3hashar Appar... [12:36:40] As of 30 minutes ago there is a 1000x spike in memcached errors, mostly from zh wiki [12:36:44] https://logstash.wikimedia.org/#/dashboard/elasticsearch/memcached [12:37:08] All about the same key zhwiki:preprocess-hash:* [12:37:24] It also seems none of the servers can access db1033:lag_times. [12:38:24] (not sure the /topic duty is still used, but ping paravoid if so) [12:40:32] 6operations: Add --no-autoloader_layout-check to operations-puppet-puppetlint-lenient - https://phabricator.wikimedia.org/T75117#1429470 (10hashar) [12:48:02] 6operations, 10RESTBase, 6Services, 7RESTBase-API: Expose RESTBase monitoring examples in Swagger spec - https://phabricator.wikimedia.org/T104850#1429505 (10mobrovac) 3NEW a:3Pchelolo [12:48:23] (03PS1) 10Yuvipanda: [WIP] labstore: Consolidate NFS exporting into daemon [puppet] - 10https://gerrit.wikimedia.org/r/223020 [12:49:30] 6operations, 10RESTBase, 6Services, 7RESTBase-API: Expose RESTBase monitoring examples in Swagger spec - https://phabricator.wikimedia.org/T104850#1429520 (10mobrovac) [12:49:33] 6operations, 6Services, 7Service-Architecture: Set up monitoring automation for services - https://phabricator.wikimedia.org/T94821#1429521 (10mobrovac) [12:49:49] 6operations, 10Continuous-Integration-Infrastructure, 7HHVM: HHVM Jenkins job throw: Unable to set CoreFileSize to 8589934592: Operation not permitted (1) - https://phabricator.wikimedia.org/T78799#1429522 (10hashar) [12:57:12] !log installed python security updates on mw*, es* and db* [12:57:16] Logged the message, Master [13:11:54] 6operations, 10RESTBase, 6Services, 7RESTBase-API: Expose RESTBase monitoring examples in Swagger spec - https://phabricator.wikimedia.org/T104850#1429576 (10mobrovac) [13:15:32] hashar: you might want to add to the phab tickets blocked by ops the "blocked on operations" project [13:15:37] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1429586 (10BBlack) Yeah it sounds to me like these are assets that should be moved to the primary phabricator site... [13:19:13] (03PS2) 10Yuvipanda: [WIP] labstore: Consolidate NFS exporting into daemon [puppet] - 10https://gerrit.wikimedia.org/r/223020 [13:19:38] (03PS3) 10Yuvipanda: [WIP] labstore: Consolidate NFS exporting into daemon [puppet] - 10https://gerrit.wikimedia.org/r/223020 [13:19:50] (03CR) 10BBlack: [C: 031] varnish: Update default varnish error page [puppet] - 10https://gerrit.wikimedia.org/r/223012 (owner: 10Krinkle) [13:20:34] oh, error page! [13:20:41] this is awesome [13:21:02] * YuviPanda sets some designers on paravoid [13:21:09] yeah I would've just merged, but figured maybe some other reviewers want a chance to stare at it first [13:21:33] (03CR) 10Faidon Liambotis: "See also T76560." [puppet] - 10https://gerrit.wikimedia.org/r/223012 (owner: 10Krinkle) [13:23:56] (03CR) 10He7d3r: [C: 031] varnish: Update default varnish error page [puppet] - 10https://gerrit.wikimedia.org/r/223012 (owner: 10Krinkle) [13:24:11] ^nice [13:24:25] even if it doesn't do all the other fancy things, it's an improvement on the status quo :) [13:24:29] yes :) [13:24:32] definitely! [13:24:58] Yeah, I agree we can do lots of interesting things, but I'd rather start by unifying what we have to something more minimal. [13:25:08] That will also make it easier to change and redesign in the future [13:25:18] hashar: regarding: "3 patches related to linting gdnsd config file on Jessie:" I had previously pinged you that the changes need a rebase and don't rebase cleanly [13:25:37] hashar: and since you probably have done so already at the integration puppetmaster, I was hoping we'd avoid the roundtrip [13:26:00] I am doubtful about what projects to assign T104853 to? #Mediawiki-databases? [13:28:16] paravoid: ah sorry. Looks like I missed the notifications :-/ [13:31:19] bblack: ooooh, nice [13:31:27] now I have to fix the one I built for tool labs as well :D [13:31:39] cmjohnson1: good morning! :) [13:31:49] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 3032.83028606 [13:32:34] (03PS4) 10Hashar: contint: authdns::lint on light Jessie slave [puppet] - 10https://gerrit.wikimedia.org/r/217467 (https://phabricator.wikimedia.org/T98003) [13:32:36] (03PS3) 10Hashar: contint: do not install zuul on light slaves [puppet] - 10https://gerrit.wikimedia.org/r/217476 (https://phabricator.wikimedia.org/T94836) [13:32:38] (03PS3) 10Hashar: contint: role::ci::slave::labs::light [puppet] - 10https://gerrit.wikimedia.org/r/217466 (https://phabricator.wikimedia.org/T94836) [13:33:16] paravoid: they needed a trivial rebase. Not sure why Gerrit flagged as not mergeable. Anyway I rebased them [13:33:20] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 7Jenkins: Please refresh Jenkins package on apt.wikimedia.org to 1.609.1 - https://phabricator.wikimedia.org/T103343#1429630 (10faidon) 5Open>3Resolved a:3faidon Done! [13:33:25] 6operations, 10ops-eqiad, 10Analytics-Cluster: analytics1020 down - https://phabricator.wikimedia.org/T104856#1429635 (10Ottomata) 3NEW a:3Cmjohnson [13:33:41] (03CR) 10Faidon Liambotis: [C: 032] contint: role::ci::slave::labs::light [puppet] - 10https://gerrit.wikimedia.org/r/217466 (https://phabricator.wikimedia.org/T94836) (owner: 10Hashar) [13:33:56] 6operations, 10ops-eqiad, 10Analytics-Cluster: analytics1020 down - https://phabricator.wikimedia.org/T104856#1429646 (10Ottomata) Related: T95263 [13:33:56] (03CR) 10Faidon Liambotis: [C: 032] contint: authdns::lint on light Jessie slave [puppet] - 10https://gerrit.wikimedia.org/r/217467 (https://phabricator.wikimedia.org/T98003) (owner: 10Hashar) [13:34:06] (03CR) 10Faidon Liambotis: [C: 032] contint: do not install zuul on light slaves [puppet] - 10https://gerrit.wikimedia.org/r/217476 (https://phabricator.wikimedia.org/T94836) (owner: 10Hashar) [13:34:25] sorry, I missed your IRC notification :( [13:34:33] no, my bad [13:34:38] (03PS3) 10Faidon Liambotis: Remove Gerrit replication to lanthanum.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/222595 (https://phabricator.wikimedia.org/T86658) (owner: 10Hashar) [13:34:44] (03CR) 10Faidon Liambotis: [C: 032 V: 032] Remove Gerrit replication to lanthanum.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/222595 (https://phabricator.wikimedia.org/T86658) (owner: 10Hashar) [13:36:27] paravoid: you have techops meeting today ? [13:36:31] we do! [13:36:48] paravoid: please make sure https://gerrit.wikimedia.org/r/221968 is broght up [13:37:33] and probably https://gerrit.wikimedia.org/r/222255 too [13:40:49] matanya: Rob's already on that [13:40:58] ah, ok [13:41:02] as he's on duty this week so :) [13:42:13] ah, monday, new on duty. should change topic from paravoid to robh [13:42:28] bd808: I ended up spending most of the weekend in random fbos waiting for weather to pass [13:42:42] wrong channel, ffs [13:43:47] matanya: they do usually around meeting time I've seen or whenever the guy gets here :p [13:44:01] yeah, i know [13:45:33] (03PS4) 10Yuvipanda: labstore: Consolidate NFS exporting into daemon [puppet] - 10https://gerrit.wikimedia.org/r/223020 [13:47:45] matanya: btw, https://github.com/subrosa-io/subrosa-server and subrosa.io (I think you were experimenting with conference solutions) [13:48:33] ah, yeah YuviPanda thanks. played with it a bit in the past [13:51:19] PROBLEM - puppet last run on mw1166 is CRITICAL Puppet has 1 failures [13:51:51] (03PS5) 10Yuvipanda: labstore: Consolidate NFS exporting into daemon [puppet] - 10https://gerrit.wikimedia.org/r/223020 [13:52:59] 6operations, 10Wikimedia-Mailing-lists: Blacklist badoo.com globally (★ fake emails and other spam) - https://phabricator.wikimedia.org/T48021#1429687 (10JohnLewis) Adding operations as this is asking for a central blacklist involving exim. [13:53:05] (03PS6) 10Yuvipanda: labstore: Consolidate NFS exporting into daemon [puppet] - 10https://gerrit.wikimedia.org/r/223020 [13:53:15] paravoid: ^ you might like to weigh in there [13:54:50] 6operations, 10Wikimedia-Mailing-lists: Ban *@utdliving.com from sending any email to the mailman server - https://phabricator.wikimedia.org/T68318#1429694 (10JohnLewis) Asking for a central blacklist as with T48021. [13:54:53] (03PS7) 10Yuvipanda: labstore: Consolidate NFS exporting into daemon [puppet] - 10https://gerrit.wikimedia.org/r/223020 [13:55:17] (03PS8) 10Yuvipanda: labstore: Consolidate NFS exporting into daemon [puppet] - 10https://gerrit.wikimedia.org/r/223020 [13:55:23] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Consolidate NFS exporting into daemon [puppet] - 10https://gerrit.wikimedia.org/r/223020 (owner: 10Yuvipanda) [14:02:03] 6operations, 10Gather, 10MobileFrontend, 7HHVM, and 2 others: [facebook/hhvm] Incorrect return value from eval, Closure generated in first eval pass is returned in the second eval pass #5502 - https://phabricator.wikimedia.org/T102937#1429708 (10phuedx) What news @Joe? [14:02:12] 6operations, 5Continuous-Integration-Isolation, 5Patch-For-Review: Backport python-diskimage-builder 0.1.46 from testing to jessie-wikimedia - https://phabricator.wikimedia.org/T102880#1429709 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [14:03:13] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1429714 (10JohnLewis) Re-poke for legal mostly. Since mailman has awkward methods of removing private content which sometimes happens (well - all of the time) per legal request, indexi... [14:07:37] !log restart apache on labcontrol1001 to pick up parser function change [14:07:41] Logged the message, Master [14:08:20] RECOVERY - puppet last run on mw1166 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [14:08:54] 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1429721 (10hashar) I have disconnected the server from Jenkins master/slave config https://integration.wikimedia.org/ci/computer/lanthanum/ [14:11:15] (03CR) 10Tobias Gritschacher: Add Phragile module. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/218930 (https://phabricator.wikimedia.org/T101235) (owner: 10Jakob) [14:13:12] (03PS1) 10Yuvipanda: labstore: Delete unneeded exports.d files [puppet] - 10https://gerrit.wikimedia.org/r/223028 [14:13:17] (03CR) 10jenkins-bot: [V: 04-1] labstore: Delete unneeded exports.d files [puppet] - 10https://gerrit.wikimedia.org/r/223028 (owner: 10Yuvipanda) [14:13:37] (03PS1) 10Giuseppe Lavagetto: varnish: enable dynamic directors for a subset of ulsfo hosts [puppet] - 10https://gerrit.wikimedia.org/r/223029 (https://phabricator.wikimedia.org/T97029) [14:13:39] (03PS1) 10Giuseppe Lavagetto: varnish: enable dynamic directors in ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/223030 (https://phabricator.wikimedia.org/T97029) [14:13:44] andrewbogott: so ^ handles deleted projects (most of them) [14:13:45] <_joe_> bblack: ^^ [14:14:12] andrewbogott: one more patch is neeeded to make it support new projects with NFS [14:14:19] (03PS2) 10Yuvipanda: labstore: Delete unneeded exports.d files [puppet] - 10https://gerrit.wikimedia.org/r/223028 [14:14:21] (03CR) 10BBlack: [C: 031] varnish: enable dynamic directors for a subset of ulsfo hosts [puppet] - 10https://gerrit.wikimedia.org/r/223029 (https://phabricator.wikimedia.org/T97029) (owner: 10Giuseppe Lavagetto) [14:14:50] <_joe_> bblack: let's go? [14:15:23] yeah [14:16:47] (03CR) 10Giuseppe Lavagetto: [C: 032] varnish: enable dynamic directors for a subset of ulsfo hosts [puppet] - 10https://gerrit.wikimedia.org/r/223029 (https://phabricator.wikimedia.org/T97029) (owner: 10Giuseppe Lavagetto) [14:17:16] <_joe_> bblack: I'll run puppet manually on 4008 (the text host) [14:17:29] (03PS6) 10Tobias Gritschacher: Add Phragile module. [puppet] - 10https://gerrit.wikimedia.org/r/218930 (https://phabricator.wikimedia.org/T101235) (owner: 10Jakob) [14:17:35] (03CR) 10Yuvipanda: [C: 032] labstore: Delete unneeded exports.d files [puppet] - 10https://gerrit.wikimedia.org/r/223028 (owner: 10Yuvipanda) [14:17:36] ok [14:17:42] (03PS3) 10Yuvipanda: labstore: Delete unneeded exports.d files [puppet] - 10https://gerrit.wikimedia.org/r/223028 [14:18:05] (03CR) 10Yuvipanda: [V: 032] labstore: Delete unneeded exports.d files [puppet] - 10https://gerrit.wikimedia.org/r/223028 (owner: 10Yuvipanda) [14:18:13] !log restbase started thinning out parsoid data (local_group_wikipedia_T_parsoid_dataDVIsgzJSne8k) for >= 22 days [14:18:17] Logged the message, Master [14:18:55] <_joe_> bblack: the vcl loaded correctly [14:20:47] (03PS1) 10Yuvipanda: labstore: Remove NFS from cephtest project [puppet] - 10https://gerrit.wikimedia.org/r/223032 (https://phabricator.wikimedia.org/T102381) [14:21:01] (03PS2) 10Yuvipanda: labstore: Remove NFS from cephtest project [puppet] - 10https://gerrit.wikimedia.org/r/223032 (https://phabricator.wikimedia.org/T102381) [14:21:08] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Remove NFS from cephtest project [puppet] - 10https://gerrit.wikimedia.org/r/223032 (https://phabricator.wikimedia.org/T102381) (owner: 10Yuvipanda) [14:22:17] chasemp: YuviPanda: who from ops folks could merge the patch for adding the Phragile module? https://gerrit.wikimedia.org/r/#/c/218930/ [14:22:31] it currently blocks a bunch of stuff for further development and it is only intended for labs. so I don't see a big issue with it.. [14:22:46] chasemp's call, I think. [14:22:52] if there's anything we could still improve about this patch, comments are welcome! [14:23:09] andrewbogott: w00t, it deletes unneeded exports.d files properly! [14:23:36] YuviPanda: ok, thx. [14:24:23] cool. There used to be a system to track and archive files from projects which had been deleted — it should be pretty easy to keep that. Like, that cleanup stage could just cat a line to a file when it deletes somethin [14:24:23] 6operations, 7Database: investigate performance_schema for wmf prod - https://phabricator.wikimedia.org/T99485#1429773 (10jcrespo) P_S OFF: ``` # 1094.7s user time, 9.4s system time, 141.22M rss, 205.21M vsz # Current date: Wed Jul 1 07:32:28 2015 # Hostname: db1018 # Files: STDIN # Overall: 4.66M total, 640... [14:24:24] g [14:24:29] <_joe_> uhm we have higher number of 503s than usual since 12:20 [14:24:35] <_joe_> anyone looked into it? [14:25:03] andrewbogott: heh, I just created https://phabricator.wikimedia.org/T104857 [14:25:26] which is different, I guess. [14:25:26] ok, let me figure out how this used to work and I’ll add to the bug [14:25:31] andrewbogott: ok! [14:25:36] it’s roughly the same [14:25:51] <_joe_> bblack: 4499.14 RxURL /w/load.php?debug=false&lang=ja&modules=startup&only=scripts&skin=minerva&target=mobile&* [14:26:07] <_joe_> debug=false, someone's been naughty [14:26:14] <_joe_> as a side note [14:26:25] 6operations, 10Traffic, 5Patch-For-Review: Sort out DHE for Forward Secrecy w/ older clients - https://phabricator.wikimedia.org/T104281#1429797 (10BBlack) Now that we're logging negotiated ciphersuites, that makes it easier to clear up issues like these. I've been checking varnish logs on this, and it does... [14:26:29] <_joe_> YuviPanda: any idea who could call us like that? [14:26:42] _joe_: not me this time! [14:26:53] <_joe_> yeah I know that now :P [14:26:54] well, that doesn't sound like the apps at least. [14:27:26] <_joe_> oh debug = false [14:27:29] <_joe_> uhm [14:27:35] <_joe_> so that's correct, sorry [14:27:36] <_joe_> meh [14:27:44] ottomata: I will look at an1020 and get back to you [14:28:36] well debug=false is "correct", but it's also the default and shouldn't be specified at all :P [14:28:38] <_joe_> so what is calling furiously that url [14:28:39] PROBLEM - DPKG on labmon1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:58] but I'm pretty sure it's our own stuff that's doing that, probably an over-reaction to the earlier bug where our own stuff was deployed with debug=true [14:29:01] <_joe_> anyways, cp4008 seems healthy to me [14:29:23] <_joe_> and the other hosts too [14:29:38] <_joe_> meaning the vcl reloaded correctly and I don't see outliers atm [14:30:20] thanks cmjohnson1 [14:32:58] (03PS1) 10Muehlenhoff: Add openstack-pkg-tools to default packages [puppet] - 10https://gerrit.wikimedia.org/r/223033 [14:33:21] 6operations, 10ops-eqiad, 10Traffic, 5Patch-For-Review: eqiad: investigate thermal issues with some cp10xx machines - https://phabricator.wikimedia.org/T103226#1429829 (10BBlack) cp1065 over the weekend: Zero thermal events and temperature still looks good! Can we set up some schedule/time to try this on... [14:34:10] RECOVERY - DPKG on labmon1001 is OK: All packages OK [14:35:24] <_joe_> the 503 plateau we see now seems to coincide with 12:18 logmsgbot: hoo Synchronized wmf-config/: Enable WikibaseQuality and WikibaseQualityConstraints on wikidata (duration: 00m 13s) [14:36:31] <_joe_> actually, that might be a casuality [14:39:11] 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1429852 (10faidon) >>! In T102827#1428781, @Krinkle wrote: > Seems unlikely indeed, though it's not unlikely people may mistype donate... [14:42:21] 6operations, 10Wikimedia-Mailing-lists, 7Mail: Blacklist badoo.com globally (★ fake emails and other spam) - https://phabricator.wikimedia.org/T48021#1429877 (10Krenair) [14:43:26] <_joe_> !log depooled the HHVM imagescaler, spitting 503s again. [14:43:31] Logged the message, Master [14:44:10] <_joe_> ori: ^^ this has happened again, ex-abrupto, around 12:30 UTC. Depooling the hhvm imagescaler solved the problem [14:46:10] !log added python-diskimage-builder 0.1.46-1+wmf1 for jessie-wikimedia on carbon [14:46:14] Logged the message, Master [14:47:31] 6operations, 10Traffic, 5Patch-For-Review: Sort out DHE for Forward Secrecy w/ older clients - https://phabricator.wikimedia.org/T104281#1429897 (10Matanya) just fyi: this is the java applet for playing ogg on older ie browsers. the best part is it is not working and being replaced by video.js (iirc). @brion... [14:47:36] 6operations, 5Continuous-Integration-Isolation, 5Patch-For-Review: Backport python-diskimage-builder 0.1.46 from testing to jessie-wikimedia - https://phabricator.wikimedia.org/T102880#1429900 (10MoritzMuehlenhoff) python-diskimage-builder 0.1.46-1+wmf1 has been added to apt.wikimedia.org @hashar: Please m... [14:48:53] !log installed python security updates on analytics*, lab* and virt* [14:48:57] Logged the message, Master [14:53:00] Who's swatting? [14:53:04] <_joe_> !log restarted hhvm on mw1152 after wiping the bytecode cache, repooling [14:53:22] jouncebot, next [14:53:22] In 0 hour(s) and 6 minute(s): Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150706T1500) [14:53:43] Krenair: CX patch need scap (or l10-update) [14:54:02] Krenair: I'll merge patch in 2 minutes. Is that fine? [14:54:27] It sounds like you've decided I'm doing it :p [14:54:31] heh [14:54:32] okay then [14:59:07] Krenair: heh [15:00:04] manybubbles anomie ostriches thcipriani marktraceur Krenair: Dear anthropoid, the time has come. Please deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150706T1500). [15:00:04] James_F: A patch you scheduled for Morning SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [15:01:12] 6operations, 10Traffic, 5Patch-For-Review: Sort out DHE for Forward Secrecy w/ older clients - https://phabricator.wikimedia.org/T104281#1429966 (10BBlack) Yeah after talking it over with @faidon, it seems like the right general approach here is either to announce with big lead-time to wikitech if we think t... [15:01:15] James_F, you here? [15:01:23] Yes. [15:02:00] (03PS3) 10Alex Monk: Enable VisualEditor for the Portal namespace on jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222617 (https://phabricator.wikimedia.org/T97313) (owner: 10Jforrester) [15:02:05] (03CR) 10Alex Monk: [C: 032] Enable VisualEditor for the Portal namespace on jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222617 (https://phabricator.wikimedia.org/T97313) (owner: 10Jforrester) [15:02:11] (03Merged) 10jenkins-bot: Enable VisualEditor for the Portal namespace on jawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222617 (https://phabricator.wikimedia.org/T97313) (owner: 10Jforrester) [15:02:41] (03PS7) 10BBlack: tlsproxy: enable DHE-2048 FS for Android 2.x, etc. [puppet] - 10https://gerrit.wikimedia.org/r/222023 (https://phabricator.wikimedia.org/T104281) [15:02:46] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222617/ (duration: 00m 12s) [15:02:50] Logged the message, Master [15:03:30] (03CR) 10BBlack: [C: 031] "Moving forward on this today most likely!" [puppet] - 10https://gerrit.wikimedia.org/r/222023 (https://phabricator.wikimedia.org/T104281) (owner: 10BBlack) [15:04:08] (03CR) 10BBlack: [C: 031] varnish: enable dynamic directors in ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/223030 (https://phabricator.wikimedia.org/T97029) (owner: 10Giuseppe Lavagetto) [15:04:48] Krenair: I merged the patch, so fill free to scap for ContentTranslation. [15:05:01] James_F, does that work? I haven't been able to convince VE to open in that namespace [15:05:09] oh, there it is [15:05:14] okay, great [15:06:07] Yeah. [15:07:02] !log krenair Started scap: https://gerrit.wikimedia.org/r/#/c/222993/ [15:07:06] Logged the message, Master [15:07:06] kart_, ^ [15:07:26] <_joe_> bblack: let's merge this second patch then! [15:07:29] Krenair: thanks [15:07:45] 6operations, 10Traffic, 7HTTPS, 5HTTPS-by-default: HTTPS Plans (tracking / high-level info) - https://phabricator.wikimedia.org/T104681#1430026 (10BBlack) [15:08:41] 6operations, 10Traffic, 7HTTPS, 5HTTPS-by-default: HTTPS Plans (tracking / high-level info) - https://phabricator.wikimedia.org/T104681#1423896 (10BBlack) Notable changes today: Switch to compat-dhe: T104281 - pending gerrit commit to switch: https://gerrit.wikimedia.org/r/#/c/222023/ . Probably going out... [15:09:27] <_joe_> !log depooled the HHVM imagescaler again [15:09:31] Logged the message, Master [15:10:25] _joe_: yup [15:11:39] <_joe_> ok then, if everything seems sane to you on those hosts, let's extend it to ulsfo in general [15:12:16] (03PS2) 10Giuseppe Lavagetto: varnish: enable dynamic directors in ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/223030 (https://phabricator.wikimedia.org/T97029) [15:12:47] (03CR) 10Giuseppe Lavagetto: [C: 032] varnish: enable dynamic directors in ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/223030 (https://phabricator.wikimedia.org/T97029) (owner: 10Giuseppe Lavagetto) [15:13:52] (03CR) 10Faidon Liambotis: [C: 04-1] varnish: Update default varnish error page (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/223012 (owner: 10Krinkle) [15:14:35] _joe_: I think I know what it is [15:14:41] (03CR) 10Faidon Liambotis: [C: 031] tlsproxy: enable DHE-2048 FS for Android 2.x, etc. [puppet] - 10https://gerrit.wikimedia.org/r/222023 (https://phabricator.wikimedia.org/T104281) (owner: 10BBlack) [15:14:53] <_joe_> ori: what could it be? [15:15:17] <_joe_> ori: I wiped out the bytecode cache earlier, it didn't help [15:16:39] (03PS1) 10Ori.livneh: Set $wgDisableOutputCompression = true for HHVM scaler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223038 [15:16:44] (03CR) 10Faidon Liambotis: [C: 031] Move dhparam support from tlsproxy to sslcert/ciphersuite (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/222839 (owner: 10BBlack) [15:17:06] (03CR) 10Ori.livneh: [C: 032] Set $wgDisableOutputCompression = true for HHVM scaler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223038 (owner: 10Ori.livneh) [15:17:32] (03Merged) 10jenkins-bot: Set $wgDisableOutputCompression = true for HHVM scaler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223038 (owner: 10Ori.livneh) [15:18:15] ori.... [15:18:47] it won't affect your scap [15:18:59] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [15:19:07] I don't think it will work while scap is running anyway? [15:19:32] it's gated by if (scaler) { if (hhvm) { } } [15:19:40] which is currently true for mw1152 and no other host [15:19:43] so i'll just sync-common there [15:19:51] and codfw, right :) [15:20:04] [15:20:07] <_joe_> !log attempting dump-apc on mw1060 [15:20:12] Logged the message, Master [15:20:45] ori: yeah, that'd also be true for codfw scalers [15:20:48] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60566 bytes in 2.593 second response time [15:20:58] although I don't know if they're actually running any mw code at the moment? [15:21:02] _joe_: try repooling mw1152? [15:21:13] Krenair: codfw is not serving mw traffic [15:21:26] are the scalers etc. running there? [15:21:46] I know it's not serving user requests [15:21:47] <_joe_> !log repooling mw1152 [15:21:51] Logged the message, Master [15:25:01] 6operations, 7HHVM, 5Patch-For-Review: HHVM memory leaks result in OOMs & 500 spikes - https://phabricator.wikimedia.org/T104769#1430120 (10Joe) tried an apc-dump on mw1060 (a server that shows significant use of memory) and it created a 5.3 GB file, while on the API appservers the dump is less than 200 MBs... [15:25:13] <_joe_> ori: also, take a look here ^^ [15:25:18] (03PS5) 10Merlijn van Deen: dynamicproxy/tools: set up outage error system [puppet] - 10https://gerrit.wikimedia.org/r/222753 (https://phabricator.wikimedia.org/T102971) [15:25:21] <_joe_> I might have a solution (maybe) [15:26:01] (03CR) 10jenkins-bot: [V: 04-1] dynamicproxy/tools: set up outage error system [puppet] - 10https://gerrit.wikimedia.org/r/222753 (https://phabricator.wikimedia.org/T102971) (owner: 10Merlijn van Deen) [15:28:00] <_joe_> ori: still seeing 503s [15:28:06] _joe_: wait [15:28:20] <_joe_> yes, I'm doing nothing :) [15:28:24] (03PS2) 10BBlack: enable ipsec for all codfw caches [puppet] - 10https://gerrit.wikimedia.org/r/219813 (https://phabricator.wikimedia.org/T81543) [15:29:11] !log krenair Finished scap: https://gerrit.wikimedia.org/r/#/c/222993/ (duration: 22m 09s) [15:29:15] Logged the message, Master [15:29:18] kart_, ^ [15:29:57] And I can visit https://fa.wikipedia.org/wiki/%D9%88%DB%8C%DA%98%D9%87:%D8%A2%D9%85%D8%A7%D8%B1_%D8%AA%D8%B1%D8%AC%D9%85%D9%87%D9%94_%D9%85%D8%AD%D8%AA%D9%88%D8%A7 [15:30:03] So I guess that fixed it? [15:31:43] can you confirm kart_? [15:32:23] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1430138 (10Negative24) The Phabricator's `res` dir are generated files based on either the `phabricator.base-uri` or `security.alternate-file-domain`. [15:34:39] 6operations, 10ops-codfw: Equip osm-cp200{1,2,3,4} with 2 1.2TB SSDs each - https://phabricator.wikimedia.org/T104610#1430143 (10Papaul) Replaced the exsisting 4X300GB SSD drives with 2x1.2TB SSD in osm-cp200(1-3) [15:35:09] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [15:36:46] kart_, ...? [15:37:24] 6operations, 10ops-eqiad, 10Traffic, 5Patch-For-Review: eqiad: investigate thermal issues with some cp10xx machines - https://phabricator.wikimedia.org/T103226#1430144 (10Cmjohnson) Yes, but I will need to buy more thermal paste first. I only had enough to do the one server on-site. [15:38:21] Krenair: oooo [15:38:34] ? [15:38:36] Krenair: Looks Good. [15:38:39] great [15:39:15] (03PS1) 10Jforrester: Set wmgVisualEditorNamespaces to array() if null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223040 [15:39:35] Krenair: ^^^ Seem sane? [15:39:46] 6operations, 10ops-eqiad, 10Traffic, 5Patch-For-Review: eqiad: investigate thermal issues with some cp10xx machines - https://phabricator.wikimedia.org/T103226#1430155 (10BBlack) ok great, let me know! if it makes this easier, we can probably chunk this up into 4 sets of 2 machines at a time, just have to... [15:40:02] James_F, I'm not sure. [15:40:19] Krenair: There might be a more PHP-y way to do it. [15:40:50] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1430156 (10mmodell) I think the assets are served from the separate domain in order to allow them to be more easily separated from dynamically generated content. Phabricator JavaScri... [15:41:27] legoktm, can you take a look at this for us? [15:44:14] 6operations, 7Easy: server admin log should include year in date (again) - https://phabricator.wikimedia.org/T85803#1430160 (10Elee) Currently, the log goes `== Month 01 ==`. Would the year go after or before? As in: `== Month 01 2015 ==` or `== 2015 Month 01 ==`? [15:44:41] o/ if anyone wants to take a quick peek at me stabbing easy bugs/tasks [15:44:42] https://phabricator.wikimedia.org/T85803 [15:45:55] (03PS1) 10KartikMistry: CX: Add 'en' as target wikis and MT support [puppet] - 10https://gerrit.wikimedia.org/r/223042 (https://phabricator.wikimedia.org/T94123) [15:46:06] also I'm going to ask a really good stupid question [15:46:09] what do I clone? =p [15:49:01] (03PS2) 10Alex Monk: Update README to remove pmtpa references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222941 [15:49:35] elee: looks like, https://git.wikimedia.org/git/operations/debs/adminbot.git [15:49:51] ARGH [15:49:54] what the actual fuck, hhvm [15:49:57] 6operations, 7Easy: server admin log should include year in date (again) - https://phabricator.wikimedia.org/T85803#1430187 (10JanZerebecki) The later: `== 2015 July 06 ==` [15:50:07] * elee throws table into air [15:50:09] kart_: thank <3 [15:50:17] 6operations, 7Easy: server admin log should include year in date (again) - https://phabricator.wikimedia.org/T85803#1430188 (10Krenair) YMD. MDY is silly [15:52:42] yes please YMD [15:52:57] YMD seems strange to me [15:53:10] <_joe_> ori: whassup? [15:53:15] DMY would be better as is year really the most important information? [15:53:38] (03CR) 10Rush: Add Phragile module. (0310 comments) [puppet] - 10https://gerrit.wikimedia.org/r/218930 (https://phabricator.wikimedia.org/T101235) (owner: 10Jakob) [15:53:40] please YMD for anything everywhere, even though I have no idea what we're talking about [15:53:46] YMD is easier for sorting (biggest -> smallest) [15:53:57] hm true [15:53:59] bblack: adminbot [15:54:14] DMY also gets confusing with local conventions on the popularity of DMY-vs-MDY [15:54:27] agree with bblack on YMD, tho if pains my confused american heart [15:54:33] 06062015 means two different things in different places [15:54:34] https://xkcd.com/1179/ [15:54:35] that's US vs EU again [15:55:36] _joe_: yeah, it's not fixed. fuck my life. sorry, can you depool it again? [15:55:44] i'll give it another shot later today. [15:56:11] oh wait [15:56:14] _joe_: hang on [15:56:22] <_joe_> lol [15:58:06] bblack: that wasn't a very good example since 6/6/15 and 6/6/15 are the same :P [15:58:54] it's 6/7/15 today [15:59:05] (03PS1) 10BBlack: Remove (www\.)?donate from most domains [dns] - 10https://gerrit.wikimedia.org/r/223044 (https://phabricator.wikimedia.org/T102827) [15:59:27] no, it's 7/6/15! [15:59:42] (03PS1) 10Elee: added year into logging [debs/adminbot] - 10https://gerrit.wikimedia.org/r/223046 [15:59:55] okay hopefully I didn't fuck that up [16:00:11] elee, that's... not going to work [16:00:20] * elee groans [16:00:49] heh [16:00:52] ha [16:01:07] (03CR) 10Faidon Liambotis: [C: 032] "Don't forget the redirects themselves (redirects.dat)" [dns] - 10https://gerrit.wikimedia.org/r/223044 (https://phabricator.wikimedia.org/T102827) (owner: 10BBlack) [16:01:10] uh okay what did I do wrong lets see [16:01:21] you need to update the format string [16:01:42] elee: also, you might also want to link the bug https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines#Auto-linking_and_cross-referencing [16:01:44] wait did I read this wrong... [16:01:48] 6operations, 10Incident-20150205-SiteOutage, 10MediaWiki-Debug-Logger, 6Reading-Infrastructure-Team, and 2 others: Decouple logging infrastructure failures from MediaWiki logging - https://phabricator.wikimedia.org/T88732#1430245 (10bd808) [16:02:16] 6operations, 7Easy: server admin log should include year in date (again) - https://phabricator.wikimedia.org/T85803#1430253 (10RobH) I hate the entire date format, as I like YYYY-MM-DD, (reference: https://xkcd.com/1179/ ). That being said, the old use had July 19th 2012. I suppose it should put in the trail... [16:02:39] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1430254 (10Mike_Peel) The dependencies I see when accessing phabricator are (including images, CSS and Javascript, looking at the main page): wait I thought the current way it inserted was July 06? [16:04:19] do we want it to be 2015-July-06 now? [16:04:29] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1430255 (10Mike_Peel) > I don't understand the logic behing this — what appears to be a "valid website" would not really give you any guarantees that it is operated by the WMF. If th... [16:05:00] elee: you have earlier on: now = datetime.datetime.utcnow(), so now is a datetime object. [16:05:04] >>> now = datetime.datetime.utcnow() [16:05:04] >>> print now.strftime("%Y-%m-%d") [16:05:05] 2015-07-06 [16:05:06] right [16:05:12] so what I need to do is [16:05:13] oh [16:05:13] wait [16:05:21] robh: is alive :D [16:05:23] I need to instantiate year = str(now.year) [16:05:27] and check that in line 43 [16:05:37] if months[now.month - 1] != month or now.day != int(day): [16:05:39] turns into [16:05:39] or just use strftime as above and let it format the string :) [16:06:38] JohnFLewis: Oh, I've been working for awhile now but i didnt wanna change the topic and get a bunch of PMs for reviews ;D [16:06:39] if now.year !int(year) or months[now.month - 1] != month or now.day != int(day): [16:06:39] im still in my monday morning pre meeting prep stages [16:06:39] bblack: wait, can we? [16:06:39] we still need to do that test in 43 [16:06:39] i just felt guilty leaving it for faidon in topic, hehe [16:06:39] also I need to figure out what 41 is doing [16:06:40] robh: well not reviews but I have two tickets that probably need a gentle 'no' from ops :) [16:06:40] (im looking at https://git.wikimedia.org/blob/operations%2Fdebs%2Fadminbot.git/579ea5abfc7b4b70d7ed82fa6b45956dd30090e9/adminlog.py#L34) [16:06:55] oh, i dont promise to weigh in on them before this afternoon at the soonest [16:07:16] but if you wanna link them to me in here i'll flag for review later [16:07:20] (flag for review = the actual private flagging system in phab so no one can track when im lurking ;) [16:07:37] robh: https://phabricator.wikimedia.org/T48021 and https://phabricator.wikimedia.org/T68318 - mailman (the bane) but wants mail filtering at exim level which has been discussed before but probably not the best/gonna happen solution [16:07:41] I'm still waiting for flags to be named [16:07:55] for now I just flag with a random color and then never check them :P [16:08:01] Negative24: i use the color coding, the closer to red it gets, the more likelyhood i dont want to touch the task [16:08:45] ;D [16:08:45] ha [16:08:45] blue or green are nice calm colors, those are low drama tasks. [16:08:45] elee: ok beats me then, I don't know the rest of the context. Seems like you're extracting stamps from the lines first? [16:08:45] it... appears to be? [16:08:47] uh hold on [16:08:48] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1430266 (10faidon) >>! In T104730#1430255, @Mike_Peel wrote: >> I don't understand the logic behing this — what appears to be a "valid website" would not really give you any guarante... [16:08:49] i joke, i kinda just random flag [16:08:51] so my problem is [16:08:59] robh: this is where you have basic logic, if project = wikimedia-mailing-list, set colour RED :p [16:09:01] the getting started with gerrit only speaks of sending a review just once [16:09:03] how does one... [16:09:06] send another patch? [16:09:10] or rather [16:09:14] a revised one? [16:09:14] elee: git review -R [16:09:22] oh so commit again [16:09:24] quash [16:11:12] then -R? [16:11:13] git commit --amend; git review -R [16:11:14] JohnFLewis: hrmm, i'll read these more in detail later, but my first reaction for massive mail config changes is to tend to lump it into our mail system revamp this quarter [16:11:14] but please dont quote me for that on this task, as i havent read it in detail, it just seems like its not small. [16:11:14] robh: hey if you want to, do it - just add more work to the goal where I'll end up going 'not worth it but sure' ;) [16:11:14] (03PS2) 10Elee: added year into logging [debs/adminbot] - 10https://gerrit.wikimedia.org/r/223046 [16:11:14] elee: gerrit doesn't do multiple commits generally [16:11:14] haha [16:11:14] 6operations: salt-minion dies if /var is full - https://phabricator.wikimedia.org/T104866#1430277 (10Krenair) Assuming this is something for operations [16:11:14] okay I swear to god [16:11:18] I'm just stupid then [16:11:20] you're on record on task saying that right? [16:11:34] 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1430287 (10BBlack) ^ Patch above is a cleanup on this based on stats/arguments above. Seems like a near-zero-impact change to me, and... [16:11:46] robh: saying its not worth it? [16:11:54] elee, so we're checking... [16:12:01] now.year != int(str(now.year)) ? [16:12:06] JohnFLewis: yep, but i was joking =] [16:12:27] I'm not but mutante is for saying 'deal with it yourselves via mailman' ;) [16:12:29] and you still didn't update the format string [16:12:55] Krenair: yeah, I'm following what the existing code does for day [16:13:49] er, clue me in on that? [16:13:50] I'm normally not this bad, just an fyi. =p the entire workflow for this is just throwing me off balance [16:14:09] robh: or tldr; "It's up to list admins to decide ... and what they want to block. I really don't want to get into global blocks if avoidable ... Mailman stuff really needs to be decentralized. If we do this once we'll get a ton of follow-ups..." :p [16:14:28] thats a legit observation that i get. [16:14:46] elee, "%s %s %d %s" - you're formatting this same string with an extra piece of information? [16:16:52] robh: but yeah - defer it to the goal and have those helpless guys deal with it (when do the goals roll over also? this week?) [16:17:21] paravoid: Is there a way to reference the data-uri from elsewhere? E.g. from a config file? [16:17:32] (03Abandoned) 10Chmarkine: Remove www.donate.wikimediafoundation.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222876 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine) [16:17:33] or in a puppet key/value thingy [16:17:42] I'd rather not inline 5K and 10K images [16:17:58] (03Abandoned) 10Chmarkine: Remove www.donate.mediawiki.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222877 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine) [16:17:59] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1430362 (10Mike_Peel) > We're digressing a bit but if someone else would own wmfusercontent.org, they would be able to issue a signed SSL certificate as well. They wouldn't be able t... [16:18:11] (03Abandoned) 10Chmarkine: Remove www.donate.wiktionary.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222880 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine) [16:18:12] 6operations: on bootup, salt-minion should not start with -d - https://phabricator.wikimedia.org/T104867#1430376 (10Krenair) Again, lack of projects, assuming this is something for operations [16:18:27] (03Abandoned) 10Chmarkine: Remove www.donate.wikipedia.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/222883 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine) [16:18:34] back [16:18:35] ah damnit [16:18:39] so er lets see [16:18:42] string string int string [16:18:46] Tobi_WMDE_SW_NA: YuviPanda|afk I hope this is helpful https://phabricator.wikimedia.org/T101235#1430294 [16:18:52] I'd want [16:18:58] string int string int string [16:19:02] okay so let me do that right now [16:19:07] wait [16:19:09] that was trivial [16:19:13] several disconected email threads and lots of ambiguity so I tried to lay it all out as I understand it [16:19:15] why did that fall out of my head thanks Krenair [16:20:15] (03PS3) 10Elee: added year into logging [debs/adminbot] - 10https://gerrit.wikimedia.org/r/223046 (https://phabricator.wikimedia.org/T85803) [16:20:24] 6operations, 10ops-codfw: Rename osm-cp2001, osm-cp2002, osm-cp2003, osm-cp2004 - https://phabricator.wikimedia.org/T104869#1430392 (10akosiaris) 3NEW a:3Papaul [16:20:28] please pass [16:20:44] (03PS1) 10BBlack: Remove dead donate hostnames from redirects [puppet] - 10https://gerrit.wikimedia.org/r/223048 (https://phabricator.wikimedia.org/T102827) [16:22:04] (03CR) 10Faidon Liambotis: [C: 032] Remove dead donate hostnames from redirects [puppet] - 10https://gerrit.wikimedia.org/r/223048 (https://phabricator.wikimedia.org/T102827) (owner: 10BBlack) [16:22:29] not sure how well this will work with existing entries elee [16:22:40] ah I'm reading the ticket over [16:22:54] so apparently people want... month day year instead? [16:23:00] wait uh right there are existing entries to deal with [16:23:05] that's not good [16:23:11] paravoid: timezone in MW log files is finally fixed -- https://phabricator.wikimedia.org/T99581#1430406 [16:23:21] that means we'd need to kludge the header reading [16:23:29] Krenair: there could be a bot that checks the time the revision was made and translate them all over to the new format [16:23:29] and I think the maxsplit parameter to line.split needs to be updated [16:23:33] that also means I'd need to deal with line 41 [16:23:43] right [16:23:45] argh [16:25:02] bd808: better late than ever :) [16:25:02] bd808: thanks! [16:25:03] yw [16:25:03] bd808: better late than ever :) [16:25:03] i see what you did there [16:25:10] :P [16:26:27] 6operations, 10ops-codfw: Equip osm-cp200{1,2,3,4} with 2 1.2TB SSDs each - https://phabricator.wikimedia.org/T104610#1430455 (10Papaul) 5Open>3Resolved replaced the 4x300GB SSD drives from osm-cp2004 with 2x1.2TB SSD drives [16:26:45] so I guess maybe one way to deal is to see if there are 4 or 5 elements in a header [16:26:52] and if there are 4, assume its the month day schema [16:26:56] and for 5, the year month day one [16:28:23] but just bashing this out would be... poor form. [16:28:24] (yell at me if I'm going about this a bad way) [16:28:25] RECOVERY - SSH on analytics1020 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [16:28:28] RECOVERY - Host analytics1020 is UPING OK - Packet loss = 0%, RTA = 1.62 ms [16:29:39] (03PS2) 10Faidon Liambotis: Add loopback IPs for cr1-eqord and cr1-eqdfw [dns] - 10https://gerrit.wikimedia.org/r/220776 [16:29:41] (03PS3) 10Faidon Liambotis: (WIP) Allocate neighbor blocks for cr1-eqord/cr1-eqdfw [dns] - 10https://gerrit.wikimedia.org/r/220777 [16:29:43] (03PS2) 10Faidon Liambotis: Repurpose s/cr2-eqiad/cr1-eqord/ to link with codfw [dns] - 10https://gerrit.wikimedia.org/r/220811 [16:29:45] (03PS2) 10Faidon Liambotis: Fix cr2-eqiad/cr1-esams GRE's PTR typos [dns] - 10https://gerrit.wikimedia.org/r/220775 [16:29:47] (03PS1) 10Faidon Liambotis: More dead subdomain cleanups [dns] - 10https://gerrit.wikimedia.org/r/223051 [16:29:54] argh [16:29:58] RECOVERY - Disk space on Hadoop worker on analytics1020 is OK: DISK OK [16:29:58] RECOVERY - Disk space on analytics1020 is OK: DISK OK [16:30:09] RECOVERY - salt-minion processes on analytics1020 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:30:09] RECOVERY - configured eth on analytics1020 is OK - interfaces up [16:30:09] RECOVERY - RAID on analytics1020 is OK no disks configured for RAID [16:30:09] RECOVERY - dhclient process on analytics1020 is OK: PROCS OK: 0 processes with command name dhclient [16:30:09] RECOVERY - Hadoop DataNode on analytics1020 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [16:30:10] heh [16:30:10] RECOVERY - DPKG on analytics1020 is OK: All packages OK [16:30:27] (03PS2) 10Faidon Liambotis: More dead subdomain cleanups [dns] - 10https://gerrit.wikimedia.org/r/223051 [16:30:31] that :) [16:31:00] oo doctor, cmjohnson, how's he doin? [16:31:18] (03PS6) 10Merlijn van Deen: dynamicproxy/tools: set up outage error system [puppet] - 10https://gerrit.wikimedia.org/r/222753 (https://phabricator.wikimedia.org/T102971) [16:31:22] (03CR) 10BBlack: [C: 032] More dead subdomain cleanups [dns] - 10https://gerrit.wikimedia.org/r/223051 (owner: 10Faidon Liambotis) [16:31:55] (03PS1) 10Alexandros Kosiaris: Name maps-test-{db,web}2XXX [dns] - 10https://gerrit.wikimedia.org/r/223052 (https://phabricator.wikimedia.org/T104869) [16:32:01] ottomata: ^^ [16:32:02] James_F, I think this unset VisualEditorNamespaces thing will break production [16:32:12] cmjohnson: what happened? [16:33:05] paravoid: Hm.. wasn't mul.wikisource.org specifically requested by the community? I found quite a few mentions of it. [16:33:05] I'm glad the rest is going though :) [16:34:55] (03PS1) 10Chmarkine: HSTS preload for Mediawiki and Wikimediafoundation [puppet] - 10https://gerrit.wikimedia.org/r/223054 (https://phabricator.wikimedia.org/T104244) [16:34:56] ottmata: the idrac license was not installed. Didn't fix after the last system board change [16:35:35] (03CR) 10Krinkle: varnish: Update default varnish error page (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/223012 (owner: 10Krinkle) [16:35:45] hm,m ok.... [16:35:55] !log depooled mw1152 [16:35:59] Logged the message, Master [16:36:34] cmjohnson: can I get an update re: labnet1002? [16:38:30] labnet1002 should be ready to install now [16:38:33] doing it now andrewbogott [16:38:33] ok! [16:38:33] !log upgrade and restart of db2029 [16:38:33] Logged the message, Master [16:39:30] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom www.$lang hostnames/redirects - https://phabricator.wikimedia.org/T102815#1430554 (10BBlack) So, working off of @faidon's per-domain-count used in the donate ticket: ``` faidon@oxygen:~$ egrep '"www\.[a-z]{2}\.' per-domain-count |head -10 10750 "w... [16:40:03] (03PS2) 10BBlack: Remove www.$lang DNS T102815 [dns] - 10https://gerrit.wikimedia.org/r/218909 [16:41:17] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom www.$lang hostnames/redirects - https://phabricator.wikimedia.org/T102815#1430565 (10BBlack) Basically, I don't know what else we can do here except go ahead and shut these off (patch ref'd earlier). They're as dead as we can make them in terms of e... [16:42:11] Krenair: Unset? [16:42:39] James_F, well, it's null [16:43:23] bblack: jfdi :) [16:44:04] that's always my inclination, but I'm trying to be accomodating and not seem like a complete loose cannon :) [16:44:36] I can +1 your patch if it'll make you feel better :p [16:44:43] 6operations, 10ops-codfw, 5Patch-For-Review: Rename osm-cp2001, osm-cp2002, osm-cp2003, osm-cp2004 - https://phabricator.wikimedia.org/T104869#1430575 (10Papaul) maps-test-db2001 port = ge-5/0/4 maps-test-db2002 port = ge-5/0/3 maps-test-web2001 port = ge-5/0/4 maps-test-web2002 port = ge-5/0/3 Rack table u... [16:45:10] (03CR) 10Chad: [C: 031] "Yay" [dns] - 10https://gerrit.wikimedia.org/r/218909 (owner: 10BBlack) [16:45:13] bblack: just for you ^ [16:45:16] lol [16:45:19] thanks :) [16:45:37] (03CR) 10Glaisher: varnish: Update default varnish error page (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/223012 (owner: 10Krinkle) [16:46:04] James_F, with the current situation we'll have a ton of warnings, and then, assuming it loads wgVisualEditorNamespaces later, we'll just get the content namespaces rather than our extra configured ones [16:46:20] 6operations, 10Traffic, 10fundraising-tech-ops, 5Patch-For-Review: Decide what to do with *.donate.wikimedia.org subdomain + TLS - https://phabricator.wikimedia.org/T102827#1430593 (10Chmarkine) Actually http://www.email.donate.wikimedia.org/ can be removed too. [16:47:33] Krenair: null -> array() isn't unsetting… [16:48:19] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [16:48:22] It's currently null [16:48:39] Which produces warnings when we try to merge stuff into it [16:49:19] (03CR) 10Alexandros Kosiaris: [C: 032] Name maps-test-{db,web}2XXX [dns] - 10https://gerrit.wikimedia.org/r/223052 (https://phabricator.wikimedia.org/T104869) (owner: 10Alexandros Kosiaris) [16:49:26] Assuming it gets set later on in the process after CommonSettings is run, VE would effectively be disabled in some namespaces? [16:50:18] 6operations, 10ops-codfw, 5Patch-For-Review: Rename osm-cp2001, osm-cp2002, osm-cp2003, osm-cp2004 - https://phabricator.wikimedia.org/T104869#1430619 (10akosiaris) switches updated, dns change merged. [16:50:19] (03CR) 10BBlack: [C: 032] HSTS preload for Mediawiki and Wikimediafoundation [puppet] - 10https://gerrit.wikimedia.org/r/223054 (https://phabricator.wikimedia.org/T104244) (owner: 10Chmarkine) [16:50:42] (03PS1) 10Cmjohnson: Adding labnet1002 to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/223055 [16:50:48] (03CR) 10BBlack: [V: 032] HSTS preload for Mediawiki and Wikimediafoundation [puppet] - 10https://gerrit.wikimedia.org/r/223054 (https://phabricator.wikimedia.org/T104244) (owner: 10Chmarkine) [16:53:29] PROBLEM - puppet last run on mw1125 is CRITICAL puppet fail [16:54:40] (03PS2) 10Cmjohnson: Adding labnet1002 to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/223055 [16:55:30] (03CR) 10Cmjohnson: [C: 032] Adding labnet1002 to netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/223055 (owner: 10Cmjohnson) [17:02:20] 6operations, 10ops-codfw, 5Patch-For-Review: Rename osm-cp2001, osm-cp2002, osm-cp2003, osm-cp2004 - https://phabricator.wikimedia.org/T104869#1430686 (10Papaul) physical label update [17:02:32] 6operations, 3Discovery-Cirrus-Sprint: Import Elasticsearch 1.6.0 deb into wmf apt - https://phabricator.wikimedia.org/T102008#1430687 (10Manybubbles) >>! In T102008#1394212, @fgiunchedi wrote: >>>! In T102008#1394186, @Manybubbles wrote: >>>>! In T102008#1394166, @fgiunchedi wrote: >>> @manybubbles, we can im... [17:03:49] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1430700 (10matmarex) >>! In T104730#1429135, @faidon wrote: > What kind of Javascript do you see loaded from wmfusercontent.org? Could you give an example? > > I see multiple resou... [17:04:36] (03PS1) 10Jcrespo: Repool db2029 after upgrade and config update [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223058 [17:04:49] PROBLEM - puppet last run on db2041 is CRITICAL puppet fail [17:07:39] Krenair, can a test a repool, doing something on tin? [17:07:48] nope [17:07:57] * Krenair logs out [17:09:12] !log jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade (duration: 00m 11s) [17:09:16] Logged the message, Master [17:09:47] (03CR) 10Jcrespo: [C: 032] Repool db2029 after upgrade and config update [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223058 (owner: 10Jcrespo) [17:10:30] RECOVERY - puppet last run on mw1125 is OK Puppet is currently enabled, last run 16 seconds ago with 0 failures [17:10:46] !log jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade(2/2) (duration: 00m 11s) [17:10:50] Logged the message, Master [17:15:08] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, 3Discovery-Wikidata-Query-Service-Sprint: Wikidata Query Service hardware - https://phabricator.wikimedia.org/T86561#1430746 (10Smalyshev) 5stalled>3Open [17:19:13] (03CR) 10Faidon Liambotis: [C: 031] Remove www.$lang DNS T102815 [dns] - 10https://gerrit.wikimedia.org/r/218909 (owner: 10BBlack) [17:21:37] (03PS1) 10Faidon Liambotis: (WIP) Make project domains template-based/DRY [dns] - 10https://gerrit.wikimedia.org/r/223059 [17:21:49] RECOVERY - puppet last run on db2041 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [17:22:07] http://deployment.wikimedia.beta.wmflabs.org/wiki/User:Poetlister [17:22:13] [58a785be] /wiki/User:Poetlister MWException from line 337 of /srv/mediawiki/php-master/includes/MagicWord.php: Error: invalid magic word 'bidi' [17:22:42] (03PS7) 10Merlijn van Deen: dynamicproxy/tools: set up outage error system [puppet] - 10https://gerrit.wikimedia.org/r/222753 (https://phabricator.wikimedia.org/T102971) [17:24:45] (03PS1) 10Jcrespo: Fully bring back up db2029 with normal load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223060 [17:25:38] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, 3Discovery-Wikidata-Query-Service-Sprint: Define the details of the hardware we need to run WDQS - https://phabricator.wikimedia.org/T104879#1430809 (10Smalyshev) 3NEW a:3Joe [17:25:41] 6operations, 7Database: codfw frontends cannot connect to mysql at db2029 - https://phabricator.wikimedia.org/T104573#1430817 (10jcrespo) a:3jcrespo I've reseted configuration to puppet defaults, upgraded and restarted the server and now it seems it works as it should. [17:25:44] (03PS3) 10BBlack: Remove www.$lang DNS T102815 [dns] - 10https://gerrit.wikimedia.org/r/218909 [17:25:55] (03CR) 10BBlack: [C: 032] Remove www.$lang DNS T102815 [dns] - 10https://gerrit.wikimedia.org/r/218909 (owner: 10BBlack) [17:27:55] 6operations, 10Traffic, 5HTTPS-by-default, 5Patch-For-Review: Preload HSTS - https://phabricator.wikimedia.org/T104244#1430829 (10BBlack) [17:27:57] 6operations, 10Traffic: Clean up DNS/redirects for TLS - https://phabricator.wikimedia.org/T102824#1430830 (10BBlack) [17:28:00] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Decom www.$lang hostnames/redirects - https://phabricator.wikimedia.org/T102815#1430827 (10BBlack) 5Open>3Resolved a:3BBlack [17:29:57] 6operations, 10Wikimedia-Mailing-lists: Ban *@utdliving.com from sending any email to the mailman server - https://phabricator.wikimedia.org/T68318#1430836 (10Dzahn) https://wikitech.wikimedia.org/wiki/Lists.wikimedia.org#Fighting_spam_in_mailman [17:31:36] ^ how am I going to find an apartment now? :P [17:32:16] <_joe_> lol [17:35:25] !log restarting cassandra instance on restbase1005: out of heap [17:35:30] Logged the message, Master [17:35:59] (03CR) 10Dzahn: static-bugzilla: update Apache config for 2.4 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/222692 (owner: 10Dzahn) [17:37:27] !log upgraded restbase1005 to jdk8 [17:37:32] Logged the message, Master [17:37:43] <_joe_> gwicke: wait, were did you get that jdk8? [17:37:50] <_joe_> didn't we remove it from the repo? [17:37:59] it's still in the repo [17:38:10] <_joe_> uhm, I'll advise against using it [17:38:12] and you're doing that without puppet? [17:38:27] we have been running it on 1004 since last week [17:38:37] $ grep jdk modules/cassandra/manifests/init.pp package { ['cassandra', 'openjdk-7-jdk']: [17:38:45] did you just apt-get install'ed it manually? [17:38:56] yeah, this is to test whether it really helps [17:39:09] once we are satisfied we'll switch over puppet for all [17:39:26] <_joe_> a test in prod? [17:39:34] _joe_: what is the issue with jdk8? [17:39:40] Krenair: It gets set before CommonSettings, I believe. [17:39:44] _joe_: yes, as it's all peachy in staging [17:39:57] we can't reproduce prod loads in staging, sadly [17:40:00] <_joe_> gwicke: it's a package we built for trying out titan [17:40:12] <_joe_> with little testing if any at all [17:40:14] akosiaris: re: static-BZ , i think nothing sets "env=nobots" anymore so it didnt do anything anymore [17:40:29] _joe_: I see, so it's not meant for prod [17:40:35] <_joe_> and no security updates too [17:40:44] <_joe_> I was sure I removed it, my bad [17:40:45] which you would probably hear if you had pushed this through puppet and code review [17:41:13] 6operations, 3Labs-Sprint-104, 3Labs-Sprint-105: Setup/Install/Deploy labnet1002 - https://phabricator.wikimedia.org/T99701#1430906 (10Andrew) [17:41:16] (03CR) 10Jcrespo: [C: 032] Fully bring back up db2029 with normal load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223060 (owner: 10Jcrespo) [17:41:19] we talked about it last week with filippo and Eric [17:41:21] 7Puppet, 6Labs, 3Labs-Sprint-104, 3Labs-Sprint-105: Allow per-host hiera overrides via wikitech - https://phabricator.wikimedia.org/T104202#1430907 (10yuvipanda) [17:42:39] !log jynus Synchronized wmf-config/db-codfw.php: increase db2029 traffic to normal levels (duration: 00m 12s) [17:42:44] Logged the message, Master [17:43:39] _joe_: we are now using the g1gc collector, which performs better for large heaps; its performance has been improved in jdk8, which is probably why it's been looking like a win on 1004 [17:44:15] <_joe_> I personally restarted cassandra on rb1004 at least 5 times last week [17:44:17] gwicke: where's that package coming from? backport from sid? jessie doesn't have openjdk-8 [17:44:20] with the large instances we are looking for anything to get us through the worst, until more data is deleted and compacted away [17:44:32] <_joe_> moritzm: yes, backport from sid [17:45:32] 1004 is the node with the largest load, followed by 1005 [17:45:48] 6operations, 10Traffic, 7HTTPS: Decom old multiple-subdomain wikis in wikipedia.org - https://phabricator.wikimedia.org/T102814#1430965 (10faidon) ``` faidon@oxygen:~$ egrep '(arbcom|wg.en)' per-domain-count 134 "arbcom-de.wikipedia.org" 100 "arbcom-nl.wikipedia.org" 74 "arbcom-en.wikipedia.org... [17:46:07] 6operations, 10ops-codfw, 5Patch-For-Review: Rename osm-cp2001, osm-cp2002, osm-cp2003, osm-cp2004 - https://phabricator.wikimedia.org/T104869#1430966 (10akosiaris) 5Open>3Resolved Resolving since all items have been done. Thanks! [17:47:15] 6operations, 10Traffic: Fix/decom multiple-subdomain wikis in wikimedia.org - https://phabricator.wikimedia.org/T102826#1430981 (10faidon) ``` faidon@oxygen:~$ egrep 'www\.(commons|meta|nl)\.wikimedia.org' per-domain-count 126 "www.commons.wikimedia.org" 10 "www.meta.wikimedia.org" 1 "www.nl.wi... [17:47:53] _joe_: so before we can move to jdk8 in general we'd need to update the package; should I create a ticket for that? [17:48:16] gwicke: all in all, !log is great already, but it'd be better to have a task and do it via gerrit so that a change is visible beforehand to others [17:48:40] if openjdk-8 turns out to be needed in prod, we can make it work, but it's a pile of work four times a year (per Oracle Critical patch update for Java) [17:48:43] (03PS2) 10Dzahn: static-bugzilla: update Apache config for 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/222692 [17:48:54] <_joe_> gwicke: what moritzm said [17:50:15] (03PS1) 10Jgreen: DNS A record for rigel.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/223063 [17:51:03] paravoid: agreed on more visibility = good [17:51:21] :) [17:51:59] (03CR) 10Jgreen: [C: 032 V: 031] DNS A record for rigel.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/223063 (owner: 10Jgreen) [17:52:09] (03PS4) 10Dzahn: switch static-bugzilla to backend bromine [puppet] - 10https://gerrit.wikimedia.org/r/222200 (https://phabricator.wikimedia.org/T101734) [17:52:18] _joe_, moritzm: how much work is the backport? is it a matter of importing the package from sid, or does it need a full rebuild? [17:52:43] <_joe_> it does need a full rebuild and some adjustments [17:53:02] it needs to be rebuild, but it has some logic to regenrate the debain/control file for various older suites [17:53:05] <_joe_> what we have now was a dodgy one-off [17:53:23] <_joe_> moritzm: AFAIR, some work was needed anyways [17:54:15] it's not too bad, I've done that multiple times for security updates for openjdk-6 and -7, (which are based on backports from sid as well due to Oracle being Oracle) [17:54:24] there are occasional bugs [17:54:27] !log authdns-update for new rigel A record [17:54:32] Logged the message, Master [17:54:45] and the build is frigging huge, the dbg package is a couple hundred megabytes [17:55:18] java land is so much fun [17:55:34] if we need this, I can build this tomorrow [17:56:08] moritzm: that would be awesome [17:56:24] should I create a ticket & assign to you? [17:56:54] <_joe_> gwicke: we'll discuss this shortly [17:57:19] kk [18:01:22] phabricator feed is gone.. hrmmmm [18:04:13] 6operations, 10RESTBase: Test JDK8 with Cassandra - https://phabricator.wikimedia.org/T104888#1431055 (10GWicke) 3NEW [18:04:26] 6operations, 10RESTBase: Update JDK 8 package in backports repo - https://phabricator.wikimedia.org/T104887#1431066 (10GWicke) [18:05:05] (03PS1) 10Chad: Phabricator: Turn on diffusion.allow-http-auth [puppet] - 10https://gerrit.wikimedia.org/r/223067 [18:05:05] 6operations, 10RESTBase: Test JDK8 with Cassandra - https://phabricator.wikimedia.org/T104888#1431068 (10GWicke) [18:05:31] (03PS2) 10Chad: Phabricator: Turn on diffusion.allow-http-auth [puppet] - 10https://gerrit.wikimedia.org/r/223067 [18:06:40] (03PS1) 10Yuvipanda: labstore: Remove nfs from dwl project [puppet] - 10https://gerrit.wikimedia.org/r/223068 (https://phabricator.wikimedia.org/T103864) [18:06:55] (03PS2) 10Yuvipanda: labstore: Remove nfs from dwl project [puppet] - 10https://gerrit.wikimedia.org/r/223068 (https://phabricator.wikimedia.org/T103864) [18:07:03] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Remove nfs from dwl project [puppet] - 10https://gerrit.wikimedia.org/r/223068 (https://phabricator.wikimedia.org/T103864) (owner: 10Yuvipanda) [18:12:12] 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1431094 (10mmodell) p:5High>3Low [18:12:43] 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1414314 (10mmodell) p:5Low>3Normal [18:14:41] 6operations, 10Deployment-Systems, 5Patch-For-Review: Trebuchet doesn't like when a deployer server is also a minion, a edge case for scap - https://phabricator.wikimedia.org/T67549#1431113 (10thcipriani) [18:15:30] (03PS1) 10Dzahn: static-bugzilla: remove role from zirconium [puppet] - 10https://gerrit.wikimedia.org/r/223071 (https://phabricator.wikimedia.org/T101734) [18:15:46] 6operations, 10Deployment-Systems, 5Patch-For-Review: Trebuchet doesn't like when a deployer server is also a minion, a edge case for scap - https://phabricator.wikimedia.org/T67549#1431124 (10thcipriani) Are there any updates that need to happen on this patch? Could pull to deployment-prep for a sanity check. [18:21:20] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, 3Discovery-Wikidata-Query-Service-Sprint: Define the details of the hardware we need to run WDQS - https://phabricator.wikimedia.org/T104879#1431165 (10Smalyshev) Initial estimate: - 64G RAM at least - SSD drive. DB size is about 70G now, if... [18:21:53] 6operations, 10Gather, 7Database, 7Schema-change: Update Gather DB schema for flagging backend - https://phabricator.wikimedia.org/T103611#1431168 (10Jdlrobson) [18:22:05] !log restarted cassandra on restbase1004 with jdk8 [18:22:10] Logged the message, Master [18:28:38] 6operations, 10RESTBase: Test JDK8 with Cassandra - https://phabricator.wikimedia.org/T104888#1431231 (10GWicke) It turns out that restbase1004 was actually downgraded to jdk7 on Friday (please log such changes!). This seems to have negatively affected its stability, with lots of restarts over the weekend, and... [18:30:37] (03PS1) 10Yuvipanda: labstore: Use safe_load vs load for yaml loading [puppet] - 10https://gerrit.wikimedia.org/r/223074 [18:32:49] (03PS1) 10Yuvipanda: labstore: Be less noisy in logging [puppet] - 10https://gerrit.wikimedia.org/r/223075 [18:32:54] (03CR) 10jenkins-bot: [V: 04-1] labstore: Be less noisy in logging [puppet] - 10https://gerrit.wikimedia.org/r/223075 (owner: 10Yuvipanda) [18:33:11] (03PS2) 10Yuvipanda: labstore: Use safe_load vs load for yaml loading [puppet] - 10https://gerrit.wikimedia.org/r/223074 [18:38:38] 6operations, 10Gather, 7Database, 7Schema-change: Update Gather DB schema for flagging backend - https://phabricator.wikimedia.org/T103611#1431287 (10Tgr) 5Open>3Resolved Ran `UPDATE gather_list SET gl_perm = 0 where gl_perm_override = 1` on Gather wikis (hewiki: 2 rows affected, enwiki: 124 rows). Ver... [18:40:39] 6operations, 10Deployment-Systems, 6Performance-Team, 6Release-Engineering, and 2 others: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#1431310 (10mmodell) p:5High>3Normal [18:40:50] 6operations, 10Gather, 7Database, 7Schema-change: Update Gather DB schema for flagging backend - https://phabricator.wikimedia.org/T103611#1431314 (10Jdlrobson) [18:42:35] chasemp: re: phragile, I won't have time to do anything at all with it, unfortunately. You should merge when you feel comfortable with it - nothing labs specific needs to be done. [18:42:43] let me put that on the ticket too [18:42:47] what about module vs labs role? [18:43:01] i.e. how do they take this module and make it applicable in labs [18:43:28] chasemp: oh, I can link to docs for that, yeah. [18:43:50] https://wikitech.wikimedia.org/wiki/Special:NovaPuppetGroup basically [18:44:10] (03CR) 10Giuseppe Lavagetto: [C: 04-1] add ferm rules for redis (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/222554 (owner: 10Muehlenhoff) [18:44:27] chasemp: responded. [18:45:40] 6operations, 10Deployment-Systems, 6Performance-Team, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1431345 (10mmodell) [18:46:33] yuvipanda: where you said "can make the role available by" they haven't defined a role at all [18:46:40] that's what I meant :) [18:46:53] chasemp: ah, then they should :) but they can just put a class there too [18:48:38] 6operations, 10Traffic, 7HTTPS: Decom old multiple-subdomain wikis in wikipedia.org - https://phabricator.wikimedia.org/T102814#1431376 (10Reedy) >>! In T102814#1430965, @faidon wrote: > ``` > faidon@oxygen:~$ egrep '(arbcom|wg.en)' per-domain-count > 134 "arbcom-de.wikipedia.org" > 100 "arbcom-nl.w... [18:51:48] !log restarted cassandra on restbase1001 with jdk8, see T104888 [18:51:52] Logged the message, Master [18:53:19] 6operations, 6Phabricator, 6Security: Phabricator dependence on wmfusercontent.org - https://phabricator.wikimedia.org/T104730#1431426 (10mmodell) @matmarex: The reason for our current setup is simply that it was the only convenient way to do it. We don't have a dedicated phabricator team and it's simply not... [18:55:46] 6operations, 10RESTBase: Test JDK8 with Cassandra - https://phabricator.wikimedia.org/T104888#1431441 (10GWicke) As latencies and timeout rates seem to have improved since switching 1004 and 1005, I went ahead and switched 1001 to jdk8 as well. This means that 1/2 the cluster and all the largest (by storage si... [18:58:13] (03PS2) 10Jforrester: Set wmgVisualEditorNamespaces to array() if null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223040 [18:58:58] PROBLEM - puppet last run on mw2064 is CRITICAL Puppet has 1 failures [19:03:40] (03PS1) 10Negative24: hiera: phab-pup-testing ssh on port 222 [puppet] - 10https://gerrit.wikimedia.org/r/223081 [19:04:00] 6operations, 7Database: codfw frontends cannot connect to mysql at db2029 - https://phabricator.wikimedia.org/T104573#1431512 (10jcrespo) 5Open>3Resolved Upgrade and reaplication of grants fixed the issue. [19:05:42] (03CR) 10Yuvipanda: [C: 032 V: 032] hiera: phab-pup-testing ssh on port 222 [puppet] - 10https://gerrit.wikimedia.org/r/223081 (owner: 10Negative24) [19:07:49] (03PS1) 10Negative24: Revert "hiera: phab-pup-testing ssh on port 222" [puppet] - 10https://gerrit.wikimedia.org/r/223084 [19:08:24] mark, hi, any updates on the ssd drives? [19:08:39] i heard they just arrived [19:09:17] suitably sweet delivery? ;) [19:10:16] MaxSem, woot! ^^ [19:10:39] w00000000000000 [19:10:56] mark, whom should we bug next :) [19:11:47] !log reduced compaction throughput from 160 to 100 mb/s across the cassandra cluster via 'nodetool -h setcompactionthroughput 100' [19:11:49] (03PS3) 10Legoktm: Set wmgVisualEditorNamespaces to array() if null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223040 (owner: 10Jforrester) [19:11:52] Logged the message, Master [19:12:32] (03PS2) 10Andrew Bogott: Limit LDAP access to internal [puppet] - 10https://gerrit.wikimedia.org/r/222567 (https://phabricator.wikimedia.org/T102481) (owner: 10Muehlenhoff) [19:12:55] yurik: alex and jaime will be working with you on that [19:13:11] so you should be getting them soon [19:14:10] (03CR) 10Andrew Bogott: [C: 032] Limit LDAP access to internal [puppet] - 10https://gerrit.wikimedia.org/r/222567 (https://phabricator.wikimedia.org/T102481) (owner: 10Muehlenhoff) [19:14:26] mark, thanks! will be happy to work with them on the project, should we set up an intro meeting soon? [19:14:30] MaxSem, ^ [19:14:31] (03PS2) 10Yuvipanda: Revert "hiera: phab-pup-testing ssh on port 222" [puppet] - 10https://gerrit.wikimedia.org/r/223084 (owner: 10Negative24) [19:14:37] yes you should [19:14:46] (03CR) 10Jforrester: [C: 031] Set wmgVisualEditorNamespaces to array() if null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223040 (owner: 10Jforrester) [19:14:56] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "hiera: phab-pup-testing ssh on port 222" [puppet] - 10https://gerrit.wikimedia.org/r/223084 (owner: 10Negative24) [19:16:52] (03PS1) 10Negative24: hiera: phab-pup-test ssh on port 222 [puppet] - 10https://gerrit.wikimedia.org/r/223088 [19:17:29] RECOVERY - puppet last run on mw2064 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [19:18:11] (03PS1) 10Ottomata: Symlink .wars for git-fat in Archiva too [puppet] - 10https://gerrit.wikimedia.org/r/223089 [19:24:07] mark, i'm guessing Jaime's last name is Crespo, but which Alex are you referring to? [19:24:26] (03Abandoned) 10Negative24: hiera: phab-pup-test ssh on port 222 [puppet] - 10https://gerrit.wikimedia.org/r/223088 (owner: 10Negative24) [19:24:50] Ops usually mean Alexandros Kosiaris [19:25:45] yurik: Krenair has root as of yesterday, so he'll be assisting you with hardware setup. [19:25:55] lol [19:26:04] yuvipanda, so we get 3 engineers! yepii [19:27:24] (03CR) 10Alex Monk: [C: 032] Set wmgVisualEditorNamespaces to array() if null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223040 (owner: 10Jforrester) [19:27:48] (03Merged) 10jenkins-bot: Set wmgVisualEditorNamespaces to array() if null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223040 (owner: 10Jforrester) [19:28:44] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/223040/ (duration: 00m 12s) [19:28:47] James_F, ^ [19:28:48] Logged the message, Master [19:28:49] legoktm: Should we be changing require_once( "$IP/extensions/VisualEditor/VisualEditor.php" ); et al. to wfLoadExtension at some point? Does it still have to wait for i18n fixes? [19:30:05] James_F: once things stabilize sure. extension-list should be updated to point to extension.json at the same time [19:32:13] legoktm: When is "once things stabilize"? Do you mean, only once all production extensions are converted? Once you've got a moment to think? Something else? [19:32:26] Krenair: Looks like it's working as anticipated. [19:32:30] great [19:32:40] Krenair: Are the fatals going down/gone? [19:32:44] James_F: once wmf13 is on all wikis [19:33:04] legoktm: Hmm. Let's not break things during Wikimania though. :-) [19:33:16] I can use the sql command in beta again now [19:33:17] yeah, no rush [19:33:25] so yes, it's fixed [19:34:26] Krenair: Success. :-) [19:41:29] yurik, um, yuvipanda was sort of making a joke when he said I had root :) [19:41:47] Krenair, that's why i marked you optional :D [19:41:49] 6operations, 10Wikimedia-Mailing-lists: Ban *@utdliving.com from sending any email to the mailman server - https://phabricator.wikimedia.org/T68318#1431736 (10Jalexander) I'm sorry, but that link is not a helpful one for this particular issue (which is a chronic one and really quite bad). The issue is that hug... [19:42:02] I probably can't actually contribute much to your meeting [19:43:02] Krenair: attend to kill time [19:43:11] (if you have some that is of course) [19:43:29] Krenair, YOU'VE BEEN VOLUNTEERED, RESISTANCE IS FUTILE [19:54:05] 7Puppet, 6Labs, 6Phabricator: On labs phabricator references security extension even though it isn't present - https://phabricator.wikimedia.org/T104904#1431755 (10Negative24) 3NEW a:3Negative24 [19:54:25] (03PS1) 10Yuvipanda: labstore: Remove NFS from puppet3-diffs project [puppet] - 10https://gerrit.wikimedia.org/r/223148 (https://phabricator.wikimedia.org/T103760) [19:54:31] (03CR) 10jenkins-bot: [V: 04-1] labstore: Remove NFS from puppet3-diffs project [puppet] - 10https://gerrit.wikimedia.org/r/223148 (https://phabricator.wikimedia.org/T103760) (owner: 10Yuvipanda) [19:55:22] (03PS2) 10Dzahn: static-bugzilla: remove role from zirconium [puppet] - 10https://gerrit.wikimedia.org/r/223071 (https://phabricator.wikimedia.org/T101734) [19:56:05] (03PS2) 10Yuvipanda: labstore: Remove NFS from puppet3-diffs project [puppet] - 10https://gerrit.wikimedia.org/r/223148 (https://phabricator.wikimedia.org/T103760) [19:56:18] JohnFLewis: what meeting is this? [19:56:28] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Remove NFS from puppet3-diffs project [puppet] - 10https://gerrit.wikimedia.org/r/223148 (https://phabricator.wikimedia.org/T103760) (owner: 10Yuvipanda) [19:56:34] Negative24: context :) [19:56:52] (give context I mean) [19:57:34] I probably can't actually contribute much to your meeting [19:58:09] Negative24: yurik/max's next steps for using recently delivered SSDs [19:58:14] (03PS3) 10Dzahn: static-bugzilla: remove role from zirconium [puppet] - 10https://gerrit.wikimedia.org/r/223071 (https://phabricator.wikimedia.org/T101734) [19:58:37] (03PS4) 10Dzahn: static-bugzilla: remove role from zirconium [puppet] - 10https://gerrit.wikimedia.org/r/223071 (https://phabricator.wikimedia.org/T101734) [19:58:43] (03CR) 10Dzahn: [C: 032] static-bugzilla: remove role from zirconium [puppet] - 10https://gerrit.wikimedia.org/r/223071 (https://phabricator.wikimedia.org/T101734) (owner: 10Dzahn) [19:59:01] Negative24, maps service :) [19:59:22] your using them for maps? [20:00:03] yurik: I thought we used paper for maps? :p [20:00:04] gwicke cscott arlolra subbu: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150706T2000). [20:00:24] JohnFLewis, we draw on sand, thank you very much [20:00:26] can't exactly read a SSD walking around a place [20:00:36] Negative24, use whom? [20:00:41] sand? reusable maps :D [20:00:41] i use everyone for maps [20:00:59] yurik: SSDs [20:01:01] (03PS2) 10Yuvipanda: [WIP] ores: worker role [puppet] - 10https://gerrit.wikimedia.org/r/222919 [20:01:18] JohnFLewis, well, technically I use computers for maps, but yes, SSDs do play a role there :) [20:01:23] nm I am completely walking into a conversation [20:01:48] yurik: computers? rackmount servers more like [20:01:58] get the specific terms ;) [20:02:13] (03PS3) 10Dzahn: static-bugzilla: update Apache config for 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/222692 [20:02:36] (03PS3) 10Yuvipanda: ores: worker role to do celery processing [puppet] - 10https://gerrit.wikimedia.org/r/222919 [20:03:39] PROBLEM - Cassanda CQL query interface on restbase1005 is CRITICAL: Connection refused [20:03:55] (03PS4) 10Dzahn: static-bugzilla: update Apache config for 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/222692 (https://phabricator.wikimedia.org/T101734) [20:03:57] * mobrovac on it ^^ [20:04:07] !log restbase restart cassandra on rb1005 [20:04:11] Logged the message, Master [20:04:39] PROBLEM - Cassandra database on restbase1005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon [20:05:43] mobrovac: whats the status of rb1009 btw [20:06:00] because i see it in icinga [20:06:02] mutante: not in prod, afaik [20:06:22] mutante: rb100[7-9] can be safely silenced [20:06:29] RECOVERY - Cassandra database on restbase1005 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon [20:06:35] mobrovac: alright [20:07:19] RECOVERY - Cassanda CQL query interface on restbase1005 is OK: TCP OK - 0.004 second response time on port 9042 [20:08:26] ACKNOWLEDGEMENT - Cassanda CQL query interface on restbase1009 is CRITICAL: Connection timed out daniel_zahn not in production [20:08:26] ACKNOWLEDGEMENT - Cassandra database on restbase1009 is CRITICAL: Timeout while attempting connection daniel_zahn not in production [20:08:26] ACKNOWLEDGEMENT - DPKG on restbase1009 is CRITICAL: Timeout while attempting connection daniel_zahn not in production [20:08:26] ACKNOWLEDGEMENT - Disk space on restbase1009 is CRITICAL: Timeout while attempting connection daniel_zahn not in production [20:08:26] ACKNOWLEDGEMENT - NTP on restbase1009 is CRITICAL: NTP CRITICAL: No response from NTP server daniel_zahn not in production [20:08:26] ACKNOWLEDGEMENT - RAID on restbase1009 is CRITICAL: Timeout while attempting connection daniel_zahn not in production [20:08:26] ACKNOWLEDGEMENT - SSH on restbase1009 is CRITICAL: Connection timed out daniel_zahn not in production [20:08:27] ACKNOWLEDGEMENT - configured eth on restbase1009 is CRITICAL: Timeout while attempting connection daniel_zahn not in production [20:08:27] ACKNOWLEDGEMENT - dhclient process on restbase1009 is CRITICAL: Timeout while attempting connection daniel_zahn not in production [20:08:28] ACKNOWLEDGEMENT - puppet last run on restbase1009 is CRITICAL: Timeout while attempting connection daniel_zahn not in production [20:08:28] ACKNOWLEDGEMENT - salt-minion processes on restbase1009 is CRITICAL: Timeout while attempting connection daniel_zahn not in production [20:09:08] ACKNOWLEDGEMENT - Host restbase1009 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn not in production [20:09:27] Acknowledge mee too you bot! :p [20:09:30] PROBLEM - puppet last run on restbase1005 is CRITICAL Puppet last ran 7 hours ago [20:10:28] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [20:10:49] 6operations, 6Analytics-Backlog, 10Deployment-Systems, 6Performance-Team, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1431847 (10Krinkle) [20:13:08] PROBLEM - Cassanda CQL query interface on restbase1005 is CRITICAL: Connection refused [20:14:05] 6operations, 6Analytics-Backlog, 10Deployment-Systems, 6Performance-Team, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1431863 (10Krinkle) Presumably by adding a tail subscriber to the varnish stream. Basically we'd collect... [20:14:08] PROBLEM - Cassandra database on restbase1005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon [20:14:50] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 1.20 ms [20:14:59] ACKNOWLEDGEMENT - Disk space on labstore2001 is CRITICAL: DISK CRITICAL - free space: /srv/backup-others-20150703 8550 MB (0% inode=98%): daniel_zahn backup - not growing - 8.4G free [20:15:54] !log restart cassandra instance on 1005 [20:15:58] Logged the message, Master [20:15:59] RECOVERY - Cassandra database on restbase1005 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon [20:16:48] RECOVERY - Cassanda CQL query interface on restbase1005 is OK: TCP OK - 0.005 second response time on port 9042 [20:16:55] (03CR) 10Yuvipanda: [C: 032 V: 032] ores: worker role to do celery processing [puppet] - 10https://gerrit.wikimedia.org/r/222919 (owner: 10Yuvipanda) [20:17:01] Hello Analytics channel. I just joined you because i was requested in T96928. This is a test but from now on i should report actual Icinga issues here but only the ones analytics is a contact for. [20:17:44] ACKNOWLEDGEMENT - ToAruShiroiNeko is OK - Chat loss = 0% [20:17:52] * yuvipanda pats mutante [20:17:59] :D [20:18:23] I am tempted to screenshot this :D [20:18:35] I've never seen an "ACKNOWLEDGEMENT" from icinga. Can someone explain? [20:18:37] its nice to know donations goes to good use :p [20:18:53] Negative24: its where someone acknowledges an issue within icinga [20:18:57] * yuvipanda trouts JohnFLewis [20:18:58] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [20:18:58] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [20:19:02] hehe [20:19:18] yuvipanda: what? :D [20:19:19] complimentary second spam [20:19:34] hashar: are you really? [20:19:35] someone, break puppet quick! [20:19:42] Negative24: http://docs.icinga.org/latest/en/extcommands2.html [20:20:03] it turns off notifications but only until the state changes again [20:20:33] mutante: you've got to break puppet while we get double spam - it'll look... awesome? cool? fun? [20:22:17] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 1.29 ms [20:22:17] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 1.29 ms [20:22:18] PROBLEM - puppet last run on labvirt1005 is CRITICAL: Connection refused by host [20:22:19] PROBLEM - puppet last run on labvirt1005 is CRITICAL: Connection refused by host [20:22:19] PROBLEM - nova-compute process on labvirt1005 is CRITICAL: Connection refused by host [20:22:19] PROBLEM - nova-compute process on labvirt1005 is CRITICAL: Connection refused by host [20:22:28] PROBLEM - salt-minion processes on labvirt1005 is CRITICAL: Connection refused by host [20:22:28] PROBLEM - salt-minion processes on labvirt1005 is CRITICAL: Connection refused by host [20:22:32] <3 [20:23:03] (03CR) 10Dzahn: [C: 032] static-bugzilla: update Apache config for 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/222692 (https://phabricator.wikimedia.org/T101734) (owner: 10Dzahn) [20:23:17] PROBLEM - Cassandra database on restbase1005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon [20:23:18] PROBLEM - Cassandra database on restbase1005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon [20:23:24] "Project policy requires all submissions to be a fast-forward." hhrrr,, yes yes.. [20:23:37] PROBLEM - Cassanda CQL query interface on restbase1005 is CRITICAL: Connection refused [20:23:37] PROBLEM - Cassanda CQL query interface on restbase1005 is CRITICAL: Connection refused [20:23:49] (03PS1) 10Yuvipanda: uwsgi: Do not setup nrpe monitor in labs [puppet] - 10https://gerrit.wikimedia.org/r/223153 [20:23:51] (03PS5) 10Dzahn: static-bugzilla: update Apache config for 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/222692 (https://phabricator.wikimedia.org/T101734) [20:25:08] RECOVERY - Cassandra database on restbase1005 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon [20:25:08] RECOVERY - Cassandra database on restbase1005 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon [20:25:08] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [20:25:08] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [20:25:28] RECOVERY - Cassanda CQL query interface on restbase1005 is OK: TCP OK - 0.011 second response time on port 9042 [20:25:28] RECOVERY - Cassanda CQL query interface on restbase1005 is OK: TCP OK - 0.011 second response time on port 9042 [20:26:58] (03CR) 10Yuvipanda: [C: 032 V: 032] uwsgi: Do not setup nrpe monitor in labs [puppet] - 10https://gerrit.wikimedia.org/r/223153 (owner: 10Yuvipanda) [20:27:27] 6operations, 10ops-eqiad, 10Analytics-Cluster: analytics1020 down - https://phabricator.wikimedia.org/T104856#1431916 (10Cmjohnson) 5Open>3Resolved Fixed. Idrac license was missing [20:28:09] (03PS6) 10Dzahn: static-bugzilla: update Apache config for 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/222692 (https://phabricator.wikimedia.org/T101734) [20:28:16] omfg, how many rebases can it possible ask for [20:29:38] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 0.99 ms [20:29:38] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 0.99 ms [20:30:21] mutante: ummm, https://static-bugzilla.wikimedia.org/ [20:30:26] (03PS1) 10Yuvipanda: ores: Fix conflict on including ores::base [puppet] - 10https://gerrit.wikimedia.org/r/223156 [20:30:32] mutante: I'm getting redirected to the annual report?? [20:30:44] legoktm: it will be fixed in a few minutes [20:30:45] lol. [20:30:54] (03PS2) 10Yuvipanda: ores: Fix conflict on including ores::base [puppet] - 10https://gerrit.wikimedia.org/r/223156 [20:30:57] ok :) [20:30:58] i am switching it right now, i had to remove the role from the old host first [20:30:59] legoktm: it's sending you a message. [20:31:02] Listen to it [20:31:08] annual report just because it's the first vhost [20:31:21] ..because i need to change Apache config [20:31:24] between 2.2 and 2.4 [20:31:28] RECOVERY - puppet last run on restbase1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:31:29] RECOVERY - puppet last run on restbase1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:31:32] (03CR) 10Yuvipanda: [C: 032 V: 032] ores: Fix conflict on including ores::base [puppet] - 10https://gerrit.wikimedia.org/r/223156 (owner: 10Yuvipanda) [20:32:40] (03PS5) 10Dzahn: switch static-bugzilla to backend bromine [puppet] - 10https://gerrit.wikimedia.org/r/222200 (https://phabricator.wikimedia.org/T101734) [20:32:54] (03PS1) 10Yuvipanda: ores: Vim has its problems, I must admit [puppet] - 10https://gerrit.wikimedia.org/r/223160 [20:32:56] now that everybody clicked on it it may be in cache.. grmbl [20:32:58] (03CR) 10jenkins-bot: [V: 04-1] ores: Vim has its problems, I must admit [puppet] - 10https://gerrit.wikimedia.org/r/223160 (owner: 10Yuvipanda) [20:33:16] (03PS2) 10Yuvipanda: ores: Vim has its problems, I must admit [puppet] - 10https://gerrit.wikimedia.org/r/223160 [20:33:19] PROBLEM - dhclient process on labvirt1005 is CRITICAL: Connection refused by host [20:33:19] PROBLEM - dhclient process on labvirt1005 is CRITICAL: Connection refused by host [20:33:34] (03CR) 10Yuvipanda: [C: 032 V: 032] ores: Vim has its problems, I must admit [puppet] - 10https://gerrit.wikimedia.org/r/223160 (owner: 10Yuvipanda) [20:33:42] (03CR) 10Dzahn: [C: 032] switch static-bugzilla to backend bromine [puppet] - 10https://gerrit.wikimedia.org/r/222200 (https://phabricator.wikimedia.org/T101734) (owner: 10Dzahn) [20:33:48] (03PS6) 10Dzahn: switch static-bugzilla to backend bromine [puppet] - 10https://gerrit.wikimedia.org/r/222200 (https://phabricator.wikimedia.org/T101734) [20:33:57] "lab sores" [20:34:08] PROBLEM - Disk space on labvirt1005 is CRITICAL: Connection refused by host [20:34:08] PROBLEM - Disk space on labvirt1005 is CRITICAL: Connection refused by host [20:34:16] andrewbogott: ^ [20:34:19] PROBLEM - RAID on labvirt1005 is CRITICAL: Connection refused by host [20:34:19] PROBLEM - RAID on labvirt1005 is CRITICAL: Connection refused by host [20:34:20] PROBLEM - DPKG on labvirt1005 is CRITICAL: Connection refused by host [20:34:20] PROBLEM - DPKG on labvirt1005 is CRITICAL: Connection refused by host [20:34:25] mutante: that’s me, rebuilding. [20:34:33] Sorry, it shouldn’t have thrown alerts but puppet is acting goofy [20:34:52] andrewbogott: alright [20:34:57] PROBLEM - configured eth on labvirt1005 is CRITICAL: Connection refused by host [20:34:57] PROBLEM - configured eth on labvirt1005 is CRITICAL: Connection refused by host [20:35:29] ACKNOWLEDGEMENT - DPKG on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - DPKG on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - Disk space on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - NTP on labvirt1005 is CRITICAL: NTP CRITICAL: No response from NTP server andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - RAID on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - configured eth on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - Disk space on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - NTP on labvirt1005 is CRITICAL: NTP CRITICAL: No response from NTP server andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - RAID on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - configured eth on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - dhclient process on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - nova-compute process on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:29] ACKNOWLEDGEMENT - dhclient process on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:30] ACKNOWLEDGEMENT - nova-compute process on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:30] ACKNOWLEDGEMENT - puppet last run on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:30] ACKNOWLEDGEMENT - salt-minion processes on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:31] ACKNOWLEDGEMENT - puppet last run on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:31] ACKNOWLEDGEMENT - salt-minion processes on labvirt1005 is CRITICAL: Connection refused by host andrew bogott rebuilding [20:35:38] :) [20:35:47] PROBLEM - puppet last run on mw1043 is CRITICAL puppet fail [20:35:47] PROBLEM - puppet last run on mw1043 is CRITICAL puppet fail [20:35:48] PROBLEM - puppet last run on db1070 is CRITICAL puppet fail [20:35:48] PROBLEM - puppet last run on db1070 is CRITICAL puppet fail [20:36:08] PROBLEM - puppet last run on elastic1023 is CRITICAL puppet fail [20:36:08] PROBLEM - puppet last run on elastic1023 is CRITICAL puppet fail [20:36:28] PROBLEM - puppet last run on nitrogen is CRITICAL puppet fail [20:36:28] PROBLEM - puppet last run on nitrogen is CRITICAL puppet fail [20:36:28] PROBLEM - puppet last run on elastic1017 is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on elastic1017 is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on db1030 is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on dbstore1002 is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on db1030 is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on dbstore1002 is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on oxygen is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on oxygen is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on wtp1003 is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on wtp1003 is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on db2063 is CRITICAL puppet fail [20:36:29] PROBLEM - puppet last run on db2063 is CRITICAL puppet fail [20:36:37] PROBLEM - puppet last run on mw1016 is CRITICAL puppet fail [20:36:38] PROBLEM - puppet last run on mw1016 is CRITICAL puppet fail [20:36:38] PROBLEM - puppet last run on mw1024 is CRITICAL puppet fail [20:36:38] PROBLEM - puppet last run on mw1024 is CRITICAL puppet fail [20:36:38] PROBLEM - puppet last run on ganeti2003 is CRITICAL puppet fail [20:36:38] PROBLEM - puppet last run on ganeti2003 is CRITICAL puppet fail [20:36:47] PROBLEM - puppet last run on analytics1028 is CRITICAL puppet fail [20:36:47] PROBLEM - puppet last run on analytics1028 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on cp3039 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on cp3039 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on ms-be1011 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on ms-be1011 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on db2053 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on db2053 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on ganeti2002 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on ganeti2002 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on mw1201 is CRITICAL puppet fail [20:36:48] PROBLEM - puppet last run on mw1201 is CRITICAL puppet fail [20:36:49] PROBLEM - puppet last run on mw1022 is CRITICAL puppet fail [20:36:49] PROBLEM - puppet last run on mw1022 is CRITICAL puppet fail [20:36:49] PROBLEM - puppet last run on mw1077 is CRITICAL puppet fail [20:36:49] PROBLEM - puppet last run on mw1077 is CRITICAL puppet fail [20:36:49] PROBLEM - puppet last run on mw1105 is CRITICAL puppet fail [20:36:49] PROBLEM - puppet last run on mw1105 is CRITICAL puppet fail [20:36:50] PROBLEM - puppet last run on wtp2020 is CRITICAL puppet fail [20:36:50] PROBLEM - puppet last run on wtp2020 is CRITICAL puppet fail [20:36:57] PROBLEM - puppet last run on db2050 is CRITICAL puppet fail [20:36:57] PROBLEM - puppet last run on wtp2006 is CRITICAL puppet fail [20:36:57] PROBLEM - puppet last run on db2050 is CRITICAL puppet fail [20:36:57] PROBLEM - puppet last run on wtp2006 is CRITICAL puppet fail [20:36:57] lol [20:36:58] PROBLEM - puppet last run on mw1231 is CRITICAL puppet fail [20:36:58] PROBLEM - puppet last run on mw1231 is CRITICAL puppet fail [20:36:58] PROBLEM - puppet last run on mw2204 is CRITICAL puppet fail [20:36:58] PROBLEM - puppet last run on mw2204 is CRITICAL puppet fail [20:36:58] PROBLEM - puppet last run on ms-fe2002 is CRITICAL puppet fail [20:36:58] PROBLEM - puppet last run on ms-fe2002 is CRITICAL puppet fail [20:36:58] PROBLEM - puppet last run on mw2153 is CRITICAL puppet fail [20:36:58] PROBLEM - puppet last run on mw2153 is CRITICAL puppet fail [20:36:58] PROBLEM - puppet last run on mw2175 is CRITICAL puppet fail [20:36:58] PROBLEM - puppet last run on mw2175 is CRITICAL puppet fail [20:36:59] PROBLEM - puppet last run on iodine is CRITICAL puppet fail [20:36:59] PROBLEM - puppet last run on labvirt1008 is CRITICAL puppet fail [20:36:59] PROBLEM - puppet last run on iodine is CRITICAL puppet fail [20:36:59] PROBLEM - puppet last run on labvirt1008 is CRITICAL puppet fail [20:37:00] PROBLEM - puppet last run on mw1108 is CRITICAL puppet fail [20:37:00] PROBLEM - puppet last run on es1010 is CRITICAL puppet fail [20:37:00] PROBLEM - puppet last run on mw1108 is CRITICAL puppet fail [20:37:00] PROBLEM - puppet last run on es1010 is CRITICAL puppet fail [20:37:00] PROBLEM - puppet last run on analytics1004 is CRITICAL puppet fail [20:37:01] PROBLEM - puppet last run on analytics1004 is CRITICAL puppet fail [20:37:01] PROBLEM - puppet last run on db1063 is CRITICAL puppet fail [20:37:02] PROBLEM - puppet last run on graphite1001 is CRITICAL puppet fail [20:37:02] Can we at least kill the dupe bot? [20:37:17] PROBLEM - puppet last run on mw1225 is CRITICAL puppet fail [20:37:17] PROBLEM - puppet last run on mw1225 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on db2011 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on db2011 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on db2012 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on db2012 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on fluorine is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on fluorine is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on mw2001 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on mw2001 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on mw2064 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on mw2064 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on mw2071 is CRITICAL puppet fail [20:37:18] PROBLEM - puppet last run on mw2071 is CRITICAL puppet fail [20:37:19] PROBLEM - puppet last run on db2030 is CRITICAL puppet fail [20:37:19] PROBLEM - puppet last run on db2030 is CRITICAL puppet fail [20:37:19] PROBLEM - puppet last run on mw2009 is CRITICAL puppet fail [20:37:19] PROBLEM - puppet last run on mw2009 is CRITICAL puppet fail [20:37:20] PROBLEM - puppet last run on mw2031 is CRITICAL puppet fail [20:37:20] PROBLEM - puppet last run on mw2031 is CRITICAL puppet fail [20:37:20] PROBLEM - puppet last run on db2028 is CRITICAL puppet fail [20:37:20] PROBLEM - puppet last run on db2028 is CRITICAL puppet fail [20:37:21] PROBLEM - puppet last run on iridium is CRITICAL puppet fail [20:37:21] PROBLEM - puppet last run on iridium is CRITICAL puppet fail [20:37:22] PROBLEM - puppet last run on mw2010 is CRITICAL puppet fail [20:37:22] PROBLEM - puppet last run on mw2010 is CRITICAL puppet fail [20:37:22] PROBLEM - puppet last run on mw2063 is CRITICAL puppet fail [20:37:22] PROBLEM - puppet last run on mw2063 is CRITICAL puppet fail [20:37:23] ugh [20:37:24] that's me [20:37:27] PROBLEM - puppet last run on mw1093 is CRITICAL puppet fail [20:37:27] PROBLEM - puppet last run on mw1093 is CRITICAL puppet fail [20:37:27] PROBLEM - puppet last run on mw1185 is CRITICAL puppet fail [20:37:27] PROBLEM - puppet last run on mw1185 is CRITICAL puppet fail [20:37:27] PROBLEM - puppet last run on mc2009 is CRITICAL puppet fail [20:37:27] PROBLEM - puppet last run on mc2009 is CRITICAL puppet fail [20:37:27] PROBLEM - puppet last run on mw1121 is CRITICAL puppet fail [20:37:27] PROBLEM - puppet last run on mw1121 is CRITICAL puppet fail [20:37:28] PROBLEM - puppet last run on mw2020 is CRITICAL puppet fail [20:37:28] PROBLEM - puppet last run on mw2020 is CRITICAL puppet fail [20:37:28] PROBLEM - puppet last run on mc2012 is CRITICAL puppet fail [20:37:28] PROBLEM - puppet last run on mc2012 is CRITICAL puppet fail [20:37:48] PROBLEM - puppet last run on mw1209 is CRITICAL puppet fail [20:37:48] PROBLEM - puppet last run on mw1219 is CRITICAL puppet fail [20:37:48] PROBLEM - puppet last run on ganeti2001 is CRITICAL puppet fail [20:37:51] Reedy: there, fixed itself [20:37:56] no I killed one [20:37:56] yuvipanda: :D [20:37:57] PROBLEM - puppet last run on mw1139 is CRITICAL puppet fail [20:37:57] PROBLEM - puppet last run on cp3049 is CRITICAL puppet fail [20:37:57] PROBLEM - puppet last run on labvirt1004 is CRITICAL puppet fail [20:37:57] PROBLEM - puppet last run on cp1052 is CRITICAL puppet fail [20:38:01] lol [20:38:02] oh heh [20:38:09] killed both [20:38:28] (03PS2) 10RobH: access: add DCausse shell account [puppet] - 10https://gerrit.wikimedia.org/r/221967 (owner: 10Matanya) [20:39:10] !log upload php5_5.3.10-1ubuntu3.19-wmf1 on apt.wikimedia.org/precise-wikimedia [20:39:14] Logged the message, Master [20:39:31] (03CR) 10RobH: [C: 032] access: add DCausse shell account [puppet] - 10https://gerrit.wikimedia.org/r/221967 (owner: 10Matanya) [20:39:38] (03PS1) 10Chad: Gerrit: remove ::old role [puppet] - 10https://gerrit.wikimedia.org/r/223161 [20:39:41] (03PS2) 10RobH: access: Grant dcausse root on the search cluster [puppet] - 10https://gerrit.wikimedia.org/r/221968 (owner: 10Matanya) [20:39:56] (03PS1) 10Yuvipanda: celery: Make celery::worker a define rather than a class [puppet] - 10https://gerrit.wikimedia.org/r/223162 [20:40:00] role::gerrit::production::old [20:40:03] lol, wtf? [20:40:15] (03PS2) 10Yuvipanda: celery: Make celery::worker a define rather than a class [puppet] - 10https://gerrit.wikimedia.org/r/223162 [20:40:22] (03CR) 10Yuvipanda: [C: 032 V: 032] celery: Make celery::worker a define rather than a class [puppet] - 10https://gerrit.wikimedia.org/r/223162 (owner: 10Yuvipanda) [20:40:32] role::gitblit::deprecate ?:) [20:40:55] 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1431985 (10hashar) [20:41:25] robh: I merged you [20:42:02] legoktm: it's back (https://static-bugzilla.wikimedia.org/12345) except what i was predicting, startpage now cached in varnish ..hrmm [20:42:24] mutante: thanks :) maybe just purge that url in varnish? [20:42:31] legoktm: the problem is the "just" part [20:42:36] :P [20:42:51] 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1431990 (10hashar) a:3RobH Removed the blockers that have been achieved for lanthanum.eqiad.wmnet There will be some puppet cleanup to conduct. Some part sti... [20:42:55] (03PS1) 10Yuvipanda: celery: Specify template_name for systemd unit explicitly [puppet] - 10https://gerrit.wikimedia.org/r/223164 [20:42:57] legoktm: what's better now.. it's on jessie.. and on a virtual machine but in prod [20:43:00] (03CR) 10jenkins-bot: [V: 04-1] celery: Specify template_name for systemd unit explicitly [puppet] - 10https://gerrit.wikimedia.org/r/223164 (owner: 10Yuvipanda) [20:43:08] legoktm: and apache 2.4 [20:43:15] yay :) [20:43:18] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [20:43:32] robh: seeing the above, I'm gonna patch a reclaim for lanthanum :) [20:43:44] (03PS2) 10Yuvipanda: celery: Specify template_name for systemd unit explicitly [puppet] - 10https://gerrit.wikimedia.org/r/223164 [20:43:46] hashar: ^ is why I was wondering earlier ;) [20:44:17] did someone merge on palladium? [20:44:29] robh: yuvipanda [20:44:40] (03CR) 10Yuvipanda: [C: 032 V: 032] celery: Specify template_name for systemd unit explicitly [puppet] - 10https://gerrit.wikimedia.org/r/223164 (owner: 10Yuvipanda) [20:44:42] yuvipanda: you merged my change yes? [20:44:49] robh: yup [20:44:57] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 0.45 ms [20:44:59] stop messing with the old folks damn you! ;D [20:45:18] * yuvipanda displays shocking-red hair at robh [20:45:22] * yuvipanda gets on robh's lawn [20:45:40] (03PS3) 10RobH: access: Grant dcausse root on the search cluster [puppet] - 10https://gerrit.wikimedia.org/r/221968 (owner: 10Matanya) [20:45:52] yuvipanda: when your hair falls out due to the color abuse [20:45:57] i'll feel bad for you. [20:46:13] robh: I'm pretty sure I've genetically inherited MPB, so might as well abuse it while I still have it! [20:46:14] Then you'll laugh at him? [20:46:18] legoktm: why do i say "just" is the problem? because the striked out part on https://static-bugzilla.wikimedia.org/12345 [20:46:19] nah, feel his pain [20:46:22] im not a monster! [20:46:29] legoktm: eh, wrong paste. https://wikitech.wikimedia.org/wiki/Varnish#One-off_purges [20:47:52] JohnFLewis: you probably want to get the machine renamed though [20:48:11] hashar: nope [20:48:29] its misc so it'll be called lanthanum still [20:48:38] hey _joe_, how did you fix this HHVM load.php GZip issue? [20:48:48] RECOVERY - nova-compute process on labvirt1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [20:48:54] werdna: stop stealing our secrets! [20:49:05] (03PS1) 10John F. Lewis: reclaim lanthanum: remove lanthanum.eqaid.wmnet [dns] - 10https://gerrit.wikimedia.org/r/223167 (https://phabricator.wikimedia.org/T86658) [20:49:28] hashar: ^ see the comment :) [20:49:50] JohnFLewis: works for me :-} [20:50:10] (03PS1) 10Yuvipanda: uwsgi: Fix ruby plugin version for jessie [puppet] - 10https://gerrit.wikimedia.org/r/223168 [20:50:21] (03PS2) 10Yuvipanda: uwsgi: Fix ruby plugin version for jessie [puppet] - 10https://gerrit.wikimedia.org/r/223168 [20:50:27] (03CR) 10Yuvipanda: [C: 032 V: 032] uwsgi: Fix ruby plugin version for jessie [puppet] - 10https://gerrit.wikimedia.org/r/223168 (owner: 10Yuvipanda) [20:50:29] (03PS1) 10Chad: Gerrit: Remove $extra_groups from replicationdest, nothing uses it [puppet] - 10https://gerrit.wikimedia.org/r/223169 [20:50:31] (03PS1) 10Chad: Gitblit: Remove ssl cert stuff [puppet] - 10https://gerrit.wikimedia.org/r/223170 [20:50:33] (03PS1) 10Chad: Gerrit: Remove ::labs role [puppet] - 10https://gerrit.wikimedia.org/r/223171 [20:51:32] 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Phase out lanthanum.eqiad.wmnet - https://phabricator.wikimedia.org/T86658#1432015 (10hashar) Removed it from the Jenkins slaves configuration. [20:51:38] (03PS1) 10BryanDavis: logstash: Enable user & group authz modules for Kibana [puppet] - 10https://gerrit.wikimedia.org/r/223172 (https://phabricator.wikimedia.org/T103804) [20:52:22] (03PS4) 10RobH: access: Grant dcausse root on the search cluster [puppet] - 10https://gerrit.wikimedia.org/r/221968 (owner: 10Matanya) [20:52:29] deploying new parsoid code [20:53:27] RECOVERY - puppet last run on oxygen is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [20:53:38] RECOVERY - puppet last run on planet1001 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [20:53:47] RECOVERY - puppet last run on ganeti2002 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [20:53:48] RECOVERY - puppet last run on cp3039 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [20:53:49] RECOVERY - puppet last run on db1006 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [20:53:51] (03CR) 10RobH: [C: 032] access: Grant dcausse root on the search cluster [puppet] - 10https://gerrit.wikimedia.org/r/221968 (owner: 10Matanya) [20:53:54] (03PS1) 10Hashar: Reclaim lanthanum: remove related puppet conf [puppet] - 10https://gerrit.wikimedia.org/r/223175 (https://phabricator.wikimedia.org/T86658) [20:53:57] RECOVERY - puppet last run on es1010 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [20:53:58] RECOVERY - puppet last run on db1063 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [20:53:58] RECOVERY - puppet last run on mc2002 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [20:53:59] RECOVERY - puppet last run on wtp1007 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [20:53:59] RECOVERY - puppet last run on maerlant is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [20:54:15] (03PS2) 10Hashar: Reclaim lanthanum: remove related puppet conf [puppet] - 10https://gerrit.wikimedia.org/r/223175 (https://phabricator.wikimedia.org/T86658) [20:54:17] RECOVERY - puppet last run on db2011 is OK Puppet is currently enabled, last run 16 seconds ago with 0 failures [20:54:17] RECOVERY - puppet last run on db2030 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [20:54:18] RECOVERY - puppet last run on db2028 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [20:54:18] RECOVERY - puppet last run on mc2009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:54:19] RECOVERY - puppet last run on logstash1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:54:27] RECOVERY - puppet last run on calcium is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [20:54:27] RECOVERY - puppet last run on mc2012 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:54:27] RECOVERY - puppet last run on cp4002 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [20:54:28] RECOVERY - puppet last run on db1011 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [20:54:28] RECOVERY - puppet last run on mc1018 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [20:54:31] (03CR) 10Hashar: [C: 031] "lanthanum is no more used by CI. puppet cleanup is https://gerrit.wikimedia.org/r/#/c/223175/" [dns] - 10https://gerrit.wikimedia.org/r/223167 (https://phabricator.wikimedia.org/T86658) (owner: 10John F. Lewis) [20:54:37] RECOVERY - puppet last run on db1070 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:54:47] RECOVERY - puppet last run on cp1052 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [20:54:48] RECOVERY - puppet last run on ganeti2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:54:48] RECOVERY - puppet last run on cp3049 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:54:57] RECOVERY - puppet last run on db1038 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:54:58] RECOVERY - puppet last run on elastic1023 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:54:59] JohnFLewis: I did the puppet patch to clean lanthanum. Have an happy reclaim :) [20:55:08] RECOVERY - puppet last run on nitrogen is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:08] RECOVERY - puppet last run on virt1008 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [20:55:08] RECOVERY - puppet last run on db1061 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:08] RECOVERY - puppet last run on elastic1017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:18] RECOVERY - puppet last run on dbstore1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:18] RECOVERY - puppet last run on db1030 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:18] RECOVERY - puppet last run on wtp1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:18] RECOVERY - puppet last run on db1033 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:19] RECOVERY - puppet last run on db2063 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:27] RECOVERY - puppet last run on cp4009 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [20:55:27] RECOVERY - puppet last run on cp4012 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [20:55:27] RECOVERY - puppet last run on cp4020 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:27] RECOVERY - puppet last run on cp3047 is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures [20:55:27] RECOVERY - puppet last run on cp3021 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures [20:55:28] RECOVERY - puppet last run on ganeti2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:28] RECOVERY - puppet last run on analytics1027 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [20:55:29] RECOVERY - puppet last run on analytics1028 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [20:55:37] RECOVERY - puppet last run on db1072 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [20:55:37] RECOVERY - puppet last run on ms-be1011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:38] RECOVERY - puppet last run on baham is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [20:55:38] RECOVERY - puppet last run on db2053 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:38] RECOVERY - puppet last run on mw1077 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [20:55:38] RECOVERY - puppet last run on mw1022 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [20:55:38] RECOVERY - puppet last run on mw1105 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [20:55:39] RECOVERY - puppet last run on db2050 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:39] RECOVERY - puppet last run on wtp2020 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:39] RECOVERY - puppet last run on wtp2006 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:40] RECOVERY - puppet last run on ganeti1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:41] hashar: you've got to remove the entire site.pp stanza and remove it from install_server (checks the last bit quickly) [20:55:47] RECOVERY - puppet last run on iodine is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:48] RECOVERY - puppet last run on labvirt1008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:48] RECOVERY - puppet last run on ms-fe2002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:48] RECOVERY - puppet last run on mw2204 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [20:55:48] RECOVERY - puppet last run on mw1108 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures [20:55:48] RECOVERY - puppet last run on analytics1004 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [20:55:48] RECOVERY - puppet last run on graphite1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:49] RECOVERY - puppet last run on mw1223 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [20:55:50] RECOVERY - puppet last run on mw1142 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures [20:55:50] RECOVERY - puppet last run on mw1167 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [20:55:50] RECOVERY - puppet last run on cp1067 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [20:55:51] RECOVERY - puppet last run on wtp2013 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:51] RECOVERY - puppet last run on labvirt1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:57] RECOVERY - puppet last run on cp3019 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:55:58] RECOVERY - puppet last run on mw1225 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [20:56:07] RECOVERY - puppet last run on analytics1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:56:07] RECOVERY - puppet last run on fluorine is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [20:56:07] RECOVERY - puppet last run on db2012 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:56:08] RECOVERY - puppet last run on iridium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:56:08] RECOVERY - puppet last run on wtp1010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:56:17] nope - just site.pp [20:56:17] RECOVERY - puppet last run on mw1229 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [20:56:18] RECOVERY - puppet last run on db1047 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:56:38] RECOVERY - puppet last run on labvirt1004 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [20:56:40] (03CR) 10John F. Lewis: [C: 04-1] "needs full site.pp stanza to be removed" [puppet] - 10https://gerrit.wikimedia.org/r/223175 (https://phabricator.wikimedia.org/T86658) (owner: 10Hashar) [20:56:48] PROBLEM - puppet last run on mw1152 is CRITICAL Puppet last ran 6 hours ago [20:56:50] (03PS2) 10RobH: access: grant David Causse deployment rights [puppet] - 10https://gerrit.wikimedia.org/r/222255 (owner: 10Matanya) [20:56:57] RECOVERY - puppet last run on einsteinium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:56:58] PROBLEM - HHVM rendering on mw1025 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 9.767 second response time [20:56:58] RECOVERY - puppet last run on mw1010 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [20:57:01] (03CR) 10RobH: [C: 032] access: grant David Causse deployment rights [puppet] - 10https://gerrit.wikimedia.org/r/222255 (owner: 10Matanya) [20:57:18] RECOVERY - puppet last run on mw1016 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [20:57:18] RECOVERY - puppet last run on mw1091 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [20:57:26] thank you for all of those robh ^ [20:57:27] RECOVERY - puppet last run on elastic1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:57:28] RECOVERY - puppet last run on mw1201 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:57:37] RECOVERY - puppet last run on mw1231 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:57:38] RECOVERY - puppet last run on mw2153 is OK Puppet is currently enabled, last run 54 seconds ago with 0 failures [20:57:47] RECOVERY - puppet last run on mw2175 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:57:47] RECOVERY - puppet last run on mw2154 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [20:57:48] RECOVERY - puppet last run on mw2148 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures [20:57:48] RECOVERY - puppet last run on mw2157 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [20:57:48] RECOVERY - puppet last run on mw2141 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [20:57:48] RECOVERY - puppet last run on mw2180 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures [20:57:48] RECOVERY - puppet last run on mw2188 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [20:57:48] RECOVERY - puppet last run on mw1122 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [20:57:57] RECOVERY - puppet last run on mw2115 is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures [20:57:57] RECOVERY - puppet last run on mw2185 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [20:57:57] RECOVERY - puppet last run on elastic1020 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:57:58] RECOVERY - puppet last run on bast1001 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [20:57:58] RECOVERY - puppet last run on mw1090 is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures [20:57:58] RECOVERY - puppet last run on mw2004 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [20:57:58] RECOVERY - puppet last run on mw2021 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [20:58:07] RECOVERY - puppet last run on mw2064 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:07] RECOVERY - puppet last run on mw2071 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:07] RECOVERY - puppet last run on mw1093 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:08] RECOVERY - puppet last run on mw1185 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [20:58:08] RECOVERY - puppet last run on mw2031 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:08] RECOVERY - puppet last run on mw1121 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:08] RECOVERY - puppet last run on mw2009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:09] RECOVERY - puppet last run on mw2063 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [20:58:09] RECOVERY - puppet last run on mw2010 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures [20:58:10] RECOVERY - puppet last run on mw2020 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [20:58:10] RECOVERY - puppet last run on mw2069 is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures [20:58:26] (03PS3) 10Hashar: Reclaim lanthanum: remove related puppet conf [puppet] - 10https://gerrit.wikimedia.org/r/223175 (https://phabricator.wikimedia.org/T86658) [20:58:28] RECOVERY - puppet last run on mw1219 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:28] RECOVERY - puppet last run on mw1209 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:28] RECOVERY - puppet last run on mw1139 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:47] RECOVERY - puppet last run on terbium is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [20:58:47] RECOVERY - puppet last run on mw1193 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [20:58:47] RECOVERY - puppet last run on mw1220 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [20:58:48] RECOVERY - HHVM rendering on mw1025 is OK: HTTP OK: HTTP/1.1 200 OK - 69697 bytes in 0.128 second response time [20:58:48] RECOVERY - puppet last run on mw1086 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:58:50] (03CR) 10Hashar: "I removed lanthanum entirely from site.pp (previously it still had `include standard`)." [puppet] - 10https://gerrit.wikimedia.org/r/223175 (https://phabricator.wikimedia.org/T86658) (owner: 10Hashar) [20:58:56] bed time! [20:58:57] RECOVERY - puppet last run on mw1236 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:59:08] RECOVERY - puppet last run on mw1024 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [20:59:08] RECOVERY - puppet last run on mw1241 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:59:08] RECOVERY - puppet last run on mw1112 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:59:22] how did this box get puppetized enough to give me a login but not enough to give me sudo? [20:59:43] (03CR) 10John F. Lewis: [C: 031] Reclaim lanthanum: remove related puppet conf [puppet] - 10https://gerrit.wikimedia.org/r/223175 (https://phabricator.wikimedia.org/T86658) (owner: 10Hashar) [21:00:14] andrewbogott: because the box looked and went 'andrew? He's on the naighty list' [21:00:16] :p [21:00:17] (03CR) 10BryanDavis: "Applied in beta cluster via cherry-pick" [puppet] - 10https://gerrit.wikimedia.org/r/223172 (https://phabricator.wikimedia.org/T103804) (owner: 10BryanDavis) [21:00:26] I think maybe it’s just that the first puppet run is epic and still not done [21:00:41] (03CR) 10Hashar: [C: 031] "Yup does not provide anything useful. Each time I went setting up a Gerrit in labs I did it manually." [puppet] - 10https://gerrit.wikimedia.org/r/223171 (owner: 10Chad) [21:01:11] (03CR) 10Hashar: [C: 031] Gerrit: remove ::old role [puppet] - 10https://gerrit.wikimedia.org/r/223161 (owner: 10Chad) [21:01:57] RECOVERY - puppet last run on labvirt1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [21:02:15] !log purging static-bz URL on varnish ... [21:02:15] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: MediaWiki deployment shell access request - https://phabricator.wikimedia.org/T104546#1419955 (10RobH) [21:02:19] Logged the message, Master [21:02:24] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: MediaWiki deployment shell access request - https://phabricator.wikimedia.org/T104546#1432091 (10RobH) 5stalled>3Resolved The access for sudo/root on search (T104222) , as well as your initial shell access (T104222) & deployment access (T104546) were... [21:03:04] legoktm: i think i got it purged now [21:03:16] wfm after closing browser [21:04:42] !log ori Synchronized php-1.26wmf12/thumb.php: cdc75debaf: Add Content-Length header to thumb.php error responses (duration: 00m 13s) [21:04:46] Logged the message, Master [21:06:49] (03PS1) 10Alex Monk: labs: Make it possible for AbuseFilters to block anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223179 (https://phabricator.wikimedia.org/T103060) [21:08:53] 6operations, 5Patch-For-Review: Move static-bugzilla from zirconium to ganeti - https://phabricator.wikimedia.org/T101734#1432142 (10Dzahn) done and switched over. deleted on zirconium. server from bromine.eqiad.wmnet now [21:09:19] 6operations, 5Patch-For-Review: Move static-bugzilla from zirconium to ganeti - https://phabricator.wikimedia.org/T101734#1432143 (10Dzahn) 5Open>3Resolved now also on jessie and Apache 2.4 as a side-effect [21:09:29] 6operations: Move static-bugzilla from zirconium to ganeti - https://phabricator.wikimedia.org/T101734#1432146 (10Dzahn) [21:11:09] RECOVERY - salt-minion processes on labvirt1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [21:11:45] mutante: wfm too. thanks :D [21:12:38] !log deployed parsoid version 87a746e6 [21:12:43] Logged the message, Master [21:16:58] _joe_: I found this: https://phabricator.wikimedia.org/T69928 — but that’s double GZipping – what appears to be happening is that the Content-Encoding heading isn’t being output [21:30:33] !log rebooting labvirt1005, again. Somehow virtualization is turned off again [21:30:38] Logged the message, Master [21:30:55] (03PS1) 10BryanDavis: beta: Replace deployment-logstash1 with deployment-logstash2 [puppet] - 10https://gerrit.wikimedia.org/r/223184 (https://phabricator.wikimedia.org/T101541) [21:31:47] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [21:32:58] PROBLEM - puppet last run on ms-be3002 is CRITICAL Puppet has 1 failures [21:33:37] (03CR) 10BryanDavis: "cherry-picked to beta cluster" [puppet] - 10https://gerrit.wikimedia.org/r/223184 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis) [21:35:08] PROBLEM - puppet last run on ms-be3001 is CRITICAL Puppet has 1 failures [21:37:23] (03PS1) 10BryanDavis: beta: Replace deployment-logstash1 with deployment-logstash2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223185 (https://phabricator.wikimedia.org/T101541) [21:41:38] RECOVERY - Host labvirt1005 is UPING OK - Packet loss = 0%, RTA = 1.54 ms [21:43:48] cassandra on restbase1003 needs a restart as per http://grafana.wikimedia.org/#/dashboard/db/restbase-cassandra-thread-pools and edsanders is reporting getting 500s from the api [21:45:54] * mobrovac handling rb1003 [21:46:02] thnx subbu for the report [21:46:04] mobrovac: restarted it already [21:46:20] k [21:46:29] i am becoming the PR here [21:46:29] haha [21:46:40] !log restarted cassandra instance on restbase1003; was low on memory and constantly writing small chunks [21:47:08] (03PS8) 10BBlack: tlsproxy: enable DHE-2048 FS for Android 2.x, etc. [puppet] - 10https://gerrit.wikimedia.org/r/222023 (https://phabricator.wikimedia.org/T104281) [21:47:23] its disk write rate had dropped close to zero [21:47:57] RECOVERY - puppet last run on ms-be3002 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [21:48:00] (03CR) 10BBlack: [C: 032] tlsproxy: enable DHE-2048 FS for Android 2.x, etc. [puppet] - 10https://gerrit.wikimedia.org/r/222023 (https://phabricator.wikimedia.org/T104281) (owner: 10BBlack) [21:49:58] RECOVERY - puppet last run on ms-be3001 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:50:47] (03PS1) 10Ottomata: Temporarily disable puppetization of EL processor on analytics1010 for testing [puppet] - 10https://gerrit.wikimedia.org/r/223186 [21:51:03] (03CR) 10Ottomata: [C: 032 V: 032] Temporarily disable puppetization of EL processor on analytics1010 for testing [puppet] - 10https://gerrit.wikimedia.org/r/223186 (owner: 10Ottomata) [21:54:53] whos the master(s) of chapters website hosting? [21:55:08] umm [21:55:11] master(s)? [21:55:23] what do you need exactly? I might be able to point you to the right people [21:55:56] hey domas, what was https://wikitech.wikimedia.org/wiki/Project_XX ? [21:56:36] master or masters :P wm portugal must change hosting, and i'm searching for solutions, i would like to know what wmf platform can give [21:58:05] Alchimista, wmf-hosted affiliate sites are all just mediawikis hosted in the same way as the rest of the projects [21:58:25] lol [21:58:41] Was that the move of the ParserCache from mc to mysql? [21:58:52] bits was db19, right? [21:58:58] so it isn't bits. [22:00:08] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 23.08% of data above the critical threshold [500.0] [22:00:16] Alchimista, I don't really know what you're looking for [22:00:57] it was right after NFS got moved from it? [22:01:01] I really don't remember! [22:01:44] it sounds like this page can safely be marked historical [22:02:36] domas: what was db20? [22:02:56] Krenair: yah, i know, but for example, we have our own address, and email system. with wmf can we still use our old adress? dns issues isn't my beach [22:02:57] db, were you testing WMSQL? :o [22:03:02] JohnFLewis: Do you want him to remeber the server's specs? [22:03:16] hoo: no, just the service tag is fine [22:03:23] can look the specs up with that :) [22:03:34] Alchimista, your own email system? as in, mailing lists? or private inboxes? [22:04:08] PROBLEM - Host labnet1002 is DOWN: PING CRITICAL - Packet loss = 100% [22:04:11] or group inboxes of some sort like otrs? [22:04:24] JohnFLewis: db* class hardware was frequently nabbed for special projects [22:04:37] JohnFLewis: for an extended period all JS/CSS was served off db19 [22:04:52] Alchimista, you can get mailing lists on lists.wikimedia.org, you can get group inboxes via otrs, but not private inboxes [22:04:56] Krenair: Krenair: both, we've for example info@wikimedia.pt, and personal emails [22:04:56] domas: too lazy to update the hostname? terrible :P [22:05:03] JohnFLewis: what for? [22:05:22] domas: db19 which wasn't a db [22:05:32] JohnFLewis: it was db-class hardware [22:05:50] I think if you own a domain you should be able to point it at OTRS (perhaps with a little bit of wmf ops help?), and OTRS should be able to accept mail for it [22:06:01] as for whether you can get personal queues in that, you'd have to ask the otrs admins, but I doubt it [22:06:03] but not a db machine :) but with that logic - fine :p [22:06:22] JohnFLewis: we used to run mysqls for external store on apaches [22:06:30] JohnFLewis: did we have to rename all apaches to 'dbs' then? [22:06:54] RD might be able to tell you more, Alchimista [22:07:04] I don't think OTRS was meant as a mail service was it? Then again I donno, I've never looked/messed with it. [22:07:13] thanks Krenair :) [22:07:18] domas: shall we call this discussion finished? [22:07:25] :-) [22:07:40] bblack, umm... OTRS definitely accepts mail :) [22:07:44] it's a ticketing/support system [22:07:50] it'll be easier than going 'we used to run all of Europe on america, do we rename it africa?' [22:08:03] talking about 'too lazy', yeah, that machine didn't have valid network configuration [22:08:08] we certainly use it almost entirely email-based. [22:08:09] so when it was rebooted, all wikipedia's css/js stopped loading [22:08:10] :) [22:08:20] Krenair: I meant, it's not meant as a generic mail hosting system for all purposes, as in "move your chapter domain's SMTP here", right? [22:08:29] right [22:08:38] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [22:08:53] I suspect the OTRS admins won't just let you use it for whatever your chapter needs. But I'm not the OTRS admins! [22:09:12] me either [22:09:18] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [22:09:52] 6operations, 10ops-codfw, 10Incident-20150617-LabsNFSOutage: Labstore2001 controler or shelf failure - https://phabricator.wikimedia.org/T102626#1432311 (10Aklapper) @Papaul, @Coren: Which specific items are left to do to close this task as resolved? Asking as this task has had [[ https://www.mediawiki.org/w... [22:09:58] ottomata: I think the unmerged is you? [22:10:23] sigh yes sorry [22:10:28] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [22:10:28] yet another case of forgetting to type yes [22:10:31] after typing puppet-merge [22:11:08] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [22:12:28] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, 3Discovery-Wikidata-Query-Service-Sprint: Define the details of the hardware we need to run WDQS - https://phabricator.wikimedia.org/T104879#1432315 (10Jdouglas) What are the requirements that this hardware is needed to satisfy? [22:13:08] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [22:24:26] 6operations, 3Labs-Sprint-104, 3Labs-Sprint-105: Setup/Install/Deploy labnet1002 - https://phabricator.wikimedia.org/T99701#1432338 (10Andrew) I disabled the internal nic and got the console unstuck. I assume some cables need to be switched around now, so that that the 10g interface can act as the new eth0... [22:27:37] 6operations, 10RESTBase-Cassandra, 5Patch-For-Review: consider moving Cassandra to G1GC in production - https://phabricator.wikimedia.org/T103161#1432343 (10GWicke) See also T104888 for ongoing JDK8 testing. [22:37:49] 6operations, 10ops-codfw, 10Incident-20150617-LabsNFSOutage: Labstore2001 controler or shelf failure - https://phabricator.wikimedia.org/T102626#1432367 (10Papaul) @Aklapper as Coren mentioned "I will provide an explcit wiring diagram shortly." haven' t received any diagram yet. [22:45:16] (03CR) 10Ragesoss: [C: 031] Autocreate accounts on meta, mediawiki.org, loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) (owner: 10Gergő Tisza) [22:47:29] Alchimista: About a dozen chapters actively use Wikimedia OTRS for their general information addresses. [22:47:39] I'm happy to answer any questions, as an OTRS admin, and/or you [22:47:50] can email otrs-admins@lists.wikimedia.org [22:51:38] 6operations, 6Phabricator, 10Wikimedia-Bugzilla: Sanitise a Bugzilla database dump - https://phabricator.wikimedia.org/T85141#1432418 (10Aklapper) [23:00:04] RoanKattouw ostriches Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150706T2300). [23:00:04] tgr bd808: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:19] oh I was supposed to stick a couple of things up for this [23:01:05] tgr, you around? [23:01:15] present [23:01:33] looks like you have +1s from all the right people :) [23:01:41] you set up to be able to test this? [23:01:51] yes [23:02:50] tgr, oh, merge conflict [23:04:46] (03PS3) 10Gergő Tisza: Autocreate accounts on meta, mediawiki.org, loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) [23:04:49] Krenair: rebased [23:05:15] (03CR) 10Alex Monk: [C: 032] Autocreate accounts on meta, mediawiki.org, loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) (owner: 10Gergő Tisza) [23:05:21] (03Merged) 10jenkins-bot: Autocreate accounts on meta, mediawiki.org, loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/220970 (https://phabricator.wikimedia.org/T74469) (owner: 10Gergő Tisza) [23:05:37] 7Puppet, 6Labs, 6Phabricator: On labs phabricator references security extension even though it isn't present - https://phabricator.wikimedia.org/T104904#1432451 (10mmodell) I don't quite understand why the git repo isn't getting cloned. [23:06:19] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/220970/ (duration: 00m 14s) [23:06:20] tgr, please test ^ [23:06:24] Logged the message, Master [23:10:34] Krenair: seems OK [23:12:56] tgr, shall we move on then? [23:13:02] (03PS1) 10EBernhardson: Add statsd reporting plugin [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/223202 [23:13:11] Krenair: yes, I'm done, thanks [23:13:22] (03PS2) 10Alex Monk: beta: Replace deployment-logstash1 with deployment-logstash2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223185 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis) [23:13:26] (03CR) 10Alex Monk: [C: 032] beta: Replace deployment-logstash1 with deployment-logstash2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223185 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis) [23:15:40] is jenkins broken? [23:15:53] meh [23:15:57] (03CR) 10Alex Monk: [V: 032] beta: Replace deployment-logstash1 with deployment-logstash2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223185 (https://phabricator.wikimedia.org/T101541) (owner: 10BryanDavis) [23:16:54] (03PS2) 10EBernhardson: Add statsd reporting plugin [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/223202 (https://phabricator.wikimedia.org/T100500) [23:17:03] !log krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/223185/ (duration: 00m 12s) [23:17:07] Logged the message, Master [23:18:47] Krenair: hmmm... looks like the force merge on https://gerrit.wikimedia.org/r/#/c/223195/ that legoktm needed to do (vendor version bumps always need a force right now) may have confused zuul [23:19:10] looks like it finally cleared [23:20:02] bd808, is that change synced to beta now? [23:20:38] looks like it [23:20:49] (I checked deployment-mediawiki02) [23:21:55] yeah it's on mw01 there too [23:22:28] (03PS3) 10Alex Monk: Remove wmgUseXAnalytics and wgAjaxEditStash override, other random cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221808 (https://phabricator.wikimedia.org/T31902) [23:22:33] (03CR) 10Alex Monk: [C: 032] Remove wmgUseXAnalytics and wgAjaxEditStash override, other random cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221808 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [23:22:36] and MW log events are still showing up at https://logstash-beta.wmflabs.org so looks good [23:22:39] (03Merged) 10jenkins-bot: Remove wmgUseXAnalytics and wgAjaxEditStash override, other random cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221808 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [23:24:34] checked ^ on tin and mw1017, seems fine [23:24:57] PROBLEM - puppet last run on eventlog1001 is CRITICAL puppet fail [23:25:10] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221808/ (duration: 00m 13s) [23:25:14] Logged the message, Master [23:26:01] (03PS2) 10Alex Monk: Standardise remaining ticket comments I could find [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221809 (https://phabricator.wikimedia.org/T31902) [23:26:10] (03CR) 10Alex Monk: [C: 032] Standardise remaining ticket comments I could find [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221809 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [23:26:16] (03Merged) 10jenkins-bot: Standardise remaining ticket comments I could find [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221809 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [23:26:31] nutcracker looks to be hosed on snapshot1002. Tons of "A TIMEOUT OCCURRED" errors in logstash [23:27:20] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221809/ - should be a noop, just doc changes (duration: 00m 13s) [23:27:25] Logged the message, Master [23:28:02] Actually it may be whatever mc server "db1033:lag_times" maps to that is sick [23:29:53] (03CR) 10Alex Monk: "Andrew, Yuvi: Could one of you guys comment here about the added rights please?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222776 (owner: 10Alex Monk) [23:30:19] 5126 repeats of 'Memcached error for key "db1033:lag_times" on server "/var/run/nutcracker/nutcracker.sock:0": A TIMEOUT OCCURRED' in the last hour [23:30:32] from all across the cluster it looks like [23:30:50] (03CR) 10Alex Monk: [C: 032] Update README to remove pmtpa references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222941 (owner: 10Alex Monk) [23:31:19] (03Merged) 10jenkins-bot: Update README to remove pmtpa references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222941 (owner: 10Alex Monk) [23:31:46] ori: how do I troubleshoot a sick memcached server? [23:32:39] !log krenair Synchronized README: https://gerrit.wikimedia.org/r/#/c/222941/ - ... (duration: 00m 13s) [23:32:44] Logged the message, Master [23:33:17] I think my outgoing changes list in gerrit just about fits on my screen now [23:35:03] (03CR) 10Alex Monk: [C: 032] labs: Make it possible for AbuseFilters to block anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223179 (https://phabricator.wikimedia.org/T103060) (owner: 10Alex Monk) [23:35:10] (03Merged) 10jenkins-bot: labs: Make it possible for AbuseFilters to block anywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223179 (https://phabricator.wikimedia.org/T103060) (owner: 10Alex Monk) [23:35:58] !log krenair Synchronized wmf-config/abusefilter.php: https://gerrit.wikimedia.org/r/#/c/223179/ - should be labs-only (duration: 00m 12s) [23:36:03] Logged the message, Master [23:40:08] Krenair: https://phabricator.wikimedia.org/T74469#1432515 [23:40:17] not sure if this warrants a revert [23:40:54] tgr: memcache maybe? [23:41:47] RECOVERY - puppet last run on eventlog1001 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [23:42:35] legoktm: nothing should be cached at this point as far as I can see [23:42:57] (03CR) 10Alex Monk: add HTTPS variants for wmfblog in feed whitelists (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222691 (https://phabricator.wikimedia.org/T104727) (owner: 10Jeremyb) [23:43:25] tgr, I'm not sure it does, but if you're not comfortable with leaving it enabled, then I'll revert [23:44:21] I'll poke around a bit, see if I can find the cause, if that's OK with you [23:44:45] I'd rather not leave it this way since it's very confusing UX-wise [23:45:04] ok [23:45:14] (03PS1) 10BBlack: ciphersuites: experimentally expand DHE options [puppet] - 10https://gerrit.wikimedia.org/r/223204 [23:46:11] (03CR) 10BBlack: [C: 032 V: 032] ciphersuites: experimentally expand DHE options [puppet] - 10https://gerrit.wikimedia.org/r/223204 (owner: 10BBlack) [23:47:54] (03PS2) 10Alex Monk: Add localised logo for Marathi Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221989 (https://phabricator.wikimedia.org/T103655) (owner: 10Odder) [23:48:00] (03CR) 10Alex Monk: [C: 032] Add localised logo for Marathi Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221989 (https://phabricator.wikimedia.org/T103655) (owner: 10Odder) [23:48:06] (03Merged) 10jenkins-bot: Add localised logo for Marathi Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221989 (https://phabricator.wikimedia.org/T103655) (owner: 10Odder) [23:49:37] !log krenair Synchronized w/static/images/project-logos/mrwikisource.png: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 13s) [23:49:41] Logged the message, Master [23:50:24] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 12s) [23:50:28] Logged the message, Master [23:52:38] legoktm, I think you were talking about something like https://gerrit.wikimedia.org/r/#/c/222581/ recently? [23:54:16] it looks fine to me, shall we just do it? [23:56:38] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, 3Discovery-Wikidata-Query-Service-Sprint: Define the details of the hardware we need to run WDQS - https://phabricator.wikimedia.org/T104879#1432540 (10Smalyshev) a:5Joe>3Smalyshev [23:57:20] 6operations, 6Discovery, 10Wikidata, 10Wikidata-Query-Service, 3Discovery-Wikidata-Query-Service-Sprint: Define the details of the hardware we need to run WDQS - https://phabricator.wikimedia.org/T104879#1430809 (10Smalyshev) 1. Running updates //fast// - i.e., much faster than the update stream - for co...