[00:00:04] twentyafterfour: It is that lovely time of the day again! You are hereby commanded to deploy Phabricator update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170928T0000). [00:00:04] No GERRIT patches in the queue for this window AFAICS. [00:36:30] !log upgrading phabricator, expect minimal downtime [00:36:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:42:01] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.19) (duration: 15m 24s) [02:42:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:20:09] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.1) (duration: 15m 44s) [03:20:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:26:19] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Sep 28 03:26:18 UTC 2017 (duration 6m 10s) [03:26:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:30:02] is there anyone online at this hour who could tell whether it is possible to purge the entire HTML page cache of hewiki (and a few smaller wikis), or alternatively to purge the entries generated after 1.31.0-wmf.1 was deployed to those wikis? see https://phabricator.wikimedia.org/T48947#3641618 (at the bottom of the comment) [03:30:54] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [03:31:14] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [03:32:04] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:35:04] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [03:40:04] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:42:24] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:46:14] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [03:47:37] PROBLEM - LVS HTTPS IPv6 on upload-lb.ulsfo.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:48:35] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [1000.0] [03:50:34] (03PS1) 10BBlack: depool ulsfo [dns] - 10https://gerrit.wikimedia.org/r/381158 [03:50:37] RECOVERY - LVS HTTPS IPv6 on upload-lb.ulsfo.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 905 bytes in 0.321 second response time [03:51:57] (03CR) 10BBlack: [C: 032] depool ulsfo [dns] - 10https://gerrit.wikimedia.org/r/381158 (owner: 10BBlack) [03:57:00] !log ulsfo dns-depooled at ~03:51 [03:57:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:02:54] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [04:04:05] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [04:59:05] !log tstarling@tin Started scap: updating Vector skin for gerrit 381153 [04:59:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:15:24] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2076, db2067, db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381163 [05:15:27] (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2076, db2067, db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381163 [05:18:22] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2076, db2067, db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381163 (owner: 10Marostegui) [05:19:45] !log tstarling@tin Finished scap: updating Vector skin for gerrit 381153 (duration: 20m 39s) [05:19:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:20:34] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2076, db2067, db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381163 (owner: 10Marostegui) [05:20:47] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2076, db2067, db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381163 (owner: 10Marostegui) [05:21:49] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2076 db2067 db2060 - T174509 (duration: 00m 48s) [05:21:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:21:55] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [05:22:16] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2064 and db2063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381164 [05:22:18] (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2064 and db2063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381164 [05:24:12] (03PS1) 10BryanDavis: wmcs: update rabbitmq drain_queue script [puppet] - 10https://gerrit.wikimedia.org/r/381165 (https://phabricator.wikimedia.org/T170492) [05:24:36] (03CR) 10jerkins-bot: [V: 04-1] wmcs: update rabbitmq drain_queue script [puppet] - 10https://gerrit.wikimedia.org/r/381165 (https://phabricator.wikimedia.org/T170492) (owner: 10BryanDavis) [05:25:19] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2064 and db2063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381164 (owner: 10Marostegui) [05:28:06] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2064 and db2063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381164 (owner: 10Marostegui) [05:28:15] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2064 and db2063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381164 (owner: 10Marostegui) [05:29:22] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2064 db2063 - T174509 (duration: 00m 48s) [05:29:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:29:30] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [05:32:25] (03PS1) 10Marostegui: db-codfw.php: Depool db2049 and db2041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381166 (https://phabricator.wikimedia.org/T174509) [05:34:36] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2049 and db2041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381166 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [05:36:53] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2049 and db2041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381166 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [05:37:12] (03CR) 10jenkins-bot: db-codfw.php: Depool db2049 and db2041 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381166 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [05:38:02] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2049 and db2041 to optimize templatelinks and pagelinks tables - T174509 (duration: 00m 47s) [05:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:38:08] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [05:43:46] !log Run maintain-views on labsdb1001, labsdb1003, labsdb1009, labsdb1010, labsdb1011 for hiwikivoyage - T173027 [05:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:43:52] T173027: Prepare and check storage layer for hi.wikivoyage - https://phabricator.wikimedia.org/T173027 [05:44:43] (03PS2) 10BryanDavis: wmcs: update rabbitmq drain_queue script [puppet] - 10https://gerrit.wikimedia.org/r/381165 (https://phabricator.wikimedia.org/T170492) [05:51:34] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [05:52:34] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [05:53:54] !log Optimize tables pagelinks and templatelinks on dbstore2002 (s2) - T174509 [05:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:54:01] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [05:54:28] !log Optimize tables pagelinks and templatelinks on dbstore2001 (s2) - T174509 [05:54:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:03:15] (03PS1) 10Elukey: hadoop: increase mapreduce history server heap settings [puppet] - 10https://gerrit.wikimedia.org/r/381168 [06:07:54] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:08:54] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:09:35] (03CR) 10Merlijn van Deen: "I worked on that a while ago (https://phabricator.wikimedia.org/T97081#2843075). The host seems to be still online, so running the steps u" [puppet] - 10https://gerrit.wikimedia.org/r/379239 (https://phabricator.wikimedia.org/T175964) (owner: 10Herron) [06:18:03] 10Operations, 10DBA: Decommission db1015, db1035, db1044 and db1038 - https://phabricator.wikimedia.org/T148078#3641783 (10Marostegui) [06:18:32] 10Operations, 10DBA: Decommission db1015, db1035, db1044 and db1038 - https://phabricator.wikimedia.org/T148078#2714228 (10Marostegui) db1035 scheduled for decommission: T176931 [06:22:14] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [06:23:14] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [06:25:54] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is CRITICAL: Test Transform wikitext to html returned the unexpected status 503 (expecting: 200) [06:26:54] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy [06:27:07] cp3032 this time [06:28:03] https://grafana.wikimedia.org/dashboard/db/varnish-failed-fetches?orgId=1&from=now-3h&to=now&var-datasource=esams%20prometheus%2Fops&var-cache_type=text [06:28:04] ah. [06:28:16] <_joe_> elukey: let's restart it [06:28:21] !log restart varnish backend on cp3032 [06:28:24] <_joe_> thanks [06:28:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:39:34] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [06:40:34] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:41:34] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:58:00] (03CR) 10Muehlenhoff: [C: 031] Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [06:59:45] !log installing apache updates on graphite* hosts [06:59:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:14:02] !log installing apache updates on silver/wikitech.wikimedia.org [07:14:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:16:27] (03PS2) 10Muehlenhoff: Remove role::salt::masters::labs::project_master [puppet] - 10https://gerrit.wikimedia.org/r/379763 [07:17:11] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3641811 (10MoritzMuehlenhoff) [07:26:56] (03PS2) 10Elukey: hadoop: increase mapreduce history server heap settings [puppet] - 10https://gerrit.wikimedia.org/r/381168 [07:27:32] (03CR) 10Elukey: [C: 032] hadoop: increase mapreduce history server heap settings [puppet] - 10https://gerrit.wikimedia.org/r/381168 (owner: 10Elukey) [07:27:39] !log upgrading app servers in deployment-prep to HHVM 3.18.5 [07:27:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:09] (03PS1) 10Giuseppe Lavagetto: First commit of the puppet-lint plugin to enforce the WMF style guide [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381171 [07:40:13] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3641820 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1308.eqiad.wmnet'] ``` The log can be... [07:43:12] (03PS1) 10Elukey: hieradata::regex: remove some new appservers from the downtime list [puppet] - 10https://gerrit.wikimedia.org/r/381172 (https://phabricator.wikimedia.org/T165519) [07:43:40] (03CR) 10Elukey: [C: 032] hieradata::regex: remove some new appservers from the downtime list [puppet] - 10https://gerrit.wikimedia.org/r/381172 (https://phabricator.wikimedia.org/T165519) (owner: 10Elukey) [07:48:25] (03PS7) 10Elukey: role::prometheus::ops: add kafka metrics [puppet] - 10https://gerrit.wikimedia.org/r/380744 (https://phabricator.wikimedia.org/T175922) [07:49:08] (03CR) 10Elukey: [C: 032] role::prometheus::ops: add kafka metrics [puppet] - 10https://gerrit.wikimedia.org/r/380744 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [08:00:19] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1288.eqiad.wmnet [08:00:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:02:15] PROBLEM - puppet last run on prometheus1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:09:34] RECOVERY - puppet last run on prometheus1003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [08:28:26] 10Operations, 10monitoring, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348#3263810 (10Volans) @Dzahn some comments: - `ms-fe1005` should be whitelisted until T162123 is done - I don't think puppetmasters should be whitelisted, all reimage stuff is now... [08:31:54] (03CR) 10Volans: [C: 031] "LGTM \o/" [puppet] - 10https://gerrit.wikimedia.org/r/381006 (owner: 10Muehlenhoff) [08:33:14] 10Operations, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint, 10Wikidata-Sprint-2016-11-08: Create wikibase/wikiba.se-deploy repo - https://phabricator.wikimedia.org/T176841#3641912 (10ema) [08:36:19] elukey: good morning. If you are around deployment-kafka-jumbo-1.deployment-prep.eqiad.wmflabs has a disabled puppet. I would like to complete a puppet run there to deploy cumin on that instance :] [08:36:27] elukey: then I guess there is a good reason for puppet to be disabled there [08:37:15] hashar: morning! I am going to re-enable it, doesn't need to be disabled now [08:37:29] awesome [08:39:09] just ran puppet in there! [08:39:21] (03CR) 10Volans: [C: 031] "LGTM \o/" [puppet] - 10https://gerrit.wikimedia.org/r/379763 (owner: 10Muehlenhoff) [08:40:47] elukey: and cumin works! [69/69] instances reacheable [08:42:08] (03PS2) 10Muehlenhoff: Remove role::salt::masters::labs [puppet] - 10https://gerrit.wikimedia.org/r/381006 [08:42:09] I've heard that it is a nice piece of software [08:42:12] not sure who wrote it [08:42:17] :P [08:42:25] elukey: troller! [08:42:27] :D [08:42:33] 10 seconds! [08:42:36] hahahahaah [08:43:13] yeah it is kind of nice. Has some oddities still: sudo cumin '*' 'apt-get remove --purge salt-common salt-minion' [08:43:25] that stall. Presumably because apt-get ask for a confirmation :] [08:43:45] yes, you need the -y [08:43:49] in apt [08:44:05] not sure why apt-get does not bail out since it is supposedly not a tty [08:44:08] hashar: not needed, that will be removed with https://gerrit.wikimedia.org/r/#/c/380498/ [08:44:08] or what was it, cannot remember [08:44:39] (03CR) 10Muehlenhoff: [C: 032] Remove role::salt::masters::labs [puppet] - 10https://gerrit.wikimedia.org/r/381006 (owner: 10Muehlenhoff) [08:44:42] moritzm: why twice? package { ['salt-minion', 'salt-minion'] [08:45:51] we really, really need to make it's gone: https://www.youtube.com/watch?v=aCbfMkh940Q [08:46:07] no, that's actually a typo and was meant to state salt-common, fixing that now [08:46:16] rotfl [08:47:31] (03CR) 10Hashar: [C: 04-1] "\o/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380498 (owner: 10Muehlenhoff) [08:47:39] moritzm: ah nice. And reviewed :] [08:48:11] added a : sudo cumin '*' 'rm -fR /var/cache/salt /var/log/salt /usr/lib/python2.7/dist-packages/salt' [08:48:48] hashar: Andrew rebuillt the wmcs cloud images yesterday [08:48:57] new instances should no longer have salt by default [08:49:13] (03PS2) 10Muehlenhoff: Purge salt-minion and salt-common in labs [puppet] - 10https://gerrit.wikimedia.org/r/380498 [08:49:27] hashar: careful with parallel rm -rf ;) [08:49:29] (03CR) 10jerkins-bot: [V: 04-1] Purge salt-minion and salt-common in labs [puppet] - 10https://gerrit.wikimedia.org/r/380498 (owner: 10Muehlenhoff) [08:49:46] (03PS3) 10Muehlenhoff: Purge salt-minion and salt-common in labs [puppet] - 10https://gerrit.wikimedia.org/r/380498 [08:50:59] hashar: for deployment-prep and intgration, shall we simply remove the role::salt::masters::labs::project_master and have them gain the cumin counter part or will they be replaced by an entirely new instance used for cumin? [08:51:51] moritzm: how I did the setup yesterday night and iirc by 23:30 cest everything was workgin fine [08:52:28] I have deleted both salt instances, stripped the hiera config in horizon/wikitech, merged the few patches to migrate from salt to cumin [08:52:34] so I think it is done [08:52:36] ah, perfect :-) [08:53:03] record time yesterday was 14 minutes to setup cumin on deployment-prep from reading the Help:Cumin master page until I got a host responding [08:53:30] and that is all included (creating an instance, fixing puppet, generate ssh keys, updating doc, run puppet on some agent etc) [08:53:35] and https://tools.wmflabs.org/openstack-browser/puppetclass/role::salt::masters::labs::project_master confirms, proceeding with the merge now [08:53:51] (03PS3) 10Muehlenhoff: Remove role::salt::masters::labs::project_master [puppet] - 10https://gerrit.wikimedia.org/r/379763 [08:54:06] \O/ [08:55:12] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3641968 (10hashar) [08:55:25] (03CR) 10Muehlenhoff: [C: 032] Remove role::salt::masters::labs::project_master [puppet] - 10https://gerrit.wikimedia.org/r/379763 (owner: 10Muehlenhoff) [08:55:29] thank you very much to have delayed the sunset by a few days! [08:55:35] yw! [08:56:39] 10Operations, 10MediaWiki-API, 10Wikimedia-General-or-Unknown: HTTP 500 on action=query&format=xml&list=logevents&letype=move&ledir=newer&lelimit=max on Meta - https://phabricator.wikimedia.org/T176938#3641969 (10Base) [08:56:47] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3245193 (10hashar) deployment-prep / integration WMCS projects no more rely on salt (thank you @Volans / @MoritzMuehlenhoff / @bd808 / @elukey... [08:57:53] (03PS2) 10Gehel: base: switch to logrotate::rule [puppet] - 10https://gerrit.wikimedia.org/r/381030 [09:03:24] (03CR) 10Gehel: [C: 032] base: switch to logrotate::rule [puppet] - 10https://gerrit.wikimedia.org/r/381030 (owner: 10Gehel) [09:04:45] (03PS1) 10Muehlenhoff: Remove obsolete salt classes [puppet] - 10https://gerrit.wikimedia.org/r/381176 [09:07:06] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/381176 (owner: 10Muehlenhoff) [09:10:08] (03CR) 10WMDE-leszek: "Sorry guys for a late reaction, I was away for a couple of days. This change is of course fine for WMDE. Thanks for doing this!" [puppet] - 10https://gerrit.wikimedia.org/r/379499 (owner: 10Elukey) [09:15:14] (03PS1) 10Elukey: role::kafka::jumbo::broker: allow ganglia configuration [puppet] - 10https://gerrit.wikimedia.org/r/381177 (https://phabricator.wikimedia.org/T175922) [09:15:17] (03PS2) 10Muehlenhoff: Remove obsolete salt classes [puppet] - 10https://gerrit.wikimedia.org/r/381176 [09:15:58] (03CR) 10Muehlenhoff: [C: 032] Remove obsolete salt classes [puppet] - 10https://gerrit.wikimedia.org/r/381176 (owner: 10Muehlenhoff) [09:16:41] (03PS3) 10Gehel: elasticsearch: use the logstash LVS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/380991 (https://phabricator.wikimedia.org/T175242) [09:17:52] (03CR) 10Volans: [C: 031] "The cheker is not super easy to follow and could be improved a bit in readability and comments but the structure looks sane to me." [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381171 (owner: 10Giuseppe Lavagetto) [09:21:41] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3642041 (10MoritzMuehlenhoff) [09:21:43] 10Operations, 10Operations-Software-Development, 10Goal, 10Technical-Debt, 10cloud-services-team (Kanban): Remove salt master (and related packages) from labcontrol1001 - https://phabricator.wikimedia.org/T176632#3642038 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff Salt keys have b... [09:27:24] (03CR) 10Alexandros Kosiaris: [C: 031] elasticsearch: use the logstash LVS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/380991 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [09:27:46] (03CR) 10Muehlenhoff: "I cherrypicked the patch on deployment-prep's puppetmaster and it works as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/380498 (owner: 10Muehlenhoff) [09:27:52] (03PS2) 10Elukey: role::kafka::jumbo::broker: allow ganglia configuration [puppet] - 10https://gerrit.wikimedia.org/r/381177 (https://phabricator.wikimedia.org/T175922) [09:28:32] (03CR) 10Elukey: [C: 032] role::kafka::jumbo::broker: allow ganglia configuration [puppet] - 10https://gerrit.wikimedia.org/r/381177 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [09:35:34] (03PS1) 10Elukey: Add kafka-jumbo to the list of Ganglia clusters [puppet] - 10https://gerrit.wikimedia.org/r/381178 (https://phabricator.wikimedia.org/T175922) [09:38:49] (03PS2) 10Elukey: Add kafka-jumbo to the list of Ganglia clusters [puppet] - 10https://gerrit.wikimedia.org/r/381178 (https://phabricator.wikimedia.org/T175922) [09:40:13] (03PS4) 10Muehlenhoff: Purge salt-minion and salt-common in labs [puppet] - 10https://gerrit.wikimedia.org/r/380498 [09:40:24] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] First commit of the puppet-lint plugin to enforce the WMF style guide [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381171 (owner: 10Giuseppe Lavagetto) [09:40:33] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] [09:40:48] mmm [09:41:29] mw1308, that is the new jobrunner [09:41:30] sigh [09:41:33] (03CR) 10Muehlenhoff: [C: 032] Purge salt-minion and salt-common in labs [puppet] - 10https://gerrit.wikimedia.org/r/380498 (owner: 10Muehlenhoff) [09:41:51] Warning: unable to connect to unix:///var/run/nutcracker/redis_eqiad.sock [2]: No such file or directory [09:41:54] wonderful [09:42:12] it is running puppet now so it should fix itself in a bit [09:42:26] !log upgrade pybal to 1.14.0 on lvs400[13] [09:42:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:17] (03PS5) 10Aaron Schulz: Enable logging of post-send DB updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377412 (https://phabricator.wikimedia.org/T166199) [09:45:02] (03CR) 10Aaron Schulz: [C: 032] Enable logging of post-send DB updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377412 (https://phabricator.wikimedia.org/T166199) (owner: 10Aaron Schulz) [09:45:10] (03CR) 10Elukey: [C: 032] Add kafka-jumbo to the list of Ganglia clusters [puppet] - 10https://gerrit.wikimedia.org/r/381178 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [09:45:17] (03PS3) 10Elukey: Add kafka-jumbo to the list of Ganglia clusters [puppet] - 10https://gerrit.wikimedia.org/r/381178 (https://phabricator.wikimedia.org/T175922) [09:47:27] (03Merged) 10jenkins-bot: Enable logging of post-send DB updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377412 (https://phabricator.wikimedia.org/T166199) (owner: 10Aaron Schulz) [09:47:36] (03CR) 10jenkins-bot: Enable logging of post-send DB updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377412 (https://phabricator.wikimedia.org/T166199) (owner: 10Aaron Schulz) [09:49:30] !log aaron@tin Synchronized wmf-config/CommonSettings.php: Enable logging of post-send DB updates (duration: 00m 55s) [09:49:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:23] !log upgrade pybal to 1.14.0 on lvs1009-10 [09:52:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:43] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] [09:55:46] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3642140 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1308.eqiad.wmnet'] ``` and were **ALL** successful. [09:59:41] !log added new mediawiki jobrunner - mw1308 [09:59:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:04] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1308.eqiad.wmnet [10:00:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:41] (03PS1) 10Muehlenhoff: Remove salt::minion class [puppet] - 10https://gerrit.wikimedia.org/r/381182 [10:04:30] (03Abandoned) 10ArielGlenn: draft notifier to delete salt keys of a labs instance when it's deleted [puppet] - 10https://gerrit.wikimedia.org/r/172700 (owner: 10ArielGlenn) [10:04:48] (03Abandoned) 10ArielGlenn: bump version to 0.5.7-1 [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/276191 (owner: 10ArielGlenn) [10:05:12] (03Abandoned) 10ArielGlenn: make fetch/checkout report a little clearer [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/219841 (https://phabricator.wikimedia.org/T103013) (owner: 10ArielGlenn) [10:05:32] (03Abandoned) 10ArielGlenn: 2014.7.5 patch to catch failure to retrieve/decrypt master response [debs/salt] (jessie) - 10https://gerrit.wikimedia.org/r/273875 (owner: 10ArielGlenn) [10:05:44] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:06:13] (03Abandoned) 10ArielGlenn: 2014.7.5 patch to use user specified timeout for runner publish [debs/salt] (jessie) - 10https://gerrit.wikimedia.org/r/273876 (owner: 10ArielGlenn) [10:06:25] (03Abandoned) 10ArielGlenn: bump version number for wmf build, 2014.7.5+ds-1+wm3 [debs/salt] (jessie) - 10https://gerrit.wikimedia.org/r/273879 (owner: 10ArielGlenn) [10:09:03] !log incrementally add mw1312-17 api-appservers to serve live traffic (weights will be raised incrementally) [10:09:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:32] (03PS1) 10Muehlenhoff: Remove further obsolete salt files [puppet] - 10https://gerrit.wikimedia.org/r/381185 [10:23:50] (03PS4) 10Gehel: elasticsearch: use the logstash LVS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/380991 (https://phabricator.wikimedia.org/T175242) [10:27:46] (03PS1) 10Muehlenhoff: Stop including role::salt::minions [puppet] - 10https://gerrit.wikimedia.org/r/381187 [10:27:48] (03PS1) 10Muehlenhoff: Remove role::salt::minions [puppet] - 10https://gerrit.wikimedia.org/r/381188 [10:29:02] (03CR) 10Alexandros Kosiaris: [C: 031] "I have to say I don't really know fluentd configuration files yet, the rest LGTM" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/380697 (owner: 10Giuseppe Lavagetto) [10:29:29] (03CR) 10Gehel: [C: 032] elasticsearch: use the logstash LVS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/380991 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [10:30:45] !log restart elasticsearch on relforge to validate new logging config - T175242 [10:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:51] T175242: all log producers need to use the logstash LVS endpoint - https://phabricator.wikimedia.org/T175242 [10:31:14] (03CR) 10Alexandros Kosiaris: [C: 031] "I 'll merge this later on, unless anyone objects" [puppet] - 10https://gerrit.wikimedia.org/r/380992 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [10:31:37] (03CR) 10Alexandros Kosiaris: [C: 031] mediawiki: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380994 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [10:34:52] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/381185 (owner: 10Muehlenhoff) [10:35:45] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/381187 (owner: 10Muehlenhoff) [10:36:26] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/381182 (owner: 10Muehlenhoff) [10:45:12] (03PS8) 10ArielGlenn: Template-ise rsync/public.pp hosts allow [puppet] - 10https://gerrit.wikimedia.org/r/379517 (owner: 10Reedy) [10:49:09] (03PS2) 10Muehlenhoff: Remove salt::minion class [puppet] - 10https://gerrit.wikimedia.org/r/381182 [10:55:59] (03CR) 10Muehlenhoff: [C: 032] Remove salt::minion class [puppet] - 10https://gerrit.wikimedia.org/r/381182 (owner: 10Muehlenhoff) [11:02:09] (03PS9) 10ArielGlenn: Template-ise rsync/public.pp hosts allow [puppet] - 10https://gerrit.wikimedia.org/r/379517 (owner: 10Reedy) [11:02:22] (03PS2) 10Muehlenhoff: Remove further obsolete salt files [puppet] - 10https://gerrit.wikimedia.org/r/381185 [11:06:44] (03CR) 10Muehlenhoff: [C: 032] Remove further obsolete salt files [puppet] - 10https://gerrit.wikimedia.org/r/381185 (owner: 10Muehlenhoff) [11:09:16] (03CR) 10Volans: [C: 031] "LGTM we still have 32 instances with the package but those will require manual intervention (broken apt, broken puppet, etc.)" [puppet] - 10https://gerrit.wikimedia.org/r/381188 (owner: 10Muehlenhoff) [11:09:44] (03PS2) 10Muehlenhoff: Stop including role::salt::minions [puppet] - 10https://gerrit.wikimedia.org/r/381187 [11:12:09] (03CR) 10Muehlenhoff: [C: 032] Stop including role::salt::minions [puppet] - 10https://gerrit.wikimedia.org/r/381187 (owner: 10Muehlenhoff) [11:12:19] 10Operations, 10Pybal, 10Traffic, 10fundraising-tech-ops, 10Patch-For-Review: pybal vs firewall failover - BGP session down - https://phabricator.wikimedia.org/T173028#3642334 (10ema) 05Open>03Resolved a:03ema Fixed in pybal 1.14.0. [11:18:30] (03CR) 10Elukey: [C: 031] aqs: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380992 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [11:21:07] (03PS2) 10Muehlenhoff: Remove role::salt::minions [puppet] - 10https://gerrit.wikimedia.org/r/381188 [11:21:54] (03CR) 10Zfilipin: "Is there anything blocking this from being merged?" [puppet] - 10https://gerrit.wikimedia.org/r/331856 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [11:23:18] (03CR) 10Muehlenhoff: [C: 032] Remove role::salt::minions [puppet] - 10https://gerrit.wikimedia.org/r/381188 (owner: 10Muehlenhoff) [11:24:09] (03PS2) 10Muehlenhoff: Remove beta::saltmaster::tools and /usr/local/bin/beta-apaches [puppet] - 10https://gerrit.wikimedia.org/r/379502 [11:28:45] 10Operations, 10monitoring, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348#3642386 (10jcrespo) Note our usage of neodymium/sarin is not required- I thought at the time that mixing mysql and salt quering could be interesting for scripting purposes. We c... [11:32:19] (03PS1) 10Addshore: Remove unused wmgUseWikibasePropertySuggester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381193 [11:39:34] (03CR) 10Hoo man: [C: 031] Remove unused wmgUseWikibasePropertySuggester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381193 (owner: 10Addshore) [11:39:43] yay hoo! [11:40:01] I'm making a patch for moving the extension loading of wikibase extensions to mediawiki-config right now [11:42:47] Yay :) [11:45:44] (03PS1) 10Addshore: WIP DNM: Add loading of wikibase extensions from build [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381194 (https://phabricator.wikimedia.org/T176948) [11:46:49] (03PS1) 10Addshore: Load wikibase build from mediawiki-config for beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381195 (https://phabricator.wikimedia.org/T176948) [11:46:58] aude: hoo ^^ I was thinking something like that, so we can try it on beta first :) [11:47:20] (03CR) 10jerkins-bot: [V: 04-1] WIP DNM: Add loading of wikibase extensions from build [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381194 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [11:48:52] (03CR) 10jerkins-bot: [V: 04-1] Load wikibase build from mediawiki-config for beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381195 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [11:53:36] (03CR) 10Addshore: "Need to figure out how to solve the sharedCacheKeyPrefix issue! :)" (036 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381194 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [11:55:19] hmm, hoo, could use the mediawiki version for the cache key? [11:56:23] (03CR) 10Addshore: Load wikibase build from mediawiki-config for beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381195 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [11:57:18] addshore: hm, I guess we can [11:57:34] given Wikipedias and Wikidata is not switched at a time [11:57:48] this should be fine and not cause to many misses [11:58:02] (03PS1) 10Muehlenhoff: Remove Salt Hiera settings for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/381197 [11:58:47] > echo $wgVersion; [11:58:47] 1.31.0-wmf.1 [11:58:51] Yup, that should do... [12:00:33] (03CR) 10Hoo man: WIP DNM: Add loading of wikibase extensions from build (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381194 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [12:01:08] (03PS1) 10Muehlenhoff: Remove obsolete labtest salt classes [puppet] - 10https://gerrit.wikimedia.org/r/381198 [12:07:00] (03PS2) 10Addshore: WIP DNM: Add loading of wikibase extensions from build [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381194 (https://phabricator.wikimedia.org/T176948) [12:07:31] (03PS2) 10Addshore: Load wikibase build from mediawiki-config for beta wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381195 (https://phabricator.wikimedia.org/T176948) [12:07:57] 10Operations, 10Operations-Software-Development: wmf-reimage and handling of "-n" option - https://phabricator.wikimedia.org/T144264#3642427 (10Volans) 05Open>03declined The `wmf-reimage` script has been superseeded by the `wmf-auto-reimage` scripts, see https://wikitech.wikimedia.org/wiki/Server_Lifecycle... [12:08:17] (03PS1) 10Addshore: Load wikibase build from mediawiki-config for beta wikidataclient [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381199 (https://phabricator.wikimedia.org/T176948) [12:08:34] (03CR) 10jerkins-bot: [V: 04-1] WIP DNM: Add loading of wikibase extensions from build [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381194 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [12:09:25] (03PS3) 10Addshore: WIP DNM: Add loading of wikibase extensions from build [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381194 (https://phabricator.wikimedia.org/T176948) [12:09:31] (03PS3) 10Addshore: Load wikibase build from mediawiki-config for beta wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381195 (https://phabricator.wikimedia.org/T176948) [12:09:33] (03PS1) 10Volans: wmf-auto-reimage: add support for rename [puppet] - 10https://gerrit.wikimedia.org/r/381200 (https://phabricator.wikimedia.org/T176955) [12:09:35] (03PS2) 10Addshore: Load wikibase build from mediawiki-config for beta wikidataclient [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381199 (https://phabricator.wikimedia.org/T176948) [12:11:41] (03PS1) 10Muehlenhoff: Remove profile::openstack::base::salt [puppet] - 10https://gerrit.wikimedia.org/r/381201 [12:11:54] (03CR) 10Volans: "https://gerrit.wikimedia.org/r/#/c/381200/ should remove the script completely" [puppet] - 10https://gerrit.wikimedia.org/r/381008 (owner: 10Dzahn) [12:12:29] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/381198 (owner: 10Muehlenhoff) [12:13:43] (03PS1) 10Muehlenhoff: Remove salt references in openstack mitaka wmfsink [puppet] - 10https://gerrit.wikimedia.org/r/381202 [12:14:25] (03CR) 10Volans: [C: 031] "LGTM but I'd wait WMCS feedback too" [puppet] - 10https://gerrit.wikimedia.org/r/381201 (owner: 10Muehlenhoff) [12:16:05] (03PS1) 10Muehlenhoff: Remove salt references in openstack liberty wmfsink [puppet] - 10https://gerrit.wikimedia.org/r/381203 [12:16:07] (03PS1) 10Ema: Revert "apply numa isolation to all new cp4 hosts" [puppet] - 10https://gerrit.wikimedia.org/r/381204 [12:18:47] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3642472 (10MoritzMuehlenhoff) [12:19:38] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3245193 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff This is done (there's a few pending openstack puppet cleanups, but... [12:21:07] (03PS2) 10Ema: Revert "apply numa isolation to all new cp4 hosts" [puppet] - 10https://gerrit.wikimedia.org/r/381204 [12:21:09] \o/ [12:21:45] (03CR) 10Ema: [C: 032] Revert "apply numa isolation to all new cp4 hosts" [puppet] - 10https://gerrit.wikimedia.org/r/381204 (owner: 10Ema) [12:22:17] moritzm: ok to merge your changes too? [12:22:30] Muehlenhoff: Remove role::salt::minions (f4e7164) [12:23:08] ema: oh, yes, please go ahead [12:23:37] done [12:27:28] !log reboot cp402[2-6] to disable numa isolation [12:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:42] 10Operations, 10vm-requests, 10Patch-For-Review: EQIAD: 1 VM request for package building - https://phabricator.wikimedia.org/T176607#3642493 (10akosiaris) 05Open>03Resolved boron is up and running, some test builds of my own went fine. The VM is equipped with 8 vCPUs and 16GB and 300GB disk (without LVM... [12:32:07] (03PS13) 10Elukey: Improvements to druid profiles, move druid role out of analytics_cluster [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [12:32:31] (03PS1) 10Alexandros Kosiaris: package_builder: Add dh-systemd [puppet] - 10https://gerrit.wikimedia.org/r/381209 [12:36:40] (03CR) 10Zoranzoki21: [C: 031] "@Framawiki Please rebase this patch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377864 (https://phabricator.wikimedia.org/T154371) (owner: 10Framawiki) [12:38:23] (03PS14) 10Elukey: Improvements to druid profiles, move druid role out of analytics_cluster [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [12:44:39] (03PS1) 10Ema: Revert "depool ulsfo" [dns] - 10https://gerrit.wikimedia.org/r/381211 [12:45:04] (03CR) 10Ema: [C: 032] Revert "depool ulsfo" [dns] - 10https://gerrit.wikimedia.org/r/381211 (owner: 10Ema) [12:46:49] 10Operations, 10Packaging: Deprecate host copper.eqiad.wmnet - https://phabricator.wikimedia.org/T176957#3642514 (10akosiaris) [12:47:36] 10Operations, 10Packaging: Deprecate host copper.eqiad.wmnet - https://phabricator.wikimedia.org/T176957#3642531 (10akosiaris) 05Open>03stalled p:05Triage>03Low Stalling until Oct 30th [12:48:42] !log restarting elastic on relforge servers to test a new version of the ltr plugin [12:48:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:07] (03CR) 10Alexandros Kosiaris: [C: 032] package_builder: Add dh-systemd [puppet] - 10https://gerrit.wikimedia.org/r/381209 (owner: 10Alexandros Kosiaris) [12:50:02] moritzm: godog: boron is up and running and I 've set copper to "deprecated" status [12:51:11] (03PS15) 10Elukey: Improvements to druid profiles, move druid role out of analytics_cluster [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [12:53:57] (03PS1) 10Marostegui: db-codfw.php: Depool db2053 and db2046 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381213 (https://phabricator.wikimedia.org/T174509) [12:54:41] akosiaris: thanks! I'll try it for my next package build and will move over my existing /home [12:56:14] yeah I was thinking about rsyncing over /home but then realized that of those 27GB, 26GB+ is probably build artifacts or just git repos [12:56:44] so, it's probably better to start over clean [12:57:16] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2053 and db2046 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381213 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [12:57:35] (03PS4) 10Addshore: Add loading of wikibase extensions from build [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381194 (https://phabricator.wikimedia.org/T176948) [12:57:40] (03PS4) 10Addshore: Load wikibase build from mediawiki-config for beta wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381195 (https://phabricator.wikimedia.org/T176948) [12:57:44] (03PS3) 10Addshore: Load wikibase build from mediawiki-config for beta wikidataclient [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381199 (https://phabricator.wikimedia.org/T176948) [12:57:53] This user is now online in #wikimedia-dev. I'll let you know when they show some activity (talk, etc.) [12:57:53] @notify aude [12:58:55] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2053 and db2046 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381213 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [12:59:05] (03CR) 10jenkins-bot: db-codfw.php: Depool db2053 and db2046 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381213 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate European Mid-day SWAT(Max 8 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170928T1300). [13:00:04] MatmaRex: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:14] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2053 db2046 - T174509 (duration: 00m 49s) [13:00:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:19] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [13:00:40] I can SWAT today! [13:00:49] MatmaRex: around for SWAT? [13:01:02] hi zeljkof [13:01:21] MatmaRex: want to deploy yourself, or should I do it? [13:01:36] (03CR) 10Elukey: "The refactoring looks good, I took the liberty to make a couple of small changes:" [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [13:01:50] (03PS16) 10Elukey: Improvements to druid profiles, move druid role out of analytics_cluster [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [13:02:37] zeljkof: i don't have deployment access, please do it :) [13:02:59] !log Optimize template links and pagelinks on db2053 and db2046 - T174509 [13:03:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:22] MatmaRex: on it, will ping you when the commit is at mwdebug, in a few minutes [13:04:05] (03CR) 10Elukey: [C: 032] Improvements to druid profiles, move druid role out of analytics_cluster [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [13:05:22] (03PS3) 10Ema: pybal: bind BGP TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/380508 (https://phabricator.wikimedia.org/T103882) [13:05:27] (03CR) 10Ema: [V: 032 C: 032] pybal: bind BGP TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/380508 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [13:06:08] MatmaRex: merging the commit https://integration.wikimedia.org/zuul/ [13:07:46] (03PS2) 10Gehel: base: switch to logrotate::rule (cleanup) [puppet] - 10https://gerrit.wikimedia.org/r/381031 [13:08:01] !log bounce pybal on ulsfo secondaries to pick up bgp-local-ips config change T103882 [13:08:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:51] (03CR) 10Gehel: [C: 032] base: switch to logrotate::rule (cleanup) [puppet] - 10https://gerrit.wikimedia.org/r/381031 (owner: 10Gehel) [13:10:11] !log upgrading mw1261 to HHVM 3.18.5 [13:10:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:05] !log Drop  redundant indexes from pagelinks and templatelinks on s4 - T174509 [13:12:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:12] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [13:13:19] !log bounce pybal on ulsfo primaries to pick up bgp-local-ips config change T103882 [13:13:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:04] 10Operations, 10MediaWiki-API, 10Wikimedia-General-or-Unknown: HTTP 500 on action=query&format=xml&list=logevents&letype=move&ledir=newer&lelimit=max on Meta - https://phabricator.wikimedia.org/T176938#3642620 (10Zoranzoki21) 05Open>03Resolved All is ok now [13:15:30] MatmaRex: finally merged, stand by https://gerrit.wikimedia.org/r/#/c/381210/ [13:15:59] standing by :) [13:16:00] 10Operations, 10fundraising-tech-ops, 10netops: remove fundraising firewall rules related to ganglia - https://phabricator.wikimedia.org/T176319#3642627 (10Jgreen) a:03Jgreen [13:18:38] MatmaRex: it should be at mwdebug1002, please test and let me know if I can continue [13:19:30] looking [13:22:26] zeljkof: one more minute, turns out the wiki i wanted to test on has different configuration that doesn't trigger this. grumble [13:22:41] MatmaRex: no rush, take your time :) [13:26:22] !log Optimize table on s4 codfw master (db2051), with replication (lag might be generated) - T174509 [13:26:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:28] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [13:31:28] zeljkof: ok, this is very frustrating, i can't get this stupid code to run [13:31:55] MatmaRex: is there anything I could help with? [13:32:06] the code is only at mwdebug1002 so far [13:32:12] do you know how to test there? [13:32:44] docs: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [13:34:11] (03PS1) 10BBlack: numa: remove isolcpus explicitly in non-isolate mode [puppet] - 10https://gerrit.wikimedia.org/r/381214 [13:35:34] (03CR) 10BBlack: [C: 032] numa: remove isolcpus explicitly in non-isolate mode [puppet] - 10https://gerrit.wikimedia.org/r/381214 (owner: 10BBlack) [13:35:36] zeljkof: yes, i see the new code [13:35:46] !log uploaded HHVM 3.18.5 for jessie-wikimedia to apt.wikimedia.org [13:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:18] zeljkof: so actually, it looks like this entire feature i'm trying to fix is already harmlessly broken and does not work. i think you can safely finish the sync, it's just a no-op. eh [13:36:19] MatmaRex: so the code is there, but it does not want to run? :) [13:36:32] MatmaRex: ok, deploying then [13:37:34] !log upgrading mw1262 to mw1265 (canary app servers) to HHVM 3.18.5 [13:37:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:12] !log zfilipin@tin Synchronized php-1.31.0-wmf.1/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.init.js: SWAT: [[gerrit:381210|ve.init.mw.DesktopArticleTarget: Remove hack for reversed tabs in RTL in Vector (T50017)]] (duration: 00m 50s) [13:38:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:16] T50017: VisualEditor: In RTL, the "Edit" tab appears down-tab (to the left) rather than up-tab (to the right) of "Edit source" - https://phabricator.wikimedia.org/T50017 [13:38:58] MatmaRex: deployed, please check, if possible [13:39:57] no further commits, so... [13:40:04] !log EU SWAT finished [13:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:41] thanks zeljkof [13:40:54] MatmaRex: thanks for releasing with #releng! ;) [13:44:23] (03CR) 10Pmiazga: Implement Schema:Print purging strategy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395) (owner: 10Bmansurov) [13:45:29] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1035 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381216 (https://phabricator.wikimedia.org/T176931) [13:45:54] (03CR) 10Bmansurov: Implement Schema:Print purging strategy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395) (owner: 10Bmansurov) [13:50:20] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Remove db1035 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381216 (https://phabricator.wikimedia.org/T176931) (owner: 10Marostegui) [13:50:37] PROBLEM - DPKG on mw1262 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:51:37] RECOVERY - DPKG on mw1262 is OK: All packages OK [13:54:17] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1035 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381216 (https://phabricator.wikimedia.org/T176931) (owner: 10Marostegui) [13:55:59] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Remove db1035 from config as it will be decommissioned - T176931 (duration: 00m 50s) [13:56:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:04] T176931: Decommission db1035 - https://phabricator.wikimedia.org/T176931 [13:56:53] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove db1035 from config as it will be decommissioned - T176931 (duration: 00m 49s) [13:56:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:19] (03PS1) 10Marostegui: mariadb: Remove db1035 from config files [puppet] - 10https://gerrit.wikimedia.org/r/381218 (https://phabricator.wikimedia.org/T176931) [13:59:15] (03CR) 10Marostegui: "Puppet compiler looks happy: https://puppet-compiler.wmflabs.org/compiler02/8081/" [puppet] - 10https://gerrit.wikimedia.org/r/381218 (https://phabricator.wikimedia.org/T176931) (owner: 10Marostegui) [14:08:25] (03PS2) 10Rush: Remove obsolete labtest salt classes [puppet] - 10https://gerrit.wikimedia.org/r/381198 (owner: 10Muehlenhoff) [14:11:03] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1035 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381216 (https://phabricator.wikimedia.org/T176931) (owner: 10Marostegui) [14:11:12] (03CR) 10Rush: [C: 032] Remove obsolete labtest salt classes [puppet] - 10https://gerrit.wikimedia.org/r/381198 (owner: 10Muehlenhoff) [14:11:38] PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [14:12:28] ^that's harmless, icinga check ran during upgrade (while depooled) [14:12:45] (03PS2) 10Rush: Remove profile::openstack::base::salt [puppet] - 10https://gerrit.wikimedia.org/r/381201 (owner: 10Muehlenhoff) [14:14:48] RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [14:21:33] (03PS1) 10Elukey: Add fake druid passwords [labs/private] - 10https://gerrit.wikimedia.org/r/381220 [14:21:48] (03CR) 10Elukey: [V: 032 C: 032] Add fake druid passwords [labs/private] - 10https://gerrit.wikimedia.org/r/381220 (owner: 10Elukey) [14:22:44] (03PS2) 10Alexandros Kosiaris: Decouple profile::ci::docker and zuul-cloner install [puppet] - 10https://gerrit.wikimedia.org/r/379727 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:22:48] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Decouple profile::ci::docker and zuul-cloner install [puppet] - 10https://gerrit.wikimedia.org/r/379727 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:23:59] (03PS2) 10Alexandros Kosiaris: Decouple profile::ci::docker and worker_localhost [puppet] - 10https://gerrit.wikimedia.org/r/379728 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:24:42] (03PS1) 10Elukey: role::druid::analytics::worker: split properties into public/private [puppet] - 10https://gerrit.wikimedia.org/r/381221 (https://phabricator.wikimedia.org/T176223) [14:25:10] (03CR) 10Alexandros Kosiaris: [C: 032] Decouple profile::ci::docker and worker_localhost [puppet] - 10https://gerrit.wikimedia.org/r/379728 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:26:23] (03PS2) 10Marostegui: mariadb: Remove db1035 from config files [puppet] - 10https://gerrit.wikimedia.org/r/381218 (https://phabricator.wikimedia.org/T176931) [14:26:58] (03CR) 10Rush: [C: 032] Remove profile::openstack::base::salt [puppet] - 10https://gerrit.wikimedia.org/r/381201 (owner: 10Muehlenhoff) [14:27:40] (03PS3) 10Hashar: Decouple profile::ci::docker and arcanist install [puppet] - 10https://gerrit.wikimedia.org/r/379726 (https://phabricator.wikimedia.org/T176267) [14:27:52] (03PS3) 10Marostegui: mariadb: Remove db1035 from config files [puppet] - 10https://gerrit.wikimedia.org/r/381218 (https://phabricator.wikimedia.org/T176931) [14:28:01] haha such a long queue to merge puppet today eh :p [14:28:33] !log upgrading mwdebug* to HHVM 3.18.5 [14:28:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:59] 10Operations, 10fundraising-tech-ops, 10netops: reconfigure networking on frack-eqiad management interfaces - https://phabricator.wikimedia.org/T176972#3643004 (10Jgreen) [14:31:35] (03CR) 10Marostegui: [C: 032] mariadb: Remove db1035 from config files [puppet] - 10https://gerrit.wikimedia.org/r/381218 (https://phabricator.wikimedia.org/T176931) (owner: 10Marostegui) [14:31:53] !log upgrading mw1276 (API canary) to HHVM 3.18.5 [14:31:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:33] (03PS1) 10Marostegui: site.php: Add db1035 to spare [puppet] - 10https://gerrit.wikimedia.org/r/381224 (https://phabricator.wikimedia.org/T176931) [14:34:37] PROBLEM - DPKG on mw1276 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:34:56] (03PS3) 10Andrew Bogott: wmcs: update rabbitmq drain_queue script [puppet] - 10https://gerrit.wikimedia.org/r/381165 (https://phabricator.wikimedia.org/T170492) (owner: 10BryanDavis) [14:35:37] RECOVERY - DPKG on mw1276 is OK: All packages OK [14:35:51] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I am not entirely sold on the global hiera variable. I see it get's reused in multiple places but we tend to avoid global hiera variables " (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/379729 (owner: 10Hashar) [14:36:03] (03CR) 10Marostegui: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8084/" [puppet] - 10https://gerrit.wikimedia.org/r/381224 (https://phabricator.wikimedia.org/T176931) (owner: 10Marostegui) [14:36:40] (03PS3) 10Alexandros Kosiaris: contint: docker-ce on labs docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:37:01] (03CR) 10jerkins-bot: [V: 04-1] contint: docker-ce on labs docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:37:10] (03CR) 10Alexandros Kosiaris: [C: 031] "Would +2ed but this is not readily mergeable yet due to the dependencies" [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:37:18] :( [14:37:49] (03CR) 10Andrew Bogott: [C: 032] wmcs: update rabbitmq drain_queue script [puppet] - 10https://gerrit.wikimedia.org/r/381165 (https://phabricator.wikimedia.org/T170492) (owner: 10BryanDavis) [14:37:55] (03PS1) 10Marostegui: s3.hosts: Remove db1035 [software] - 10https://gerrit.wikimedia.org/r/381225 (https://phabricator.wikimedia.org/T176931) [14:37:57] (03PS4) 10Andrew Bogott: wmcs: update rabbitmq drain_queue script [puppet] - 10https://gerrit.wikimedia.org/r/381165 (https://phabricator.wikimedia.org/T170492) (owner: 10BryanDavis) [14:37:59] (03PS4) 10Alexandros Kosiaris: Decouple profile::ci::docker and arcanist install [puppet] - 10https://gerrit.wikimedia.org/r/379726 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:38:04] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Decouple profile::ci::docker and arcanist install [puppet] - 10https://gerrit.wikimedia.org/r/379726 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:38:32] (03PS3) 10Alexandros Kosiaris: Decouple profile::ci::docker and zuul-cloner install [puppet] - 10https://gerrit.wikimedia.org/r/379727 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:38:35] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Decouple profile::ci::docker and zuul-cloner install [puppet] - 10https://gerrit.wikimedia.org/r/379727 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:38:53] (03PS3) 10Alexandros Kosiaris: Decouple profile::ci::docker and worker_localhost [puppet] - 10https://gerrit.wikimedia.org/r/379728 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:38:56] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Decouple profile::ci::docker and worker_localhost [puppet] - 10https://gerrit.wikimedia.org/r/379728 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:39:10] (03PS1) 10Elukey: Fix typo in druid private properties [labs/private] - 10https://gerrit.wikimedia.org/r/381226 [14:39:26] (03CR) 10Elukey: [V: 032 C: 032] Fix typo in druid private properties [labs/private] - 10https://gerrit.wikimedia.org/r/381226 (owner: 10Elukey) [14:39:30] 10Operations, 10MediaWiki-API, 10Wikimedia-General-or-Unknown: HTTP 500 on action=query&format=xml&list=logevents&letype=move&ledir=newer&lelimit=max on Meta - https://phabricator.wikimedia.org/T176938#3641969 (10Anomie) I can't say for certain that it's related, but I see suggestive messages in the logs at... [14:41:55] (03PS2) 10Elukey: role::druid::analytics::worker: split properties into public/private [puppet] - 10https://gerrit.wikimedia.org/r/381221 (https://phabricator.wikimedia.org/T176223) [14:42:12] !log Remove db1035 from tendril - T176931 [14:42:15] (03PS5) 10Andrew Bogott: wmcs: update rabbitmq drain_queue script [puppet] - 10https://gerrit.wikimedia.org/r/381165 (https://phabricator.wikimedia.org/T170492) (owner: 10BryanDavis) [14:42:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:19] T176931: Decommission db1035 - https://phabricator.wikimedia.org/T176931 [14:42:46] (03CR) 10Marostegui: [C: 032] s3.hosts: Remove db1035 [software] - 10https://gerrit.wikimedia.org/r/381225 (https://phabricator.wikimedia.org/T176931) (owner: 10Marostegui) [14:43:36] (03Merged) 10jenkins-bot: s3.hosts: Remove db1035 [software] - 10https://gerrit.wikimedia.org/r/381225 (https://phabricator.wikimedia.org/T176931) (owner: 10Marostegui) [14:44:13] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8085/druid1001.eqiad.wmnet/ looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/381221 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [14:44:28] (03PS3) 10Elukey: role::druid::analytics::worker: split properties into public/private [puppet] - 10https://gerrit.wikimedia.org/r/381221 (https://phabricator.wikimedia.org/T176223) [14:45:42] !log installing apache updates on puppet mastes (will trigger a few failing puppet runs, but ircecho disabled temporarily) [14:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:15] (03CR) 10Ottomata: "Why can't we require profile::druid::common from a service class? The class DOES require profile::druid::common, or it won't work. Why s" [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [14:48:20] (03CR) 10Ottomata: "Also, what's wrong with renaming the network constants to match the cluster name. AND, why hieradata in eqiad? If we were to set up a co" [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [14:48:22] (03PS2) 10Rush: Remove Salt Hiera settings for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/381197 (owner: 10Muehlenhoff) [14:49:00] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: eqiad row D switch upgrade - https://phabricator.wikimedia.org/T172459#3643083 (10ema) p:05Triage>03Normal [14:49:21] 10Operations, 10Android-app-feature-Compilations, 10Traffic, 10Wikipedia-Android-App-Backlog, 10Reading-Infrastructure-Team-Backlog (Kanban): Determine URL paths for Zim files - https://phabricator.wikimedia.org/T172148#3643085 (10ema) p:05Triage>03Normal [14:49:24] (03PS1) 10BBlack: grub::bootparam: fixup interface issues [puppet] - 10https://gerrit.wikimedia.org/r/381227 [14:49:27] (03CR) 10Ottomata: role::druid::analytics::worker: split properties into public/private (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/381221 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [14:49:48] 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Traffic, and 2 others: Determine how to upload Zim files to Swift infrastructure - https://phabricator.wikimedia.org/T172123#3643087 (10ema) p:05Triage>03Normal [14:49:49] (03CR) 10jerkins-bot: [V: 04-1] grub::bootparam: fixup interface issues [puppet] - 10https://gerrit.wikimedia.org/r/381227 (owner: 10BBlack) [14:50:00] 10Operations, 10Phabricator, 10Traffic, 10Zero: Missing IP addresses for Maroc Telecom - https://phabricator.wikimedia.org/T174342#3643088 (10ema) p:05Triage>03Normal [14:50:20] 10Operations, 10MediaWiki-API, 10Wikimedia-General-or-Unknown, 10Patch-For-Review: HTTP 500 on action=query&format=xml&list=logevents&letype=move&ledir=newer&lelimit=max on Meta - https://phabricator.wikimedia.org/T176938#3643089 (10Zoranzoki21) >>! In T176938#3643062, @Anomie wrote: > I can't say for cert... [14:50:25] 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration: Like nan.wikipedia.org, redirect other nan.*.org to the proper zh-min-nan.*.org domains - https://phabricator.wikimedia.org/T173966#3643092 (10ema) p:05Triage>03Normal [14:50:30] (03CR) 10Andrew Bogott: [C: 032] Remove salt references in openstack mitaka wmfsink [puppet] - 10https://gerrit.wikimedia.org/r/381202 (owner: 10Muehlenhoff) [14:50:54] 10Operations, 10ops-ulsfo, 10Traffic, 10hardware-requests: Decom cp4005-8,13-16 (8 nodes) - https://phabricator.wikimedia.org/T176366#3643093 (10ema) p:05Triage>03Normal [14:51:20] !log Stop MySQL on db1035 as server is going to be decommissioned - T176931 [14:51:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:26] T176931: Decommission db1035 - https://phabricator.wikimedia.org/T176931 [14:51:26] (03CR) 10Andrew Bogott: [C: 032] Remove salt references in openstack liberty wmfsink [puppet] - 10https://gerrit.wikimedia.org/r/381203 (owner: 10Muehlenhoff) [14:51:42] 10Operations, 10Traffic, 10Wikimedia-Logstash: Varnish does not vary elasticsearch query by request body - https://phabricator.wikimedia.org/T174960#3643097 (10ema) p:05Triage>03Normal [14:52:05] 10Operations, 10Analytics, 10Traffic: Invalid "wikimedia" family in unique devices data due to misplaced WMF-Last-Access-Global cookie - https://phabricator.wikimedia.org/T174640#3643098 (10ema) p:05Triage>03Normal [14:52:29] 10Operations, 10Puppet, 10Cloud-Services, 10Traffic, 10Technical-Debt: Convert all of our site.pp/roles to the role/profile paradigm - https://phabricator.wikimedia.org/T159412#3643099 (10ema) p:05Triage>03Normal [14:52:43] 10Operations, 10Puppet, 10Cloud-Services, 10Traffic, 10Technical-Debt: Uniform cluster nomenclature across puppet - https://phabricator.wikimedia.org/T159411#3643100 (10ema) p:05Triage>03Normal [14:53:16] (03PS4) 10Hashar: contint: docker-ce on labs docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) [14:53:17] 10Operations, 10DNS, 10Traffic, 10User-fgiunchedi: Use DNS discovery record for deployment CNAME - https://phabricator.wikimedia.org/T164460#3643101 (10ema) p:05Triage>03Normal [14:53:35] (03PS2) 10BBlack: grub::bootparam: fixup interface issues [puppet] - 10https://gerrit.wikimedia.org/r/381227 [14:54:25] (03PS2) 10Andrew Bogott: Remove salt references in openstack liberty wmfsink [puppet] - 10https://gerrit.wikimedia.org/r/381203 (owner: 10Muehlenhoff) [14:54:45] 10Operations, 10ops-codfw, 10DBA: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3643107 (10Papaul) After reviewing all the logs , HP decide to send a new mainboard because the controller is on the mainboard. The tech will be onsite to perform the replacement on Monday 2nd between 9am and... [14:54:52] !log restarting elastic on relforge servers to reinstall prod ltr plugin [14:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:02] (03PS5) 10Hashar: contint: docker-ce on labs docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) [14:56:21] 10Operations, 10ops-eqiad, 10DBA: Decommission db1035 - https://phabricator.wikimedia.org/T176931#3643109 (10Marostegui) a:05Marostegui>03Cmjohnson db1035 is ready to be decommissioned by @Cmjohnson All the configuration references have been removed, it has been set to spare, alerts disabled and MySQL i... [14:57:10] (03CR) 10Hashar: "Cherry picked on tip of production branch to drop the dependency on ( https://gerrit.wikimedia.org/r/#/c/379729/ ). So this change is sta" [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [14:57:17] (03CR) 10Elukey: "Hey Andrew, as I wrote above (probably my fault) I have stepped into the issue of not remembering about the has in the private repo, havin" [puppet] - 10https://gerrit.wikimedia.org/r/381221 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [14:57:53] (03PS2) 10BBlack: browsersec: bump to 23% 2017-09-28 [puppet] - 10https://gerrit.wikimedia.org/r/376313 (https://phabricator.wikimedia.org/T163251) [14:58:21] 10Operations, 10ops-codfw, 10DBA: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3643117 (10Marostegui) >>! In T174764#3643107, @Papaul wrote: > After reviewing all the logs , HP decide to send a new mainboard because the controller is on the mainboard. The tech will be onsite to perform... [14:58:27] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:58:37] PROBLEM - puppet last run on mw1280 is CRITICAL: CRITICAL: Puppet has 41 failures. Last run 2 minutes ago with 41 failures. Failed resources (up to 3 shown): File[/home/milimetric],File[/home/twentyafterfour],File[/home/daniel],File[/home/esanders] [14:58:39] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [14:58:39] PROBLEM - puppet last run on etcd1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:58:44] (03CR) 10BBlack: [C: 032] browsersec: bump to 23% 2017-09-28 [puppet] - 10https://gerrit.wikimedia.org/r/376313 (https://phabricator.wikimedia.org/T163251) (owner: 10BBlack) [14:58:47] PROBLEM - puppet last run on mw1270 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:58:58] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 2 minutes ago with 4 failures. Failed resources (up to 3 shown): File[/usr/local/bin/mediawiki-firejail-lilypond],File[/etc/apache2/mods-available/setenvif.conf],File[/usr/lib/nagios/plugins/check-fresh-files-in-dir.py],File[/usr/local/bin/puppet-enabled] [14:59:08] PROBLEM - puppet last run on aqs1009 is CRITICAL: CRITICAL: Puppet has 12 failures. Last run 2 minutes ago with 12 failures. Failed resources (up to 3 shown): File[/etc/cassandra-b/prometheus_jmx_exporter.yaml],File[/etc/cassandra-b/logback-tools.xml],File[/usr/local/bin/phaste],File[/usr/local/lib/nagios/plugins/check_raid] [14:59:17] RECOVERY - puppet last run on db1064 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [14:59:22] 10Operations, 10ops-eqiad, 10fundraising-tech-ops, 10netops: connect second interface for each frack to opposite switch for each eqiad host - https://phabricator.wikimedia.org/T176975#3643120 (10Jgreen) [14:59:38] RECOVERY - puppet last run on etcd1002 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:00:17] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [15:01:47] RECOVERY - puppet last run on mw1270 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [15:01:49] (03PS6) 10Alexandros Kosiaris: contint: docker-ce on labs docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [15:01:54] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] contint: docker-ce on labs docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [15:02:18] RECOVERY - puppet last run on conf2002 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [15:02:37] (03CR) 10Elukey: "> Why can't we require profile::druid::common from a service class?" [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [15:04:32] (03PS2) 10Andrew Bogott: Remove salt references in openstack mitaka wmfsink [puppet] - 10https://gerrit.wikimedia.org/r/381202 (owner: 10Muehlenhoff) [15:04:43] 10Operations, 10ops-codfw, 10DBA: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3643143 (10Papaul) @Marostegui yes I will make sure PXE is disable. [15:05:06] 10Operations, 10ops-codfw, 10DBA: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3643144 (10Marostegui) Thank you! [15:06:39] (03CR) 10Andrew Bogott: [C: 032] Remove Salt Hiera settings for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/381197 (owner: 10Muehlenhoff) [15:06:45] (03PS3) 10Andrew Bogott: Remove Salt Hiera settings for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/381197 (owner: 10Muehlenhoff) [15:08:17] (03CR) 10Ottomata: "It feels somehow unclean to do this if we don't have to. We can just add a comment about it in the profile and in the hieradata file too." [puppet] - 10https://gerrit.wikimedia.org/r/381221 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [15:08:37] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[jenkins-deploy docker membership] [15:09:51] (03CR) 10Elukey: "> It feels somehow unclean to do this if we don't have to. We can" [puppet] - 10https://gerrit.wikimedia.org/r/381221 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [15:11:26] (03CR) 10Ottomata: "> Why can't we require profile::druid::common from a service class?" [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [15:13:12] (03PS4) 10Rush: fullstack: optionally clean up leaked VMs after a point [puppet] - 10https://gerrit.wikimedia.org/r/379388 (https://phabricator.wikimedia.org/T167556) (owner: 10Andrew Bogott) [15:14:08] (03PS1) 10Elukey: hieradata: add sites for kafka_jumbo in ganglia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/381230 (https://phabricator.wikimedia.org/T176223) [15:14:19] (03CR) 10Rush: [C: 031] "neat" [puppet] - 10https://gerrit.wikimedia.org/r/379388 (https://phabricator.wikimedia.org/T167556) (owner: 10Andrew Bogott) [15:16:17] RECOVERY - puppet last run on ores2001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [15:16:18] RECOVERY - puppet last run on cp2018 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [15:16:33] (03CR) 10Elukey: [C: 032] hieradata: add sites for kafka_jumbo in ganglia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/381230 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [15:18:48] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:18:58] RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [15:22:48] !log Ran scap pull on mwdebug1001 after experiments with T176844 [15:22:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:55] T176844: Look into truthy nt dump performance - https://phabricator.wikimedia.org/T176844 [15:24:07] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[jenkins-deploy docker membership] [15:25:48] RECOVERY - puppet last run on aqs1009 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [15:26:18] RECOVERY - puppet last run on mw1280 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:26:47] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [15:32:30] (03CR) 10Hashar: "> I am not entirely sold on the global hiera variable. I see it get's reused in multiple places but we tend to avoid global hiera variable" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379729 (owner: 10Hashar) [15:40:10] (03PS5) 10Nuria: Add cron to purge old mediawiki data snapshots [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) [15:40:15] (03PS6) 10Nuria: Add cron to purge old mediawiki data snapshots [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) [15:42:27] 10Operations, 10ops-ulsfo, 10Traffic: cp4024 kernel errors - https://phabricator.wikimedia.org/T174891#3643328 (10RobH) This did fail hardware testing. Service Tag : 3NC7KH2 Error Code : 2000-0251 Validation : 127076 I'll open a dispatch for whatever this error code is. [15:45:02] (03Draft4) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) [15:46:04] (03PS5) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) [15:46:15] (03CR) 10jerkins-bot: [V: 04-1] Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [15:46:27] PROBLEM - Nginx local proxy to apache on mw2143 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:47:15] (03PS6) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) [15:47:18] RECOVERY - Nginx local proxy to apache on mw2143 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.200 second response time [15:47:20] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1036 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381233 (https://phabricator.wikimedia.org/T176311) [15:47:26] (03CR) 10jerkins-bot: [V: 04-1] Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [15:48:53] (03Abandoned) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [15:49:11] (03CR) 10Elukey: [C: 032] Add cron to purge old mediawiki data snapshots [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) (owner: 10Nuria) [15:50:42] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643401 (10alanajjar) [15:51:15] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643413 (10alanajjar) [15:51:50] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643417 (10Marostegui) p:05Triage>03Normal When would you like to do this rename? In which timezone are you? [15:52:19] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643420 (10alanajjar) p:05Normal>03Triage [15:52:26] (03Restored) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [15:52:45] (03PS5) 10Andrew Bogott: fullstack: optionally clean up leaked VMs after a point [puppet] - 10https://gerrit.wikimedia.org/r/379388 (https://phabricator.wikimedia.org/T167556) [15:52:46] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643401 (10alanajjar) @Marostegui if you available, we can do it now? [15:53:03] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643427 (10Marostegui) >>! In T176985#3643420, @alanajjar wrote: > @Marostegui if you available, we can do it now? Sure, give me 5 minutes to get ready [15:53:37] (03PS7) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) [15:53:44] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643429 (10alanajjar) @Marostegui okay, when you being ready ping me here :) [15:53:50] (03CR) 10jerkins-bot: [V: 04-1] Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [15:53:52] (03PS6) 10Andrew Bogott: fullstack: optionally clean up leaked VMs after a point [puppet] - 10https://gerrit.wikimedia.org/r/379388 (https://phabricator.wikimedia.org/T167556) [15:54:02] 10Operations, 10ops-ulsfo, 10Traffic: cp4024 kernel errors - https://phabricator.wikimedia.org/T174891#3643431 (10BBlack) Nice. Dell.com has a page on these here: http://www.dell.com/support/manuals/us/en/19/poweredge-vrtx/servers_tsg/psaepsa-diagnostics-error-codes?guid=guid-9afeed67-a47c-4afd-83d8-04301eb... [15:54:27] (03CR) 10Andrew Bogott: [C: 032] fullstack: optionally clean up leaked VMs after a point [puppet] - 10https://gerrit.wikimedia.org/r/379388 (https://phabricator.wikimedia.org/T167556) (owner: 10Andrew Bogott) [15:55:56] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643447 (10Marostegui) >>! In T176985#3643427, @Marostegui wrote: >>>! In T176985#3643420, @alanajjar wrote: >> @Marostegui if you available, we can do it now? > >... [15:56:41] (03PS8) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) [15:56:51] (03CR) 10jerkins-bot: [V: 04-1] Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [15:58:45] 10Operations, 10fundraising-tech-ops, 10netops: reconfigure networking on frack-eqiad management interfaces - https://phabricator.wikimedia.org/T176972#3643480 (10Jgreen) [15:59:03] (03PS9) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) [15:59:14] (03CR) 10jerkins-bot: [V: 04-1] Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [16:00:04] godog, moritzm, and _joe_: It is that lovely time of the day again! You are hereby commanded to deploy Puppet SWAT(Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170928T1600). [16:00:04] thcipriani: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:11] 10Operations, 10fundraising-tech-ops, 10netops: reconfigure networking on frack-eqiad management interfaces - https://phabricator.wikimedia.org/T176972#3643004 (10Jgreen) I did frdb1003 first: /admin1-> racadm setniccfg -s 10.64.40.199 255.255.255.192 10.64.40.193 Static IP configuration enabled and modifie... [16:00:14] o/ [16:01:17] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643494 (10alanajjar) @Marostegui [[https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Rua|the progress]] [16:01:47] !log Global rename of CodeCat → Rua - T176985 [16:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:52] T176985: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985 [16:01:59] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643500 (10Marostegui) Thanks! [16:02:53] puppet swat patch is already cherry-picked on beta (which is the only domain it effects). One line change to scap config file: https://gerrit.wikimedia.org/r/#/c/378750/ [16:03:01] (03PS10) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) [16:03:11] (03CR) 10jerkins-bot: [V: 04-1] Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [16:05:45] (03PS2) 10Thcipriani: Beta: Scap: canary_dashboard_url to beta logstash [puppet] - 10https://gerrit.wikimedia.org/r/378750 (https://phabricator.wikimedia.org/T168211) [16:05:47] (03PS11) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) [16:06:00] (03CR) 10jerkins-bot: [V: 04-1] Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [16:09:03] !log cp4024 bios update in progress (has been for last 5 minutes), letting it alone for a solid 30 to let it finish [16:09:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:08] (03PS1) 10Arlolra: Update Parsoid rttest linter config [puppet] - 10https://gerrit.wikimedia.org/r/381238 [16:09:27] of course i log that and it reports back the second i do that its done [16:09:27] woo. [16:10:00] 10Operations, 10Analytics, 10hardware-requests, 10Patch-For-Review: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097#3643514 (10Nuria) [16:10:23] 10Operations, 10ops-ulsfo, 10Traffic: cp4024 kernel errors - https://phabricator.wikimedia.org/T174891#3643517 (10BBlack) Oh sorry, my comment was redundant to your edit :) [16:11:02] (03PS12) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) [16:11:34] (03CR) 10Zoranzoki21: "Thank you god!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (https://phabricator.wikimedia.org/T176979) (owner: 10Zoranzoki21) [16:12:04] (03PS13) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 [16:12:31] (03PS11) 10Dzahn: Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [16:12:57] (03PS1) 10Dzahn: screen-monitor: whitelist ms-fe1005, rm netmon2001 [puppet] - 10https://gerrit.wikimedia.org/r/381240 (https://phabricator.wikimedia.org/T165348) [16:13:09] (03CR) 10Dzahn: [C: 032] Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [16:13:58] (03CR) 10Dzahn: "thanks, now this will need part 2. this technically does NOT give access to stat1005, it just creates the user account, it's right though," [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [16:15:12] (03CR) 10Dzahn: [C: 032] screen-monitor: whitelist ms-fe1005, rm netmon2001 [puppet] - 10https://gerrit.wikimedia.org/r/381240 (https://phabricator.wikimedia.org/T165348) (owner: 10Dzahn) [16:15:19] (03PS2) 10Dzahn: screen-monitor: whitelist ms-fe1005, rm netmon2001 [puppet] - 10https://gerrit.wikimedia.org/r/381240 (https://phabricator.wikimedia.org/T165348) [16:15:25] (03PS3) 10BBlack: grub::bootparam: fixup interface issues [puppet] - 10https://gerrit.wikimedia.org/r/381227 [16:16:47] (03CR) 10BBlack: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8088/ confirms functional no-op except in the new isolcpus case (where it's a bugfix). If " [puppet] - 10https://gerrit.wikimedia.org/r/381227 (owner: 10BBlack) [16:16:58] (03PS4) 10BBlack: grub::bootparam: fixup interface issues [puppet] - 10https://gerrit.wikimedia.org/r/381227 [16:17:16] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628320 (10Dzahn) @Zoranzoki21 Thanks! Part 1 is merged. Feel free to add part 2 now that adds him to the right groups. This is just partially solved for now. [16:17:35] 10Operations, 10fundraising-tech-ops, 10netops: reconfigure networking on frack-eqiad management interfaces - https://phabricator.wikimedia.org/T176972#3643556 (10ayounsi) [16:22:47] (03CR) 10Jcrespo: [C: 031] Beta: Scap: canary_dashboard_url to beta logstash [puppet] - 10https://gerrit.wikimedia.org/r/378750 (https://phabricator.wikimedia.org/T168211) (owner: 10Thcipriani) [16:23:15] (03PS3) 10Elukey: Beta: Scap: canary_dashboard_url to beta logstash [puppet] - 10https://gerrit.wikimedia.org/r/378750 (https://phabricator.wikimedia.org/T168211) (owner: 10Thcipriani) [16:23:25] (03PS5) 10Jcrespo: Beta: Scap: canary_dashboard_url to beta logstash [puppet] - 10https://gerrit.wikimedia.org/r/378750 (https://phabricator.wikimedia.org/T168211) (owner: 10Thcipriani) [16:23:36] ups [16:23:38] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3643587 (10Zoranzoki21) >>! In T176518#3643552, @Dzahn wrote: > @Zoranzoki21 Thanks! Part 1 is merged. Feel free to add part 2 now that adds him to the right groups.... [16:23:54] jynus: I'll let you do it :) [16:24:03] nono [16:24:08] :) [16:24:17] (03CR) 10Jayprakash12345: [C: 04-1] "There are some error. Please Check Cade again." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:24:22] please do, you will have better connection [16:24:37] thcipriani: see all the opsens wants to merge your changes [16:24:37] (03CR) 10Jcrespo: [C: 032] Beta: Scap: canary_dashboard_url to beta logstash [puppet] - 10https://gerrit.wikimedia.org/r/378750 (https://phabricator.wikimedia.org/T168211) (owner: 10Thcipriani) [16:25:02] ahahah [16:25:12] elukey: exciting changes inspire folks :P [16:25:35] thcipriani, your change will revolutionize the way user interact with wikipedia [16:25:40] just wait [16:26:03] !log cp4024 has had ilom and bios firmware updaed per epsa error checking, now re-running epsa utility T174891 [16:26:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:12] T174891: cp4024 kernel errors - https://phabricator.wikimedia.org/T174891 [16:26:12] :P [16:26:22] * elukey feels inspired by Tyler [16:26:30] jynus: I sure hope it won't :) [16:27:29] (03CR) 10Zoranzoki21: "What?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:28:07] (03CR) 10Zoranzoki21: "Which error you found?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:28:43] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643613 (10Marostegui) @alanajjar any way to check the progress of the "in progress" status? :-) It's been there for a while now [16:30:01] (03CR) 10Jcrespo: [C: 04-1] "unmatched bracket?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:30:57] (03CR) 10Jayprakash12345: [C: 04-1] "On 9530 line. Add + before viwiktionary" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:31:42] (03CR) 10Jayprakash12345: [C: 04-1] "Remove line 10394 and 11162" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:32:39] lol, it tragic peope have comments involving line numbers in the ten thousands [16:34:11] (03PS1) 10Elukey: profile::druid::*: include profile::druid::common [puppet] - 10https://gerrit.wikimedia.org/r/381241 (https://phabricator.wikimedia.org/T176223) [16:35:36] (03CR) 10Jayprakash12345: [C: 04-1] "And also this patch have some exta ordinary work. Please revise them." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:35:50] (03PS1) 10Herron: Add slaporte to group analytics-privatedata-users for stat1005 access [puppet] - 10https://gerrit.wikimedia.org/r/381243 (https://phabricator.wikimedia.org/T176518) [16:36:34] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8092/druid1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/381241 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [16:36:45] (03PS2) 10Elukey: profile::druid::*: include profile::druid::common [puppet] - 10https://gerrit.wikimedia.org/r/381241 (https://phabricator.wikimedia.org/T176223) [16:37:46] (03PS14) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 [16:38:12] (03CR) 10Zoranzoki21: "I will resolve other problems for 10-15 minutes" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:40:02] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643682 (10alanajjar) @Marostegui if you see [[https://en.wiktionary.org/w/index.php?title=Special%3ACentralAuth&target=Rua|the Central Auth]] of the new name (Rua) y... [16:40:54] (03CR) 10Elukey: "> > Why can't we require profile::druid::common from a service class?" [puppet] - 10https://gerrit.wikimedia.org/r/380800 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [16:41:26] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643684 (10Marostegui) >>! In T176985#3643682, @alanajjar wrote: > @Marostegui if you see [[https://en.wiktionary.org/w/index.php?title=Special%3ACentralAuth&target=R... [16:47:27] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643701 (10alanajjar) @Marostegui I think there's a problem here! If you see [[https://en.wiktionary.org/w/index.php?limit=50&title=Special%3AContributions&contribs=... [16:48:15] (03CR) 10Zoranzoki21: ">Remove line 10394 and 11162" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:48:38] (03CR) 10Zoranzoki21: "On 9530 line added +" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:48:58] !log Created centralauth.{analytics,web}.db.svc.eqiad.wmflabs Designate CNAME (T176978) [16:49:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:04] T176978: Service name aliases in *.{analytics,web}.db.svc.eqiad.wmflabs missing for non-wiki databases - https://phabricator.wikimedia.org/T176978 [16:49:06] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643712 (10Marostegui) >>! In T176985#3643701, @alanajjar wrote: > @Marostegui I think there's a problem here! > > If you see [[https://en.wiktionary.org/w/index.php... [16:49:30] (03CR) 10Zoranzoki21: [C: 031] Add slaporte to group analytics-privatedata-users for stat1005 access [puppet] - 10https://gerrit.wikimedia.org/r/381243 (https://phabricator.wikimedia.org/T176518) (owner: 10Herron) [16:49:55] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3643715 (10Zoranzoki21) >>! In T176518#3643645, @gerritbot wrote: > Change 381243 had a related patch set uploaded (by Herron; owner: Herron): > [operations/puppet@p... [16:51:31] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643720 (10Marostegui) @MarcoAurelio could this be something related to T173419? This rename got stuck in enwiktionary (where most of the edits are) [16:51:48] (03CR) 10Jayprakash12345: [C: 04-1] "Please give up or see this once again it has mismatch bracket" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:53:48] (03Abandoned) 10Zoranzoki21: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381232 (owner: 10Zoranzoki21) [16:54:28] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643730 (10alanajjar) >>! In T176985#3643720, @Marostegui wrote: > @MarcoAurelio could this be something related to T173419? > This rename got stuck in enwiktionary (... [17:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: #bothumor I � Unicode. All rise for Services – Graphoid / Parsoid / OCG / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170928T1700). [17:00:05] No GERRIT patches in the queue for this window AFAICS. [17:00:11] (03CR) 10Dzahn: [C: 031] "makes sense, same group that zhousquared has and this was all about doing zhou's work" [puppet] - 10https://gerrit.wikimedia.org/r/381243 (https://phabricator.wikimedia.org/T176518) (owner: 10Herron) [17:00:50] (03PS2) 10Dzahn: admin: Add slaporte to analytics-privatedata-users for stat1005 access [puppet] - 10https://gerrit.wikimedia.org/r/381243 (https://phabricator.wikimedia.org/T176518) (owner: 10Herron) [17:00:51] (03CR) 10Herron: [C: 032] admin: Add slaporte to analytics-privatedata-users for stat1005 access [puppet] - 10https://gerrit.wikimedia.org/r/381243 (https://phabricator.wikimedia.org/T176518) (owner: 10Herron) [17:01:02] (03PS3) 10Herron: admin: Add slaporte to analytics-privatedata-users for stat1005 access [puppet] - 10https://gerrit.wikimedia.org/r/381243 (https://phabricator.wikimedia.org/T176518) [17:02:19] (03CR) 10Zoranzoki21: [C: 031] "All is ok.. Hello jenkins-bot, Zoranzoki21," [puppet] - 10https://gerrit.wikimedia.org/r/381243 (https://phabricator.wikimedia.org/T176518) (owner: 10Herron) [17:05:28] !log bios firmware uploaded and awaiting reboot for application on cp402[1235678] [17:05:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:05:44] (03CR) 10Herron: [C: 032] admin: Add slaporte to analytics-privatedata-users for stat1005 access [puppet] - 10https://gerrit.wikimedia.org/r/381243 (https://phabricator.wikimedia.org/T176518) (owner: 10Herron) [17:08:48] !log cp4021 - depool->reboot for bios update [17:08:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:16] (03Abandoned) 10Dzahn: puppetmaster: drop salt support from wmf-reimage [puppet] - 10https://gerrit.wikimedia.org/r/381008 (owner: 10Dzahn) [17:11:14] (03Draft2) 10Jayprakash12345: Create rollbacker group at viwiktionary. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) [17:12:29] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3643800 (10herron) >! In T176518#3643715, @Zoranzoki21 wrote: > Thank you! No problem :) @Slaporte you should now be able to access stat1005 ``` stat1005:~$ id sl... [17:13:18] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643804 (10MarcoAurelio) @Marostegui Please run mwscript showJobs.php --wiki=enwiktionary on terbium so we can know if there are queued jobs there. [17:13:54] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643805 (10Marostegui) >>! In T176985#3643804, @MarcoAurelio wrote: > @Marostegui Please run > > mwscript showJobs.php --wiki=enwiktionary > > on terbium so we ca... [17:15:28] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643809 (10Marostegui) I saw that this query timed out on enwiktionary - on the logs: ``` SELECT log_timestamp FROM `logging` WHERE log_user_text = 'CodeCat' AND... [17:15:53] (03CR) 10Brian Wolff: [C: 031] "Security is good with adding 2FA to this group" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379985 (https://phabricator.wikimedia.org/T176554) (owner: 10Ladsgroup) [17:16:43] marostegui: hi [17:16:52] marostegui: with --group which is the output? [17:16:57] let me see [17:17:15] I editted the message shortly after my bad sorry [17:17:35] tabbycat: https://phabricator.wikimedia.org/P6057 ? [17:18:20] okay marostegui so there's no localRenameJobs queued [17:18:31] there's a maintenance script to use to fix this [17:18:40] let me check [17:18:49] thanks :) [17:19:32] docs says we have to wait at least 3 hours before using it because failed jobs will restart themselves [17:19:38] ah [17:19:48] I'm not sure we've to wait here since there's no job at all [17:20:02] did you check fatalmonitor or exception.log? [17:20:30] No, i was checking db logs [17:20:40] to look for db errors for enwiktionary [17:20:55] !log cp4027 - depool->reboot for bios updates, isolcpus fix [17:21:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:08] in any case: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiktionary --logwiki=metawiki "CodeCat" "Rua" is the magic command to unstuck the rename job [17:21:17] shall we try it? [17:21:26] (03CR) 10Zoranzoki21: [C: 031] "I seen what is problem with my patch now.. OK, this is good. CR: +1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) (owner: 10Jayprakash12345) [17:21:50] if legoktm approves [17:22:01] (due to skipping the 3 hours limit) [17:22:11] let me update the ticket btw [17:22:26] hi [17:22:32] tabbycat: what's wrong? [17:22:37] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643848 (10MarcoAurelio) (conversation being continued on IRC) [17:22:46] MatmaRex: just to be clear: are we good to roll forward today with all wikis? The cache busting still needs to happen (cc bblack ) but afaict the underlying issue is resolved, right? [17:22:53] (03CR) 10Zoranzoki21: [C: 031] "Note: With change on this patch (adding rollbacker group at viwiktionary) is ok, but file InitaliseSettings.php need cleaning.." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) (owner: 10Jayprakash12345) [17:22:53] legoktm: stuck global rename, no localrenamejob ever registered for enwiktionary [17:23:03] +200k edits there (...) [17:23:21] fatalmonitor and exception.log does not show anything marostegui ? [17:23:39] and showJobs.php does not show any rename jobs there [17:23:40] greg-g: yes, tim deployed it [17:23:43] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643849 (10Marostegui) brief update: @MarcoAurelio and myself have been chatting on IRC ``` ˜/tabbycat 19:16> marostegui: hi ˜/tabbycat 19:16> marostegui: with --grou... [17:23:47] so I was thinking mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiktionary --logwiki=metawiki "CodeCat" "Rua" [17:23:53] what's the output of mwscript showJobs.php --wiki=enwiktionary --group ? [17:24:00] MatmaRex: thanks [17:24:08] (03PS1) 10BryanDavis: wmcs: wikireplica_dns: add support for non-wikidb CNAMES [puppet] - 10https://gerrit.wikimedia.org/r/381260 (https://phabricator.wikimedia.org/T176978) [17:24:14] legoktm: https://phabricator.wikimedia.org/P6057 [17:24:20] that [17:24:36] that's fishy [17:24:51] yes, I think it should be safe to resume then [17:25:17] mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiktionary --logwiki=metawiki "CodeCat" "Rua" then? [17:25:20] legoktm: okay thanks [17:25:29] on terbium [17:25:51] !log Run fixStuckGlobalRename.php --wiki=enwiktionary --logwiki=metawiki "CodeCat" "Rua" on terbium - T176985 [17:25:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:57] T176985: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985 [17:26:07] and now let's see if the jobs are running :) [17:26:12] done [17:26:32] https://phabricator.wikimedia.org/P6057#33473 [17:27:04] I guess it's the enqueue one [17:27:08] ? [17:27:43] hopefully :) [17:28:16] i can see stuff going on on the master [17:28:18] related to the rename [17:28:23] we will see if it finishes [17:28:56] marostegui: https://wikitech.wikimedia.org/wiki/Stuck_global_renames can be useful next time [17:29:10] feel free to add there :) [17:29:28] I'm checking on-wiki logs [17:29:47] (03PS1) 10RobH: labs-announce to cloud-announce migration alias [puppet] - 10https://gerrit.wikimedia.org/r/381262 (https://phabricator.wikimedia.org/T175191) [17:29:59] not really sure it is doing anything yet [17:30:07] can't see anything [17:30:29] (03CR) 10Zoranzoki21: [C: 031] labs-announce to cloud-announce migration alias [puppet] - 10https://gerrit.wikimedia.org/r/381262 (https://phabricator.wikimedia.org/T175191) (owner: 10RobH) [17:30:30] anything in showJobs ? [17:30:46] root@terbium:~# mwscript showJobs.php --wiki=enwiktionary [17:30:46] 8708 [17:30:50] !log arlolra@tin Started deploy [parsoid/deploy@84342fe]: Updating Parsoid to 2f4b9a8c [17:30:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:31:01] there are still 2 enqueued [17:31:03] not active [17:31:11] so maybe it is still working out its magic [17:31:39] I'll ask people not to continue renaming until this is resolved [17:32:30] yeah, not sure this is going to work out :( [17:32:37] we can always leave it for hours there [17:32:45] let me check if hte query timedout again [17:32:48] !log cp4022 - depool->reboot for bios updates, isolcpus fix [17:32:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:34:15] (03CR) 10RobH: [C: 032] labs-announce to cloud-announce migration alias [puppet] - 10https://gerrit.wikimedia.org/r/381262 (https://phabricator.wikimedia.org/T175191) (owner: 10RobH) [17:34:23] no, so far no queries timing out [17:34:25] on logtash [17:34:42] (03PS1) 10Cmjohnson: adding dns entries for flerovium [dns] - 10https://gerrit.wikimedia.org/r/381265 [17:34:55] oh root@terbium, that's major words bro :P [17:35:08] (03PS1) 10Chad: Group2 to 1.31.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381266 [17:35:09] xddd [17:35:10] (03CR) 10Chad: [C: 04-2] Group2 to 1.31.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381266 (owner: 10Chad) [17:35:28] always check with --group, otherwise we don't know which jobs are running [17:35:48] yeah, forgot that time [17:35:53] but still enqueue [17:35:57] no more details about it [17:36:11] on deployment-tin I'm wikiadmin@... [17:36:26] btw [17:37:00] this was the ouput of the fixStuckGlobalRename.php: https://phabricator.wikimedia.org/P6057#33474 [17:37:00] heh, a rename just happened at en.wiktionary [17:37:13] haha [17:37:15] not the one we want! [17:37:24] maybe that is the one I saw on the master? [17:37:40] output looks good, if it fails again we can re-run with --ignorestatus to force [17:38:07] how can we know whether it has failed? :) [17:38:47] I supose fatalmonitor or exception.log should show [17:39:17] (03CR) 10Cmjohnson: [C: 032] adding dns entries for flerovium [dns] - 10https://gerrit.wikimedia.org/r/381265 (owner: 10Cmjohnson) [17:40:28] !log installing apache updates on cobalt [17:40:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:40:44] PROBLEM - HTTPS on netmon1002 is CRITICAL: SSL CRITICAL - Certificate librenms.wikimedia.org valid until 2017-10-01 17:40:00 +0000 (expires in 2 days) [17:41:05] ^ would expect auto-renewal [17:42:14] but normally it would do it before alert triggers.. threshold is already different for LE [17:43:04] tabbycat: nothing on fatalmonitor [17:43:15] or I don't see it :) [17:43:28] !log cp4028 - depool->reboot for bios updates, isolcpus fix [17:43:29] marostegui: yeah, it's definitelly stuck apparently [17:43:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:58] marostegui: try re-running the script with --ignorestatus this time [17:44:21] tabbycat: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiktionary --logwiki=metawiki "CodeCat" "Rua" --ignorestatus ? [17:44:41] I think --ignorestatus goes before the names in "" [17:44:53] yeah makes sense :) [17:45:12] running [17:45:49] The ouput says done already, let's see if it is true this time :) [17:45:53] [[Special:Log/renameuser]] renameuser * Global rename script * Global rename script renamed user [[User:CodeCat]] (280301 edits) to [[User:Rua]]: Per [[:m:Special:Permalink/17275795|:m:SRUC]] [17:46:01] weird [17:46:05] well [17:46:10] it says done [17:46:11] now [17:46:15] https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Rua [17:46:16] !log arlolra@tin Finished deploy [parsoid/deploy@84342fe]: Updating Parsoid to 2f4b9a8c (duration: 15m 25s) [17:46:19] so apparently went thru [17:46:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:40] And the DB master looks good, the new user is there [17:46:51] the edit count listed on https://meta.wikimedia.org/wiki/Special:CentralAuth/Rua for en.wikt looks accurate [17:47:45] (03PS1) 10Dzahn: netmon1002: re-enable Letsencrypt cert creation [puppet] - 10https://gerrit.wikimedia.org/r/381267 [17:47:46] legoktm: so --ignorestatus made the rename be done by global rename script, but pages moves are being done by the global renamer; I think it's working fine now? [17:48:04] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3643932 (10Marostegui) After more chatting and checks @marcoaurelio suggested: ``` mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwikti... [17:48:31] looks suspicious that it just took seconds to rename 200k edits [17:48:52] tabbycat: maybe it already renamed some on the first iteration? [17:49:14] f-ck --> https://meta.wikimedia.org/wiki/Special:CentralAuth/Rua [17:49:25] legoktm: all accounts are being left unnatached :| [17:49:32] (03PS2) 10Dzahn: netmon1002: re-enable Letsencrypt cert creation [puppet] - 10https://gerrit.wikimedia.org/r/381267 (https://phabricator.wikimedia.org/T159756) [17:49:48] (03PS3) 10Dzahn: netmon1002: re-enable Letsencrypt cert creation [puppet] - 10https://gerrit.wikimedia.org/r/381267 (https://phabricator.wikimedia.org/T159756) [17:50:57] tabbycat: it has definitely done stuff because I am seeing lag and the UPDATE query on s2 (enwikationary) [17:51:16] so it's renaming right? [17:51:16] (03CR) 10Dzahn: [C: 032] netmon1002: re-enable Letsencrypt cert creation [puppet] - 10https://gerrit.wikimedia.org/r/381267 (https://phabricator.wikimedia.org/T159756) (owner: 10Dzahn) [17:51:29] yes, i can see the update query [17:51:31] hope that the accounts re-attach [17:51:50] tabbycat: but the ones it already marks as done, are still not attached [17:52:10] yeah, and I never saw such a thing [17:52:11] basically all the ones being done after enwiktionary [17:52:22] and that's... serious [17:52:42] we can use uh, migrateAccount.php properly to reattach [17:52:45] (03PS1) 10RobH: labs-l to cloud alias [puppet] - 10https://gerrit.wikimedia.org/r/381268 (https://phabricator.wikimedia.org/T175190) [17:52:48] !log librenms - renewed LE TLS cert ..via puppet, after Icinga alerted and accidentally still had "do_acme: false" in Hiera [17:52:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:05] but something seems wrong [17:53:18] RECOVERY - HTTPS on netmon1002 is OK: SSL OK - Certificate librenms.wikimedia.org valid until 2017-12-27 16:52:16 +0000 (expires in 89 days) [17:53:33] maybe the --ignorestatus thing messed things up :( [17:53:43] (03CR) 10RobH: [C: 032] labs-l to cloud alias [puppet] - 10https://gerrit.wikimedia.org/r/381268 (https://phabricator.wikimedia.org/T175190) (owner: 10RobH) [17:54:16] how do you guys suggest we re-attach them then? [17:54:56] maybe wait until the rename is fully done [17:55:06] but this is an unknown scenario for me [17:55:16] seems unlikely ignorestatus did anything [17:55:31] once the rename finishes we can run migrateAccount.php to re-attach [17:55:33] jouncebot: next [17:55:33] In 0 hour(s) and 4 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170928T1800) [17:55:34] let's wait then until it is fully done it should be fast [17:55:52] legoktm: you've got the full command handy or will you run it yourself? [17:56:16] I don't have my ssh keys with me, let me figure out the command [17:56:22] thanks :) [17:56:58] https://phabricator.wikimedia.org/diffusion/ECAU/browse/master/maintenance/migrateAccount.php [17:57:06] there's also an attachAccount.php [17:57:59] !log Updated Parsoid to 2f4b9a8c (T176363, T176151, T170832) [17:58:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:06] T176151: Parsed error in linter: Cannot read property '2' of undefined - https://phabricator.wikimedia.org/T176151 [17:58:06] T176363: New high-priority Linter category for a subset of misnested tags whose behavior will change with a HTML5 parser - https://phabricator.wikimedia.org/T176363 [17:58:06] T170832: Linting reports errors detected in extensions as multiPartTemplateBlock template issues - https://phabricator.wikimedia.org/T170832 [17:58:14] oh [17:58:17] that looks even better [17:59:14] jouncebot: refresh [17:59:17] I refreshed my knowledge about deployments. [17:59:19] attachAccount seems to need some sort of a file [17:59:22] $this->addOption( 'userlist', [17:59:22] 'List of usernames to attach, one per line', true, true ); [17:59:22] $this->addOption( 'dry-run', 'Do not update database' ); [17:59:28] marostegui: echo "Rua" >> ~/tmp.txt; mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=metawiki ~/tmp.txt [17:59:35] legoktm: <3 [17:59:41] let's wait until the rename is fully done [18:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for Morning SWAT (Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170928T1800). [18:00:04] Jayprakash12345, raynor, and mutante: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:23] I'm here [18:00:25] marostegui: I suggest an incident report so if there's anything broken with global renames it be addressed, unless others think otherwise? [18:01:08] tabbycat: not sure if this is an incident report, wasn't there an stuck globalrenames task? [18:01:15] we can probably update it once we have fixed this [18:01:23] good idea [18:02:26] (03PS3) 10Dzahn: copy squid.php->reverse-proxy.php, squid-labs->reverse-proxy-staging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376337 (https://phabricator.wikimedia.org/T104148) [18:05:55] (03CR) 10Dzahn: "added to morning swat just now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376337 (https://phabricator.wikimedia.org/T104148) (owner: 10Dzahn) [18:08:36] what's up with SWAT? [18:08:55] apparently no one around? [18:09:10] (swatters I mean) [18:09:21] I can babysit the first patch [18:09:55] (03CR) 10MarcoAurelio: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) (owner: 10Jayprakash12345) [18:13:33] tabbycat: not long left to finish the rename :) [18:13:40] (03CR) 10jerkins-bot: [V: 04-1] Create rollbacker group at viwiktionary. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) (owner: 10Jayprakash12345) [18:13:43] I'd swat if my internet wasn't so darn slow. Can't wait for the office to reopen. [18:13:46] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3644046 (10MarcoAurelio) However renames happening after en.wiktionary are leaving unnatached local accounts. @Legoktm suggest running after the rename has finished:... [18:14:01] I can SWAT [18:15:07] thcipriani: I'm voluntary babbysitting first patch scheduled [18:15:10] thcipriani: I'm around [18:15:14] tabbycat: looks like the first patch needs a ' ' after // [18:15:16] but it has failed full jenkins test [18:15:30] yep, I was going to ask if you could give me a minute to fix it, please? [18:15:35] yep, np [18:15:39] thanks [18:15:55] raynor: looks like jdlrobson abandoned that patch? [18:16:13] hmm, sec [18:16:24] we had two patches, first we cherry-picked to wrong branch, one sec [18:16:48] (03PS3) 10MarcoAurelio: Create rollbacker group at viwiktionary. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) (owner: 10Jayprakash12345) [18:16:55] ah, damn, sorry, wrong gerrit id. fixing it now [18:16:56] (03PS1) 10Legoktm: admin: Remove legoktm's current ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/381272 [18:16:58] (03PS1) 10Legoktm: admin: Add legoktm's new ssh key≈ [puppet] - 10https://gerrit.wikimedia.org/r/381273 [18:17:00] (03CR) 10BryanDavis: [C: 031] "Tested from my homedir on labcontrol1001." [puppet] - 10https://gerrit.wikimedia.org/r/381260 (https://phabricator.wikimedia.org/T176978) (owner: 10BryanDavis) [18:17:01] Wrong one [18:17:07] ah, np :) [18:17:24] (03PS4) 10MarcoAurelio: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) (owner: 10Jayprakash12345) [18:17:26] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376337 (https://phabricator.wikimedia.org/T104148) (owner: 10Dzahn) [18:17:29] (03PS2) 10Legoktm: admin: Add legoktm's new ssh key [puppet] - 10https://gerrit.wikimedia.org/r/381273 [18:17:41] (03PS1) 10Krinkle: errorpage: Migrate from back-compat wmf.png to wmf-logo.png [puppet] - 10https://gerrit.wikimedia.org/r/381274 [18:17:43] (03PS1) 10Krinkle: errorpage: Set explicit height on logo image [puppet] - 10https://gerrit.wikimedia.org/r/381275 [18:18:05] thcipriani: should be fixed, waiting for jenkins [18:18:08] (03PS2) 10Krinkle: errorpage: Migrate from back-compat wmf.png to wmf-logo.png [puppet] - 10https://gerrit.wikimedia.org/r/381274 [18:18:17] (03PS2) 10Krinkle: errorpage: Set explicit height on logo image [puppet] - 10https://gerrit.wikimedia.org/r/381275 [18:18:22] thcipriani: done, I commented recheck because of unknown reason previous CI check failed [18:18:31] * thcipriani checks [18:19:04] (03Merged) 10jenkins-bot: copy squid.php->reverse-proxy.php, squid-labs->reverse-proxy-staging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376337 (https://phabricator.wikimedia.org/T104148) (owner: 10Dzahn) [18:19:09] marostegui: 44 wikis left :) [18:19:15] (03CR) 10jenkins-bot: copy squid.php->reverse-proxy.php, squid-labs->reverse-proxy-staging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376337 (https://phabricator.wikimedia.org/T104148) (owner: 10Dzahn) [18:19:23] thanks! i'm here, but this is just adding new files that i copied from existing files [18:19:27] so there is no test i think [18:19:42] tabbycat: almost there! :) [18:19:43] besides "the site is still up" [18:19:59] (03CR) 10Krinkle: "This is still cherry-picked on Beta, but abandoned here?" [puppet] - 10https://gerrit.wikimedia.org/r/312504 (https://phabricator.wikimedia.org/T146469) (owner: 10Hashar) [18:20:17] mutante: gotcha, I figured, I'll go ahead and just sync that change. [18:20:21] (03CR) 10Krinkle: "Task is closed, I've removed the cherry-pick." [puppet] - 10https://gerrit.wikimedia.org/r/312504 (https://phabricator.wikimedia.org/T146469) (owner: 10Hashar) [18:20:45] thcipriani: :) [18:21:11] Yeah, the next patch is to switch production from using the old files to the new ones, but that can happen later. [18:21:36] Especially, not during a big meeting. :-) [18:23:13] (03CR) 10RobH: [C: 032] admin: Remove legoktm's current ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/381272 (owner: 10Legoktm) [18:23:31] James_F: definitely not today, yes :) [18:23:34] (03CR) 10Krinkle: "This is also applied on Beta (although it was missing a Change-Id, appeared like a local commit). I've replaced it with a proper cherry-pi" [puppet] - 10https://gerrit.wikimedia.org/r/316512 (owner: 10Alex Monk) [18:25:18] (03PS1) 10Krinkle: puppetmaster: hacks to fix puppet logstash [puppet] - 10https://gerrit.wikimedia.org/r/381279 [18:25:22] !log thcipriani@tin Synchronized wmf-config: SWAT: [[gerrit:376337|copy squid.php->reverse-proxy.php, squid-labs->reverse-proxy-staging]] T104148 (duration: 00m 51s) [18:25:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:28] T104148: Change Squid references in Wikimedia configuration files - https://phabricator.wikimedia.org/T104148 [18:25:45] (03PS2) 10Rush: wmcs: wikireplica_dns: add support for non-wikidb CNAMES [puppet] - 10https://gerrit.wikimedia.org/r/381260 (https://phabricator.wikimedia.org/T176978) (owner: 10BryanDavis) [18:26:04] (03Abandoned) 10Krinkle: puppetmaster: hacks to fix puppet logstash [puppet] - 10https://gerrit.wikimedia.org/r/381279 (owner: 10Krinkle) [18:26:21] (03CR) 10Rush: [C: 032] wmcs: wikireplica_dns: add support for non-wikidb CNAMES [puppet] - 10https://gerrit.wikimedia.org/r/381260 (https://phabricator.wikimedia.org/T176978) (owner: 10BryanDavis) [18:26:22] marostegui: 5 y final :) [18:26:33] 3 [18:26:44] 1 [18:26:52] !log thcipriani@tin Synchronized docroot/noc: SWAT: [[gerrit:376337|copy squid.php->reverse-proxy.php, squid-labs->reverse-proxy-staging]] PART II T104148 (duration: 00m 49s) [18:26:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:00] ^ mutante is everywhere [18:27:11] finished and accounts unattached, tabbycat legoktm I will run the command suggested to attach the account [18:27:53] ok, sounds good to me :) [18:27:57] marostegui: okay [18:27:59] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0 [18:28:23] thcipriani: thanks :) [18:28:26] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) (owner: 10Jayprakash12345) [18:28:29] cause it'd be unlikely that the system tried to attach them now right? [18:28:51] I am going to run a dry-run first [18:28:59] +1 [18:29:21] 10Operations, 10fundraising-tech-ops, 10netops: reconfigure networking on frack-eqiad management interfaces - https://phabricator.wikimedia.org/T176972#3644090 (10Jgreen) Here's what I ended up doing for the HP boxes: 1) install hponcfg from HP deb repo http://downloads.linux.hpe.com/SDR/repo/mcp/ 2) hponcf... [18:30:34] Looks like we need to use --userlist [18:31:03] (03Merged) 10jenkins-bot: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) (owner: 10Jayprakash12345) [18:31:17] (03CR) 10jenkins-bot: Create rollbacker group at viwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381251 (https://phabricator.wikimedia.org/T176979) (owner: 10Jayprakash12345) [18:32:16] tabbycat: rollbacker for viwiktionary is on mwdebug1002, check please (thanks for babysitting this patch, by the way :)) [18:32:27] thcipriani: checking, np [18:33:10] thcipriani: https://gerrit.wikimedia.org/r/#/c/381259/ - main test build succeeded [18:33:25] raynor: nice, just +2'd [18:33:37] thcipriani: change looks good on mwdebug [18:33:48] tabbycat: ok, going live [18:33:56] chachi :) [18:33:58] tabbycat legoktm https://phabricator.wikimedia.org/P6057#33475 [18:33:59] let me know when to test it, I have one failing link on italian wikipedia [18:34:05] it doesn't look too good to me that dry run [18:34:12] "attached: 0" [18:34:39] marostegui: https://meta.wikimedia.org/wiki/Special:CentralAuth/Rua [18:34:52] tabbycat: it was a dry-run [18:34:57] aaah [18:35:02] maybe that's why? [18:35:04] could be [18:35:07] let's see [18:35:24] done without dry-run [18:35:27] and still the same :( [18:35:34] crappy script [18:35:36] xD [18:35:51] if ( count( $unattached ) === 0 ) { [18:35:51] $this->ok++; [18:35:51] if ( !$this->quiet ) { [18:35:51] $this->output( "OK: {$username}\n" ); [18:35:51] } [18:35:53] return; [18:35:55] } [18:36:10] so $central->listUnattached() is returning 0 [18:36:16] thing is that local logs display the accounts being "created automatically" [18:36:29] but on centralauth the dates of attachment are displayed as of "today" [18:36:41] I have to leave for ~30/40' [18:36:50] bbl [18:37:04] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:381251|Create rollbacker group at viwiktionary]] T176979 (duration: 00m 50s) [18:37:05] legoktm maybe we should try your original idea of the mgiration script? [18:37:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:09] T176979: Create rollbacker group at viwiktionary - https://phabricator.wikimedia.org/T176979 [18:37:25] no, I don't think that will work if this one didn't [18:37:29] ah :( [18:38:08] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3644119 (10Marostegui) The command to attach the accounts didn't fix the issue: ``` root@terbium:~# cat /tmp/rua.txt Rua root@terbium:~# mwscript extensions/CentralA... [18:38:33] legoktm: then I am out of ideas :( [18:42:16] !log cp4023 - depool->reboot for bios updates, isolcpus fix [18:42:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:22] raynor: ok, change is live on mwdebug1002, check please [18:43:22] ok [18:44:07] hmm, no difference [18:44:19] I'm still getting http_503 [18:44:48] * thcipriani ensures the code made it [18:45:10] mwdebug1002 [Wc1CpgpAAC4AADPz7PQAAAAO] /wiki/Discussioni_utente:FrancescoMissarino ErrorException from line 749 of /srv/mediawiki/php-1.30.0-wmf.19/skins/MinervaNeue/includes/skins/SkinMinerva.php: PHP Error: Argument 1 passed to SkinMinerva::getRevisionEditor() must be an instance of Revision, null given [18:45:23] this error should go away [18:46:12] raynor: hrm, code is definitely up-to-date on mwdebug1002 :\ [18:46:28] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3644178 (10Slaporte) 05Open>03Resolved Thanks, working great! [18:46:43] 10Operations, 10fundraising-tech-ops, 10netops: reconfigure networking on frack-eqiad management interfaces - https://phabricator.wikimedia.org/T176972#3644180 (10Jgreen) 05Open>03Resolved This is done, passwords are updated too. [18:47:01] raynor: oh! you're testing on a wiki that's on wmf.19 [18:47:02] thcipriani: give me 2 mins, I'll check the code once again [18:47:16] but the skin change was for 1.31.wmf.1 [18:47:29] https://tools.wmflabs.org/versions/ [18:47:34] yup, the italian wiki is broken [18:48:07] we should have similar error for english one, let me find the url [18:48:09] hrm well wmf.1 should hit the wikipedias later today, it's on all wikis except wikipedias right now [18:48:17] and also - what should I do to fix the italian? [18:48:23] (03CR) 10MarkTraceur: [C: 031] "This would be nice to push through before we go to Commons, where the chance of having a big queue is much greater." [puppet] - 10https://gerrit.wikimedia.org/r/381045 (https://phabricator.wikimedia.org/T166699) (owner: 10Gilles) [18:48:28] (03PS1) 10BBlack: Software decom for cp4005-8,13-16 [puppet] - 10https://gerrit.wikimedia.org/r/381285 (https://phabricator.wikimedia.org/T176366) [18:48:33] the only two wikipedias that wmf.1 are on are cawiki and hewiki [18:48:57] I saw those errors only on enwiki and itwiki [18:49:38] thcipriani: we also have a patch for wmf.19 [18:49:44] (the first one which was broken) - https://gerrit.wikimedia.org/r/#/c/381258/ [18:49:56] (03CR) 10BBlack: [C: 032] Software decom for cp4005-8,13-16 [puppet] - 10https://gerrit.wikimedia.org/r/381285 (https://phabricator.wikimedia.org/T176366) (owner: 10BBlack) [18:50:03] Can I restore it and add it to current SWAT? [18:50:09] raynor: sure [18:50:43] raynor: up to you if you want to do the older branch [18:50:51] raynor: is the same change for wmf.1 change fine? Can it be sync'd out? [18:51:04] raynor: it will only be up for a few more hours [18:51:05] yes, it can, but I'm not able to test it right now [18:51:09] FWIW, the train should make wmf.19 obsolete in 10ish minutes [18:51:29] raynor: i thought you can point itwiki at the old branch.. but maybe i was wrong [18:51:30] * thcipriani hasn't checked the blockers though [18:51:34] legoktm: any ideas on what to do next? [18:51:37] ok, so lets close this task and I'll test it later, change is pretty simple, will not cause problems [18:51:57] ok, I'll sync that out, post-train you should be able to to test on itwiki [18:52:10] awesome, lets do that [18:53:14] marostegui: not sure. once I get shell access back (https://gerrit.wikimedia.org/r/#/c/381273/) I can probably poke around the database a bit [18:53:37] legoktm: you want me merge that? [18:53:42] sure :) [18:53:54] (03PS3) 10Marostegui: admin: Add legoktm's new ssh key [puppet] - 10https://gerrit.wikimedia.org/r/381273 (owner: 10Legoktm) [18:54:44] !log thcipriani@tin Synchronized php-1.31.0-wmf.1/skins/MinervaNeue/includes/skins/SkinMinerva.php: SWAT: [[gerrit:381259|Revision::newFromTitle may return null]] T176882 (duration: 00m 50s) [18:54:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:52] T176882: null passed to getRevisionEditor() on itwiki - https://phabricator.wikimedia.org/T176882 [18:55:00] ^ raynor jdlrobson live on wmf.1 now [18:55:17] er for rather [18:55:34] (03CR) 10Marostegui: [C: 032] admin: Add legoktm's new ssh key [puppet] - 10https://gerrit.wikimedia.org/r/381273 (owner: 10Legoktm) [18:55:46] thcipriani: thx [18:56:06] sure, yw :) [18:56:08] legoktm: done! :) [19:00:05] no_justification: #bothumor I � Unicode. All rise for MediaWiki train deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170928T1900). [19:00:05] No GERRIT patches in the queue for this window AFAICS. [19:00:46] !log cp4025 - depool->reboot for bios updates, isolcpus fix [19:00:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:57] marostegui: I see the script still didn't worked right? [19:07:45] marostegui: has puppet run on terbium? it's still prompting me for a password [19:08:50] https://phabricator.wikimedia.org/diffusion/ECAU/browse/master/maintenance/migrateAccount.php <-- maybe we should try this one [19:10:53] thcipriani: swat all done? [19:11:00] no_justification: yeap [19:11:09] all yours :) [19:12:29] PROBLEM - Apache HTTP on mw2150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:13:19] RECOVERY - Apache HTTP on mw2150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.121 second response time [19:13:53] 10Operations, 10MediaWiki-API, 10Wikimedia-General-or-Unknown, 10Patch-For-Review: HTTP 500 on action=query&format=xml&list=logevents&letype=move&ledir=newer&lelimit=max on Meta - https://phabricator.wikimedia.org/T176938#3644329 (10Base) I just opened it to check and got Request from *removed* via cp3031... [19:14:04] !log cp4024 initial tests done (takes about 2 hours or so) then prompted for more in depth testing (hit yes, no failures so far) T174891 [19:14:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:09] T174891: cp4024 kernel errors - https://phabricator.wikimedia.org/T174891 [19:16:56] 10Operations, 10MediaWiki-API, 10Wikimedia-General-or-Unknown, 10Patch-For-Review: HTTP 500 on action=query&format=xml&list=logevents&letype=move&ledir=newer&lelimit=max on Meta - https://phabricator.wikimedia.org/T176938#3644351 (10Base) 05Resolved>03Open Oh. Works anonymously. I forgot to mention, my... [19:17:35] legoktm: just ran it there [19:17:44] actually still running :) [19:17:57] (03CR) 10Chad: [C: 032] Group2 to 1.31.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381266 (owner: 10Chad) [19:18:13] tabbycat: I asked legoktm but he said he is unsure whether it would work if the other script didn't [19:18:25] legoktm: your key just got added, you should be able to login [19:19:11] marostegui: well, we'll always have Paris eval.php [19:19:26] hmm, it still wants a password for me. [19:19:29] 10Operations, 10ops-eqiad, 10fundraising-tech-ops, 10netops: connect second interface for each frack to opposite switch for each eqiad host - https://phabricator.wikimedia.org/T176975#3644371 (10Jgreen) a:05Jgreen>03None [19:19:52] legoktm: I can see it there :| [19:20:09] I can get into bast1001 properly [19:20:16] * legoktm checks proxycommand config [19:20:19] haha [19:20:24] tabbycat: XDDDDD [19:20:26] you are trying to get to terbium you say? there is no failed login there [19:20:34] so seems more like proxycommand, ack [19:21:14] Host *.wikimedia.org *.wmnet !gerrit.wikimedia.org !git-ssh.wikimedia.org [19:21:14] User your_username_here [19:21:26] (03Merged) 10jenkins-bot: Group2 to 1.31.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381266 (owner: 10Chad) [19:21:33] I probably should have read more carefully what I copy and pasted -.- [19:21:36] (03CR) 10jenkins-bot: Group2 to 1.31.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381266 (owner: 10Chad) [19:21:49] legoktm: or get your username changed to that :) [19:21:51] marostegui: I'm in now [19:21:52] \o/ [19:22:00] \o/ [19:23:18] the good news is that nothing is wrong with the account, just how it is displayed on Special:CentralAuth :) [19:23:26] \o/ [19:23:28] oh? [19:23:41] for whatever reason lu_attached_method is set to NULL and not login or new or whatever [19:23:55] legoktm: you mind if i quote you on the ticket? so everyone can have an update [19:23:59] yeah [19:24:05] I'm just going to fix this with an SQL query [19:24:07] thanks :) [19:24:13] sql powah [19:24:38] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3644406 (10Marostegui) ``` ˜/legoktm 21:23> the good news is that nothing is wrong with the account, just how it is displayed on Special:CentralAuth :) ˜/legoktm 21:2... [19:24:55] UPDATE localuser SET lu_attached_method="new" WHERE lu_attached_method=NULL AND lu_name="Rua"; [19:25:08] marostegui: ^ does that look ok? [19:25:16] legoktm: now that you're doing sql stuff you could handle T176798 ? [19:25:17] T176798: Remove 'monitor' group from enwiki - https://phabricator.wikimedia.org/T176798 [19:25:27] legoktm: to be run where? on centralauth master? [19:25:32] marostegui: yes [19:26:14] lu_attached_method is NULL I think. [19:26:17] marostegui: 215 rows [19:26:29] I was checking the number of rows indeed [19:27:53] marostegui: is it OK if I run it? [19:28:02] legoktm: yup [19:28:18] !log mysql:wikiadmin@db1062 [centralauth]> UPDATE localuser SET lu_attached_method="new" WHERE lu_attached_method is NULL AND lu_name="Rua"; [19:28:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:22] Query OK, 215 rows affected (0.01 sec) [19:28:22] Rows matched: 215 Changed: 215 Warnings: 0 [19:28:40] https://meta.wikimedia.org/wiki/Special:CentralAuth/Rua looks good now [19:28:52] yep, although dates and methods don't match [19:28:52] legoktm is it fine if they look as new account? [19:28:56] yeah [19:28:58] all of the attach information is wrong, but that doesn't matter [19:29:14] maybe if the user runs Special:MergeAccount that'd fix it? [19:29:50] nah [19:29:56] it'll just be wrong forever [19:30:21] I'm glad it is not my account ;) thanks!! [19:31:08] !log cp4026 - depool->reboot for bios updates, isolcpus fix [19:31:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:42] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of CodeCat → Rua: supervision needed - https://phabricator.wikimedia.org/T176985#3644420 (10Marostegui) 05Open>03Resolved a:03Legoktm Everything is fixed after running the UPDATE below ``` ˜/legoktm 21:28> !log mysql:wikiadmin@db1062 [centr... [19:31:52] tabbycat legoktm thanks a lot for getting this fixed. I am going to log off now, as it is late here :-) [19:32:09] o/ [19:32:21] okay marostegui thanks for your help, sleep well [19:32:45] (03PS1) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [19:33:11] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [19:34:10] !log mysql:wikiadmin@db1052 [enwiki]> delete from user_groups where ug_user =17629530 limit 1; (T176798) [19:34:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:17] T176798: Remove 'monitor' group from enwiki - https://phabricator.wikimedia.org/T176798 [19:35:58] (03PS2) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [19:36:32] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.1 [19:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:37:45] 10Operations, 10MediaWiki-API, 10Wikimedia-General-or-Unknown, 10Patch-For-Review: HTTP 500 on action=query&format=xml&list=logevents&letype=move&ledir=newer&lelimit=max on Meta - https://phabricator.wikimedia.org/T176938#3644480 (10Anomie) 05Open>03Resolved a:03Anomie >>! In T176938#3644351, @Base w... [19:41:04] PROBLEM - Nginx local proxy to apache on mw1200 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.007 second response time [19:41:34] PROBLEM - puppet last run on eventlog1001 is CRITICAL: CRITICAL: Puppet has 26 failures. Last run 2 minutes ago with 26 failures. Failed resources (up to 3 shown): Service[prometheus-node-exporter],Exec[git_pull_mediawiki/event-schemas],Service[apparmor],Service[diamond] [19:42:03] (03PS1) 10Mridubhatnagar: Explaination of what scripts/webservice does when a user runs it as: webservice --backend kubernetes start webservice --backend kubernetes stop Along with this the file contains line by line explaination of source code present in scripts/webservice. [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 [19:42:04] RECOVERY - Nginx local proxy to apache on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.045 second response time [19:43:51] (03PS3) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [19:57:44] PROBLEM - puppet last run on netmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[acme-setup-acme-librenms] [19:59:30] (03Abandoned) 10Dzahn: mariadb::misc: allow connections from gerrit servers [puppet] - 10https://gerrit.wikimedia.org/r/380827 (https://phabricator.wikimedia.org/T168562) (owner: 10Dzahn) [20:02:51] thcipriani: FYI - I just tested the IT wiki - it works as expected \o/ [20:03:33] nice :) [20:07:35] (03PS2) 10Herron: Change check_ipmi_temp to check_ipmi_sensor and monitor Power_Supply [puppet] - 10https://gerrit.wikimedia.org/r/376048 (https://phabricator.wikimedia.org/T109903) [20:09:14] RECOVERY - puppet last run on eventlog1001 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [20:13:13] (03CR) 10Pmiazga: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395) (owner: 10Bmansurov) [20:13:45] 10Operations, 10Gerrit: Upload gerrit package to stretch apt.wm.org repo - https://phabricator.wikimedia.org/T165620#3644590 (10Dzahn) [20:13:49] 10Operations, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Reimage gerrit2001 as stretch - https://phabricator.wikimedia.org/T168562#3644589 (10Dzahn) 05Open>03stalled [20:14:05] 10Operations, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Reimage gerrit2001 as stretch - https://phabricator.wikimedia.org/T168562#3368352 (10Dzahn) Stalled by firewall on DB. [20:14:24] 10Operations, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Reimage gerrit2001 as stretch - https://phabricator.wikimedia.org/T168562#3644595 (10Dzahn) [20:14:26] 10Operations, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3644593 (10Dzahn) 05Open>03stalled stalled by firewall on DB [20:18:41] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3644612 (10Paladox) Adding DBA as we need the firewall to allow connection from m2-master.codfw.wmnet. Since the eqiad has... [20:20:21] 10Operations, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10User-ArielGlenn: logrotate issue (cron spam) on dumps hosts - https://phabricator.wikimedia.org/T176810#3644614 (10ArielGlenn) 05Open>03Resolved [20:22:50] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#3644618 (10Paladox) [20:48:56] (03PS1) 10Rush: hiera keys for pdns_server in openstack deployments [labs/private] - 10https://gerrit.wikimedia.org/r/381345 [20:49:59] (03CR) 10Rush: [V: 032 C: 032] hiera keys for pdns_server in openstack deployments [labs/private] - 10https://gerrit.wikimedia.org/r/381345 (owner: 10Rush) [20:52:03] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops: Move eqiad frack to new infra - https://phabricator.wikimedia.org/T174218#3644703 (10ayounsi) > To be escalated to JTAC JTAC noticed that the control link went down as the same time as the data/fabric link because of missed heartbeats, which shouldn't happen a... [20:54:05] ACKNOWLEDGEMENT - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0: Ayounsi Telia outage in progress [21:03:52] (03PS4) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [21:11:20] (03CR) 10Rush: "http://puppet-compiler.wmflabs.org/8097/" [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [21:16:58] (03PS5) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [21:24:21] 10Operations, 10netops: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3644811 (10ayounsi) It seems like Junos' `local-as` feature isn't working as expected. Global AS of 43821, remote side with `peer-as 43821`, and the local side with: `local-as 14907` -> BGP session doesn't establish... [21:28:05] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops: Move eqiad frack to new infra - https://phabricator.wikimedia.org/T174218#3644816 (10Jgreen) >>! In T174218#3644703, @ayounsi wrote: >> To be escalated to JTAC > JTAC noticed that the control link went down as the same time as the data/fabric link because of m... [21:35:36] 10Operations, 10ops-ulsfo, 10Traffic: cp4024 kernel errors - https://phabricator.wikimedia.org/T174891#3644832 (10RobH) a:03BBlack Followup testing passed without further errors: All tests passed. @bblack: Perhaps it was a transient error solved by the bios firmware updates? It went from 2.4.2 to 2.5.4 i... [21:35:55] elukey: ^ [21:36:00] i just realized you made the task not brandon [21:36:35] but cp4024 failed initial epsa testing, but it had an error in the log about power loss. clearing the log and updating the firmware and now it passes epsa testing [21:39:33] 10Operations, 10ops-eqiad: rack/setup/install flerovium.eqiad.wmnet - https://phabricator.wikimedia.org/T176505#3644840 (10RobH) Chris, when you rack this can you plug one md1400 into each port on the controller? Dell advises this may be the best way to wire both, and I'd like to compare it to codfw (which we... [21:41:14] 10Operations, 10ops-eqiad: rack/setup/install flerovium.eqiad.wmnet - https://phabricator.wikimedia.org/T176505#3644849 (10RobH) [21:42:07] 10Operations, 10ops-eqiad: rack/setup/install flerovium.eqiad.wmnet - https://phabricator.wikimedia.org/T176505#3627849 (10RobH) [21:53:11] 10Operations, 10netops: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3644900 (10ayounsi) [22:18:32] !log Deployed fix for T176247 [22:18:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:46:15] 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#3644975 (10Dzahn) 05Resolved>03Open [22:46:40] 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#2346094 (10Dzahn) p:05Low>03High [22:49:49] 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#3644996 (10Dzahn) received personal email that there is now http://wikimediaendowment.org/ and content is supposed to be hosted on Wordpress servers, while we still handle email and the time-frame is... [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170928T2300). [23:00:05] No GERRIT patches in the queue for this window AFAICS. [23:00:23] =o [23:00:33] empty swat [23:06:34] PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[cdh::hadoop::directory /user/spark] [23:07:04] PROBLEM - Apache HTTP on mw1279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [23:07:14] PROBLEM - HHVM rendering on mw1279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [23:08:04] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.034 second response time [23:08:14] ok, i was about to restart that [23:08:14] RECOVERY - HHVM rendering on mw1279 is OK: HTTP OK: HTTP/1.1 200 OK - 74625 bytes in 0.108 second response time [23:08:18] but didnt [23:20:15] PROBLEM - Long running screen/tmux on stat1005 is CRITICAL: Return code of 255 is out of bounds [23:21:44] PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds [23:21:45] PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds [23:21:45] PROBLEM - Disk space on stat1005 is CRITICAL: Return code of 255 is out of bounds [23:21:54] PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds [23:21:54] PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds [23:22:04] PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds [23:24:39] host is still alive... [23:25:45] R is keeping CPU busy [23:28:04] PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds [23:29:51] (03PS1) 10Dzahn: releases/jenkins: add ProxyPassReverse config line [puppet] - 10https://gerrit.wikimedia.org/r/381365 (https://phabricator.wikimedia.org/T164030) [23:30:29] (03CR) 10Dzahn: [C: 032] "like you said.." [puppet] - 10https://gerrit.wikimedia.org/r/381365 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [23:30:40] (03PS2) 10Dzahn: releases/jenkins: add ProxyPassReverse config line [puppet] - 10https://gerrit.wikimedia.org/r/381365 (https://phabricator.wikimedia.org/T164030) [23:32:54] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [23:32:55] RECOVERY - DPKG on stat1005 is OK: All packages OK [23:32:55] RECOVERY - Disk space on stat1005 is OK: DISK OK [23:33:04] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [23:33:14] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [23:34:05] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [23:45:43] !log awight@tin Started deploy [ores/deploy@42c5663]: Cause ORES service restart [23:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:01] !log awight@tin Finished deploy [ores/deploy@42c5663]: Cause ORES service restart (duration: 00m 20s) [23:46:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:09] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3645110 (10awight) [23:47:13] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3645108 (10awight) 05Open>03Resolved @akosiaris This fixed the problem, thanks! [23:56:34] PROBLEM - puppet last run on releases1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:56:45] ^ me and fixing [23:57:34] RECOVERY - puppet last run on releases1001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [23:58:04] RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Thu 2017-09-28 23:57:59 UTC.