[00:05:32] 10Operations, 10netops: review and fix scs config - https://phabricator.wikimedia.org/T185926 (10ayounsi) 05Open>03Resolved a:03ayounsi Hey, I've done that long ago. All are also in LibreNMS and Rancid. I re-audited their config and only ulsfo had telnet enabled. We're all good here. [00:06:25] 10Operations, 10ops-eqiad, 10decommission, 10netops: unrack/decom pfw1-eqiad and pfw2-eqiad - https://phabricator.wikimedia.org/T183390 (10ayounsi) a:03Cmjohnson [00:21:15] !log ms-be2043 - mount -a ; re-enabled puppet, running puppet [00:21:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:21:23] RECOVERY - swift-container-updater on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [00:21:23] RECOVERY - swift-container-replicator on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [00:21:24] RECOVERY - swift-account-auditor on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [00:21:24] RECOVERY - swift-account-replicator on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [00:21:43] RECOVERY - swift-object-updater on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [00:21:53] RECOVERY - swift-container-auditor on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [00:21:54] RECOVERY - swift-object-server on ms-be2043 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [00:21:54] RECOVERY - swift-object-replicator on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [00:22:03] RECOVERY - swift-container-server on ms-be2043 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [00:22:04] RECOVERY - swift-account-server on ms-be2043 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [00:22:13] RECOVERY - swift-account-reaper on ms-be2043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [00:22:14] RECOVERY - swift-object-auditor on ms-be2043 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [00:39:48] 10Operations, 10Beta-Cluster-Infrastructure, 10Jenkins, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Krenair) During cherry-pick review today I realised that my attempt (`b59add730544b922e1fb6... [00:47:58] 10Operations, 10netops: Security audit for tftp on install1001 - https://phabricator.wikimedia.org/T122210 (10ayounsi) 05Open>03Resolved a:03ayounsi To answer the original request, tftp isn't reachable from cloud, neither the internal subnet nor the public IPs are in the ACL. In addition Daniel confirmed... [00:48:03] 10Operations, 10Puppet, 10puppet-compiler, 10Continuous-Integration-Config, 10Release-Engineering-Team (Someday): Figure out a way to enable volunteers to use the puppet compiler - https://phabricator.wikimedia.org/T192532 (10Dzahn) i just noticed T97580 Is that the same thing and it got solved?? [01:05:14] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [01:05:33] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 65, down: 0, dormant: 0, excluded: 0, unused: 0 [01:06:41] 10Operations, 10Cloud-VPS, 10netops: dmz_cidr only includes some wikimedia public IP ranges, leading to some very strange behaviour - https://phabricator.wikimedia.org/T174596 (10ayounsi) Seems like even after T167357 I can reproduce the tests from the description. My guess is that it's setup like that to m... [09:04:28] !log Deploy schema change on db1095:3312 [09:04:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:40] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1122" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455105 (owner: 10Marostegui) [09:04:52] 10Operations: Onboarding Mathew Onipe - https://phabricator.wikimedia.org/T202708 (10Gehel) [09:06:01] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1122" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455105 (owner: 10Marostegui) [09:06:03] (03CR) 10Marostegui: [C: 032] db2035.yaml: Fixing typo [puppet] - 10https://gerrit.wikimedia.org/r/455103 (owner: 10Marostegui) [09:08:08] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1122" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455105 (owner: 10Marostegui) [09:09:59] !log Restart MySQL on db2035 (codfw s2 master) for mysql upgrade and pick up new mysql options after merging https://gerrit.wikimedia.org/r/455103 [09:10:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:38] arturo: should be ready for testing! [09:10:45] 10Operations: Onboarding Mathew Onipe - https://phabricator.wikimedia.org/T202708 (10Gehel) Note: @Mathew.onipe does not have an @wikimedia.org email yet. Some of the checklist items above would make more sense with an @wikimedia.org email (like exim email aliases), so those might be delayed a bit. [09:13:24] elukey: rebuilding... [09:13:47] * elukey follows https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/12220/console [09:14:58] uff same error [09:15:30] -_- [09:15:34] maybe I haven't updated pcc correctly [09:15:47] yeah [09:15:47] File "build/bdist.linux-x86_64/egg/puppet_compiler/presentation/html.py", line 41, in htmlpage f.write(page) [09:15:54] f.write is the old one [09:15:59] silly me [09:18:18] <_joe_> did you run puppet on the compilers? [09:18:50] <_joe_> if you updated hiera, that ought to deploy the new version [09:18:55] yep yep, but it was half broken (fixing it now) so I might have failed there [09:19:01] (03PS2) 10Vgutierrez: [WIP] Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 [09:19:35] also there is some time for the labs puppet master to sync with prod right? [09:20:08] <_joe_> not much AIUI [09:20:22] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (owner: 10Vgutierrez) [09:20:53] ah also the pcc hosts are trying to use component/puppetdb4 that is defined only for stretch-wikimedia (they are jessies0 [09:21:14] but I can see Scheduling refresh of Exec[git_checkout_operations/software/puppet-compiler] anyway [09:21:58] <_joe_> elukey: talk to herron about that I guess? [09:22:07] <_joe_> I'm not maintaining those hosts anymore [09:22:18] nice, I've managed to break fluke8, pylint and unittests in a single commit [09:22:22] *flake8 [09:25:20] nice one! [09:27:33] a single syntax error is enough for that right? [09:28:41] !log reimage wtp2005-wtp2007 to stretch [09:28:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:09] 10Operations, 10netops: Intermittent connectivity issues in eqiad's row C - https://phabricator.wikimedia.org/T201139 (10Aklapper) [09:30:31] mmm so I can see the 0.4.1 version under /usr/local/lib/python2.7/dist-packages [09:30:52] but running pcc returns the old code [09:31:22] (f.write(page) instead of the new one) [09:33:03] (03PS3) 10Vgutierrez: [WIP] Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 [09:33:04] <_joe_> elukey: need me to take a look? [09:33:59] _joe_ if you have a min yes please [09:34:34] <_joe_> where is the console output that shows this still using f.write ? [09:34:57] <_joe_> the code is at 0.4.1 indeed [09:35:06] so the last one should be https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/12221/console [09:35:21] <_joe_> load_entry_point('puppet-compiler==0.4.1' [09:35:37] where is it? (to learn how to find it) [09:35:41] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [09:35:59] <_joe_> elukey: /usr/local/bin/puppet-compiler [09:36:05] <_joe_> elukey: try again please? [09:36:13] sure [09:36:20] <_joe_> the file is updated 1 minute ago [09:36:43] <_joe_> can someone look at mediawiki while we work on this? [09:37:34] <_joe_> Translate/stringmangler/StringMatcher.php again, already known IIRC [09:37:51] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [09:40:25] _joe_ I unzipped /usr/local/lib/python2.7/dist-packages/puppet_compiler-0.4.1-py2.7.egg and html.py doesn't contain my fix, so I must have missed a step [09:40:47] <_joe_> elukey: did you check that change was merged properly? [09:41:00] <_joe_> the whole gate-and-submit thing might be broken [09:41:27] it didn't happen only for the setup.py change now that I see [09:41:51] (I tagged that commit to 0.4.1) [09:42:12] (03PS1) 10Marostegui: production-m5.sql.erb: Add GRANTS for nova user [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) [09:42:37] <_joe_> elukey: in /var/lib/catalog-differ/compiler I do see your change [09:43:01] <_joe_> so I am not sure what happened there [09:43:32] (03CR) 10Jcrespo: [C: 031] "Disclaimer: I didn't check the IPs" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [09:43:34] I don't in /var/lib/catalog-differ/compiler/puppet_compiler/presentation/html.py no? [09:43:59] ah snap there are two f.writes [09:44:16] ahhhhhh [09:44:17] <_joe_> I was about to tell you [09:44:22] * elukey cries in a corner [09:44:26] I fixed the wrong one [09:44:28] <_joe_> you changed the wrong one, even :D [09:44:36] <_joe_> but it's still ok as a change [09:44:46] <_joe_> ok mistery solved [09:44:48] lemme apply it again on the other [09:44:50] thanks _joe_ [09:44:53] <_joe_> I can go back to my changes [09:45:00] <_joe_> self-merge the change at will [09:45:48] "One host or software cannot be fixed or worked by Luca at the same time. When one of the two things happen, the other is irrevocably changed (especially after the second one)" [09:46:20] <_joe_> ? [09:46:56] joking about me not fixing things :D [09:47:10] nevermind, sending the cr in a bit [09:48:57] (03PS1) 10Elukey: html.py: force utf-8 encoding for generated html [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/455116 (https://phabricator.wikimedia.org/T173518) [09:49:55] (03CR) 10Elukey: [C: 032] html.py: force utf-8 encoding for generated html [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/455116 (https://phabricator.wikimedia.org/T173518) (owner: 10Elukey) [09:51:04] (03PS1) 10Elukey: Bump setup.py version to 0.4.2 [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/455117 (https://phabricator.wikimedia.org/T173518) [09:51:50] (03CR) 10Elukey: [C: 032] Bump setup.py version to 0.4.2 [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/455117 (https://phabricator.wikimedia.org/T173518) (owner: 10Elukey) [09:51:54] (03PS2) 10Jcrespo: db backup statistics: Initial implementation of the backup stats [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/449469 (https://phabricator.wikimedia.org/T198987) [09:52:18] (03Merged) 10jenkins-bot: Bump setup.py version to 0.4.2 [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/455117 (https://phabricator.wikimedia.org/T173518) (owner: 10Elukey) [09:52:20] (03CR) 10jerkins-bot: [V: 04-1] db backup statistics: Initial implementation of the backup stats [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/449469 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo) [09:53:59] (03PS1) 10Elukey: Deploy version 0.4.2 of the puppet compiler [puppet] - 10https://gerrit.wikimedia.org/r/455119 (https://phabricator.wikimedia.org/T173518) [09:54:58] (03CR) 10Elukey: [C: 032] Deploy version 0.4.2 of the puppet compiler [puppet] - 10https://gerrit.wikimedia.org/r/455119 (https://phabricator.wikimedia.org/T173518) (owner: 10Elukey) [09:55:17] (03PS3) 10Jcrespo: db backup statistics: Initial implementation of the backup stats [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/449469 (https://phabricator.wikimedia.org/T198987) [09:56:10] (03PS3) 10Jcrespo: WMFMariaDB refactoring and adding tests [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/449185 [09:56:32] (03CR) 10Jcrespo: [C: 032] db backup statistics: Initial implementation of the backup stats [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/449469 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo) [09:56:35] (03CR) 10jerkins-bot: [V: 04-1] WMFMariaDB refactoring and adding tests [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/449185 (owner: 10Jcrespo) [10:06:48] arturo: https://puppet-compiler.wmflabs.org/compiler02/12223/cloudcontrol1003.wikimedia.org/ [10:06:51] \o/ [10:07:24] I am not sure what "linuxnet_interface_driver = nova.network.linux_net.LinuxBridgeInterfaceDriver" is though [10:08:06] elukey: thanks you are amazing! :-) also _joe_ [10:10:13] (03PS6) 10Arturo Borrero Gonzalez: cloudvps: eqiad1: move nova DBs to m5-master [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) [10:16:12] arturo: any idea why those chars are there nova.conf? [10:16:15] it looks very weird [10:16:39] not sure, but I'm fixing them. I bet that was copy&pate issue from a PDF or something like that [10:17:11] ahh okok [10:18:44] 10Operations, 10Puppet, 10puppet-compiler, 10User-herron: Upgrade Puppet compilers to Stretch - https://phabricator.wikimedia.org/T191438 (10elukey) [10:19:34] 10Operations, 10Puppet, 10puppet-compiler, 10User-herron: Upgrade Puppet compilers to Stretch - https://phabricator.wikimedia.org/T191438 (10elukey) p:05Normal>03High As explained in T202717, this task should be prioritized, so I am raising its priority to High. [10:23:32] PROBLEM - parsoid on wtp2007 is CRITICAL: connect to address 10.192.16.49 and port 8000: Connection refused [10:25:31] PROBLEM - puppet last run on wtp2007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[parsoid/deploy] [10:28:51] RECOVERY - parsoid on wtp2007 is OK: HTTP OK: HTTP/1.1 200 OK - 1051 bytes in 0.191 second response time [10:30:24] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [10:30:35] RECOVERY - puppet last run on wtp2007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:31:24] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [10:39:36] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp2013.codfw.wmnet', 'cp3030.esams.wmnet'] ``` The log can be found in `/var/l... [10:41:21] (03PS1) 10Elukey: archiva::proxy: fix tls cert names according to letsencrypt::cert::integrated [puppet] - 10https://gerrit.wikimedia.org/r/455137 (https://phabricator.wikimedia.org/T192639) [10:41:52] vgutierrez: I think I found what it is failing on archiva1001 --^ [10:42:06] (needed more coffee) [10:42:10] (03PS4) 10Vgutierrez: [WIP] Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 [10:43:37] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (owner: 10Vgutierrez) [10:43:41] fuuu :) [10:45:42] (03PS2) 10Elukey: archiva::proxy: fix tls cert names according to letsencrypt::cert::integrated [puppet] - 10https://gerrit.wikimedia.org/r/455137 (https://phabricator.wikimedia.org/T192639) [10:47:56] !log reimage wtp2008-wtp2010 to stretch [10:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:07] (03PS3) 10Elukey: archiva::proxy: fix tls cert names according to letsencrypt::cert::integrated [puppet] - 10https://gerrit.wikimedia.org/r/455137 (https://phabricator.wikimedia.org/T192639) [10:48:47] (03CR) 10jerkins-bot: [V: 04-1] archiva::proxy: fix tls cert names according to letsencrypt::cert::integrated [puppet] - 10https://gerrit.wikimedia.org/r/455137 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [10:51:06] (03PS4) 10Elukey: archiva::proxy: fix tls cert names according to letsencrypt::cert::integrated [puppet] - 10https://gerrit.wikimedia.org/r/455137 (https://phabricator.wikimedia.org/T192639) [10:51:21] (03PS3) 10Ema: prometheus: Job definition for trafficserver_exporter [puppet] - 10https://gerrit.wikimedia.org/r/454784 (https://phabricator.wikimedia.org/T202381) [10:52:22] (03CR) 10Ema: [C: 032] prometheus: Job definition for trafficserver_exporter [puppet] - 10https://gerrit.wikimedia.org/r/454784 (https://phabricator.wikimedia.org/T202381) (owner: 10Ema) [10:52:35] PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 50 not-conn: cp3030_v4, cp3030_v6 [10:53:01] (03PS5) 10Elukey: archiva::proxy: fix tls cert names according to letsencrypt::cert::integrated [puppet] - 10https://gerrit.wikimedia.org/r/455137 (https://phabricator.wikimedia.org/T192639) [10:53:03] (03CR) 10Arturo Borrero Gonzalez: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [10:53:55] (03CR) 10Elukey: [C: 032] "Finally looking good! https://puppet-compiler.wmflabs.org/compiler02/12229/" [puppet] - 10https://gerrit.wikimedia.org/r/455137 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [10:56:05] RECOVERY - Check systemd state on archiva1001 is OK: OK - running: The system is fully operational [10:57:05] RECOVERY - puppet last run on archiva1001 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:59:08] https://archiva-new.wikimedia.org/ \o/ [11:00:20] 10Operations, 10Security-Team, 10Traffic, 10Wikimedia-General-or-Unknown: Add restrictive CSP to upload.wikimedia.org - https://phabricator.wikimedia.org/T117618 (10ema) p:05Triage>03Normal [11:00:30] 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: https://sv.wikipedia.beta.wmflabs.org/ has invalid certificate - https://phabricator.wikimedia.org/T202564 (10ema) p:05Triage>03Normal [11:00:59] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, and 2 others: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10ema) p:05Triage>03Normal [11:04:00] (03PS5) 10Vgutierrez: [WIP] Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 [11:06:37] (03CR) 10Arturo Borrero Gonzalez: [C: 04-2] "It seems other database names use '_' as separator." [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [11:06:54] (03PS6) 10Vgutierrez: Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 [11:12:16] (03PS7) 10Arturo Borrero Gonzalez: cloudvps: eqiad1: move nova DBs to m5-master [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) [11:19:44] !log powercycle cp3030, stuck rebooting T200445 [11:19:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:50] T200445: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 [11:23:44] (03CR) 10Alexandros Kosiaris: [C: 032] wikilabels: Add zlib1g-dev package and cronjob to remove expired tasks [puppet] - 10https://gerrit.wikimedia.org/r/454546 (https://phabricator.wikimedia.org/T168478) (owner: 10Ladsgroup) [11:23:51] (03PS2) 10Alexandros Kosiaris: wikilabels: Add zlib1g-dev package and cronjob to remove expired tasks [puppet] - 10https://gerrit.wikimedia.org/r/454546 (https://phabricator.wikimedia.org/T168478) (owner: 10Ladsgroup) [11:23:53] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] wikilabels: Add zlib1g-dev package and cronjob to remove expired tasks [puppet] - 10https://gerrit.wikimedia.org/r/454546 (https://phabricator.wikimedia.org/T168478) (owner: 10Ladsgroup) [11:36:40] (03PS8) 10Arturo Borrero Gonzalez: cloudvps: eqiad1: move nova DBs to m5-master [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) [11:37:40] (03PS9) 10Arturo Borrero Gonzalez: cloudvps: eqiad1: move nova DBs to m5-master [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) [11:42:43] (03CR) 10Arturo Borrero Gonzalez: "Compiler is finally happy:" [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) (owner: 10Arturo Borrero Gonzalez) [11:43:13] RECOVERY - IPsec on cp2007 is OK: Strongswan OK - 52 ESP OK [11:44:47] (03PS2) 10Arturo Borrero Gonzalez: cloudvps: cleanup openstack liberty files [puppet] - 10https://gerrit.wikimedia.org/r/451315 [11:45:41] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: cleanup openstack liberty files [puppet] - 10https://gerrit.wikimedia.org/r/451315 (owner: 10Arturo Borrero Gonzalez) [11:50:44] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp2013.codfw.wmnet', 'cp3030.esams.wmnet'] ``` and were **ALL** successful. [11:52:05] (03Abandoned) 10Arturo Borrero Gonzalez: hieradata: eqiad1: fix public IP address [puppet] - 10https://gerrit.wikimedia.org/r/445204 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez) [11:52:49] (03Abandoned) 10Arturo Borrero Gonzalez: labs_bootstrapvz: firstboot.sh: bring back some resolv.conf magic [puppet] - 10https://gerrit.wikimedia.org/r/429211 (https://phabricator.wikimedia.org/T181523) (owner: 10Arturo Borrero Gonzalez) [11:58:20] !log upgrade nodejs packages on aqs* for security upgrade (rolling restart of aqs daemon included) [11:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:33] PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet operation_type={create_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [12:01:58] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp2014.codfw.wmnet', 'cp3033.esams.wmnet'] ``` The log can be found in `/var/l... [12:02:33] RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [12:03:02] 10Operations, 10Cloud-VPS, 10netops: dmz_cidr only includes some wikimedia public IP ranges, leading to some very strange behaviour - https://phabricator.wikimedia.org/T174596 (10aborrero) a:03aborrero Our plan is to keep using the `dmz_cidr` mechanism with the new `172.16` addressing space. This is alrea... [12:03:23] (03PS15) 10Vgutierrez: Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (https://phabricator.wikimedia.org/T199711) [12:03:25] (03PS6) 10Vgutierrez: Implement different Certificate.save() modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/453124 (https://phabricator.wikimedia.org/T199711) [12:03:27] (03PS10) 10Vgutierrez: Certcentral integration tests [software/certcentral] - 10https://gerrit.wikimedia.org/r/454045 (https://phabricator.wikimedia.org/T199711) [12:03:29] (03PS4) 10Vgutierrez: Deliver certificates in every save mode [software/certcentral] - 10https://gerrit.wikimedia.org/r/454794 (https://phabricator.wikimedia.org/T199711) [12:03:31] (03PS7) 10Vgutierrez: Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) [12:06:41] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10MoritzMuehlenhoff) a:05Muehlenhoff>03MoritzMuehlenhoff [12:09:43] 10Operations, 10ORES, 10Scoring-platform-team (Current): Spin up a new poolcounter node for ores - https://phabricator.wikimedia.org/T201824 (10akosiaris) Yes, it does make sense indeed to not share the infrastructure with mediawiki. I 'll file a task for creating 4 VMs (2 poolcounter instances per DC) for this [12:12:43] (03PS2) 10Volans: cookbook: fix BaseCookbooksItem interface [software/spicerack] - 10https://gerrit.wikimedia.org/r/454801 (https://phabricator.wikimedia.org/T199079) [12:12:45] (03PS3) 10Volans: cookbook: fix links to parent in interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/454802 (https://phabricator.wikimedia.org/T199079) [12:12:47] (03PS3) 10Volans: cookbook: properly handle KeyboardInterrupt [software/spicerack] - 10https://gerrit.wikimedia.org/r/454803 (https://phabricator.wikimedia.org/T199079) [12:12:49] (03PS3) 10Volans: cookbook: allow to pass parameters in the menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/454804 (https://phabricator.wikimedia.org/T199079) [12:12:51] (03PS3) 10Volans: cookbook: handle SystemExit exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/454805 (https://phabricator.wikimedia.org/T199079) [12:12:53] (03PS2) 10Volans: mediawiki: fix Remote queries for maintenance host [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) [12:12:57] (03CR) 10Volans: "done" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/454801 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:14:01] (03CR) 10Volans: "done" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/454804 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:14:16] (03CR) 10Volans: "done" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:14:59] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10MoritzMuehlenhoff) I've backported support for that driver the stretch 4.9 kernel, it's a series of 18 patches, kernel is at https... [12:19:30] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-eo-fr: Use --no-parallel flag [debs/contenttranslation/apertium-eo-fr] - 10https://gerrit.wikimedia.org/r/446717 (https://phabricator.wikimedia.org/T199962) (owner: 10KartikMistry) [12:20:34] PROBLEM - IPsec on cp4025 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2014_v4, cp2014_v6 [12:21:49] RECOVERY - IPsec on cp4025 is OK: Strongswan OK - 36 ESP OK [12:23:11] (03PS1) 10Vgutierrez: Provide support in the API for different certificate save modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/455153 (https://phabricator.wikimedia.org/T199711) [12:27:49] (03PS8) 10Giuseppe Lavagetto: php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) [12:27:51] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: add support for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/455154 (https://phabricator.wikimedia.org/T201140) [12:28:55] 10Operations, 10Wikimedia-Mailing-lists: Password reset request for wikimedia-nd mailing list - https://phabricator.wikimedia.org/T202247 (10Geekdidi) Hi hi, I'm still expecting the link. The thing is, there's a moderation request on the mailing list - and I keep geeting an email once a day to moderate it, I... [12:29:30] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-kaz-tat: Fix Build-Dep [debs/contenttranslation/apertium-kaz-tat] - 10https://gerrit.wikimedia.org/r/446525 (https://phabricator.wikimedia.org/T199962) (owner: 10KartikMistry) [12:33:39] !log upload apertium-eo-fr to apt.wikimedia.org/jessie-wikimedia/main T199962 [12:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:33:44] T199962: Apertium maintenance updates (July-September) - https://phabricator.wikimedia.org/T199962 [12:33:44] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp2014.codfw.wmnet', 'cp3033.esams.wmnet'] ``` and were **ALL** successful. [12:33:45] !log upload apertium-kaz-tat to apt.wikimedia.org/jessie-wikimedia/main T199962 [12:33:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:23] (03CR) 10Jcrespo: "Let's also add a MAX_USER_CONNECTIONS 50 to avoid further issues on wikitech and other services. We should later add those to all other ac" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [12:51:20] (03PS10) 10Arturo Borrero Gonzalez: cloudvps: eqiad1: move nova DBs to m5-master [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) [12:57:32] PROBLEM - Check systemd state on kubestage1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:59:32] PROBLEM - Check systemd state on kubestage1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:00:00] RECOVERY - Check systemd state on kubestage1002 is OK: OK - running: The system is fully operational [13:00:41] RECOVERY - Check systemd state on kubestage1001 is OK: OK - running: The system is fully operational [13:01:06] (03PS1) 10Ladsgroup: wikilabels: install postgresql package as well [puppet] - 10https://gerrit.wikimedia.org/r/455157 [13:10:06] (03CR) 10Bstorm: "The quirk is that openstack is very much designed to keep lots of connections open. If it can't, it'll crash. Right now, this user is gu" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [13:10:37] (03PS1) 10Vgutierrez: [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) [13:11:27] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [13:11:45] (03PS1) 10Alexandros Kosiaris: kubernetes: Dumping --require-kubeconfig, make it param [puppet] - 10https://gerrit.wikimedia.org/r/455160 [13:12:34] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: rack/setup/install analyticsmaster100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T201939 (10elukey) @Cmjohnson hi! A couple of (probably stupid) questions: * are the final node names analytics-master100[12] or analyticsmaster100[12]? I s... [13:15:30] (03CR) 10Bstorm: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [13:15:42] (03PS2) 10Vgutierrez: [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) [13:16:08] (03CR) 10Jcrespo: "> The quirk is that openstack is very much designed to keep lots of" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [13:16:41] (03Abandoned) 10Bstorm: Revert "nova: reduce the pool size for database connections a lot" [puppet] - 10https://gerrit.wikimedia.org/r/454850 (owner: 10Bstorm) [13:17:12] (03PS2) 10Ladsgroup: wikilabels: install postgresql package as well [puppet] - 10https://gerrit.wikimedia.org/r/455157 [13:17:27] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [13:18:06] (03CR) 10Bstorm: "I think we really should give it a dedicated database because of its particularly unhelpful behaviors, and its needs. I don't know that w" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [13:23:45] (03CR) 10Bstorm: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [13:23:58] (03CR) 10Alexandros Kosiaris: [C: 032] wikilabels: install postgresql package as well [puppet] - 10https://gerrit.wikimedia.org/r/455157 (owner: 10Ladsgroup) [13:27:05] (03CR) 10Alexandros Kosiaris: [C: 032] kubernetes: Dumping --require-kubeconfig, make it param [puppet] - 10https://gerrit.wikimedia.org/r/455160 (owner: 10Alexandros Kosiaris) [13:27:13] (03PS2) 10Alexandros Kosiaris: kubernetes: Dumping --require-kubeconfig, make it param [puppet] - 10https://gerrit.wikimedia.org/r/455160 [13:27:15] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] kubernetes: Dumping --require-kubeconfig, make it param [puppet] - 10https://gerrit.wikimedia.org/r/455160 (owner: 10Alexandros Kosiaris) [13:33:57] (03PS2) 10Marostegui: production-m5.sql.erb: Add GRANTS for nova user [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) [13:36:23] (03PS3) 10Marostegui: production-m5.sql.erb: Add GRANTS for nova user [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) [13:41:42] (03CR) 10Marostegui: "> > Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [13:45:27] !log Deploy schema change on s8 codfw master with replication (this will generate lag on s8 codfw) [13:45:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:48] (03CR) 10Gehel: [C: 031] "LGTM, minor comments inline, but feel free to ignore." (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/454801 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:00:31] (03CR) 10Bstorm: "> Either way, I don't think we are in a position were we can decide and execute neither #1 or #2, so we should try to minimize issues." [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [14:03:54] (03CR) 10Bstorm: ">" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [14:06:40] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/454804 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:09:01] (03CR) 10Gehel: mediawiki: fix Remote queries for maintenance host (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:14:58] 10Operations, 10Wikimedia-Mailing-lists: Creation of a mailing list for the "Wiki Labs Culture" initiative - https://phabricator.wikimedia.org/T202737 (10InsaneHacker) [14:21:46] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10faidon) p:05Normal>03High [14:28:48] 10Operations, 10ops-eqiad: rack/setup/install cloudservices1004.wikimedia.org - https://phabricator.wikimedia.org/T201341 (10Andrew) *bump* -- I'm happy to do the OS install &c. if that helps move this along. Thanks! [14:31:48] (03PS3) 10Volans: cookbook: fix BaseCookbooksItem interface [software/spicerack] - 10https://gerrit.wikimedia.org/r/454801 (https://phabricator.wikimedia.org/T199079) [14:31:50] (03PS4) 10Volans: cookbook: fix links to parent in interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/454802 (https://phabricator.wikimedia.org/T199079) [14:31:52] (03PS4) 10Volans: cookbook: properly handle KeyboardInterrupt [software/spicerack] - 10https://gerrit.wikimedia.org/r/454803 (https://phabricator.wikimedia.org/T199079) [14:31:55] (03PS4) 10Volans: cookbook: allow to pass parameters in the menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/454804 (https://phabricator.wikimedia.org/T199079) [14:31:56] (03PS4) 10Volans: cookbook: handle SystemExit exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/454805 (https://phabricator.wikimedia.org/T199079) [14:31:58] (03PS3) 10Volans: mediawiki: fix Remote queries for maintenance host [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) [14:32:14] (03CR) 10Volans: "done" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:32:36] (03CR) 10Volans: "done" (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/454801 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:35:21] (03CR) 10Gehel: [C: 031] "Nice!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/454801 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:36:20] (03CR) 10Marostegui: "> >" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [14:37:21] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/454804 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:39:51] (03CR) 10Gehel: [C: 031] mediawiki: fix Remote queries for maintenance host (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:43:03] (03CR) 10Volans: "See reply inline" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:43:31] (03CR) 10Volans: [C: 032] cookbook: fix BaseCookbooksItem interface [software/spicerack] - 10https://gerrit.wikimedia.org/r/454801 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:43:54] (03CR) 10Gehel: [C: 031] mediawiki: fix Remote queries for maintenance host (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:44:41] (03Merged) 10jenkins-bot: cookbook: fix BaseCookbooksItem interface [software/spicerack] - 10https://gerrit.wikimedia.org/r/454801 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:45:30] (03CR) 10Volans: [C: 032] cookbook: fix links to parent in interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/454802 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:47:03] (03Merged) 10jenkins-bot: cookbook: fix links to parent in interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/454802 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:48:11] (03CR) 10Volans: [C: 032] cookbook: properly handle KeyboardInterrupt [software/spicerack] - 10https://gerrit.wikimedia.org/r/454803 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:49:45] (03Merged) 10jenkins-bot: cookbook: properly handle KeyboardInterrupt [software/spicerack] - 10https://gerrit.wikimedia.org/r/454803 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:50:05] (03CR) 10Volans: [C: 032] cookbook: allow to pass parameters in the menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/454804 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:51:04] (03Merged) 10jenkins-bot: cookbook: allow to pass parameters in the menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/454804 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:51:37] (03CR) 10Volans: [C: 032] cookbook: handle SystemExit exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/454805 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:52:38] (03Merged) 10jenkins-bot: cookbook: handle SystemExit exceptions [software/spicerack] - 10https://gerrit.wikimedia.org/r/454805 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:06:16] (03CR) 10Volans: [C: 032] mediawiki: fix Remote queries for maintenance host [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:07:17] (03Merged) 10jenkins-bot: mediawiki: fix Remote queries for maintenance host [software/spicerack] - 10https://gerrit.wikimedia.org/r/455098 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:07:24] !log Deploy schema change on dbstore1002:s8 [15:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:05] (03PS2) 10Dzahn: Update 2030.wikimedia.org redirect [puppet] - 10https://gerrit.wikimedia.org/r/454498 (https://phabricator.wikimedia.org/T202498) (owner: 10Reedy) [15:23:49] (03CR) 10Dzahn: [C: 032] Update 2030.wikimedia.org redirect [puppet] - 10https://gerrit.wikimedia.org/r/454498 (https://phabricator.wikimedia.org/T202498) (owner: 10Reedy) [15:25:41] (03PS3) 10Volans: Add README [cookbooks] - 10https://gerrit.wikimedia.org/r/454559 (https://phabricator.wikimedia.org/T199079) [15:25:43] (03PS2) 10Volans: Initial structure for the cookbooks hierarchy [cookbooks] - 10https://gerrit.wikimedia.org/r/454800 (https://phabricator.wikimedia.org/T199079) [15:26:58] (03CR) 10Dzahn: "does your last comment mean it's not ready to merge yet?" [puppet] - 10https://gerrit.wikimedia.org/r/454724 (https://phabricator.wikimedia.org/T202479) (owner: 10Krinkle) [15:27:25] (03CR) 10Volans: "done" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/454559 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:27:43] (03PS2) 10Dzahn: authdns::server: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/454448 [15:30:32] 10Operations, 10Beta-Cluster-Infrastructure, 10Jenkins, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Dzahn) Production deployment servers don't have a nodejs installed. I don't know why deploy... [15:34:05] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10Fjalapeno) Approves [15:36:32] 10Operations, 10Cloud-VPS, 10netops: dmz_cidr only includes some wikimedia public IP ranges, leading to some very strange behaviour - https://phabricator.wikimedia.org/T174596 (10ayounsi) So first, why maintain 4 different lists instead of 1? (or at least have the same subnets in each lists). Then 185.15.56.... [15:38:00] 10Operations, 10Patch-For-Review: Netbox: postgres cannot be restarted w/ current config - https://phabricator.wikimedia.org/T184634 (10Dzahn) [15:39:26] 10Operations, 10Patch-For-Review: Netbox: postgres cannot be restarted w/ current config - https://phabricator.wikimedia.org/T184634 (10Dzahn) Checked the monitoring check box. We have that now. Details in subtask. [15:41:32] 10Operations, 10Patch-For-Review: reinstall rdb100[56] with RAID - https://phabricator.wikimedia.org/T140442 (10elukey) Yes definitely, now only ChangeProp uses Redis and it should be easy enough to reimage rdb100[56]. I added an overview of how our Redis cluster will look like in T196685#4267110 (from what I... [15:44:48] (03CR) 10Mobrovac: [C: 04-1] "One comment in-lined." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) (owner: 10Ppchelko) [15:45:15] PROBLEM - IPMI Sensor Status on cp3030 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [15:48:35] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10RobH) @MoritzMuehlenhoff: Just to confirm, you'd like us to take a third SSD or SATA SFF disk, install into the system, and cable... [15:49:22] (03CR) 10Alex Monk: [C: 032] "I'm merging this now, I think there's a minor thing that we maybe should follow up but it's not a big deal." (032 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [15:50:26] (03Merged) 10jenkins-bot: Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [15:51:23] (03CR) 10jenkins-bot: Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [15:52:16] (03CR) 10Alex Monk: [C: 032] Implement different Certificate.save() modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/453124 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [15:53:25] (03Merged) 10jenkins-bot: Implement different Certificate.save() modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/453124 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [15:54:13] (03PS1) 10Aklapper: Order list of extensions by alphabet [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455188 [15:54:28] (03CR) 10jenkins-bot: Implement different Certificate.save() modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/453124 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [15:57:32] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 (10debt) 05Open>03Resolved [15:58:28] 10Operations, 10Discovery, 10Discovery-Search (Current work), 10Patch-For-Review: migrate elasticsearch cirrus cluster to RAID0 - https://phabricator.wikimedia.org/T198391 (10debt) 05Open>03Resolved [15:59:02] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Samuel Guebo - https://phabricator.wikimedia.org/T202362 (10RobH) I neglected to note restrited is a sudo group, and thus this will require approval in our weekly SRE... [15:59:12] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou - https://phabricator.wikimedia.org/T202486 (10RobH) I neglected to note restrited is a sudo group, and thus this will require approval in our w... [16:00:21] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: mjolnir-kafka-bulk-daemon failed on all elastic / eqiad nodes - https://phabricator.wikimedia.org/T202120 (10debt) 05Open>03Resolved [16:23:22] (03PS1) 10Andrew Bogott: Neutron: copy security group settings to neutron.conf [puppet] - 10https://gerrit.wikimedia.org/r/455193 (https://phabricator.wikimedia.org/T202150) [16:23:58] (03CR) 10Andrew Bogott: [C: 032] Neutron: copy security group settings to neutron.conf [puppet] - 10https://gerrit.wikimedia.org/r/455193 (https://phabricator.wikimedia.org/T202150) (owner: 10Andrew Bogott) [16:25:01] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10RobH) [16:26:43] (03CR) 10Krinkle: "It's ready." [puppet] - 10https://gerrit.wikimedia.org/r/454724 (https://phabricator.wikimedia.org/T202479) (owner: 10Krinkle) [17:01:25] 10Operations, 10Patch-For-Review: reinstall rdb100[56] with RAID - https://phabricator.wikimedia.org/T140442 (10Dzahn) Thanks! Could we continue on the ticket? That means it doesn't have to be real time in the same timezone which isn't that easy to organize for me. [17:12:14] I have the conch. [17:17:35] (03CR) 10Alex Monk: [C: 04-1] Certcentral integration tests (034 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/454045 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [17:21:23] 10Operations, 10Legalpad: Update terms "Labs" and "Operations" in L3 - https://phabricator.wikimedia.org/T202617 (10RobH) >>! In T202617#4528278, @MarcoAurelio wrote: > @RobH The "Scope" section at L3 still mentions Labs. Not much of an issue as the page redirects to Cloud Services Terms of Use though. Regards... [17:21:45] 10Operations, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect 2030.wikimedia.org to the new movement strategy portal - https://phabricator.wikimedia.org/T202498 (10Dzahn) The changes has been deployed. The redirect should change once varnish cashes are updated. ( i think within 24 hours max) [17:22:58] (03PS1) 10Andrew Bogott: Neutron: Make services subscribe to their config files [puppet] - 10https://gerrit.wikimedia.org/r/455208 [17:23:35] (03CR) 10jerkins-bot: [V: 04-1] Neutron: Make services subscribe to their config files [puppet] - 10https://gerrit.wikimedia.org/r/455208 (owner: 10Andrew Bogott) [17:24:26] 10Operations: Update & standardize Platform-specific_documentation for HP servers - https://phabricator.wikimedia.org/T138866 (10RobH) 05Open>03Resolved [17:24:38] 10Operations: Update & standardize Platform-specific_documentation for HP servers - https://phabricator.wikimedia.org/T138866 (10RobH) keeping this open as these are living docs is silly, closing. [17:25:11] (03PS2) 10Andrew Bogott: Neutron: Make services subscribe to their config files [puppet] - 10https://gerrit.wikimedia.org/r/455208 [17:29:03] (03CR) 10Jforrester: [C: 04-2] Enable the WikibaseMediaInfo extension in Beta Cluster Commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446845 (https://phabricator.wikimedia.org/T180981) (owner: 10Jforrester) [17:33:13] (03CR) 10Alex Monk: Deliver certificates in every save mode (031 comment) [software/certcentral] - 10https://gerrit.wikimedia.org/r/454794 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [17:41:59] (03PS3) 10Andrew Bogott: Neutron: Make services subscribe to their config files [puppet] - 10https://gerrit.wikimedia.org/r/455208 [17:43:26] (03CR) 10Alex Monk: Implement DNS01 challenge support (031 comment) [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [17:43:53] Updates from me: [17:44:06] !log scap sync-file php-1.32.0-wmf.18/extensions/WikiEditor/modules/jquery.wikiEditor.dialogs.config.js Hot-deploy of I364ac118255 to fix missing icon [17:44:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:18] !log scap sync-file php-1.32.0-wmf.18/extensions/Echo/includes/gateway/UserNotificationGateway.php Hot-deploy of Ifdfa93059 to fix T202672 log error [17:44:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:23] T202672: Echo/includes/gateway/UserNotificationGateway.php: PHP Notice: Undefined variable: success - https://phabricator.wikimedia.org/T202672 [17:44:25] (03CR) 10Andrew Bogott: [C: 032] Neutron: Make services subscribe to their config files [puppet] - 10https://gerrit.wikimedia.org/r/455208 (owner: 10Andrew Bogott) [17:44:28] !log [17:44:28] James_F: Message missing. Nothing logged. [17:44:35] !log scap sync-dir php-1.32.0-wmf.18/extensions/WikimediaEvents/ No-op bump this code to avoid dirty deploy repo [17:44:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:19] OK, hot-deploy over, I give up the conch. [17:47:37] 10Operations, 10Cloud-VPS, 10netops: dmz_cidr only includes some wikimedia public IP ranges, leading to some very strange behaviour - https://phabricator.wikimedia.org/T174596 (10chasemp) Yeah it would be best to have this list of prod networks to preserve 172 source IP for: a) a fixed list of required end... [17:49:26] (03CR) 10Alex Monk: [C: 04-1] Certcentral integration tests (031 comment) [software/certcentral] - 10https://gerrit.wikimedia.org/r/454045 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [17:50:01] (03PS1) 10Andrew Bogott: region-migrate-security-groups: handle 10.0.0.0/8 as a special case [puppet] - 10https://gerrit.wikimedia.org/r/455212 [17:50:56] (03CR) 10Alex Monk: [C: 032] Provide support in the API for different certificate save modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/455153 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [17:51:23] (03CR) 10Andrew Bogott: [C: 032] region-migrate-security-groups: handle 10.0.0.0/8 as a special case [puppet] - 10https://gerrit.wikimedia.org/r/455212 (owner: 10Andrew Bogott) [17:52:37] 10Operations, 10Cloud-VPS, 10netops: dmz_cidr only includes some wikimedia public IP ranges, leading to some very strange behaviour - https://phabricator.wikimedia.org/T174596 (10Krenair) [17:54:02] robh: Not sure if you're on ops clinic duty today (/topic vandalism is fun); Logmsgbot seems to be down, and I think only SRExen can fix it? [17:54:59] logmsgbot will live in prod yeah [17:55:37] probably just needs someone in ops to run the restart instructions at https://wikitech.wikimedia.org/wiki/Logmsgbot [18:02:23] yeah cool ill check now [18:03:10] showed running, but i restarted [18:03:15] there we go [18:03:40] James_F: sorry for delay i had my sound turns too far down to hear the ping [18:04:15] has there been a lot of topic vandalism in here? [18:04:21] we can relock it to @ again [18:04:51] I don't know, just saw the one time. [18:04:58] Thanks, robh. [18:05:36] well, everyone who can do clinic duty or should be updating the topic has ops in here [18:05:48] so relocked for now [18:06:05] Kk. [18:06:06] its just annoying during a outage to have to op yourself to change the optic i think, its also more annoying to deal with topic spam. [18:06:15] heh [18:06:26] Yeah. No reason to not have SRExen perma-opped in here, though. [18:09:02] (03PS1) 10Andrew Bogott: region-migrate-security-groups: also look for 10.4.0.0/16 cidrs [puppet] - 10https://gerrit.wikimedia.org/r/455213 [18:11:32] 10Operations, 10CommRel-Internals, 10Wikimedia-Mailing-lists: Close https://lists.wikimedia.org/mailman/listinfo/cep and keep the archive for now - https://phabricator.wikimedia.org/T155683 (10Quiddity) Hi @Dzahn, We already have and are using a new replacement team-list at `commrel-support@wikimedia.org` (n... [18:12:21] 10Operations, 10Data-Services, 10Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083 (10Bstorm) [18:13:00] (03CR) 10Andrew Bogott: [C: 032] region-migrate-security-groups: also look for 10.4.0.0/16 cidrs [puppet] - 10https://gerrit.wikimedia.org/r/455213 (owner: 10Andrew Bogott) [18:22:05] 10Operations, 10CommRel-Internals, 10Wikimedia-Mailing-lists: Close https://lists.wikimedia.org/mailman/listinfo/cep and keep the archive for now - https://phabricator.wikimedia.org/T155683 (10Dzahn) Gotcha @Quiddity It woud be an easy request if you were asking for an alias from one mailman list to another.... [18:44:13] I'm not sure there's any point having this channel be +p but okay [18:47:44] 10Operations, 10monitoring: rack/setup/install icinga1001.wikimedia.org - https://phabricator.wikimedia.org/T201344 (10RobH) a:05RobH>03None [18:47:58] (03PS1) 10Bstorm: nfs-exportd: gratuitous conversion to python3 [puppet] - 10https://gerrit.wikimedia.org/r/455219 (https://phabricator.wikimedia.org/T202294) [18:48:10] 10Operations, 10monitoring: rack/setup/install icinga1001.wikimedia.org - https://phabricator.wikimedia.org/T201344 (10RobH) a:03Dzahn This is ready for @dzahn to take over for service implementation. [18:51:05] PROBLEM - Device not healthy -SMART- on db2058 is CRITICAL: cluster=mysql device=cciss,10 instance=db2058:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2058&var-datasource=codfw%2520prometheus%252Fops [18:55:39] (03PS2) 10Ppchelko: Replace the semver patch version in Accept with x [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) [18:58:01] (03CR) 10Ppchelko: "> One thing to note here is that this silently ignores versioning errors in cases where client supply a higher-than-available patch versio" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) (owner: 10Ppchelko) [19:12:03] (03CR) 10Alex Monk: nfs-exportd: gratuitous conversion to python3 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455219 (https://phabricator.wikimedia.org/T202294) (owner: 10Bstorm) [19:17:19] 10Operations, 10CommRel-Internals, 10Wikimedia-Mailing-lists: Close https://lists.wikimedia.org/mailman/listinfo/cep and keep the archive for now - https://phabricator.wikimedia.org/T155683 (10Dzahn) I ran this on the mailman server, nevertheless. that should just mean you don't get as much spam now. ``` /... [19:23:23] 10Operations, 10CommRel-Internals, 10Wikimedia-Mailing-lists: Close https://lists.wikimedia.org/mailman/listinfo/cep and keep the archive for now - https://phabricator.wikimedia.org/T155683 (10Dzahn) @Herron Do you see an easy way to alias something @lists.wikimedia.org to something @wikimedia.org? I looked... [19:29:30] robh: ping? [19:30:04] robh: not sure if you could help but maybe you know who could with https://phabricator.wikimedia.org/T202764 (since you're on clinic duty :) [19:30:41] gehel: You about? [19:30:51] I think he handles wdqs stuff [19:30:57] robh: almost [19:31:01] though its a Friday midday here ;] [19:31:02] robh: it's not wdqs stuff, it's wikidata stuff [19:31:06] oh, my bad [19:31:16] so its wikidata slamming wdqs [19:31:32] it's wikidata not wanting to talk to wdqs [19:32:07] and it increased 3x in last 2 days [19:32:11] which is worrying [19:32:54] after increasing about 8x at the start of the week [19:35:08] robh: it's only slamming the updates, an async flow, so no direct user impact [19:35:20] except that the data is somewhat out of date [19:35:44] yep. What worries me is like 10x jump of fails recently [19:35:52] without any change on our side [19:36:35] e.g. see https://logstash.wikimedia.org/goto/37f4fa1f831439d0d422878e41cc69eb [19:37:00] yep, definitly not great! [19:37:42] WDQS can recover from it, but it worries me that something is wrong on wikidata side [19:38:15] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Smalyshev) [19:41:09] sorry, dell called me got distracted [19:41:26] gehel: do we think this is a 'start calling random sre folks who work on wikidata' kinda bad [19:41:31] or a wait until monday meeting bad? [19:41:44] not trying to pass the buck, i honestly have zero clue as to the impact of this issue. [19:41:57] robh: I don't either :/ [19:41:58] since its friday night for the majority of our team at this point, its relevant ;] [19:42:07] hrmm [19:42:31] Do we have any SRE members familiar with wikidata (and its comminication with wdqs) https://phabricator.wikimedia.org/T202764 [19:42:31] it probably can wait till monday meeting [19:42:33] it is clear that there is an issue, affecting wikidata, most probably at least slow down, probably for all wikidata traffic [19:42:41] bah, imeant to ask that in back channel [19:42:42] heh [19:42:44] or at least api traffic to wikidata [19:42:57] nothing to hide about that! [19:43:07] SMalyshev: Well, at the worst I can certainly bring it up in the monday meeting and ensure we get everyone alerted to the issue! [19:43:23] but i feel like that is the bare minimum =P [19:43:44] yeah I think it's not as bad as getting people working on weekend [19:43:49] 10Operations, 10Wikimedia-Mailing-lists: Creation of a mailing list for the "Wiki Labs Culture" initiative - https://phabricator.wikimedia.org/T202737 (10Dzahn) a:03Dzahn [19:43:54] robh: I can do that as well, remind me if I forget! [19:43:58] I have a few more infos [19:44:10] I just wanted somebody to be aware of the issue and start handling it [19:44:13] * gehel needs to go on weekend. Ping me if needed [19:44:21] SMalyshev: indeed! thank you for filing the task and bringing it up! [19:44:40] I've flagged it for my pointing it out in meeting (i have a coiuple tasks for this review ;) [19:45:04] for our blockers/attention section [19:45:40] 10Operations, 10Wikimedia-Mailing-lists: Creation of a mailing list for the "Wiki Labs Culture" initiative - https://phabricator.wikimedia.org/T202737 (10Dzahn) Happy to support anything that moves user groups off of Facebook. That's really great for inclusion. Will create it today. [19:59:20] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): tools-k8s-master-01 has two floating IPs - https://phabricator.wikimedia.org/T164123 (10Andrew) a:03Andrew [20:11:49] 10Operations, 10Wikimedia-Mailing-lists: Creation of a mailing list for the "Wiki Labs Culture" initiative - https://phabricator.wikimedia.org/T202737 (10Dzahn) You have successfully created the mailing list wikilabskultur and notification has been sent to the list owner steen@thomassen.net. You can now: [[ h... [20:18:41] 10Operations, 10Wikimedia-Mailing-lists: Creation of a mailing list for the "Wiki Labs Culture" initiative - https://phabricator.wikimedia.org/T202737 (10Dzahn) 05Open>03Resolved Hi @InsaneHacker See the links above, you should have received 2 emails with automatically generated passwords. The second one i... [20:20:26] 10Operations, 10Legalpad: Update terms "Labs" and "Operations" in L3 - https://phabricator.wikimedia.org/T202617 (10MarcoAurelio) >>! In T202617#4530736, @RobH wrote: >>>! In T202617#4528278, @MarcoAurelio wrote: >> @RobH The "Scope" section at L3 still mentions Labs. Not much of an issue as the page redirects... [20:30:06] (03CR) 10MarcoAurelio: Add correct sitename for satwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/450469 (https://phabricator.wikimedia.org/T198400) (owner: 10Urbanecm) [20:35:38] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [20:37:57] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [20:38:16] was about to say,, already over on the graph [20:42:17] (03CR) 10Alex Monk: [C: 04-1] ircecho: Support auth over irc (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/405594 (owner: 10Paladox) [20:48:16] (03PS1) 10Urbanecm: Permissions changes in ruwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455232 (https://phabricator.wikimedia.org/T201265) [20:52:14] (03PS1) 10Urbanecm: yphc.ir to the wgCopyUploadsDomains whitelist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455234 (https://phabricator.wikimedia.org/T201237) [20:56:12] (03PS13) 10Paladox: ircecho: Support auth over irc [puppet] - 10https://gerrit.wikimedia.org/r/405594 [21:04:03] paladox: you can attach that https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/405594/ to the old bug T48254 :) [21:04:04] T48254: ircecho should support nickserv registration - https://phabricator.wikimedia.org/T48254 [21:05:40] hashar thanks, i have a open change edit so when i publish it :) [21:15:04] (03PS1) 10Dzahn: icinga: make service_ensure status configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455238 (https://phabricator.wikimedia.org/T201344) [21:15:41] (03CR) 10jerkins-bot: [V: 04-1] icinga: make service_ensure status configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455238 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [21:21:38] (03PS1) 10Urbanecm: Upload HD logos for various wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455240 (https://phabricator.wikimedia.org/T177506) [21:21:40] (03PS1) 10Urbanecm: Use HD logos for various Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455241 (https://phabricator.wikimedia.org/T177506) [21:27:48] (03CR) 10Dzahn: "yes... "Found hiera call in class 'role::icinga'" but those hiera calls should all move together when this is converted to profile..out of" [puppet] - 10https://gerrit.wikimedia.org/r/455238 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [21:47:03] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: add SSDs to wdqs200[12] - https://phabricator.wikimedia.org/T202777 (10RobH) p:05Triage>03Normal [21:47:06] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: add more ssds to wdqs2003 - https://phabricator.wikimedia.org/T202778 (10RobH) p:05Triage>03Normal [21:48:37] 10Operations, 10ops-codfw, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs200[12] - https://phabricator.wikimedia.org/T202777 (10RobH) [21:48:50] 10Operations, 10ops-codfw, 10Discovery, 10Wikidata, and 2 others: add more ssds to wdqs2003 - https://phabricator.wikimedia.org/T202778 (10RobH) [21:49:09] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs100[45] - https://phabricator.wikimedia.org/T202779 (10RobH) p:05Triage>03Normal [21:50:39] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485 (10RobH) Please note all sub-tasks for additions to the wdqs cluster have been created and are linked off this task. Next steps are for coordination with @gehel and onsites to... [21:51:13] 10Operations, 10ops-codfw, 10Discovery, 10Wikidata, and 2 others: add ssds to wdqs2003 - https://phabricator.wikimedia.org/T202778 (10RobH) [21:52:22] (03PS1) 10Dzahn: icinga: rename role::icinga to profile::icinga [puppet] - 10https://gerrit.wikimedia.org/r/455247 [21:52:24] (03PS1) 10Dzahn: icinga: move Hiera calls to profile parameters [puppet] - 10https://gerrit.wikimedia.org/r/455248 [21:53:06] (03CR) 10jerkins-bot: [V: 04-1] icinga: rename role::icinga to profile::icinga [puppet] - 10https://gerrit.wikimedia.org/r/455247 (owner: 10Dzahn) [21:53:22] (03CR) 10jerkins-bot: [V: 04-1] icinga: move Hiera calls to profile parameters [puppet] - 10https://gerrit.wikimedia.org/r/455248 (owner: 10Dzahn) [21:55:00] (03PS1) 10MarcoAurelio: Use translated MetaNamespace for fy.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455249 (https://phabricator.wikimedia.org/T202769) [21:56:22] * Krinkle staging on mwdebug1002 and deploy1001 [21:58:08] (03PS2) 10Dzahn: icinga: rename role::icinga to profile::icinga [puppet] - 10https://gerrit.wikimedia.org/r/455247 [22:01:31] (03PS2) 10Dzahn: icinga: make service_ensure status configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455238 (https://phabricator.wikimedia.org/T201344) [22:01:33] (03PS3) 10Dzahn: icinga: rename role::icinga to profile::icinga [puppet] - 10https://gerrit.wikimedia.org/r/455247 [22:01:35] (03PS2) 10Dzahn: icinga: move Hiera calls to profile parameters [puppet] - 10https://gerrit.wikimedia.org/r/455248 [22:02:05] (03CR) 10jerkins-bot: [V: 04-1] icinga: make service_ensure status configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455238 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [22:02:27] (03CR) 10jerkins-bot: [V: 04-1] icinga: rename role::icinga to profile::icinga [puppet] - 10https://gerrit.wikimedia.org/r/455247 (owner: 10Dzahn) [22:02:43] (03CR) 10jerkins-bot: [V: 04-1] icinga: move Hiera calls to profile parameters [puppet] - 10https://gerrit.wikimedia.org/r/455248 (owner: 10Dzahn) [22:07:21] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.18/extensions/WikimediaEvents/: I17452c980f2588bbd (duration: 00m 50s) [22:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:07:50] (03PS4) 10Dzahn: icinga: rename role::icinga to profile::icinga [puppet] - 10https://gerrit.wikimedia.org/r/455247 [22:11:44] Krinkle: before you leave, would it be possible for you to make a *dry-run* of namespaceDupes.php for me for fywikitionary cf. T202769 ? [22:11:45] T202769: Request to rename a namespace on fy.wiktionary.org - https://phabricator.wikimedia.org/T202769 [22:11:57] mwscript namespaceDupes.php --wiki=fywiktionary [22:12:12] (that's dry-run, won't do anything) [22:13:54] (03PS3) 10Dzahn: icinga: make service_ensure status configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455238 (https://phabricator.wikimedia.org/T201344) [22:14:32] (03CR) 10jerkins-bot: [V: 04-1] icinga: make service_ensure status configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455238 (https://phabricator.wikimedia.org/T201344) (owner: 10Dzahn) [22:17:47] Hauskatze: I'll try. Does it need an additional parameter to find things relating to the proposed rename? [22:19:27] Krinkle: thanks - the command I gave to you should be enough to find the instances of broken links and namespaces [22:19:39] (on mwmaint1/2 ofc forgot to say) [22:19:53] Hauskatze: It will search based on current namespace configuration. [22:20:00] yup [22:20:05] unless a parameter indicates otherwise [22:20:21] I assume you want to know about pages currently in NS_0 that conflict with the proposed name for the project namespace, right? [22:20:22] I guess there'll be no conflicts for now, then [22:20:34] yep, that [22:20:50] but maybe we can leave it for another moment as I'm heading to bed [22:21:00] I'm very tired and I don't see things very clear atm [22:21:01] It will only find those if it is run after the change, or if the new name is passed as parameter. [22:21:17] I've run it, anyway [22:21:19] 0 pages to fix, 0 were resolvable. [22:21:19] 0 links to fix, 0 were resolvable. [22:21:34] Which is unsurprising, as otherwise there would currently pages that we cannot view. [22:21:41] well this is good, there's no conflicts to fix before the merge [22:21:49] :) [22:22:01] there's https://fy.wiktionary.org/wiki/Wiki:Alle_titels?from=Wikiwurdboek&to=&namespace=0 however [22:22:37] the patch sets Wiktionary -> Wikiwurdboek, so I'm not sure about those redirects [22:22:49] If a page exists as "Foo:Bar" and "Foo:" is not a namespace, then that page belongs to the Main namespace. After you create the "Foo" namespace, links fo "Foo:Bar" will be broken, because MediaWiki will look for NS_FOO title "Bar" which does not exist. [22:23:10] those redirects should be deleted then [22:23:22] It is impossible to create a page in the main namespace with the prefix of another namespace. This can only happen after a namespace is added/changed. [22:23:40] Yeah, that is what the script does, it renames them so that someone can delete them if needed. [22:23:56] so I can do it now or wait after the patch is applied [22:24:00] but that is after the change. Before the change, you'll need to add the parameter with the proposed prefix. [22:24:12] which is basically the same as the prefixindex you used. [22:24:25] except it also scans for WhatlinksHere for would-be redlinks. [22:24:44] okay! I'll think about this during my sleep [22:25:06] Yeah, the process is usually that before changing or adding a namespace, this maintenance script is run, to look for conflicts with the prorposed name. that means the proposed name must be set as a parameter. [22:25:22] Feel free to ping me tomorrow and we can try again. No problem. [22:25:52] Thanks :) [22:25:54] good night [22:26:34] (03PS7) 10MarcoAurelio: Modify gender namespaces for pl.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) [22:33:42] 10Operations, 10monitoring: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) [22:34:38] 10Operations, 10monitoring: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) [22:34:44] 10Operations, 10monitoring, 10Patch-For-Review: rack/setup/install icinga1001.wikimedia.org - https://phabricator.wikimedia.org/T201344 (10Dzahn) [22:41:04] 10Operations, 10monitoring: add icinga1001 to allowed hosts for AQL SMS gateway - https://phabricator.wikimedia.org/T202784 (10Dzahn) [22:41:28] 10Operations, 10monitoring: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) [22:41:30] 10Operations, 10monitoring: add icinga1001 to allowed hosts for AQL SMS gateway - https://phabricator.wikimedia.org/T202784 (10Dzahn) [22:41:41] 10Operations, 10monitoring: add icinga1001 to allowed hosts for AQL SMS gateway - https://phabricator.wikimedia.org/T202784 (10Dzahn) p:05Triage>03Low [23:02:50] (03PS4) 10Dzahn: icinga: make service_ensure status configurable in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455238 (https://phabricator.wikimedia.org/T201344) [23:06:14] 10Operations, 10Icinga: register a nickserv account for icinga-wm - https://phabricator.wikimedia.org/T22771 (10Krenair) [23:06:18] 10Operations, 10IRCecho: ircecho should support nickserv registration - https://phabricator.wikimedia.org/T48254 (10Krenair) 05declined>03Open I think it's fine to reopen this without needing another task. This is also for shinken-wm and we've recently found that the `$~a` ban applied everywhere via #wikim... [23:07:05] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/12234/" [puppet] - 10https://gerrit.wikimedia.org/r/455247 (owner: 10Dzahn) [23:11:28] (03PS1) 10Dzahn: icinga: move Hiera data from hosts to role [puppet] - 10https://gerrit.wikimedia.org/r/455262 [23:16:30] (03PS1) 10Dzahn: icinga: add Hieradata for icinga1001, set to passive/disabled [puppet] - 10https://gerrit.wikimedia.org/r/455264 [23:17:29] (03PS1) 10Aklapper: Phab: Clarify that spaces are not allowed in user account names [puppet] - 10https://gerrit.wikimedia.org/r/455265 (https://phabricator.wikimedia.org/T179126) [23:18:39] (03CR) 10Aklapper: "No idea if keeping the %d like in the code is correct here." [puppet] - 10https://gerrit.wikimedia.org/r/455265 (https://phabricator.wikimedia.org/T179126) (owner: 10Aklapper) [23:33:29] (03PS1) 10Aklapper: Phab: Use our custom Priority field value in tooltip on Reports page [puppet] - 10https://gerrit.wikimedia.org/r/455271 (https://phabricator.wikimedia.org/T91428) [23:45:43] (03PS1) 10Dzahn: netbox: move IP addresses from class to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455273 [23:46:32] (03CR) 10jerkins-bot: [V: 04-1] netbox: move IP addresses from class to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455273 (owner: 10Dzahn) [23:48:10] (03PS2) 10Dzahn: netbox: move IP addresses from class to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455273 [23:48:52] (03CR) 10jerkins-bot: [V: 04-1] netbox: move IP addresses from class to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455273 (owner: 10Dzahn) [23:52:00] (03PS3) 10Dzahn: netbox: move IP addresses from class to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/455273 [23:56:32] (03CR) 10Alex Monk: "Uh, wasn't that for pmtpa?" [puppet] - 10https://gerrit.wikimedia.org/r/455213 (owner: 10Andrew Bogott) [23:58:20] (03CR) 10Alex Monk: "-; 10.4.0.0/21 - guest VMs subnet" [puppet] - 10https://gerrit.wikimedia.org/r/455213 (owner: 10Andrew Bogott)