[00:07:26] PROBLEM - puppet last run on ganeti1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:09:56] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:35:26] RECOVERY - puppet last run on ganeti1003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [00:38:56] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [01:15:05] 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2960088 (10Reedy) [02:02:37] !log l10nupdate@tin LocalisationUpdate failed: git pull of extensions failed [02:02:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:09:59] That's fun [02:14:06] l10nupdate@tin:/var/lib/l10nupdate/mediawiki/extensions$ git submodule update --init --recursive [02:14:06] Cloning into 'modules/phpflickr'... [02:14:07] No [02:15:55] Ah, not that [02:16:00] Submodule path 'YouTube': checked out '704c07ba28c0cfeb1f15a9414019f581fb73e318' [02:16:00] error: Your local changes to the following files would be overwritten by checkout: [02:16:04] .gitignore [02:23:29] !log cleaned up reCaptcha extension in l10ncache dirs [02:23:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:23:46] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:26:03] !log running l10nupdate manually [02:26:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:46:56] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:47:26] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:52:46] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [02:57:48] !log reedy@tin scap sync-l10n completed (1.29.0-wmf.8) (duration: 13m 14s) [02:57:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:02:32] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Jan 23 03:02:32 UTC 2017 (duration 4m 44s) [03:02:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:13:05] 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2960258 (10bd808) [03:13:46] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:14:46] Reedy: thanks for noticing and fixing that l10nupdate failure :) [03:15:10] I noticed in SAL->twitter last night... And it happened again [03:15:56] RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [03:16:26] RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [03:41:46] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [04:38:04] 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2960364 (10bd808) [04:38:34] 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2334744 (10bd808) [04:39:24] 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2334744 (10bd808) [04:46:06] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:14:08] (03CR) 10Niharika29: [C: 031] Amend category collation for de.wikisource to 'uca-de-u-kn' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333488 (https://phabricator.wikimedia.org/T155916) (owner: 10MarcoAurelio) [05:15:06] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [05:52:46] PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:13:16] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:21:46] RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:29:26] PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:34:26] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:41:16] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:43:37] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: Monitor Certificate Transparency (CT) logs - https://phabricator.wikimedia.org/T155807#2960473 (10faidon) I prepared packages for certspotter and filed a Debian ITP to maintain the package upstream (I'll wait for 0.3 to upload). I also built it (and its... [06:44:16] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:54:46] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [07:00:06] PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures [07:02:26] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [07:03:16] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:03:58] 06Operations, 10DBA, 10Wikimedia-General-or-Unknown: Spurious completely empty `image` table row on commonswiki - https://phabricator.wikimedia.org/T155769#2960504 (10Marostegui) >>! In T155769#2956568, @Legoktm wrote: > Let's just delete it? Seems similar to T96233. If you guys consider it is safe to delet... [07:05:06] RECOVERY - check_puppetrun on americium is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [07:12:35] (03PS10) 10Giuseppe Lavagetto: base: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/332355 [07:18:12] _joe_: since you're online, can I ask what's the eta of https://phabricator.wikimedia.org/T155098 ? [07:18:46] <_joe_> I guess whenever brion wants to merge the mediawiki changes [07:19:07] <_joe_> https://gerrit.wikimedia.org/r/#/c/331668/ see my comment here [07:19:42] (03PS11) 10Giuseppe Lavagetto: base: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/332355 [07:20:04] k [07:22:46] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [07:28:16] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:28:33] !log Compressing cebwiki.templatelinks on db1015 (224G table) - T153739 [07:28:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:28:38] T153739: Defragment db1015 - https://phabricator.wikimedia.org/T153739 [07:30:16] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:32:27] !log Deploy gtid_domain_id db1043 (passive master) - last host pending in m3 - T149418 [07:32:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:30] T149418: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 [07:38:24] (03PS2) 10Marostegui: eventlogging: Enable gtid_domainid on eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/332965 (https://phabricator.wikimedia.org/T149418) [07:40:07] (03CR) 10Marostegui: [C: 032] eventlogging: Enable gtid_domainid on eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/332965 (https://phabricator.wikimedia.org/T149418) (owner: 10Marostegui) [07:43:31] !log Enabling gtid_domain_id on db1046 (eventlogging master) - T149418 [07:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:34] T149418: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 [07:47:10] !log Enabling gtid_domain_id on db1047 (eventlogging host) - T149418 [07:47:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:55] (03PS3) 10Muehlenhoff: Stick with node 4.6 on maps due to karthotherian not being ready for node 6 [puppet] - 10https://gerrit.wikimedia.org/r/332768 (https://phabricator.wikimedia.org/T149331) [08:08:16] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [08:13:58] (03CR) 10Muehlenhoff: "Since this service unit is generic useful it should rather be part of the Debian package. In addition to shipping the systemd unit you als" (032 comments) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [08:24:26] (03PS1) 10ArielGlenn: in snapshot manifests fix up more template to shell script conversions [puppet] - 10https://gerrit.wikimedia.org/r/333572 [08:25:35] (03CR) 10ArielGlenn: [C: 032] in snapshot manifests fix up more template to shell script conversions [puppet] - 10https://gerrit.wikimedia.org/r/333572 (owner: 10ArielGlenn) [08:29:48] (03PS1) 10ArielGlenn: snapshot hosts need one more directory defined for sh scripts [puppet] - 10https://gerrit.wikimedia.org/r/333573 [08:29:50] 06Operations, 10ops-codfw, 10netops: asw-a7-codfw is down - https://phabricator.wikimedia.org/T154758#2960554 (10faidon) [08:30:56] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 10 failures. Last run 2 minutes ago with 10 failures. Failed resources (up to 3 shown): Exec[uncomment root bash aliases],Service[exim4],Exec[bump nf_conntrack hash table size],Service[ganglia-monitor] [08:31:13] (03CR) 10ArielGlenn: [C: 032] snapshot hosts need one more directory defined for sh scripts [puppet] - 10https://gerrit.wikimedia.org/r/333573 (owner: 10ArielGlenn) [08:39:36] 06Operations, 10ops-eqiad: Degraded RAID on ms1001 - https://phabricator.wikimedia.org/T152367#2960556 (10ArielGlenn) @Volans This host is in use but it's a secondary. It may be taken offline for disk replacement (if hotswap is not possible) at any time without the need for scheduling. [08:40:04] 06Operations, 10ops-eqiad, 10Datasets-General-or-Unknown: Degraded RAID on ms1001 - https://phabricator.wikimedia.org/T152367#2960557 (10ArielGlenn) [08:43:44] (03CR) 10Hashar: "Indeed that is an annoyance of puppet/ruby when one uses a non UTF-8 locale such as C. I can reproduce setting LC_ALL=C" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [08:43:52] 06Operations, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#2960558 (10MoritzMuehlenhoff) p:05Triage>03Normal [08:55:57] 06Operations, 10OTRS, 07Wikimedia-Incident: OTRS error (back up, now monitoring) - https://phabricator.wikimedia.org/T154841#2960565 (10akosiaris) The upstream bug has been marked as FIXED. Seems like it will make it to 5.0.17. It's more of a workaround than a bug fix since it just mutes GenericAgent tasks t... [08:57:44] (03PS9) 10Hashar: puppet parse validate from rake [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) [08:57:46] (03PS1) 10Hashar: Fix invalid byte sequence in US-ASCII [puppet] - 10https://gerrit.wikimedia.org/r/333576 [08:58:14] (03CR) 10Hashar: "Non ASCII errors should all be fixed with https://gerrit.wikimedia.org/r/333576 which I made a parent of this change." [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [08:59:56] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [09:03:46] (03PS9) 10Juniorsys: geowiki module: Lint changes + modes/umask quoting [puppet] - 10https://gerrit.wikimedia.org/r/332101 (https://phabricator.wikimedia.org/T93645) [09:03:59] (03PS8) 10Juniorsys: mediawiki module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332103 (https://phabricator.wikimedia.org/T93645) [09:04:11] (03PS8) 10Juniorsys: postgresql module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332104 (https://phabricator.wikimedia.org/T93645) [09:04:39] (03PS8) 10Juniorsys: toollabs role modules: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332110 (https://phabricator.wikimedia.org/T93645) [09:04:45] (03PS8) 10Juniorsys: toollabs module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332111 [09:06:23] !log Compress s2 on dbstore2001 - T151552 [09:06:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:27] T151552: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552 [09:08:19] (03PS7) 10Juniorsys: role analytics_cluster: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332106 (https://phabricator.wikimedia.org/T93645) [09:09:25] (03CR) 10Alexandros Kosiaris: [C: 032] Puppetmaster: remove temporary logging for debugging [puppet] - 10https://gerrit.wikimedia.org/r/333162 (https://phabricator.wikimedia.org/T128895) (owner: 10Volans) [09:09:31] (03PS2) 10Alexandros Kosiaris: Puppetmaster: remove temporary logging for debugging [puppet] - 10https://gerrit.wikimedia.org/r/333162 (https://phabricator.wikimedia.org/T128895) (owner: 10Volans) [09:09:34] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Puppetmaster: remove temporary logging for debugging [puppet] - 10https://gerrit.wikimedia.org/r/333162 (https://phabricator.wikimedia.org/T128895) (owner: 10Volans) [09:10:15] 06Operations, 13Patch-For-Review: Randomly failing puppetmaster sync to strontium - https://phabricator.wikimedia.org/T128895#2960601 (10akosiaris) Agreed. Patch merged. Also resolving this. We haven't witnessed this for quite a while (>3months) so I think we are done with this [09:18:21] (03PS3) 10Paladox: Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 [09:19:27] (03CR) 10Paladox: Gerrit: Add a systemd init script fro gerrit (032 comments) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [09:24:16] (03PS4) 10Paladox: Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 [09:24:39] (03PS9) 10Paladox: Phabricator: Fix phd init script, also use systemd script if the os is cable of it [puppet] - 10https://gerrit.wikimedia.org/r/333358 [09:26:46] (03PS10) 10Juniorsys: geowiki module: Lint changes + modes/umask quoting [puppet] - 10https://gerrit.wikimedia.org/r/332101 (https://phabricator.wikimedia.org/T93645) [09:26:50] (03PS9) 10Juniorsys: mediawiki module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332103 (https://phabricator.wikimedia.org/T93645) [09:26:53] (03PS9) 10Juniorsys: postgresql module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332104 (https://phabricator.wikimedia.org/T93645) [09:26:57] (03PS8) 10Juniorsys: role analytics_cluster: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332106 (https://phabricator.wikimedia.org/T93645) [09:27:02] (03PS9) 10Juniorsys: toollabs role modules: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332110 (https://phabricator.wikimedia.org/T93645) [09:27:05] (03PS9) 10Juniorsys: toollabs module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332111 [09:30:11] 06Operations, 06Labs, 07kubernetes: docker-engine pulled into our repositories only keeps the latest version - https://phabricator.wikimedia.org/T153416#2960621 (10akosiaris) Yeah, I thought about the section workaround too, but at best it's a hack. And tbh, I am not in love with the idea of a "labs" section... [09:30:48] 06Operations: Package the next LTS kernel (4.9) - https://phabricator.wikimedia.org/T154934#2960626 (10MoritzMuehlenhoff) 4.9 introduced a new security hardening feature around stack handling [1], which caused fallout in form of memory corruption all over the kernel tree. Brad Spengler of grsecurity reported a f... [09:32:20] (03CR) 10Giuseppe Lavagetto: [C: 032] "http://puppet-compiler.wmflabs.org/5178 everything looks good, both deployment-prep and tools have been shown to be noops, so I'll just mo" [puppet] - 10https://gerrit.wikimedia.org/r/332355 (owner: 10Giuseppe Lavagetto) [09:32:32] (03PS12) 10Giuseppe Lavagetto: base: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/332355 [09:32:58] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] base: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/332355 (owner: 10Giuseppe Lavagetto) [09:33:07] <_joe_> let's go [09:34:44] \o/ [09:35:04] 06Operations, 13Patch-For-Review: Randomly failing puppetmaster sync to strontium - https://phabricator.wikimedia.org/T128895#2960632 (10akosiaris) 05stalled>03Resolved a:03akosiaris [09:36:56] PROBLEM - puppet last run on mw2104 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:39:34] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare and improve the datacenter switchover procedure - https://phabricator.wikimedia.org/T154658#2960635 (10Gilles) Can you make a subtask for the warmup tool with details about what you need and add the #per... [09:40:18] (03CR) 10Filippo Giunchedi: "Nice! I've tried this on osx and worked fine after 'bundle install'. It took around 5min for 'rake lint' to complete here: 331.41 real " [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [09:41:32] (03CR) 10Filippo Giunchedi: [C: 031] icinga: critical on ripe atlas check exceptions [puppet] - 10https://gerrit.wikimedia.org/r/333093 (owner: 10Ema) [09:56:26] RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK [10:03:56] RECOVERY - puppet last run on mw2104 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:09:56] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:15:42] 06Operations, 10ops-codfw: mw2098 drac offline - system unreachable - https://phabricator.wikimedia.org/T155688#2960723 (10MoritzMuehlenhoff) @Papaul, the host is depooled and can be rebooted at your discretion. [10:16:54] (03PS3) 10Ema: icinga: critical on ripe atlas check exceptions [puppet] - 10https://gerrit.wikimedia.org/r/333093 [10:17:55] (03CR) 10Ema: [V: 032 C: 032] icinga: critical on ripe atlas check exceptions [puppet] - 10https://gerrit.wikimedia.org/r/333093 (owner: 10Ema) [10:25:09] (03PS2) 10Jcrespo: Remove remnant files and templates of mha classes [puppet] - 10https://gerrit.wikimedia.org/r/325509 (owner: 10Tim Landscheidt) [10:26:48] (03PS10) 10Yuvipanda: postgresql module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332104 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [10:26:51] (03CR) 10Yuvipanda: [C: 032] postgresql module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332104 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [10:27:00] (03CR) 10Yuvipanda: [V: 032] "https://puppet-compiler.wmflabs.org/5179/ NOOP" [puppet] - 10https://gerrit.wikimedia.org/r/332104 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [10:27:41] (03CR) 10Yuvipanda: [V: 032 C: 032] postgresql module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332104 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [10:28:23] (03CR) 10Jcrespo: [C: 032] Remove remnant files and templates of mha classes [puppet] - 10https://gerrit.wikimedia.org/r/325509 (owner: 10Tim Landscheidt) [10:28:32] (03PS3) 10Jcrespo: Remove remnant files and templates of mha classes [puppet] - 10https://gerrit.wikimedia.org/r/325509 (owner: 10Tim Landscheidt) [10:38:00] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:38:53] !log installing openjpeg security updates [10:38:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:13] 06Operations, 10Citoid, 06Services, 10VisualEditor: Package and test Zotero for Jessie - https://phabricator.wikimedia.org/T107302#2960843 (10Mvolz) [10:45:23] 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2960848 (10Mvolz) [10:50:34] !log installing pdns-recursor security updates on trusty systems [10:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:55:59] (03PS10) 10Yuvipanda: toollabs module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332111 (owner: 10Juniorsys) [10:56:05] (03CR) 10Yuvipanda: [V: 032 C: 032] toollabs module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332111 (owner: 10Juniorsys) [10:56:30] (03PS10) 10Yuvipanda: toollabs role modules: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332110 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [10:56:44] (03CR) 10Yuvipanda: [V: 032 C: 032] toollabs role modules: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332110 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [10:57:09] (03PS5) 10Paladox: Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 [10:59:18] PROBLEM - DPKG on aqs1007 is CRITICAL: Return code of 255 is out of bounds [10:59:28] PROBLEM - Disk space on aqs1007 is CRITICAL: Return code of 255 is out of bounds [10:59:48] PROBLEM - MD RAID on aqs1007 is CRITICAL: Return code of 255 is out of bounds [11:00:12] hello aqs1007! [11:00:15] (03CR) 10Alexandros Kosiaris: [C: 04-1] ssh: Don't add IPv6 address as an alias in exported resource if it's undefined (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/333472 (https://phabricator.wikimedia.org/T72792) (owner: 10Alex Monk) [11:00:28] PROBLEM - configured eth on aqs1007 is CRITICAL: Return code of 255 is out of bounds [11:00:38] PROBLEM - dhclient process on aqs1007 is CRITICAL: Return code of 255 is out of bounds [11:00:41] I just installed the os, ran puppet from install-console and got locked out [11:00:47] let me silence it [11:00:58] PROBLEM - puppet last run on aqs1007 is CRITICAL: Return code of 255 is out of bounds [11:01:08] PROBLEM - salt-minion processes on aqs1007 is CRITICAL: Return code of 255 is out of bounds [11:03:07] (03PS1) 10Ema: varnishstatsd: exclude piped requests [puppet] - 10https://gerrit.wikimedia.org/r/333591 (https://phabricator.wikimedia.org/T151643) [11:03:12] (03PS1) 10Gehel: elasticsearch - configure new servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/333592 (https://phabricator.wikimedia.org/T154251) [11:03:27] (03PS6) 10Paladox: Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 [11:03:44] (03PS2) 10Ema: varnishstatsd: exclude piped requests [puppet] - 10https://gerrit.wikimedia.org/r/333591 (https://phabricator.wikimedia.org/T151643) [11:04:28] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [11:06:27] (03CR) 10Alexandros Kosiaris: [C: 032] Move postgres tuning.conf into module [puppet] - 10https://gerrit.wikimedia.org/r/333470 (owner: 10Alex Monk) [11:06:32] (03PS2) 10Alexandros Kosiaris: Move postgres tuning.conf into module [puppet] - 10https://gerrit.wikimedia.org/r/333470 (owner: 10Alex Monk) [11:06:38] (03PS7) 10Paladox: Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 [11:06:44] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Move postgres tuning.conf into module [puppet] - 10https://gerrit.wikimedia.org/r/333470 (owner: 10Alex Monk) [11:07:56] (03PS8) 10Paladox: Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 [11:10:42] (03CR) 10DCausse: elasticsearch - configure new servers in codfw (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/333592 (https://phabricator.wikimedia.org/T154251) (owner: 10Gehel) [11:11:08] RECOVERY - salt-minion processes on aqs1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [11:11:18] RECOVERY - DPKG on aqs1007 is OK: All packages OK [11:11:28] RECOVERY - configured eth on aqs1007 is OK: OK - interfaces up [11:11:28] RECOVERY - Disk space on aqs1007 is OK: DISK OK [11:11:38] RECOVERY - dhclient process on aqs1007 is OK: PROCS OK: 0 processes with command name dhclient [11:11:48] RECOVERY - MD RAID on aqs1007 is OK: OK: Active: 12, Working: 12, Failed: 0, Spare: 0 [11:11:58] RECOVERY - puppet last run on aqs1007 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [11:12:32] better [11:15:02] (03CR) 10Alexandros Kosiaris: [C: 04-1] Allow use of PuppetDB in labs for sshknowngen (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/333471 (https://phabricator.wikimedia.org/T72792) (owner: 10Alex Monk) [11:22:49] (03PS3) 10Ema: varnishstatsd: exclude piped requests [puppet] - 10https://gerrit.wikimedia.org/r/333591 (https://phabricator.wikimedia.org/T151643) [11:26:29] 06Operations, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, 10Wikimedia-Extension-setup, 15User-Addshore: Deploy InterwikiSorting extension to beta - https://phabricator.wikimedia.org/T155995#2960964 (10Addshore) [11:26:46] 06Operations, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, 10Wikimedia-Extension-setup, 15User-Addshore: Deploy InterwikiSorting extension to beta - https://phabricator.wikimedia.org/T155995#2960979 (10Addshore) [11:27:14] (03PS3) 10Addshore: Enable InterwikiSorting on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332917 (https://phabricator.wikimedia.org/T155995) [11:33:28] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [11:34:39] (03PS4) 10Addshore: Enable InterwikiSorting on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332917 (https://phabricator.wikimedia.org/T155995) [11:35:23] (03PS5) 10Addshore: Prepare to enable InterwikiSorting on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332917 (https://phabricator.wikimedia.org/T155995) [11:36:40] (03CR) 10Alexandros Kosiaris: [C: 032] puppetdb: Allow tuning.conf to have a different shared_buffers value [puppet] - 10https://gerrit.wikimedia.org/r/333473 (https://phabricator.wikimedia.org/T72792) (owner: 10Alex Monk) [11:36:46] (03PS2) 10Alexandros Kosiaris: puppetdb: Allow tuning.conf to have a different shared_buffers value [puppet] - 10https://gerrit.wikimedia.org/r/333473 (https://phabricator.wikimedia.org/T72792) (owner: 10Alex Monk) [11:36:50] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] puppetdb: Allow tuning.conf to have a different shared_buffers value [puppet] - 10https://gerrit.wikimedia.org/r/333473 (https://phabricator.wikimedia.org/T72792) (owner: 10Alex Monk) [11:44:51] (03CR) 10Alex Monk: "really, a -1 for that?" [puppet] - 10https://gerrit.wikimedia.org/r/333472 (https://phabricator.wikimedia.org/T72792) (owner: 10Alex Monk) [11:45:13] (03PS3) 10Alex Monk: ssh: Don't add IPv6 address as an alias in exported resource if it's undefined [puppet] - 10https://gerrit.wikimedia.org/r/333472 (https://phabricator.wikimedia.org/T72792) [11:45:50] (03CR) 10Gehel: elasticsearch - configure new servers in codfw (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/333592 (https://phabricator.wikimedia.org/T154251) (owner: 10Gehel) [11:46:45] (03CR) 10Hashar: "> Nice! I've tried this on osx and worked fine after 'bundle install'. It took around 5min for 'rake lint' to complete here:" [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [11:49:48] (03CR) 10Alex Monk: "I don't think the sshkey in role::ci::slave::labs for gerrit will cause exported resources to be generated." [puppet] - 10https://gerrit.wikimedia.org/r/333471 (https://phabricator.wikimedia.org/T72792) (owner: 10Alex Monk) [11:52:30] (03PS1) 10Addshore: Move InterwikiSortOrders to own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333596 (https://phabricator.wikimedia.org/T150183) [11:54:33] !log whitelisted dbproxy1010 on cr1/cr2 for analytics-in4 input filter [11:54:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:55:27] (03PS2) 10Alex Monk: Allow use of PuppetDB in labs for ssh_known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/333471 (https://phabricator.wikimedia.org/T72792) [12:06:42] (03PS4) 10Ema: varnishstatsd: exclude piped requests [puppet] - 10https://gerrit.wikimedia.org/r/333591 (https://phabricator.wikimedia.org/T151643) [12:06:49] (03CR) 10Ema: [V: 032 C: 032] varnishstatsd: exclude piped requests [puppet] - 10https://gerrit.wikimedia.org/r/333591 (https://phabricator.wikimedia.org/T151643) (owner: 10Ema) [12:10:26] (03PS6) 10Addshore: Prepare to enable InterwikiSorting on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332917 (https://phabricator.wikimedia.org/T155995) [12:11:24] (03PS1) 10Addshore: Enable InterwikiSorting on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333603 (https://phabricator.wikimedia.org/T155995) [12:20:49] (03PS3) 10Hashar: rsync: allow extra settings in rsyncd.conf [puppet] - 10https://gerrit.wikimedia.org/r/290895 (https://phabricator.wikimedia.org/T136276) [12:20:51] (03PS3) 10Hashar: contint: disable DNS lookup for castor rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/290896 (https://phabricator.wikimedia.org/T136276) [12:21:04] (03CR) 10Thiemo Mättig (WMDE): [C: 04-1] Move InterwikiSortOrders to own file (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333596 (https://phabricator.wikimedia.org/T150183) (owner: 10Addshore) [12:23:31] (03CR) 10Thiemo Mättig (WMDE): [C: 04-1] Prepare to enable InterwikiSorting on beta cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332917 (https://phabricator.wikimedia.org/T155995) (owner: 10Addshore) [12:25:20] (03CR) 10Hashar: [C: 031] "Cherry picked on CI puppet master:" [puppet] - 10https://gerrit.wikimedia.org/r/290895 (https://phabricator.wikimedia.org/T136276) (owner: 10Hashar) [12:25:36] (03CR) 10Hashar: [C: 031] "Cherry picked on CI puppet master:" [puppet] - 10https://gerrit.wikimedia.org/r/290896 (https://phabricator.wikimedia.org/T136276) (owner: 10Hashar) [12:28:12] (03PS2) 10Addshore: Move InterwikiSortOrders to own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333596 (https://phabricator.wikimedia.org/T150183) [12:29:51] (03CR) 10Addshore: Prepare to enable InterwikiSorting on beta cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332917 (https://phabricator.wikimedia.org/T155995) (owner: 10Addshore) [12:30:16] (03PS7) 10Addshore: Prepare to enable InterwikiSorting on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332917 (https://phabricator.wikimedia.org/T155995) [12:32:57] (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/325762 (https://phabricator.wikimedia.org/T152549) (owner: 10Hashar) [12:33:14] (03CR) 10Hashar: "Can we get this patch a tiny bit of attention to get it merged? :]" [software] - 10https://gerrit.wikimedia.org/r/325762 (https://phabricator.wikimedia.org/T152549) (owner: 10Hashar) [12:33:58] (03Draft1) 10Hashar: (wip) Use make to find tabulations (wip) [dns] - 10https://gerrit.wikimedia.org/r/322920 [12:34:01] (03Abandoned) 10Hashar: (wip) Use make to find tabulations (wip) [dns] - 10https://gerrit.wikimedia.org/r/322920 (owner: 10Hashar) [12:39:27] Hi, is gerrit broken? I'm getting this error response every time I try to clone a repo. https://www.irccloud.com/pastebin/bEqTwtz4/ [12:47:55] Niharika: your clone command worked for me [12:48:19] Hmm. Maybe I should restart vagrant. [12:48:49] I cloned into a filesystem on a laptop, no vm or anything in between [12:49:58] apergos: Thanks. I will give that a try. [12:50:07] :-) [12:52:04] (03PS4) 10Giuseppe Lavagetto: Add schema support [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 [12:55:34] (03CR) 10DCausse: [C: 031] elasticsearch - configure new servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/333592 (https://phabricator.wikimedia.org/T154251) (owner: 10Gehel) [13:00:07] (03PS7) 10Giuseppe Lavagetto: Generalize entities definitions [software/conftool] - 10https://gerrit.wikimedia.org/r/288609 (https://phabricator.wikimedia.org/T155823) [13:00:09] (03PS5) 10Giuseppe Lavagetto: Add schema support [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823) [13:11:01] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack and set up aqs100[7-9] - https://phabricator.wikimedia.org/T155654#2961089 (10elukey) @Cmjohnson: I wanted to help and I installed the OS and signed puppet/salt keys for aqs100[78], but wasn't able to PXE boot aqs1009 due to a DHCP lease failure. I didn't... [13:13:20] (03PS1) 10Muehlenhoff: Add remaining staff email addresses to data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/333611 (https://phabricator.wikimedia.org/T142826) [13:16:55] (03CR) 10Alexandros Kosiaris: [C: 032] Add missing comment for some Ganeti instances [dns] - 10https://gerrit.wikimedia.org/r/332710 (owner: 10Volans) [13:16:58] (03PS4) 10Alexandros Kosiaris: Add missing comment for some Ganeti instances [dns] - 10https://gerrit.wikimedia.org/r/332710 (owner: 10Volans) [13:17:02] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add missing comment for some Ganeti instances [dns] - 10https://gerrit.wikimedia.org/r/332710 (owner: 10Volans) [13:28:59] (03PS2) 10Muehlenhoff: Add remaining staff email addresses to data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/333611 (https://phabricator.wikimedia.org/T142826) [13:38:50] (03CR) 10Alexandros Kosiaris: osm: Add a prometheus textfile exporter (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/331623 (owner: 10Alexandros Kosiaris) [13:40:50] 06Operations, 10DBA, 06Labs, 10netops: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961118 (10Marostegui) [13:45:30] (03PS2) 10Alexandros Kosiaris: osm: Add a prometheus textfile exporter [puppet] - 10https://gerrit.wikimedia.org/r/331623 [13:45:48] (03CR) 10jerkins-bot: [V: 04-1] osm: Add a prometheus textfile exporter [puppet] - 10https://gerrit.wikimedia.org/r/331623 (owner: 10Alexandros Kosiaris) [13:46:45] (03PS2) 10Zfilipin: Add *.finds.org.uk to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333294 (https://phabricator.wikimedia.org/T155844) (owner: 10Urbanecm) [13:47:16] hashar: should I do eu swat today? https://wikitech.wikimedia.org/wiki/Deployments#Monday.2C.C2.A0January.C2.A023 [13:47:20] looks easy enough [13:47:52] I'm here BTW :) [13:48:31] 06Operations, 06Security-Team: Allow the production cluster to access *.wmflabs.org IPs - https://phabricator.wikimedia.org/T95714#2961139 (10akosiaris) 05Open>03Resolved a:03akosiaris Resolving per the comment above. [13:49:04] (03PS3) 10Alexandros Kosiaris: osm: Add a prometheus textfile exporter [puppet] - 10https://gerrit.wikimedia.org/r/331623 [13:50:18] (03CR) 10Alexandros Kosiaris: [C: 032] Fix invalid byte sequence in US-ASCII [puppet] - 10https://gerrit.wikimedia.org/r/333576 (owner: 10Hashar) [13:50:23] (03PS2) 10Alexandros Kosiaris: Fix invalid byte sequence in US-ASCII [puppet] - 10https://gerrit.wikimedia.org/r/333576 (owner: 10Hashar) [13:50:26] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Fix invalid byte sequence in US-ASCII [puppet] - 10https://gerrit.wikimedia.org/r/333576 (owner: 10Hashar) [13:53:24] (03CR) 10Addshore: [C: 031] Move InterwikiSortOrders to own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333596 (https://phabricator.wikimedia.org/T150183) (owner: 10Addshore) [13:53:31] PROBLEM - puppet last run on ms-be1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:54:01] zeljkof: fyi I just added one to the calanderrrr (easy one) should be a noop, but need to make sure the files are synced in the correct order ;) [13:54:47] addshore: in that case, would _you_ like to do the the eu swat? ;) [13:55:19] ooh, can do! [13:55:58] addshore: great, it's all yours then :) [13:57:54] ack [14:00:05] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170123T1400). [14:00:05] Urbanecm, tto, Niharika, and Addshore: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [14:00:07] (03CR) 10Addshore: [C: 032] Add *.finds.org.uk to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333294 (https://phabricator.wikimedia.org/T155844) (owner: 10Urbanecm) [14:00:12] Urbanecm: still here? :) [14:00:14] Present [14:00:18] G'day [14:00:19] o/ [14:00:19] yours is going first :) [14:00:26] ack [14:00:49] Just FYI, my patch is untestable (it's another no-op) so there's no real need to ping me about it [14:00:58] tto: okay! [14:00:58] I'll be here regardless [14:01:24] (03PS3) 10Addshore: Temporarily set $wgDisableUserGroupExpiry to true on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333476 (owner: 10TTO) [14:01:30] (03CR) 10Addshore: [C: 032] Temporarily set $wgDisableUserGroupExpiry to true on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333476 (owner: 10TTO) [14:01:39] tto: yours is going too :) [14:01:55] You'll have to wait for the beta update anyway! :) [14:03:16] (03Merged) 10jenkins-bot: Add *.finds.org.uk to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333294 (https://phabricator.wikimedia.org/T155844) (owner: 10Urbanecm) [14:03:27] (03CR) 10jenkins-bot: Add *.finds.org.uk to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333294 (https://phabricator.wikimedia.org/T155844) (owner: 10Urbanecm) [14:03:32] (03Merged) 10jenkins-bot: Temporarily set $wgDisableUserGroupExpiry to true on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333476 (owner: 10TTO) [14:03:42] Urbanecm: is yours testable? [14:04:54] Urbanecm: yours is live on mwdebug1002 [14:05:10] If it is applied at both beta and prod then yes otherwise not. [14:05:23] (I have the rights to upload using the toolset only at beta) [14:05:28] addshore, ^^^ [14:05:29] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development, 13Patch-For-Review: E901 SyntaxError: invalid syntax is wrongly raised on using python's abc by jenkins python CI linter - https://phabricator.wikimedia.org/T152950#2864426 (10hashar) 05Open>03stalled [14:05:34] (03CR) 10jenkins-bot: Temporarily set $wgDisableUserGroupExpiry to true on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333476 (owner: 10TTO) [14:05:40] ahh okay, It will take some time to get to beta.. [14:05:50] Hi. [14:05:54] Addshore can you ping me when you're done? I'll have something to deploy later in the window. [14:06:01] Dereckson: ack [14:06:03] addshore, how many time? [14:07:12] Urbanecm: it doesnt look like it has broken anything so syncing now to prod [14:07:34] ack [14:07:38] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:333294]] Add *.finds.org.uk to wgCopyUploadsDomains (duration: 00m 41s) [14:07:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:46] Urbanecm: ^^ [14:07:53] thx addshore [14:07:54] (03PS3) 10Addshore: Add n, n:es and n:fr as import sources in test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333362 (https://phabricator.wikimedia.org/T155906) (owner: 10Urbanecm) [14:08:16] Note: I am not a sysop at test2wiki nor global transwiki importer so I can't test it. [14:09:12] (03PS2) 10Addshore: Amend category collation for de.wikisource to 'uca-de-u-kn' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333488 (https://phabricator.wikimedia.org/T155916) (owner: 10MarcoAurelio) [14:09:32] !log addshore@tin Synchronized wmf-config/InitialiseSettings-labs.php: [[gerrit:333476]] (NOOP) Temporarily set $wgDisableUserGroupExpiry to true on labs (duration: 00m 40s) [14:09:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:38] (03CR) 10Addshore: [C: 032] Add n, n:es and n:fr as import sources in test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333362 (https://phabricator.wikimedia.org/T155906) (owner: 10Urbanecm) [14:10:47] Urbanecm: ^^ your next one is going next :) [14:10:51] ack [14:11:46] (03PS1) 10Dereckson: Set site name and meta namespace for Sanskrit wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333640 (https://phabricator.wikimedia.org/T101634) [14:12:15] (03Merged) 10jenkins-bot: Add n, n:es and n:fr as import sources in test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333362 (https://phabricator.wikimedia.org/T155906) (owner: 10Urbanecm) [14:12:28] (03CR) 10jenkins-bot: Add n, n:es and n:fr as import sources in test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333362 (https://phabricator.wikimedia.org/T155906) (owner: 10Urbanecm) [14:13:00] Urbanecm: that one is live on mwdebug1002! please check :) [14:13:04] (03Abandoned) 10Hashar: (WIP) Lame rake / vcl / erb stuff (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/276733 (owner: 10Hashar) [14:13:11] addshore, did you see my note upper? [14:13:12] :) [14:13:22] ahh! [14:13:22] Note: I am not a sysop at test2wiki nor global transwiki importer so I can't test it. [14:14:00] !log Fix namespaces dupes on sa.wikisource to prepare T101634 / [[Gerrit:333640]] [14:14:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:06] T101634: Correction of namespace names in Sanskrit - https://phabricator.wikimedia.org/T101634 [14:14:33] Urbanecm: I am, and it looks like they appeared :) syncing now! [14:14:45] Great! [14:15:39] (03PS3) 10Addshore: Move InterwikiSortOrders to own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333596 (https://phabricator.wikimedia.org/T150183) [14:15:49] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:333362|T155906 Add n, n:es and n:fr as import sources in test2wiki]] (duration: 00m 39s) [14:15:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:53] T155906: Add en.wikinews, es.wikinews and fr.wikinews as import sources in test2wiki - https://phabricator.wikimedia.org/T155906 [14:15:55] Urbanecm: ^^ done [14:16:04] thx [14:16:08] Niharika: your up next! here? :) [14:16:16] addshore: Yup! [14:16:19] (03CR) 10Addshore: [C: 032] Amend category collation for de.wikisource to 'uca-de-u-kn' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333488 (https://phabricator.wikimedia.org/T155916) (owner: 10MarcoAurelio) [14:17:01] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1013 - https://phabricator.wikimedia.org/T155907#2961222 (10fgiunchedi) >>! In T155907#2958803, @Volans wrote: > Also puppet is broken because of this, is this //expected// @fgiunchedi ? Shouldn't we detect the failed disk and let puppet conti... [14:17:02] (03PS9) 10Elukey: role analytics_cluster: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332106 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:17:25] (03CR) 10Addshore: [C: 032] Move InterwikiSortOrders to own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333596 (https://phabricator.wikimedia.org/T150183) (owner: 10Addshore) [14:17:50] (03Merged) 10jenkins-bot: Amend category collation for de.wikisource to 'uca-de-u-kn' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333488 (https://phabricator.wikimedia.org/T155916) (owner: 10MarcoAurelio) [14:18:14] Niharika: is this testable on mwdebug1002? [14:18:16] (03CR) 10jenkins-bot: Amend category collation for de.wikisource to 'uca-de-u-kn' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333488 (https://phabricator.wikimedia.org/T155916) (owner: 10MarcoAurelio) [14:18:35] addshore: Gimme a moment, all I can check is nothing is broke. [14:18:54] (03Merged) 10jenkins-bot: Move InterwikiSortOrders to own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333596 (https://phabricator.wikimedia.org/T150183) (owner: 10Addshore) [14:19:25] addshore: Looks sane. good to deploy everywhere. [14:20:18] !log depool ms-fe200[1234] T152612 [14:20:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:23] Niharika: syncing [14:20:23] T152612: codfw: rack/setup ms-fe200[5-8] - https://phabricator.wikimedia.org/T152612 [14:20:40] (03CR) 10jenkins-bot: Move InterwikiSortOrders to own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333596 (https://phabricator.wikimedia.org/T150183) (owner: 10Addshore) [14:20:57] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:333488|T155916 Amend category collation for de.wikisource to uca-de-u-kn]] (duration: 00m 39s) [14:21:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:01] T155916: German Wikisource: Alphabetical order in the categories (collation) - https://phabricator.wikimedia.org/T155916 [14:21:08] * Dereckson has added 333640 to [[Deployments]]. [14:21:13] Niharika: ^^ all done [14:21:24] addshore: Thanks! [14:21:31] RECOVERY - puppet last run on ms-be1024 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [14:22:52] addshore, hm, looks I made a mistake. Or maybe not. WIll *.domain.tld cater for domain.tld? [14:23:03] !log disabling puppet on elastic20(2[5-9]|3[0-6]) prior to reimage - T154251 [14:23:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:05] T154251: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251 [14:23:16] Urbanecm: give me 2 secs for this final patch [14:23:31] !log addshore@tin Synchronized wmf-config/InterwikiSortOrders.php: [[gerrit:T150183|T150183 Move InterwikiSortOrders to own file]] PT 1/2 (duration: 00m 40s) [14:23:34] addshore, sure. [14:23:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:36] T150183: Deploy InterwikiSorting extension to production - https://phabricator.wikimedia.org/T150183 [14:24:55] !log addshore@tin Synchronized wmf-config/Wikibase.php: [[gerrit:T150183|T150183 Move InterwikiSortOrders to own file]] PT 2/2 (duration: 00m 39s) [14:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:43] (03PS2) 10Gehel: elasticsearch - configure new servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/333592 (https://phabricator.wikimedia.org/T154251) [14:25:44] Dereckson: that is me done up to the patch that you have just added! [14:26:24] (03CR) 10Elukey: [C: 032] "PCC looks good! https://puppet-compiler.wmflabs.org/5181" [puppet] - 10https://gerrit.wikimedia.org/r/332106 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [14:26:30] Urbanecm: it looks like you need to add it without the wildcard too! [14:26:46] You can see some examples further up in the array! [14:26:52] addshore, I'm going to make a patch, will you deploy it please? [14:26:57] Urbanecm: sure! [14:27:22] ok, give me a sec [14:28:13] (03PS2) 10Addshore: Set site name and meta namespace for Sanskrit wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333640 (https://phabricator.wikimedia.org/T101634) (owner: 10Dereckson) [14:28:15] (03PS2) 10Hashar: zuul: rspec tests [puppet] - 10https://gerrit.wikimedia.org/r/299151 [14:28:25] Dereckson: ^^ will you deploy that one? :) [14:28:29] (just rebased it) [14:28:34] okay, in 10 minutes [14:28:37] okay! [14:29:05] (03PS1) 10Urbanecm: [fix] Add finds.org.uk without wildcard too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333650 (https://phabricator.wikimedia.org/T155844) [14:30:04] (03PS2) 10Addshore: [fix] Add finds.org.uk without wildcard too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333650 (https://phabricator.wikimedia.org/T155844) (owner: 10Urbanecm) [14:30:11] (03PS3) 10Urbanecm: [fix] Add finds.org.uk without wildcard too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333650 (https://phabricator.wikimedia.org/T155844) [14:30:27] (03PS3) 10Gehel: elasticsearch - configure new servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/333592 (https://phabricator.wikimedia.org/T154251) [14:31:11] PROBLEM - puppet last run on cp3032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:31:14] (03CR) 10Addshore: [C: 032] [fix] Add finds.org.uk without wildcard too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333650 (https://phabricator.wikimedia.org/T155844) (owner: 10Urbanecm) [14:31:26] Dereckson: just going to send this one out too ^^ [14:32:51] (03Merged) 10jenkins-bot: [fix] Add finds.org.uk without wildcard too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333650 (https://phabricator.wikimedia.org/T155844) (owner: 10Urbanecm) [14:33:10] (03CR) 10jenkins-bot: [fix] Add finds.org.uk without wildcard too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333650 (https://phabricator.wikimedia.org/T155844) (owner: 10Urbanecm) [14:33:36] addshore: Thanks for that! [14:34:29] tto: np :) [14:34:57] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:333294|T155844 [fix] Add finds.org.uk without wildcard too]] (duration: 00m 39s) [14:34:59] Urbanecm: ^^ [= [14:35:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:00] T155844: Please add to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T155844 [14:35:05] Thanks [14:35:07] Dereckson: really all yours this time :) [14:38:15] (03CR) 10Gehel: [C: 032] elasticsearch - configure new servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/333592 (https://phabricator.wikimedia.org/T154251) (owner: 10Gehel) [14:40:21] (03CR) 10Alexandros Kosiaris: [C: 031] puppet parse validate from rake [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [14:45:15] addshore: ok [14:47:31] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:49:24] Urbanecm: tto: addshore: what do you think about the Wiktionary line at https://phabricator.wikimedia.org/T101634? [14:49:31] Is the last character a : or not? [14:49:56] शः [14:50:01] (03PS1) 10Niharika29: Disable logins on loginwiki to support LoginNotify [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333653 (https://phabricator.wikimedia.org/T154064) [14:50:05] tto: That seems to have a meaning in Devanagari script [14:50:19] ok [14:50:21] If it is a ASCII colon it will be bad [14:50:30] Have you checked the Unicode code point? [14:50:58] 06Operations, 07Puppet, 10Deployment-Systems, 06Release-Engineering-Team, 05Mediawiki SWAT Deployments: mwdebug1002 should have PHP extensions - https://phabricator.wikimedia.org/T153316#2961290 (10MoritzMuehlenhoff) p:05Triage>03Normal >>! In T153316#2881845, @greg wrote: > Let's do that. It makes s... [14:51:24] How do you that? It's one of the code point you can backspace-deleted but not select. [14:52:06] Dereckson: Really odd, isn't it? It appears to be u+0903 [14:52:23] Dereckson, I don't think so. If I copy the character and then use arrows, it seems it is only one character. [14:52:28] (so no colon at the end) [14:52:53] (03CR) 10Niharika29: [C: 04-1] "On hold till LoginNotify is nearing deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333653 (https://phabricator.wikimedia.org/T154064) (owner: 10Niharika29) [14:53:40] For the selection/backspace pattern, http://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/ is interesting [14:54:27] (03PS3) 10Dereckson: Set site name and meta namespace for Sanskrit wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333640 (https://phabricator.wikimedia.org/T101634) [14:54:34] Dereckson, for the record I pasted the text into Windows WordPad and pressed Alt+X after the last character to get the Unicode codepoint [14:54:35] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333640 (https://phabricator.wikimedia.org/T101634) (owner: 10Dereckson) [14:54:44] tto: ok [14:55:08] (03CR) 10Milimetric: "bump on this, should I add someone else? I did reply to a concern about security review: https://phabricator.wikimedia.org/T125403#295421" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333086 (https://phabricator.wikimedia.org/T125403) (owner: 10Milimetric) [14:55:57] (03Merged) 10jenkins-bot: Set site name and meta namespace for Sanskrit wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333640 (https://phabricator.wikimedia.org/T101634) (owner: 10Dereckson) [14:56:33] Live on mwdebug1002 [14:57:36] https://sa.wikisource.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%B8%E0%A5%8D%E0%A4%B0%E0%A5%8B%E0%A4%A4%E0%A4%83:Foo [14:57:48] We see the difference between the ':' and the तः [14:58:07] (03CR) 10jenkins-bot: Set site name and meta namespace for Sanskrit wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333640 (https://phabricator.wikimedia.org/T101634) (owner: 10Dereckson) [14:58:18] And that's coherent with https://sa.wikisource.org/wiki/%E0%A4%B8%E0%A4%9E%E0%A5%8D%E0%A4%9A%E0%A4%BF%E0%A4%95%E0%A4%BE:Wikisource-logo-sa.svg logo [14:58:24] So yes, it looks good to me. [14:58:27] 06Operations, 10Monitoring: Fix permissions for systemd file - https://phabricator.wikimedia.org/T155869#2957345 (10MoritzMuehlenhoff) That's a generic bug in base::service_unit, the rest for $initscript in service_unit.pp:108 needs to be extended to also cover system_override [14:58:36] 06Operations, 10Monitoring: Fix permissions for systemd file - https://phabricator.wikimedia.org/T155869#2961319 (10MoritzMuehlenhoff) p:05Triage>03Normal [15:00:07] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Set site name and meta namespace for Sanskrit wikis (T101634) (duration: 00m 40s) [15:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:11] T101634: Correction of namespace names in Sanskrit - https://phabricator.wikimedia.org/T101634 [15:00:21] RECOVERY - puppet last run on cp3032 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:01:22] 06Operations, 10ops-codfw, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251#2905389 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts: ``` ['elastic2025.codfw.wmnet'] ``` T... [15:02:09] !log EU SWAT done (handled by addshore) [15:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:05] 06Operations, 10ops-eqiad, 13Patch-For-Review, 15User-Joe: Decommission mw1152 - https://phabricator.wikimedia.org/T149185#2961351 (10Joe) @Cmjohnson any news on this? [15:05:29] (03PS1) 10Alexandros Kosiaris: Remove redundant calls to standard from install_server [puppet] - 10https://gerrit.wikimedia.org/r/333658 [15:08:21] PROBLEM - puppet last run on elastic1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:09:02] 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2385879 (10Eustaz) FYI: SDCH is on its way to [[ https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/nQl0ORHy7sw | unshippment ]]. [15:09:17] (03PS3) 10Andrew Bogott: wmf_sink: Remove all ldap handling [puppet] - 10https://gerrit.wikimedia.org/r/333098 (https://phabricator.wikimedia.org/T148781) [15:10:09] (03CR) 10Alexandros Kosiaris: [C: 032] Remove redundant calls to standard from install_server [puppet] - 10https://gerrit.wikimedia.org/r/333658 (owner: 10Alexandros Kosiaris) [15:10:15] (03PS2) 10Alexandros Kosiaris: Remove redundant calls to standard from install_server [puppet] - 10https://gerrit.wikimedia.org/r/333658 [15:10:28] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] "noop in https://puppet-compiler.wmflabs.org/5182/" [puppet] - 10https://gerrit.wikimedia.org/r/333658 (owner: 10Alexandros Kosiaris) [15:10:38] !installing mysql 5.5 security updates (as packaged by jessie/trusty, not the internal mariadb packages) [15:10:56] (03CR) 10Alexandros Kosiaris: [C: 04-2] "Actually, the inclusion of standard was wrong in the first place. Fixed in https://gerrit.wikimedia.org/r/#/c/333658/" [puppet] - 10https://gerrit.wikimedia.org/r/331635 (owner: 10Hashar) [15:11:30] (03Abandoned) 10Hashar: apt: skip commenting non existent old comment [puppet] - 10https://gerrit.wikimedia.org/r/325570 (owner: 10Hashar) [15:11:53] (03CR) 10Andrew Bogott: [C: 032] wmf_sink: Remove all ldap handling [puppet] - 10https://gerrit.wikimedia.org/r/333098 (https://phabricator.wikimedia.org/T148781) (owner: 10Andrew Bogott) [15:12:01] (03PS4) 10Andrew Bogott: wmf_sink: Remove all ldap handling [puppet] - 10https://gerrit.wikimedia.org/r/333098 (https://phabricator.wikimedia.org/T148781) [15:12:09] (03CR) 10Alexandros Kosiaris: [C: 032] mirrors: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331639 (owner: 10Hashar) [15:12:14] (03PS3) 10Alexandros Kosiaris: mirrors: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331639 (owner: 10Hashar) [15:12:18] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] mirrors: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/331639 (owner: 10Hashar) [15:13:23] (03PS5) 10Andrew Bogott: wmf_sink: Remove all ldap handling [puppet] - 10https://gerrit.wikimedia.org/r/333098 (https://phabricator.wikimedia.org/T148781) [15:14:47] (03CR) 10Filippo Giunchedi: [C: 031] puppet parse validate from rake [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [15:15:18] !log reimage elastic2025 - T154251 [15:15:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:21] T154251: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251 [15:15:23] 06Operations, 10ops-eqiad, 10DBA: Move db1051 to row D2 - https://phabricator.wikimedia.org/T156004#2961384 (10Marostegui) [15:15:31] RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [15:15:40] (03Abandoned) 10Hashar: install_server: mock standard class for tests [puppet] - 10https://gerrit.wikimedia.org/r/331635 (owner: 10Hashar) [15:16:25] (03Abandoned) 10Hashar: (DO NOT SUBMIT) Octopus merge of spec fixes [puppet] - 10https://gerrit.wikimedia.org/r/331850 (owner: 10Hashar) [15:16:39] 06Operations, 10DBA: Reimage db1065 and db1066 - https://phabricator.wikimedia.org/T156005#2961400 (10Marostegui) [15:17:36] !log Fixed namespaces dupes following NS_PROJECT update on sa.wikisource (T101634) [15:17:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:39] T101634: Correction of namespace names in Sanskrit - https://phabricator.wikimedia.org/T101634 [15:17:42] !log installing pdns-recursor security update on labservices1002 [15:17:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:07] moritzm: you missed the previous one: 15:10:37 <@moritzm> !installing mysql 5.5 ... [15:18:25] Dereckson: thanks! [15:18:28] !log installing mysql 5.5 security updates (as packaged by jessie/trusty, not the internal mariadb packages) [15:18:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:19:23] !log whitelisted dbproxy1011 on cr1/cr2 for analytics-in4 input filter [15:19:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:45] moritzm, where is there a mysql 5.5 ? [15:20:49] 06Operations, 10ops-codfw, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251#2961415 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2025.codfw.wmnet'] ``` and were **ALL** successful. [15:20:52] 06Operations, 10ops-eqiad, 10DBA: Move db1051 to row D2 - https://phabricator.wikimedia.org/T156004#2961418 (10Marostegui) [15:21:50] 06Operations, 10ops-eqiad, 10DBA: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961422 (10Marostegui) [15:22:34] (03PS15) 10Hashar: Modification of Rakefile spec entry point [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [15:22:36] (03PS10) 10Hashar: Use rake tasks to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223 [15:22:38] (03PS5) 10Hashar: Jenkins integration of rspec [puppet] - 10https://gerrit.wikimedia.org/r/331856 (https://phabricator.wikimedia.org/T78342) [15:22:40] jynus: yubiauth servers, bacula web UI, piwik and some service on labtestcontrol I don't remember [15:22:58] interesting [15:23:11] shall I open tickets for those? [15:23:25] to evaluate migrating it to the standard mysql clusters? [15:23:31] not really [15:23:34] ok [15:23:46] I just didn't know they exist [15:24:23] upstream mysql should be the standard [15:24:38] 06Operations, 10DBA, 10netops: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2961450 (10Marostegui) [15:24:47] 06Operations, 10DBA, 06Labs, 10netops: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961118 (10Marostegui) [15:24:51] (03PS1) 10Andrew Bogott: Revert "wmf_sink: Remove all ldap handling" [puppet] - 10https://gerrit.wikimedia.org/r/333660 [15:25:18] I upgraded our package recently, too, BTW (see packages log) [15:26:22] (03CR) 10Filippo Giunchedi: "LGTM modulo metric name" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/331623 (owner: 10Alexandros Kosiaris) [15:27:39] 06Operations, 10ops-codfw, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251#2961482 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts: ``` ['elastic2026.codfw.wmnet', 'elas... [15:29:13] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Create an etcd cluster in codfw - https://phabricator.wikimedia.org/T156009#2961483 (10Joe) [15:29:46] 06Operations, 10DBA: Reimage db1065 and db1066 - https://phabricator.wikimedia.org/T156005#2961499 (10jcrespo) a:03jcrespo [15:29:55] 06Operations, 10ops-eqiad: ms-be1016 controller cache failure - https://phabricator.wikimedia.org/T150206#2961501 (10Cmjohnson) @robh: The s/n MXQ50702Q5 for ms-be1016 is not showing up as having a contract or a warranty. Could you look into this please. Thanks [15:30:20] (03PS1) 10Andrew Bogott: wmfsink: Only handle delete messages [puppet] - 10https://gerrit.wikimedia.org/r/333661 (https://phabricator.wikimedia.org/T148781) [15:30:35] 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure, 07Wikimedia-Incident: Replace fans (or paste) on labservices1001 - https://phabricator.wikimedia.org/T154391#2961504 (10Cmjohnson) @andrew any other issues with this? Can we close the task? [15:30:44] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 15User-Joe, 07Wikimedia-Multiple-active-datacenters: Create an etcd cluster in codfw - https://phabricator.wikimedia.org/T156009#2961483 (10Joe) [15:31:09] (03CR) 10Andrew Bogott: [C: 04-2] "Probably no need to revert, this should be addressed by https://gerrit.wikimedia.org/r/#/c/333661/" [puppet] - 10https://gerrit.wikimedia.org/r/333660 (owner: 10Andrew Bogott) [15:31:16] (03CR) 10Andrew Bogott: [C: 032] wmfsink: Only handle delete messages [puppet] - 10https://gerrit.wikimedia.org/r/333661 (https://phabricator.wikimedia.org/T148781) (owner: 10Andrew Bogott) [15:33:28] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack and set up aqs100[7-9] - https://phabricator.wikimedia.org/T155654#2961507 (10Cmjohnson) @elukey, the vlan wasn't set in the switch. Please try again at your convenience. Thanks [15:33:51] (03PS1) 10Marostegui: x1.hosts: Remove old hosts [software] - 10https://gerrit.wikimedia.org/r/333662 [15:34:05] RECOVERY - Elasticsearch HTTPS on elastic2025 is OK: SSL OK - Certificate elastic2025.codfw.wmnet valid until 2022-01-22 15:23:02 +0000 (expires in 1824 days) [15:35:29] (03CR) 10Marostegui: [C: 032] x1.hosts: Remove old hosts [software] - 10https://gerrit.wikimedia.org/r/333662 (owner: 10Marostegui) [15:35:35] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [15:36:18] (03Merged) 10jenkins-bot: x1.hosts: Remove old hosts [software] - 10https://gerrit.wikimedia.org/r/333662 (owner: 10Marostegui) [15:36:25] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [15:38:05] !log Alter tables: flow_topic_list and flow_tree_node on db1031 (x1 master) - T149819 [15:38:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:10] T149819: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819 [15:41:47] 06Operations, 10ops-eqiad, 10DBA: Move db1051 to row D2 - https://phabricator.wikimedia.org/T156004#2961542 (10faidon) D 2 is a potentially bad choice, as it will soon become a 10G rack once the T148506 migration is (finally…) done. Any other rack except D 2 & D 7 (new 10G) and D 6-8 (current 10G) is probabl... [15:42:53] (03PS1) 10Gehel: elasticsearch - notifiy nginx of SSL certificate changes [puppet] - 10https://gerrit.wikimedia.org/r/333664 [15:44:00] (03PS1) 10Alexandros Kosiaris: WIP: redis monitoring: Use the correct variable [puppet] - 10https://gerrit.wikimedia.org/r/333665 [15:44:32] 06Operations, 10DBA, 06Labs, 10netops: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961547 (10Marostegui) [15:44:36] 06Operations, 10ops-eqiad, 10DBA: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961546 (10Marostegui) 05stalled>03Open [15:46:53] 06Operations, 10ops-eqiad, 10DBA: Move db1051 to row D2 - https://phabricator.wikimedia.org/T156004#2961548 (10Marostegui) Thanks Faidon - let's go for B3 then? [15:48:27] 06Operations, 10ops-codfw, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251#2961583 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2026.codfw.wmnet', 'elastic2027.codfw.wmnet'] ``` Of which those **FAILED**... [15:48:38] 06Operations, 10ops-eqiad, 10DBA: Move db1051 to row B3 - https://phabricator.wikimedia.org/T156004#2961584 (10Marostegui) [15:48:52] 06Operations, 10ops-eqiad, 10DBA: Move db1051 to row B3 - https://phabricator.wikimedia.org/T156004#2961586 (10faidon) Sure, no objections from a networking/DC design perspective :) [15:51:37] 06Operations, 10ops-eqiad, 10DBA: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961587 (10jcrespo) ``` marostegui and jynus plan for 1400UTC ``` [15:54:30] (03PS2) 10Alexandros Kosiaris: WIP: redis monitoring: Use the correct variable [puppet] - 10https://gerrit.wikimedia.org/r/333665 [15:54:55] RECOVERY - Elasticsearch HTTPS on elastic2026 is OK: SSL OK - Certificate elastic2026.codfw.wmnet valid until 2022-01-22 15:53:52 +0000 (expires in 1824 days) [15:55:15] RECOVERY - Elasticsearch HTTPS on elastic2027 is OK: SSL OK - Certificate elastic2027.codfw.wmnet valid until 2022-01-22 15:53:57 +0000 (expires in 1824 days) [15:58:50] (03PS1) 10Giuseppe Lavagetto: profile::base::labs: do not require profile::base [puppet] - 10https://gerrit.wikimedia.org/r/333667 [15:59:00] <_joe_> andrewbogott: ^^ [15:59:16] 06Operations, 06Labs, 10Labs-Infrastructure, 07Wikimedia-Incident: labservices1001 down, suspected overheating - https://phabricator.wikimedia.org/T152340#2961652 (10Andrew) [15:59:19] 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure, 07Wikimedia-Incident: Replace fans (or paste) on labservices1001 - https://phabricator.wikimedia.org/T154391#2961650 (10Andrew) 05Open>03Resolved The box has been solid since you worked on it. Unfortunately, the issue we seek to fix is VERY r... [15:59:44] _joe_: is the stuff from base included elsewhere? [15:59:52] <_joe_> yes [16:00:00] <_joe_> role::labs::instance includes standard [16:00:00] ok then :) [16:00:06] <_joe_> that includes profile::base [16:00:12] (03CR) 10Andrew Bogott: [C: 032] profile::base::labs: do not require profile::base [puppet] - 10https://gerrit.wikimedia.org/r/333667 (owner: 10Giuseppe Lavagetto) [16:01:06] <_joe_> sorry, this is honestly black puppet magic :P [16:01:16] 06Operations, 10ops-codfw, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251#2961654 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts: ``` ['elastic2027.codfw.wmnet'] ``` T... [16:01:17] <_joe_> didn't expect a change of distros to cause this [16:01:28] (03PS3) 10Hashar: wmflib: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/332475 [16:02:41] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:04:02] (03CR) 10jerkins-bot: [V: 04-1] wmflib: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/332475 (owner: 10Hashar) [16:04:52] (03PS4) 10Tim Landscheidt: Tools: Disable automatic backups of aptly repositories [puppet] - 10https://gerrit.wikimedia.org/r/328031 (https://phabricator.wikimedia.org/T150726) [16:05:49] (03CR) 10BBlack: [C: 04-1] "Can we use something other than 443, so we don't run into the same problem with https conflicts? We have the same basic issue on new cach" [puppet] - 10https://gerrit.wikimedia.org/r/333247 (owner: 10Filippo Giunchedi) [16:06:09] (03PS6) 10BBlack: TLS: reduce scope of stream.wm.o redirect exception [puppet] - 10https://gerrit.wikimedia.org/r/328193 (https://phabricator.wikimedia.org/T143925) [16:07:40] cmjohnson1: thanks for aqs1009! [16:08:41] PROBLEM - configured eth on labstore1004 is CRITICAL: eth1 reporting no carrier. [16:08:51] 06Operations, 10ops-eqiad, 10DBA: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961687 (10Marostegui) Reminder: - we need to silence labsdb1009, 1010 and 1011 as they replicate from it. - we need to stop replication on them - once it is back, we need to repoint them to the new IP [16:13:44] 06Operations, 10ops-eqiad, 10DBA: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961719 (10jcrespo) > we need to silence labsdb1009, 1010 and 1011 as they replicate from it. But they replicate from db1094 :-/ [16:14:28] 06Operations, 10ops-eqiad, 10DBA: Move db1052 to row B3 - https://phabricator.wikimedia.org/T156006#2961733 (10Marostegui) >>! In T156006#2961719, @jcrespo wrote: >> we need to silence labsdb1009, 1010 and 1011 as they replicate from it. > > But they replicate from db1094 :-/ Gah, sorry! Yes, I meant db109... [16:16:34] ^me attemping to turn this over to a /30 private space but I'm going to shutdown and look at this later [16:16:41] RECOVERY - configured eth on labstore1004 is OK: OK - interfaces up [16:18:26] (03CR) 10BBlack: [C: 032] TLS: reduce scope of stream.wm.o redirect exception [puppet] - 10https://gerrit.wikimedia.org/r/328193 (https://phabricator.wikimedia.org/T143925) (owner: 10BBlack) [16:20:39] 06Operations, 10ops-codfw, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251#2961755 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2027.codfw.wmnet'] ``` and were **ALL** successful. [16:22:01] PROBLEM - puppet last run on cp2015 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:22:01] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:22:11] PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:22:31] PROBLEM - puppet last run on cp2019 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:22:40] 06Operations, 10DBA, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2961764 (10jcrespo) [16:22:43] 06Operations, 10DBA: Reimage db1065 and db1066 - https://phabricator.wikimedia.org/T156005#2961763 (10jcrespo) [16:22:51] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:03] 06Operations, 10ops-codfw, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251#2961765 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts: ``` ['elastic2028.codfw.wmnet'] ``` T... [16:23:09] 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 05DC-Switchover-Prep-Q3-2016-17, and 6 others: Expand conftool to support multiple objects via a schema definition. - https://phabricator.wikimedia.org/T155823#2961766 (10Joe) [16:23:43] PROBLEM - puppet last run on cp2017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:23:52] PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:24:52] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:25:12] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:26:02] PROBLEM - puppet last run on cp3039 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:26:41] bleh [16:27:02] PROBLEM - puppet last run on cp2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:27:02] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:27:32] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:27:32] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:27:52] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:28:02] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:28:32] PROBLEM - puppet last run on cp1053 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:28:42] (03PS1) 10BBlack: syntax fixup for 3ca8aeec (double-quote) [puppet] - 10https://gerrit.wikimedia.org/r/333672 [16:28:52] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:28:52] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:28:54] (03CR) 10BBlack: [V: 032 C: 032] syntax fixup for 3ca8aeec (double-quote) [puppet] - 10https://gerrit.wikimedia.org/r/333672 (owner: 10BBlack) [16:29:02] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:29:02] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:29:02] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:29:47] (03CR) 10Ottomata: [C: 031] Remove otto and elukey from eventlogging-admins [puppet] - 10https://gerrit.wikimedia.org/r/333242 (https://phabricator.wikimedia.org/T142836) (owner: 10Muehlenhoff) [16:30:02] PROBLEM - puppet last run on cp1064 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:30:12] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:30:52] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:31:03] !log shutting down mw2098 for maintenance [16:31:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:12] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:31:12] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:31:32] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:32:12] PROBLEM - puppet last run on cp3032 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file-frontend] [16:34:32] bblack: I think that there might be an issue with your last patch --^ [16:35:06] it says VCL compilation failure, but not sure if it is temporary or not [16:36:06] ahhh I saw the last code reviews, nevermind [16:36:30] works fine now, just tried on 1048 :) [16:36:32] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:37:01] yeah [16:38:02] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:40:18] (03CR) 10Fjalapeno: [C: 031] Text VCL: consolidate mobile hostname rewrite regex [puppet] - 10https://gerrit.wikimedia.org/r/333158 (https://phabricator.wikimedia.org/T155504) (owner: 10Ema) [16:41:14] (03CR) 10Fjalapeno: [C: 031] "@_joe_ does this look good to you?" [puppet] - 10https://gerrit.wikimedia.org/r/333158 (https://phabricator.wikimedia.org/T155504) (owner: 10Ema) [16:42:39] 06Operations, 10ops-codfw, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251#2961806 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2028.codfw.wmnet'] ``` and were **ALL** successful. [16:43:28] (03PS1) 10Dzahn: aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) [16:44:59] (03PS3) 10Alexandros Kosiaris: redis monitoring: Use the correct variable [puppet] - 10https://gerrit.wikimedia.org/r/333665 [16:45:01] (03PS1) 10Alexandros Kosiaris: redis monitoring: Add a more descriptive description [puppet] - 10https://gerrit.wikimedia.org/r/333677 [16:46:42] RECOVERY - Elasticsearch HTTPS on elastic2028 is OK: SSL OK - Certificate elastic2028.codfw.wmnet valid until 2022-01-22 16:44:51 +0000 (expires in 1824 days) [16:49:48] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare and improve the datacenter switchover procedure - https://phabricator.wikimedia.org/T154658#2961828 (10Joe) @Gilles will do today or tomorrow [16:50:02] RECOVERY - puppet last run on cp2015 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:50:12] (03PS1) 10Hashar: squid3: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/333678 [16:50:12] RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:50:32] RECOVERY - puppet last run on cp2019 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:51:31] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Check the size of every cluster in codfw to see if it matches eqiad's capacity - https://phabricator.wikimedia.org/T156023#2961839 (10Joe) [16:51:42] RECOVERY - puppet last run on cp2017 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:51:52] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:51:53] RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:52:12] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:53:28] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] redis monitoring: Use the correct variable [puppet] - 10https://gerrit.wikimedia.org/r/333665 (owner: 10Alexandros Kosiaris) [16:53:33] (03PS4) 10Alexandros Kosiaris: redis monitoring: Use the correct variable [puppet] - 10https://gerrit.wikimedia.org/r/333665 [16:53:37] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] redis monitoring: Use the correct variable [puppet] - 10https://gerrit.wikimedia.org/r/333665 (owner: 10Alexandros Kosiaris) [16:53:52] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:53:59] (03PS2) 10Alexandros Kosiaris: redis monitoring: Add a more descriptive description [puppet] - 10https://gerrit.wikimedia.org/r/333677 [16:54:02] RECOVERY - puppet last run on cp3039 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:54:06] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) (owner: 10Dzahn) [16:54:08] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] redis monitoring: Add a more descriptive description [puppet] - 10https://gerrit.wikimedia.org/r/333677 (owner: 10Alexandros Kosiaris) [16:55:02] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:55:32] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:55:32] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:55:52] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:56:02] RECOVERY - puppet last run on cp2002 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:56:02] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:56:26] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:56:56] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:56:56] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:56:56] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:57:06] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:57:06] RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:57:07] 06Operations, 06Analytics-Kanban: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#2961899 (10Nuria) [16:57:16] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:57:56] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [16:58:06] RECOVERY - puppet last run on cp1064 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:59:11] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:59:11] RECOVERY - puppet last run on cp3032 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:59:11] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [17:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170123T1700). [17:00:31] PROBLEM - Redis replication status tcp_6379 on oresrdb1002 is CRITICAL: Return code of 255 is out of bounds [17:00:51] PROBLEM - Redis replication status tcp_6380 on oresrdb1002 is CRITICAL: Return code of 255 is out of bounds [17:01:26] (03CR) 10Hashar: [C: 04-1] "Time for me to come back on this change :-} Will hopefully have something cleaned up this week!" [puppet] - 10https://gerrit.wikimedia.org/r/327695 (https://phabricator.wikimedia.org/T150771) (owner: 10Dzahn) [17:09:25] 06Operations, 10Analytics, 10netops, 13Patch-For-Review: Open temporary access from analytics vlan to new-labsdb one - https://phabricator.wikimedia.org/T155487#2944637 (10Nuria) ping @elukey is this a duplicate [17:09:36] 06Operations, 06Analytics-Kanban, 10netops, 13Patch-For-Review: Open temporary access from analytics vlan to new-labsdb one - https://phabricator.wikimedia.org/T155487#2961934 (10Nuria) [17:10:35] brion, did you see my last comment? https://phabricator.wikimedia.org/T155750 [17:11:02] is it part of the same issue? or should I open another report? [17:14:36] (03PS2) 10Alexandros Kosiaris: squid3: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/333678 (owner: 10Hashar) [17:14:43] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] squid3: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/333678 (owner: 10Hashar) [17:15:10] 06Operations, 10Pybal, 10Traffic, 15User-Joe: Pybal not happy with DNS delays - https://phabricator.wikimedia.org/T154759#2961957 (10Joe) [17:18:25] 06Operations, 07Puppet, 07Epic, 07Need-volunteer, 13Patch-For-Review: align puppet-lint config with coding style - https://phabricator.wikimedia.org/T93645#2961964 (10scfc) It would be nice if we had a `puppet-lint` check for the full names and trailing commas that @Juniorsys just fixed. [17:21:11] (03PS2) 10Alexandros Kosiaris: redis: Allow specifying password for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/332436 [17:24:20] 06Operations, 06Analytics-Kanban, 10netops, 13Patch-For-Review: Open temporary access from analytics vlan to new-labsdb one - https://phabricator.wikimedia.org/T155487#2961977 (10elukey) @Nuria no sorry this was probably the good one, I commented in https://phabricator.wikimedia.org/T155658... Sorry @JAlle... [17:26:12] 06Operations, 10Traffic: Enable Service in Asia Cache DC - https://phabricator.wikimedia.org/T156026#2961980 (10BBlack) [17:26:20] 06Operations, 10Traffic: Configuration for Asia Cache DC hosts - https://phabricator.wikimedia.org/T156027#2961994 (10BBlack) [17:26:23] 06Operations, 10Traffic: Name Asia Cache DC site - https://phabricator.wikimedia.org/T156028#2962007 (10BBlack) [17:26:26] 06Operations, 10Traffic: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#2962020 (10BBlack) [17:26:31] 06Operations, 10Traffic: Turn up network links for Asia Cache DC - https://phabricator.wikimedia.org/T156031#2962044 (10BBlack) [17:26:34] 06Operations, 10Traffic: Hardware installation for Asia Cache DC - https://phabricator.wikimedia.org/T156032#2962057 (10BBlack) [17:26:37] 06Operations, 10Traffic: Hardware purchasing for Asia Cache DC - https://phabricator.wikimedia.org/T156033#2962070 (10BBlack) [17:26:53] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:27:02] 06Operations, 10Traffic: Configuration for Asia Cache DC hosts - https://phabricator.wikimedia.org/T156027#2962084 (10BBlack) [17:27:03] PROBLEM - cassandra-b SSL 10.192.16.177:7001 on restbase2007 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [17:27:05] 06Operations, 10Traffic: Enable Service in Asia Cache DC - https://phabricator.wikimedia.org/T156026#2962083 (10BBlack) [17:27:28] 06Operations, 10Traffic: Name Asia Cache DC site - https://phabricator.wikimedia.org/T156028#2962086 (10BBlack) [17:27:31] 06Operations, 10Traffic: Configuration for Asia Cache DC hosts - https://phabricator.wikimedia.org/T156027#2961994 (10BBlack) [17:27:43] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [17:27:53] 06Operations, 10Traffic: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#2962088 (10BBlack) [17:27:56] 06Operations, 10Traffic: Name Asia Cache DC site - https://phabricator.wikimedia.org/T156028#2962007 (10BBlack) [17:28:04] PROBLEM - cassandra-b CQL 10.192.16.177:9042 on restbase2007 is CRITICAL: connect to address 10.192.16.177 and port 9042: Connection refused [17:29:03] PROBLEM - cassandra-a SSL 10.192.32.137:7001 on restbase2004 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [17:29:03] PROBLEM - Check systemd state on restbase2007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:29:33] PROBLEM - cassandra-a CQL 10.192.32.137:9042 on restbase2004 is CRITICAL: connect to address 10.192.32.137 and port 9042: Connection refused [17:29:33] PROBLEM - Check systemd state on restbase2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:29:33] PROBLEM - cassandra-b service on restbase2007 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed [17:29:33] PROBLEM - cassandra-a service on restbase2004 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed [17:30:37] 06Operations, 10Traffic: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#2962108 (10BBlack) [17:31:33] RECOVERY - Check systemd state on restbase2004 is OK: OK - running: The system is fully operational [17:31:33] RECOVERY - cassandra-a service on restbase2004 is OK: OK - cassandra-a is active [17:31:34] 06Operations, 10Traffic: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#2962020 (10BBlack) [17:31:37] 06Operations, 10Traffic: Turn up network links for Asia Cache DC - https://phabricator.wikimedia.org/T156031#2962111 (10BBlack) [17:33:03] RECOVERY - cassandra-a SSL 10.192.32.137:7001 on restbase2004 is OK: SSL OK - Certificate restbase2004-a valid until 2017-09-12 15:35:23 +0000 (expires in 231 days) [17:33:12] <_joe_> uh what's up with cassandra? [17:33:33] RECOVERY - cassandra-a CQL 10.192.32.137:9042 on restbase2004 is OK: TCP OK - 0.039 second response time on 10.192.32.137 port 9042 [17:33:40] 06Operations, 10Traffic: Enable Service in Asia Cache DC - https://phabricator.wikimedia.org/T156026#2962115 (10BBlack) [17:33:43] 06Operations, 10Traffic: Hardware installation for Asia Cache DC - https://phabricator.wikimedia.org/T156032#2962116 (10BBlack) [17:33:54] 06Operations, 10Traffic: Hardware purchasing for Asia Cache DC - https://phabricator.wikimedia.org/T156033#2962118 (10BBlack) [17:33:57] 06Operations, 10Traffic: Hardware installation for Asia Cache DC - https://phabricator.wikimedia.org/T156032#2962057 (10BBlack) [17:35:48] 06Operations, 10Traffic: Turn up network links for Asia Cache DC - https://phabricator.wikimedia.org/T156031#2962126 (10BBlack) [17:35:51] 06Operations, 10Traffic: Enable Service in Asia Cache DC - https://phabricator.wikimedia.org/T156026#2962125 (10BBlack) [17:35:53] PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:36:37] 06Operations, 10Traffic: Hardware purchasing for Asia Cache DC - https://phabricator.wikimedia.org/T156033#2962132 (10BBlack) [17:36:39] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare and improve the datacenter switchover procedure - https://phabricator.wikimedia.org/T154658#2962135 (10fgiunchedi) [17:36:43] RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy [17:36:56] _joe_ java.lang.OutOfMemoryError: Java heap space [17:37:10] maybe the wide rows issue? cc: mobrovac, urandom [17:37:27] <_joe_> yeah I guess so [17:37:38] (03PS1) 10Gehel: elasticsearch - increase size of GC logs [puppet] - 10https://gerrit.wikimedia.org/r/333696 [17:37:44] "puppet will "fix" it" [17:39:47] it may probably need a blacklist fix in restbase [17:40:33] RECOVERY - cassandra-b service on restbase2007 is OK: OK - cassandra-b is active [17:41:03] RECOVERY - Check systemd state on restbase2007 is OK: OK - running: The system is fully operational [17:41:16] (to avoid recurrences) [17:42:04] RECOVERY - cassandra-b CQL 10.192.16.177:9042 on restbase2007 is OK: TCP OK - 0.036 second response time on 10.192.16.177 port 9042 [17:42:13] RECOVERY - cassandra-b SSL 10.192.16.177:7001 on restbase2007 is OK: SSL OK - Certificate restbase2007-b valid until 2017-09-12 15:35:53 +0000 (expires in 231 days) [17:46:17] elukey: yeah likely [17:47:33] PROBLEM - Check systemd state on elastic2032 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:48:03] PROBLEM - Elasticsearch HTTPS on elastic2032 is CRITICAL: SSL CRITICAL - failed to verify search.svc.codfw.wmnet against elastic2032.codfw.wmnet [18:00:04] gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170123T1800). [18:01:02] 06Operations, 10ops-codfw: mw2098 drac offline - system unreachable - https://phabricator.wikimedia.org/T155688#2962264 (10Papaul) This is the third time this happen. https://phabricator.wikimedia.org/T148719 https://phabricator.wikimedia.org/T85286 Since the system is out of warranty , I replaced the IDRAC... [18:01:11] (03PS1) 10Yuvipanda: labs: Dump instance info somewhere that exists [puppet] - 10https://gerrit.wikimedia.org/r/333700 [18:01:33] 06Operations, 13Patch-For-Review: Remote IPMI doesn't work for ~2% of the fleet - https://phabricator.wikimedia.org/T150160#2962269 (10Papaul) [18:01:35] 06Operations, 10ops-codfw: mw2098 drac offline - system unreachable - https://phabricator.wikimedia.org/T155688#2962268 (10Papaul) 05Open>03Resolved [18:01:53] (03CR) 10Alexandros Kosiaris: [C: 032] redis: Allow specifying password for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/332436 (owner: 10Alexandros Kosiaris) [18:02:00] (03PS3) 10Alexandros Kosiaris: redis: Allow specifying password for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/332436 [18:02:03] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] redis: Allow specifying password for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/332436 (owner: 10Alexandros Kosiaris) [18:06:17] (03PS1) 10Jcrespo: mariadb: Depool db1065 (enwiki api server) for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333701 (https://phabricator.wikimedia.org/T156005) [18:09:46] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1065 (enwiki api server) for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333701 (https://phabricator.wikimedia.org/T156005) (owner: 10Jcrespo) [18:09:59] (03CR) 10jenkins-bot: mariadb: Depool db1065 (enwiki api server) for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333701 (https://phabricator.wikimedia.org/T156005) (owner: 10Jcrespo) [18:10:35] 06Operations, 10DBA, 10Wikimedia-General-or-Unknown: Spurious completely empty `image` table row on commonswiki - https://phabricator.wikimedia.org/T155769#2962307 (10matmarex) >>! In T155769#2960504, @Marostegui wrote: > If you guys consider it is safe to delete, go ahead, but please remember to use mediawi... [18:11:03] 06Operations, 10Phabricator, 06Release-Engineering-Team: reinstall iridium (phabricator) as phab1001 with jessie - https://phabricator.wikimedia.org/T152129#2962309 (10Dzahn) I think this could be done anytime. We'll just need an agreement with releng, decision which way to go and a larger maintenance window. [18:11:06] (03CR) 10Chad: "Why do we need this? One of the benefits of our current init script is we're pulling it straight from upstream and not having to maintain " [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [18:11:19] 06Operations, 10Phabricator, 06Release-Engineering-Team: reinstall iridium (phabricator) as phab1001 with jessie - https://phabricator.wikimedia.org/T152129#2962310 (10Paladox) I Spoke with @dzahn who said we can do this any time but is now blocked on @mmodell scheduling the maint window for this. [18:12:15] (03CR) 10Paladox: "> Why do we need this? One of the benefits of our current init script" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [18:12:57] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1065 (duration: 00m 39s) [18:13:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:09] 39 seconds? [18:13:15] that is really good [18:13:18] new scap? [18:13:31] new servers? [18:13:34] (03PS1) 10Alexandros Kosiaris: Revert "redis: Allow specifying password for monitoring" [puppet] - 10https://gerrit.wikimedia.org/r/333703 [18:14:52] (03CR) 10Chad: "Works fine in production." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [18:14:57] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "redis: Allow specifying password for monitoring" [puppet] - 10https://gerrit.wikimedia.org/r/333703 (owner: 10Alexandros Kosiaris) [18:15:12] 06Operations, 10Phabricator, 06Release-Engineering-Team: reinstall iridium (phabricator) as phab1001 with jessie - https://phabricator.wikimedia.org/T152129#2962332 (10Paladox) We can migrate to phab2001 first then upgrade iridium to debian and rename it to phab1001. Reason we should do it like this is if we... [18:15:30] (03CR) 10Paladox: "Oh, it fails on labs." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [18:18:27] !log shutting down ms-be2010 for maintenance [18:18:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:59] (03CR) 10Chad: [C: 032] Test mediawiki-Dashiki on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333086 (https://phabricator.wikimedia.org/T125403) (owner: 10Milimetric) [18:19:41] (03PS6) 10Ottomata: Configure RCFeeds to use EventBus extension in beta to send recentchange events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332807 (https://phabricator.wikimedia.org/T152030) [18:19:55] (03CR) 10Ottomata: [V: 032 C: 032] Configure RCFeeds to use EventBus extension in beta to send recentchange events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332807 (https://phabricator.wikimedia.org/T152030) (owner: 10Ottomata) [18:20:09] (03CR) 10jenkins-bot: Configure RCFeeds to use EventBus extension in beta to send recentchange events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332807 (https://phabricator.wikimedia.org/T152030) (owner: 10Ottomata) [18:21:14] ottomata: Was there a need to verify+2 and not wait for jenkins to do it? [18:21:15] PROBLEM - Host ms-be2010 is DOWN: PING CRITICAL - Packet loss = 100% [18:21:28] ^real? [18:21:45] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.348 second response time [18:22:21] ostriches: sorry, it was a rebase and i was impatient, jenkins had verified it previously [18:22:45] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.424 second response time [18:22:54] jynus: no that's papaul's maint [18:23:18] ottomata: It's ok. You've just motivated me to remove the ability for humans to verify themselves :) [18:23:27] * ostriches rubs hands together evilly [18:23:57] (03PS4) 10Chad: Test mediawiki-Dashiki on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333086 (https://phabricator.wikimedia.org/T125403) (owner: 10Milimetric) [18:24:26] ok [18:25:45] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.202 second response time [18:27:45] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.504 second response time [18:30:04] !log reimaging db1065 to jessie [18:30:05] RECOVERY - Host ms-be2010 is UP: PING OK - Packet loss = 0%, RTA = 36.16 ms [18:30:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:10] (03CR) 10jenkins-bot: Test mediawiki-Dashiki on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333086 (https://phabricator.wikimedia.org/T125403) (owner: 10Milimetric) [18:31:55] PROBLEM - Host ms-be2010 is DOWN: PING CRITICAL - Packet loss = 100% [18:32:07] !log demon@tin Synchronized wmf-config/extension-list-labs: no-op, completeness (duration: 00m 40s) [18:32:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:08] (03PS2) 10Yuvipanda: labs: Dump instance info somewhere that exists [puppet] - 10https://gerrit.wikimedia.org/r/333700 [18:33:44] ottomata: Oh btw in the future when merging changes to -labs files in mw-config, we also sync them in prod even if they're no-ops there. [18:33:45] (03CR) 10Paladox: [C: 031] "@BBlack hi, any update on this please?" [puppet] - 10https://gerrit.wikimedia.org/r/324797 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [18:33:49] (don't like things getting outta sync) [18:33:52] !log demon@tin Synchronized wmf-config/CommonSettings-labs.php: no-op, completeness (duration: 00m 40s) [18:33:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:58] But already did it for you just now because I was also doing it [18:34:17] oh, thanks ostriches sorry about that [18:34:22] No worries :) [18:34:26] (03CR) 10Yuvipanda: [C: 032] labs: Dump instance info somewhere that exists [puppet] - 10https://gerrit.wikimedia.org/r/333700 (owner: 10Yuvipanda) [18:34:29] i was talking to timo on friday and I thought i just had to merge this for beta only [18:35:12] ostriches: is there some mw config wikitech docs about how to do that, or process? [18:35:15] i found group_size: 2 [18:35:16] oops [18:35:21] i found https://wikitech.wikimedia.org/wiki/Configuration_files [18:35:31] but it mostly just describes the repo and config [18:41:22] ottomata: Yeah, it's beta only (which jenkins handles automatically) [18:41:36] Best practice has just been to also sync to prod so it's not weirdly outta sync with the repo [18:41:47] (I don't think *that* is documented tho) [18:43:49] !log nuria@tin Starting deploy [analytics/aqs/deploy@56ab863]: (no message) [18:43:50] !log nuria@tin Finished deploy [analytics/aqs/deploy@56ab863]: (no message) (duration: 00m 01s) [18:43:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:43:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:35] !log nuria@tin Starting deploy [analytics/aqs/deploy@56ab863]: (no message) [18:44:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:43] !log nuria@tin Finished deploy [analytics/aqs/deploy@56ab863]: (no message) (duration: 01m 08s) [18:45:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:41] !log nuria@tin Starting deploy [analytics/aqs/deploy@56ab863]: (no message) [18:47:43] !log nuria@tin Finished deploy [analytics/aqs/deploy@56ab863]: (no message) (duration: 00m 01s) [18:47:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:47] !log nuria@tin Starting deploy [analytics/aqs/deploy@56ab863]: (no message) [18:47:49] !log nuria@tin Finished deploy [analytics/aqs/deploy@56ab863]: (no message) (duration: 00m 01s) [18:47:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:58] nuria: Everything ok? [18:50:21] ostriches: i think so, i just deployed pageview api to every host 1 by 1 [18:50:38] Ah ok, wasn't sure why so many syncs :) [18:50:47] ostriches: thus logging [18:51:06] Also, dunno if you know about this feature, but you can type "scap deploy 'this is my awesome description'" and that's what shows up in IRC and the SAL [18:51:12] Instead of (no message) :) [18:51:17] ostriches: ya, there is an issue as it is trying to deploy to no longer existing host aqs100[1,2,3] will talk to team tomorrow [18:51:20] ostriches: i haven't done many wmf-config syncs ever, nor mw deploys [18:51:23] so i'm pretty green on that [18:51:43] No worries. Just using it as a chance to spread the knowledge :) [18:51:55] #imageaworld #themoreyouknow [18:52:03] Eh, #imagineaworld [18:52:05] ostriches: is it a full mw deploy? [18:52:06] sync [18:52:09] Nope :D [18:52:10] or just config? [18:52:14] how do you sync it? [18:52:16] are there docs on that? [18:52:23] `scap sync-file wmf-config/InitialiseSettings-labs.php` [18:52:26] And yes [18:52:39] https://wikitech.wikimedia.org/wiki/How_to_deploy_code should cover it [18:53:02] perfect thank you [18:55:35] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:56:02] ostriches: is there a way to see what mw config values are [18:56:08] (in beta) [18:56:08] ? [18:56:15] my change isn't doing what it should [18:56:16] php config? [18:56:19] yes [18:56:28] if you ssh to the tin in deployment prep [18:56:35] mwscript eval.php dbname [18:56:38] oo [18:56:40] var_dump( $wgFooBar ); [18:56:43] oh cool [18:56:44] nice [18:56:44] thanks [18:56:46] !log nuria@tin Starting deploy [analytics/aqs/deploy@025ef23]: (no message) [18:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:21] !log nuria@tin Finished deploy [analytics/aqs/deploy@025ef23]: (no message) (duration: 00m 35s) [18:57:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:25] Reedy: where is mwscript? [18:57:31] should be in path [18:57:50] OH [18:57:52] sorry, wrong host, hang on [18:58:02] (was on an app server) [18:58:10] heh [18:58:32] Ahhh already looking fishy [18:58:32] Notice: Undefined variable: wgEventServiceUrl in /srv/mediawiki-staging/wmf-config/CommonSettings-labs.php on line 39 [18:58:36] bastion.wmflabs.org -> deployment-tin.eqiad.wmflabs [18:59:15] PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.32.133 on port 6479 [18:59:20] CommonSettings.php should be evaled before CommonsSettings-labs.php, no? [19:00:04] Not all of it, but most of it, yeah [19:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170123T1900). Please do the needful. [19:00:07] if ( $wmfRealm === 'labs' ) { [19:00:07] require( "$wmfConfigDir/CommonSettings-labs.php" ); [19:00:07] } [19:00:09] near the bottom [19:00:15] RECOVERY - Redis replication status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 3085968 keys, up 84 days 10 hours - replication_delay is 60 [19:00:26] Nothing for swat today [19:00:45] ok [19:00:54] that seems what'd i'd expect [19:01:10] Wehre is wgEventServiceUrl defined? [19:01:21] Ah, only in if ( $wmgUseEventBus ) { [19:01:27] Does beta use event bus? [19:01:36] yes [19:01:39] the default is true [19:01:59] i guess that comes from InitialiseSettings.php [19:02:21] 'wmgUseEventBus' => [ [19:02:21] 'default' => true, [19:02:33] and [19:02:34] > var_dump( $wmgUseEventBus); [19:02:34] bool(true) [19:02:37] so that's good [19:03:31] yeah, its just $wgEventServiceUrl that is not set, looking [19:03:45] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.605 second response time [19:04:33] ACKNOWLEDGEMENT - Elasticsearch HTTPS on elastic2032 is CRITICAL: SSL CRITICAL - failed to verify search.svc.codfw.wmnet against elastic2032.codfw.wmnet Volans new host pending installation https://phabricator.wikimedia.org/T154251 [19:04:45] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.297 second response time [19:04:53] ACKNOWLEDGEMENT - Check systemd state on elastic2032 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Volans new host pending installation https://phabricator.wikimedia.org/T154251 [19:05:11] (03PS24) 10BBlack: cache_misc app_directors/req_handling split [puppet] - 10https://gerrit.wikimedia.org/r/300574 (https://phabricator.wikimedia.org/T110717) [19:05:13] (03PS24) 10BBlack: cache_misc req_handling: sort entries [puppet] - 10https://gerrit.wikimedia.org/r/300579 (https://phabricator.wikimedia.org/T110717) [19:05:15] (03PS25) 10BBlack: cache_misc req_handling: subpaths, cache policy, defaulting [puppet] - 10https://gerrit.wikimedia.org/r/300581 (https://phabricator.wikimedia.org/T110717) [19:05:17] (03PS9) 10BBlack: cache_misc: stream.wm.o subpathing for eventstreams [puppet] - 10https://gerrit.wikimedia.org/r/327550 (https://phabricator.wikimedia.org/T143925) [19:05:21] :O :) [19:07:14] hm Reedy, there must be some require order problem: [19:07:24] > otto@deployment-tin:~$ mwscript eval.php enwiki [19:07:24] PHP Notice: Undefined variable: wgEventServiceUrl in /srv/mediawiki-staging/wmf-config/CommonSettings-labs.php on line 39 [19:07:24] Notice: Undefined variable: wgEventServiceUrl in /srv/mediawiki-staging/wmf-config/CommonSettings-labs.php on line 39 [19:07:24] > var_dump( $wgEventServiceUrl ); [19:07:24] string(77) "http://deployment-eventlogging04.deployment-prep.eqiad.wmflabs:8085/v1/events" [19:08:01] but, it is CommonSettings.php that does [19:08:04] $wgEventServiceUrl = "{$wmfLocalServices['eventbus']}/v1/events"; [19:08:51] could CommonSettings-labs.php ever be required by something outside of CommonSettings.php? [19:10:08] It shouldn't be [19:10:33] yea i don't see that either [19:10:37] but, when i load mwscript [19:10:51] it says $wgEventServiceUrl is undefined when referenced in CommonsSettings-labs.php [19:11:05] but by the first command I enter at the prompt, it is defined [19:12:53] wonder if something is out of sync [19:15:14] $wmgUseEventBus is set properly [19:15:24] but $wgEventServiceUrl is not [19:15:32] when line 39 in CommonSettings-labs.php runs [19:16:58] Reedy: maybe, but i'm doing this all on deployment-tin right now [19:17:07] and the files look like they are in the right place [19:17:34] unless there is some /srv/mediawiki vs /srv/mediawiki-staging problem? I don't know the difference between those, except that my change is not in /srv/mediawiki on tin [19:17:38] but it is on app servers [19:18:34] likely [19:18:38] just trying to get it to fix that [19:19:45] ottomata: oh [19:20:02] EventBus *is* after the -labs [19:20:05] Why shit is down there [19:20:06] * Reedy fixes [19:20:14] 06Operations, 10Analytics, 10Analytics-Cluster, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2962587 (10DarTar) @Ottomata from a budget perspective I think we can move forward with this immediately, we should be able to fund the expense but I'... [19:20:16] OH [19:20:18] sorry [19:20:23] was just grepping shoulda checked that [19:20:41] Not your fault :P [19:21:34] (03Draft2) 10Reedy: Load CommonSettings-labs last! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333709 [19:23:05] (03CR) 10Reedy: [C: 032] Load CommonSettings-labs last! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333709 (owner: 10Reedy) [19:23:35] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [19:23:38] ottomata: It's easy to lose context Ctrl + F in a browser :D [19:23:48] ya, i shoulda had the file open [19:23:57] * Reedy waits for jenkins [19:24:48] (03Merged) 10jenkins-bot: Load CommonSettings-labs last! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333709 (owner: 10Reedy) [19:24:59] (03CR) 10jenkins-bot: Load CommonSettings-labs last! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333709 (owner: 10Reedy) [19:25:44] 06Operations, 10Phabricator, 06Release-Engineering-Team: reinstall iridium (phabricator) as phab1001 with jessie - https://phabricator.wikimedia.org/T152129#2839436 (10Dzahn) [19:26:06] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Make sure CommonSettings-labs is one of the last things loaded so we don't get problems from things being included after (duration: 00m 40s) [19:26:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:21] 06Operations, 07Puppet, 07Epic, 07Need-volunteer, 13Patch-For-Review: align puppet-lint config with coding style - https://phabricator.wikimedia.org/T93645#2962613 (10Dzahn) @scfc +1 It should be a bug/feature request with upstream puppet-lint. Care to make one? Also, we should check if it's not just ab... [19:27:43] ottomata: That's fixed the notices on deployment tin [19:27:47] jenkins should be deploying it now [19:28:19] CooOOoL [19:29:25] it works! [19:29:26] thanks Reedy [19:29:34] :) [19:29:36] np [19:29:47] Hopefully where I moved it should prevent these problems in future [19:30:34] great [19:41:05] (03CR) 10Dzahn: "I checked that all 130 email addresses are deliverable Google addresses (router = ldap_account, transport = remote_smtp, host aspmx.l.goog" [puppet] - 10https://gerrit.wikimedia.org/r/333611 (https://phabricator.wikimedia.org/T142826) (owner: 10Muehlenhoff) [19:41:19] (03PS3) 10Dzahn: Add remaining staff email addresses to data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/333611 (https://phabricator.wikimedia.org/T142826) (owner: 10Muehlenhoff) [19:42:22] (03CR) 10Dzahn: [C: 032] Add remaining staff email addresses to data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/333611 (https://phabricator.wikimedia.org/T142826) (owner: 10Muehlenhoff) [19:54:45] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.519 second response time [19:55:45] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.582 second response time [19:56:36] Morning / afternoon greg-g! [19:57:15] (03PS2) 10Dzahn: kartotherian: optional parameter listed before required [puppet] - 10https://gerrit.wikimedia.org/r/332956 [19:59:45] (03PS4) 10Dzahn: kartotherian: optional parameter listed before required [puppet] - 10https://gerrit.wikimedia.org/r/332956 [20:02:36] !log otto@tin Starting deploy [analytics/aqs/deploy@025ef23]: (no message) [20:02:36] (03CR) 10Gehel: [C: 031] "I don't really like that rule. In a language like Puppet, it seems to me that it makes more sense to group parameters semantically than by" [puppet] - 10https://gerrit.wikimedia.org/r/332956 (owner: 10Dzahn) [20:02:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:20] (03CR) 10Dzahn: [C: 032] "thanks!. yea, i know. i thought this too but basically just for the special case of user/pass parameters where 2 parameters kind of belon" [puppet] - 10https://gerrit.wikimedia.org/r/332956 (owner: 10Dzahn) [20:04:45] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [20:04:52] !log otto@tin Finished deploy [analytics/aqs/deploy@025ef23]: (no message) (duration: 02m 16s) [20:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:55] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.107, port=7232): Max retries exceeded with url: /analytics.wikimedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [20:04:55] PROBLEM - AQS root url on aqs1004 is CRITICAL: connect to address 10.64.0.107 and port 7232: Connection refused [20:05:12] ottomata: --^ [20:05:35] PROBLEM - Check systemd state on aqs1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:05:39] (03PS2) 10Dzahn: proxysql: optional parameter before required parameter [puppet] - 10https://gerrit.wikimedia.org/r/332957 [20:06:09] yeah [20:06:11] i just depooled it [20:06:17] !log deplyoing latest wdqs version (2h behind planned schedule) [20:06:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:21] the deploy failed, and then the process wouldn't start up [20:06:22] elukey: [20:06:25] dunno what's going on yet [20:06:39] (03PS3) 10Dzahn: proxysql: optional parameter before required parameter [puppet] - 10https://gerrit.wikimedia.org/r/332957 [20:06:55] ottomata: just deployed to 1004? [20:07:24] (03CR) 10Dzahn: [C: 032] proxysql: optional parameter before required parameter [puppet] - 10https://gerrit.wikimedia.org/r/332957 (owner: 10Dzahn) [20:07:30] yes [20:07:31] just 1004 [20:08:48] (03PS2) 10Dzahn: labspuppetbackend: optional parameter before required [puppet] - 10https://gerrit.wikimedia.org/r/332958 [20:08:54] (03PS3) 10Dzahn: labspuppetbackend: optional parameter before required [puppet] - 10https://gerrit.wikimedia.org/r/332958 [20:09:06] !log gehel@tin Starting deploy [wdqs/wdqs@fd88fda]: (no message) [20:09:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:41] !log otto@tin Starting deploy [analytics/aqs/deploy@025ef23]: (no message) [20:09:43] !log otto@tin Finished deploy [analytics/aqs/deploy@025ef23]: (no message) (duration: 00m 01s) [20:09:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:46] (03CR) 10Dzahn: [C: 032] labspuppetbackend: optional parameter before required [puppet] - 10https://gerrit.wikimedia.org/r/332958 (owner: 10Dzahn) [20:11:03] !log gehel@tin Finished deploy [wdqs/wdqs@fd88fda]: (no message) (duration: 01m 56s) [20:11:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:52] SMalyshev: wdqs deployment completed, tests are passing, all looks good [20:12:05] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:12:35] (03PS3) 10Dzahn: openstack: instancersync not in autoload module layout [puppet] - 10https://gerrit.wikimedia.org/r/332954 [20:12:40] gehel: cool, thanks! [20:12:46] (03PS2) 10Dzahn: openstack: designate/glance/keystone not in autoload module [puppet] - 10https://gerrit.wikimedia.org/r/332955 [20:13:55] (03PS2) 10Dzahn: interface: rps::modparams, aggregate_member not in autoload layout [puppet] - 10https://gerrit.wikimedia.org/r/332959 [20:15:45] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 1.431 second response time [20:16:29] (03CR) 10Dzahn: [C: 04-1] "@Faidon should apt.wikimedia.org.conf.erb be in modules "aptrepo" rather than "install_server"? What about the web server setup. Right no" [puppet] - 10https://gerrit.wikimedia.org/r/325864 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [20:16:45] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.536 second response time [20:17:35] RECOVERY - Check systemd state on aqs1004 is OK: OK - running: The system is fully operational [20:17:55] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy [20:17:55] RECOVERY - AQS root url on aqs1004 is OK: HTTP OK: HTTP/1.1 200 - 727 bytes in 0.003 second response time [20:18:43] !log otto@tin Starting deploy [analytics/aqs/deploy@025ef23]: (no message) [20:18:44] !log otto@tin Finished deploy [analytics/aqs/deploy@025ef23]: (no message) (duration: 00m 01s) [20:18:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:35] !log nuria@tin Starting deploy [analytics/aqs/deploy@025ef23]: (no message) [20:28:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:42] !log nuria@tin Finished deploy [analytics/aqs/deploy@025ef23]: (no message) (duration: 00m 07s) [20:28:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:03] !log nuria@tin Starting deploy [analytics/aqs/deploy@025ef23]: (no message) [20:29:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:29:11] !log nuria@tin Finished deploy [analytics/aqs/deploy@025ef23]: (no message) (duration: 00m 08s) [20:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:45] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [20:37:08] 06Operations: upgrade netmon1001 to jessie - https://phabricator.wikimedia.org/T125020#1972424 (10Dzahn) [20:37:40] 06Operations: upgrade netmon1001 to jessie - https://phabricator.wikimedia.org/T125020#1972424 (10Dzahn) procurement ticket for replacement hardware for this T156040 [20:38:30] 06Operations, 10Analytics, 10Analytics-Cluster, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2962931 (10Ottomata) We guesstimated that a stat1002 like replacement would cost around $10K (these machines have a lot of storage...we may reevaluate... [20:39:24] 06Operations, 10Analytics, 10Analytics-Cluster, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2962934 (10Ottomata) That is, if you all are ok with waiting until sometime in Q4 for this. If not, we'd have to get a smaller form factor GPU and pu... [20:41:05] RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [20:41:18] (03CR) 10Dzahn: "I don't know if that is the only reason for access control on grafana. Alex?" [puppet] - 10https://gerrit.wikimedia.org/r/333024 (owner: 10Addshore) [20:46:11] (03PS2) 10Dzahn: aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) [20:46:58] (03PS3) 10Dzahn: aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) [20:47:50] 06Operations, 13Patch-For-Review: Split carbon's install/mirror roles, provision install1001 - https://phabricator.wikimedia.org/T132757#2963006 (10Dzahn) [20:50:15] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:51:40] checks wasat [20:52:44] false alarm as mostly expected [20:53:15] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [20:54:44] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) (owner: 10Dzahn) [20:56:47] (03CR) 10Alex Monk: "project-bastion, the most open group in our LDAP system, is on there..." [puppet] - 10https://gerrit.wikimedia.org/r/333024 (owner: 10Addshore) [20:57:38] Krenair: ^^ is it? I dont see it... [20:57:51] hang on [20:58:46] (03CR) 10Alex Monk: [C: 031] "oh, right, that's only the labs.pp one. Anyway grafana-admins is on both, and they don't need any NDA, so this should be fine" [puppet] - 10https://gerrit.wikimedia.org/r/333024 (owner: 10Addshore) [21:00:04] gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, Amir1, and yurik: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170123T2100). Please do the needful. [21:03:50] (03PS4) 10Dzahn: aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) [21:06:00] (03PS5) 10Dzahn: aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) [21:09:23] (03CR) 10Jcrespo: [C: 04-1] "there is already the grafana-admin group, which wmde users were added on requests, to solve this issue. IF this was to be granted, it woul" [puppet] - 10https://gerrit.wikimedia.org/r/333024 (owner: 10Addshore) [21:09:57] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) (owner: 10Dzahn) [21:12:21] 07Puppet, 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure, 07Beta-Cluster-reproducible: New instance have broken puppet configuration when using puppetmaster standalone - https://phabricator.wikimedia.org/T148929#2963092 (10hashar) [21:13:35] (03CR) 10Addshore: "> there is already the grafana-admin group, which wmde users were added on requests, to solve this issue. IF this was to be granted, it wo" [puppet] - 10https://gerrit.wikimedia.org/r/333024 (owner: 10Addshore) [21:14:15] (03PS6) 10Dzahn: aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) [21:15:42] (03CR) 10Dzahn: "> If you have a volunteer that wanted to be able to edit grafana dashboards, they would not be added to the wmde group." [puppet] - 10https://gerrit.wikimedia.org/r/333024 (owner: 10Addshore) [21:16:53] 07Puppet, 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure, 07Beta-Cluster-reproducible: New instance have broken puppet configuration when using puppetmaster standalone - https://phabricator.wikimedia.org/T148929#2736876 (10hashar) [21:16:54] (03CR) 10Dzahn: "..unless we can do something like "grafana-admins: wmde, user1, user" including the group in another group" [puppet] - 10https://gerrit.wikimedia.org/r/333024 (owner: 10Addshore) [21:17:37] (03CR) 10Addshore: "> ..unless we can do something like "grafana-admins: wmde, user1, user" including the group in another group" [puppet] - 10https://gerrit.wikimedia.org/r/333024 (owner: 10Addshore) [21:18:03] (03CR) 10Jcrespo: [C: 04-1] "> Which means we should simply use the grafana-admin group to add grafana-admins. Right?" [puppet] - 10https://gerrit.wikimedia.org/r/333024 (owner: 10Addshore) [21:18:59] 06Operations, 10hardware-requests: hardware request for netmon1001 - https://phabricator.wikimedia.org/T156040#2963118 (10RobH) [21:19:34] 07Puppet, 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure, 07Beta-Cluster-reproducible: New instance have broken puppet configuration when using puppetmaster standalone - https://phabricator.wikimedia.org/T148929#2963120 (10hashar) [21:19:35] RECOVERY - Host ms-be2010 is UP: PING OK - Packet loss = 0%, RTA = 36.14 ms [21:20:12] 06Operations, 10hardware-requests: eqiad: (4) worker servers for kubernetes - https://phabricator.wikimedia.org/T141624#2963127 (10RobH) [21:20:15] 06Operations, 10ops-eqiad, 10hardware-requests: Return wmf4747/wmf4748/wmf4749/wmf4750 to spares - https://phabricator.wikimedia.org/T146171#2963126 (10RobH) 05Open>03Resolved [21:24:30] (03CR) 10Chad: [C: 031] Move some production apache config files to templates [puppet] - 10https://gerrit.wikimedia.org/r/322602 (https://phabricator.wikimedia.org/T1256) (owner: 10Alex Monk) [21:27:38] (03PS7) 10Dzahn: aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) [21:32:02] (03CR) 10Alex Monk: [C: 031] Escape period in wiki.phtml rewrites [puppet] - 10https://gerrit.wikimedia.org/r/331944 (owner: 10Reedy) [21:32:13] 06Operations, 10hardware-requests: hardware request for netmon1001 - https://phabricator.wikimedia.org/T156040#2963146 (10RobH) a:05RobH>03mark So netmon1001 has the following: * Dual [[ https://ark.intel.com/products/47923/Intel-Xeon-Processor-E5640-12M-Cache-2_66-GHz-5_86-GTs-Intel-QPI | Intel Xeon E564... [21:32:36] addshore: hey, sorry, was afk for a bit this morning, what's up? [21:35:10] greg-g: Just to let you know that I am going to schedule a slot for https://phabricator.wikimedia.org/T155995 tommorrow (probably before the EU mid day swat). Don't think I have to let you know for things on beta but figured I'd give you a poke anyway! :) [21:35:55] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2010 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [21:36:37] !log bsitzmann@tin Starting deploy [mobileapps/deploy@7615bf9]: Update mobileapps to 66ef3c2 [21:36:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:37:02] (03CR) 10Chad: [C: 032] Remove wiki.phtml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332698 (owner: 10Chad) [21:37:05] addshore: thanks, I like knowing :) [21:37:10] Reedy: ^ [21:37:20] Yeah, I just saw at the time [21:37:23] And then It'll probably head out to prod in another slot in a few weeks :) [21:37:28] 06Operations, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, and 3 others: Deploy InterwikiSorting extension to beta - https://phabricator.wikimedia.org/T155995#2963153 (10greg) [21:37:30] addshore: behave [21:37:38] (03CR) 10Reedy: [C: 031] Remove wiki.phtml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332698 (owner: 10Chad) [21:37:53] 06Operations, 10ops-esams: Degraded RAID on bast3001 - https://phabricator.wikimedia.org/T154603#2917240 (10Dzahn) ``` This message was generated by the smartd daemon running on: host name: bast3001 DNS domain: wikimedia.org The following warning/error was logged by the smartd daemon: Device: /dev/sd... [21:38:07] Reedy: hush now, where is my chocolate? [21:38:15] do you deserve any? [21:38:26] :o [21:38:42] (03Merged) 10jenkins-bot: Remove wiki.phtml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332698 (owner: 10Chad) [21:39:09] Reedy: phanobviously.... (doesn't really work there....) [21:39:11] (03CR) 10jenkins-bot: Remove wiki.phtml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332698 (owner: 10Chad) [21:39:24] obphiously? [21:39:40] phantastic... [21:39:54] !log bsitzmann@tin Finished deploy [mobileapps/deploy@7615bf9]: Update mobileapps to 66ef3c2 (duration: 03m 16s) [21:39:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:05] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [21:40:17] addshore: go down to the grocery store and ask them for a candy bar as an in-kind donation to WMDE and wikidata ;) [21:40:41] bd808: thanks for the tshirt ;) [21:40:42] Reedy: Here goes nothing [21:40:50] uh oh [21:41:01] ostriches: "Well, it's your own stupid fault for using this" [21:41:08] !log demon@tin Synchronized w: Removing wiki.phtml, apache does the rewrites (duration: 00m 48s) [21:41:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:41:32] "This URL was only deprecated like 12 years ago" [21:42:22] Heh, resourceloader makes urls that are too long now for apache :p [21:42:23] addshore: you are most welcome. greg-g found the budget to pay for this batch of them so we should both thank him to0. (thanks greg-g!) [21:42:33] anytime [21:42:45] bd808: I want to get mine framed [21:42:52] now, let's not all go breaking (and fixing!) production to eat through all of my remaining budget! [21:43:45] Reedy: I thought you already had? I've at least told people that your mum framed it for you. :) [21:43:49] !log sca2004 was out of memory but also fixed itself and i could run puppet again a few minutes later [21:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:05] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:44:06] bd808: She was going to... But I'm not sure what happened to doing so [21:44:12] Need to decide if I fram the whole thing [21:44:13] greg-g, there are different colours right? (Like the youtube play buttons they send to people) silver for 1 breakage, gold for 10, diamond for 50 and ruby for 100? [21:44:21] Or just the important part [21:44:37] * Reedy thinks addshore doesn't understand rankings of precious gems [21:44:43] Reedy: I'm not seeing any real change in 4xx levels or errors in logstash generally, seems to have worked :) [21:45:05] woo [21:45:13] * addshore thinks Reedy doesn't know the hierarchy youtube created [21:46:41] fsck youtube [21:46:46] addshore: I've broken the site enough times that I ran out of space for them :P [21:46:58] I've got like 40 diamonds by now [21:47:02] (that's 200 breakages?) [21:48:18] (03PS1) 10Dzahn: add install1002 to site [puppet] - 10https://gerrit.wikimedia.org/r/333780 [21:52:20] (03PS1) 10Chad: Updating from meta, removes wiki.phtml that bugs me [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333788 [21:52:30] (03CR) 10Chad: [C: 032] Updating from meta, removes wiki.phtml that bugs me [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333788 (owner: 10Chad) [21:53:47] (03Merged) 10jenkins-bot: Updating from meta, removes wiki.phtml that bugs me [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333788 (owner: 10Chad) [21:53:58] (03CR) 10jenkins-bot: Updating from meta, removes wiki.phtml that bugs me [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333788 (owner: 10Chad) [21:54:25] (03PS1) 10Rush: nfs_mount: protoct absent the same as present [puppet] - 10https://gerrit.wikimedia.org/r/333791 [21:54:59] !log demon@tin Synchronized wmf-config: interwiki update, dropping some old ExtensionMessages files (duration: 00m 41s) [21:55:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:55:35] (03PS2) 10Rush: nfs_mount: protoct absent the same as present [puppet] - 10https://gerrit.wikimedia.org/r/333791 [21:57:05] PROBLEM - Redis replication status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [21:58:05] PROBLEM - NTP on ms-be2010 is CRITICAL: NTP CRITICAL: Offset unknown [21:58:05] RECOVERY - Redis replication status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3089227 keys, up 84 days 13 hours - replication_delay is 55 [21:59:19] (03PS3) 10Rush: nfs_mount: protect absent the same as present [puppet] - 10https://gerrit.wikimedia.org/r/333791 [22:00:04] dapatrick, bawolff, and Reedy: Dear anthropoid, the time has come. Please deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170123T2200). [22:00:15] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:03:25] PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.32.133 on port 6479 [22:04:25] RECOVERY - Redis replication status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 3089925 keys, up 84 days 13 hours - replication_delay is 0 [22:05:08] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/5190/" [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) (owner: 10Dzahn) [22:07:37] (03PS1) 10Rush: nfsclient: remove temporary absents from migration [puppet] - 10https://gerrit.wikimedia.org/r/333794 [22:07:45] (03PS4) 10Rush: nfs_mount: protect absent the same as present [puppet] - 10https://gerrit.wikimedia.org/r/333791 [22:07:57] (03PS2) 10Rush: nfsclient: remove temporary absents from migration [puppet] - 10https://gerrit.wikimedia.org/r/333794 [22:09:42] (03CR) 10Madhuvishy: [C: 031] nfsclient: remove temporary absents from migration [puppet] - 10https://gerrit.wikimedia.org/r/333794 (owner: 10Rush) [22:11:23] (03CR) 10Madhuvishy: [C: 031] nfs_mount: protect absent the same as present [puppet] - 10https://gerrit.wikimedia.org/r/333791 (owner: 10Rush) [22:11:38] (03CR) 10Rush: [C: 032] nfs_mount: protect absent the same as present [puppet] - 10https://gerrit.wikimedia.org/r/333791 (owner: 10Rush) [22:13:55] 06Operations: Select site vendor for Asia Cache Datacenter - https://phabricator.wikimedia.org/T156030#2963216 (10Krenair) [22:15:41] (03CR) 10Rush: [C: 032] nfsclient: remove temporary absents from migration [puppet] - 10https://gerrit.wikimedia.org/r/333794 (owner: 10Rush) [22:18:06] !log update RESTBase to d1663345c: staging [22:18:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:21:23] !log update RESTBase to d1663345c: canary on restbase1007 [22:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:22:23] !log update RESTBase to d1663345c [22:22:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:28:15] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [22:37:05] !log update RESTBase to 598fa56f: canary on restbase1007 [22:37:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:45] isnt restbase the DB or am i thinking of something else, sorry if thats a stupid question its just i cant remember what its for [22:38:53] !log update RESTBase to 598fa56f [22:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:46:35] PROBLEM - mediawiki-installation DSH group on mw2098 is CRITICAL: Host mw2098 is not in mediawiki-installation dsh group [22:54:55] PROBLEM - puppet last run on cp3039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:03:28] 06Operations, 10ops-codfw: troubleshoot drac on ms-be2010.codfw.wmnet - https://phabricator.wikimedia.org/T155690#2963437 (10Papaul) Cannot flash firmware of drac/bios gettng error message when trying to update the drac/bios. Resetting the IDRAC to factory hands at 1% for 45 minutes have to cancel. I will res... [23:04:55] PROBLEM - puppet last run on db1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:22:55] PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:23:55] RECOVERY - puppet last run on cp3039 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [23:32:55] RECOVERY - puppet last run on db1038 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [23:35:37] (03CR) 10Volans: [C: 032] "LGTM" [software] - 10https://gerrit.wikimedia.org/r/325762 (https://phabricator.wikimedia.org/T152549) (owner: 10Hashar) [23:37:04] (03CR) 10Volans: [C: 032] "recheck" [software] - 10https://gerrit.wikimedia.org/r/325762 (https://phabricator.wikimedia.org/T152549) (owner: 10Hashar) [23:37:44] (03Merged) 10jenkins-bot: Use local tox instead of installing a new one [software] - 10https://gerrit.wikimedia.org/r/325762 (https://phabricator.wikimedia.org/T152549) (owner: 10Hashar) [23:41:44] (03PS8) 10Dzahn: aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) [23:43:05] PROBLEM - Redis replication status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 619 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3095361 keys, up 84 days 15 hours - replication_delay is 619 [23:43:15] PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 636 600 - REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 3095558 keys, up 84 days 15 hours - replication_delay is 636 [23:45:09] (03PS9) 10Dzahn: aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) [23:46:53] (03CR) 10Volans: [C: 04-1] "Hashar, what about the warnings I'm getting in the submodules?" [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [23:48:15] RECOVERY - Redis replication status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 3091539 keys, up 84 days 15 hours - replication_delay is 0 [23:49:05] RECOVERY - Redis replication status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3091380 keys, up 84 days 15 hours - replication_delay is 0 [23:50:12] (03PS1) 10Jcrespo: mariadb: repool db1065 with low weight after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333812 (https://phabricator.wikimedia.org/T156005) [23:50:55] RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [23:51:43] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: setup rsync between 2 APT servers [puppet] - 10https://gerrit.wikimedia.org/r/333676 (https://phabricator.wikimedia.org/T84380) (owner: 10Dzahn) [23:54:02] (03CR) 10Volans: "See my comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/327388 (https://phabricator.wikimedia.org/T151632) (owner: 10Dzahn) [23:56:30] 06Operations, 10Analytics, 10Analytics-Cluster, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2963557 (10ellery) I'm in no rush, especially if I can get some budget to rent GPUs on AWS in the meantime. [23:57:51] 06Operations, 07Puppet, 10Deployment-Systems, 06Release-Engineering-Team, 05Mediawiki SWAT Deployments: mwdebug1002 should have PHP extensions - https://phabricator.wikimedia.org/T153316#2963564 (10greg) >>! In T153316#2961290, @MoritzMuehlenhoff wrote: >>>! In T153316#2881845, @greg wrote: >> Let's do t... [23:59:35] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues