[00:00:06] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=chromium.wikimedia.org [00:00:10] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171122T0000). [00:00:10] bawolff: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:10] Deploy window No deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171122T0000) [00:00:10] No GERRIT patches in the queue for this window AFAICS. [00:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:00:34] Woo [00:00:44] I'll do it [00:00:46] addshore hashar anomie RainbowSprinkles aude MaxSem twentyafterfour RoanKattouw Dereckson thcipriani Niharika zeljkof hey actually may have a patch to deploy for CN [00:00:52] Thanks [00:01:03] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=maerlant.wikimedia.org [00:01:04] in about 15 or 20 min [00:01:06] !log bblack@neodymium conftool action : set/pooled=no; selector: name=maerlant.wikimedia.org [00:01:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:01:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:01:21] just some quick instrumentation before a FR test tomorrow [00:02:43] AndyRussG: sure, just add it to the page [00:02:57] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=maerlant.wikimedia.org [00:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:03:06] (03PS2) 10MaxSem: Enable Timeless everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392576 (https://phabricator.wikimedia.org/T154371) (owner: 10Brian Wolff) [00:03:22] (03CR) 10MaxSem: [C: 032] Enable Timeless everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392576 (https://phabricator.wikimedia.org/T154371) (owner: 10Brian Wolff) [00:03:44] MaxSem: thanks!!!! [00:04:33] (03Merged) 10jenkins-bot: Enable Timeless everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392576 (https://phabricator.wikimedia.org/T154371) (owner: 10Brian Wolff) [00:04:41] (I guess it's fine to break the cluster if it's FR telling you to do that :P) [00:04:42] !log bblack@neodymium conftool action : set/pooled=no; selector: name=nescio.wikimedia.org [00:04:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:07:01] (03CR) 10jenkins-bot: Enable Timeless everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392576 (https://phabricator.wikimedia.org/T154371) (owner: 10Brian Wolff) [00:07:07] !log bblack@neodymium conftool action : set/pooled=yes; selector: name=nescio.wikimedia.org [00:07:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:07:32] bawolff: both changes pulled on mwdebug1002, please test [00:08:04] confirmed timeless works [00:08:40] confirmed that https://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E6%8A%80%E6%9C%AF#inputbox.E7.9A.84.E7.B9.81.E7.B0.A1.E8.BD.89.E6.8F.9B.E5.A4.B1.E6.95.88.E4.BA.86 now works [00:08:44] Looks good :) [00:08:49] wee [00:10:13] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/392576/ (duration: 00m 46s) [00:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:10:31] forgive my ignorance - this means s/vector/timeless/ is possible per-user on all wikis now, or? [00:11:13] bblack: There's now an option in preferences where you can enable timeless [00:11:28] https://en.wikipedia.org/w/index.php?title=Main_Page&useskin=timeless looks really nice :) [00:11:28] The individual users have to select it [00:11:43] And the useskin= url parameter will now work on all wikis [00:11:46] MaxSem: <3 [00:11:49] !log maxsem@tin Synchronized php-1.31.0-wmf.8/extensions/InputBox/: https://gerrit.wikimedia.org/r/#/c/392745/ (duration: 00m 45s) [00:11:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:12:06] no, <3<3<3 to you Isarra :) [00:12:07] awesome! [00:12:32] There are so many bugs... [00:12:38] And now there will be more! [00:12:41] Muahahahah! [00:13:11] I mean... [00:15:26] page icons look weird: https://en.wikipedia.org/wiki/Robert_Mugabe?useskin=timeless [00:15:55] infobox margins are gone [00:17:32] BUT BESIDES THAT IT'S AWESOME [00:20:09] Much to fix. Um. [00:20:16] I'll get on that at some point. [00:20:33] Also yeah, what is with the infoboxes? I never did figure that one out... [00:21:13] MaxSem: changes are on the Deployments page, Gerrit just merged, it seems :) [00:21:16] thanks much!! [00:23:12] AndyRussG: how is it deployed? I don't see wmf branches [00:24:40] MaxSem: ah yes sorry CentralNotice is still a snowflake [00:24:54] in core, the submodule points to the head of the wmf_deploy branch [00:25:33] MaxSem: CN should now be at c3593256c1ae8af7d128446adea3298602718178 [00:25:40] which is what's to be deployed [00:26:07] AndyRussG: so checkout that hash manually? [00:27:44] MaxSem: I think there's some magic... if you fetch and rebase it should put CN submodule to that automatically [00:28:01] ughhhh [00:28:29] the wmf branch of mediawiki core, just like for all other extensions, tracks a certain branch. For most extenisons, it tracks the same-name wmf branch [00:28:34] for CN is is wmf_deploy hardcoded [00:28:52] Submodule path 'extensions/CentralNotice': rebased into 'c3593256c1ae8af7d128446adea3298602718178' [00:28:54] And just like for other extensions, a commit to that branch should result in an automatic commit by Gerrit to update the submodule pointer, so git pull in mw core install should provide a submodule update. [00:28:59] Did it not do that? [00:29:10] it did [00:29:18] Krinkle: MaxSem: cool thanks! [00:29:19] Looks like it did https://github.com/wikimedia/mediawiki/commits/wmf/1.31.0-wmf.8 [00:29:34] Yeah this needs un-snowflaking [00:29:40] apologies for the cruft [00:29:46] AndyRussG: live on mwdebug1002, please test [00:30:09] For SWAT it behaves the same as any other extension, the abstraction is moved to Gerrit internals, so not that different anymore for what its worth. Still tech debt though. [00:32:22] MaxSem: K! [00:36:14] AndyRussG: so we're fine to deploy? [00:36:28] MaxSem: yes all good :) [00:36:29] PROBLEM - Apache HTTP on mw2113 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:37:19] RECOVERY - Apache HTTP on mw2113 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.119 second response time [00:38:51] jouncebot: next [00:38:51] In 130 hour(s) and 21 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171127T1100) [00:39:08] !log maxsem@tin Synchronized php-1.31.0-wmf.8/extensions/CentralNotice/: https://gerrit.wikimedia.org/r/#/c/392754/ (duration: 00m 47s) [00:39:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:39:40] AndyRussG: ^ [00:40:24] !log gerrit - re-enabling puppet to apply logstash change on cobalt, gerrit restart incoming (T141324) [00:40:26] MaxSem: excellent!!! [00:40:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:40:32] T141324: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324 [00:40:51] MaxSem: thanks much!!!!! [00:41:06] Krinkle: thx!!! [00:43:40] !log gerrit restarting service to apply config change [00:43:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:44:49] gerrit is back [00:46:47] (03PS11) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [01:09:00] !log Cleaned out remaining T180934 related log blow up on snapshot1007 (dumpwikidatajson-wikidata-20171120-all-0.log) [01:09:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:09:06] T180934: Wikidata json dumps filling /var/log - https://phabricator.wikimedia.org/T180934 [01:13:09] 10Operations: Migrate racktables to servermon - https://phabricator.wikimedia.org/T88424#3780052 (10faidon) [01:13:11] 10Operations, 10DC-Ops: Information missing from racktables - https://phabricator.wikimedia.org/T150651#3780050 (10faidon) 05Open>03Resolved Good enough for now. Thanks everyone! [01:26:30] PROBLEM - Host cp3048 is DOWN: PING CRITICAL - Packet loss = 100% [01:28:39] RECOVERY - Host cp3048 is UP: PING OK - Packet loss = 0%, RTA = 83.77 ms [01:36:03] (03PS1) 10Iniquity: Revert "Enable Timeless everywhere" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392762 [01:37:05] (03CR) 10Paladox: [C: 04-1] "I liked it enable on english wikipedia." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392762 (owner: 10Iniquity) [01:39:28] (03CR) 10Brian Wolff: "Why?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392762 (owner: 10Iniquity) [01:41:40] (03PS1) 10Dzahn: Revert "Revert "apache: remove ganglia monitoring"" [puppet] - 10https://gerrit.wikimedia.org/r/392763 [01:41:47] (03PS1) 10Dzahn: Revert "Revert "hhvm: remove ganglia monitoring"" [puppet] - 10https://gerrit.wikimedia.org/r/392764 [01:41:58] (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "apache: remove ganglia monitoring"" [puppet] - 10https://gerrit.wikimedia.org/r/392763 (owner: 10Dzahn) [01:44:26] (03CR) 10Dereckson: "> If you want it disabled on a wiki please create a patch for that wiki." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392762 (owner: 10Iniquity) [01:47:38] (03CR) 10MaxSem: [C: 04-2] "Please provide an explanation. Revert warring in source repositories is a big no-no." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392762 (owner: 10Iniquity) [01:49:21] (03PS1) 10Kaldari: Enable MP3 uploading on Commons on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392765 (https://phabricator.wikimedia.org/T120288) [01:50:25] (03CR) 10jerkins-bot: [V: 04-1] Enable MP3 uploading on Commons on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392765 (https://phabricator.wikimedia.org/T120288) (owner: 10Kaldari) [01:51:49] (03PS2) 10Kaldari: Enable MP3 uploading on Commons on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392765 (https://phabricator.wikimedia.org/T120288) [02:35:43] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.8) (duration: 11m 34s) [02:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:04:11] 10Operations, 10Domains, 10Traffic: Acquire wikidirectory.org - https://phabricator.wikimedia.org/T181114#3780141 (10Legoktm) 05Open>03declined If/when https://meta.wikimedia.org/wiki/Wikidirectory is approved (or it looks likely to be approved) then it would make sense to buy the domain. At this time fi... [03:08:52] 10Operations, 10Domains, 10Traffic: Acquire wikidirectory.org - https://phabricator.wikimedia.org/T181114#3780145 (10KATMAKROFAN) 19 support, 2 oppose is not "unlikely to be approved". [03:15:21] 10Operations, 10Domains, 10Traffic: Acquire wikidirectory.org - https://phabricator.wikimedia.org/T181114#3780146 (10Legoktm) Please compare that to https://meta.wikimedia.org/wiki/Wikivoyage/Archive/2012-11-16#People_interested and https://meta.wikimedia.org/wiki/Requests_for_comment/Travel_Guide [03:16:06] 10Operations, 10Domains, 10Traffic: Acquire wikidirectory.org - https://phabricator.wikimedia.org/T181114#3780124 (10Dzahn) I suggest contacting the [[ https://meta.wikimedia.org/wiki/Legal | Legal ]] team directly about this. [03:24:59] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 773.34 seconds [03:30:14] (03PS1) 10Dzahn: dumps/site: remove outdated comment about firewall [puppet] - 10https://gerrit.wikimedia.org/r/392767 [03:30:43] (03CR) 10jerkins-bot: [V: 04-1] dumps/site: remove outdated comment about firewall [puppet] - 10https://gerrit.wikimedia.org/r/392767 (owner: 10Dzahn) [03:30:56] (03PS2) 10Dzahn: dumps/site: remove outdated comment about firewall [puppet] - 10https://gerrit.wikimedia.org/r/392767 [03:33:49] PROBLEM - Host mw2251 is DOWN: PING CRITICAL - Packet loss = 100% [03:36:10] !log powercycled mw2251 which had gone down without further comment [03:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:38:19] RECOVERY - Host mw2251 is UP: PING OK - Packet loss = 0%, RTA = 36.15 ms [03:42:17] (03PS1) 10Dzahn: mw_canary_appserver: mv firewall incl to role, use profile [puppet] - 10https://gerrit.wikimedia.org/r/392768 [03:46:20] (03PS1) 10Dzahn: labnodepool: move standard/firewall includes to role [puppet] - 10https://gerrit.wikimedia.org/r/392769 [03:50:49] 10Operations, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3029409 (10Dzahn) @chasemp Should we reinstall these with stretch? I noticed them in site.pp with a comment leading to... [03:55:09] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 247.16 seconds [03:55:47] (03PS1) 10Dzahn: archiva: move standard include, use profile::b::firewall [puppet] - 10https://gerrit.wikimedia.org/r/392770 [03:56:39] ACKNOWLEDGEMENT - Long running screen/tmux on puppetcompiler1001 is CRITICAL: CRIT: Long running SCREEN process. (PID: 12278, 2864405s 1728000s). daniel_zahn going to be removed soon [04:01:10] ACKNOWLEDGEMENT - Long running screen/tmux on graphite1001 is CRITICAL: CRIT: Long running SCREEN process. (PID: 36516, 1921668s 1728000s). daniel_zahn krinkle [04:39:18] (03PS1) 10Dzahn: nagios_common: remove krinkle from perf-team contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/392771 [04:40:38] (03CR) 10Dzahn: [C: 032] nagios_common: remove krinkle from perf-team contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/392771 (owner: 10Dzahn) [04:55:46] (03CR) 10Krinkle: "@Lego Hm.. if this is not used by the import, perhaps it would make more sense to reduce these pieces of text to just the name and e-mail?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392461 (owner: 10Legoktm) [04:57:19] (03CR) 10Legoktm: "I included the fingerprint specifically because Tim has two keys, and neither of them are his current key (all 3 with same email and name)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392461 (owner: 10Legoktm) [05:12:32] PROBLEM - HHVM rendering on mw1294 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:13:23] RECOVERY - HHVM rendering on mw1294 is OK: HTTP OK: HTTP/1.1 200 OK - 74705 bytes in 0.133 second response time [05:34:54] (03PS3) 10Kaldari: Enable MP3 uploading on Commons on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392765 (https://phabricator.wikimedia.org/T120288) [06:22:12] RECOVERY - cassandra-b CQL 10.192.32.138:9042 on restbase2004 is OK: TCP OK - 0.036 second response time on 10.192.32.138 port 9042 [06:29:12] PROBLEM - puppet last run on mw2222 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/furl] [06:31:22] PROBLEM - puppet last run on mw2165 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/conftool/schema.yaml] [06:34:49] (03PS1) 10Marostegui: db-eqiad.php: Move db1051 to s5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392785 (https://phabricator.wikimedia.org/T177208) [06:38:19] (03PS1) 10Marostegui: mariadb: Move db1051 from s1 to s5 [puppet] - 10https://gerrit.wikimedia.org/r/392786 (https://phabricator.wikimedia.org/T177208) [06:39:42] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Move db1051 to s5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392785 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [06:40:55] (03Merged) 10jenkins-bot: db-eqiad.php: Move db1051 to s5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392785 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [06:41:06] (03CR) 10jenkins-bot: db-eqiad.php: Move db1051 to s5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392785 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [06:42:28] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Move db1051 from s1 to s5 - T177208 (duration: 00m 53s) [06:42:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:36] T177208: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208 [06:43:27] !log Stop MySQL on db1063 and db1051 (which is going to be recloned) - T177208 [06:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:51:11] (03CR) 10Marostegui: "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler02/8899/" [puppet] - 10https://gerrit.wikimedia.org/r/392786 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [06:51:13] (03CR) 10Marostegui: [C: 032] mariadb: Move db1051 from s1 to s5 [puppet] - 10https://gerrit.wikimedia.org/r/392786 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [06:55:27] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2068 storage crash - https://phabricator.wikimedia.org/T180927#3780283 (10Marostegui) 05Open>03Resolved [06:56:22] RECOVERY - puppet last run on mw2165 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:59:12] RECOVERY - puppet last run on mw2222 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:59:55] (03PS1) 10Marostegui: s1,s5.hosts: Move db1051 from s1 to s5 [software] - 10https://gerrit.wikimedia.org/r/392787 (https://phabricator.wikimedia.org/T177208) [07:01:00] (03CR) 10Marostegui: [C: 032] s1,s5.hosts: Move db1051 from s1 to s5 [software] - 10https://gerrit.wikimedia.org/r/392787 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [07:01:41] (03Merged) 10jenkins-bot: s1,s5.hosts: Move db1051 from s1 to s5 [software] - 10https://gerrit.wikimedia.org/r/392787 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [07:12:54] PROBLEM - puppet last run on es2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:13:14] PROBLEM - puppet last run on db2053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:13:54] PROBLEM - puppet last run on wtp1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:13:54] PROBLEM - puppet last run on maps-test2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:14:54] PROBLEM - puppet last run on ms-be2028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:14] PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:14] PROBLEM - puppet last run on elastic2035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:15] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:15] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:15] PROBLEM - puppet last run on releases2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:24] PROBLEM - Check Varnish expiry mailbox lag on cp4024 is CRITICAL: CRITICAL: expiry mailbox lag is 2013688 [07:16:34] PROBLEM - puppet last run on logstash1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:44] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:45] PROBLEM - puppet last run on cp2010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:16:56] <_joe_> uhm [07:16:59] <_joe_> nitrogen again? [07:17:14] PROBLEM - puppet last run on cp4026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:17:15] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:17:35] <_joe_> puppetdb restarted 7 minutes ago [07:17:38] <_joe_> so I guess so [07:18:17] <_joe_> indeed [07:29:46] <_joe_> !log stopping the additional workers for htmlCacheUpdate (commons and ruwiki), adding one additional runner for refreshLinks on ruwiki [07:29:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:13] (03PS1) 10Elukey: profile::mariadb::misc::el::replication: fix el cleaner cronjob [puppet] - 10https://gerrit.wikimedia.org/r/392788 (https://phabricator.wikimedia.org/T156933) [07:35:38] (03CR) 10jerkins-bot: [V: 04-1] profile::mariadb::misc::el::replication: fix el cleaner cronjob [puppet] - 10https://gerrit.wikimedia.org/r/392788 (https://phabricator.wikimedia.org/T156933) (owner: 10Elukey) [07:36:24] PROBLEM - Check Varnish expiry mailbox lag on cp4024 is CRITICAL: CRITICAL: expiry mailbox lag is 2029221 [07:38:04] (03PS2) 10Elukey: profile::mariadb::misc::el::replication: fix el cleaner cronjob [puppet] - 10https://gerrit.wikimedia.org/r/392788 (https://phabricator.wikimedia.org/T156933) [07:41:14] RECOVERY - puppet last run on elastic2035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:41:15] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [07:41:15] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:41:15] RECOVERY - puppet last run on releases2001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [07:41:34] RECOVERY - puppet last run on logstash1004 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [07:41:44] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [07:41:44] RECOVERY - puppet last run on cp2010 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [07:42:14] RECOVERY - puppet last run on cp4026 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [07:42:15] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:42:55] RECOVERY - puppet last run on es2016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:43:14] RECOVERY - puppet last run on db2053 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:43:54] RECOVERY - puppet last run on wtp1037 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:43:54] RECOVERY - puppet last run on maps-test2004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:44:54] RECOVERY - puppet last run on ms-be2028 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:46:14] RECOVERY - puppet last run on mw1281 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:48:04] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received [07:48:48] !log Drop index from ores_classification on s7 - T180045 [07:48:54] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy [07:48:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:55] T180045: Review and deploy schema change on dropping oresc_rev_predicted_model index - https://phabricator.wikimedia.org/T180045 [07:50:21] !log Drop index from ores_classification on s6 - T180045 [07:50:24] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=textvar-status_type=5 [07:50:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:44] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=esamsvar-cache_type=Allvar-status_type=5 [07:56:22] !log Drop index from ores_classification on s3 - T180045 [07:56:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:30] T180045: Review and deploy schema change on dropping oresc_rev_predicted_model index - https://phabricator.wikimedia.org/T180045 [08:02:45] marostegui: let's stop a second to drop indexes until the 5xx are solved ok? Just as precaution [08:02:58] yep! [08:03:44] PROBLEM - Host actinium is DOWN: PING CRITICAL - Packet loss = 100% [08:03:44] PROBLEM - Host bohrium is DOWN: PING CRITICAL - Packet loss = 100% [08:03:54] PROBLEM - Host etcd1004 is DOWN: PING CRITICAL - Packet loss = 100% [08:04:04] PROBLEM - Host releases1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:04:05] PROBLEM - Host etcd1002 is DOWN: PING CRITICAL - Packet loss = 100% [08:04:05] PROBLEM - Host hassium is DOWN: PING CRITICAL - Packet loss = 100% [08:04:05] PROBLEM - Host netmon1003 is DOWN: PING CRITICAL - Packet loss = 100% [08:04:14] PROBLEM - Host planet1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:04:17] what? [08:04:44] PROBLEM - SSH on ganeti1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:05:46] those are all ganeti instances, probably all running on ganeti1005 [08:09:25] RECOVERY - Host releases1001 is UP: PING WARNING - Packet loss = 93%, RTA = 3.99 ms [08:09:25] RECOVERY - Host etcd1002 is UP: PING WARNING - Packet loss = 93%, RTA = 3.16 ms [08:09:25] RECOVERY - Host hassium is UP: PING WARNING - Packet loss = 73%, RTA = 1.33 ms [08:09:25] RECOVERY - Host netmon1003 is UP: PING WARNING - Packet loss = 86%, RTA = 2.20 ms [08:09:25] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=textvar-status_type=5 [08:09:25] RECOVERY - Host planet1001 is UP: PING WARNING - Packet loss = 44%, RTA = 3.15 ms [08:09:34] RECOVERY - Host etcd1004 is UP: PING OK - Packet loss = 0%, RTA = 1.15 ms [08:09:34] RECOVERY - Host actinium is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms [08:09:35] RECOVERY - SSH on ganeti1005 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [08:11:05] there were a lot of 5xx from piwik raising up all of a sudden, so those are probably related to ganeti1005 [08:12:09] it has broken hardware I think [08:12:39] initially I couldn't connect via ssh and when I tried mgmt it worked fine [08:12:52] but dmesg is full of "bad page state" errors [08:13:23] probably broken memory [08:13:25] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=miscvar-status_type=5 [08:13:33] this one is for piwik --^ [08:15:22] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=2&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5&from=1511330216444&to=1511338505693 [08:15:32] maybe lost in noise above, but there's text 5xx recently too ^ [08:16:14] RECOVERY - Host bohrium is UP: PING OK - Packet loss = 0%, RTA = 0.46 ms [08:16:31] !log backend restart on cp4024 (upload@ulsfo) - mailbox lag [08:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:38] bblack: yep, did you find anything interesting among those text 50x? [08:20:16] in the meantime, bohrium/piwik vm seems to be stable-ish (https://grafana.wikimedia.org/dashboard/db/piwik?orgId=1&from=now-3h&to=now) [08:21:56] even if ganeti1005 doesn't seem healthy [08:22:30] elukey: I'm creating a ticket, we should probably take it down down for a dc-ops mem test [08:23:41] I'm not doing a "gnt-node evacuate" on ganeti1005 yet, I guess Alex will be around soon, wanted to doublecheck with him first [08:23:56] (since the VMs running on it seem to be fine ATM) [08:26:24] RECOVERY - Check Varnish expiry mailbox lag on cp4024 is OK: OK: expiry mailbox lag is 0 [08:27:34] RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=miscvar-status_type=5 [08:28:02] as far as I can see from https://logstash.wikimedia.org/app/kibana#/dashboard/Varnish-Webrequest-50X we still have a background 50x noise for text though [08:28:56] 10Operations, 10ops-eqiad: Possible memory errors on ganeti1005 - https://phabricator.wikimedia.org/T181121#3780358 (10MoritzMuehlenhoff) [08:37:51] 10Operations, 10Electron-PDFs, 10Security-Reviews, 10Services-next, and 2 others: Restrict outgoing network connections from Electron render service - https://phabricator.wikimedia.org/T148567#3780384 (10mobrovac) >>! In T148567#3778518, @dpatrick wrote: > Just following up on some lingering security revie... [08:38:18] yeah we do have some new text 5xx since ~07:04 [08:38:26] it's not very high rate, but it's new and notable [08:38:54] I'm in the midst of an all night pumpkin pie baking fiasco, but peeking between steps :P [08:41:00] :) [08:48:54] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=esamsvar-cache_type=Allvar-status_type=5 [08:58:46] (03PS1) 10BBlack: lvs@ulsfo - switch primaries to new hardware [puppet] - 10https://gerrit.wikimedia.org/r/392792 [09:00:46] !log lvs4005 - reboot to clear experimental stuff [09:00:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:52] (03PS1) 10Hashar: Fix flake8 issues [software/conftool] - 10https://gerrit.wikimedia.org/r/392793 [09:04:06] 10Operations, 10ops-codfw: Degraded RAID on wtp2017 - https://phabricator.wikimedia.org/T180211#3780451 (10akosiaris) 05Open>03Resolved I 've added the disk to the RAID array. For those interested the commands where ``` dd if=/dev/sda1 of=/dev/sdb bs=512 count=1 # (copy the MBR from the first disk) fdisk... [09:04:29] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wtp2017.codfw.wmnet [09:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:43] (03CR) 10Hashar: Fix flake8 issues (033 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/392793 (owner: 10Hashar) [09:06:22] !log akosiaris@tin Started deploy [parsoid/deploy@b150764]: T180211 [09:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:30] T180211: Degraded RAID on wtp2017 - https://phabricator.wikimedia.org/T180211 [09:08:16] !log puppet disabled on lvs400[1256] for switching primaries [09:08:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:57] (03CR) 10BBlack: [C: 032] lvs@ulsfo - switch primaries to new hardware [puppet] - 10https://gerrit.wikimedia.org/r/392792 (owner: 10BBlack) [09:09:59] (03PS1) 10Hashar: Add .gitreview [software/conftool] - 10https://gerrit.wikimedia.org/r/392795 [09:11:26] !log akosiaris@tin Finished deploy [parsoid/deploy@b150764]: T180211 (duration: 05m 05s) [09:11:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:33] T180211: Degraded RAID on wtp2017 - https://phabricator.wikimedia.org/T180211 [09:14:21] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db1101 as rc to s5,s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392797 (https://phabricator.wikimedia.org/T178359) [09:21:12] 10Operations, 10OTRS: Upgrade OTRS to 5.0.24 - https://phabricator.wikimedia.org/T181127#3780465 (10akosiaris) [09:21:23] 10Operations, 10OTRS, 10Security: Upgrade OTRS to 5.0.24 - https://phabricator.wikimedia.org/T181127#3780478 (10akosiaris) [09:27:05] !log lvs@ulsfo - done switching primaries (host MED config) - lvs400[56] now primary for text/upload traffic [09:27:08] 10Operations, 10Electron-PDFs, 10OfflineContentGenerator, 10Services (designing): Improve stability and maintainability of our browser-based PDF render service - https://phabricator.wikimedia.org/T172815#3780481 (10phuedx) [09:27:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:51] (03PS1) 10Alexandros Kosiaris: Update Templates for 5.0.24 OTRS version [software/otrs] - 10https://gerrit.wikimedia.org/r/392800 (https://phabricator.wikimedia.org/T181127) [09:35:59] !log cr[12]-ulsfo - switch static fallback LVS routes from lvs400[12] to lvs400[56] [09:36:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:36:22] (03CR) 10Hashar: "recheck" [software/conftool] - 10https://gerrit.wikimedia.org/r/300524 (owner: 10Hashar) [09:37:57] (03CR) 10Jcrespo: [C: 031] db-eqiad,db-codfw.php: Add db1101 as rc to s5,s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392797 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [09:40:34] 10Operations, 10Electron-PDFs, 10OfflineContentGenerator, 10Services (designing): Improve stability and maintainability of our browser-based PDF render service - https://phabricator.wikimedia.org/T172815#3780510 (10phuedx) A brief update: Readers Web are currently building a Chromium-based PDF render serv... [09:41:59] (03PS3) 10Elukey: profile::mariadb::misc::el::replication: fix el cleaner cronjob [puppet] - 10https://gerrit.wikimedia.org/r/392788 (https://phabricator.wikimedia.org/T156933) [09:42:01] (03CR) 10Volans: "FYI those fixes are already included in Iad20092cbd0eb48a347a288cdccde4b80302d3e7" [software/conftool] - 10https://gerrit.wikimedia.org/r/392793 (owner: 10Hashar) [09:42:03] 10Operations, 10ops-ulsfo, 10Traffic: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3780516 (10BBlack) [09:42:06] 10Operations, 10Traffic, 10Patch-For-Review: rack/setup/install lvs400[567].ulsfo.wmnet - https://phabricator.wikimedia.org/T178436#3780513 (10BBlack) 05Open>03Resolved a:03BBlack These are fully in-service now [09:42:48] (03CR) 10Elukey: [C: 032] profile::mariadb::misc::el::replication: fix el cleaner cronjob [puppet] - 10https://gerrit.wikimedia.org/r/392788 (https://phabricator.wikimedia.org/T156933) (owner: 10Elukey) [09:43:27] 10Operations, 10ops-ulsfo, 10Traffic: decommission lvs400[1-4].ulsfo.wmnet - https://phabricator.wikimedia.org/T178535#3780533 (10BBlack) These are now non-primary, but still active as backups for now. Will switch to spare role and remove from router configs post-Thanksgiving and then real decom can start. [09:45:38] (03CR) 10Hashar: "I seen that change, seems python 3 support is a bit more work to review/fix :)" [software/conftool] - 10https://gerrit.wikimedia.org/r/392793 (owner: 10Hashar) [09:48:49] (03PS6) 10MarcoAurelio: Extension:Translate default permissions for Wikimedia wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385953 (https://phabricator.wikimedia.org/T178793) [09:51:27] (03PS3) 10ArielGlenn: dumps/site: remove outdated comment about firewall [puppet] - 10https://gerrit.wikimedia.org/r/392767 (owner: 10Dzahn) [09:52:18] (03CR) 10ArielGlenn: [C: 032] dumps/site: remove outdated comment about firewall [puppet] - 10https://gerrit.wikimedia.org/r/392767 (owner: 10Dzahn) [09:58:00] (03PS1) 10Elukey: profile::mariadb::misc::eventlogging: use /var/log/eventlogging in el_sync [puppet] - 10https://gerrit.wikimedia.org/r/392807 (https://phabricator.wikimedia.org/T156933) [09:58:04] (03PS1) 10ArielGlenn: include all log files in cirrussearch cron cleanup [puppet] - 10https://gerrit.wikimedia.org/r/392808 (https://phabricator.wikimedia.org/T162688) [09:59:11] (03CR) 10ArielGlenn: [C: 032] include all log files in cirrussearch cron cleanup [puppet] - 10https://gerrit.wikimedia.org/r/392808 (https://phabricator.wikimedia.org/T162688) (owner: 10ArielGlenn) [10:03:48] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8902/dbstore1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/392807 (https://phabricator.wikimedia.org/T156933) (owner: 10Elukey) [10:03:54] (03PS2) 10Elukey: profile::mariadb::misc::eventlogging: use /var/log/eventlogging in el_sync [puppet] - 10https://gerrit.wikimedia.org/r/392807 (https://phabricator.wikimedia.org/T156933) [10:04:44] !log running "scap pull" on mw1191, it's depooled and marked as "inactive", but health checks are triggering db errors [10:04:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:32] (03Abandoned) 10ArielGlenn: move hardcoded hostnames out of script for rsync of dumps to peers [puppet] - 10https://gerrit.wikimedia.org/r/381760 (owner: 10ArielGlenn) [10:09:32] 10Operations, 10Cloud-Services: Rename @thiemowmde's account in LDAP, Wikitech, and Gerrit - https://phabricator.wikimedia.org/T181130#3780669 (10thiemowmde) [10:15:11] (03PS7) 10Elukey: role::analytics_cluster::hadoop::worker: move to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [10:20:03] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Add db1101 as rc to s5,s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392797 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [10:21:11] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db1101 as rc to s5,s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392797 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [10:21:21] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db1101 as rc to s5,s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392797 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [10:22:59] (03CR) 10Elukey: "I added some profiles directly to the new hadoop::worker one since from what I can see they are strictly required to make everything worki" [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [10:23:48] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add db1101 to s5 and s7 as recentchanges multi-instance slave - T178359 (duration: 00m 45s) [10:23:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:54] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [10:30:26] (03PS1) 10Marostegui: db-eqiad.php: Pool db1063 and db1051 as vslow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392813 (https://phabricator.wikimedia.org/T177208) [10:31:21] (03CR) 10ArielGlenn: [V: 032 C: 032] "Thanks Hashar!" [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/392442 (owner: 10Hashar) [10:38:51] (03PS1) 10Marostegui: db1101.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/392815 (https://phabricator.wikimedia.org/T178359) [10:39:51] (03CR) 10Marostegui: [C: 032] db1101.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/392815 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [10:41:50] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Pool db1063 and db1051 as vslow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392813 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [10:42:56] (03Merged) 10jenkins-bot: db-eqiad.php: Pool db1063 and db1051 as vslow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392813 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [10:43:14] (03CR) 10jenkins-bot: db-eqiad.php: Pool db1063 and db1051 as vslow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392813 (https://phabricator.wikimedia.org/T177208) (owner: 10Marostegui) [10:44:14] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Pool db1051 and db1063 in vslow service for s5 to warm them up for the s8 split - T177208 (duration: 00m 45s) [10:44:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:22] T177208: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208 [10:51:16] !log  Drop index from ores_classification on s5 - T180045 [10:51:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:28] T180045: Review and deploy schema change on dropping oresc_rev_predicted_model index - https://phabricator.wikimedia.org/T180045 [10:54:44] !log gnt-node migrate -f ganeti1005. T181121 [10:54:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:52] T181121: Possible memory errors on ganeti1005 - https://phabricator.wikimedia.org/T181121 [10:55:05] (03PS1) 10Volans: Varnish instance: fix child restarted check [puppet] - 10https://gerrit.wikimedia.org/r/392819 [10:56:22] (03PS1) 10Marostegui: db-eqiad.php: Increase weight on db1101.s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392820 (https://phabricator.wikimedia.org/T178359) [10:58:21] (03CR) 10Ema: [C: 031] Varnish instance: fix child restarted check [puppet] - 10https://gerrit.wikimedia.org/r/392819 (owner: 10Volans) [10:59:13] (03PS2) 10Volans: Varnish instance: fix child restarted check [puppet] - 10https://gerrit.wikimedia.org/r/392819 [10:59:24] (03CR) 10Alexandros Kosiaris: [C: 031] apache: Add http2 to mod [puppet] - 10https://gerrit.wikimedia.org/r/392495 (owner: 10Paladox) [10:59:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight on db1101.s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392820 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [10:59:46] (03CR) 10Volans: [C: 032] Varnish instance: fix child restarted check [puppet] - 10https://gerrit.wikimedia.org/r/392819 (owner: 10Volans) [11:00:54] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight on db1101.s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392820 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [11:01:04] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight on db1101.s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392820 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [11:02:07] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1101.s7 - T178359 (duration: 00m 45s) [11:02:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:14] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [11:19:40] !log gnt-node evacuate -s -f ganeti1005. T181121 [11:19:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:48] T181121: Possible memory errors on ganeti1005 - https://phabricator.wikimedia.org/T181121 [11:24:20] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Update Templates for 5.0.24 OTRS version [software/otrs] - 10https://gerrit.wikimedia.org/r/392800 (https://phabricator.wikimedia.org/T181127) (owner: 10Alexandros Kosiaris) [11:37:34] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1101.s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392823 [11:39:31] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1101.s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392823 (owner: 10Marostegui) [11:40:37] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1101.s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392823 (owner: 10Marostegui) [11:40:47] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1101.s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392823 (owner: 10Marostegui) [11:41:39] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1101.s7 - T178359 (duration: 00m 45s) [11:41:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:47] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [11:47:08] PROBLEM - cassandra-c service on restbase2004 is CRITICAL: NRPE: Command check_cassandra-c-state not defined [11:47:18] PROBLEM - cassandra-c SSL 10.192.32.139:7001 on restbase2004 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [11:47:27] PROBLEM - cassandra-c CQL 10.192.32.139:9042 on restbase2004 is CRITICAL: connect to address 10.192.32.139 and port 9042: Connection refused [11:48:36] known ^, expired downtime [11:49:38] PROBLEM - puppet last run on restbase2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 10 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[cassandra-c] [11:49:39] we should really have icinga ping on irc/email the owner of a downtime when only 10% of it's downtime period is left :D [11:50:02] oh puppet tried to run there? dang [11:50:03] :/ [11:50:13] volans: +1 [11:51:44] !log starting dropping incorrectly created database on s7 amwikimedia (not to be confused with production wiki s3 amwikimedia) [11:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:42] (03CR) 10Hashar: "The sole warning got fixed by https://gerrit.wikimedia.org/r/#/c/392442/ :)" [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/392443 (owner: 10Hashar) [12:00:17] (03PS1) 10Marostegui: db-eqiad.php: Fully pool db1101 in s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392825 (https://phabricator.wikimedia.org/T178359) [12:02:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully pool db1101 in s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392825 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:03:11] (03Merged) 10jenkins-bot: db-eqiad.php: Fully pool db1101 in s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392825 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:03:23] (03CR) 10jenkins-bot: db-eqiad.php: Fully pool db1101 in s7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392825 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:04:09] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully pool db1101.s7 - T178359 (duration: 00m 45s) [12:04:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:16] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [12:11:53] (03PS1) 10Marostegui: db-eqiad.php: Start building db1097 as rc in s4,s5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392826 (https://phabricator.wikimedia.org/T178359) [12:12:49] (03PS2) 10Marostegui: db-eqiad.php: Start building db1097 as rc in s4,s5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392826 (https://phabricator.wikimedia.org/T178359) [12:14:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Start building db1097 as rc in s4,s5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392826 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:15:51] (03Merged) 10jenkins-bot: db-eqiad.php: Start building db1097 as rc in s4,s5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392826 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:16:54] (03CR) 10jenkins-bot: db-eqiad.php: Start building db1097 as rc in s4,s5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392826 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:17:06] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Start adapting the config to move db1097 to s4 and s5 as multi-instance rc slave T178359 (duration: 00m 45s) [12:17:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:13] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [12:20:30] (03PS1) 10Marostegui: mariadb: Convert db1097 to multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/392829 (https://phabricator.wikimedia.org/T178359) [12:23:51] (03CR) 10Marostegui: [C: 032] "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler02/8905/" [puppet] - 10https://gerrit.wikimedia.org/r/392829 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:24:43] (03PS8) 10Elukey: role::analytics_cluster::hadoop: move worker and masters to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [12:25:09] (03CR) 10jerkins-bot: [V: 04-1] role::analytics_cluster::hadoop: move worker and masters to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [12:31:56] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392833 (https://phabricator.wikimedia.org/T178359) [12:33:42] (03PS2) 10Marostegui: db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392833 (https://phabricator.wikimedia.org/T178359) [12:36:30] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392833 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:37:41] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392833 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:37:51] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392833 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [12:38:45] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1053 - T178359 (duration: 00m 45s) [12:38:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:53] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [12:39:32] !log Stop MySQL on db1053 to clone db1097.s4 - T178359 [12:39:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:38] (03PS9) 10Elukey: role::analytics_cluster::hadoop: move worker and masters to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [12:50:34] (03CR) 10Elukey: "Got carried by the puppet refactoring and started adding more things. Currently testing the changes via pcc :D" [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [12:56:16] (03PS3) 10ArielGlenn: rsync misc dumps (everything but xml/sql) to fallback hosts, labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/392625 (https://phabricator.wikimedia.org/T179942) [13:12:52] PROBLEM - HHVM rendering on mw2245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:13:42] RECOVERY - HHVM rendering on mw2245 is OK: HTTP OK: HTTP/1.1 200 OK - 74582 bytes in 0.678 second response time [13:15:57] (03PS4) 10ArielGlenn: rsync misc dumps (everything but xml/sql) to fallback hosts, labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/392625 (https://phabricator.wikimedia.org/T179942) [13:16:36] (03CR) 10ArielGlenn: [C: 032] rsync misc dumps (everything but xml/sql) to fallback hosts, labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/392625 (https://phabricator.wikimedia.org/T179942) (owner: 10ArielGlenn) [13:28:49] (03PS10) 10Elukey: role::analytics_cluster::hadoop: move worker and masters to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [13:34:14] (03CR) 10Elukey: "No big changes from https://puppet-compiler.wmflabs.org/compiler02/8908/" [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [13:37:51] !log installing imagemagick security updates [13:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:24] (03PS1) 10Alexandros Kosiaris: Allow specifying kubelet/kubeproxy username/token [puppet] - 10https://gerrit.wikimedia.org/r/392838 [13:38:26] (03PS1) 10Alexandros Kosiaris: Add kubelet_username, kubeproxy_username hieradata [puppet] - 10https://gerrit.wikimedia.org/r/392839 [13:43:26] (03PS1) 10Elukey: role:prometheus::analytics: add druid_exporter targets [puppet] - 10https://gerrit.wikimedia.org/r/392841 (https://phabricator.wikimedia.org/T177459) [13:43:47] !log one more round of labstore1006 <-- ms1001 rsync catchup [13:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:26] (03PS1) 10Alexandros Kosiaris: Use kubelet/kubeproxy specific configs [puppet] - 10https://gerrit.wikimedia.org/r/392842 [13:58:22] PROBLEM - puppet last run on mw2184 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[imagemagick] [13:59:09] (03PS1) 10Alexandros Kosiaris: Specify new hiera keys for kubernetes [labs/private] - 10https://gerrit.wikimedia.org/r/392843 [14:01:05] (03CR) 10Alexandros Kosiaris: [C: 032] Specify new hiera keys for kubernetes [labs/private] - 10https://gerrit.wikimedia.org/r/392843 (owner: 10Alexandros Kosiaris) [14:01:07] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Specify new hiera keys for kubernetes [labs/private] - 10https://gerrit.wikimedia.org/r/392843 (owner: 10Alexandros Kosiaris) [14:02:37] (03PS8) 10Jcrespo: mariadb: Setup s8 replica set on codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391835 (https://phabricator.wikimedia.org/T177208) [14:03:14] (03PS9) 10Jcrespo: mariadb: Setup s8 replica set on codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391835 (https://phabricator.wikimedia.org/T177208) [14:03:36] I have one thing for SWAT [14:04:02] oh, no SWAT [14:04:05] :( [14:08:22] (03CR) 10Marostegui: [C: 031] mariadb: Setup s8 replica set on codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391835 (https://phabricator.wikimedia.org/T177208) (owner: 10Jcrespo) [14:11:21] !log bootstrapping cassandra, restbase2004-c.codfw.wmnet - T179422 [14:11:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:28] T179422: Reshape RESTBase Cassandra clusters - https://phabricator.wikimedia.org/T179422 [14:13:22] RECOVERY - cassandra-c SSL 10.192.32.139:7001 on restbase2004 is OK: SSL OK - Certificate restbase2004-c valid until 2018-08-17 16:11:56 +0000 (expires in 268 days) [14:21:35] !log starting database topology changes for s8 on codfw T177208 [14:21:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:42] T177208: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208 [14:23:13] 10Operations, 10Release Pipeline, 10Release-Engineering-Team (Watching / External): Update Debian package for Blubber - https://phabricator.wikimedia.org/T179984#3781320 (10akosiaris) OK, I git pulled and refreshed tags. Unfortunately we are still at a no-go state. I now have ``` dh_auto_build -O--buildsyst... [14:23:22] RECOVERY - puppet last run on mw2184 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:25:02] (03CR) 10ArielGlenn: [V: 032 C: 032] gcc warning are now fatals (-Werror) [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/392443 (owner: 10Hashar) [14:27:54] !log installing libxml-libxml-perl security updates [14:28:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:54] (03CR) 10Jcrespo: [C: 031] puppet: point db2* hosts at puppet 4 master puppetmaster2001 [puppet] - 10https://gerrit.wikimedia.org/r/392664 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [14:40:23] RECOVERY - cassandra-c service on restbase2004 is OK: OK - cassandra-c is active [14:44:43] RECOVERY - puppet last run on restbase2004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:45:25] (03CR) 10Volans: [C: 04-2] "According to puppet compiler, seems to do the right thing:" [puppet] - 10https://gerrit.wikimedia.org/r/392606 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [14:47:13] (03CR) 10Jcrespo: [C: 032] mariadb: Setup s8 replica set on codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391835 (https://phabricator.wikimedia.org/T177208) (owner: 10Jcrespo) [14:49:39] !log jynus@tin Synchronized wmf-config/db-codfw.php: mariadb: Setup s8 replica set on codfw (duration: 00m 45s) [14:49:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:18] (03CR) 10Ema: vcl: distinguish between hfp and hfm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/391171 (https://phabricator.wikimedia.org/T180434) (owner: 10Ema) [14:58:21] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3781414 (10chasemp) 05Open>03Resolved https://gerrit.wikimedia.org/r/#/c/392514/ [14:58:53] (03CR) 10BBlack: vcl: distinguish between hfp and hfm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/391171 (https://phabricator.wikimedia.org/T180434) (owner: 10Ema) [15:03:30] (03CR) 10jenkins-bot: mariadb: Setup s8 replica set on codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391835 (https://phabricator.wikimedia.org/T177208) (owner: 10Jcrespo) [15:04:44] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Check analytics1037 power supply status - https://phabricator.wikimedia.org/T179192#3781423 (10elukey) ping :) [15:05:48] (03CR) 10Muehlenhoff: kmod::blacklist: prevent manual install, update initramfs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/392644 (owner: 10BBlack) [15:22:00] !log beginning cut over of codfw db servers (^db2.*) to codfw puppet 4 masters [15:22:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:51] (03CR) 10Herron: [C: 032] puppet: point db2* hosts at puppet 4 master puppetmaster2001 [puppet] - 10https://gerrit.wikimedia.org/r/392664 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [15:22:57] (03PS2) 10Herron: puppet: point db2* hosts at puppet 4 master puppetmaster2001 [puppet] - 10https://gerrit.wikimedia.org/r/392664 (https://phabricator.wikimedia.org/T177254) [15:22:59] (03PS1) 10Elukey: geowiki::mysql_conf: fix db host configuration in the research cnf file [puppet] - 10https://gerrit.wikimedia.org/r/392852 [15:24:15] (03PS2) 10Elukey: geowiki::mysql_conf: fix db host configuration in the research cnf file [puppet] - 10https://gerrit.wikimedia.org/r/392852 [15:25:30] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3775204 (10MoritzMuehlenhoff) >>! In T180978#3777053, @elukey wrote: > 2) Since the experimental tag has been removed only recently I strongly suggest to use a recent ver... [15:26:53] I was checking a wikidata- cold wikidata hompage (reparsing, etc.) take like 20 seconds [15:27:28] probably because databases are also cold [15:30:49] (03PS8) 10Rush: openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [15:31:10] (03CR) 10jerkins-bot: [V: 04-1] openstack: dualing regex matches for labtest [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [15:34:34] I generated some errors on eqiad mediawikis [15:34:38] sorry [15:34:43] codfw [15:34:49] == not user visible [15:34:52] PROBLEM - Varnish HTTP text-backend - port 3128 on cp4027 is CRITICAL: connect to address 10.128.0.127 and port 3128: Connection refused [15:35:52] RECOVERY - Varnish HTTP text-backend - port 3128 on cp4027 is OK: HTTP OK: HTTP/1.1 200 OK - 178 bytes in 0.157 second response time [15:36:25] (03PS9) 10Rush: openstack: consolidate labtest regex matches [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [15:36:46] (03CR) 10jerkins-bot: [V: 04-1] openstack: consolidate labtest regex matches [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [15:38:10] (03PS10) 10Rush: openstack: consolidate labtest regex matches [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [15:38:18] I think I can fake s8 by starting heartbeat on db1071 [15:38:34] (03CR) 10jerkins-bot: [V: 04-1] openstack: consolidate labtest regex matches [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [15:41:51] !log starting manually pt-heartbeat for s8 on db1071 [15:41:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:15] ok, trying [15:43:52] PROBLEM - Long running screen/tmux on snapshot1001 is CRITICAL: CRIT: Long running SCREEN process. (PID: 62786, 1734519s 1728000s). [15:45:24] (03CR) 10Ottomata: [C: 031] geowiki::mysql_conf: fix db host configuration in the research cnf file [puppet] - 10https://gerrit.wikimedia.org/r/392852 (owner: 10Elukey) [15:46:21] (03CR) 10Elukey: [C: 032] geowiki::mysql_conf: fix db host configuration in the research cnf file [puppet] - 10https://gerrit.wikimedia.org/r/392852 (owner: 10Elukey) [15:53:08] (03PS11) 10Rush: openstack: consolidate labtest regex matches [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) [15:53:16] (03CR) 10Rush: [V: 032 C: 032] openstack: consolidate labtest regex matches [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [16:04:18] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3781587 (10ArielGlenn) [16:06:14] (03PS1) 10Rush: openstack: remove labtest per host values [labs/private] - 10https://gerrit.wikimedia.org/r/392859 (https://phabricator.wikimedia.org/T171494) [16:06:32] (03CR) 10Rush: [V: 032 C: 032] openstack: remove labtest per host values [labs/private] - 10https://gerrit.wikimedia.org/r/392859 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [16:08:07] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Check analytics1037 power supply status - https://phabricator.wikimedia.org/T179192#3781597 (10RobH) I think replacing bad powersupplies on out of warranty servers is likely a waste of money (as other parts will also go bad with older systems), however... [16:13:10] (03PS1) 10Rush: openstack: remove todo for horizon [puppet] - 10https://gerrit.wikimedia.org/r/392861 (https://phabricator.wikimedia.org/T171494) [16:13:20] (03PS2) 10Rush: openstack: remove todo for horizon [puppet] - 10https://gerrit.wikimedia.org/r/392861 (https://phabricator.wikimedia.org/T171494) [16:13:40] (03CR) 10jerkins-bot: [V: 04-1] openstack: remove todo for horizon [puppet] - 10https://gerrit.wikimedia.org/r/392861 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [16:13:42] (03CR) 10Rush: "http://puppet-compiler.wmflabs.org/8916/" [puppet] - 10https://gerrit.wikimedia.org/r/392711 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [16:16:51] !log restart druid broker,coordinator,historical daemons on druid100[123] to pick up new logging settings [16:16:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:52] 10Operations, 10Goal, 10Patch-For-Review: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208#3781631 (10jcrespo) I've tested wikidatawiki and dewiki with the new config, and everthing seems to work as intended. One issue I saw is that there are errors if heartbeat... [16:20:00] (03PS1) 10Herron: puppet: point codfw scb hosts at codfw puppet 4 masters [puppet] - 10https://gerrit.wikimedia.org/r/392863 (https://phabricator.wikimedia.org/T177254) [16:28:46] !log starting canary deploy/cutover of codfw scb hosts to codfw puppet 4 masters [16:28:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:17] (03CR) 10Herron: [C: 032] puppet: point codfw scb hosts at codfw puppet 4 masters [puppet] - 10https://gerrit.wikimedia.org/r/392863 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [16:31:54] (03CR) 10Ottomata: "Some nit and Qs, but this looks great!" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [16:34:37] !log Compress s4 on db1097 - T178359 [16:34:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:44] T178359: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359 [16:36:09] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:41:06] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:45:33] !log disable puppet accross labtest things [16:45:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:51:55] PROBLEM - puppet last run on scb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:55:26] PROBLEM - eventstreams on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:55:35] PROBLEM - SSH on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:55:55] PROBLEM - pdfrender on scb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:56:25] RECOVERY - SSH on scb1001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [16:56:26] (03Abandoned) 10Andrew Bogott: rabbitmq: increase vm_memory_high_watermark_paging_ratio [puppet] - 10https://gerrit.wikimedia.org/r/375823 (https://phabricator.wikimedia.org/T170492) (owner: 10Andrew Bogott) [16:56:46] RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [16:56:46] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/summary/{title}{/revision}{/tid} (Get summary for Barack Obama) timed out before a response was received [16:56:50] (03PS4) 10Rush: rabbitmq: add a giant default config [puppet] - 10https://gerrit.wikimedia.org/r/375822 (https://phabricator.wikimedia.org/T170492) (owner: 10Andrew Bogott) [16:56:55] RECOVERY - puppet last run on scb2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:57:21] (03Abandoned) 10Chad: wikidatawiki to wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392466 (owner: 10Chad) [16:57:25] RECOVERY - eventstreams on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 929 bytes in 0.126 second response time [16:57:55] PROBLEM - cxserver endpoints health on scb1001 is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium, adapt the links to target language wiki.) is CRITICAL: Could not fetch url http://10.64.0.16:8080/v2/translate/en/es/Apertium: Generic connection error: HTTPConnectionPool(host=u10.64.0.16, port=8080): Max retries exceeded with url: /v2/translate/en/es/Apertium (Caused by NewConnec [16:57:55] onnection.HTTPConnection object at 0x7f88f33fc810: Failed to establish a new connection: [Errno 111] Connection refused,)): /robots.txt (robots.txt check) is CRITICAL: Could not fetch url http://10.64.0.16:8080/robots.txt: Generic connection error: HTTPConnectionPool(host=u10.64.0.16, port=8080): Max retries exceeded with url: /robots.txt (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7f88f33fc710: [16:57:55] a new connection: [Errno 111] Connection refused,)): /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) is CRITICAL: Could not fetch url http://10.64.0.16 [16:57:56] the oom killer acted and killed celery [16:58:00] on scb1001 [16:58:07] (03CR) 10Andrew Bogott: [C: 032] rabbitmq: add a giant default config [puppet] - 10https://gerrit.wikimedia.org/r/375822 (https://phabricator.wikimedia.org/T170492) (owner: 10Andrew Bogott) [16:58:08] memory consumption seems to be high [16:58:11] cc mobrovac [16:58:45] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [16:58:46] RECOVERY - cxserver endpoints health on scb1001 is OK: All endpoints are healthy [16:59:42] sigh [16:59:52] looking [17:00:03] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of JeanBono → Rexcornot: supervision needed - https://phabricator.wikimedia.org/T181170#3781800 (10alanajjar) [17:00:20] (03PS1) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [17:00:49] (03CR) 10jerkins-bot: [V: 04-1] Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) (owner: 10Ottomata) [17:01:00] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of JeanBono → Rexcornot: supervision needed - https://phabricator.wikimedia.org/T181170#3781817 (10Marostegui) if you want to do it now, go ahead. [17:01:13] elukey: not sure what happened there, it's not trending edits this time [17:02:15] (03PS2) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [17:02:28] (03CR) 10Elukey: role::analytics_cluster::hadoop: move worker and masters to role/profiles (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [17:02:38] (03CR) 10jerkins-bot: [V: 04-1] Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) (owner: 10Ottomata) [17:02:47] (03PS1) 10Marostegui: s4.hosts: db1097 is now a rc slave in s4 and s5 [software] - 10https://gerrit.wikimedia.org/r/392867 (https://phabricator.wikimedia.org/T178359) [17:03:36] (03CR) 10Marostegui: [C: 032] s4.hosts: db1097 is now a rc slave in s4 and s5 [software] - 10https://gerrit.wikimedia.org/r/392867 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [17:04:28] (03Merged) 10jenkins-bot: s4.hosts: db1097 is now a rc slave in s4 and s5 [software] - 10https://gerrit.wikimedia.org/r/392867 (https://phabricator.wikimedia.org/T178359) (owner: 10Marostegui) [17:06:23] mobrovac: I did a quick top and then ordered by MEM, and the processes with most of RES are celery.. [17:06:54] thought so elukey [17:06:54] (03PS3) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [17:08:50] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3781833 (10demon) p:05Triage>03Lowest gerrit2001 is running stretch, but we haven't reimaged the master cobalt yet (cf T176774). Given that, plus the fact that this... [17:09:12] 10Operations, 10Gerrit, 10Release-Engineering-Team (Someday): Reimage cobalt as stretch - https://phabricator.wikimedia.org/T176774#3636304 (10demon) [17:09:15] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3781837 (10demon) [17:09:48] 10Operations, 10ops-codfw: Degraded RAID on wtp2017 - https://phabricator.wikimedia.org/T180211#3781839 (10Papaul) @akosiaris you welcome [17:16:36] (03PS11) 10Elukey: role::analytics_cluster::hadoop: move worker and masters to role/profiles [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) [17:29:15] (03PS4) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [17:30:54] !log demon@tin Synchronized php-1.31.0-wmf.8/extensions/AdvancedSearch/: fixing layout issues in timeless (duration: 00m 46s) [17:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:32:09] (03PS5) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [17:32:35] (03PS1) 10ArielGlenn: add firewall rules to permit rsync to/from labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/392870 [17:34:24] no_justification: thanks for the backport! [17:35:35] (03CR) 10Ottomata: role::analytics_cluster::hadoop: move worker and masters to role/profiles (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/392658 (https://phabricator.wikimedia.org/T167790) (owner: 10Elukey) [17:37:37] (03PS2) 10ArielGlenn: add firewall rules to permit rsync to/from labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/392870 [17:37:39] (03PS6) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [17:38:22] (03CR) 10ArielGlenn: [C: 032] add firewall rules to permit rsync to/from labstore1006 [puppet] - 10https://gerrit.wikimedia.org/r/392870 (owner: 10ArielGlenn) [17:41:04] addshore: I was shamelessly abusing my powers for my own benefit :p [17:41:09] (but you're welcome, heh) [17:41:25] :D [17:41:35] (03PS7) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [17:46:28] (03PS1) 10ArielGlenn: fix up usage message and dest path for xml/sql dumps rsync [puppet] - 10https://gerrit.wikimedia.org/r/392873 [17:48:47] (03CR) 10ArielGlenn: [C: 032] fix up usage message and dest path for xml/sql dumps rsync [puppet] - 10https://gerrit.wikimedia.org/r/392873 (owner: 10ArielGlenn) [17:53:06] (03PS1) 10ArielGlenn: rsync all dumps status files to web servers and unpack them periodically [puppet] - 10https://gerrit.wikimedia.org/r/392875 (https://phabricator.wikimedia.org/T179857) [17:53:26] (03CR) 10jerkins-bot: [V: 04-1] rsync all dumps status files to web servers and unpack them periodically [puppet] - 10https://gerrit.wikimedia.org/r/392875 (https://phabricator.wikimedia.org/T179857) (owner: 10ArielGlenn) [17:55:16] (03PS8) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [17:59:28] (03PS1) 10Ladsgroup: labs: Enable ORES extension in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392876 (https://phabricator.wikimedia.org/T181168) [17:59:58] (03PS9) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [18:01:09] (03CR) 10Ladsgroup: [C: 032] labs: Enable ORES extension in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392876 (https://phabricator.wikimedia.org/T181168) (owner: 10Ladsgroup) [18:02:02] !log disabling puppet agents for puppetdb postgres security update [18:02:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:15] (03Merged) 10jenkins-bot: labs: Enable ORES extension in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392876 (https://phabricator.wikimedia.org/T181168) (owner: 10Ladsgroup) [18:02:30] (03CR) 10jenkins-bot: labs: Enable ORES extension in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392876 (https://phabricator.wikimedia.org/T181168) (owner: 10Ladsgroup) [18:03:09] (03PS2) 10ArielGlenn: rsync all dumps status files to web servers and unpack them periodically [puppet] - 10https://gerrit.wikimedia.org/r/392875 (https://phabricator.wikimedia.org/T179857) [18:03:32] (03CR) 10jerkins-bot: [V: 04-1] rsync all dumps status files to web servers and unpack them periodically [puppet] - 10https://gerrit.wikimedia.org/r/392875 (https://phabricator.wikimedia.org/T179857) (owner: 10ArielGlenn) [18:05:20] !log installing postgres security updates on nihal/puppetdb [18:05:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:23] (03PS3) 10ArielGlenn: rsync all dumps status files to web servers and unpack them periodically [puppet] - 10https://gerrit.wikimedia.org/r/392875 (https://phabricator.wikimedia.org/T179857) [18:08:44] (03PS10) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [18:09:12] !log installing postgres security updates on nitrogen/puppetdb [18:09:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:36] (03CR) 10Ottomata: "Looking good (finally!) https://puppet-compiler.wmflabs.org/compiler02/8931/stat1004.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) (owner: 10Ottomata) [18:11:46] (03PS11) 10Ottomata: Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) [18:12:39] !log re-enabling puppet agents after puppetdb postgres security updates [18:12:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:09] (03CR) 10Ottomata: [C: 032] Add statistics::explorer role to stat1004 to install R and other packages [puppet] - 10https://gerrit.wikimedia.org/r/392865 (https://phabricator.wikimedia.org/T181094) (owner: 10Ottomata) [18:14:59] (03PS1) 10Jcrespo: mariadb: s5 master switchover from db2023 to db2052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392879 (https://phabricator.wikimedia.org/T177208) [18:19:50] (03PS1) 10Ladsgroup: labs: Enable goodfaith in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392880 (https://phabricator.wikimedia.org/T181168) [18:20:28] Amir1: If you're doing beta merges, please also sync to production for consistency [18:22:47] (03PS1) 10Jcrespo: mariadb: Switchover s5 codfw master (db2023) to db2052 [puppet] - 10https://gerrit.wikimedia.org/r/392881 (https://phabricator.wikimedia.org/T176243) [18:23:10] (03CR) 10Chad: [C: 032] updatewikiversions: Only attempt symlink change if needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392723 (owner: 10Chad) [18:23:18] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3782175 (10awight) [18:23:44] (03CR) 10Jcrespo: "I've prepared this but I do not want to deploy so late in the day, will do tomorrow." [puppet] - 10https://gerrit.wikimedia.org/r/392881 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [18:24:17] 10Operations, 10Release Pipeline, 10Release-Engineering-Team (Watching / External): Update Debian package for Blubber - https://phabricator.wikimedia.org/T179984#3782184 (10thcipriani) >>! In T179984#3781320, @akosiaris wrote: > Which means it complains about not finding https://github.com/docker/distributio... [18:24:21] (03CR) 10Ladsgroup: [C: 032] labs: Enable goodfaith in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392880 (https://phabricator.wikimedia.org/T181168) (owner: 10Ladsgroup) [18:24:23] (03Merged) 10jenkins-bot: updatewikiversions: Only attempt symlink change if needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392723 (owner: 10Chad) [18:24:42] (03PS2) 10Jcrespo: mariadb: Switchover codfw s5 master from db2023 to db2052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392879 (https://phabricator.wikimedia.org/T177208) [18:25:31] (03Merged) 10jenkins-bot: labs: Enable goodfaith in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392880 (https://phabricator.wikimedia.org/T181168) (owner: 10Ladsgroup) [18:26:57] (03CR) 10jenkins-bot: updatewikiversions: Only attempt symlink change if needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392723 (owner: 10Chad) [18:28:03] (03PS4) 10ArielGlenn: rsync all dumps status files to web servers and unpack them periodically [puppet] - 10https://gerrit.wikimedia.org/r/392875 (https://phabricator.wikimedia.org/T179857) [18:28:05] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 4 minutes ago with 4 failures. Failed resources (up to 3 shown): Mount[/mnt/data],Package[r-base],Package[r-recommended],Package[r-base-dev] [18:28:27] no_justification: sorry, do you want to rebase tin? [18:28:39] I wanted to rebase and then I saw your change coming in [18:29:21] I'll do it, nbd [18:29:55] thanks [18:30:56] !log demon@tin Pruned MediaWiki: 1.31.0-wmf.7 [keeping static files] (duration: 01m 46s) [18:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:52] !log demon@tin Synchronized scap/plugins/updatewikiversions.py: minor fix (duration: 00m 45s) [18:32:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:14] 10Operations, 10Electron-PDFs, 10Security-Reviews, 10Services-next, and 3 others: Productize the Electron PDF render service & create a REST API end point - https://phabricator.wikimedia.org/T142226#3782256 (10dpatrick) [18:33:17] 10Operations, 10Electron-PDFs, 10Security-Reviews, 10Services-next, and 2 others: Restrict outgoing network connections from Electron render service - https://phabricator.wikimedia.org/T148567#3782253 (10dpatrick) 05Open>03Resolved a:03dpatrick >>! In T148567#3780384, @mobrovac wrote: >>>! In T148567... [18:34:33] !log demon@tin Synchronized wmf-config/InitialiseSettings-labs.php: no-op (duration: 00m 45s) [18:34:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:24] (03PS1) 10Ottomata: Move dataset_mount class out of statistics::compute so exporler can use compute class [puppet] - 10https://gerrit.wikimedia.org/r/392884 (https://phabricator.wikimedia.org/T181094) [18:35:49] (03CR) 10jerkins-bot: [V: 04-1] Move dataset_mount class out of statistics::compute so exporler can use compute class [puppet] - 10https://gerrit.wikimedia.org/r/392884 (https://phabricator.wikimedia.org/T181094) (owner: 10Ottomata) [18:36:52] (03PS2) 10Ottomata: Move dataset_mount class out of statistics::compute [puppet] - 10https://gerrit.wikimedia.org/r/392884 (https://phabricator.wikimedia.org/T181094) [18:37:15] (03CR) 10jerkins-bot: [V: 04-1] Move dataset_mount class out of statistics::compute [puppet] - 10https://gerrit.wikimedia.org/r/392884 (https://phabricator.wikimedia.org/T181094) (owner: 10Ottomata) [18:39:25] (03CR) 10Chad: [C: 032] Close transitionteamwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392620 (https://phabricator.wikimedia.org/T181000) (owner: 10MarcoAurelio) [18:39:46] (03CR) 10Ottomata: [V: 032 C: 032] Move dataset_mount class out of statistics::compute [puppet] - 10https://gerrit.wikimedia.org/r/392884 (https://phabricator.wikimedia.org/T181094) (owner: 10Ottomata) [18:40:32] (03Merged) 10jenkins-bot: Close transitionteamwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392620 (https://phabricator.wikimedia.org/T181000) (owner: 10MarcoAurelio) [18:43:05] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:47:39] !log demon@tin Synchronized dblists/closed.dblist: closed transitionteamwiki (duration: 00m 45s) [18:47:40] (03PS1) 10Jforrester: BetaFeatures whitelist: Update date comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392885 [18:47:41] (03PS1) 10Jforrester: Switch submit button from 'save' to 'publish' on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392886 [18:47:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:23] (03CR) 10Jforrester: [C: 04-2] "Not yet." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392886 (owner: 10Jforrester) [18:50:13] (03CR) 10jenkins-bot: labs: Enable goodfaith in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392880 (https://phabricator.wikimedia.org/T181168) (owner: 10Ladsgroup) [18:51:18] (03CR) 10jenkins-bot: Close transitionteamwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392620 (https://phabricator.wikimedia.org/T181000) (owner: 10MarcoAurelio) [19:06:38] (03PS16) 10Zoranzoki21: Enable the ArticlePlaceholder for sewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387077 (https://phabricator.wikimedia.org/T179241) [19:07:55] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of JeanBono → Rexcornot: supervision needed - https://phabricator.wikimedia.org/T181170#3782418 (10alanajjar) @Marostegui sorry for late! When you here ping me again (Y) [19:08:00] 10Operations, 10Analytics-Kanban, 10hardware-requests: eqiad: (2) hardware access request for jupyter notebook refresh (SWAP) - https://phabricator.wikimedia.org/T175603#3782420 (10Ottomata) p:05Triage>03Normal [19:09:10] (03CR) 10Hashar: "There are several profiles requiring others. My point is that profile::ci::firewall requires profile::base::firewall." [puppet] - 10https://gerrit.wikimedia.org/r/391742 (owner: 10Dzahn) [19:12:10] 10Operations, 10Analytics-Kanban, 10hardware-requests: eqiad: (2) hardware access request for jupyter notebook refresh (SWAP) - https://phabricator.wikimedia.org/T175603#3782444 (10RobH) A few questions: * Do you have a minimum clock speed you need on the CPU, or just whatever the best price point is? ** We... [19:18:00] (03PS1) 10Hashar: Move contint::package_builder to a profile [puppet] - 10https://gerrit.wikimedia.org/r/392893 [19:24:41] 10Operations, 10Analytics-Kanban, 10hardware-requests: eqiad: (2) hardware access request for jupyter notebook refresh (SWAP) - https://phabricator.wikimedia.org/T175603#3782479 (10Ottomata) > Do you have a minimum clock speed you need on the CPU, or just whatever the best price point is? No, but as always,... [19:25:46] RECOVERY - cassandra-c CQL 10.192.32.139:9042 on restbase2004 is OK: TCP OK - 0.036 second response time on 10.192.32.139 port 9042 [19:26:51] (03PS1) 10Smalyshev: Add stemming languages settings for description indexing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392894 (https://phabricator.wikimedia.org/T176903) [19:29:15] 10Operations, 10ORES, 10Scoring-platform-team, 10Traffic, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782495 (10awight) @hoo Wondering if you wrote an incident report, that I can add to with an explanation of ORES's involvement? [19:31:18] (03PS2) 10Dzahn: ci::master/ferm: require base::firewall in ci::firewall [puppet] - 10https://gerrit.wikimedia.org/r/391742 [19:32:31] (03PS3) 10Dzahn: ci::master/ferm: require base::firewall in ci::firewall [puppet] - 10https://gerrit.wikimedia.org/r/391742 [19:32:51] (03CR) 10MaxSem: [C: 031] extract2.php: Stop allowing the www portal templates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391353 (owner: 10Chad) [19:34:16] (03CR) 10Dzahn: "> What I am afraid of is one day someone apply the ci::firewall profile and forget about the base::firewall one :]" [puppet] - 10https://gerrit.wikimedia.org/r/391742 (owner: 10Dzahn) [19:37:49] (03PS1) 10Zoranzoki21: Add domain *nil.org.il to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392895 (https://phabricator.wikimedia.org/T181179) [19:38:01] (03CR) 10Chad: [C: 032] extract2.php: Stop allowing the www portal templates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391353 (owner: 10Chad) [19:38:11] 10Operations, 10ORES, 10Scoring-platform-team, 10Traffic, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782508 (10BBlack) No, we never made an incident rep on this one, and I don't think it would be fair at this time to implicate... [19:38:23] MaxSem: Re that ^, I wonder if we could actually simplify the API doc page to not need extract2 somehow [19:38:29] (03PS2) 10Zoranzoki21: Add domain *nil.org.il to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392895 (https://phabricator.wikimedia.org/T181179) [19:38:31] Like, doesn't seem necessary for mostly static content [19:39:03] ask services? I would totally support getting rid of extract2 completely [19:39:51] (03Merged) 10jenkins-bot: extract2.php: Stop allowing the www portal templates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391353 (owner: 10Chad) [19:39:52] (03CR) 10jenkins-bot: extract2.php: Stop allowing the www portal templates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391353 (owner: 10Chad) [19:39:58] (03PS6) 10Dzahn: apache: Add http2 to mod [puppet] - 10https://gerrit.wikimedia.org/r/392495 (owner: 10Paladox) [19:40:56] 10Operations, 10ORES, 10Scoring-platform-team, 10Traffic, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782516 (10awight) @BBlack Thanks for the detailed notes! All I was going to add was my understanding of how Ext:ORES has the... [19:41:09] !log demon@tin Synchronized w/extract2.php: removing old portal support (duration: 00m 45s) [19:41:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:23] MaxSem: Page has exactly 10 edits. 9 from 2015, one from 2016 [19:41:36] Yeahhh [19:42:12] 10Operations, 10ORES, 10Scoring-platform-team, 10Traffic, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715229 (10Zoranzoki21) Does it made problem with high sleep times in pywiki? [19:42:58] mobrovac: About? Was wondering about your thoughts on the static page at /api/ where we list the REST and action API endpoints. [19:43:19] Right now we dynamically fetch that from meta via extract2.php. That seems like overkill for content that changes so rarely [19:43:53] * bawolff has never liked the fact that those templates are controllable on wiki [19:44:09] * 1 template [19:44:20] the reast are dead, yay [19:44:21] Oh did we kill the rest of them [19:45:52] 10Operations, 10ORES, 10Scoring-platform-team, 10Traffic, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3782522 (10demon) >>! In T179156#3782516, @awight wrote: > @BBlack Thanks for the detailed notes! All I was going to add was m... [19:47:02] Yay! [19:47:28] (03CR) 10Dzahn: [C: 032] "this doesn't actually change any stretch apache, it just makes it possible to include it (http://puppet-compiler.wmflabs.org/8935/netmon10" [puppet] - 10https://gerrit.wikimedia.org/r/392495 (owner: 10Paladox) [19:47:28] for, you know, like 2 years now:) [19:48:19] Well can I give a +1 to moving that page to be fully committed, and get rid of the feature where random admins can inject javascript to literally all of our sites including things like office [19:49:14] (03CR) 10Chad: [C: 032] Remove last vestigates of weird wmfwiki-specific docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385113 (owner: 10Chad) [19:49:27] mooeypoo: I see James_F already cherrypicked it [19:49:46] !log starting cassandra cleanups, restbase-200{1,3,5}-a - T179422 [19:49:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:54] T179422: Reshape RESTBase Cassandra clusters - https://phabricator.wikimedia.org/T179422 [19:50:12] MaxSem: Yeah, RoanKattouw scheduled it for Monday. [19:50:22] (03Merged) 10jenkins-bot: Remove last vestigates of weird wmfwiki-specific docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385113 (owner: 10Chad) [19:50:29] Well, I said I would, I haven't done it yet [19:50:32] (03CR) 10jenkins-bot: Remove last vestigates of weird wmfwiki-specific docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385113 (owner: 10Chad) [19:51:14] RoanKattouw, you have until ... Monday. [19:51:26] * no_justification crosses fingers he doesn't break anything rn [19:51:45] !log demon@tin Synchronized docroot/: removing old foundation docroot (duration: 00m 46s) [19:51:46] (03CR) 10Dzahn: "@jynus Looks like this one is ready to go nowadays, afaict." [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) (owner: 10Tpt) [19:51:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:01] (03CR) 10Chad: [C: 032] multiversion: Assume --wiki=aawiki for purgeUrls.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392561 (owner: 10Krinkle) [19:54:13] (03Merged) 10jenkins-bot: multiversion: Assume --wiki=aawiki for purgeUrls.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392561 (owner: 10Krinkle) [19:54:23] (03CR) 10jenkins-bot: multiversion: Assume --wiki=aawiki for purgeUrls.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392561 (owner: 10Krinkle) [19:55:43] (03CR) 10Chad: [C: 032] BetaFeatures whitelist: Update date comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392885 (owner: 10Jforrester) [19:56:19] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/8936/mwdebug1001.eqiad.wmnet/ and "violations delta" = -3" [puppet] - 10https://gerrit.wikimedia.org/r/392768 (owner: 10Dzahn) [19:56:21] !log demon@tin Synchronized multiversion/MWScript.php: assume aawiki for purgeUrls (duration: 00m 45s) [19:56:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:59] (03Merged) 10jenkins-bot: BetaFeatures whitelist: Update date comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392885 (owner: 10Jforrester) [19:57:07] (03CR) 10Chad: [C: 04-1] "I thought we left them there so people could verify old releases too" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392463 (owner: 10Legoktm) [19:58:12] (03CR) 10jenkins-bot: BetaFeatures whitelist: Update date comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392885 (owner: 10Jforrester) [19:58:15] (03CR) 10Legoktm: "See T180615 - the keys are still listed on keys.html, just not in the txt version that people should be automatically importing." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392463 (owner: 10Legoktm) [19:58:19] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: comments (duration: 00m 45s) [19:58:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:39] (03CR) 10Legoktm: "Why were all the dates extended?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392885 (owner: 10Jforrester) [19:58:59] (03PS5) 10Chad: multiversion: Fix PHP notice when no argument is given [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371967 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [20:00:03] (03CR) 10Chad: [C: 032] multiversion: Fix PHP notice when no argument is given [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371967 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [20:00:16] (03CR) 10Dzahn: [C: 031] "seaborgium = labs LDAP , dubnium = corp LDAP, no-op http://puppet-compiler.wmflabs.org/8937/ and violation delta -8" [puppet] - 10https://gerrit.wikimedia.org/r/391737 (owner: 10Dzahn) [20:01:16] (03Merged) 10jenkins-bot: multiversion: Fix PHP notice when no argument is given [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371967 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [20:01:40] (03CR) 10jenkins-bot: multiversion: Fix PHP notice when no argument is given [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371967 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [20:02:11] (03PS3) 10Dzahn: openldap: move firewall/standard to roles, use profile [puppet] - 10https://gerrit.wikimedia.org/r/391737 [20:02:23] !log demon@tin Synchronized multiversion/bin/expanddblist: fix param warning (duration: 00m 45s) [20:02:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:34] (03CR) 10jerkins-bot: [V: 04-1] openldap: move firewall/standard to roles, use profile [puppet] - 10https://gerrit.wikimedia.org/r/391737 (owner: 10Dzahn) [20:02:47] (03Draft1) 10Paladox: logstash: Add filter_log4j [puppet] - 10https://gerrit.wikimedia.org/r/392897 [20:02:50] (03PS2) 10Paladox: logstash: Add filter_log4j [puppet] - 10https://gerrit.wikimedia.org/r/392897 [20:03:12] (03CR) 10Paladox: logstash: Add filter_log4j (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:03:43] (03PS4) 10Dzahn: openldap: move firewall/standard to roles, use profile [puppet] - 10https://gerrit.wikimedia.org/r/391737 [20:03:46] (03PS3) 10Paladox: logstash: Add filter_log4j [puppet] - 10https://gerrit.wikimedia.org/r/392897 [20:03:51] (03CR) 10Chad: Disable EducationProgram on cs.wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391163 (https://phabricator.wikimedia.org/T180426) (owner: 10Urbanecm) [20:04:06] (03CR) 10jerkins-bot: [V: 04-1] openldap: move firewall/standard to roles, use profile [puppet] - 10https://gerrit.wikimedia.org/r/391737 (owner: 10Dzahn) [20:04:19] (03CR) 10EBernhardson: logstash: Add filter_log4j (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:05:25] (03PS3) 10Chad: Disable DisableAccount on wikis where there are no disabled users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338792 (https://phabricator.wikimedia.org/T106067) (owner: 10Reedy) [20:05:38] (03PS4) 10Paladox: logstash: Add filter_log4j [puppet] - 10https://gerrit.wikimedia.org/r/392897 [20:05:40] (03CR) 10Paladox: logstash: Add filter_log4j (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:06:10] (03CR) 10Chad: [C: 032] robots.txt: block MJ12bot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382494 (owner: 10BBlack) [20:07:15] (03Merged) 10jenkins-bot: robots.txt: block MJ12bot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382494 (owner: 10BBlack) [20:07:37] (03CR) 10Paladox: openldap: move firewall/standard to roles, use profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/391737 (owner: 10Dzahn) [20:08:48] (03PS17) 10Zoranzoki21: Enable the ArticlePlaceholder for sewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387077 (https://phabricator.wikimedia.org/T179241) [20:08:55] (03PS3) 10Zoranzoki21: Add domain *nil.org.il to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392895 (https://phabricator.wikimedia.org/T181179) [20:09:13] !log demon@tin Synchronized robots.txt: block a nasty bot 💔 (duration: 00m 44s) [20:09:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:28] (03CR) 10jenkins-bot: robots.txt: block MJ12bot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382494 (owner: 10BBlack) [20:10:13] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/8938/ and "violations delta" -9" [puppet] - 10https://gerrit.wikimedia.org/r/391731 (owner: 10Dzahn) [20:10:50] (03PS5) 10Paladox: openldap: move firewall/standard to roles, use profile [puppet] - 10https://gerrit.wikimedia.org/r/391737 (owner: 10Dzahn) [20:10:50] (03CR) 10Dzahn: openldap: move firewall/standard to roles, use profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/391737 (owner: 10Dzahn) [20:11:10] (03PS2) 10Zoranzoki21: Disable EducationProgram on cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391163 (https://phabricator.wikimedia.org/T180426) (owner: 10Urbanecm) [20:11:27] (03CR) 10Zoranzoki21: ">" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391163 (https://phabricator.wikimedia.org/T180426) (owner: 10Urbanecm) [20:11:34] (03CR) 10Chad: [C: 032] Add domain *nil.org.il to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392895 (https://phabricator.wikimedia.org/T181179) (owner: 10Zoranzoki21) [20:12:39] (03Merged) 10jenkins-bot: Add domain *nil.org.il to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392895 (https://phabricator.wikimedia.org/T181179) (owner: 10Zoranzoki21) [20:12:48] (03CR) 10jenkins-bot: Add domain *nil.org.il to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392895 (https://phabricator.wikimedia.org/T181179) (owner: 10Zoranzoki21) [20:13:13] !log otto@tin Started deploy [eventlogging/analytics@57234e7]: no-op: removing now unneeded code that might accidentally serialize userAgent to json string: T179625 [20:13:17] !log otto@tin Finished deploy [eventlogging/analytics@57234e7]: no-op: removing now unneeded code that might accidentally serialize userAgent to json string: T179625 (duration: 00m 04s) [20:13:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:13:21] T179625: Resolve EventCapsule / MySQL / Hive schema discrepancies - https://phabricator.wikimedia.org/T179625 [20:13:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:13:57] hmm visting http://mj12bot.com/ crashed my browser (safari) [20:14:23] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: adding national library of Israel to copy domains (duration: 00m 45s) [20:14:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:15:04] (03PS3) 10Chad: Use hosts in wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361055 (owner: 10Dereckson) [20:15:29] paladox: i didnt crash [20:15:33] Safari [20:15:44] hmm [20:16:00] (03PS6) 10Dzahn: openldap: move firewall/standard to roles, use profile [puppet] - 10https://gerrit.wikimedia.org/r/391737 [20:16:06] PROBLEM - eventstreams on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 8092: Connection refused [20:16:23] (03CR) 10Chad: [C: 032] Use hosts in wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361055 (owner: 10Dereckson) [20:16:55] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/summary/{title}{/revision}{/tid} (Get summary for Barack Obama) timed out before a response was received: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received: /{domain}/v1/feed/onthisday/{type}/{mm}/{dd} (retrieve all events on January 15) timed out before a response [20:16:55] main}/v1/page/definition/{title} (retrieve en-wiktionary definitions for cat) timed out before a response was received [20:17:06] RECOVERY - eventstreams on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 929 bytes in 0.028 second response time [20:17:14] (03CR) 10Chad: "Still needed?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377929 (https://phabricator.wikimedia.org/T175868) (owner: 10Gergő Tisza) [20:17:25] (03PS7) 10Dzahn: openldap: move firewall/standard to roles, use profile [puppet] - 10https://gerrit.wikimedia.org/r/391737 [20:17:36] (03Merged) 10jenkins-bot: Use hosts in wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361055 (owner: 10Dereckson) [20:17:55] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [20:18:09] (03CR) 10Zoranzoki21: [C: 031] "Looks good to me, but someone else must approve" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338792 (https://phabricator.wikimedia.org/T106067) (owner: 10Reedy) [20:18:27] (03Abandoned) 10Dzahn: ci::master/ferm: require base::firewall in ci::firewall [puppet] - 10https://gerrit.wikimedia.org/r/391742 (owner: 10Dzahn) [20:18:30] (03PS2) 10Chad: Check config variables are set before applying [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383179 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [20:18:38] (03CR) 10Chad: [C: 032] Check config variables are set before applying [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383179 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [20:19:05] !log demon@tin Synchronized wmf-config/LabsServices.php: no-op (duration: 00m 45s) [20:19:06] (03PS4) 10Chad: Allow to regenerate computed dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371968 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [20:19:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:19:15] (03CR) 10Chad: [C: 032] Allow to regenerate computed dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371968 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [20:19:49] (03Merged) 10jenkins-bot: Check config variables are set before applying [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383179 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [20:20:07] (03CR) 10jenkins-bot: Use hosts in wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/361055 (owner: 10Dereckson) [20:20:32] (03CR) 10Jforrester: "> Why were all the dates extended?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392885 (owner: 10Jforrester) [20:20:57] (03CR) 10Chad: "I wonder if we could write a quick and dirty test that'll yell at us if people add entries to dblists that aren't alphabetical?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [20:20:59] (03Merged) 10jenkins-bot: Allow to regenerate computed dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371968 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [20:21:26] (03CR) 10Chad: "Wait wtf?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383179 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [20:21:31] (03CR) 10Chad: "Gerrit....." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383179 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [20:21:32] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3782577 (10Dzahn) >>! In T180978#3781833, @demon wrote: >> I'm proposing we lower the priority on this and let another service (preferably one with less depending on it)... [20:21:39] (03CR) 10Chad: "Why would you allow me to merge an empty commit?!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383179 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [20:22:21] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/8939/labnodepool1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/392769 (owner: 10Dzahn) [20:22:34] (03PS2) 10Dzahn: archiva: move standard include, use profile::b::firewall [puppet] - 10https://gerrit.wikimedia.org/r/392770 [20:23:22] !log demon@tin Synchronized tests/noc-conf/NOCDblistTest.php: no-op (duration: 00m 45s) [20:23:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:50] (03CR) 10Zoranzoki21: "> I wonder if we could write a quick and dirty test that'll yell at" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [20:24:29] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/8940/meitnerium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/392770 (owner: 10Dzahn) [20:25:14] (03CR) 10Chad: "Yes yes, they are in this change. I mean for future changes." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [20:25:25] !log demon@tin Synchronized dblists/Makefile: no-op (duration: 00m 45s) [20:25:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:03] (03CR) 10Dzahn: "no-op on meitnerium.wikimedia.org" [puppet] - 10https://gerrit.wikimedia.org/r/392770 (owner: 10Dzahn) [20:27:52] Huge spike in fcgi errors from mw1264 about 5mins ago [20:28:11] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (127.0.0.1) failed[proxy:error] [pid 37276:tid 139868361422592] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (127.0.0.1) failed [20:28:23] Seems to have just been a spike, maybe transient [20:28:24] (03CR) 10Zoranzoki21: "> Yes yes, they are in this change. I mean for future changes." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [20:30:18] (03PS4) 10Chad: Sort dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [20:32:26] (03CR) 10Dzahn: "but will Gerrit really always be the only service using "type log4j"? It seems there might be other and then it'd be bad to replace the ho" [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:32:54] (03CR) 10Paladox: "> but will Gerrit really always be the only service using "type" [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:33:03] (03CR) 10Chad: [C: 032] Sort dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [20:34:02] (03CR) 10Dzahn: "also we might want to know what is from cobalt and what is from gerrit2001, right" [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:34:12] (03Merged) 10jenkins-bot: Sort dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [20:34:25] (03CR) 10Dzahn: "while at the same time of course it would be nice being able to just search for "gerrit" and get them all" [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:35:24] !log demon@tin Synchronized dblists/: now with more alphabeticalizedness (duration: 00m 45s) [20:35:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:17] (03CR) 10Chad: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367465 (owner: 10Reedy) [20:40:00] (03CR) 10Krinkle: "Can we enforce this with a unit test? Looks like we already have some unit tests for dblists, should be relatively straight forward to enf" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [20:43:42] (03CR) 10Zoranzoki21: "@Chad Deploy patch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391163 (https://phabricator.wikimedia.org/T180426) (owner: 10Urbanecm) [20:46:13] (03PS18) 10Zoranzoki21: Enable the ArticlePlaceholder for sewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387077 (https://phabricator.wikimedia.org/T179241) [20:49:10] (03CR) 10jerkins-bot: [V: 04-1] Update mediawiki-codesniffer to 14.1.0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367465 (owner: 10Reedy) [20:49:34] (03CR) 10Paladox: "@Dzahn i've been searching around but carn't find anything that will allow us to set this client side. But also looking, no one is using t" [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:50:43] (03CR) 10Dzahn: "what if we just add the tag but don't do anything with hostnames, wouldn't that fix it too, so that logs show up?" [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:50:57] (03CR) 10Chad: "Deploy patch? How about saying please next time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391163 (https://phabricator.wikimedia.org/T180426) (owner: 10Urbanecm) [20:51:06] (03CR) 10Paladox: "> what if we just add the tag but don't do anything with hostnames," [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [20:51:19] (03PS5) 10Paladox: logstash: Add filter_log4j [puppet] - 10https://gerrit.wikimedia.org/r/392897 [20:53:31] 10Operations, 10ops-ulsfo, 10Traffic: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3782634 (10BBlack) [20:53:33] 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: rack/setup/install cp40(29|3[012]).ulsfo.wmnet - https://phabricator.wikimedia.org/T178423#3782633 (10BBlack) 05Open>03Resolved [20:55:15] (03CR) 10MaxSem: Sort dblists (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [20:57:03] (03CR) 10Chad: "It shouldn't affect those paths, as they don't include these directories." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391355 (owner: 10Chad) [20:57:27] (03CR) 10Zoranzoki21: "> Deploy patch? How about saying please next time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391163 (https://phabricator.wikimedia.org/T180426) (owner: 10Urbanecm) [21:01:35] (03PS1) 10Hashar: Move contint::hhvm to a profile [puppet] - 10https://gerrit.wikimedia.org/r/392925 [21:01:54] 10Operations, 10ops-ulsfo, 10Traffic: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3782641 (10BBlack) Recapping where we're at on all things here, because even I get lost sometimes: Of the old hosts being decommed, the only one still in live use are: * bast4001 (blocking on bast40... [21:02:03] (03CR) 10jerkins-bot: [V: 04-1] Move contint::hhvm to a profile [puppet] - 10https://gerrit.wikimedia.org/r/392925 (owner: 10Hashar) [21:02:57] (03CR) 10Marostegui: mariadb: Switchover s5 codfw master (db2023) to db2052 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392881 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [21:05:36] (03CR) 10Dzahn: [C: 031] "lgtmafaict, a +1 from ebernhardson would make me merge it :)" [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [21:06:26] (03CR) 10Marostegui: mariadb: Switchover codfw s5 master from db2023 to db2052 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392879 (https://phabricator.wikimedia.org/T177208) (owner: 10Jcrespo) [21:07:16] (03CR) 10Marostegui: [C: 031] "Works for me!" [puppet] - 10https://gerrit.wikimedia.org/r/392400 (https://phabricator.wikimedia.org/T170662) (owner: 10Jcrespo) [21:08:00] (03PS1) 10Chad: noc: Use local static favicon location instead of external one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392927 [21:08:02] (03CR) 10Chad: [C: 032] noc: Use local static favicon location instead of external one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392927 (owner: 10Chad) [21:08:26] (03CR) 10Jforrester: "This really is unhelpful for the visualeditor-nondefault list which has comments there to explain both the order and why things are in the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [21:09:15] (03Merged) 10jenkins-bot: noc: Use local static favicon location instead of external one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392927 (owner: 10Chad) [21:10:06] (03PS2) 10Hashar: Move contint::hhvm to a profile [puppet] - 10https://gerrit.wikimedia.org/r/392925 [21:10:08] (03PS1) 10Hashar: contint: libcurl4-gnutls-dev is now absent [puppet] - 10https://gerrit.wikimedia.org/r/392928 [21:10:10] (03PS1) 10Hashar: Install hhvm dev packages from the profile [puppet] - 10https://gerrit.wikimedia.org/r/392929 [21:11:35] !log demon@tin Synchronized docroot/noc/conf/index.php: favicon fix (duration: 00m 46s) [21:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:11:41] (03CR) 10EBernhardson: [C: 031] "This can be improved, but this is enough to get things going. We might need a log4j expert to tweak the gerrit config to get anythung furt" [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [21:11:53] (03CR) 10Chad: "Tbh, we have too many dblists as it is :\" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [21:12:03] (03CR) 10Chad: "But we can revert that particular file" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [21:12:34] (03PS6) 10Dzahn: logstash: Add filter_log4j [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [21:12:49] (03CR) 10Dzahn: [C: 032] "thanks for the help, ebernhardson!" [puppet] - 10https://gerrit.wikimedia.org/r/392897 (owner: 10Paladox) [21:14:15] MaxSem: Would love your thoughts on https://gerrit.wikimedia.org/r/#/c/391355/, if you have a moment [21:14:31] ugh [21:14:50] I've seen it, but I've completely forgotten by now how this stuff works [21:15:08] Worst case we deploy and everything on the portals breaks [21:15:09] !log running puppet on logstash hosts to apply config change 392897 (add log4j filter) [21:15:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:15] Best case, my assumption is right :p [21:21:10] (03CR) 10MaxSem: [C: 031] Remove www.*.org symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391355 (owner: 10Chad) [21:21:19] looks right but... [21:21:40] today is the best day to break shit, right? :P [21:21:52] i was just thinking that too :) [21:22:07] +1 [21:22:10] MaxSem: Hey, I already almost broke things once :p [21:22:37] The foundationwiki docroot thing was also a possible explosion :p [21:22:56] (03PS1) 10Andrew Bogott: Horizon puppet: Merge role-associated hiera values with freeform hiera [puppet] - 10https://gerrit.wikimedia.org/r/392939 (https://phabricator.wikimedia.org/T181196) [21:23:25] Granted, portals probably get more traffic (sorry wmfwiki) [21:24:11] (03Draft1) 10Paladox: gerrit: Add ReconnectionDelay to socket [puppet] - 10https://gerrit.wikimedia.org/r/392943 [21:24:15] (03PS2) 10Paladox: gerrit: Add ReconnectionDelay to socket [puppet] - 10https://gerrit.wikimedia.org/r/392943 [21:27:11] no_justification: do you want to take another look at my keys patch series? [21:27:20] After lunch, yeah [21:27:21] :) [21:27:45] (03PS3) 10Paladox: gerrit: Add ReconnectionDelay to socket [puppet] - 10https://gerrit.wikimedia.org/r/392943 [21:27:48] Thanks :) [21:28:12] (03CR) 10Dzahn: "turns out the delay is in milliseconds, not seconds" [puppet] - 10https://gerrit.wikimedia.org/r/392943 (owner: 10Paladox) [21:29:06] (03CR) 10Paladox: "> turns out the delay is in milliseconds, not seconds" [puppet] - 10https://gerrit.wikimedia.org/r/392943 (owner: 10Paladox) [21:29:33] (03CR) 10Dzahn: [C: 032] gerrit: Add ReconnectionDelay to socket [puppet] - 10https://gerrit.wikimedia.org/r/392943 (owner: 10Paladox) [21:35:09] paladox: IT WORKS :) [21:35:14] yay [21:35:15] type:log4j search [21:35:18] no_justification ^^ [21:35:19] from gerrit2001 [21:35:27] :) [21:35:36] Error in listener com.google.gerrit.server.events.StreamEventsApiListener for event com.google.gerrit.server.extensions.events.GitReferenceUpdated: null [21:35:46] :) [21:35:51] ^ normal noise, just saying it's in logstash, heh [21:35:53] right [21:36:21] paladox: the host name shows as IP address [21:36:31] omg yay [21:36:34] yayayayayayay [21:36:37] that is even cobalt [21:36:39] :) [21:36:41] and i did NOT restart there, heh [21:36:49] * no_justification needs to set up some dashboards now [21:36:53] :) yea [21:37:22] :) [21:37:25] (03PS1) 10Hashar: Move contint::browsertests to a profile [puppet] - 10https://gerrit.wikimedia.org/r/392956 [21:38:15] (03PS2) 10Andrew Bogott: Horizon puppet: Merge role-associated hiera values with freeform hiera [puppet] - 10https://gerrit.wikimedia.org/r/392939 (https://phabricator.wikimedia.org/T181196) [21:38:17] (03PS1) 10Andrew Bogott: labtest: move hiera puppetmaster to labtestpuppetmaster2001 [puppet] - 10https://gerrit.wikimedia.org/r/392958 [21:38:30] no_justification you may need to filter on type:log4j for now [21:38:39] as host: will be specifying logstash ip. [21:39:06] though i see no one else using log4j. so it's safe for now :) [21:39:16] (03CR) 10Andrew Bogott: [C: 032] labtest: move hiera puppetmaster to labtestpuppetmaster2001 [puppet] - 10https://gerrit.wikimedia.org/r/392958 (owner: 10Andrew Bogott) [21:41:45] (03CR) 10Jcrespo: [C: 031] "The problems is not merging, the problem is it requires some manual stuff after deploying, and I need to get some free time to deploy it (" [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) (owner: 10Tpt) [21:42:14] (03CR) 10Rush: "spoke with madhu today and she agreed to merge and pilot this on labstore1006/7 in service of revamping dumps. maybe as soon as next week " [puppet] - 10https://gerrit.wikimedia.org/r/353508 (https://phabricator.wikimedia.org/T165136) (owner: 10Muehlenhoff) [21:45:56] (03PS2) 10Rush: labnodepool: move standard/firewall includes to role [puppet] - 10https://gerrit.wikimedia.org/r/392769 (owner: 10Dzahn) [21:48:45] (03CR) 10Thiemo Mättig (WMDE): "Automation is fine, but manually sorting this list once a year will still be faster than writing any tool. ;-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [21:49:38] (03CR) 10Jforrester: "> Tbh, we have too many dblists as it is :\" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [21:50:02] (03CR) 10Jcrespo: mariadb: Switchover s5 codfw master (db2023) to db2052 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392881 (https://phabricator.wikimedia.org/T176243) (owner: 10Jcrespo) [21:50:57] mutante: WIP, but at least groups all type:log4j together https://logstash.wikimedia.org/goto/8e6b60e186e4409e3a1b0d02cc96fda3 [21:53:44] (03CR) 10Jcrespo: mariadb: Switchover codfw s5 master from db2023 to db2052 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392879 (https://phabricator.wikimedia.org/T177208) (owner: 10Jcrespo) [21:54:34] (03PS1) 10Hashar: Move contint::browsers to a profile [puppet] - 10https://gerrit.wikimedia.org/r/392976 [22:01:17] (03PS1) 10Ottomata: [WIP] Puppetization for superset [puppet] - 10https://gerrit.wikimedia.org/r/392978 (https://phabricator.wikimedia.org/T166689) [22:01:40] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Puppetization for superset [puppet] - 10https://gerrit.wikimedia.org/r/392978 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata) [22:05:47] (03PS2) 10Ottomata: [WIP] Puppetization for superset [puppet] - 10https://gerrit.wikimedia.org/r/392978 (https://phabricator.wikimedia.org/T166689) [22:06:06] (03PS3) 10Ottomata: [WIP] Puppetization for superset [puppet] - 10https://gerrit.wikimedia.org/r/392978 (https://phabricator.wikimedia.org/T166689) [22:11:37] 10Operations, 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3782789 (10Nuria) @Nettrom Right, is not only that dashboard but all the ones that are feed data via reportupdater that needed the new configura... [22:16:30] no_justification: cool :)! [22:20:39] (03CR) 10Andrew Bogott: [C: 032] Horizon puppet: Merge role-associated hiera values with freeform hiera [puppet] - 10https://gerrit.wikimedia.org/r/392939 (https://phabricator.wikimedia.org/T181196) (owner: 10Andrew Bogott) [22:24:53] paladox: take the honors and close the Phab task for logstash. You earned that one, thanks so much for pushing it through [22:25:08] Your welcome but dzahn beat me to it :) [22:26:20] Gj paladox [22:26:34] :) [22:29:06] i assigned it to you because he is right, you earned that one [22:29:23] (03PS1) 10Andrew Bogott: horizon pupettab: Handle the user passing in an empty hiera block [puppet] - 10https://gerrit.wikimedia.org/r/392980 (https://phabricator.wikimedia.org/T181196) [22:29:45] thanks :). [22:31:29] (03PS2) 10Andrew Bogott: horizon pupettab: Handle the user passing in an empty hiera block [puppet] - 10https://gerrit.wikimedia.org/r/392980 (https://phabricator.wikimedia.org/T181196) [22:33:06] (03CR) 10Andrew Bogott: [C: 032] horizon pupettab: Handle the user passing in an empty hiera block [puppet] - 10https://gerrit.wikimedia.org/r/392980 (https://phabricator.wikimedia.org/T181196) (owner: 10Andrew Bogott) [22:42:18] (03CR) 10jenkins-bot: Check config variables are set before applying [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383179 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [22:43:02] (03CR) 10jenkins-bot: Allow to regenerate computed dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371968 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [22:43:04] (03CR) 10jenkins-bot: Sort dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [22:43:06] (03CR) 10jenkins-bot: noc: Use local static favicon location instead of external one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392927 (owner: 10Chad) [22:45:26] (03CR) 10Dzahn: "since https://gerrit.wikimedia.org/r/#/c/391241/ has been merged.. this should now be fine and not result in new dashboards being affected" [puppet] - 10https://gerrit.wikimedia.org/r/392764 (owner: 10Dzahn) [22:45:44] (03PS2) 10Dzahn: Revert "Revert "hhvm: remove ganglia monitoring"" [puppet] - 10https://gerrit.wikimedia.org/r/392764 [22:46:57] (03PS3) 10Dzahn: Revert "Revert "hhvm: remove ganglia monitoring"" [puppet] - 10https://gerrit.wikimedia.org/r/392764 (https://phabricator.wikimedia.org/T177225) [22:47:35] (03Draft1) 10Paladox: planet: Enable http/2 mod for apache [puppet] - 10https://gerrit.wikimedia.org/r/392983 [22:47:38] (03PS2) 10Paladox: planet: Enable http/2 mod for apache [puppet] - 10https://gerrit.wikimedia.org/r/392983 [22:47:49] (03CR) 10Dzahn: "re-revert in https://gerrit.wikimedia.org/r/#/c/392764/ since https://gerrit.wikimedia.org/r/#/c/391241/ has been merged the reason to re" [puppet] - 10https://gerrit.wikimedia.org/r/382915 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [22:47:59] (03CR) 10jerkins-bot: [V: 04-1] planet: Enable http/2 mod for apache [puppet] - 10https://gerrit.wikimedia.org/r/392983 (owner: 10Paladox) [22:49:43] (03PS3) 10Paladox: planet: Enable http/2 mod for apache [puppet] - 10https://gerrit.wikimedia.org/r/392983 [22:50:04] (03CR) 10jerkins-bot: [V: 04-1] planet: Enable http/2 mod for apache [puppet] - 10https://gerrit.wikimedia.org/r/392983 (owner: 10Paladox) [22:51:24] (03PS4) 10Paladox: planet: Enable http/2 mod for apache [puppet] - 10https://gerrit.wikimedia.org/r/392983 [22:51:45] (03CR) 10jerkins-bot: [V: 04-1] planet: Enable http/2 mod for apache [puppet] - 10https://gerrit.wikimedia.org/r/392983 (owner: 10Paladox) [22:51:52] (03CR) 10Dzahn: [C: 032] "graphs replaced by https://grafana.wikimedia.org/dashboard/db/prometheus-apache-hhvm-dc-stats?orgId=1" [puppet] - 10https://gerrit.wikimedia.org/r/392764 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [22:53:28] (03PS3) 10TerraCodes: Enable local uploads for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/390303 (https://phabricator.wikimedia.org/T166763) [22:54:31] (03PS12) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [22:54:56] (03CR) 10Dzahn: "nothing happens on appservers because this had already been merged before, it was just reverted because we needed the $cluster setting fro" [puppet] - 10https://gerrit.wikimedia.org/r/392764 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [22:56:06] PROBLEM - puppet last run on mw1224 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/lib/ganglia/python_modules/hhvm_mem.py],File[/etc/ganglia/conf.d/hhvm_health.pyconf] [22:56:59] ^ that is caused by my change but no reason to worry , it's decom'ing ganglia [22:57:11] and just some race due to the large number of servers [22:57:22] (03PS5) 10Paladox: planet: Enable http/2 mod for apache [puppet] - 10https://gerrit.wikimedia.org/r/392983 [22:57:25] PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ganglia/conf.d/hhvm_mem.pyconf] [22:57:35] PROBLEM - puppet last run on mw2118 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/lib/ganglia/python_modules/hhvm_mem.py],File[/usr/lib/ganglia/python_modules/hhvm_health.py] [22:57:42] (03PS6) 10Paladox: planet: Enable http/2 mod for apache [puppet] - 10https://gerrit.wikimedia.org/r/392983 [22:58:08] runs puppet on those to show the recoveries [22:58:36] PROBLEM - puppet last run on mw2213 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/lib/ganglia/python_modules/hhvm_mem.py],File[/usr/lib/ganglia/python_modules/hhvm_health.py] [22:59:25] sees entirely unrelated issue with jobchron unit [22:59:35] PROBLEM - puppet last run on mw2150 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/lib/ganglia/python_modules/hhvm_mem.py],File[/usr/lib/ganglia/python_modules/hhvm_health.py] [23:00:50] 10Operations, 10MediaWiki-General-or-Unknown, 10TechCom-RfC: Bump PHP requirement to 5.6 in 1.31 - https://phabricator.wikimedia.org/T178538#3782867 (10tstarling) 05Open>03declined [23:02:25] RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:02:35] RECOVERY - puppet last run on mw2118 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [23:03:36] RECOVERY - puppet last run on mw2213 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:04:35] RECOVERY - puppet last run on mw2150 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:05:03] (03PS7) 10Paladox: planet: Add support for http/2 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/392983 [23:08:17] (03PS8) 10Paladox: planet: Add support for http/2 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/392983 [23:09:45] (03CR) 10Dzahn: "https://grafana.wikimedia.org/dashboard/db/prometheus-apache-hhvm-dc-stats?orgId=1 still working this time :)" [puppet] - 10https://gerrit.wikimedia.org/r/392764 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [23:11:41] (03CR) 10Dzahn: [C: 032] "compiler nicely shows the diff between jessie (1001) and stretch (2001) http://puppet-compiler.wmflabs.org/8942/" [puppet] - 10https://gerrit.wikimedia.org/r/392983 (owner: 10Paladox) [23:11:55] (03PS9) 10Dzahn: planet: Add support for http/2 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/392983 (owner: 10Paladox) [23:14:12] (03PS10) 10Paladox: planet: Add support for http/2 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/392983 (https://phabricator.wikimedia.org/T168490) [23:14:59] 10Operations, 10Wikimedia-Planet: Enable http/2 for planet apache - https://phabricator.wikimedia.org/T181202#3782900 (10Paladox) [23:15:19] (03PS11) 10Paladox: planet: Add support for http/2 on stretch [puppet] - 10https://gerrit.wikimedia.org/r/392983 (https://phabricator.wikimedia.org/T181202) [23:16:27] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3782919 (10Dzahn) Since https://gerrit.wikimedia.org/r/#/c/391241/ was merged (thanks Filippo!) i was able to re-revert the "remove HHVM ganglia from appservers" change, so files (htt... [23:17:44] 10Operations, 10Wikimedia-Planet, 10Patch-For-Review: Enable http/2 for planet apache - https://phabricator.wikimedia.org/T181202#3782920 (10Dzahn) [23:18:31] 10Operations, 10Wikimedia-Planet, 10Patch-For-Review: Enable http/2 for planet apache - https://phabricator.wikimedia.org/T181202#3782900 (10Dzahn) [23:19:28] 10Operations, 10HHVM, 10User-Elukey: Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy - https://phabricator.wikimedia.org/T177498#3660812 (10Bawolff) [I was asked to post this here] We need to do the libicu transition because sort order and in particular the binary sort ke... [23:20:34] (03CR) 10Dzahn: "prod planet is unaffected. planet2001 doesn't get traffic but now:" [puppet] - 10https://gerrit.wikimedia.org/r/392983 (https://phabricator.wikimedia.org/T181202) (owner: 10Paladox) [23:24:25] (03PS2) 10Dzahn: Revert "Revert "apache: remove ganglia monitoring"" [puppet] - 10https://gerrit.wikimedia.org/r/392763 [23:25:01] (03CR) 10Dzahn: "same as the previous HHVM change, since https://gerrit.wikimedia.org/r/#/c/391241/ was merged this should now be just fine and good to go " [puppet] - 10https://gerrit.wikimedia.org/r/392763 (owner: 10Dzahn) [23:26:06] RECOVERY - puppet last run on mw1224 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:26:32] 10Operations, 10Wikimedia-Planet, 10Patch-For-Review: Enable http/2 for planet apache - https://phabricator.wikimedia.org/T181202#3782937 (10Dzahn) //Yeah! planet-hotdog.wmflabs.org supports HTTP/2.0.// from https://tools.keycdn.com/http2-test using URL https://planet-hotdog.wmflabs.org after the above... [23:35:57] (03PS3) 10Dzahn: Revert "Revert "apache: remove ganglia monitoring"" [puppet] - 10https://gerrit.wikimedia.org/r/392763 [23:54:34] (03CR) 10Dzahn: [C: 032] Revert "Revert "apache: remove ganglia monitoring"" [puppet] - 10https://gerrit.wikimedia.org/r/392763 (owner: 10Dzahn) [23:55:18] This might cause some more false alerts, but hard to avoid, besides killing the bot, closely watching it. [23:55:27] (not talking about paging or anything) [23:59:15] PROBLEM - puppet last run on mw2258 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/lib/ganglia/python_modules/apache_status.py]