[00:09:58] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10decommission: Decommission iron - https://phabricator.wikimedia.org/T220505 (10Papaul) ` apaul@asw2-b-eqiad# show | compare  [edit interfaces] -   ge-4/0/8 { -       description iron; -   }
[00:10:20] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10decommission: Decommission iron - https://phabricator.wikimedia.org/T220505 (10Papaul)
[00:13:38] <wikibugs>	 (03PS1) 10Papaul: DNS: Remove mgmt DNS for iron [dns] - 10https://gerrit.wikimedia.org/r/542635
[00:20:47] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for iron [dns] - 10https://gerrit.wikimedia.org/r/542635 (owner: 10Papaul)
[00:22:10] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10decommission, 10Patch-For-Review: Decommission iron - https://phabricator.wikimedia.org/T220505 (10Papaul)
[00:22:33] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10decommission, 10Patch-For-Review: Decommission iron - https://phabricator.wikimedia.org/T220505 (10Papaul) 05Open→03Resolved Complete
[00:31:59] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 36 probes of 464 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[00:37:41] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 23 probes of 464 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[01:52:55] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[02:12:13] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 55 probes of 464 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[02:14:09] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[02:16:17] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 45 probes of 464 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[02:16:21] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 65 probes of 464 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[02:19:03] <wikibugs>	 (03PS1) 10Krinkle: Remove unused wmgReduceStartupExpiry logic in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542641 (https://phabricator.wikimedia.org/T235314)
[02:19:05] <wikibugs>	 (03PS1) 10Krinkle: Remove wmgReduceStartupExpiry (no longer used) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542642 (https://phabricator.wikimedia.org/T235314)
[02:21:55] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 25 probes of 464 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[02:21:59] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 23 probes of 464 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[02:23:25] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 27 probes of 464 (alerts on 35) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[02:36:23] <wikibugs>	 (03PS1) 10Niharika29: Auto-set global prefs for email-blacklist and echo-notifications-blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542644
[02:39:38] <wikibugs>	 (03CR) 10Niharika29: [C: 03+2] "Patch for labs only." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542644 (owner: 10Niharika29)
[02:40:28] <wikibugs>	 (03Merged) 10jenkins-bot: Auto-set global prefs for email-blacklist and echo-notifications-blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542644 (owner: 10Niharika29)
[02:49:31] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 30486840 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[02:59:17] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 2104 and 85 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[03:30:05] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[03:40:43] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[04:12:44] <wikibugs>	 (03PS1) 10Krinkle: [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658
[04:13:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658 (owner: 10Krinkle)
[04:14:01] <wikibugs>	 (03PS2) 10Krinkle: [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658
[04:14:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658 (owner: 10Krinkle)
[04:14:58] <wikibugs>	 (03PS3) 10Krinkle: [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658
[04:15:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658 (owner: 10Krinkle)
[04:15:40] <wikibugs>	 (03PS4) 10Krinkle: [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658
[04:16:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658 (owner: 10Krinkle)
[04:21:47] <icinga-wm>	 PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:22:58] <wikibugs>	 (03PS3) 10Krinkle: Drop HHVM XHProf and Arclamp code, no longer called [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542185 (https://phabricator.wikimedia.org/T235142) (owner: 10Jforrester)
[04:23:09] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "Updated to retain some of the comments only present in the original copy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542185 (https://phabricator.wikimedia.org/T235142) (owner: 10Jforrester)
[04:24:24] <wikibugs>	 (03PS4) 10Krinkle: Drop HHVM XHProf and Arclamp code, no longer called [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542185 (https://phabricator.wikimedia.org/T235142) (owner: 10Jforrester)
[04:25:45] <wikibugs>	 (03CR) 10Krinkle: "While the code only conditionally does something, it is included from PhpAutoPrepend on all web requests. Splitting up so that there is no" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542185 (https://phabricator.wikimedia.org/T235142) (owner: 10Jforrester)
[04:26:30] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] Drop HHVM XHProf and Arclamp code, no longer called [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542185 (https://phabricator.wikimedia.org/T235142) (owner: 10Jforrester)
[04:27:18] <wikibugs>	 (03Merged) 10jenkins-bot: Drop HHVM XHProf and Arclamp code, no longer called [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542185 (https://phabricator.wikimedia.org/T235142) (owner: 10Jforrester)
[04:29:13] * Krinkle staging X-Wikimedia-Debug fix on mwdebug1002
[04:37:30] <logmsgbot>	 !log krinkle@deploy1001 Synchronized wmf-config/profiler.php: 29d846938c898dd (duration: 00m 57s)
[04:37:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:38:30] <wikibugs>	 (03PS5) 10Krinkle: [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658
[04:38:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658 (owner: 10Krinkle)
[04:39:37] <wikibugs>	 (03PS6) 10Krinkle: [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658
[04:40:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658 (owner: 10Krinkle)
[05:08:13] <wikibugs>	 (03PS7) 10Krinkle: [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658
[05:47:45] <wikibugs>	 (03PS1) 10Ammarpad: Add custom Minerva wordmark for Hebrew wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542660 (https://phabricator.wikimedia.org/T234278)
[05:50:42] <wikibugs>	 (03CR) 10Ammarpad: "I don't know how these width and height figures are calculated, so I use the generic wikivoyage values as default" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542660 (https://phabricator.wikimedia.org/T234278) (owner: 10Ammarpad)
[06:24:18] <wikibugs>	 (03CR) 10Masumrezarock100: [C: 03+1] "I just hope its dimension is correct. Rest of the code looks OK to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542660 (https://phabricator.wikimedia.org/T234278) (owner: 10Ammarpad)
[08:57:33] <icinga-wm>	 RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:00:43] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[09:11:21] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[10:23:14] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10decommission: Decommission iron - https://phabricator.wikimedia.org/T220505 (10Peachey88)
[11:41:29] <icinga-wm>	 PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[11:45:47] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10mobrovac) We have to resolve the same problem here to the one we encountered in Beta. Namely, both php-fpm and parsoid services use port 8000 to listen to incoming reque...
[11:52:07] <icinga-wm>	 RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[14:18:24] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) @mobrovac Yes, i agree. Making 2 new LVS and DNS services, one parsoid-php and one parsoid-js and then switching first from old parsoid to parsoid-js seems like t...
[15:08:31] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to 'analytics-privatedata-users' and 'researchers' for Jerrie Kumalah - https://phabricator.wikimedia.org/T234433 (10Aklapper) @jkumalah: Please use `ssh -v`, `ssh -vv`, or `ssh -vvv` to get verbose debug output to potentially identify the underlying issue.
[15:15:27] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10Aklapper) @BMueller: Could you answer the last question, please?
[15:32:27] <icinga-wm>	 PROBLEM - Logstash rate of ingestion percent change compared to yesterday on icinga1001 is CRITICAL: 220.6 ge 210 https://phabricator.wikimedia.org/T202307 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen
[15:39:00] <cdanis>	 that's interesting
[15:39:19] <cdanis>	 ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace
[15:45:51] <wikibugs>	 10Operations, 10Wikimedia-production-error: ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace - https://phabricator.wikimedia.org/T235334 (10CDanis)
[15:50:43] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-API, 10Wikimedia-production-error: ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace - https://phabricator.wikimedia.org/T235334 (10Reedy)
[15:52:09] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-API, 10Wikimedia-production-error: ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace - https://phabricator.wikimedia.org/T235334 (10Reedy)
[15:54:23] <cdanis>	 ty Reedy
[15:54:36] <Reedy>	 I'm guessing this is mostly someone has started hammering a query that's broken
[15:54:56] <cdanis>	 yeah that was my impression as well; at a quick glance a lot of the same URLs were repeated
[15:55:06] <cdanis>	 the interval also suggests that
[15:58:11] <Reedy>	 cdanis: I blame https://github.com/wikimedia/mediawiki/commit/a8525d7201dada88f3117142fe1919d0b9b80d4e
[15:59:05] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-API, 10Wikimedia-production-error: ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace - https://phabricator.wikimedia.org/T235334 (10Reedy) I think https://g...
[15:59:38] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-API, 10MW-1.34-release, 10Wikimedia-production-error: ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace - https://phabricator.wikimedia.org/T235334 (10Reed...
[15:59:47] <Reedy>	 cdanis: Question is whether it's spammy enough to backport a revert of that to shut it up
[16:00:14] <Reedy>	 240K errors in 4 hours suggests maybe so
[16:00:27] <cdanis>	 it's worse than that, it's 240K errors in about 1.5 hours
[16:00:36] <Reedy>	 ah, ok
[16:00:38] <Reedy>	 Let's do that
[16:02:04] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-API, 10MW-1.34-release, and 2 others: ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace - https://phabricator.wikimedia.org/T235334 (10Reedy) Reverting out...
[16:06:01] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10Scap, 10Wikimedia-General-or-Unknown: "Currently active MediaWiki versions:" broken on noc/conf - https://phabricator.wikimedia.org/T235338 (10Reedy) Current implementation:  `lang=html <p>Currently active MediaWiki versions: <?php         echo str_replace( ' ',...
[16:09:44] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-API, 10MW-1.34-release, and 2 others: ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace - https://phabricator.wikimedia.org/T235334 (10Umherirrender) Alread...
[16:15:36] <logmsgbot>	 !log reedy@deploy1001 Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBacklinksprop.php: T235334 (duration: 00m 56s)
[16:15:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:43] <stashbot>	 T235334: ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace - https://phabricator.wikimedia.org/T235334
[16:16:42] <logmsgbot>	 !log reedy@deploy1001 Synchronized php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: T235334 (duration: 00m 51s)
[16:16:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:36] <Reedy>	 cdanis: That should shut it up
[16:43:58] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] Document Apache gzip sidestepping [puppet] - 10https://gerrit.wikimedia.org/r/539842 (https://phabricator.wikimedia.org/T232615) (owner: 10Gilles)
[16:44:34] <wikibugs>	 10Operations, 10observability, 10serviceops, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10Krinkle)
[16:47:24] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-API, 10MW-1.34-release, and 2 others: ErrorException from line 596 of /srv/mediawiki/php-1.35.0-wmf.1/includes/api/ApiQueryBase.php: PHP Notice: Undefined property: stdClass::$page_namespace - https://phabricator.wikimedia.org/T235334 (10Krinkle)
[16:49:37] <wikibugs>	 10Operations, 10Commons, 10Multimedia, 10media-storage, 10User-Josve05a: Specific revisions of multiple files missing from Swift - 404 Not Found returned - https://phabricator.wikimedia.org/T124101 (10AlexisJazz) https://commons.wikimedia.org/wiki/File:Busan_tower_by_night.jpg First four revisions missing!
[16:50:33] <wikibugs>	 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Krinkle)
[16:53:19] <wikibugs>	 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Krinkle) Removed a few unrelated tasks from the tree that need to happen after this, but aren't part of the sam...
[16:55:54] <wikibugs>	 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Krinkle)
[16:59:50] <wikibugs>	 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Krinkle)
[16:59:59] <wikibugs>	 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Krinkle)
[17:33:03] <icinga-wm>	 RECOVERY - Logstash rate of ingestion percent change compared to yesterday on icinga1001 is OK: (C)210 ge (W)150 ge 104.2 https://phabricator.wikimedia.org/T202307 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen
[18:00:10] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): "Uh, allow me to question this:" [dns] - 10https://gerrit.wikimedia.org/r/521966 (owner: 10Jforrester)
[18:27:37] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 #page on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[18:29:31] * godog shakes fist at zotero
[18:29:58] <godog>	 I'm taking a look
[18:30:51] <_joe_>	 godog: I would bet it's the usual memleak
[18:31:03] <XioNoX>	 on my phone far from my laptop, let me know if there is anything network related
[18:31:34] <_joe_>	 godog: you would need to go kill the pods
[18:31:54] <godog>	 _joe_: I wouldn't be suprised if it was memleak indeed, what's the easiest way to confirm/deny ?
[18:32:11] <godog>	 and/or kill the pods ?
[18:32:25] <_joe_>	 for killing the pods https://wikitech.wikimedia.org/wiki/Mathoid#Restarting_all_pods_for_the_service
[18:32:37] <_joe_>	 just use zotero instead of mathoid
[18:33:29] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[18:33:32] <_joe_>	 lemme et to a computer
[18:33:40] <godog>	 found https://grafana.wikimedia.org/d/000000620/xxxx-zotero-debugging-kubernetes?orgId=1&from=now-3h&to=now so indeed seems memory
[18:33:47] <godog>	 I'll kill the pods now, thanks _joe_ 
[18:33:59] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 #page on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 138 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[18:34:09] <_joe_>	 uh that was fast :P
[18:34:16] <godog>	 or not? didn't touch anything :(
[18:34:58] <_joe_>	 https://grafana.wikimedia.org/d/000000620/xxxx-zotero-debugging-kubernetes?orgId=1&from=now-1h&to=now
[18:35:01] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[18:35:07] <_joe_>	 yeah just got a less hogged zotero pod
[18:35:15] <_joe_>	 ok lemme see what's going on
[18:36:13] <godog>	 _joe_: I'm going to hold off then for now
[18:36:24] <_joe_>	 godog: what I am doing is looking at grafana
[18:36:34] <_joe_>	 seeing which pods are high on memory usage
[18:36:38] <_joe_>	 and go to delete them only
[18:37:53] <godog>	 ok
[18:38:05] <_joe_>	 !log deleting zotero pods with excessive memory usage in eqiad
[18:38:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:42:37] <_joe_>	 ok crisis averted I think
[18:42:44] <_joe_>	 I ended up deleting a few pods
[18:43:29] <_joe_>	 interestingly the pods that went crazy were the only ones who were never restarted
[18:45:03] <godog>	 hah! restarted automatically by k8s ?
[18:46:33] <godog>	 looks like we're back and can go afk again
[18:49:07] <_joe_>	 yes
[18:49:26] <_joe_>	 the way to restart pods in a Deployment is to delete the running ones
[18:49:34] <_joe_>	 k8s just starts new ones
[18:50:43] <onimisionipe>	 Or a rollout restart command. Not sure if it's supported in our version of k8s
[18:56:02] <_joe_>	 I don't think that's supported by any current version. If you want a rolling restart the best way is to bump your deployment version and run helm
[18:56:23] <_joe_>	 a rolling restart can be easily created with some bash / awk.
[18:57:51] <cdanis>	 two questions: 1) if some pods were okay, why did the service address fail?  is the liveness check not working?  2) should zotero page?
[18:58:38] <onimisionipe>	 It's supported in 1.16
[19:00:26] <onimisionipe>	 With kubectl rollout restart deployment <deployment> -n <namespace>
[19:02:59] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is CRITICAL: 30.41 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[19:04:23] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 55.81 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[19:06:13] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is OK: (C)60 le (W)70 le 99.47 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[19:07:35] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: (C)60 le (W)70 le 83.47 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[19:17:08] <_joe_>	 cdanis: 1) it's a known problem that the liveness check is not effective, and there is IIRC a request to upstream for a non-POST url 2) definitely, it breaks functionality
[20:12:48] <wikibugs>	 (03CR) 10Masumrezarock100: [C: 03+1] "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542660 (https://phabricator.wikimedia.org/T234278) (owner: 10Ammarpad)
[20:57:03] <Urbanecm>	 !log Reset user email of User:Gardini (T235318)
[20:57:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:57:59] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Traffic, and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10Bmueller) Hey @CDanis, sorry that I missed your question! (thanks for the ping, @AKlapper :-)  >>! In T2260...
[20:58:03] <icinga-wm>	 PROBLEM - ElasticSearch unassigned shard check - 9243 on search.svc.eqiad.wmnet is CRITICAL: CRITICAL - enwiki_content_1546970425[3](2019-10-09T14:42:44.498Z) https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:07:25] <logmsgbot>	 !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.1/includes/resourceloader/ResourceLoaderStartUpModule.php: 8c6baeae2 (duration: 00m 53s)
[21:07:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:30] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "Keeping as redirect, like we do for the root m.wikipedia.org, seems preferred." [dns] - 10https://gerrit.wikimedia.org/r/521966 (owner: 10Jforrester)
[21:57:49] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.192.48.122:9042 on restbase2017 is CRITICAL: connect to address 10.192.48.122 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[21:57:51] <icinga-wm>	 PROBLEM - cassandra-b service on restbase2017 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[21:58:47] <icinga-wm>	 PROBLEM - Check systemd state on restbase2017 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:58:59] <icinga-wm>	 PROBLEM - cassandra-b SSL 10.192.48.122:7001 on restbase2017 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://phabricator.wikimedia.org/T120662
[22:10:13] <icinga-wm>	 RECOVERY - Check systemd state on restbase2017 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:10:55] <icinga-wm>	 RECOVERY - cassandra-b service on restbase2017 is OK: OK - cassandra-b is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[22:11:59] <icinga-wm>	 RECOVERY - cassandra-b SSL 10.192.48.122:7001 on restbase2017 is OK: SSL OK - Certificate restbase2017-b valid until 2020-11-29 09:26:18 +0000 (expires in 413 days) https://phabricator.wikimedia.org/T120662
[22:12:27] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.192.48.122:9042 on restbase2017 is OK: TCP OK - 0.036 second response time on 10.192.48.122 port 9042 https://phabricator.wikimedia.org/T93886
[22:15:47] <wikibugs>	 10Operations, 10media-storage, 10serviceops, 10Patch-For-Review, 10User-jijiki: Swift object servers become briefly unresponsive on a regular basis - https://phabricator.wikimedia.org/T226373 (10CDanis) Is this still an issue?
[23:11:40] <wikibugs>	 (03PS1) 10Krinkle: profiler.php: Remove trigger_error call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542732 (https://phabricator.wikimedia.org/T231564)
[23:14:33] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] profiler.php: Remove trigger_error call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542732 (https://phabricator.wikimedia.org/T231564) (owner: 10Krinkle)
[23:15:20] <wikibugs>	 (03Merged) 10jenkins-bot: profiler.php: Remove trigger_error call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542732 (https://phabricator.wikimedia.org/T231564) (owner: 10Krinkle)
[23:17:02] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar), and 2 others: PHP Fatal error:  The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp on mwdebug1002) - https://phabricator.wikimedia.org/T214734 (10Krinkle)
[23:19:06] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar), and 2 others: PHP Fatal error:  The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp on mwdebug1002) - https://phabricator.wikimedia.org/T214734 (10Krinkle) Tagging RelEng for awareness as this means whe...
[23:19:24] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar), and 2 others: PHP Fatal error:  The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp on mwdebug1002) - https://phabricator.wikimedia.org/T214734 (10Krinkle) p:05Low→03Normal
[23:21:02] <logmsgbot>	 !log krinkle@deploy1001 Synchronized wmf-config/profiler.php: bfa8bb69c1f, T231564 (duration: 00m 51s)
[23:21:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:21:06] <stashbot>	 T231564: profiler.php: PHP Notice: RedisException: Connection timed out - https://phabricator.wikimedia.org/T231564