[00:28:01] (03CR) 10AndyRussG: [C: 03+1] "lgtm!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/527183 (https://phabricator.wikimedia.org/T225261) (owner: 10Ejegg) [01:10:38] 10Operations, 10Core Platform Team Workboards (Green): Install wrk and lua-cjson packages on deploy1001 - https://phabricator.wikimedia.org/T230178 (10Eevans) [01:11:29] mutante: are you still around? If so, I'm wondering if you could help with T230178 [01:11:30] T230178: Install wrk and lua-cjson packages on deploy1001 - https://phabricator.wikimedia.org/T230178 [01:11:50] 10Operations, 10Core Platform Team Workboards (Green): Install wrk and lua-cjson packages on deploy1001 - https://phabricator.wikimedia.org/T230178 (10Eevans) p:05Triage→03Normal [01:41:53] urandom: i was out but taking a look now [01:42:08] i know what it is about , yea [01:46:03] mutante: _joe_ pinged you, I guess? [01:46:17] yes, he mentioned it in our meeting [01:47:33] urandom: are we going to keep it permanently or remove it again? [01:48:00] mutante: well, I don't know [01:48:25] I see siege is already installed [01:48:26] this is like ehm.. sessionstorage::testing_tools [01:48:33] ok [01:48:39] on it [01:48:40] so I guess there is some precedent [01:49:03] I mean, I think if the host were reimaged and it didn't make it back on, that wouldn't be a huge problem [01:49:08] if that's what you mean [01:49:16] oh, dont worry, i am doing it in code [01:49:26] it was just about where to put it [01:49:41] is siege done in code? [01:49:46] * urandom didn't even check [01:50:08] it looks like not..meh [01:50:09] seems no [01:50:17] ok, making patch [01:52:04] mutante: if this is the place we test services running in k8s staging, then maybe it shouldn't be associated with sessionstore [01:52:29] urandom: yea, i am putting it in deployment_server context [01:52:39] but i say they are testing tools for sessionstorage [01:52:41] the deploy host seems questionable for such a thing, but no one had a better alternative [01:52:46] k [01:53:07] that was my question in the meeting as well [01:53:18] is it not the mwmaint, but no, we are doing deploy [01:53:50] assuming that the way you connect to your staging service is via a port-forward ala kubectl, it's really the only choice from the sound of it [01:53:58] the only choice for mere mortals such as myself [01:54:19] yea, i dont have a better alternative either [01:54:33] at least it's not fenari the bastion host anymore [01:58:57] 10Operations, 10Core Platform Team Workboards (Green): Install wrk and lua-cjson packages on deploy1001 - https://phabricator.wikimedia.org/T230178 (10Dzahn) a:05Eevans→03Dzahn [01:59:05] (03PS1) 10Dzahn: deployment_server: install benchmarking tools, wrk, siege [puppet] - 10https://gerrit.wikimedia.org/r/529193 (https://phabricator.wikimedia.org/T230178) [01:59:39] (03CR) 10jerkins-bot: [V: 04-1] deployment_server: install benchmarking tools, wrk, siege [puppet] - 10https://gerrit.wikimedia.org/r/529193 (https://phabricator.wikimedia.org/T230178) (owner: 10Dzahn) [02:00:37] (03PS2) 10Dzahn: deployment_server: install benchmarking tools, wrk, siege [puppet] - 10https://gerrit.wikimedia.org/r/529193 (https://phabricator.wikimedia.org/T230178) [02:00:50] (03PS3) 10Dzahn: deployment_server: install benchmarking tools, wrk, siege [puppet] - 10https://gerrit.wikimedia.org/r/529193 (https://phabricator.wikimedia.org/T230178) [02:01:46] (03CR) 10Eevans: [C: 03+1] deployment_server: install benchmarking tools, wrk, siege [puppet] - 10https://gerrit.wikimedia.org/r/529193 (https://phabricator.wikimedia.org/T230178) (owner: 10Dzahn) [02:03:25] (03PS4) 10Dzahn: deployment_server: install benchmarking tools, wrk, siege [puppet] - 10https://gerrit.wikimedia.org/r/529193 (https://phabricator.wikimedia.org/T230178) [02:04:38] (03CR) 10Dzahn: [C: 03+2] deployment_server: install benchmarking tools, wrk, siege [puppet] - 10https://gerrit.wikimedia.org/r/529193 (https://phabricator.wikimedia.org/T230178) (owner: 10Dzahn) [02:05:07] ha, I looked right at 'lua-csjon' and didn't spot the typo [02:05:25] i saw it but by that time i had used the online editor to change commit message [02:05:45] and doing git-review took time [02:06:05] so then i switched back to old Gerrit UI because there i have the web editor and in the new one i don't . heh [02:07:19] Notice: /Stage[main]/Packages::Wrk/Package[wrk]/ensure: created [02:07:24] Notice: /Stage[main]/Packages::Lua_cjson/Package[lua-cjson]/ensure: created [02:08:15] https://www.irccloud.com/pastebin/DNq7Gnzg/ [02:08:16] urandom: btw it's not just deploy1001, you also have deploy2001 automatically, role based [02:08:27] mutante: yup; that's great, thanks! [02:08:30] in case you want to test from different locations [02:09:43] 10Operations, 10Core Platform Team Workboards (Green), 10Patch-For-Review: Install wrk and lua-cjson packages on deploy1001 - https://phabricator.wikimedia.org/T230178 (10Dzahn) 05Open→03Resolved on deploy1001 and deploy2001: Notice: /Stage[main]/Packages::Wrk/Package[wrk]/ensure: created Notice: /Stage... [02:10:37] 10Operations, 10Core Platform Team Workboards (Green): Install wrk, siege and lua-cjson packages on deploy1001 - https://phabricator.wikimedia.org/T230178 (10Dzahn) [02:10:54] 10Operations, 10Security-Team: jalexander should be removed from security@ as his emails are bouncing - https://phabricator.wikimedia.org/T212621 (10Aklapper) @chasemp: I'm also wondering, looking at https://phabricator.wikimedia.org/project/?member=PHID-USER-f6l3bchko2ogbi4poj24&status=active#R [02:17:45] !log mwmaint1002 - manually running purge_abusefilter maintenance cron [02:17:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:24:26] !log mwmaint1002 - manually running purge_expired_userrights maintenance cron to confirm no issues with PHP 7.2 in maintenance/purgeExpiredUserrights.php [02:24:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:29:35] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 34430224 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:30:24] !log mwmaint1002 - manually running cleanup_upload_stash maintenance cron to confirm no issues with PHP 7.2 in maintenance/cleanupUploadStash.php [02:30:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:40:59] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 255216 and 74 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:43:08] 10Operations, 10cloud-services-team, 10serviceops, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [03:47:03] (03CR) 10Mholloway: [C: 04-2] "> > Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526541 (https://phabricator.wikimedia.org/T227348) (owner: 10Mholloway) [04:35:36] (03PS12) 10Vgutierrez: Backport required OCSP commits [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/528716 (https://phabricator.wikimedia.org/T220383) [04:35:38] (03PS3) 10Vgutierrez: Backport commits required to report SSL stats to an origin server [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/528984 (https://phabricator.wikimedia.org/T228135) [04:40:20] (03CR) 10Vgutierrez: [C: 03+2] Backport required OCSP commits [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/528716 (https://phabricator.wikimedia.org/T220383) (owner: 10Vgutierrez) [04:40:29] (03CR) 10Vgutierrez: [C: 03+2] Backport commits required to report SSL stats to an origin server (031 comment) [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/528984 (https://phabricator.wikimedia.org/T228135) (owner: 10Vgutierrez) [05:11:32] (03PS2) 10Subramanya Sastry: Make scandium a read-only appserver + enable exception logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529173 (https://phabricator.wikimedia.org/T228069) [05:32:34] (03PS4) 10Marostegui: maintain-views.yaml: Remove math table [puppet] - 10https://gerrit.wikimedia.org/r/528724 (https://phabricator.wikimedia.org/T196055) [05:33:53] (03CR) 10Marostegui: [C: 03+2] maintain-views.yaml: Remove math table [puppet] - 10https://gerrit.wikimedia.org/r/528724 (https://phabricator.wikimedia.org/T196055) (owner: 10Marostegui) [05:37:15] !log Run maintain-views script with --clean to clean up math table views - T196055 [05:37:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:37:24] T196055: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 [05:47:01] 10Operations, 10Maps, 10SRE-Access-Requests: Remove MaxSem from map servers - https://phabricator.wikimedia.org/T230183 (10MaxSem) [05:48:20] (03PS1) 10MaxSem: Remove me (maxsem) from maps groups [puppet] - 10https://gerrit.wikimedia.org/r/529200 (https://phabricator.wikimedia.org/T230183) [06:04:54] (03PS1) 10Marostegui: mariadb: Decommission db2069 [puppet] - 10https://gerrit.wikimedia.org/r/529201 (https://phabricator.wikimedia.org/T230107) [06:06:18] (03PS2) 10Vgutierrez: x509: Expose the OCSP URI of a Certificate as a property [software/acme-chief] - 10https://gerrit.wikimedia.org/r/516604 (https://phabricator.wikimedia.org/T219765) [06:06:20] (03PS1) 10Vgutierrez: ocsp: Provide basic functionality to perform OCSP requests [software/acme-chief] - 10https://gerrit.wikimedia.org/r/529202 (https://phabricator.wikimedia.org/T219765) [06:09:04] (03CR) 10jerkins-bot: [V: 04-1] ocsp: Provide basic functionality to perform OCSP requests [software/acme-chief] - 10https://gerrit.wikimedia.org/r/529202 (https://phabricator.wikimedia.org/T219765) (owner: 10Vgutierrez) [06:10:50] ACKNOWLEDGEMENT - Host thumbor2004 is DOWN: PING CRITICAL - Packet loss = 100% Effie Mouzeli Investigating [06:12:00] (03PS2) 10Vgutierrez: ocsp: Provide basic functionality to perform OCSP requests [software/acme-chief] - 10https://gerrit.wikimedia.org/r/529202 (https://phabricator.wikimedia.org/T219765) [06:12:16] jijiki: o/ [06:12:32] I am in serial console now, do you want me to log off? [06:12:38] (03PS14) 10Effie Mouzeli: profile::mediawiki default php to php7 [puppet] - 10https://gerrit.wikimedia.org/r/425027 (https://phabricator.wikimedia.org/T195392) (owner: 10Giuseppe Lavagetto) [06:12:54] the host seems frozen, some errors in racadm getsel [06:13:05] I can reboot it if you want [06:13:17] (I thought it was going to be decommed at first) [06:19:24] !log powercycle thumbor2004 (no ssh, serial console showing a fronzen os) [06:19:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:17] RECOVERY - Host thumbor2004 is UP: PING OK - Packet loss = 0%, RTA = 36.14 ms [06:21:49] (03PS8) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [06:23:52] the only thing that I can see is [06:23:54] Description: CPU 1 machine check error detected. [06:24:02] at around 20:53 UTC yesterday [06:29:51] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [06:30:55] PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [06:33:03] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [06:34:35] seems due to a hhvm slowness? [06:34:36] https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&panelId=21&fullscreen&from=1565329469427&to=1565332505459 [06:35:38] (saying that after checking logstash) [06:36:37] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/529200 (https://phabricator.wikimedia.org/T230183) (owner: 10MaxSem) [06:44:56] (03CR) 10Elukey: [C: 03+2] profile::kerberos::kdc: add daily backup for the KDC database [puppet] - 10https://gerrit.wikimedia.org/r/528775 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [06:45:04] (03PS3) 10Elukey: profile::kerberos::kdc: add daily backup for the KDC database [puppet] - 10https://gerrit.wikimedia.org/r/528775 (https://phabricator.wikimedia.org/T226089) [06:45:47] (03CR) 10Elukey: "Wrong one, waiting for review :)" [puppet] - 10https://gerrit.wikimedia.org/r/528775 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [06:46:48] elukey: I will powecycle it once more and then open a task to either check or decom it [06:46:53] thank you! [06:47:33] jijiki: why another powercycle? :D [06:47:37] (curious) [06:48:24] sigh sorry [06:48:48] I will wait a bit more and then open a task [06:48:56] I am not awake yet :p [06:50:09] PROBLEM - High CPU load on API appserver on mw1342 is CRITICAL: CRITICAL - load average: 74.33, 38.26, 25.09 https://wikitech.wikimedia.org/wiki/Application_servers [06:50:25] it's chances are thing, CPU 1 machine check error detected. [06:50:28] thin* [06:53:21] RECOVERY - High CPU load on API appserver on mw1342 is OK: OK - load average: 35.11, 36.95, 27.18 https://wikitech.wikimedia.org/wiki/Application_servers [06:57:12] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, those dumps could just as well also end up in Bacula, though? The size is tiny after all." [puppet] - 10https://gerrit.wikimedia.org/r/528775 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [06:57:45] (03CR) 10Ema: [C: 03+2] ATS: use TLS and discovery hostname for bromine [puppet] - 10https://gerrit.wikimedia.org/r/529035 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [06:57:52] (03PS4) 10Ema: ATS: use TLS and discovery hostname for bromine [puppet] - 10https://gerrit.wikimedia.org/r/529035 (https://phabricator.wikimedia.org/T210411) [06:58:53] RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:06:00] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10ema) [07:08:28] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10ema) We have added TLS termination to bromine/vega with `profile::tlsproxy::envoy`. In the upcoming days I'll use the profile to add termination to all remaining services.... [07:10:47] (03CR) 10Elukey: "> Looks good, those dumps could just as well also end up in Bacula," [puppet] - 10https://gerrit.wikimedia.org/r/528775 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [07:10:48] (03PS4) 10Elukey: profile::kerberos::kdc: add daily backup for the KDC database [puppet] - 10https://gerrit.wikimedia.org/r/528775 (https://phabricator.wikimedia.org/T226089) [07:15:13] (03CR) 10Muehlenhoff: [C: 03+1] "Ack, let's merge this and then we can check in a followup with Alex and Jaime whether they agree with also injecting this to Bacula." [puppet] - 10https://gerrit.wikimedia.org/r/528775 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [07:15:32] (03PS2) 10Muehlenhoff: Remove me (maxsem) from maps groups [puppet] - 10https://gerrit.wikimedia.org/r/529200 (https://phabricator.wikimedia.org/T230183) (owner: 10MaxSem) [07:15:43] (03CR) 10Elukey: [C: 03+2] profile::kerberos::kdc: add daily backup for the KDC database [puppet] - 10https://gerrit.wikimedia.org/r/528775 (https://phabricator.wikimedia.org/T226089) (owner: 10Elukey) [07:17:10] (03PS3) 10Muehlenhoff: Remove me (maxsem) from maps groups [puppet] - 10https://gerrit.wikimedia.org/r/529200 (https://phabricator.wikimedia.org/T230183) (owner: 10MaxSem) [07:18:49] (03CR) 10Muehlenhoff: [C: 03+2] Remove me (maxsem) from maps groups [puppet] - 10https://gerrit.wikimedia.org/r/529200 (https://phabricator.wikimedia.org/T230183) (owner: 10MaxSem) [07:19:28] Thanks, moritzm :) [07:20:19] (03CR) 10Muehlenhoff: [C: 03+1] "Great, +1 then :-)" [puppet] - 10https://gerrit.wikimedia.org/r/528719 (https://phabricator.wikimedia.org/T229262) (owner: 10Filippo Giunchedi) [07:21:28] (03CR) 10Muehlenhoff: [C: 03+1] admin: add Jaime Anstee to ldap only users [puppet] - 10https://gerrit.wikimedia.org/r/529184 (https://phabricator.wikimedia.org/T229959) (owner: 10Cwhite) [07:22:35] (03PS2) 10Marostegui: mariadb: Decommission db2069 [puppet] - 10https://gerrit.wikimedia.org/r/529201 (https://phabricator.wikimedia.org/T230107) [07:23:57] yw :-) [07:24:33] (03PS1) 10Ema: Add discovery CNAME phabricator -> phab1003 [dns] - 10https://gerrit.wikimedia.org/r/529306 (https://phabricator.wikimedia.org/T210411) [07:25:50] (03PS1) 10Elukey: profile::kerberos::kdc: use absolute path in timer command [puppet] - 10https://gerrit.wikimedia.org/r/529307 [07:27:47] PROBLEM - puppet last run on kerberos1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[delete-old-backups-kdc-database.timer] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:27:48] (03CR) 10Elukey: [C: 03+2] profile::kerberos::kdc: use absolute path in timer command [puppet] - 10https://gerrit.wikimedia.org/r/529307 (owner: 10Elukey) [07:31:45] !log uploaded trafficserver-8.0.3wm3 to apt.wikimedia.org (stretch) - T220383 T228135 [07:31:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:31:55] T228135: ATS lacks the possibility of reporting SSL stats to an origin server via HTTP Headers - https://phabricator.wikimedia.org/T228135 [07:31:56] T220383: Evaluate ATS TLS stack - https://phabricator.wikimedia.org/T220383 [07:33:23] RECOVERY - puppet last run on kerberos1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:35:24] (03PS1) 10Ema: phabricator.discovery.wmnet: add certificate [puppet] - 10https://gerrit.wikimedia.org/r/529308 (https://phabricator.wikimedia.org/T210411) [07:35:58] (03PS1) 10Ema: Add TLS termination for phabricator.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/529309 (https://phabricator.wikimedia.org/T210411) [07:36:19] (03CR) 10Vgutierrez: [C: 03+2] fifo_log_demux: Remove socket activation [puppet] - 10https://gerrit.wikimedia.org/r/527075 (owner: 10Vgutierrez) [07:36:45] (03PS1) 10Elukey: profile::kerberos::kdc: fix source content of file [puppet] - 10https://gerrit.wikimedia.org/r/529310 [07:37:17] rebase issues.. sigh :_) [07:37:27] PROBLEM - Check systemd state on kerberos1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:38:28] (03PS1) 10Ema: secret: dummy key for phabricator [labs/private] - 10https://gerrit.wikimedia.org/r/529311 (https://phabricator.wikimedia.org/T210411) [07:40:56] (03PS2) 10Vgutierrez: fifo_log_demux: Remove socket activation [puppet] - 10https://gerrit.wikimedia.org/r/527075 [07:41:17] (03CR) 10Elukey: [C: 03+2] profile::kerberos::kdc: fix source content of file [puppet] - 10https://gerrit.wikimedia.org/r/529310 (owner: 10Elukey) [07:42:15] (03CR) 10Ema: [V: 03+2 C: 03+2] secret: dummy key for phabricator [labs/private] - 10https://gerrit.wikimedia.org/r/529311 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [07:42:36] (03PS2) 10Ema: phabricator.discovery.wmnet: add certificate [puppet] - 10https://gerrit.wikimedia.org/r/529308 (https://phabricator.wikimedia.org/T210411) [07:43:19] (03CR) 10Ema: [C: 03+2] phabricator.discovery.wmnet: add certificate [puppet] - 10https://gerrit.wikimedia.org/r/529308 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [07:43:25] (03CR) 10Vgutierrez: [C: 03+2] fifo_log_demux: Remove socket activation [puppet] - 10https://gerrit.wikimedia.org/r/527075 (owner: 10Vgutierrez) [07:43:35] (03PS3) 10Vgutierrez: fifo_log_demux: Remove socket activation [puppet] - 10https://gerrit.wikimedia.org/r/527075 [07:45:10] (03CR) 10Ema: "pcc seems reasonable: https://puppet-compiler.wmflabs.org/compiler1001/17826/" [puppet] - 10https://gerrit.wikimedia.org/r/529309 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [07:50:03] (03PS1) 10Ema: ATS: add remap rule bugs.wikimedia.org -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/529312 (https://phabricator.wikimedia.org/T227432) [07:51:41] PROBLEM - puppet last run on kerberos1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/dump_kdc_database] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:52:48] (03PS2) 10Ema: ATS: add remap rule bugs.wikimedia.org -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/529312 (https://phabricator.wikimedia.org/T227432) [07:53:13] (03CR) 10jerkins-bot: [V: 04-1] ATS: add remap rule bugs.wikimedia.org -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/529312 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [07:54:51] (03PS1) 10Elukey: profile::kerberos::kdc: fix (again) source of script [puppet] - 10https://gerrit.wikimedia.org/r/529313 [08:01:30] (03CR) 10Elukey: [C: 03+2] profile::kerberos::kdc: fix (again) source of script [puppet] - 10https://gerrit.wikimedia.org/r/529313 (owner: 10Elukey) [08:04:26] (03CR) 10Ema: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/529312 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [08:04:59] (03PS9) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [08:06:11] RECOVERY - Check systemd state on kerberos1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:07:07] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Mwmaint1002, Stat1007 for Abijeet Patro - https://phabricator.wikimedia.org/T230020 (10Arrbee) >>! In T230020#5403373, @colewhite wrote: > @RStallman-legalteam Would you mind confirming NDA on file for Abijeet? > > @Arrbee From th... [08:07:09] (03PS1) 10Elukey: profile::kerberos::kdc: add 'dump' to krb5_util's script [puppet] - 10https://gerrit.wikimedia.org/r/529314 [08:07:37] RECOVERY - puppet last run on kerberos1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:10:15] (03CR) 10Elukey: [C: 03+2] profile::kerberos::kdc: add 'dump' to krb5_util's script [puppet] - 10https://gerrit.wikimedia.org/r/529314 (owner: 10Elukey) [08:15:29] (03CR) 10Gergő Tisza: [C: 03+1] "> Do the following patches in this series actually depend on this one?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526541 (https://phabricator.wikimedia.org/T227348) (owner: 10Mholloway) [08:15:31] (03PS3) 10Marostegui: mariadb: Decommission db2069 [puppet] - 10https://gerrit.wikimedia.org/r/529201 (https://phabricator.wikimedia.org/T230107) [08:16:27] (03CR) 10Marostegui: [C: 03+2] mariadb: Decommission db2069 [puppet] - 10https://gerrit.wikimedia.org/r/529201 (https://phabricator.wikimedia.org/T230107) (owner: 10Marostegui) [08:21:55] (03PS1) 10Marostegui: dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/529317 (https://phabricator.wikimedia.org/T196055) [08:24:06] (03CR) 10Marostegui: [C: 03+2] dbproxy1010: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/529317 (https://phabricator.wikimedia.org/T196055) (owner: 10Marostegui) [08:24:54] !log Reload haproxy on dbproxy1010 to depool labsdb1011 - T196055 [08:25:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:03] T196055: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 [08:29:11] 10Operations, 10ops-codfw, 10DBA, 10decommission, 10Patch-For-Review: Decommission db2069 - https://phabricator.wikimedia.org/T230107 (10Marostegui) [08:29:19] !log Remove db2069 from tendril and zarcillo T230107 [08:29:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:26] T230107: Decommission db2069 - https://phabricator.wikimedia.org/T230107 [08:30:26] (03PS3) 10Ema: ATS: add remap rule bugs.wikimedia.org -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/529312 (https://phabricator.wikimedia.org/T227432) [08:31:59] (03CR) 10Ema: [C: 03+2] ATS: add remap rule bugs.wikimedia.org -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/529312 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema) [08:32:36] !log Stop MySQL on db2069 T230107 [08:32:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:54] (03CR) 10Muehlenhoff: "This would work. However, policy-rc.d is still a rather brittle mechanism. It's from the dark ages where sysvinit was thing and was adapte" [puppet] - 10https://gerrit.wikimedia.org/r/529109 (owner: 10Ema) [08:36:37] 10Operations, 10ops-codfw, 10decommission: Decommission db2069 - https://phabricator.wikimedia.org/T230107 (10Marostegui) a:05Marostegui→03RobH [08:36:54] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2069 - https://phabricator.wikimedia.org/T230107 (10Marostegui) This host is ready for #dc-ops to decommission [08:38:04] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [08:38:23] RECOVERY - MegaRAID on helium is OK: OK: optimal, 1 logical, 12 physical https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [08:39:36] (03PS1) 10Ema: ATS: use TLS and discovery hostname for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/529318 (https://phabricator.wikimedia.org/T210411) [08:45:20] (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/529320 [08:51:46] !log upgrading ghostscript on thumbor1001 [08:51:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:46] (03PS1) 10Elukey: Add kerberos support to Analytics Spark Refine [puppet] - 10https://gerrit.wikimedia.org/r/529322 (https://phabricator.wikimedia.org/T226698) [08:55:05] (03CR) 10jerkins-bot: [V: 04-1] Add kerberos support to Analytics Spark Refine [puppet] - 10https://gerrit.wikimedia.org/r/529322 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [08:57:16] (03PS2) 10Marostegui: Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/529320 [08:58:21] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1010: Depool labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/529320 (owner: 10Marostegui) [08:58:52] !log Reload haproxy on dbproxy1010 to repool labsdb1011 - T196055 [08:58:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:00] T196055: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 [09:01:31] (03PS2) 10Elukey: Add kerberos support to Analytics Spark Refine [puppet] - 10https://gerrit.wikimedia.org/r/529322 (https://phabricator.wikimedia.org/T226698) [09:04:18] !log Drop math table from s4 - T196055 [09:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:26] T196055: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 [09:06:21] (03PS3) 10Elukey: Add kerberos support to Analytics Spark Refine [puppet] - 10https://gerrit.wikimedia.org/r/529322 (https://phabricator.wikimedia.org/T226698) [09:17:07] (03PS4) 10Elukey: Add kerberos support to Analytics Spark Refine [puppet] - 10https://gerrit.wikimedia.org/r/529322 (https://phabricator.wikimedia.org/T226698) [09:23:47] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/17832/" [puppet] - 10https://gerrit.wikimedia.org/r/529322 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [09:23:54] (03PS5) 10Elukey: Add kerberos support to Analytics Spark Refine [puppet] - 10https://gerrit.wikimedia.org/r/529322 (https://phabricator.wikimedia.org/T226698) [09:24:45] (03PS10) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [09:25:54] (03CR) 10jerkins-bot: [V: 04-1] ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [09:27:42] (03PS11) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [09:30:39] (03PS1) 10Elukey: Add spark config to the Hadoop Test cluster's yarn-site config [puppet] - 10https://gerrit.wikimedia.org/r/529324 (https://phabricator.wikimedia.org/T226698) [09:31:37] (03PS2) 10Elukey: Add spark config to the Hadoop Test cluster's yarn-site config [puppet] - 10https://gerrit.wikimedia.org/r/529324 (https://phabricator.wikimedia.org/T226698) [09:32:50] (03CR) 10Elukey: [C: 03+2] Add spark config to the Hadoop Test cluster's yarn-site config [puppet] - 10https://gerrit.wikimedia.org/r/529324 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [09:35:59] !log Drop math table from s7 T196055 [09:36:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:36:07] T196055: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 [09:41:59] (03PS1) 10Ema: systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 [09:42:54] (03Abandoned) 10Ema: Add class base::policy_rcd_not_allowed [puppet] - 10https://gerrit.wikimedia.org/r/529109 (owner: 10Ema) [09:55:25] (03PS2) 10Ema: systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 [10:02:09] 10Operations, 10netops: csw2-esams's VCP link flapped - https://phabricator.wikimedia.org/T229755 (10mark) EX4200 can also have any port converted as VC - just won't be as fast, max 10Gbps. [10:02:53] (03PS1) 10Marostegui: mariadb: Promote db1133 to m5 master [puppet] - 10https://gerrit.wikimedia.org/r/529331 (https://phabricator.wikimedia.org/T229657) [10:03:16] (03PS3) 10Ema: systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 [10:03:43] (03CR) 10Marostegui: [C: 04-2] "Wait for the failover day" [puppet] - 10https://gerrit.wikimedia.org/r/529331 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui) [10:05:15] (03PS12) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [10:05:17] (03PS1) 10Vgutierrez: ATS: Fix OCSP stapling configuration [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) [10:05:20] 10Operations, 10Math, 10Patch-For-Review: Remove unused configuration variables for Math Extension from codebase - https://phabricator.wikimedia.org/T228547 (10Physikerwelt) p:05Triage→03Low [10:10:01] (03PS1) 10Marostegui: wmnet: Promote db1133 to m5 master [dns] - 10https://gerrit.wikimedia.org/r/529333 (https://phabricator.wikimedia.org/T229657) [10:10:16] (03CR) 10Marostegui: [C: 04-2] "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler1002/17833/" [puppet] - 10https://gerrit.wikimedia.org/r/529331 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui) [10:11:09] (03CR) 10Marostegui: [C: 04-2] "Wait for the failover day" [dns] - 10https://gerrit.wikimedia.org/r/529333 (https://phabricator.wikimedia.org/T229657) (owner: 10Marostegui) [10:23:35] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10Marostegui) I have submitted the patches for review, I would appreciate if the #cloud-services-team folks can gi... [10:26:14] (03PS13) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [10:26:16] (03PS1) 10Vgutierrez: ATS: Allow writing OCSP responses in /etc/acmecerts [puppet] - 10https://gerrit.wikimedia.org/r/529335 [10:26:51] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/529328 (owner: 10Ema) [10:28:50] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/529184 (https://phabricator.wikimedia.org/T229959) (owner: 10Cwhite) [10:38:05] !log gerrit restart on cobalt. [10:38:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:00] (03PS14) 10Vgutierrez: ATS: Include TLS instance in cache upload role [puppet] - 10https://gerrit.wikimedia.org/r/513970 (https://phabricator.wikimedia.org/T221594) [10:44:45] PROBLEM - puppet last run on notebook1004 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config],Exec[git_pull_mediawiki/event-schemas] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:47:01] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:47:09] PROBLEM - puppet last run on webperf1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/software/xhgui] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:48:44] (03PS1) 10Marostegui: filtered_tables: Remove math table [puppet] - 10https://gerrit.wikimedia.org/r/529346 (https://phabricator.wikimedia.org/T196055) [11:09:17] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:12:41] RECOVERY - puppet last run on notebook1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:12:48] PROBLEM - LVS HTTP IPv4 #page on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.84 and port 8643: Connection refused https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [11:15:03] RECOVERY - puppet last run on webperf1002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:17:31] (03CR) 10Ema: ATS: Fix OCSP stapling configuration (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/529332 (https://phabricator.wikimedia.org/T221594) (owner: 10Vgutierrez) [11:18:14] (03CR) 10Ema: [C: 03+1] ATS: Allow writing OCSP responses in /etc/acmecerts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529335 (owner: 10Vgutierrez) [11:18:33] * gehel is looking at cloudelastic [11:21:48] (03PS4) 10Ema: systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 [11:50:50] !log root@puppetmaster2001:/srv/private# su -c "export GIT_SSH=/srv/private/.git/ssh_wrapper.sh ; git push ssh://puppetmaster1001.eqiad.wmnet/srv/private master" gitpuppet [11:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:29] (03PS3) 10Muehlenhoff: Initial stub role for the IDP (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/528487 [12:01:28] (03CR) 10jerkins-bot: [V: 04-1] Initial stub role for the IDP (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/528487 (owner: 10Muehlenhoff) [12:04:48] (03PS4) 10Muehlenhoff: Initial stub role for the IDP (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/528487 [12:52:55] (03PS5) 10Ema: systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 [13:09:10] (03PS1) 10Phamhi: Mark hpham as absent and added phamhi as per T230126 [puppet] - 10https://gerrit.wikimedia.org/r/529353 [13:10:24] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10aborrero) That plan sounds good. Remember that you may need to manually restart ferm in some places because we p... [13:12:27] (03PS6) 10Ema: systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 [13:12:53] (03CR) 10jerkins-bot: [V: 04-1] systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 (owner: 10Ema) [13:14:03] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10CDanis) > @Marostegui to pool db1133 with weight 0 on wikitech section via dbctl instance db1133 edit so it can... [13:14:17] (03PS7) 10Ema: systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 [13:15:21] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10Marostegui) >>! In T229657#5404778, @Marostegui wrote: > I have submitted the patches for review, I would apprec... [13:17:44] (03PS2) 10Phamhi: admin: fix and cleanup hpham/phamhi users [puppet] - 10https://gerrit.wikimedia.org/r/529353 (https://phabricator.wikimedia.org/T230126) [13:18:58] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10Marostegui) >>! In T229657#5405047, @CDanis wrote: >> @Marostegui to pool db1133 with weight 0 on wikitech secti... [13:20:23] (03CR) 10Mholloway: [C: 04-2] "> > Do the following patches in this series actually depend on this" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526541 (https://phabricator.wikimedia.org/T227348) (owner: 10Mholloway) [13:22:22] (03PS3) 10Phamhi: admin: fix and cleanup hpham/phamhi users [puppet] - 10https://gerrit.wikimedia.org/r/529353 (https://phabricator.wikimedia.org/T230126) [13:23:44] (03PS4) 10ArielGlenn: look at dumps logs every so often for exceptions and report them [puppet] - 10https://gerrit.wikimedia.org/r/528995 (https://phabricator.wikimedia.org/T230099) [13:25:04] (03CR) 10ArielGlenn: [C: 03+2] look at dumps logs every so often for exceptions and report them [puppet] - 10https://gerrit.wikimedia.org/r/528995 (https://phabricator.wikimedia.org/T230099) (owner: 10ArielGlenn) [13:30:14] (03PS4) 10Arturo Borrero Gonzalez: admin: fix and cleanup hpham/phamhi users [puppet] - 10https://gerrit.wikimedia.org/r/529353 (https://phabricator.wikimedia.org/T230126) (owner: 10Phamhi) [13:30:46] (03PS1) 10Elukey: Add Hadoop Native libraries path to spark2 default [puppet] - 10https://gerrit.wikimedia.org/r/529355 (https://phabricator.wikimedia.org/T226698) [13:31:34] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] admin: fix and cleanup hpham/phamhi users [puppet] - 10https://gerrit.wikimedia.org/r/529353 (https://phabricator.wikimedia.org/T230126) (owner: 10Phamhi) [13:31:56] (03PS1) 10ArielGlenn: dump exception checker uses python rather than bash [puppet] - 10https://gerrit.wikimedia.org/r/529356 (https://phabricator.wikimedia.org/T230099) [13:32:22] (03CR) 10jerkins-bot: [V: 04-1] dump exception checker uses python rather than bash [puppet] - 10https://gerrit.wikimedia.org/r/529356 (https://phabricator.wikimedia.org/T230099) (owner: 10ArielGlenn) [13:32:36] (03PS2) 10ArielGlenn: dump exception checker uses python rather than bash [puppet] - 10https://gerrit.wikimedia.org/r/529356 (https://phabricator.wikimedia.org/T230099) [13:33:11] (03CR) 10ArielGlenn: [C: 03+2] dump exception checker uses python rather than bash [puppet] - 10https://gerrit.wikimedia.org/r/529356 (https://phabricator.wikimedia.org/T230099) (owner: 10ArielGlenn) [13:33:57] I'm merging multiple puppet patches [13:34:06] (03PS1) 10Giuseppe Lavagetto: systemd::timer::job: fix the default logging paths [puppet] - 10https://gerrit.wikimedia.org/r/529358 [13:34:10] arturo: [13:34:20] (03PS2) 10Elukey: Add Hadoop Native libraries path to spark2 default [puppet] - 10https://gerrit.wikimedia.org/r/529355 (https://phabricator.wikimedia.org/T226698) [13:34:22] can merge your fix and cleanup hpham/phamhi users ? [13:34:29] apergos: wait, Hieu is merging it [13:34:38] (but yes, is safe to merge anyway) [13:34:44] apergos: try again in a few seconds please [13:34:51] waiting [13:35:29] (03PS2) 10CDanis: systemd::timer::job: fix the default logging paths [puppet] - 10https://gerrit.wikimedia.org/r/529358 (https://phabricator.wikimedia.org/T230127) (owner: 10Giuseppe Lavagetto) [13:36:09] apergos: you should be ready to go now. Thanks for your patience :-) [13:36:26] you merged mine I think [13:36:35] so there's nothing to be ready for now :-D [13:36:57] (03PS3) 10Elukey: Add Hadoop Native libraries path to spark2 default [puppet] - 10https://gerrit.wikimedia.org/r/529355 (https://phabricator.wikimedia.org/T226698) [13:37:49] apergos: that seems to be true. Here is the output https://www.irccloud.com/pastebin/Mo0IBlGw/ [13:38:01] merging both was unintended anyway [13:38:21] fix and cleanup hpham/phamhi users [13:38:24] woops [13:38:31] Merge these changes? (multiple/no)? multiple [13:38:37] that right there was the multiple merge [13:38:54] anyways my puppet run did what it needed to do, all good. thanks [13:39:30] sorry..I should have picked no [13:40:30] no problem [13:41:01] we were just doing your first puppet-merge, that's why I was trying to avoid some more complex cases of multi-patches, etc [13:41:02] :-) [13:41:19] hehe [13:41:42] (03CR) 10Elukey: [C: 03+2] Add Hadoop Native libraries path to spark2 default [puppet] - 10https://gerrit.wikimedia.org/r/529355 (https://phabricator.wikimedia.org/T226698) (owner: 10Elukey) [13:45:53] PROBLEM - puppet last run on bast1002 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): User[hpham],User[phamhi] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [13:46:08] ^^^ investigating that [13:46:31] that doesn't sound good [13:46:57] phamhi: please log out from any SSH connection to our servers [13:47:11] arturo: done [13:47:12] the problem is the users is holding some files in the bastion (ssh connections) [13:47:26] oh ok [13:47:27] (03PS3) 10Giuseppe Lavagetto: systemd::timer::job: fix the default logging paths [puppet] - 10https://gerrit.wikimedia.org/r/529358 [13:47:36] Error: /Stage[main]/Admin/Admin::Hashuser[hpham]/Admin::User[hpham]/User[hpham]/ensure: change from 'present' to 'absent' failed: Could not delete user hpham: Execution of '/usr/sbin/userdel hpham' returned 8: userdel: user hpham is currently used by process 7594 [13:47:51] ah ok.. I have logged out [13:48:00] cool [13:48:08] does it retry automatically? [13:48:52] I'm manually running the puppet agent in those servers [13:49:02] (03CR) 10CDanis: systemd::timer::job: fix the default logging paths (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529358 (owner: 10Giuseppe Lavagetto) [13:49:32] phamhi: it does, but only every half hour [13:49:53] cdanis: good to know..thanks [13:50:03] I've always wondered in the 30 mins thing is an upstream puppet thing or our own thing [13:50:12] it's our own [13:50:24] (03PS18) 10ArielGlenn: refactor wikidata entity dumps into wikibase + wikidata specific bits [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) [13:50:35] oh [13:51:01] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/17837/ lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/529358 (owner: 10Giuseppe Lavagetto) [13:51:29] RECOVERY - puppet last run on bast1002 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [13:51:29] (03CR) 10Giuseppe Lavagetto: [C: 03+1] systemd::timer::job: fix the default logging paths (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529358 (owner: 10Giuseppe Lavagetto) [13:51:40] ^^^ here we go, CC phamhi [13:51:59] arturo: awesome..thanks [13:52:26] arturo: if you're curious, modules/base/manifests/puppet.pp and modules/base/templates/puppet.cron.erb [13:52:38] * arturo reading [13:52:40] (03PS4) 10Giuseppe Lavagetto: systemd::timer::job: fix the default logging paths [puppet] - 10https://gerrit.wikimedia.org/r/529358 [13:56:21] RECOVERY - Memory correctable errors -EDAC- on thumbor1004 is OK: (C)4 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops [13:57:20] (03PS19) 10ArielGlenn: refactor wikidata entity dumps into wikibase + wikidata specific bits [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) [13:58:21] (03CR) 10ArielGlenn: [C: 03+2] refactor wikidata entity dumps into wikibase + wikidata specific bits [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) (owner: 10ArielGlenn) [13:59:34] (03PS1) 10Jbond: cloudelastic: fix monitored ip addresses [puppet] - 10https://gerrit.wikimedia.org/r/529362 (https://phabricator.wikimedia.org/T229621) [14:00:22] jbond42: ^ Oh, that's the part that's missing! [14:01:19] 10Operations, 10Elasticsearch, 10Traffic, 10Discovery-Search (Current work), 10Patch-For-Review: Icinga check defined from LVS configuration for cloudelastic are borked - https://phabricator.wikimedia.org/T229621 (10jbond) When this alerted today and ended up going right down the rabbit hole. Anyway i t... [14:01:55] gehel: i think so i have been debuging the monitoring::lvs class pretty much since the alert fired earlier [14:02:05] i have added a description to the phab task [14:02:07] PROBLEM - Mediawiki Cirrussearch update rate - codfw on icinga1001 is CRITICAL: CRITICAL: 30.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [14:02:12] reading it right now [14:02:39] there is some "creative" problem solving going on in monitoring::lvs [14:02:49] I got lost in it a few times... [14:03:45] yes there are a lot of layers of abstraction yaml -> erb -> yaml -> puppet [14:04:03] and yaml pointer just in case it was to clear ;) [14:04:07] PROBLEM - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is CRITICAL: CRITICAL: 50.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [14:05:42] 10Operations, 10DBA, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10JHedden) The plan looks good to me. In the pre-failover stage I'll be shutting down the OpenStack scheduler and... [14:05:51] the cirrus update rate is genuine, but not much we can do right now [14:06:03] I'll increase the alert threshold until the underlying issue is fixed [14:06:49] (03PS1) 10Marostegui: check_depooled: Wrapper to find out depooled hosts [software] - 10https://gerrit.wikimedia.org/r/529366 [14:08:09] (03PS1) 10CDanis: dbctl: clarify some CLI help messages [software/conftool] - 10https://gerrit.wikimedia.org/r/529367 [14:08:38] (03CR) 10Marostegui: [C: 03+2] check_depooled: Wrapper to find out depooled hosts [software] - 10https://gerrit.wikimedia.org/r/529366 (owner: 10Marostegui) [14:09:05] (03Merged) 10jenkins-bot: check_depooled: Wrapper to find out depooled hosts [software] - 10https://gerrit.wikimedia.org/r/529366 (owner: 10Marostegui) [14:11:38] (03PS1) 10Gehel: cirrus: increase alerting threshold for Cirrus update rate check. [puppet] - 10https://gerrit.wikimedia.org/r/529368 (https://phabricator.wikimedia.org/T224425) [14:12:50] (03PS2) 10Gehel: cirrus: increase alerting threshold for Cirrus update rate check. [puppet] - 10https://gerrit.wikimedia.org/r/529368 (https://phabricator.wikimedia.org/T224425) [14:14:45] ACKNOWLEDGEMENT - Mediawiki Cirrussearch update rate - codfw on icinga1001 is CRITICAL: CRITICAL: 100.00% of data under the critical threshold [50.0] Gehel known issue -https://phabricator.wikimedia.org/T224425 https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [14:14:45] ACKNOWLEDGEMENT - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is CRITICAL: CRITICAL: 90.00% of data under the critical threshold [50.0] Gehel known issue -https://phabricator.wikimedia.org/T224425 https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [14:17:32] (03PS3) 10ArielGlenn: add more public tables for xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/527505 (https://phabricator.wikimedia.org/T226167) [14:23:15] RECOVERY - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [14:24:23] RECOVERY - Mediawiki Cirrussearch update rate - codfw on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [14:27:55] (03PS2) 10Jbond: cloudelastic: fix monitored ip addresses [puppet] - 10https://gerrit.wikimedia.org/r/529362 (https://phabricator.wikimedia.org/T229621) [14:28:41] (03CR) 10CDanis: [C: 03+1] "LGTM one nit" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529358 (owner: 10Giuseppe Lavagetto) [14:30:50] (03CR) 10Giuseppe Lavagetto: [C: 03+2] systemd::timer::job: fix the default logging paths [puppet] - 10https://gerrit.wikimedia.org/r/529358 (owner: 10Giuseppe Lavagetto) [14:31:45] (03PS1) 10ArielGlenn: New US mirror of last 5 dumps [puppet] - 10https://gerrit.wikimedia.org/r/529374 [14:33:05] (03CR) 10ArielGlenn: [C: 03+2] New US mirror of last 5 dumps [puppet] - 10https://gerrit.wikimedia.org/r/529374 (owner: 10ArielGlenn) [14:33:24] (03PS5) 10Giuseppe Lavagetto: systemd::timer::job: fix the default logging paths [puppet] - 10https://gerrit.wikimedia.org/r/529358 [14:35:13] (03PS6) 10Giuseppe Lavagetto: systemd::timer::job: fix the default logging paths [puppet] - 10https://gerrit.wikimedia.org/r/529358 [14:36:12] <_joe_> oh come on jenkins [14:36:14] 10Operations, 10Elasticsearch, 10Traffic, 10Discovery-Search (Current work), 10Patch-For-Review: Icinga check defined from LVS configuration for cloudelastic are borked - https://phabricator.wikimedia.org/T229621 (10Mathew.onipe) @jbond Thank you! You fix is way better than mine. I will look at the patch... [14:36:25] (03CR) 10Giuseppe Lavagetto: [C: 03+2] systemd::timer::job: fix the default logging paths [puppet] - 10https://gerrit.wikimedia.org/r/529358 (owner: 10Giuseppe Lavagetto) [14:44:39] 10Operations, 10puppet-compiler, 10User-jijiki: Remove nginx submodule from puppet - https://phabricator.wikimedia.org/T230206 (10jijiki) [14:44:50] 10Operations, 10puppet-compiler, 10User-jijiki: Remove nginx submodule from puppet - https://phabricator.wikimedia.org/T230206 (10jijiki) p:05Triage→03Normal [14:46:54] (03PS1) 10ArielGlenn: Add new dumps mirror to html list [puppet] - 10https://gerrit.wikimedia.org/r/529376 [14:48:17] (03CR) 10ArielGlenn: [C: 03+2] Add new dumps mirror to html list [puppet] - 10https://gerrit.wikimedia.org/r/529376 (owner: 10ArielGlenn) [14:55:43] (03PS3) 10Mathew.onipe: cloudelastic: fix monitored ip addresses [puppet] - 10https://gerrit.wikimedia.org/r/529362 (https://phabricator.wikimedia.org/T229621) (owner: 10Jbond) [14:55:45] (03PS2) 10Mathew.onipe: lvs: allow access to wdqs lvs on port 8888 [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) [14:56:17] (03PS1) 10Giuseppe Lavagetto: run_ci_locally: bump default image version to the latest [puppet] - 10https://gerrit.wikimedia.org/r/529379 [14:59:48] (03CR) 10Giuseppe Lavagetto: [C: 03+2] run_ci_locally: bump default image version to the latest [puppet] - 10https://gerrit.wikimedia.org/r/529379 (owner: 10Giuseppe Lavagetto) [15:03:17] (03CR) 10Mathew.onipe: [C: 03+1] cirrus: increase alerting threshold for Cirrus update rate check. [puppet] - 10https://gerrit.wikimedia.org/r/529368 (https://phabricator.wikimedia.org/T224425) (owner: 10Gehel) [15:03:33] (03PS3) 10Gehel: cirrus: increase alerting threshold for Cirrus update rate check. [puppet] - 10https://gerrit.wikimedia.org/r/529368 (https://phabricator.wikimedia.org/T224425) [15:04:13] PROBLEM - Mediawiki Cirrussearch update rate - codfw on icinga1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [15:04:20] (03CR) 10Gehel: [C: 03+2] cirrus: increase alerting threshold for Cirrus update rate check. [puppet] - 10https://gerrit.wikimedia.org/r/529368 (https://phabricator.wikimedia.org/T224425) (owner: 10Gehel) [15:04:39] PROBLEM - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is CRITICAL: CRITICAL: 40.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [15:05:59] _joe_: can I merge your change as well? Looks trivial enough [15:06:15] <_joe_> gehel: damn, sure [15:06:23] done! [15:06:25] <_joe_> I was distracted by an irc conversation :/ [15:06:40] blaming IRC, always works! [15:07:12] and that merge should get rid of that cirrus update rate noise [15:10:10] RECOVERY - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [15:10:10] RECOVERY - Mediawiki Cirrussearch update rate - codfw on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1 [15:17:52] 10Operations, 10cloud-services-team, 10serviceops, 10Core Platform Team Legacy (Watching / External), and 4 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki) @Dzahn I think we should continue by switching all jobs on Monday and see what gives. We fo... [15:23:11] 04Critical Alert for device cr1-codfw.wikimedia.org - Juniper alarm active [15:26:24] XioNoX: is that expected? ^ [15:27:03] jijiki: it's a known issue nagging me [15:27:13] thanks for the head's up [15:27:24] librenms always demanding attention [15:28:57] XioNo.X is really trying to love all crs equally [15:30:46] (I set hilights for all doted version of my nick :) [15:30:47] ) [15:30:54] highlight* [15:30:56] aha [15:31:06] x4x [15:31:18] (added) [15:31:32] the annoying thing is that the alert doesn't show up on librenms dashboard, so I can't ACK it... [15:31:38] looks like a bug [15:37:11] 04Critical Alert for device cr1-codfw.wikimedia.org - Juniper alarm active got acknowledged [15:39:03] it is acknowledging itself [15:39:05] it is like [15:39:14] it has a life of its own! [15:41:18] 10Operations, 10fundraising-tech-ops: rack/setup/install Prometeuse/Grafana host frmon2001 for fr-tech - https://phabricator.wikimedia.org/T196476 (10Jgreen) [15:42:00] (03CR) 10Cwhite: [C: 03+2] admin: add Jaime Anstee to ldap only users [puppet] - 10https://gerrit.wikimedia.org/r/529184 (https://phabricator.wikimedia.org/T229959) (owner: 10Cwhite) [15:42:05] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install Prometeuse/Grafana host frmon2001 for fr-tech - https://phabricator.wikimedia.org/T196476 (10Jgreen) 05Open→03Resolved prometheus is up and running and collecting, there's still work to be done on the monitoring infrastructure around r... [15:42:08] (03PS2) 10Cwhite: admin: add Jaime Anstee to ldap only users [puppet] - 10https://gerrit.wikimedia.org/r/529184 (https://phabricator.wikimedia.org/T229959) [15:47:26] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Mwmaint1002, Stat1007 for Abijeet Patro - https://phabricator.wikimedia.org/T230020 (10colewhite) [15:47:55] 10Operations, 10puppet-compiler, 10User-jijiki: Remove nginx submodule from puppet - https://phabricator.wikimedia.org/T230206 (10elukey) When I folded the Analytics modules I used a procedure suggested by Joe, here an example: https://gerrit.wikimedia.org/r/#/c/520267/ https://gerrit.wikimedia.org/r/#/c/52... [15:48:13] !log Disable puppet on mw1222 and depool [15:48:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:38] 10Operations, 10Analytics, 10Analytics-Kanban, 10Cleanup, 10Patch-For-Review: Archive zookeeper puppet submodule - https://phabricator.wikimedia.org/T227164 (10elukey) [15:48:45] 10Operations, 10Analytics, 10Analytics-Kanban, 10Cleanup, 10Patch-For-Review: Archive zookeeper puppet submodule - https://phabricator.wikimedia.org/T227164 (10elukey) 05Open→03Resolved [15:49:46] (03PS4) 10Cwhite: admin: admin data and access for Abijeet Patro [puppet] - 10https://gerrit.wikimedia.org/r/529125 (https://phabricator.wikimedia.org/T230020) [15:50:00] 10Operations, 10Analytics, 10Analytics-Kanban, 10Cleanup: Archive cdh puppet submodule - https://phabricator.wikimedia.org/T226474 (10elukey) 05Open→03Resolved [16:01:11] (03PS1) 10Subramanya Sastry: Update parsoid-rt-client.config.js.erb to fetch test ids from a function [puppet] - 10https://gerrit.wikimedia.org/r/529391 (https://phabricator.wikimedia.org/T230166) [16:02:02] (03CR) 10Subramanya Sastry: "This depends on https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/529191" [puppet] - 10https://gerrit.wikimedia.org/r/529391 (https://phabricator.wikimedia.org/T230166) (owner: 10Subramanya Sastry) [16:04:50] (03CR) 10Gergő Tisza: [C: 03+1] "Maybe CDB files are generated via some different mechanism in beta and the extension list is ignored. I'm not really sure." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/526541 (https://phabricator.wikimedia.org/T227348) (owner: 10Mholloway) [16:14:59] !log add phamhi to 'wmf' and 'ops' LDAP groups (T228942) [16:15:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:09] T228942: Onboard Hieu Pham to Wikimedia Foundation as SRE in Cloud Services - https://phabricator.wikimedia.org/T228942 [16:20:12] 10Operations, 10Maps, 10SRE-Access-Requests: Remove MaxSem from map servers - https://phabricator.wikimedia.org/T230183 (10colewhite) 05Open→03Resolved p:05Triage→03Normal a:03Muehlenhoff [16:20:40] 10Operations, 10Product-Analytics: Help speed up onboarding for new Analysts - https://phabricator.wikimedia.org/T230173 (10colewhite) [16:20:56] (03PS1) 10CDanis: dbctl: add note & candidate_master fields [software/conftool] - 10https://gerrit.wikimedia.org/r/529396 (https://phabricator.wikimedia.org/T229677) [16:22:13] (03CR) 10Gehel: [C: 03+1] "Looks great! And much simpler than what we thought we needed. Thanks for diving into this!" [puppet] - 10https://gerrit.wikimedia.org/r/529362 (https://phabricator.wikimedia.org/T229621) (owner: 10Jbond) [16:30:28] 10Operations, 10Traffic, 10wikimediafoundation.org, 10Security: Setting up static maintenance page on Foundation servers for Foundation website - https://phabricator.wikimedia.org/T230075 (10colewhite) [16:30:58] (03PS1) 10Phamhi: admin: add phamhi public key [labs/private] - 10https://gerrit.wikimedia.org/r/529398 (https://phabricator.wikimedia.org/T230126) [16:34:56] (03CR) 10Dzahn: design.wikimedia.org: add new dir and repo for strategy site (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/528922 (https://phabricator.wikimedia.org/T230053) (owner: 10Dzahn) [16:35:11] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] admin: add phamhi public key [labs/private] - 10https://gerrit.wikimedia.org/r/529398 (https://phabricator.wikimedia.org/T230126) (owner: 10Phamhi) [16:36:10] (03CR) 10CDanis: check_depooled: Wrapper to find out depooled hosts (032 comments) [software] - 10https://gerrit.wikimedia.org/r/529366 (owner: 10Marostegui) [16:37:43] (03CR) 10Phamhi: [V: 03+2 C: 03+2] admin: add phamhi public key [labs/private] - 10https://gerrit.wikimedia.org/r/529398 (https://phabricator.wikimedia.org/T230126) (owner: 10Phamhi) [16:37:56] (03CR) 10Gehel: [C: 04-1] "Beside the naming, this looks good to my limited understanding of LVS." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529053 (https://phabricator.wikimedia.org/T176875) (owner: 10Mathew.onipe) [16:38:47] 10Operations, 10observability: Remove logster from cp* hosts - https://phabricator.wikimedia.org/T229357 (10colewhite) a:03colewhite [16:40:13] (03PS2) 10Dzahn: mediawiki:maintenance: switch pagetriage cron to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/528609 (https://phabricator.wikimedia.org/T195392) [16:41:16] (03PS5) 10Cwhite: admin: admin data and access for Abijeet Patro [puppet] - 10https://gerrit.wikimedia.org/r/529125 (https://phabricator.wikimedia.org/T230020) [16:41:18] (03PS1) 10Cwhite: logster: add ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) [16:43:28] (03PS1) 10Dzahn: parsoid::testing: remove parameter use_parsoid_php again [puppet] - 10https://gerrit.wikimedia.org/r/529400 (https://phabricator.wikimedia.org/T228069) [16:44:33] (03CR) 10jerkins-bot: [V: 04-1] parsoid::testing: remove parameter use_parsoid_php again [puppet] - 10https://gerrit.wikimedia.org/r/529400 (https://phabricator.wikimedia.org/T228069) (owner: 10Dzahn) [16:52:46] (03PS1) 10Effie Mouzeli: profile:templates:services_proxy: Enable ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) [16:52:54] !log mwmaint - manually running updatePageTriageQueue maintenance cron with PHP 7.2 [16:53:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:30] (03CR) 10Dzahn: [C: 03+2] mediawiki:maintenance: switch pagetriage cron to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/528609 (https://phabricator.wikimedia.org/T195392) (owner: 10Dzahn) [16:53:38] (03PS2) 10Effie Mouzeli: profile:templates:services_proxy: Enable ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) [17:06:55] (03PS8) 10Ema: systemd: add support for mask and unmask [puppet] - 10https://gerrit.wikimedia.org/r/529328 [17:07:24] 10Operations, 10Discovery-Search (Current work): Prometheus not collecting cloudelastic metrics - https://phabricator.wikimedia.org/T229937 (10debt) 05Open→03Resolved [17:15:00] (03CR) 10Elukey: [C: 04-1] "PCC doesn't like it https://puppet-compiler.wmflabs.org/compiler1001/17842/cp2010.codfw.wmnet/ :(" [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite) [17:18:48] (03PS2) 10Cwhite: logster: add ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) [17:19:36] !log mwmaint - running purgeParserCache maintenance cron manually with PHP 7.2 - ..slowly [17:19:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:27] !log set BGP peer "BrightRidge" on cr2-eqiad [17:23:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:24:19] (03PS1) 10Ema: ATS: ensure trafficserver is not auto-started upon installation [puppet] - 10https://gerrit.wikimedia.org/r/529402 [17:24:28] (03CR) 10Cwhite: "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/529399 (https://phabricator.wikimedia.org/T229357) (owner: 10Cwhite) [17:25:49] hi mutante [17:27:15] 10Operations, 10MediaWiki-extensions-Mailgun, 10cloud-services-team, 10serviceops, and 5 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10jijiki) [17:29:31] 10Operations, 10Puppet: clean up systemd::timer::job logging basedir mess - https://phabricator.wikimedia.org/T230127 (10MarcoAurelio) >>! In T230127#5405065, @gerritbot wrote: > Change 529358 had a related patch set uploaded (by CDanis; owner: Giuseppe Lavagetto): > [operations/puppet@production] systemd::tim... [17:31:48] (03PS2) 10Ema: ATS: ensure trafficserver is not auto-started upon installation [puppet] - 10https://gerrit.wikimedia.org/r/529402 [17:50:20] (03CR) 10Dzahn: "thanks for doing this, i would like to start converting them, just separately from the PHP 7.2 switch as we already agreed on IRC. and i w" [puppet] - 10https://gerrit.wikimedia.org/r/529077 (owner: 10MarcoAurelio) [17:54:21] !log mwmaint - running initsitestats maintenance job - initializes or updates statistics table on all wikis [17:54:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:56] 10Operations, 10MediaWiki-extensions-Mailgun, 10cloud-services-team, 10serviceops, and 5 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [18:14:06] !log add BGP peer for AS 38758 on cr1-eqsin [18:14:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:36] 10Operations, 10MediaWiki-extensions-Mailgun, 10cloud-services-team, 10serviceops, and 5 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [18:42:30] 10Operations, 10ops-eqiad, 10DC-Ops, 10netbox: Missing Netbox Info for New PDUs - https://phabricator.wikimedia.org/T229680 (10wiki_willy) 05Open→03Resolved Info entered into Netbox by @RobH Resolving task [18:42:32] 10Operations, 10ops-codfw, 10ops-eqiad, 10DC-Ops, and 2 others: Triage and resolve all outstanding Netbox report errors - https://phabricator.wikimedia.org/T223450 (10wiki_willy) [18:43:11] 10Operations, 10ops-eqsin: msw1-eqsin/msw2-eqsin missing serial number - https://phabricator.wikimedia.org/T227911 (10RobH) 05Open→03Resolved netbox updated [18:44:46] 10Operations, 10ops-codfw, 10ops-eqiad, 10DC-Ops, and 2 others: Triage and resolve all outstanding Netbox report errors - https://phabricator.wikimedia.org/T223450 (10wiki_willy) [19:07:02] (03PS1) 10Dzahn: mw-maintenance/cirrussearch: switch cron jobs to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/529417 (https://phabricator.wikimedia.org/T195392) [19:07:14] (03PS1) 10Fomafix: Add redirects for https://nan.wik{tionary,iquote,ibooks,isource}.org [puppet] - 10https://gerrit.wikimedia.org/r/529418 (https://phabricator.wikimedia.org/T86915) [19:08:24] (03CR) 10Dzahn: [C: 03+2] mw-maintenance/cirrussearch: switch cron jobs to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/529417 (https://phabricator.wikimedia.org/T195392) (owner: 10Dzahn) [19:08:35] (03PS2) 10Dzahn: mw-maintenance/cirrussearch: switch cron jobs to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/529417 (https://phabricator.wikimedia.org/T195392) [19:11:13] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 4 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) It sounds like we should just close this ticket as Declined then @WMDE-leszek ? [19:13:00] PROBLEM - Check the Netbox report-s- librenms for fail status. on netmon1002 is CRITICAL: librenms.LibreNMS CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [19:14:28] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 4 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) If that is really the outcome that would be unfortunate but please leave it open or ideally create a subtask to revert/remove the code... [19:15:29] (03PS7) 10Smalyshev: Set up dumps for mediainfo RDF generation [puppet] - 10https://gerrit.wikimedia.org/r/516444 (https://phabricator.wikimedia.org/T221917) [19:18:13] Hi, I tried to login on gerrit but it just shows me "Cannot assign user name "viztor" to account 7460; name already in use." [19:18:24] anyone have an idea what might be happening? [19:20:13] try using lowercases [19:20:29] (03PS8) 10Smalyshev: Set up dumps for mediainfo RDF generation [puppet] - 10https://gerrit.wikimedia.org/r/516444 (https://phabricator.wikimedia.org/T221917) [19:20:44] viztor_: instead of login as Viztor use viztor [19:20:50] or vice versa [19:20:55] it has worked in the past [19:21:59] It should work with either case [19:22:05] thcipriani ^ [19:22:14] Doesn't seem to make a difference... [19:22:59] viztor_: could you file a task? I'll take a look at logs vs database and see if I can figure out what's going on. [19:23:00] Task or Tyler then. I'm leaving for dinner :) [19:23:03] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 4 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) It seems like we have been in a [[ https://en.wikipedia.org/wiki/Mexican_standoff | Mexican standoff ]] / impasse here since [[ http... [19:24:11] Ok.. [19:24:20] Is there a way I can get on asap? [19:24:27] Like reset the account or something. [19:26:20] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 4 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) Tbh it seems the main issue is that the c-level communication is not happening on this task and therefore not visible. [19:26:31] https://twitter.com/Flaxtonboy1/status/1159862884056387584/photo/1 [19:26:32] lol [19:26:36] uh [19:26:38] wrong channel [19:27:19] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 4 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) a:05Dzahn→03None [19:37:46] !log mwmaint - running cirrussearch maintenance jobs manually (completion indices, sanitize cirrus jobs) [19:37:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:29] !log mwdebug1001 - temp stopped puppet, editing nginx config to test making it listen on IPv6 for upstream proxies (529401) (T224538) [19:46:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:37] T224538: Socket Errors on PHP7 - https://phabricator.wikimedia.org/T224538 [19:47:52] (03CR) 10Dzahn: [C: 04-1] "it's missing a : which would lead to a syntax error, fixing it" [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli) [19:51:33] (03CR) 10Dzahn: [C: 04-1] profile:templates:services_proxy: Enable ipv6 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli) [19:53:30] (03PS3) 10Dzahn: profile:templates:services_proxy: Enable ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli) [19:54:04] (03CR) 10Dzahn: [C: 03+1] "tested on mwdebug1001" [puppet] - 10https://gerrit.wikimedia.org/r/529401 (https://phabricator.wikimedia.org/T224538) (owner: 10Effie Mouzeli) [19:59:55] (03PS2) 10Dzahn: design.wikimedia.org: add new dir and repo for strategy site [puppet] - 10https://gerrit.wikimedia.org/r/528922 (https://phabricator.wikimedia.org/T230053) [20:05:53] (03CR) 10Dzahn: [C: 03+2] "dir name fixed. thanks. i see the repo exists now and just has no content yet. merging this to unblock you." [puppet] - 10https://gerrit.wikimedia.org/r/528922 (https://phabricator.wikimedia.org/T230053) (owner: 10Dzahn) [20:06:13] (03PS3) 10Dzahn: design.wikimedia.org: add new dir and repo for strategy site [puppet] - 10https://gerrit.wikimedia.org/r/528922 (https://phabricator.wikimedia.org/T230053) [20:11:46] (03PS1) 10Dzahn: design.wikimedia.org: fix directory name and rename httpd config file [puppet] - 10https://gerrit.wikimedia.org/r/529425 (https://phabricator.wikimedia.org/T230053) [20:12:03] (03Abandoned) 10Dzahn: planet: re-add support for https for traffic server [puppet] - 10https://gerrit.wikimedia.org/r/524621 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn) [20:12:51] (03CR) 10Dzahn: [C: 03+2] design.wikimedia.org: fix directory name and rename httpd config file [puppet] - 10https://gerrit.wikimedia.org/r/529425 (https://phabricator.wikimedia.org/T230053) (owner: 10Dzahn) [20:13:53] (03PS2) 10Dzahn: design.wikimedia.org: fix directory name and rename httpd config file [puppet] - 10https://gerrit.wikimedia.org/r/529425 (https://phabricator.wikimedia.org/T230053) [20:21:51] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10cchen) Thank you @colewhite! I am still having trouble to log into [[ https://superset.wikimedia.org/ | Superset... [20:22:13] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10cchen) 05Resolved→03Open [20:25:41] 10Operations, 10Analytics, 10Core Platform Team Legacy (Watching / External), 10Patch-For-Review, and 2 others: Replace and expand kafka main hosts (kafka[12]00[123]) with kafka-main[12]00[12345] - https://phabricator.wikimedia.org/T225005 (10herron) [20:31:39] !log contint1001 - added entry to /etc/fstab for /mnt/docker to survive reboots ( 13 /dev/mapper/contint1001--data-docker /mnt/docker ext4 defaults 0 2$ [20:31:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:23] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201907): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10Dzahn) on contint1001: added entr... [20:32:27] (03PS1) 10Herron: eventgate-main: replace broker kafka1001 with kafka-main1001 [deployment-charts] - 10https://gerrit.wikimedia.org/r/529428 (https://phabricator.wikimedia.org/T225005) [20:38:03] (03PS1) 10Dzahn: design.wikimedia.org: add httpd alias for /strategy URL [puppet] - 10https://gerrit.wikimedia.org/r/529429 (https://phabricator.wikimedia.org/T230053) [20:38:29] (03CR) 10jerkins-bot: [V: 04-1] design.wikimedia.org: add httpd alias for /strategy URL [puppet] - 10https://gerrit.wikimedia.org/r/529429 (https://phabricator.wikimedia.org/T230053) (owner: 10Dzahn) [20:39:39] (03PS2) 10Dzahn: design.wikimedia.org: add httpd alias for /strategy URL [puppet] - 10https://gerrit.wikimedia.org/r/529429 (https://phabricator.wikimedia.org/T230053) [20:40:24] 10Operations, 10Analytics, 10Core Platform Team Legacy (Watching / External), 10Patch-For-Review, and 2 others: Replace and expand kafka main hosts (kafka[12]00[123]) with kafka-main[12]00[12345] - https://phabricator.wikimedia.org/T225005 (10herron) >>! In T225005#5396148, @Ottomata wrote: > In addition t... [20:40:41] (03CR) 10Dzahn: [C: 03+2] "..because you wanted the /strategy URL and not /design-strategy while the dir has that name as you pointed out" [puppet] - 10https://gerrit.wikimedia.org/r/529429 (https://phabricator.wikimedia.org/T230053) (owner: 10Dzahn) [20:47:15] 10Operations, 10Domains, 10Product-Design-Strategy, 10Traffic, 10Patch-For-Review: Add a repo reference to Design Strategy web address - https://phabricator.wikimedia.org/T230053 (10Dzahn) 05Open→03Resolved a:03Dzahn done! here you go: https://design.wikimedia.org/strategy/ all you need is get yo... [20:55:10] !log mwmaint - running update_special_pages maintenance cron manually [20:55:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:02:44] (03CR) 10Gergő Tisza: "Who can access scandium? In theory, exception traces might leak private data." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529173 (https://phabricator.wikimedia.org/T228069) (owner: 10Subramanya Sastry) [21:06:34] 10Operations, 10LDAP-Access-Requests: Membership to 'wmf' LDAP group request for Connie Chen - https://phabricator.wikimedia.org/T230242 (10colewhite) [21:06:38] (03CR) 10Dzahn: "the access groups: ops, parsoid-admin, parsoid-roots, parsoid-test-admins and parsoid-test-roots which resolves to:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529173 (https://phabricator.wikimedia.org/T228069) (owner: 10Subramanya Sastry) [21:07:37] (03CR) 10Subramanya Sastry: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529173 (https://phabricator.wikimedia.org/T228069) (owner: 10Subramanya Sastry) [21:11:35] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10colewhite) Access to Superset and Turnilo are managed by the 'wmf' LDAP group. Since it is beyond the scope of th... [21:11:49] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003 and notebook1004] and groups for cchen - https://phabricator.wikimedia.org/T228447 (10colewhite) 05Open→03Resolved [21:12:38] 10Operations, 10LDAP-Access-Requests: Membership to 'wmf' LDAP group request for Connie Chen - https://phabricator.wikimedia.org/T230242 (10colewhite) p:05Triage→03Normal [21:13:09] 10Operations, 10LDAP-Access-Requests: Membership to 'wmf' LDAP group request for Connie Chen - https://phabricator.wikimedia.org/T230242 (10colewhite) [21:14:43] 10Operations, 10Mail, 10OTRS: check OTRS wiki for email addresses no longer used - https://phabricator.wikimedia.org/T230243 (10Dzahn) [21:15:05] 10Operations, 10LDAP-Access-Requests: Membership to 'wmf' LDAP group request for Connie Chen - https://phabricator.wikimedia.org/T230242 (10colewhite) @cchen is now in the wmf ldap group. Resolving task. Please feel free to reopen if you encounter any related issue. [21:15:17] 10Operations, 10LDAP-Access-Requests: Membership to 'wmf' LDAP group request for Connie Chen - https://phabricator.wikimedia.org/T230242 (10colewhite) 05Open→03Resolved [21:19:35] 10Operations, 10LDAP-Access-Requests: Membership to 'wmf' LDAP group request for Connie Chen - https://phabricator.wikimedia.org/T230242 (10kzimmerman) Thanks for creating this and resolving it, @colewhite! [21:21:23] 10Operations, 10MediaWiki-extensions-Mailgun, 10cloud-services-team, 10serviceops, and 5 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [21:23:58] (03PS1) 10Dzahn: mw-maintenance: switch generatecaptcha cron to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/529433 (https://phabricator.wikimedia.org/T195392) [21:24:31] (03CR) 10Dzahn: [C: 03+2] mw-maintenance: switch generatecaptcha cron to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/529433 (https://phabricator.wikimedia.org/T195392) (owner: 10Dzahn) [21:28:21] !log mwmaint - generating new captchas for ConfirmEdit extension by running generatecaptcha maintenance cron command [21:28:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:54] 10Operations, 10ops-eqiad, 10DC-Ops, 10Wikimedia-Logstash, and 2 others: Decommission old eqiad logstash hardware hosts logstash100[456] - https://phabricator.wikimedia.org/T217556 (10Jclark-ctr) [21:39:57] 10Operations, 10Discovery: elastic2054 unresponsive - https://phabricator.wikimedia.org/T227298 (10Papaul) You have successfully submitted request SR995910217. Your dispatch request has been successfully created and will be reviewed by our team. You can monitor its progress on your Dell EMC TechDirect dashbo... [21:48:19] 10Operations, 10ops-eqsin: update PDUs for eqsin (asset tag and other info) - https://phabricator.wikimedia.org/T211368 (10RobH) 05Open→03Resolved I've gone ahead and populated the serial number with the asset tags. That will remove it off our error reporting. As we plan to eventually replace these with... [21:53:56] 10Operations, 10MediaWiki-extensions-Mailgun, 10cloud-services-team, 10serviceops, and 5 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [21:55:35] (03PS1) 10Jhedden: openstack: initial haproxy profile [puppet] - 10https://gerrit.wikimedia.org/r/529436 (https://phabricator.wikimedia.org/T223907) [21:56:50] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [21:58:18] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [21:59:59] 10Operations, 10netops: Cleanup confed BGP peerings and policies - https://phabricator.wikimedia.org/T167841 (10ayounsi) For the eqord issue, this should works. The`208.80.152.0/22` prefix gets created only if the router has (or learn) at least one contributing prefix (including in the /22) with a next-hop (ig... [22:04:33] (03PS1) 10Dzahn: mw-maintenance: switch characterEditStatsTranslate to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/529437 (https://phabricator.wikimedia.org/T195392) [22:08:52] (03PS2) 10Jhedden: openstack: initial haproxy profile [puppet] - 10https://gerrit.wikimedia.org/r/529436 (https://phabricator.wikimedia.org/T223907) [22:14:05] (03PS3) 10Jhedden: openstack: initial haproxy profile [puppet] - 10https://gerrit.wikimedia.org/r/529436 (https://phabricator.wikimedia.org/T223907) [22:17:35] (03CR) 10Dzahn: [C: 03+2] mw-maintenance: switch characterEditStatsTranslate to PHP 7.2 [puppet] - 10https://gerrit.wikimedia.org/r/529437 (https://phabricator.wikimedia.org/T195392) (owner: 10Dzahn) [22:19:44] PROBLEM - Router interfaces on cr2-knams is CRITICAL: CRITICAL: host 91.198.174.246, interfaces up: 54, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:25:37] 10Operations, 10MediaWiki-extensions-Mailgun, 10cloud-services-team, 10serviceops, and 5 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [22:27:56] 10Operations, 10MediaWiki-extensions-Mailgun, 10cloud-services-team, 10serviceops, and 5 others: Switch cronjobs on maintenance hosts to PHP7 - https://phabricator.wikimedia.org/T195392 (10Dzahn) [22:35:15] (03PS1) 10Ayounsi: Ignore SJ Manufacturing ThruPower devices for LibreNMS report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/529439 [22:37:42] ACKNOWLEDGEMENT - Check the Netbox report-s- librenms for fail status. on netmon1002 is CRITICAL: librenms.LibreNMS CRITICAL Ayounsi https://gerrit.wikimedia.org/r/c/operations/software/netbox-reports/+/529439 https://wikitech.wikimedia.org/wiki/Netbox%23Reports [22:40:05] (03CR) 10CRusnov: [C: 03+1] "LGTM" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/529439 (owner: 10Ayounsi) [22:41:50] PROBLEM - HHVM rendering on mw1277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [22:43:18] RECOVERY - HHVM rendering on mw1277 is OK: HTTP OK: HTTP/1.1 200 OK - 91689 bytes in 0.626 second response time https://wikitech.wikimedia.org/wiki/Application_servers [22:44:38] (03CR) 10Gergő Tisza: "It has DB access though, so a stack trace could expose info about oversighted revisions for example. Probably not a big deal if it's not e" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529173 (https://phabricator.wikimedia.org/T228069) (owner: 10Subramanya Sastry) [23:22:37] (03PS1) 10Smalyshev: Disable DCAT-AP updates - will be moved to separate endpoint [puppet] - 10https://gerrit.wikimedia.org/r/529443 (https://phabricator.wikimedia.org/T228297) [23:23:42] (03CR) 10Smalyshev: [C: 04-1] "Setting to -1 temporarily until the VPS server is ready, then it can be merged." [puppet] - 10https://gerrit.wikimedia.org/r/529443 (https://phabricator.wikimedia.org/T228297) (owner: 10Smalyshev)