[00:02:20] Ah :) [00:08:38] (03PS1) 10RobH: Offboarding Trevor Parscal [puppet] - 10https://gerrit.wikimedia.org/r/375094 [00:10:36] PROBLEM - pivot on thorium is CRITICAL: connect to address 10.64.53.26 and port 9090: Connection refused [00:10:39] PROBLEM - Check systemd state on thorium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:10:40] (03CR) 10RobH: [C: 032] Offboarding Trevor Parscal [puppet] - 10https://gerrit.wikimedia.org/r/375094 (owner: 10RobH) [00:11:17] PROBLEM - SSH on thorium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:08] !log gerrit: flushed all non-login caches, things might be sluggish for the next ~15mins or so [00:13:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:17:47] PROBLEM - Check systemd state on thorium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:17:56] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [00:18:07] PROBLEM - Hue Server on thorium is CRITICAL: PROCS CRITICAL: 0 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue [00:18:26] RECOVERY - SSH on thorium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [00:21:36] PROBLEM - SSH on thorium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:22:06] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [00:24:50] 10Operations, 10Security-Team, 10vm-requests: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2406322 (10EddieGP) It seems this vm is still in site.pp and the role is still present in puppet. Should this be removed? [00:27:26] PROBLEM - Hue Server on thorium is CRITICAL: PROCS CRITICAL: 0 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue [00:27:36] RECOVERY - SSH on thorium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [00:27:46] RECOVERY - pivot on thorium is OK: TCP OK - 0.000 second response time on 10.64.53.26 port 9090 [00:28:06] PROBLEM - Check systemd state on thorium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:28:07] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [00:40:07] RECOVERY - Check systemd state on thorium is OK: OK - running: The system is fully operational [00:40:17] RECOVERY - salt-minion processes on thorium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:57:01] !log tstarling@tin Synchronized php-1.30.0-wmf.16/includes/parser/Parser.php: (no justification provided) (duration: 00m 44s) [00:57:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:18:57] PROBLEM - Hue Server on thorium is CRITICAL: PROCS CRITICAL: 0 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue [01:24:15] (03PS2) 10Andrew Bogott: labpuppetmaster: add back the wmcs-roots group [puppet] - 10https://gerrit.wikimedia.org/r/375084 (owner: 10Rush) [01:25:23] (03CR) 10Andrew Bogott: [C: 032] labpuppetmaster: add back the wmcs-roots group [puppet] - 10https://gerrit.wikimedia.org/r/375084 (owner: 10Rush) [01:29:06] RECOVERY - Hue Server on thorium is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue [01:35:47] PROBLEM - Check systemd state on thorium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:35:57] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [01:36:26] PROBLEM - pivot on thorium is CRITICAL: connect to address 10.64.53.26 and port 9090: Connection refused [01:37:27] RECOVERY - pivot on thorium is OK: TCP OK - 0.000 second response time on 10.64.53.26 port 9090 [01:42:56] RECOVERY - Check systemd state on thorium is OK: OK - running: The system is fully operational [01:43:06] RECOVERY - salt-minion processes on thorium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [02:17:26] PROBLEM - SSH on thorium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:17:57] PROBLEM - pivot on thorium is CRITICAL: connect to address 10.64.53.26 and port 9090: Connection refused [02:18:37] PROBLEM - Check systemd state on thorium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:18:46] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [02:19:17] RECOVERY - SSH on thorium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [02:19:26] PROBLEM - Check whether ferm is active by checking the default input chain on thorium is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [02:22:27] PROBLEM - SSH on thorium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:31:26] (03PS1) 10Krinkle: webperf: Refactor ve.py and add unit tests [puppet] - 10https://gerrit.wikimedia.org/r/375105 (https://phabricator.wikimedia.org/T110903) [02:31:27] (03PS1) 10Krinkle: webperf: Convert ve.py from ZMQ to KafkaConsumer [puppet] - 10https://gerrit.wikimedia.org/r/375106 (https://phabricator.wikimedia.org/T110903) [02:32:33] (03CR) 10jerkins-bot: [V: 04-1] webperf: Refactor ve.py and add unit tests [puppet] - 10https://gerrit.wikimedia.org/r/375105 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [02:32:33] (03CR) 10jerkins-bot: [V: 04-1] webperf: Convert ve.py from ZMQ to KafkaConsumer [puppet] - 10https://gerrit.wikimedia.org/r/375106 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [02:34:12] (03PS2) 10Krinkle: webperf: Refactor ve.py and add unit tests [puppet] - 10https://gerrit.wikimedia.org/r/375105 (https://phabricator.wikimedia.org/T110903) [02:34:13] (03PS2) 10Krinkle: webperf: Convert ve.py from ZMQ to KafkaConsumer [puppet] - 10https://gerrit.wikimedia.org/r/375106 (https://phabricator.wikimedia.org/T110903) [02:34:17] PROBLEM - Hue Server on thorium is CRITICAL: PROCS CRITICAL: 0 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue [02:34:30] (03CR) 10jerkins-bot: [V: 04-1] webperf: Convert ve.py from ZMQ to KafkaConsumer [puppet] - 10https://gerrit.wikimedia.org/r/375106 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [02:34:36] RECOVERY - SSH on thorium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [02:34:46] RECOVERY - Check whether ferm is active by checking the default input chain on thorium is OK: OK ferm input default policy is set [02:34:57] PROBLEM - Check systemd state on thorium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:35:06] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [02:35:16] RECOVERY - pivot on thorium is OK: TCP OK - 0.000 second response time on 10.64.53.26 port 9090 [02:39:44] (03CR) 10Krinkle: [C: 031] "Bump :) Would like to drop support for PHP 5.5. in my tools but can't since for using Composer, the bastion is required until this lands, " [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/369838 (https://phabricator.wikimedia.org/T172358) (owner: 10BryanDavis) [02:41:17] PROBLEM - HHVM rendering on mw2251 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:16] RECOVERY - HHVM rendering on mw2251 is OK: HTTP OK: HTTP/1.1 200 OK - 78878 bytes in 0.335 second response time [03:15:11] (03PS1) 10BBlack: Deprecation of 3DES: bump to 11% [puppet] - 10https://gerrit.wikimedia.org/r/375107 (https://phabricator.wikimedia.org/T163251) [03:15:28] (03CR) 10BBlack: [C: 032] Deprecation of 3DES: bump to 11% [puppet] - 10https://gerrit.wikimedia.org/r/375107 (https://phabricator.wikimedia.org/T163251) (owner: 10BBlack) [03:28:17] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 741.98 seconds [03:35:56] (03PS8) 10BryanDavis: Deploy scholarships with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/326461 (https://phabricator.wikimedia.org/T129134) (owner: 10Niharika29) [03:37:45] (03PS1) 10Tim Starling: Re-enable EtcdConfig in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375108 [03:46:45] (03PS2) 10Krinkle: Enable jQuery 3 on nlwiki sister projects (b, n, q, s, wikt) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374850 (https://phabricator.wikimedia.org/T124742) [03:48:36] (03CR) 10Krinkle: Re-enable EtcdConfig in beta cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375108 (owner: 10Tim Starling) [03:49:04] (03CR) 10Krinkle: [C: 032] Enable jQuery 3 on nlwiki sister projects (b, n, q, s, wikt) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374850 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [03:50:38] (03Merged) 10jenkins-bot: Enable jQuery 3 on nlwiki sister projects (b, n, q, s, wikt) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374850 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [03:50:48] (03CR) 10jenkins-bot: Enable jQuery 3 on nlwiki sister projects (b, n, q, s, wikt) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374850 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [03:51:30] (03CR) 10Krinkle: "fixme" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374922 (https://phabricator.wikimedia.org/T69931) (owner: 10Mattflaschen) [03:51:49] (03CR) 10Krinkle: "Fixed (yay)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374922 (https://phabricator.wikimedia.org/T69931) (owner: 10Mattflaschen) [03:54:29] !log krinkle@tin Synchronized wmf-config/InitialiseSettings.php: I4dfc33f66c3 - Enable jQuery 3 on nlwiki sister projects (duration: 00m 43s) [03:54:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:59:17] (03CR) 10Krinkle: "https://phabricator.wikimedia.org/T174758" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375051 (owner: 10Mattflaschen) [04:05:30] (03PS1) 10BryanDavis: Deploy iegreview with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) [04:13:47] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 208.05 seconds [04:57:16] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [140.0] [04:59:21] ^ that's me [06:29:07] PROBLEM - MariaDB Slave IO: s4 on db2044 is CRITICAL: CRITICAL slave_io_state could not connect [06:29:19] PROBLEM - mysqld processes on db2044 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [06:29:22] PROBLEM - MariaDB disk space on db2044 is CRITICAL: DISK CRITICAL - /srv is not accessible: Input/output error [06:29:22] PROBLEM - Disk space on db2044 is CRITICAL: DISK CRITICAL - /srv is not accessible: Input/output error [06:29:36] PROBLEM - MariaDB Slave SQL: s4 on db2044 is CRITICAL: CRITICAL slave_sql_state could not connect [06:30:55] (03PS2) 10Giuseppe Lavagetto: profile::docker::storage: fix guard around vg_to_remove [puppet] - 10https://gerrit.wikimedia.org/r/371023 [06:31:53] (03Abandoned) 10Giuseppe Lavagetto: confd: fix templates for the future parser. [puppet] - 10https://gerrit.wikimedia.org/r/371022 (owner: 10Giuseppe Lavagetto) [06:32:13] (03PS3) 10Giuseppe Lavagetto: profile::docker::storage: fix guard around vg_to_remove [puppet] - 10https://gerrit.wikimedia.org/r/371023 [06:36:06] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag could not connect [06:36:07] PROBLEM - puppet last run on labcontrol1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [06:46:27] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::docker::storage: fix guard around vg_to_remove [puppet] - 10https://gerrit.wikimedia.org/r/371023 (owner: 10Giuseppe Lavagetto) [06:54:47] 10Operations, 10ops-eqiad, 10DBA: Decommission db1026 - https://phabricator.wikimedia.org/T174763#3572381 (10Marostegui) [06:55:05] 10Operations, 10ops-eqiad, 10DBA: Decommission db1026 - https://phabricator.wikimedia.org/T174763#3572394 (10Marostegui) p:05Triage>03Normal [07:04:46] RECOVERY - puppet last run on labcontrol1002 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [07:06:19] !log Power reset db2044 as it is unresponsive - T174764 [07:06:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:33] T174764: db2044 HW issues - https://phabricator.wikimedia.org/T174764 [07:06:58] <_joe_> marostegui: ouch [07:11:01] RECOVERY - MariaDB disk space on db2044 is OK: DISK OK [07:11:02] RECOVERY - Disk space on db2044 is OK: DISK OK [07:15:36] RECOVERY - Hue Server on thorium is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue [07:16:33] 10Operations, 10ops-codfw, 10DBA: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3572438 (10jcrespo) [07:19:07] PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) timed out before a response was received [07:21:07] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy [07:22:19] !log restart apache2 and hue on thorium, Analytics sites down, investigating [07:22:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:17] RECOVERY - salt-minion processes on thorium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [07:24:34] the host froze for a bit tonight - https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=thorium&refresh=1m&orgId=1&from=now-6h&to=now [07:25:25] and oom killer party [07:27:27] definitely something happened yesterday ~20:30 UTC and it changed its behaviro [07:27:41] will continue debugging with my team [07:30:26] PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [07:30:48] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: Switch all hosts to the future parser - https://phabricator.wikimedia.org/T171704#3572465 (10Joe) [07:30:49] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: Fix the `base::service_unit` template scoping problem - https://phabricator.wikimedia.org/T173078#3572463 (10Joe) 05Open>03Resolved a:03Joe [07:32:17] RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy [07:35:27] RECOVERY - Check systemd state on thorium is OK: OK - running: The system is fully operational [07:46:07] PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) timed out before a response was received [07:47:07] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy [07:49:47] PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [07:50:46] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy [07:53:36] 10Operations, 10Goal, 10Kubernetes: Standardize on the "default" pod setup - https://phabricator.wikimedia.org/T170120#3572511 (10Joe) As far as logging goes, we have basically two big options: - Run a log collector as a sidecar. For this to work, all logs from the application must be sent to a specific por... [07:56:05] (03CR) 10Filippo Giunchedi: "> It actually conflicts with that change being merged, leaving puppet" [puppet] - 10https://gerrit.wikimedia.org/r/371582 (owner: 10Filippo Giunchedi) [07:58:31] (03PS1) 10Phedenskog: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) [07:59:03] (03CR) 10jerkins-bot: [V: 04-1] Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [08:01:36] (03PS2) 10Phedenskog: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) [08:02:02] (03CR) 10jerkins-bot: [V: 04-1] Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [08:02:29] (03PS1) 10Filippo Giunchedi: cassandra: fix jbod_device invocation [puppet] - 10https://gerrit.wikimedia.org/r/375346 [08:03:03] (03PS1) 10Tpt: pp_index table is not private [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) [08:03:37] (03PS3) 10Phedenskog: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) [08:03:57] (03PS2) 10Tpt: pr_index table is not private [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) [08:04:10] (03CR) 10jerkins-bot: [V: 04-1] Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [08:05:17] (03PS4) 10Phedenskog: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) [08:08:25] (03CR) 10Filippo Giunchedi: [C: 032] cassandra: fix jbod_device invocation [puppet] - 10https://gerrit.wikimedia.org/r/375346 (owner: 10Filippo Giunchedi) [08:17:30] (03PS3) 10Jcrespo: mariadb: Implement regular logical backups using mydumper [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) [08:17:32] (03PS1) 10Jcrespo: mariadb: Add script to generate watchlist_count table on labs [puppet] - 10https://gerrit.wikimedia.org/r/375349 (https://phabricator.wikimedia.org/T59617) [08:18:46] (03CR) 10Jcrespo: "This no longer works "as is" because it needs adapting to the new sanitarium hosts." [puppet] - 10https://gerrit.wikimedia.org/r/375349 (https://phabricator.wikimedia.org/T59617) (owner: 10Jcrespo) [08:26:56] PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [08:27:46] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy [08:33:28] (03CR) 10Jcrespo: [C: 031] "This seems to agree with security audit: https://phabricator.wikimedia.org/T103011#3536648" [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) (owner: 10Tpt) [08:34:16] (03PS5) 10Filippo Giunchedi: Use absolute paths for `data_file_directories` [puppet] - 10https://gerrit.wikimedia.org/r/372469 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [08:38:24] (03PS6) 10Filippo Giunchedi: Use absolute paths for `data_file_directories` [puppet] - 10https://gerrit.wikimedia.org/r/372469 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [08:40:33] (03CR) 10Filippo Giunchedi: [C: 032] Use absolute paths for `data_file_directories` [puppet] - 10https://gerrit.wikimedia.org/r/372469 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [08:49:21] (03PS4) 10Filippo Giunchedi: Instance-configurable `heapdump_directory` [puppet] - 10https://gerrit.wikimedia.org/r/375048 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [08:50:46] PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 64.71% of data above the critical threshold [1800.0] [08:51:09] (03CR) 10Filippo Giunchedi: [C: 032] Instance-configurable `heapdump_directory` [puppet] - 10https://gerrit.wikimedia.org/r/375048 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [08:55:20] (03PS4) 10Filippo Giunchedi: Configure `disk_failure_policy: best_effort` [puppet] - 10https://gerrit.wikimedia.org/r/375049 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [08:56:22] (03CR) 10Filippo Giunchedi: [C: 032] Configure `disk_failure_policy: best_effort` [puppet] - 10https://gerrit.wikimedia.org/r/375049 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [08:57:39] 10Operations, 10Goal, 10Kubernetes: Standardize on the "default" pod setup - https://phabricator.wikimedia.org/T170120#3572659 (10Joe) For metrics collection, my proposal (after a chat with @fgiunchedi) have another sidecar running `prometheus-statsd-exporter` in the modifed version we maintain. This will a... [08:58:47] PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 44.00% of data above the critical threshold [1800.0] [08:59:43] ^ looking into wdqs1002, probably related to T174161 [08:59:44] T174161: updater failed during ~40 minutes on wdqs1002 - https://phabricator.wikimedia.org/T174161 [08:59:52] !log depool restbase200[135] before reimage - T169939 [09:00:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:05] T169939: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939 [09:04:05] !log lvs1007 upgrade to pybal 1.13.11 - one-packet-scheduling, instrumentation fixes. T104442, T103882 [09:04:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:19] T104442: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442 [09:04:47] PROBLEM - graphite.wikimedia.org on graphite1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.001 second response time [09:05:07] ugh, I'll take a look at graphite [09:05:16] 10Operations, 10Goal, 10Kubernetes: Standardize on the "default" pod setup - https://phabricator.wikimedia.org/T170120#3572667 (10Joe) A containerized microservice environment should make developing and deploying applications as easy as possible. Given this goal, it is a good idea to abstract from the sing... [09:05:56] PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 39.29% of data above the critical threshold [1800.0] [09:10:06] !log bounce graphite-web on graphite1001, problematic query? using all CPU [09:10:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:56] RECOVERY - graphite.wikimedia.org on graphite1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1547 bytes in 0.008 second response time [09:10:57] RECOVERY - High lag on wdqs1002 is OK: OK: Less than 30.00% above the threshold [600.0] [09:11:06] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [140.0] [09:12:20] there will be 500s alerts, that's graphite [09:14:10] (03CR) 10Ema: [C: 032] varnish: remove varnishtest-runner [puppet] - 10https://gerrit.wikimedia.org/r/374973 (https://phabricator.wikimedia.org/T150660) (owner: 10Ema) [09:14:15] (03PS2) 10Ema: varnish: remove varnishtest-runner [puppet] - 10https://gerrit.wikimedia.org/r/374973 (https://phabricator.wikimedia.org/T150660) [09:14:18] (03CR) 10Ema: [V: 032 C: 032] varnish: remove varnishtest-runner [puppet] - 10https://gerrit.wikimedia.org/r/374973 (https://phabricator.wikimedia.org/T150660) (owner: 10Ema) [09:25:34] 10Operations, 10ops-codfw: Degraded RAID on ms-be2023 - https://phabricator.wikimedia.org/T174777#3572690 (10ops-monitoring-bot) [09:26:37] 10Operations, 10ops-codfw: Degraded RAID on ms-be2023 - https://phabricator.wikimedia.org/T174777#3572694 (10fgiunchedi) This failed yesterday but for some reason `raid_handler` either didn't get called or got called but didn't work (cc @volans) ``` 23:46 -icinga-wm:#wikimedia-operations- PROBLEM - HP RAID on... [09:27:12] 10Operations, 10Discovery, 10Discovery-Analysis, 10Maps, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3572699 (10ema) [09:27:16] 10Operations, 10Analytics, 10Traffic, 10Patch-For-Review: Implement Varnish-level rough ratelimiting - https://phabricator.wikimedia.org/T163233#3572696 (10ema) 05Open>03Resolved a:03ema We've been using `vsthrottle` in prod for a while now, closing. [09:28:04] 10Operations, 10ops-codfw: Degraded RAID on ms-be2023 - https://phabricator.wikimedia.org/T174777#3572700 (10fgiunchedi) [09:28:29] 10Operations, 10ops-codfw: Degraded RAID on ms-be2023 - https://phabricator.wikimedia.org/T174777#3572690 (10fgiunchedi) a:03Papaul @papaul please replace the 3TB disk, thanks! [09:29:23] ACKNOWLEDGEMENT - puppet last run on ms-be2023 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 14 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[parted-/dev/sdc] Filippo Giunchedi T174777 [09:31:01] (03PS7) 10Giuseppe Lavagetto: role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 [09:35:20] (03PS3) 10Elukey: Add QuickSurvey schemas to EventLogging white-list [puppet] - 10https://gerrit.wikimedia.org/r/368769 (https://phabricator.wikimedia.org/T172112) (owner: 10Mforns) [09:36:06] (03CR) 10Elukey: [C: 032] Add QuickSurvey schemas to EventLogging white-list [puppet] - 10https://gerrit.wikimedia.org/r/368769 (https://phabricator.wikimedia.org/T172112) (owner: 10Mforns) [09:49:56] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [09:50:06] PROBLEM - Check systemd state on thorium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:50:27] PROBLEM - pivot on thorium is CRITICAL: connect to address 10.64.53.26 and port 9090: Connection refused [09:50:38] (03PS2) 10Elukey: Add Kartographer schema to EventLogging white-list [puppet] - 10https://gerrit.wikimedia.org/r/369350 (https://phabricator.wikimedia.org/T171622) (owner: 10Mforns) [09:50:43] * elukey cries [09:51:22] (03CR) 10Elukey: [C: 032] Add Kartographer schema to EventLogging white-list [puppet] - 10https://gerrit.wikimedia.org/r/369350 (https://phabricator.wikimedia.org/T171622) (owner: 10Mforns) [09:51:27] RECOVERY - pivot on thorium is OK: TCP OK - 0.000 second response time on 10.64.53.26 port 9090 [09:54:24] (03PS1) 10Ema: varnish: introduce rate limiting for maps [puppet] - 10https://gerrit.wikimedia.org/r/375354 (https://phabricator.wikimedia.org/T169175) [10:01:27] (03PS8) 10Giuseppe Lavagetto: role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 [10:02:57] RECOVERY - salt-minion processes on thorium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:03:01] (03CR) 10Giuseppe Lavagetto: [C: 032] role::mediawiki::memcached: switch to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/371021 (owner: 10Giuseppe Lavagetto) [10:03:07] RECOVERY - Check systemd state on thorium is OK: OK - running: The system is fully operational [10:11:27] PROBLEM - HHVM rendering on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:19] Sep 1 10:11:01 mw1226 systemd[1]: hhvm.service: main process exited, code=killed, status=11/SEGV [10:12:29] RECOVERY - HHVM rendering on mw1226 is OK: HTTP OK: HTTP/1.1 200 OK - 80011 bytes in 0.735 second response time [10:14:16] (03PS2) 10Ema: varnish: introduce rate limiting for maps [puppet] - 10https://gerrit.wikimedia.org/r/375354 (https://phabricator.wikimedia.org/T169175) [10:18:30] 10Operations, 10ops-codfw, 10DBA: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3572787 (10Marostegui) 05Open>03Resolved a:03Marostegui After rebooting the server again, everything looks good again and I see no more HW errors. I have started mysql and replication and everything is... [10:19:22] 10Operations, 10ops-codfw, 10DBA: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3572790 (10Marostegui) p:05Triage>03Normal [10:22:59] (03PS1) 10Marostegui: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375358 (https://phabricator.wikimedia.org/T168661) [10:25:47] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375358 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [10:28:39] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375358 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [10:28:49] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375358 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [10:29:11] (03PS2) 10Giuseppe Lavagetto: parsoid: test commmit for T149432 [puppet] - 10https://gerrit.wikimedia.org/r/370168 [10:29:45] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1081 - T168661 (duration: 00m 43s) [10:29:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:00] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [10:30:23] (03PS1) 10Marostegui: mariadb: Update db1081 socket location [puppet] - 10https://gerrit.wikimedia.org/r/375359 (https://phabricator.wikimedia.org/T148507) [10:31:18] !log Upgrade MariaDB to 10.0.32 on db1081 - T168661 [10:31:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:31] !log stop puppet on thorium and disable root rsyncs - T174756 [10:35:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:44] T174756: rsync-published-datasets cron should not launch multiple rsync processes - https://phabricator.wikimedia.org/T174756 [10:38:25] (03CR) 10Marostegui: [C: 032] mariadb: Update db1081 socket location [puppet] - 10https://gerrit.wikimedia.org/r/375359 (https://phabricator.wikimedia.org/T148507) (owner: 10Marostegui) [10:49:21] (03PS1) 10Filippo Giunchedi: prometheus: fix jmx_exporter_config query [puppet] - 10https://gerrit.wikimedia.org/r/375361 [10:51:14] (03CR) 10Giuseppe Lavagetto: [C: 031] prometheus: fix jmx_exporter_config query [puppet] - 10https://gerrit.wikimedia.org/r/375361 (owner: 10Filippo Giunchedi) [10:51:24] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, to be merged next week" [puppet] - 10https://gerrit.wikimedia.org/r/365619 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [10:53:07] (03CR) 10Filippo Giunchedi: "err, I re-read my earlier comment, it'd be cleaner (e.g. in case of revert) to logically split the patch between adding thumbor "support" " [puppet] - 10https://gerrit.wikimedia.org/r/365619 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [11:06:48] (03PS2) 10Filippo Giunchedi: prometheus: fix jmx_exporter_config query [puppet] - 10https://gerrit.wikimedia.org/r/375361 [11:07:30] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: fix jmx_exporter_config query [puppet] - 10https://gerrit.wikimedia.org/r/375361 (owner: 10Filippo Giunchedi) [11:10:50] PROBLEM - puppet last run on prometheus1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:11:56] that's me ^ [11:12:44] (03PS1) 10Marostegui: db-eqiad.php: Repool db1081 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375362 (https://phabricator.wikimedia.org/T168661) [11:20:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1081 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375362 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [11:21:38] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1081 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375362 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [11:21:48] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1081 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375362 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [11:23:26] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1081 with low weight - T168661 (duration: 00m 44s) [11:23:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:39] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [11:24:27] PROBLEM - puppet last run on prometheus1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:32:47] (03PS5) 10Phedenskog: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) [11:34:47] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [11:35:56] (03PS6) 10Phedenskog: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) [11:36:27] PROBLEM - pivot on thorium is CRITICAL: connect to address 10.64.53.26 and port 9090: Connection refused [11:37:47] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [11:38:06] PROBLEM - Check systemd state on thorium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:38:27] RECOVERY - pivot on thorium is OK: TCP OK - 0.000 second response time on 10.64.53.26 port 9090 [11:43:15] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375363 (https://phabricator.wikimedia.org/T168661) [11:46:09] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375363 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [11:47:44] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375363 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [11:47:53] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375363 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [11:48:04] (03CR) 10Reedy: [C: 031] pr_index table is not private [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) (owner: 10Tpt) [11:50:00] PROBLEM - salt-minion processes on thorium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [11:50:03] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1081 weight - T168661 (duration: 00m 43s) [11:50:10] PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:50:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:16] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [11:52:59] (03CR) 10Jcrespo: [C: 031] "The patch seems to be fine, but let's block its deployment on deleting unnecesary empty tables: https://phabricator.wikimedia.org/T113842#" [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) (owner: 10Tpt) [12:04:08] Reedy: are yo around? [12:05:21] PROBLEM - Check systemd state on thorium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:07:16] 10Operations, 10ops-eqiad, 10DBA: Decommission db1026 - https://phabricator.wikimedia.org/T174763#3572931 (10Marostegui) [12:08:20] (03PS1) 10Marostegui: db-codfw,db-eqiad.php: Remove db1026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375364 (https://phabricator.wikimedia.org/T174763) [12:13:59] (03CR) 10Marostegui: [C: 032] db-codfw,db-eqiad.php: Remove db1026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375364 (https://phabricator.wikimedia.org/T174763) (owner: 10Marostegui) [12:16:27] (03Merged) 10jenkins-bot: db-codfw,db-eqiad.php: Remove db1026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375364 (https://phabricator.wikimedia.org/T174763) (owner: 10Marostegui) [12:16:37] (03CR) 10jenkins-bot: db-codfw,db-eqiad.php: Remove db1026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375364 (https://phabricator.wikimedia.org/T174763) (owner: 10Marostegui) [12:17:43] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Remove db1026 as it will be decommissioned - T174763 (duration: 00m 43s) [12:17:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:57] T174763: Decommission db1026 - https://phabricator.wikimedia.org/T174763 [12:18:21] RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [12:19:51] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove db1026 as it will be decommissioned - T174763 (duration: 00m 43s) [12:20:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:22:59] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1026 - https://phabricator.wikimedia.org/T174763#3572948 (10Marostegui) [12:23:11] RECOVERY - salt-minion processes on thorium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:23:13] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1026 - https://phabricator.wikimedia.org/T174763#3572381 (10Marostegui) [12:23:31] RECOVERY - Check systemd state on thorium is OK: OK - running: The system is fully operational [12:24:19] (03PS1) 10Marostegui: mariadb: Decommission db1026 [puppet] - 10https://gerrit.wikimedia.org/r/375365 (https://phabricator.wikimedia.org/T174763) [12:26:27] (03PS1) 10Marostegui: db-eqiad.php: Restore db1081 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375367 (https://phabricator.wikimedia.org/T168661) [12:26:36] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler02/7686/" [puppet] - 10https://gerrit.wikimedia.org/r/375365 (https://phabricator.wikimedia.org/T174763) (owner: 10Marostegui) [12:26:41] RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] [12:27:25] (03PS2) 10Marostegui: mariadb: Decommission db1026 [puppet] - 10https://gerrit.wikimedia.org/r/375365 (https://phabricator.wikimedia.org/T174763) [12:28:35] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1081 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375367 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [12:31:25] (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1081 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375367 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [12:31:38] (03CR) 10jenkins-bot: db-eqiad.php: Restore db1081 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375367 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [12:32:41] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore db1081 original weight - T168661 (duration: 00m 43s) [12:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:55] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [12:33:34] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: Switch all hosts to the future parser - https://phabricator.wikimedia.org/T171704#3572968 (10Joe) [12:35:52] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: Switch all hosts to the future parser - https://phabricator.wikimedia.org/T171704#3572973 (10Joe) After my series of changes the situation looks much better: https://puppet-compiler.wmflabs.org/compiler02/7683/index-future.html there are 52 classes... [12:36:26] (03CR) 10Marostegui: [C: 032] mariadb: Decommission db1026 [puppet] - 10https://gerrit.wikimedia.org/r/375365 (https://phabricator.wikimedia.org/T174763) (owner: 10Marostegui) [12:39:01] (03PS1) 10Marostegui: s5.hosts: Remove db1026 [software] - 10https://gerrit.wikimedia.org/r/375370 (https://phabricator.wikimedia.org/T174763) [12:42:33] (03Abandoned) 10Giuseppe Lavagetto: mediawiki: fixes for the future parser [WiP] [puppet] - 10https://gerrit.wikimedia.org/r/356539 (owner: 10Giuseppe Lavagetto) [12:43:05] (03Abandoned) 10Giuseppe Lavagetto: profile::mariadb::maintenance: switch to codfw [puppet] - 10https://gerrit.wikimedia.org/r/350382 (owner: 10Giuseppe Lavagetto) [12:48:39] (03CR) 10Krinkle: Make values stackable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [12:51:02] (03CR) 10Qgil: [C: 031] Enable Newsletter on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364734 (https://phabricator.wikimedia.org/T110170) (owner: 10Addshore) [12:54:35] (03PS3) 10Krinkle: webperf: Convert ve.py from ZMQ to KafkaConsumer [puppet] - 10https://gerrit.wikimedia.org/r/375106 (https://phabricator.wikimedia.org/T110903) [12:56:32] (03CR) 10Gehel: [C: 031] "LGTM (as far as I understand VCL)" [puppet] - 10https://gerrit.wikimedia.org/r/375354 (https://phabricator.wikimedia.org/T169175) (owner: 10Ema) [12:59:50] (03CR) 10Gehel: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/365619 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [13:08:47] (03CR) 10Marostegui: [C: 032] s5.hosts: Remove db1026 [software] - 10https://gerrit.wikimedia.org/r/375370 (https://phabricator.wikimedia.org/T174763) (owner: 10Marostegui) [13:09:30] (03Merged) 10jenkins-bot: s5.hosts: Remove db1026 [software] - 10https://gerrit.wikimedia.org/r/375370 (https://phabricator.wikimedia.org/T174763) (owner: 10Marostegui) [13:10:27] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1026 - https://phabricator.wikimedia.org/T174763#3573026 (10Marostegui) [13:10:40] !log Stop MySQL on db1026 as it will be decommissioned - T174763 [13:10:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:53] T174763: Decommission db1026 - https://phabricator.wikimedia.org/T174763 [13:13:34] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1026 - https://phabricator.wikimedia.org/T174763#3573040 (10Marostegui) a:03Cmjohnson db1026 is now ready to be decommissioned and all the pending steps are DC Ops ones, so I am handing this over to @Cmjohnson [13:26:08] (03PS1) 10Filippo Giunchedi: prometheus: fix target collection for jmx_exporter_config [puppet] - 10https://gerrit.wikimedia.org/r/375375 [13:42:33] !log Rename table pr_index on enwiki on db1089 - T174782 [13:42:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:48] T174782: Drop pr_index from wikis where ProofreadPage isn't enabled - https://phabricator.wikimedia.org/T174782 [13:46:40] (03CR) 10Giuseppe Lavagetto: [C: 031] prometheus: fix target collection for jmx_exporter_config [puppet] - 10https://gerrit.wikimedia.org/r/375375 (owner: 10Filippo Giunchedi) [13:47:00] 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3573128 (10Jgreen) a:03Jgreen [13:47:50] (03PS2) 10Filippo Giunchedi: prometheus: fix target collection for jmx_exporter_config [puppet] - 10https://gerrit.wikimedia.org/r/375375 [13:48:03] 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3481495 (10Jgreen) @jgreen configured active-backup for frdb2001. I stopped mysql first, and the reconfigure seemed to go smoothly without a reboot this time [13:48:33] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: fix target collection for jmx_exporter_config [puppet] - 10https://gerrit.wikimedia.org/r/375375 (owner: 10Filippo Giunchedi) [13:50:21] RECOVERY - puppet last run on prometheus1003 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [13:51:52] RECOVERY - puppet last run on prometheus1004 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [14:28:09] !log Add 150G to /srv partition on labsdb1001 [14:28:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:13] jdlrobson: A few hours later.. Yeah I am [14:38:02] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3573274 (10Joe) To recap quickly the plan: Servers already present (in parentheses, servers to be decommissioned): Appserver: 20 row D, 15 row A,  (3 row d,... [14:46:46] (03PS1) 10Filippo Giunchedi: role: drop restbase table/cf 'meta' metrics in Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/375388 [14:51:20] Reedy: just emailed releng [14:51:36] missed your ping!.. i don't suppose you are able to help with an emergency deploy? [14:51:57] Probably, yeah [14:52:00] What's up? [14:52:10] https://phabricator.wikimedia.org/T174724 [14:52:20] https://gerrit.wikimedia.org/r/375360 fixes it [14:53:25] Reedy: but i dont have any deployers available and i'm guessing greg-g will not be around until maybe an hour? [14:53:46] * Reedy CR+2's it [14:55:09] Amusingly, you pinged me originally literally just after I'd gone out [14:55:17] Reedy: hah [14:55:41] 10Operations, 10monitoring: Investigate check_nrpe -u option to reduce critical alerts - https://phabricator.wikimedia.org/T172131#3573320 (10herron) [14:55:51] PROBLEM - Host cp4024 is DOWN: PING CRITICAL - Packet loss = 100% [14:56:07] 10Operations, 10monitoring: Investigate check_nrpe -u option to reduce critical alerts - https://phabricator.wikimedia.org/T172131#3486701 (10herron) 05Open>03Resolved [14:56:11] RECOVERY - Host cp4024 is UP: PING OK - Packet loss = 0%, RTA = 78.57 ms [14:57:39] (03PS2) 10Filippo Giunchedi: role: drop restbase table/cf 'meta' metrics in Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/375388 [14:58:46] (03CR) 10Filippo Giunchedi: [C: 032] role: drop restbase table/cf 'meta' metrics in Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/375388 (owner: 10Filippo Giunchedi) [15:00:58] !log reedy@tin Synchronized php-1.30.0-wmf.16/extensions/Popups/: T174724 (duration: 00m 46s) [15:01:06] jdlrobson: done [15:01:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:11] T174724: [Regression] Page previews stopped working on wikis where jQuery 3 is not available - https://phabricator.wikimedia.org/T174724 [15:01:17] Reedy: all swatteed?! [15:01:52] It's merged and deployed, ja [15:02:17] Reedy: okay just waiting for js to updte.. [15:05:03] (03CR) 10ArielGlenn: "Welp, found something else during a dry run: most mw hosts don't have /usr/local/bin/expanddblist and neither do the snapshots. Can you u" [puppet] - 10https://gerrit.wikimedia.org/r/373354 (https://phabricator.wikimedia.org/T173892) (owner: 10Smalyshev) [15:05:25] w00t [15:05:28] thank you Reedy thank you [15:06:53] !log Restarting Cassandra: restbase-dev1004-{a,b} [15:07:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:40] np [15:10:33] thanks a lot Reedy [15:11:17] did verification on ca.wikipedia.org, all good now [15:15:44] !log Restarting Cassandra: restbase-dev100[5-6]-{a,b} [15:15:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:51] (03PS1) 10Eevans: Pin current Cassandra build (3.11.0-wmf3) [puppet] - 10https://gerrit.wikimedia.org/r/375392 (https://phabricator.wikimedia.org/T169939) [15:24:28] sweet [15:30:51] ACKNOWLEDGEMENT - puppet last run on restbase-dev1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[cassandra] eevans Pending https://gerrit.wikimedia.org/r/375392 to fix Cassandra package pinning [15:30:51] ACKNOWLEDGEMENT - puppet last run on restbase-dev1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[cassandra] eevans Pending https://gerrit.wikimedia.org/r/375392 to fix Cassandra package pinning [15:30:51] ACKNOWLEDGEMENT - puppet last run on restbase-dev1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 14 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[cassandra] eevans Pending https://gerrit.wikimedia.org/r/375392 to fix Cassandra package pinning [15:32:42] <_joe_> urandom: ^^ [15:32:55] _joe_: ?? [15:33:14] <_joe_> should we do that? [15:33:36] you mean the ack? [15:33:42] <_joe_> no, merging the change [15:33:46] oh, yeah [15:33:53] it just fixes the pinning [15:33:59] <_joe_> I wanted to know if you prefer to do it on monday or now [15:34:00] to match the package that is installed [15:34:12] it won't hit any prod machines [15:34:23] <_joe_> yeah I was checking [15:34:39] it also won't help anything other than removing some red from icinga [15:34:49] (03CR) 10Giuseppe Lavagetto: [C: 032] Pin current Cassandra build (3.11.0-wmf3) [puppet] - 10https://gerrit.wikimedia.org/r/375392 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [15:34:53] (because those machines aren't production) [15:35:00] _joe_: thanks! [15:37:21] RECOVERY - puppet last run on restbase-dev1004 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [15:37:56] (03CR) 10Jcrespo: "Andrew: This is blocked on you greenlighting it." [puppet] - 10https://gerrit.wikimedia.org/r/362217 (owner: 10Jcrespo) [15:38:21] RECOVERY - puppet last run on restbase-dev1006 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [15:38:30] RECOVERY - puppet last run on restbase-dev1005 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [15:41:14] (03CR) 10Andrew Bogott: [C: 031] "Sorry for the slow response -- this looks totally fine to me. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/362217 (owner: 10Jcrespo) [15:43:10] 10Operations, 10ops-eqiad, 10Analytics-Cluster, 10Analytics-Kanban, and 2 others: kafka-jumbo.cfg partman recipe creation/troubleshooting - https://phabricator.wikimedia.org/T174457#3573471 (10elukey) Assuming that the 90% maximum limit of guided_size is not applied in the expert recipe (to be verified), I... [15:45:31] (03PS2) 10Elukey: Tune the kafka-jumbo.cfg partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/375002 (https://phabricator.wikimedia.org/T174457) [15:47:11] (03CR) 10Elukey: [C: 032] Tune the kafka-jumbo.cfg partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/375002 (https://phabricator.wikimedia.org/T174457) (owner: 10Elukey) [15:52:49] (03CR) 10Thcipriani: [C: 031] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) (owner: 10BryanDavis) [16:15:46] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2038941 [16:18:08] !log restart varnish backend on cp1074 (mailbox lag) [16:18:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:28] (03PS1) 10Eevans: Fixup (obviously) typo'd data_file_directories entries [puppet] - 10https://gerrit.wikimedia.org/r/375400 (https://phabricator.wikimedia.org/T169939) [16:25:46] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [16:30:58] 10Operations, 10ops-eqiad, 10DBA: Decommission db1045 - https://phabricator.wikimedia.org/T174806#3573553 (10Marostegui) [16:31:16] 10Operations, 10ops-eqiad, 10DBA: Decommission db1045 - https://phabricator.wikimedia.org/T174806#3573566 (10Marostegui) p:05Triage>03Normal [16:53:25] (03CR) 10BryanDavis: "App changes have landed. How do we coordinate the switch over?" [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) (owner: 10BryanDavis) [16:54:47] (03CR) 10Chad: "I can't speak for public/index.php, but the rest of that can just go out whenever using Trebuchet--it'll just ignore it. Then you can land" [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) (owner: 10BryanDavis) [16:56:50] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Patch-For-Review, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3573615 (10zhuyife... [16:56:59] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review: rack/setup/install labstore100[67].wikimedia.org - https://phabricator.wikimedia.org/T167984#3573619 (10RobH) I can see the issue, both labstore1006 and labstore1007 have it. When checking the boot order in bios, it lists the following: ``... [16:57:40] (03CR) 10BryanDavis: "> Then you can land this bit any time after." [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) (owner: 10BryanDavis) [17:00:19] (03CR) 10Alexandros Kosiaris: [C: 032] Rebuild for stretch [debs/cni] - 10https://gerrit.wikimedia.org/r/375000 (owner: 10Alexandros Kosiaris) [17:04:47] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/375354 (https://phabricator.wikimedia.org/T169175) (owner: 10Ema) [17:16:05] (03CR) 10Chad: "Nope, it should Just Work after puppet runs" [puppet] - 10https://gerrit.wikimedia.org/r/375112 (https://phabricator.wikimedia.org/T129154) (owner: 10BryanDavis) [17:35:05] PROBLEM - HHVM rendering on mw1289 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.919 second response time [17:35:06] PROBLEM - Apache HTTP on mw1289 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [17:36:06] RECOVERY - HHVM rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 79916 bytes in 0.291 second response time [17:36:06] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.028 second response time [17:57:56] (03CR) 10Herron: WIP: Add standalone letsencrypt nginx template (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/375071 (https://phabricator.wikimedia.org/T174720) (owner: 10Herron) [17:58:00] (03PS1) 10Urbanecm: Update logo for sr.wikibooks.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375409 (https://phabricator.wikimedia.org/T172284) [18:06:45] (03PS2) 10Herron: WIP: Add standalone letsencrypt nginx template [puppet] - 10https://gerrit.wikimedia.org/r/375071 (https://phabricator.wikimedia.org/T174720) [18:07:06] (03CR) 10jerkins-bot: [V: 04-1] WIP: Add standalone letsencrypt nginx template [puppet] - 10https://gerrit.wikimedia.org/r/375071 (https://phabricator.wikimedia.org/T174720) (owner: 10Herron) [18:07:48] (03PS3) 10Herron: WIP: Add standalone letsencrypt nginx template [puppet] - 10https://gerrit.wikimedia.org/r/375071 (https://phabricator.wikimedia.org/T174720) [18:08:29] (03CR) 10Zoranzoki21: [C: 031] Update logo for sr.wikibooks.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375409 (https://phabricator.wikimedia.org/T172284) (owner: 10Urbanecm) [18:22:44] (03PS1) 10Eevans: Allow more per-instance overrides [puppet] - 10https://gerrit.wikimedia.org/r/375414 (https://phabricator.wikimedia.org/T169939) [18:22:46] (03PS1) 10Eevans: Configure restbase2001 instance data paths [puppet] - 10https://gerrit.wikimedia.org/r/375415 (https://phabricator.wikimedia.org/T169939) [18:34:35] (03PS2) 10Eevans: Allow more per-instance overrides [puppet] - 10https://gerrit.wikimedia.org/r/375414 (https://phabricator.wikimedia.org/T169939) [18:34:37] (03PS2) 10Eevans: Configure restbase2001 instance data paths [puppet] - 10https://gerrit.wikimedia.org/r/375415 (https://phabricator.wikimedia.org/T169939) [18:34:57] (03CR) 10jerkins-bot: [V: 04-1] Allow more per-instance overrides [puppet] - 10https://gerrit.wikimedia.org/r/375414 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [18:37:53] (03PS3) 10Eevans: Allow more per-instance overrides [puppet] - 10https://gerrit.wikimedia.org/r/375414 (https://phabricator.wikimedia.org/T169939) [18:37:55] (03PS3) 10Eevans: Configure restbase2001 instance data paths [puppet] - 10https://gerrit.wikimedia.org/r/375415 (https://phabricator.wikimedia.org/T169939) [18:41:52] (03CR) 10Eevans: [C: 031] "This is a no-op; See [PC output](http://puppet-compiler.wmflabs.org/7688)" [puppet] - 10https://gerrit.wikimedia.org/r/375414 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [18:45:59] (03CR) 10Eevans: "[PC output](http://puppet-compiler.wmflabs.org/7689)" [puppet] - 10https://gerrit.wikimedia.org/r/375415 (https://phabricator.wikimedia.org/T169939) (owner: 10Eevans) [18:46:57] (03PS7) 10Phedenskog: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) [19:30:50] 10Operations, 10Traffic: Fix broken referer categorization for visits from Safari browsers - https://phabricator.wikimedia.org/T154702#3574065 (10JKatzWMF) @TheDJ Thanks for the heads up! [19:39:37] (03PS1) 10Herron: WIP: Add letsencrypt certs to mx servers [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) [19:39:57] (03CR) 10jerkins-bot: [V: 04-1] WIP: Add letsencrypt certs to mx servers [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) (owner: 10Herron) [20:20:29] (03PS2) 10Herron: WIP: Add letsencrypt certs to mx servers [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) [20:20:53] (03CR) 10jerkins-bot: [V: 04-1] WIP: Add letsencrypt certs to mx servers [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) (owner: 10Herron) [20:24:18] Could someone please give access to http://logstash.wikimedia.org/ for myself and DMaza (both WMF) ? [20:24:47] logstash and grafana [20:25:22] I think you have to fill a task out im not quite sure if that applies to wmf staff though [20:25:29] in other words, they're missing the WMF group or something [20:25:44] we should reeeally do this as part of onboarding [20:25:55] what could possibly go wrong [20:26:15] Krenair: well... [20:27:00] Zppix where should the task go? [20:27:20] I maybe wrong... let me double check wikitech [20:27:37] (03PS3) 10Herron: WIP: Add letsencrypt certs to mx servers [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) [20:28:03] (03CR) 10jerkins-bot: [V: 04-1] WIP: Add letsencrypt certs to mx servers [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) (owner: 10Herron) [20:29:27] Krenair do you know if they have to file an access req? [20:29:35] yes [20:29:46] Ok [20:30:02] LDAP-Access-Requests [20:30:12] davidwbarratt: ^ [20:30:31] On phab [20:30:43] you'll want to apply to be added to the 'wmf' group [20:30:44] https://wikitech.wikimedia.org/wiki/LDAP_Groups#Specific_groups [20:30:52] kk thanks! do you want seperate tickets for DMaza and myself or just one ticket? [20:31:08] I believe seperate [20:31:20] Zppix kk, thanks! [20:31:28] Np [20:45:28] Zppix done. [20:45:36] Ok [20:46:11] Someone will be around to look at it at some point i dont have any access to do anything to act on it (im just a volunteer dev) [20:50:44] Zppix no problem. thanks! [20:52:21] davidwbarratt, you may need to get dmaza to post there [20:52:22] not sure [21:19:56] (03PS1) 10RobH: add wmf employee dmaza to admin module [puppet] - 10https://gerrit.wikimedia.org/r/375445 (https://phabricator.wikimedia.org/T174828) [21:20:56] (03CR) 10RobH: [C: 032] add wmf employee dmaza to admin module [puppet] - 10https://gerrit.wikimedia.org/r/375445 (https://phabricator.wikimedia.org/T174828) (owner: 10RobH) [21:21:41] Krenair, you need me to post where? [21:22:03] on the ticket [21:22:12] but it's not me that needs anything [21:22:16] it seems they're doing it anyway [21:22:55] oh ok [21:25:13] DMaza: heh you are all set [21:25:25] i knew you were legit cuz i could see what email account is tied to that ldap account [21:31:07] robh, are those even verified before they go into ldap? [21:31:32] what do you mean? I look up there is a valid wikitech account that is activated and tied to a wikimedia.org email address [21:31:36] and that they are indeed staff [21:31:50] and they have to then be listed in the admins module, so i add them [21:32:14] when someone uses an account that isnt tied to @wikimedia.org and they want the wmf staff flag, then i tend to ask for more verification [21:32:27] like the user stating they dont ahve a staff wikimediaorg account and they want this account used, etc... [21:33:21] if its an nda flag request, i am one of the opsen which legal/hr shared a google sheet listing everyone with an nda on file [21:33:30] so i can confirm against that. [21:33:42] (there is one ldap request that has no nda on file, so its blocked on that user) [21:34:37] all requests ive done are for wmf or nda flags, so thats about it? (the nda flag also tends to have a wmf staff requesting or sponsoring the request since the nda flag is typically for volunteers) [21:34:39] robh, thank you [21:34:47] you too Krenair [21:35:27] welcome =] [21:36:52] we should reeeally do this as part of onboarding (03CR) 10Zoranzoki21: [C: 031] Block vandalism IP that repeatedly added comments / uploaded files [puppet] - 10https://gerrit.wikimedia.org/r/370630 (owner: 10Aklapper) [22:17:44] (03CR) 10Smalyshev: "@ArielGlenn sure, what would you recommend? Where would one store the list of the hosts?" [puppet] - 10https://gerrit.wikimedia.org/r/373354 (https://phabricator.wikimedia.org/T173892) (owner: 10Smalyshev) [22:18:07] apergos: ping? [22:18:12] (03PS1) 10Andrew Bogott: labmon: prometheus classes to monitor the keystone api endpoint [puppet] - 10https://gerrit.wikimedia.org/r/375452 [22:18:34] (03CR) 10jerkins-bot: [V: 04-1] labmon: prometheus classes to monitor the keystone api endpoint [puppet] - 10https://gerrit.wikimedia.org/r/375452 (owner: 10Andrew Bogott) [22:21:08] (03PS2) 10Andrew Bogott: labmon: prometheus classes to monitor the keystone api endpoint [puppet] - 10https://gerrit.wikimedia.org/r/375452 [22:23:03] (03CR) 10Andrew Bogott: "@filippo, this role is applied to andrewclient.puppet.wmflabs.org but I can't make the dashboard load at all... am I making an obvious mis" [puppet] - 10https://gerrit.wikimedia.org/r/375452 (owner: 10Andrew Bogott) [22:24:09] 10Operations, 10Traffic, 10Browser-Support-Apple-Safari, 10Upstream: Fix broken referer categorization for visits from Safari browsers - https://phabricator.wikimedia.org/T154702#3574476 (10TheDJ) [22:27:57] (03PS3) 10GeoffreyT2000: Rename Wikisaurus namespace on Wiktionary to "Thesaurus" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/374063 (https://phabricator.wikimedia.org/T174264) [22:31:55] (03CR) 10Smalyshev: "I've converted it to explicitly stating the dblist and using cat, since I don't really need the expand functionality." [puppet] - 10https://gerrit.wikimedia.org/r/373354 (https://phabricator.wikimedia.org/T173892) (owner: 10Smalyshev) [22:32:09] (03PS7) 10Smalyshev: Add RDF dumps for categories [puppet] - 10https://gerrit.wikimedia.org/r/373354 (https://phabricator.wikimedia.org/T173892)