[00:01:00] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [00:01:20] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [00:09:20] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [00:10:00] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:31:10] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [00:31:31] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [00:40:11] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:40:31] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [01:00:41] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [01:01:20] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [01:09:30] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:09:50] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [01:31:40] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [01:32:00] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [01:40:00] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [01:40:31] que es? [01:40:40] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:42:02] I think it may be because downtime has either expired or something changed today [01:42:31] See T176532 [01:42:31] T176532: Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 [02:01:50] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [02:02:10] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [02:08:26] !log l10nupdate@tin LocalisationUpdate failed: git pull of extensions failed [02:08:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:08:52] Hmm, failed? [02:09:05] thcipriani: ^^ [02:10:20] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [02:11:00] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:31:01] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [02:31:21] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [02:40:10] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:40:30] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [03:01:20] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [03:01:40] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [03:09:41] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [03:10:21] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [03:27:20] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 797.08 seconds [03:31:31] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [03:32:00] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [03:40:00] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [03:40:40] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [03:55:15] (03PS1) 10Andrew Bogott: labweb wikitech: greatly simplify vhost file [puppet] - 10https://gerrit.wikimedia.org/r/416377 [03:56:01] (03CR) 10Andrew Bogott: [C: 032] labweb wikitech: greatly simplify vhost file [puppet] - 10https://gerrit.wikimedia.org/r/416377 (owner: 10Andrew Bogott) [04:01:10] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [04:01:50] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [04:08:30] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 247.73 seconds [04:09:51] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:10:20] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [04:31:01] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [04:31:30] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [04:40:11] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:40:30] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [05:01:20] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [05:01:40] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [05:10:21] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:10:50] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [05:31:40] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [05:32:00] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [05:33:21] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 23 probes of 294 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [05:38:21] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 10 probes of 294 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [05:40:01] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [05:40:40] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:50:21] PROBLEM - ensure kvm processes are running on labvirt1015 is CRITICAL: PROCS CRITICAL: 76 processes with regex args /usr/bin/kvm [05:51:21] RECOVERY - ensure kvm processes are running on labvirt1015 is OK: PROCS OK: 75 processes with regex args /usr/bin/kvm [06:01:11] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [06:01:22] PROBLEM - ensure kvm processes are running on labvirt1015 is CRITICAL: PROCS CRITICAL: 78 processes with regex args /usr/bin/kvm [06:01:50] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [06:03:22] RECOVERY - ensure kvm processes are running on labvirt1015 is OK: PROCS OK: 75 processes with regex args /usr/bin/kvm [06:09:25] (03PS1) 10Andrew Bogott: Change the labvirt kvm test to allow for many more processes [puppet] - 10https://gerrit.wikimedia.org/r/416380 [06:10:00] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:10:20] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [06:31:01] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [06:31:30] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [06:40:10] PROBLEM - Check systemd state on gerrit2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:40:30] PROBLEM - gerrit process on gerrit2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [07:01:20] RECOVERY - Check systemd state on gerrit2001 is OK: OK - running: The system is fully operational [07:01:40] RECOVERY - gerrit process on gerrit2001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site [07:05:01] added a week of downtime to gerrit2 to avoid icinga spam [07:05:42] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#3628916 (10elukey) Just added a week of downtime to gerrit2001 since icinga was spamming. [07:06:32] !log Deploy schema change on s2 primary master db1054 - T185128 T153182 [07:06:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:48] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [07:06:49] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [07:08:18] mutante: o/ - not sure what's best for gerrit2001, probably a week of downtime is only temp and not the best one [07:10:33] 10Operations, 10ops-eqiad, 10DBA: Move db1069 to A1 - https://phabricator.wikimedia.org/T186699#4022894 (10Marostegui) @Cmjohnson let us know if you have time to do this sometime this week. Thanks! [07:13:41] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416383 (https://phabricator.wikimedia.org/T183469) [07:15:38] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Remove db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416383 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [07:15:45] (03PS1) 10Marostegui: db1073: Disable notification [puppet] - 10https://gerrit.wikimedia.org/r/416384 [07:16:27] (03CR) 10Marostegui: [C: 032] db1073: Disable notification [puppet] - 10https://gerrit.wikimedia.org/r/416384 (owner: 10Marostegui) [07:17:38] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416383 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [07:17:53] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416383 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [07:18:52] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove db1073 from config (duration: 00m 59s) [07:19:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:20:04] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Remove db1073 from config (duration: 00m 58s) [07:20:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:26:24] (03PS1) 10Marostegui: mariadb: Move db1073 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/416385 (https://phabricator.wikimedia.org/T183469) [07:26:59] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db1073 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/416385 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [07:28:44] (03PS2) 10Marostegui: mariadb: Move db1073 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/416385 (https://phabricator.wikimedia.org/T183469) [07:29:15] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Move db1073 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/416385 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [07:35:27] (03CR) 10Marostegui: [V: 032 C: 032] "Going to override the -1 as this requires a refactor for the whole misc role, and we are kind of under a rush to replace m5 master as it i" [puppet] - 10https://gerrit.wikimedia.org/r/416385 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [07:38:10] (03PS1) 10Marostegui: install_server: Allow reinstall db1073 [puppet] - 10https://gerrit.wikimedia.org/r/416386 (https://phabricator.wikimedia.org/T183469) [07:38:51] (03CR) 10Marostegui: [C: 032] install_server: Allow reinstall db1073 [puppet] - 10https://gerrit.wikimedia.org/r/416386 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [07:40:30] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022936 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1073.eqiad.wmnet'] ``` Th... [07:51:54] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022944 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1073.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['db1073.eqiad.wmnet'] ``` [07:53:15] (03PS1) 10Marostegui: install_server: Reimage db1073 as jessie [puppet] - 10https://gerrit.wikimedia.org/r/416387 [07:53:53] (03CR) 10Marostegui: [C: 032] install_server: Reimage db1073 as jessie [puppet] - 10https://gerrit.wikimedia.org/r/416387 (owner: 10Marostegui) [07:55:40] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022950 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1073.eqiad.wmnet'] ``` Th... [07:58:24] (03PS1) 10Marostegui: db1073: Move it to m5.hosts [software] - 10https://gerrit.wikimedia.org/r/416388 (https://phabricator.wikimedia.org/T183469) [07:59:41] (03CR) 10Marostegui: [C: 032] db1073: Move it to m5.hosts [software] - 10https://gerrit.wikimedia.org/r/416388 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [08:00:25] (03Merged) 10jenkins-bot: db1073: Move it to m5.hosts [software] - 10https://gerrit.wikimedia.org/r/416388 (https://phabricator.wikimedia.org/T183469) (owner: 10Marostegui) [08:09:17] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022961 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1073.eqiad.wmnet'] ``` Th... [08:14:48] (03PS1) 10Elukey: profile::tcpircbot: remove eventlog1001 references [puppet] - 10https://gerrit.wikimedia.org/r/416389 (https://phabricator.wikimedia.org/T114199) [08:17:50] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Deprecation of mw.errors.* metrics - https://phabricator.wikimedia.org/T188749#4022964 (10elukey) [08:25:51] !log Stop MySQL on db2078 for mariadb and kernel upgrade [08:26:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:43] 10Operations, 10Contributors-Analysis, 10Mail, 10Surveys: Qualtrics cannot send email to wikimedia.org addresses - https://phabricator.wikimedia.org/T176666#4022988 (10Neil_P._Quinn_WMF) p:05Normal>03Triage [08:35:40] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022999 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1073.eqiad.wmnet'] ``` and were **ALL** successful. [08:51:41] 10Operations, 10ops-eqiad: Degraded RAID on stat1004 - https://phabricator.wikimedia.org/T188863#4023015 (10elukey) [08:51:45] 10Operations, 10ops-eqiad: Degraded RAID on stat1004 - https://phabricator.wikimedia.org/T188861#4023017 (10elukey) [08:51:59] 10Operations, 10ops-eqiad: Degraded RAID on stat1004 - https://phabricator.wikimedia.org/T188863#4022030 (10elukey) 05Open>03Resolved a:03elukey @Addshore thanks! [08:54:08] !log Stop mariadb on db2037 to copy it to db1073 [08:54:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:42] (03PS2) 10Filippo Giunchedi: Point private Thumbor Swift user to existing user for now [puppet] - 10https://gerrit.wikimedia.org/r/416240 (https://phabricator.wikimedia.org/T188834) (owner: 10Gilles) [08:57:13] (03CR) 10Filippo Giunchedi: [C: 032] Point private Thumbor Swift user to existing user for now [puppet] - 10https://gerrit.wikimedia.org/r/416240 (https://phabricator.wikimedia.org/T188834) (owner: 10Gilles) [09:01:21] PROBLEM - haproxy failover on dbproxy1005 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [09:01:22] !log roll-restart thumbor to apply https://gerrit.wikimedia.org/r/416240 [09:01:32] ^ that haproxy thingy is me [09:01:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:15:04] !log Deploy schema change on s7 codfw master (db2040), this will generate lag on codfw - T187089 T185128 T153182 [09:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:15:21] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [09:15:21] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [09:15:21] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [09:18:03] (03PS2) 10Gehel: Stephane Bisson should be able to deploy maps. [puppet] - 10https://gerrit.wikimedia.org/r/415845 (https://phabricator.wikimedia.org/T188720) [09:19:42] (03CR) 10Gehel: "@Dzahn: actually, comparing with the groups of pnorman, the additional group is "deployment". I also added the kartotherian / tilerator ad" [puppet] - 10https://gerrit.wikimedia.org/r/415845 (https://phabricator.wikimedia.org/T188720) (owner: 10Gehel) [09:22:30] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Give maps deployment rights to sbisson - https://phabricator.wikimedia.org/T188720#4023099 (10Gehel) Note that in addition to deploying, @SBisson should also have the rights to restart the various services (it does not really make sense to have one per... [09:26:00] (03PS2) 10Filippo Giunchedi: hieradata: add private wikis thumbor swift user [puppet] - 10https://gerrit.wikimedia.org/r/415263 (https://phabricator.wikimedia.org/T187822) [09:27:03] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add private wikis thumbor swift user [puppet] - 10https://gerrit.wikimedia.org/r/415263 (https://phabricator.wikimedia.org/T187822) (owner: 10Filippo Giunchedi) [09:29:19] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#4023117 (10ArielGlenn) Hey @RobH, what are next steps on this? [09:31:32] 10Operations, 10Dumps-Generation: determine hardware needs for dumps in eqiad and codfw - https://phabricator.wikimedia.org/T118154#4023121 (10ArielGlenn) 05Open>03Resolved Time to close this ticket. At this point we have: labstore boxes coming on line soon, dumpsdata hosts deployed months ago, snapshot te... [09:31:35] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#4023123 (10ArielGlenn) [09:32:46] (03PS1) 10Muehlenhoff: Remove access for joewalsh [puppet] - 10https://gerrit.wikimedia.org/r/416402 [09:33:10] PROBLEM - Swift HTTP backend on ms-fe2005 is CRITICAL: connect to address 10.192.0.28 and port 80: Connection refused [09:33:20] PROBLEM - Swift HTTP frontend on ms-fe2005 is CRITICAL: connect to address 10.192.0.28 and port 80: Connection refused [09:33:30] !log roll restart swift in codfw to add thumbor private user [09:33:36] that's me ^ [09:33:40] PROBLEM - Check systemd state on ms-fe2005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:33:40] PROBLEM - Swift HTTPS on ms-fe2005 is CRITICAL: connect to address 10.192.0.28 and port 80: Connection refused [09:33:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:11] PROBLEM - Swift HTTP backend on ms-fe2006 is CRITICAL: connect to address 10.192.16.190 and port 80: Connection refused [09:34:20] PROBLEM - Swift HTTP frontend on ms-fe2006 is CRITICAL: connect to address 10.192.16.190 and port 80: Connection refused [09:34:30] PROBLEM - Check systemd state on ms-fe2006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:34:51] PROBLEM - Swift HTTPS on ms-fe2006 is CRITICAL: connect to address 10.192.16.190 and port 80: Connection refused [09:35:20] PROBLEM - Swift HTTPS on ms-fe2008 is CRITICAL: connect to address 10.192.48.72 and port 80: Connection refused [09:35:31] RECOVERY - haproxy failover on dbproxy1005 is OK: OK check_failover servers up 2 down 0 [09:35:40] PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - CRITICAL - swift-https_443: Servers ms-fe2008.codfw.wmnet are marked down but pooled [09:35:40] PROBLEM - Swift HTTP backend on ms-fe2008 is CRITICAL: connect to address 10.192.48.72 and port 80: Connection refused [09:35:50] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:35:50] PROBLEM - Swift HTTPS on ms-fe2007 is CRITICAL: connect to address 10.192.32.155 and port 80: Connection refused [09:35:51] PROBLEM - Swift HTTP frontend on ms-fe2008 is CRITICAL: connect to address 10.192.48.72 and port 80: Connection refused [09:35:51] PROBLEM - PyBal backends health check on lvs2006 is CRITICAL: PYBAL CRITICAL - CRITICAL - swift-https_443: Servers ms-fe2008.codfw.wmnet are marked down but pooled [09:36:00] PROBLEM - Check systemd state on ms-fe2008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:36:10] (03CR) 10Muehlenhoff: [C: 032] Remove access for joewalsh [puppet] - 10https://gerrit.wikimedia.org/r/416402 (owner: 10Muehlenhoff) [09:36:11] PROBLEM - Swift HTTP backend on ms-fe2007 is CRITICAL: connect to address 10.192.32.155 and port 80: Connection refused [09:36:20] PROBLEM - Check systemd state on ms-fe2007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:36:20] PROBLEM - Swift HTTP frontend on ms-fe2007 is CRITICAL: connect to address 10.192.32.155 and port 80: Connection refused [09:36:40] RECOVERY - Check systemd state on ms-fe2005 is OK: OK - running: The system is fully operational [09:36:41] RECOVERY - Swift HTTPS on ms-fe2005 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.077 second response time [09:37:11] RECOVERY - Swift HTTP backend on ms-fe2005 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.091 second response time [09:37:20] RECOVERY - Swift HTTP frontend on ms-fe2005 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.074 second response time [09:37:40] RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy [09:37:50] RECOVERY - PyBal backends health check on lvs2006 is OK: PYBAL OK - All pools are healthy [09:37:51] RECOVERY - Swift HTTPS on ms-fe2006 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.077 second response time [09:38:11] RECOVERY - Swift HTTP backend on ms-fe2006 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.092 second response time [09:38:20] RECOVERY - Swift HTTP frontend on ms-fe2006 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.074 second response time [09:38:30] RECOVERY - Check systemd state on ms-fe2006 is OK: OK - running: The system is fully operational [09:38:50] RECOVERY - Swift HTTPS on ms-fe2007 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.077 second response time [09:39:11] RECOVERY - Swift HTTP backend on ms-fe2007 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.093 second response time [09:39:20] RECOVERY - Check systemd state on ms-fe2007 is OK: OK - running: The system is fully operational [09:39:20] RECOVERY - Swift HTTP frontend on ms-fe2007 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.077 second response time [09:41:50] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2460 bytes in 0.369 second response time [09:42:24] (03PS1) 10Filippo Giunchedi: hieradata: rename thumbor-private user [puppet] - 10https://gerrit.wikimedia.org/r/416403 (https://phabricator.wikimedia.org/T187822) [09:44:14] (03PS1) 10Marostegui: install_server: Reimage db1073 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/416404 [09:44:54] (03CR) 10Gilles: [C: 031] hieradata: rename thumbor-private user [puppet] - 10https://gerrit.wikimedia.org/r/416403 (https://phabricator.wikimedia.org/T187822) (owner: 10Filippo Giunchedi) [09:44:56] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: rename thumbor-private user [puppet] - 10https://gerrit.wikimedia.org/r/416403 (https://phabricator.wikimedia.org/T187822) (owner: 10Filippo Giunchedi) [09:45:10] PROBLEM - puppet last run on wtp1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:39] (03PS1) 10Elukey: eventlogging: add eventloggingctl script for systemd [puppet] - 10https://gerrit.wikimedia.org/r/416405 (https://phabricator.wikimedia.org/T114199) [09:45:50] PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:50] PROBLEM - puppet last run on mwdebug1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:00] PROBLEM - puppet last run on mc1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:01] PROBLEM - puppet last run on cp4032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:46:07] (03PS2) 10Elukey: eventlogging: add eventloggingctl script for systemd [puppet] - 10https://gerrit.wikimedia.org/r/416405 (https://phabricator.wikimedia.org/T114199) [09:46:14] (03PS2) 10Marostegui: install_server: Reimage db1073 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/416404 [09:46:40] PROBLEM - puppet last run on mw1241 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:47:00] the puppet fails is nitrogen afaict [09:47:10] puppetdb 2234 95.8 15.5 12296736 2563856 ? Ssl 09:42 3:37 [09:47:36] yep [09:47:36] [Mon Mar 5 09:42:30 2018] Out of memory: Kill process 21665 (java) score 392 or sacrifice child [09:47:49] restarted 4min 47s ago [09:48:11] PROBLEM - puppet last run on mc1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:17] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/10248/" [puppet] - 10https://gerrit.wikimedia.org/r/416405 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [09:48:21] RECOVERY - Swift HTTPS on ms-fe2008 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.074 second response time [09:48:21] PROBLEM - puppet last run on elastic1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:31] (03PS4) 10Gilles: Add Thumbor private container user configuration keys [puppet] - 10https://gerrit.wikimedia.org/r/414631 (https://phabricator.wikimedia.org/T187822) [09:48:41] PROBLEM - puppet last run on chlorine is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:48:41] RECOVERY - Swift HTTP backend on ms-fe2008 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.098 second response time [09:48:50] PROBLEM - puppet last run on dns4001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:00] RECOVERY - Swift HTTP frontend on ms-fe2008 is OK: HTTP OK: HTTP/1.1 200 OK - 185 bytes in 0.074 second response time [09:49:01] RECOVERY - Check systemd state on ms-fe2008 is OK: OK - running: The system is fully operational [09:49:10] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:19] (03CR) 10Gilles: Add Thumbor private container user configuration keys (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/414631 (https://phabricator.wikimedia.org/T187822) (owner: 10Gilles) [09:49:20] PROBLEM - puppet last run on restbase1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:33] (03CR) 10Marostegui: [C: 032] install_server: Reimage db1073 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/416404 (owner: 10Marostegui) [09:49:40] (03PS3) 10Marostegui: install_server: Reimage db1073 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/416404 [09:49:50] PROBLEM - puppet last run on snapshot1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:49:50] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:05:44] (03PS3) 10Gehel: wdqs: switch alterting to prometheus instead of icinga [puppet] - 10https://gerrit.wikimedia.org/r/415884 [10:09:40] (03PS3) 10Gehel: wdqs: propagate rename of updater_option to wdqs-test [puppet] - 10https://gerrit.wikimedia.org/r/415906 [10:10:22] (03CR) 10Gehel: [C: 032] wdqs: propagate rename of updater_option to wdqs-test [puppet] - 10https://gerrit.wikimedia.org/r/415906 (owner: 10Gehel) [10:12:25] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4023189 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1073.eqiad.wmnet'] ``` Th... [10:13:30] RECOVERY - puppet last run on elastic1049 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [10:14:10] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [10:14:20] RECOVERY - puppet last run on restbase1008 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [10:14:50] RECOVERY - puppet last run on snapshot1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:14:50] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:15:11] RECOVERY - puppet last run on wtp1036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:15:50] RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:15:50] RECOVERY - puppet last run on mwdebug1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:16:00] RECOVERY - puppet last run on mc1029 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:16:01] RECOVERY - puppet last run on cp4032 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:16:40] RECOVERY - puppet last run on mw1241 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:18:11] RECOVERY - puppet last run on mc1022 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [10:18:41] RECOVERY - puppet last run on chlorine is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [10:18:50] RECOVERY - puppet last run on dns4001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [10:19:27] (03PS4) 10Gehel: wdqs: switch alterting to prometheus instead of icinga [puppet] - 10https://gerrit.wikimedia.org/r/415884 [10:23:03] (03PS1) 10Gehel: wdqs: split updater options between generic and recent change [puppet] - 10https://gerrit.wikimedia.org/r/416408 [10:23:49] !log rolling reboot of logstash* for kernel security update [10:24:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:32] !log drain + reboot analytics10[46-49] for kernel updates [10:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:51] (03CR) 10Gehel: [C: 032] wdqs: split updater options between generic and recent change [puppet] - 10https://gerrit.wikimedia.org/r/416408 (owner: 10Gehel) [10:32:29] 10Operations, 10DBA, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4023215 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1073.eqiad.wmnet'] ``` and were **ALL** successful. [10:41:36] _joe_: hey, you might like this: https://grafana.wikimedia.org/dashboard/db/wikidata-change-propagation?orgId=1 and https://grafana.wikimedia.org/dashboard/db/wikidata-change-propagation?orgId=1&from=now-7d&to=now The second graph says, this week last month Wikidata triggered 2K refresh jobs per minute and this week it triggered 574 per minute [10:42:26] <_joe_> wow [10:42:28] <_joe_> <3 [10:42:41] <_joe_> that's great mainly in prespective [10:43:02] <_joe_> I'm not going to lose my sleep thinking of enwiki embracing wikidata items usage more [10:43:11] <_joe_> thanks a lot [10:43:47] yep thanks a lot ! [10:45:43] that's very interesting [10:46:58] !log rebooting lithium for kernel security update [10:47:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:36] Amir1: \o/ [10:48:26] _joe_: yeah, I'm working on it ;) [10:48:39] Lucas_WMDE: Amir1 woo! [10:49:01] Lucas_WMDE: addshore \o/ [10:50:44] (03PS5) 10Jayprakash12345: Add import sources on pawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414451 (https://phabricator.wikimedia.org/T185982) (owner: 10Tulsi Bhagat) [10:51:58] jouncebot: next [10:51:58] In 0 hour(s) and 8 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180305T1100) [10:52:24] (03CR) 10Jayprakash12345: [C: 031] Add import sources on pawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414451 (https://phabricator.wikimedia.org/T185982) (owner: 10Tulsi Bhagat) [10:53:41] !log rebooting bast2001 for kernel security update [10:53:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:04] jan_drewniak: (Dis)respected human, time to deploy Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180305T1100). Please do the needful. [11:00:04] No GERRIT patches in the queue for this window AFAICS. [11:09:13] !log drain + reboot analytics10[50,51,53,54] for kernel updates [11:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:18] !log running "racadm racreset" on rhenium, mgmt inaccessible [11:19:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:52] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Give sbisson the rights to deploy maps and restart maps-related services - https://phabricator.wikimedia.org/T188720#4023307 (10MoritzMuehlenhoff) [11:24:30] (03CR) 10Muehlenhoff: [C: 031] "Change is fine, but pending approval in ops meeting" [puppet] - 10https://gerrit.wikimedia.org/r/415845 (https://phabricator.wikimedia.org/T188720) (owner: 10Gehel) [11:27:39] 10Operations, 10Puppet: Setting packages on 'hold' breaks puppet runs - https://phabricator.wikimedia.org/T187651#4023309 (10MoritzMuehlenhoff) p:05Triage>03Low [11:28:40] 10Operations, 10Analytics, 10Traffic: Update documentation for "https" field in X-Analytics - https://phabricator.wikimedia.org/T188807#4023315 (10MoritzMuehlenhoff) p:05Triage>03Normal [11:39:44] (03PS1) 10Vgutierrez: Release PyBal 1.15.0 [debs/pybal] (1.15) - 10https://gerrit.wikimedia.org/r/416412 [11:40:06] !log updating tor packages to 0.3.2.10 [11:40:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:59:36] !log upgrading tor on radium [11:59:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:38] 10Operations: netfilter software at WMF: iptables vs nftables - https://phabricator.wikimedia.org/T187994#4023412 (10aborrero) EOF. I propose we follow up in the future. [12:08:00] 10Operations: netfilter software at WMF: iptables vs nftables - https://phabricator.wikimedia.org/T187994#4023431 (10aborrero) 05Open>03stalled [12:08:13] !log installing freexl security updates [12:08:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:39] !log installing wavpack security updates [12:13:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:01] !log installing libvpx security updates [12:19:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:47] PROBLEM - Disk space on stat1005 is CRITICAL: DISK CRITICAL - /mnt/hdfs is not accessible: Transport endpoint is not connected [12:30:36] PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:30:37] PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:30:50] !log Remove db1011 from tendril as it will be decommissioned - T184703 [12:30:56] PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:07] PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:31:07] T184703: Decommission db1011 - https://phabricator.wikimedia.org/T184703 [12:31:17] PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:33:56] PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:38:42] (03PS1) 10Marostegui: install_server: Remove db1011 [puppet] - 10https://gerrit.wikimedia.org/r/416423 (https://phabricator.wikimedia.org/T184703) [12:38:58] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416424 (https://phabricator.wikimedia.org/T184703) [12:39:44] (03CR) 10Marostegui: [C: 032] install_server: Remove db1011 [puppet] - 10https://gerrit.wikimedia.org/r/416423 (https://phabricator.wikimedia.org/T184703) (owner: 10Marostegui) [12:40:27] PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:40:39] !log rebooting bast4001 for kernel security update [12:40:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:59] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Remove db1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416424 (https://phabricator.wikimedia.org/T184703) (owner: 10Marostegui) [12:43:31] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416424 (https://phabricator.wikimedia.org/T184703) (owner: 10Marostegui) [12:45:00] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Remove db1011 from config - T184703 (duration: 01m 02s) [12:45:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:17] T184703: Decommission db1011 - https://phabricator.wikimedia.org/T184703 [12:46:10] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove db1011 from config - T184703 (duration: 01m 02s) [12:46:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:50] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4023515 (10Marostegui) a:03RobH db1011 is now ready to be decommissioned by DC Ops - assigning it to @RobH [12:48:17] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416424 (https://phabricator.wikimedia.org/T184703) (owner: 10Marostegui) [12:54:27] (03PS1) 10Marostegui: db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416426 (https://phabricator.wikimedia.org/T187089) [12:55:17] RECOVERY - DPKG on stat1005 is OK: All packages OK [12:55:46] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [12:55:56] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [12:56:06] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [12:56:56] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416426 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [12:58:26] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416426 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [12:58:40] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1098:3317 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416426 (https://phabricator.wikimedia.org/T187089) (owner: 10Marostegui) [12:58:56] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:00] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1098:3317 for alter table (duration: 00m 57s) [13:00:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:16] !log Deploy schema change on db1098:3317 - T187089 T185128 T153182 [13:00:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:31] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [13:00:32] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [13:00:32] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [13:04:35] !log rebooting bast4002 for kernel security update [13:04:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:16] (03PS2) 10Jcrespo: Revert "labsdb: Depool labsdb1010 in preparation for its recovery" [puppet] - 10https://gerrit.wikimedia.org/r/415923 [13:10:26] RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Mon 2018-03-05 13:10:21 UTC. [13:30:43] (03PS1) 10Rush: openstack: kvm monitoring threshold 75=>90 [puppet] - 10https://gerrit.wikimedia.org/r/416432 (https://phabricator.wikimedia.org/T178405) [13:31:59] (03CR) 10Rush: "Note hightest I see now is still labvirt1015 at 72 instances." [puppet] - 10https://gerrit.wikimedia.org/r/416432 (https://phabricator.wikimedia.org/T178405) (owner: 10Rush) [13:34:29] (03CR) 10Rush: [C: 04-1] "I don't think 998 is a valid case here. Let's set to something we would never expect to see or would necessity investigation. I propse 9" [puppet] - 10https://gerrit.wikimedia.org/r/416380 (owner: 10Andrew Bogott) [13:36:29] 10Operations, 10Beta-Cluster-Infrastructure: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4023646 (10Aklapper) [13:36:43] (03PS2) 10Rush: openstack: kvm monitoring threshold 75=>90 [puppet] - 10https://gerrit.wikimedia.org/r/416432 (https://phabricator.wikimedia.org/T178405) [13:37:41] !log mobrovac@tin Started restart [cpjobqueue/deploy@b5255f0]: Force RecordLintJob rebalance in Kakfa - T188870 [13:37:52] !log rebooting neon for kernel security update [13:37:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:56] T188870: Pages that have linter errors fixed aren't getting updated in Special:LintErrors - https://phabricator.wikimedia.org/T188870 [13:38:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:57] (03CR) 10Rush: [C: 032] openstack: kvm monitoring threshold 75=>90 [puppet] - 10https://gerrit.wikimedia.org/r/416432 (https://phabricator.wikimedia.org/T178405) (owner: 10Rush) [13:41:37] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM! just a nit" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415884 (owner: 10Gehel) [13:41:57] (03CR) 10Rush: [C: 04-1] "I merged https://gerrit.wikimedia.org/r/#/c/416432/ to get us up over 75 (1008 is at 72 now) as that seems like hte most conservative pote" [puppet] - 10https://gerrit.wikimedia.org/r/416380 (owner: 10Andrew Bogott) [13:44:36] (03PS5) 10Gehel: wdqs: switch alerting to prometheus instead of icinga [puppet] - 10https://gerrit.wikimedia.org/r/415884 [13:44:44] (03CR) 10Filippo Giunchedi: Add Thumbor private container user configuration keys (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/414631 (https://phabricator.wikimedia.org/T187822) (owner: 10Gilles) [13:46:04] (03PS6) 10Gehel: wdqs: switch alerting to prometheus instead of icinga [puppet] - 10https://gerrit.wikimedia.org/r/415884 [13:46:12] (03CR) 10Rush: [C: 04-1] "small" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415178 (owner: 10Legoktm) [13:46:17] (03PS2) 10Rush: wmcs: Notify legoktm for codesearch alerts [puppet] - 10https://gerrit.wikimedia.org/r/415178 (owner: 10Legoktm) [13:46:53] (03PS1) 10Filippo Giunchedi: hieradata: match swift account name with username for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/416437 [13:49:39] !log rebooting releases2001 for kernel security update [13:49:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:51:30] (03PS2) 10Filippo Giunchedi: hieradata: match swift account name with username for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/416437 [13:52:15] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: match swift account name with username for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/416437 (owner: 10Filippo Giunchedi) [13:53:13] godog: around this morning? I need advice/ideas regarding wikitech vs. swift [13:54:00] zeljkof: I have CR+2 the three patches for extensions. i guess I will do the swat, I havent done it in a while :] [13:54:08] andrewbogott: sure, what's up? [13:54:43] hashar: FYI: my patches (that you've already +2) can only be tested together [13:54:50] I'm in the process of rebuilding wikitech (again) with a somewhat more standard setup. I have a pair of servers behind lvs. [13:55:00] stephanebisson: guess I will pull/deploy both at the same time :] [13:55:08] But — currently (on silver) most images are hosted locally. That clearly won't work if there are two servers. [13:55:17] RECOVERY - Disk space on stat1005 is OK: DISK OK [13:55:47] So I have two questions: 1) is it possible to do some kind of wholesale import of images from a local server to swift? 2) Having done that, is there a way to get a dump of all images associated with a given wiki? [13:55:55] !log rolling reboot of swift backends in codfw for kernel security update [13:56:08] (If the answer to 2 is 'no' then I may just drop this whole idea, since I need to sync images to wikitech-static) [13:56:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:25] (03PS2) 10Hashar: 2017 wikitext editor: Simplify config part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413652 (owner: 10Jforrester) [13:57:03] godog: do my questions make sense, at least? [13:57:41] andrewbogott: yeah they do! I don't know offhand but IIRC yes there's bulk import/export between different FileRepo, perhaps mediawiki maintenance scripts have something [13:58:14] godog: great, I will dig. Thanks. [13:58:22] andrewbogott: no problem! [13:58:25] in theory File:xxx pages in the local wiki reflect the existence of actual images locally, unless someone uploaded a file, it was moved to commons, and then the File: page was not removed. I think. [13:58:29] hashar: cool :) [13:59:05] apergos: yeah, my worry is that even after I import everything to swift I'll have to rewrite ever single file reference... [13:59:18] oh dear [13:59:52] (03PS7) 10Gehel: wdqs: switch alerting to prometheus instead of icinga [puppet] - 10https://gerrit.wikimedia.org/r/415884 [13:59:56] I guess another question is… should all these images go into commons or should there be a special swift namespace just for wikitech? (I don't really know if swift has namespaces like this) [14:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a European Mid-day SWAT(Max 8 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180305T1400). [14:00:04] James_F, Jayprakash12345, jan_drewniak, and stephanebisson: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:10] (03CR) 10Filippo Giunchedi: "> Patch Set 4:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/414631 (https://phabricator.wikimedia.org/T187822) (owner: 10Gilles) [14:00:20] apergos: but surely this is something that's been done before… we've certainly gobbled up pre-existing external wikis befoer. [14:00:20] as long as there is a wikitech static which has the images on disk, in case we are looking stuff up when commons is gone, [14:00:34] I don't see why we can't have the imiages on the 'non-static' one be on commons [14:00:38] (03CR) 10Gehel: [C: 032] wdqs: switch alerting to prometheus instead of icinga [puppet] - 10https://gerrit.wikimedia.org/r/415884 (owner: 10Gehel) [14:00:48] if the idea is to replace wikitech-static completely, then that's another thing [14:01:11] I'm coming into the middle of the discussion, if there's a ticket I could read or otherwise get the background... [14:01:11] jan_drewniak: I am going to deploy your Popups change https://gerrit.wikimedia.org/r/#/c/415934/ [14:01:15] apergos: the main reason to not do that is that we don't want commons people looking at wikitech docs with screenshots and saying "this looks copyrighted" and destroying our docs :) [14:01:19] hashar: thanks! [14:01:43] good point [14:02:07] well then "local wiki" it is [14:02:17] apergos: wikitech-static will remain. It's the existence of wikitech-static that makes this safe (previously when wikitech-static was unreliable I intentionally kept our diagrams &c. off of swift to avoid losing them) [14:02:23] agreed, and the files in swift will end up in their own container [14:02:27] swift container that is [14:02:32] apergos: I'm going to make a ticket and cc you, it sounds like you have more of a clue than I have [14:02:57] I've generated some media use lists before, so maybe I'll be able to say something useful [14:03:27] pfff [14:03:33] ValueError: /srv/mediawiki-staging/php-1.31.0-wmf.23/extensions/Popups/.eslintrc.json is an invalid JSON file [14:03:41] * hashar blames eslint [14:04:34] godog: remind me I wanna chat with you about the research team request for commons thumbnails at some point [14:05:04] can't find the ticket right now, but you're on it [14:05:37] apergos: yeah I'm on it, LMK when and I'd be happy to chat about that request [14:05:46] (03PS3) 10Jcrespo: Revert "labsdb: Depool labsdb1010 in preparation for its recovery" [puppet] - 10https://gerrit.wikimedia.org/r/415923 [14:06:03] apergos: https://phabricator.wikimedia.org/T188915 [14:06:03] sweet [14:06:03] !log hashar@tin Started scap: Popups: Remove client side formatters in the REST formatter - T183833 [14:06:16] (03CR) 10Jcrespo: [C: 032] Revert "labsdb: Depool labsdb1010 in preparation for its recovery" [puppet] - 10https://gerrit.wikimedia.org/r/415923 (owner: 10Jcrespo) [14:06:19] pff [14:06:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:20] !log hashar@tin scap aborted: Popups: Remove client side formatters in the REST formatter - T183833 (duration: 00m 16s) [14:06:20] T183833: [Bug report] Removing parentheses breaks chemical formulas - https://phabricator.wikimedia.org/T183833 [14:06:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:39] jan_drewniak: ok it is no mwdebug1001 now [14:08:09] (03CR) 10Gilles: "I'm going to wait until the mediawiki change is deployed, the latest branch cut happened before the change to SwiftFileBackend was merged" [puppet] - 10https://gerrit.wikimedia.org/r/414631 (https://phabricator.wikimedia.org/T187822) (owner: 10Gilles) [14:08:25] andrewbogott: I might not get comments on there until tomorrow, is that ok? [14:08:27] (03PS1) 10Gehel: wdqs: query used for alerting should be a scalar [puppet] - 10https://gerrit.wikimedia.org/r/416441 [14:08:28] jan_drewniak: sorry my english is crap today [14:08:33] hashar: Yup! looks like it's working, thanks [14:08:33] sure [14:08:40] !log hashar@tin Started scap: Popups: Remove client side formatters in the REST formatter - T183833 [14:08:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:07] stephanebisson: I will do your changes next [14:09:46] (03CR) 10Filippo Giunchedi: [C: 031] wdqs: query used for alerting should be a scalar [puppet] - 10https://gerrit.wikimedia.org/r/416441 (owner: 10Gehel) [14:09:52] (03CR) 10Gehel: [C: 032] wdqs: query used for alerting should be a scalar [puppet] - 10https://gerrit.wikimedia.org/r/416441 (owner: 10Gehel) [14:11:32] !log mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=bdwikimedia translate # T188853 [14:11:45] (03CR) 10Hashar: [C: 031] "Creating translate tables...done!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416338 (https://phabricator.wikimedia.org/T188853) (owner: 10Jayprakash12345) [14:11:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:50] T188853: Install translate extension in bd.wikimedia.org - https://phabricator.wikimedia.org/T188853 [14:13:52] waiting on LocalisationCache again :( [14:17:32] sync-masters [14:19:04] hello [14:19:14] sorry [14:20:07] check-canaries ! [14:21:00] hashar: You can SWAT today [14:21:32] (03PS1) 10Elukey: role::analytics_cluster::client: force remount of HDFS mountpoint [puppet] - 10https://gerrit.wikimedia.org/r/416442 (https://phabricator.wikimedia.org/T187073) [14:22:11] Jayprakash12345: yeah I am doing it [14:22:16] but the first deploy is taking a while :( [14:23:06] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Deprecation of mw.errors.* metrics - https://phabricator.wikimedia.org/T188749#4023747 (10fgiunchedi) >>! In T188749#4021055, @elukey wrote: > Thanks @Krinkle! @fgiunchedi I think we are ready to go, what do you... [14:24:03] (03PS2) 10Elukey: role::eventlogging::analytics: deprecate mw.errors.* metrics [puppet] - 10https://gerrit.wikimedia.org/r/415887 (https://phabricator.wikimedia.org/T188749) [14:26:19] (03CR) 10Elukey: [C: 032] role::eventlogging::analytics: deprecate mw.errors.* metrics [puppet] - 10https://gerrit.wikimedia.org/r/415887 (https://phabricator.wikimedia.org/T188749) (owner: 10Elukey) [14:27:32] pff [14:30:07] (03PS3) 10Hashar: Enable rollbacker user right at arwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416224 (https://phabricator.wikimedia.org/T188633) (owner: 10Jayprakash12345) [14:30:10] (03PS4) 10Hashar: Enable translate extension in bdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416338 (https://phabricator.wikimedia.org/T188853) (owner: 10Jayprakash12345) [14:30:12] (03PS6) 10Hashar: Add import sources on pawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414451 (https://phabricator.wikimedia.org/T185982) (owner: 10Tulsi Bhagat) [14:31:48] !log hashar@tin Finished scap: Popups: Remove client side formatters in the REST formatter - T183833 (duration: 23m 08s) [14:32:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:05] T183833: [Bug report] Removing parentheses breaks chemical formulas - https://phabricator.wikimedia.org/T183833 [14:32:45] stephanebisson: finally I can do your changes :D [14:32:56] hashar: I am ready to test patch at mwdebug1002. [14:32:58] hashar: I'm ready [14:33:55] syncing [14:33:57] stephanebisson: ok they are both on mwdebug1001 :) [14:34:05] Jayprakash12345: yeah will do yours in a bit :) [14:34:05] hashar: testing... [14:34:16] 10Operations, 10Analytics, 10Traffic: Update documentation for "https" field in X-Analytics - https://phabricator.wikimedia.org/T188807#4023770 (10Tbayer) Very informative, thanks @BBlack! So I understand that these cases would explain the 1% of requests in T188807#4021737 that have NULL for `x_analytics_map... [14:34:35] !log graphite metrics mw.error.* deprecated in T188749 [14:34:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:50] T188749: Deprecation of mw.errors.* metrics - https://phabricator.wikimedia.org/T188749 [14:35:03] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Deprecation of mw.errors.* metrics - https://phabricator.wikimedia.org/T188749#4023773 (10elukey) [14:35:52] hashar: all good [14:36:40] !log hashar@tin Started scap: core + Flow, master/replicate race condition - T182358 T184670 [14:36:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:57] T182358: Restore all default settings (in all sections)" will not opt-out of SD on User talk page - https://phabricator.wikimedia.org/T182358 [14:36:58] T184670: [wmf.16-regression] Fatal exception of type "Flow\Exception\InvalidDataException" for opting out from "Structured Discussions on user talk" - https://phabricator.wikimedia.org/T184670 [14:38:31] (03PS1) 10Gehel: wdqs: wrong escaping of quotes in prometheus check [puppet] - 10https://gerrit.wikimedia.org/r/416443 [14:38:45] (03PS2) 10Gehel: wdqs: wrong escaping of quotes in prometheus check [puppet] - 10https://gerrit.wikimedia.org/r/416443 [14:39:05] oh man [14:40:38] Jayprakash12345: I will do you rchanges now [14:40:53] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416224 (https://phabricator.wikimedia.org/T188633) (owner: 10Jayprakash12345) [14:40:55] ok [14:40:57] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416338 (https://phabricator.wikimedia.org/T188853) (owner: 10Jayprakash12345) [14:41:05] !log hashar@tin Finished scap: core + Flow, master/replicate race condition - T182358 T184670 (duration: 04m 24s) [14:41:12] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414451 (https://phabricator.wikimedia.org/T185982) (owner: 10Tulsi Bhagat) [14:41:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:23] stephanebisson: they are live on the whole cluster ! [14:41:29] hashar: thanks! [14:42:06] (03Merged) 10jenkins-bot: Enable rollbacker user right at arwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416224 (https://phabricator.wikimedia.org/T188633) (owner: 10Jayprakash12345) [14:42:20] (03CR) 10jenkins-bot: Enable rollbacker user right at arwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416224 (https://phabricator.wikimedia.org/T188633) (owner: 10Jayprakash12345) [14:42:26] (03Merged) 10jenkins-bot: Enable translate extension in bdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416338 (https://phabricator.wikimedia.org/T188853) (owner: 10Jayprakash12345) [14:43:48] Jayprakash12345: deploying the fist "rollbacker right for arwikiversity" [14:44:20] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable rollbacker user right at arwikiversity - T188633 (duration: 00m 57s) [14:44:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:36] ok, looks good [14:44:36] T188633: Creation of Rollbacker group on ar.wikiversity - https://phabricator.wikimedia.org/T188633 [14:44:53] Jayprakash12345: and on mwdebug1001, Translate should be activated for bdwikimedia now [14:44:58] (03CR) 10Gehel: [C: 032] wdqs: wrong escaping of quotes in prometheus check [puppet] - 10https://gerrit.wikimedia.org/r/416443 (owner: 10Gehel) [14:45:02] hashar: ping me when swat is done, please [14:45:09] mobrovac: roger [14:46:28] Jayprakash12345: onmwdebug1001 https://bd.wikimedia.org/wiki/Special:Translate?uselang=en looks mroe or less good :) [14:46:39] it does not offer any translations, but I guess it is to be expected [14:46:50] (03CR) 10jenkins-bot: Enable translate extension in bdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416338 (https://phabricator.wikimedia.org/T188853) (owner: 10Jayprakash12345) [14:47:00] yes [14:47:03] Deploy [14:47:33] \o/ [14:48:08] and the last is https://gerrit.wikimedia.org/r/#/c/414451/ "Add import sources on pawikisource" [14:48:34] (03PS3) 10Hashar: 2017 wikitext editor: Simplify config part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413652 (owner: 10Jforrester) [14:48:34] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable translate extension in bdwikimedia - T188853 (duration: 00m 57s) [14:48:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:49] T188853: Install translate extension in bd.wikimedia.org - https://phabricator.wikimedia.org/T188853 [14:49:22] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414451 (https://phabricator.wikimedia.org/T185982) (owner: 10Tulsi Bhagat) [14:50:25] (03PS4) 10Hashar: 2017 wikitext editor: Simplify config part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413652 (owner: 10Jforrester) [14:50:30] hashar: Deploy 414451, If there is no Log error. Because I cant test it. [14:50:49] Jayprakash12345: yeah I am going to deploy it directly [14:50:55] (03Merged) 10jenkins-bot: Add import sources on pawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414451 (https://phabricator.wikimedia.org/T185982) (owner: 10Tulsi Bhagat) [14:51:56] (03CR) 10jenkins-bot: Add import sources on pawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414451 (https://phabricator.wikimedia.org/T185982) (owner: 10Tulsi Bhagat) [14:52:24] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable translate extension in bdwikimedia - T188853 (duration: 00m 57s) [14:52:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:49] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413652 (owner: 10Jforrester) [14:54:02] (03Merged) 10jenkins-bot: 2017 wikitext editor: Simplify config part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413652 (owner: 10Jforrester) [14:54:35] !log hashar@tin Started scap: 2017 wikitext editor: Simplify config part 2 [14:54:48] (03CR) 10Hashar: [C: 032] "Deployed :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413652 (owner: 10Jforrester) [14:54:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:29] (03CR) 10jenkins-bot: 2017 wikitext editor: Simplify config part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/413652 (owner: 10Jforrester) [14:57:28] (03CR) 10Andrew Bogott: "This code is in the midst of a major overhaul so I'd advise that you just avoid it for a couple of weeks. Soon there will be new wikitech" [puppet] - 10https://gerrit.wikimedia.org/r/415768 (owner: 10Dzahn) [14:57:32] !log hashar@tin Finished scap: 2017 wikitext editor: Simplify config part 2 (duration: 02m 57s) [14:57:45] !log European SWAT completed [14:57:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:47] mobrovac: done :) [14:58:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:20] 10Puppet, 10monitoring: puppetization of check_prometheus is not robust to the use of single quotes - https://phabricator.wikimedia.org/T188917#4023818 (10Gehel) [14:58:31] hashar: Thanks for being here. Anything else for me. Can I quit? [14:58:44] Jayprakash12345: all good. Thank you for the patches :) [14:58:48] hashar: ack, merci! [14:59:37] (03PS2) 10Gehel: wdqs: remove diamond collectors which have been replaced by prometheus [puppet] - 10https://gerrit.wikimedia.org/r/415889 [15:00:04] mobrovac and Pchelolo: #bothumor I � Unicode. All rise for JobQueue Deployment Window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180305T1500). [15:00:04] No GERRIT patches in the queue for this window AFAICS. [15:00:34] yes there are [15:03:11] lol [15:03:21] now that is bot humor [15:05:38] 10Operations, 10ops-codfw: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4002973 (10elukey) This is the current layout of our mw codfw servers: |role|A|B|C|D| |appserver| 20|25 (20)|37|0| |api|12|28 (15)|15|0| |videoscaler|1|2 (2) | 1|0| |jobrunner|12|0|5|0| In parentheses ther... [15:06:43] mobrovac: ahhhh it is good to see stuff moving to kafka :] [15:11:38] (03CR) 10Gehel: [C: 032] wdqs: remove diamond collectors which have been replaced by prometheus [puppet] - 10https://gerrit.wikimedia.org/r/415889 (owner: 10Gehel) [15:12:14] 10Operations, 10ops-codfw: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4023857 (10Joe) My racking recommendation would be to put the new servers in row B in place of the ones we're decommissioning. That would maintain a better balance between clusters and racks/rows. I can wo... [15:14:48] !log rebooting webperf2001 for kernel security update [15:15:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:09] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T188187#4023865 (10Cmjohnson) @Marostegui Feel free to fail the disk...I am ready w/a replacement [15:16:26] 10Operations, 10ops-eqiad, 10Analytics-Kanban: DIMM errors for analytics1062 - https://phabricator.wikimedia.org/T187164#4023867 (10Cmjohnson) @elukey I will need to shutdown the server down and move the dimm around. Let me know when it's safe to do this [15:19:39] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T188187#4023869 (10Marostegui) >>! In T188187#4023865, @Cmjohnson wrote: > @Marostegui Feel free to fail the disk...I am ready w/a replacement Thanks - I will do in a sec once I get someone to double check the comm... [15:20:15] !log mobrovac@tin Synchronized php-1.31.0-wmf.23/extensions/EventBus/includes/JobExecutor.php: [JobExecutor] Wait for the replicas if the transaction takes too long (duration: 00m 57s) [15:20:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:52] (03PS2) 10Gehel: wdqs: cleanup after removing diamond collectors [puppet] - 10https://gerrit.wikimedia.org/r/415891 [15:22:55] (03CR) 10Gehel: [C: 032] wdqs: cleanup after removing diamond collectors [puppet] - 10https://gerrit.wikimedia.org/r/415891 (owner: 10Gehel) [15:23:51] 10Operations, 10ops-eqiad, 10Analytics-Kanban: DIMM errors for analytics1062 - https://phabricator.wikimedia.org/T187164#4023874 (10elukey) >>! In T187164#4023867, @Cmjohnson wrote: > @elukey I will need to shutdown the server down and move the dimm around. Let me know when it's safe to do this The server... [15:28:53] !log Mark as failed disk 32:9 on db1068 (s4 primary master) - T188187 [15:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:09] T188187: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T188187 [15:29:16] (03PS1) 10Muehlenhoff: Create a new group for basic Graphite service supervision [puppet] - 10https://gerrit.wikimedia.org/r/416452 (https://phabricator.wikimedia.org/T188649) [15:29:18] (03PS1) 10Muehlenhoff: Add imarlier to graphite-admins [puppet] - 10https://gerrit.wikimedia.org/r/416453 (https://phabricator.wikimedia.org/T188649) [15:31:11] !log ppchelko@tin Started deploy [cpjobqueue/deploy@fe5b1f3]: Enable refreshLinks for 50% of the jobs [15:31:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:41] (03CR) 10Mobrovac: [C: 032] Switch 50% for refreshLinks to kafka job queue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415877 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [15:31:49] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@fe5b1f3]: Enable refreshLinks for 50% of the jobs (duration: 00m 39s) [15:32:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:59] (03CR) 10Paladox: [C: 031] Add a thirdparty/php71 component for use by Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/415856 (owner: 10Muehlenhoff) [15:35:15] (03PS3) 10Mobrovac: Switch 50% for refreshLinks to kafka job queue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415877 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [15:35:51] marostegui: will you be syncing any mw-config changes in the next 30 mins or so? [15:36:14] (03CR) 10Paladox: [C: 031] Add repository configuration for thirdparty/php71 [puppet] - 10https://gerrit.wikimedia.org/r/415857 (owner: 10Muehlenhoff) [15:36:16] mobrovac: yeah, one in a sec, is that a problem=? [15:36:37] (03PS1) 10Marostegui: db-eqiad.php: Depool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416454 (https://phabricator.wikimedia.org/T186699) [15:36:40] mobrovac: this one only ^ [15:36:47] marostegui: can it wasit 2 mins? i need to sync sonthing and already merged it [15:37:14] sure! [15:37:17] kk thnx [15:37:18] :) [15:37:37] 10Operations, 10Analytics, 10Traffic: Update documentation for "https" field in X-Analytics - https://phabricator.wikimedia.org/T188807#4023886 (10BBlack) >>! In T188807#4023770, @Tbayer wrote: > And is it correct to assume besides those HTTP --> HTTPS redirects, there are other cases where we send a 301 rep... [15:38:09] (03CR) 10jenkins-bot: Switch 50% for refreshLinks to kafka job queue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415877 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [15:38:54] !log mobrovac@tin Synchronized wmf-config/InitialiseSettings.php: Switch 50% for refreshLinks to EventBus - T185052 (duration: 00m 57s) [15:39:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:10] T185052: Migrate RefreshLinks job to kafka - https://phabricator.wikimedia.org/T185052 [15:39:17] marostegui: ok, you can go now (but will need back the control of tin after that) [15:39:29] thanks! [15:39:34] (no problem) [15:39:39] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416454 (https://phabricator.wikimedia.org/T186699) (owner: 10Marostegui) [15:40:54] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416454 (https://phabricator.wikimedia.org/T186699) (owner: 10Marostegui) [15:41:06] !log drain + reboot analytics 1055->57 for kernel updates [15:41:16] (03CR) 10Andrew Bogott: [C: 031] "Yes please!" [puppet] - 10https://gerrit.wikimedia.org/r/358896 (https://phabricator.wikimedia.org/T146285) (owner: 10Chad) [15:41:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:10] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1069 - T186699 (duration: 00m 57s) [15:42:11] mobrovac: all yours [15:42:19] gracias marostegui [15:42:20] !log stop and poweroff db1069 for rack change - T186699 [15:42:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:25] T186699: Move db1069 to A1 - https://phabricator.wikimedia.org/T186699 [15:42:28] mobrovac: de nada! [15:42:32] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416454 (https://phabricator.wikimedia.org/T186699) (owner: 10Marostegui) [15:42:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:16] (03PS2) 10Mobrovac: Switch all of the cdnPurge to kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415870 (https://phabricator.wikimedia.org/T188540) (owner: 10Ppchelko) [15:43:25] mobrovac: \o/ [15:43:35] :) [15:43:58] it will be interesting to plan the migration to Kafka 1.0 [15:44:15] yup elukey, looking fwd to it [15:45:10] !log ppchelko@tin Started deploy [cpjobqueue/deploy@346a2b6]: Switch all cdnPurge jobs to kafka [15:45:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:45] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@346a2b6]: Switch all cdnPurge jobs to kafka (duration: 00m 35s) [15:45:54] (03CR) 10Mobrovac: [C: 032] Switch all of the cdnPurge to kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415870 (https://phabricator.wikimedia.org/T188540) (owner: 10Ppchelko) [15:45:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:30] (03PS1) 10Cmjohnson: update db1069 dns for rack change [dns] - 10https://gerrit.wikimedia.org/r/416455 (https://phabricator.wikimedia.org/T186699) [15:47:25] (03Merged) 10jenkins-bot: Switch all of the cdnPurge to kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415870 (https://phabricator.wikimedia.org/T188540) (owner: 10Ppchelko) [15:48:14] (03PS1) 10Sbisson: Revert "Hide Flow beta feature everywhere but testwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416456 [15:48:27] (03CR) 10Cmjohnson: [C: 032] update db1069 dns for rack change [dns] - 10https://gerrit.wikimedia.org/r/416455 (https://phabricator.wikimedia.org/T186699) (owner: 10Cmjohnson) [15:48:29] (03PS2) 10Sbisson: Revert "Hide Flow beta feature everywhere but testwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416456 [15:48:45] (03PS1) 10Sbisson: Revert "Enable log channel T184670" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416457 [15:48:50] (03PS2) 10Sbisson: Revert "Enable log channel T184670" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416457 [15:48:56] (03CR) 10Imarlier: [C: 031] Add imarlier to graphite-admins [puppet] - 10https://gerrit.wikimedia.org/r/416453 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [15:49:30] (03CR) 10Imarlier: [C: 031] Create a new group for basic Graphite service supervision [puppet] - 10https://gerrit.wikimedia.org/r/416452 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [15:49:44] !log mobrovac@tin Synchronized wmf-config/jobqueue.php: Switch all of the cdnPurge to EventBus, file 1/2 - T188540 (duration: 00m 57s) [15:49:55] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T188187#4023913 (10Marostegui) This has been replaced by Chris: ``` root@db1068:~# megacli -PDRbld -ShowProg -PhysDrv [32:9] -aALL Rebuild Progress on Device at Enclosure 32, Slot 9 Completed 1% in 12 Minutes. ``` [15:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:01] T188540: Switch cdnPurge to Kafka - https://phabricator.wikimedia.org/T188540 [15:50:10] !log Deploy schema change on dbstore1002 - T187089 T185128 T153182 [15:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:26] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [15:50:27] T153182: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182 [15:50:27] T185128: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128 [15:50:46] PROBLEM - Host db1069.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:51:04] !log mobrovac@tin Synchronized wmf-config/InitialiseSettings.php: Switch all of the cdnPurge to EventBus, file 2/2 - T188540 (duration: 00m 57s) [15:51:05] ^ that is expected, db1069 is being moved to another rack [15:51:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:00] (03CR) 10jenkins-bot: Switch all of the cdnPurge to kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415870 (https://phabricator.wikimedia.org/T188540) (owner: 10Ppchelko) [15:52:28] !log updating `system_traces` keyspace replication strategy, restbase cassandra cluster [15:52:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:55] (03CR) 10Elukey: [C: 032] "After a chat with Alex it seems that this is what originally introduced the rule for vanadium:" [puppet] - 10https://gerrit.wikimedia.org/r/416389 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [15:55:01] (03PS2) 10Elukey: profile::tcpircbot: remove eventlog1001 references [puppet] - 10https://gerrit.wikimedia.org/r/416389 (https://phabricator.wikimedia.org/T114199) [15:55:40] !log setting trace probability to 0.001 (.1%), eqiad datacenter, restbase cassandra cluster [15:55:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:56] RECOVERY - Host db1069.mgmt is UP: PING OK - Packet loss = 0%, RTA = 2.80 ms [15:56:16] PROBLEM - MegaRAID on db1068 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [15:56:17] ACKNOWLEDGEMENT - MegaRAID on db1068 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T188918 [15:56:23] !log upload helm on apt.wikimedia.org Component: main distros: jessie-wikimedia, stretch-wikimedia T189919 [15:56:28] !log upload tiller on apt.wikimedia.org Component: main distros: jessie-wikimedia, stretch-wikimedia T189919 [15:56:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:39] (03PS2) 10Vgutierrez: Release PyBal 1.15.0 [debs/pybal] (1.15) - 10https://gerrit.wikimedia.org/r/416412 [15:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:04] (03CR) 10ArielGlenn: "This is probably ok for a first take but see comments on the ticket." [puppet] - 10https://gerrit.wikimedia.org/r/416442 (https://phabricator.wikimedia.org/T187073) (owner: 10Elukey) [16:00:26] !log test [16:00:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:40] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Move db1069 to A1 - https://phabricator.wikimedia.org/T186699#4023961 (10Marostegui) 05Open>03Resolved This is all done now - Chris will update racktables Thanks @Cmjohnson [16:05:34] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416460 [16:06:29] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416460 [16:10:21] (03PS1) 10Andrew Bogott: labweb/wikitech/silver: one and only one host syncing with wikitech-static [puppet] - 10https://gerrit.wikimedia.org/r/416463 [16:11:01] (03CR) 10jerkins-bot: [V: 04-1] labweb/wikitech/silver: one and only one host syncing with wikitech-static [puppet] - 10https://gerrit.wikimedia.org/r/416463 (owner: 10Andrew Bogott) [16:14:13] (03PS2) 10Andrew Bogott: labweb/wikitech/silver: one and only one host syncing with wikitech-static [puppet] - 10https://gerrit.wikimedia.org/r/416463 [16:16:49] mobrovac: you around? [16:19:23] (03CR) 10Dzahn: [C: 031] Add imarlier to graphite-admins [puppet] - 10https://gerrit.wikimedia.org/r/416453 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [16:19:26] (03PS2) 10Herron: initial commit of 4.4.0-1 [debs/puppetdb] (4.4.0-1) - 10https://gerrit.wikimedia.org/r/415591 [16:20:17] (03CR) 10Dzahn: [C: 031] Create a new group for basic Graphite service supervision [puppet] - 10https://gerrit.wikimedia.org/r/416452 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [16:21:42] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416460 (owner: 10Marostegui) [16:23:19] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416460 (owner: 10Marostegui) [16:24:33] (03PS3) 10Andrew Bogott: labweb: one and only one host syncing with wikitech-static [puppet] - 10https://gerrit.wikimedia.org/r/416463 [16:24:52] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 after alter table (duration: 00m 57s) [16:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:04] elukey: a downtime that is really long, like a year [16:27:31] (03PS1) 10Marostegui: db-eqiad.php: Repool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416468 (https://phabricator.wikimedia.org/T186699) [16:28:12] (03PS2) 10Marostegui: db-eqiad.php: Repool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416468 (https://phabricator.wikimedia.org/T186699) [16:29:46] (03PS2) 10Elukey: role::analytics_cluster::client: force remount of HDFS mountpoint [puppet] - 10https://gerrit.wikimedia.org/r/416442 (https://phabricator.wikimedia.org/T187073) [16:29:57] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416468 (https://phabricator.wikimedia.org/T186699) (owner: 10Marostegui) [16:30:14] 10Operations, 10DC-Ops, 10monitoring: memory errors not showing in icinga - https://phabricator.wikimedia.org/T183177#3845956 (10fgiunchedi) Outcome from today's monitoring meeting: needs more investigation wrt we can get the hardware errors status from e.g. ipmi or linux directly. Another option is also loo... [16:30:34] (03CR) 10Andrew Bogott: [C: 032] labweb: one and only one host syncing with wikitech-static [puppet] - 10https://gerrit.wikimedia.org/r/416463 (owner: 10Andrew Bogott) [16:31:11] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416468 (https://phabricator.wikimedia.org/T186699) (owner: 10Marostegui) [16:31:49] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1098:3317" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416460 (owner: 10Marostegui) [16:31:53] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416468 (https://phabricator.wikimedia.org/T186699) (owner: 10Marostegui) [16:32:21] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1069 - T186699 (duration: 00m 57s) [16:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:37] T186699: Move db1069 to A1 - https://phabricator.wikimedia.org/T186699 [16:33:26] 10Operations, 10monitoring, 10User-fgiunchedi: Better organization for ops grafana dashboards - https://phabricator.wikimedia.org/T178690#4024073 (10akosiaris) [16:34:37] (03PS15) 10Bstorm: wiki-replicas: Accommodate new comments table with rules and compatibility [puppet] - 10https://gerrit.wikimedia.org/r/415384 (https://phabricator.wikimedia.org/T181650) [16:42:52] (03CR) 10Jcrespo: "While this is the "right way" to do it, will the comments table be usable (I am talking performance, as it could need to touch so many tab" [puppet] - 10https://gerrit.wikimedia.org/r/415384 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [16:45:55] (03PS1) 10Giuseppe Lavagetto: Fetch the last modified index in etcd.php, and expose it via siteinfo. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416470 [16:46:11] 10Operations, 10MediaWiki-Configuration, 10monitoring: EtcdConfig: add Icinga check - https://phabricator.wikimedia.org/T188922#4024115 (10Volans) p:05Triage>03Normal [16:46:53] (03PS1) 10Elukey: eventlogging: remove zmq-forwarder [puppet] - 10https://gerrit.wikimedia.org/r/416471 (https://phabricator.wikimedia.org/T114199) [16:47:51] (03PS1) 10Jcrespo: tendril: Only mark as high QPS > 25000 [software/tendril] - 10https://gerrit.wikimedia.org/r/416472 [16:49:19] (03PS2) 10Giuseppe Lavagetto: Fetch the last modified index in etcd.php, and expose it via siteinfo. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416470 [16:50:33] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/10261/eventlog1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/416471 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [16:50:35] (03CR) 10Marostegui: [C: 031] tendril: Only mark as high QPS > 25000 [software/tendril] - 10https://gerrit.wikimedia.org/r/416472 (owner: 10Jcrespo) [16:51:14] (03CR) 10Jcrespo: [V: 032 C: 032] tendril: Only mark as high QPS > 25000 [software/tendril] - 10https://gerrit.wikimedia.org/r/416472 (owner: 10Jcrespo) [16:51:18] 10Operations, 10Analytics: setup/install eventlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#4024131 (10elukey) [16:53:43] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4024138 (10mobrovac) [16:53:48] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Switch cdnPurge to Kafka - https://phabricator.wikimedia.org/T188540#4024136 (10mobrovac) [16:54:35] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 4 others: Switch cdnPurge to Kafka - https://phabricator.wikimedia.org/T188540#4011588 (10mobrovac) 05Open>03Resolved The `cdnPurge` has been successfully moved over to EventBus. Resolving. [16:59:55] (03PS1) 10Alexandros Kosiaris: Add tiller image used to run services [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/416473 [17:04:50] (03PS1) 10Gehel: wdqs: enable kafka poller on all production nodes [puppet] - 10https://gerrit.wikimedia.org/r/416475 (https://phabricator.wikimedia.org/T188252) [17:04:52] (03CR) 10Bstorm: "> While this is the "right way" to do it, will the comments table be" [puppet] - 10https://gerrit.wikimedia.org/r/415384 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:07:30] 10Operations, 10netops: cr1-eqsin faulty interfaces - https://phabricator.wikimedia.org/T187807#4024176 (10BBlack) The shipping company has updated: `05-Mar-2018 18:34:00 SGT Proof of Delivery Rcvd` [17:13:36] 10Operations, 10Patch-For-Review: setup/install bast1002(WMF4749) - https://phabricator.wikimedia.org/T186623#4024188 (10Dzahn) could get console again, installer fails at: │ Input/output error during write on /dev/sdb │ [17:15:46] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4024193 (10Dzahn) installer is in install loop, fails at partioner (like bast1002 T186623#4024188) [17:19:19] (03PS6) 10Volans: Icinga: add sync check for MW config on etcd [puppet] - 10https://gerrit.wikimedia.org/r/413355 (https://phabricator.wikimedia.org/T182597) [17:19:21] (03PS6) 10Volans: Icinga: add EtcdConfig sync check on MW hosts [puppet] - 10https://gerrit.wikimedia.org/r/413356 (https://phabricator.wikimedia.org/T182597) [17:20:41] 10Operations, 10ops-eqsin, 10Traffic, 10netops: replace sfp+ in use in eqsin EX4600 - https://phabricator.wikimedia.org/T188923#4024207 (10RobH) p:05Triage>03High [17:22:40] 10Operations, 10Analytics, 10Traffic: Investigate and fix odd uri_host values - https://phabricator.wikimedia.org/T188804#4024228 (10fdans) [17:23:52] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 3 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#4024232 (10Volker_E) @MarcoAurelio As you've accomplished T188887#4023319 I'd like to ask you for support here as well… :) [17:25:46] (03CR) 10BryanDavis: [C: 031] "The forcing of php5 was an "optimization" for not starting HHVM to do one-off jobs, but this can be managed with the `PHP` environment var" [puppet] - 10https://gerrit.wikimedia.org/r/358896 (https://phabricator.wikimedia.org/T146285) (owner: 10Chad) [17:27:54] 10Operations, 10ops-codfw, 10User-Elukey: rack/setup/install mw2259-mw2290 - https://phabricator.wikimedia.org/T188301#4024261 (10elukey) [17:30:06] (03PS1) 10Ppchelko: Swith all refreshLinks jobs to Kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416476 (https://phabricator.wikimedia.org/T185052) [17:32:05] !log Added zhuyifei1999_ and chicocvenancio to the "toollabs-trusted" gerrit group [17:32:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:32:53] bd808: thanks [17:33:21] zhuyifei1999_: I'm just making it easier for me to push work on you, but you're welcome :) [17:33:30] brion: as the presumed expert in obscure mediawiki maintenance scripts… do you know if there's a way to get a dump (or even a list) of all images used on a given wiki? This is for dumping wikitech and duplicating it on wikitech-static. Best I can tell, 'dumpUploads.php' only gives me files that are stored locally, not things in swift and/or commons. [17:33:41] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 3 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#4024287 (10MarcoAurelio) @Volker_E Sorry but I don't think I'm able to do what it's requested here (that is, create page under the... [17:33:46] lol [17:34:39] andrewbogott: well, you can dump imagelinks table manually :) [17:34:40] andrewbogott: hmm... listing the File namespace? I think that every instantcommons usage ends up making a local shadow File:.... page [17:34:47] !log drain + reboot analytics10[58-60] for kernel updates [17:35:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:31] bd808: no actual page entry unless you create it locally, just the subbed-in data [17:35:56] brion, bd808, one thing I want to do (ideally) is exclude files that were uploaded to the wiki but aren't actually on any pages. It sounds like brion's suggestion might do that? [17:36:14] in theory i think yea [17:36:30] yeah, imagelinks should be up to date if the jobs are all running correctly I think [17:36:32] That'd be great, right now I'm duplicating a fair bit of porn spam to wikitech-static :) [17:36:40] PROBLEM - Hadoop NodeManager on analytics1059 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [17:37:00] PROBLEM - Hadoop NodeManager on analytics1058 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [17:37:21] PROBLEM - Hadoop NodeManager on analytics1060 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [17:37:23] I was almost sure they were downtimed [17:37:30] OK, so probably I'll write my own maintenance script. D'you think I should try to commit that upstream or just assume this is only useful for my one use case? [17:37:42] I suppose actually copying the files will be more complicated once we move from local thumbs to thumbor/swift [17:37:49] sorry for the spam [17:38:29] andrewbogott: https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/WikimediaMaintenance is made for this kind of weirdly one-off thing [17:38:46] cool [17:39:35] So, 1) list image files, 2) wikitech-static wgets each of those files one by one, 3) wikitech-static mucks up the imagelinks table to change everything to a local file ref [17:39:56] 2) could be 'grab and compress all images into a tarball locally' instead, not sure if that's better though [17:42:30] andrewbogott: what is the substitute of silver in terms of application server, the regulars mediawiki cluster? [17:42:42] or will it be labswebSOMETHING [17:43:09] andrewbogott: once they are coming from swift I'm not sure that there is lot of difference between pulling from inside or outside the network. If you could do some kind of revision check then you could keep the traffic down to the deltas from each dump [17:43:18] jynus: labweb100[12] [17:43:23] thanks [17:43:29] jynus: the new hosts are labweb1001 and labweb1002, in lvs as 'labweb' [17:43:34] cool [17:43:54] we have to kill off OpenStackManager completely before we move to the main cluster [17:44:14] bd808: yeah, not sure how to do version checking but hopefully that's something included in the imagelinks table [17:44:24] https://phabricator.wikimedia.org/T188926 [17:44:34] bd808: Kill it with fire! [17:44:38] bd808: I wasn't sure if that was scheduled for short term or only long term, that was my confusion [17:45:16] jynus: *nod* it's a "mid-term" move and we don't have any firm timeline for the next step [17:46:06] 10Operations, 10Analytics, 10Traffic: Investigate and fix odd uri_host values - https://phabricator.wikimedia.org/T188804#4020291 (10BBlack) The bottom line is that the value of `uri_host` is entirely up to the client, and therefore subject to client-side stupidity. It's legal (in all protocol senses) for a... [17:46:25] James_F: working on it :) Do you want to write a Horizon dashboard for us to take over the last project membership management bits? Good Django + weird framework practice :) [17:47:02] (03PS1) 10Ppchelko: Remove special jobrunners for refreshLinks. [puppet] - 10https://gerrit.wikimedia.org/r/416481 (https://phabricator.wikimedia.org/T185052) [17:47:10] bd808: I've never use Django. This might not be best time to start, given the time-criticality. ;-) [17:47:21] (03CR) 10Anomie: "> In my local copy of the setup (using Mediawiki Vagrant), the temp" [puppet] - 10https://gerrit.wikimedia.org/r/415384 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:47:35] (03CR) 10jerkins-bot: [V: 04-1] Remove special jobrunners for refreshLinks. [puppet] - 10https://gerrit.wikimedia.org/r/416481 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [17:48:01] James_F: its not blocking the HHVM/PHP7 move so its not too critical [17:48:08] * James_F nods. [17:48:18] it would be "nice" though [17:48:34] Yeah, less stuff in the cluster at large is always good. [17:49:21] (03PS2) 10Ppchelko: Remove special jobrunners for refreshLinks. [puppet] - 10https://gerrit.wikimedia.org/r/416481 (https://phabricator.wikimedia.org/T185052) [17:49:43] (03PS3) 10Rush: openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) [17:50:03] (03PS4) 10Rush: openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) [17:50:34] (03CR) 10jerkins-bot: [V: 04-1] openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [17:51:39] (03PS1) 10Giuseppe Lavagetto: Fetch data from etcd on every server, but use them only for labs/x-wikimedia-debug hosts. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416482 [17:51:41] (03PS1) 10Giuseppe Lavagetto: Enable use of EtcdConfig everywhere. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416483 [17:53:28] (03CR) 10Bstorm: ">" [puppet] - 10https://gerrit.wikimedia.org/r/415384 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [17:53:50] RECOVERY - Hadoop NodeManager on analytics1059 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [17:53:52] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Give sbisson the rights to deploy maps and restart maps-related services - https://phabricator.wikimedia.org/T188720#4024357 (10Gehel) approved in Ops meeting [17:53:53] (03PS3) 10Gehel: Stephane Bisson should be able to deploy maps. [puppet] - 10https://gerrit.wikimedia.org/r/415845 (https://phabricator.wikimedia.org/T188720) [17:54:01] RECOVERY - Hadoop NodeManager on analytics1058 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [17:56:30] (03CR) 10Gehel: [C: 032] Stephane Bisson should be able to deploy maps. [puppet] - 10https://gerrit.wikimedia.org/r/415845 (https://phabricator.wikimedia.org/T188720) (owner: 10Gehel) [17:58:45] (03PS5) 10Rush: openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) [17:59:10] (03PS6) 10Rush: openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) [17:59:48] (03CR) 10jerkins-bot: [V: 04-1] openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [18:00:04] gehel: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180305T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:00:40] RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational [18:12:14] (03CR) 10Jcrespo: "You can test directly on the final hosts, no problem- these are not like the ones serving to production application servers, and even on t" [puppet] - 10https://gerrit.wikimedia.org/r/415384 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [18:12:18] (03PS1) 10Gehel: wdqs: fix updater options for wdqs-test [puppet] - 10https://gerrit.wikimedia.org/r/416484 [18:12:54] (03CR) 10Smalyshev: [C: 031] wdqs: fix updater options for wdqs-test [puppet] - 10https://gerrit.wikimedia.org/r/416484 (owner: 10Gehel) [18:13:03] (03CR) 10Gehel: [C: 032] wdqs: fix updater options for wdqs-test [puppet] - 10https://gerrit.wikimedia.org/r/416484 (owner: 10Gehel) [18:15:17] (03CR) 10Imarlier: [C: 031] "Will let you know when the coal update is rolled out" [puppet] - 10https://gerrit.wikimedia.org/r/416471 (https://phabricator.wikimedia.org/T114199) (owner: 10Elukey) [18:15:24] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Give sbisson the rights to deploy maps and restart maps-related services - https://phabricator.wikimedia.org/T188720#4024481 (10Gehel) 05Open>03Resolved a:03Gehel [18:16:55] !log gehel@tin Started deploy [wdqs/wdqs@11c73f0]: new WDQS GUI and updater version [18:17:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:26] (03CR) 10Anomie: [C: 031] "Seems sane, unless we somehow backslid on SUL finalization. Haven't tested." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416331 (https://phabricator.wikimedia.org/T57420) (owner: 10Gergő Tisza) [18:20:03] !log gehel@tin Finished deploy [wdqs/wdqs@11c73f0]: new WDQS GUI and updater version (duration: 03m 08s) [18:20:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:40] SMalyshev: ^ wdqs deployment completed, all tests are green [18:21:58] SMalyshev: ouch, UI is broken [18:22:06] rolling back [18:23:13] !log gehel@tin Started deploy [wdqs/wdqs@11c73f0]: rolling back to previous state, UI is broken [18:23:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:46] gehel: hmm what's broken in UI? [18:24:07] !log gehel@tin Finished deploy [wdqs/wdqs@11c73f0]: rolling back to previous state, UI is broken (duration: 00m 54s) [18:24:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:24:58] SMalyshev: a bunch of JS + CSS not loading [18:25:16] but still the same issue after rollback (at least for me) [18:25:32] gehel: hmm loading fine for me... can you specify? [18:26:11] SMalyshev: I might have a cached version of the HTML... but it also fails in incognito mode... [18:26:11] gehel: everything loads fine for me [18:26:20] SMalyshev: I trust you! [18:26:49] incognito's fine too [18:26:52] SMalyshev: just give me 5' to see if I can understadn what is broken on my side [18:26:54] looks good to me [18:26:59] gehel: okie [18:27:16] but I also don’t see the latest GUI changes being reflected… but I guess that was after the last GUI update in the deploy repo [18:27:17] gehel: try developer console with disable cache checked and see if make a difference [18:27:51] Lucas_WMDE: I just rolled back the deployment, I'll push it again once I understand why things are broken on my side [18:28:31] gehel: I tried it before your latest !log as well [18:28:36] but I might’ve had a cached version [18:30:54] Ok, looks like we have a max-age=300 on the HTML, so I was probably unlucky with caching. I'll open a task about that... [18:31:24] SMalyshev: I'll deploy again after our standup, no need to rush right now... [18:31:50] (03PS1) 10Subramanya Sastry: Enable RemexHTML on kowiki, mznwiki, warwiki, cebwiki, nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416489 (https://phabricator.wikimedia.org/T188869) [18:33:25] bstorm_: I am very sorry for my accident- I am recovering everthing [18:33:49] No worries! Awesome that you can recover it! [18:33:55] it is taking me more than it should because I found some holes in our backups [18:34:19] (03PS1) 10Ppchelko: Switch dynamic and prioritized refreshLinks to kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416490 (https://phabricator.wikimedia.org/T185052) [18:34:22] So it turned out to be useful then ;-) [18:34:30] For finding holes, that is [18:34:32] I am embarrased [18:34:51] this is the first time I did this- I click the edit button by accident [18:34:57] the edit the patch [18:35:07] so I panicked and clicked the delete change [18:35:16] instead the discard patch [18:35:29] (03CR) 10jerkins-bot: [V: 04-1] Switch dynamic and prioritized refreshLinks to kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416490 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [18:35:41] * bstorm_ hugs jynus. Is ok. Also, I just learned something not to do myself. [18:36:34] then I just realized I can only revert changes up to 2 days on the master because bad configuration [18:36:47] but we have 1 week on the replica + backups [18:36:57] I've certainly deleted things on web interfaces like this one that I didn't intend to. I ended up rebuilding a build configuration on TeamCity (which is a lot like Jenkins) for over an hour because I did that. [18:37:43] can you confirm our first change is from 2018-02-28 20:51:40 ? [18:38:47] Looks like Wed, Feb 28, 13:51 from Phab's link [18:38:50] So yep [18:39:09] thanks, I have everthing here, just have to make sure I do not re-delete it again [18:39:15] Cool [18:42:08] bstorm_: apparently, I have just been told that the "delete" button is a new functionality [18:42:36] :) [18:47:17] (03PS3) 10Ppchelko: Remove special jobrunners for refreshLinks and htmlCacheUpdate. [puppet] - 10https://gerrit.wikimedia.org/r/416481 (https://phabricator.wikimedia.org/T185052) [18:47:29] (03CR) 10jerkins-bot: [V: 04-1] Remove special jobrunners for refreshLinks and htmlCacheUpdate. [puppet] - 10https://gerrit.wikimedia.org/r/416481 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [18:48:37] (03PS2) 10Ppchelko: Switch dynamic and prioritized refreshLinks to kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416490 (https://phabricator.wikimedia.org/T185052) [18:51:05] (03PS2) 10Krinkle: scap prep: Scap-ify the creation of beta's StartProfiler.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416334 (https://phabricator.wikimedia.org/T180766) [18:54:23] !log stop slave on db2044 [18:54:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:42] (03PS4) 10Ppchelko: Remove special jobrunners for refreshLinks and htmlCacheUpdate. [puppet] - 10https://gerrit.wikimedia.org/r/416481 (https://phabricator.wikimedia.org/T185052) [18:54:54] (03CR) 10jerkins-bot: [V: 04-1] Remove special jobrunners for refreshLinks and htmlCacheUpdate. [puppet] - 10https://gerrit.wikimedia.org/r/416481 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [18:55:23] (03Abandoned) 10Ppchelko: Remove jobrunner config specific to htmlCacheUpdate. [puppet] - 10https://gerrit.wikimedia.org/r/408576 (https://phabricator.wikimedia.org/T182023) (owner: 10Ppchelko) [18:58:11] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4024653 (10Pchelolo) [19:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Morning SWAT (Max 8 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180305T1900). [19:00:05] Lucas_WMDE, stephanebisson, and James_F: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:18] hello [19:00:25] hi :) [19:00:27] my patch is still in the same situation as last Thursday (where it was discussed with zeljkof and hashar) – CI still failing and I don’t know how to fix it [19:01:03] either convince yourself the failure is unrelated and ignore it, or skip the change :) [19:01:39] (if the change is skipped, the non-backported version will be rolled out with the next train) [19:01:50] Hey. [19:03:44] Do we have a volunteer to do the deployment? [19:04:32] I'll do it [19:06:25] (03PS3) 10MaxSem: Revert "Enable log channel T184670" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416457 (owner: 10Sbisson) [19:07:27] Lucas_WMDE: I'm personally not comfortable about your patch. your team has deployers, including swatters - maybe they can look into this? [19:07:36] (03CR) 10MaxSem: [C: 032] Revert "Enable log channel T184670" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416457 (owner: 10Sbisson) [19:08:10] MaxSem: ^ this one is not testable, it's just cleaning up an unused logging channel [19:08:12] addshore: do you want to look into that backport? (https://gerrit.wikimedia.org/r/415319) [19:08:49] (03Merged) 10jenkins-bot: Revert "Enable log channel T184670" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416457 (owner: 10Sbisson) [19:09:08] (03CR) 10jenkins-bot: Revert "Enable log channel T184670" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416457 (owner: 10Sbisson) [19:10:42] (03PS1) 10Nuria: Archiving data on piwik every 8 hrs [puppet] - 10https://gerrit.wikimedia.org/r/416494 (https://phabricator.wikimedia.org/T188939) [19:14:04] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/416457/ (duration: 00m 58s) [19:14:10] stephanebisson: ^ [19:14:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:59] MaxSem: thanks [19:16:24] (03PS3) 10MaxSem: Revert "Hide Flow beta feature everywhere but testwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416456 (owner: 10Sbisson) [19:16:32] (03CR) 10MaxSem: [C: 032] Revert "Hide Flow beta feature everywhere but testwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416456 (owner: 10Sbisson) [19:18:06] (03Merged) 10jenkins-bot: Revert "Hide Flow beta feature everywhere but testwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416456 (owner: 10Sbisson) [19:19:02] (03CR) 10jenkins-bot: Revert "Hide Flow beta feature everywhere but testwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416456 (owner: 10Sbisson) [19:19:10] stephanebisson: pulled on mwdebug1002, please test [19:19:19] yes sir [19:20:36] !log gehel@tin Started deploy [wdqs/wdqs@11c73f0]: new WDQS GUI and updater version [19:20:47] MaxSem: all good [19:20:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:21:59] !log gehel@tin Finished deploy [wdqs/wdqs@11c73f0]: new WDQS GUI and updater version (duration: 01m 23s) [19:22:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:47] SMalyshev: re-deployment completed, tests are green, GUI looks good, sorry for the delay [19:22:52] Lucas_WMDE: cc ^ [19:23:21] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/416456/ (duration: 00m 58s) [19:23:32] (03PS2) 10Gehel: wdqs: enable kafka poller on all production nodes [puppet] - 10https://gerrit.wikimedia.org/r/416475 (https://phabricator.wikimedia.org/T188252) [19:23:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:37] gehel: thanks – multilingual templates still not in place, but I guess that was just merged too recently (only a few hours ago) [19:23:55] stephanebisson: ^ [19:24:07] MaxSem: thanks [19:24:18] gehel: we seem to have a breakage in the GUI :( when I do a query, result is not displayed [19:24:30] Uncaught ReferenceError: pluralRuleParser is not defined [19:24:36] SMalyshev: damn... this time it worked for me :) [19:24:49] gehel: did you try a query in the GUI? [19:24:55] SMalyshev: yep, I did [19:25:06] hmm let me clean the caches [19:25:11] James_F: ready? [19:25:11] maybe it's my fault [19:25:21] SMalyshev, gehel: might be my fault [19:25:33] MaxSem: Always. [19:25:36] sounds like it could be https://gerrit.wikimedia.org/r/c/412698/ [19:26:11] yeah still fails [19:26:28] Lucas_WMDE, SMalyshev: ok, let's roll back just the GUI change, and keep the removal of jolokia [19:26:47] gehel: yep let me rollback GUI [19:27:08] SMalyshev: I can just move the deployment server back to the previous commit [19:27:26] gehel: ok, let's try that [19:28:12] !log gehel@tin Started deploy [wdqs/wdqs@11c73f0]: rolling back previous GUI update [19:28:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:56] (03PS1) 10Bstorm: wiki-replicas: Accommodate new comments table with rules and compatibility [puppet] - 10https://gerrit.wikimedia.org/r/416496 (https://phabricator.wikimedia.org/T181650) [19:29:37] Lucas_WMDE: it works on my local install from master :( [19:29:45] same here :/ [19:30:17] not sure what's wrong... maybe something messed up with the build [19:30:45] let me set it up on test [19:30:48] !log gehel@tin Finished deploy [wdqs/wdqs@11c73f0]: rolling back previous GUI update (duration: 02m 36s) [19:30:55] SMalyshev, Lucas_WMDE: rollback completed. Let's take a bit of time to check everything before we try again... [19:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:10] you can run $.i18n( 'wdqs-app-resultbrowser-response-summary', 10, 20 ) in the console to test it btw [19:31:11] (03PS6) 10MaxSem: beta: remove $wgReadingListsCentralWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414017 [19:31:16] (03CR) 10MaxSem: [C: 032] beta: remove $wgReadingListsCentralWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414017 (owner: 10MaxSem) [19:31:40] Lucas_WMDE: let me try it [19:31:44] SMalyshev, Lucas_WMDE: for the record, wdqs-test still has the latest GUI version [19:32:07] Lucas_WMDE: yep that fails on prod [19:32:26] gehel: where is that? I didn’t even know that existed [19:32:28] (03Merged) 10jenkins-bot: beta: remove $wgReadingListsCentralWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414017 (owner: 10MaxSem) [19:32:40] Lucas_WMDE: wdqs-test.wmflabs.org [19:32:42] Lucas_WMDE: but works on wdqs-test... let me check it's the same files [19:32:43] (03CR) 10jenkins-bot: beta: remove $wgReadingListsCentralWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414017 (owner: 10MaxSem) [19:32:58] nope not the same files [19:33:04] hashes are different [19:33:14] (03PS6) 10MaxSem: beta: remove $wmgUseReadingLists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414018 [19:33:18] (03CR) 10MaxSem: [C: 032] beta: remove $wmgUseReadingLists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414018 (owner: 10MaxSem) [19:33:34] SMalyshev: atm prod is 1 commit behind wdqs-test [19:33:35] SMalyshev: I can still reproduce the error in a local query-gui-deploy clone [19:33:51] but not on wdqs-test, you’re right [19:34:35] (03Merged) 10jenkins-bot: beta: remove $wmgUseReadingLists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414018 (owner: 10MaxSem) [19:34:58] (03PS5) 10Awight: Split out retrieving globals and use a more machine-readable format [dumps] - 10https://gerrit.wikimedia.org/r/348002 (https://phabricator.wikimedia.org/T185116) [19:35:05] gehel: on wdqs-test vendor file is .2bbc... but on prod it's .8e36 [19:35:40] gehel: deploy one is .8e36 so wdqs-test is wrong [19:35:52] (03CR) 10Bstorm: "On this next version of the review, I have another question on the indexes. In my testing, whenever running the query that handles the co" [puppet] - 10https://gerrit.wikimedia.org/r/416496 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [19:35:53] SMalyshev, elukey: I'm holding off on activating kafka poller until we know where we stand (it should not be related to the current uncertainty in any way, but still) [19:37:01] yeah let's figure out the gui first [19:37:07] one thing at a time [19:37:15] (03CR) 10jenkins-bot: beta: remove $wmgUseReadingLists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414018 (owner: 10MaxSem) [19:37:28] (03PS2) 10MaxSem: beta: remove $wgAutoloadAttemptLowercase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415759 (https://phabricator.wikimedia.org/T166759) [19:37:32] (03CR) 10MaxSem: [C: 032] beta: remove $wgAutoloadAttemptLowercase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415759 (https://phabricator.wikimedia.org/T166759) (owner: 10MaxSem) [19:38:01] SMalyshev: yep, prod and wdqs-test do not have the same version. I rolledback the GUI deployment on prod, but not on wdqs-test. [19:38:16] I can roll it back on wdqs-test as well if that helps... [19:38:48] gehel: nope I want the broken one on wdqs-test to test [19:38:57] gehel: right now it's not there [19:39:00] (03Merged) 10jenkins-bot: beta: remove $wgAutoloadAttemptLowercase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415759 (https://phabricator.wikimedia.org/T166759) (owner: 10MaxSem) [19:40:23] SMalyshev: I'm not sure I follow. Do you need anything from me? Would it be easier in a hangout? [19:40:27] ok I copied the files, now it reproduces on wdqs-test [19:40:52] gehel: not right now, I got it, but fyi the version you tested on wdqs-test was not production version [19:41:06] anyone from the parsing team around? [19:41:19] it was some other build. Doesn't matter now as I copied the right files already, but in the future we may want to verify the hashes are right [19:41:33] MaxSem: Yeah, LGTM. [19:41:42] MaxSem: Sorry, debug mode is really slow when not at the office. [19:42:10] Lucas_WMDE: any ideas how to fix the thing? should we try to revert https://gerrit.wikimedia.org/r/c/412698/? [19:42:37] SMalyshev: I don’t think that’s the reason [19:42:50] I'll try to make a new build and see if it improves matters [19:42:54] that change did something to moment.js, the error now is in $.i18n [19:43:49] !log maxsem@tin Synchronized php-1.31.0-wmf.23/extensions/Cite: https://gerrit.wikimedia.org/r/#/c/416467/ (duration: 00m 58s) [19:44:00] James_F: ^ [19:44:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:44:11] Thanks MaxSem. [19:44:15] looks something weird, like I built from wrong version or something... https://gerrit.wikimedia.org/r/c/416498/ has too many changes which means previous build was somehow messed up [19:44:17] (03PS2) 10MaxSem: Remove $wgBrowserBlacklist, does nothing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416219 [19:44:21] (03CR) 10MaxSem: [C: 032] Remove $wgBrowserBlacklist, does nothing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416219 (owner: 10MaxSem) [19:44:24] SMalyshev: you seem to have the things well in hand... ping me when there is something else to deploy :) [19:44:46] gehel: ok, I'll make a new build and ping you [19:45:50] (03Merged) 10jenkins-bot: Remove $wgBrowserBlacklist, does nothing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416219 (owner: 10MaxSem) [19:47:25] (03PS1) 10Krinkle: beta: Remove redundant CentralNotice overrides [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416500 [19:48:02] MaxSem: nice work removing all those [19:48:17] :) thanks [19:49:22] Lucas_WMDE: can you test on http://wdqs-test.wmflabs.org/ - it should be ok now? [19:49:39] looks good [19:49:43] so did you fix it? [19:49:55] I was just about to write some analysis at you, but it looks like you got farther than me :) [19:50:25] I made a new build after cleaning up my repo... seems to have worked. Not sure how I managed to break it in the first place [19:50:31] ok [19:50:48] this pluralRuleParser seems to be injected in a strange fashion, somewhere in node_modules/jquery.i18n/libs/CLDRPluralRuleParser/src/CLDRPluralRuleParser.js [19:50:55] perhaps the build didn’t include that [19:51:01] (that's why we need to make https://gerrit.wikimedia.org/r/c/415769/ / T160943 work so I won't mess it up ) [19:51:02] T160943: Automate WDQS GUI deployment - https://phabricator.wikimedia.org/T160943 [19:51:26] Lucas_WMDE: I think it somehow picked up wrong version. Not sure why [19:51:49] (03CR) 10Jcrespo: [C: 031] "Regarding the question- I would check with equivalent data (even on the wikireplicas themselves), but yes, if they have the exact same col" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/416496 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [19:51:51] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/416219/ (duration: 00m 57s) [19:52:04] gehel: I think we can try deploying GUI from head again, it should be ok now [19:52:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:55] Lucas_WMDE: did you get your change deployed? [19:55:01] hasharAway: no [19:55:13] Lucas_WMDE: that was https://gerrit.wikimedia.org/r/#/c/415319/ right? [19:55:19] "Fix empty condition list in metadata lookup" [19:55:22] I’m just going to abandon it, it’s not worth spending more time on when the train will reach the non-backported commit in two days anyways IMO [19:55:24] (03CR) 10jenkins-bot: beta: remove $wgAutoloadAttemptLowercase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415759 (https://phabricator.wikimedia.org/T166759) (owner: 10MaxSem) [19:55:26] yes, that one [19:55:28] (03CR) 10jenkins-bot: Remove $wgBrowserBlacklist, does nothing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416219 (owner: 10MaxSem) [19:55:36] Lucas_WMDE: I dont mind force merging and deploying it right now [19:55:39] 10Operations, 10hardware-requests: eqiad/codfw: (4)+(4) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4024800 (10RobH) >>! In T188075#4010635, @brion wrote: > @RobH we'd still like to buy 2 new machines with this configuration, so if/when the ones taken from the image sca... [19:55:47] as I said last week, that is a phan/CI failure and it is unrelated imho [19:56:00] okay :) [19:56:05] but I won’t be able to test it either way [19:57:52] MaxSem: if you are done, I will do the Wikibase patch [19:58:05] hashar: sure [19:58:41] (03PS7) 10Rush: openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) [19:59:20] (03CR) 10jerkins-bot: [V: 04-1] openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [20:00:11] Lucas_WMDE: syncing it [20:00:12] (03PS2) 10Ppchelko: Swith all refreshLinks jobs to Kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416476 (https://phabricator.wikimedia.org/T185052) [20:01:07] (03PS3) 10Ppchelko: Swith all refreshLinks jobs to Kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416476 (https://phabricator.wikimedia.org/T185052) [20:02:00] !log hashar@tin Synchronized php-1.31.0-wmf.23/extensions/Wikibase: Fix empty condition list in metadata lookup - T188313 (duration: 01m 58s) [20:02:02] Lucas_WMDE: done [20:02:07] hashar: thanks! [20:02:10] :] [20:02:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:16] T188313: Fix safeguard for empty condition list in WikiPageEntityMetaDataLookup - https://phabricator.wikimedia.org/T188313 [20:03:12] !log gehel@tin Started deploy [wdqs/wdqs@1983ddf]: wdqs GUI update [20:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:03:31] 10Operations, 10hardware-requests: eqiad/codfw: (4)+(4) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4024816 (10brion) Yes, the R430 with 20/40 cores/threads and 64GB ram, roughly matching the existing ones from the old image scalers pool. As long as they can all be use... [20:04:48] !log gehel@tin Finished deploy [wdqs/wdqs@1983ddf]: wdqs GUI update (duration: 01m 36s) [20:05:01] SMalyshev, Lucas_WMDE: ^deployment completed [20:05:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:33] gehel: thanks, everything looks good now [20:05:53] (03CR) 10Anomie: "If the image_comment_temp table on your local wiki is very small, the planner might decide that it's quicker to scan the whole primary ind" [puppet] - 10https://gerrit.wikimedia.org/r/416496 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [20:06:24] 10Operations, 10hardware-requests: codfw: (1) videoscaler server - https://phabricator.wikimedia.org/T188943#4024838 (10RobH) p:05Triage>03Normal [20:07:47] (03PS8) 10Rush: openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) [20:10:15] 10Operations, 10hardware-requests: eqiad/codfw: (4)+(4) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4024871 (10RobH) >>! In T188075#4024816, @brion wrote: > Yes, the R430 with 20/40 cores/threads and 64GB ram, roughly matching the existing ones from the old image scaler... [20:16:58] (03CR) 10Krinkle: [C: 04-1] coal: Process from Kafka instead of from ZMQ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415218 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [20:17:32] (03CR) 10Krinkle: [C: 04-1] coal: Process from Kafka instead of from ZMQ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/415218 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [20:19:50] (03PS11) 10Imarlier: coal: Process from Kafka instead of from ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/415218 (https://phabricator.wikimedia.org/T110903) [20:20:13] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 3 others: Create an LVS endpoint for videoscalers - https://phabricator.wikimedia.org/T188947#4024921 (10Pchelolo) p:05Triage>03Normal [20:20:17] (03CR) 10Mobrovac: Swith all refreshLinks jobs to Kafka. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416476 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [20:20:24] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 2 others: Create an LVS endpoint for videoscalers - https://phabricator.wikimedia.org/T188947#4024934 (10Pchelolo) [20:21:37] (03CR) 10Ppchelko: Swith all refreshLinks jobs to Kafka. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416476 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [20:23:02] 10Operations, 10hardware-requests: eqiad/codfw: (4)+(4) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4024937 (10dr0ptp4kt) @RobH would you please grant me access on those tix? [20:23:36] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 3 others: Create an LVS endpoint for jobrunners on videoscalers - https://phabricator.wikimedia.org/T188947#4024939 (10mobrovac) [20:23:55] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 4 others: Create an LVS endpoint for jobrunners on videoscalers - https://phabricator.wikimedia.org/T188947#4024921 (10mobrovac) [20:25:32] (03PS12) 10Imarlier: coal: Process from Kafka instead of from ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/415218 (https://phabricator.wikimedia.org/T110903) [20:25:48] (03CR) 10Mobrovac: Swith all refreshLinks jobs to Kafka. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416476 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [20:27:30] 10Operations, 10hardware-requests: eqiad/codfw: (4)+(4) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4024957 (10RobH) >>! In T188075#4024937, @dr0ptp4kt wrote: > @RobH would you please grant me access on those tix? Done, you can now view the contents of the S4 space. A... [20:28:28] (03PS4) 10Ppchelko: Swith all refreshLinks jobs to Kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416476 (https://phabricator.wikimedia.org/T185052) [20:28:43] (03CR) 10Imarlier: "Puppet compiler run showing corrected output: https://puppet-compiler.wmflabs.org/compiler02/10266/graphite1001.eqiad.wmnet/" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/415218 (https://phabricator.wikimedia.org/T110903) (owner: 10Imarlier) [20:29:15] (03Abandoned) 10Ppchelko: Switch dynamic and prioritized refreshLinks to kafka. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416490 (https://phabricator.wikimedia.org/T185052) (owner: 10Ppchelko) [20:29:17] (03PS1) 10Madhuvishy: dumps: Refactor profiles and hierakeys in web/ [puppet] - 10https://gerrit.wikimedia.org/r/416502 [20:29:26] 10Operations, 10hardware-requests: eqiad/codfw: (4)+(4) hardware access request for videoscalers - https://phabricator.wikimedia.org/T188075#4024959 (10dr0ptp4kt) Thanks @RobH, understood. [20:30:04] (03PS2) 10Madhuvishy: dumps: Refactor profiles and hierakeys in web/ [puppet] - 10https://gerrit.wikimedia.org/r/416502 (https://phabricator.wikimedia.org/T168486) [20:41:45] (03PS3) 10Madhuvishy: dumps: Refactor profiles and hierakeys in web/ [puppet] - 10https://gerrit.wikimedia.org/r/416502 (https://phabricator.wikimedia.org/T168486) [20:47:09] (03PS4) 10Madhuvishy: dumps: Refactor profiles and hierakeys in web/ [puppet] - 10https://gerrit.wikimedia.org/r/416502 (https://phabricator.wikimedia.org/T168486) [20:52:30] (03PS5) 10Madhuvishy: dumps: Refactor profiles and hierakeys in web/ [puppet] - 10https://gerrit.wikimedia.org/r/416502 (https://phabricator.wikimedia.org/T168486) [20:57:45] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): Rebuild raids on labvirt1019 and 1020 - https://phabricator.wikimedia.org/T187373#4025018 (10Cmjohnson) I don't what is wrong with these servers, I count 10 disks but the controller is only seeing 8. I don't see any settings that would change t... [21:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180305T2100). [21:00:05] No GERRIT patches in the queue for this window AFAICS. [21:00:16] No ORES surprises today. [21:06:19] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: wdqs1004 broken - https://phabricator.wikimedia.org/T188045#4025043 (10Cmjohnson) a ticket has been placed with Dell You have successfully submitted request SR961706803. [21:07:12] 10Operations, 10Jouncebot, 10Tools, 10Patch-For-Review, and 2 others: Jouncebot: Crashes when issued a command. - https://phabricator.wikimedia.org/T158448#4025047 (10Framawiki) [21:08:17] 10Operations, 10DBA, 10Jouncebot, 10MediaWiki-Maintenance-scripts, and 3 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#4025055 (10Framawiki) [21:13:41] !log arlolra@tin Started deploy [parsoid/deploy@232631f]: Updating Parsoid to d115592 [21:13:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:25:53] !log arlolra@tin Finished deploy [parsoid/deploy@232631f]: Updating Parsoid to d115592 (duration: 12m 12s) [21:26:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:27:20] (03CR) 10Nuria: [C: 031] Parse raw user_agent out of raw eventlogging client side event [puppet] - 10https://gerrit.wikimedia.org/r/415691 (https://phabricator.wikimedia.org/T188673) (owner: 10Ottomata) [21:28:12] (03PS6) 10Madhuvishy: dumps: Refactor profiles and hierakeys in web/ [puppet] - 10https://gerrit.wikimedia.org/r/416502 (https://phabricator.wikimedia.org/T168486) [21:32:02] !log Updated Parsoid to d115592 (T188591) [21:32:05] when reading Phabricator notifications these all seem normal "commented" "moved", "triaged" .. but what i had not noticed before was "shifted". User "shifted" T.. from the public space to the procurement space [21:32:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:18] T188591: token.getAttribute(...)[0].match is not a function - https://phabricator.wikimedia.org/T188591 [21:34:41] (03CR) 10Bstorm: wiki-replicas: Accommodate new comments table with rules and compatibility (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/416496 (https://phabricator.wikimedia.org/T181650) (owner: 10Bstorm) [21:34:49] (03PS7) 10Madhuvishy: dumps: Refactor profiles and hierakeys in web/ [puppet] - 10https://gerrit.wikimedia.org/r/416502 (https://phabricator.wikimedia.org/T168486) [21:35:04] (03PS2) 10Bstorm: wiki-replicas: Accommodate new comments table with rules and compatibility [puppet] - 10https://gerrit.wikimedia.org/r/416496 (https://phabricator.wikimedia.org/T181650) [21:40:12] (03PS1) 10Krinkle: beta: Remove wgCentralBannerRecorder override for old special page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416584 [21:40:38] AndyRussG: Could you +1 https://gerrit.wikimedia.org/r/#/c/416500/ and https://gerrit.wikimedia.org/r/#/c/416584/ ? [21:41:23] Krinkle: u bet! [21:49:08] (03CR) 10Dzahn: [C: 032] Add imarlier to graphite-admins [puppet] - 10https://gerrit.wikimedia.org/r/416453 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [21:49:22] (03PS2) 10Dzahn: Create a new group for basic Graphite service supervision [puppet] - 10https://gerrit.wikimedia.org/r/416452 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [21:49:42] (03CR) 10Dzahn: [C: 032] Create a new group for basic Graphite service supervision [puppet] - 10https://gerrit.wikimedia.org/r/416452 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [21:53:53] 10Operations, 10Cloud-Services, 10Developer-Relations: Use the term "developer account" for Wikimedia LDAP accounts - https://phabricator.wikimedia.org/T179461#3725481 (10Harej) @bd808 emailed wikitech on February 23 asking for input. Unless serious objections are raised, should we consider it a done deal b... [21:54:28] (03CR) 10Smalyshev: [C: 031] "Need to verify it doesn't apply to labs (should be the case, but double check won't hurt)" [puppet] - 10https://gerrit.wikimedia.org/r/416475 (https://phabricator.wikimedia.org/T188252) (owner: 10Gehel) [21:54:32] (03PS2) 10Dzahn: Add imarlier to graphite-admins [puppet] - 10https://gerrit.wikimedia.org/r/416453 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [21:55:33] (03CR) 10Dzahn: [C: 032] "approved in ops meeting" [puppet] - 10https://gerrit.wikimedia.org/r/416453 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [21:57:14] oops.. there is a typo somewhere [21:58:00] or the order of things. yea. i had to merge, run puppet, merge [21:58:09] not merge, merge, puppet [22:00:04] bawolff and Reedy: Your horoscope predicts another unfortunate Weekly Security deployment window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180305T2200). [22:00:04] No GERRIT patches in the queue for this window AFAICS. [22:01:54] lol [22:02:39] (03PS1) 10Urbanecm: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416585 (https://phabricator.wikimedia.org/T188626) [22:02:41] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:02:56] (03PS1) 10Dzahn: admins/graphite: fix group name, graphite-users -> graphite-admins [puppet] - 10https://gerrit.wikimedia.org/r/416586 (https://phabricator.wikimedia.org/T188649) [22:03:01] PROBLEM - puppet last run on graphite2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:03:02] ;) [22:03:37] (03CR) 10Dzahn: [C: 032] admins/graphite: fix group name, graphite-users -> graphite-admins [puppet] - 10https://gerrit.wikimedia.org/r/416586 (https://phabricator.wikimedia.org/T188649) (owner: 10Dzahn) [22:05:33] (03CR) 10Dzahn: [C: 032] "https://gerrit.wikimedia.org/r/#/c/416586/" [puppet] - 10https://gerrit.wikimedia.org/r/416452 (https://phabricator.wikimedia.org/T188649) (owner: 10Muehlenhoff) [22:07:04] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: performance-team/imarlier need access to graphite servers - https://phabricator.wikimedia.org/T188649#4025177 (10Dzahn) ``` [graphite1001:~] $ id imarlier uid=18334(imarlier) gid=500(wikidev) groups=500(wikidev),800(graphite-admins) [graphite1001:~] $... [22:07:41] RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:08:01] RECOVERY - puppet last run on graphite2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:08:33] (03PS9) 10Rush: openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) [22:08:54] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: performance-team/imarlier need access to graphite servers - https://phabricator.wikimedia.org/T188649#4025205 (10Dzahn) - new group graphite-admins has been created - imarlier account has been created on graphite machines (pending puppet run, max 30 mi... [22:09:10] PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:11:54] (03CR) 10Gehel: "Lab is set in https://github.com/wikimedia/puppet/blob/production/hieradata/labs/wikidata-query/common.yaml#L12 so this should not affect " [puppet] - 10https://gerrit.wikimedia.org/r/416475 (https://phabricator.wikimedia.org/T188252) (owner: 10Gehel) [22:19:39] (03PS10) 10Rush: openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) [22:20:57] (03CR) 10Rush: [C: 032] openstack: keystone bootstrap setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/415392 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [22:26:07] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: performance-team/imarlier need access to graphite servers - https://phabricator.wikimedia.org/T188649#4025257 (10Imarlier) 05Open>03Resolved a:03Imarlier Works like a charm! ``` (coal) imarlier@WMF2024 ~/dev/src/puppet (coal-kafka●)$ ssh graphit... [22:27:33] * bawolff is going to go deploy a security patch [22:31:07] * Hauskatze is curious as all cats [22:32:02] Hauskatze: honestly, its not as exciting as it sounds [22:33:48] that's, of course, what he would say if it was extremely exciting [22:34:10] RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:35:47] SECURITY: Fix bug giving every user a million dollars [22:39:23] bawolff: really? [22:39:37] There’s an actual bug doing that? /me hopes note :) [22:39:38] umm, no [22:44:36] !log bawolff@tin Synchronized php-1.31.0-wmf.23/includes/logging/LogPager.php: T188145 (duration: 00m 58s) [22:44:40] (03PS1) 10Rush: openstack: keystone bootstrap notes [puppet] - 10https://gerrit.wikimedia.org/r/416594 (https://phabricator.wikimedia.org/T188266) [22:44:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:45:48] (03PS2) 10Rush: openstack: keystone bootstrap notes [puppet] - 10https://gerrit.wikimedia.org/r/416594 (https://phabricator.wikimedia.org/T188266) [22:46:28] (03CR) 10Rush: [C: 032] openstack: keystone bootstrap notes [puppet] - 10https://gerrit.wikimedia.org/r/416594 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [22:47:53] Hauskatze: All your questions are now answered: https://phabricator.wikimedia.org/T188145 [22:51:36] bawolff: :) tho T187638 is still a mistery :) [22:52:06] Yeah, and it will remain so until next mediawiki release [22:52:34] Also, I was stupid, and wrote my patch against an old checkout of mediawiki, so it got -1'd, so I'm not deploying the patch for that one today [22:55:58] don't you normally DOLOGMSGNOLOG those syncs? [22:56:43] Krenair: usually yes, except i was going to make the bug public in 5 minutes anyways so I thought there wasn't much point [22:56:47] ah [23:09:56] (03PS1) 10Andrew Bogott: role::mariadb::ferm: Allow db access to labweb [puppet] - 10https://gerrit.wikimedia.org/r/416598 (https://phabricator.wikimedia.org/T188915) [23:10:34] (03CR) 10jerkins-bot: [V: 04-1] role::mariadb::ferm: Allow db access to labweb [puppet] - 10https://gerrit.wikimedia.org/r/416598 (https://phabricator.wikimedia.org/T188915) (owner: 10Andrew Bogott) [23:14:09] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 3 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#3911827 (10Volker_E) Sorry @MarcoAurelio, identified that you were just the courier on the other change. Have reached out to @demo... [23:22:14] (03PS5) 10Aaron Schulz: [WIP] Add dynomite module and dynomite_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/415789 [23:22:50] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add dynomite module and dynomite_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/415789 (owner: 10Aaron Schulz) [23:30:01] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 2 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776#4019041 (10Platonides) Please remember to 301 /wiki/.* to the new url… [23:36:29] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting shell access and access to groups 'analytics-privatedata-users' and 'researchers' for katielin (katie) - https://phabricator.wikimedia.org/T187623#3981063 (10Dzahn) @MeganHernandez_WMF Can you confirm that you "sponsor" Katielin in getting t... [23:38:37] (03PS6) 10Aaron Schulz: [WIP] Add dynomite module and dynomite_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/415789 [23:39:25] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add dynomite module and dynomite_wancache profile [puppet] - 10https://gerrit.wikimedia.org/r/415789 (owner: 10Aaron Schulz) [23:43:05] (03PS1) 10Rush: openstack: set region to codfw1dev-r for labtestn [puppet] - 10https://gerrit.wikimedia.org/r/416606 (https://phabricator.wikimedia.org/T188266) [23:43:24] (03CR) 10jerkins-bot: [V: 04-1] openstack: set region to codfw1dev-r for labtestn [puppet] - 10https://gerrit.wikimedia.org/r/416606 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [23:44:32] (03PS2) 10Rush: openstack: set region to codfw1dev-r for labtestn [puppet] - 10https://gerrit.wikimedia.org/r/416606 (https://phabricator.wikimedia.org/T188266) [23:45:06] (03CR) 10jerkins-bot: [V: 04-1] openstack: set region to codfw1dev-r for labtestn [puppet] - 10https://gerrit.wikimedia.org/r/416606 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [23:46:03] (03PS3) 10Rush: openstack: set region to codfw1dev-r for labtestn [puppet] - 10https://gerrit.wikimedia.org/r/416606 (https://phabricator.wikimedia.org/T188266) [23:46:31] (03PS2) 10Andrew Bogott: multiversion: add a transitional mapping for newwikitech.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415914 (https://phabricator.wikimedia.org/T168470) [23:46:33] (03PS1) 10Andrew Bogott: wikitech: use files from swift rather than local uploads. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416607 (https://phabricator.wikimedia.org/T188915) [23:46:46] (03CR) 10Rush: [C: 032] openstack: set region to codfw1dev-r for labtestn [puppet] - 10https://gerrit.wikimedia.org/r/416606 (https://phabricator.wikimedia.org/T188266) (owner: 10Rush) [23:47:45] (03CR) 10jerkins-bot: [V: 04-1] wikitech: use files from swift rather than local uploads. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416607 (https://phabricator.wikimedia.org/T188915) (owner: 10Andrew Bogott) [23:48:01] (03CR) 10jerkins-bot: [V: 04-1] multiversion: add a transitional mapping for newwikitech.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415914 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [23:48:56] (03PS2) 10Andrew Bogott: wikitech: use files from swift rather than local uploads. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416607 (https://phabricator.wikimedia.org/T188915) [23:48:58] (03PS3) 10Andrew Bogott: multiversion: add a transitional mapping for newwikitech.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415914 (https://phabricator.wikimedia.org/T168470) [23:50:06] (03CR) 10jerkins-bot: [V: 04-1] wikitech: use files from swift rather than local uploads. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416607 (https://phabricator.wikimedia.org/T188915) (owner: 10Andrew Bogott) [23:50:26] (03CR) 10jerkins-bot: [V: 04-1] multiversion: add a transitional mapping for newwikitech.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415914 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [23:54:13] (03CR) 10BryanDavis: wikitech: use files from swift rather than local uploads. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416607 (https://phabricator.wikimedia.org/T188915) (owner: 10Andrew Bogott) [23:56:05] (03PS4) 10Andrew Bogott: multiversion: add a transitional mapping for newwikitech.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/415914 (https://phabricator.wikimedia.org/T168470) [23:57:40] (03PS1) 10Rush: wip: openstack: initial nova setup for mitaka [puppet] - 10https://gerrit.wikimedia.org/r/416608 (https://phabricator.wikimedia.org/T188266) [23:58:05] (03PS8) 10Dzahn: icinga: script to send custom SMS to Icinga contacts [puppet] - 10https://gerrit.wikimedia.org/r/400615 (https://phabricator.wikimedia.org/T82937) [23:59:49] (03CR) 10AndyRussG: "Thanks!!! Removing the override in principle seems fine. Note that $wgCentralPagePath was removed from CentralNotice a while ago. (We shou" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416500 (owner: 10Krinkle)