[00:09:57] (03CR) 10BryanDavis: [C: 031] "> Snapshot hosts are done. There's one lab instance to worry about," [puppet] - 10https://gerrit.wikimedia.org/r/430912 (owner: 10Muehlenhoff) [00:21:52] !log pnorman@tin Started deploy [tilerator/deploy@a194185]: Deploy test of stretch build of tilerator to test2004 [00:21:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:22:13] !log pnorman@tin Finished deploy [tilerator/deploy@a194185]: Deploy test of stretch build of tilerator to test2004 (duration: 00m 21s) [00:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:22:45] !log pnorman@tin Started deploy [tilerator/deploy@a194185]: Deploy test of stretch build of tilerator to test2004 [00:22:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:25:01] !log pnorman@tin Finished deploy [tilerator/deploy@a194185]: Deploy test of stretch build of tilerator to test2004 (duration: 02m 17s) [00:25:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:26:51] 10Operations, 10Wikimedia-Planet, 10Patch-For-Review: planet.wikimedia.org: replace planet-venus software with rawdog - https://phabricator.wikimedia.org/T180498#4241688 (10Dzahn) This is the .deb built from that: https://people.wikimedia.org/~dzahn/rawdog/ [00:28:12] !log pnorman@tin Started deploy [tilerator/deploy@a194185]: Deploy test of stretch build of tilerator to test2004 [00:28:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:28:18] !log pnorman@tin Finished deploy [tilerator/deploy@a194185]: Deploy test of stretch build of tilerator to test2004 (duration: 00m 05s) [00:28:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:41:07] !log pnorman@tin Started deploy [tilerator/deploy@a194185]: Deploy test of stretch build of tilerator to test2004 [00:41:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:41:19] !log pnorman@tin Finished deploy [tilerator/deploy@a194185]: Deploy test of stretch build of tilerator to test2004 (duration: 00m 12s) [00:41:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:41:46] RECOVERY - tileratorui on maps-test2004 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.092 second response time [00:43:06] RECOVERY - tilerator on maps-test2004 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.096 second response time [00:45:36] !log pnorman@tin Started deploy [tilerator/deploy@bc35971]: Use parameterized dbname on test2004 [00:45:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:57] !log pnorman@tin Finished deploy [tilerator/deploy@bc35971]: Use parameterized dbname on test2004 (duration: 00m 21s) [00:46:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:08:26] !log built added rawdog_2.22-1-wmf1 to apt.wikimedia.org, upgraded rawdog on planet2001. Unpacking rawdog (2.22-1-wmf1) over (2.22-1) (T180498) [01:08:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:08:30] T180498: planet.wikimedia.org: replace planet-venus software with rawdog - https://phabricator.wikimedia.org/T180498 [01:41:46] (03PS1) 10Zoranzoki21: Add sites to the wgCopyUploadsDomains whitelist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436211 (https://phabricator.wikimedia.org/T195270) [02:41:50] (03PS1) 10Zoranzoki21: Add filemover right to the groups of "patroller" and "autoreviewer on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436213 (https://phabricator.wikimedia.org/T195247) [02:42:53] (03PS2) 10Zoranzoki21: Add filemover right to the groups of "patroller" and "autoreviewer on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436213 (https://phabricator.wikimedia.org/T195247) [02:43:46] (03PS3) 10Zoranzoki21: Add filemover right to the groups of patroller and autoreviewer on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436213 (https://phabricator.wikimedia.org/T195247) [03:01:41] !log l10nupdate@tin scap sync-l10n completed (1.32.0-wmf.5) (duration: 12m 57s) [03:01:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:15:44] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed May 30 03:15:44 UTC 2018 (duration 14m 3s) [03:15:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:27:36] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 847.42 seconds [04:04:35] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0 [04:04:46] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0 [04:11:15] RECOVERY - MariaDB Slave Lag: s8 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 53.34 seconds [04:11:18] (03CR) 10A2093064: [C: 04-1] "It's wrong. See https://phabricator.wikimedia.org/T195247#4241823" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436213 (https://phabricator.wikimedia.org/T195247) (owner: 10Zoranzoki21) [04:12:05] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 243.32 seconds [04:15:26] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [04:15:45] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [04:23:10] 10Puppet, 10Cloud-VPS: role::puppetmaster::standalone on stretch: Unable to locate package geoipupdate - https://phabricator.wikimedia.org/T171916#4241832 (10bd808) 05Open>03Invalid Marking as invalid due to resolution by indirect means. [04:27:55] (03CR) 10Zoranzoki21: "I am totally confused." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436213 (https://phabricator.wikimedia.org/T195247) (owner: 10Zoranzoki21) [05:06:13] !log Deploy schema change on s1 primary master db1052 - T188299 [05:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:06:19] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [05:17:14] (03PS1) 10Marostegui: db-eqiad.php: Depool s2,s4,s6,s7 sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436217 (https://phabricator.wikimedia.org/T190704) [05:19:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool s2,s4,s6,s7 sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436217 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [05:20:49] (03Merged) 10jenkins-bot: db-eqiad.php: Depool s2,s4,s6,s7 sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436217 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [05:21:05] (03CR) 10jenkins-bot: db-eqiad.php: Depool s2,s4,s6,s7 sanitarium masters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436217 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [05:22:39] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool sanitariums masters for s2, s4, s6, s7 - T190704 (duration: 01m 03s) [05:22:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:22:43] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [05:23:31] !log Stop db1074 and db2035 in sync - T190704 [05:23:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:31:28] !log Stop db1085 and db2039 in sync - T190704 [05:31:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:31:32] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [05:41:42] !log Stop db1079 and db2040 in sync - T190704 [05:41:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:41:46] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [05:42:06] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1972 bytes in 0.093 second response time [05:50:17] (03Abandoned) 10Marostegui: mariadb: Promote db1093 to master [puppet] - 10https://gerrit.wikimedia.org/r/435756 (https://phabricator.wikimedia.org/T187962) (owner: 10Marostegui) [05:50:27] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool s2,s4,s6,s7 sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436218 [05:50:35] PROBLEM - Check Varnish expiry mailbox lag on cp5008 is CRITICAL: CRITICAL: expiry mailbox lag is 7423385 [05:50:41] !log restart Kafka mirror maker on kafka10[12-23] - failures to consume after rebalance [05:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:56:28] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool s2,s4,s6,s7 sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436218 (owner: 10Marostegui) [05:56:34] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4241894 (10Marostegui) [05:57:53] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool s2,s4,s6,s7 sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436218 (owner: 10Marostegui) [05:59:03] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool sanitariums masters for s2, s4, s6, s7 - T190704 (duration: 01m 00s) [05:59:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:59:07] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [05:59:40] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool s2,s4,s6,s7 sanitarium masters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436218 (owner: 10Marostegui) [06:05:04] (03PS1) 10Marostegui: db-eqiad.php: Depool sanitarium masters for s1,s3,s5,s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436219 (https://phabricator.wikimedia.org/T190704) [06:08:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool sanitarium masters for s1,s3,s5,s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436219 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [06:09:06] (03Merged) 10jenkins-bot: db-eqiad.php: Depool sanitarium masters for s1,s3,s5,s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436219 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [06:10:46] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool sanitariums masters for s1, s3, s5, s8 - T190704 (duration: 01m 00s) [06:10:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:10:51] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [06:11:26] !log Stop db1106 and db2048 in sync - T190704 [06:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:11:46] (03CR) 10jenkins-bot: db-eqiad.php: Depool sanitarium masters for s1,s3,s5,s8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436219 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [06:12:36] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.066 second response time [06:17:31] !log reimage druid1001 to Debian stretch [06:17:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:19:26] PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet operation_type={container_status,create_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [06:20:06] (03PS1) 10Elukey: role::druid::analytics::worker: set zookeeper version to 3.4.9-3 [puppet] - 10https://gerrit.wikimedia.org/r/436220 (https://phabricator.wikimedia.org/T192636) [06:20:26] RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [06:20:43] (03CR) 10Elukey: [C: 032] role::druid::analytics::worker: set zookeeper version to 3.4.9-3 [puppet] - 10https://gerrit.wikimedia.org/r/436220 (https://phabricator.wikimedia.org/T192636) (owner: 10Elukey) [06:24:12] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4241913 (10Marostegui) [06:25:26] !log Stop db1077 and db2043 in sync - T190704 [06:25:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:31] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [06:28:55] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/share/ca-certificates/GlobalSign_Organization_Validation_CA_-_SHA256_-_G2.crt] [06:29:35] PROBLEM - puppet last run on analytics1071 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/R/biocLite.R] [06:29:54] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1972 bytes in 0.068 second response time [06:30:34] PROBLEM - puppet last run on labvirt1014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml] [06:31:08] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4241917 (10Marostegui) [06:31:51] !log Stop db1082 and db2052 in sync - T190704 [06:31:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:31:55] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [06:43:27] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4241923 (10Marostegui) [06:43:46] !log Stop db1087 and db2045 in sync - T190704 [06:43:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:50] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [06:50:48] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool sanitarium masters for s1,s3,s5,s8" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436221 [06:51:59] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4241926 (10Marostegui) [06:53:28] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool sanitarium masters for s1,s3,s5,s8" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436221 (owner: 10Marostegui) [06:53:34] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4163218 (10Marostegui) db2094 and db2095 (sanitarium in codfw) have been moved to replicate under codfw sanitarium masters. The... [06:54:51] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool sanitarium masters for s1,s3,s5,s8" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436221 (owner: 10Marostegui) [06:55:21] (03PS1) 10Elukey: role::configcluster_stretch: page if a zookeeper node goes down [puppet] - 10https://gerrit.wikimedia.org/r/436222 (https://phabricator.wikimedia.org/T182924) [06:55:42] RECOVERY - puppet last run on labvirt1014 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:55:53] (03CR) 10Elukey: [C: 032] role::configcluster_stretch: page if a zookeeper node goes down [puppet] - 10https://gerrit.wikimedia.org/r/436222 (https://phabricator.wikimedia.org/T182924) (owner: 10Elukey) [06:56:19] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool sanitariums masters for s1, s3, s5, s8 - T190704 (duration: 01m 01s) [06:56:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:23] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [06:59:05] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:59:37] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool sanitarium masters for s1,s3,s5,s8" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436221 (owner: 10Marostegui) [06:59:44] RECOVERY - puppet last run on analytics1071 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:14:26] !log installing git security updates [07:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:18:32] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4241957 (10Marostegui) [07:30:49] (03PS1) 10Elukey: network::constants: remove decommed zookeeper nodes [puppet] - 10https://gerrit.wikimedia.org/r/436226 (https://phabricator.wikimedia.org/T182924) [07:31:41] (03CR) 10Elukey: [C: 032] network::constants: remove decommed zookeeper nodes [puppet] - 10https://gerrit.wikimedia.org/r/436226 (https://phabricator.wikimedia.org/T182924) (owner: 10Elukey) [07:32:56] 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4241965 (10ayounsi) [07:37:56] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 23 probes of 302 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [07:38:03] !log running refresh-translatable-pages.php for wikis having Translate [07:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:17] PROBLEM - puppet last run on kafkamon2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git] [07:43:56] PROBLEM - puppet last run on mw1225 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[git] [07:47:57] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 18 probes of 302 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [07:50:26] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1956 bytes in 0.092 second response time [08:00:05] (03PS1) 10Alexandros Kosiaris: Reimage ganeti1002, ganeti1005 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/436227 [08:07:46] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.082 second response time [08:08:46] RECOVERY - puppet last run on kafkamon2001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [08:09:32] (03PS4) 10Jcrespo: mariadb: Remove old references to db105* and codfw hosts at dns [dns] - 10https://gerrit.wikimedia.org/r/434920 (https://phabricator.wikimedia.org/T186320) [08:10:37] (03CR) 10Alexandros Kosiaris: [C: 04-1] Allow use of PuppetDB in labs for ssh_known_hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/333471 (https://phabricator.wikimedia.org/T72792) (owner: 10Alex Monk) [08:12:47] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.078 second response time [08:13:02] (03CR) 10Alexandros Kosiaris: [C: 032] Reimage ganeti1002, ganeti1005 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/436227 (owner: 10Alexandros Kosiaris) [08:14:07] RECOVERY - puppet last run on mw1225 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:14:40] (03Abandoned) 10Alexandros Kosiaris: Depool poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435968 (https://phabricator.wikimedia.org/T187962) (owner: 10Alexandros Kosiaris) [08:15:03] (03PS2) 10Alexandros Kosiaris: apertium: Pin python3-tornado to jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/435159 (https://phabricator.wikimedia.org/T194883) [08:15:06] (03CR) 10Marostegui: [C: 031] mariadb: Remove old references to db105* and codfw hosts at dns [dns] - 10https://gerrit.wikimedia.org/r/434920 (https://phabricator.wikimedia.org/T186320) (owner: 10Jcrespo) [08:15:23] (03CR) 10Alexandros Kosiaris: [C: 032] apertium: Pin python3-tornado to jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/435159 (https://phabricator.wikimedia.org/T194883) (owner: 10Alexandros Kosiaris) [08:16:42] (03CR) 10Candalua: [C: 031] Set wgProofreadPagePageSeparator to '' for jawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436168 (https://phabricator.wikimedia.org/T195873) (owner: 10Urbanecm) [08:17:11] (03CR) 10Candalua: [C: 031] Set wgProofreadPagePageSeparator='' on zhwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436169 (https://phabricator.wikimedia.org/T194875) (owner: 10Urbanecm) [08:18:02] (03CR) 10Jcrespo: [C: 032] mariadb: Remove old references to db105* and codfw hosts at dns [dns] - 10https://gerrit.wikimedia.org/r/434920 (https://phabricator.wikimedia.org/T186320) (owner: 10Jcrespo) [08:21:57] 10Operations, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10hardware-requests, 10Patch-For-Review: Give misc dump crons their own host - https://phabricator.wikimedia.org/T181936#4242006 (10ArielGlenn) 05stalled>03Resolved a:03ArielGlenn Everything looks good, let's see if I can close this w... [08:22:19] !log upgrade python3-tornado on scb1001 and restart apertium-apy. T194883 [08:22:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:23] T194883: Backport python3-tornado for jessie-wikimedia - https://phabricator.wikimedia.org/T194883 [08:27:08] (03CR) 10ArielGlenn: [C: 031] "let's do this then!" [puppet] - 10https://gerrit.wikimedia.org/r/435171 (owner: 10Muehlenhoff) [08:30:04] (03PS1) 10Muehlenhoff: Record extended MOU dates for arnad and piccardi [puppet] - 10https://gerrit.wikimedia.org/r/436231 [08:30:56] (03PS2) 10Muehlenhoff: Record extended MOU dates for arnad and piccardi [puppet] - 10https://gerrit.wikimedia.org/r/436231 [08:31:56] (03CR) 10Muehlenhoff: [C: 032] Record extended MOU dates for arnad and piccardi [puppet] - 10https://gerrit.wikimedia.org/r/436231 (owner: 10Muehlenhoff) [08:38:54] !log Stop and reboot db2094 and db2095 for testing - T190704 [08:38:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:58] T190704: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704 [08:40:16] (03PS2) 10Muehlenhoff: Remove support for trusty in mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/430912 [08:46:23] (03PS4) 10Jcrespo: mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 [08:46:25] (03PS1) 10Jcrespo: mariadb-backups: Add misc eqiad missing backups from db1117 [puppet] - 10https://gerrit.wikimedia.org/r/436235 (https://phabricator.wikimedia.org/T192358) [08:46:41] (03PS2) 10Jcrespo: mariadb-backups: Add misc eqiad missing backups from db1117 [puppet] - 10https://gerrit.wikimedia.org/r/436235 (https://phabricator.wikimedia.org/T192358) [08:47:02] !log reimage druid1004 to Debian Stretch [08:47:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:46] (03PS1) 10Elukey: Override zookeeper version for druid1004 [puppet] - 10https://gerrit.wikimedia.org/r/436236 (https://phabricator.wikimedia.org/T192636) [08:48:14] (03CR) 10Elukey: [C: 032] Override zookeeper version for druid1004 [puppet] - 10https://gerrit.wikimedia.org/r/436236 (https://phabricator.wikimedia.org/T192636) (owner: 10Elukey) [08:48:16] (03CR) 10Marostegui: [C: 031] mariadb-backups: Add misc eqiad missing backups from db1117 [puppet] - 10https://gerrit.wikimedia.org/r/436235 (https://phabricator.wikimedia.org/T192358) (owner: 10Jcrespo) [08:50:31] (03CR) 10ArielGlenn: [C: 031] "this would have been easier if tin were gone. ah well..." [puppet] - 10https://gerrit.wikimedia.org/r/430912 (owner: 10Muehlenhoff) [08:51:32] (03CR) 10Alexandros Kosiaris: [C: 04-1] Allow PuppetDB use on standalone puppetmasters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/435631 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [08:53:28] (03PS3) 10Muehlenhoff: Remove support for trusty in mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/430912 [08:54:42] (03CR) 10Muehlenhoff: [C: 032] Remove support for trusty in mediawiki classes [puppet] - 10https://gerrit.wikimedia.org/r/430912 (owner: 10Muehlenhoff) [09:05:10] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4242095 (10ayounsi) For lvs1015, @Cmjohnson can you cable the following? |host|hostport|switch:switchport| |---|---|---|---| |lvs1015|enp4s0f0 (primary)|asw2-c-eqiad:xe-7/0/... [09:13:05] (03PS1) 10Giuseppe Lavagetto: utils: add script to generate mcrouter-related certs [puppet] - 10https://gerrit.wikimedia.org/r/436240 (https://phabricator.wikimedia.org/T192771) [09:13:42] (03CR) 10jerkins-bot: [V: 04-1] utils: add script to generate mcrouter-related certs [puppet] - 10https://gerrit.wikimedia.org/r/436240 (https://phabricator.wikimedia.org/T192771) (owner: 10Giuseppe Lavagetto) [09:16:23] (03PS1) 10Muehlenhoff: Remove now obsolete os conditional [puppet] - 10https://gerrit.wikimedia.org/r/436241 [09:23:24] (03PS1) 10Muehlenhoff: Remove obsolete mediawiki Upstart jobs [puppet] - 10https://gerrit.wikimedia.org/r/436242 [09:24:10] !log peering setup with RIPE RIS in eqsin [09:24:13] (03PS1) 10Volans: Reducing max length for varchar columns [software/debmonitor] - 10https://gerrit.wikimedia.org/r/436243 (https://phabricator.wikimedia.org/T191299) [09:24:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:15] (03PS1) 10Volans: MySQL config fine-tuning [software/debmonitor] - 10https://gerrit.wikimedia.org/r/436244 (https://phabricator.wikimedia.org/T191299) [09:27:19] (03PS1) 10Marostegui: db-codfw.php: Depool db2059 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436246 (https://phabricator.wikimedia.org/T191316) [09:28:19] (03PS1) 10Addshore: Revert "Disable search integration with Article Placeholder temporarily" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436247 [09:30:47] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2059 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436246 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [09:32:21] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2059 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436246 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [09:32:37] (03CR) 10jenkins-bot: db-codfw.php: Depool db2059 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436246 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [09:33:17] (03PS1) 10Ppchelko: Disable redis queue for cirrus except wikipedia, commons and wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436249 (https://phabricator.wikimedia.org/T189137) [09:33:17] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [09:33:44] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2059 for alter table (duration: 01m 02s) [09:33:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:36] topmost exceptions [09:34:37] Could not wait for replica DBs to catch up to db1071 [09:34:43] Could not wait for replica DBs to catch up to db1054 [09:35:12] marostegui: ---^ [09:37:14] (grabbed from the fatal monitor) [09:37:17] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [09:39:19] elukey: checking [09:40:26] (03PS3) 10Jcrespo: mariadb-backups: Add misc eqiad missing backups from db1117 [puppet] - 10https://gerrit.wikimedia.org/r/436235 (https://phabricator.wikimedia.org/T192358) [09:40:57] elukey: maybe a spike? it looks gone now (those are s8 and s2 btw) [09:42:51] yep yep [09:42:55] I was only reporting :) [09:43:45] yeah, thanks I took a look and it was gone by then [09:44:31] (03CR) 10Jcrespo: [C: 032] mariadb-backups: Add misc eqiad missing backups from db1117 [puppet] - 10https://gerrit.wikimedia.org/r/436235 (https://phabricator.wikimedia.org/T192358) (owner: 10Jcrespo) [09:47:31] !log Deploy schema change on db2059 - T191316 T192926 T89737 T195193 [09:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:40] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [09:47:40] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [09:47:40] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [09:47:40] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [09:57:43] (03PS1) 10Jcrespo: mariadb: Depool db1089 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436250 [09:59:12] 10Operations, 10DBA, 10Epic: DB meta task for next DC failover issues - https://phabricator.wikimedia.org/T189107#4242311 (10Marostegui) [09:59:13] (03PS1) 10Aaron Schulz: [WIP] Add "memcached-mcrouter" to $wgObjectCaches as default for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436252 [09:59:36] 10Operations, 10DBA, 10Epic: DB meta task for next DC failover issues - https://phabricator.wikimedia.org/T189107#4031557 (10Marostegui) [09:59:49] 10Operations, 10DBA, 10Epic: DB meta task for next DC failover issues - https://phabricator.wikimedia.org/T189107#4031557 (10Marostegui) [10:03:02] (03PS1) 10Muehlenhoff: Re-add Siddarth Parmar to LDAP access [puppet] - 10https://gerrit.wikimedia.org/r/436254 [10:03:40] (03PS2) 10Addshore: Revert "Disable search integration with Article Placeholder temporarily" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436247 [10:05:34] (03Draft2) 10Ayounsi: Allow netbox-deploy to subscribe to netbox [software/netbox] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/436253 [10:05:34] marostegui: s8, anything to worry about? [10:06:27] addshore: no that I can see [10:06:47] =] [10:07:40] (03PS2) 10Muehlenhoff: Re-add Siddarth Parmar to LDAP access [puppet] - 10https://gerrit.wikimedia.org/r/436254 [10:09:15] (03PS7) 10Addshore: Update Wikidata wgPropertySuggesterDeprecatedIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430045 (owner: 10Matěj Suchánek) [10:09:49] (03CR) 10Addshore: "I removed the word 'blacklist' from the commit message, as the wikibase blacklisted entities / ids is a very different setting and this co" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430045 (owner: 10Matěj Suchánek) [10:10:03] (03PS3) 10Muehlenhoff: Re-add Siddarth Parmar to LDAP access [puppet] - 10https://gerrit.wikimedia.org/r/436254 [10:10:35] (03PS4) 10Muehlenhoff: Re-add Siddarth Parmar to LDAP access [puppet] - 10https://gerrit.wikimedia.org/r/436254 [10:12:57] (03PS3) 10Addshore: Revert "Disable search integration with Article Placeholder temporarily" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436247 (https://phabricator.wikimedia.org/T195751) [10:12:59] (03PS4) 10Mark Bergsma: Implement common base class for "looping check" monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/435764 [10:14:04] (03CR) 10Mark Bergsma: [C: 032] Implement common base class for "looping check" monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/435764 (owner: 10Mark Bergsma) [10:14:22] (03CR) 10Muehlenhoff: [C: 032] Re-add Siddarth Parmar to LDAP access [puppet] - 10https://gerrit.wikimedia.org/r/436254 (owner: 10Muehlenhoff) [10:14:42] (03Merged) 10jenkins-bot: Implement common base class for "looping check" monitors [debs/pybal] - 10https://gerrit.wikimedia.org/r/435764 (owner: 10Mark Bergsma) [10:19:10] 10Operations, 10monitoring, 10Patch-For-Review: Evaluate Grafana's LDAP group options and deprecate grafana-admin if possible - https://phabricator.wikimedia.org/T170150#4242370 (10akosiaris) Information about the 14 grafana database duplicate users can be found at P7183 (WMF-NDA protected currently) [10:22:48] (03CR) 10ArielGlenn: [C: 031] "Yep, looks like you got all the cruft out." [puppet] - 10https://gerrit.wikimedia.org/r/436242 (owner: 10Muehlenhoff) [10:27:18] (03PS1) 10Mark Bergsma: Restore testing of the UDP monitor check method [debs/pybal] - 10https://gerrit.wikimedia.org/r/436261 [10:29:44] (03PS5) 10Jcrespo: mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 [10:29:46] (03PS1) 10Jcrespo: mariadb: Allow reimage of db108* hosts, set db1089 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/436262 [10:30:33] (03PS2) 10Jcrespo: mariadb: Allow reimage of db108* hosts, set db1089 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/436262 [10:30:40] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2059" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436263 [10:31:59] (03PS3) 10Jcrespo: mariadb: Allow reimage of db108* hosts, set db1089 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/436262 [10:32:35] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2059" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436263 (owner: 10Marostegui) [10:33:58] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2059" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436263 (owner: 10Marostegui) [10:34:27] (03PS5) 10Jcrespo: phabricator/mariadb: Update database configuration for stretch/10.1 [puppet] - 10https://gerrit.wikimedia.org/r/377693 (https://phabricator.wikimedia.org/T175679) [10:35:13] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2059 after alter table (duration: 01m 02s) [10:35:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:00] (03CR) 10Jcrespo: [C: 032] mariadb: Allow reimage of db108* hosts, set db1089 for stretch [puppet] - 10https://gerrit.wikimedia.org/r/436262 (owner: 10Jcrespo) [10:36:36] (03PS2) 10Jcrespo: mariadb: Depool db1089 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436250 [10:37:55] (03PS1) 10Marostegui: db-eqiad.php: Depool db2089:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436264 (https://phabricator.wikimedia.org/T191316) [10:38:28] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1089 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436250 (owner: 10Jcrespo) [10:39:35] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2059" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436263 (owner: 10Marostegui) [10:39:45] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db2089:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436264 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [10:39:58] (03Merged) 10jenkins-bot: mariadb: Depool db1089 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436250 (owner: 10Jcrespo) [10:41:09] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db2089:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436264 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [10:42:18] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1089 (duration: 01m 01s) [10:42:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:22] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2089:3315 for alter table (duration: 01m 01s) [10:43:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:06] (03PS1) 10Marostegui: db2094,db2095: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/436268 (https://phabricator.wikimedia.org/T190704) [10:44:10] !log Deploy schema change on db2089:3315 - T191316 T192926 T89737 T195193 [10:44:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:16] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [10:44:16] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [10:44:17] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [10:44:17] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [10:45:16] (03CR) 10jenkins-bot: mariadb: Depool db1089 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436250 (owner: 10Jcrespo) [10:45:22] (03CR) 10jenkins-bot: db-eqiad.php: Depool db2089:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436264 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [10:46:29] (03CR) 10Marostegui: [C: 032] db2094,db2095: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/436268 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui) [10:47:44] (03PS2) 10Giuseppe Lavagetto: utils: add script to generate mcrouter-related certs [puppet] - 10https://gerrit.wikimedia.org/r/436240 (https://phabricator.wikimedia.org/T192771) [10:50:42] (03PS1) 10Muehlenhoff: Enable microcode updates for all mediawiki servers [puppet] - 10https://gerrit.wikimedia.org/r/436271 (https://phabricator.wikimedia.org/T127825) [11:04:38] !log stop and reimage db1089 [11:04:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:51] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/433318 (https://phabricator.wikimedia.org/T194342) (owner: 10KartikMistry) [11:19:22] (03CR) 10jerkins-bot: [V: 04-1] WIP: apertium-apy: New upstream release [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/433318 (https://phabricator.wikimedia.org/T194342) (owner: 10KartikMistry) [11:21:35] akosiaris: ^ What can be reason for this even we pinnned python3-tornado? :/ [11:22:54] (03PS3) 10KartikMistry: WIP: apertium-apy: New upstream release [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/433318 (https://phabricator.wikimedia.org/T194342) [11:23:19] (03PS2) 10Mark Bergsma: Add unit testing for BGP Factory classes [debs/pybal] - 10https://gerrit.wikimedia.org/r/433735 [11:23:24] (03CR) 10jerkins-bot: [V: 04-1] WIP: apertium-apy: New upstream release [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/433318 (https://phabricator.wikimedia.org/T194342) (owner: 10KartikMistry) [11:25:16] (03CR) 10Mark Bergsma: [C: 031] Add unit testing for BGP Factory classes [debs/pybal] - 10https://gerrit.wikimedia.org/r/433735 (owner: 10Mark Bergsma) [11:26:51] 10Operations, 10Patch-For-Review: Re-add intel-microcode - https://phabricator.wikimedia.org/T127825#4242519 (10MoritzMuehlenhoff) Test servers for the elastic clusters at https://phabricator.wikimedia.org/P7184 [11:35:17] (03CR) 10R4q3NWnUx2CEhVyr: "Any new about this patch ?" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/430069 (owner: 10R4q3NWnUx2CEhVyr) [11:39:23] (03CR) 10Thiemo Kreuz (WMDE): [C: 031] "Ladsgroup, you said you wanted to schedule this for SWAT, but apparently missed it. Do you have the capacity to try again?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430045 (owner: 10Matěj Suchánek) [11:43:10] (03CR) 10Ladsgroup: "Oh my sentence was imperative because I don't have much context here but I think it's not that hard. I can do it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430045 (owner: 10Matěj Suchánek) [11:46:20] (03PS3) 10Mark Bergsma: Clarify interface of buildProtocol and setEnabledAddressFamilies [debs/pybal] - 10https://gerrit.wikimedia.org/r/433736 [11:58:41] (03PS2) 10Mark Bergsma: Fix BGP collision detection [debs/pybal] - 10https://gerrit.wikimedia.org/r/434161 [11:59:29] (03CR) 10Mark Bergsma: [C: 031] "Done" [debs/pybal] - 10https://gerrit.wikimedia.org/r/433736 (owner: 10Mark Bergsma) [12:02:54] (03CR) 10Mark Bergsma: [C: 031] Fix BGP collision detection [debs/pybal] - 10https://gerrit.wikimedia.org/r/434161 (owner: 10Mark Bergsma) [12:03:22] kart_: CI has nothing to do with was is in production. hasharAway might know how to enable backports in CI [12:08:48] PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:08:49] PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:08:58] PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:09:19] PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:09:19] PROBLEM - Disk space on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:09:29] PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:09:42] hello stat1005 [12:10:18] PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds [12:10:23] (03CR) 10Mark Bergsma: [C: 031] "WiP" [debs/pybal] - 10https://gerrit.wikimedia.org/r/434162 (owner: 10Mark Bergsma) [12:13:29] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [12:13:29] RECOVERY - Disk space on stat1005 is OK: DISK OK [12:13:39] RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational [12:13:58] RECOVERY - DPKG on stat1005 is OK: All packages OK [12:13:59] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [12:14:08] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [12:14:36] Amir1: o/ - the oom killer on stat1005 killed a python process, and judging from the current workload it might have been yours :) [12:15:01] stat1005 is still a bit overloaded, are you planning to let it run for a long time? [12:15:05] yup, I'm training a model for ORES :D [12:15:13] hmm [12:15:19] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 15 minutes ago with 0 failures [12:15:21] https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?orgId=1&var-server=stat1005&var-datasource=eqiad%20prometheus%2Fops [12:15:24] :) [12:15:28] I can stop once this part is done [12:15:36] akosiaris: oh, Ok! [12:15:54] next time I run it with nice [12:16:08] elukey: would it help? [12:16:42] (03PS1) 10Hoo man: Dumps: Add groups to dumper_misc_crons_only hosts [puppet] - 10https://gerrit.wikimedia.org/r/436276 (https://phabricator.wikimedia.org/T181936) [12:17:05] Amir1: maybe even reducing the number of processes/concurrency would be more effective [12:17:44] (03PS2) 10Mark Bergsma: Add tests that similate client or server sessions initial connection [debs/pybal] - 10https://gerrit.wikimedia.org/r/434162 [12:19:48] (03CR) 10Mark Bergsma: [C: 031] Add tests that similate client or server sessions initial connection [debs/pybal] - 10https://gerrit.wikimedia.org/r/434162 (owner: 10Mark Bergsma) [12:21:01] (03PS1) 10Gergő Tisza: Remove warning about using labs-vagrant on stretch [puppet] - 10https://gerrit.wikimedia.org/r/436277 (https://phabricator.wikimedia.org/T180377) [12:24:32] (03CR) 10Elukey: [C: 031] "LGTM, let's review https://puppet-compiler.wmflabs.org/compiler02/11320/ just to be sure before merging!" [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [12:24:48] (03CR) 10Alexandros Kosiaris: MySQL config fine-tuning (032 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/436244 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:24:59] 10Operations, 10Patch-For-Review: Re-add intel-microcode - https://phabricator.wikimedia.org/T127825#4242633 (10MoritzMuehlenhoff) Test servers for Hadoop cluster at https://phabricator.wikimedia.org/P7186 [12:25:12] (03CR) 10jerkins-bot: [V: 04-1] Create profile::analytics::cluster::packages class [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [12:26:13] (03PS1) 10Jcrespo: mariadb: Repool db1089 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436278 [12:26:30] (03CR) 10Alexandros Kosiaris: [C: 04-1] "pedanticness alert!" (032 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/434162 (owner: 10Mark Bergsma) [12:28:36] akosiaris: <3 [12:28:56] :-) [12:30:48] i will avoid the problem altogether [12:30:57] (03PS3) 10Mark Bergsma: Add tests that emulate client or server sessions initial connection [debs/pybal] - 10https://gerrit.wikimedia.org/r/434162 [12:30:58] (03PS2) 10Mark Bergsma: Move FSM connect state handling to the FSM itself [debs/pybal] - 10https://gerrit.wikimedia.org/r/434163 [12:31:06] lol [12:34:03] (03CR) 10Volans: MySQL config fine-tuning (031 comment) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/436244 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:34:21] !log starting elasticsearch cluster restart on codfw - T193734 [12:34:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:27] T193734: Move Serbian language wikis from extra-analysis to extra-analysis-serbian plugin - https://phabricator.wikimedia.org/T193734 [12:34:36] (03CR) 10Dbarratt: [C: 031] Enable $wgCookieSetOnIpBlock on test wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436193 (https://phabricator.wikimedia.org/T195930) (owner: 10Dmaza) [12:35:14] !log reboot analytics1029,1042,1070 to pick up the new cpu-microcode [12:35:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:26] (03CR) 10Mark Bergsma: [C: 031] Move FSM connect state handling to the FSM itself [debs/pybal] - 10https://gerrit.wikimedia.org/r/434163 (owner: 10Mark Bergsma) [12:38:10] (03CR) 10Muehlenhoff: [C: 031] "Sounds good" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/436243 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:39:38] PROBLEM - ganeti-confd running on ganeti1002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 110 (gnt-confd), command name ganeti-confd [12:39:38] PROBLEM - ganeti-mond running on ganeti1005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-mond [12:39:38] PROBLEM - ganeti-noded running on ganeti1005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded [12:39:39] PROBLEM - ganeti-noded running on ganeti1002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded [12:39:58] PROBLEM - ganeti-confd running on ganeti1005 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (gnt-confd), command name ganeti-confd [12:39:59] akosiaris: ^^^ expected? [12:40:09] PROBLEM - ganeti-mond running on ganeti1002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-mond [12:40:37] !log reimage ganeti1002, ganeti1005 as stretch [12:40:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:40:42] volans: yup [12:40:47] ack :) [12:41:09] icinga was faster than me starting the reimage it seems [12:41:12] (03CR) 10Ema: [C: 031] Restore testing of the UDP monitor check method [debs/pybal] - 10https://gerrit.wikimedia.org/r/436261 (owner: 10Mark Bergsma) [12:41:14] lol [12:41:21] \o/ [12:42:05] (03CR) 10ArielGlenn: [C: 032] Dumps: Add groups to dumper_misc_crons_only hosts [puppet] - 10https://gerrit.wikimedia.org/r/436276 (https://phabricator.wikimedia.org/T181936) (owner: 10Hoo man) [12:44:42] (03CR) 10Jcrespo: MySQL config fine-tuning (032 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/436244 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:45:51] (03CR) 10Mark Bergsma: [C: 032] Restore testing of the UDP monitor check method [debs/pybal] - 10https://gerrit.wikimedia.org/r/436261 (owner: 10Mark Bergsma) [12:46:24] (03Merged) 10jenkins-bot: Restore testing of the UDP monitor check method [debs/pybal] - 10https://gerrit.wikimedia.org/r/436261 (owner: 10Mark Bergsma) [12:47:40] (03PS3) 10Hoo man: Wikidata dispatching: Update comment [puppet] - 10https://gerrit.wikimedia.org/r/432778 [12:48:21] (03PS3) 10Mark Bergsma: Remove server.is_pooled as it isn't actually used [debs/pybal] - 10https://gerrit.wikimedia.org/r/428901 [12:48:27] (03CR) 10ArielGlenn: [C: 032] Wikidata dispatching: Update comment [puppet] - 10https://gerrit.wikimedia.org/r/432778 (owner: 10Hoo man) [12:49:58] (03CR) 10Mark Bergsma: [C: 032] Remove server.is_pooled as it isn't actually used [debs/pybal] - 10https://gerrit.wikimedia.org/r/428901 (owner: 10Mark Bergsma) [12:50:05] (03CR) 10Ayounsi: [V: 032 C: 032] Allow netbox-deploy to subscribe to netbox [software/netbox] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/436253 (owner: 10Ayounsi) [12:50:35] (03Merged) 10jenkins-bot: Remove server.is_pooled as it isn't actually used [debs/pybal] - 10https://gerrit.wikimedia.org/r/428901 (owner: 10Mark Bergsma) [12:53:12] (03PS1) 10Muehlenhoff: Cleanup after migration of deployment servers to stretch [puppet] - 10https://gerrit.wikimedia.org/r/436284 [12:57:00] (03CR) 10Volans: MySQL config fine-tuning (032 comments) [software/debmonitor] - 10https://gerrit.wikimedia.org/r/436244 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [12:57:33] 10Operations, 10Maps-Sprint, 10Patch-For-Review: reimage maps-test2004 to stretch and cassandra 2.2 - https://phabricator.wikimedia.org/T195741#4242713 (10Gehel) @Eevans packaged a new cassandra-2.2.6-wmf4 on https://people.wikimedia.org/~eevans/debian, we can test it and see how it works... [12:57:43] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db2089:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436285 [12:57:51] jouncebot: nextg [12:57:55] jouncebot: next [12:57:55] In 0 hour(s) and 2 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180530T1300) [12:58:58] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db2089:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436285 (owner: 10Marostegui) [12:59:48] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db2089:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436285 (owner: 10Marostegui) [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for European Mid-day SWAT(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180530T1300). [13:00:04] CFisch_WMDE, tgr, Jdlrobson, and tarrow: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:19] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2089:3315 after alter table (duration: 01m 02s) [13:00:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:38] \o [13:00:41] o/ [13:00:42] \o/ [13:00:47] o/ [13:01:24] (03PS2) 10Volans: MySQL config fine-tuning [software/debmonitor] - 10https://gerrit.wikimedia.org/r/436244 (https://phabricator.wikimedia.org/T191299) [13:01:46] I can SWAT today [13:01:54] :-D [13:02:08] anybody wants to deploy their own patches? [13:03:00] I've just been asked to cancel my patch last minute :) so don't worry about me [13:03:26] tarrow: please remove it from the calendar then [13:03:27] I'll just pull it off Deployments [13:03:33] sure! [13:04:34] CFisch_WMDE: you are first, will ping you in a few minutes when the patch is at mwdebug [13:04:43] tgr: you are next, please stand by [13:04:45] cool [13:04:54] (03PS2) 10Zfilipin: Enable detection of changes in moved paragraphs on most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436017 (https://phabricator.wikimedia.org/T195375) (owner: 10WMDE-Fisch) [13:05:34] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436017 (https://phabricator.wikimedia.org/T195375) (owner: 10WMDE-Fisch) [13:06:52] (03Merged) 10jenkins-bot: Enable detection of changes in moved paragraphs on most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436017 (https://phabricator.wikimedia.org/T195375) (owner: 10WMDE-Fisch) [13:07:13] (03PS1) 10Volans: debmonitor: specify MySQL connection options [puppet] - 10https://gerrit.wikimedia.org/r/436286 (https://phabricator.wikimedia.org/T191299) [13:07:21] zeljkof, I can deploy [13:07:30] I finally have +2 rights, but it will be my first SWAT [13:07:48] so maybe I can go last?, also I need to check the production shell access, I didn't do that for months [13:07:52] raynor: do you have anything to deploy? [13:08:06] I don't see it in the calendar? [13:08:12] (03CR) 10Volans: "This patch should be straightforward now (the config.json is just an example file)." [software/debmonitor] - 10https://gerrit.wikimedia.org/r/436244 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [13:08:14] jdlrobson added it to calendar as himself: [13:08:14] Remove unused PopupsAnonsExperimentalGroupSize config variable [13:08:25] https://gerrit.wikimedia.org/r/#/c/431759/ -> it's a config change [13:08:33] raynor: ah, I see it, sure, get ready, I'll leave it for you [13:08:42] \o/ [13:09:29] CFisch_WMDE: your patch is at mwdebug1002, let me know if I can deploy [13:09:55] (03PS2) 10Zfilipin: Add WMDS support question feed to mediawikiwiki RSS whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434901 (https://phabricator.wikimedia.org/T185087) (owner: 10Gergő Tisza) [13:10:08] !log Deploy schema change on dbstore2001:3315 - T191316 T192926 T89737 T195193 [13:10:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:15] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [13:10:15] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [13:10:15] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [13:10:15] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [13:10:29] zeljkof: nice, looks good [13:10:48] you can deploy [13:11:18] (03CR) 10jenkins-bot: Enable detection of changes in moved paragraphs on most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436017 (https://phabricator.wikimedia.org/T195375) (owner: 10WMDE-Fisch) [13:11:21] CFisch_WMDE: deploying [13:12:31] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:436017|Enable detection of changes in moved paragraphs on most wikis (T195375)]] (duration: 01m 02s) [13:12:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:37] T195375: Show changes in moved paragraphs on most wikis - https://phabricator.wikimedia.org/T195375 [13:12:51] CFisch_WMDE: deployed, please check and thanks for deploying with #releng! ;) [13:13:23] tgr: will ping you in a few minutes when your patch is at mwdebug for testing [13:14:24] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434901 (https://phabricator.wikimedia.org/T185087) (owner: 10Gergő Tisza) [13:14:28] zeljkof: seems all good caching is a bit in the way but with purge=now the changes are visible and work as expected [13:14:43] thanks [13:14:50] CFisch_WMDE: thumbs up 👍 [13:15:50] (03Merged) 10jenkins-bot: Add WMDS support question feed to mediawikiwiki RSS whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434901 (https://phabricator.wikimedia.org/T185087) (owner: 10Gergő Tisza) [13:16:09] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1089 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436278 (owner: 10Jcrespo) [13:17:13] (03CR) 10jenkins-bot: Add WMDS support question feed to mediawikiwiki RSS whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434901 (https://phabricator.wikimedia.org/T185087) (owner: 10Gergő Tisza) [13:17:25] (03Merged) 10jenkins-bot: mariadb: Repool db1089 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436278 (owner: 10Jcrespo) [13:17:40] (03CR) 10jenkins-bot: mariadb: Repool db1089 with low load after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436278 (owner: 10Jcrespo) [13:18:07] tgr: your commit is at mwdebug1002, please test and let me know if I can deploy it [13:19:21] raynor: please get ready, I'm not sure if tgr is around for SWAT, you are next [13:19:35] ok [13:19:47] zeljkof: it doesn't seem to be doing anything, but maybe it's just some kind of caching [13:20:09] it does not break anything, in any case [13:20:14] tgr: should I wait, deploy, or revert? :) [13:20:54] zeljkof: probably easiest to just deploy it [13:21:07] tgr: ok, deploying [13:22:15] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:434901|Add WMDS support question feed to mediawikiwiki RSS whitelist (T185087)]] (duration: 01m 01s) [13:22:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:20] T185087: Whitelist discourse-mediawiki.wmflabs.org for Extension:RSS in MediaWiki.org - https://phabricator.wikimedia.org/T185087 [13:22:34] tgr: deployed, please test if possible, and thanks for deploying with #releng! :) [13:22:42] raynor: swat is yours! :D [13:22:43] thanks zeljkof! [13:22:59] I'm around, if you have any questions or if you need help [13:23:05] ok, cool zeljkof, you'll have to tell me step by step. I' [13:23:09] I'm here: https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment#Change_wiki_configuration [13:23:18] raynor: oh noes :) [13:23:22] and I'm logged on tin [13:23:31] raynor: this is all I know :) https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers [13:23:44] ok, first - rebase [13:24:05] !log emptying ganeti1003, ganeti1007 for stretch upgrade [13:24:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:10] !log ppchelko@tin Started deploy [changeprop/deploy@43310d4]: Emit revision-score event T167180 [13:26:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:15] T167180: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180 [13:26:32] (03PS3) 10Pmiazga: Remove unused PopupsAnonsExperimentalGroupSize config variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431759 (https://phabricator.wikimedia.org/T173952) [13:26:52] FWIW, I was testing it with VisualEditor preview which was dumb [13:27:19] ok, I think I'm ready, zeljkof I'll go pv so I don't flood this channel [13:27:42] !log ppchelko@tin Finished deploy [changeprop/deploy@43310d4]: Emit revision-score event T167180 (duration: 01m 32s) [13:27:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:50] raynor: this channel is a good place for deployment questions :) [13:28:25] (03CR) 10Pmiazga: [C: 032] Remove unused PopupsAnonsExperimentalGroupSize config variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431759 (https://phabricator.wikimedia.org/T173952) (owner: 10Pmiazga) [13:28:45] I just want to verify each step with you [13:29:06] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 with low load (duration: 00m 59s) [13:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:43] (03Merged) 10jenkins-bot: Remove unused PopupsAnonsExperimentalGroupSize config variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431759 (https://phabricator.wikimedia.org/T173952) (owner: 10Pmiazga) [13:30:01] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 [13:30:08] (03CR) 10jenkins-bot: Remove unused PopupsAnonsExperimentalGroupSize config variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/431759 (https://phabricator.wikimedia.org/T173952) (owner: 10Pmiazga) [13:30:11] (03CR) 10jerkins-bot: [V: 04-1] Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 (owner: 10Jcrespo) [13:32:24] !log reboot analytics1002 (Hadoop master node standby) to pick up new cpu microcode [13:32:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:27] 10Operations, 10Ops-Access-Reviews, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access for bmansurov to run mwscript in terbium - https://phabricator.wikimedia.org/T189285#4242884 (10bmansurov) 05Resolved>03Open [13:34:30] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 [13:34:39] (03CR) 10jerkins-bot: [V: 04-1] Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 (owner: 10Jcrespo) [13:36:35] https://gerrit.wikimedia.org/r/#/c/431759 tested on mwdebug1002, works as expected [13:37:07] !log ppchelko@tin Started deploy [eventstreams/deploy@14e0b03]: Recreate config with new puppet and restart service T167180 [13:37:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:12] T167180: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180 [13:37:15] 10Operations, 10Patch-For-Review: Re-add intel-microcode - https://phabricator.wikimedia.org/T127825#4242900 (10ema) It would be useful to check if a new microcode is available (and thus a system restart is needed). Something along these lines should do the trick: ``` #!/bin/bash running="$(awk '/microcode/... [13:39:14] !log ppchelko@tin Finished deploy [eventstreams/deploy@14e0b03]: Recreate config with new puppet and restart service T167180 (duration: 02m 07s) [13:39:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:32] (03PS3) 10Jcrespo: Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 [13:40:38] !log ppchelko@tin Started deploy [changeprop/deploy@4503987]: Fix a bug with no content in ores response [13:40:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:17] !log ppchelko@tin Finished deploy [changeprop/deploy@4503987]: Fix a bug with no content in ores response (duration: 01m 38s) [13:42:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:24] 10Operations, 10Maps-Sprint, 10Patch-For-Review: reimage maps-test2004 to stretch and cassandra 2.2 - https://phabricator.wikimedia.org/T195741#4242913 (10Eevans) >>! In T195741#4238201, @Gehel wrote: > I can't find a repo in gerrit for cassandra packaging. @elukey / @Eevans any idea where it could be? For... [13:44:43] !log pmiazga@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:431759|Remove unused PopupsAnonsExperimentalGroupSize config variable (T173952)]] (duration: 01m 02s) [13:44:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:47] T173952: Remove A/B testing instrumentation code - https://phabricator.wikimedia.org/T173952 [13:46:36] !log pmiazga@tin Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:431759|Remove unused PopupsAnonsExperimentalGroupSize config variable (T173952)]] (duration: 01m 01s) [13:46:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:59] !log EU SWAT finished [13:49:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:22] (03CR) 10ArielGlenn: "terbium is still running some php5 crapola on it, as seen via lastcomm >_<" [puppet] - 10https://gerrit.wikimedia.org/r/436284 (owner: 10Muehlenhoff) [13:49:52] swat is done. zeljkof thanks for your help. I appreciate that! [13:51:28] (03CR) 10Herron: "Planning on merging this shortly" [puppet] - 10https://gerrit.wikimedia.org/r/431860 (https://phabricator.wikimedia.org/T193766) (owner: 10Herron) [13:51:59] raynor: welcome to deployers! :D [13:52:23] \o/ [13:52:24] yay [13:53:14] raynor: boom! https://phabricator.wikimedia.org/people/badges/8417/ [13:53:36] you have just been awarded with legendary SWAT deployer phabricator badge! :D [13:55:06] wow, I didn't know it exists. I'll wear it with pride [13:55:27] I did not know about badges until a few days ago too :D [13:56:18] it's first time I see that feature [13:58:05] wheeee [14:10:53] (03PS2) 10Ottomata: Enable Kafka SSL listener for main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/436171 [14:11:19] !log enabling SSL port for Kafka main-codfw cluster (take 2 :) ) T193778 [14:11:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:24] T193778: SSL and inter broker encryption for Kafka main - https://phabricator.wikimedia.org/T193778 [14:11:29] (03CR) 10Ottomata: [C: 032] Enable Kafka SSL listener for main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/436171 (owner: 10Ottomata) [14:13:16] (03PS1) 10Dbarratt: Revert "Disable Datetime Selector on Special:Block on all wikis except Meta, MediaWiki," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436292 (https://phabricator.wikimedia.org/T193785) [14:13:50] (03PS2) 10Dbarratt: Revert "Disable Datetime Selector on Special:Block on all wikis except Meta, MediaWiki," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436292 (https://phabricator.wikimedia.org/T193785) [14:14:58] (03CR) 10jerkins-bot: [V: 04-1] Revert "Disable Datetime Selector on Special:Block on all wikis except Meta, MediaWiki," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436292 (https://phabricator.wikimedia.org/T193785) (owner: 10Dbarratt) [14:16:36] (03PS1) 10Ottomata: Don't enable auth_acls until all brokers have SSL ports open [puppet] - 10https://gerrit.wikimedia.org/r/436294 (https://phabricator.wikimedia.org/T193778) [14:16:56] (03CR) 10Ottomata: [V: 032 C: 032] Don't enable auth_acls until all brokers have SSL ports open [puppet] - 10https://gerrit.wikimedia.org/r/436294 (https://phabricator.wikimedia.org/T193778) (owner: 10Ottomata) [14:18:30] (03PS5) 10Herron: ELK: change elasticsearch index prefix to logstash-syslog for syslog type [puppet] - 10https://gerrit.wikimedia.org/r/431860 (https://phabricator.wikimedia.org/T193766) [14:18:51] (03PS2) 10Ottomata: Enable SSL inter.broker communication for Kafka main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/434362 (https://phabricator.wikimedia.org/T193778) [14:19:12] (03PS3) 10Dbarratt: Revert "Disable Datetime Selector on Special:Block on all wikis except Meta, MediaWiki," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436292 (https://phabricator.wikimedia.org/T193785) [14:20:01] (03CR) 10Herron: [C: 032] ELK: change elasticsearch index prefix to logstash-syslog for syslog type [puppet] - 10https://gerrit.wikimedia.org/r/431860 (https://phabricator.wikimedia.org/T193766) (owner: 10Herron) [14:21:31] !log changing logstash elasticsearch index prefix for syslogs to 'logstash-syslog' https://gerrit.wikimedia.org/r/431860 [14:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:19] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4243061 (10MoritzMuehlenhoff) Given that the change is now live, shall we close this ticket or do you expect another update soon for the dete... [14:24:38] (03CR) 10Ottomata: [C: 032] Enable SSL inter.broker communication for Kafka main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/434362 (https://phabricator.wikimedia.org/T193778) (owner: 10Ottomata) [14:24:42] (03PS3) 10Ottomata: Enable SSL inter.broker communication for Kafka main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/434362 (https://phabricator.wikimedia.org/T193778) [14:24:44] (03CR) 10Ottomata: [V: 032 C: 032] Enable SSL inter.broker communication for Kafka main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/434362 (https://phabricator.wikimedia.org/T193778) (owner: 10Ottomata) [14:26:57] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1968 bytes in 0.072 second response time [14:28:08] 10Operations, 10Patch-For-Review: Re-add intel-microcode - https://phabricator.wikimedia.org/T127825#4243063 (10ema) The following cache hosts have been running with updated microcodes for the past two days: | **Host** | **CPU** | **Microcode version on die** | **Updated microcode versi... [14:28:43] dispatch :/ [14:29:00] 10Operations, 10Patch-For-Review: Ship host syslogs to ELK - https://phabricator.wikimedia.org/T193766#4243068 (10herron) The updated logstash-syslog prefix is looking good. Beginning to see results in Kibana with the new prefix: {F18615782} [14:30:30] (03PS2) 10Muehlenhoff: Enable Intel microcode installation for labvirt [puppet] - 10https://gerrit.wikimedia.org/r/433359 (https://phabricator.wikimedia.org/T194258) [14:36:03] (03CR) 10DCausse: [C: 031] Disable redis queue for cirrus except wikipedia, commons and wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436249 (https://phabricator.wikimedia.org/T189137) (owner: 10Ppchelko) [14:36:58] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1966 bytes in 0.075 second response time [14:37:32] 10Operations, 10Maps-Sprint, 10Patch-For-Review: reimage maps-test2004 to stretch and cassandra 2.2 - https://phabricator.wikimedia.org/T195741#4243095 (10Mholloway) >>! In T195741#4241483, @Pnorman wrote: > Just to note so it doesn't disappear into a long .bash_history, someone trying to follow the stretch... [14:38:58] (03PS3) 10Mark Bergsma: Add unit testing for BGP Factory classes [debs/pybal] - 10https://gerrit.wikimedia.org/r/433735 [14:39:00] (03PS4) 10Mark Bergsma: Clarify interface of buildProtocol and setEnabledAddressFamilies [debs/pybal] - 10https://gerrit.wikimedia.org/r/433736 [14:39:02] (03PS3) 10Mark Bergsma: Fix BGP collision detection [debs/pybal] - 10https://gerrit.wikimedia.org/r/434161 [14:39:04] (03PS4) 10Mark Bergsma: Add tests that emulate client or server sessions initial connection [debs/pybal] - 10https://gerrit.wikimedia.org/r/434162 [14:39:06] (03PS3) 10Mark Bergsma: Move FSM connect state handling to the FSM itself [debs/pybal] - 10https://gerrit.wikimedia.org/r/434163 [14:39:08] (03PS1) 10Mark Bergsma: Implement BGP FSM events 4 and 5 (passive start) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436297 [14:39:10] (03PS1) 10Mark Bergsma: Implement BGP FSM event 14 [debs/pybal] - 10https://gerrit.wikimedia.org/r/436298 [14:39:12] (03PS1) 10Mark Bergsma: Correct incoming connection interaction with BGP FSM [debs/pybal] - 10https://gerrit.wikimedia.org/r/436299 [14:43:14] /o\ [14:45:28] oh i have more [14:45:32] lovely [14:45:58] but the first few are just rebases, don't worry ;p [14:45:59] vgutierrez: you were about to ask for holidays right? :-P [14:46:10] volans: you read my mind! [14:46:23] 'ask' [14:54:13] 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, 10Patch-For-Review: Update wikidiff2 library on the WMF production cluster - https://phabricator.wikimedia.org/T190717#4243129 (10Lea_WMDE) 05Open>03Resolved We are going to need to deploy the bugfix, but I am going to start a new ticket for that once that... [14:56:19] (03PS2) 10Rush: openstack: labtest use labtestcontrol2003 for keystone [puppet] - 10https://gerrit.wikimedia.org/r/433734 (https://phabricator.wikimedia.org/T167559) [14:58:20] (03CR) 10Muehlenhoff: "Yeah, but this is limited to the deployment classes which are not used on terbium." [puppet] - 10https://gerrit.wikimedia.org/r/436284 (owner: 10Muehlenhoff) [15:00:21] (03PS1) 10Marostegui: db-codfw.php: Depool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436301 (https://phabricator.wikimedia.org/T191316) [15:00:23] (03PS6) 10Jcrespo: mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 [15:00:25] (03PS1) 10Jcrespo: mariadb: Reimage db1082 with stretch [puppet] - 10https://gerrit.wikimedia.org/r/436302 [15:02:20] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436301 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [15:03:31] (03PS2) 10Jcrespo: mariadb: Reimage db1082 with stretch [puppet] - 10https://gerrit.wikimedia.org/r/436302 [15:03:44] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436301 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [15:05:03] (03PS4) 10Jcrespo: Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 [15:05:06] (03PS1) 10Jcrespo: mariadb: Depool db1082 for reimage to stretch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436303 [15:05:36] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2038 for alter table (duration: 01m 01s) [15:05:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:50] (03CR) 10Jcrespo: [C: 032] mariadb: Reimage db1082 with stretch [puppet] - 10https://gerrit.wikimedia.org/r/436302 (owner: 10Jcrespo) [15:09:08] (03CR) 10ArielGlenn: "Ah yes! I retract that. However in its place I leave deployment-tin in beta, still on jessie; it too must be converted. After that this lo" [puppet] - 10https://gerrit.wikimedia.org/r/436284 (owner: 10Muehlenhoff) [15:09:25] !log Deploy schema change on db2038 - T191316 T192926 T89737 T195193 [15:09:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:36] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [15:09:36] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [15:09:36] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [15:09:36] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [15:09:56] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1082 for reimage to stretch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436303 (owner: 10Jcrespo) [15:10:56] (03PS2) 10Jcrespo: mariadb: Depool db1082 for reimage to stretch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436303 [15:11:35] 10Operations, 10Puppet: require_package should mark packages as manually installed - https://phabricator.wikimedia.org/T195981#4243184 (10ema) [15:11:47] 10Operations, 10Puppet: require_package should mark packages as manually installed - https://phabricator.wikimedia.org/T195981#4243194 (10ema) p:05Triage>03Normal [15:12:25] 10Operations, 10Puppet: require_package should mark packages as manually installed - https://phabricator.wikimedia.org/T195981#4243184 (10ema) [15:19:08] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 with higher load, depool db1082 (duration: 01m 01s) [15:19:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:36] 10Operations, 10Puppet: require_package should mark packages as manually installed - https://phabricator.wikimedia.org/T195981#4243232 (10MoritzMuehlenhoff) Ack, the analysis and the proposed fix seem entirely correct. [15:22:29] !log stop and reimage db1082 [15:22:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:38] (03PS1) 10Gehel: maps: upgrade to cassandra 2.2.6-wmf4 [puppet] - 10https://gerrit.wikimedia.org/r/436308 (https://phabricator.wikimedia.org/T195741) [15:23:08] (03CR) 10jerkins-bot: [V: 04-1] maps: upgrade to cassandra 2.2.6-wmf4 [puppet] - 10https://gerrit.wikimedia.org/r/436308 (https://phabricator.wikimedia.org/T195741) (owner: 10Gehel) [15:24:58] PROBLEM - MariaDB Slave IO: s5 on db2092 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl@db1082.eqiad.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Cant connect to MySQL server on db1082.eqiad.wmnet (111 Connection refused) [15:25:08] gehel: o/ - do you have the debdiff for cassandra 2.2 wmf3/wmf4 by any chance? [15:25:15] otherwise I'll get by myself [15:25:23] curious to see if I can deploy it to AQS or not [15:25:29] ^that is ok, but apparently it wasn't downtimed correctly or it expired [15:26:02] elukey: https://github.com/wikimedia/cassandra/commit/fc2b0a53e22bb4c9846b3193cd6475377fe3ee74 (from Eric) [15:26:32] 10Operations, 10Patch-For-Review: Re-add intel-microcode - https://phabricator.wikimedia.org/T127825#4243244 (10MoritzMuehlenhoff) Test servers for Parsoid: https://phabricator.wikimedia.org/P7187 [15:26:56] (03CR) 10Krinkle: mediawiki/hhvm: Move fatal-error.php to Puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [15:27:57] 10Operations, 10Discovery, 10Maps: disk usage increase on maps servers - https://phabricator.wikimedia.org/T194966#4243247 (10Gehel) p:05Triage>03Normal Disk usage has been stable over the last week, so it looks like the tuning we did, while not recovering much space helped stabilize. I'll keep an eye on... [15:28:18] (03CR) 10Eevans: [C: 031] "CI failure notwithstanding, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/436308 (https://phabricator.wikimedia.org/T195741) (owner: 10Gehel) [15:29:09] (03PS2) 10Gehel: maps: upgrade to cassandra 2.2.6-wmf4 [puppet] - 10https://gerrit.wikimedia.org/r/436308 (https://phabricator.wikimedia.org/T195741) [15:30:12] (03PS1) 10Urbanecm: Temporarily enable MFMobileMainPageCss in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436310 (https://phabricator.wikimedia.org/T195905) [15:31:43] (03PS7) 10Krinkle: mediawiki/hhvm: Move fatal-error.php to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) [15:32:02] (03CR) 10Krinkle: "Rebased to resolve conflict with the modules/profile rename." [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [15:40:15] (03CR) 10jenkins-bot: db-codfw.php: Depool db2038 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436301 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [15:40:23] (03CR) 10jenkins-bot: mariadb: Depool db1082 for reimage to stretch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436303 (owner: 10Jcrespo) [15:40:35] (03Draft1) 10Paladox: WIP: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [15:40:37] (03Draft2) 10Paladox: WIP: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [15:43:55] !log installing spice security updates [15:43:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:03] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2038" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436316 [15:46:21] (03PS1) 10Muehlenhoff: Add library hint for spice [puppet] - 10https://gerrit.wikimedia.org/r/436317 [15:46:57] (03PS3) 10Paladox: WIP: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [15:47:05] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2038" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436316 (owner: 10Marostegui) [15:47:26] (03CR) 10Muehlenhoff: [C: 032] Add library hint for spice [puppet] - 10https://gerrit.wikimedia.org/r/436317 (owner: 10Muehlenhoff) [15:47:29] hello, is there a known issue with IRC and channels that are kicking folks out? [15:47:54] I've been kicked off several channels this morning [15:48:07] RECOVERY - MariaDB Slave IO: s5 on db2092 is OK: OK slave_io_state Slave_IO_Running: Yes [15:48:15] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2038" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436316 (owner: 10Marostegui) [15:49:37] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2038 after alter table (duration: 01m 01s) [15:49:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:02] (03CR) 10Vgutierrez: [C: 031] "nice work!, check inline comment, LGTM otherwise." (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/433735 (owner: 10Mark Bergsma) [15:50:29] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2038" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436316 (owner: 10Marostegui) [15:53:56] debt: yes, there have been. make sure you are "identified" to freenode (/msg nickserv help identify) then you can rejoin those channels usually [15:55:42] (03PS1) 10Arturo Borrero Gonzalez: openstack: neutron: disable l2population [puppet] - 10https://gerrit.wikimedia.org/r/436319 (https://phabricator.wikimedia.org/T195786) [15:55:46] (03PS5) 10Jcrespo: Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 [15:55:48] (03PS1) 10Jcrespo: mariadb: Repool db1082 with low weight after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436320 [15:57:06] thanks greg-g - that worked. :) [15:57:07] (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: neutron: disable l2population [puppet] - 10https://gerrit.wikimedia.org/r/436319 (https://phabricator.wikimedia.org/T195786) (owner: 10Arturo Borrero Gonzalez) [15:57:27] debt: np! [15:59:30] (03PS3) 10Giuseppe Lavagetto: utils: add script to generate mcrouter-related certs [puppet] - 10https://gerrit.wikimedia.org/r/436240 (https://phabricator.wikimedia.org/T192771) [16:02:59] (03PS4) 10Mark Bergsma: Add unit testing for BGP Factory classes [debs/pybal] - 10https://gerrit.wikimedia.org/r/433735 [16:03:03] (03CR) 10Mark Bergsma: [C: 031] Add unit testing for BGP Factory classes (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/433735 (owner: 10Mark Bergsma) [16:03:25] (03CR) 10Andrew Bogott: [C: 04-1] "Looks good. One comment inline." (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/435692 (owner: 10Nehajha) [16:04:09] (03CR) 10Mark Bergsma: [C: 032] Add unit testing for BGP Factory classes [debs/pybal] - 10https://gerrit.wikimedia.org/r/433735 (owner: 10Mark Bergsma) [16:04:33] (03Merged) 10jenkins-bot: Add unit testing for BGP Factory classes [debs/pybal] - 10https://gerrit.wikimedia.org/r/433735 (owner: 10Mark Bergsma) [16:06:43] (03CR) 10Andrew Bogott: [C: 04-1] "This is definitely an improvement!" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/435691 (owner: 10Nehajha) [16:08:32] 10Operations, 10Puppet: require_package should mark packages as manually installed - https://phabricator.wikimedia.org/T195981#4243309 (10ema) There's an open upstream bug for this too: https://tickets.puppetlabs.com/browse/PUP-6631 [16:08:53] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM; would it make sense to do the same with the 404 page as well?" [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [16:09:14] (03CR) 10Vgutierrez: Implement BGP FSM event 14 (032 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436298 (owner: 10Mark Bergsma) [16:09:51] (03CR) 10Krinkle: "Yep, and many more per T113114. Starting with just one to try it out and see how it works." [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [16:16:52] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 (owner: 10Jcrespo) [16:17:58] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 (owner: 10Jcrespo) [16:18:24] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1082 with low weight after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436320 (owner: 10Jcrespo) [16:19:29] (03Merged) 10jenkins-bot: mariadb: Repool db1082 with low weight after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436320 (owner: 10Jcrespo) [16:19:36] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436288 (owner: 10Jcrespo) [16:21:47] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 with full weight, repool db1089 with low (duration: 01m 02s) [16:21:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:10] (03PS2) 10Nehajha: Print the type of webservice running [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/435692 (https://phabricator.wikimedia.org/T158244) [16:25:42] (03CR) 10jenkins-bot: mariadb: Repool db1082 with low weight after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436320 (owner: 10Jcrespo) [16:28:16] (03CR) 10Dmaza: [C: 031] Revert "Disable Datetime Selector on Special:Block on all wikis except Meta, MediaWiki," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436292 (https://phabricator.wikimedia.org/T193785) (owner: 10Dbarratt) [16:34:29] (03CR) 10Andrew Bogott: [C: 032] Print the type of webservice running [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/435692 (https://phabricator.wikimedia.org/T158244) (owner: 10Nehajha) [16:34:52] (03CR) 10Andrew Bogott: [C: 032] "We haven't quite agreed on the right way to roll out these changes, but this looks good." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/435692 (https://phabricator.wikimedia.org/T158244) (owner: 10Nehajha) [16:35:08] (03Merged) 10jenkins-bot: Print the type of webservice running [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/435692 (https://phabricator.wikimedia.org/T158244) (owner: 10Nehajha) [16:36:07] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436321 [16:38:21] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1065 storage crash - https://phabricator.wikimedia.org/T195444#4243358 (10Marostegui) p:05Triage>03Normal >>! In T195444#4230827, @jcrespo wrote: > db1065 storage has been rebuilt and data cloned to it again. However, there is a smart error on the... [16:38:34] !log upgrading openjdk-7 on conf100[1-3] [16:38:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:47] (03PS7) 10Jcrespo: mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 [16:38:48] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1089 for maintenance" [puppet] - 10https://gerrit.wikimedia.org/r/436322 [16:40:38] (03Abandoned) 10Jcrespo: Revert "mariadb: Depool db1089 for maintenance" [puppet] - 10https://gerrit.wikimedia.org/r/436322 (owner: 10Jcrespo) [16:41:06] (03Abandoned) 10Jcrespo: Revert "mariadb: Depool db1089 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436321 (owner: 10Jcrespo) [16:42:32] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1082 for reimage to stretch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436323 [16:42:42] (03CR) 10jerkins-bot: [V: 04-1] Revert "mariadb: Depool db1082 for reimage to stretch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436323 (owner: 10Jcrespo) [16:43:34] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1082 for reimage to stretch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436323 [16:47:02] (03CR) 10Zhuyifei1999: Print the type of webservice running (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/435692 (https://phabricator.wikimedia.org/T158244) (owner: 10Nehajha) [16:50:48] (03PS4) 10Paladox: WIP: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [16:54:53] (03PS5) 10Paladox: WIP: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [17:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a Morning SWAT (Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180530T1700). [17:00:05] dmaza, Zoranzoki21, Amir1, and davidwbarratt: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:00:14] o/ [17:00:23] oops .. i forgot to submit a patch for swat. :) [17:01:10] (03PS6) 10Paladox: WIP: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [17:01:59] here! [17:02:10] * subbu submitted a patch [17:02:40] (03PS7) 10Paladox: WIP: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [17:04:49] (03PS8) 10Paladox: WIP: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [17:05:23] (03PS9) 10Paladox: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [17:09:08] (03CR) 10Vgutierrez: Implement BGP FSM event 14 (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436298 (owner: 10Mark Bergsma) [17:10:20] who's swatting [17:10:32] (03CR) 10Alex Monk: Allow PuppetDB use on standalone puppetmasters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/435631 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [17:10:51] (03PS2) 10Alex Monk: Allow PuppetDB use on standalone puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/435631 (https://phabricator.wikimedia.org/T194962) [17:11:26] (03CR) 10jerkins-bot: [V: 04-1] Allow PuppetDB use on standalone puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/435631 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [17:14:45] 10Operations, 10Wikimedia-Mailing-lists: Mailman issues a "403 Forbidden" error when subscribing to a list - https://phabricator.wikimedia.org/T195750#4243456 (10Sylvain_WMFr) One coworker tried to the glam list and had the same issue (Firefox 60 too). Another coworker spoke about the issue to @Sadads who says... [17:20:14] anyone swatting? :) [17:20:29] I do it if no one is around [17:20:32] let's do it [17:20:36] yay [17:20:58] (03CR) 10Vgutierrez: Implement BGP FSM event 14 (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436298 (owner: 10Mark Bergsma) [17:21:15] nice monologue on that CR /o\ [17:21:56] (03PS2) 10Ladsgroup: Enable $wgCookieSetOnIpBlock on test wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436193 (https://phabricator.wikimedia.org/T195930) (owner: 10Dmaza) [17:23:16] (03PS1) 10Bstorm: wiki replicas: refactor some python and systemd stuff for maintain-dbusers [puppet] - 10https://gerrit.wikimedia.org/r/436328 (https://phabricator.wikimedia.org/T188681) [17:23:42] dmaza: Is your patch testable on mwdebug1002? [17:23:52] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436193 (https://phabricator.wikimedia.org/T195930) (owner: 10Dmaza) [17:24:16] sort of I guess [17:24:18] I can try [17:25:01] let me know when it's ready [17:25:14] (03Merged) 10jenkins-bot: Enable $wgCookieSetOnIpBlock on test wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436193 (https://phabricator.wikimedia.org/T195930) (owner: 10Dmaza) [17:25:52] dmaza: It's live there, please test [17:25:58] on it. thanks [17:27:52] (03PS1) 10Jforrester: Drop the UnicodeConverter extension from production, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436331 (https://phabricator.wikimedia.org/T195941) [17:27:54] (03PS1) 10Jforrester: Drop the UnicodeConverter extension from production, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436332 (https://phabricator.wikimedia.org/T195941) [17:27:56] (03PS1) 10Jforrester: Drop the UnicodeConverter extension from production, part 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436333 (https://phabricator.wikimedia.org/T195941) [17:27:58] (03PS1) 10Jforrester: Drop the UnicodeConverter extension from production, part 4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436334 (https://phabricator.wikimedia.org/T195941) [17:29:15] (03CR) 10Vgutierrez: Correct incoming connection interaction with BGP FSM (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436299 (owner: 10Mark Bergsma) [17:29:27] (03CR) 10jenkins-bot: Enable $wgCookieSetOnIpBlock on test wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436193 (https://phabricator.wikimedia.org/T195930) (owner: 10Dmaza) [17:31:30] 10Operations, 10Wikimedia-Mailing-lists: Mailman issues a "403 Forbidden" error when subscribing to a list - https://phabricator.wikimedia.org/T195750#4243495 (10herron) Hi @Sylvain_WMFr could you share that IP address with me? I can trace logs on the lists server. By email would be great (03PS1) 10Rush: WIP: openstack: eqiad1 deployment (neutron in eqiad) [puppet] - 10https://gerrit.wikimedia.org/r/436337 [17:32:47] (03CR) 10jerkins-bot: [V: 04-1] WIP: openstack: eqiad1 deployment (neutron in eqiad) [puppet] - 10https://gerrit.wikimedia.org/r/436337 (owner: 10Rush) [17:34:30] dmaza: ? [17:34:52] I don't see my changes working.. trying to figure out why [17:36:55] (03CR) 10Alex Monk: Allow use of PuppetDB in labs for ssh_known_hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/333471 (https://phabricator.wikimedia.org/T72792) (owner: 10Alex Monk) [17:36:58] dmaza: I just double checked and it's live in mwdebug1002. One thing would be that the addon/extension is not enabled or it's going to job tracker [17:37:02] *job queue [17:37:15] which means it'll go to another node [17:37:34] but I highly doubt that, your changes doesn't look like it [17:38:46] I think the problem is that the feature I'm enabling didn't go out in the train [17:39:17] dmaza: the train hasn't been applied on group0 (yet) [17:39:31] https://tools.wmflabs.org/versions/ [17:39:35] It's still on .5 [17:39:56] well that would be why. :) [17:40:01] yup [17:40:12] is it wrong to leave the config on? [17:40:19] or should I re-schedule it? [17:40:27] I think it's safe [17:40:34] it's only testwikis [17:40:39] it is only on test so [17:41:04] Thanks Amir1 [17:41:56] Thank you! This feature is very valuable (my volunteer hats on) [17:42:02] syncing [17:42:18] davidwbarratt: you're next. [17:42:35] (03PS4) 10Ladsgroup: Revert "Disable Datetime Selector on Special:Block on all wikis except Meta, MediaWiki," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436292 (https://phabricator.wikimedia.org/T193785) (owner: 10Dbarratt) [17:42:37] 👍 [17:42:37] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: Enable $wgCookieSetOnIpBlock on test wiki (T195930) (duration: 01m 03s) [17:42:37] ready [17:42:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:42] T195930: Enable set cookie with IP/IP-Range blocks when blocking logged-out users - https://phabricator.wikimedia.org/T195930 [17:42:58] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436292 (https://phabricator.wikimedia.org/T193785) (owner: 10Dbarratt) [17:44:08] (03Merged) 10jenkins-bot: Revert "Disable Datetime Selector on Special:Block on all wikis except Meta, MediaWiki," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436292 (https://phabricator.wikimedia.org/T193785) (owner: 10Dbarratt) [17:45:03] davidwbarratt: it's live in mwdebgu1002 [17:45:10] kk [17:45:56] works as expected on English Wikipedia [17:46:50] noted syncing [17:47:47] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: Enable the datetime selector on Sp:Block on all Wikimedia wikis (T193785) (duration: 01m 03s) [17:47:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:52] T193785: Revert T192962 thereby enabling the datetime selector on Sp:Block on all Wikimedia wikis - https://phabricator.wikimedia.org/T193785 [17:47:53] (03PS5) 10Alex Monk: Allow use of PuppetDB in labs for ssh_known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/333471 (https://phabricator.wikimedia.org/T72792) [17:47:53] davidwbarratt: it's live everywhere, please test [17:47:55] (03PS1) 10Alex Monk: Delete sshknowngen [puppet] - 10https://gerrit.wikimedia.org/r/436341 [17:48:32] (03PS3) 10Ladsgroup: Enable RemexHtml on a bunch of additional wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435829 (https://phabricator.wikimedia.org/T195263) (owner: 10Subramanya Sastry) [17:48:40] Amir1 works as expected. :) [17:48:49] nice! [17:48:52] subbu: you're next [17:49:09] ok. [17:49:37] (03CR) 10jenkins-bot: Revert "Disable Datetime Selector on Special:Block on all wikis except Meta, MediaWiki," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436292 (https://phabricator.wikimedia.org/T193785) (owner: 10Dbarratt) [17:49:39] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435829 (https://phabricator.wikimedia.org/T195263) (owner: 10Subramanya Sastry) [17:50:44] (03Merged) 10jenkins-bot: Enable RemexHtml on a bunch of additional wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435829 (https://phabricator.wikimedia.org/T195263) (owner: 10Subramanya Sastry) [17:51:23] subbu: it's live in mwdebug1002 [17:51:48] ok. one moment [17:52:40] Amir1, lgtm .. good to go if there are no errors in logs. [17:53:21] it's clean [17:53:24] let's go [17:54:47] alright. ty! [17:54:53] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: Enable RemexHtml on a bunch of additional wikis (T195263) (duration: 01m 02s) [17:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:57] T195263: Do another round of Tidy replacement on May 30th and June 13th (last deploys before final switch) - https://phabricator.wikimedia.org/T195263 [17:55:06] subbu: It's live ^ [17:55:09] (03CR) 10jenkins-bot: Enable RemexHtml on a bunch of additional wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435829 (https://phabricator.wikimedia.org/T195263) (owner: 10Subramanya Sastry) [17:55:12] I wait for a couple of minutes for logs [17:58:28] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) as phab1002 - https://phabricator.wikimedia.org/T195623#4243588 (10RobH) a:03mark @mark: Is this something you would want to approve? If it was a permanent allocation, I know it would be. Since... [17:58:58] Logs are clean, that'sgreat [17:59:01] moving to mine [17:59:10] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430045 (owner: 10Matěj Suchánek) [18:00:05] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180530T1800) [18:00:19] oops [18:00:31] (03Merged) 10jenkins-bot: Update Wikidata wgPropertySuggesterDeprecatedIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430045 (owner: 10Matěj Suchánek) [18:00:47] thcipriani: Is it okay if I fiinish this deployment? or I should revert it? [18:01:45] (03CR) 10jenkins-bot: Update Wikidata wgPropertySuggesterDeprecatedIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430045 (owner: 10Matěj Suchánek) [18:01:57] Amir1: please finish, thank you for handling swat :) [18:02:16] Thanks! [18:02:48] (03CR) 10Mark Bergsma: Implement BGP FSM event 14 (032 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436298 (owner: 10Mark Bergsma) [18:04:24] !log ladsgroup@tin Synchronized wmf-config/Wikibase-production.php: Update Wikidata wgPropertySuggesterDeprecatedIds (duration: 01m 01s) [18:04:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:35] Logs are fine! [18:05:42] !log Morning SWAT is done [18:05:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:01] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) as phab1002 - https://phabricator.wikimedia.org/T195623#4243640 (10mark) >>! In T195623#4243588, @RobH wrote: > @mark: Is this something you would want to approve? If it was a permanent allocation,... [18:06:06] 10Operations, 10Wikimedia-Mailing-lists: Mailman issues a "403 Forbidden" error when subscribing to a list - https://phabricator.wikimedia.org/T195750#4243641 (10herron) IP received by email. Thanks! Unfortunately it appears this address is listed on a few spam blocklists. Details about which lists and link... [18:06:25] (03PS5) 10Mark Bergsma: Clarify interface of buildProtocol and setEnabledAddressFamilies [debs/pybal] - 10https://gerrit.wikimedia.org/r/433736 [18:06:27] (03PS4) 10Mark Bergsma: Fix BGP collision detection [debs/pybal] - 10https://gerrit.wikimedia.org/r/434161 [18:06:29] (03PS5) 10Mark Bergsma: Add tests that emulate client or server sessions initial connection [debs/pybal] - 10https://gerrit.wikimedia.org/r/434162 [18:06:31] (03PS4) 10Mark Bergsma: Move FSM connect state handling to the FSM itself [debs/pybal] - 10https://gerrit.wikimedia.org/r/434163 [18:06:33] (03PS2) 10Mark Bergsma: Implement BGP FSM events 4 and 5 (passive start) [debs/pybal] - 10https://gerrit.wikimedia.org/r/436297 [18:06:35] (03PS2) 10Mark Bergsma: Implement BGP FSM event 14 [debs/pybal] - 10https://gerrit.wikimedia.org/r/436298 [18:06:37] (03PS2) 10Mark Bergsma: Correct incoming connection interaction with BGP FSM [debs/pybal] - 10https://gerrit.wikimedia.org/r/436299 [18:07:16] (03CR) 10jerkins-bot: [V: 04-1] Clarify interface of buildProtocol and setEnabledAddressFamilies [debs/pybal] - 10https://gerrit.wikimedia.org/r/433736 (owner: 10Mark Bergsma) [18:09:43] 10Operations, 10Wikimedia-Mailing-lists: Mailman issues a "403 Forbidden" error when subscribing to a list - https://phabricator.wikimedia.org/T195750#4243663 (10herron) p:05Triage>03Normal [18:16:48] (03PS10) 10Paladox: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [18:17:08] (03PS11) 10Paladox: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 [18:20:50] marostegui: FYI, https://phabricator.wikimedia.org/T192926#4243687 [18:22:25] and https://phabricator.wikimedia.org/T191316#4243745 [18:27:12] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1970 bytes in 0.082 second response time [18:27:37] anomie: I just replied [18:30:58] (03PS4) 10Herron: role::mail::mx: Permit changing certificate [puppet] - 10https://gerrit.wikimedia.org/r/435814 (owner: 10Alex Monk) [18:31:00] anomie: Maybe a mariadb bug? [18:31:18] marostegui: It looks like you need STRICT_ALL_TABLES or STRICT_TRANS_TABLES in the sql_mode for INPLACE to work when adding NOT NULL. [18:32:46] anomie: Ah right, we do not have that one enabled [18:33:08] marostegui: Can it be set for the session without setting it for other sessions, or does that break stuff? [18:33:52] (03PS2) 10Zoranzoki21: Add sites to the wgCopyUploadsDomains whitelist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436211 (https://phabricator.wikimedia.org/T195270) [18:34:51] anomie: I just tested and it can be enabled on a session, but I am not sure what the implications can be. I will need to test it further [18:35:09] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561#4243815 (10Dzahn) deployment-deploy1001 has been deleted by thcipriani. deployment-deploy-01 has been created with x-large flavor for mor... [18:35:11] I will comment on the task [18:36:45] anomie: It will "only" need to be enabled on the masters, as we are depooling the slaves anyways [18:38:47] marostegui: I was mostly concerned that you said there were data type changes going on when there shouldn't have been. Now that we have that figured out, I'm not concerned anymore. ;) [18:39:11] anomie: Yeah, sorry I didn't express myself correctly :) [18:39:52] No problem. I'd much rather look and find no issue than not look and later find something like T187089 (: [18:39:53] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [18:40:14] * anomie grumbles about MySQL/MariaDB not warning us about that one [18:40:21] anomie: Yeah, that was a bad one :( [18:41:02] Also, that's why I like not having deadlines for getting stuff released ;) [18:41:35] anomie: Yeah, we were counting with the dc failover as well to get it 100% done, but it was postponed :( [18:43:42] (03CR) 10Herron: [C: 032] "Thanks for refactoring!" [puppet] - 10https://gerrit.wikimedia.org/r/435814 (owner: 10Alex Monk) [18:43:49] (03PS5) 10Herron: role::mail::mx: Permit changing certificate [puppet] - 10https://gerrit.wikimedia.org/r/435814 (owner: 10Alex Monk) [18:44:43] (03PS1) 10Thcipriani: Group0 to 1.32.0-wmf.6 ref T191052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436352 [18:45:11] (03PS1) 10Bstorm: wiki replicas: maintain-dbusers to skip offline labsdb servers [puppet] - 10https://gerrit.wikimedia.org/r/436353 (https://phabricator.wikimedia.org/T188681) [18:45:29] (03CR) 10Gehel: [C: 032] maps: upgrade to cassandra 2.2.6-wmf4 [puppet] - 10https://gerrit.wikimedia.org/r/436308 (https://phabricator.wikimedia.org/T195741) (owner: 10Gehel) [18:45:35] (03PS2) 10Thcipriani: Group0 to 1.32.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436352 (https://phabricator.wikimedia.org/T191052) [18:45:37] (03PS3) 10Gehel: maps: upgrade to cassandra 2.2.6-wmf4 [puppet] - 10https://gerrit.wikimedia.org/r/436308 (https://phabricator.wikimedia.org/T195741) [18:47:35] (03CR) 10Thcipriani: [C: 032] Group0 to 1.32.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436352 (https://phabricator.wikimedia.org/T191052) (owner: 10Thcipriani) [18:48:52] RECOVERY - cassandra service on maps-test2004 is OK: OK - cassandra is active [18:49:06] (03Merged) 10jenkins-bot: Group0 to 1.32.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436352 (https://phabricator.wikimedia.org/T191052) (owner: 10Thcipriani) [18:49:21] RECOVERY - Check systemd state on maps-test2004 is OK: OK - running: The system is fully operational [18:49:48] (03CR) 10jenkins-bot: Group0 to 1.32.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436352 (https://phabricator.wikimedia.org/T191052) (owner: 10Thcipriani) [18:51:40] !log thcipriani@tin Started scap: testwiki to php-1.32.0-wmf.6 and rebuild l10n cache [18:51:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:52:32] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1974 bytes in 0.073 second response time [18:53:20] 10Operations, 10Maps-Sprint, 10Patch-For-Review: reimage maps-test2004 to stretch and cassandra 2.2 - https://phabricator.wikimedia.org/T195741#4243867 (10Gehel) cassandra and cassandra-tools-wmf are manually installed from https://people.wikimedia.org/~eevans/debian/ on maps-test2004, after testing I'll upl... [18:53:41] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) as phab1002 - https://phabricator.wikimedia.org/T195623#4243868 (10Dzahn) It has been suggested by Moritz because the hardware specs are quite similar to the existing phab server. ("The specs are rou... [18:57:24] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4243879 (10Dzahn) The new planned window for this migration is the upcoming Friday, June 1st. (with thcipriani hoping he gets t... [19:00:04] thcipriani: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki train. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180530T1900). [19:00:41] * thcipriani continues [19:01:25] (03PS6) 10Andrew Bogott: wikitech: remove OpenStackManager private settings [puppet] - 10https://gerrit.wikimedia.org/r/432703 (https://phabricator.wikimedia.org/T161553) [19:01:26] (03PS1) 10Andrew Bogott: Designate policy.json: Rename 'domain' to 'zone' in Ocata [puppet] - 10https://gerrit.wikimedia.org/r/436354 (https://phabricator.wikimedia.org/T195059) [19:02:10] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561#4243899 (10Dzahn) applied the "role(deployment_server)" on it via instance puppet. (like in prod, no other roles yet that would differ fr... [19:03:41] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4243902 (10Dzahn) deployment-prep now also has a new instance using stretch with more disk space to match this (T192561#4243810) [19:04:40] (03CR) 10Framawiki: [C: 031] Set wgProofreadPagePageSeparator to '' for jawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436168 (https://phabricator.wikimedia.org/T195873) (owner: 10Urbanecm) [19:04:56] (03CR) 10Framawiki: [C: 031] Set wgProofreadPagePageSeparator='' on zhwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436169 (https://phabricator.wikimedia.org/T194875) (owner: 10Urbanecm) [19:08:04] 10Operations, 10ops-codfw, 10Cloud-VPS: move/setup/install labtestnet2002(WMF6469) - https://phabricator.wikimedia.org/T196000#4243908 (10RobH) p:05Triage>03Normal [19:08:51] 10Operations, 10ops-codfw, 10Cloud-VPS: move/setup/install labtestnet2002(WMF6469) - https://phabricator.wikimedia.org/T196000#4243924 (10RobH) a:05Papaul>03RobH Hold off on this until I get @mark approval for this allocation, I meant to keep it assigned to me for now. [19:13:40] 10Operations, 10WMF-Blog-Social-Team, 10Wikimedia-Mailing-lists: Request mailman list for upcoming affiliate campaign - https://phabricator.wikimedia.org/T196003#4243975 (10MelodyKramer) [19:14:08] (03PS12) 10Dzahn: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 (owner: 10Paladox) [19:14:34] (03CR) 10Andrew Bogott: [C: 032] Designate policy.json: Rename 'domain' to 'zone' in Ocata [puppet] - 10https://gerrit.wikimedia.org/r/436354 (https://phabricator.wikimedia.org/T195059) (owner: 10Andrew Bogott) [19:19:44] (03CR) 10Andrew Bogott: "@Jcrespo, I'm still blocked for lack of this patch. Any thoughts?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429841 (https://phabricator.wikimedia.org/T192339) (owner: 10Andrew Bogott) [19:31:23] !log pnorman@tin Started deploy [tilerator/deploy@bc35971] (cleartables): Use parameterized dbname on test2004 [19:31:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:34] (03CR) 10Dzahn: Planet: Update libs and some ui tweeks for rawdog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/436312 (owner: 10Paladox) [19:31:36] !log pnorman@tin Finished deploy [tilerator/deploy@bc35971] (cleartables): Use parameterized dbname on test2004 (duration: 00m 13s) [19:31:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:15] !log pnorman@tin Started deploy [tilerator/deploy@bc35971] (cleartables): Use parameterized dbname on test2004 [19:33:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:22] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1962 bytes in 0.069 second response time [19:36:31] PROBLEM - tilerator on maps-test2002 is CRITICAL: connect to address 10.192.0.129 and port 6534: Connection refused [19:36:41] PROBLEM - tileratorui on maps-test2001 is CRITICAL: connect to address 10.192.0.128 and port 6535: Connection refused [19:36:42] PROBLEM - tileratorui on maps-test2002 is CRITICAL: connect to address 10.192.0.129 and port 6535: Connection refused [19:36:43] PROBLEM - tilerator on maps-test2001 is CRITICAL: connect to address 10.192.0.128 and port 6534: Connection refused [19:37:53] !log pnorman@tin Started deploy [tilerator/deploy@9e40702]: Restore test2001 test2002 [19:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:07] ACKNOWLEDGEMENT - tilerator on maps-test2001 is CRITICAL: connect to address 10.192.0.128 and port 6534: Connection refused Gehel wrong config deployed on test servers [19:38:07] ACKNOWLEDGEMENT - tileratorui on maps-test2001 is CRITICAL: connect to address 10.192.0.128 and port 6535: Connection refused Gehel wrong config deployed on test servers [19:38:07] ACKNOWLEDGEMENT - tilerator on maps-test2002 is CRITICAL: connect to address 10.192.0.129 and port 6534: Connection refused Gehel wrong config deployed on test servers [19:38:07] ACKNOWLEDGEMENT - tileratorui on maps-test2002 is CRITICAL: connect to address 10.192.0.129 and port 6535: Connection refused Gehel wrong config deployed on test servers [19:38:20] !log pnorman@tin Finished deploy [tilerator/deploy@9e40702]: Restore test2001 test2002 (duration: 00m 27s) [19:38:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:41] RECOVERY - tilerator on maps-test2002 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.089 second response time [19:38:51] RECOVERY - tileratorui on maps-test2002 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.095 second response time [19:39:21] PROBLEM - pdfrender on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 5252: Connection refused [19:41:55] (03CR) 10Dzahn: Planet: Update libs and some ui tweeks for rawdog (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/436312 (owner: 10Paladox) [19:43:28] (03CR) 10Paladox: Planet: Update libs and some ui tweeks for rawdog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/436312 (owner: 10Paladox) [19:43:30] (03CR) 10Dzahn: Planet: Update libs and some ui tweeks for rawdog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/436312 (owner: 10Paladox) [19:49:32] (03PS7) 10Andrew Bogott: wikitech: remove OpenStackManager private settings [puppet] - 10https://gerrit.wikimedia.org/r/432703 (https://phabricator.wikimedia.org/T161553) [19:49:34] (03PS1) 10Andrew Bogott: Designate: remove the 'zone_primary_or_admin' rule. [puppet] - 10https://gerrit.wikimedia.org/r/436357 (https://phabricator.wikimedia.org/T195059) [19:50:34] (03CR) 10Andrew Bogott: [C: 032] Designate: remove the 'zone_primary_or_admin' rule. [puppet] - 10https://gerrit.wikimedia.org/r/436357 (https://phabricator.wikimedia.org/T195059) (owner: 10Andrew Bogott) [19:57:35] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) as phab1002 - https://phabricator.wikimedia.org/T195623#4244152 (10RobH) We don't have any spare systems in eqiad with 64GB of RAM. WMF4727 has 32GB of RAM, dual 3GHz/4C and 4*4TB, so more than enou... [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / … . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180530T2000). [20:00:06] (03CR) 10Paladox: Planet: Update libs and some ui tweeks for rawdog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/436312 (owner: 10Paladox) [20:00:56] no parsoid deploy today [20:02:46] 10Operations, 10Cloud-VPS, 10Patch-For-Review: Cannot add or update records under DNS zones in Horizon - https://phabricator.wikimedia.org/T195059#4244162 (10Andrew) 05Open>03Resolved a:03Andrew @Krenair confirms that this is now fixed [20:04:34] (03CR) 10Paladox: Planet: Update libs and some ui tweeks for rawdog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/436312 (owner: 10Paladox) [20:14:42] RECOVERY - Check Varnish expiry mailbox lag on cp5008 is OK: OK: expiry mailbox lag is 0 [20:14:52] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy2001 is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging [20:20:03] ACKNOWLEDGEMENT - HP RAID on labvirt1019 is CRITICAL: CRITICAL: Slot 0: no logical drives --- Slot 0: no drives --- Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:1:1, 2I:1:2, 2I:1:3, 2I:1:4, 2I:2:1, 2I:2:2 - Controller: OK - Battery count: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T196014 [20:20:10] 10Operations, 10ops-eqiad: Degraded RAID on labvirt1019 - https://phabricator.wikimedia.org/T196014#4244219 (10ops-monitoring-bot) [20:25:01] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy2001 is OK: Files ownership is ok. [20:25:10] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) as phab1002 - https://phabricator.wikimedia.org/T195623#4244224 (10Dzahn) >>! In T195623#4243640, @mark wrote: > However I dislike the use of a server from the existing MediaWiki appserver cluster fo... [20:26:01] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1972 bytes in 0.134 second response time [20:38:01] PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 52.15, 23.98, 16.65 [20:40:11] RECOVERY - High CPU load on API appserver on mw1227 is OK: OK - load average: 22.42, 22.76, 17.19 [20:42:44] 10Operations, 10Mail, 10Patch-For-Review: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361#4244275 (10herron) Crickets! Ok, I'll plan on merging the localhost smtp listener part tomorrow and get to work depooling mx1001 for reimage next week. [20:45:42] !log thcipriani@tin Finished scap: testwiki to php-1.32.0-wmf.6 and rebuild l10n cache (duration: 114m 02s) [20:45:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:49] (03PS4) 10Herron: logstash: add tcp tls input for syslogs [puppet] - 10https://gerrit.wikimedia.org/r/431830 (https://phabricator.wikimedia.org/T193766) [20:50:13] (03CR) 10Herron: logstash: add tcp tls input for syslogs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/431830 (https://phabricator.wikimedia.org/T193766) (owner: 10Herron) [20:58:19] !log thcipriani@tin rebuilt and synchronized wikiversions files: group0 to 1.32.0-wmf.6 [20:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:59:42] PROBLEM - kartotherian endpoints health on maps-test2004 is CRITICAL: /osm-intl/9/207/163@1.5x.png (default scaled tile) is CRITICAL: Test default scaled tile returned the unexpected status 400 (expecting: 200): /osm-intl/11/828/655.png (get a tile in the middle of the ocean, with overzoom) is CRITICAL: Test get a tile in the middle of the ocean, with overzoom returned the unexpected status 400 (expecting: 200): /img/osm-intl,1 [20:59:42] 5x.png (Small scaled map) is CRITICAL: Test Small scaled map returned the unexpected status 400 (expecting: 200): /osm-intl/info.json (tile service info for osm-intl) is CRITICAL: Test tile service info for osm-intl returned the unexpected status 400 (expecting: 200) [20:59:53] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) as phab1002 - https://phabricator.wikimedia.org/T195623#4244316 (10RobH) IRC sync update: Ok, the non-mw spare pool system with 32GB for @dzahn's use is WMF4727. I'll create the setup task now. [21:02:36] 10Operations, 10ops-eqiad: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019#4244341 (10RobH) p:05Triage>03Normal [21:04:58] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) as phab1002 - https://phabricator.wikimedia.org/T195623#4244355 (10RobH) 05Open>03Resolved [21:07:03] (03PS1) 10RobH: phab1002 mgmt dns [dns] - 10https://gerrit.wikimedia.org/r/436406 (https://phabricator.wikimedia.org/T196019) [21:07:36] 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019#4244385 (10RobH) [21:08:01] (03CR) 10RobH: [C: 032] phab1002 mgmt dns [dns] - 10https://gerrit.wikimedia.org/r/436406 (https://phabricator.wikimedia.org/T196019) (owner: 10RobH) [21:09:08] (03PS13) 10Dzahn: Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 (owner: 10Paladox) [21:10:16] (03CR) 10Dzahn: [C: 032] Planet: Update libs and some ui tweeks for rawdog [puppet] - 10https://gerrit.wikimedia.org/r/436312 (owner: 10Paladox) [21:10:21] thanks! [21:12:28] thanks for the theme work [21:12:41] bbiaw [21:16:54] your welcome :) [21:16:55] we can roll it out now that the ui is finished [21:16:57] can be improved later if needed [21:18:51] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1970 bytes in 0.069 second response time [21:19:30] facepalm [21:25:48] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet broken on deployment-mx due to systemd on trusty - https://phabricator.wikimedia.org/T184244#4244425 (10Krenair) With help from @herron on https://gerrit.wikimedia.org/r/#/c/435814/ and Andrew on T195059 (and a weird labsaliaser problem) I've managed to get the p... [21:30:04] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561#4244432 (10EddieGP) Woohoo! Thanks Daniel and Tyler! :) [21:35:19] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review: Connect or troubleshoot eth1 on labvirt1019 and labvirt1020 - https://phabricator.wikimedia.org/T194964#4244447 (10Bstorm) @Cmjohnson I currently see labvirt1020 with both ports live, but labvirt1019 shows NO-CARRIER no matter what I do from... [21:40:42] (03PS1) 10Thcipriani: Group1 to 1.32.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436409 (https://phabricator.wikimedia.org/T191052) [21:43:14] (03CR) 10Thcipriani: [C: 032] Group1 to 1.32.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436409 (https://phabricator.wikimedia.org/T191052) (owner: 10Thcipriani) [21:44:50] (03Merged) 10jenkins-bot: Group1 to 1.32.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436409 (https://phabricator.wikimedia.org/T191052) (owner: 10Thcipriani) [21:47:07] !log thcipriani@tin rebuilt and synchronized wikiversions files: Group1 to 1.32.0-wmf.6 [21:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:47:24] (03CR) 10Volans: "The script looks ok, few minor comments inline." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/436240 (https://phabricator.wikimedia.org/T192771) (owner: 10Giuseppe Lavagetto) [21:49:28] (03CR) 10jenkins-bot: Group1 to 1.32.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436409 (https://phabricator.wikimedia.org/T191052) (owner: 10Thcipriani) [21:59:31] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1977 bytes in 0.166 second response time [22:07:55] 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019#4244528 (10Dzahn) a:03Dzahn [22:08:37] thcipriani: I added two train blockers [22:09:37] legoktm: rollback group1-type blockers? or don't move to group2-type? [22:09:41] * thcipriani looks [22:10:02] thcipriani: the globalprefs one is rollback worthy I think, but I'll have a patch in a second [22:10:40] ok [22:10:43] thank you! [22:12:22] thcipriani: (PS1) Legoktm: Don't type hint PreferencesFormPreSave hook against PreferencesForm [extensions/GlobalPreferences] - https://gerrit.wikimedia.org/r/436411 (https://phabricator.wikimedia.org/T196023) [22:12:51] aaand looks like MatmaRex is working on the timeless one :) [22:13:21] legoktm: already done :) [22:14:06] <3 [22:14:12] +2'd [22:14:26] (03CR) 10BryanDavis: [C: 031] wiki replicas: maintain-dbusers to skip offline labsdb servers [puppet] - 10https://gerrit.wikimedia.org/r/436353 (https://phabricator.wikimedia.org/T188681) (owner: 10Bstorm) [22:15:48] legoktm: merging your patch to wmf.6, will be staged momentarily [22:16:36] 10Operations, 10ops-eqiad: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019#4244552 (10RobH) [22:18:01] legoktm: pulled your patch over to mwdebug1002 [22:19:25] thcipriani: tested, works [22:19:40] awesome, going live [22:19:57] 10Operations, 10hardware-requests, 10Patch-For-Review: request to assign wmf6937 (mw1298, former imagescaler) (now: wmf4727) as phab1002 - https://phabricator.wikimedia.org/T195623#4244561 (10Dzahn) [22:21:04] (03PS3) 10Dzahn: assign wmf4727 as phab1002 [dns] - 10https://gerrit.wikimedia.org/r/435211 (https://phabricator.wikimedia.org/T190568) [22:22:24] !log thcipriani@tin Synchronized php-1.32.0-wmf.6/extensions/GlobalPreferences/includes/Hooks.php: SWAT: [[gerrit:436413|Do not type hint PreferencesFormPreSave hook against PreferencesForm]] T196023 (duration: 01m 22s) [22:22:31] ^ legoktm live now [22:22:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:22:32] T196023: Attempt to change skins throwing error on MediaWiki.org - https://phabricator.wikimedia.org/T196023 [22:22:50] (03CR) 10RobH: [C: 04-1] "the patch shouldn't include renaming mw1297, its not being touched by this change any longer." [dns] - 10https://gerrit.wikimedia.org/r/435211 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [22:23:32] (03CR) 10BryanDavis: [C: 031] wiki replicas: refactor some python and systemd stuff for maintain-dbusers [puppet] - 10https://gerrit.wikimedia.org/r/436328 (https://phabricator.wikimedia.org/T188681) (owner: 10Bstorm) [22:30:31] (03CR) 10Dzahn: "it doesn't. i wanted to amend to this change to adjust it to the new situation after my original request was rejected." [dns] - 10https://gerrit.wikimedia.org/r/435211 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [22:30:36] (03Abandoned) 10Dzahn: assign wmf4727 as phab1002 [dns] - 10https://gerrit.wikimedia.org/r/435211 (https://phabricator.wikimedia.org/T190568) (owner: 10Dzahn) [22:30:45] (03CR) 10BryanDavis: [C: 031] Remove warning about using labs-vagrant on stretch [puppet] - 10https://gerrit.wikimedia.org/r/436277 (https://phabricator.wikimedia.org/T180377) (owner: 10Gergő Tisza) [22:31:53] (03PS2) 10Andrew Bogott: Remove warning about using labs-vagrant on stretch [puppet] - 10https://gerrit.wikimedia.org/r/436277 (https://phabricator.wikimedia.org/T180377) (owner: 10Gergő Tisza) [22:32:40] (03CR) 10Andrew Bogott: [C: 032] Remove warning about using labs-vagrant on stretch [puppet] - 10https://gerrit.wikimedia.org/r/436277 (https://phabricator.wikimedia.org/T180377) (owner: 10Gergő Tisza) [22:34:12] MatmaRex: your timeless change is on mwdebug1002 if you've got time to look [22:35:36] 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4244596 (10Jrbranaa) @danstillman thanks for the additional info regarding your developments. As timing is of the ess... [22:36:05] seems to be working for me, I'll go ahead and sync :) [22:36:10] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4244599 (10Dzahn) Using the mw server has not been approved (T195623). We will have to use another spare machine with... [22:40:42] !log thcipriani@tin Synchronized php-1.32.0-wmf.6/skins/Timeless/includes/TimelessTemplate.php: [[gerrit:436415|Fix condition for "emptyPortlet" class]] T196026 (duration: 01m 21s) [22:40:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:40:49] T196026: Problem With Timeless Skin - https://phabricator.wikimedia.org/T196026 [22:43:41] PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[horizon/deploy] [22:49:33] (03Draft1) 10Paladox: Planet: Redirect atom.xml to rss20.xml [puppet] - 10https://gerrit.wikimedia.org/r/436416 [22:49:34] (03PS2) 10Paladox: Planet: Redirect atom.xml to rss20.xml [puppet] - 10https://gerrit.wikimedia.org/r/436416 [22:49:38] (03PS3) 10Paladox: Planet: Redirect atom.xml to rss20.xml [puppet] - 10https://gerrit.wikimedia.org/r/436416 [22:55:28] (03PS4) 10Paladox: Planet: Redirect atom.xml to rss20.xml [puppet] - 10https://gerrit.wikimedia.org/r/436416 [22:57:14] (03PS5) 10Dzahn: Planet: Redirect atom.xml to rss20.xml [puppet] - 10https://gerrit.wikimedia.org/r/436416 (https://phabricator.wikimedia.org/T180498) (owner: 10Paladox) [22:57:24] (03CR) 10Dzahn: [C: 032] Planet: Redirect atom.xml to rss20.xml [puppet] - 10https://gerrit.wikimedia.org/r/436416 (https://phabricator.wikimedia.org/T180498) (owner: 10Paladox) [22:57:27] :) [23:00:16] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy Evening SWAT (Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180530T2300). [23:00:17] No GERRIT patches in the queue for this window AFAICS. [23:09:01] RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:18:00] (03PS2) 10Bstorm: wiki replicas: refactor some python and systemd stuff for maintain-dbusers [puppet] - 10https://gerrit.wikimedia.org/r/436328 (https://phabricator.wikimedia.org/T188681) [23:19:00] (03CR) 10Bstorm: [C: 032] wiki replicas: refactor some python and systemd stuff for maintain-dbusers [puppet] - 10https://gerrit.wikimedia.org/r/436328 (https://phabricator.wikimedia.org/T188681) (owner: 10Bstorm) [23:23:32] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1004 is CRITICAL: CRITICAL - Expecting active but unit maintain-dbusers is failed [23:24:01] PROBLEM - Check systemd state on labstore1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:26:48] (03PS1) 10Bstorm: Revert "wiki replicas: refactor some python and systemd stuff for maintain-dbusers" [puppet] - 10https://gerrit.wikimedia.org/r/436424 [23:27:12] (03CR) 10Bstorm: "Made a typo. Need to update this patch." [puppet] - 10https://gerrit.wikimedia.org/r/436424 (owner: 10Bstorm) [23:27:23] (03CR) 10jerkins-bot: [V: 04-1] Revert "wiki replicas: refactor some python and systemd stuff for maintain-dbusers" [puppet] - 10https://gerrit.wikimedia.org/r/436424 (owner: 10Bstorm) [23:29:31] (03PS3) 10Alex Monk: Allow PuppetDB use on standalone puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/435631 (https://phabricator.wikimedia.org/T194962) [23:33:02] (03PS1) 10Bstorm: wiki replicas: fix a mistake in the maintain-dbusers refactor [puppet] - 10https://gerrit.wikimedia.org/r/436425 [23:33:24] (03PS2) 10Bstorm: wiki replicas: fix a mistake in the maintain-dbusers refactor [puppet] - 10https://gerrit.wikimedia.org/r/436425 [23:34:14] (03CR) 10Bstorm: [C: 032] wiki replicas: fix a mistake in the maintain-dbusers refactor [puppet] - 10https://gerrit.wikimedia.org/r/436425 (owner: 10Bstorm) [23:34:50] (03PS1) 10Dzahn: planet/misc-varnish: add codfw backend to director [puppet] - 10https://gerrit.wikimedia.org/r/436426 (https://phabricator.wikimedia.org/T168490) [23:34:52] (03PS1) 10Dzahn: planet/misc-varnish: comment out eqiad backend for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/436427 (https://phabricator.wikimedia.org/T168490) [23:35:52] RECOVERY - Check systemd state on labstore1004 is OK: OK - running: The system is fully operational [23:36:01] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1005 is CRITICAL: CRITICAL - Expecting active but unit maintain-dbusers is failed [23:36:02] PROBLEM - Check systemd state on labstore1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:36:41] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1004 is OK: OK - maintain-dbusers is active [23:38:21] (03CR) 10Bstorm: "Fixed with another commit instead" [puppet] - 10https://gerrit.wikimedia.org/r/436424 (owner: 10Bstorm) [23:38:27] (03Abandoned) 10Bstorm: Revert "wiki replicas: refactor some python and systemd stuff for maintain-dbusers" [puppet] - 10https://gerrit.wikimedia.org/r/436424 (owner: 10Bstorm) [23:38:41] (03PS2) 10Bstorm: wiki replicas: maintain-dbusers to skip offline labsdb servers [puppet] - 10https://gerrit.wikimedia.org/r/436353 (https://phabricator.wikimedia.org/T188681) [23:39:44] (03CR) 10Paladox: [C: 031] planet/misc-varnish: add codfw backend to director [puppet] - 10https://gerrit.wikimedia.org/r/436426 (https://phabricator.wikimedia.org/T168490) (owner: 10Dzahn) [23:39:53] (03CR) 10Paladox: [C: 031] planet/misc-varnish: comment out eqiad backend for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/436427 (https://phabricator.wikimedia.org/T168490) (owner: 10Dzahn) [23:40:41] (03CR) 10Dzahn: [C: 032] planet/misc-varnish: add codfw backend to director [puppet] - 10https://gerrit.wikimedia.org/r/436426 (https://phabricator.wikimedia.org/T168490) (owner: 10Dzahn) [23:40:58] (03PS2) 10Dzahn: planet/misc-varnish: add codfw backend to director [puppet] - 10https://gerrit.wikimedia.org/r/436426 (https://phabricator.wikimedia.org/T168490) [23:42:34] (03PS2) 10Dzahn: planet/misc-varnish: comment out eqiad backend for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/436427 (https://phabricator.wikimedia.org/T168490) [23:45:12] 10Operations, 10ops-ulsfo, 10Traffic, 10netops: troubleshoot cr3/cr4 link - https://phabricator.wikimedia.org/T196030#4244733 (10RobH) Ok, so no good news: I went ahead and did the following things, in the following order. Each step was followed by checking the optics diagnostics to see the send/rcv powe... [23:47:17] (03CR) 10Dzahn: [C: 032] planet/misc-varnish: comment out eqiad backend for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/436427 (https://phabricator.wikimedia.org/T168490) (owner: 10Dzahn) [23:47:54] (03CR) 10Dzahn: [C: 032] "i ran puppet on all cache::misc between this and the previous change that added codfw." [puppet] - 10https://gerrit.wikimedia.org/r/436427 (https://phabricator.wikimedia.org/T168490) (owner: 10Dzahn) [23:57:25] (03Draft1) 10Paladox: Planet: Fix path to libs [puppet] - 10https://gerrit.wikimedia.org/r/436428 [23:57:28] (03PS2) 10Paladox: Planet: Fix path to libs [puppet] - 10https://gerrit.wikimedia.org/r/436428 [23:58:02] PROBLEM - Host elastic2018 is DOWN: PING CRITICAL - Packet loss = 100%