[00:02:59] PROBLEM - SSH on stat1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:06:08] RECOVERY - SSH on stat1005 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u3 (protocol 2.0) [00:10:28] PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds [00:25:48] PROBLEM - IPMI Sensor Status on stat1005 is CRITICAL: Return code of 255 is out of bounds [00:30:29] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [00:30:29] RECOVERY - DPKG on stat1005 is OK: All packages OK [00:30:29] RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational [00:31:08] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [00:31:09] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [00:35:08] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [00:40:29] RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Mon 2018-08-27 00:40:25 UTC. [00:55:48] RECOVERY - IPMI Sensor Status on stat1005 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK [01:24:33] (03CR) 10Krinkle: [C: 031] [Wikimania] Create year namespaces for each Wikimania, 2005–2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455049 (https://phabricator.wikimedia.org/T202683) (owner: 10Jforrester) [02:03:59] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on einsteinium is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [02:04:48] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [02:07:18] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [02:15:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [02:36:45] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.18) (duration: 15m 13s) [02:36:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:46:57] !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Mon Aug 27 02:46:56 UTC 2018 (duration 10m 12s) [02:47:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:28:59] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on einsteinium is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [03:29:39] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 841.05 seconds [03:32:18] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [04:00:59] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 272.84 seconds [04:47:25] (03CR) 10Andrew Bogott: [C: 04-1] "One minor concern, inline. Otherwise, this looks right to me. I tested it with the puppet compiler and the diff there seems correct as w" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) (owner: 10Arturo Borrero Gonzalez) [05:04:58] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [05:09:08] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [05:14:37] !log Deploy schema change on db1066 (s2 primary master) [05:14:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:15:29] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [05:16:57] !log reimaging wtp2011-wtp2013 to stretch [05:17:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:17:48] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [05:27:51] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455488 [05:29:23] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455488 (owner: 10Marostegui) [05:30:44] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455488 (owner: 10Marostegui) [05:31:35] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455488 (owner: 10Marostegui) [05:31:48] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1099:3318 (duration: 00m 50s) [05:31:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:46:12] (03PS4) 10Marostegui: production-m5.sql.erb: Add GRANTS for nova user [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) [05:46:54] (03CR) 10Marostegui: "I have given it 100 connections for now to keep this moving - I assume not 100 will be used straightaway and we can discuss this further a" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [05:53:59] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) From my tcpdumps it seems that no more https calls are made via ipv6 without going through the proxy. @ayounsi, we can proceed... [06:01:13] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10MoritzMuehlenhoff) >>! In T199125#4530463, @RobH wrote: > @MoritzMuehlenhoff: Just to confirm, you'd like us to take a third SSD o... [06:27:55] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/disable-puppet] [06:28:44] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455499 [06:29:04] PROBLEM - puppet last run on mw1307 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled] [06:31:38] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455499 (owner: 10Marostegui) [06:32:24] PROBLEM - puppet last run on dbproxy1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/lib/nagios/plugins/check_ferm] [06:32:56] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455499 (owner: 10Marostegui) [06:34:11] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1099:3318 (duration: 00m 49s) [06:34:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:34:19] (03PS1) 10Marostegui: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455500 [06:35:23] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455499 (owner: 10Marostegui) [06:35:46] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455500 (owner: 10Marostegui) [06:37:00] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455500 (owner: 10Marostegui) [06:38:17] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 (duration: 00m 48s) [06:38:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:56] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:50:15] !log Deploy schema change on labtestwiki - T187089 [06:50:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:24] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [06:50:58] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455500 (owner: 10Marostegui) [06:52:10] !log installing discover updates from stretch 9.5 point release [06:52:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:18] !log Deploy schema change on labtestwiki - T89737 [06:53:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:23] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [06:57:35] RECOVERY - puppet last run on dbproxy1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:59:24] RECOVERY - puppet last run on mw1307 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:00:28] (03PS1) 10Muehlenhoff: Add library hint for discover [puppet] - 10https://gerrit.wikimedia.org/r/455501 [07:01:51] (03CR) 10Filippo Giunchedi: mtail: Escape the '.' in /w/load.php for varnishrls.mtail (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454724 (https://phabricator.wikimedia.org/T202479) (owner: 10Krinkle) [07:08:07] 10Operations, 10monitoring, 10netops, 10User-fgiunchedi: Update ACLs for newer graphite hosts - https://phabricator.wikimedia.org/T202846 (10fgiunchedi) p:05Triage>03Normal [07:08:54] (03CR) 10Muehlenhoff: [C: 032] Add library hint for discover [puppet] - 10https://gerrit.wikimedia.org/r/455501 (owner: 10Muehlenhoff) [07:10:36] !log reimaging wtp2014-wtp2016 to stretch [07:10:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:22:59] 10Operations, 10User-fgiunchedi: rack/setup/install centrallog1001.eqiad.wmnet - https://phabricator.wikimedia.org/T200706 (10fgiunchedi) Steps for service implementation: [] Include centrallog1001 in router ACLs [] Add centrallog1001 to `remote_syslog(_tls)` destinations so logs start flowing to that host too... [07:23:19] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2088 rebooted itself and came back sick - https://phabricator.wikimedia.org/T202822 (10jcrespo) Thanks @Andrew for taking the time, I owe you a drink of your preference next time we meet. [07:23:21] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455502 [07:24:23] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb={PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [07:24:32] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api [07:25:13] PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 49900 MB (10% inode=99%) [07:25:33] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [07:25:52] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [07:29:34] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455502 (owner: 10Marostegui) [07:30:51] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455502 (owner: 10Marostegui) [07:30:59] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [07:30:59] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [07:31:10] RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [07:31:49] 10Operations, 10Discovery-Search (Current work): Onboarding Mathew Onipe - https://phabricator.wikimedia.org/T202708 (10Gehel) [07:31:59] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [07:33:55] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1101:3318 (duration: 00m 48s) [07:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:03] (03PS1) 10Marostegui: db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455503 [07:38:16] (03PS1) 10Smalyshev: Move daily dump load - last one finished at 5:58 [puppet] - 10https://gerrit.wikimedia.org/r/455504 [07:38:34] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455503 (owner: 10Marostegui) [07:38:51] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1101:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455502 (owner: 10Marostegui) [07:39:55] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455503 (owner: 10Marostegui) [07:42:37] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 50s) [07:42:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:52] !log force remount of /mnt/hdfs on stat1005 (transport not connected errors) [07:44:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:45:09] RECOVERY - Disk space on stat1005 is OK: DISK OK [07:52:09] RECOVERY - Disk space on elastic1025 is OK: DISK OK [07:53:05] (03CR) 10Filippo Giunchedi: [C: 04-1] ircecho: Support auth over irc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/405594 (owner: 10Paladox) [07:54:14] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/454613 (https://phabricator.wikimedia.org/T201630) (owner: 10Ayounsi) [07:54:36] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455503 (owner: 10Marostegui) [07:54:46] (03CR) 10Mobrovac: [C: 04-1] Replace the semver patch version in Accept with x (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) (owner: 10Ppchelko) [07:56:46] PROBLEM - parsoid on wtp2014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:59:03] ^reimage, silenced [07:59:27] !log installing nodejs security updates on maps* hosts [07:59:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:33] !log restarting kartotherian / tilerator on maps* for nodejs upgrade [08:05:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:05:50] (03CR) 10Mobrovac: [C: 031] "PCC OK - https://puppet-compiler.wmflabs.org/compiler02/12240/" [puppet] - 10https://gerrit.wikimedia.org/r/454574 (owner: 10Ppchelko) [08:07:43] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) [08:07:46] RECOVERY - parsoid on wtp2014 is OK: HTTP OK: HTTP/1.1 200 OK - 1051 bytes in 0.144 second response time [08:09:07] RECOVERY - Filesystem available is greater than filesystem size on ms-be2042 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2042&var-datasource=codfw%2520prometheus%252Fops [08:09:34] sorry in advance for a bit of gerrit spam ;) [08:09:50] (03PS1) 10Volans: remote: ensure host list is a copy [software/spicerack] - 10https://gerrit.wikimedia.org/r/455506 (https://phabricator.wikimedia.org/T199079) [08:09:52] (03PS1) 10Volans: remote, confctl: raise on no-match [software/spicerack] - 10https://gerrit.wikimedia.org/r/455507 (https://phabricator.wikimedia.org/T199079) [08:09:54] (03PS1) 10Volans: Add mysql module to interact with the core MySQLs [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) [08:09:56] (03PS1) 10Volans: cookbook: add help option to the interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/455509 (https://phabricator.wikimedia.org/T199079) [08:09:58] (03PS1) 10Volans: config: refactor to explicitly pass the file [software/spicerack] - 10https://gerrit.wikimedia.org/r/455510 (https://phabricator.wikimedia.org/T199079) [08:10:00] (03PS1) 10Volans: mediawiki: set timeout for requests [software/spicerack] - 10https://gerrit.wikimedia.org/r/455511 (https://phabricator.wikimedia.org/T199079) [08:10:02] (03PS1) 10Volans: tests: skip tests when fixture is not available [software/spicerack] - 10https://gerrit.wikimedia.org/r/455512 (https://phabricator.wikimedia.org/T199079) [08:10:04] (03PS1) 10Volans: confctl: fix dry-run log message [software/spicerack] - 10https://gerrit.wikimedia.org/r/455513 (https://phabricator.wikimedia.org/T199079) [08:10:06] (03PS1) 10Volans: dnsdisc: add retry decorator to check_ttl() [software/spicerack] - 10https://gerrit.wikimedia.org/r/455514 (https://phabricator.wikimedia.org/T199079) [08:10:08] (03PS1) 10Volans: cookbook: simplify cookbook return value [software/spicerack] - 10https://gerrit.wikimedia.org/r/455515 (https://phabricator.wikimedia.org/T199079) [08:10:50] (03PS1) 10Volans: Initial debian packaging [software/spicerack] - 10https://gerrit.wikimedia.org/r/455516 (https://phabricator.wikimedia.org/T199079) [08:11:54] (03CR) 10Gehel: [C: 031] Add README [cookbooks] - 10https://gerrit.wikimedia.org/r/454559 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:12:12] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10jcrespo) 05stalled>03Open [08:12:20] (03CR) 10Gehel: [C: 031] Initial structure for the cookbooks hierarchy [cookbooks] - 10https://gerrit.wikimedia.org/r/454800 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:13:06] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1104" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455517 [08:13:21] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10jcrespo) a:03Marostegui [08:14:42] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1104" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455517 (owner: 10Marostegui) [08:16:08] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1104" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455517 (owner: 10Marostegui) [08:17:37] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1104 (duration: 00m 52s) [08:17:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:45] (03PS1) 10Marostegui: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455518 [08:18:51] (03PS4) 10Volans: Add README [cookbooks] - 10https://gerrit.wikimedia.org/r/454559 (https://phabricator.wikimedia.org/T199079) [08:18:53] (03PS3) 10Volans: Initial structure for the cookbooks hierarchy [cookbooks] - 10https://gerrit.wikimedia.org/r/454800 (https://phabricator.wikimedia.org/T199079) [08:19:24] (03PS11) 10Vgutierrez: Certcentral integration tests [software/certcentral] - 10https://gerrit.wikimedia.org/r/454045 (https://phabricator.wikimedia.org/T199711) [08:19:26] (03PS5) 10Vgutierrez: Deliver certificates in every save mode [software/certcentral] - 10https://gerrit.wikimedia.org/r/454794 (https://phabricator.wikimedia.org/T199711) [08:19:29] (03PS8) 10Vgutierrez: Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) [08:19:31] (03PS2) 10Vgutierrez: Provide support in the API for different certificate save modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/455153 (https://phabricator.wikimedia.org/T199711) [08:19:33] (03PS3) 10Vgutierrez: [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) [08:19:40] (03CR) 10Vgutierrez: Certcentral integration tests (034 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/454045 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [08:19:42] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455518 (owner: 10Marostegui) [08:20:52] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455518 (owner: 10Marostegui) [08:21:07] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [08:22:00] (03CR) 10Gehel: [C: 031] "Comments inline, but minor enough." (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455506 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:22:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 48s) [08:22:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:14] !log deploy schema change on db1087 with replication (lag on labs:s8 will be generated) [08:22:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:29] (03CR) 10Vgutierrez: Refactor certcentral.certificate_management() (031 comment) [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [08:23:39] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/455507 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:25:06] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on sarin.codfw.wmnet for hosts: ``` ['cp2019.codfw.wmnet', 'cp2017.codfw.wmnet'] ``` The log can be found in `/var/log/w... [08:26:39] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1104" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455517 (owner: 10Marostegui) [08:26:41] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455518 (owner: 10Marostegui) [08:27:45] !log reimaging wtp2017-wtp2019 to stretch [08:27:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:50] (03PS1) 10Filippo Giunchedi: logstash: add plugin_id to outputs [puppet] - 10https://gerrit.wikimedia.org/r/455520 (https://phabricator.wikimedia.org/T200362) [08:29:11] (03CR) 10Vgutierrez: Implement DNS01 challenge support (031 comment) [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [08:31:07] (03CR) 10Gehel: [C: 04-1] "minor comments inline" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:31:20] !log installing ca-certificates updates for jessie/stretch [08:31:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:06] (03CR) 10Gehel: [C: 031] cookbook: add help option to the interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/455509 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:32:12] (03PS9) 10Vgutierrez: Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) [08:32:14] (03PS3) 10Vgutierrez: Provide support in the API for different certificate save modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/455153 (https://phabricator.wikimedia.org/T199711) [08:32:16] (03PS4) 10Vgutierrez: [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) [08:32:18] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/12241/" [puppet] - 10https://gerrit.wikimedia.org/r/455520 (https://phabricator.wikimedia.org/T200362) (owner: 10Filippo Giunchedi) [08:32:30] (03CR) 10jerkins-bot: [V: 04-1] Provide support in the API for different certificate save modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/455153 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [08:33:18] (03CR) 10Gehel: [C: 031] config: refactor to explicitly pass the file [software/spicerack] - 10https://gerrit.wikimedia.org/r/455510 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:33:44] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [08:35:42] (03CR) 10Gehel: mediawiki: set timeout for requests (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455511 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:35:56] (03CR) 10Vgutierrez: "recheck" [software/certcentral] - 10https://gerrit.wikimedia.org/r/455153 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [08:37:36] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455521 [08:38:05] (03CR) 10Volans: "Replies inline (no code change yet)" (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455506 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:38:32] (03CR) 10Gehel: [C: 031] "Good enough, minor comment inline." (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455512 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:38:55] (03CR) 10Gehel: [C: 031] confctl: fix dry-run log message [software/spicerack] - 10https://gerrit.wikimedia.org/r/455513 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:39:15] (03CR) 10Muehlenhoff: [C: 031] "Haven't tried to build it, but looks good." [software/spicerack] - 10https://gerrit.wikimedia.org/r/455516 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:40:19] (03CR) 10Gehel: [C: 031] dnsdisc: add retry decorator to check_ttl() [software/spicerack] - 10https://gerrit.wikimedia.org/r/455514 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:41:05] (03CR) 10Gehel: [C: 031] cookbook: simplify cookbook return value [software/spicerack] - 10https://gerrit.wikimedia.org/r/455515 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:43:45] (03PS3) 10Elukey: piwik: convert apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/453546 (owner: 10Dzahn) [08:44:33] (03CR) 10Elukey: [C: 032] piwik: convert apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/453546 (owner: 10Dzahn) [08:46:01] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455521 (owner: 10Marostegui) [08:46:23] (03CR) 10Volans: "reply inline" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455511 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:47:18] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455521 (owner: 10Marostegui) [08:47:39] (03PS1) 10Elukey: piwik: update notify apache dependencies [puppet] - 10https://gerrit.wikimedia.org/r/455522 [08:47:46] marostegui: for unrelated reasons I was watching mediawiki-errors on logstash and it looks like dumpJson.php logs like crazy "Wikimedia\Rdbms\LoadBalancer::pickReaderIndex: all replica DBs lagged. Switch to read-only mode" after db1087 depool, known? doesn't seem to be user impacting tho as it is only from snapshot1008 [08:48:16] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 48s) [08:48:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:30] (03CR) 10Elukey: [C: 032] piwik: update notify apache dependencies [puppet] - 10https://gerrit.wikimedia.org/r/455522 (owner: 10Elukey) [08:48:32] (03PS1) 10Marostegui: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455523 [08:49:01] (03CR) 10Volans: "reply inline" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455512 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:50:15] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455523 (owner: 10Marostegui) [08:50:44] (03CR) 10Volans: [V: 032 C: 032] Add README [cookbooks] - 10https://gerrit.wikimedia.org/r/454559 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:51:05] (03CR) 10Volans: [V: 032 C: 032] Initial structure for the cookbooks hierarchy [cookbooks] - 10https://gerrit.wikimedia.org/r/454800 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:51:32] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455523 (owner: 10Marostegui) [08:52:35] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 48s) [08:52:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:19] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp2019.codfw.wmnet', 'cp2017.codfw.wmnet'] ``` and were **ALL** successful. [08:53:29] !log rebooting bast5001 for kernel security update [08:53:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:56] (03CR) 10Gehel: [C: 031] remote: ensure host list is a copy (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455506 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:55:43] (03CR) 10Gehel: [C: 031] mediawiki: set timeout for requests (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455511 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:56:06] (03CR) 10Gehel: [C: 031] tests: skip tests when fixture is not available (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455512 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:56:23] !log Drop eventlogcleaner user from dbstore1002 [08:56:26] elukey: ^ [08:56:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:29] ack! [08:57:46] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455521 (owner: 10Marostegui) [08:57:51] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455523 (owner: 10Marostegui) [08:59:57] !log Drop sul, phadmin and phuser from dbstore1002 [09:00:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:04] addshore: It is that lovely time of the day again! You are hereby commanded to deploy Wikidata link formatter config. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180827T0900). [09:00:04] leszek_wmde: A patch you scheduled for Wikidata link formatter config is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [09:00:27] addshore: fashionably late but I am here [09:00:37] hi! [09:00:42] * addshore logs in to places [09:01:39] leszek_wmde: so there wasn't some other patch that was also needed right? [09:02:04] What was it that stopped it from working last week on test? [09:02:11] addshore: yeah, but it get in when you deployed config for testwikidatawiki on friday [09:02:26] (03PS2) 10Addshore: Wikidata: Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455389 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [09:02:36] (03CR) 10Addshore: [C: 032] Wikidata: Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455389 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [09:02:49] addshore: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/454510/ is the "prerequisite" patch [09:03:17] addshore: I thought it all works. Or do you mean the logging not showing up in logstash confusion? [09:03:25] yes, the logging issue! [09:05:06] addshore: so I think we didn't investigate it really [09:06:54] marostegui: not sure if you saw my earlier message [09:06:55] 09:47 marostegui: for unrelated reasons I was watching mediawiki-errors on logstash and it looks like dumpJson.php logs like crazy "Wikimedia\Rdbms\LoadBalancer::pickReaderIndex: all replica DBs lagged. Switch to read-only mode" after db1087 depool, known? doesn't seem to be user impacting tho as it is only from snapshot1008 [09:09:56] heh leszek_wmde the patch was marked as WIP again! [09:10:03] (03CR) 10Addshore: [C: 032] Wikidata: Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455389 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [09:10:25] addshore: logging mystery solved. apparently the relevant code was not merged yet, although we though it was :) [09:10:27] addshore: sorrey! [09:10:49] does git-review push everything as default to WIP or something? [09:11:04] addshore: I set it do it in my gerrit settings [09:11:12] aaaah [09:11:12] godog: That host isn't delayed and when it was repooled it wasn't delayed either - let me check anyways [09:11:13] addshore: will learn to un-wip patches finally [09:11:20] (03Merged) 10jenkins-bot: Wikidata: Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455389 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [09:11:52] leszek_wmde: the patch is on mwdebug1002 [09:11:54] please test [09:11:55] godog: Oh, I see it is from 09:47 - I missed that message indeed sorry [09:13:53] marostegui: np, it recovered as soon as db1087 was repooled tho [09:14:06] addshore: looks good! [09:14:27] (03CR) 10jenkins-bot: Wikidata: Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455389 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [09:14:35] godog: strange... [09:14:52] leszek_wmde: cool, will sync! [09:15:40] syncing [09:16:22] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:455389]] Wikidata: Use new item ID formatter for Q1 T201832 (duration: 00m 49s) [09:16:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:27] T201832: Use link formatter that uses cache instead of wb_terms for item Q1 - https://phabricator.wikimedia.org/T201832 [09:16:56] !log rebooting bast4002 for kernel security update [09:16:58] leszek_wmde: done [09:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:17:08] addshore: checking [09:18:36] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on sarin.codfw.wmnet for hosts: ``` ['cp3034.esams.wmnet', 'cp2020.codfw.wmnet'] ``` The log can be found in `/var/log/w... [09:18:42] addshore: looks we're good [09:18:47] addshore: thank you sir! [09:18:53] :) [09:19:07] leszek_wmde: I guess https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/438245/ can be abandoned? [09:19:39] addshore: I didn't even remember this patch existed :) [09:19:44] (03Abandoned) 10WMDE-leszek: Only enable repo-specific parts of WikibaseLexeme on wikidata wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438245 (https://phabricator.wikimedia.org/T195615) (owner: 10WMDE-leszek) [09:19:44] :D [09:20:03] !log deploy slot done [09:20:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:20] (03PS1) 10Gehel: Introduce specific RemoteHosts subclass for MySql [software/spicerack] - 10https://gerrit.wikimedia.org/r/455527 [09:21:22] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455528 [09:21:24] (03CR) 10jerkins-bot: [V: 04-1] Introduce specific RemoteHosts subclass for MySql [software/spicerack] - 10https://gerrit.wikimedia.org/r/455527 (owner: 10Gehel) [09:22:52] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455528 (owner: 10Marostegui) [09:24:08] (03PS2) 10Aleksey Bekh-Ivanov (WMDE): Wikidata: Use new item ID formatter for Q1-Q100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455390 (https://phabricator.wikimedia.org/T201833) (owner: 10WMDE-leszek) [09:24:11] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455528 (owner: 10Marostegui) [09:25:11] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 49s) [09:25:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:18] (03PS1) 10Marostegui: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455529 [09:27:02] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455529 (owner: 10Marostegui) [09:28:20] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455529 (owner: 10Marostegui) [09:28:57] PROBLEM - IPsec on cp5004 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2020_v4, cp2020_v6 [09:29:28] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 48s) [09:29:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:22] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455528 (owner: 10Marostegui) [09:31:24] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455529 (owner: 10Marostegui) [09:35:51] (03PS2) 10Gehel: Introduce specific RemoteHosts subclass for MySql [software/spicerack] - 10https://gerrit.wikimedia.org/r/455527 [09:36:55] (03CR) 10jerkins-bot: [V: 04-1] Introduce specific RemoteHosts subclass for MySql [software/spicerack] - 10https://gerrit.wikimedia.org/r/455527 (owner: 10Gehel) [09:38:28] RECOVERY - IPsec on cp5004 is OK: Strongswan OK - 36 ESP OK [09:43:06] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455531 [09:47:37] (03CR) 10Filippo Giunchedi: [C: 032] logstash: add plugin_id to outputs [puppet] - 10https://gerrit.wikimedia.org/r/455520 (https://phabricator.wikimedia.org/T200362) (owner: 10Filippo Giunchedi) [09:50:10] (03PS2) 10Filippo Giunchedi: logstash: add plugin_id to outputs [puppet] - 10https://gerrit.wikimedia.org/r/455520 (https://phabricator.wikimedia.org/T200362) [09:50:40] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] logstash: add plugin_id to outputs [puppet] - 10https://gerrit.wikimedia.org/r/455520 (https://phabricator.wikimedia.org/T200362) (owner: 10Filippo Giunchedi) [09:50:46] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp2020.codfw.wmnet', 'cp3034.esams.wmnet'] ``` and were **ALL** successful. [09:54:34] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455531 (owner: 10Marostegui) [09:55:54] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455531 (owner: 10Marostegui) [09:56:59] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1092 (duration: 00m 48s) [09:57:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:27] (03PS1) 10Filippo Giunchedi: logstash: fix default plugin_id name for elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/455533 [10:07:14] (03CR) 10Filippo Giunchedi: [C: 032] logstash: fix default plugin_id name for elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/455533 (owner: 10Filippo Giunchedi) [10:09:42] !log reimaging wtp2020 to stretch [10:09:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:10] !log disable puppet on maps*, scb*, restbase* hosts for merge of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454574/ [10:15:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:48] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455531 (owner: 10Marostegui) [10:19:27] !log disable puppet on hosts including service::configuration for merge of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454574/. See https://phabricator.wikimedia.org/P7486 [10:19:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:25] (03CR) 10Alexandros Kosiaris: [C: 032] Switch services to MW connection to https. [puppet] - 10https://gerrit.wikimedia.org/r/454574 (owner: 10Ppchelko) [10:20:33] (03PS2) 10Alexandros Kosiaris: Switch services to MW connection to https. [puppet] - 10https://gerrit.wikimedia.org/r/454574 (owner: 10Ppchelko) [10:20:36] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Switch services to MW connection to https. [puppet] - 10https://gerrit.wikimedia.org/r/454574 (owner: 10Ppchelko) [10:29:06] !log enable and run puppet on restbase2001, restart restbase [10:29:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:19] !log enable and run puppet on scb1001, restart services [10:29:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:05] jan_drewniak: #bothumor My software never has bugs. It just develops random features. Rise for Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180827T1030). [10:32:04] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455534 (https://phabricator.wikimedia.org/T128546) [10:33:47] (03CR) 10Jdrewniak: [C: 032] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455534 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:34:20] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on sarin.codfw.wmnet for hosts: ``` ['cp3035.esams.wmnet', 'cp2024.codfw.wmnet'] ``` The log can be found in `/var/log/w... [10:35:08] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455534 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:36:31] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455534 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:38:09] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:455534|Bumping portals to master (T128546)]] (duration: 00m 49s) [10:38:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:14] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:38:58] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:455534|Bumping portals to master (T128546)]] (duration: 00m 48s) [10:39:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:07] !log Deploy schema change on db2051 (s4 codfw master) with replication, this will generate lag on s4 codfw [10:58:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:40] PROBLEM - Varnish HTTP upload-frontend - port 3120 on cp3035 is CRITICAL: connect to address 10.20.0.170 and port 3120: Connection refused [10:58:40] PROBLEM - Freshness of OCSP Stapling files on cp3035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:58:40] PROBLEM - Varnish traffic logger - varnishreqstats on cp3035 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [11:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180827T1100). [11:00:04] tgr and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:09] Here [11:00:16] o/ [11:00:19] o/ [11:00:23] I can SWAT today [11:00:31] tgr: go ahead while I review other patches [11:00:35] ack [11:01:41] (03PS12) 10Gergő Tisza: Remove sitewide and user CSS/JS editing from old groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421124 (https://phabricator.wikimedia.org/T190015) [11:01:56] (03PS14) 10Gergő Tisza: Enforce that interface-admin is the only group that can edit non-own CSS/JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) [11:02:22] please ignore the alerts about cp3035, that's me [11:02:23] PROBLEM - HTTPS Unified ECDSA on cp3035 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [11:03:23] RECOVERY - HTTPS Unified ECDSA on cp3035 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 595558 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-01-24 12:00:00 +0000 (expires in 150 days) [11:04:22] (03CR) 10Gergő Tisza: [C: 032] Remove sitewide and user CSS/JS editing from old groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421124 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [11:04:29] 10Operations, 10Dumps-Generation: Reboots of dumps/snapshot hosts for L1TF/microcode updates - https://phabricator.wikimedia.org/T202623 (10ArielGlenn) [11:05:08] (03CR) 10Gergő Tisza: [C: 032] Enforce that interface-admin is the only group that can edit non-own CSS/JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [11:05:56] (03Merged) 10jenkins-bot: Remove sitewide and user CSS/JS editing from old groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421124 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [11:06:24] (03Merged) 10jenkins-bot: Enforce that interface-admin is the only group that can edit non-own CSS/JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [11:06:31] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp2024.codfw.wmnet', 'cp3035.esams.wmnet'] ``` and were **ALL** successful. [11:06:53] RECOVERY - Varnish traffic logger - varnishreqstats on cp3035 is OK: PROCS OK: 1 process with args /usr/local/bin/varnishreqstats, UID = 0 (root) [11:06:54] * addshore will hang around and perhaps tag a patch on the end that he can deploy himself [11:07:43] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp3035 is OK: HTTP OK: HTTP/1.1 200 OK - 498 bytes in 0.167 second response time [11:07:43] marostegui: are we good to go with T202549 ? [11:07:44] T202549: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 [11:07:45] addshore: if it's just one patch, and if you don't need a lot of time to test it, you can go after tgr [11:07:58] bbiab [11:08:11] zeljkof: in a meeting until the end of the slot / midway :) [11:08:26] :) [11:08:35] (03CR) 10jenkins-bot: Remove sitewide and user CSS/JS editing from old groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421124 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [11:08:37] (03CR) 10jenkins-bot: Enforce that interface-admin is the only group that can edit non-own CSS/JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [11:09:37] (03CR) 10Zfilipin: [C: 031] Enable AbuseFilter 'block' on it.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455307 (https://phabricator.wikimedia.org/T202808) (owner: 10Zoranzoki21) [11:12:06] (03PS3) 10Zfilipin: Fix "seperated" typo in MWMultiVersion.php file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455354 (https://phabricator.wikimedia.org/T201491) (owner: 10Zoranzoki21) [11:12:44] PROBLEM - DPKG on bast1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:12:49] (03CR) 10Zfilipin: [C: 031] Fix "seperated" typo in MWMultiVersion.php file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455354 (https://phabricator.wikimedia.org/T201491) (owner: 10Zoranzoki21) [11:13:03] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 21 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [11:13:53] RECOVERY - DPKG on bast1002 is OK: All packages OK [11:14:50] (03CR) 10Zfilipin: [C: 031] *.pensoft.net should be in wgCopyUploadsDomains whitelist instead of pensoft.net [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455393 (https://phabricator.wikimedia.org/T202832) (owner: 10Urbanecm) [11:15:46] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:421124|Remove sitewide and user CSS/JS editing from old groups]] [[gerrit:421125|Enforce that interface-admin is the only group that can edit non-own CSS/JS]] (T190015) (duration: 00m 48s) [11:15:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:27] Urbanecm: the first patch should be testable, the second patch is just a typo in comments, nothing to test, can you test the third patch? [11:16:50] Please push the third patch directly to prod [11:16:56] !log tgr@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:421124|Remove sitewide and user CSS/JS editing from old groups]] [[gerrit:421125|Enforce that interface-admin is the only group that can edit non-own CSS/JS]] (T190015) (duration: 00m 48s) [11:17:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:04] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 15 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [11:18:31] zeljkof: the core patch is still merging, you can go on [11:18:53] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 500 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test [11:18:53] from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 500 (expecting: 200) [11:19:29] !log Deploy schema change on dbstore1002:s4 [11:19:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:12] tgr: thanks, starting with config changes cc Urbanecm [11:22:18] ack [11:22:41] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455307 (https://phabricator.wikimedia.org/T202808) (owner: 10Zoranzoki21) [11:24:02] (03Merged) 10jenkins-bot: Enable AbuseFilter 'block' on it.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455307 (https://phabricator.wikimedia.org/T202808) (owner: 10Zoranzoki21) [11:24:20] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: nova.conf: fix weird character [puppet] - 10https://gerrit.wikimedia.org/r/455543 [11:25:04] (03CR) 10jenkins-bot: Enable AbuseFilter 'block' on it.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455307 (https://phabricator.wikimedia.org/T202808) (owner: 10Zoranzoki21) [11:25:18] Urbanecm: the first patch is at mwdebug1002 [11:25:24] ack [11:25:30] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: nova.conf: fix weird character [puppet] - 10https://gerrit.wikimedia.org/r/455543 (owner: 10Arturo Borrero Gonzalez) [11:25:42] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455354 (https://phabricator.wikimedia.org/T201491) (owner: 10Zoranzoki21) [11:25:49] please push it to prod zeljkof [11:25:58] arturo: I am still waiting for you code review on the GRANTS ;-) [11:26:00] Urbanecm: ok [11:26:52] !log zfilipin@deploy1001 Synchronized wmf-config/abusefilter.php: SWAT: [[gerrit:455307|Enable AbuseFilter "block" on it.wikibooks (T202808)]] (duration: 00m 48s) [11:26:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:57] T202808: Enable AbuseFilter 'block' on it.wikibooks - https://phabricator.wikimedia.org/T202808 [11:27:06] (03Merged) 10jenkins-bot: Fix "seperated" typo in MWMultiVersion.php file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455354 (https://phabricator.wikimedia.org/T201491) (owner: 10Zoranzoki21) [11:27:12] Urbanecm: the first patch is deployed, will ping you when the rest is deployed [11:27:16] ack [11:27:29] 10Operations, 10Wikimedia-Mailing-lists: Creation of a mailing list for the "Wiki Labs Culture" initiative - https://phabricator.wikimedia.org/T202737 (10Fnielsen) Thanks! I can confirm subscription works for me. [11:28:29] (03PS11) 10Arturo Borrero Gonzalez: cloudvps: eqiad1: move nova DBs to m5-master [puppet] - 10https://gerrit.wikimedia.org/r/454774 (https://phabricator.wikimedia.org/T202549) [11:28:41] (03PS2) 10Zfilipin: *.pensoft.net should be in wgCopyUploadsDomains whitelist instead of pensoft.net [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455393 (https://phabricator.wikimedia.org/T202832) (owner: 10Urbanecm) [11:28:49] !log zfilipin@deploy1001 Synchronized multiversion/MWMultiVersion.php: SWAT: [[gerrit:455354|Fix "seperated" typo in MWMultiVersion.php file (T201491)]] (duration: 00m 48s) [11:28:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:28:54] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455393 (https://phabricator.wikimedia.org/T202832) (owner: 10Urbanecm) [11:28:55] T201491: Fix common typos in code - https://phabricator.wikimedia.org/T201491 [11:30:27] (03Merged) 10jenkins-bot: *.pensoft.net should be in wgCopyUploadsDomains whitelist instead of pensoft.net [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455393 (https://phabricator.wikimedia.org/T202832) (owner: 10Urbanecm) [11:31:27] (03PS1) 10Alexandros Kosiaris: proton: Force HTTP endpoint for mwapi [puppet] - 10https://gerrit.wikimedia.org/r/455546 [11:31:38] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:455393|*.pensoft.net should be in wgCopyUploadsDomains whitelist instead of pensoft.net (T202832)]] (duration: 00m 48s) [11:31:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:43] T202832: *.pensoft.net should be in wgCopyUploadsDomains whitelist instead of pensoft.net - https://phabricator.wikimedia.org/T202832 [11:31:52] Urbanecm: all patches deployed [11:31:58] ack, thank you [11:32:07] tgr, addshore: I'm done with swat, go ahead with your patches [11:32:46] I'll go then [11:32:46] marostegui: it still needs a refresh `nova_api-eqiad1` vs `nova_api_eqiad1` [11:33:09] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "This still needs a refresh:" [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [11:33:12] Ah! [11:33:59] BTW do I need to edit anything in the dabatase dump to reference these new DBs? [11:34:12] I see this [11:34:14] https://www.irccloud.com/pastebin/qSlgD8oS/ [11:34:28] (03PS5) 10Marostegui: production-m5.sql.erb: Add GRANTS for nova user [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) [11:34:46] arturo: you'd need to do a grep for: "USE nova_api" [11:34:48] And change that [11:34:53] To reflect the new DBs [11:35:26] is not present [11:35:33] (03CR) 10Alexandros Kosiaris: [C: 032] proton: Force HTTP endpoint for mwapi [puppet] - 10https://gerrit.wikimedia.org/r/455546 (owner: 10Alexandros Kosiaris) [11:35:51] arturo: where are the backups? [11:36:03] PROBLEM - swift-account-reaper on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [11:36:03] PROBLEM - swift-container-replicator on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [11:36:05] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] "Proton needed to be excluded while we work out how to make chromium trust Puppet CA. See https://gerrit.wikimedia.org/r/#/c/operations/pup" [puppet] - 10https://gerrit.wikimedia.org/r/454574 (owner: 10Ppchelko) [11:36:13] PROBLEM - swift-account-auditor on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [11:36:13] PROBLEM - swift-object-replicator on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [11:36:13] PROBLEM - swift-account-server on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [11:36:23] PROBLEM - swift-account-replicator on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [11:36:32] PROBLEM - swift-object-auditor on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [11:36:42] PROBLEM - swift-container-server on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [11:36:42] PROBLEM - swift-object-updater on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [11:36:52] PROBLEM - swift-object-server on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [11:36:53] PROBLEM - swift-container-updater on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [11:37:40] marostegui: which backups? currently is a local database, I doubt this has any backup [11:37:48] zeljkof: thanks! [11:37:51] arturo: the mysqldumps I mean [11:38:04] oh, I just scp'ed into m5-master, my home dir [11:38:12] ah ok, let me see [11:40:08] !log tgr@deploy1001 Synchronized php-1.32.0-wmf.18/includes/diff/DifferenceEngine.php: SWAT: [[gerrit:455254|Fix DifferenceEngine revision loading logic (T201218, T202454)]] (duration: 00m 49s) [11:40:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:15] T201218: Viewing page's first revision via diff gives error - https://phabricator.wikimedia.org/T201218 [11:40:15] T202454: Call to a member function getRevisionRecord() on a non-object (boolean) - https://phabricator.wikimedia.org/T202454 [11:40:23] addshore: I'm done [11:40:59] arturo: they look fine, yeah, so we need to: 1) merge the grants (check them again), 2) create the new databases 3) manually apply the grants 4) import the databases [11:41:02] (03CR) 10jenkins-bot: Fix "seperated" typo in MWMultiVersion.php file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455354 (https://phabricator.wikimedia.org/T201491) (owner: 10Zoranzoki21) [11:41:04] (03CR) 10jenkins-bot: *.pensoft.net should be in wgCopyUploadsDomains whitelist instead of pensoft.net [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455393 (https://phabricator.wikimedia.org/T202832) (owner: 10Urbanecm) [11:41:40] arturo: I can do 1, 2, 3 now and 4) tomorrow morning with you? I am going to start an onboarding soon [11:42:06] tgr: thanks! [11:42:19] * addshore waits for jenkins to merge his on the branch [11:43:03] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy [11:43:52] marostegui: ACK, I can do the import myself if you want [11:44:13] arturo: We can do it together :) [11:44:23] arturo: Give the grants another review and let me know if it is good to merge [11:45:01] addshore: you might want to add the patch to the wikitech page, just for records [11:45:09] can do! [11:45:25] (03CR) 10Arturo Borrero Gonzalez: [C: 032] "Let me know which passwd do you use finally." [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) (owner: 10Marostegui) [11:45:41] (03PS6) 10Marostegui: production-m5.sql.erb: Add GRANTS for nova user [puppet] - 10https://gerrit.wikimedia.org/r/455114 (https://phabricator.wikimedia.org/T202549) [11:45:41] marostegui: good to merge [11:45:51] arturo: Cool - doing it now [11:45:54] !log enable puppet on all hosts including puppet class service::configuration and start a rolling puppet run on them [11:45:57] (03PS2) 10Ladsgroup: mediawiki: Remove unneeded file decleration on wikidata maintenance script [puppet] - 10https://gerrit.wikimedia.org/r/454543 [11:45:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:01] {{done}] [11:48:41] (03CR) 10Ladsgroup: "Rebased, I think we can merge this now :)" [puppet] - 10https://gerrit.wikimedia.org/r/454543 (owner: 10Ladsgroup) [11:48:45] 10Operations, 10Discovery-Search (Current work): Migrate elasticsearch scripts to spicerack cookbooks - https://phabricator.wikimedia.org/T202885 (10Gehel) [11:48:53] PROBLEM - tileratorui on maps-test2003 is CRITICAL: connect to address 10.192.16.34 and port 6535: Connection refused [11:49:03] PROBLEM - tilerator on maps-test2003 is CRITICAL: connect to address 10.192.16.34 and port 6534: Connection refused [11:49:32] * gehel is checking maps-test2003 [11:49:42] !log rebooting labweb* for kernel security update [11:49:43] RECOVERY - swift-account-replicator on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [11:49:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:52] RECOVERY - swift-object-auditor on ms-be2042 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [11:50:02] RECOVERY - swift-container-server on ms-be2042 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [11:50:02] RECOVERY - swift-object-updater on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [11:50:07] that's me ^ [11:50:12] RECOVERY - swift-object-server on ms-be2042 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [11:50:13] RECOVERY - swift-container-updater on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [11:50:32] RECOVERY - swift-account-reaper on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [11:50:32] RECOVERY - swift-container-replicator on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [11:50:33] RECOVERY - swift-account-auditor on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [11:50:33] RECOVERY - swift-object-replicator on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [11:50:33] RECOVERY - swift-account-server on ms-be2042 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [11:50:43] !log repair on ms-be2042 sdd - T199198 [11:50:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:48] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [11:51:38] looks like tilerator and tileratorui just restarted, not sure why yet [11:52:15] gehel: service::configuration [11:52:59] aside the restart it should be a noop [11:53:40] akosiaris: ok, all good then! thanks! [11:54:57] * addshore still waits or CI on his patch, next time I'll remember to hit +2 25 mins before wishing to deploy.... [11:55:01] jouncebot: next [11:55:01] In 5 hour(s) and 4 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180827T1700) [11:55:08] * addshore will overrun slightly [11:57:50] 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting Access to view EventLogging data for gabriel-wmde / gbirke - https://phabricator.wikimedia.org/T202072 (10Tim_WMDE) a:05gabriel-wmde>03None [11:59:01] !log Create empty databases nova_api_eqiad1 and nova_eqiad1 on m5 master (db1073) - T202549 [11:59:03] marostegui: Failed to log message to wiki. Somebody should check the error logs. [11:59:03] T202549: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 [11:59:58] !log addshore@deploy1001 Synchronized php-1.32.0-wmf.18/extensions/Wikibase/lib/includes: [[gerrit:455542]] Pass IDBAccessObject flag to RevisionStore in WikiPageEntityRevisionLookup T202706 (duration: 00m 51s) [12:00:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:04] T202706: wmf.18 - "Failed to load blob from address" while merging entities - https://phabricator.wikimedia.org/T202706 [12:00:10] !log Create empty databases nova_api_eqiad1 and nova_eqiad1 on m5 master (db1073) - T202549 [12:00:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:29] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on neodymium.eqiad.wmnet for hosts: ``` ['cp3037.esams.wmnet', 'cp3041.esams.wmnet'] ``` The log can be found in `/var/l... [12:01:09] !log swat done [12:01:09] addshore: Failed to log message to wiki. Somebody should check the error logs. [12:02:48] !log swat done [12:02:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:43] 10Operations, 10docker-pkg, 10Patch-For-Review: releng/mediawiki-phpcs-dryrun fails to upload to docker-registry.wikimedia.org - https://phabricator.wikimedia.org/T200722 (10hashar) [12:12:11] !log rolling restart of parsoid service on wtp hosts for picking up https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454574/ [12:12:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:32] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485 (10Gehel) To not duplicate infos on each of the child tasks, I'll add anything that is common to all on this task. We'll take this occasion to reimage the systems, so that we... [12:16:58] 10Operations, 10ops-codfw, 10Discovery, 10Wikidata, and 2 others: add ssds to wdqs2003 - https://phabricator.wikimedia.org/T202778 (10Gehel) @Papaul: we'll start by reimaging wdqs2003 (wdqs200[12] to follow). We'll reimage them one by one, to ensure that we have at most 1 host down in the cluster at any ti... [12:17:54] PROBLEM - Memcached on labweb1002 is CRITICAL: connect to address 208.80.155.109 and port 11000: Connection refused [12:18:04] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485 (10Gehel) Note that data import after reimage can be done by copying over data from wdqs1010, which has been reimported recently. Procedure is documented on https://wikitech.wi... [12:21:26] (03PS4) 10Alexandros Kosiaris: Update links to github repos of scoring platform team [puppet] - 10https://gerrit.wikimedia.org/r/454577 (https://phabricator.wikimedia.org/T194212) (owner: 10Ladsgroup) [12:21:28] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Update links to github repos of scoring platform team [puppet] - 10https://gerrit.wikimedia.org/r/454577 (https://phabricator.wikimedia.org/T194212) (owner: 10Ladsgroup) [12:25:28] RECOVERY - Memcached on labweb1002 is OK: TCP OK - 0.000 second response time on 208.80.155.109 port 11000 [12:31:34] ACKNOWLEDGEMENT - tilerator on maps-test2003 is CRITICAL: connect to address 10.192.16.34 and port 6534: Connection refused Gehel tracked in https://phabricator.wikimedia.org/T202888 - this is a test server, so not critical [12:31:35] ACKNOWLEDGEMENT - tileratorui on maps-test2003 is CRITICAL: connect to address 10.192.16.34 and port 6535: Connection refused Gehel tracked in https://phabricator.wikimedia.org/T202888 - this is a test server, so not critical [12:33:17] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp3037.esams.wmnet', 'cp3041.esams.wmnet'] ``` and were **ALL** successful. [12:34:49] re [12:35:42] Hey banyek [12:38:47] hi :) [12:41:45] hey banyek welcome :-) [12:42:06] banyek: welcome! [12:48:50] (03PS2) 10Bstorm: nfs-exportd: gratuitous conversion to python3 [puppet] - 10https://gerrit.wikimedia.org/r/455219 (https://phabricator.wikimedia.org/T202294) [12:59:15] (03PS2) 10Volans: remote: ensure host list is a copy [software/spicerack] - 10https://gerrit.wikimedia.org/r/455506 (https://phabricator.wikimedia.org/T199079) [12:59:17] (03PS2) 10Volans: remote, confctl: raise on no-match [software/spicerack] - 10https://gerrit.wikimedia.org/r/455507 (https://phabricator.wikimedia.org/T199079) [12:59:19] (03PS2) 10Volans: Add mysql module to interact with the core MySQLs [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) [12:59:21] (03PS2) 10Volans: cookbook: add help option to the interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/455509 (https://phabricator.wikimedia.org/T199079) [12:59:23] (03PS2) 10Volans: config: refactor to explicitly pass the file [software/spicerack] - 10https://gerrit.wikimedia.org/r/455510 (https://phabricator.wikimedia.org/T199079) [12:59:25] (03PS2) 10Volans: mediawiki: set timeout for requests [software/spicerack] - 10https://gerrit.wikimedia.org/r/455511 (https://phabricator.wikimedia.org/T199079) [12:59:27] (03PS2) 10Volans: tests: skip tests when fixture is not available [software/spicerack] - 10https://gerrit.wikimedia.org/r/455512 (https://phabricator.wikimedia.org/T199079) [12:59:29] (03PS2) 10Volans: confctl: fix dry-run log message [software/spicerack] - 10https://gerrit.wikimedia.org/r/455513 (https://phabricator.wikimedia.org/T199079) [12:59:31] (03PS2) 10Volans: dnsdisc: add retry decorator to check_ttl() [software/spicerack] - 10https://gerrit.wikimedia.org/r/455514 (https://phabricator.wikimedia.org/T199079) [12:59:33] (03PS2) 10Volans: cookbook: simplify cookbook return value [software/spicerack] - 10https://gerrit.wikimedia.org/r/455515 (https://phabricator.wikimedia.org/T199079) [12:59:41] (03CR) 10Volans: "done" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455506 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:00:11] (03CR) 10Volans: "reply inline" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:03:20] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [13:04:33] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) I have verified that banyek is the holder of the @wikimedia.org email [13:05:43] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/455506 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:08:55] (03CR) 10Bstorm: [C: 032] nfs-exportd: gratuitous conversion to python3 [puppet] - 10https://gerrit.wikimedia.org/r/455219 (https://phabricator.wikimedia.org/T202294) (owner: 10Bstorm) [13:09:50] (03PS1) 10Filippo Giunchedi: graphite: alert when eqiad and codfw drift in number of thumbnails [puppet] - 10https://gerrit.wikimedia.org/r/455553 (https://phabricator.wikimedia.org/T199073) [13:10:18] (03CR) 10Gehel: [C: 031] "LGTM, minor documentation comment inline." (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:10:34] (03CR) 10Volans: "as agreed this approach has been integrated into Ibcdbd1cee11c106579bb94c27c87560566e348cf" [software/spicerack] - 10https://gerrit.wikimedia.org/r/455527 (owner: 10Gehel) [13:10:59] (03Abandoned) 10Gehel: Introduce specific RemoteHosts subclass for MySql [software/spicerack] - 10https://gerrit.wikimedia.org/r/455527 (owner: 10Gehel) [13:11:04] (03CR) 10Bstorm: nfs-exportd: gratuitous conversion to python3 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455219 (https://phabricator.wikimedia.org/T202294) (owner: 10Bstorm) [13:16:33] (03PS1) 10Alexandros Kosiaris: noc: Also allow cumin masters to access site [puppet] - 10https://gerrit.wikimedia.org/r/455555 [13:17:10] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Banyek) You signed this document on Mon, Aug 27, 3:16 PM. [13:19:09] (03CR) 10Volans: [C: 032] remote: ensure host list is a copy [software/spicerack] - 10https://gerrit.wikimedia.org/r/455506 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:20:11] (03Merged) 10jenkins-bot: remote: ensure host list is a copy [software/spicerack] - 10https://gerrit.wikimedia.org/r/455506 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:20:35] (03CR) 10Volans: [C: 032] remote, confctl: raise on no-match [software/spicerack] - 10https://gerrit.wikimedia.org/r/455507 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:20:37] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [13:21:34] (03Merged) 10jenkins-bot: remote, confctl: raise on no-match [software/spicerack] - 10https://gerrit.wikimedia.org/r/455507 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:22:25] (03CR) 10Alexandros Kosiaris: [C: 032] noc: Also allow cumin masters to access site [puppet] - 10https://gerrit.wikimedia.org/r/455555 (owner: 10Alexandros Kosiaris) [13:23:58] PROBLEM - IPMI Sensor Status on cp3035 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [13:25:18] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [13:25:20] 10Operations, 10SRE-Access-Requests: Requesting access to RESOURCE for USER[S] - https://phabricator.wikimedia.org/T202892 (10Mathew.onipe) [13:25:26] !log anomie@deploy1001 Synchronized php-1.32.0-wmf.18/includes/Storage/RevisionStore.php: Backport for T202032 (duration: 00m 49s) [13:25:31] (03PS3) 10Volans: Add mysql module to interact with the core MySQLs [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) [13:25:33] (03PS3) 10Volans: cookbook: add help option to the interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/455509 (https://phabricator.wikimedia.org/T199079) [13:25:35] (03PS3) 10Volans: config: refactor to explicitly pass the file [software/spicerack] - 10https://gerrit.wikimedia.org/r/455510 (https://phabricator.wikimedia.org/T199079) [13:25:37] (03PS3) 10Volans: mediawiki: set timeout for requests [software/spicerack] - 10https://gerrit.wikimedia.org/r/455511 (https://phabricator.wikimedia.org/T199079) [13:25:39] (03PS3) 10Volans: tests: skip tests when fixture is not available [software/spicerack] - 10https://gerrit.wikimedia.org/r/455512 (https://phabricator.wikimedia.org/T199079) [13:25:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:40] T202032: Duplicate ar_rev_id values in several wikis - https://phabricator.wikimedia.org/T202032 [13:25:41] (03PS3) 10Volans: confctl: fix dry-run log message [software/spicerack] - 10https://gerrit.wikimedia.org/r/455513 (https://phabricator.wikimedia.org/T199079) [13:25:43] (03PS3) 10Volans: dnsdisc: add retry decorator to check_ttl() [software/spicerack] - 10https://gerrit.wikimedia.org/r/455514 (https://phabricator.wikimedia.org/T199079) [13:25:45] (03PS3) 10Volans: cookbook: simplify cookbook return value [software/spicerack] - 10https://gerrit.wikimedia.org/r/455515 (https://phabricator.wikimedia.org/T199079) [13:25:51] (03CR) 10Volans: "done" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:26:57] (03CR) 10Volans: "Please take into account the TODO node when reviewing ;)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:28:03] 10Operations, 10SRE-Access-Requests: Requesting access to RESOURCE for USER[S] - https://phabricator.wikimedia.org/T202892 (10Mathew.onipe) [13:40:41] (03Abandoned) 10Hashar: test: puppet-syntax now fails on deprecation notices [puppet] - 10https://gerrit.wikimedia.org/r/333012 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [13:42:00] (03PS1) 10Gergő Tisza: Add editsitejson to everyone who has editinterface [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455561 (https://phabricator.wikimedia.org/T190015) [13:42:45] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [13:47:10] 10Operations, 10Analytics, 10Documentation: Remove data from Hadoop's HDFS as part of the user offboard workflow - https://phabricator.wikimedia.org/T200312 (10elukey) The users might leave PII data in the following places: * /home/$USER dir on the stat boxes * /user/$USER dir on HDFS * Hive databases on HDFS [13:50:10] 10Operations, 10SRE-Access-Requests: Requesting access to RESOURCE for USER[S] - https://phabricator.wikimedia.org/T202892 (10Gehel) 05Open>03declined Actually, this will be tracked as part of T202708 [13:50:44] (03PS1) 10Volans: spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) [13:51:28] (03CR) 10jerkins-bot: [V: 04-1] spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:52:52] (03PS1) 10Mathew.onipe: configured wqds to use RAID10 [puppet] - 10https://gerrit.wikimedia.org/r/455563 [13:53:20] (03PS2) 10Volans: spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) [13:53:32] (03CR) 10Hashar: "I think I made that patch to run the puppet tests with ruby 2.4 on Mac. Not sure why I cherry picked it on beta, we can probably remove it" [puppet] - 10https://gerrit.wikimedia.org/r/336840 (owner: 10Hashar) [13:53:59] (03CR) 10jerkins-bot: [V: 04-1] spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:54:14] what's wrong with me today... :( [13:55:03] ah ensure_first_param, fixing [13:55:07] (03PS3) 10Volans: spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) [13:57:03] (03CR) 10Volans: "question for the reviewers inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:57:09] !log mobrovac@deploy1001 Started deploy [proton/deploy@17fc7bb]: Ignore HTTPS errors for the time being [13:57:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:42] !log mobrovac@deploy1001 Finished deploy [proton/deploy@17fc7bb]: Ignore HTTPS errors for the time being (duration: 00m 33s) [13:57:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:31] !log mobrovac@deploy1001 Started deploy [citoid/deploy@fe96789]: Resolve DOIs/URLs all the way to end - T197242 [13:58:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:36] T197242: Transition citoid to use Zotero's translation-server-v2 - https://phabricator.wikimedia.org/T197242 [14:02:01] (03PS1) 10Bstorm: nfs-exportd: Remove deleted projects to stop errors [puppet] - 10https://gerrit.wikimedia.org/r/455564 (https://phabricator.wikimedia.org/T202294) [14:04:08] !log mobrovac@deploy1001 Finished deploy [citoid/deploy@fe96789]: Resolve DOIs/URLs all the way to end - T197242 (duration: 05m 37s) [14:04:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:13] T197242: Transition citoid to use Zotero's translation-server-v2 - https://phabricator.wikimedia.org/T197242 [14:04:19] (03PS1) 10Alexandros Kosiaris: Revert "proton: Force HTTP endpoint for mwapi" [puppet] - 10https://gerrit.wikimedia.org/r/455565 [14:04:27] (03PS2) 10Alexandros Kosiaris: Revert "proton: Force HTTP endpoint for mwapi" [puppet] - 10https://gerrit.wikimedia.org/r/455565 [14:04:32] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "proton: Force HTTP endpoint for mwapi" [puppet] - 10https://gerrit.wikimedia.org/r/455565 (owner: 10Alexandros Kosiaris) [14:06:21] (03CR) 10Mathew.onipe: "Please review" [puppet] - 10https://gerrit.wikimedia.org/r/455563 (owner: 10Mathew.onipe) [14:06:57] (03Abandoned) 10Hashar: systemd: allow isequal to match programname in/rsyslog [puppet] - 10https://gerrit.wikimedia.org/r/337411 (owner: 10Hashar) [14:06:57] PROBLEM - IPMI Sensor Status on cp3034 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:06:58] PROBLEM - Memory correctable errors -EDAC- on scb1002 is CRITICAL: 5.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=scb1002&var-datasource=eqiad%2520prometheus%252Fops [14:07:42] (03Abandoned) 10Hashar: Jenkins integration of rspec [puppet] - 10https://gerrit.wikimedia.org/r/331856 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [14:08:57] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 404 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test [14:08:57] from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 404 (expecting: 200) [14:10:35] (03CR) 10Gehel: "minor comments in line" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:10:53] akosiaris: this error makes 0 sense ^ [14:10:58] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 404 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test [14:10:58] from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 404 (expecting: 200) [14:10:58] PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 404 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test [14:10:58] from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 404 (expecting: 200) [14:11:12] according to the logs, rb returned 404 for Foo [14:11:22] this is something suspicious [14:11:28] PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 404 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test [14:11:28] from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 404 (expecting: 200) [14:11:49] indeed [14:12:08] * mobrovac looking [14:12:41] (03PS4) 10Volans: spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) [14:12:52] (03CR) 10Volans: "replies inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [14:13:13] (03CR) 10Gehel: [C: 04-1] "Congratulation on your first CR! See comment inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455563 (owner: 10Mathew.onipe) [14:13:25] ok, rb doesn't return 404 for these [14:13:30] so it's something about proton [14:13:34] looking at the code [14:14:29] (03CR) 10Gehel: [C: 04-1] configured wqds to use RAID10 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455563 (owner: 10Mathew.onipe) [14:16:03] (03PS1) 10Banyek: admin: added user banyek [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) [14:16:05] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [14:20:14] (03CR) 10Gilles: [C: 031] Extend Imagemagick policy file to disable Postscript/PDF [puppet] - 10https://gerrit.wikimedia.org/r/454544 (owner: 10Muehlenhoff) [14:21:39] 10Operations, 10Traffic, 10Continuous-Integration-Config, 10Patch-For-Review: Add CI to all operations/software/varnish/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180329 (10hashar) 05Open>03Resolved a:03ema CI has been configured by @ema via various tasks [14:22:29] (03CR) 10Mathew.onipe: "> Patch Set 1: Code-Review-1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455563 (owner: 10Mathew.onipe) [14:29:11] (03CR) 10Jcrespo: "Looks ok, including the gid, but remember it adds not privileges, will have to be done on a separate task." [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [14:29:25] (03CR) 10Jcrespo: "s/gid/uid/" [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [14:29:43] akosiaris: hm, it seems https is causing some weirdness with the request chromium makes, the 404 suggests that either the query disappears or the host header, or both [14:29:47] not sure at this point what exactly [14:30:09] (03CR) 10Jcrespo: [C: 031] admin: added user banyek [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [14:31:20] (03CR) 10Marostegui: "The data itself looks good." [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [14:31:29] (03PS2) 10Bstorm: nfs-exportd: Remove deleted projects to stop errors [puppet] - 10https://gerrit.wikimedia.org/r/455564 (https://phabricator.wikimedia.org/T202294) [14:33:40] (03PS2) 10Mathew.onipe: configured wqds to use RAID10 `Bug: T196485` Change-Id: I1abd8d4c2f8431704dadc1b39d4f34629b2ad099 [puppet] - 10https://gerrit.wikimedia.org/r/455563 (https://phabricator.wikimedia.org/T196485) [14:33:58] (03CR) 10Bstorm: [C: 032] nfs-exportd: Remove deleted projects to stop errors [puppet] - 10https://gerrit.wikimedia.org/r/455564 (https://phabricator.wikimedia.org/T202294) (owner: 10Bstorm) [14:35:30] (03PS2) 10Banyek: admin: add user banyek [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) [14:36:27] (03PS3) 10Marostegui: admin: add user banyek [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [14:36:57] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [14:38:21] (03CR) 10Ottomata: [C: 031] archiva::proxy: use certificate_name rather than only 'archiva' [puppet] - 10https://gerrit.wikimedia.org/r/455082 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [14:38:58] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [14:39:45] (03PS1) 10Bstorm: nfs-exportd: change to warning log level [puppet] - 10https://gerrit.wikimedia.org/r/455574 (https://phabricator.wikimedia.org/T202294) [14:40:37] (03PS1) 10Filippo Giunchedi: prometheus: alert on unusual day-over-day logstash ingestion rate change [puppet] - 10https://gerrit.wikimedia.org/r/455576 (https://phabricator.wikimedia.org/T202307) [14:40:47] akosiaris: i think we should revert back to http for now, until i get to the bottom of this ... [14:40:58] ok doing so [14:41:11] (03PS1) 10Alexandros Kosiaris: Revert "Revert "proton: Force HTTP endpoint for mwapi"" [puppet] - 10https://gerrit.wikimedia.org/r/455577 [14:41:21] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "Revert "proton: Force HTTP endpoint for mwapi"" [puppet] - 10https://gerrit.wikimedia.org/r/455577 (owner: 10Alexandros Kosiaris) [14:41:30] (03PS2) 10Alexandros Kosiaris: Revert "Revert "proton: Force HTTP endpoint for mwapi"" [puppet] - 10https://gerrit.wikimedia.org/r/455577 [14:41:32] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "Revert "proton: Force HTTP endpoint for mwapi"" [puppet] - 10https://gerrit.wikimedia.org/r/455577 (owner: 10Alexandros Kosiaris) [14:42:44] !log Running deduplicateArchiveRevId.php on aawikibooks for T202032 [14:42:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:50] T202032: Duplicate ar_rev_id values in several wikis - https://phabricator.wikimedia.org/T202032 [14:43:07] mobrovac: done [14:43:18] (03PS1) 10Alexandros Kosiaris: Display etcd /mediawiki-config values in noc.w.o [puppet] - 10https://gerrit.wikimedia.org/r/455578 [14:43:27] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [14:43:28] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy [14:43:51] !log Running deduplicateArchiveRevId.php on gotwikibooks, kswikiquote, lvwikibooks, nostalgiawiki, wawikibooks and wikimania2005wiki for T202032 [14:43:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:57] RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy [14:44:17] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move internal sites hosted on thorium to ganeti instance(s) - https://phabricator.wikimedia.org/T202011 (10Ottomata) > I'm not sure if it's feasible to keep the rest of Cloudera and only install Hue from the upstream sources? This might be... [14:44:56] (03PS1) 10Elukey: archiva1001: enable bacula backups for /var/lib/archiva [puppet] - 10https://gerrit.wikimedia.org/r/455579 (https://phabricator.wikimedia.org/T192639) [14:45:37] 10Operations, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [14:47:28] 10Operations, 10ops-codfw: Degraded RAID on db2033 - https://phabricator.wikimedia.org/T201757 (10Papaul) a:05Papaul>03Marostegui Done [14:47:46] 10Operations, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10jcrespo) >>! In T202521#4534387, @Banyek wrote: > https://phabricator.wikimedia.org/L3: You signed this document on Mon, Aug 27, 3:16 PM. Can confirm: `Verified, Current Banyek Balazs Pocze Mon, Au... [14:48:34] 10Operations, 10Analytics, 10Traffic, 10Services (blocked): Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10Ottomata) Hm, either of these solutions is fine, but even if Accept isn't requested from others, it might be something fairly interesting to just include in the fu... [14:49:57] (03PS2) 10Alexandros Kosiaris: Display etcd /mediawiki-config values in noc.w.o [puppet] - 10https://gerrit.wikimedia.org/r/455578 [14:50:01] (03CR) 10Marostegui: [C: 032] admin: add user banyek [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [14:50:14] (03PS4) 10Marostegui: admin: add user banyek [puppet] - 10https://gerrit.wikimedia.org/r/455566 (https://phabricator.wikimedia.org/T202521) (owner: 10Banyek) [14:52:02] !log up Shutting down db2088 for BIOS upgrade [14:52:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:53] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/12247/" [puppet] - 10https://gerrit.wikimedia.org/r/455579 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [14:53:45] (03CR) 10Alexandros Kosiaris: [C: 031] archiva1001: enable bacula backups for /var/lib/archiva [puppet] - 10https://gerrit.wikimedia.org/r/455579 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [14:55:08] akosiaris: o/ - just to be sure, this one will trigger a completely different backup for archiva1001, not touching the one currently running for meitnerium right? (better safe than sorry) [14:56:18] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485 (10Mathew.onipe) @Gehel Alright then. [14:56:21] elukey: yes [14:56:26] super thanks :) [14:58:22] 10Operations, 10ops-codfw: Degraded RAID on db2033 - https://phabricator.wikimedia.org/T201757 (10Marostegui) Thanks Papaul ``` physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Rebuilding) ``` Let's see if it goes well this time Thanks! [14:59:17] (03CR) 10Gehel: configured wqds to use RAID10 `Bug: T196485` Change-Id: I1abd8d4c2f8431704dadc1b39d4f34629b2ad099 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/455563 (https://phabricator.wikimedia.org/T196485) (owner: 10Mathew.onipe) [14:59:27] RECOVERY - Long running screen/tmux on analytics1003 is OK: OK: No SCREEN or tmux processes detected. [14:59:29] 10Operations, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [15:03:19] (03CR) 10Gehel: spicerack, cookbooks: install and configure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:04:42] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T202824 (10Papaul) a:05Papaul>03Marostegui Disk replacement complete [15:05:42] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T202824 (10Marostegui) Thanks! ``` physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS, 600 GB, Rebuilding) ``` [15:07:18] (03PS1) 10Sbisson: Enable 'PageTriage' log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455581 (https://phabricator.wikimedia.org/T202815) [15:09:22] 10Operations, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect 2030.wikimedia.org to the new movement strategy portal - https://phabricator.wikimedia.org/T202498 (10nebber) Super cool, thanks a lot! (Can I like comments here, or thank people?) [15:09:52] (03PS3) 10Mathew.onipe: configured wqds to use RAID10 [puppet] - 10https://gerrit.wikimedia.org/r/455563 (https://phabricator.wikimedia.org/T196485) [15:11:41] 10Operations, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [15:11:46] (03CR) 10Andrew Bogott: [C: 031] "This seems just fine -- whatever keeps the logs readable." [puppet] - 10https://gerrit.wikimedia.org/r/455574 (https://phabricator.wikimedia.org/T202294) (owner: 10Bstorm) [15:12:13] !log set transient low watermark to 80% for elasticsearch logstash cluster to allow shard replica allocation - T201971 [15:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:18] T201971: Shorten logstash retention temporarily - https://phabricator.wikimedia.org/T201971 [15:12:40] (03CR) 10Andrew Bogott: [C: 031] nfs-exportd: change to warning log level (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455574 (https://phabricator.wikimedia.org/T202294) (owner: 10Bstorm) [15:12:44] !log decreasing logstash elasticsearch index replica count to 1 on indices older than 1 day [15:12:46] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-Addshore: Requesting access to view EventLogging data for Tim WMDE - https://phabricator.wikimedia.org/T202063 (10Tim_WMDE) Works fine, thanks! [15:12:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:48] (03PS2) 10Bstorm: nfs-exportd: change to warning log level [puppet] - 10https://gerrit.wikimedia.org/r/455574 (https://phabricator.wikimedia.org/T202294) [15:12:50] (03CR) 10Ottomata: [C: 031] archiva1001: enable bacula backups for /var/lib/archiva [puppet] - 10https://gerrit.wikimedia.org/r/455579 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [15:12:52] (03CR) 10Mathew.onipe: "> Patch Set 2:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455563 (https://phabricator.wikimedia.org/T196485) (owner: 10Mathew.onipe) [15:14:11] (03CR) 10Gehel: [C: 031] "LGTM, let's wait until we have a schedule on when to install the new disks to merge it." [puppet] - 10https://gerrit.wikimedia.org/r/455563 (https://phabricator.wikimedia.org/T196485) (owner: 10Mathew.onipe) [15:14:54] (03CR) 10Bstorm: nfs-exportd: change to warning log level (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455574 (https://phabricator.wikimedia.org/T202294) (owner: 10Bstorm) [15:15:13] 10Operations, 10Wikimedia-Logstash, 10Goal, 10Patch-For-Review, and 2 others: Shorten logstash retention temporarily - https://phabricator.wikimedia.org/T201971 (10herron) decreased logstash elasticsearch index replica count to 1 on indices older than 1 day: ``` health status index uu... [15:15:37] 10Operations, 10ops-eqiad, 10Patch-For-Review: setup backup1001.eqiad.wmnet - https://phabricator.wikimedia.org/T189801 (10akosiaris) [15:15:39] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477 (10akosiaris) [15:19:55] (03PS2) 10Elukey: archiva1001: enable bacula backups for /var/lib/archiva [puppet] - 10https://gerrit.wikimedia.org/r/455579 (https://phabricator.wikimedia.org/T192639) [15:20:42] (03PS3) 10Bstorm: nfs-exportd: change to warning log level [puppet] - 10https://gerrit.wikimedia.org/r/455574 (https://phabricator.wikimedia.org/T202294) [15:20:44] (03CR) 10Elukey: [C: 032] archiva1001: enable bacula backups for /var/lib/archiva [puppet] - 10https://gerrit.wikimedia.org/r/455579 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [15:21:19] (03CR) 10Bstorm: "But now that you mention it, I'm going to put that up in its own patch. It was bothering me, them more I thought of it." [puppet] - 10https://gerrit.wikimedia.org/r/455574 (https://phabricator.wikimedia.org/T202294) (owner: 10Bstorm) [15:22:22] (03PS4) 10Bstorm: nfs-exportd: change to warning log level [puppet] - 10https://gerrit.wikimedia.org/r/455574 (https://phabricator.wikimedia.org/T202294) [15:23:39] (03CR) 10Bstorm: [C: 032] nfs-exportd: change to warning log level [puppet] - 10https://gerrit.wikimedia.org/r/455574 (https://phabricator.wikimedia.org/T202294) (owner: 10Bstorm) [15:23:46] 10Operations, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) @Banyek has confirmed he's access to the bastions. [15:24:58] PROBLEM - Check systemd state on archiva1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:26:43] !log taking wdqs2003 offline for SSD installation - T202778 [15:26:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:48] T202778: add ssds to wdqs2003 - https://phabricator.wikimedia.org/T202778 [15:26:52] !log gehel@puppetmaster1001 conftool action : set/pooled=inactive; selector: dc=codfw,cluster=wdqs,name=wdqs2003.eqiad.wmnet [15:26:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:08] RECOVERY - Check systemd state on archiva1001 is OK: OK - running: The system is fully operational [15:28:30] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2088 rebooted itself and came back sick - https://phabricator.wikimedia.org/T202822 (10Papaul) a:05Papaul>03Marostegui BIOS upgrade from version 2.4.3 to 2.8.0 IDRAC upgrade from version 2.40 to 2.60 [15:29:27] (03PS1) 10Bstorm: nfs-exportd: switch iteration from items to values for IPs [puppet] - 10https://gerrit.wikimedia.org/r/455582 [15:30:07] RECOVERY - Device not healthy -SMART- on db2058 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2058&var-datasource=codfw%2520prometheus%252Fops [15:30:21] 10Operations, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Redirect 2030.wikimedia.org to the new movement strategy portal - https://phabricator.wikimedia.org/T202498 (10Dzahn) 05Open>03Resolved a:03Dzahn @nebber :) You can try "Award token" if you like. In general in Phabricator you can direc... [15:30:24] (03PS1) 10Volans: readme: fix typo [cookbooks] - 10https://gerrit.wikimedia.org/r/455583 (https://phabricator.wikimedia.org/T199079) [15:32:35] (03CR) 10Volans: [V: 032 C: 032] readme: fix typo [cookbooks] - 10https://gerrit.wikimedia.org/r/455583 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:32:45] 10Operations, 10monitoring: add icinga1001 to allowed hosts for AQL SMS gateway - https://phabricator.wikimedia.org/T202784 (10Dzahn) Thanks @volans! Noted that i could have had access myself via pwstore. [15:34:32] 10Operations, 10ops-codfw, 10Discovery, 10Wikidata, and 2 others: add ssds to wdqs2003 - https://phabricator.wikimedia.org/T202778 (10Papaul) a:05Papaul>03Gehel Disks are in place [15:39:19] (03PS4) 10Gehel: configured wqds to use RAID10 [puppet] - 10https://gerrit.wikimedia.org/r/455563 (https://phabricator.wikimedia.org/T196485) (owner: 10Mathew.onipe) [15:40:47] RECOVERY - Device not healthy -SMART- on db2033 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2033&var-datasource=codfw%2520prometheus%252Fops [15:41:03] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['wdqs2003.codfw.wmnet'] ``` The log can be f... [15:44:24] (03CR) 10Gehel: [C: 032] configured wqds to use RAID10 [puppet] - 10https://gerrit.wikimedia.org/r/455563 (https://phabricator.wikimedia.org/T196485) (owner: 10Mathew.onipe) [15:48:40] !log reenabling puppet on install[12]002 [15:48:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:02] !log Re-running populateContentTables.php on aawikibooks, gotwikibooks, kswikiquote, lvwikibooks, nostalgiawiki, wawikibooks and wikimania2005wiki for T183488 [15:49:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:07] T183488: MCR schema migration stage 2: populate new fields - https://phabricator.wikimedia.org/T183488 [15:49:58] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2088 rebooted itself and came back sick - https://phabricator.wikimedia.org/T202822 (10Marostegui) Thanks Papaul. I have upgraded kernel, mysql and started it. Once it has caught up I will do a data check before repooling it. [15:50:11] !log Upgrade kernel and mariadb on db2088 [15:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:52] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['wdqs2003.codfw.wmnet'] ``` The log can be f... [15:51:02] 10Operations, 10SRE-Access-Requests: Please add everyone on the performance team to perf-roots - https://phabricator.wikimedia.org/T202648 (10RobH) [15:51:04] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10RobH) [15:54:11] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10RobH) @Imarlier: We've not gotten any feedback on which of the two groups you need, or if it is both? As such, I'm not sure what to present in our SRE meeting for approv... [15:54:50] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:55:13] (03CR) 10Volans: [C: 032] Add mysql module to interact with the core MySQLs [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:56:19] (03Merged) 10jenkins-bot: Add mysql module to interact with the core MySQLs [software/spicerack] - 10https://gerrit.wikimedia.org/r/455508 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:56:34] (03CR) 10Volans: [C: 032] cookbook: add help option to the interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/455509 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:57:37] (03Merged) 10jenkins-bot: cookbook: add help option to the interactive menu [software/spicerack] - 10https://gerrit.wikimedia.org/r/455509 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:58:26] (03CR) 10Volans: [C: 032] config: refactor to explicitly pass the file [software/spicerack] - 10https://gerrit.wikimedia.org/r/455510 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:59:28] (03Merged) 10jenkins-bot: config: refactor to explicitly pass the file [software/spicerack] - 10https://gerrit.wikimedia.org/r/455510 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:59:47] (03CR) 10Volans: [C: 032] mediawiki: set timeout for requests [software/spicerack] - 10https://gerrit.wikimedia.org/r/455511 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [15:59:48] bwahahahahaha ffreeeeeeeee [15:59:50] im freeeeeeee [15:59:57] its all apergos problem now! [16:00:01] * robh runs away cackling [16:00:28] ah good, I was just looking at the bazillion access requests [16:00:40] lol [16:01:37] (03Merged) 10jenkins-bot: mediawiki: set timeout for requests [software/spicerack] - 10https://gerrit.wikimedia.org/r/455511 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [16:02:11] (03CR) 10Volans: [C: 032] tests: skip tests when fixture is not available [software/spicerack] - 10https://gerrit.wikimedia.org/r/455512 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [16:03:58] (03Merged) 10jenkins-bot: tests: skip tests when fixture is not available [software/spicerack] - 10https://gerrit.wikimedia.org/r/455512 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [16:04:36] PROBLEM - puppet last run on db1083 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:05:21] (03CR) 10Volans: [C: 032] confctl: fix dry-run log message [software/spicerack] - 10https://gerrit.wikimedia.org/r/455513 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [16:06:15] (03Merged) 10jenkins-bot: confctl: fix dry-run log message [software/spicerack] - 10https://gerrit.wikimedia.org/r/455513 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [16:06:37] (03CR) 10Volans: [C: 032] dnsdisc: add retry decorator to check_ttl() [software/spicerack] - 10https://gerrit.wikimedia.org/r/455514 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [16:07:50] (03Merged) 10jenkins-bot: dnsdisc: add retry decorator to check_ttl() [software/spicerack] - 10https://gerrit.wikimedia.org/r/455514 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [16:08:13] (03CR) 10Volans: [C: 032] cookbook: simplify cookbook return value [software/spicerack] - 10https://gerrit.wikimedia.org/r/455515 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [16:09:15] (03Merged) 10jenkins-bot: cookbook: simplify cookbook return value [software/spicerack] - 10https://gerrit.wikimedia.org/r/455515 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [16:09:44] (03CR) 10Kosta Harlan: [C: 031] Enable 'PageTriage' log channel (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455581 (https://phabricator.wikimedia.org/T202815) (owner: 10Sbisson) [16:12:34] (03CR) 10Andrew Bogott: [C: 031] nfs-exportd: switch iteration from items to values for IPs [puppet] - 10https://gerrit.wikimedia.org/r/455582 (owner: 10Bstorm) [16:14:57] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [16:16:16] 10Operations, 10ops-codfw: Degraded RAID on db2033 - https://phabricator.wikimedia.org/T201757 (10Marostegui) This finished fine! ``` logicaldrive 1 (3.3 TB, RAID 1+0, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 600 GB, OK)... [16:18:01] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['wdqs2003.codfw.wmnet'] ``` and were **ALL** successful. [16:18:28] RECOVERY - HP RAID on db2033 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK [16:18:47] (03PS1) 10Anomie: Test that all wikis are in one of the shard dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) [16:19:56] (03CR) 10Anomie: "PS1 is going to fail the new test. Which shard is advisorswiki in, s3?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [16:20:40] (03CR) 10jerkins-bot: [V: 04-1] Test that all wikis are in one of the shard dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [16:22:58] (03CR) 10Anomie: "See T202904" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/426762 (https://phabricator.wikimedia.org/T189181) (owner: 10Reedy) [16:30:00] (03CR) 10Alex Monk: "Well this is strange. I removed the cherry-pick but shinken's puppet checks blew up in -releng. But puppet still looked okay when I ran it" [puppet] - 10https://gerrit.wikimedia.org/r/336840 (owner: 10Hashar) [16:31:06] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: add PTR record for labtestcontrol2003.wikimedia.org IPv6 [dns] - 10https://gerrit.wikimedia.org/r/455591 [16:31:47] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: add PTR record for labtestcontrol2003.wikimedia.org IPv6 [dns] - 10https://gerrit.wikimedia.org/r/455591 (owner: 10Arturo Borrero Gonzalez) [16:31:53] !log Running populateContentTables.php on advisorswiki for T183488 and T202904 [16:31:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:59] T183488: MCR schema migration stage 2: populate new fields - https://phabricator.wikimedia.org/T183488 [16:32:00] T202904: advisorswiki is not in any s?.dblist - https://phabricator.wikimedia.org/T202904 [16:32:14] robh: btw, I reviewed the maintenance notification in the calendar (and renamed them), they don't overlap (different sites) [16:34:53] RECOVERY - puppet last run on db1083 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:37:05] (03CR) 10Dzahn: "thank you very much for merging :)" [puppet] - 10https://gerrit.wikimedia.org/r/453546 (owner: 10Dzahn) [16:37:32] (03CR) 10Dzahn: "ah, didn't spot this. thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/455522 (owner: 10Elukey) [16:38:05] XioNoX: cool [16:38:24] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: labtestn: allow keystone IPv6 connections from labtest [puppet] - 10https://gerrit.wikimedia.org/r/455592 [16:38:50] 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting Access to view EventLogging data for gabriel-wmde / gbirke - https://phabricator.wikimedia.org/T202072 (10RStallman-legalteam) The NDA is fully signed and on file with legal. Thanks! [16:39:22] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: labtestn: allow keystone IPv6 connections from labtest [puppet] - 10https://gerrit.wikimedia.org/r/455592 (owner: 10Arturo Borrero Gonzalez) [16:46:52] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) Root shell access has been approved for @banyek on the SRE meeting of 27th August 2018 [16:46:53] (03CR) 10Krinkle: Enforce that interface-admin is the only group that can edit non-own CSS/JS (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [16:49:58] (03PS6) 10Vgutierrez: Deliver certificates in every save mode [software/certcentral] - 10https://gerrit.wikimedia.org/r/454794 (https://phabricator.wikimedia.org/T199711) [16:50:00] (03PS10) 10Vgutierrez: Implement DNS01 challenge support [software/certcentral] - 10https://gerrit.wikimedia.org/r/454845 (https://phabricator.wikimedia.org/T199711) [16:50:02] (03PS4) 10Vgutierrez: Provide support in the API for different certificate save modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/455153 (https://phabricator.wikimedia.org/T199711) [16:50:04] (03PS5) 10Vgutierrez: [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) [16:50:30] (03CR) 10Vgutierrez: Deliver certificates in every save mode (031 comment) [software/certcentral] - 10https://gerrit.wikimedia.org/r/454794 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [16:51:40] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Validate challenges before pushing them to the ACME directory [software/certcentral] - 10https://gerrit.wikimedia.org/r/455159 (https://phabricator.wikimedia.org/T199711) (owner: 10Vgutierrez) [16:51:52] 10Operations, 10Discovery-Search (Current work): Onboarding Mathew Onipe - https://phabricator.wikimedia.org/T202708 (10Gehel) It is not entirely clear what access we want to give @Mathew.onipe at this point. Constraints: * Matt is a contractor with a more junior profile than our usual Opsen * We need Matt to... [16:56:10] (03CR) 10Gergő Tisza: Enforce that interface-admin is the only group that can edit non-own CSS/JS (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [16:57:23] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [16:59:33] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [17:00:04] gehel: It is that lovely time of the day again! You are hereby commanded to deploy Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180827T1700). [17:00:13] jouncebot: o/ [17:00:53] (03PS2) 10Gehel: Move daily dump load - last one finished at 5:58 [puppet] - 10https://gerrit.wikimedia.org/r/455504 (owner: 10Smalyshev) [17:01:38] (03PS2) 10Bstorm: nfs-exportd: switch iteration from items to values for IPs [puppet] - 10https://gerrit.wikimedia.org/r/455582 [17:01:40] (03CR) 10Volans: "one question inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455553 (https://phabricator.wikimedia.org/T199073) (owner: 10Filippo Giunchedi) [17:01:47] (03CR) 10Gehel: [C: 032] Move daily dump load - last one finished at 5:58 [puppet] - 10https://gerrit.wikimedia.org/r/455504 (owner: 10Smalyshev) [17:04:50] !log gehel@deploy1001 Started deploy [wdqs/wdqs@22869f0]: new version of wdqs GUI and updater (wdqs1009 only) [17:04:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:05:17] !log gehel@deploy1001 Finished deploy [wdqs/wdqs@22869f0]: new version of wdqs GUI and updater (wdqs1009 only) (duration: 00m 27s) [17:05:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:06:50] (03PS3) 10Bstorm: nfs-exportd: switch iteration from items to values for IPs [puppet] - 10https://gerrit.wikimedia.org/r/455582 [17:07:15] (03CR) 10Krinkle: Enable 'PageTriage' log channel (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455581 (https://phabricator.wikimedia.org/T202815) (owner: 10Sbisson) [17:07:18] !log gehel@deploy1001 Started deploy [wdqs/wdqs@22869f0]: new version of wdqs GUI and updater [17:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:07:59] (03CR) 10Bstorm: [C: 032] nfs-exportd: switch iteration from items to values for IPs [puppet] - 10https://gerrit.wikimedia.org/r/455582 (owner: 10Bstorm) [17:09:02] (03PS1) 10Volans: Initial debian packaging [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/455597 (https://phabricator.wikimedia.org/T199079) [17:10:23] (03CR) 10Volans: "This CR was reference to the master branch, I've rebased it to track the debian one but gerrit has opened a new CR, so abandoning this, mo" [software/spicerack] - 10https://gerrit.wikimedia.org/r/455516 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [17:10:30] (03Abandoned) 10Volans: Initial debian packaging [software/spicerack] - 10https://gerrit.wikimedia.org/r/455516 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [17:12:18] (03PS2) 10Sbisson: Enable 'PageTriage' log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455581 (https://phabricator.wikimedia.org/T202815) [17:13:25] (03CR) 10Volans: [C: 032] "Merging as already +1ed in https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/455516" [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/455597 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [17:13:33] PROBLEM - Keyholder SSH agent on netmon1002 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. [17:13:42] (03CR) 10Sbisson: Enable 'PageTriage' log channel (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455581 (https://phabricator.wikimedia.org/T202815) (owner: 10Sbisson) [17:15:16] (03Merged) 10jenkins-bot: Initial debian packaging [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/455597 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [17:16:41] should recover ^ [17:16:44] RECOVERY - Keyholder SSH agent on netmon1002 is OK: OK: Keyholder is armed with all configured keys. [17:17:42] !log gehel@deploy1001 Finished deploy [wdqs/wdqs@22869f0]: new version of wdqs GUI and updater (duration: 10m 24s) [17:17:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:46] (03CR) 10Alex Monk: "Yeah so I don't know what happened there but this is no longer cherry-picked and puppet appears happy and shinken appears happy. Feel free" [puppet] - 10https://gerrit.wikimedia.org/r/336840 (owner: 10Hashar) [17:18:56] (03PS1) 10Niharika29: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455598 [17:21:13] (03Abandoned) 10Hashar: interface: IPAddr.new() requires an address family [puppet] - 10https://gerrit.wikimedia.org/r/336840 (owner: 10Hashar) [17:21:52] (03CR) 10Arturo Borrero Gonzalez: [C: 031] Delegate 185.15.56.0/24 to labs-ns0/ns1 [dns] - 10https://gerrit.wikimedia.org/r/445303 (https://phabricator.wikimedia.org/T199374) (owner: 10Andrew Bogott) [17:22:07] SMalyshev: wdqs deployment completed, tests are green [17:22:10] 10Operations: add perf-root admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Krenair) [17:23:06] 10Operations, 10Performance-Team, 10SRE-Access-Requests: add perf-root admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Dzahn) [17:23:30] (03PS1) 10Niharika29: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455600 [17:25:08] (03PS1) 10Niharika29: Add TemplateWizard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455601 [17:25:23] (03PS1) 10Dzahn: admin: add perf-roots to webserver_misc_static [puppet] - 10https://gerrit.wikimedia.org/r/455602 (https://phabricator.wikimedia.org/T202910) [17:26:01] 10Operations, 10Performance-Team, 10SRE-Access-Requests: add perf-root admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Dzahn) [17:27:24] !log uploaded spicerack_0.0.1-1_amd64.deb to apt.wikimedia.org jessie-wikimedia - T199079 [17:27:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:29] T199079: Refactor the switchdc script - https://phabricator.wikimedia.org/T199079 [17:30:07] (03PS5) 10Volans: spicerack, cookbooks: install and configure [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) [17:32:02] 10Operations, 10Analytics, 10Traffic, 10Services (blocked): Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10Ottomata) @Pchelolo we discussed this in standup today. If the data you need is small enough (can we filter on a URI?) and you only need a sample (say from a sing... [17:32:10] 10Operations, 10Performance-Team, 10SRE-Access-Requests: add perf-root admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Krinkle) @Dzahn I think you meant `perf-team`, not `perf-roots`. The perf-team group is for services owned/maintained by Performance that all team memb... [17:32:22] (03CR) 10Legoktm: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [17:35:00] (03CR) 10Krinkle: admin: add perf-roots to webserver_misc_static (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455602 (https://phabricator.wikimedia.org/T202910) (owner: 10Dzahn) [17:36:43] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10ayounsi) ```lang=diff [edit firewall family inet filter analytics-in4 term default then] - reject; + discard; [edit firewal... [17:36:49] (03CR) 10Volans: "Compiler results here:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455562 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [17:37:05] !log pushing the above analytics-in changes to cr1/2-eqiad - T198623 [17:37:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:10] T198623: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 [17:40:01] 10Operations, 10Analytics, 10Traffic, 10Services (blocked): Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10Pchelolo) > If the data you need is small enough (can we filter on a URI?) and you only need a sample (say from a single cache host), AND if traffic folks don't mi... [17:43:33] (03PS2) 10Dzahn: admin: add perf-team to webserver_misc_static [puppet] - 10https://gerrit.wikimedia.org/r/455602 (https://phabricator.wikimedia.org/T202910) [17:44:13] 10Operations, 10Performance-Team, 10SRE-Access-Requests: add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Dzahn) [17:44:44] 10Operations, 10Performance-Team, 10SRE-Access-Requests: add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Dzahn) Ok, thanks. Renamed the ticket and amended the patch accordingly. [17:45:23] PROBLEM - DPKG on analytics-tool1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:45:49] (03PS3) 10Dzahn: admin: add perf-team to webserver_misc_static [puppet] - 10https://gerrit.wikimedia.org/r/455602 (https://phabricator.wikimedia.org/T202910) [17:46:04] PROBLEM - Check systemd state on analytics-tool1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:46:56] 10Operations, 10Performance-Team, 10SRE-Access-Requests: add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Dzahn) for the record. "perf-team" also means "root access" in this context. [17:48:29] (03PS3) 10Ppchelko: Replace the semver patch version in Accept with x [puppet] - 10https://gerrit.wikimedia.org/r/455036 (https://phabricator.wikimedia.org/T202682) [17:49:09] (03PS3) 10Dzahn: Phab: Allow aklapper to delete personal Herald filter rules [puppet] - 10https://gerrit.wikimedia.org/r/448505 (https://phabricator.wikimedia.org/T202503) (owner: 10Aklapper) [17:50:22] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) > Equinix Support, > > Please note this ticket is for both a site access/visitor ticket for two Unisys Engineers as well as a SmartHands Escort & Supervise ticket for those... [17:50:59] (03CR) 10Smalyshev: [C: 04-1] "Temporary blocked by yet another bug." [puppet] - 10https://gerrit.wikimedia.org/r/454067 (https://phabricator.wikimedia.org/T201217) (owner: 10Smalyshev) [17:54:50] (03PS1) 10Fdans: Add druid snapshot removal cron job [puppet] - 10https://gerrit.wikimedia.org/r/455605 (https://phabricator.wikimedia.org/T197889) [17:55:42] (03CR) 10jerkins-bot: [V: 04-1] Add druid snapshot removal cron job [puppet] - 10https://gerrit.wikimedia.org/r/455605 (https://phabricator.wikimedia.org/T197889) (owner: 10Fdans) [17:57:33] (03PS2) 10Fdans: Add druid snapshot removal cron job [puppet] - 10https://gerrit.wikimedia.org/r/455605 (https://phabricator.wikimedia.org/T197889) [17:58:46] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10RobH) This was approved in today's SRE team meeting. [17:58:49] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Samuel Guebo - https://phabricator.wikimedia.org/T202362 (10RobH) This was approved in today's SRE team meeting. [17:58:53] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10ArielGlenn) Approved in SRE meeting. [17:58:55] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou - https://phabricator.wikimedia.org/T202486 (10RobH) This was approved in today's SRE team meeting. [17:59:02] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access and analytics-privatedata-users for Ty Hargrove - https://phabricator.wikimedia.org/T202363 (10RobH) This was approved in today's SRE team meeting. [17:59:09] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Karen Brown - https://phabricator.wikimedia.org/T201668 (10RobH) This was approved in today's SRE team meeting. [17:59:43] 10Operations, 10SRE-Access-Requests: request to add phendeskog to perf-roots - https://phabricator.wikimedia.org/T202658 (10ArielGlenn) Approved in SRE meeting. [18:00:01] 10Operations, 10SRE-Access-Requests: request to add imarlier to perf-roots - https://phabricator.wikimedia.org/T202657 (10ArielGlenn) Approved in SRE meeting. [18:00:05] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for Morning SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180827T1800). [18:00:05] stephanebisson: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:14] hello [18:00:23] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10ArielGlenn) Approved in SRE meeting. [18:00:27] o/ [18:01:17] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10ArielGlenn) Approved in SRE meeting. [18:01:22] greetings [18:01:56] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou - https://phabricator.wikimedia.org/T202486 (10ArielGlenn) Approved in SRE meeting. [18:02:34] (03PS2) 10RobH: adding user Bill Pirkle [puppet] - 10https://gerrit.wikimedia.org/r/454709 (https://phabricator.wikimedia.org/T202546) [18:03:19] (03CR) 10jerkins-bot: [V: 04-1] adding user Bill Pirkle [puppet] - 10https://gerrit.wikimedia.org/r/454709 (https://phabricator.wikimedia.org/T202546) (owner: 10RobH) [18:03:33] yea yea missed a << [18:03:36] (03PS3) 10RobH: adding user Bill Pirkle [puppet] - 10https://gerrit.wikimedia.org/r/454709 (https://phabricator.wikimedia.org/T202546) [18:03:52] (03CR) 10RobH: [C: 032] adding user Bill Pirkle [puppet] - 10https://gerrit.wikimedia.org/r/454709 (https://phabricator.wikimedia.org/T202546) (owner: 10RobH) [18:05:32] (03CR) 10Dzahn: [C: 032] "was approved in SRE meeting" [puppet] - 10https://gerrit.wikimedia.org/r/448505 (https://phabricator.wikimedia.org/T202503) (owner: 10Aklapper) [18:05:37] (03PS2) 10Framawiki: Add mhs.ox.ac.uk to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454616 (https://phabricator.wikimedia.org/T201604) [18:05:41] (03PS4) 10Dzahn: Phab: Allow aklapper to delete personal Herald filter rules [puppet] - 10https://gerrit.wikimedia.org/r/448505 (https://phabricator.wikimedia.org/T202503) (owner: 10Aklapper) [18:08:15] (03Abandoned) 10RobH: adding bill pirkle to groups [puppet] - 10https://gerrit.wikimedia.org/r/454710 (https://phabricator.wikimedia.org/T202546) (owner: 10RobH) [18:08:24] Anyone available to SWAT? [18:08:24] (03PS3) 10Framawiki: Throttle exception for 2018-08-29 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454160 (https://phabricator.wikimedia.org/T202288) [18:09:38] (03PS1) 10RobH: adding bpirkle to restricted [puppet] - 10https://gerrit.wikimedia.org/r/455609 (https://phabricator.wikimedia.org/T202546) [18:10:02] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10Framawiki) Merged, can this task be closed ? [18:10:54] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10RobH) a:03Aklapper Once @aklapper confirms the rule works as intended and he can delete heralds, yep. [18:11:00] (03CR) 10RobH: [C: 032] adding bpirkle to restricted [puppet] - 10https://gerrit.wikimedia.org/r/455609 (https://phabricator.wikimedia.org/T202546) (owner: 10RobH) [18:12:00] !log trunking cloud-instances1-b-codfw to labtestneutron2002:eth1 and labtestneutron2001:eth1 - T202636 [18:12:04] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10Dzahn) 05Open>03Resolved Yes, now that puppet ran on phab1001 and phab2001 and edited the sudo privileges. +%phabricator-admin ALL =... [18:12:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:05] T202636: Allow routing between eqiad and eqiad1 regions - https://phabricator.wikimedia.org/T202636 [18:12:32] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10RobH) Access is now live. Please note it can take up to 30 minutes for all affected hosts to receive the update. If there are any issues,... [18:12:45] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10RobH) 05Open>03Resolved [18:13:26] Anybody for a swat window ? [18:14:22] (03PS2) 10RobH: adding user Samuel Guebo to admin module [puppet] - 10https://gerrit.wikimedia.org/r/454578 (https://phabricator.wikimedia.org/T202362) [18:14:30] kostajh and I need to SWAT a fix for a production issue (T202815). It's kind of important. Anyone? Please? [18:14:31] T202815: [wmf.18] enwiki NPP page - no scroll - https://phabricator.wikimedia.org/T202815 [18:15:04] (03PS3) 10RobH: adding user Samuel Guebo to admin module [puppet] - 10https://gerrit.wikimedia.org/r/454578 (https://phabricator.wikimedia.org/T202362) [18:15:06] (03CR) 10jerkins-bot: [V: 04-1] adding user Samuel Guebo to admin module [puppet] - 10https://gerrit.wikimedia.org/r/454578 (https://phabricator.wikimedia.org/T202362) (owner: 10RobH) [18:15:25] (03CR) 10RobH: [C: 032] adding user Samuel Guebo to admin module [puppet] - 10https://gerrit.wikimedia.org/r/454578 (https://phabricator.wikimedia.org/T202362) (owner: 10RobH) [18:15:59] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10Dzahn) The comment on T202910#4535341 describes the difference between the use-cases for perf-roots and perf-team. Even though the name might suggest otherwise it doesn't... [18:18:09] (03PS3) 10Ayounsi: Per DC alerting on sudden traffic drop [puppet] - 10https://gerrit.wikimedia.org/r/454613 (https://phabricator.wikimedia.org/T201630) [18:19:11] godog: going to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/454613 [18:19:33] (03CR) 10Ayounsi: [C: 032] Per DC alerting on sudden traffic drop [puppet] - 10https://gerrit.wikimedia.org/r/454613 (https://phabricator.wikimedia.org/T201630) (owner: 10Ayounsi) [18:20:14] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Karen Brown - https://phabricator.wikimedia.org/T201668 (10Dzahn) 05stalled>03Open [18:21:32] !log merge Per DC alerting on sudden traffic drop (454613) - T201630 [18:21:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:37] T201630: False alarms on varnish-http-requests 70% GET drop in 30 min alert - https://phabricator.wikimedia.org/T201630 [18:22:38] (03Abandoned) 10RobH: adding user Samuel Guebo to groups in the admin module [puppet] - 10https://gerrit.wikimedia.org/r/454581 (https://phabricator.wikimedia.org/T202362) (owner: 10RobH) [18:23:16] (03PS1) 10RobH: adding sguebo to restricted and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/455613 (https://phabricator.wikimedia.org/T202362) [18:23:41] (03CR) 10RobH: [C: 032] adding sguebo to restricted and analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/455613 (https://phabricator.wikimedia.org/T202362) (owner: 10RobH) [18:27:05] (03PS1) 10Arturo Borrero Gonzalez: cloudvps: labtestn: add network link with labtest deployment network [puppet] - 10https://gerrit.wikimedia.org/r/455615 (https://phabricator.wikimedia.org/T202636) [18:27:21] (03CR) 10Catrope: [C: 032] Enable 'PageTriage' log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455581 (https://phabricator.wikimedia.org/T202815) (owner: 10Sbisson) [18:27:28] (03PS1) 10Dzahn: admins: add karen to restricted, analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/455616 (https://phabricator.wikimedia.org/T201668) [18:27:42] I'll do the SWAT [18:28:28] RoanKattouw: thank you [18:31:09] RECOVERY - Check systemd state on analytics-tool1001 is OK: OK - running: The system is fully operational [18:31:28] RECOVERY - DPKG on analytics-tool1001 is OK: All packages OK [18:31:39] (03Merged) 10jenkins-bot: Enable 'PageTriage' log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455581 (https://phabricator.wikimedia.org/T202815) (owner: 10Sbisson) [18:34:28] RECOVERY - Hue Server on analytics-tool1001 is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue [18:34:46] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable PageTriage log channel (T202815) (duration: 00m 57s) [18:34:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:52] T202815: [wmf.18] enwiki NPP page - no scroll - https://phabricator.wikimedia.org/T202815 [18:35:58] RECOVERY - puppet last run on analytics-tool1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:37:57] (03CR) 10Dzahn: [C: 032] admins: add karen to restricted, analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/455616 (https://phabricator.wikimedia.org/T201668) (owner: 10Dzahn) [18:38:19] (03CR) 10jenkins-bot: Enable 'PageTriage' log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455581 (https://phabricator.wikimedia.org/T202815) (owner: 10Sbisson) [18:38:36] (03CR) 10Dzahn: [C: 032] "access copies what James Alexander has" [puppet] - 10https://gerrit.wikimedia.org/r/455616 (https://phabricator.wikimedia.org/T201668) (owner: 10Dzahn) [18:39:10] kostajh: The logging channel patch is deployed. The PageTriage one is still finishing Jenkins [18:39:25] (03CR) 10Catrope: [C: 032] Add mhs.ox.ac.uk to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454616 (https://phabricator.wikimedia.org/T201604) (owner: 10Framawiki) [18:39:41] (03CR) 10Catrope: [C: 032] Throttle exception for 2018-08-29 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454160 (https://phabricator.wikimedia.org/T202288) (owner: 10Framawiki) [18:40:06] RoanKattouw: thanks [18:42:59] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Karen Brown - https://phabricator.wikimedia.org/T201668 (10Dzahn) 05Open>03Resolved a:03Dzahn @Jalexander @Kbrown You have the same admin groups now. (Or at lea... [18:43:08] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10Imarlier) @RobH Sorry about that, I missed your followup. I _think_ that restbase-root is most appropriate, but could be misevaluating. The questions that have been r... [18:43:14] (03Merged) 10jenkins-bot: Throttle exception for 2018-08-29 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454160 (https://phabricator.wikimedia.org/T202288) (owner: 10Framawiki) [18:43:18] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Karen Brown - https://phabricator.wikimedia.org/T201668 (10Dzahn) [18:45:01] (03PS3) 10Catrope: Add mhs.ox.ac.uk to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454616 (https://phabricator.wikimedia.org/T201604) (owner: 10Framawiki) [18:45:30] 10Operations, 10Traffic, 10monitoring, 10Patch-For-Review: False alarms on varnish-http-requests 70% GET drop in 30 min alert - https://phabricator.wikimedia.org/T201630 (10ayounsi) 05Open>03Resolved All checks are green: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=traffic+drop... [18:45:32] (03CR) 10Catrope: [C: 032] Add mhs.ox.ac.uk to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454616 (https://phabricator.wikimedia.org/T201604) (owner: 10Framawiki) [18:47:03] !log catrope@deploy1001 Synchronized wmf-config/throttle.php: Throttle exception for 2018-08-29 (T202288) (duration: 00m 48s) [18:47:04] godog: any concerns about https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/402758/ ? was close to merging [18:47:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:08] T202288: Temporary lift of IP cap for WikiGap Göteborg 2018-08-29 - https://phabricator.wikimedia.org/T202288 [18:47:18] (03Merged) 10jenkins-bot: Add mhs.ox.ac.uk to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454616 (https://phabricator.wikimedia.org/T201604) (owner: 10Framawiki) [18:50:41] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add mhs.ox.ac.uk to $wgCopyUploadsDomains (T201604) (duration: 00m 48s) [18:50:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:46] T201604: Add mhs.ox.ac.uk to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T201604 [18:51:08] RoanKattouw: Thanks ! [18:52:51] kostajh: stephanebisson The PageTriage patch is now on mwdebug1002, please test it there (or tell me if that's not feasible) [18:54:12] RoanKattouw: looking now... [18:54:15] RoanKattouw: tested, seems to work. [18:54:28] stephanebisson: load the feed, switch to sort by Oldest, you should be able to pull in more results [18:54:47] (03CR) 10jenkins-bot: Throttle exception for 2018-08-29 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454160 (https://phabricator.wikimedia.org/T202288) (owner: 10Framawiki) [18:54:49] (03CR) 10jenkins-bot: Add mhs.ox.ac.uk to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454616 (https://phabricator.wikimedia.org/T201604) (owner: 10Framawiki) [18:55:03] yeah, seem to work [18:55:07] RoanKattouw: ^ [18:55:17] OK deploying [18:56:08] !log catrope@deploy1001 Synchronized php-1.32.0-wmf.18/extensions/PageTriage/includes/ArticleMetadata.php: Allow deferred writes on GET for pages with missing metadata (T202815) (duration: 00m 47s) [18:56:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:56:13] T202815: [wmf.18] enwiki NPP page - no scroll - https://phabricator.wikimedia.org/T202815 [18:57:50] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move internal sites hosted on thorium to ganeti instance(s) - https://phabricator.wikimedia.org/T202011 (10Ottomata) a:03Ottomata [18:58:15] !log pushing labs-instance-in4 changes to cr1/2-eqiad - T199437 [18:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:55] RoanKattouw: thanks! [18:59:16] (03PS1) 10Ottomata: Add comment about Jessie deps in Stretch for Hue [puppet] - 10https://gerrit.wikimedia.org/r/455618 (https://phabricator.wikimedia.org/T202011) [18:59:36] (03PS2) 10Ottomata: Add comment about Jessie deps in Stretch for Hue [puppet] - 10https://gerrit.wikimedia.org/r/455618 (https://phabricator.wikimedia.org/T202011) [19:01:01] (03CR) 10Ottomata: [C: 032] Add comment about Jessie deps in Stretch for Hue [puppet] - 10https://gerrit.wikimedia.org/r/455618 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [19:02:29] Are people aware that labstestwiki is throwing errors due to a read-only DB? (cc mutante ) [19:04:32] (03CR) 10Krinkle: Test that all wikis are in one of the shard dblists (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [19:05:46] RoanKattouw: i am not aware, but i dont know about labtestwiki, it's testing for new horizon afaik. i'll forward it to cloud [19:06:54] OK. Nothing urgent, it just jumped out at me when looking at logstash post-deploy [19:06:57] RoanKattouw: yes, people are aware i hear [19:07:35] (03PS1) 10Andrew Bogott: wmf_sink: handle exceptions during proxy cleanup [puppet] - 10https://gerrit.wikimedia.org/r/455619 [19:08:13] (03CR) 10jerkins-bot: [V: 04-1] wmf_sink: handle exceptions during proxy cleanup [puppet] - 10https://gerrit.wikimedia.org/r/455619 (owner: 10Andrew Bogott) [19:09:16] (03PS2) 10Andrew Bogott: wmf_sink: handle exceptions during proxy cleanup [puppet] - 10https://gerrit.wikimedia.org/r/455619 [19:09:30] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Ladsgroup) CC @Lydia_Pintscher @WMDE-leszek [19:09:56] (03CR) 10Herron: [C: 031] "Looks like a noop in prod, and seems a nice way to add some flexibility to this profile. Thanks for this Alex! Will merge and perform a " [puppet] - 10https://gerrit.wikimedia.org/r/439791 (https://phabricator.wikimedia.org/T87338) (owner: 10Alex Monk) [19:11:28] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 26 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [19:11:49] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 27 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [19:12:22] (03CR) 10Andrew Bogott: [C: 032] wmf_sink: handle exceptions during proxy cleanup [puppet] - 10https://gerrit.wikimedia.org/r/455619 (owner: 10Andrew Bogott) [19:12:40] (03CR) 10Smalyshev: [C: 031] Enable daily category diffs on test [puppet] - 10https://gerrit.wikimedia.org/r/454067 (https://phabricator.wikimedia.org/T201217) (owner: 10Smalyshev) [19:12:58] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 23 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [19:13:08] (03PS2) 10Gehel: Enable daily category diffs on test [puppet] - 10https://gerrit.wikimedia.org/r/454067 (https://phabricator.wikimedia.org/T201217) (owner: 10Smalyshev) [19:14:09] (03CR) 10Gehel: [C: 032] Enable daily category diffs on test [puppet] - 10https://gerrit.wikimedia.org/r/454067 (https://phabricator.wikimedia.org/T201217) (owner: 10Smalyshev) [19:15:55] (03PS2) 10Anomie: Test that all wikis are in one of the shard dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) [19:15:59] (03CR) 10Anomie: Test that all wikis are in one of the shard dblists (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [19:17:39] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 20 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:17:58] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 18 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [19:19:55] !log adding IX bgp session to AS10089 on cr1-eqsin [19:19:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:14] (03CR) 10Alex Monk: "I doubt it's being used in many places outside prod, deployment-prep just tries to copy prod so a lot of use cases that don't come up in t" [puppet] - 10https://gerrit.wikimedia.org/r/439791 (https://phabricator.wikimedia.org/T87338) (owner: 10Alex Monk) [19:21:38] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 15 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [19:21:58] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 15 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [19:22:25] (03CR) 10Legoktm: [C: 031] Test that all wikis are in one of the shard dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455587 (https://phabricator.wikimedia.org/T202904) (owner: 10Anomie) [19:22:48] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 18 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [19:23:47] (03PS1) 10Urbanecm: Upload new logos for advisorswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455622 (https://phabricator.wikimedia.org/T202844) [19:23:49] (03PS1) 10Urbanecm: Use new logos for advisorywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455623 (https://phabricator.wikimedia.org/T202844) [19:24:48] (03PS1) 10Jforrester: Follow-up 0629eb9: Fix outdated reference to user group name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455624 [19:32:03] (03PS3) 10Herron: exim: Permit DKIM domain to be changed by hiera [puppet] - 10https://gerrit.wikimedia.org/r/439791 (https://phabricator.wikimedia.org/T87338) (owner: 10Alex Monk) [19:32:13] (03PS1) 10Urbanecm: Translation of scnwiktionary sitename was removed, add it back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455627 (https://phabricator.wikimedia.org/T202926) [19:33:58] (03CR) 10Herron: [C: 032] exim: Permit DKIM domain to be changed by hiera [puppet] - 10https://gerrit.wikimedia.org/r/439791 (https://phabricator.wikimedia.org/T87338) (owner: 10Alex Monk) [19:34:10] 10Operations, 10monitoring, 10netops, 10User-fgiunchedi: Update ACLs for newer graphite hosts - https://phabricator.wikimedia.org/T202846 (10ayounsi) ```lang=diff [edit firewall family inet filter analytics-in4 term graphite from destination-address] + /* graphite1004 */ + 10.64.16.149/32;... [19:34:17] (03PS1) 10Urbanecm: Remove uzwiki from commonsuploads.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455628 (https://phabricator.wikimedia.org/T202847) [19:34:55] !log adding graphite1004 and graphite2003 to analytics-in4 - T202846 [19:35:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:00] T202846: Update ACLs for newer graphite hosts - https://phabricator.wikimedia.org/T202846 [19:35:47] composer install seems to be failing a ton more than normal today [19:36:06] er, wrong channel [19:36:59] 10Operations, 10monitoring, 10netops, 10User-fgiunchedi: Update ACLs for newer graphite hosts - https://phabricator.wikimedia.org/T202846 (10ayounsi) 05Open>03Resolved Feel free to reopen or file a new task for the deletion part. [19:37:01] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: rack/setup/install graphite1004 - https://phabricator.wikimedia.org/T196484 (10ayounsi) [19:49:51] 10Operations, 10ops-ulsfo, 10Traffic, 10netops: ulsfo migration tracking - https://phabricator.wikimedia.org/T202433 (10RobH) @ayounsi: Can you review https://docs.google.com/spreadsheets/d/19f8XkjqQIKZ66uCY8vcEvqOdooTFxR8guLY6m_5yzXM/edit?usp=sharing and update the ulsfo transit/x-connections and/or confi... [19:52:47] (03PS2) 10Dzahn: piwik: add support for stretch/PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/453553 [19:58:31] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Gehel) Digging into this a bit more from the WDQS side, we see a few interesting things: * The NoHttpResponseException see... [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: It is that lovely time of the day again! You are hereby commanded to deploy Services – Parsoid / Citoid / Mobileapps / ORES / …. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180827T2000). [20:00:12] no parsoid deploy today [20:00:19] I have deploy for ores [20:04:00] 10Operations, 10SRE-Access-Requests, 10Performance-Team (Radar): add perf-team admins to webserver misc static servers - https://phabricator.wikimedia.org/T202910 (10Imarlier) [20:04:27] !log ladsgroup@deploy1001 Started deploy [ores/deploy@0e8cc73]: Deploy renaming wp10 models to articlequality (T196240) [20:04:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:33] T196240: Rename wp10 ORES model - https://phabricator.wikimedia.org/T196240 [20:07:05] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Imarlier) Hey, @Smalyshev -- Did you tag perf team on this because you're hoping that we can help with the investigation? [20:09:08] !log rolling back [20:09:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:06] !log ladsgroup@deploy1001 Finished deploy [ores/deploy@0e8cc73]: Deploy renaming wp10 models to articlequality (T196240) (duration: 05m 38s) [20:10:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:10] T196240: Rename wp10 ORES model - https://phabricator.wikimedia.org/T196240 [20:11:55] trying again [20:12:04] !log ladsgroup@deploy1001 Started deploy [ores/deploy@0e8cc73]: Deploy renaming wp10 models to articlequality (T196240) [20:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:13:20] you might see some down time errors here for ores [20:13:33] it's expected [20:16:42] as web and worker components of ores are different in almost all requests, we need to support blue/green system in a better way [20:17:57] Amir1: This seems like an edge case though, we rarely change the ORES key scheme. [20:18:16] awight: yeah but I have seen this type of issue before [20:18:48] one big thing is that it's pretty stressful IMO [20:18:51] Amir1: True, it's annoying that we can't easily send requests through the canary backend. Maybe we should do something like the X-Debug header. [20:20:05] Another improvement would be a mechanism to choose what percentage of load goes through the canary. [20:20:41] But for this specific deployment, we probably should have included back-compat code. [20:23:28] I don't see how it would go well deploying to all nodes, since the problem you identified a minute ago will happen with all mismatched front vs. backend code. [20:30:45] 10Operations, 10ops-ulsfo, 10Traffic, 10netops: ulsfo migration tracking - https://phabricator.wikimedia.org/T202433 (10RobH) Ok, @ayounsi and I reviewed the xconnection listing on the google sheet. It is now accurate. Do we need to have new LoAs generated for this migration, or is it enough to simply pr... [20:37:05] !log ladsgroup@deploy1001 Finished deploy [ores/deploy@0e8cc73]: Deploy renaming wp10 models to articlequality (T196240) (duration: 25m 01s) [20:37:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:37:10] T196240: Rename wp10 ORES model - https://phabricator.wikimedia.org/T196240 [20:37:28] 10Operations, 10ops-eqiad, 10DC-Ops, 10Data-Services, 10cloud-services-team: Decommission labstore100[12] and their disk shelves - https://phabricator.wikimedia.org/T187456 (10Bstorm) ``` labstore1001 is a Unused spare system (spare::system) labstore1002 is a Unused spare system (spare::system) ``` They... [20:38:34] 10Operations, 10ops-eqiad, 10DC-Ops, 10Data-Services, 10cloud-services-team: Decommission labstore100[12] and their disk shelves - https://phabricator.wikimedia.org/T187456 (10Bstorm) Apparently they are being held for a reason, though. They are thought of as a possible backup for labstore1003 if we can... [20:39:51] !log the ORES deployment is done [20:39:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:43:01] 10Operations, 10ops-eqiad, 10DC-Ops, 10Data-Services, 10cloud-services-team: Decommission labstore100[12] and their disk shelves - https://phabricator.wikimedia.org/T187456 (10Bstorm) So, we are waiting on T193655 The issues we've had with these new Dell systems gives me pause. So far, so good, and the... [20:46:51] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Wikidata produces a lot of failed requests for recentchanges API - https://phabricator.wikimedia.org/T202764 (10Smalyshev) @Imarlier yes, I suspect it might be Wikidata recentchanges API being slow, and I wonder if there's a way to che... [20:55:06] 10Operations, 10ops-ulsfo, 10Traffic, 10netops: ulsfo migration tracking - https://phabricator.wikimedia.org/T202433 (10RobH) Also the most recent copy of the Unitedlayer invoice may help track down the links we dont pay for xconnects on, or that seem to terminate with UL directly rather than DR/Telx . [21:00:04] bawolff and Reedy: I, the Bot under the Fountain, allow thee, The Deployer, to do Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180827T2100). [21:04:11] !log krinkle@deploy1001 Synchronized wmf-config/mc.php: doc - Ic7349f4ce (duration: 00m 48s) [21:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:05] !log repooling wdqs2003, new SSD, reimaged and data loaded - T195285 [21:36:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:15:07] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10RStallman-legalteam) Just seeing this now, as I was on vacation last week. @thiemowmde: I'll set up the NDA for you... [22:29:20] (03PS2) 10Dzahn: toolserver_legacy: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/448811 [22:30:34] (03CR) 10Dzahn: [C: 032] toolserver_legacy: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/448811 (owner: 10Dzahn) [22:42:30] (03PS1) 10Dzahn: openstack::network: update private IP of relic.toolserver instance [puppet] - 10https://gerrit.wikimedia.org/r/455737 [22:56:35] (03PS1) 10Catrope: Revert "Create copyviobot group in beta labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455741 [22:56:55] (03CR) 10Catrope: [C: 04-2] "Holding this back until the linked commit in PageTriage is merged" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455741 (owner: 10Catrope) [23:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy Evening SWAT (Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180827T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:11:08] 10Operations, 10CommRel-Internals, 10Wikimedia-Mailing-lists: Close https://lists.wikimedia.org/mailman/listinfo/cep and keep the archive for now - https://phabricator.wikimedia.org/T155683 (10Dzahn) 05Open>03Resolved We talked about it briefly on IRC. It's possible but we'd rather not starting adding th... [23:19:21] (03PS1) 10Dzahn: fix typo: torrealy -> torrelay [dns] - 10https://gerrit.wikimedia.org/r/455742 (https://phabricator.wikimedia.org/T196701) [23:20:28] (03CR) 10Dzahn: [C: 032] fix typo: torrealy -> torrelay [dns] - 10https://gerrit.wikimedia.org/r/455742 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [23:21:02] (03PS1) 10Ottomata: Add cache app directors for analytics_ui, superset and thorium [puppet] - 10https://gerrit.wikimedia.org/r/455743 (https://phabricator.wikimedia.org/T202011) [23:25:38] (03PS1) 10Dzahn: move tor_relay role to torrelay1001, decom radium [puppet] - 10https://gerrit.wikimedia.org/r/455744 (https://phabricator.wikimedia.org/T196701) [23:32:26] 10Operations, 10Patch-For-Review, 10Tor: rack/setup/install torrelay1001.wikimedia.org - https://phabricator.wikimedia.org/T196701 (10Dzahn) migration plan: goal: keep the same fingerprints - stop tor service on radium - rsync datadir contents (/var/lib/tor/ from radium to torrelay1001 - delete datadir and... [23:32:44] 10Operations, 10Patch-For-Review, 10Tor: rack/setup/install torrelay1001.wikimedia.org - https://phabricator.wikimedia.org/T196701 (10Dzahn) [23:38:41] (03PS1) 10Dzahn: tor_relay: temp allow rsync of datadir for migration [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) [23:39:17] (03CR) 10jerkins-bot: [V: 04-1] tor_relay: temp allow rsync of datadir for migration [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [23:42:59] (03PS2) 10Dzahn: tor_relay: temp allow rsync of datadir for migration [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) [23:43:35] (03CR) 10jerkins-bot: [V: 04-1] tor_relay: temp allow rsync of datadir for migration [puppet] - 10https://gerrit.wikimedia.org/r/455745 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn)