[00:23:17] (03PS3) 10Dzahn: icinga: enhance check for screen sessions, also detect tmux [puppet] - 10https://gerrit.wikimedia.org/r/373687 (https://phabricator.wikimedia.org/T165348) [00:40:03] (03PS4) 10Dzahn: icinga: enhance check for screen sessions, also detect tmux [puppet] - 10https://gerrit.wikimedia.org/r/373687 (https://phabricator.wikimedia.org/T165348) [00:42:51] (03PS5) 10Dzahn: icinga: enhance check for screen sessions, also detect tmux [puppet] - 10https://gerrit.wikimedia.org/r/373687 (https://phabricator.wikimedia.org/T165348) [00:43:29] (03CR) 10Dzahn: [C: 032] icinga: enhance check for screen sessions, also detect tmux [puppet] - 10https://gerrit.wikimedia.org/r/373687 (https://phabricator.wikimedia.org/T165348) (owner: 10Dzahn) [01:47:56] (03PS1) 10Dzahn: icinga/base: add screen-monitoring, whitelist hosts [puppet] - 10https://gerrit.wikimedia.org/r/373992 (https://phabricator.wikimedia.org/T165348) [01:50:27] (03PS2) 10Dzahn: icinga/base: add screen-monitoring, whitelist hosts [puppet] - 10https://gerrit.wikimedia.org/r/373992 (https://phabricator.wikimedia.org/T165348) [01:57:56] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/7602/" [puppet] - 10https://gerrit.wikimedia.org/r/373992 (https://phabricator.wikimedia.org/T165348) (owner: 10Dzahn) [02:44:51] (03PS1) 10Dzahn: icinga/base: add sudo privileges for screen process check [puppet] - 10https://gerrit.wikimedia.org/r/373994 (https://phabricator.wikimedia.org/T165348) [02:46:44] (03PS2) 10Dzahn: icinga/base: add sudo privileges for screen process check [puppet] - 10https://gerrit.wikimedia.org/r/373994 (https://phabricator.wikimedia.org/T165348) [02:48:59] (03CR) 10Dzahn: [C: 032] icinga/base: add sudo privileges for screen process check [puppet] - 10https://gerrit.wikimedia.org/r/373994 (https://phabricator.wikimedia.org/T165348) (owner: 10Dzahn) [02:56:28] 10Operations, 10monitoring, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348#3554994 (10Dzahn) An example works now on netmon2001: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=netmon2001&service=Long+running+screen%2Ftmux Current... [03:01:45] (03PS2) 10Dzahn: Releases: Proxy jenkins to main apache instance [puppet] - 10https://gerrit.wikimedia.org/r/373380 (owner: 10Chad) [03:02:14] (03CR) 10Dzahn: [C: 032] Releases: Proxy jenkins to main apache instance [puppet] - 10https://gerrit.wikimedia.org/r/373380 (owner: 10Chad) [03:03:53] (03CR) 10Dzahn: "Could not restart Service[apache2]: Execution of '/usr/sbin/service apache2 reload' returned 1: Job for apache2.service failed because the" [puppet] - 10https://gerrit.wikimedia.org/r/373380 (owner: 10Chad) [03:03:58] (03PS1) 10Dzahn: Revert "Releases: Proxy jenkins to main apache instance" [puppet] - 10https://gerrit.wikimedia.org/r/373996 [03:04:22] (03CR) 10Dzahn: [C: 032] "please test - Could not restart Service[apache2]: Execution of '/usr/sbin/service apache2 reload' returned 1: Job for apache2.service fail" [puppet] - 10https://gerrit.wikimedia.org/r/373996 (owner: 10Dzahn) [03:13:12] (03PS1) 10Dzahn: releases: load Apache mod_proxy/mod_proxy_http [puppet] - 10https://gerrit.wikimedia.org/r/373997 [03:15:12] (03PS2) 10Dzahn: releases: load Apache mod_proxy/mod_proxy_http [puppet] - 10https://gerrit.wikimedia.org/r/373997 [03:25:16] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 700.69 seconds [03:36:18] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3555004 (10Jayprakash12345) Who can connect hiwikiversity to labs. xtools.wmflabs.org does not wort yet. [03:39:53] (03PS3) 10Dzahn: releases: load Apache mod_proxy/mod_proxy_http [puppet] - 10https://gerrit.wikimedia.org/r/373997 [03:42:00] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3480814 (10Dzahn) >>! In T168765#3555004, @Jayprakash12345 wrote: > Who can connect hiwikiversity to labs. xtools.wmflabs.org does not work yet. If it's about #x... [03:44:00] (03CR) 10Dzahn: [C: 032] releases: load Apache mod_proxy/mod_proxy_http [puppet] - 10https://gerrit.wikimedia.org/r/373997 (owner: 10Dzahn) [03:46:41] (03PS1) 10Dzahn: Revert "Revert "Releases: Proxy jenkins to main apache instance"" [puppet] - 10https://gerrit.wikimedia.org/r/373998 [03:47:01] (03PS2) 10Dzahn: Revert "Revert "Releases: Proxy jenkins to main apache instance"" [puppet] - 10https://gerrit.wikimedia.org/r/373998 [03:48:31] (03CR) 10Dzahn: [C: 032] Revert "Revert "Releases: Proxy jenkins to main apache instance"" [puppet] - 10https://gerrit.wikimedia.org/r/373998 (owner: 10Dzahn) [03:49:12] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/373997/ -->" [puppet] - 10https://gerrit.wikimedia.org/r/373380 (owner: 10Chad) [03:51:15] (03CR) 10Dzahn: "https://releases.wikimedia.org/ci/" [puppet] - 10https://gerrit.wikimedia.org/r/373380 (owner: 10Chad) [03:52:22] (03Abandoned) 10Dzahn: shiladsen shell: try RSA key instead, add expiry [puppet] - 10https://gerrit.wikimedia.org/r/373177 (https://phabricator.wikimedia.org/T171988) (owner: 10Jeremyb) [03:54:08] (03CR) 10Dzahn: "please add me once we are actually on 2.15 and you have a +1 from Chad. thanks" [puppet] - 10https://gerrit.wikimedia.org/r/373520 (owner: 10Paladox) [04:01:26] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 273.92 seconds [04:07:58] (03CR) 10Dzahn: "@Chad @Platonides the link has been added. good?" [puppet] - 10https://gerrit.wikimedia.org/r/356645 (https://phabricator.wikimedia.org/T43608) (owner: 10Paladox) [04:08:54] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 2 others: NDA, MOU and LDAP (analytics cluster) for Shilad Sen - https://phabricator.wikimedia.org/T171988#3555051 (10Dzahn) 05Resolved>03Open [04:10:00] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Research, and 2 others: NDA, MOU and LDAP (analytics cluster) for Shilad Sen - https://phabricator.wikimedia.org/T171988#3482327 (10Dzahn) @Robh @herron Please see above, reopened because of the follow-up questions from Shilad. [04:19:56] (03CR) 10Dzahn: "please re-add me once we are actually on gerrit 2.15+" [puppet] - 10https://gerrit.wikimedia.org/r/368547 (owner: 10Paladox) [05:15:26] PROBLEM - Router interfaces on cr1-eqdfw is CRITICAL: CRITICAL: host 208.80.153.198, interfaces up: 33, down: 1, dormant: 0, excluded: 0, unused: 0 [05:21:36] RECOVERY - Router interfaces on cr1-eqdfw is OK: OK: host 208.80.153.198, interfaces up: 35, down: 0, dormant: 0, excluded: 0, unused: 0 [06:37:14] 10Operations, 10Commons, 10MediaWiki-extensions-Scribunto, 10Patch-For-Review, 10Wikimedia-log-errors: Some Commons pages transcluding Template:Countries_of_Europe HTTP 500/503 when accessed from non-English languages specified in the template - https://phabricator.wikimedia.org/T171392#3555246 (10Vachove... [07:49:41] PROBLEM - Host text-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [07:52:46] PROBLEM - Host lvs4001 is DOWN: PING CRITICAL - Packet loss = 100% [07:54:04] * volans checking [08:18:07] RECOVERY - Host lvs4001 is UP: PING WARNING - Packet loss = 50%, RTA = 78.54 ms [08:18:35] RECOVERY - Host text-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 78.53 ms [09:39:52] 10Operations, 10TCB-Team, 10Two-Column-Edit-Conflict-Merge, 10Patch-For-Review, and 2 others: Deploy TwoColConflict extension to production - https://phabricator.wikimedia.org/T150184#3555338 (10MarcoAurelio) [09:39:55] 10Operations, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, 10Wikimedia-Extension-setup, and 4 others: Deploy InterwikiSorting extension to production - https://phabricator.wikimedia.org/T150183#3555339 (10MarcoAurelio) [09:42:41] 10Operations, 10Wikimedia-Extension-setup, 10Shell, 10Tamil-Sites, 10wikimedia-extension-review-queue: Enable Extension:ShortUrl on or.wikipedia, ta.wikipedia... - https://phabricator.wikimedia.org/T3450#3555422 (10MarcoAurelio) [09:44:16] 10Operations, 10Wikimedia-Extension-setup, 10Shell, 10Tamil-Sites, 10wikimedia-extension-review-queue: Enable Extension:ShortUrl on or.wikipedia, ta.wikipedia... - https://phabricator.wikimedia.org/T3450#3555550 (10MarcoAurelio) [11:34:40] https://commons.wikimedia.org/w/index.php?title=Commons:Help_desk&curid=523680&diff=256393273&oldid=256390042 thumbnail issue on Commons [11:35:48] with technical details and screenshot [11:46:20] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3555784 (10Paladox) I doint think this is worth it now since reviewdb is being removed from gerrit soon... [11:46:58] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3555786 (10Paladox) [11:47:48] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2897017 (10Paladox) [11:47:52] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit: Schedule downtime to migrate db to utf8mb4 - https://phabricator.wikimedia.org/T155764#3555788 (10Paladox) 05Open>03declined [11:48:02] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit: Schedule downtime to migrate db to utf8mb4 - https://phabricator.wikimedia.org/T155764#2953844 (10Paladox) T174034 Will fix the issue. [13:37:35] (03PS2) 10Alexandros Kosiaris: Generate kubernetes manpages for kubectl [debs/kubernetes] - 10https://gerrit.wikimedia.org/r/373917 (https://phabricator.wikimedia.org/T170346) [15:27:15] PROBLEM - Check Varnish expiry mailbox lag on cp1073 is CRITICAL: CRITICAL: expiry mailbox lag is 2037732 [15:46:09] !log bounce varnish on cp1073 - mailbox lag [15:46:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:15] RECOVERY - Check Varnish expiry mailbox lag on cp1073 is OK: OK: expiry mailbox lag is 0 [17:00:35] 10Operations, 10CirrusSearch, 10Discovery, 10Discovery-Search, and 6 others: Job queue is increasing non-stop - https://phabricator.wikimedia.org/T173710#3556169 (10Esc3300) [17:20:51] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3556196 (10Dzahn) p:05Normal>03Low [17:37:33] (03PS1) 10Dzahn: icinga/base: screen monitoring by default. whitelist copper, terbium [puppet] - 10https://gerrit.wikimedia.org/r/374050 (https://phabricator.wikimedia.org/T165348) [18:55:15] (03PS1) 1020after4: Move phabricator conf files outside of source tree [puppet] - 10https://gerrit.wikimedia.org/r/374054 [18:55:39] (03CR) 10jerkins-bot: [V: 04-1] Move phabricator conf files outside of source tree [puppet] - 10https://gerrit.wikimedia.org/r/374054 (owner: 1020after4) [18:58:28] (03PS2) 1020after4: Move phabricator conf files outside of source tree [puppet] - 10https://gerrit.wikimedia.org/r/374054 [18:58:51] (03CR) 10jerkins-bot: [V: 04-1] Move phabricator conf files outside of source tree [puppet] - 10https://gerrit.wikimedia.org/r/374054 (owner: 1020after4) [18:59:41] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:59:59] I'm taking a look [19:00:01] (03PS3) 1020after4: Move phabricator conf files outside of source tree [puppet] - 10https://gerrit.wikimedia.org/r/374054 (https://phabricator.wikimedia.org/T172847) [19:00:41] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 173 bytes in 0.220 second response time [19:01:50] this looks liek an instance of T172930 btw [19:01:51] T172930: Long running thumbnail requests locking up Thumbor instances - https://phabricator.wikimedia.org/T172930 [19:02:04] godog: We need a ComputersSuck phab tag [19:03:10] 10Operations, 10Cassandra, 10Epic, 10Goal, and 2 others: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3556267 (10Eevans) [19:03:21] Reedy: seriously! [19:05:19] but all phab task have such tag implicitly [19:05:25] *tasks [19:06:15] Nah [19:06:19] Not if someone is asking for a new feature [19:06:27] Anything that's a bug, or a failure etc, sure ;) [19:08:21] I'm not seeing any suckage out of the ordinary in 500s btw, it was mostly icinga thinking thumbor was unwell [19:08:30] sorry about the page [19:09:16] * godog off