[00:24:40] PROBLEM - Long running screen/tmux on labcontrol1001 is CRITICAL: CRIT: Long running SCREEN process. (user: andrew PID: 3224, 1734525s 1728000s). [01:23:45] someone reset the channel topic? [01:25:00] (don't remember the rest) [01:29:43] AlexZ: wonder is the block to the @wikimedia cloak intended? [01:29:57] (Just wondering in case you did that by mistake :)) [01:30:07] I did on mobile rn [01:30:18] I think AlexZ just banned themselves [01:30:23] Ok [01:31:04] Didn’t realise that was his cloak :) [01:31:34] (You should unblock your self :)) [01:33:24] _joe_: *nudge* T190893#4193076 [01:33:25] T190893: Setup the webservice-related instances in toolsbeta - https://phabricator.wikimedia.org/T190893 [01:37:10] Hmm they joined again and left [03:11:00] !log l10nupdate@tin scap sync-l10n completed (1.32.0-wmf.3) (duration: 16m 20s) [03:11:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:18:25] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon May 14 03:18:25 UTC 2018 (duration 7m 25s) [03:18:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:29:50] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 842.83 seconds [04:16:10] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 240.00 seconds [04:34:56] (03PS1) 10BryanDavis: toolforge: Redirect GET & HEAD to https [puppet] - 10https://gerrit.wikimedia.org/r/432935 (https://phabricator.wikimedia.org/T102367) [05:18:22] (03PS1) 10Marostegui: db-eqiad.php: Depool db1091 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432938 (https://phabricator.wikimedia.org/T190148) [05:19:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1091 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432938 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:21:16] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1091 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432938 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:21:31] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1091 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432938 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui) [05:22:31] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1091 for alter table (duration: 01m 03s) [05:22:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:22:44] !log Deploy schema change on db1091 - T191519 T188299 T190148 [05:22:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:22:50] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [05:22:50] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [05:22:50] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [06:16:08] !log Drop unused flaggedrevs from s3 testwiki - T174801 [06:16:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:16:12] T174801: Drop flaggedrevs tables on wikis where it is not enabled - https://phabricator.wikimedia.org/T174801 [06:24:16] 10Operations, 10ops-eqiad, 10DBA, 10decommission: Decommission db1029 and db1031 - https://phabricator.wikimedia.org/T184054#4203578 (10Marostegui) a:05RobH>03Cmjohnson Assigning it to Chris to reflect current status [06:25:21] * elukey checks for alter tables [06:31:40] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml] [06:35:36] !log installing wget security updates on trusty (Debian already fixed) [06:35:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:36:33] elukey: you are late: ˜/marostegui 7:22> !log Deploy schema change on db1091 - T191519 T188299 T190148 [06:36:33] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [06:36:34] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [06:36:34] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [06:38:15] marostegui: <3 [06:38:42] !log rolling restart of cassandra on aqs* for openjdk-8 upgrades [06:38:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:52] (03CR) 10Muehlenhoff: [C: 031] Cumin masters in WMCS: upgrade to python3 [puppet] - 10https://gerrit.wikimedia.org/r/419131 (https://phabricator.wikimedia.org/T188112) (owner: 10Volans) [06:55:40] PROBLEM - cassandra-b CQL 10.64.0.127:9042 on aqs1004 is CRITICAL: connect to address 10.64.0.127 and port 9042: Connection refused [06:56:51] RECOVERY - cassandra-b CQL 10.64.0.127:9042 on aqs1004 is OK: TCP OK - 0.000 second response time on 10.64.0.127 port 9042 [06:58:00] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:07:44] 10Operations, 10ops-codfw, 10DC-Ops: mw2139 failed to boot - hardware check - https://phabricator.wikimedia.org/T194426#4203628 (10MoritzMuehlenhoff) a:03Papaul [07:19:00] PROBLEM - cassandra-a CQL 10.64.48.148:9042 on aqs1006 is CRITICAL: connect to address 10.64.48.148 and port 9042: Connection refused [07:19:31] all good in here, sorry for the noise [07:20:20] RECOVERY - cassandra-a CQL 10.64.48.148:9042 on aqs1006 is OK: TCP OK - 0.000 second response time on 10.64.48.148 port 9042 [07:46:27] (03CR) 10Filippo Giunchedi: "> Patch Set 2:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/432712 (https://phabricator.wikimedia.org/T190978) (owner: 10Krinkle) [07:51:11] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1091" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432956 [07:52:11] 10Operations, 10Wikimedia-Mailing-lists: Create wikibaseug mailing list - https://phabricator.wikimedia.org/T189674#4203737 (10samuwmde) @Dzahn thank you really much could you put also Léa Lacroix as Mailinglist Admin to the list. [07:53:13] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1091" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432956 (owner: 10Marostegui) [07:54:28] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1091" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432956 (owner: 10Marostegui) [07:55:51] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1091 after alter table (duration: 01m 02s) [07:55:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:53] (03PS1) 10Jcrespo: dbhosts: Remove x1 from dbstore2001 [software] - 10https://gerrit.wikimedia.org/r/432957 [07:57:22] !log Deploy schema change on s7 codfw master (db2040) with replication, this will generate lag on codfw - T191519 T188299 T190148 [07:57:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:27] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [07:57:27] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [07:57:28] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [07:58:37] (03CR) 10Jcrespo: [C: 032] dbhosts: Remove x1 from dbstore2001 [software] - 10https://gerrit.wikimedia.org/r/432957 (owner: 10Jcrespo) [08:02:21] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1091" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432956 (owner: 10Marostegui) [08:15:51] (03PS1) 10Marostegui: mariadb: Convert db1120 to a temporary sanitarium [puppet] - 10https://gerrit.wikimedia.org/r/432959 (https://phabricator.wikimedia.org/T192979) [08:20:05] (03CR) 10Marostegui: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11194/" [puppet] - 10https://gerrit.wikimedia.org/r/432959 (https://phabricator.wikimedia.org/T192979) (owner: 10Marostegui) [08:22:55] 10Operations, 10Federated-Wikibase-Workshops, 10Wikimedia-Mailing-lists: Shall we set up a mailing list or something for the Wikibase community? - https://phabricator.wikimedia.org/T192799#4203799 (10Lydia_Pintscher) 05Open>03Resolved Jep this should be fine then. Thank you! [08:41:02] !log uploaded HHVM 3.18.5+dfsg-1+wmf8+deb9u1 to apt.wikimedia.org [08:41:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:02] 10Operations, 10Wikimedia-Mailing-lists: Create wikibaseug mailing list - https://phabricator.wikimedia.org/T189674#4203851 (10Aklapper) Any of the two existing admins can add Léa as an admin under "owner" (second field) at https://lists.wikimedia.org/mailman/admin/wikibaseug/general [08:46:00] (03PS6) 10Elukey: Swap conf1001 with conf1004 in Zookeeper main-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/425238 (https://phabricator.wikimedia.org/T182924) [08:58:21] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4203893 (10Vgutierrez) [09:03:50] (03PS1) 10Muehlenhoff: Add library hint for wavpack [puppet] - 10https://gerrit.wikimedia.org/r/432961 [09:07:08] (03CR) 10Muehlenhoff: [C: 032] Add library hint for wavpack [puppet] - 10https://gerrit.wikimedia.org/r/432961 (owner: 10Muehlenhoff) [09:07:52] (03PS1) 10Urbanecm: New throttle rule for edit-a-thon on 18 May [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432962 (https://phabricator.wikimedia.org/T194630) [09:11:23] !log installing wavpack security updates for stretch (jessie/trusty not affected) [09:11:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:23:43] !log installing libmad security updates [09:23:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:57] !log installing ghostscript security updates on trusty [09:29:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:26] (03PS1) 10Jcrespo: mariadb-package: Update systemd unit [software] - 10https://gerrit.wikimedia.org/r/432965 (https://phabricator.wikimedia.org/T194516) [09:41:30] !log installing gunicorn security updates [09:41:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:44] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204037 (10Vgutierrez) [09:59:03] (03PS6) 10Volans: debmonitor: add server side puppettization [puppet] - 10https://gerrit.wikimedia.org/r/430881 (https://phabricator.wikimedia.org/T191299) [10:08:28] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204062 (10Vgutierrez) [10:13:16] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::mcrouter_wancache: add ssl, proxy support [puppet] - 10https://gerrit.wikimedia.org/r/431737 (https://phabricator.wikimedia.org/T192370) [10:13:18] (03PS3) 10Giuseppe Lavagetto: puppet_ecdsacert: allow IP-based SANs [puppet] - 10https://gerrit.wikimedia.org/r/431738 (https://phabricator.wikimedia.org/T192370) [10:17:57] !log installing systemd updates from stretch SUA update [10:18:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:44] (03CR) 10Alexandros Kosiaris: [C: 04-1] "git-lfs is related to how the software is deployed so I don't think it's place is in base.pp. I gather this is for the labs environment (n" [puppet] - 10https://gerrit.wikimedia.org/r/432432 (owner: 10Halfak) [10:38:16] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204122 (10Vgutierrez) [10:43:42] (03CR) 10Alexandros Kosiaris: [C: 04-1] debmonitor: add server side puppettization (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/430881 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [10:49:55] (03PS7) 10Volans: debmonitor: add server side puppettization [puppet] - 10https://gerrit.wikimedia.org/r/430881 (https://phabricator.wikimedia.org/T191299) [10:59:19] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-streamparser: Initial Debian packaging [debs/contenttranslation/apertium-streamparser] - 10https://gerrit.wikimedia.org/r/431553 (https://phabricator.wikimedia.org/T192987) (owner: 10KartikMistry) [11:00:04] jan_drewniak: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Wikimedia Portals Update . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180514T1100). [11:03:45] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204216 (10Vgutierrez) [11:04:42] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4204232 (10MoritzMuehlenhoff) I suggest we do the following: - Pick a date/time frame of a few hours where no deployments are ha... [11:07:01] 10Operations, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4204237 (10MoritzMuehlenhoff) >>! In T191921#4194970, @Legoktm wrote: > (..) I'm not sure all of that is necessary if we're just... [11:08:03] (03CR) 10Alexandros Kosiaris: [C: 031] CLI: use lsb_release for OS detection [software/debmonitor] - 10https://gerrit.wikimedia.org/r/432395 (owner: 10Volans) [11:08:33] akosiaris: when you upload apertium-streamparser, please use T192978 instead of T192987, I fixed commit msg, but it got lost in between patch sets :/ [11:08:34] T192987: wdqs-frontend BRAND_TITLE env var doesn't appear to work - https://phabricator.wikimedia.org/T192987 [11:08:34] T192978: Package apertium-streamparser - https://phabricator.wikimedia.org/T192978 [11:19:48] (03CR) 10Muehlenhoff: [C: 031] "That works. For debdeploy I went with parsing /etc/os-release instead, which is shipped/owned by base-file which is an essential package (" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/432395 (owner: 10Volans) [11:22:21] (03CR) 10Alexandros Kosiaris: [C: 031] Make model validation stronger [software/debmonitor] - 10https://gerrit.wikimedia.org/r/432377 (owner: 10Volans) [11:23:37] !log upload apertium-streamparser to apt.wikimedia.org/jessie-wikimedia/main T192978 [11:23:40] kart_: ^ [11:23:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:42] T192978: Package apertium-streamparser - https://phabricator.wikimedia.org/T192978 [11:26:58] (03PS1) 10Alexandros Kosiaris: Reimage as stretch ganeti2003, ganeti2007 [puppet] - 10https://gerrit.wikimedia.org/r/432973 [11:38:32] (03CR) 10Krinkle: mtail: Add xcachestatus to varnishrls (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/432712 (https://phabricator.wikimedia.org/T190978) (owner: 10Krinkle) [11:39:59] (03PS2) 10Rduran: [WIP] Refactor code in transfer.py [puppet] - 10https://gerrit.wikimedia.org/r/432569 (https://phabricator.wikimedia.org/T156462) [11:47:45] (03PS2) 10Volans: Initial working version [software/debmonitor/deploy] - 10https://gerrit.wikimedia.org/r/432597 (https://phabricator.wikimedia.org/T191299) [12:11:20] PROBLEM - puppet last run on ms-be1016 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [12:19:15] 10Operations, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar), 10User-Joe: mcrouter production architecture - https://phabricator.wikimedia.org/T192771#4204404 (10Joe) [12:22:34] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Ban clients of WDQS which don't follow throttling directives for some time - https://phabricator.wikimedia.org/T194653#4204417 (10Gehel) [12:26:16] akosiaris: thanks! [12:35:26] !log kartik@tin Started deploy [cxserver/deploy@a7ef01b]: Update cxserver to 176b507 [12:35:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:30] RECOVERY - puppet last run on ms-be1016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:38:53] !log kartik@tin Finished deploy [cxserver/deploy@a7ef01b]: Update cxserver to 176b507 (duration: 03m 26s) [12:38:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:12] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204462 (10Vgutierrez) [12:56:22] (03PS1) 10Jcrespo: mariadb: Allow reimage to stretch of db206* hosts [puppet] - 10https://gerrit.wikimedia.org/r/432985 [12:57:22] (03CR) 10Jcrespo: [C: 032] mariadb: Allow reimage to stretch of db206* hosts [puppet] - 10https://gerrit.wikimedia.org/r/432985 (owner: 10Jcrespo) [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180514T1300). [13:00:04] dcausse and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:15] I'm here [13:01:01] o/ [13:01:06] (03PS2) 10Alexandros Kosiaris: Reimage as stretch ganeti2003, ganeti2007 [puppet] - 10https://gerrit.wikimedia.org/r/432973 [13:04:26] (03CR) 10Alexandros Kosiaris: [C: 032] Reimage as stretch ganeti2003, ganeti2007 [puppet] - 10https://gerrit.wikimedia.org/r/432973 (owner: 10Alexandros Kosiaris) [13:04:51] zeljkof: no SWAT today? [13:05:09] PROBLEM - Host ganeti2008 is DOWN: PING CRITICAL - Packet loss = 100% [13:05:52] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204514 (10Vgutierrez) [13:05:59] RECOVERY - Host ganeti2008 is UP: PING OK - Packet loss = 0%, RTA = 36.88 ms [13:06:11] !log reboot ganeti2008 for kernel ugprade [13:06:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:19] !log remove ganeti2003, ganeti2007 from the ganeti cluster. stretch reimaging in progress [13:08:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:28] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204518 (10MaxBioHazard) Do we need to stop using DotNetWikiBot framework, because you will disable encryption method, used in it? [13:09:29] PROBLEM - ganeti-noded running on ganeti2003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded [13:09:39] PROBLEM - ganeti-confd running on ganeti2003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (gnt-confd), command name ganeti-confd [13:09:49] PROBLEM - ganeti-confd running on ganeti2007 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (gnt-confd), command name ganeti-confd [13:09:59] PROBLEM - ganeti-noded running on ganeti2007 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded [13:10:09] PROBLEM - ganeti-mond running on ganeti2003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-mond [13:10:20] PROBLEM - ganeti-mond running on ganeti2007 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-mond [13:12:25] zeljkof, are you around? [13:12:57] Reedy, Dereckson, somebody in SWAT team? [13:15:55] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204523 (10BBlack) It's more likely that DotNetWikiBot just needs to be built against a newer .NET version, or needs .NET configuration tweaks, to support better encryption (or p... [13:18:26] !log Deploy schema change on dbstore1002:s7 - T191519 T188299 T190148 [13:18:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:32] T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519 [13:18:32] T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148 [13:18:32] T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 [13:20:48] 10Operations, 10SRE-Access-Requests: Give Seddon access to the analytics cluster - https://phabricator.wikimedia.org/T194445#4204532 (10Ottomata) The requested group makes sense. Nuria, 2-3 hours delay is kinda 'near real time'? :) [13:29:42] <_joe_> ottomata: I have a patch for cergen (also, I'd need to deploy it soon, if you like it) [13:30:09] <_joe_> https://gerrit.wikimedia.org/r/#/c/432977/ [13:30:34] <_joe_> which is the sister of the puppet_ecdsacert patch at https://gerrit.wikimedia.org/r/#/c/431738/ [13:32:03] <_joe_> tox fails in CI for $reasons, but passes in my stretch environment, FWIW [13:32:39] _joe_: $reasons is written there, setuptools too old :-P [13:33:06] <_joe_> volans: yeah, I intentionally didn't want to get into that ;) [13:41:20] _joe_: i saw will look! [13:41:34] <_joe_> ottomata: <3 [13:41:44] !log stop db2063 for reimage [13:41:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:17] dcausse, Urbanecm sorry traveling [13:44:36] And are you able to swat? [13:45:10] Urbanecm: sorry, no, probably all week [13:45:24] Ok. [13:45:46] zeljkof, who is able to SWAT during EU SWAT? [13:46:10] Urbanecm: I'm not in the SWAT team technically P [13:46:11] :P [13:46:32] dcausse: Abotu? [13:46:33] Reedy, and can you sync two patches? :D [13:46:36] Though, that's CR-1 [13:46:46] Reedy, don't understand, what's CR-1? [13:46:54] The other patch in SWAT [13:47:00] (03CR) 10Reedy: [C: 032] New throttle rule for edit-a-thon on 18 May [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432962 (https://phabricator.wikimedia.org/T194630) (owner: 10Urbanecm) [13:47:05] It's VR-1 :D [13:47:16] Urbanecm: sorry, no idea [13:47:25] zeljkof, Reedy's SWATting, thank you [13:47:41] Reedy: yes mine is blocked on T194632 [13:47:41] T194632: mediawiki-extensions-hhvm-jessie failures on EchoDiscussionParserTest (wmf/1.32.0-wmf.3) - https://phabricator.wikimedia.org/T194632 [13:47:59] dcausse: You can force it through if that's the only failure [13:48:02] It's obviously unrelated [13:48:07] I can swat the one from Urbanecm if someone is willing to give a +1 [13:48:13] 10Operations, 10ops-eqiad, 10DBA: Move db1067 to row C - https://phabricator.wikimedia.org/T193835#4204590 (10Marostegui) This has been scheduled for Wed 16th [13:48:15] I've just CR+d'd it [13:48:19] (03Merged) 10jenkins-bot: New throttle rule for edit-a-thon on 18 May [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432962 (https://phabricator.wikimedia.org/T194630) (owner: 10Urbanecm) [13:48:30] Reedy: yes but I'd love to have some blessing from releng before [13:49:12] Just apply common sense [13:49:16] I'll merge it if you want :) [13:49:56] a +1 might be sufficient I suppose? [13:50:01] !log reedy@tin Synchronized wmf-config/throttle.php: T194630 (duration: 01m 02s) [13:50:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:05] T194630: Request to lift IP cap for Le Rosey edit-a-thon on May 18, 2018 - https://phabricator.wikimedia.org/T194630 [13:50:16] Reedy: you rock :) [13:53:20] !log reedy@tin Synchronized php-1.32.0-wmf.3/extensions/CirrusSearch: Partially revert deprecation of global namespace handling in prefix (duration: 01m 20s) [13:53:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:51] Reedy: thanks!! [13:53:59] np :) [13:58:19] !log added temporary iptables drop rules on fermium for IPs with many hits logged against the list subscribe rate limit [13:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:58] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: instance=kubernetes1004.eqiad.wmnet operation_type={create_container,image_status,podsandbox_status,remove_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [14:02:58] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [14:04:07] (03PS2) 10Milimetric: Drop private data used for geoeditor aggregation [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T193165) [14:04:46] (03CR) 10jerkins-bot: [V: 04-1] Drop private data used for geoeditor aggregation [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T193165) (owner: 10Milimetric) [14:05:59] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Ban clients of WDQS which don't follow throttling directives for some time - https://phabricator.wikimedia.org/T194653#4204678 (10Gehel) [14:08:51] 10Operations, 10Patch-For-Review: revisit swift (sys)logging - https://phabricator.wikimedia.org/T137397#4204685 (10fgiunchedi) 05Open>03Resolved Resolving, swift (sys)log has been fixed a while ago but this task never resolved. [14:10:23] !log milimetric@tin Started deploy [analytics/refinery@541823e]: deploying refinery to update python logic for cron jobs [14:10:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:07] (03CR) 10Filippo Giunchedi: mtail: Add xcachestatus to varnishrls (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/432712 (https://phabricator.wikimedia.org/T190978) (owner: 10Krinkle) [14:12:42] (03PS4) 10Ottomata: Puppetize Turnilo (Pivot replacement) [puppet] - 10https://gerrit.wikimedia.org/r/432530 (https://phabricator.wikimedia.org/T194427) [14:13:16] (03CR) 10jerkins-bot: [V: 04-1] Puppetize Turnilo (Pivot replacement) [puppet] - 10https://gerrit.wikimedia.org/r/432530 (https://phabricator.wikimedia.org/T194427) (owner: 10Ottomata) [14:13:55] (03CR) 10Ottomata: [V: 032 C: 032] Puppetize Turnilo (Pivot replacement) [puppet] - 10https://gerrit.wikimedia.org/r/432530 (https://phabricator.wikimedia.org/T194427) (owner: 10Ottomata) [14:15:54] !log mobrovac@tin Started deploy [restbase/deploy@75dc661]: API: Add /transform/list/tool/{tool}{/from}{/to} - T163203 [14:15:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:59] T163203: Update CX to use the new Restbase provided public API instead of CXServer - https://phabricator.wikimedia.org/T163203 [14:17:04] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204704 (10MaxBioHazard) E-mail of DNWB author is codedriller@gmail.com . You can write a letter to he and explain, what he should do to fix this problem. [14:18:25] PROBLEM - puppet last run on thorium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[analytics/turnilo/deploy] [14:18:38] ah! [14:19:08] !log rebooting silver for some microcode/kernel tests [14:19:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:25] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 51 seconds ago with 1 failures. Failed resources (up to 3 shown): Scap_source[analytics/turnilo/deploy] [14:20:45] ^ me [14:20:53] missing some content in a deploy repo [14:20:54] elukey: sorry [14:20:55] fixing [14:22:44] ottomata: my "ah" was like a "nice!", nothing more :D [14:23:11] :) [14:23:57] ya elukey that was pretty easy did it real fast end my last work day [14:24:01] mostly copy/paste pivot stuff [14:24:40] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2067 - https://phabricator.wikimedia.org/T194103#4204745 (10Papaul) a:05Papaul>03jcrespo @jcrespo disk replacement complete [14:25:47] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2067 - https://phabricator.wikimedia.org/T194103#4204748 (10Marostegui) Thanks! Let's see how it goes ``` physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 600 GB, Rebuilding) ``` [14:27:41] (03CR) 10jenkins-bot: New throttle rule for edit-a-thon on 18 May [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432962 (https://phabricator.wikimedia.org/T194630) (owner: 10Urbanecm) [14:29:32] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [14:29:35] !log otto@tin Started deploy [analytics/turnilo/deploy@9b2c8f0]: initial deploy [14:29:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:03] !log otto@tin Finished deploy [analytics/turnilo/deploy@9b2c8f0]: initial deploy (duration: 00m 28s) [14:30:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:09] !log milimetric@tin Finished deploy [analytics/refinery@541823e]: deploying refinery to update python logic for cron jobs (duration: 19m 46s) [14:30:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:33] what is the issue with etcd? [14:32:31] PROBLEM - puppet last run on deploy1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Scap_source[analytics/turnilo/deploy] [14:32:52] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation=compareAndSwap https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:33:11] PROBLEM - puppet last run on naos is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Scap_source[analytics/turnilo/deploy] [14:33:22] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:33:36] there was a spike at 14:27 [14:33:41] RECOVERY - puppet last run on thorium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:34:01] RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:34:31] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:39:29] !log mobrovac@tin Finished deploy [restbase/deploy@75dc661]: API: Add /transform/list/tool/{tool}{/from}{/to} - T163203 (duration: 23m 35s) [14:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:37] T163203: Update CX to use the new Restbase provided public API instead of CXServer - https://phabricator.wikimedia.org/T163203 [14:40:48] !log mobrovac@tin Started deploy [restbase/deploy@75dc661]: API: Add /transform/list/tool/{tool}{/from}{/to}, take #2 [14:40:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:27] (03PS1) 10Ottomata: Fix yaml rendering of druid_clusters for turnilo [puppet] - 10https://gerrit.wikimedia.org/r/432989 (https://phabricator.wikimedia.org/T194427) [14:44:06] (03PS2) 10Ottomata: Fix yaml rendering of druid_clusters for turnilo [puppet] - 10https://gerrit.wikimedia.org/r/432989 (https://phabricator.wikimedia.org/T194427) [14:44:16] (03CR) 10Ottomata: [V: 032 C: 032] Fix yaml rendering of druid_clusters for turnilo [puppet] - 10https://gerrit.wikimedia.org/r/432989 (https://phabricator.wikimedia.org/T194427) (owner: 10Ottomata) [14:44:39] zeljkof: hi! Would you by any chance have some time soon to have a look at https://gerrit.wikimedia.org/r/#/c/432983/ ? [14:44:45] !log mobrovac@tin Finished deploy [restbase/deploy@75dc661]: API: Add /transform/list/tool/{tool}{/from}{/to}, take #2 (duration: 03m 58s) [14:44:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:27] zeljkof: tldr: we want a daily job for browser tests for new extensions. I would especially appreciated advice on how to make the existing job templates reused without copying&pasting everything N times :) [14:45:27] !log rebooting labtestnet2002 for some microcode/kernel tests [14:45:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:50] (03CR) 10Milimetric: [C: 031] "this can now be merged, the code it depends on has been deployed." [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T193165) (owner: 10Milimetric) [14:50:07] (03PS1) 10Ottomata: Fix bad comment line wrap for turnilo config [puppet] - 10https://gerrit.wikimedia.org/r/432990 (https://phabricator.wikimedia.org/T194427) [14:50:56] (03CR) 10Ottomata: [C: 032] Fix bad comment line wrap for turnilo config [puppet] - 10https://gerrit.wikimedia.org/r/432990 (https://phabricator.wikimedia.org/T194427) (owner: 10Ottomata) [14:51:08] (03PS1) 10Elukey: profile::cumin:aliases: add analytics-all-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/432991 [14:56:30] (03CR) 10Muehlenhoff: profile::cumin:aliases: add analytics-all-eqiad (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/432991 (owner: 10Elukey) [14:56:30] !log Drop unused table long_run_profiling from enwiki - T194661 [14:56:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:36] T194661: Drop long_run_profiling unused table - https://phabricator.wikimedia.org/T194661 [14:58:02] (03PS1) 10Ottomata: More turnilo yaml config fixes [puppet] - 10https://gerrit.wikimedia.org/r/432993 (https://phabricator.wikimedia.org/T194427) [14:59:10] (03CR) 10Ottomata: [C: 032] More turnilo yaml config fixes [puppet] - 10https://gerrit.wikimedia.org/r/432993 (https://phabricator.wikimedia.org/T194427) (owner: 10Ottomata) [15:01:32] (03PS1) 10Ottomata: Proper cluster names in turnilo datasources [puppet] - 10https://gerrit.wikimedia.org/r/432994 (https://phabricator.wikimedia.org/T194427) [15:02:09] (03CR) 10Jcrespo: [C: 032] mariadb-package: Update systemd unit [software] - 10https://gerrit.wikimedia.org/r/432965 (https://phabricator.wikimedia.org/T194516) (owner: 10Jcrespo) [15:02:18] PROBLEM - Check the NTP synchronisation status of timesyncd on ganeti2003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [15:02:18] PROBLEM - puppet last run on ganeti2003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [15:02:28] (03CR) 10Ottomata: [C: 032] Proper cluster names in turnilo datasources [puppet] - 10https://gerrit.wikimedia.org/r/432994 (https://phabricator.wikimedia.org/T194427) (owner: 10Ottomata) [15:02:59] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204891 (10Vgutierrez) @MaxBioHazard I just tested DotNetWikiBot/3.15 on a docker container with mono 4.8.0 and it's able to use recent TLS ciphersuites, so you should be able to... [15:03:47] PROBLEM - Check whether ferm is active by checking the default input chain on ganeti2003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [15:04:38] RECOVERY - Device not healthy -SMART- on db2067 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2067&var-datasource=codfw%2520prometheus%252Fops [15:05:17] PROBLEM - DPKG on ganeti2003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [15:08:08] PROBLEM - Disk space on ganeti2003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [15:08:25] 10Operations, 10ops-eqiad, 10Cloud-VPS: Update and move labnet1001/1002 - https://phabricator.wikimedia.org/T193579#4204901 (10Cmjohnson) @chasemp @andrewbogott . no, it has not been moved yet [15:09:15] (03CR) 10Elukey: Drop private data used for geoeditor aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T193165) (owner: 10Milimetric) [15:09:18] (03PS5) 10Rush: openstack: neutron: nova.conf: enable options [puppet] - 10https://gerrit.wikimedia.org/r/432130 (https://phabricator.wikimedia.org/T193657) (owner: 10Arturo Borrero Gonzalez) [15:09:48] 10Operations, 10ops-codfw, 10DC-Ops: mw2139 failed to boot - hardware check - https://phabricator.wikimedia.org/T194426#4204903 (10Papaul) a:05Papaul>03Dzahn @Dzahn after unplugging the PSU from the server and boot the server I get the error below. I have no option to reset the IDRAC when I go to the... [15:10:18] RECOVERY - Disk space on ganeti2003 is OK: DISK OK [15:10:28] RECOVERY - DPKG on ganeti2003 is OK: All packages OK [15:10:38] (03CR) 10Rush: "I've uploaded what I think is the fix but I'll let you validate. Two issues I saw here: there were dupe keys for the scheduler_pool for la" [puppet] - 10https://gerrit.wikimedia.org/r/432130 (https://phabricator.wikimedia.org/T193657) (owner: 10Arturo Borrero Gonzalez) [15:11:07] RECOVERY - Check whether ferm is active by checking the default input chain on ganeti2003 is OK: OK ferm input default policy is set [15:11:39] 10Operations, 10ops-eqiad, 10Cloud-VPS: Update and move labnet1001/1002 - https://phabricator.wikimedia.org/T193579#4204911 (10Andrew) OK. In theory we can move it after the outage window tomorrow, since we're planning to switch all traffic back to labnet1001 after it gets re-racked. The only risk I can th... [15:11:56] !log stop db2064 for reimage [15:11:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:27] RECOVERY - puppet last run on ganeti2003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:14:03] 10Operations, 10ops-codfw, 10DC-Ops: mw2139 failed to boot - hardware check - https://phabricator.wikimedia.org/T194426#4204915 (10Papaul) {F18254197} [15:16:39] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4196798 (10Reedy) AutoWikiBrowser isn't showing in your requests... And that is .NET based, though mostly on Windows, rather than mono (though, there are some users that do..) D... [15:21:14] (03CR) 10Ottomata: [C: 031] "NICCE! YEEHAW" [puppet] - 10https://gerrit.wikimedia.org/r/432564 (https://phabricator.wikimedia.org/T192557) (owner: 10Elukey) [15:22:21] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204922 (10Vgutierrez) >>! In T194380#4204918, @Reedy wrote: > AutoWikiBrowser isn't showing in your requests... And that is .NET based, though mostly on Windows, rather than mon... [15:24:01] (03PS1) 10Herron: lists: disable list subscription via email [puppet] - 10https://gerrit.wikimedia.org/r/432998 (https://phabricator.wikimedia.org/T194032) [15:24:53] (03PS2) 10Elukey: profile::cumin:aliases: add analytics-all-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/432991 [15:24:59] 10Operations, 10Traffic: Identify bots using AES128-SHA maintainers running on toolforge - https://phabricator.wikimedia.org/T194380#4204926 (10Reedy) >>! In T194380#4204922, @Vgutierrez wrote: >>>! In T194380#4204918, @Reedy wrote: >> AutoWikiBrowser isn't showing in your requests... And that is .NET based, t... [15:29:00] RECOVERY - puppet last run on deploy1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:29:49] RECOVERY - puppet last run on naos is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:31:39] PROBLEM - Host ganeti2007 is DOWN: PING CRITICAL - Packet loss = 100% [15:32:09] RECOVERY - Host ganeti2007 is UP: PING WARNING - Packet loss = 54%, RTA = 37.22 ms [15:32:19] RECOVERY - Check the NTP synchronisation status of timesyncd on ganeti2003 is OK: OK: synced at Mon 2018-05-14 15:32:14 UTC. [15:34:49] PROBLEM - Host ganeti2003 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:49] RECOVERY - Host ganeti2003 is UP: PING OK - Packet loss = 0%, RTA = 36.21 ms [15:38:07] (03Abandoned) 10Niedzielski: New: add chromium_render service [puppet] - 10https://gerrit.wikimedia.org/r/409996 (https://phabricator.wikimedia.org/T178166) (owner: 10Niedzielski) [15:41:10] PROBLEM - Check whether ferm is active by checking the default input chain on ganeti2007 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [15:42:58] (03CR) 10Ottomata: [C: 031] Kafka: increase group.initial.rebalance.delay.ms to 10s. [puppet] - 10https://gerrit.wikimedia.org/r/432615 (https://phabricator.wikimedia.org/T189618) (owner: 10Ppchelko) [15:43:39] RECOVERY - Check whether ferm is active by checking the default input chain on ganeti2007 is OK: OK ferm input default policy is set [15:44:18] leszek_wmde: sorry traveling, will take a look later [15:52:55] 10Operations, 10Analytics, 10Patch-For-Review: Puppet admin module should support adding system users to managed groups - https://phabricator.wikimedia.org/T174465#4205006 (10fdans) p:05Normal>03High [15:58:28] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/432991 (owner: 10Elukey) [15:58:42] zeljkof: thanks! [16:00:15] (03CR) 10Elukey: [C: 032] profile::cumin:aliases: add analytics-all-eqiad [puppet] - 10https://gerrit.wikimedia.org/r/432991 (owner: 10Elukey) [16:05:12] (03PS1) 10Elukey: profile::cumin: fix analytics-misc alias, missing or [puppet] - 10https://gerrit.wikimedia.org/r/433003 [16:05:48] (03CR) 10Elukey: [C: 032] profile::cumin: fix analytics-misc alias, missing or [puppet] - 10https://gerrit.wikimedia.org/r/433003 (owner: 10Elukey) [16:08:30] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#3845702 (10Joe) >>! In T175288#4204232, @MoritzMuehlenhoff wrote: > I suggest we do the following: > - Pick a date/time frame of... [16:11:11] hey, is this a known error in the puppet catalog compiler webpage? https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/11197/console [16:11:22] https://www.irccloud.com/pastebin/wTpfB1vc/ [16:12:28] arturo: isn't it the same one that we were checking the other day? [16:12:44] elukey: is the error the same? [16:12:52] do you have the phab number? [16:13:10] I ran this same job the other day with no issues [16:13:13] yeah that weird UnicodeEncodeError: 'ascii' codec can't encode characters in position 5397-5398: ordinal not in range(128) [16:13:44] https://phabricator.wikimedia.org/T173518 [16:14:02] ok [16:14:12] and do you know of some workaround? :-P [16:14:56] ah this is the bad part, no :D [16:15:41] it seems as if python tries to write html but it fails with the above error [16:16:19] it might be a quick fix in the complier's python code [16:18:37] 10Operations, 10SRE-Access-Requests: Give Seddon access to the analytics cluster - https://phabricator.wikimedia.org/T194445#4205079 (10Jseddon) So the 2-3 hours turnaround is certainly fulfilling the needs of A/B testing requirements and vastly improves the ability to verify a campaign setup. Reducing both fr... [16:19:53] !log umount/remount /mnt/hdfs on stat1005 to pick up new openjdk upgrades [16:19:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:58] apergos: --^ [16:20:08] ah ha [16:20:10] thank you [16:25:52] (03PS6) 10Arturo Borrero Gonzalez: openstack: neutron: nova.conf: enable options [puppet] - 10https://gerrit.wikimedia.org/r/432130 (https://phabricator.wikimedia.org/T193657) [16:26:58] (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: neutron: nova.conf: enable options [puppet] - 10https://gerrit.wikimedia.org/r/432130 (https://phabricator.wikimedia.org/T193657) (owner: 10Arturo Borrero Gonzalez) [16:30:56] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4205104 (10Joe) The following servers: ``` mc1012 mc1011 mc1010 mc1009 mc1008 mc1007 ``` should all be decommissioned by now, and definitely don't need any s... [16:31:18] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4205105 (10Marostegui) From a DB point of view, these servers need special care: db1061 - s6 primary master. We'd need the less downtime possible. Writes to f... [16:31:34] (03PS1) 10Ottomata: Re-enable job topic mirroring main-eqiad -> jumbo [puppet] - 10https://gerrit.wikimedia.org/r/433005 (https://phabricator.wikimedia.org/T189464) [16:34:45] 10Operations, 10Wikimedia-Mailing-lists: Provide a mean to mass discard/reject subscription requests on Wikimedia mailing lists - https://phabricator.wikimedia.org/T194669#4205111 (10MarcoAurelio) [16:37:46] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4205127 (10jcrespo) Also most db hosts will need to be depooled (but that can be done for an extended time) due to Mediawiki bugs with timed out requests: T180... [16:42:48] 10Operations, 10SRE-Access-Requests: Give Seddon access to the analytics cluster - https://phabricator.wikimedia.org/T194445#4205134 (10Nuria) Access approved [16:50:02] 10Operations, 10Reading-Infrastructure-Team-Backlog, 10SRE-Access-Requests, 10Patch-For-Review: Add Michael Holloway (Reading Infrastructure) to maps admin groups - https://phabricator.wikimedia.org/T194404#4205170 (10Dzahn) a:03Dzahn [16:56:01] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4205217 (10akosiaris) ganeti hosts are the housing for multiple VMs. Those will experience an outage during the recabling. Listing them here ``` aluminium.wi... [16:56:07] 10Operations, 10SRE-Access-Requests: Give Seddon access to the analytics cluster - https://phabricator.wikimedia.org/T194445#4205218 (10Dzahn) Thank you all for explaining and approval. I'll hand this over to @Herron now to finish it since we are on a rotating clinic duty. [16:59:28] RECOVERY - HP RAID on db2067 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK [17:00:05] gehel: Your horoscope predicts another unfortunate Wikidata Query Service weekly deploy deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180514T1700). [17:00:16] jouncebot: o/ [17:00:19] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#3991415 (10Volans) Yeah, `puppetdb1001` will probably just generate some spam on IRC for failing puppet runs, transient. Regarding `neodymium` the only thing... [17:03:44] 10Operations, 10Wikimedia-Mailing-lists: Archive "wiki-offline-reader-l" - https://phabricator.wikimedia.org/T194575#4205256 (10Dzahn) p:05Triage>03Normal @Herron fyi, this is how to disable a list (in a consistent way compared to 'manually' using the web ui) and keeping the archives, for tickets like thi... [17:03:53] !log gehel@tin Started deploy [wdqs/wdqs@ef37bf7]: new WDQS GUI version [17:03:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:04:31] (03PS5) 10Alexandros Kosiaris: grafana: Add migration script from proxy to LDAP auth [puppet] - 10https://gerrit.wikimedia.org/r/404651 (https://phabricator.wikimedia.org/T170150) [17:04:33] (03PS9) 10Alexandros Kosiaris: grafana: Enable grafana LDAP in production [puppet] - 10https://gerrit.wikimedia.org/r/404321 (https://phabricator.wikimedia.org/T170150) [17:12:35] !log gehel@tin Finished deploy [wdqs/wdqs@ef37bf7]: new WDQS GUI version (duration: 08m 43s) [17:12:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:13:45] SMalyshev: ^ deployment completed, tests are green [17:15:46] 10Operations, 10Wikimedia-Mailing-lists: Archive "wiki-offline-reader-l" - https://phabricator.wikimedia.org/T194575#4205264 (10Dzahn) The archives are still here: https://lists.wikimedia.org/pipermail/wiki-offline-reader-l/ But you should not get any mail from it anymore.. though i'm not 100% sure if you wo... [17:16:11] 10Operations, 10Wikimedia-Mailing-lists: Archive "wiki-offline-reader-l" - https://phabricator.wikimedia.org/T194575#4205265 (10Dzahn) 05Open>03Resolved a:03Dzahn please reopen if you get any more mails related to this list [17:17:43] 10Operations, 10Wikimedia-Mailing-lists: Provide a mean to mass discard/reject subscription requests on Wikimedia mailing lists - https://phabricator.wikimedia.org/T194669#4205269 (10Aklapper) https://bugs.launchpad.net/mailman/+bug/266746 is probably the closest I could find in upstream when it comes to //mas... [17:19:37] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4205272 (10Dzahn) The plan sounds good. How do you guys feel about the maintenance server, terbium -> mwmaint1001 (T192092), sh... [17:27:16] (03CR) 10Ottomata: [C: 032] Re-enable job topic mirroring main-eqiad -> jumbo [puppet] - 10https://gerrit.wikimedia.org/r/433005 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata) [17:28:19] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4205305 (10MoritzMuehlenhoff) >>! In T175288#4205272, @Dzahn wrote: > How do you guys feel about the maintenance server, terbium... [17:30:12] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4205307 (10MoritzMuehlenhoff) > And, let's do this on friday, that leaves us until monday's SWAT (if any). How about Friday, 25... [17:34:23] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4205322 (10Dzahn) @20after4 @greg We are trying to find a time to replace tin with deploy1001 (this time for real) and are think... [17:36:13] (03PS3) 10Elukey: Kafka: increase group.initial.rebalance.delay.ms to 10s. [puppet] - 10https://gerrit.wikimedia.org/r/432615 (https://phabricator.wikimedia.org/T189618) (owner: 10Ppchelko) [17:40:30] 10Operations, 10ops-codfw, 10DC-Ops: mw2139 failed to boot - hardware check - https://phabricator.wikimedia.org/T194426#4205332 (10Dzahn) a:05Dzahn>03Papaul @Papaul thank you ! We should try the mainboard replacement but only if it's relatively easy thing to do and we have one around. If it causes a con... [17:47:27] (03PS3) 10Subramanya Sastry: Enable RemexHtml on wikis with < 100 ns0 errors in high priority cats [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432621 (https://phabricator.wikimedia.org/T193685) [17:53:07] 10Operations, 10Icinga, 10monitoring, 10Tor: Icinga check for Tor - https://phabricator.wikimedia.org/T148614#2727820 (10Dzahn) [[ https://github.com/goodvikings/tor_nagios/ | tor_nagios ]] is [[ https://en.wikipedia.org/wiki/Beerware | beerware ]] licensed which means if any of us meet [[https://en.wikip... [17:53:34] (03PS3) 10Milimetric: Drop private data used for geoeditor aggregation [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T193165) [18:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for Morning SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180514T1800). [18:00:04] framawiki: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:54] o/ [18:01:37] (03PS1) 10Jcrespo: mariadb: Depool all row C databases (except s6 master) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433014 (https://phabricator.wikimedia.org/T187962) [18:02:15] (03CR) 10Jcrespo: [C: 04-2] "Do not deploy yet, but please review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433014 (https://phabricator.wikimedia.org/T187962) (owner: 10Jcrespo) [18:06:32] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4205406 (10jcrespo) We should be able to failover logically dbproxy1007,8,9 to its hot spare, too. [18:06:39] (03CR) 10Elukey: "Extended Pcc: https://puppet-compiler.wmflabs.org/compiler02/11198/" [puppet] - 10https://gerrit.wikimedia.org/r/425238 (https://phabricator.wikimedia.org/T182924) (owner: 10Elukey) [18:07:35] framawiki: I can SWAT once kaldari reviews it. I don't have a good idea of how permissions work. [18:07:41] I asked him to review it just now. [18:07:44] Should be quick. [18:09:01] (03PS1) 10Jcrespo: mariadb: Failover dbproxy1007,8 and 9 and make them passive [dns] - 10https://gerrit.wikimedia.org/r/433015 (https://phabricator.wikimedia.org/T187962) [18:10:04] (03PS4) 10Milimetric: Drop private data used for geoeditor aggregation [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T193165) [18:10:16] (03CR) 10Kaldari: [C: 031] Create the 'eventcoordinator' user group on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430418 (https://phabricator.wikimedia.org/T193075) (owner: 10Framawiki) [18:10:42] (03CR) 10Elukey: [C: 032] Drop private data used for geoeditor aggregation [puppet] - 10https://gerrit.wikimedia.org/r/432104 (https://phabricator.wikimedia.org/T193165) (owner: 10Milimetric) [18:10:50] (03PS4) 10Niharika29: Create the 'eventcoordinator' user group on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430418 (https://phabricator.wikimedia.org/T193075) (owner: 10Framawiki) [18:11:06] (03CR) 10Niharika29: [C: 032] "SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430418 (https://phabricator.wikimedia.org/T193075) (owner: 10Framawiki) [18:12:31] (03Merged) 10jenkins-bot: Create the 'eventcoordinator' user group on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430418 (https://phabricator.wikimedia.org/T193075) (owner: 10Framawiki) [18:12:46] (03CR) 10jenkins-bot: Create the 'eventcoordinator' user group on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430418 (https://phabricator.wikimedia.org/T193075) (owner: 10Framawiki) [18:13:19] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4205417 (10jcrespo) With the only above patches, the only special requirement for us is to handle db1061 (s6 master) on its own separate window- provide a real... [18:15:18] framawiki: Available to test? [18:15:24] Or should I just sync? [18:16:07] Niharika: I prefer verify if possible, on what host should I look at ? [18:16:17] framawiki: mwdebug1001. [18:17:05] not 1002 today? [18:17:42] (btw thanks for the check kaldari) [18:17:46] Niharika: ok for me [18:17:49] Hauskatze: It's got an ssh key change and I didn't bother my known_hosts file. [18:17:59] :) [18:18:03] framawiki: Perfect. Deploying now... [18:20:07] !log niharika29@tin Synchronized wmf-config/InitialiseSettings.php: Create the eventcoordinator user group on enwiki - T193075 (duration: 01m 02s) [18:20:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:12] T193075: Create the 'Event coordinator' user group on English Wikipedia - https://phabricator.wikimedia.org/T193075 [18:21:31] framawiki: Done. [18:23:22] Niharika: thanks ! [18:23:38] Thank YOU for the patch. :) [18:28:10] Hi everyone, is it possible to deploy a new, unscheduled patch in this SWAT window? [18:30:22] Daimona: Depends on the patch. :) [18:30:33] We do have time. [18:30:44] Sure, thanks :) [18:30:47] The patch is this one: https://gerrit.wikimedia.org/r/#/c/432519/ [18:31:06] A pretty straightforward change, already live in master [18:32:31] I'm sorry for not adding myself to SWAT page but I wasn't sure to be on time here [18:45:54] Daimona: Sorry I got distracted. [18:46:00] Daimona: Will you be able to test it? [18:46:07] No problem :) [18:46:14] Yeah, the test is really quick [18:46:33] Okay, waiting for it to merge. [18:47:09] Many thanks [18:47:22] It'll take some time [18:47:37] Yeah, no worries. [18:52:36] Woah, it's failing [18:52:46] But I'm highly unsure about the reason [18:52:58] Might be something unrelated to the patch. [18:53:12] But we won't be able to SWAT it. [18:53:59] Hmm, some errors with EchoDiscussionParserTests. [18:54:19] Yeah, sounds like unrelated [18:54:34] Not sure if it's a false positive or something wrong with Echo, though [18:55:12] I'm creating a bug about this. [18:55:35] Daimona: Is it alright if we delay the patch until evening swat? [18:55:49] Daimona: Leon says he will be available then, I think. [18:56:17] Unfortunately I won't be there due to time zone :) [18:56:31] But I'll reschedule it for tomorrow [18:57:06] Daimona: You don't have to be present if someone else can be there in your stead. [18:57:17] Oh [18:57:21] Who will? [18:57:37] Daimona: I dunno. musikanimal? [18:57:38] Anyway it would be fine indeed [18:57:41] Alright. [18:57:53] Oh, right. Well then :) [18:59:17] BTW those Echo failures are known, T194632 [18:59:17] T194632: mediawiki-extensions-hhvm-jessie failures on EchoDiscussionParserTest (wmf/1.32.0-wmf.3) - https://phabricator.wikimedia.org/T194632 [19:00:10] Oh, I filed another ticket. [19:00:12] Will merge. [19:09:43] Yes I'll be around for evening SWAT [19:10:11] Cool, thanks [19:10:22] Although I'm worried for that Echo test [19:11:01] !log rolling cassandra restart, restbase dev environment [19:11:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:41] 10Operations, 10Cloud-VPS, 10cloud-services-team: rack/setup/install labstore1008 & labstore1009 - https://phabricator.wikimedia.org/T193655#4205548 (10chasemp) Planning to talk to the team about this during the normal meeting tomorrow. [19:20:09] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#3853129 (10mmodell) @MoritzMuehlenhoff @dzahn: The 25th should be fine. [19:20:57] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2067 - https://phabricator.wikimedia.org/T194103#4205552 (10Marostegui) 05Open>03Resolved This time it worked fine Thanks Papaul! ``` logicaldrive 1 (3.3 TB, RAID 1+0, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK)... [19:20:58] PROBLEM - Disk space on elastic1026 is CRITICAL: DISK CRITICAL - free space: /srv 61673 MB (12% inode=99%) [19:26:29] (03PS1) 10Jgreen: add check_ipsec to nsca_frack.cfg.erb [puppet] - 10https://gerrit.wikimedia.org/r/433019 [19:28:13] (03CR) 10Jgreen: [C: 032] add check_ipsec to nsca_frack.cfg.erb [puppet] - 10https://gerrit.wikimedia.org/r/433019 (owner: 10Jgreen) [19:29:48] RECOVERY - Disk space on elastic1026 is OK: DISK OK [19:35:07] (03PS5) 10Andrew Bogott: keystonehooks: Update our create_project monkeypatch to match Mitaka upstream [puppet] - 10https://gerrit.wikimedia.org/r/432040 [19:51:09] (03PS1) 10Dzahn: admins: add mholloway to maps/tilerator/kartotherian admins [puppet] - 10https://gerrit.wikimedia.org/r/433021 (https://phabricator.wikimedia.org/T194404) [19:51:57] 10Operations, 10Reading-Infrastructure-Team-Backlog, 10SRE-Access-Requests, 10Patch-For-Review: Add Michael Holloway (Reading Infrastructure) to maps admin groups - https://phabricator.wikimedia.org/T194404#4205655 (10Dzahn) This requested has been approved in today's SRE meeting. [19:53:22] 10Operations, 10Toolforge-standards-committee, 10Wikimedia-Mailing-lists: Rename (recreate) mailing list for Toolforge-standards-committee - https://phabricator.wikimedia.org/T172624#4205662 (10Quiddity) <3 [19:55:54] 10Operations, 10Reading-Infrastructure-Team-Backlog, 10SRE-Access-Requests, 10Patch-For-Review: Add Michael Holloway (Reading Infrastructure) to maps admin groups - https://phabricator.wikimedia.org/T194404#4205667 (10Dzahn) [19:56:38] 10Operations, 10Reading-Infrastructure-Team-Backlog, 10SRE-Access-Requests, 10Patch-For-Review: Add Michael Holloway (Reading Infrastructure) to maps admin groups - https://phabricator.wikimedia.org/T194404#4197487 (10Dzahn) Everything checked off except the waiting period. Will be merged and resolved tomo... [19:59:44] 10Operations, 10SRE-Access-Requests: Give Seddon access to the analytics cluster - https://phabricator.wikimedia.org/T194445#4205680 (10Dzahn) a:03herron [19:59:46] 10Operations, 10Analytics, 10SRE-Access-Requests: Access to usergroups for Marshall Miller - https://phabricator.wikimedia.org/T194550#4205682 (10Dzahn) a:03herron [20:00:05] cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180514T2000). [20:00:25] 10Operations, 10Wikimedia-Mailing-lists: wikitech-l is mangling my PGP/MIME emails, causing signature validation to fail - https://phabricator.wikimedia.org/T186311#4205683 (10Dzahn) p:05Triage>03Normal [20:01:14] (03CR) 10Andrew Bogott: [C: 032] keystonehooks: Update our create_project monkeypatch to match Mitaka upstream [puppet] - 10https://gerrit.wikimedia.org/r/432040 (owner: 10Andrew Bogott) [20:01:57] !log bsitzmann@tin Started deploy [mobileapps/deploy@ccffa6b]: Update mobileapps to 39c16e4 (T193440 T193439 T194065) [20:02:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:05] T193439: PCS metadata: add description_source - https://phabricator.wikimedia.org/T193439 [20:02:05] T194065: /page/media endpoint fails for a specific case on enwp: "Cannot read property '0' of undefined" - https://phabricator.wikimedia.org/T194065 [20:02:05] T193440: MCS random: use prop=description instead of prop=pageterms - https://phabricator.wikimedia.org/T193440 [20:09:46] !log bsitzmann@tin Finished deploy [mobileapps/deploy@ccffa6b]: Update mobileapps to 39c16e4 (T193440 T193439 T194065) (duration: 07m 49s) [20:09:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:52] T193439: PCS metadata: add description_source - https://phabricator.wikimedia.org/T193439 [20:09:52] T194065: /page/media endpoint fails for a specific case on enwp: "Cannot read property '0' of undefined" - https://phabricator.wikimedia.org/T194065 [20:09:52] T193440: MCS random: use prop=description instead of prop=pageterms - https://phabricator.wikimedia.org/T193440 [20:10:10] (03PS1) 10Herron: admin: add user seddon to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/433025 (https://phabricator.wikimedia.org/T194445) [20:22:29] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4205765 (10Ottomata) @bblack did this end up being a Q4 goal for traffic team? [20:25:12] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Give Seddon access to the analytics cluster - https://phabricator.wikimedia.org/T194445#4205769 (10herron) @Jseddon I don't think we have an SSH key on file for you yet. Could you please provide the unique ssh public key that you'll be using to acce... [20:25:37] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4205770 (10BBlack) I think this ended up being an Analytics Q4 goal? It's not on our goals list, but we agree to alot some time to it in this Q... [20:26:17] 10Operations, 10Analytics, 10Patch-For-Review: Puppet admin module should support adding system users to managed groups - https://phabricator.wikimedia.org/T174465#4205772 (10Ottomata) @akosiaris @MoritzMuehlenhoff, I need to resurrect this task. We also need this in order for the druid user to access `hdfs... [20:40:55] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4205810 (10Ottomata) Ok, great! From our side, we're mostly looking on either more TODOs and/or approval to remove IPSec from jumbo + varnishka... [20:42:40] !log arlolra@tin Started deploy [parsoid/deploy@28fcc4e]: Updating Parsoid to 945ed23 [20:42:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:54:08] !log arlolra@tin Finished deploy [parsoid/deploy@28fcc4e]: Updating Parsoid to 945ed23 (duration: 11m 29s) [20:54:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:04] bawolff and Reedy: That opportune time is upon us again. Time for a Weekly Security deployment window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180514T2100). [21:03:06] !log Updated Parsoid to 945ed23 (T194082, T194083, T194084) [21:03:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:03:12] T194084: Expecting : in parser function definiton - https://phabricator.wikimedia.org/T194084 [21:03:13] T194083: Found nested inserted dom-diff flags! - https://phabricator.wikimedia.org/T194083 [21:03:13] T194082: Cannot read property 'insertBefore' of null - https://phabricator.wikimedia.org/T194082 [21:10:59] (03PS1) 10Herron: admin: add mmiller to analytics-privatedata-users and researchers [puppet] - 10https://gerrit.wikimedia.org/r/433083 (https://phabricator.wikimedia.org/T194550) [21:44:12] (03PS4) 10Dzahn: mediawiki/apache: seperate line for each chapter ServerAlias [puppet] - 10https://gerrit.wikimedia.org/r/429863 [21:50:14] (03CR) 10Dzahn: [C: 031] "confirmed UID in LDAP, lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/433083 (https://phabricator.wikimedia.org/T194550) (owner: 10Herron) [21:50:41] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4205997 (10ayounsi) [21:54:26] !log mwdebug1001 - temp modifying apache 08-wikimedia.conf to test gerrit:429863 [21:54:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:57:30] (03CR) 10Dzahn: [C: 032] "tested with apache-fast-test and all the urls on mwdebug1001" [puppet] - 10https://gerrit.wikimedia.org/r/429863 (owner: 10Dzahn) [21:58:32] (03PS1) 10Bstorm: wiki replicas: return page to a full view [puppet] - 10https://gerrit.wikimedia.org/r/433085 (https://phabricator.wikimedia.org/T174047) [22:03:01] (03CR) 10Dzahn: [C: 032] "apache2ctl -S before and after shows same number of aliases too" [puppet] - 10https://gerrit.wikimedia.org/r/429863 (owner: 10Dzahn) [22:05:17] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Give Seddon access to the analytics cluster - https://phabricator.wikimedia.org/T194445#4206012 (10Jseddon) ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC8dbGveSAbYpf9dfFbSbNcIlxnQwhBOZJ+y8MIIX650ujHRofwZEcnUFDIGqEHCMO6clYuFFxN8GUcBzSQI+R9IoIH+GSmx3SIdMVtjM... [22:11:46] (03PS5) 10Dzahn: base: update version of gen_fingerprints script [puppet] - 10https://gerrit.wikimedia.org/r/429114 [22:32:02] (03PS6) 10Dzahn: base: update version of gen_fingerprints script [puppet] - 10https://gerrit.wikimedia.org/r/429114 [22:36:43] (03PS7) 10Dzahn: base: update version of gen_fingerprints script [puppet] - 10https://gerrit.wikimedia.org/r/429114 [22:38:17] (03CR) 10Dzahn: [C: 032] base: update version of gen_fingerprints script [puppet] - 10https://gerrit.wikimedia.org/r/429114 (owner: 10Dzahn) [22:38:21] (03PS2) 10Bstorm: wiki replicas: return page to a full view [puppet] - 10https://gerrit.wikimedia.org/r/433085 (https://phabricator.wikimedia.org/T174047) [22:42:50] (03CR) 10Dzahn: [C: 032] "we are now also getting ASCII art. example:" [puppet] - 10https://gerrit.wikimedia.org/r/429114 (owner: 10Dzahn) [22:55:52] oh i totally forgot I was going to deploy https://gerrit.wikimedia.org/r/#/c/432334/ in the security window... oh well I'll just add it to SWAT [22:57:18] no_justification: Can I re-add csp logs to logstash? We're planning to do a big push related to csp in the upcoming weeks [22:58:52] (03CR) 10Dzahn: [C: 04-1] "too early" [puppet] - 10https://gerrit.wikimedia.org/r/430530 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [22:59:00] (03CR) 10Dzahn: [C: 04-1] "too early" [puppet] - 10https://gerrit.wikimedia.org/r/431039 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [22:59:11] (03CR) 10Dzahn: [C: 04-1] "too early" [puppet] - 10https://gerrit.wikimedia.org/r/431041 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [22:59:18] (03CR) 10Dzahn: [C: 04-1] "too early" [puppet] - 10https://gerrit.wikimedia.org/r/431042 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180514T2300). [23:00:04] bawolff: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:10] Woo [23:00:28] If its easier for swatters, I could do the deployment myself [23:02:19] bawolff: I'm out of office [23:02:29] You do you boo [23:02:38] Cool cool :) [23:02:50] Thanks [23:02:59] Yw [23:05:11] (03CR) 10Dzahn: "merge once it's Tuesday morning in S.F." [puppet] - 10https://gerrit.wikimedia.org/r/433021 (https://phabricator.wikimedia.org/T194404) (owner: 10Dzahn) [23:06:59] (03CR) 10Dzahn: "could i add this to a DB SWAT window?:)" [puppet] - 10https://gerrit.wikimedia.org/r/430524 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [23:08:03] (03PS1) 10Brian Wolff: Re-enable sending csp logs to logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433089 [23:17:27] (03CR) 10Dzahn: [C: 04-1] "affects: thorium.eqiad.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/416742 (owner: 10Dzahn) [23:18:35] SWAT people: Was that a yes to I can deploy my own patch during the swat window? [23:24:02] Based on lack of response, I'm going to take that as a yes, and deploy that patch [23:27:04] (03CR) 10Brian Wolff: [C: 032] Use the english message in badpass logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432334 (owner: 10Brian Wolff) [23:28:21] (03Merged) 10jenkins-bot: Use the english message in badpass logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432334 (owner: 10Brian Wolff) [23:29:54] (03CR) 10jenkins-bot: Use the english message in badpass logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432334 (owner: 10Brian Wolff) [23:30:06] and while i'm here, I'm probably going to deploy https://gerrit.wikimedia.org/r/#/c/433089/ as well [23:32:26] (03CR) 10Brian Wolff: [C: 032] Re-enable sending csp logs to logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433089 (owner: 10Brian Wolff) [23:33:38] (03Merged) 10jenkins-bot: Re-enable sending csp logs to logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433089 (owner: 10Brian Wolff) [23:36:00] (03CR) 10jenkins-bot: Re-enable sending csp logs to logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433089 (owner: 10Brian Wolff) [23:36:19] !log bawolff@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/432334/ Use english messages in badpass log (duration: 01m 16s) [23:36:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:57] !log bawolff@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/433089/ - re-enable sending csp logs to logstash (duration: 01m 01s) [23:40:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:40:51] * bawolff done now [23:51:09] 10Operations, 10ops-eqiad: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T194698#4206146 (10ops-monitoring-bot) [23:53:08] (03PS1) 10Ottomata: Increase MirrorMaker max request size to message.max.bytes + 1Mb [puppet] - 10https://gerrit.wikimedia.org/r/433092 (https://phabricator.wikimedia.org/T189464) [23:53:54] (03CR) 10Ottomata: [C: 032] Increase MirrorMaker max request size to message.max.bytes + 1Mb [puppet] - 10https://gerrit.wikimedia.org/r/433092 (https://phabricator.wikimedia.org/T189464) (owner: 10Ottomata)