[00:17:44] RECOVERY - puppet last run on db1049 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [01:16:04] PROBLEM - MegaRAID on db1046 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [02:21:51] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.2) (duration: 08m 14s) [02:22:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:27:54] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Jun 5 02:27:53 UTC 2017 (duration 6m 2s) [02:28:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:46:04] PROBLEM - puppet last run on labtestservices2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/usr/local/bin/labs-ip-alias-dump.py] [04:16:44] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=565.80 Read Requests/Sec=580.00 Write Requests/Sec=11.00 KBytes Read/Sec=37642.00 KBytes_Written/Sec=62.80 [04:24:44] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=17.30 Read Requests/Sec=1.60 Write Requests/Sec=0.30 KBytes Read/Sec=16.40 KBytes_Written/Sec=5.60 [05:18:24] PROBLEM - HHVM rendering on mw2133 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:19:14] RECOVERY - HHVM rendering on mw2133 is OK: HTTP OK: HTTP/1.1 200 OK - 76844 bytes in 0.164 second response time [05:50:57] (03PS1) 10Marostegui: db-eqiad.php: Add comment about db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357176 [05:54:42] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Add comment about db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357176 (owner: 10Marostegui) [05:55:43] (03Merged) 10jenkins-bot: db-eqiad.php: Add comment about db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357176 (owner: 10Marostegui) [05:55:52] (03CR) 10jenkins-bot: db-eqiad.php: Add comment about db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357176 (owner: 10Marostegui) [05:56:04] RECOVERY - MegaRAID on db1046 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [05:56:53] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add coments to db1089's current status (duration: 00m 39s) [05:57:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:57:46] 10Operations, 10ops-eqiad, 10Analytics-Kanban, 10DBA, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3314068 (10Marostegui) This BBU failed again and the policy went back to WriteThrough: ``` Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad... [05:58:55] !log Stop MySQL on db1095 to take a backup - T153743 [05:59:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:59:03] T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743 [06:07:04] (03PS1) 10Marostegui: db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357182 (https://phabricator.wikimedia.org/T166206) [06:09:26] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357182 (https://phabricator.wikimedia.org/T166206) (owner: 10Marostegui) [06:10:47] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357182 (https://phabricator.wikimedia.org/T166206) (owner: 10Marostegui) [06:10:55] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1053 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357182 (https://phabricator.wikimedia.org/T166206) (owner: 10Marostegui) [06:12:51] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1053 - T166206 (duration: 00m 39s) [06:12:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:59] T166206: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206 [06:13:51] !log Deploy alter table on s4 - db1053 - T166206 [06:13:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:15:24] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/3/1: down - Transit: Telia (IC-308845) {#3861} [10Gbps]BR [06:15:37] !log Deploy alter table on s3 - db1069 - T166278 [06:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:15:44] T166278: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278 [06:19:14] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:20:06] !log Deploy alter table s4 - on labsdb1001 - T166206 [06:20:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:20:16] T166206: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206 [06:24:24] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [06:41:13] (03CR) 10Elukey: "dbstore1002 (anaytics-store) does not have '::role::mariadb::misc::eventlogging', is the change intended only for db1047/1046?" [puppet] - 10https://gerrit.wikimedia.org/r/356648 (owner: 10Jcrespo) [06:48:14] RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:55:55] RECOVERY - Disk space on labtestcontrol2001 is OK: DISK OK [06:58:44] PROBLEM - designate-api http on labtestservices2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:10:44] PROBLEM - swift-object-updater on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:44] PROBLEM - dhclient process on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:44] PROBLEM - swift-object-auditor on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:45] PROBLEM - swift-account-replicator on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:45] PROBLEM - swift-account-auditor on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:45] PROBLEM - salt-minion processes on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:54] PROBLEM - swift-object-server on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:54] PROBLEM - swift-container-updater on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:54] PROBLEM - swift-container-auditor on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:54] PROBLEM - swift-container-server on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:54] PROBLEM - swift-container-replicator on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:55] PROBLEM - swift-object-replicator on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:55] PROBLEM - swift-account-reaper on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:56] PROBLEM - swift-account-server on ms-be1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:11:34] RECOVERY - swift-object-updater on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [07:11:34] RECOVERY - dhclient process on ms-be1002 is OK: PROCS OK: 0 processes with command name dhclient [07:11:35] RECOVERY - salt-minion processes on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [07:11:35] RECOVERY - swift-account-replicator on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [07:11:35] RECOVERY - swift-account-auditor on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [07:11:35] RECOVERY - swift-object-auditor on ms-be1002 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [07:11:44] RECOVERY - swift-container-replicator on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [07:11:44] RECOVERY - swift-object-server on ms-be1002 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [07:11:44] RECOVERY - swift-object-replicator on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [07:11:44] RECOVERY - swift-container-updater on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [07:11:44] RECOVERY - swift-container-auditor on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [07:11:45] RECOVERY - swift-container-server on ms-be1002 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [07:11:45] RECOVERY - swift-account-server on ms-be1002 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [07:11:46] RECOVERY - swift-account-reaper on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [07:14:14] RECOVERY - puppet last run on labtestservices2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [07:15:12] !log Deploy alter table in s2 (codfw master) this will generate lag in codfw - T166205 [07:15:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:26] T166205: Convert unique keys into primary keys for some wiki tables on s2 - https://phabricator.wikimedia.org/T166205 [07:18:19] (03Abandoned) 10Giuseppe Lavagetto: Use directory manifests when enabling the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/356379 (owner: 10Giuseppe Lavagetto) [07:40:24] (03PS1) 10Framawiki: Enable SandboxLink extension on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357186 (https://phabricator.wikimedia.org/T166553) [07:41:28] !log stopping db2038 mysql and preparing for reimage [07:41:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:39] !log Stop labsdb1011 to take a backup - T153743 [07:43:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:51] T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743 [07:47:45] PROBLEM - haproxy failover on dbproxy1011 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [07:48:05] ^ that is me with labsdb1011 [08:10:11] !log swift eqiad-prod decom ms-be1009 / 10 / 11 - T166489 [08:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:20] T166489: Decommission ms-be1001 - ms-be1012 - https://phabricator.wikimedia.org/T166489 [08:33:04] RECOVERY - Check systemd state on conf1002 is OK: OK - running: The system is fully operational [08:37:14] (03PS1) 10Joal: Add webrequest dataset to pivot configuration [puppet] - 10https://gerrit.wikimedia.org/r/357191 (https://phabricator.wikimedia.org/T166967) [08:37:39] elukey: for when you have time in between two reviews --^ [08:42:45] (03CR) 10Elukey: [C: 032] Add webrequest dataset to pivot configuration [puppet] - 10https://gerrit.wikimedia.org/r/357191 (https://phabricator.wikimedia.org/T166967) (owner: 10Joal) [08:48:13] https://phabricator.wikimedia.org/T166806 what's happening here? [08:48:48] (An unknown error occurred in storage backend "local-swift-eqiad".) [08:49:11] that's the same message that for the failed transcodes [08:49:41] how big are those files? [08:50:23] quite big, but I don't know exactly [08:50:46] because IIRC now switch caps to ~4G [08:51:00] err swift [08:51:43] trying downloading https://tools.wmflabs.org/video2commons/static/ssu/Ram_Rajya_(1943)_720p.webm, I get 2 GB [08:56:24] trying to check logs for those files [09:04:02] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3314257 (10Joe) [09:06:40] !log Stop replication on db1070 for maintenance - T153743 [09:06:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:50] T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743 [09:12:25] yannf: didn't find anything good atm, maybe godog (aka the swift master) will find more [09:13:23] <_joe_> I strongly suspect it has to do with something other than swift, btw [09:15:03] the last problem related to upload file size was nginx not accepting a super big payload, this is why I asked.. [09:15:15] I checked FileOperations.log on mwlog but nothing came up [09:22:32] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3314304 (10Yann) That's the same error message as in T166482 [09:23:34] 10Operations, 10TimedMediaHandler, 10media-storage: Persistent failure of TMH to transcode videos at specific resolutions - https://phabricator.wikimedia.org/T166482#3297326 (10Yann) This error message has also appeared while doing server-side uploads: T166806. [09:24:26] elukey, I don't if it helps, but it is the same error message as in https://phabricator.wikimedia.org/T166482 [09:25:06] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3314309 (10zhuyifei1999) >>! In T166806#3309121, @Dereckson wrote: > @zhuyifei1999 Do you wish these tasks in the video2commons project or not? I don't really mind e... [09:34:08] yannf: I didn't find the the same signature on FileOperations.log, might be something different [10:02:02] I'm taking a look at the task too btw [10:08:20] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3308339 (10fgiunchedi) @Dereckson I saw your importImages run has finished on terbium (?) how'd it go this time? [10:17:27] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3314353 (10Dereckson) Failure too. [10:42:12] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3314381 (10fgiunchedi) Focusing only on one file for now, found a 500 from swift in `FileOperation`, now looking on the swift side ``` archive/FileOperation.log-2017... [10:50:16] godog: just making sure, will such very frequent swift operations on a single file (like https://commons.wikimedia.org/w/index.php?title=Special%3ALog&type=&user=&page=File%3ANEfgdgdfgW.XCF&year=&month=-1&tagfilter=&hide_thanks_log=1&hide_patrol_log=1&hide_tag_log=1) break swift? [10:53:33] (03PS1) 10Giuseppe Lavagetto: role:jobqueue_redis: daily restart of slaves [puppet] - 10https://gerrit.wikimedia.org/r/357193 [10:53:55] <_joe_> elukey: ^^ [10:55:07] zhuyifei1999_: no it should be ok, things tend to be slow rather than break [10:55:17] k [11:05:15] PROBLEM - salt-minion processes on puppetmaster1001 is CRITICAL: PROCS CRITICAL: 5 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [11:23:15] RECOVERY - salt-minion processes on puppetmaster1001 is OK: PROCS OK: 4 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [11:25:41] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3314495 (10fgiunchedi) @Dereckson I've enabled debug on nginx for connections coming from terbium, can you try the uploads again? thanks! [11:38:44] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3314518 (10Dereckson) Sûre. [11:39:46] (03PS1) 10Volans: Tox: find and check Python files without extension [puppet] - 10https://gerrit.wikimedia.org/r/357197 (https://phabricator.wikimedia.org/T144169) [11:47:46] jouncebot: refresh [11:47:48] I refreshed my knowledge about deployments. [11:47:51] godog: done for upload script re-run (and still failing) [11:52:17] jouncebot: next [11:52:17] In 1 hour(s) and 7 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T1300) [12:03:37] (03CR) 10Elukey: "Thanks for the review! I am going to apply the changes and then I'll re-submit the patch.. (the ones not commented will be fixed as well, " (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey) [12:22:31] (03PS1) 10Giuseppe Lavagetto: etcd: big old roles/auth cleanup [puppet] - 10https://gerrit.wikimedia.org/r/357199 [12:30:35] (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/6666/" [puppet] - 10https://gerrit.wikimedia.org/r/357199 (owner: 10Giuseppe Lavagetto) [12:40:27] volans: http://pymysql.readthedocs.io/en/latest/modules/cursors.html's execute looks really nice, it takes tuples/dicts/lists as args [12:40:49] I thought it was only tuples [12:41:14] :) [12:46:52] pretty easy to mock too in tests [12:50:46] PROBLEM - puppet last run on db1047 is CRITICAL: Return code of 255 is out of bounds [12:51:06] PROBLEM - configured eth on db1047 is CRITICAL: Return code of 255 is out of bounds [12:51:06] PROBLEM - Check size of conntrack table on db1047 is CRITICAL: Return code of 255 is out of bounds [12:51:16] PROBLEM - Disk space on db1047 is CRITICAL: Return code of 255 is out of bounds [12:51:16] PROBLEM - salt-minion processes on db1047 is CRITICAL: Return code of 255 is out of bounds [12:51:16] PROBLEM - dhclient process on db1047 is CRITICAL: Return code of 255 is out of bounds [12:51:20] uh [12:51:21] PROBLEM - mysqld processes on db1047 is CRITICAL: Return code of 255 is out of bounds [12:51:24] icinga getting nuts again? [12:51:27] PROBLEM - MariaDB disk space on db1047 is CRITICAL: Return code of 255 is out of bounds [12:51:27] PROBLEM - eventlogging_sync processes on db1047 is CRITICAL: Return code of 255 is out of bounds [12:51:47] RECOVERY - puppet last run on db1047 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [12:52:02] <_joe_> marostegui: that would mean nrpe not running [12:52:06] <_joe_> or something [12:52:08] RECOVERY - configured eth on db1047 is OK: OK - interfaces up [12:52:08] RECOVERY - Check size of conntrack table on db1047 is OK: OK: nf_conntrack is 2 % full [12:52:17] RECOVERY - Disk space on db1047 is OK: DISK OK [12:52:17] RECOVERY - salt-minion processes on db1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:52:17] RECOVERY - dhclient process on db1047 is OK: PROCS OK: 0 processes with command name dhclient [12:52:23] RECOVERY - mysqld processes on db1047 is OK: PROCS OK: 1 process with command name mysqld [12:52:28] RECOVERY - MariaDB disk space on db1047 is OK: DISK OK [12:52:28] RECOVERY - eventlogging_sync processes on db1047 is OK: PROCS OK: 1 process with UID = 0 (root), args /bin/bash /usr/local/bin/eventlogging_sync.sh [12:52:31] just db1047, is clearly not icinga loosing again the downtimes [12:52:43] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Re-setup lvs1007-lvs1012, replace lvs1001-lvs1006 - https://phabricator.wikimedia.org/T150256#3314674 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by bblack on neodymium.eqiad.wmnet for hosts: ``` ['lvs1007.eqiad.wmnet'] ``` The log can... [12:52:50] I ran puppet and it started nrpe indeed [12:52:51] so yes [12:52:59] <_joe_> yeah that might be the reason [12:53:10] That server is a bit screwed, mysql wise [12:53:14] I will take care of it [12:53:21] sigh [12:53:50] marostegui: what happens to db1047? [12:54:01] elukey: it is a bit dead - mysql wise [12:54:19] elukey: T166452 [12:54:20] T166452: db1047 has been restarted - needs another restart - https://phabricator.wikimedia.org/T166452 [12:54:26] !log Stop MySQL db1047 - T166452 [12:54:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:55:38] PROBLEM - salt-minion processes on puppetmaster1001 is CRITICAL: PROCS CRITICAL: 5 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:56:48] PROBLEM - haproxy failover on dbproxy1009 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [12:56:48] PROBLEM - haproxy failover on dbproxy1004 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [12:56:55] ^ me shutting down db1047 [12:59:48] RECOVERY - haproxy failover on dbproxy1009 is OK: OK check_failover servers up 2 down 0 [12:59:48] RECOVERY - haproxy failover on dbproxy1004 is OK: OK check_failover servers up 2 down 0 [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T1300). [13:00:04] thedj and DatGuy: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:14] yep [13:00:59] o/ [13:01:03] o/ [13:02:34] !log rebooting lsv1010 (post-reinstall) [13:02:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:09] andrewbogott: https://gerrit.wikimedia.org/r/#/c/350494/ [13:04:28] PROBLEM - Host lvs1010 is DOWN: PING CRITICAL - Packet loss = 100% [13:04:34] DatGuy: looks like it's just the two of us for eu swat [13:04:45] I can deploy your commit [13:04:50] o/ [13:04:53] I am around if needed [13:05:03] hashar: want to do the swat, or should I? [13:05:13] (I am fine with doing it, just asking) [13:05:16] please do. I am busy with jenkins :) [13:05:21] hashar: will do [13:05:32] alright. commit is https://gerrit.wikimedia.org/r/#/c/357128/ if lazy :D [13:05:46] DatGuy: already reviewing it :) [13:06:18] RECOVERY - Check systemd state on lvs1010 is OK: OK - running: The system is fully operational [13:06:28] RECOVERY - Host lvs1010 is UP: PING OK - Packet loss = 0%, RTA = 36.09 ms [13:06:41] (03PS15) 10Krinkle: dynamicproxy: Centralise error page template and use it [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) [13:06:48] (03CR) 10Krinkle: "Minor doc improvements." [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [13:09:11] (03PS16) 10Krinkle: dynamicproxy: Centralise error page template and use it [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) [13:09:34] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357128 (https://phabricator.wikimedia.org/T166788) (owner: 10DatGuy) [13:09:45] (03CR) 10Krinkle: "Haven't been able to test. Which node can I use in the puppet-compiler?" [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [13:11:52] (03Merged) 10jenkins-bot: Lift IP throttle for Wikimedia Chile editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357128 (https://phabricator.wikimedia.org/T166788) (owner: 10DatGuy) [13:12:05] (03CR) 10jenkins-bot: Lift IP throttle for Wikimedia Chile editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357128 (https://phabricator.wikimedia.org/T166788) (owner: 10DatGuy) [13:16:45] !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:357128|Lift IP throttle for Wikimedia Chile editathon (T166788)]] (duration: 00m 39s) [13:16:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:54] T166788: Lift IP rate limit - Editathon (WMCL) - 2017-06-06 - https://phabricator.wikimedia.org/T166788 [13:17:06] DatGuy: deployed [13:17:12] Dereckson: thanks! I'll check [13:17:13] I guess there is nothing to check, right? [13:17:44] I could create an account? [13:17:54] I have ACC access and could process a request? [13:18:15] DatGuy: well, you should create several accounts from that range of IPs, right? (to test it) [13:18:42] It's pretty urgent because it is for tomorrow. I couldn't get a hang of the event [13:19:01] but no errors on basic stuff, so I guess its good :) [13:19:15] we are around tomorrow, in case of emergency [13:19:30] well, somebody will be around, and there are several swat deploy windows... [13:19:47] One day, we'll have a request without IPv6 [13:19:49] I'm not available tomorrow [13:20:09] thedj: around for eu swat? [13:20:12] and surprise, the browser will pick IPv6 instead of whitelisted IPv4 [13:20:35] zeljkof, I believe its good for deployment to master [13:20:44] so be ready if the ip given by the contact is the ip whitelisted to ask "would you have some ipv6" [13:22:49] DatGuy: your commit is already deployed [13:22:57] oh [13:22:59] :P [13:23:30] did you skip the eqiad server? [13:23:47] cluster* [13:24:29] well, thanks [13:26:40] DatGuy: deployment.eqiad.wmnet? [13:26:53] I have deployed from there [13:27:02] mwdebug1002.eqiad.wmnet [13:27:10] I guess thedj is not around, so finishing eu swat [13:27:33] DatGuy: yes, skipped that since as far as I know one can not test throttle rules there [13:27:42] ah ok [13:27:43] was I wrong? could you test it there? [13:27:46] not sure [13:27:59] !log eu swat finished [13:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:30:36] just a note, because it's tomorrow dont forget to remove the memcache key zeljkof [13:32:13] DatGuy: sorry, did not understand that [13:32:23] is there something I need to do tomorrow? [13:32:24] https://wikitech.wikimedia.org/wiki/Increasing_account_creation_threshold [13:33:02] (03CR) 10Thcipriani: [C: 031] "> Alexandros and I will take care of the migration on Tuesday 6th" [puppet] - 10https://gerrit.wikimedia.org/r/354186 (https://phabricator.wikimedia.org/T129148) (owner: 10Thcipriani) [13:33:36] DatGuy: oops, I was not aware of that [13:33:42] should I do it right now? or tomorrow' [13:34:33] probably now [13:36:10] ok, doing it right now [13:40:40] DatGuy: ok, just to make it clear, this is what I need to do? [13:40:44] mwscript mcc.php --wiki=enwiki [13:40:50] and the same for eswiki? [13:40:59] hashar: ^ [13:41:27] seems like it.... I never had deployment access. hash.ar which you already pinged might be of help more [13:42:17] hashar: also, TheDJ is not around, my plan was to skip his commit [13:42:46] what is happening ? [13:43:03] hashar: I have deployed https://gerrit.wikimedia.org/r/#/c/357128 [13:43:21] DatGuy says I should run this script https://wikitech.wikimedia.org/wiki/Increasing_account_creation_threshold [13:43:36] but I have never done it, so I am not sure how to do it, like this: [13:43:43] zfilipin@terbium:~$ mwscript mcc.php --wiki=enwiki [13:43:50] hashar: is that correct? [13:43:56] (and the same for eswiki) [13:44:08] I never ever bothered to clear out the cached key [13:44:20] hashar: so I can just skip that step? [13:44:27] though yeah apparently mediawiki core seems to have a TTL of 86400 seconds [13:44:52] fyi its tomorrow, so that's why I thought it might be necessary (not sure if your other deploys have been with more time notice) [13:46:39] hashar: so, should I run the script or skip that step? :) [13:46:47] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3314791 (10fgiunchedi) So the problem is the `tmpfs` on `/var/lib/nginx` being 1G, IOW the maximum client body that nginx will spool there. For swift frontends I thi... [13:47:06] !log rebooting lvs1010 again [13:47:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:26] zeljkof: I have no idea [13:48:47] the event is 2017-06-06T15:00 UTC, so 17 UTC our time, just a couple hours less than 24 hours since the deployment (about 22) [13:49:11] hashar: I am not even sure how to run the script [13:49:35] does `mwscript mcc.php --wiki=$wiki` mean `mwscript mcc.php --wiki=enwiki`? [13:49:47] ($wiki stands for enwiki, in this case?) [13:50:35] (03CR) 10Ottomata: "Just a thought: Does belong in mariadb role? Or eventlogging role?" [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey) [13:51:59] I dont even know how long those keys stick in memcached [13:53:06] hashar: is there anybody we should ask? or just do nothing? :) [13:53:13] (03PS1) 10Filippo Giunchedi: hieradata: set nginx client_max_body_size 0 for swift [puppet] - 10https://gerrit.wikimedia.org/r/357207 (https://phabricator.wikimedia.org/T166806) [13:54:22] ottomata: eventloggin_sync.sh is in the mariadb role, this is why I added the script in there [13:54:27] ahhh ok [13:54:30] ya guess so [13:54:31] zeljkof: maybe we should. Then that documentation is most probably obsolete [13:54:31] :/ [13:54:49] I am ok to put it wherever we want :) [13:55:13] 10Operations, 10ops-eqiad, 10DBA: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935#3314805 (10Marostegui) ``` root@db1089:~# hpssacli controller all show detail | grep Firmware Firmware Version: 3.56 ``` [13:55:21] hashar: last update in 2012 [13:55:48] elukey: ya sounds fine, leave it in mariadb if the other one is there [13:56:11] hashar: ok, doing noting for now [14:02:07] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: set nginx client_max_body_size 0 for swift [puppet] - 10https://gerrit.wikimedia.org/r/357207 (https://phabricator.wikimedia.org/T166806) (owner: 10Filippo Giunchedi) [14:04:32] 10Operations, 10Puppet, 10Horizon, 10Labs, 10Patch-For-Review: Puppet tab in Horizon unusably slow - https://phabricator.wikimedia.org/T149589#3314826 (10Volans) To add some data here, I'm getting very slow responses when opening an instance page, like `https://horizon.wikimedia.org/project/instances/edb... [14:08:42] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage, 10Patch-For-Review: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3314833 (10fgiunchedi) @Dereckson could you try the uploads one more time? I've disabled spooling of files to disk in nginx [14:11:22] godog: still an issue [14:14:07] (03PS5) 10Elukey: role::mariadb::analytics::custom_repl_slave: add eventlogging_cleaner.py [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) [14:21:25] Dereckson: sigh, sorry about the false starts, I'll poke at it some more [14:23:48] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Re-setup lvs1007-lvs1012, replace lvs1001-lvs1006 - https://phabricator.wikimedia.org/T150256#3314892 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by bblack on neodymium.eqiad.wmnet for hosts: ``` ['lvs1007.eqiad.wmnet'] ``` The log can... [14:25:56] (03PS6) 10Elukey: role::mariadb::analytics::custom_repl_slave: add eventlogging_cleaner.py [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) [14:27:28] PROBLEM - Nginx local proxy to apache on mw1206 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.153 second response time [14:28:28] RECOVERY - Nginx local proxy to apache on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.195 second response time [14:41:37] (03PS1) 10Marostegui: db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357216 [14:43:10] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357216 (owner: 10Marostegui) [14:43:49] (03PS1) 10Filippo Giunchedi: tlsproxy: selectively disable request buffering [puppet] - 10https://gerrit.wikimedia.org/r/357218 (https://phabricator.wikimedia.org/T166806) [14:44:11] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357216 (owner: 10Marostegui) [14:45:12] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight - T166935 (duration: 00m 39s) [14:45:15] 10Operations, 10ops-eqiad, 10DBA: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935#3315004 (10Marostegui) I have started to slowly repool this server as I don't want to leave it out much longer. @Cmjohnson once you have time for the firmware upgrade, let us know and we will... [14:45:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:23] T166935: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935 [14:45:51] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357216 (owner: 10Marostegui) [14:53:41] (03CR) 10Mforns: "LGTM!" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey) [14:57:57] RECOVERY - Host lvs1007 is UP: PING OK - Packet loss = 0%, RTA = 37.44 ms [14:59:22] (03PS1) 10Marostegui: db-eqiad.php: Increase db1089 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357220 [14:59:30] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/6668/" [puppet] - 10https://gerrit.wikimedia.org/r/357218 (https://phabricator.wikimedia.org/T166806) (owner: 10Filippo Giunchedi) [14:59:34] Hi. I need someone from DB and operations to supervise a global rename from an user with more than 50k edits if possible [15:00:34] <_joe_> TabbyCat: open a phabricator ticket I guess? [15:00:40] +1 [15:00:46] <_joe_> adding #operations as a tag [15:01:09] Add also #DBA please [15:01:13] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase db1089 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357220 (owner: 10Marostegui) [15:02:12] (03Merged) 10jenkins-bot: db-eqiad.php: Increase db1089 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357220 (owner: 10Marostegui) [15:02:30] (03CR) 10jenkins-bot: db-eqiad.php: Increase db1089 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357220 (owner: 10Marostegui) [15:03:04] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1089 weight - T166935 (duration: 00m 38s) [15:03:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:13] T166935: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935 [15:03:42] ktnx :) [15:03:45] will do [15:12:42] (03PS1) 10Marostegui: db-eqiad.php: Increase db1089 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357221 [15:13:42] 10Operations, 10ops-eqiad, 10User-fgiunchedi: Debug HP raid cache disabled errors on ms-be1019/20/21 - https://phabricator.wikimedia.org/T163777#3315093 (10fgiunchedi) [15:13:44] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1020 - https://phabricator.wikimedia.org/T166837#3315095 (10fgiunchedi) [15:13:59] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase db1089 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357221 (owner: 10Marostegui) [15:15:26] (03Merged) 10jenkins-bot: db-eqiad.php: Increase db1089 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357221 (owner: 10Marostegui) [15:15:32] 10Operations, 10ops-esams, 10Traffic: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T166965#3315104 (10Volans) [15:15:50] (03CR) 10jenkins-bot: db-eqiad.php: Increase db1089 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357221 (owner: 10Marostegui) [15:16:20] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1089 weight - T166935 (duration: 00m 39s) [15:16:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:30] T166935: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935 [15:16:42] 10Operations, 10ops-esams, 10Traffic: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T166964#3315106 (10Volans) [15:20:42] 10Operations, 10DBA, 10GlobalRename, 10MediaWiki-extensions-CentralAuth: Global rename of global account with > 50k edits - supervision needed - https://phabricator.wikimedia.org/T167031#3315110 (10MarcoAurelio) [15:23:06] 10Operations, 10ops-esams, 10Traffic: Degraded RAID on lvs3001 - https://phabricator.wikimedia.org/T166964#3313134 (10Volans) Relating it to T166965 [15:26:14] 10Operations, 10ops-eqiad: Degraded RAID on terbium - https://phabricator.wikimedia.org/T166962#3315157 (10Volans) p:05Triage>03Normal a:03Volans False positive, I'll add the error message to the list of ones to be skipped. [15:28:40] (03PS1) 10Volans: Icinga: skip another NRPE error in raid hanlder [puppet] - 10https://gerrit.wikimedia.org/r/357223 (https://phabricator.wikimedia.org/T166962) [15:32:04] godog: it's a 1-line, I'm merging it, just FYI, another way NRPE can fail ;) [15:32:13] (03CR) 10Volans: [C: 032] Icinga: skip another NRPE error in raid hanlder [puppet] - 10https://gerrit.wikimedia.org/r/357223 (https://phabricator.wikimedia.org/T166962) (owner: 10Volans) [15:33:19] 10Operations, 10ops-eqiad: Degraded RAID on terbium - https://phabricator.wikimedia.org/T166962#3315191 (10Volans) 05Open>03Resolved Fix merged. [15:33:45] volans: sigh, thanks for the heads up [15:34:07] yw :) [15:35:55] actually, you should thank NRPE for it's fantasy :-P [15:36:49] 10Operations, 10Analytics: Broken /a/refinery-source/guard/run_all_guards.sh script on stat1002 - https://phabricator.wikimedia.org/T166937#3315212 (10Nuria) [15:37:01] 10Operations, 10Performance-Team, 10Thumbor, 10MW-1.30-release-notes (WMF-deploy-2017-06-06_(1.30.0-wmf.4)), 10Patch-For-Review: Thumbor should reject thumbnail requests that are the same size as the original or bigger - https://phabricator.wikimedia.org/T150741#3315226 (10fgiunchedi) [15:37:03] 10Operations, 10Performance-Team, 10Thumbor: Limit maximum x-content-dimension size to avoid hitting nginx limits - https://phabricator.wikimedia.org/T167034#3315213 (10fgiunchedi) [15:37:06] 10Operations, 10Analytics: Broken /a/refinery-source/guard/run_all_guards.sh script on stat1002 - https://phabricator.wikimedia.org/T166937#3312286 (10Nuria) p:05Normal>03High [15:38:53] PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2001206 [15:50:40] 10Operations, 10DBA, 10GlobalRename, 10MediaWiki-extensions-CentralAuth: Global rename of global account with > 50k edits - supervision needed - https://phabricator.wikimedia.org/T167031#3315249 (10jcrespo) I won't be around on Friday or Monday. I would suggest starting it next Tuesday, assuming it is appr... [15:54:19] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam from root@ - https://phabricator.wikimedia.org/T132324#3315266 (10fgiunchedi) [15:54:21] 10Operations: acct monthly cron will spam when /var/log/wtmp.1 doesn't exist - https://phabricator.wikimedia.org/T167035#3315254 (10fgiunchedi) [15:54:58] 10Operations: stretch acct monthly cron will spam when /var/log/wtmp.1 doesn't exist - https://phabricator.wikimedia.org/T167035#3315254 (10fgiunchedi) [16:10:00] 10Operations, 10ops-eqiad, 10Analytics-Kanban, 10DBA, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3315324 (10Nuria) Since this is the master for eventlogging machine. Can we move the refresh for this host to happen sooner? (ping @jcrespo) https://phabricator.wikime... [16:13:03] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:13:54] (03CR) 10BBlack: [C: 031] tlsproxy: selectively disable request buffering [puppet] - 10https://gerrit.wikimedia.org/r/357218 (https://phabricator.wikimedia.org/T166806) (owner: 10Filippo Giunchedi) [16:14:44] (03PS2) 10Filippo Giunchedi: tlsproxy: selectively disable request buffering [puppet] - 10https://gerrit.wikimedia.org/r/357218 (https://phabricator.wikimedia.org/T166806) [16:15:34] 10Operations, 10DBA, 10GlobalRename, 10MediaWiki-extensions-CentralAuth: Global rename of Idh0854 -> Garam: supervision needed - https://phabricator.wikimedia.org/T167031#3315348 (10Framawiki) p:05Triage>03Normal [16:16:02] 10Operations, 10Wikimedia-Site-requests: Global rename of Idh0854 -> Garam: supervision needed - https://phabricator.wikimedia.org/T167031#3315110 (10Framawiki) [16:16:41] (03CR) 10Filippo Giunchedi: [C: 032] tlsproxy: selectively disable request buffering [puppet] - 10https://gerrit.wikimedia.org/r/357218 (https://phabricator.wikimedia.org/T166806) (owner: 10Filippo Giunchedi) [16:17:04] 10Operations, 10ops-eqiad, 10Analytics-Kanban, 10DBA, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3315357 (10jcrespo) Not really, we have almost decided the goals for Q1, and they are all quite urgent and for hardware that has been already bought. What we can do to... [16:17:17] 10Operations, 10Wikimedia-Site-requests: Global rename of Idh0854 -> Garam: supervision needed - https://phabricator.wikimedia.org/T167031#3315110 (10Framawiki) (it looks like these tags are for the extension itself, not for our usage on WMF wikis) [16:19:25] !log stopping db2037 and preparing for reimage [16:19:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:45] (03PS1) 10Marostegui: db-eqiad.php: Restore db1089 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357229 [16:20:16] (03PS1) 10Filippo Giunchedi: hieradata: turn off nginx proxy_request_buffering [puppet] - 10https://gerrit.wikimedia.org/r/357230 (https://phabricator.wikimedia.org/T166806) [16:20:17] 10Operations, 10Analytics, 10User-Elukey: kafkatee's logrotate/syslog default pkg files needs to be removed - https://phabricator.wikimedia.org/T145490#3315393 (10Nuria) [16:20:19] 10Operations, 10Patch-For-Review, 10User-Elukey: Cron conflict for kafkatee logrotate on oxygen - https://phabricator.wikimedia.org/T151748#3315391 (10Nuria) [16:21:25] 10Operations, 10Analytics: Rename stat100x machines to have misc element names - https://phabricator.wikimedia.org/T149228#2745858 (10Nuria) We actually renamed 1001 to be thorium. It is no longer a stats box (in name). [16:21:37] 10Operations, 10Analytics: Rename stat100x machines to have misc element names - https://phabricator.wikimedia.org/T149228#3315397 (10Nuria) 05Open>03Resolved [16:22:25] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: turn off nginx proxy_request_buffering [puppet] - 10https://gerrit.wikimedia.org/r/357230 (https://phabricator.wikimedia.org/T166806) (owner: 10Filippo Giunchedi) [16:22:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1089 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357229 (owner: 10Marostegui) [16:24:02] (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1089 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357229 (owner: 10Marostegui) [16:25:02] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore db1089 original weight - T166935 (duration: 00m 38s) [16:25:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:25:10] T166935: db1089: update RAID controller firwmare - https://phabricator.wikimedia.org/T166935 [16:25:54] (03CR) 10jenkins-bot: db-eqiad.php: Restore db1089 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357229 (owner: 10Marostegui) [16:26:38] 10Operations, 10Analytics, 10Performance-Team: Update jq to v1.4.0 or higher - https://phabricator.wikimedia.org/T159392#3066309 (10Nuria) We are moving to new stats boxes which we will not have this problem. [16:26:45] 10Operations, 10Analytics, 10Performance-Team: Update jq to v1.4.0 or higher - https://phabricator.wikimedia.org/T159392#3315423 (10Nuria) 05Open>03declined [16:33:09] 10Operations, 10Wikimedia-Site-requests: Global rename of Idh0854 -> Garam: supervision needed - https://phabricator.wikimedia.org/T167031#3315455 (10Marostegui) Good point @jcrespo about the alter tables :-) [16:34:11] Dereckson: still here? mind trying another upload :? [16:41:55] (03PS1) 10Framawiki: Lift IP throttle for Wikipedia workshop (14 June 2017) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357233 (https://phabricator.wikimedia.org/T167011) [16:42:03] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:43:22] (03CR) 10Hashar: [C: 031] "From a discussion with Tyler. Yes batched deploy sounds like a good idea." [puppet] - 10https://gerrit.wikimedia.org/r/354186 (https://phabricator.wikimedia.org/T129148) (owner: 10Thcipriani) [16:46:55] 10Operations, 10Traffic, 10Interactive-Sprint, 10Maps (Kartographer), 10Regression: Map tiles load way slower than before - https://phabricator.wikimedia.org/T167046#3315566 (10MaxSem) [16:47:30] (03CR) 10Filippo Giunchedi: "paladox, 20after4 I've amended the patch to list the redirects first, PTAL" [puppet] - 10https://gerrit.wikimedia.org/r/355769 (https://phabricator.wikimedia.org/T166120) (owner: 10Filippo Giunchedi) [16:48:57] (03CR) 10Paladox: "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/355769 (https://phabricator.wikimedia.org/T166120) (owner: 10Filippo Giunchedi) [16:52:11] paladox: on what host did you test it btw? [16:52:25] godog phabricator (actually called phabricator) [16:53:46] (03CR) 10Dzahn: [C: 031] "i like Indigo though :) hehe http://bikeshed.org/" [puppet] - 10https://gerrit.wikimedia.org/r/357121 (owner: 10Paladox) [16:53:48] paladox: neat, thanks, just out of curiosity [16:55:03] i just applied the above again and works :). http://phabzilla.wmflabs.org redirects to https://phab-01.wmflabs.org [16:55:46] (03CR) 10Paladox: [C: 031] "Tested and works https://phabzilla.wmflabs.org/" [puppet] - 10https://gerrit.wikimedia.org/r/355769 (https://phabricator.wikimedia.org/T166120) (owner: 10Filippo Giunchedi) [17:00:04] gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T1700). [17:01:58] (03PS2) 10Dzahn: planet: add Wikimedia Scoring Platform blog feed [puppet] - 10https://gerrit.wikimedia.org/r/357110 (owner: 10BryanDavis) [17:03:06] (03CR) 10Dzahn: [C: 032] "just adjusted spelling ("Wikikmedia")" [puppet] - 10https://gerrit.wikimedia.org/r/357110 (owner: 10BryanDavis) [17:03:17] (03PS3) 10Dzahn: planet: add Wikimedia Scoring Platform blog feed [puppet] - 10https://gerrit.wikimedia.org/r/357110 (owner: 10BryanDavis) [17:06:07] (03PS3) 10Dzahn: Phabricator: Fix colour for Unbreak Now tasks [puppet] - 10https://gerrit.wikimedia.org/r/357121 (owner: 10Paladox) [17:06:44] (03CR) 10Dzahn: [C: 032] "yep, this isn't changing anything, it's just putting it in config, ACK @ "must have been change through web UI"" [puppet] - 10https://gerrit.wikimedia.org/r/357121 (owner: 10Paladox) [17:07:02] mutante ^^ thanks :) [17:08:04] 10Operations, 10ORES, 10Scoring-platform-team, 10Developer-notice, 10MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)): Re-enable ORES data in action API - https://phabricator.wikimedia.org/T163687#3315642 (10Halfak) 05Open>03Resolved a:03Halfak [17:11:57] (03CR) 10Dzahn: [C: 04-1] "@Hashar so this is still -1 per "Luasandbox crashes with HHVM 3.18 T165043" does that mean it's _not_ cherry-picked anymore?" [puppet] - 10https://gerrit.wikimedia.org/r/353964 (https://phabricator.wikimedia.org/T165462) (owner: 10Paladox) [17:14:33] (03CR) 10Dzahn: [C: 031] "lgtm, before this it would install BOTH, version 7 and 8, if on jessie. That's not desired, right Hashar?" [puppet] - 10https://gerrit.wikimedia.org/r/356241 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [17:48:40] (03PS1) 10Volans: CLI: improve configuration error handling [software/cumin] - 10https://gerrit.wikimedia.org/r/357234 (https://phabricator.wikimedia.org/T158747) [17:49:07] (03PS2) 10Volans: CLI: improve configuration error handling [software/cumin] - 10https://gerrit.wikimedia.org/r/357234 (https://phabricator.wikimedia.org/T158747) [17:49:21] (03PS13) 10Dzahn: jenkins: Install java 8 on stretch and greater [puppet] - 10https://gerrit.wikimedia.org/r/356243 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [17:51:22] (03CR) 10Ladsgroup: "@aude: I think you should remove the -1 as the change is deployed on testwikidata" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/353601 (https://phabricator.wikimedia.org/T165197) (owner: 10Ladsgroup) [17:57:45] (03CR) 10Dzahn: [C: 032] "only a change if on stretch" [puppet] - 10https://gerrit.wikimedia.org/r/356243 (https://phabricator.wikimedia.org/T166611) (owner: 10Paladox) [18:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T1800). [18:00:04] framawiki and Amir1: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:13] o/ [18:00:14] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage, 10Patch-For-Review: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3315819 (10Yann) The first file is 2 GB, so I don't even understand why it needs server-side upload. I thought that the limit is 4 GB now. Isn't... [18:00:16] \o [18:00:25] guess I can deploy [18:03:18] yannf: I have to run out now, though can you try again the uploads in T166806 even not server side? I should have fixed more limits to actually match 4GB [18:03:19] T166806: Server side upload for Yann - https://phabricator.wikimedia.org/T166806 [18:06:10] (03CR) 10Elukey: role::mariadb::analytics::custom_repl_slave: add eventlogging_cleaner.py (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey) [18:06:14] (03PS4) 10MaxSem: Enable wgExtraSignatureNamespaces at NS:102 for trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/356437 (https://phabricator.wikimedia.org/T166522) (owner: 10Framawiki) [18:06:21] (03CR) 10MaxSem: [C: 032] Enable wgExtraSignatureNamespaces at NS:102 for trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/356437 (https://phabricator.wikimedia.org/T166522) (owner: 10Framawiki) [18:07:16] godog, ok, trying [18:07:42] (03Merged) 10jenkins-bot: Enable wgExtraSignatureNamespaces at NS:102 for trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/356437 (https://phabricator.wikimedia.org/T166522) (owner: 10Framawiki) [18:07:51] (03CR) 10jenkins-bot: Enable wgExtraSignatureNamespaces at NS:102 for trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/356437 (https://phabricator.wikimedia.org/T166522) (owner: 10Framawiki) [18:08:58] framawiki, pulled on mwdebug1002, please test [18:09:06] MaxSem: I look at this [18:09:34] MaxSem: ok ! [18:11:20] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/356437/4 (duration: 00m 40s) [18:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:07] framawiki, deployed, please confirm [18:12:32] (03PS7) 10Elukey: role::mariadb::analytics::custom_repl_slave: add eventlogging_cleaner.py [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) [18:12:54] MaxSem: confirmed on prod [18:13:00] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage, 10Patch-For-Review: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3315892 (10zhuyifei1999) v2c received a stasherror for probably the same reason as why the server-side upload failed. Upon receiving the error i... [18:14:04] I don't feel confident to deploy https://gerrit.wikimedia.org/r/#/c/357022/ without security approval [18:14:49] MaxSem: wait, after a ctrl+f5 i've lost the icon [18:15:07] can you check that https://tr.wikipedia.org/w/index.php?title=Vikiproje:Sinema&action=edit shows a pencil icon in the edit bar ? thanks ! [18:16:19] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage, 10Patch-For-Review: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3315900 (10Yann) I tried uploading the first file using https://tools.wmflabs.org/url2commons/index.html and it failed. [18:16:32] framawiki, signature button? I see it [18:16:49] ok, perhaps a cache problem [18:17:18] godog, failed with url2commons [18:17:59] ERROR: null [18:18:21] Krinkle: https://gerrit.wikimedia.org/r/#/c/350494 has dependencies which I don't feel great about reviewing… I'm happy to merge the proxy bits once they're mergeable. [18:18:45] (03CR) 10MaxSem: [C: 04-2] "Since this is a security-sensitive feature and this is the first instance of it being enabled on WMF sites, I'd like security approval bef" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357022 (https://phabricator.wikimedia.org/T166947) (owner: 10Framawiki) [18:19:36] andrewbogott: It no longer has dependencies, although it does add the errorpage function in this commit. [18:19:43] I unbased it week or so ago [18:19:47] oh! Ok, I'll re-read it then [18:20:12] (03CR) 10Eevans: Genericize ca-manager (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/355782 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [18:20:23] (03PS2) 10MaxSem: Enable SandboxLink extension on ta.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357026 (https://phabricator.wikimedia.org/T166901) (owner: 10Framawiki) [18:20:29] (03CR) 10MaxSem: [C: 032] Enable SandboxLink extension on ta.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357026 (https://phabricator.wikimedia.org/T166901) (owner: 10Framawiki) [18:21:26] (03CR) 10Ottomata: [C: 04-1] "Eric, thanks for looking. Had a conversation with _joe_ today about this. He wants me to incorporate this with puppet ecdsa cert stuff, " [puppet] - 10https://gerrit.wikimedia.org/r/355782 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [18:21:28] (03Merged) 10jenkins-bot: Enable SandboxLink extension on ta.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357026 (https://phabricator.wikimedia.org/T166901) (owner: 10Framawiki) [18:21:38] (03CR) 10jenkins-bot: Enable SandboxLink extension on ta.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357026 (https://phabricator.wikimedia.org/T166901) (owner: 10Framawiki) [18:21:56] mutante thanks :) [18:22:42] framawiki, pulled on mwdebug1002 [18:23:53] PROBLEM - Disk space on elastic2014 is CRITICAL: DISK CRITICAL - free space: / 5711 MB (12% inode=97%) [18:24:03] MaxSem: it's good [18:25:31] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357026/2 (duration: 00m 38s) [18:25:39] framawiki, ^ [18:25:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:10] (03PS2) 10MaxSem: Enable SandboxLink extension on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357186 (https://phabricator.wikimedia.org/T166553) (owner: 10Framawiki) [18:26:14] (03CR) 10MaxSem: [C: 032] Enable SandboxLink extension on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357186 (https://phabricator.wikimedia.org/T166553) (owner: 10Framawiki) [18:27:16] (03Merged) 10jenkins-bot: Enable SandboxLink extension on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357186 (https://phabricator.wikimedia.org/T166553) (owner: 10Framawiki) [18:27:25] (03CR) 10jenkins-bot: Enable SandboxLink extension on euwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357186 (https://phabricator.wikimedia.org/T166553) (owner: 10Framawiki) [18:27:31] MaxSem: ok [18:29:09] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357186/2 (duration: 00m 42s) [18:29:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:29:18] framawiki, ^ [18:29:48] (03PS1) 10Smalyshev: Enable archive indexing on delete for select wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357236 (https://phabricator.wikimedia.org/T163235) [18:30:26] (03PS2) 10Smalyshev: Enable archive indexing on delete for select wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357236 (https://phabricator.wikimedia.org/T162302) [18:31:05] (03PS2) 10MaxSem: Add www.defenceimagery.mod.uk to CopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355594 (https://phabricator.wikimedia.org/T166271) (owner: 10Framawiki) [18:31:10] (03CR) 10MaxSem: [C: 032] Add www.defenceimagery.mod.uk to CopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355594 (https://phabricator.wikimedia.org/T166271) (owner: 10Framawiki) [18:31:58] MaxSem: ok on prod for euwiki patch [18:32:28] (03Merged) 10jenkins-bot: Add www.defenceimagery.mod.uk to CopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355594 (https://phabricator.wikimedia.org/T166271) (owner: 10Framawiki) [18:32:40] (03CR) 10jenkins-bot: Add www.defenceimagery.mod.uk to CopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355594 (https://phabricator.wikimedia.org/T166271) (owner: 10Framawiki) [18:32:56] framawiki, pulled on mwdebug1002 [18:33:01] I can not easily test this patch [18:34:53] okay [18:36:07] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/355594/2 (duration: 00m 39s) [18:36:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:36:25] framawiki, ^ [18:36:53] (03PS2) 10MaxSem: Create Mustand namespace for etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357025 (https://phabricator.wikimedia.org/T166887) (owner: 10Framawiki) [18:36:58] (03CR) 10MaxSem: [C: 032] Create Mustand namespace for etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357025 (https://phabricator.wikimedia.org/T166887) (owner: 10Framawiki) [18:38:01] (03Merged) 10jenkins-bot: Create Mustand namespace for etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357025 (https://phabricator.wikimedia.org/T166887) (owner: 10Framawiki) [18:41:29] 10Operations, 10Security-Reviews, 10Surveys: Re-evaluate Limesurvey - https://phabricator.wikimedia.org/T109606#3316026 (10Elitre) Please define involvement :) Putting it on my team's radar for the moment. Really glad to hear it's actually getting tabled. [18:41:38] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357025/2 (duration: 00m 39s) [18:41:43] 10Operations, 10Community-Liaisons, 10Security-Reviews, 10Surveys: Re-evaluate Limesurvey - https://phabricator.wikimedia.org/T109606#3316028 (10Elitre) [18:41:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:54] MaxSem: last patch is ok on prod [18:43:35] I await the answer of the person who have rights to test the previous patch [18:43:42] framawiki, ^!log Ran mwscript maintenance/namespaceDupes.php --wiki=etwiki --fix [18:43:46] err [18:43:50] !log ran mwscript maintenance/namespaceDupes.php --wiki=etwiki --fix [18:43:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:08] (03PS2) 10MaxSem: Change colors of LanguageStats to comply with WikimediaUI color palette [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357169 (https://phabricator.wikimedia.org/T162058) (owner: 10Ladsgroup) [18:46:13] (03CR) 10MaxSem: [C: 032] Change colors of LanguageStats to comply with WikimediaUI color palette [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357169 (https://phabricator.wikimedia.org/T162058) (owner: 10Ladsgroup) [18:47:11] (03Merged) 10jenkins-bot: Change colors of LanguageStats to comply with WikimediaUI color palette [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357169 (https://phabricator.wikimedia.org/T162058) (owner: 10Ladsgroup) [18:48:00] Amir1, pulled on mwdebug1002, please test [18:48:07] MaxSem: on it [18:49:34] 10Operations, 10Continuous-Integration-Infrastructure: CI for operations/puppet is taking too long - https://phabricator.wikimedia.org/T166888#3310890 (10greg) Looking at the data we have it seems that the tests themselves take about [[ https://integration.wikimedia.org/ci/job/operations-puppet-tests-jessie/bu... [18:51:44] MaxSem: It works just fine [18:52:55] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/357169/2 (duration: 00m 39s) [18:53:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:05] Amir1, ^ [18:53:12] Thanks [18:53:19] (03CR) 10jenkins-bot: Create Mustand namespace for etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357025 (https://phabricator.wikimedia.org/T166887) (owner: 10Framawiki) [18:53:21] (03CR) 10jenkins-bot: Change colors of LanguageStats to comply with WikimediaUI color palette [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357169 (https://phabricator.wikimedia.org/T162058) (owner: 10Ladsgroup) [18:53:34] works fine [18:53:35] Thanks [18:54:31] everything look good for me [18:54:46] did I miss anything? are we done? [18:55:05] MaxSem: yes [18:55:14] MaxSem: what do you want to do with https://gerrit.wikimedia.org/r/#/c/357022/ [18:55:33] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR [18:55:43] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [18:55:46] I've already pinged the security team in phab [18:55:59] Ok, thanks. [18:56:11] have a nice day [18:56:22] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant sudo access for Bryan Davis for labstore* and labsdb* - https://phabricator.wikimedia.org/T166310#3316084 (10RobH) The next ops meeting is scheduled for this Wednesday, June 7th. This will be listed for review. [19:04:44] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage, 10Patch-For-Review: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3316105 (10zhuyifei1999) >>! In T166806#3315900, @Yann wrote: > I tried uploading the first file using https://tools.wmflabs.org/url2commons/ind... [19:07:44] godog: server side upload works now, thanks for the quick fix [19:09:15] (03PS2) 10Ottomata: Add exception for events tagged as coming from MW [puppet] - 10https://gerrit.wikimedia.org/r/356626 (https://phabricator.wikimedia.org/T67508) (owner: 10Fdans) [19:10:30] (03PS3) 10Dzahn: planet: remove "ja" and "ca" (empty), add link to new "el" [puppet] - 10https://gerrit.wikimedia.org/r/356977 [19:13:48] (03PS3) 10Ottomata: Add exception for events tagged as coming from MW [puppet] - 10https://gerrit.wikimedia.org/r/356626 (https://phabricator.wikimedia.org/T67508) (owner: 10Fdans) [19:14:40] (03CR) 10jerkins-bot: [V: 04-1] Add exception for events tagged as coming from MW [puppet] - 10https://gerrit.wikimedia.org/r/356626 (https://phabricator.wikimedia.org/T67508) (owner: 10Fdans) [19:20:00] (03CR) 10Nuria: Add exception for events tagged as coming from MW (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/356626 (https://phabricator.wikimedia.org/T67508) (owner: 10Fdans) [19:25:28] (03PS1) 10Thcipriani: Scap: Bump version to 3.5.8-1 [puppet] - 10https://gerrit.wikimedia.org/r/357239 (https://phabricator.wikimedia.org/T127762) [19:28:57] 10Operations, 10Kubernetes, 10Prod-Kubernetes (Experiment): Load balancing "external" traffic to the Kubernetes cluster in production - https://phabricator.wikimedia.org/T152078#3316158 (10Krinkle) [19:28:59] 10Operations, 10Prod-Kubernetes (Experiment), 10User-Joe: Build calico - https://phabricator.wikimedia.org/T150434#3316159 (10Krinkle) [19:29:02] 10Operations, 10Patch-For-Review, 10Prod-Kubernetes (Experiment), 10User-Joe: Set up docker building environment for production - https://phabricator.wikimedia.org/T149812#3316160 (10Krinkle) [19:29:06] 10Operations, 10Prod-Kubernetes (Experiment): Build Kubernetes for production use - https://phabricator.wikimedia.org/T148968#3316162 (10Krinkle) [19:31:22] 10Operations: eqiad: 1 hardware access request for labs on real hardware (mwoffliner) - https://phabricator.wikimedia.org/T117095#3316186 (10chasemp) [19:31:24] 10Operations, 10Labs, 10Labs-Infrastructure, 10hardware-requests, and 2 others: Labs test cluster in codfw - https://phabricator.wikimedia.org/T114435#3316187 (10chasemp) [19:31:26] 10Operations, 10Labs, 10Labs-Infrastructure, 10labs-sprint-117, 10labs-sprint-118: How to handle mgmt lan for labs bare metal? - https://phabricator.wikimedia.org/T116607#3316184 (10chasemp) 05Open>03declined For now this is totally off the books [19:33:48] 10Operations, 10Labs, 10Labs-Infrastructure, 10labs-sprint-117, 10labs-sprint-118: How to handle mgmt lan for labs bare metal? - https://phabricator.wikimedia.org/T116607#1753690 (10Dzahn) If this is totally off the books, can we remove the existing remnants? [19:37:45] (03CR) 10Dzahn: [C: 032] planet: remove "ja" and "ca" (empty), add link to new "el" [puppet] - 10https://gerrit.wikimedia.org/r/356977 (owner: 10Dzahn) [19:39:00] (03CR) 10Dzahn: [C: 032] remove "ja" and "ca" planet.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/356978 (owner: 10Dzahn) [19:39:59] (03CR) 10BryanDavis: "> Haven't been able to test. Which node can I use in the" [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [19:41:20] (03CR) 10Krinkle: "I suppose in theory it could, but the problem is that the applying of these classes isn't specified in Puppet - I guess these roles are ap" [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [19:43:35] (03CR) 10Thcipriani: [C: 031] "> From a discussion with Tyler. Yes batched deploy sounds like a good" [puppet] - 10https://gerrit.wikimedia.org/r/354186 (https://phabricator.wikimedia.org/T129148) (owner: 10Thcipriani) [19:45:31] huh. We own the domain en.wiki , but its A record points to an amazon aws server? [19:46:14] that seems really odd [19:46:33] 10Operations, 10Traffic, 10Interactive-Sprint, 10Maps (Kartographer), 10Regression: Map tiles load way slower than before - https://phabricator.wikimedia.org/T167046#3315566 (10BBlack) Are you comparing cache hits to cache misses? From where? What was the timing like before? [19:48:31] (03PS3) 10Dzahn: flake8 fixes for E305 [puppet] - 10https://gerrit.wikimedia.org/r/356234 (owner: 10BryanDavis) [19:51:23] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage, 10Patch-For-Review: Server side upload for Yann - https://phabricator.wikimedia.org/T166806#3316250 (10Dereckson) 05Open>03Resolved ```name=Terbium $ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Yann --overw... [19:54:06] (03CR) 10Andrew Bogott: "> the applying of these classes isn't specified in Puppet" [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [19:56:18] (03CR) 10Krinkle: "Would be nice if puppet-compiler supported specifying one or more roles, which would still be quite useful, even if it won't detect change" [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [19:56:27] (03CR) 10Krinkle: "Thanks in advance :)" [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [19:56:29] (03PS17) 10Andrew Bogott: dynamicproxy: Centralise error page template and use it [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [19:57:50] 10Operations, 10Traffic, 10Interactive-Sprint, 10Maps (Kartographer), 10Regression: Map tiles load way slower than before - https://phabricator.wikimedia.org/T167046#3316274 (10BBlack) Another thought - could we be maxing out parallel connections to the kartotherian machines? We've always had a `max_con... [19:59:44] (03CR) 10Dzahn: [C: 032] "newlines only" [puppet] - 10https://gerrit.wikimedia.org/r/356234 (owner: 10BryanDavis) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T2000). Please do the needful. [20:01:07] (03CR) 10Andrew Bogott: [C: 032] dynamicproxy: Centralise error page template and use it [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [20:01:12] deployin gparsoid now [20:01:14] (03PS18) 10Andrew Bogott: dynamicproxy: Centralise error page template and use it [puppet] - 10https://gerrit.wikimedia.org/r/350494 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [20:03:45] !log ssastry@tin Started deploy [parsoid/deploy@bb0613c]: Updating Parsoid to 141fc07d [20:03:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:38] jouncebot: next [20:08:38] In 0 hour(s) and 51 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T2100) [20:09:03] jouncebot: next l10nupdate [20:09:03] In 0 hour(s) and 50 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T2100) [20:09:59] (03CR) 10Dzahn: [C: 04-2] contint: role/profile conversion [puppet] - 10https://gerrit.wikimedia.org/r/355156 (owner: 10Dzahn) [20:10:14] PROBLEM - configured eth on labtestvirt2003 is CRITICAL: eth1 reporting no carrier. [20:10:47] !log ssastry@tin Finished deploy [parsoid/deploy@bb0613c]: Updating Parsoid to 141fc07d (duration: 07m 02s) [20:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:13:08] (03CR) 10Dzahn: [C: 031] phabricator: move hiera lookups to parameters [puppet] - 10https://gerrit.wikimedia.org/r/355871 (owner: 10Dzahn) [20:15:39] jouncebot: next [20:15:39] In 0 hour(s) and 44 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T2100) [20:15:43] Perfect [20:16:13] Reedy: but when is the next l10nupdate?:) [20:16:29] Who cares? :P [20:16:37] andrewbogott: let me know how it works out! [20:16:44] !log updated parsoid to 141fc07d (T166655) [20:16:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:55] Reedy: the one who wants a change in l10nupdate merged [20:16:55] T166655: Linter extension stopped updating errors in sida/page-namespace (104) 2017-05-15 - https://phabricator.wikimedia.org/T166655 [20:17:01] (03PS6) 10Paladox: contint: Only install java 7 on trusty and jessie [puppet] - 10https://gerrit.wikimedia.org/r/356241 (https://phabricator.wikimedia.org/T166611) [20:17:29] Reedy: that would be you [20:17:45] (03PS5) 10Paladox: contint: Only install libmysqlclient-dev if on trusty or jessie [puppet] - 10https://gerrit.wikimedia.org/r/356246 (https://phabricator.wikimedia.org/T166611) [20:18:00] mutante: What change do I want merging? [20:18:16] Krinkle: so far, [20:18:16] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid parameter title on Mediawiki::Errorpage[/var/www/error/errorpage.html] at /etc/puppet/modules/dynamicproxy/manifests/init.pp:120 on node novaproxy-02.project-proxy.eqiad.wmflabs [20:18:19] The cronjob? [20:18:28] scap https://gerrit.wikimedia.org/r/#/c/255958/ [20:18:30] which is not obvious to me what the problem is [20:19:12] heh [20:19:13] Nov 30, 2015 7:34 AM [20:19:15] I'm in no rush :P [20:19:24] lol, ok [20:19:35] Would be nice to finally get it merged, sure [20:20:00] it just appeared new to me :P [20:20:04] because i was added [20:20:29] heh [20:22:45] $title is a reserver word in puppet [20:23:47] so with mediawiki::errorpage { '/var/www/error/errorpage.html': $title is already "/var/www/error/errorpage.html" and you can't do: title => $error_config['title'], afterwards [20:23:52] mutante: ah! Thanks [20:24:06] you just have to use a different name for it [20:27:22] (03PS1) 10Andrew Bogott: dynamic proxy errorpage: s/title/pagetitle/ [puppet] - 10https://gerrit.wikimedia.org/r/357249 [20:27:33] Krinkle: is this what you meant? https://gerrit.wikimedia.org/r/#/c/357249/ [20:27:41] there's a 'pagetitle' and a 'doctitle' param [20:28:27] (03CR) 10jerkins-bot: [V: 04-1] dynamic proxy errorpage: s/title/pagetitle/ [puppet] - 10https://gerrit.wikimedia.org/r/357249 (owner: 10Andrew Bogott) [20:30:28] (03PS2) 10Andrew Bogott: dynamic proxy errorpage: s/title/pagetitle/ [puppet] - 10https://gerrit.wikimedia.org/r/357249 [20:31:49] (03CR) 10Andrew Bogott: [C: 032] dynamic proxy errorpage: s/title/pagetitle/ [puppet] - 10https://gerrit.wikimedia.org/r/357249 (owner: 10Andrew Bogott) [20:32:26] (03PS13) 10Paladox: Gerrit: Use the mariadb plugin instead of mysql [puppet] - 10https://gerrit.wikimedia.org/r/336003 (https://phabricator.wikimedia.org/T145885) [20:38:50] Krinkle: are you gone now? [20:39:40] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 39 probes of 436 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [20:39:41] (03PS1) 10Andrew Bogott: Revert "dynamic proxy errorpage: s/title/pagetitle/" [puppet] - 10https://gerrit.wikimedia.org/r/357252 [20:39:56] (03PS1) 10Andrew Bogott: Revert "dynamicproxy: Centralise error page template and use it" [puppet] - 10https://gerrit.wikimedia.org/r/357253 [20:40:04] (03PS1) 10Nemo bis: [Planet Wikimedia] Add blog.wikimedia.gr to Greek Planet [puppet] - 10https://gerrit.wikimedia.org/r/357254 [20:40:14] (03CR) 10jerkins-bot: [V: 04-1] Revert "dynamicproxy: Centralise error page template and use it" [puppet] - 10https://gerrit.wikimedia.org/r/357253 (owner: 10Andrew Bogott) [20:43:41] (03CR) 10Andrew Bogott: [C: 032] Revert "dynamic proxy errorpage: s/title/pagetitle/" [puppet] - 10https://gerrit.wikimedia.org/r/357252 (owner: 10Andrew Bogott) [20:43:54] (03PS2) 10Andrew Bogott: Revert "dynamicproxy: Centralise error page template and use it" [puppet] - 10https://gerrit.wikimedia.org/r/357253 [20:44:40] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 1 probes of 436 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [20:45:26] (03CR) 10Andrew Bogott: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/357253 (owner: 10Andrew Bogott) [20:46:25] (03CR) 10Andrew Bogott: [C: 032] Revert "dynamicproxy: Centralise error page template and use it" [puppet] - 10https://gerrit.wikimedia.org/r/357253 (owner: 10Andrew Bogott) [20:50:26] (03PS4) 10Reedy: Re-instate "Run Pdf Processors in firejails" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352572 (https://phabricator.wikimedia.org/T164145) [20:58:29] (03CR) 10Reedy: [C: 032] Re-instate "Run Pdf Processors in firejails" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352572 (https://phabricator.wikimedia.org/T164145) (owner: 10Reedy) [20:59:55] (03Merged) 10jenkins-bot: Re-instate "Run Pdf Processors in firejails" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352572 (https://phabricator.wikimedia.org/T164145) (owner: 10Reedy) [21:00:04] dapatrick, bawolff, and Reedy: Respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T2100). Please do the needful. [21:00:09] (03CR) 10jenkins-bot: Re-instate "Run Pdf Processors in firejails" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352572 (https://phabricator.wikimedia.org/T164145) (owner: 10Reedy) [21:01:55] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Run Pdf Processors in firejails T164145 T164000 (duration: 00m 40s) [21:02:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:02:06] T164145: Investigate why firejails break PdfHandler - https://phabricator.wikimedia.org/T164145 [21:02:06] T164000: ghostscript dSafer bypass - https://phabricator.wikimedia.org/T164000 [21:03:23] 10Operations, 10Wikimedia-General-or-Unknown, 10Patch-For-Review: Investigate why firejails break PdfHandler - https://phabricator.wikimedia.org/T164145#3316510 (10Reedy) 05Open>03Resolved a:03Reedy Redeployed, purging PDFs works fine, doesn't break thumbnailing now! [21:19:37] (03PS1) 10Andrew Bogott: dynamicproxy: Centralise error page template and use it [puppet] - 10https://gerrit.wikimedia.org/r/357310 (https://phabricator.wikimedia.org/T113114) [21:23:58] (03CR) 10Andrew Bogott: [C: 032] dynamicproxy: Centralise error page template and use it [puppet] - 10https://gerrit.wikimedia.org/r/357310 (https://phabricator.wikimedia.org/T113114) (owner: 10Andrew Bogott) [21:24:15] (03PS1) 10Nuria: Correct pageview_hourly loading scheme on pivot home [puppet] - 10https://gerrit.wikimedia.org/r/357315 [21:24:39] (03PS2) 10Nuria: Correct pageview_hourly loading scheme on pivot home [puppet] - 10https://gerrit.wikimedia.org/r/357315 [21:34:46] !log deployed patch for T165846 [21:34:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:46] (03CR) 10Andrew Bogott: "The good news: This version of the patch doesn't seem to break anything. The bad news... either it doesn't change the error page, or I d" [puppet] - 10https://gerrit.wikimedia.org/r/357310 (https://phabricator.wikimedia.org/T113114) (owner: 10Andrew Bogott) [21:48:10] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2034334 [21:56:04] (03CR) 10Volans: [C: 04-1] "It improved a lot, keeping the -1 for minor things, but is getting there, nice job." (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/356383 (https://phabricator.wikimedia.org/T108850) (owner: 10Elukey) [21:59:53] (03CR) 10Krinkle: "You probably know this module better than me, but from a quick look I think these pages are available via http://tools.wmflabs.org/.error/" [puppet] - 10https://gerrit.wikimedia.org/r/357310 (https://phabricator.wikimedia.org/T113114) (owner: 10Andrew Bogott) [22:00:20] (03PS3) 10Tjones: Enable BM25 for Chinese wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/350312 (https://phabricator.wikimedia.org/T163829) [22:02:19] (03CR) 10Krinkle: "Also it seems that for banned.html, undef casts to a string?" [puppet] - 10https://gerrit.wikimedia.org/r/357310 (https://phabricator.wikimedia.org/T113114) (owner: 10Andrew Bogott) [22:02:28] !log cp1099 - varnish-backend-restart (mailbox lag) [22:02:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:03:22] !log cp1074 - varnish-backend-restart (mailbox lag) [22:03:25] (03PS1) 10MaxSem: Enable LoginNotify on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357317 (https://phabricator.wikimedia.org/T165007) [22:03:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:10] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [22:08:50] RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 0 [22:33:11] 10Operations, 10Labs, 10Labs-Infrastructure, 10labs-sprint-117, 10labs-sprint-118: How to handle mgmt lan for labs bare metal? - https://phabricator.wikimedia.org/T116607#3316744 (10Andrew) Subbu is still using Prometheum. We have half a plan to clean that up but in the meantime we'll need to keep some... [22:40:50] RECOVERY - salt-minion processes on puppetmaster1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [22:43:50] RECOVERY - designate-api http on labtestservices2001 is OK: HTTP OK: HTTP/1.1 200 OK - 571 bytes in 0.006 second response time [23:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170605T2300). Please do the needful. [23:07:55] I have something for this swat [23:07:59] I make a patch right now [23:08:07] sorry, Lost track of time [23:16:43] (03PS1) 10Ladsgroup: Enable ORES review tool in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357328 (https://phabricator.wikimedia.org/T165044) [23:17:56] I have this for SWAT [23:17:56] https://gerrit.wikimedia.org/r/357328 [23:18:09] thcipriani: I hope that you're around :) [23:18:53] Amir1: heh, yes, I am around, I can SWAT. [23:19:06] Thanks [23:20:51] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357328 (https://phabricator.wikimedia.org/T165044) (owner: 10Ladsgroup) [23:21:48] (03Merged) 10jenkins-bot: Enable ORES review tool in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357328 (https://phabricator.wikimedia.org/T165044) (owner: 10Ladsgroup) [23:21:57] (03CR) 10jenkins-bot: Enable ORES review tool in frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/357328 (https://phabricator.wikimedia.org/T165044) (owner: 10Ladsgroup) [23:23:26] !log frwiki create tables ores_model and ores_classification T165044 [23:23:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:37] T165044: Deploy ORES review tool on French Wikipedia - https://phabricator.wikimedia.org/T165044 [23:27:23] Amir1: ok, created tables and pulled changes over on mwdebug1002, check there then I'll sync everywhere then run maintenance on terbium [23:27:35] okay [23:27:52] that's the order of operations IIRC, right? [23:30:02] thcipriani: Yes, it looks okay to me [23:30:29] ok, syncing live [23:32:25] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:357328|Enable ORES review tool in frwiki]] T165044 (duration: 00m 40s) [23:32:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:32:34] T165044: Deploy ORES review tool on French Wikipedia - https://phabricator.wikimedia.org/T165044 [23:32:41] ^ Amir1 live now, running maintenance [23:32:52] Fantastic [23:33:06] \o/ [23:33:58] !log running on terbium: mwscript extensions/ORES/maintenance/CheckModelVersions.php frwiki && mwscript extensions/ORES/maintenance/PopulateDatabase.php frwiki [23:34:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:23] Amir1: halfak populateDatabase.php just finished, FYI [23:51:35] halfak is afk [23:51:35] (03PS2) 10Krinkle: Capture messages on 'autoloader' debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/356841 (https://phabricator.wikimedia.org/T166759) (owner: 10Ori.livneh) [23:51:37] thanks [23:52:21] Strangely it's still slow [23:52:47] I check it out later on and then change the threshold if it's still slow [23:52:55] thanks for doing it thcipriani [23:53:10] yw :)