[01:15:36] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:41:53] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [01:51:24] PROBLEM - puppet last run on labcontrol1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:13:05] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:17:44] RECOVERY - puppet last run on labcontrol1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [02:28:12] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.20) (duration: 13m 16s) [02:28:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:39:36] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:26:18] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2683465 (10Liuxinyu970226) >>! In T10217#2175710, @Kaihsu wrote: > The ea... [03:32:09] 06Operations, 06Labs, 10wikitech.wikimedia.org: Can't login wikitech - https://phabricator.wikimedia.org/T144805#2610832 (10Liuxinyu970226) I also have this problem since March, before that I never set that thing [03:34:26] PROBLEM - Disk space on dubnium is CRITICAL: DISK CRITICAL - free space: / 657 MB (3% inode=94%) [04:14:01] PROBLEM - puppet last run on elastic1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:38:06] RECOVERY - puppet last run on elastic1024 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [05:03:40] PROBLEM - puppet last run on dubnium is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[tshark],Package[tmux] [05:18:29] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:20:49] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [05:26:35] 06Operations, 06Labs, 10wikitech.wikimedia.org: Can't login wikitech - https://phabricator.wikimedia.org/T144805#2683492 (10zhuyifei1999) March? sounds like related to T130892. Pinging @csteipp @dpatrick @jcrespo [06:23:11] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 676 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3286600 keys - replication_delay is 676 [06:30:21] !log altering S3,S4,S5,S6,S7 user_groups tables in sanitarium to avoid tokudb bug - T146121 [06:30:22] T146121: db1069: convert user_groups table to InnoDB across all the wikis - https://phabricator.wikimedia.org/T146121 [06:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:30:33] PROBLEM - MegaRAID on db1055 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [06:30:52] I will check db1055 [06:33:01] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[debian-goodies] [06:38:35] 06Operations, 10ops-eqiad, 10DBA: db1055: degraded array - https://phabricator.wikimedia.org/T147172#2683544 (10Marostegui) [06:39:35] ACKNOWLEDGEMENT - MegaRAID on db1055 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Marostegui T147172 [06:52:23] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3265111 keys - replication_delay is 41 [06:57:12] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [07:24:55] !log emptying /var/log/debug on dubnium because of disk full (the same data is on syslog) T147173 [07:24:56] T147173: dubnium disk full - https://phabricator.wikimedia.org/T147173 [07:24:57] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 974634 msg (=800000 warning): ocg_render_job_queue 3000 msg (=3000 critical) [07:25:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:26:07] RECOVERY - Disk space on dubnium is OK: DISK OK [07:31:28] (03PS2) 10Elukey: Reduce cronspam from graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/313573 (https://phabricator.wikimedia.org/T144797) [07:33:56] (03CR) 10Elukey: [C: 032] Reduce cronspam from graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/313573 (https://phabricator.wikimedia.org/T144797) (owner: 10Elukey) [07:35:17] yuvipanda: hello! Should I merge your change too? 9cd275c [07:35:47] elukey: uh, the grafana one? [07:36:13] elukey: or the k8s one? [07:36:18] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 977458 msg (=800000 warning): ocg_render_job_queue 3029 msg (=3000 critical) [07:36:19] mmmm weird, I can see on puppetmaster1001 YuviPanda^O: k8s: Make the kubernetes user be a member of ssl-certs group (9cd275c) [07:36:22] the k8s one it looks like. I thought I merged it a few days ago [07:36:44] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 977686 msg (=800000 warning): ocg_render_job_queue 3121 msg (=3000 critical) [07:37:23] RECOVERY - puppet last run on dubnium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:37:33] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 977909 msg (=800000 warning): ocg_render_job_queue 3160 msg (=3000 critical) [07:38:43] yuvipanda: so should I merge? [07:39:35] elukey: getting on puppetmaster to take a look [07:39:43] ah nice, thanks :) [07:40:08] elukey: ok, merged both [07:40:10] looked ok [07:40:14] not sure why didn't merge earlier [07:40:35] super thanks! [07:45:26] * elukey checking v [07:45:28] https://wikitech.wikimedia.org/wiki/OCG#Monitoring [07:47:00] seems a burst of requests waiting to be processed by OCG no? I don't see anything weird going on [07:47:08] <_joe_> elukey: yes [07:47:25] <_joe_> elukey: we should check what they're requesting, maybe [07:47:35] <_joe_> but low priority [07:48:33] (03PS3) 10Giuseppe Lavagetto: scap::source: also define the corresponding dsh group [puppet] - 10https://gerrit.wikimedia.org/r/306431 [07:48:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313776 (https://phabricator.wikimedia.org/T147113) [07:51:06] (03CR) 10Jcrespo: [C: 031] db-eqiad.php: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313776 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [07:53:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313776 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [07:54:06] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313776 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [07:56:46] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1081 for maintenance - T147113 (duration: 00m 50s) [07:56:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:57:20] mw1207.eqiad.wmnet failed to deploy - looks like that server is down [07:57:27] It is not replying to ping [07:58:04] Will it get the code once it gets back? [07:58:14] PROBLEM - puppet last run on ms-be1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:58:52] <_joe_> marostegui: not guaranteed, let me check what's the status of that machine [07:59:17] cheers [08:03:36] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 686 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3266276 keys - replication_delay is 686 [08:04:04] <_joe_> !log powercycling mw1207 [08:04:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:06:37] RECOVERY - Host mw1207 is UP: PING OK - Packet loss = 0%, RTA = 1.93 ms [08:21:32] RECOVERY - puppet last run on ms-be1017 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [08:23:45] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [08:23:56] (03PS1) 10ArielGlenn: add nschaaf to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/313777 (https://phabricator.wikimedia.org/T146924) [08:24:50] 06Operations, 10Ops-Access-Requests, 10netops: Access to network devices - https://phabricator.wikimedia.org/T147061#2683656 (10ArielGlenn) p:05Triage>03Normal [08:25:49] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stat1002 and stat1004 for nschaaf - https://phabricator.wikimedia.org/T146924#2683657 (10ArielGlenn) p:05Triage>03Normal [08:28:55] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:30:17] _joe_: Do I need to do something to get that host sync'ed with the deploy I did? [08:31:16] 06Operations: dubnium disk full - https://phabricator.wikimedia.org/T147173#2683661 (10Volans) [08:32:23] marostegui: from https://config-master.wikimedia.org/conftool/eqiad/api it seems pooled, maybe to only a 'scap pull' on the host to make sure that it has the last version (please note, non authoritative answer :P) [08:33:20] <_joe_> marostegui: I already did that (scap pull) after restarting it [08:33:36] Ah, thanks guys. So next time ssh to the host and scap pull and that's it? [08:33:49] <_joe_> elukey: you were 100% correct [08:33:57] <_joe_> marostegui: whoever restarts it should do it [08:34:55] 06Operations, 06Performance-Team, 10Thumbor: Figure out a way to live-debug running production thumbor processes - https://phabricator.wikimedia.org/T146143#2683665 (10fgiunchedi) @Gilles thanks, looks great! [08:35:11] Ah, gotcha [08:35:14] Thanks :) [08:42:23] 06Operations, 06Performance-Team, 10Thumbor: Separate 404s into their own log - https://phabricator.wikimedia.org/T145632#2683669 (10Gilles) [08:46:13] !log rebooted compiler02.puppet3-diffs.eqiad.wmflabs (not reachable by Jenkins, pingable from bastions but no ssh available) [08:46:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:49:25] PROBLEM - puppet last run on lvs1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:51:43] 06Operations, 06Performance-Team, 10Thumbor: Figure out a way to live-debug running production thumbor processes - https://phabricator.wikimedia.org/T146143#2683678 (10fgiunchedi) >>! In T146143#2670131, @Gilles wrote: > @fgiunchedi can you build it and put it on jessie-wikimedia? yep, uploaded now to `jess... [08:55:13] Hi, one question. Is the limit about 8 patches per SWAT window non-breakable? I have 8 patches scheduled for todays EU SWAT and I want to have deployed nine patches. The last one is scheduled for tomorrow EU but having it deployed at once will be better for me. Could I reschedule the last one (313658) for today too? Thanks in advance for any help! [08:58:02] (03PS1) 10Marostegui: files/dhcpd/linux-host-entries.ttyS1-115200: Remove trusty lines as it will be reinstalled [puppet] - 10https://gerrit.wikimedia.org/r/313780 (https://phabricator.wikimedia.org/T146261) [08:58:31] 06Operations, 06Performance-Team, 10Thumbor: Separate 404s into their own log - https://phabricator.wikimedia.org/T145632#2683685 (10Gilles) [09:04:35] (03PS2) 10Marostegui: files/dhcpd/linux-host-entries.ttyS1-115200: Remove trusty installation [puppet] - 10https://gerrit.wikimedia.org/r/313780 (https://phabricator.wikimedia.org/T146261) [09:08:24] !log clean exim queues on mx1001 from backscatter spam [09:08:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:09:09] (03PS3) 10Marostegui: Remove trusty installation [puppet] - 10https://gerrit.wikimedia.org/r/313780 (https://phabricator.wikimedia.org/T146261) [09:09:54] (03CR) 10Elukey: [C: 031] Remove trusty installation [puppet] - 10https://gerrit.wikimedia.org/r/313780 (https://phabricator.wikimedia.org/T146261) (owner: 10Marostegui) [09:10:06] (03PS1) 10Gilles: Upgrade to 0.1.22 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/313781 [09:11:08] (03CR) 10Marostegui: [C: 032] Remove trusty installation [puppet] - 10https://gerrit.wikimedia.org/r/313780 (https://phabricator.wikimedia.org/T146261) (owner: 10Marostegui) [09:13:05] (03PS4) 10Marostegui: Remove trusty installation [puppet] - 10https://gerrit.wikimedia.org/r/313780 (https://phabricator.wikimedia.org/T146261) [09:14:03] !log T147173 clean exim queues on mx1001 from backscatter spam [09:14:04] T147173: dubnium disk full - https://phabricator.wikimedia.org/T147173 [09:14:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:14:39] RECOVERY - puppet last run on lvs1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:23:49] (03PS1) 10Gilles: Automatic async cleanup of thumbor temp files [puppet] - 10https://gerrit.wikimedia.org/r/313782 [09:23:51] (03PS1) 10Gilles: Enable manhole on thumbor [puppet] - 10https://gerrit.wikimedia.org/r/313783 [09:24:03] (03PS1) 10Hashar: [throttle] Rule for Winona State University [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313784 (https://phabricator.wikimedia.org/T146600) [09:24:20] (03PS2) 10Gilles: Automatic async cleanup of thumbor temp files [puppet] - 10https://gerrit.wikimedia.org/r/313782 [09:24:33] (03PS2) 10Gilles: Enable manhole on thumbor [puppet] - 10https://gerrit.wikimedia.org/r/313783 [09:24:51] (03Abandoned) 10Hashar: [throttle] Rule for Winona State University [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312815 (https://phabricator.wikimedia.org/T146600) (owner: 10Urbanecm) [09:24:55] (03Abandoned) 10Hashar: [throttle] Rule for Winona State University [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312816 (https://phabricator.wikimedia.org/T146600) (owner: 10Urbanecm) [09:24:58] (03Abandoned) 10Hashar: [throttle] Rule for Winona State University [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312842 (https://phabricator.wikimedia.org/T146600) (owner: 10Urbanecm) [09:25:41] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3254025 keys - replication_delay is 0 [09:25:55] !log rolling restart of elasticsearch codfw cluster for kernel upgrade - T146123 [09:26:03] (03PS4) 10Hashar: [throttle] Ada Lovelace Day Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312852 (https://phabricator.wikimedia.org/T146654) (owner: 10Urbanecm) [09:26:22] gehel: nothing better to start the week than a cluster restart [09:26:33] elukey: just like every other week :) [09:26:42] (03CR) 10jenkins-bot: [V: 04-1] [throttle] Ada Lovelace Day Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312852 (https://phabricator.wikimedia.org/T146654) (owner: 10Urbanecm) [09:26:51] gehel: extra space before the !_log [09:27:06] p858snake|L2: oops... thanks! [09:27:08] !log rolling restart of elasticsearch codfw cluster for kernel upgrade - T146123 [09:27:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:28:05] !log dbstore2001 going to be reimaged as jessie [09:28:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:28:23] (03CR) 10Hashar: [C: 031] [throttle] Rule for Winona State University [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313784 (https://phabricator.wikimedia.org/T146600) (owner: 10Hashar) [09:31:34] (03PS5) 10Hashar: [throttle] Ada Lovelace Day Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312852 (https://phabricator.wikimedia.org/T146654) (owner: 10Urbanecm) [09:32:35] !log T147173 clean exim queues on mx1001 from backscatter spam. Seems to be originating from mx.{east,west}.cox.net, blocked them for now [09:32:37] T147173: dubnium disk full - https://phabricator.wikimedia.org/T147173 [09:32:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:40:18] 06Operations: dubnium disk full - https://phabricator.wikimedia.org/T147173#2683741 (10akosiaris) 05Open>03stalled p:05Triage>03Low Setting to stalled for a while.. will have a look later in the day whether the backscatter continues [09:48:52] !log lowered down builds log retention from 90 to 60 days for the puppet compiler (https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/) [09:48:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:49:41] (03PS1) 10Marostegui: db-eqiad.php: Pool db1081 back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313787 (https://phabricator.wikimedia.org/T147113) [09:52:13] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Pool db1081 back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313787 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [09:52:39] (03Merged) 10jenkins-bot: db-eqiad.php: Pool db1081 back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313787 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [09:54:38] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1081 after finishing its maintenance - T147113 (duration: 00m 49s) [09:54:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:58:25] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 / webrequest logs for MelodyKramer - https://phabricator.wikimedia.org/T145387#2683766 (10elukey) @ArielGlenn thanks! I was wrong, I didn't check that `researchers` only offers a `/etc/mysql/conf.d/research-client.cnf` on stat1003 not stat1002.... [09:59:46] 06Operations, 13Patch-For-Review: Migrate pool counters to trusty/jessie - https://phabricator.wikimedia.org/T123734#2683773 (10fgiunchedi) [09:59:48] 06Operations, 13Patch-For-Review: Build poolcounter for jessie - https://phabricator.wikimedia.org/T146277#2683770 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi Done, package built and uploaded to `jessie-wikimedia` [10:00:36] 06Operations, 07LDAP: Fix LDAP replication OIT hostname - https://phabricator.wikimedia.org/T82675#2683775 (10akosiaris) 05Open>03Invalid No longer valid. We got rid of OpenDJ, we got OpenLDAP these days. [10:00:53] 06Operations, 13Patch-For-Review: Migrate pool counters to trusty/jessie - https://phabricator.wikimedia.org/T123734#1936635 (10fgiunchedi) I tried provisioning `deployment-poolcounter03` with jessie to migrate beta too but ATM the instance is not accessible via ssh and console shows ``` Debian GNU/Linux 8 d... [10:04:51] (03PS2) 10Kerberizer: Fix an invalid empty line in the global robots.txt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313763 (https://phabricator.wikimedia.org/T146908) [10:06:36] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/4188/ - looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/313558 (owner: 10Andrew Bogott) [10:06:41] (03PS3) 10Elukey: Puppetize the upstart logrotate script on Trusty. [puppet] - 10https://gerrit.wikimedia.org/r/313558 (owner: 10Andrew Bogott) [10:08:10] (03PS2) 10Alexandros Kosiaris: Use DB_LOG_AUTOREMOVE for openldap database [puppet] - 10https://gerrit.wikimedia.org/r/305992 (https://phabricator.wikimedia.org/T143302) (owner: 10Muehlenhoff) [10:08:22] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Use DB_LOG_AUTOREMOVE for openldap database [puppet] - 10https://gerrit.wikimedia.org/r/305992 (https://phabricator.wikimedia.org/T143302) (owner: 10Muehlenhoff) [10:09:54] PROBLEM - puppet last run on kafka1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:11:09] !log restarting slapd on pollux.wikimedia.org T143302 [10:11:11] T143302: BDB transaction files on OpenLDAP servers - https://phabricator.wikimedia.org/T143302 [10:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:13:14] !log restarting slapd on serpens.wikimedia.org T143302 [10:13:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:14:32] volans: /dev/vda1 19G 4.3G 14G 25% / [10:14:33] ^ [10:14:40] seems to have worked flawlessly [10:14:45] moving on with the other 2 servers [10:15:08] great akosiaris! thanks for taking care of it [10:15:34] RECOVERY - Disk space on serpens is OK: DISK OK [10:16:49] !log restarting slapd on seaborgium.wikimedia.org T143302 [10:16:50] T143302: BDB transaction files on OpenLDAP servers - https://phabricator.wikimedia.org/T143302 [10:16:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:21:51] !log restarting slapd on dubnium.wikimedia.org T143302 [10:21:52] T143302: BDB transaction files on OpenLDAP servers - https://phabricator.wikimedia.org/T143302 [10:21:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:24:50] 06Operations, 07LDAP: BDB transaction files on OpenLDAP servers - https://phabricator.wikimedia.org/T143302#2683841 (10akosiaris) 05Open>03Resolved a:03akosiaris All 4 servers have been manually migrated and checked. Disk space usage dropped on all 4, but mostly on the labs ones. Everything seems ok. T... [10:26:56] (03PS1) 10Filippo Giunchedi: deployment-prep: Move poolcounter to deployment-poolcounter04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313789 (https://phabricator.wikimedia.org/T123734) [10:30:42] (03PS1) 10Marostegui: db-eqiad.php: Increase weight db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313791 (https://phabricator.wikimedia.org/T147113) [10:31:25] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313791 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [10:31:51] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313791 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [10:32:24] RECOVERY - puppet last run on kafka1020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:33:40] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1081 weight after finishing its maintenance - T147113 (duration: 00m 48s) [10:33:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:40:12] (03CR) 10Alex Monk: "I thought 02 was new? Or did I create it as trusty? On phone, can't check right now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313789 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [10:44:23] (03CR) 10Filippo Giunchedi: "@Alex, yeah 02 is trusty, I've packaged/uploaded poolcounter for jessie last week and 04 is running poolcounterd" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313789 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [10:46:14] (03PS2) 10Gilles: Upgrade to 0.1.22 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/313781 [10:47:04] 06Operations, 15User-Joe: Docker installation for production kubernetes - https://phabricator.wikimedia.org/T147181#2683877 (10Joe) [10:53:38] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 970235 msg (=800000 warning): ocg_render_job_queue 3010 msg (=3000 critical) [10:53:49] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 970326 msg (=800000 warning): ocg_render_job_queue 3025 msg (=3000 critical) [10:56:20] (03CR) 10Filippo Giunchedi: [C: 04-1] "Generally LGTM, some comments" (035 comments) [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/313781 (owner: 10Gilles) [10:56:34] another peak of incoming jobs: https://grafana.wikimedia.org/dashboard/db/ocg?panelId=2&fullscreen (OCG) [11:02:32] 06Operations, 15User-Joe: Docker installation for production kubernetes - https://phabricator.wikimedia.org/T147181#2683894 (10yuvipanda) Yeah, I think it should be refactored into multiple classes based on their functionality. Also if we use devicemapper graphdriver in prod, the script *could* be adapted to w... [11:08:15] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 974812 msg (=800000 warning): ocg_render_job_queue 3105 msg (=3000 critical) [11:08:54] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 975072 msg (=800000 warning): ocg_render_job_queue 3207 msg (=3000 critical) [11:09:50] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 975341 msg (=800000 warning): ocg_render_job_queue 3236 msg (=3000 critical) [11:11:48] 06Operations, 15User-Joe: Docker installation for production kubernetes - https://phabricator.wikimedia.org/T147181#2683932 (10yuvipanda) Another option would be to use overlayfs2 driver available in Docker 1.12, with a recent (4.5+) kernel. [11:13:57] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:18:35] (03PS1) 10Marostegui: db-eqiad.php: Increase weight db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313792 (https://phabricator.wikimedia.org/T147113) [11:19:36] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313792 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [11:20:04] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313792 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [11:20:07] PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:21:28] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1081 weight after finishing its maintenance - T147113 (duration: 00m 48s) [11:32:40] (03PS2) 10Filippo Giunchedi: puppet_compiler: Restrict to labs networks [puppet] - 10https://gerrit.wikimedia.org/r/307275 (owner: 10Muehlenhoff) [11:37:37] 06Operations, 10DBA: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#2684019 (10Marostegui) @Cmjohnson let me know if you want to proceed with this upgrade sometime this week? This server needs to be depooled first. [11:37:49] (03CR) 10Filippo Giunchedi: [C: 032] puppet_compiler: Restrict to labs networks [puppet] - 10https://gerrit.wikimedia.org/r/307275 (owner: 10Muehlenhoff) [11:38:42] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:44:43] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [11:46:32] (03CR) 10Gilles: "Concerns addressed here: https://phabricator.wikimedia.org/D400" (035 comments) [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/313781 (owner: 10Gilles) [11:48:01] (03PS1) 10Marostegui: db-eqiad.php: Restore original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313794 (https://phabricator.wikimedia.org/T147113) [11:49:30] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313794 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [11:49:53] (03Merged) 10jenkins-bot: db-eqiad.php: Restore original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313794 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [11:51:44] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1081 weight to its original value after finishing maintenance - T147113 (duration: 00m 48s) [11:51:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:03:12] PROBLEM - puppet last run on snapshot1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:03:55] so from https://grafana-admin.wikimedia.org/dashboard/db/ocg?panelId=2&fullscreen OCG's queue seems out of control [12:04:21] oh no [12:05:03] elukey: it has its own queue system or is that the regular MW job queue ? [12:05:34] IIRC it has its own redis jobqueue [12:05:42] on one of the jobqueue hosts [12:06:09] I am pretty sure that somebody is requesting tons of pdfs for entire categories :D [12:07:02] pretty sure that some ebooks market place often scrap the whole wikibooks.org [12:07:14] ah that too [12:07:30] * hashar digs in logstash [12:10:38] (03PS1) 10Marostegui: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313796 (https://phabricator.wikimedia.org/T147113) [12:11:03] hashar: didn't find ocg in logstash, I am taling the logs.. if you find it, please let me know :) [12:11:43] elukey: you want type:"mw-ocg-service" [12:12:12] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313796 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [12:12:40] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313796 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [12:13:44] ah nice! [12:14:31] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1084 for maintenance - T147113 (duration: 00m 48s) [12:14:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:16:48] I can see a lot of "could not fetch from redis" [12:16:54] not sure if we are hitting a limit [12:23:10] !log Deploying alter table in S4 - T147113 [12:23:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:28:19] elukey: slightly enhanced the dashboard https://grafana.wikimedia.org/dashboard/db/ocg [12:28:32] I have reused a graph from the mw job queue [12:29:21] wow nice! [12:29:26] ah [12:29:29] RECOVERY - puppet last run on snapshot1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:29:41] and on restbase there are a bunch of requests failling from a user-agent mw-ocg-bundler [12:29:53] though that is a low rate [12:30:04] https://logstash.wikimedia.org/goto/be1ddad14b26ac837780f4de5dfed598 [12:30:17] socket hang up [12:30:18] bah [12:30:49] I don't know a lot the service but I guess that RB enqueues stuff to Redis (or maybe another thing does it) and then OCG simply depools jobs from it [12:30:53] why do softwares need sockets anyway [12:30:57] hahaha [12:31:04] probably the queue is reaching a limit? [12:31:44] so the first thing to do is to understand who is causing these requests [12:31:48] Error: Error fetching restbase1 result: some huwikinews url [12:31:48] at /srv/deployment/ocg/ocg/mw-ocg-bundler/lib/parsoid.js:239:20 at /srv/deployment/ocg/ocg/mw-ocg-bundler/lib/retry-request.js:77:3 [12:31:58] so really I dont know [12:32:08] OCG --> some parsoid lib --> restbase [12:32:22] then that one has been going for for quite a while [12:32:47] mobrovac: aloha, you there? [12:33:39] beside that ... logstash is filled with the usual flow of errors from ocg [12:35:10] I can see a lot of 504s from restbase, that are the socket hang up mentioned.. [12:37:33] elukey: yup, what's up? [12:37:49] elukey: euh, our restbase? or aqs? [12:38:24] elukey: maybe ocg had a bunch of jobs added and is running at max capacity? [12:38:58] mobrovac: o/ so OCG's queue len is huge now, I believe due to tons of jobs requested recently [12:39:13] but it is only a speculation, I don't really know the service well [12:39:50] elukey: ocg has no conn to rb [12:40:06] in the sense that ocg is not an rb back-end [12:40:22] it might be using RB to get some data though [12:41:30] sure sure, I wanted to get some insight from you if you know who puts jobs in the Redis Job queue that OCG polls (this is my understanding) [12:41:42] basically I am trying to check what is creating this huge backlog of jobs [12:41:52] yes, that's the way it works [12:42:03] they are usually user-triggered [12:42:14] but it may happen that a collection is being re-rendered [12:42:22] due to some template updates or something [12:42:36] i can see there are transclude jobs piling up for change-prop [12:42:51] so a similar thing might be happening for OCG [12:42:59] elukey: check the jobqueue runners [12:45:19] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 / webrequest logs for MelodyKramer - https://phabricator.wikimedia.org/T145387#2684143 (10MelodyKramer) Thanks @ArielGlenn and @elukey - I will try this today. [12:45:40] (03PS1) 10Urbanecm: Enable subpages for main namespace in arbcom_nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313797 (https://phabricator.wikimedia.org/T147186) [12:46:43] I am not seeing anything weird from grafana related to the job queues [12:49:29] so I am tailing ocg logs on ocg1002 with tail -f ocg.log | grep "picking up job" | grep metabook --color [12:50:09] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1530 [12:50:32] that should show what the OCG worker are picking up from Redis [12:51:07] yes [12:52:40] !log elasticsearch@eqiad: reducing replica count from 5 to 3 for dewiki_titlesuggest [12:52:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:53:23] most of them have format 'nuwiki' [12:54:06] hashar: what's the plan for eu swat? should I do it? [12:55:12] RECOVERY - check_mysql on fdb2001 is OK: Uptime: 612377 Threads: 1 Questions: 107182730 Slow queries: 3323 Opens: 5331 Flush tables: 2 Open tables: 530 Queries per second avg: 175.027 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 468 [12:57:51] mobrovac: checked the eventbus dashboard, now I get what you were saying about transclude jobs [12:58:35] but if this is the result of a bit template change I am not sure how to handle the whole situation [12:59:27] going to step afk for a bit to eat something [13:00:48] !log elasticsearch@eqiad: reducing replica count from 5 to 3 for enwiki_titlesuggest [13:00:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:01:34] o/ [13:01:41] zeljkof: yeah please do :] [13:01:42] hashar: is jouncebot down? [13:01:45] I am around as needed [13:01:50] no clue forget the bot :D [13:02:00] ok, in that case: I can SWAT today! :D [13:02:12] !log starting EU SWAT [13:02:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:02:49] Urbanecm: ready for SWAT? :) [13:02:54] Sure [13:03:14] hashar: you are contact person for 313763? [13:03:31] Urbanecm: great, in that case starting with your patches [13:03:52] can you test all of them at mw1099? [13:05:36] 06Operations, 06Discovery, 06Maps, 03Interactive-Sprint: reimage maps-test* servers - https://phabricator.wikimedia.org/T147194#2684215 (10Gehel) [13:06:07] shutting down services on maps-test* servers prior to reimage -T147194 [13:06:08] T147194: reimage maps-test* servers - https://phabricator.wikimedia.org/T147194 [13:06:27] zeljkof: for the throttling ones, you can just +2 all of them [13:06:40] scap pull on mw1099 then verify that mw1099 still yields content [13:06:57] the robots.txt yeah we can push it easily [13:07:19] hashar: how do I check that mw1099 yields content? [13:07:33] enable the wikimedia debug ext in your browser ? :) [13:07:42] are you saying I should start with throttling commits? [13:07:43] browse the site [13:07:50] hashar: oh, I see, I get it now [13:07:53] well the order does not matter [13:08:13] zeljkof: except throttling patches yes, they are testable at mw1099. [13:08:19] but both throttling changes can be tested/synced together [13:09:05] hashar, Urbanecm: ok, in that case merging the couple of throttling patches first [13:09:17] !log shutting down services on maps-test* servers prior to reimage -T147194 [13:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:09:27] (03PS2) 10Zfilipin: [throttle] Rule for Winona State University [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313784 (https://phabricator.wikimedia.org/T146600) (owner: 10Hashar) [13:09:49] zeljkof: Okay. [13:10:46] LALALALALALA [13:11:02] Platonides: that was quick :) [13:11:11] :D [13:11:17] (03PS2) 10Filippo Giunchedi: poolcounter: move to modules/role [puppet] - 10https://gerrit.wikimedia.org/r/313564 (https://phabricator.wikimedia.org/T123734) [13:11:29] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313784 (https://phabricator.wikimedia.org/T146600) (owner: 10Hashar) [13:11:47] !log elasticsearch@eqiad: reducing replica count from 5 to 2 for frwiki_titlesuggest [13:11:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:11:59] (03Merged) 10jenkins-bot: [throttle] Rule for Winona State University [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313784 (https://phabricator.wikimedia.org/T146600) (owner: 10Hashar) [13:12:01] i thought Platonides was a bot )) [13:12:30] (03PS6) 10Zfilipin: [throttle] Ada Lovelace Day Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312852 (https://phabricator.wikimedia.org/T146654) (owner: 10Urbanecm) [13:12:53] (03CR) 10Filippo Giunchedi: [C: 032] poolcounter: move to modules/role [puppet] - 10https://gerrit.wikimedia.org/r/313564 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [13:13:40] !log reimage of maps-test2001 - T147194 [13:13:41] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312852 (https://phabricator.wikimedia.org/T146654) (owner: 10Urbanecm) [13:13:42] T147194: reimage maps-test* servers - https://phabricator.wikimedia.org/T147194 [13:13:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:14:17] (03Merged) 10jenkins-bot: [throttle] Ada Lovelace Day Edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312852 (https://phabricator.wikimedia.org/T146654) (owner: 10Urbanecm) [13:14:46] hashar: not related to swat, but is this job stuck? https://integration.wikimedia.org/ci/job/operations-mw-config-typos/3003/console [13:15:09] 313784 and 312852 are merged [13:15:51] !log elasticsearch@eqiad: reducing replica count from 5 to 2 for zhwiki_titlesuggest [13:15:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:16:46] hashar, Urbanecm: as far as I understood, throttling changes can not be tested at mw1099 [13:16:52] so I can just deploy them? [13:17:10] zeljkof: Yes, please deploy them everywhere. [13:17:18] ok, deploying [13:17:45] thx [13:18:01] (03CR) 10jenkins-bot: [V: 04-1] deployment-prep: Move poolcounter to deployment-poolcounter04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313789 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [13:18:06] zeljkof: yeah the job is stuck somehow [13:18:52] (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313789 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [13:19:01] But as I can see it's killed by hashar :) [13:19:05] !log elasticsearch@eqiad: reducing replica count from 5 to 2 for ruwiki_titlesuggest [13:19:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:19:45] (03PS2) 10Filippo Giunchedi: prometheus: optionally print labs targets according to format() [puppet] - 10https://gerrit.wikimedia.org/r/311099 [13:21:12] !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:313784|[throttle] Rule for Winona State University (T146600)]] [[gerrit:312852|[throttle] Ada Lovelace Day Edit-a-thon (T146654)]] (duration: 00m 49s) [13:21:14] T146600: Lift the Wikipedia Account Creation Limit on 2016-10-04 and other dates for Winona State University - https://phabricator.wikimedia.org/T146600 [13:21:15] T146654: Ada Lovelave Day Edit-a-thon - https://phabricator.wikimedia.org/T146654 [13:21:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:21:34] (03CR) 10Mobrovac: "LGTM, one minor comment in-lined" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/313619 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [13:21:37] zeljkof: Are they deployed? [13:21:43] Urbanecm: 313784 and 312852 are deployed [13:21:45] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: optionally print labs targets according to format() [puppet] - 10https://gerrit.wikimedia.org/r/311099 (owner: 10Filippo Giunchedi) [13:21:57] thx. Let's move at the others. [13:22:06] now moving to config changes [13:22:14] okay [13:22:25] (03PS2) 10Zfilipin: Change protection level autoreview in arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312734 (https://phabricator.wikimedia.org/T146575) (owner: 10Urbanecm) [13:22:58] starting with 312734 [13:23:06] !log elasticsearch@eqiad: reducing replica count from 5 to 2 for jawiki_titlesuggest and eswiki_titlesuggest [13:23:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:23:45] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312734 (https://phabricator.wikimedia.org/T146575) (owner: 10Urbanecm) [13:23:48] zeljkof: Okay. [13:24:18] (03Merged) 10jenkins-bot: Change protection level autoreview in arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312734 (https://phabricator.wikimedia.org/T146575) (owner: 10Urbanecm) [13:26:23] Urbanecm: 312734 is at mw1099 [13:26:29] can you take a look? [13:26:34] Okay. [13:27:01] (03PS3) 10Zfilipin: Fix hewiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310798 (https://phabricator.wikimedia.org/T145017) (owner: 10Urbanecm) [13:28:00] zeljkof: It works, please deploy it everywhere. [13:28:11] Urbanecm: deploying... [13:28:32] 06Operations, 10Traffic, 07HTTPS, 07Wikimedia-Incident: Make OCSP Stapling support more generic and robust - https://phabricator.wikimedia.org/T93927#2684299 (10BBlack) This is still technically an outstanding issue that should be addressed, but it's relatively low priority with relatively low risk, at lea... [13:31:00] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:312734|Change protection level autoreview in arwiki (T146575)]] (duration: 00m 48s) [13:31:01] T146575: change protection level "autoreview" in Arwiki - https://phabricator.wikimedia.org/T146575 [13:31:04] Urbanecm: deployed, please check production [13:31:06] (03CR) 10Faidon Liambotis: [C: 04-2] "These magic percentages are ugly and arbitrary. If we want to be warned at 30/15 days, we should configure our checks for that, not for "9" [puppet] - 10https://gerrit.wikimedia.org/r/309203 (https://phabricator.wikimedia.org/T144293) (owner: 10Alex Monk) [13:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:31:09] working on 310798 [13:31:14] 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review: Certain images failing to load in ulsfo - https://phabricator.wikimedia.org/T144257#2593559 (10BBlack) >>! In T144257#2681470, @Aklapper wrote: > Is anybody actively investigating this? / Does this need more investigation? Or did the merged patc... [13:31:21] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310798 (https://phabricator.wikimedia.org/T145017) (owner: 10Urbanecm) [13:31:37] 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review: Certain images failing to load in ulsfo - https://phabricator.wikimedia.org/T144257#2684312 (10BBlack) [13:31:40] 06Operations, 10Traffic, 13Patch-For-Review: varnish backends start returning 503s after ~6 days uptime - https://phabricator.wikimedia.org/T145661#2684310 (10BBlack) [13:32:04] Production is ok too. [13:33:41] (03PS3) 10Filippo Giunchedi: Move scap package version to class parameter [puppet] - 10https://gerrit.wikimedia.org/r/312971 (https://phabricator.wikimedia.org/T146618) (owner: 1020after4) [13:33:48] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684318 (10BBlack) It would probably be better to upgrade the deployment-prep upload cache to varnish4. [13:34:16] hashar: ok, now related to both jenkins and swat :( the job got stuck https://integration.wikimedia.org/ci/job/operations-mw-config-typos/3021/console [13:34:37] related to https://gerrit.wikimedia.org/r/#/c/310798/ [13:35:21] (03CR) 10Filippo Giunchedi: [C: 032] Move scap package version to class parameter [puppet] - 10https://gerrit.wikimedia.org/r/312971 (https://phabricator.wikimedia.org/T146618) (owner: 1020after4) [13:35:39] (03CR) 10Filippo Giunchedi: "NOOP in PCC https://puppet-compiler.wmflabs.org/4191/" [puppet] - 10https://gerrit.wikimedia.org/r/312971 (https://phabricator.wikimedia.org/T146618) (owner: 1020after4) [13:36:06] Urbanecm: please wait, jenkins problems :( [13:36:11] (see above) [13:36:30] hashar: should I just stop the job in jenkins (for now) and recheck? [13:36:42] zeljkof: I have a lot of time, the bus leaves in two hours :) . [13:36:43] again ? [13:36:56] (03CR) 10jenkins-bot: [V: 04-1] Fix hewiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310798 (https://phabricator.wikimedia.org/T145017) (owner: 10Urbanecm) [13:37:03] (03CR) 10Faidon Liambotis: [C: 031] "LGTM (aside from the mixing of tabs and spaces :)" [puppet] - 10https://gerrit.wikimedia.org/r/310815 (owner: 10Alexandros Kosiaris) [13:37:17] zeljkof: I guess that agent has some issue [13:37:19] er slave [13:37:28] (03CR) 10Hashar: [C: 032] Fix hewiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310798 (https://phabricator.wikimedia.org/T145017) (owner: 10Urbanecm) [13:37:32] lets try again [13:37:36] (03PS1) 10DCausse: Reduce number of replicas for titlesuggest indices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313800 (https://phabricator.wikimedia.org/T147192) [13:37:46] hashar: Is it possible to reboot the slave? Maybe it'll help... [13:37:55] (03Merged) 10jenkins-bot: Fix hewiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310798 (https://phabricator.wikimedia.org/T145017) (owner: 10Urbanecm) [13:38:12] (03PS2) 10BBlack: remove redundant require [puppet] - 10https://gerrit.wikimedia.org/r/313554 [13:38:14] (03PS1) 10BBlack: VCL commentary [puppet] - 10https://gerrit.wikimedia.org/r/313801 [13:38:16] (03PS1) 10BBlack: upload storage: weekly restarts [puppet] - 10https://gerrit.wikimedia.org/r/313802 (https://phabricator.wikimedia.org/T145661) [13:38:28] !log reenable puppet on scb1* ores/celery spamming is over [13:38:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:39:26] Urbanecm: yeah I will reboot it [13:39:33] zeljkof: it merged [13:39:34] Okay, thanks hashar . [13:39:45] hashar: great, thanks, continuing with the deploy [13:43:00] Urbanecm: 310798 is at 1099, please check [13:43:18] (03PS2) 10Zfilipin: Upload 1x logo for olowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312977 (https://phabricator.wikimedia.org/T146745) (owner: 10Urbanecm) [13:43:36] zeljkof: Was the logo purged? [13:44:10] Urbanecm: uh, is that something I should do? [13:44:58] I think you must purge cache or I won't see any difference. [13:45:07] * zeljkof is looking at the docs... [13:45:22] you would not see anything even at 1099? [13:46:03] (03PS3) 10Zfilipin: Upload 1x logo for olowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312977 (https://phabricator.wikimedia.org/T146745) (owner: 10Urbanecm) [13:46:06] (03PS1) 10Elukey: Upgrade memcached on mc2009 to 1.4.28 [puppet] - 10https://gerrit.wikimedia.org/r/313803 (https://phabricator.wikimedia.org/T129963) [13:46:17] zeljkof: just sync the logo [13:46:18] :) [13:46:27] hashar: on production, or 1099? [13:46:31] to prod [13:46:36] hashar: ok, will do [13:47:17] that is for a new wikipedia https://olo.wikipedia.org/ [13:47:20] which is being provisioned [13:47:34] hashar: no, for hewiki [13:47:34] https://gerrit.wikimedia.org/r/#/c/310798/ [13:47:40] aah [13:47:46] 06Operations, 10Monitoring, 13Patch-For-Review, 05Prometheus-metrics-monitoring: test prometheus server - https://phabricator.wikimedia.org/T126785#2684332 (10fgiunchedi) 05Open>03Resolved The scope for this task is done, we have prometheus server in production now [13:48:00] same just push it :)]] [13:48:03] I am rebasing the commit for the new wikipedia, so it is ready [13:48:03] (03PS2) 10Elukey: Upgrade memcached on mc2009 to 1.4.28 [puppet] - 10https://gerrit.wikimedia.org/r/313803 (https://phabricator.wikimedia.org/T129963) [13:48:06] ok, pushing [13:48:26] zeljkof and hashar: Should I test the logos for hewiki? Is it ready from your side? [13:48:35] for the logos [13:48:43] Urbanecm: in a minute [13:48:46] I would not bother testing them really [13:48:47] just push [13:48:57] report back on the task that the change is deployed [13:49:00] (03PS3) 10BBlack: remove redundant require [puppet] - 10https://gerrit.wikimedia.org/r/313554 [13:49:03] and folks from the wiki will iterate from there [13:49:07] hashar: ok, then I will push all logos without testing [13:49:15] (03CR) 10BBlack: [C: 032 V: 032] remove redundant require [puppet] - 10https://gerrit.wikimedia.org/r/313554 (owner: 10BBlack) [13:49:19] the only thing I check is whether the logo still look legit [13:49:25] (03PS2) 10BBlack: VCL commentary [puppet] - 10https://gerrit.wikimedia.org/r/313801 [13:49:31] (03CR) 10BBlack: [C: 032 V: 032] VCL commentary [puppet] - 10https://gerrit.wikimedia.org/r/313801 (owner: 10BBlack) [13:49:36] (03PS4) 10Andrew Bogott: l10nupdate: Add 'su' to logrotate script [puppet] - 10https://gerrit.wikimedia.org/r/313563 (https://phabricator.wikimedia.org/T132324) [13:49:37] if it is changed to an orange pumpkin, I would think twice [13:50:47] (03PS2) 10BBlack: upload storage: weekly restarts [puppet] - 10https://gerrit.wikimedia.org/r/313802 (https://phabricator.wikimedia.org/T145661) [13:51:04] !log zfilipin@tin Synchronized static/images/project-logos/hewiki.png: SWAT: [[gerrit:310798|Fix hewiki logos (T145017)]] (duration: 00m 47s) [13:51:05] T145017: Pixelized logo - hewiki - https://phabricator.wikimedia.org/T145017 [13:51:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:51:20] Urbanecm: 310798 is deployed to prod [13:51:32] (03PS5) 10Andrew Bogott: l10nupdate: Add 'su' to logrotate script [puppet] - 10https://gerrit.wikimedia.org/r/313563 (https://phabricator.wikimedia.org/T132324) [13:51:40] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312977 (https://phabricator.wikimedia.org/T146745) (owner: 10Urbanecm) [13:51:47] moving to 312977 [13:51:52] will push to prod and let you know [13:52:13] (03Merged) 10jenkins-bot: Upload 1x logo for olowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312977 (https://phabricator.wikimedia.org/T146745) (owner: 10Urbanecm) [13:52:36] (03CR) 10BBlack: [C: 032] upload storage: weekly restarts [puppet] - 10https://gerrit.wikimedia.org/r/313802 (https://phabricator.wikimedia.org/T145661) (owner: 10BBlack) [13:52:51] (03CR) 10Andrew Bogott: [C: 032] l10nupdate: Add 'su' to logrotate script [puppet] - 10https://gerrit.wikimedia.org/r/313563 (https://phabricator.wikimedia.org/T132324) (owner: 10Andrew Bogott) [13:53:37] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/4193/" [puppet] - 10https://gerrit.wikimedia.org/r/313803 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey) [13:54:09] zeljkof: 310798 is ok [13:54:47] (03PS6) 10Andrew Bogott: l10nupdate: Add 'su' to logrotate script [puppet] - 10https://gerrit.wikimedia.org/r/313563 (https://phabricator.wikimedia.org/T132324) [13:55:42] !log zfilipin@tin Synchronized static/images/project-logos/olowiki.png: SWAT: [[gerrit:312977|Upload 1x logo for olowiki (T146745)]] (duration: 00m 48s) [13:55:43] T146745: Project logos for olo.wikipedia (1x, 1.5x and 2x) - https://phabricator.wikimedia.org/T146745 [13:55:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:55:51] Urbanecm 312977 is live on prod, please check [13:56:19] (03PS2) 10Zfilipin: Enable subpages in 121 namespace in wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312500 (https://phabricator.wikimedia.org/T146271) (owner: 10Urbanecm) [13:56:25] moving on to 312500 [13:56:53] hola andrewbogott, I merged all the code reviews except the l10update one, if you are ok we can merge now [13:57:05] I'm merging it right now :) [13:57:19] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312500 (https://phabricator.wikimedia.org/T146271) (owner: 10Urbanecm) [13:57:35] \o/ [13:57:38] + su l10nupdate wikidev [13:57:46] (03Merged) 10jenkins-bot: Enable subpages in 121 namespace in wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312500 (https://phabricator.wikimedia.org/T146271) (owner: 10Urbanecm) [13:57:52] 06Operations, 13Patch-For-Review: Migrate pool counters to trusty/jessie - https://phabricator.wikimedia.org/T123734#2684347 (10AlexMonk-WMF) >>! In T123734#2683773, @fgiunchedi wrote: > I tried provisioning `deployment-poolcounter03` with jessie to migrate beta too but ATM the instance is not accessible via s... [13:58:05] (03CR) 10Alex Monk: [C: 031] deployment-prep: Move poolcounter to deployment-poolcounter04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313789 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [13:59:09] Urbanecm: 312500 is at 1099, please check [13:59:10] zeljkof: 312977 is ok [13:59:34] (03PS3) 10Zfilipin: Add 1.5 and 2x logos for olowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313658 (https://phabricator.wikimedia.org/T146745) (owner: 10Urbanecm) [13:59:36] 06Operations, 13Patch-For-Review: Migrate pool counters to trusty/jessie - https://phabricator.wikimedia.org/T123734#2684349 (10fgiunchedi) >>! In T123734#2684347, @AlexMonk-WMF wrote: >>>! In T123734#2683773, @fgiunchedi wrote: >> I tried provisioning `deployment-poolcounter03` with jessie to migrate beta too... [14:00:55] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684352 (10AlexMonk-WMF) >>! In T147116#2684318, @BBlack wrote: > It would probably be better to upgrade the deployment-prep upload cache to varnish4. Okay... [14:01:21] (03CR) 10BBlack: [C: 031] Avoid unnecessary varnishkafka restarts [puppet] - 10https://gerrit.wikimedia.org/r/313400 (owner: 10Elukey) [14:01:41] (03PS2) 10Elukey: Avoid unnecessary varnishkafka restarts [puppet] - 10https://gerrit.wikimedia.org/r/313400 [14:02:57] !log extending EU SWAT [14:02:57] (03CR) 10Elukey: [C: 032] Avoid unnecessary varnishkafka restarts [puppet] - 10https://gerrit.wikimedia.org/r/313400 (owner: 10Elukey) [14:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:03:36] zeljkof: I doesn't see any change, look at https://www.wikidata.org/wiki/Property_talk:P131/Archive , no buttons typical for subpages appeared... [14:03:59] 06Operations, 10media-storage, 13Patch-For-Review: swift upgrade plans: jessie and swift 2.x - https://phabricator.wikimedia.org/T117972#2684356 (10fgiunchedi) [14:04:25] Urbanecm: looking... what to do now? should I revert? [14:04:51] I am not really familiar with wikidata, so I don't know what to expect... [14:05:31] zeljkof: There should be navigation. [14:05:59] But somewhere in InitialiseSettings.php there is wikidatawiki instead of just wikidata. Maybe this is why nothing happens. [14:06:09] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684357 (10BBlack) I wish :) The basic flow we're using on prod nodes is here, but some of that's inapplicable to deployment-prep: https://wikitech.wikimed... [14:06:25] Urbanecm: hm, ok, what to do now? revert? [14:06:25] zeljkof: Could you test it (replace wikidata with wikidatawiki in the row which was added by my patch)? [14:06:58] we are already over the time, I am reluctant to poke around, is it urgent? can we leave it for a later deploy or tomorrow? [14:07:26] I am not really familiar with mediawiki config and wikidata... [14:08:25] what is the change ? [14:08:36] ALVAROMOLINA KILL DEBE MORIR.. LALALALLALALA... JEM VALE MIERDA Y ALEXZ AMA A C ❤❤❤❤😘😘😘 [14:08:37] hashar: https://gerrit.wikimedia.org/r/#/c/312500/2 [14:08:38] 06Operations, 10Monitoring, 15User-Joe, 07Wikimedia-Incident: Monitor redis memory/disk usage - https://phabricator.wikimedia.org/T110169#2684363 (10Joe) a:05Joe>03None [14:08:44] lalallalaal [14:09:29] 06Operations, 13Patch-For-Review, 07Performance, 15User-Joe, and 2 others: Package and deploy Mcrouter as a replacement for twemproxy - https://phabricator.wikimedia.org/T132317#2684366 (10Joe) a:05Joe>03None [14:09:47] Urbanecm: zeljkof: "wikidata" would be the wikis in the wikidata.dblist [14:09:55] which are wikidatawiki and testwikidatawiki [14:10:31] what is wrong whith this patch? [14:10:53] (03Abandoned) 10Alex Monk: check_ssl: Use a maximum percentage of certificate validity time for determining alert state [puppet] - 10https://gerrit.wikimedia.org/r/309203 (https://phabricator.wikimedia.org/T144293) (owner: 10Alex Monk) [14:11:00] hashar: Urbanecm says he does not see navigation he expects to see [14:11:02] at 1099 [14:11:17] https://www.wikidata.org/wiki/Property_talk:P131/Archive [14:11:28] ohh [14:11:31] subpages [14:11:46] so https://www.wikidata.org/wiki/Property_talk:P131/Archive [14:12:02] is supposed to be a subpage now ? [14:12:11] Yes [14:12:26] All pages in NS 121 should be subpages. [14:12:33] gotta purge them [14:12:36] https://www.wikidata.org/wiki/Property_talk:P131/Archive [14:12:37] Okay. [14:12:50] 06Operations, 06Discovery, 06Maps, 03Interactive-Sprint: reimage maps-test* servers - https://phabricator.wikimedia.org/T147194#2684215 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['maps-test2001.codfw.wmnet'] ``` The log can be found in `/va... [14:12:58] hashar: is that something I have to do? [14:13:22] I dont think we have any easy way to purge all pages of a given namespace :( [14:13:36] hashar: should I deploy the change? [14:13:45] zeljkof: yes [14:13:50] ok, deploying [14:14:06] the whole semantic is very confusing [14:14:21] but in short, if there is a "wiki" suffix, that is a single wiki [14:14:29] else that is set of wikis listed in /dblists/ [14:14:33] + purge needs to happen [14:14:48] hashar: Thanks for explanation. [14:15:02] 06Operations, 10Traffic, 13Patch-For-Review: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2684373 (10BBlack) There's not much left to do here and we're no longer actively investigating. However, I'd like to try removing the 401 hack at some point, to see i... [14:15:17] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:312500|Enable subpages in 121 namespace in wikidata (T146271)]] (duration: 00m 49s) [14:15:18] T146271: Wikidata site configuration: enable subpage feature for "Property talk" namespace (wgNamespaceNumber: 121) - https://phabricator.wikimedia.org/T146271 [14:15:21] Urbanecm: 312500 is live on prod, please check (if you can) [14:15:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:15:38] hashar: do I need to purge something at prod for the change to be visibile? [14:15:41] * zeljkof is confused [14:15:55] zeljkof: Running touch.py from PWB probably would help. [14:16:00] hashar: also, should I continue with the swat? or stop? [14:16:05] Urbanecm: touch is lame :D [14:16:11] what is left to do ? [14:16:29] Urbanecm: what is PWB? [14:16:35] zeljkof: Pywikibot [14:16:43] hashar: Why? ;) [14:16:53] hashar 313658 (logos) and 313763 (robots) [14:17:07] Urbanecm: well if it is run with -purge I dont mind. But creating null edits is really crazy :D [14:17:37] But running touch.py would help when it'll be run correctly :). [14:17:38] ah found it [14:17:40] Okay, going to it. [14:17:43] *going to do it [14:17:47] missing wold [14:17:47] zeljkof: yeah do swat both it is all fine [14:17:53] Urbanecm: I am going to purge them server side [14:18:01] hashar: ok, continuing with swat [14:18:16] hashar: Okay, so nothing is required from my side? [14:18:18] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313658 (https://phabricator.wikimedia.org/T146745) (owner: 10Urbanecm) [14:18:24] continuing with 313658 [14:18:43] (03Merged) 10jenkins-bot: Add 1.5 and 2x logos for olowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313658 (https://phabricator.wikimedia.org/T146745) (owner: 10Urbanecm) [14:19:23] Urbanecm: purged :] [14:19:25] (03PS3) 10Zfilipin: Fix an invalid empty line in the global robots.txt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313763 (https://phabricator.wikimedia.org/T146908) (owner: 10Kerberizer) [14:19:28] 06Operations, 10MediaWiki-Page-editing, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: text input history/autocomplete doesn't work with HTTPS under IE8-10 - https://phabricator.wikimedia.org/T55636#2684382 (10BBlack) 05Open>03declined Declining this task because (a) It's been open for 3 ye... [14:19:32] hashar: Thanks. [14:19:51] !log Purged wikidata wiki property talk page, they now allow subpages (T146271). Ran: mwscript purgeList.php --wiki=wikidatawiki --namespace=121 --verbose [14:19:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:20:12] Urbanecm: I havent updated page_touched though. I dont htink that is relevant [14:20:46] 06Operations, 10Traffic, 10Wikimedia-Shop, 07HTTPS: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#2684390 (10BBlack) [14:20:48] !log T146271 mwscript purgeList.php --wiki=testwikidatawiki --namespace=121 --verbose [14:20:49] 06Operations, 10Traffic, 10Wikimedia-Shop, 07HTTPS: Canonical URL in Store points to HTTP address, should be HTTPS - https://phabricator.wikimedia.org/T131131#2684389 (10BBlack) [14:20:49] T146271: Wikidata site configuration: enable subpage feature for "Property talk" namespace (wgNamespaceNumber: 121) - https://phabricator.wikimedia.org/T146271 [14:20:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:22:01] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548#2684397 (10BBlack) [14:22:04] 06Operations, 10Traffic, 07HTTPS: https://wikipedia.com and similar throw certificate warning - https://phabricator.wikimedia.org/T42998#2684399 (10BBlack) [14:22:07] zeljkof: the olowiki logos you can sync them to the cluster (ie skip mw1099) [14:22:58] hashar: sure [14:23:10] same for the robots.txt [14:23:11] :) [14:23:16] hashar: ok [14:23:41] !log zfilipin@tin Synchronized static/images/project-logos/olowiki-1.5x.png: SWAT: [[gerrit:313658|Add 1.5 and 2x logos for olowiki (T146745)]] (duration: 00m 48s) [14:23:42] T146745: Project logos for olo.wikipedia (1x, 1.5x and 2x) - https://phabricator.wikimedia.org/T146745 [14:23:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:24:54] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313763 (https://phabricator.wikimedia.org/T146908) (owner: 10Kerberizer) [14:24:56] !log zfilipin@tin Synchronized static/images/project-logos/olowiki-2x.png: SWAT: [[gerrit:313658|Add 1.5 and 2x logos for olowiki (T146745)]] (duration: 00m 48s) [14:25:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:25:02] Urbanecm: 313658 live on prod, please check [14:25:02] (03CR) 10Alexandros Kosiaris: [C: 032] Remove changeprop.svc.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/313590 (owner: 10Alexandros Kosiaris) [14:25:08] Okay, going to check it... [14:25:19] hashar: deploying 313763 [14:25:20] (03Merged) 10jenkins-bot: Fix an invalid empty line in the global robots.txt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313763 (https://phabricator.wikimedia.org/T146908) (owner: 10Kerberizer) [14:26:01] !) [14:26:05] zeljkof: Working [14:26:25] (for clarify, the patch works, so everything is ok) [14:26:41] Urbanecm: great, thanks, you are done for now :) [14:26:50] Thanks for the deploys! [14:27:48] Urbanecm: thank you for the patches ;) [14:28:16] 06Operations, 10Traffic: OpenSSL 1.1 deployment for cache clusters - https://phabricator.wikimedia.org/T144523#2684429 (10BBlack) We discussed this at the offsite, and we're reading to go with OpenSSL 1.1.0b. The plan is to patch our build such that the -dev package is version-differentiated in the package ti... [14:28:26] !log zfilipin@tin Synchronized robots.txt: SWAT: [[gerrit:313763|Fix an invalid empty line in the global robots.txt (T146908)]] (duration: 00m 47s) [14:28:27] T146908: Global robots.txt contains invalid empty line - https://phabricator.wikimedia.org/T146908 [14:28:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:28:34] hashar: 313763 is live on prod, is there anything to check? [14:28:43] or should we end the swat? [14:30:53] 06Operations, 10Traffic, 13Patch-For-Review: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181#2684436 (10BBlack) We've discussed (at our offiste meetings) our strategy for removing the final pair of non-forward-secret ciphers (DES-CBC3-SHA and AES128-SHA).... [14:31:50] zeljkof: na it is all good :) [14:31:57] hashar: ok [14:32:08] !log ending EU SWAT [14:32:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:33:53] hashar: zeljkof: thanks! i don't see the robots.txt change live yet, but I guess it's just a matter of server-side caching [14:34:01] (03CR) 10Alexandros Kosiaris: [C: 032] "https://puppet-compiler.wmflabs.org/4177/ says no diff for many many hosts, merging" [puppet] - 10https://gerrit.wikimedia.org/r/312984 (owner: 10Alexandros Kosiaris) [14:34:04] (03PS1) 10Marostegui: db-eqiad-php: Repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313804 (https://phabricator.wikimedia.org/T147113) [14:34:06] (03PS2) 10Alexandros Kosiaris: Revert "Stabilize the output of stdlib's keys function" [puppet] - 10https://gerrit.wikimedia.org/r/312984 [14:34:08] (03CR) 10Alexandros Kosiaris: [V: 032] Revert "Stabilize the output of stdlib's keys function" [puppet] - 10https://gerrit.wikimedia.org/r/312984 (owner: 10Alexandros Kosiaris) [14:35:13] (03CR) 10Marostegui: [C: 032] db-eqiad-php: Repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313804 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [14:35:41] (03Merged) 10jenkins-bot: db-eqiad-php: Repool db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313804 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [14:35:42] hashar: do you know when robots.txt change will be visible cc kerberizer [14:37:21] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1084 after its maintenance - T147113 (duration: 00m 48s) [14:37:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:43:54] (03PS1) 10Alex Monk: Make check_ssl warning/critical thresholds explicit and lower them on Let's Encrypt domains [puppet] - 10https://gerrit.wikimedia.org/r/313805 (https://phabricator.wikimedia.org/T144293) [14:44:54] (03Abandoned) 10Alexandros Kosiaris: Disable puppetDB everywhere [puppet] - 10https://gerrit.wikimedia.org/r/312050 (owner: 10Alexandros Kosiaris) [14:45:19] (03CR) 10jenkins-bot: [V: 04-1] Make check_ssl warning/critical thresholds explicit and lower them on Let's Encrypt domains [puppet] - 10https://gerrit.wikimedia.org/r/313805 (https://phabricator.wikimedia.org/T144293) (owner: 10Alex Monk) [14:45:38] (03Abandoned) 10Alexandros Kosiaris: check_puppetrun: Add failed resource warning/critical levels [puppet] - 10https://gerrit.wikimedia.org/r/305505 (owner: 10Alexandros Kosiaris) [14:47:23] (03PS2) 10Alex Monk: Make check_ssl warning/critical thresholds explicit and lower them on Let's Encrypt domains [puppet] - 10https://gerrit.wikimedia.org/r/313805 (https://phabricator.wikimedia.org/T144293) [14:47:29] 06Operations, 06Labs, 10wikitech.wikimedia.org: Can't login wikitech - https://phabricator.wikimedia.org/T144805#2684464 (10jcrespo) Of those, @dpatrick is probably the right person that knows, but I am standing by in case something is needed from my side. [14:50:52] 06Operations, 06Discovery, 06Maps, 03Interactive-Sprint: reimage maps-test* servers - https://phabricator.wikimedia.org/T147194#2684467 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['maps-test2001.codfw.wmnet'] ``` Those hosts were successful: ``` ['maps-test2001.codfw.wmnet'] ``` [14:52:13] 06Operations, 10Traffic: Removing support for DES-CBC3-SHA TLS cipher - https://phabricator.wikimedia.org/T147199#2684468 (10BBlack) [14:53:01] !log adding volans (RCoccioli) to phab security, confirmed staff account association and membership in ops acl already, confirmed w/ riccardo he is missing, and there is a long standing agreement all members of ops should be in #security [14:53:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:55:05] (03PS1) 10Marostegui: db-eqiad.php: Increase weight db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313806 (https://phabricator.wikimedia.org/T147113) [14:56:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313806 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [14:56:41] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313806 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [14:58:11] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1084 after its maintenance - T147113 (duration: 00m 48s) [14:58:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:01:33] 06Operations, 10Traffic: Removing support for DES-CBC3-SHA TLS cipher - https://phabricator.wikimedia.org/T147199#2684520 (10BBlack) [15:05:27] 06Operations, 10Traffic: Removing support for AES128-SHA TLS cipher - https://phabricator.wikimedia.org/T147202#2684557 (10BBlack) [15:07:32] 06Operations, 10ops-esams, 06DC-Ops, 10hardware-requests: Decomission amssq31-62 (32 hosts) - https://phabricator.wikimedia.org/T95742#2684578 (10BBlack) [15:08:44] 06Operations, 10MediaWiki-General-or-Unknown, 06Release-Engineering-Team, 10Traffic, and 5 others: Make sure we're not relying on HTTP_PROXY headers - https://phabricator.wikimedia.org/T140658#2471564 (10BBlack) Is there more to do here on the MW-Core side of things? [15:10:16] PROBLEM - Check size of conntrack table on mendelevium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:10:27] PROBLEM - salt-minion processes on mendelevium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:12:06] (03PS1) 10Marostegui: db-eqiad.php: Increase weight db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313811 (https://phabricator.wikimedia.org/T147113) [15:12:46] RECOVERY - Check size of conntrack table on mendelevium is OK: OK: nf_conntrack is 0 % full [15:12:56] RECOVERY - salt-minion processes on mendelevium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:13:11] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313811 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [15:13:44] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight db1084 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313811 (https://phabricator.wikimedia.org/T147113) (owner: 10Marostegui) [15:15:07] PROBLEM - puppet last run on labstore1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:15:11] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1084 after its maintenance to its original value: 500 - T147113 (duration: 00m 48s) [15:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:15:53] 06Operations, 10Traffic, 13Patch-For-Review: Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430#2684611 (10BBlack) The hostname's been gone for ~12 days now, so odds of revert seem low at this point. I'm going to merge up the VCL patch to kill the unused bits code there, and push... [15:18:50] (03PS2) 10BBlack: text VCL: remove bits.wm.o stuff [puppet] - 10https://gerrit.wikimedia.org/r/305535 (https://phabricator.wikimedia.org/T107430) [15:22:26] (03CR) 10BBlack: [C: 032] text VCL: remove bits.wm.o stuff [puppet] - 10https://gerrit.wikimedia.org/r/305535 (https://phabricator.wikimedia.org/T107430) (owner: 10BBlack) [15:23:08] (03PS2) 10BBlack: MW apache: remove bits.wm.o vhost [puppet] - 10https://gerrit.wikimedia.org/r/305536 (https://phabricator.wikimedia.org/T107430) [15:24:31] (03CR) 10BBlack: [C: 031] "This is ready to go and seems fairly trivial: bits.wm.o doesn't exist in DNS anymore, and our Varnishes don't support it anymore either. " [puppet] - 10https://gerrit.wikimedia.org/r/305536 (https://phabricator.wikimedia.org/T107430) (owner: 10BBlack) [15:25:18] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684616 (10AlexMonk-WMF) 05Open>03Resolved a:03AlexMonk-WMF >>! In T147116#2684357, @BBlack wrote: > I wish :) Yeah I knew you were gonna say that.... [15:26:24] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684621 (10BBlack) I think we can abandon the patch. We're assuming we're past the point of reverting to varnish3 for the upload caches at this point, just... [15:28:03] (03PS4) 10Andrew Bogott: Certcleaner: Add some logging [puppet] - 10https://gerrit.wikimedia.org/r/313578 (https://phabricator.wikimedia.org/T146303) [15:29:08] (03CR) 10Andrew Bogott: [C: 032] Certcleaner: Add some logging [puppet] - 10https://gerrit.wikimedia.org/r/313578 (https://phabricator.wikimedia.org/T146303) (owner: 10Andrew Bogott) [15:30:36] (03Abandoned) 10Alex Monk: varnish: Fix upload backend support for versions other than 4 [puppet] - 10https://gerrit.wikimedia.org/r/313668 (https://phabricator.wikimedia.org/T147116) (owner: 10Alex Monk) [15:32:34] 06Operations, 10Traffic, 13Patch-For-Review: Stop using persistent storage in our backend varnish layers. - https://phabricator.wikimedia.org/T142848#2684632 (10BBlack) 05Open>03Resolved a:03BBlack We're past this decision point now. There are issues with `file` storage in Varnish4 as well, but mitiga... [15:33:04] 06Operations, 10Traffic, 13Patch-For-Review: varnishd: Assert error in smp_oc_getobj(), storage/storage_persistent_silo.c line 417 - https://phabricator.wikimedia.org/T142810#2684636 (10BBlack) 05Open>03Resolved a:03BBlack No longer relevant (see T142848) [15:37:20] (03PS1) 10Filippo Giunchedi: prometheus: add beta-specific instance [puppet] - 10https://gerrit.wikimedia.org/r/313816 (https://phabricator.wikimedia.org/T144502) [15:38:47] (03CR) 10jenkins-bot: [V: 04-1] prometheus: add beta-specific instance [puppet] - 10https://gerrit.wikimedia.org/r/313816 (https://phabricator.wikimedia.org/T144502) (owner: 10Filippo Giunchedi) [15:40:48] RECOVERY - puppet last run on labstore1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:46:42] 06Operations, 10Traffic: Removing support for DES-CBC3-SHA TLS cipher - https://phabricator.wikimedia.org/T147199#2684679 (10BBlack) [15:46:49] (03PS2) 10Filippo Giunchedi: prometheus: add beta-specific instance [puppet] - 10https://gerrit.wikimedia.org/r/313816 (https://phabricator.wikimedia.org/T144502) [15:48:45] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 209, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps wave]BR [15:49:17] 06Operations, 10Traffic: Removing support for AES128-SHA TLS cipher - https://phabricator.wikimedia.org/T147202#2684685 (10BBlack) [15:51:29] (03PS1) 10Andrew Bogott: Add a role wrapper around base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/313819 [15:52:31] (03CR) 10jenkins-bot: [V: 04-1] Add a role wrapper around base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/313819 (owner: 10Andrew Bogott) [15:55:55] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "You should explain the rationale of such a change a bit better in the commit message; as it is, it's a bit weird." [puppet] - 10https://gerrit.wikimedia.org/r/313819 (owner: 10Andrew Bogott) [15:56:18] (03PS2) 10Andrew Bogott: Add a role wrapper around base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/313819 [15:57:20] (03CR) 10Andrew Bogott: "@giuseppe -- indeed, I'm going to send an email to the ops list with a question, using this patch as an example" [puppet] - 10https://gerrit.wikimedia.org/r/313819 (owner: 10Andrew Bogott) [16:00:21] 06Operations, 07Beta-Cluster-reproducible, 15User-Joe: Update confd package - https://phabricator.wikimedia.org/T147204#2684734 (10AlexMonk-WMF) [16:03:22] 06Operations, 07Beta-Cluster-reproducible, 15User-Joe: Update confd package - https://phabricator.wikimedia.org/T147204#2684753 (10AlexMonk-WMF) Specifically version 0.11.0 or higher: https://github.com/kelseyhightower/confd/commit/27056b9389519e9f1ebf7244f2825a8e008082d6 Current version in our apt repo is 0... [16:04:08] (03PS1) 10Gehel: maps-test - don't pin cassandra version [puppet] - 10https://gerrit.wikimedia.org/r/313823 [16:04:09] 06Operations, 10Traffic, 06WMF-Communications, 07HTTPS, 07Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2684754 (10BBlack) I don't think there's really anything we can do here on our end, and this has been opened with no pr... [16:05:04] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: add beta-specific instance [puppet] - 10https://gerrit.wikimedia.org/r/313816 (https://phabricator.wikimedia.org/T144502) (owner: 10Filippo Giunchedi) [16:06:00] 06Operations, 06Performance-Team, 10Traffic, 10Wikimedia-Stream, and 2 others: HTTPS-only for stream.wikimedia.org - https://phabricator.wikimedia.org/T140128#2684764 (10BBlack) >>! In T140128#2637840, @Dzahn wrote: >>>! In T140128#2625078, @AlexMonk-WMF wrote: >> Can you filter those access logs down to l... [16:06:10] (03PS2) 10Gehel: maps-test - don't pin cassandra version [puppet] - 10https://gerrit.wikimedia.org/r/313823 [16:07:38] (03PS3) 10Gehel: maps-test - don't pin cassandra version [puppet] - 10https://gerrit.wikimedia.org/r/313823 [16:08:20] (03PS4) 10Giuseppe Lavagetto: scap::source: also define the corresponding dsh group [puppet] - 10https://gerrit.wikimedia.org/r/306431 [16:08:22] (03PS1) 10Giuseppe Lavagetto: role::deployment::server: make explicit hiera calls [puppet] - 10https://gerrit.wikimedia.org/r/313824 [16:09:29] (03CR) 10Gehel: [C: 032] maps-test - don't pin cassandra version [puppet] - 10https://gerrit.wikimedia.org/r/313823 (owner: 10Gehel) [16:09:31] (03PS1) 10Eevans: Adding TimeWindowCompactionStrategy-2.2.5.jar via git-fat [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/313825 (https://phabricator.wikimedia.org/T133395) [16:14:40] 06Operations, 10Traffic, 06WMF-Communications, 07HTTPS, 07Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2684780 (10Reedy) Canned response to send back to them, and something for them to push to their IT guys to point out th... [16:18:28] (03PS1) 10Filippo Giunchedi: prometheus: use array_concat, not concat [puppet] - 10https://gerrit.wikimedia.org/r/313826 [16:19:15] 06Operations, 10Traffic, 06WMF-Communications, 07HTTPS, 07Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2684800 (10BBlack) I think probably the best canned response we can send is something along the lines of: ```Probably... [16:19:31] (03PS1) 10BBlack: Text VCL: remove Win+Chrome/41 bug workaround [puppet] - 10https://gerrit.wikimedia.org/r/313827 (https://phabricator.wikimedia.org/T141786) [16:20:51] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: use array_concat, not concat [puppet] - 10https://gerrit.wikimedia.org/r/313826 (owner: 10Filippo Giunchedi) [16:23:49] (03PS2) 10BBlack: Text VCL: remove Win+Chrome/41 bug workaround [puppet] - 10https://gerrit.wikimedia.org/r/313827 (https://phabricator.wikimedia.org/T141786) [16:23:55] (03CR) 10BBlack: [C: 032 V: 032] Text VCL: remove Win+Chrome/41 bug workaround [puppet] - 10https://gerrit.wikimedia.org/r/313827 (https://phabricator.wikimedia.org/T141786) (owner: 10BBlack) [16:25:58] 06Operations, 10ops-codfw: ms-be2009.codfw.wmnet: slot=3 dev=sdd failed - https://phabricator.wikimedia.org/T147060#2684827 (10Papaul) p:05Triage>03Normal a:03Papaul [16:26:47] 06Operations, 10ops-codfw, 06Discovery: rack/setup/deploy wdqs200[12] - https://phabricator.wikimedia.org/T142864#2684835 (10Papaul) [16:26:50] 06Operations, 10ops-codfw, 06Discovery: codfw: rack/setup/deploy wdqs200[12]switch configuration - https://phabricator.wikimedia.org/T143613#2684833 (10Papaul) 05Open>03Resolved This is done. [16:26:55] (03PS1) 10BBlack: Text VCL: remove synth side of Win+Chrome/41 workaround [puppet] - 10https://gerrit.wikimedia.org/r/313828 (https://phabricator.wikimedia.org/T141786) [16:33:04] 07Blocked-on-Operations, 06Operations, 10Cassandra: Update Cassandra in Wikimedia APT repository - https://phabricator.wikimedia.org/T140409#2463737 (10Gehel) maps servers are the last to be using cassandra 2.1 and I'm in the process of upgrading them. I think it is time to upload cassandra 2.2.6-wmf1 to our... [16:33:55] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 646 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3265399 keys - replication_delay is 646 [16:37:04] 07Blocked-on-Operations, 06Operations, 10Cassandra: Update Cassandra in Wikimedia APT repository - https://phabricator.wikimedia.org/T140409#2684850 (10Gehel) Note that https://people.wikimedia.org/~eevans/debian/ already contains a wmf2 version. According to @Eevans, we should stick with wmf1 at the moment. [16:38:17] (03Abandoned) 10BBlack: openssl (1.0.2h-1~wmf5) jessie-wikimedia; urgency=medium [debs/openssl] - 10https://gerrit.wikimedia.org/r/306659 (https://phabricator.wikimedia.org/T131908) (owner: 10BBlack) [16:40:12] 06Operations, 10Traffic, 13Patch-For-Review: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2684860 (10BBlack) At least in the initial few minutes after removing the workaround, there's no apparent return of the bad traffic. Will leave this for a few days to... [16:40:56] PROBLEM - puppet last run on radium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:41:45] (03PS2) 10Giuseppe Lavagetto: role::deployment::server: make explicit hiera calls [puppet] - 10https://gerrit.wikimedia.org/r/313824 [16:41:47] (03PS5) 10Giuseppe Lavagetto: scap::source: also define the corresponding dsh group [puppet] - 10https://gerrit.wikimedia.org/r/306431 [16:45:04] (03PS3) 10Eevans: Extend classpath via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/313619 (https://phabricator.wikimedia.org/T133395) [16:46:03] (03CR) 10Eevans: Extend classpath via Puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/313619 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [16:48:35] _joe_: thank you for your work on the deployment server puppet mess :(( [16:48:43] (03CR) 10Eevans: "Patch 1 struck as being a little too clever, so patch 2 simply allows you to specify the individual entries to be added." [puppet] - 10https://gerrit.wikimedia.org/r/313619 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [16:49:10] I would really like to yank out all the trebuchet stuff that is filling up the deployment::server role and push it into a little corner for the time being. [16:50:29] (03CR) 10Eevans: "Updated PC output: http://puppet-compiler.wmflabs.org/4195" [puppet] - 10https://gerrit.wikimedia.org/r/313619 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [16:51:00] <_joe_> thcipriani: let me finish with this patches and things should be more clear [16:51:36] 06Operations, 05Prometheus-metrics-monitoring: upgrade to prometheus >= 1.1 - https://phabricator.wikimedia.org/T147207#2684893 (10fgiunchedi) [16:52:04] (03PS3) 10Giuseppe Lavagetto: role::deployment::server: make explicit hiera calls [puppet] - 10https://gerrit.wikimedia.org/r/313824 [16:52:04] heh, seemed like there was a plan of attack beginning to unfold. [16:53:30] (03CR) 10Giuseppe Lavagetto: [C: 032] role::deployment::server: make explicit hiera calls [puppet] - 10https://gerrit.wikimedia.org/r/313824 (owner: 10Giuseppe Lavagetto) [16:53:52] (03CR) 10Giuseppe Lavagetto: [V: 032] role::deployment::server: make explicit hiera calls [puppet] - 10https://gerrit.wikimedia.org/r/313824 (owner: 10Giuseppe Lavagetto) [16:59:49] (03CR) 10Giuseppe Lavagetto: "@mobrovac: the canaries will still be handled manually; please note that no file gets linked to the conftool with this change alone." [puppet] - 10https://gerrit.wikimedia.org/r/306431 (owner: 10Giuseppe Lavagetto) [17:05:57] 06Operations, 06Labs, 10Wikimedia-IRC-RC-Server, 10wikitech.wikimedia.org, 13Patch-For-Review: Enable irc feed for wikitech.wikimedia.org site - https://phabricator.wikimedia.org/T36685#2685039 (10Meno25) [17:06:16] RECOVERY - puppet last run on radium is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:09:18] 06Operations, 10Traffic: etcd cluster has Raft Internal errors sporadically - https://phabricator.wikimedia.org/T147209#2685056 (10BBlack) [17:10:44] 06Operations, 10Wikidata, 10Wikidata.org: Enable CORS headers for accessing wikidata.org - https://phabricator.wikimedia.org/T46994#2685085 (10Meno25) [17:16:49] !log deploying latest wdqs updater and gui [17:16:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:20:01] SMalyshev: test failing on wdqs1001, checking... [17:20:10] gehel: which one? [17:20:20] SMalyshev: all... [17:20:27] oh [17:20:32] SMalyshev: probably something stupid on my side... [17:21:11] gehel: blazegraph is up on wdq1 [17:21:24] 07Puppet, 10Beta-Cluster-Infrastructure: deployment-apertium01 puppet failing due to missing packages on trusty - https://phabricator.wikimedia.org/T147210#2685129 (10AlexMonk-WMF) [17:21:46] 06Operations, 10OCG-General: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2685145 (10elukey) [17:22:06] 06Operations, 10OCG-General: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2685157 (10elukey) p:05Triage>03High [17:22:36] this is for the current OCG issue --^ [17:22:51] <_joe_> high? [17:22:54] <_joe_> not really [17:23:01] SMalyshev: test script seems to receive a redirect... [17:23:24] _joe_: no? It seems not healthy to me.. [17:23:46] <_joe_> elukey: ocg is expected to work asyncronously [17:24:00] I got it but 30k jobs in the queue? [17:24:03] <_joe_> we should mainly inspect what is causing this batch of renderings [17:24:19] <_joe_> elukey: are jobs being processed? [17:24:31] <_joe_> if so, the only thing we should do is check the origin of such jobs [17:25:05] from what I can see yes, but it is not super clear to me how to find the source of these jobs :) [17:25:05] SMalyshev: manual testing looks good, deploying on other nodes [17:25:13] (03CR) 10Mobrovac: "Will the jars contained in a dir be picked up as well?" [puppet] - 10https://gerrit.wikimedia.org/r/313619 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [17:25:15] but I am following up in #wikimedia-parsoid [17:26:48] <_joe_> I thought the collections extension would make a direct call to ocg [17:27:32] 06Operations, 10OCG-General: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2685172 (10elukey) Related also to https://phabricator.wikimedia.org/T97524 [17:27:33] SMalyshev: deployment complete, tests looks good [17:27:45] gehel: ok, cool! [17:29:00] _joe_ I have no idea about the extension, just following https://wikitech.wikimedia.org/wiki/OCG#When_something_goes_wrong :) [17:30:32] anyhow, fine to me if we want to drop the task to normal or low, I thought it was important but I don't have a lot of experience with the service.. Will double check tomorrow eu morning [17:39:48] 07Puppet, 10Beta-Cluster-Infrastructure: deployment-apertium01 puppet failing due to missing packages on trusty - https://phabricator.wikimedia.org/T147210#2685207 (10KartikMistry) This is happening due to cherry-pick of https://gerrit.wikimedia.org/r/#/c/308679/ which is for testing before deployment in Produ... [17:53:11] bblack: yt? [17:53:41] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:01:53] who's doing SWAT deploys today? [18:02:17] no jouncebot [18:03:16] oh noes! [18:03:17] nuria_: somewhat, yes [18:04:08] bblack: could you take a look at this change that will return 404 for many of the requests for which we are returning 200 now? [18:04:34] jouncebot: now [18:04:34] For the next 0 hour(s) and 55 minute(s): Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161003T1800) [18:04:45] bblack: https://gerrit.wikimedia.org/r/#/c/312561/ [18:04:47] bblack: it is voted -1 until you look at it, just as an FYI, it corrects a bug on mediawiki [18:05:31] RoanKattouw, kaldari, SMalyshev -- ping for swat [18:05:48] I'm here [18:05:49] bd808: Pong [18:05:50] nuria_: yeah seems fine by me [18:06:10] Looks like RoanKattouw has the most patches. Does he have time to run the swat? [18:06:21] Sure I can do it [18:06:31] excellent [18:06:35] Thanks for giving me a good excuse to not go outside [18:06:41] bblack: ok, I *think* (this is a theory) that we are going to see the effect on varnish when we launch it (as an increase on non cached requests) but that those requests will stop happening soon after [18:06:41] Last time I checked it rained [18:06:48] (In October! In SF! That's like a month too early) [18:07:01] bblack: want to +2? [18:07:14] * bd808 helps keep RoanKattouw in a dark room and chained to a laptop [18:07:27] nuria_: we do cache 404s, but only for 10 minutes (which is enough to avoid serious miss-rate problems with specific URLs, but low enough that cached 404s don't unduly impact the timeline for creating new pages or turning on new wiki hostnames, etc) [18:07:49] (03CR) 10Catrope: [C: 032] Deploying PageAssessments to English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312869 (https://phabricator.wikimedia.org/T146679) (owner: 10Kaldari) [18:07:53] (03PS3) 10Catrope: Deploying PageAssessments to English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312869 (https://phabricator.wikimedia.org/T146679) (owner: 10Kaldari) [18:07:57] (03CR) 10Catrope: Deploying PageAssessments to English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312869 (https://phabricator.wikimedia.org/T146679) (owner: 10Kaldari) [18:07:58] bblack: ah, even better then [18:08:01] (03CR) 10Catrope: [C: 032] Deploying PageAssessments to English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312869 (https://phabricator.wikimedia.org/T146679) (owner: 10Kaldari) [18:08:31] (03Merged) 10jenkins-bot: Deploying PageAssessments to English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312869 (https://phabricator.wikimedia.org/T146679) (owner: 10Kaldari) [18:08:34] bblack: we will see then a drop in request ratios because iam guessing whoever is using us as some kind of keep live with those urls will stop sending requests when we return 4040 [18:08:34] nuria_: I can, with the caveat that I don't speak php and I'm not reviewing the code itself :) [18:08:37] *404 [18:08:55] 4oh4 [18:08:57] bblack: jaja, neither do i, but others looked at the cocde [18:09:01] *code [18:09:24] bblack: we are just trying to anticipate its effect [18:09:31] actually, I might not even have +2 on that repo, I really don't know [18:09:59] bblack: If you +1 that I'll +2 [18:10:03] I did [18:10:11] That patch is in a widely accepted "if Brandon blesses it it will happen" state [18:10:13] OK cool [18:10:26] * Reedy pokes RoanKattouw with a jfdi stick [18:10:41] k [18:10:49] Done [18:11:55] kaldari: Your PageAssessments patch is on mw1099, please test [18:12:09] testing... [18:14:26] RoanKattouw: Seems to be good [18:14:41] OK, pushing out to all servers [18:15:15] SMalyshev: You around for SWAT? [18:15:23] RoanKattouw: yup [18:15:47] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable PageAssessments on enwiki (T146679) (duration: 00m 49s) [18:15:48] T146679: Deploy PageAssessments to English Wikipedia - https://phabricator.wikimedia.org/T146679 [18:15:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:15:59] SMalyshev: Did you and bd808 coordinate to ensure your patches don't conflict with each other? [18:16:11] One is a follow-up to the change that the other is a partial revert of [18:16:22] RoanKattouw: I don't think they are given both are written by bd808 [18:16:23] I wrote both of them, they don't conflict [18:16:28] I am just cherry-picking :) [18:16:34] lol good point [18:16:36] Did not see that [18:16:37] I think SMalyshev's should go out first [18:16:43] OK will do [18:18:05] (03PS1) 10BBlack: cache_upload: jemalloc chunk size: s/1MB/128KB/ [puppet] - 10https://gerrit.wikimedia.org/r/313847 [18:19:06] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:20:55] (03CR) 10Catrope: [C: 032] Enable Flow beta feature on elwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312537 (https://phabricator.wikimedia.org/T144384) (owner: 10Catrope) [18:21:00] (03CR) 10Catrope: Enable Flow beta feature on elwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312537 (https://phabricator.wikimedia.org/T144384) (owner: 10Catrope) [18:21:04] (03PS2) 10Catrope: Enable Flow beta feature on elwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312537 (https://phabricator.wikimedia.org/T144384) [18:21:08] (03CR) 10Catrope: [C: 032] Enable Flow beta feature on elwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312537 (https://phabricator.wikimedia.org/T144384) (owner: 10Catrope) [18:21:34] (03Merged) 10jenkins-bot: Enable Flow beta feature on elwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312537 (https://phabricator.wikimedia.org/T144384) (owner: 10Catrope) [18:22:17] (03CR) 10Eevans: "> Will the jars contained in a dir be picked up as well?" [puppet] - 10https://gerrit.wikimedia.org/r/313619 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [18:24:53] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on elwiki (T144384) (duration: 00m 49s) [18:24:55] T144384: Enable Flow as a Beta feature in Greek Wikipedia (elwiki) - https://phabricator.wikimedia.org/T144384 [18:24:58] (03PS3) 10Catrope: Use === for $wgDBname comparison [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309226 (owner: 10Dereckson) [18:24:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:25:03] (03CR) 10Catrope: [C: 032] Use === for $wgDBname comparison [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309226 (owner: 10Dereckson) [18:25:30] (03Merged) 10jenkins-bot: Use === for $wgDBname comparison [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309226 (owner: 10Dereckson) [18:30:23] bd808: OK SMalyshev's patch is now on mw1099. Apologies for the delay, someone had put the wmf.20 dir in a weird state because they were confused about the deployment freeze [18:30:55] is it swat now? [18:31:06] * aude eager to see if my talk page works again :) [18:31:06] yes [18:31:12] aude: Yes, has been for half an hour but the going has been slow [18:31:14] RoanKattouw: thanks. SMalyshev can you test that one on mw1099? I'm not sure how to reproduce your failure that it is supposed to fix. [18:31:24] :/ [18:31:36] bd808: Do you also want me to do the other patch or should we do that one after Stas's? [18:31:48] bd808: I still see the error with the x-wikimedia-debug extension :( [18:32:07] not sure whether I do something wrong or fix is not enough [18:32:12] SMalyshev: yuck. but it was fixed when you tested locally? [18:32:15] it worked for me locally.... [18:32:48] server:mw1274.eqiad.wmnet [18:32:50] hmm [18:32:56] looks like the ext is not really working [18:33:01] let me try manually [18:33:19] SMalyshev: is there some page to view? [18:33:22] to test [18:33:49] bd808: with curl it works [18:33:54] produces 404 as it should [18:34:12] cool. maybe just some bug in the browser extension for you then [18:34:31] btw, i have https://meta.wikimedia.org/wiki/User:Aude/global.js to display the server and response time [18:34:32] yes [18:34:42] !log catrope@tin Synchronized wmf-config/CommonSettings.php: Use === for $wgDBname comparisons (duration: 01m 53s) [18:34:43] bd808, RoanKattouw: so it works [18:34:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:34:58] OK pushing that out to all boxes now [18:36:06] (03PS4) 10Catrope: Set $wgDefaultExternalStore for wikitech before Flow settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309225 (https://phabricator.wikimedia.org/T127792) (owner: 10Dereckson) [18:36:21] aude: yes, e.g. https://www.wikidata.org/wiki/Special:EntityData/Q19369930.ttl [18:36:23] !log catrope@tin Synchronized php-1.28.0-wmf.20/includes/exception/MWExceptionHandler.php: Restore delegation to MWException::report (T147098) (duration: 00m 48s) [18:36:24] T147098: Exception when getting TTL export of a deleted entity - https://phabricator.wikimedia.org/T147098 [18:36:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:36:42] aude: in general, any export of a deleted entity produced 500, should be 404 instead [18:36:42] looks good [18:36:57] and now it's 404. yay! [18:37:01] (03CR) 10Catrope: [C: 032] Set $wgDefaultExternalStore for wikitech before Flow settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309225 (https://phabricator.wikimedia.org/T127792) (owner: 10Dereckson) [18:37:25] (03Merged) 10jenkins-bot: Set $wgDefaultExternalStore for wikitech before Flow settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309225 (https://phabricator.wikimedia.org/T127792) (owner: 10Dereckson) [18:37:46] now I have to extract deletes from the rc log and update them... [18:39:02] (03PS1) 10BryanDavis: Only log on_join for our own nick [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/313857 [18:39:49] !log catrope@tin Synchronized wmf-config/CommonSettings.php: Set $wgDefaultExternalStore for wikitech before Flow settings (T127792) (duration: 01m 04s) [18:39:50] T127792: Enable Flow on wikitech (labswiki and labtestwiki), then turn on for Tool talk namespace - https://phabricator.wikimedia.org/T127792 [18:39:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:44:39] bd808: Your "restore prior render logic" is now on mw1099, please test [18:44:50] (And that's the last patch, yay) [18:45:35] RoanKattouw: Looks good. I'm getting the "Invalid value was provided for loading Flow content." message now instead of just the cryptic fatal. [18:45:51] bd808: Interesting, what's your test URL? [18:45:57] https://www.wikidata.org/wiki/User_talk:BDavis_(WMF) [18:45:59] * RoanKattouw did not realize this was for a Flow thing [18:46:19] aha, I see [18:46:27] it's flow related, not exclusively a flow problem [18:46:29] Cool, pushing that out to all boxes [18:46:44] Yeah, I think I understand, Flow is trying to do custom exception rendering and it wasn't working? [18:47:12] Invalid value was provided for loading Flow content [18:47:24] yeah. Flow's exception stack piddles strangely with MWException::report() [18:47:32] i think it can't find a revision [18:47:47] there was a related error in the exception log [18:47:56] aude: T138310 [18:47:56] !log catrope@tin Synchronized php-1.28.0-wmf.20/includes/exception/MWExceptionHandler.php: Restore prior render() logic (T147122) (duration: 00m 48s) [18:47:57] T138310: Flow as a Beta feature: enable, disable and reenable doesn't seem to work - https://phabricator.wikimedia.org/T138310 [18:47:57] T147122: Flow\Exception\InvalidInputException displayed after enable/disable Flow talk page beta feature - https://phabricator.wikimedia.org/T147122 [18:48:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:48:12] aude: Yeah, that URL is known to throw an exception, we were testing a change in the exception renderer :) [18:48:13] https://phabricator.wikimedia.org/T147122#2682014 [18:48:30] yeah :) [18:48:58] OK, and with that deployed, SWAT is now done :) [18:49:06] thanks RoanKattouw [18:49:11] * RoanKattouw checks if it's still raining outside [18:50:06] Looks like it stopped. Laziness FTW [18:53:12] 07Blocked-on-Operations, 06Operations, 10Cassandra: Update Cassandra in Wikimedia APT repository - https://phabricator.wikimedia.org/T140409#2685594 (10Gehel) 05Open>03Resolved a:03Gehel 2.2.6-wmf1 packages have been uploaded to our apt repo, closing this. [18:56:13] !log cleared rapidly-growing OCG queue w/ mw-ocg-service/scripts/clear-queue.js to cope with someone trying to render all of enwiktionary to PDF. [18:56:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:57:34] cscott: did you get an alert about that or? [18:57:47] https://phabricator.wikimedia.org/T147211 [18:59:01] i'm going to have to do something else, since whoever is running the enwiktionary spider is still at it [18:59:44] they are up to "unadventurous" [19:00:02] ideas welcome about how to rate limit that [19:01:23] cscott: ty [19:01:31] they're going alphabetically? [19:01:38] 06Operations, 10OCG-General: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2685634 (10cscott) Someone is trying to render all of enwiktionary to PDF. At the moment they are up to "unadventurous". ``` cscott@ocg1001:/srv/deployment/ocg/ocg$ tail -f /var/... [19:03:36] 06Operations, 10OCG-General: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2685636 (10cscott) I cleared the queue to make things a little more responsive for others: ``` cscott@ocg1001:/srv/deployment/ocg/ocg$ sudo -u ocg -g ocg nodejs-ocg mw-ocg-service... [19:03:40] Krenair: apparently. [19:05:14] 06Operations, 10OCG-General: ocg alarm ocg_job_status_queue 'flapping' - https://phabricator.wikimedia.org/T97524#2685652 (10cscott) Today there was a huge load spike as someone tried to pull all of enwiktionary through OCG. See T147211. It got up to 40K pending jobs in the load queue before I started cleari... [19:07:06] PROBLEM - puppet last run on elastic1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:10:34] should i just patch ocg to reject jobs from enwiktionary temporarily? [19:11:26] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 1102026 msg (=800000 warning): ocg_render_job_queue 3025 msg (=3000 critical) [19:11:45] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 1102105 msg (=800000 warning): ocg_render_job_queue 3018 msg (=3000 critical) [19:13:08] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 1102687 msg (=800000 warning): ocg_render_job_queue 3189 msg (=3000 critical) [19:13:35] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:15:09] 06Operations, 10OCG-General: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2685673 (10cscott) @elukey said: ``` (12:53:51 PM) elukey: cscott: one thing that was noted from https://grafana.wikimedia.org/dashboard/db/eventbus is that EventBus (and ChangePr... [19:24:11] (03CR) 10Alex Monk: [C: 032] Only log on_join for our own nick [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/313857 (owner: 10BryanDavis) [19:25:27] (03PS1) 10Jdlrobson: Enable Wikidata descriptions on Japanese and Spanish Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313869 (https://phabricator.wikimedia.org/T145786) [19:27:07] (03Merged) 10jenkins-bot: Only log on_join for our own nick [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/313857 (owner: 10BryanDavis) [19:28:17] (03PS1) 10Catrope: Disable wmgEchoFooterNotice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313872 [19:32:35] RECOVERY - puppet last run on elastic1031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:39:15] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [19:46:14] (03PS3) 10Gilles: Upgrade to 0.1.22 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/313781 [19:47:04] jouncebot: next [19:47:04] In 0 hour(s) and 12 minute(s): Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161003T2000) [19:47:23] (03PS4) 10Gilles: Upgrade to 0.1.22 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/313781 [19:48:23] !log cleared OCG queue again, while I work on a blacklist patch for the OCG frontend [19:48:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:58:09] (03PS1) 10Jcrespo: Depool db1091 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313879 (https://phabricator.wikimedia.org/T147113) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, Amir1, and yurik: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161003T2000). Please do the needful. [20:04:04] I need to deploy to mediawiki config: https://gerrit.wikimedia.org/r/#/c/313879/1 is this an issue? [20:08:29] (03CR) 10Jcrespo: [C: 032] Depool db1091 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313879 (https://phabricator.wikimedia.org/T147113) (owner: 10Jcrespo) [20:11:54] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 48s) [20:12:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:16:20] (03CR) 10Mobrovac: [C: 031] "Yup, looked it up :) And totally saner than letting Puppet do it by itself." [puppet] - 10https://gerrit.wikimedia.org/r/313619 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [20:26:54] as part of the service window, i will deploy graphoid [20:28:25] !log about to deploy graphoid update - https://gerrit.wikimedia.org/r/#/c/313887/ [20:28:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:30:08] (03PS3) 10Jcrespo: phabricator: Enable innodb_buffer_pool_load_at_startup and innodb_buffer_pool_dump_at_shutdown [puppet] - 10https://gerrit.wikimedia.org/r/313240 (owner: 10Paladox) [20:30:23] (03CR) 10Jcrespo: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/313240 (owner: 10Paladox) [20:31:43] (03CR) 10Jcrespo: [C: 032] phabricator: Enable innodb_buffer_pool_load_at_startup and innodb_buffer_pool_dump_at_shutdown [puppet] - 10https://gerrit.wikimedia.org/r/313240 (owner: 10Paladox) [20:32:30] !log deployed graphoid update - https://gerrit.wikimedia.org/r/#/c/313887/ [20:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:33:16] guess am I in time for swat? [20:34:26] (03PS2) 10Eevans: [WIP]: Cassandra TWCS deploy repository [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/313825 (https://phabricator.wikimedia.org/T133395) [20:36:48] gerrit:313162 says merge conflict, although it's trivial change on noc docroot [20:38:28] (03CR) 10Paladox: "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/313240 (owner: 10Paladox) [20:40:34] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 1105324 msg (=800000 warning): ocg_render_job_queue 3034 msg (=3000 critical) [20:41:23] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 1105603 msg (=800000 warning): ocg_render_job_queue 3072 msg (=3000 critical) [20:41:57] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 1105753 msg (=800000 warning): ocg_render_job_queue 3082 msg (=3000 critical) [20:44:11] (03PS2) 10Jcrespo: labsdb1002: remove from dhcp install server config [puppet] - 10https://gerrit.wikimedia.org/r/312528 (https://phabricator.wikimedia.org/T146455) [20:47:56] (03PS1) 10Eevans: Enable cassandra/twcs deploy repository [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) [20:48:25] (03CR) 10Jcrespo: [C: 032] labsdb1002: remove from dhcp install server config [puppet] - 10https://gerrit.wikimedia.org/r/312528 (https://phabricator.wikimedia.org/T146455) (owner: 10Jcrespo) [20:49:48] (03CR) 10Jforrester: [C: 031] "Product sign-off." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312735 (https://phabricator.wikimedia.org/T146417) (owner: 10Dereckson) [20:53:13] !log starting mobileapps deploy [20:53:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:55:24] (03PS1) 10Jcrespo: wmnet: remove labsdb1002.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/313894 (https://phabricator.wikimedia.org/T146455) [20:56:19] (03CR) 10Mobrovac: [C: 04-1] "If you intend to deploy it via scap3, this definition should go into hieradata/common/scap/server.yaml" [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [20:57:40] !log deployed mobileapps 17bc059 [20:57:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:59:55] (03PS1) 10Jcrespo: toolschecker: remove all references to labsdb1002 [puppet] - 10https://gerrit.wikimedia.org/r/313896 (https://phabricator.wikimedia.org/T146455) [21:00:04] dapatrick and bawolff: Dear anthropoid, the time has come. Please deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161003T2100). [21:01:45] !log disabling puppet on labsdb1002 and shutting it down for decommission [21:01:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:04:57] (03PS1) 10Jdlrobson: Push footer version 2 to stable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313898 (https://phabricator.wikimedia.org/T145442) [21:05:24] (03CR) 10Jdlrobson: [C: 04-1] "Blocked on T144579" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313898 (https://phabricator.wikimedia.org/T145442) (owner: 10Jdlrobson) [21:05:29] (03CR) 10Eevans: "> If you intend to deploy it via scap3, this definition should go" [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [21:06:49] (03PS1) 10Gilles: Separate Thumbor 404s into their own log [puppet] - 10https://gerrit.wikimedia.org/r/313899 [21:07:15] (03CR) 10Mobrovac: [C: 04-1] [WIP]: Cassandra TWCS deploy repository (032 comments) [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/313825 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [21:07:43] (03PS2) 10Gilles: Separate Thumbor 404s into their own log [puppet] - 10https://gerrit.wikimedia.org/r/313899 [21:08:27] 06Operations, 06Performance-Team, 10Thumbor: Separate 404s into their own log - https://phabricator.wikimedia.org/T145632#2686061 (10Gilles) [21:11:56] (03PS1) 10Andrew Bogott: Remove beta::deployaccess as it's no longer needed. [puppet] - 10https://gerrit.wikimedia.org/r/313903 (https://phabricator.wikimedia.org/T121721) [21:11:59] (03PS1) 10Andrew Bogott: Add role::beta::autoupdater [puppet] - 10https://gerrit.wikimedia.org/r/313904 (https://phabricator.wikimedia.org/T147233) [21:12:29] (03CR) 10Andrew Bogott: "Of course references in ldap need to be removed before this is merged." [puppet] - 10https://gerrit.wikimedia.org/r/313903 (https://phabricator.wikimedia.org/T121721) (owner: 10Andrew Bogott) [21:14:33] 06Operations, 06Collaboration-Team-Triage, 10Flow, 10MediaWiki-Redirects, and 2 others: Flow notification links on mobile point to desktop - https://phabricator.wikimedia.org/T107108#2686087 (10Catrope) [21:14:57] (03CR) 10jenkins-bot: [V: 04-1] Add role::beta::autoupdater [puppet] - 10https://gerrit.wikimedia.org/r/313904 (https://phabricator.wikimedia.org/T147233) (owner: 10Andrew Bogott) [21:16:15] (03PS1) 10Jdlrobson: Enable RelatedArticles on Minerva skin for all but top 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313906 (https://phabricator.wikimedia.org/T144812) [21:16:19] (03CR) 10Mobrovac: "Ups, sorry, 9ee97710b78278b807c82e8c75bb14869befc267 moved it to hieradata/role/common/deployment/server.yaml . Pretty confusing, I know." [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [21:16:42] (03CR) 10Jdlrobson: [C: 04-1] "Blocked on parent patch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313906 (https://phabricator.wikimedia.org/T144812) (owner: 10Jdlrobson) [21:21:15] 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission labsdb1002 - https://phabricator.wikimedia.org/T146455#2686106 (10jcrespo) I cannot access the serial interface. Please @cmjohnson make sure labsdb1002 is down next time you go to the datacenter- I no longer can connect to the host but it still resp... [21:24:16] (03PS3) 10Eevans: [WIP]: Cassandra TWCS deploy repository [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/313825 (https://phabricator.wikimedia.org/T133395) [21:26:23] (03PS2) 10Andrew Bogott: Add role::beta::autoupdater [puppet] - 10https://gerrit.wikimedia.org/r/313904 (https://phabricator.wikimedia.org/T147233) [21:26:42] (03PS4) 10Eevans: [WIP]: Cassandra TWCS deploy repository [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/313825 (https://phabricator.wikimedia.org/T133395) [21:31:09] (03CR) 10Eevans: "Any suggestions for the user values in `scap.cfg`?" (032 comments) [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/313825 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [21:38:16] (03PS2) 10Eevans: Enable cassandra/twcs deploy repository [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) [21:38:58] (03PS1) 10Jcrespo: mariadb: change phabricator's stopwords and init's owner to root [puppet] - 10https://gerrit.wikimedia.org/r/313918 [21:39:17] (03CR) 1020after4: [C: 031] "Dooo it" [puppet] - 10https://gerrit.wikimedia.org/r/313903 (https://phabricator.wikimedia.org/T121721) (owner: 10Andrew Bogott) [21:40:22] (03PS2) 10Andrew Bogott: Remove beta::deployaccess as it's no longer needed. [puppet] - 10https://gerrit.wikimedia.org/r/313903 (https://phabricator.wikimedia.org/T121721) [21:40:22] 06Operations, 10Traffic, 13Patch-For-Review, 05codfw-rollout: Varnish support for active:active backend services - https://phabricator.wikimedia.org/T134404#2686167 (10BBlack) [21:40:26] 06Operations, 06Services, 10Traffic, 13Patch-For-Review: Declarative configuration for varnish services and backends - https://phabricator.wikimedia.org/T110717#2686169 (10BBlack) [21:41:03] (03CR) 10Eevans: "> Ups, sorry, 9ee97710b78278b807c82e8c75bb14869befc267 moved it to" [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [21:41:50] (03PS3) 10Eevans: Enable cassandra/twcs deploy repository [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) [21:42:18] (03CR) 10Andrew Bogott: [C: 032] Remove beta::deployaccess as it's no longer needed. [puppet] - 10https://gerrit.wikimedia.org/r/313903 (https://phabricator.wikimedia.org/T121721) (owner: 10Andrew Bogott) [21:45:23] (03PS1) 10Jcrespo: Repool db1091 with low weight after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313919 (https://phabricator.wikimedia.org/T147113) [21:46:31] (03CR) 10Jcrespo: [C: 032] Repool db1091 with low weight after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313919 (https://phabricator.wikimedia.org/T147113) (owner: 10Jcrespo) [21:47:33] !log starting OCG deploy [21:47:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:50:16] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1091 with low weight after maintenance (duration: 00m 50s) [21:50:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:50:46] !log updated OCG to version 0bf27e3452dfdc770317f15793e93e6e89c7865a (T147211, T144120) [21:50:48] T147211: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211 [21:50:49] T144120: "Generation of the document file has failed. Status: Rendering process died with non zero code: 1" for https://en.wikipedia.org/wiki/User:DennisDaniels/Books/Medical_Laboratory_Basics - https://phabricator.wikimedia.org/T144120 [21:50:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:52:09] (03PS1) 10Andrew Bogott: Revert "Remove beta::deployaccess as it's no longer needed." [puppet] - 10https://gerrit.wikimedia.org/r/313921 [21:53:45] (03CR) 10Andrew Bogott: [C: 032] Revert "Remove beta::deployaccess as it's no longer needed." [puppet] - 10https://gerrit.wikimedia.org/r/313921 (owner: 10Andrew Bogott) [21:54:05] !log OCG deploy temporarily disabled PDF render on en.wiktionary.org to combat DoS. [21:54:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:54:39] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:57:48] (03PS1) 10Jcrespo: Repool db1091 with regular weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313923 (https://phabricator.wikimedia.org/T147113) [22:00:37] 06Operations, 10Traffic, 13Patch-For-Review, 05codfw-rollout: Varnish support for active:active backend services - https://phabricator.wikimedia.org/T134404#2686270 (10BBlack) I've merged the "declarative config" ticket to here, it's worth perusing the older comments/commits there at T110717. The rational... [22:01:42] (03PS2) 10Jcrespo: mariadb: change phabricator's stopwords and init's owner to root [puppet] - 10https://gerrit.wikimedia.org/r/313918 [22:02:24] (03CR) 10Mobrovac: [C: 031] "> are you suggesting that I *not* have this merged, and instead have someone in Ops clone it on tin? Or...?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [22:02:33] (03CR) 10Mobrovac: Enable cassandra/twcs deploy repository [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [22:02:36] (03CR) 10Jcrespo: [C: 032] mariadb: change phabricator's stopwords and init's owner to root [puppet] - 10https://gerrit.wikimedia.org/r/313918 (owner: 10Jcrespo) [22:08:55] !log running schema change (innodb conversion) on phabricator db hosts T146673 [22:08:56] T146673: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673 [22:09:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:11:11] 06Operations, 10OCG-General, 13Patch-For-Review: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2686306 (10cscott) ok, deployed a patch to blacklist en.wiktionary.org for the time being, rejecting jobs in the front end. when the front end error rate on... [22:20:39] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:24:52] (03CR) 10Jcrespo: [C: 032] Repool db1091 with regular weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313923 (https://phabricator.wikimedia.org/T147113) (owner: 10Jcrespo) [22:26:52] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1091 with regular weight (duration: 00m 51s) [22:26:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:30:06] (03PS1) 10Thcipriani: Beta: Move beta::deployaccess to role::beta::deployaccess [puppet] - 10https://gerrit.wikimedia.org/r/313927 [22:33:37] (03CR) 10Andrew Bogott: [C: 032] Beta: Move beta::deployaccess to role::beta::deployaccess [puppet] - 10https://gerrit.wikimedia.org/r/313927 (owner: 10Thcipriani) [22:55:02] AaronSchulz: could I get you to take a look at beta? Looks like https://gerrit.wikimedia.org/r/#/c/310757/ may be causing some issues, causing a bunch of fatals in https://logstash-beta.wmflabs.org [23:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161003T2300). Please do the needful. [23:00:04] Jdlrobson, James_F, dcausse, and matt_flaschen: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:56] Present [23:01:02] o/ [23:01:52] * James_F waves. [23:02:43] Who's doing it? [23:04:58] uhh, I suppose I'm around if no one else is... [23:05:23] I will SWAT! [23:05:52] Cool. [23:06:06] thcipriani: Note that mine is a combined /vendor and /core update, sorry.\ [23:06:18] oh boy. [23:06:34] All for one line of PHP, but… [23:06:40] I know there are notes about that somewhere so I don't have to think too hard... [23:09:50] James_F: ok...now how do I sync this :) [23:10:10] thcipriani: /vendor then resources/libs/oojs-ui I think. [23:10:32] (Push to 1099 first though!) [23:10:37] :D [23:10:52] good looking out :) [23:11:35] (03PS2) 10Thcipriani: Reduce number of replicas for titlesuggest indices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313800 (https://phabricator.wikimedia.org/T147192) (owner: 10DCausse) [23:11:53] while I wait on zuul, I'll get dcausse 's done. [23:12:05] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313800 (https://phabricator.wikimedia.org/T147192) (owner: 10DCausse) [23:12:19] thcipriani: mine is a noop, cluster settings are already updated [23:12:32] (03Merged) 10jenkins-bot: Reduce number of replicas for titlesuggest indices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313800 (https://phabricator.wikimedia.org/T147192) (owner: 10DCausse) [23:13:19] dcausse: so nothing to check on mw1099? (it's there now) [23:13:37] thcipriani: no, nothing to test :) [23:13:47] okie doke, syncing everywhere [23:15:31] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:313800|Reduce number of replicas for titlesuggest indices (T147192)]] (duration: 00m 51s) [23:15:32] T147192: Reduce the number of replicas for some titlesuggest indices - https://phabricator.wikimedia.org/T147192 [23:15:37] ^ dcausse live everywhere [23:15:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:15:47] thcipriani: thanks! [23:17:03] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313872 (owner: 10Catrope) [23:17:06] (03PS2) 10Thcipriani: Disable wmgEchoFooterNotice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313872 (owner: 10Catrope) [23:17:15] (03CR) 10Thcipriani: Disable wmgEchoFooterNotice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313872 (owner: 10Catrope) [23:17:30] (03CR) 10Thcipriani: [C: 032] "SWAT (rebase fail)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313872 (owner: 10Catrope) [23:18:00] (03Merged) 10jenkins-bot: Disable wmgEchoFooterNotice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313872 (owner: 10Catrope) [23:18:04] ah, vendor things merged, too. [23:18:20] lemme get the config change out the door, and then I'll circleback on that one (FYI James_F ) [23:18:35] Sure. [23:19:34] matt_flaschen: https://gerrit.wikimedia.org/r/#/c/313872/2 is love on mw1099, check please [23:19:43] Also, live. [23:19:59] yeah, that :) [23:22:20] thcipriani, verified on enwiki [23:22:34] matt_flaschen: ok, going live everywhere [23:24:32] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:313872|Disable wmgEchoFooterNotice]] (duration: 00m 49s) [23:24:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:45] ^ matt_flaschen live everywhere [23:27:44] James_F: your changes are live on mw1099 [23:28:12] * James_F checks. [23:28:34] thcipriani: hey im here now [23:28:39] Yup, LGTM, fixes the issue without any errors. [23:28:39] sorry a bit later then planned [23:28:59] jdlrobson: no problem, will circle back to your patch shortly :) [23:29:34] Thanks, thcipriani.Verified without 1099. [23:30:53] James_F: okie doke, will sync-dir vendor then resources/lib/oojs-ui [23:31:07] Ta. [23:31:15] I wonder how long vendor will take...scap is going to try to lint all those php files :\ [23:31:22] matt_flaschen: thank you for checking [23:33:18] thcipriani: Eww. [23:33:31] eww indeed. [23:33:38] ah, but it's done [23:34:02] not too bad, only 1000 files to lint :P [23:34:23] !log thcipriani@tin Synchronized php-1.28.0-wmf.20/vendor: SWAT: [[gerrit:313871|Update OOjs UI to v0.17.10]] (duration: 01m 33s) [23:34:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:36:42] !log thcipriani@tin Synchronized php-1.28.0-wmf.20/resources/lib/oojs-ui: SWAT: [[gerrit:313873|Update OOjs UI to v0.17.10]] (duration: 00m 48s) [23:36:47] ^ James_F live everywhere [23:36:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:36:55] Ta. Double-checking. [23:37:22] Yup, good in prod. [23:37:31] James_F: cool, thanks for checking [23:38:42] [23:36] https://gerrit.wikimedia.org/r/313162/ says merge conflict, although it's trivial change on noc docroot [23:39:11] (03PS2) 10Thcipriani: Enable Wikidata descriptions on Japanese and Spanish Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313869 (https://phabricator.wikimedia.org/T145786) (owner: 10Jdlrobson) [23:39:23] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313869 (https://phabricator.wikimedia.org/T145786) (owner: 10Jdlrobson) [23:39:49] (03Merged) 10jenkins-bot: Enable Wikidata descriptions on Japanese and Spanish Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313869 (https://phabricator.wikimedia.org/T145786) (owner: 10Jdlrobson) [23:40:04] arseny92: mediawiki-config is ff-only now, so you may just need to rebase that one and it should be fine [23:40:47] jdlrobson: your change is live on mw1099, check please [23:40:52] on it [23:41:21] could someone explain why those conflicts happen anyway as commits affect only changed text no? [23:41:30] Hello. [23:41:35] matt_flaschen, I've seen you planned some Flow-on-wikitech changes, all worked fine? [23:41:46] matt_flaschen: next step is to create table and enable it? [23:42:14] thcipriani ff only? [23:42:30] arseny92: the parent of any commit to merge MUST be the currently HEAD of the master branch [23:42:46] what Dereckson said :) [23:42:49] Dereckson, I put them in RoanKattouw's SWAT, I assume there were no complications. [23:42:52] arseny92: when you create a change, some other changes are merged meanwhile, you need to rebase yours against the last merged [23:42:59] that create a linear history [23:43:08] matt_flaschen: good news [23:43:32] thcipriani: looks good to me [23:43:36] please push everywhere [23:43:39] jdlrobson: ok, going live [23:43:41] Dereckson, next step is "Create the tables". Steps after that are given at https://phabricator.wikimedia.org/T127792 . Critical to do the populateContentModel.php step immediately before deploy in the same window. [23:44:17] matt_flaschen: okay, we can do it with the script now, as https://gerrit.wikimedia.org/r/#/c/309498/ is merged [23:45:40] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:313869|Enable Wikidata descriptions on Japanese and Spanish Wikipedias (T145786)]] (duration: 00m 49s) [23:45:41] T145786: Deploy wikidata descriptions to mobile web stable channel - spanish and japanese - https://phabricator.wikimedia.org/T145786 [23:45:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:45:46] ^ jdlrobson live everywhere [23:46:14] Dereckson, also, if someone wants to make the case that wikitech-static is a blocker, they can. I don't think so, though, as long as it doesn't break the import entirely (since Tool talk should not be discussing critical production stuff, which I think is the reason for wikitech-static). [23:46:30] Dereckson, I haven't checked into importing a full wiki (including some Flow), without doing a Flow import. [23:46:44] Krenair asked if import would still work fine indeed. [23:47:00] could someone do the rebase or that can only be done by the submitter of the commit (in this case Max)? [23:47:14] arseny92: you can click on the rebase button on Gerrit, yes [23:47:24] generally it's done just before the merge [23:47:42] Gerrit is configure to allow everyone to rebase/submit a new change [23:47:59] and update an existing change. [23:48:02] matt_flaschen: https://gerrit.wikimedia.org/r/#/c/313930 is live on mw1099 check please [23:48:25] technically though I'm the one who requested that change, as I don't have a labs account , see http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/?C=M;O=D for sep 28 [23:48:28] Dereckson, someone ought to assign that task to themselves to shepherd through. [23:48:30] I don't think wikitech-static is an immediate blocker to moving forward but you should probably consider it an expected part to be done before closing the task [23:48:57] Krenair, yeah, my concern is just that wikitech-static import not be entirely broken (or at least if it is, very very briefly) for all pages. [23:49:19] yeah breaking wikitech-static imports of non-flow pages would be Bad [23:49:42] It should be quite easy to fix properly (export all including Flow, import all including Flow) though. [23:49:55] Export: Run one new script. Import: Run the same exact script on a new file. [23:50:26] (03CR) 10Eevans: Enable cassandra/twcs deploy repository (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [23:51:55] so since I'm present, I guess the requirement for the requester to be present during swat deploy to verify things, is met eh? (as if there would be anything to verify in regards to that trivial change lol) [23:52:19] matt_flaschen, given the way this system works, let me know if you need any help getting the importing working [23:52:28] I'll probably need to install the extension on wikitech-static too [23:53:23] Yeah, it will be. [23:53:30] arseny92: if you wish we deploy it now, please add your noc. change on https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161003T2300 [23:53:43] (Flow will need to be on wikitech-static I mean) [23:54:09] yeah I mean, it's probably not already. so me or ops will have to sort that out [23:54:36] which is fine, just gotta figure what to do, and in what order to do it [23:55:05] before we create a Flow page on wikitech [23:55:20] delayed thanks thcipriani [23:55:22] :) [23:55:49] jdlrobson: np, you're welcome :) [23:57:13] Flow can be installed in parallel on -static and wikitech, what we need to block is to create a Flow page before it's installed on -static. [23:58:08] arseny92: or you can plan it at a later window, as your better convenience [23:59:50] Where is the rebase button lol