[00:00:04] twentyafterfour: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160707T0000). Please do the needful. [00:02:12] Amir1: can you find a small-ish global rename to do and I'll tell you when to do it? [00:02:28] legoktm: yeah [00:02:36] let me check the queue [00:03:29] !log legoktm@tin Synchronized php-1.28.0-wmf.8/extensions/CentralAuth/: Make LocalRename jobs run sequentially - T137973 (duration: 00m 34s) [00:03:30] T137973: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973 [00:03:31] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [00:03:36] legoktm: This looks good: https://meta.wikimedia.org/wiki/Special:CentralAuth/Tegoutte [00:03:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:03:59] legoktm: tested the parser function and APIs and everything is running smoothly. Thanks for your help. I'll ping jamie to make sure he's aware of the new tables on prod. He's already reviewed them and they don't have any private data, so should be good. [00:04:58] kaldari: mkay [00:05:17] !log legoktm@tin Synchronized php-1.28.0-wmf.9/extensions/CentralAuth/: Make LocalRename jobs run sequentially - T137973 (duration: 00m 30s) [00:05:18] T137973: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973 [00:05:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:06:50] !log legoktm@tin Synchronized php-1.28.0-wmf.8/extensions/CentralAuth/: Make LocalRename jobs run sequentially - T137973 (for real this time) (duration: 00m 30s) [00:06:51] T137973: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973 [00:06:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:07:12] Amir1: ok, go for it [00:07:34] legoktm: started [00:07:45] https://meta.wikimedia.org/wiki/Special:GlobalRenameQueue/request/25311/ [00:08:23] > There are no renames in progress for Correcteur748. They may have already finished. [00:08:28] \o/ [00:08:39] yes [00:08:46] do you want something bigger? [00:09:06] you should probably test one with a lot of attached wikis [00:09:22] legoktm: https://gerrit.wikimedia.org/r/#/c/297715/ [00:09:37] Amir1: yes please [00:10:09] https://meta.wikimedia.org/wiki/Special:CentralAuth/Der-wuppertaler [00:10:15] 13K edits, 90 wikis [00:10:32] https://meta.wikimedia.org/wiki/Special:GlobalRenameQueue/request/25312/ [00:11:03] it's much much slower now, but I think I like it that way [00:11:14] https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/-wuppertaler [00:11:40] I like that it works in alphabetical order [00:11:42] :D [00:11:55] yeah renames don't need to be incredibly fast they just need to be reliable [00:12:40] 06Operations, 10Ops-Access-Requests: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2435742 (10Dzahn) a:03MoritzMuehlenhoff could you take a look Moritz? can you see if/how this was existing access that got removed? and possibly restore [00:13:21] we have 300 requests in the queue [00:15:05] is there a way to know what 10.68.16.121 is? I mean which machine and what it's doing? [00:17:00] SMalyshev: it's a labs instance, is what it tells us [00:17:11] you can see that if you clone the DNS repo [00:17:18] aha, ok, thanks! [00:17:22] and in templates/10.in-addr.arpa [00:17:31] but from there.. _which_ labs instance.. eh... [00:17:43] gotta go to a labs bastion [00:18:08] legoktm: so I'm starting to approve the requests there. [00:18:18] Amir1: er, don't yet [00:18:24] okay [00:18:33] :D [00:18:52] tell me when you want more tests, etc. [00:19:17] Amir1: can you start two more renames? [00:19:30] big, small? [00:19:37] big [00:20:49] okay, looking [00:21:03] (not like giant ones though :P) [00:21:46] 121.16.68.10.in-addr.arpa domain name pointer ci-jessie-wikimedia-47938.contintcloud.eqiad.wmflabs. [00:21:49] 121.16.68.10.in-addr.arpa domain name pointer petscan1.petscan.eqiad.wmflabs. [00:21:52] SMalyshev: ^ [00:21:58] now if you ask why there are 2 .. eh [00:22:04] one is old [00:22:20] mutante: thanks! that helps a lot [00:22:22] i guess it's more likely this is is ci-jessie [00:22:29] sure, yw [00:23:03] legoktm: https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Guillaume_Lussier-Dulude_(LordGui99) [00:23:14] and https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Alexandre_Clerici [00:23:25] thanks [00:23:26] not super big, tell me if you want bigger [00:23:35] one is 24 wikis, the other one is 44 [00:26:23] legoktm: I have this: https://meta.wikimedia.org/wiki/Special:CentralAuth/StefanoRR [00:26:40] do it [00:27:09] https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/LarsFillmore [00:27:31] PROBLEM - HP RAID on labvirt1011 is CRITICAL: Connection refused by host [00:28:02] PROBLEM - dhclient process on labvirt1011 is CRITICAL: Connection refused by host [00:28:22] PROBLEM - nova-compute process on labvirt1011 is CRITICAL: Connection refused by host [00:28:22] PROBLEM - SSH on labvirt1011 is CRITICAL: Connection refused [00:28:42] PROBLEM - puppet last run on labvirt1011 is CRITICAL: Connection refused by host [00:29:21] PROBLEM - Disk space on labvirt1011 is CRITICAL: Connection refused by host [00:30:26] MaxSem, was https://gerrit.wikimedia.org/r/#/c/297716/1 showing up a lot? [00:30:51] RECOVERY - nova-compute process on labvirt1011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [00:30:52] RECOVERY - SSH on labvirt1011 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [00:31:01] now if you ask why there are 2 .. eh [00:31:03] It's a known bug [00:31:21] Almost always contintcloud involvement because they create/delete instances so much, apparently [00:31:22] PROBLEM - kvm ssl cert on labvirt1011 is CRITICAL: Connection refused by host [00:31:31] SMalyshev, ^ [00:31:43] Krenair, not much [00:32:03] Krenair: I see [00:32:05] SMalyshev, mutante: https://phabricator.wikimedia.org/T115194 [00:33:08] One instance I created has 49 other instances showing up with it when you query PTR records [00:33:57] 06Operations: ytterbium, neon and strontium daily cronspam - https://phabricator.wikimedia.org/T132661#2206046 (10Dzahn) it's because one of the NameVirtualHost *:80 is in ports.conf which comes like that per default, and then it's repeated in our puppetized files in sites-enabled. I worked around this before by... [00:33:59] actually I listed it in that task further down the comments [00:34:13] RECOVERY - Disk space on labvirt1011 is OK: DISK OK [00:34:24] huh I guess CI ones just don't get deleted properly. But the other one looks like what I need [00:34:26] 06Operations, 06Labs, 10Labs-Infrastructure: Some labs instances IP have multiple PTR entries in DNS - https://phabricator.wikimedia.org/T115194#2435798 (10AlexMonk-WMF) ``` 121.16.68.10.in-addr.arpa domain name pointer ci-jessie-wikimedia-47938.contintcloud.eqiad.wmflabs. 121.16.68.10.in... [00:35:05] well, maybe, but look at the very first example I gave in the task SMalyshev [00:35:20] Maybe testlabs-createtest2.testlabs.eqiad.wmflabs. came from nodepool? I honestly have no idea [00:36:02] RECOVERY - kvm ssl cert on labvirt1011 is OK: Cert /etc/ssl/localcerts/labvirt-star.eqiad.wmnet.crt will not expire for at least 90 days [00:36:13] PROBLEM - salt-minion processes on labvirt1011 is CRITICAL: Connection refused by host [00:36:22] PROBLEM - DPKG on labvirt1011 is CRITICAL: Connection refused by host [00:37:03] RECOVERY - HP RAID on labvirt1011 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:1:5, 2I:1:6, 2I:1:7, 2I:1:8, Controller, Battery/Capacitor [00:37:21] (03PS1) 10Dzahn: gerrit: remove NameVirtualHost *:80 from Apache template [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) [00:37:52] PROBLEM - nova-compute process on labvirt1011 is CRITICAL: Connection refused by host [00:38:32] RECOVERY - salt-minion processes on labvirt1011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:38:41] RECOVERY - DPKG on labvirt1011 is OK: All packages OK [00:40:12] PROBLEM - SSH on labvirt1011 is CRITICAL: Connection refused [00:41:12] PROBLEM - Disk space on labvirt1011 is CRITICAL: Connection refused by host [00:41:52] PROBLEM - configured eth on labvirt1011 is CRITICAL: Connection refused by host [00:42:32] RECOVERY - SSH on labvirt1011 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [00:44:32] RECOVERY - dhclient process on labvirt1011 is OK: PROCS OK: 0 processes with command name dhclient [00:45:22] PROBLEM - kvm ssl cert on labvirt1011 is CRITICAL: Connection refused by host [00:45:43] (03PS1) 10Dzahn: icinga,tendril: remove duplicate NameVirtualHost *:80 [puppet] - 10https://gerrit.wikimedia.org/r/297727 (https://phabricator.wikimedia.org/T132661) [00:45:51] RECOVERY - Disk space on labvirt1011 is OK: DISK OK [00:46:32] RECOVERY - configured eth on labvirt1011 is OK: OK - interfaces up [00:49:32] PROBLEM - SSH on labvirt1011 is CRITICAL: Connection refused [00:50:11] PROBLEM - salt-minion processes on labvirt1011 is CRITICAL: Connection refused by host [00:51:51] RECOVERY - nova-compute process on labvirt1011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [00:52:21] RECOVERY - kvm ssl cert on labvirt1011 is OK: Cert /etc/ssl/localcerts/labvirt-star.eqiad.wmnet.crt will not expire for at least 90 days [00:52:31] RECOVERY - salt-minion processes on labvirt1011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:56:12] PROBLEM - dhclient process on labvirt1011 is CRITICAL: Connection refused by host [00:56:31] RECOVERY - SSH on labvirt1011 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [01:00:43] 06Operations: Rename 'restricted' group? - https://phabricator.wikimedia.org/T104671#2435825 (10Dzahn) 69 members: [daniel, dartar, ellery, bearloga, 70 ezachte, hoo, jamesur, jdlrobson, khorn, tparscal, ssastry, 71 ironholds, nuria, leila, santhosh, amire80, legoktm, addsho... [01:02:33] legoktm: I have some big ones for when you want to test more [01:02:42] 06Operations: Rename 'restricted' group? - https://phabricator.wikimedia.org/T104671#2435828 (10Dzahn) [01:04:11] 06Operations: Rename 'restricted' group? - https://phabricator.wikimedia.org/T104671#2435829 (10Dzahn) [01:04:12] PROBLEM - kvm ssl cert on labvirt1011 is CRITICAL: Connection refused by host [01:04:22] PROBLEM - salt-minion processes on labvirt1011 is CRITICAL: Connection refused by host [01:07:51] PROBLEM - configured eth on labvirt1011 is CRITICAL: Connection refused by host [01:08:02] RECOVERY - dhclient process on labvirt1011 is OK: PROCS OK: 0 processes with command name dhclient [01:08:52] RECOVERY - kvm ssl cert on labvirt1011 is OK: Cert /etc/ssl/localcerts/labvirt-star.eqiad.wmnet.crt will not expire for at least 90 days [01:09:12] PROBLEM - DPKG on labvirt1011 is CRITICAL: Connection refused by host [01:12:55] Amir1: uh, I'm just going to email the global-renamers list in a few minutes [01:13:23] okay [01:13:42] https://gerrit.wikimedia.org/r/#/c/297713 <- also this [01:14:01] PROBLEM - Disk space on labvirt1011 is CRITICAL: Connection refused by host [01:14:22] +2'd [01:14:42] RECOVERY - configured eth on labvirt1011 is OK: OK - interfaces up [01:16:02] RECOVERY - salt-minion processes on labvirt1011 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [01:16:08] thanks :) [01:16:52] PROBLEM - HP RAID on labvirt1011 is CRITICAL: Connection refused by host [01:18:42] RECOVERY - Disk space on labvirt1011 is OK: DISK OK [01:19:42] PROBLEM - dhclient process on labvirt1011 is CRITICAL: Connection refused by host [01:21:33] RECOVERY - HP RAID on labvirt1011 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:1:5, 2I:1:6, 2I:1:7, 2I:1:8, Controller, Battery/Capacitor [01:22:52] PROBLEM - kvm ssl cert on labvirt1011 is CRITICAL: Connection refused by host [01:23:12] RECOVERY - DPKG on labvirt1011 is OK: All packages OK [01:24:03] PROBLEM - configured eth on labvirt1011 is CRITICAL: Connection refused by host [07:04:09] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 13Patch-For-Review, and 3 others: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2436259 (10biplabanand) Finally i am able to log in now. Thanks @Tgr and @Legoktm [07:05:14] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 13Patch-For-Review, and 3 others: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2436272 (10Cyberpower678) >>! In T137973#2436259, @biplabanand wrote: > Finally i am able to log in now. Thanks @Tgr and @Legoktm... [07:06:55] Jamesofur: hi [07:07:00] \o [07:09:53] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 13Patch-For-Review, and 3 others: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2436285 (10biplabanand) >>! In T137973#2436272, @Cyberpower678 wrote: >>>! In T137973#2436259, @biplabanand wrote: >> Finally i a... [07:11:14] Jamesofur: do you know when he was last able to login? [07:11:24] legoktm: today I believe [07:11:44] but let me ask [07:12:23] legoktm: "I don’t recall.   I know I was logged in a couple of days ago" [07:15:19] | scnwiki | Philippe | 20160705084207 | login | [07:16:01] legoktm: I tried to get him to login there today following the https://en.wikipedia.org/wiki/Help:Logging_in#Login_issues_and_problems instructions [07:16:11] but he got another exeception when he tried there [07:16:54] !log mysql:wikiadmin@db1041 [centralauth]> delete from localuser where lu_name ="Philippe" and lu_wiki ="scnwiki"; [07:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:17:01] Jamesofur: ask him to try now [07:17:12] legoktm: will do, should he log in anywhere or specifically scn ? [07:17:19] (03Abandoned) 10Gehel: WIP - Create necessary folders for Postgresql and Cassandra [puppet] - 10https://gerrit.wikimedia.org/r/288215 (https://phabricator.wikimedia.org/T134901) (owner: 10Gehel) [07:17:21] anywhere [07:17:29] 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: SVG rendering with marker-element is different between librsvg and Inkscape - https://phabricator.wikimedia.org/T97758#2436310 (10MoritzMuehlenhoff) @Menner, @tgr: Fixed. For confirmation I have re-triggered the generation of File:Rsvg marker element bug.s... [07:18:05] legoktm: he's in, thanks [07:18:14] np [07:30:25] PROBLEM - check_payments_wiki on payments1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:35:23] PROBLEM - check_payments_wiki on payments1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:39:24] RECOVERY - Apache HTTP on mw1261 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.498 second response time [07:40:13] RECOVERY - check_payments_wiki on payments1005 is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 0.025 second response time [07:56:24] !log rolling restart of elasticsearch cluster codfw completed (T138811) [07:56:25] T138811: CVE-2016-4997 - https://phabricator.wikimedia.org/T138811 [07:56:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:01:10] (03CR) 10Gehel: Correct scoping issues in role::osm::master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/297703 (owner: 10Gehel) [08:02:27] RECOVERY - puppet last run on mw1261 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:06:44] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 13Patch-For-Review, and 3 others: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2436343 (10biplabanand) Got Problem once again:) my account is not attached with more than 256 accounts :) https://commons.wikime... [08:12:46] (03CR) 10Alexandros Kosiaris: "I 've been thinking about this too. I went for this approach first too, then I realized it would not be consistent since in this file we w" [puppet] - 10https://gerrit.wikimedia.org/r/297727 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [08:23:26] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [08:23:26] (03CR) 10Elukey: "Thanks Daniel!" [puppet] - 10https://gerrit.wikimedia.org/r/297727 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [08:24:56] (03CR) 10Elukey: "Same discussion happening in https://gerrit.wikimedia.org/r/#/c/297727/, would it make sense to upgrade to 2.4 and get rid of this?" [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [08:30:40] 06Operations, 13Patch-For-Review: Randomly failing puppetmaster sync to strontium - https://phabricator.wikimedia.org/T128895#2089539 (10akosiaris) >>! In T128895#2187221, @fgiunchedi wrote: > and again, actually not while synching to strontium but as soon as puppet-merge is ran > > ``` > palladium:~$ sudo pu... [08:33:13] (03CR) 10Alexandros Kosiaris: "@Elukey: ah, you 've touched a sensitive subject. Technically neon should not even be around anymore https://phabricator.wikimedia.org/T12" [puppet] - 10https://gerrit.wikimedia.org/r/297727 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [08:33:55] !log Updated Wikidata's property suggester with data from Monday's json dump and removed the external identifiers as a workaround for T132839 [08:33:56] T132839: Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839 [08:33:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:41:58] (03PS1) 10Jcrespo: Use db1056 instead of db1019 for commons recentchanges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297755 (https://phabricator.wikimedia.org/T139346) [08:45:01] (03CR) 10Jcrespo: [C: 032] Use db1056 instead of db1019 for commons recentchanges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297755 (https://phabricator.wikimedia.org/T139346) (owner: 10Jcrespo) [08:47:01] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Failover s4 recentchanges to db1056 (duration: 00m 38s) [08:47:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:51:22] <_joe_> !log removing all old servers from the appservers pool but the canaries (T139353) [08:51:23] T139353: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353 [08:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:58:58] 06Operations, 06Commons, 10media-storage: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2435518 (10MoritzMuehlenhoff) @kaldari I think this is fallout of https://phabricator.wikimedia.org/T97758#2436310. I have ran &action=purge on... [09:08:12] (03PS2) 10Muehlenhoff: dumps: Restrict to PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297582 [09:10:03] (03CR) 10Giuseppe Lavagetto: [C: 032] "deb builds correctly and with no lintian errors." [software/service-checker] - 10https://gerrit.wikimedia.org/r/297558 (owner: 10Giuseppe Lavagetto) [09:12:29] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2436482 (10elukey) Upgraded aqs100[456], we are going to test and upgrade 100[123] soon (@mobrovac I am ok for the deadline) [09:15:21] PROBLEM - check_payments_wiki on payments1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:20:11] RECOVERY - check_payments_wiki on payments1005 is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 0.031 second response time [09:23:13] (03CR) 10Muehlenhoff: [C: 032 V: 032] dumps: Restrict to PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297582 (owner: 10Muehlenhoff) [09:24:10] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2436493 (10mobrovac) >>! In T138561#2436053, @KartikMistry wrote: > @mobrovac deployment-sca02.eqiad.wmflabs and deployment-sca01.eqiad.wmflabs should be fine... [09:24:35] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2436494 (10mobrovac) >>! In T138561#2436482, @elukey wrote: > Upgraded aqs100[456], we are going to test and upgrade 100[123] soon (@mobrovac I am ok for the d... [09:27:05] (03PS1) 10Mklette: server.pp: fix zkCleanup cron [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/297757 [09:39:21] (03PS1) 10Elukey: Revert "Revert "Raise the Hadoop HDFS datanode heapsize to 2GB."" [puppet] - 10https://gerrit.wikimedia.org/r/297758 [09:39:35] (03PS2) 10Elukey: Revert "Revert "Raise the Hadoop HDFS datanode heapsize to 2GB."" [puppet] - 10https://gerrit.wikimedia.org/r/297758 [09:42:44] (03CR) 10Elukey: [C: 032] Revert "Revert "Raise the Hadoop HDFS datanode heapsize to 2GB."" [puppet] - 10https://gerrit.wikimedia.org/r/297758 (owner: 10Elukey) [09:43:23] 06Operations, 06Labs, 10Labs-Infrastructure: Some labs instances IP have multiple PTR entries in DNS - https://phabricator.wikimedia.org/T115194#2436546 (10hashar) Until the DNS leak is identified entries will keep leaking. It is quite easy to retrieve all of them from the Designate database, so there is no... [09:44:24] (03PS2) 10Gehel: Correct scoping issues in role::osm::master [puppet] - 10https://gerrit.wikimedia.org/r/297703 [09:47:52] (03CR) 10Gehel: "Puppet compiler looks good: https://puppet-compiler.wmflabs.org/3278/" [puppet] - 10https://gerrit.wikimedia.org/r/297703 (owner: 10Gehel) [09:48:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] Correct scoping issues in role::osm::master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/297703 (owner: 10Gehel) [09:49:32] (03CR) 10Gehel: Correct scoping issues in role::osm::master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/297703 (owner: 10Gehel) [09:49:57] akosiaris: ^ what are the "appropriate name changes" ? [09:50:52] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 4 failures [09:51:25] (03PS3) 10Gehel: Correct scoping issues in role::osm::master [puppet] - 10https://gerrit.wikimedia.org/r/297703 [09:51:49] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n, 13Patch-For-Review: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2436568 (10MoritzMuehlenhoff) fonts-sil-lateef has been uploaded to jessie-backports, but it will only... [09:53:21] (03CR) 10Gehel: "Puppet compiler still looks good: https://puppet-compiler.wmflabs.org/3279/" [puppet] - 10https://gerrit.wikimedia.org/r/297703 (owner: 10Gehel) [09:55:26] (03CR) 10Alexandros Kosiaris: Correct scoping issues in role::osm::master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/297703 (owner: 10Gehel) [09:55:37] (03CR) 10Alexandros Kosiaris: [C: 031] Correct scoping issues in role::osm::master [puppet] - 10https://gerrit.wikimedia.org/r/297703 (owner: 10Gehel) [09:56:08] (03PS4) 10Gehel: Correct scoping issues in role::osm::master [puppet] - 10https://gerrit.wikimedia.org/r/297703 [09:56:22] akosiaris: thanks! [09:56:53] gehel: the name changes would have had to happen if it was not the role hiera backend but the nuyaml one [09:57:02] it looks up hiera keys differently [09:58:15] akosiaris: how differenlty? Not using the fully qualified name of the parameter? [09:58:21] yup [09:58:34] depending on the file contained in it could have been [09:58:35] (03CR) 10Gehel: [C: 032] Correct scoping issues in role::osm::master [puppet] - 10https://gerrit.wikimedia.org/r/297703 (owner: 10Gehel) [09:58:43] osm_slave or master::osm_slave [09:58:50] I know confusing [09:58:53] we need to fix this a bit [09:59:12] Ok, I'm going to get bitten by that one at some point :P [09:59:44] only for changes that are not host or role related [10:00:10] host specific ones and role ones are not prone to that thing [10:00:21] not sure whether to call it a bug or a feature [10:00:34] a bit confusing at least... [10:01:12] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [10:04:17] !log rolling restart of elasticsearch cluster eqiad completed (T138811) [10:04:19] T138811: CVE-2016-4997 - https://phabricator.wikimedia.org/T138811 [10:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:05:46] (03PS1) 10Alexandros Kosiaris: Introduce network::constants::frack_networks [puppet] - 10https://gerrit.wikimedia.org/r/297760 [10:07:27] !log reboot etherpad1001.eqiad.wmnet, kernel upgrade and qemu upgrade, T134242 [10:07:28] T134242: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242 [10:07:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:09:38] (03PS1) 10Giuseppe Lavagetto: service_checker: use external package [puppet] - 10https://gerrit.wikimedia.org/r/297761 [10:09:46] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2436620 (10akosiaris) >>>! In T138561#2431559, @akosiaris wrote: >> @mobrovac, I will not be around that week, but all in all I doubt I will be needed. I don't... [10:10:22] _joe_: btw, I 've got a patch for service_checker coming. mostly adding debugging statements. Probably after I return though [10:10:29] it's only 20-30% ready [10:10:40] <_joe_> akosiaris: cool! [10:10:53] <_joe_> akosiaris: and yeah, we have no logging atm, which is shameful [10:11:04] shame, shame, shame [10:11:05] !log pooling mw1261 back to service with Apache mod-proxy-fcgi set to trace8 (T73487) [10:11:06] T73487: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487 [10:11:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:11:19] https://www.youtube.com/watch?v=uZ7vkmUNTPA [10:11:26] <_joe_> elukey: let's see if it blows up :P [10:11:27] I will pool mw1261 back with incremental weights just to be sure [10:12:18] _joe_ I like when you are positive :P [10:12:43] <_joe_> elukey: I patched it, so optimism would be being delusional [10:13:41] ok so I pooled it without trace8 and weight 5, all good from logstash and access logs.. going to set up trace [10:13:47] !log reboot bohrium T134242 [10:13:48] T134242: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242 [10:13:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:14:18] PROBLEM - BGP status on cr1-ulsfo is CRITICAL: BGP CRITICAL - AS1299/IPv6: Active, AS1299/IPv4: Connect [10:14:27] <_joe_> uh ^ [10:14:36] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 22 probes of 417 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [10:15:04] <_joe_> akosiaris: should we put ulsfo out of rotation? [10:15:24] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2436635 (10mobrovac) [10:15:33] (03PS2) 10Alexandros Kosiaris: Introduce network::constants::frack_networks [puppet] - 10https://gerrit.wikimedia.org/r/297760 [10:15:44] hmm [10:17:08] not yet, lemme make sure it's not transient [10:17:27] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:18:12] mw1261 pooled with proxy_fcgi:trace8 [10:19:48] _joe_: looks like it's back up [10:20:18] Description: Telia [10:20:18] Type: External State: Active [10:20:26] PROBLEM - check_payments_wiki on payments1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:20:56] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 1 probes of 417 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [10:22:06] RECOVERY - BGP status on cr1-ulsfo is OK: BGP OK - up: 17, down: 0, shutdown: 0 [10:22:08] !log reboot bromine T134242 [10:22:09] T134242: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242 [10:22:15] !log reboot dubnium T134242 [10:22:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:22:16] T134242: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242 [10:22:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:23:50] (03PS1) 10Muehlenhoff: Add fonts-taml-tscu font to scalers [puppet] - 10https://gerrit.wikimedia.org/r/297763 (https://phabricator.wikimedia.org/T117919) [10:24:18] moritzm: btw, congrats on getting all the scalers on jessie [10:24:38] I am sure a lot of people will appreciate all the librsvg bugs that closed [10:25:16] RECOVERY - check_payments_wiki on payments1005 is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 0.029 second response time [10:25:48] I'm sure they'll just find even more of them... [10:25:55] !log reboot mx1001, planet1001, rutherfordium, seaborgium, ununpentium T134242 [10:25:56] T134242: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242 [10:26:14] still quite a few bugs unfixed upstream (and w/o much/any activity), but 2.40.16 is certainly a good step forward [10:26:56] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add fonts-taml-tscu font to scalers [puppet] - 10https://gerrit.wikimedia.org/r/297763 (https://phabricator.wikimedia.org/T117919) (owner: 10Muehlenhoff) [10:29:40] !log reboot fermium.wikimedia.org hassium.eqiad.wmnet install1001.wikimedia.org krypton.eqiad.wmnet meitnerium.wikimedia.org mendelevium.eqiad.wmnet T134242 [10:29:41] T134242: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242 [10:29:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:31:08] (03CR) 10Muehlenhoff: [C: 04-1] "Don't merge yet, still in the NEW queue of jessie-backports." [puppet] - 10https://gerrit.wikimedia.org/r/297236 (https://phabricator.wikimedia.org/T138136) (owner: 10Muehlenhoff) [10:31:36] (03CR) 10Alexandros Kosiaris: "PCC says ok https://puppet-compiler.wmflabs.org/3282/carbon.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/297760 (owner: 10Alexandros Kosiaris) [10:32:07] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit nfs-exports is failed [10:32:09] 06Operations, 10Wikimedia-SVG-rendering, 13Patch-For-Review: Librsvg does not consistently render the TSCu font family - https://phabricator.wikimedia.org/T117919#2436709 (10MoritzMuehlenhoff) [10:32:32] (03PS3) 10Alexandros Kosiaris: Introduce network::constants::frack_networks [puppet] - 10https://gerrit.wikimedia.org/r/297760 [10:32:50] (03PS4) 10Alexandros Kosiaris: Introduce network::constants::frack_networks [puppet] - 10https://gerrit.wikimedia.org/r/297760 [10:32:56] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Introduce network::constants::frack_networks [puppet] - 10https://gerrit.wikimedia.org/r/297760 (owner: 10Alexandros Kosiaris) [10:34:53] (03PS2) 10Giuseppe Lavagetto: service_checker: use external package [puppet] - 10https://gerrit.wikimedia.org/r/297761 [10:36:04] 06Operations: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242#2436716 (10akosiaris) All `eqiad` VMs have been upgraded to qemu 2.5 as well. I 'll leave this open just in case some bug manifests, but otherwise I consider it resolved [10:38:37] PROBLEM - puppet last run on mw2104 is CRITICAL: CRITICAL: puppet fail [10:40:37] (03PS1) 10Ema: package_builder: add WMF lintian vendor profile [puppet] - 10https://gerrit.wikimedia.org/r/297765 [10:44:10] !log disabling all mysql lag alerts cross-fleet T122457 [10:44:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:46:17] stashbot: !! [10:46:36] nope, no sign of life [10:58:53] (03PS2) 10Jcrespo: icinga: move check_mariadb plugin into module [puppet] - 10https://gerrit.wikimedia.org/r/296923 (owner: 10Dzahn) [10:59:37] moritzm: does firejail tracelog work for you? [11:03:49] (03CR) 10Jcrespo: [C: 032] "We agreed several ops that this should be on the actual mariadb module, and should be moved later." [puppet] - 10https://gerrit.wikimedia.org/r/296923 (owner: 10Dzahn) [11:04:32] (03CR) 10Jcrespo: [C: 032] icinga/mariadb: move plugin into module [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/296921 (owner: 10Dzahn) [11:06:34] (03PS1) 10Jcrespo: Update mariadb submodule (change mariadb alert location) [puppet] - 10https://gerrit.wikimedia.org/r/297767 [11:07:13] RECOVERY - puppet last run on mw2104 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:15] (03PS2) 10Jcrespo: Update mariadb submodule (change mariadb alert location) [puppet] - 10https://gerrit.wikimedia.org/r/297767 [11:07:16] 06Operations, 10Wikimedia-SVG-rendering: SVG rendering problem with pattern - https://phabricator.wikimedia.org/T118456#2436830 (10MoritzMuehlenhoff) [11:07:28] 06Operations, 10Wikimedia-SVG-rendering: SVG rendering problem with pattern - https://phabricator.wikimedia.org/T118456#1800700 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [11:08:35] (03CR) 10Jcrespo: [C: 032] Update mariadb submodule (change mariadb alert location) [puppet] - 10https://gerrit.wikimedia.org/r/297767 (owner: 10Jcrespo) [11:08:59] 06Operations, 10Wikimedia-SVG-rendering: SVG rendering problem with pattern - https://phabricator.wikimedia.org/T118456#1800700 (10MoritzMuehlenhoff) 05stalled>03Resolved This has been fixed by the recent update of librsvg on the image scalers to 2.40.16, The rendered PNG version (e.g. https://upload.wiki... [11:12:10] jzerebecki: it does. but it logs to syslog and not to stderr/stdout as the manpage seems to imply [11:12:38] what least it's working for me with 0.9.40 [11:12:55] didn't test with 0.9.38 which we currently have on production servers [11:14:34] PROBLEM - puppet last run on ms-be2019 is CRITICAL: CRITICAL: puppet fail [11:15:29] 06Operations, 10Wikimedia-SVG-rendering: SVG linearGradient element must be defined before fill attribute link to render - https://phabricator.wikimedia.org/T107638#2436853 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [11:15:44] PROBLEM - puppet last run on db1083 is CRITICAL: CRITICAL: Puppet has 1 failures [11:16:08] (03PS1) 10Yuvipanda: tools: Fix k8s webservice backend check [puppet] - 10https://gerrit.wikimedia.org/r/297771 (https://phabricator.wikimedia.org/T131929) [11:16:51] (03PS2) 10Yuvipanda: tools: Fix k8s webservice backend check [puppet] - 10https://gerrit.wikimedia.org/r/297771 (https://phabricator.wikimedia.org/T131929) [11:17:36] (03CR) 10Yuvipanda: [C: 032] tools: Fix k8s webservice backend check [puppet] - 10https://gerrit.wikimedia.org/r/297771 (https://phabricator.wikimedia.org/T131929) (owner: 10Yuvipanda) [11:17:58] (03PS1) 10Jcrespo: Sanitize SQL errors printed to icinga and IRC [puppet] - 10https://gerrit.wikimedia.org/r/297773 (https://phabricator.wikimedia.org/T122457) [11:18:07] (03CR) 10Yuvipanda: [V: 032] tools: Fix k8s webservice backend check [puppet] - 10https://gerrit.wikimedia.org/r/297771 (https://phabricator.wikimedia.org/T131929) (owner: 10Yuvipanda) [11:18:33] (03PS2) 10Jcrespo: Sanitize SQL errors printed to icinga and IRC [puppet] - 10https://gerrit.wikimedia.org/r/297773 (https://phabricator.wikimedia.org/T122457) [11:19:46] (03PS1) 10Yuvipanda: tools: Add icinga check for kubernetes webservice [puppet] - 10https://gerrit.wikimedia.org/r/297774 (https://phabricator.wikimedia.org/T131929) [11:20:20] (03CR) 10Jcrespo: [C: 032] Sanitize SQL errors printed to icinga and IRC [puppet] - 10https://gerrit.wikimedia.org/r/297773 (https://phabricator.wikimedia.org/T122457) (owner: 10Jcrespo) [11:21:04] 06Operations, 10Wikimedia-SVG-rendering: SVG linearGradient element must be defined before fill attribute link to render - https://phabricator.wikimedia.org/T107638#2436861 (10MoritzMuehlenhoff) 05Open>03Resolved This has been fixed by the recent update of the librsvg to 2.40.16. The PNG version is now re... [11:21:16] (03PS2) 10Yuvipanda: tools: Add icinga check for kubernetes webservice [puppet] - 10https://gerrit.wikimedia.org/r/297774 (https://phabricator.wikimedia.org/T131929) [11:21:31] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Add icinga check for kubernetes webservice [puppet] - 10https://gerrit.wikimedia.org/r/297774 (https://phabricator.wikimedia.org/T131929) (owner: 10Yuvipanda) [11:22:16] yuvipanda, do I merge your change or wait? [11:22:34] jynus yup merge! [11:22:41] I got a ! 1330c39..f21ff1f production -> origin/production (unable to update local ref) [11:23:04] oh? [11:23:14] on palladium? [11:23:19] I got no error [11:24:55] there is a "There are 2 unmerged changes in puppet", but I suppose it is outdated [11:25:11] yes, it is gone now [11:25:14] 06Operations, 10Wikimedia-SVG-rendering: SVG marker-mid with orient auto don't work (stops rendering subsequent elements) - https://phabricator.wikimedia.org/T117530#2436870 (10MoritzMuehlenhoff) [11:25:36] AFAI see everything is synced [11:29:08] 06Operations, 10Wikimedia-SVG-rendering: SVG marker-mid with orient auto don't work (stops rendering subsequent elements) - https://phabricator.wikimedia.org/T117530#1776780 (10MoritzMuehlenhoff) It's not obvious to me what the exact visual problem was/is. Is this about the blue broken line in the middle? I've... [11:33:09] I think my check is working? [11:34:04] !log breaking m3 replication on db1048 (depooled) to check icinga alert changes [11:34:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:40:14] RECOVERY - puppet last run on ms-be2019 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:41:33] RECOVERY - puppet last run on db1083 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:43:48] (03PS2) 10KartikMistry: apertium-urd-hin: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) [11:44:56] 06Operations, 06Labs: labvirt1011 periodically unavailable - https://phabricator.wikimedia.org/T139555#2436959 (10Andrew) [11:51:30] (03PS1) 10Yuvipanda: tools: Make toolschecker return FAIL when it fails [puppet] - 10https://gerrit.wikimedia.org/r/297775 [11:51:55] (03PS2) 10Yuvipanda: tools: Make toolschecker return FAIL when it fails [puppet] - 10https://gerrit.wikimedia.org/r/297775 [11:52:23] (03PS1) 10Jcrespo: Fix special case when the mariadb server is not a slave [puppet] - 10https://gerrit.wikimedia.org/r/297776 [11:52:32] (03PS2) 10Jcrespo: Fix special case when the mariadb server is not a slave [puppet] - 10https://gerrit.wikimedia.org/r/297776 [11:54:14] (03PS3) 10Yuvipanda: tools: Make toolschecker return FAIL when it fails [puppet] - 10https://gerrit.wikimedia.org/r/297775 [11:54:26] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Make toolschecker return FAIL when it fails [puppet] - 10https://gerrit.wikimedia.org/r/297775 (owner: 10Yuvipanda) [11:55:22] (03PS3) 10Jcrespo: Fix special case when the mariadb server is not a slave [puppet] - 10https://gerrit.wikimedia.org/r/297776 [11:56:00] (03PS3) 10Yuvipanda: tools: Upgrade kubernetes to v1.3.0 [puppet] - 10https://gerrit.wikimedia.org/r/297436 [11:56:23] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Upgrade kubernetes to v1.3.0 [puppet] - 10https://gerrit.wikimedia.org/r/297436 (owner: 10Yuvipanda) [11:56:58] (03PS4) 10Jcrespo: Fix special case when the mariadb server is not a slave [puppet] - 10https://gerrit.wikimedia.org/r/297776 [11:58:50] (03CR) 10Jcrespo: [C: 032] Fix special case when the mariadb server is not a slave [puppet] - 10https://gerrit.wikimedia.org/r/297776 (owner: 10Jcrespo) [12:00:10] (03CR) 10Alexandros Kosiaris: [C: 031] package_builder: add WMF lintian vendor profile [puppet] - 10https://gerrit.wikimedia.org/r/297765 (owner: 10Ema) [12:05:29] !log depooling mw1261 from service (T73487) [12:05:30] T73487: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487 [12:05:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:21:41] (03PS2) 10Sbisson: Remove EchoBundleEmailInterval [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289395 (https://phabricator.wikimedia.org/T135446) [12:25:25] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2437135 (10elukey) More information thanks to @joe's patch to log FCGI headers.... [12:27:01] moritzm: thx. didn't work for me with 0.9.40 on some form of debian testing with journald. will need to investigate further... [12:28:21] (03PS3) 10Sbisson: Remove EchoBundleEmailInterval [mediawiki-config] - 10https://gerrit.wikimedia.org/r/289395 (https://phabricator.wikimedia.org/T135446) [12:29:42] 06Operations, 06Labs: labvirt1011 periodically unavailable - https://phabricator.wikimedia.org/T139555#2437159 (10Andrew) more background: I did a dist-upgrade on that system right before putting it into service. That was on 2016-06-26. The system behaved well until 2016-06-05 when alarms started firing all... [12:30:20] jzerebecki: did you pass an explicit blacklist on the command line or via a config file? with e.g. "firejail --blacklist=/sbin --tracelog", even a simple "cd /sbin" logs a violation to syslog for me [12:31:02] (03CR) 10Mobrovac: "Looking good for the compiler - https://puppet-compiler.wmflabs.org/3284/ - and for me as well. We'll have to coordinate this deploy, thou" [puppet] - 10https://gerrit.wikimedia.org/r/297761 (owner: 10Giuseppe Lavagetto) [12:32:38] moritzm: oh. that works. had a blacklist in a more complicated config file, I probably just missed something, will try again. [12:33:08] (03CR) 10Giuseppe Lavagetto: "What do you mean "most services"? Is it used in scap3?" [puppet] - 10https://gerrit.wikimedia.org/r/297761 (owner: 10Giuseppe Lavagetto) [12:39:35] (03PS1) 10Andrew Bogott: Fix c/p bugs in dhcpd config for new labvirt servers [puppet] - 10https://gerrit.wikimedia.org/r/297783 [12:42:13] (03CR) 10Faidon Liambotis: [C: 032] Fix c/p bugs in dhcpd config for new labvirt servers [puppet] - 10https://gerrit.wikimedia.org/r/297783 (owner: 10Andrew Bogott) [12:48:46] 06Operations: Depleted connection tracking table on labvirt1010 - https://phabricator.wikimedia.org/T139598#2437190 (10MoritzMuehlenhoff) [12:49:21] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 185 bytes in 42.261 second response time [12:51:41] 06Operations, 06Labs: labvirt1011 periodically unavailable - https://phabricator.wikimedia.org/T139555#2437231 (10Andrew) This is almost certainly fixed by https://gerrit.wikimedia.org/r/#/c/297783/ we'll know soon enough. [12:56:10] RECOVERY - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 31.410 second response time [13:01:02] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install/deploy labvirt nodes - https://phabricator.wikimedia.org/T138509#2437269 (10Andrew) @cmjohnson, I would like these boxes configured with one big hardware raid. I'm pretty sure you did this for me with 1010 and 1011; I can't for the life of me... [13:01:34] (03PS1) 10Giuseppe Lavagetto: Port to precise: - removed python3 and use python_distutils instead of pybuild [software/service-checker] (precise) - 10https://gerrit.wikimedia.org/r/297784 [13:07:33] 06Operations, 06Labs: labvirt1011 periodically unavailable - https://phabricator.wikimedia.org/T139555#2437284 (10Andrew) 05Open>03Resolved a:03Andrew So here's the story: - A typo in dhcpd cofig which resulted in 1012 1013 and 1014 wanting the same IP as 1011 - This shouldn't have mattered since those... [13:14:01] PROBLEM - puppet last run on labvirt1001 is CRITICAL: CRITICAL: Puppet has 1 failures [13:21:19] (03PS1) 10Gehel: Externalize Postgresql user creation from role::osm::master [puppet] - 10https://gerrit.wikimedia.org/r/297786 [13:22:08] (03CR) 10Gehel: [C: 04-1] "Passwords needs to be added to private repo before merging this change." [puppet] - 10https://gerrit.wikimedia.org/r/297786 (owner: 10Gehel) [13:24:17] (03PS2) 10Giuseppe Lavagetto: Port to precise: - removed python3 and use python_distutils instead of pybuild [software/service-checker] (precise) - 10https://gerrit.wikimedia.org/r/297784 [13:25:31] (03CR) 10Giuseppe Lavagetto: [C: 032] Port to precise: - removed python3 and use python_distutils instead of pybuild [software/service-checker] (precise) - 10https://gerrit.wikimedia.org/r/297784 (owner: 10Giuseppe Lavagetto) [13:28:01] RECOVERY - puppet last run on labvirt1001 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [13:29:48] there seems to be a lot of "CentralAuthUser::saveSettings" database errors coming from api requests [13:35:30] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2437365 (10fgiunchedi) [13:50:53] (03PS1) 10Giuseppe Lavagetto: etcd::backup: fix scripts when there are no logs to remove [puppet] - 10https://gerrit.wikimedia.org/r/297791 [13:51:47] 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#2437467 (10MoritzMuehlenhoff) 05stalled>03Resolved The image scalers are now using 2.40.16 containing th upstream fix. [13:54:16] (03CR) 10Waldir: "Ah, that explains it. No issues then." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297556 (https://phabricator.wikimedia.org/T127435) (owner: 10Thiemo Mättig (WMDE)) [13:59:14] (03PS1) 10Elukey: Add analytics to the AQS monitoring contact group. [puppet] - 10https://gerrit.wikimedia.org/r/297793 [14:00:20] PROBLEM - check_payments_wiki on payments1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:00:42] (03CR) 10jenkins-bot: [V: 04-1] Add analytics to the AQS monitoring contact group. [puppet] - 10https://gerrit.wikimedia.org/r/297793 (owner: 10Elukey) [14:00:59] thanks jenkins [14:01:14] 06Operations, 10Wikimedia-SVG-rendering: SVG fails to render properly due to several issues - https://phabricator.wikimedia.org/T46016#2437515 (10MoritzMuehlenhoff) [14:03:03] 06Operations, 10Wikimedia-SVG-rendering: SVG fails to render properly due to several issues - https://phabricator.wikimedia.org/T46016#488876 (10MoritzMuehlenhoff) That's not fully fixed in 2.40.16: In comparison to https://commons.wikimedia.org/wiki/File:Vector_saturn_%28Correct_render%29.png the re-rendered... [14:03:48] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [14:05:08] RECOVERY - check_payments_wiki on payments1005 is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 0.029 second response time [14:05:58] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5126329 keys - replication_delay is 0 [14:11:46] (03CR) 10Giuseppe Lavagetto: [C: 032] etcd::backup: fix scripts when there are no logs to remove [puppet] - 10https://gerrit.wikimedia.org/r/297791 (owner: 10Giuseppe Lavagetto) [14:20:36] (03PS2) 10Elukey: Add analytics to the AQS monitoring contact group. [puppet] - 10https://gerrit.wikimedia.org/r/297793 [14:22:25] (03CR) 10jenkins-bot: [V: 04-1] Add analytics to the AQS monitoring contact group. [puppet] - 10https://gerrit.wikimedia.org/r/297793 (owner: 10Elukey) [14:23:25] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2437566 (10fgiunchedi) yeah the new hardware would work too and easier to compare, I think we can grab 2x machines from appservers /cc @Joe @elukey [14:25:15] (03PS3) 10Elukey: Add analytics to the AQS monitoring contact group. [puppet] - 10https://gerrit.wikimedia.org/r/297793 [14:25:43] (03PS2) 10Ema: package_builder: add WMF lintian vendor profile [puppet] - 10https://gerrit.wikimedia.org/r/297765 [14:26:56] (03CR) 10jenkins-bot: [V: 04-1] package_builder: add WMF lintian vendor profile [puppet] - 10https://gerrit.wikimedia.org/r/297765 (owner: 10Ema) [14:27:39] (03CR) 10Mobrovac: [C: 031] "Yes, node services that are deployed using scap3 use service_checker as the post-deploy check. But, as you point out, that file isn't remo" [puppet] - 10https://gerrit.wikimedia.org/r/297761 (owner: 10Giuseppe Lavagetto) [14:31:12] (03PS4) 10Elukey: Add analytics to the AQS monitoring contact group. [puppet] - 10https://gerrit.wikimedia.org/r/297793 [14:31:34] (03CR) 10Alexandros Kosiaris: "nice! minor comment, greatly appreciated otherwise" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/297765 (owner: 10Ema) [14:31:41] (03PS3) 10Giuseppe Lavagetto: service_checker: use external package [puppet] - 10https://gerrit.wikimedia.org/r/297761 [14:32:49] (03PS1) 10Ottomata: Include analytics_cluster::hive::client role on analytics1030 to see if this fixes HiveSpark in cluster mode [puppet] - 10https://gerrit.wikimedia.org/r/297797 [14:32:58] (03CR) 10Elukey: [C: 032] "Puppet compiler looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/297793 (owner: 10Elukey) [14:33:10] (03PS2) 10Ottomata: Include analytics_cluster::hive::client role on analytics1030 to see if this fixes HiveSpark in cluster mode [puppet] - 10https://gerrit.wikimedia.org/r/297797 [14:36:16] (03PS3) 10Ema: package_builder: add WMF lintian vendor profile [puppet] - 10https://gerrit.wikimedia.org/r/297765 [14:37:52] !log depool aqs1001 for nodejs upgrade [14:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:38:16] !log elukey@palladium conftool action : set/pooled=no; selector: aqs1001.eqiad.wmnet [14:38:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:43:41] !log elukey@palladium conftool action : set/pooled=yes; selector: aqs1001.eqiad.wmnet [14:43:42] \!log T107306 uploaded to apt.wikimedia.org jessie-wikimedia giella-core_0.1.1~r129227+svn121148-1+wmf1 [14:43:43] T107306: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306 [14:43:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:44:39] mobrovac: aqs1001 has been upgraded, will do 100[23] tomorrow if nothing comes up [14:45:07] oh nice elukey! [14:45:11] thnx! [14:45:34] (03PS3) 10Ottomata: Include analytics_cluster::hive::client role on analytics1030 to see if this fixes HiveSpark in cluster mode [puppet] - 10https://gerrit.wikimedia.org/r/297797 [14:48:41] TIL \! works with logmsgbot [14:49:19] (03PS4) 10Ottomata: Include analytics_cluster::hive::client role on analytics1030 to see if this fixes HiveSpark in cluster mode [puppet] - 10https://gerrit.wikimedia.org/r/297797 [14:49:32] (03CR) 10Ottomata: [C: 032 V: 032] Include analytics_cluster::hive::client role on analytics1030 to see if this fixes HiveSpark in cluster mode [puppet] - 10https://gerrit.wikimedia.org/r/297797 (owner: 10Ottomata) [14:49:42] godog: I wonder if !!log means don't log :p [14:50:43] hahah ostriches or log with some more urgency [14:51:24] (03PS4) 10Ema: package_builder: add WMF lintian vendor profile [puppet] - 10https://gerrit.wikimedia.org/r/297765 [14:51:33] godog: !logwithfeeling :p [14:51:39] (03CR) 10Ema: [C: 032 V: 032] package_builder: add WMF lintian vendor profile [puppet] - 10https://gerrit.wikimedia.org/r/297765 (owner: 10Ema) [14:52:41] File:Sting.ogg [14:52:50] also re: today's upgrade of logstash [14:57:02] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2437622 (10KartikMistry) >>! In T138561#2436493, @mobrovac wrote: >>>! In T138561#2436053, @KartikMistry wrote: >> @mobrovac deployment-sca02.eqiad.wmflabs and... [14:57:25] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2437623 (10KartikMistry) So, cxserver is OK with nodejs 4.4.6. [14:58:44] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2437624 (10mobrovac) [15:00:04] anomie, ostriches, thcipriani, hashar, twentyafterfour, and aude: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160707T1500). [15:00:04] kart_ and stephanebisson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:13] o_O [15:00:33] I'm here [15:00:42] o/ [15:00:47] I'm too [15:01:13] kart_ is kart__ [15:01:28] I can SWAT today [15:01:46] (03PS5) 10Thcipriani: Deploy Compact Language Links as default (Stage 4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297349 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry) [15:01:53] * aude is at the airport and prefers not to swat today [15:02:01] but willing to swat other times [15:02:04] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297349 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry) [15:02:15] thcipriani: usual test hosts sync first :) [15:02:40] (03Merged) 10jenkins-bot: Deploy Compact Language Links as default (Stage 4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297349 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry) [15:02:41] kart__: yup. will do. [15:03:36] hola, just throwing this out there, will announce to the list of SWATers later when I more wake up, but, I add the "test on mw1017/use X-Wikimedia-Debug" step to https://wikitech.wikimedia.org/wiki/SWAT_deploys#Doing_the_deploy [15:03:59] there are situations, I'm sure, where it's not possible/relevant, but, it's a good practice [15:04:28] greg-g: yup, agreed, saw a bit of discussion about that last night. [15:04:46] also, g'morning [15:05:15] morning! :) [15:05:27] kart__: patch is on mw1017, please test [15:05:46] thcipriani: sure [15:06:05] (03PS1) 10Ottomata: Include hive::client role in hadoop::worker role to get hive-site.xml and other deps [puppet] - 10https://gerrit.wikimedia.org/r/297799 [15:08:36] thcipriani: looks good. Go ahead. [15:08:43] kart__: ack, doing [15:11:16] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:297349|Deploy Compact Language Links as default (Stage 4) (T136677)]] PART I (duration: 00m 55s) [15:11:17] T136677: Deployment of Compact Language Links - https://phabricator.wikimedia.org/T136677 [15:11:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:11:57] !log thcipriani@tin Synchronized dblists/clldefault.dblist: SWAT: [[gerrit:297349|Deploy Compact Language Links as default (Stage 4) (T136677)]] PART II (duration: 00m 34s) [15:11:58] T136677: Deployment of Compact Language Links - https://phabricator.wikimedia.org/T136677 [15:12:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:12:02] ^ kart__ check please [15:12:05] (03PS1) 10Muehlenhoff: xenon: Use DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297800 [15:12:25] thcipriani: thanks. Testing [15:13:59] <_joe_> !log uploaded new HHVM package for jessie [15:14:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:16:30] stephanebisson: your change is live on mw1017 only, check please [15:16:35] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] giella-core: Initial Debian packaging [debs/contenttranslation/giella-core] - 10https://gerrit.wikimedia.org/r/294426 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [15:16:42] thcipriani: testing [15:17:03] thcipriani: all good. Sorry for late message. [15:17:13] kart__: np, thanks for testing :) [15:18:52] (03PS1) 10Chad: Phab: properly disable crons for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/297802 (https://phabricator.wikimedia.org/T138460) [15:20:48] thcipriani: I don't see it, not sure what I'm doing wrong. I have the ff extension to target mw1017.equiad.wmnet AND debug=1... [15:21:27] (mw1017.eqiad.wmnet is the url I have) [15:22:02] (03PS2) 10Rush: Enable base::firewall for labtestmetal2001 [puppet] - 10https://gerrit.wikimedia.org/r/293712 (owner: 10Muehlenhoff) [15:22:55] thcipriani: wait, I know, I'm testing against fr.wp, which doesn't have this branch yet [15:23:11] (03PS1) 10Chad: Phab: make sure the mail crons have mysql-client installed [puppet] - 10https://gerrit.wikimedia.org/r/297803 [15:23:13] ahh, yeah, group1 only so far [15:23:27] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 2 failures [15:24:00] thcipriani: Alright, sorry for the confusion. All good, thanks! [15:24:16] stephanebisson: np, kk, rolling out everywhere now [15:25:55] !log thcipriani@tin Synchronized php-1.28.0-wmf.9/extensions/Echo/modules/controller/mw.echo.Controller.js: SWAT: [[gerrit:297792|Correct section (alert/message/all)]] (duration: 00m 25s) [15:25:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:26:00] ^ stephanebisson check please [15:26:54] thcipriani: confirmed. [15:27:09] stephanebisson: great, thanks for checking! [15:29:38] (03CR) 10Alexandros Kosiaris: [C: 031] xenon: Use DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297800 (owner: 10Muehlenhoff) [15:36:17] 06Operations, 07Puppet, 13Patch-For-Review: Reconsider the aligning arrows puppet lint - https://phabricator.wikimedia.org/T137763#2437735 (10yuvipanda) 05Open>03declined [15:38:48] (03CR) 10Ottomata: [C: 032] Include hive::client role in hadoop::worker role to get hive-site.xml and other deps [puppet] - 10https://gerrit.wikimedia.org/r/297799 (owner: 10Ottomata) [15:44:36] !log Dropped logstash indices older than logstash-2016.07.01 in preparation for elasticsearch upgrade [15:44:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:46:58] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:51:18] !log add mw1261 back into service [15:51:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:00:04] gehel: Dear anthropoid, the time has come. Please deploy logstash / kibana / elasticsearch upgrade (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160707T1600). [16:00:04] ebernhardson and bd808: A patch you scheduled for logstash / kibana / elasticsearch upgrade is about to be deployed. Please be available during the process. [16:00:04] godog, moritzm, and _joe_: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160707T1600). Please do the needful. [16:03:21] no puppet SWAT today [16:03:40] good luck gehel bd808 ebernhardson ! [16:04:12] godog: thanks! We'll need some! Not yet started and already some surprises [16:04:26] ow, nasty surprises? [16:10:45] godog: basically we now have invalid mapping / fields in all indexes, we'll probably need to drop the whole history [16:10:52] (03PS3) 10Rush: Enable base::firewall for labtestmetal2001 [puppet] - 10https://gerrit.wikimedia.org/r/293712 (owner: 10Muehlenhoff) [16:11:05] (03CR) 10Rush: [C: 031] "yeah makes sense" [puppet] - 10https://gerrit.wikimedia.org/r/293712 (owner: 10Muehlenhoff) [16:11:31] 06Operations, 10DBA, 10Phabricator, 13Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2437836 (10jcrespo) Are you sure they are still running?- I commented them on puppet and commented it from the server. See: ``` # HEADER: This file was autogenerated at 2... [16:14:07] 06Operations, 10DBA, 10Phabricator, 13Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2437858 (10demon) They were still running on phab2001, which was causing cronspam that @faidon alerted me to this morning. [16:14:26] (03CR) 10Jcrespo: [C: 04-1] "See my comment on: T138460#2437836" [puppet] - 10https://gerrit.wikimedia.org/r/297802 (https://phabricator.wikimedia.org/T138460) (owner: 10Chad) [16:16:33] 06Operations, 10DBA, 10Phabricator, 13Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2437887 (10jcrespo) phab2001 connects to m3-slave? That is even a worse problem! Are you using TLS?- the answer is no, because until now it did not work due to 5.5) [16:18:30] (03PS2) 10Chad: Phab: properly disable crons for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/297802 (https://phabricator.wikimedia.org/T138460) [16:18:57] (03CR) 10Chad: "PS2 passes ensure from the role" [puppet] - 10https://gerrit.wikimedia.org/r/297802 (https://phabricator.wikimedia.org/T138460) (owner: 10Chad) [16:20:35] 06Operations, 10DBA, 10Phabricator, 13Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2437907 (10demon) >>! In T138460#2437887, @jcrespo wrote: > phab2001 connects to m3-slave? That is even a worse problem! Are you using TLS?- the answer is no, because unti... [16:20:43] (03CR) 10jenkins-bot: [V: 04-1] Phab: properly disable crons for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/297802 (https://phabricator.wikimedia.org/T138460) (owner: 10Chad) [16:21:33] (03PS3) 10Chad: Phab: properly disable crons for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/297802 (https://phabricator.wikimedia.org/T138460) [16:21:44] (03CR) 10Chad: "PS3 fixes puppetlint issue." [puppet] - 10https://gerrit.wikimedia.org/r/297802 (https://phabricator.wikimedia.org/T138460) (owner: 10Chad) [16:24:10] (03PS4) 10Jcrespo: Phab: properly disable crons for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/297802 (https://phabricator.wikimedia.org/T138460) (owner: 10Chad) [16:24:51] !log starting elasticsearch and kibana upgrade on logstash cluster (T136001) [16:24:52] T136001: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001 [16:24:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:25:19] PROBLEM - check_payments_wiki on payments1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:26:04] 06Operations, 10DBA, 10Phabricator, 13Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2437926 (10jcrespo) @demon I agree (you will get mails twice). Not a huge issue because db1048 is mostly up, only depooled still because I found some data differences with... [16:26:23] (03PS1) 10Eevans: Upgrade remaining rack 'd' Cassandra nodes to 2.2.6 [puppet] - 10https://gerrit.wikimedia.org/r/297809 (https://phabricator.wikimedia.org/T126629) [16:28:33] urandom: same as yesterday? --^ [16:28:38] (03CR) 10Jcrespo: [C: 032] Phab: properly disable crons for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/297802 (https://phabricator.wikimedia.org/T138460) (owner: 10Chad) [16:28:52] elukey: yup! [16:28:56] that may fail, though [16:29:36] !log Disabling Puppet on restbase101[4-5].eqiad.wmnet : T126629 [16:29:37] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [16:29:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:29:48] (03PS1) 10Google: Revert "Add mw129[78] to the MediaWiki scap dsh list." [puppet] - 10https://gerrit.wikimedia.org/r/297810 [16:30:09] RECOVERY - check_payments_wiki on payments1005 is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 0.027 second response time [16:30:42] jynus: did you refer to cassandra or was it a coincindence? [16:30:48] (just double checking) [16:30:50] no [16:30:53] (03PS2) 10Google: Revert "Add mw129[78] to the MediaWiki scap dsh list." [puppet] - 10https://gerrit.wikimedia.org/r/297810 [16:30:53] to my merge [16:30:55] super [16:31:05] urandom: ready to merge? [16:31:12] elukey: yeah, that would be great! [16:31:16] (03PS2) 10Elukey: Upgrade remaining rack 'd' Cassandra nodes to 2.2.6 [puppet] - 10https://gerrit.wikimedia.org/r/297809 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [16:31:21] Google? [16:31:44] 06Operations, 10Wikimedia-SVG-rendering: SVG marker-mid with orient auto don't work (stops rendering subsequent elements) - https://phabricator.wikimedia.org/T117530#2437947 (10Menner) @MoritzMuehlenhoff: Look at the file history both have work arounds applied. You may test old version on [[ https://commons.... [16:32:18] https://gerrit.wikimedia.org/r/#/q/owner:%22Google+%253Cgoogle%2540legacy.ventures%253E%22,n,z ? [16:32:26] greg-g: I was about to ask the same [16:32:47] (03CR) 10Google: "Creative Commons License nope, it worked [16:33:27] well, at least they're CC:BY-SA licensing their spam? [16:33:35] win. [16:34:10] (03CR) 10Elukey: [C: 032] "Had a chat with Eric over IRC, puppet already disabled. Change looks good." [puppet] - 10https://gerrit.wikimedia.org/r/297809 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [16:34:12] (03CR) 10Google: [C: 04-1] "Creative Commons License jynus: Thanks for the merge, cronspam-- :D [16:34:22] also -1? [16:34:25] elukey: thanks! [16:35:07] 06Operations, 10DBA, 10Phabricator, 13Patch-For-Review: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2437957 (10jcrespo) Please continue working on phab architecture. There is already a slave on codfw: db2012 [16:35:15] urandom: merged! [16:35:17] ostriches, but that must have been sending 2 emails all the time [16:35:27] please followup [16:35:36] T138460#2437957 [16:35:36] T138460: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460 [16:35:44] 06Operations, 10ops-codfw: codfw: return one intel ssd to dasher for warranty replacement - https://phabricator.wikimedia.org/T132210#2437958 (10Papaul) 05Open>03Resolved Received disk replacement {F4249881} [16:36:04] db1048 is almost ready (it is up and working) [16:36:47] but I saw 3 rows different from db1043 [16:36:53] when I fix those issues [16:36:55] (03CR) 10Google: "Creative Commons License we will failover db1043 to db1048 [16:37:29] (03CR) 10Google: [C: 04-1] "Creative Commons License (03PS3) 10Google: Revert "Add mw129[78] to the MediaWiki scap dsh list." [puppet] - 10https://gerrit.wikimedia.org/r/297810 [16:38:54] (03PS1) 10Chad: Phabricator: Don't run dumps or mail scripts from non-primary host [puppet] - 10https://gerrit.wikimedia.org/r/297813 [16:38:58] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase1014.eqiad.wmnet : T126629 [16:38:59] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [16:39:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:39:14] jynus: Follows up in 297813 ^^^^ [16:39:22] !log Restarting Cassandra instance restbase1014-a.eqiad.wmnet : T126629 [16:39:23] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [16:39:23] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2437994 (10elukey) For anybody that wants to help, there is a complete error lo... [16:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:39:28] the hell is that "Google" user doing? [16:39:52] thanks [16:39:59] want me to deploy? [16:40:33] jynus: Should be fine to keep them from coming back later :) [16:41:24] I assume that is a yes? [16:41:32] Sorry, I didn't understood [16:41:39] Yeah, let's do it [16:41:48] Krenair: User only 4 days old (july 3) [16:41:52] Probably spam [16:42:00] shall I disable it? :/ [16:42:02] (03CR) 10Jcrespo: [C: 032] Phabricator: Don't run dumps or mail scripts from non-primary host [puppet] - 10https://gerrit.wikimedia.org/r/297813 (owner: 10Chad) [16:42:26] On wikitech? Sure. I'll disable the gerrit user. [16:42:33] cc MatmaRex ^^ [16:42:42] I can do the gerrit user too [16:42:45] but ok [16:43:17] I asked for review to a few people in releng, how could I know know there was a "secret" phabricator host? [16:43:58] will labs test have the same problems? [16:44:16] the phab-VMs? [16:44:30] jynus: Probably. I didn't see the cron disabling. And I wouldn't call it "secret", the role is in use in site.pp [16:44:35] But yeah, easily missed. [16:44:38] those jobs should be included only in the prod role and not the labs role [16:44:49] So labs should be fine? Ok [16:44:51] Hi someonone creating a google username at https://gerrit.wikimedia.org/r/#/c/297810/ [16:44:54] sounds like a bot [16:44:57] But yeah, phab2001 was easily missed until the crons failed. [16:44:57] of google's [16:44:59] none of the phab VMs use a working puppet role currently. there is a ticket and work on changing that [16:45:01] chasemp, with should are you guessing or advicing? [16:45:03] paladox: Already blocked. [16:45:04] It's spam [16:45:07] Ok thanks [16:45:20] ostriches: thanks, i was about to paste the same thing.. wth :) [16:45:27] Could the patch be abandoned please [16:45:29] jynus: kind of both, as I understand it anyone effected was doing thing incorrectly [16:46:02] (03Abandoned) 10Chad: Revert "Add mw129[78] to the MediaWiki scap dsh list." [puppet] - 10https://gerrit.wikimedia.org/r/297810 (owner: 10Google) [16:46:12] Thanks ostriches [16:46:38] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 13Patch-For-Review, and 3 others: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2438024 (10Anomie) Looks like the serializing of the jobs isn't quite working. For example, ```name=runJobs.log 2016-07-07 04:27:... [16:47:11] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2438026 (10Joe) @elukey it would be useful to save the corresponding access log... [16:47:25] !log stopping pc2006 for hardware maintenance T139283 [16:47:26] T139283: pc2006 down - https://phabricator.wikimedia.org/T139283 [16:47:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:47:37] jynus: fyi: I just finished cleaning up all of the Gerrit puppet. It's on my todo list to clean up Phabricator's some too. [16:47:44] It's pretty convoluted [16:47:49] it is ok [16:47:52] if you told me [16:48:00] better than average [16:48:11] Could be better though :D [16:48:34] !log Restarting Cassandra instance restbase1014-b.eqiad.wmnet : T126629 [16:48:35] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [16:48:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:48:56] the issue is, reviews are for "hey, is this enough, I do not know the service good enough" [16:49:36] maybe iridium-codfw is recent? [16:49:47] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, 07Epic: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2438041 (10Gehel) Invalid fields / mappings have appeared in the latest index as well. The de-dotting did not work... [16:49:49] I didn't know there was an iridium in codfw [16:49:51] it is, a month or os Ithink [16:50:00] no, just phab2001 (aka, the iridium of codfw) [16:50:02] I thought it was iridium.eqiad.wmnet and phab2001.codfw.wmnet [16:50:08] Oh, names are hard :) [16:50:46] * ebernhardson votes to use projects names for all the things. kill the snowflakes :P [16:51:07] T137928 isn't even resolved yet? [16:51:08] T137928: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928 [16:51:14] greg-g, so that is where the confusion came [16:51:22] I assumed there was only one server [16:51:27] * greg-g nods [16:51:33] but please [16:51:44] do not do cross-dc requests [16:51:56] it must be linked to its local db [16:52:01] I will comment on the task [16:52:23] papaul, pc2006 should be down already [16:52:25] Yeah, that needs fixing forsure. [16:52:30] sorry for the delay [16:55:03] right now, there are 2 dbs on eqiad, but only 1 on codfw [16:55:40] !log Restarting Cassandra instance restbase1014-c.eqiad.wmnet : T126629 [16:55:41] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [16:55:42] although there is a larger redundancy on phab* dbs having into account the analytics and delayed slaves [16:55:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:57:05] jynus: We should easily be able to set the variables for which DB to use in hiera. I'll work up a patch on that next. [16:57:21] Re: pc2006, you may see some errors regarding the parsercache on *codfw* [16:57:25] jynus: ok [16:57:30] mediawiki errors [16:57:37] I prefered not to depool while on maintenance [16:57:51] PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp San Francisco page via mobile-sections-lead) is CRITICAL: Test retrieve lead section of en.wp San Francisco page via mobile-sections-lead returned the unexpected status 500 (expecting: 200) [16:57:55] because it may contaminate production pc1* hosts on eqiad [16:58:24] we (I) have to fix a better HA model for parsercaches [16:58:31] probably involving a proxy [16:58:59] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp San Francisco page via mobile-sections-lead) is CRITICAL: Test retrieve lead section of en.wp San Francisco page via mobile-sections-lead returned the unexpected status 500 (expecting: 200) [16:59:30] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:00:04] yurik, gwicke, cscott, arlolra, and subbu: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160707T1700). Please do the needful. [17:02:55] (03CR) 10Google: "Creative Commons License ostriches Google has just replied ^^ [17:06:34] (03PS1) 10Google: Revert "Add mw129[78] to the MediaWiki scap dsh list." [puppet] - 10https://gerrit.wikimedia.org/r/297819 [17:07:05] (03Abandoned) 10Google: Revert "Add mw129[78] to the MediaWiki scap dsh list." [puppet] - 10https://gerrit.wikimedia.org/r/297819 (owner: 10Google) [17:07:22] ^^ google keeps spamming [17:07:32] <_joe_> I think it's learning how to use our gerrit [17:07:37] paladox: thanks we can see that [17:07:45] <_joe_> next step will be it will start contributing code [17:07:47] Oh sorry [17:07:58] should we rename the account skynet? [17:07:59] <_joe_> skynet has finally gained conscience! [17:08:15] _joe_: jinx! [17:08:16] <_joe_> bd808: this clearly shows how unoriginal nerds are [17:08:18] no parsoid deploy today. [17:08:30] <_joe_> subbu: maybe google has different ideas :P [17:08:30] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [17:08:41] lol [17:08:43] _joe_: I only copy what I find on StackOverflow and movies [17:09:04] <_joe_> bd808: that's our next t-shirt, right? [17:09:38] "I broke Wikipedia, but SO told me how to fix it!" [17:09:41] RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy [17:10:56] I broke SO, but Wikipedia told me what SQL server is and I fixed it? [17:11:17] <_joe_> jynus: s/fixed it/started running/ [17:11:20] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [17:11:59] gerrit> update accounts set full_name = 'Skynet Spammer' where full_name = 'Google'; [17:11:59] UPDATE 1; 3 ms [17:12:44] 06Operations, 10ops-codfw, 10DBA: pc2006 down - https://phabricator.wikimedia.org/T139283#2438116 (10Papaul) Bios update from 1.5.4 to 2.1.7 [17:12:56] jynus: i am done with the BIOS update [17:12:56] !log gerrit: flush all caches to pick up account disable & rename [17:13:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:13:02] It's email is legacy.ventures [17:13:21] papaul, thank you, can you start it? [17:13:44] jynus: it is already up [17:13:51] great [17:18:37] (03PS2) 10Dzahn: xenon: Use DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297800 (owner: 10Muehlenhoff) [17:21:43] (03PS4) 10Merlijn van Deen: [DO NOT SUBMIT] test for tool labs puppet compiler [puppet] - 10https://gerrit.wikimedia.org/r/254183 [17:21:51] ori: ^ it means xenon can be reached from prod and labs. good, right [17:24:09] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 185 bytes in 41.991 second response time [17:26:01] Excepts on Special:CentralAuth [17:26:02] er [17:26:05] Exceptions* [17:26:11] RECOVERY - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 31.903 second response time [17:26:30] And other pages such as login ones apparently? [17:27:33] "[V36QjgpAEK0AAD5WDMcAAAAS] 2016-07-07 17:25:34: Fatal exception of type "Exception" " [17:28:02] hopefully that's not a special string related to me, but that's what I got [17:28:28] no, it's just an id for looking in the server side logs [17:28:37] Oh okay, good. [17:28:38] :D [17:29:46] I didn't get one when looking at my CentralAuth page though... [17:36:23] 06Operations, 10Traffic, 13Patch-For-Review: Make upload.wikimedia.org cookieless - https://phabricator.wikimedia.org/T137609#2373208 (10Milimetric) [17:46:29] (03PS1) 10Yuvipanda: dynamicproxy: do not override nginx.conf [puppet] - 10https://gerrit.wikimedia.org/r/297829 (https://phabricator.wikimedia.org/T134383) [17:46:49] (03PS2) 10Yuvipanda: dynamicproxy: do not override nginx.conf [puppet] - 10https://gerrit.wikimedia.org/r/297829 (https://phabricator.wikimedia.org/T134383) [17:47:04] (03PS1) 10Chad: Phabricator: Minor nit, move things that aren't templates out of templates [puppet] - 10https://gerrit.wikimedia.org/r/297830 [17:47:39] 06Operations, 10Traffic, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2438288 (10Jgreen) [17:48:03] 06Operations, 10Traffic, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2359459 (10Jgreen) [17:49:01] (03CR) 10Dzahn: [C: 032] xenon: Use DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297800 (owner: 10Muehlenhoff) [17:49:17] (03CR) 10jenkins-bot: [V: 04-1] Phabricator: Minor nit, move things that aren't templates out of templates [puppet] - 10https://gerrit.wikimedia.org/r/297830 (owner: 10Chad) [17:50:10] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures [17:53:36] (03CR) 10Gehel: Update kibana module for kibana 4 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296279 (https://phabricator.wikimedia.org/T129138) (owner: 10EBernhardson) [17:54:08] 06Operations, 10ops-codfw, 10DBA: pc2006 down - https://phabricator.wikimedia.org/T139283#2438327 (10jcrespo) ``` /admin1-> racadm getsel Record: 1 Date/Time: 12/18/2015 20:35:18 Source: system Severity: Ok Description: Log cleared. -------------------------------------------------------------... [17:56:18] (03PS9) 10EBernhardson: Update kibana module for kibana 4 [puppet] - 10https://gerrit.wikimedia.org/r/296279 (https://phabricator.wikimedia.org/T129138) [17:56:39] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2438345 (10elukey) Yes definitely! mw1261.eqiad.wmnet:/home/elukey/error_log_... [17:56:43] (03PS1) 10Yuvipanda: dynamicproxy: Use http2 rather than spdy [puppet] - 10https://gerrit.wikimedia.org/r/297832 (https://phabricator.wikimedia.org/T134383) [17:57:01] (03CR) 10Dzahn: "@Alex well.. i have provisioned ports.conf via puppet in another module before to solve the same issue. so we can also do that." [puppet] - 10https://gerrit.wikimedia.org/r/297727 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [17:57:29] (03PS2) 10Yuvipanda: dynamicproxy: Use http2 rather than spdy [puppet] - 10https://gerrit.wikimedia.org/r/297832 (https://phabricator.wikimedia.org/T134383) [17:59:26] (03CR) 10EBernhardson: "updated per code review. Pulled to deployment-puppetmaster and ran on deployment-logstash3, looks to work as expected." [puppet] - 10https://gerrit.wikimedia.org/r/296279 (https://phabricator.wikimedia.org/T129138) (owner: 10EBernhardson) [18:02:28] (03PS2) 10Chad: Phab: make sure the mail crons have mysql-client installed [puppet] - 10https://gerrit.wikimedia.org/r/297803 [18:07:02] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase1015.eqiad.wmnet : T126629 [18:07:03] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [18:07:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:07:17] 06Operations, 10Traffic, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2438394 (10Jgreen) [18:08:06] Hi. Can I get an approval to perform a bigdelete on plwiki for 35,000+ revisions? [18:08:43] !log Restarting Cassandra instance restbase1015-a.eqiad.wmnet : T126629 [18:08:44] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [18:08:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:11:54] !log Restarting Cassandra instance restbase1015-b.eqiad.wmnet : T126629 [18:11:55] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [18:11:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:15:23] !log Cassandra 2.2.6 upgrade of restbase1015.eqiad.wmnet instances complete : T126629 [18:15:24] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [18:15:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:17:31] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:25:30] 06Operations, 10ops-codfw, 10DBA: pc2006 down - https://phabricator.wikimedia.org/T139283#2438530 (10jcrespo) I see no errors either on the web interface. Should we plan a general upgrade of all affected machines, or should we wait in case it fails again? [18:27:28] !log Disabling Puppet on RESTBase codfw nodes : T126629 [18:27:29] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [18:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:31:37] 06Operations, 10Traffic, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2438550 (10Jgreen) [18:32:43] (03PS1) 10Chad: Gerrit: Puppetize the known_hosts file for replication [puppet] - 10https://gerrit.wikimedia.org/r/297837 [18:32:45] (03PS1) 10Eevans: Upgrade codfw nodes to Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/297838 (https://phabricator.wikimedia.org/T126629) [18:33:06] (03CR) 10Dzahn: "true yea.. we are in the middle of that upgrade (ytterbium -> lead)" [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [18:34:33] (03CR) 10Dzahn: "we can also just put an "if apache version" conditional around it in the .erb template" [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [18:34:46] (03CR) 10Chad: "Will moving to jessie automatically get us 2.4?" [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [18:34:55] mutante: ping? [18:36:47] 06Operations, 06Commons, 10media-storage: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2438554 (10kaldari) @MoritzMuehlenhoff: https://upload.wikimedia.org/wikipedia/commons/thumb/2/2c/MediaWiki_SVG_fonts.svg/860px-MediaWiki_SVG_f... [18:39:01] (03CR) 10Dzahn: "yes (and we may have to put some conditionals in the template anyways to make it compatible with both 2.2 and 2.4 (or switch all at once t" [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [18:39:11] urandom: hello [18:39:24] mutante: greets [18:39:44] mutante: did godog get with you about a potential puppet +2 for Cassandra upgrades? [18:39:57] he mentioned he might, given your favorable timezone [18:40:01] urandom: is it the same thing that we did before? [18:40:10] then yea [18:40:28] mutante: heh, not sure. you merged some changes for adding instances, this is different [18:40:40] i meant "adding instances" yea [18:40:47] ok; this is not that [18:40:49] then..no [18:40:57] ok [18:43:03] urandom: is it about upgrading cassandra to 2.2 ? [18:43:08] yes [18:43:40] so i see a hiera change that basically just sets the version to 2.2 for each hostname [18:43:52] mutante: right [18:44:05] that would be fine .. if somebody tells me it is the right time [18:44:13] but do you really want to change them all at the same time [18:44:21] i have puppet disabled [18:44:51] we did it piecemeal for eqiad out of caution, i'm going to go through the codfw pretty quick though [18:45:15] and i'm optimizing for the effort in finding someone to +2 :) [18:45:18] ah, ok. and you would be doing that like right now ? [18:45:23] yup [18:45:37] it's pretty routine at this point [18:45:41] (03PS1) 10Muehlenhoff: ocg: Restrict to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297840 [18:45:43] i did all of eqiad [18:45:52] and codfw is not active atm [18:46:25] i.e. even if it went south, there should be no fallout [18:46:33] * urandom knocks on wood [18:46:45] ok, just one more question.. shouldn't there be "cassandra::target_version" in the eqiad ones? [18:46:56] the yaml files i mean [18:46:58] there is [18:47:04] there already is [18:47:46] indeed there is, pebcak [18:47:59] (03CR) 10Dzahn: [C: 032] Upgrade codfw nodes to Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/297838 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [18:48:11] mutante: thank you sir! [18:48:45] ok, and now it should be active [18:49:10] yup, now i can continue with my tedium! :) [18:49:28] ok, cool [18:51:29] let the !log know how it's going [18:51:35] heh [18:53:55] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase2001.codfw.wmnet : T126629 [18:53:56] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [18:53:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:55:59] !log Restarting Cassandra instance restbase2001-a.codfw.wmnet : T126629 [18:56:00] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [18:56:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:58:03] (03CR) 10: "Creative Commons License 06Operations, 10ops-codfw, 10DBA: pc2006 down - https://phabricator.wikimedia.org/T139283#2438642 (10Papaul) I think we can plan a general upgrade since it takes not more than 5 minutes to do the upgrade on a system. I will check and see how many systems are affected. [19:00:04] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160707T1900). Please do the needful. [19:02:28] !log Restarting Cassandra instance restbase2001-b.codfw.wmnet : T126629 [19:02:29] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:05:41] !log Restarting Cassandra instance restbase2001-c.codfw.wmnet : T126629 [19:05:42] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:05:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:08:18] !log Cassandra upgrade of restbase2001.codfw.wmnet instances complete : T126629 [19:08:19] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:08:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:09:17] (03PS3) 10Nuria: Seeting up log retention policy in yarn [puppet/cdh] - 10https://gerrit.wikimedia.org/r/297643 (https://phabricator.wikimedia.org/T139178) [19:09:19] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase2002.codfw.wmnet : T126629 [19:09:20] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:10:20] (03CR) 10Nuria: Seeting up log retention policy in yarn (031 comment) [puppet/cdh] - 10https://gerrit.wikimedia.org/r/297643 (https://phabricator.wikimedia.org/T139178) (owner: 10Nuria) [19:11:35] !log Restarting Cassandra instance restbase2002-a.codfw.wmnet : T126629 [19:11:36] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:11:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:12:54] !log gerrit: force all users to log out. sorry ❤️ [19:12:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:13:30] * MatmaRex stabs ostriches <3 [19:13:36] sorry 'bout it [19:13:45] !log Restarting Cassandra instance restbase2002-b.codfw.wmnet : T126629 [19:13:45] was that really the only way D: [19:13:46] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:13:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:14:00] MatmaRex: Yeah. I dunno how else to kill his session. [19:14:09] It's stored in a gerrit cache. [19:14:14] freaking gerrit. [19:14:27] Could be worse. [19:14:30] Coulda had to restart [19:16:06] heh [19:16:43] !log Restarting Cassandra instance restbase2002-c.codfw.wmnet : T126629 [19:16:44] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:16:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:19:00] 06Operations, 10ops-codfw, 10DBA: pc2006 down - https://phabricator.wikimedia.org/T139283#2438784 (10Papaul) please see below for servers that we need to upgrade This affects all PowerEdge R730 and R630 es2011 es2012 es2013 es2014 es2015 es2016 es2017 done es2018 es2019 done pc2004 pc2005 pc2006 done [19:19:34] !log Upgrade of restbase2002.codfw.wmnet instances complete : T126629 [19:19:35] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:19:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:22:29] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase2007.codfw.wmnet : T126629 [19:22:29] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:22:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:24:38] !log Restarting Cassandra instance restbase2007-a.codfw.wmnet : T126629 [19:24:39] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:24:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:24:47] * aude waves [19:27:10] !log Restarting Cassandra instance restbase2007-b.codfw.wmnet : T126629 [19:27:11] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:27:22] did we deploy the train yet? [19:27:53] not yet [19:28:03] we're still in a meeting, coming (still within the window) :) [19:28:10] ok [19:28:24] * aude is on an airplane :) [19:29:57] !log Restarting Cassandra instance restbase2007-c.codfw.wmnet : T126629 [19:29:57] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:30:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:31:00] greg-g, ostriches, twentyafterfour can we have a special window to deploy some Echo stuff to 1.28.0-wmf.9 ? We want to fix some serious user-noticeable rendering problems. If so, I'll coordinate with twentyafterfour. [19:31:55] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [19:31:55] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [19:32:56] !log Upgrade of restbase2007.codfw.wmnet instances complete : T126629 [19:32:57] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:33:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:33:10] matt_flaschen: pre it (wmf.9) goign to wikipedias? [19:33:22] greg-g, yes, preferably. [19:33:30] twentyafterfour: ^ [19:33:36] matt_flaschen: you can probably do it now [19:33:51] Okay, thanks. [19:33:54] I dont' think twentyafterfour has started with the patches for today's roll (we just got out of a meeting) [19:34:11] ignore my fingers lack of exactness [19:36:43] (03CR) 10Ottomata: [C: 032] Seeting up log retention policy in yarn [puppet/cdh] - 10https://gerrit.wikimedia.org/r/297643 (https://phabricator.wikimedia.org/T139178) (owner: 10Nuria) [19:36:53] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:36:53] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:37:12] James_F: can we merge https://gerrit.wikimedia.org/r/#/c/296901/ [19:38:04] the content is fine, more asking because there might be some deployment step [19:38:30] (03PS3) 10Krinkle: Remove unused deprecated $wgStyleSheetPath [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297511 [19:39:50] anyone for integration/raita repo ? https://gerrit.wikimedia.org/r/#/c/296879/ [19:40:18] also just getting those git.wm links fixed [19:44:05] greg-g, they're all gating now. [19:44:48] (03PS2) 10Dzahn: Phabricator: Minor nit, move things that aren't templates out of templates [puppet] - 10https://gerrit.wikimedia.org/r/297830 (owner: 10Chad) [19:45:13] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase2003.codfw.wmnet : T126629 [19:45:14] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:45:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:45:36] http://snapshot.debian.org/package/php5/5.3.10-2/ [19:45:43] Woops wrong place [19:46:24] matt_flaschen: let me know when you're all done :) [19:46:38] mutante: Merged. :-) [19:46:50] !log Restarting Cassandra instance restbase2003-a.codfw.wmnet : T126629 [19:46:51] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:46:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:46:55] twentyafterfour, will do, thank you. [19:47:47] James_F: thanks :) [19:50:36] !log Restarting Cassandra instance restbase2003-b.codfw.wmnet : T126629 [19:50:37] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:50:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:53:42] !log Cassandra upgrade of restbase2003.codfw.wmnet instances complete : T126629 [19:53:43] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:53:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:55:11] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase2004.codfw.wmnet : T126629 [19:55:12] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:55:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:55:50] !log Restarting Cassandra instance restbase2004-a.codfw.wmnet : T126629 [19:55:51] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:55:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:56:15] urandom: The title of that task should probably be changed :P [19:56:55] Reedy: yeah :/ [19:57:12] * Reedy fixes [19:57:25] !log Restarted logstash on logstash1003; hoping to clear up missing de-dot errors [19:57:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:57:39] it's kind of right [19:57:58] the main problem being that it covers too much [19:58:11] Is everything not being upgraded to 2.2.6? [19:58:18] well, it is *now* [19:58:24] but there was an upgrade to 2.1.13 [19:58:34] that's what history is for ;) [19:58:38] ya [19:59:11] !log Restarting Cassandra instance restbase2004-b.codfw.wmnet : T126629 [19:59:12] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [19:59:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:59:49] !log mattflaschen@tin Synchronized php-1.28.0-wmf.9/extensions/Flow/includes/Notifications/PostReplyPresentationModel.php: flow-post-reply: show compact header on one line (duration: 00m 32s) [19:59:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:01:00] !log Cassandra upgrade of restbase2004.codfw.wmnet instances complete : T126629 [20:01:02] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:01:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:01:41] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase2008.codfw.wmnet : T126629 [20:01:44] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:01:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:02:18] !log mattflaschen@tin Synchronized php-1.28.0-wmf.9/extensions/Echo: Fixes for notification sorting and message parsing (duration: 00m 38s) [20:02:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:02:54] !log Restarting Cassandra instance restbase2008-a.codfw.wmnet : T126629 [20:02:54] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:02:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:03:14] greg-g, twentyafterfour, done. Thank you. [20:03:57] (03PS3) 10Dzahn: Phabricator: Minor nit, move things that aren't templates out of templates [puppet] - 10https://gerrit.wikimedia.org/r/297830 (owner: 10Chad) [20:04:21] (03CR) 10Dzahn: [C: 032] "no-op http://puppet-compiler.wmflabs.org/3291/" [puppet] - 10https://gerrit.wikimedia.org/r/297830 (owner: 10Chad) [20:04:45] !log Restarting Cassandra instance restbase2008-b.codfw.wmnet : T126629 [20:04:46] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:04:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:07:00] !log Cassandra upgrade of restbase2008.codfw.wmnet instances complete : T126629 [20:07:00] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:07:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:07:08] !log Restarted logstash on logstash1002; hoping to clear up missing de-dot errors [20:07:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:09:20] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/297830 (owner: 10Chad) [20:09:38] leroy [20:10:13] !log Restarted logstash on logstash1001; hoping to clear up missing de-dot errors [20:10:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:13:35] (03PS1) 1020after4: all wikis to 1.28.0-wmf.9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297860 [20:14:19] (03CR) 1020after4: [C: 032] all wikis to 1.28.0-wmf.9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297860 (owner: 1020after4) [20:15:05] (03Merged) 10jenkins-bot: all wikis to 1.28.0-wmf.9 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297860 (owner: 1020after4) [20:15:25] !log logstash upgrade aborted, rescheduled to Monday July 11th [20:15:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:15:46] (03CR) 10jenkins-bot: [V: 04-1] Phabricator: Minor nit, move things that aren't templates out of templates [puppet] - 10https://gerrit.wikimedia.org/r/297830 (owner: 10Chad) [20:17:28] gehel: bd808 ebernhardson :( sorry it aborted. thanks for trying [20:17:51] we'll get there [20:18:08] greg-g: thanks! It did not go that bad, just takes much more time than we have available right now. [20:18:24] "good" :) [20:18:37] but seriously, thank you [20:18:38] and we did not break anything (yet) [20:18:46] * greg-g knocks on wood [20:18:55] pplint-HEAD fail out of nowhere after rebase [20:18:57] hrmm [20:19:14] oh, nevermind [20:20:30] (03PS4) 10Dzahn: Phabricator: Minor nit, move things that aren't templates out of templates [puppet] - 10https://gerrit.wikimedia.org/r/297830 (owner: 10Chad) [20:24:44] I'm stepping out very briefly. stephanebisson will be available, and I'm available on Hangout. [20:24:56] 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2439038 (10mmodell) a:05mmodell>03demon Since you're working on the phab puppet stuff [20:26:00] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase2005.codfw.wmnet : T126629 [20:26:01] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:26:01] !log deploying wmf.9 to all wikis refs T138555 [20:26:02] T138555: MW-1.28.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T138555 [20:26:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:26:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:27:10] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.9 [20:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:27:31] !log Restarting Cassandra instance restbase2005-a.codfw.wmnet : T126629 [20:27:32] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:27:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:30:39] !log Restarting Cassandra instance restbase2005-b.codfw.wmnet : T126629 [20:30:40] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:31:56] (03PS1) 10Chad: Gerrit: Block a really bad person [puppet] - 10https://gerrit.wikimedia.org/r/297863 [20:32:08] mutante: Can I get a quick merge on ^^^^ [20:33:02] (03CR) 10Rush: [C: 032] Gerrit: Block a really bad person [puppet] - 10https://gerrit.wikimedia.org/r/297863 (owner: 10Chad) [20:33:18] (03CR) 10Rush: [V: 032] Gerrit: Block a really bad person [puppet] - 10https://gerrit.wikimedia.org/r/297863 (owner: 10Chad) [20:33:35] no need to force-merge :) [20:33:41] been following along ostriches I got it [20:33:49] thx [20:33:51] !log Cassandra upgrade of restbase2005.codfw.wmnet instances complete : T126629 [20:33:52] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:33:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:33:59] i was waiting for the other one to get a jenkins vote [20:34:08] repeating that [20:34:17] (03PS5) 10Dzahn: Phabricator: Minor nit, move things that aren't templates out of templates [puppet] - 10https://gerrit.wikimedia.org/r/297830 (owner: 10Chad) [20:34:24] Back [20:35:11] sure I can understand waiting on jenkins I do nearly all the time but I could see it was fine so I made a judgement call [20:36:18] (03PS1) 10Nuria: Upgrading cdh module [puppet] - 10https://gerrit.wikimedia.org/r/297873 [20:36:31] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [20:36:32] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase2006.codfw.wmnet : T126629 [20:36:33] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:37:25] (03CR) 10Dzahn: [C: 032] Phabricator: Minor nit, move things that aren't templates out of templates [puppet] - 10https://gerrit.wikimedia.org/r/297830 (owner: 10Chad) [20:38:16] !log Restarting Cassandra instance restbase2006-a.codfw.wmnet : T126629 [20:38:17] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:38:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:38:50] James_F: i dont know why jenkins dislikes it https://gerrit.wikimedia.org/r/#/c/296901/ [20:39:04] it liked it before [20:39:11] 06Operations, 06Commons, 10media-storage: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2439118 (10Menner) Please keep in mind that Linux implements font-substitution and has many fonts not installed. There is a [[ https://noc.wiki... [20:39:18] and there is just one PS [20:39:32] so it seems to make no sense that it first works and then fails later [20:40:22] 06Operations, 06Labs, 10Labs-Infrastructure: Depleted connection tracking table on labvirt1010 - https://phabricator.wikimedia.org/T139598#2439126 (10Andrew) Related: https://openstack-in-production.blogspot.com/2015/01/exceeding-tracked-connections.html [20:41:31] !log Restarting Cassandra instance restbase2006-b.codfw.wmnet : T126629 [20:41:32] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:41:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:43:01] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /page/mobile-sections/{title} (Get MobileApps Foobar page) is CRITICAL: Test Get MobileApps Foobar page returned the unexpected status 500 (expecting: 200) [20:43:02] (03PS1) 10Andrew Bogott: Increase conntrack limits for nova compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/297897 (https://phabricator.wikimedia.org/T139598) [20:45:10] !log Restarting RESTBase on restbase2001.codfw.wmnet [20:45:12] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy [20:45:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:45:33] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.192.16.152, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [20:45:44] 06Operations, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Depleted connection tracking table on labvirt1010 - https://phabricator.wikimedia.org/T139598#2439142 (10chasemp) p:05Triage>03High [20:45:53] (03CR) 10Rush: [C: 031] Increase conntrack limits for nova compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/297897 (https://phabricator.wikimedia.org/T139598) (owner: 10Andrew Bogott) [20:46:59] (03CR) 10Dzahn: [C: 04-1] "should be gerrit::crons::ssh_key in hiera" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296951 (owner: 10Chad) [20:47:52] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [20:48:18] !log Cassandra upgrade of restbase2006.codfw.wmnet instances complete : T126629 [20:48:19] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:48:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:48:59] !log Upgrading Cassandra to 2.2.6-wmf1 on restbase2009.codfw.wmnet : T126629 [20:49:00] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:49:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:49:38] (03CR) 10Dzahn: "looks like this would fail on iridium." [puppet] - 10https://gerrit.wikimedia.org/r/297803 (owner: 10Chad) [20:50:28] !log Restarting Cassandra instance restbase2009-a.codfw.wmnet : T126629 [20:50:29] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:50:44] (03CR) 10Dzahn: "on iridium we have:" [puppet] - 10https://gerrit.wikimedia.org/r/297803 (owner: 10Chad) [20:52:53] !log Restarting Cassandra instance restbase2009-b.codfw.wmnet : T126629 [20:52:54] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [20:52:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:55:00] (03PS2) 10Dzahn: Gerrit: Puppetize the known_hosts file for replication [puppet] - 10https://gerrit.wikimedia.org/r/297837 (owner: 10Chad) [20:55:23] (03PS1) 10Merlijn van Deen: logging: fix debug logging [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297900 [20:55:31] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:55:42] (03PS1) 10Merlijn van Deen: add .gitreview [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297901 [20:56:17] (03CR) 10Dzahn: [C: 032] "not changing ytterbium, that's how it is there" [puppet] - 10https://gerrit.wikimedia.org/r/297837 (owner: 10Chad) [20:58:58] (03PS3) 10Dzahn: Gerrit: move nasty ssh key to crons class, only user [puppet] - 10https://gerrit.wikimedia.org/r/296951 (owner: 10Chad) [20:59:57] !log Fin : T126629 [20:59:58] T126629: Cassandra 2.2.6 - https://phabricator.wikimedia.org/T126629 [21:00:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:00:11] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [21:00:41] (03PS1) 10Merlijn van Deen: Set up labs realm (ldap classifier and hiera) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297902 (https://phabricator.wikimedia.org/T97081) [21:01:29] quite a lot of logspam in logstash now [21:01:42] tasks submitted [21:01:46] 06Operations, 06Commons, 10media-storage: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2439232 (10kaldari) @Menner: Regardless of whether the font is substituted, it should always be anti-aliased. Otherwise it is very difficult to... [21:04:43] mutante: We only run the expensive fire-up-a-cross-browser-test-for-IE6-etc. CI stuff on merge, not on submission. But in this case it's just a glitch. [21:05:27] James_F: ok, so is there any action needed ? [21:05:31] (03PS1) 10Eevans: Enable instance 1009-c [puppet] - 10https://gerrit.wikimedia.org/r/297905 (https://phabricator.wikimedia.org/T139362) [21:05:35] like telling it to try again [21:05:47] or is it merged anyways [21:06:11] 06Operations, 06Security-Team, 10vm-requests, 13Patch-For-Review: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2439270 (10dpatrick) [21:06:19] mutante: I'm trying it again. :-) [21:06:55] mutante: i have one of those instances gerrits again, if i can trouble you: https://gerrit.wikimedia.org/r/#/c/297905/1 [21:08:02] ok and ok :) [21:08:06] (03PS2) 10Dzahn: Enable instance 1009-c [puppet] - 10https://gerrit.wikimedia.org/r/297905 (https://phabricator.wikimedia.org/T139362) (owner: 10Eevans) [21:08:56] (03CR) 10Dzahn: [C: 032] "131.48.64.10.in-addr.arpa domain name pointer restbase1009-c.eqiad.wmnet." [puppet] - 10https://gerrit.wikimedia.org/r/297905 (https://phabricator.wikimedia.org/T139362) (owner: 10Eevans) [21:09:54] (03PS4) 10Dzahn: Gerrit: move nasty ssh key to crons class, only user [puppet] - 10https://gerrit.wikimedia.org/r/296951 (owner: 10Chad) [21:10:09] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/3292/" [puppet] - 10https://gerrit.wikimedia.org/r/296951 (owner: 10Chad) [21:10:44] (03CR) 10jenkins-bot: [V: 04-1] Set up labs realm (ldap classifier and hiera) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297902 (https://phabricator.wikimedia.org/T97081) (owner: 10Merlijn van Deen) [21:13:42] urandom: now it has been submitted [21:14:00] mutante: thanks again! [21:14:04] np [21:15:43] mutante: Bah, again no. Oh well. No rush; we've not done a release of that software for > 6 months. [21:17:11] James_F: ok, just wanted it out of the queue. thanks [21:20:41] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures [21:32:27] !log Bootstrapping restbase1009-c : T139362 [21:32:28] T139362: High storage utilization on restbase1014.eqiad.wmnet - https://phabricator.wikimedia.org/T139362 [21:32:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:35:23] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:38:43] PROBLEM - cassandra-c CQL 10.64.48.131:9042 on restbase1009 is CRITICAL: Connection refused [21:38:57] ^^^ normal/expected, i'll get it [21:39:55] looking for giueseppe, idk his irc nick... [21:40:12] ACKNOWLEDGEMENT - cassandra-c CQL 10.64.48.131:9042 on restbase1009 is CRITICAL: Connection refused eevans Bootstrapping - The acknowledgement expires at: 2016-07-08 21:39:55. [21:44:51] Danny_B, _joe_, [21:45:30] thanks [21:50:17] (03PS5) 10Dzahn: Gerrit: move nasty ssh key to crons class, only user [puppet] - 10https://gerrit.wikimedia.org/r/296951 (owner: 10Chad) [21:53:51] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/3292/" [puppet] - 10https://gerrit.wikimedia.org/r/296951 (owner: 10Chad) [21:54:30] (03CR) 10Dzahn: [C: 032] add .gitreview [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297901 (owner: 10Merlijn van Deen) [21:55:57] 06Operations, 06Commons, 10Wikimedia-SVG-rendering: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#1616960 (10dpatrick) We discussed this issue in the Security Team meeting. Our consensus is that it is okay to add `--unlimited` given that we have mitigations in... [21:56:18] (03PS2) 10Dzahn: logging: fix debug logging [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297900 (owner: 10Merlijn van Deen) [22:01:02] (03CR) 10Dzahn: [C: 032] logging: fix debug logging [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297900 (owner: 10Merlijn van Deen) [22:05:57] (03PS1) 10Krinkle: Remove unused file 'docroot/foundation/index.html' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297909 [22:07:47] (03PS2) 10Krinkle: Remove unused file 'docroot/foundation/index.html' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297909 [22:13:36] (03PS3) 10Krinkle: Remove unused file 'docroot/foundation/index.html' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297909 [22:27:00] (03PS7) 10Alex Monk: [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) [22:28:35] (03PS8) 10Alex Monk: [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) [22:30:42] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:31:50] 06Operations, 06Performance-Team, 10Traffic, 10Wikimedia-Stream, 13Patch-For-Review: Move stream.wikimedia.org (rcstream) behind cache_misc - https://phabricator.wikimedia.org/T134871#2439913 (10Krinkle) 05Open>03Resolved a:03Krinkle [22:33:34] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [22:33:48] !log maxsem@tin Synchronized php-1.28.0-wmf.9/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#/c/297795/ and https://gerrit.wikimedia.org/r/#/c/297908/ (duration: 00m 45s) [22:33:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:35:13] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [22:35:42] (03PS4) 10Dzahn: Enable base::firewall for labtestmetal2001 [puppet] - 10https://gerrit.wikimedia.org/r/293712 (owner: 10Muehlenhoff) [22:37:10] (03CR) 10Dzahn: [C: 032] Enable base::firewall for labtestmetal2001 [puppet] - 10https://gerrit.wikimedia.org/r/293712 (owner: 10Muehlenhoff) [22:42:03] 06Operations, 06Commons, 10Wikimedia-SVG-rendering: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#2439951 (10Bawolff) >>! In T111815#2439765, @dpatrick wrote: > We discussed this issue in the Security Team meeting. Our consensus is that it is okay to add `--unl... [22:42:24] 07Blocked-on-Operations, 06Operations, 10Increasing-content-coverage, 06Research-and-Data-Backlog: Backport python3-sklearn and python3-sklearn-lib from sid - https://phabricator.wikimedia.org/T133362#2229360 (10Halfak) Why is packaging necessary? With ORES, we are using sklearn via a wheel rather than a... [22:44:23] (03CR) 10Dzahn: [C: 031] "yes, comment on ticket says "This would still be helpful so that other operators could run the bot."" [puppet] - 10https://gerrit.wikimedia.org/r/270638 (https://phabricator.wikimedia.org/T126933) (owner: 10Merlijn van Deen) [22:45:43] 06Operations, 10Traffic, 06Wikipedia-Android-App-Backlog, 06Wikipedia-iOS-App-Backlog, and 2 others: Zero: Investigate removing the limit on carrier tagging to m-dot and zero-dot requests - https://phabricator.wikimedia.org/T137990#2439988 (10MBinder_WMF) [22:46:48] (03PS2) 10Dzahn: toollabs: install inkscape on exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/270638 (https://phabricator.wikimedia.org/T126933) (owner: 10Merlijn van Deen) [22:47:28] (03PS3) 10Dzahn: toollabs: install inkscape on exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/270638 (https://phabricator.wikimedia.org/T126933) (owner: 10Merlijn van Deen) [22:48:24] (03CR) 10Dzahn: "just manual rebase to fix the path conflict. where is imagemagick coming from .." [puppet] - 10https://gerrit.wikimedia.org/r/270638 (https://phabricator.wikimedia.org/T126933) (owner: 10Merlijn van Deen) [22:51:06] (03PS4) 10Dzahn: toollabs: install inkscape on exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/270638 (https://phabricator.wikimedia.org/T126933) (owner: 10Merlijn van Deen) [22:55:10] (03CR) 10Dzahn: [C: 032] toollabs: install inkscape on exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/270638 (https://phabricator.wikimedia.org/T126933) (owner: 10Merlijn van Deen) [23:00:04] RoanKattouw, ostriches, MaxSem, awight, and Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160707T2300). Please do the needful. [23:00:04] James_F, Krinkle, AndyRussG, matt_flaschen, and Amir1: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:14] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [23:00:20] Hi! Whoever is doing tonight's SWAT... I haven't prepared a core patch, just a wmf_deploy branch patch for the CentralNotice update [23:00:21] o/ [23:00:34] I I have another patch coming [23:00:42] two SWAT if possible [23:00:44] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:01:02] Oops. [23:01:14] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [23:01:15] Present. I need to merge one, then cherry-pick it, but it's already listed. [23:01:48] Hello. I can SWAT this evening. [23:02:16] I keep forgetting who should prepar and/or +2 those core patches... Anyway I could still do one... [23:02:22] Hi Dereckson [23:02:23] ack [23:02:41] (03PS1) 10Ladsgroup: Enable ORES review tool as a beta feature in ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297917 (https://phabricator.wikimedia.org/T139692) [23:02:44] Also hoping to deploy to mw1017 first, if possible :) [23:02:48] thx much!!! [23:03:00] Yeah, we decided to always mw1017 first now. [23:03:26] Ah K I remember it was discussed..... [23:04:42] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [23:05:23] matt_flaschen: https://gerrit.wikimedia.org/r/#/c/297914/1/includes/Notifications/PostReplyPresentationModel.php <- there are existing messages or new ones? [23:05:39] (03PS4) 10Andrew Bogott: Mark off a block of public IPs for labtest [dns] - 10https://gerrit.wikimedia.org/r/284491 (https://phabricator.wikimedia.org/T115491) [23:05:52] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:07:01] matt_flaschen: I've checked on en.wiki: messages already exist, okay [23:07:23] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:08:06] Dereckson, our other one will require scap, though. [23:08:22] k [23:09:20] (03CR) 10Dzahn: [C: 031] backup: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295786 (owner: 10Muehlenhoff) [23:10:47] (03CR) 10Dzahn: [C: 032] "cherry-picked already and labs-only" [puppet] - 10https://gerrit.wikimedia.org/r/295894 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [23:11:34] (03PS2) 10Dzahn: contint: Java 8 on Jessie slaves [puppet] - 10https://gerrit.wikimedia.org/r/295880 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [23:11:55] (03CR) 10Dzahn: [C: 031] contint: Java 8 on Jessie slaves [puppet] - 10https://gerrit.wikimedia.org/r/295880 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [23:12:42] (03PS3) 10Dereckson: VisualEditor: Move cite out of primary toolbar except on WP/WB/WV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296573 (owner: 10Jforrester) [23:13:00] Dereckson: can I add a patch to SWAT still? if not, I can deploy it myself afterwards [23:13:33] legoktm: sure, go ahead [23:14:06] ok, it's https://gerrit.wikimedia.org/r/#/c/297921/ [23:14:11] adding it to the page now [23:14:39] Dereckson, thanks for cherry-picking. [23:15:08] You're welcome. [23:15:42] Patches are in Zuul gate-and-submit queue. Let's do the config ones pending merge. [23:15:50] James_F: you're first [23:16:10] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296573 (owner: 10Jforrester) [23:16:18] (03CR) 10Dzahn: [C: 031] Mark off a block of public IPs for labtest [dns] - 10https://gerrit.wikimedia.org/r/284491 (https://phabricator.wikimedia.org/T115491) (owner: 10Andrew Bogott) [23:16:21] Sure. [23:16:52] (03Merged) 10jenkins-bot: VisualEditor: Move cite out of primary toolbar except on WP/WB/WV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296573 (owner: 10Jforrester) [23:16:56] (03PS3) 10Dzahn: contint: Android SDK deps on all slaves [puppet] - 10https://gerrit.wikimedia.org/r/295894 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [23:18:47] James_F: live on mw1017 [23:19:14] I can't test that here. [23:19:49] I'm sure it's fine. [23:21:07] Tested. Looks good to me. [23:21:12] And so: yes you can. [23:21:14] (03PS4) 10Dzahn: contint: Android SDK deps on all slaves [puppet] - 10https://gerrit.wikimedia.org/r/295894 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [23:21:27] (03CR) 10Dzahn: "removed dependency on unrelated change" [puppet] - 10https://gerrit.wikimedia.org/r/295894 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [23:21:45] James_F: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug allows you to get a page on any site you wish [23:21:52] and force the request to be processed by mw1017 [23:22:18] so I asked a page in VE in wikinews, one in Wikipedia, see all looks good and there is no cite bar in WN [23:22:44] I imagine that's qualify as a minimal correct test procedure for your patch. [23:22:52] Dereckson: Yes I know. But that's not helpful if you change deployment practice with no notice. I'm on a phone. :-) [23:23:18] And yes, that sounds good. [23:23:53] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: VisualEditor: Move cite out of primary toolbar except on WP/WB/WV ([[Gerrit:296573]]) (duration: 00m 30s) [23:23:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:44] I imagine James_F the notice will come soon: https://wikitech.wikimedia.org/w/index.php?title=SWAT_deploys&type=revision&diff=726131&oldid=701203 has only been updated this afternoon UTC [23:25:52] James_F: looks good in prod? [23:27:36] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297511 (owner: 10Krinkle) [23:27:41] o/ [23:27:42] (03PS4) 10Dereckson: Remove unused deprecated $wgStyleSheetPath [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297511 (owner: 10Krinkle) [23:27:52] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297511 (owner: 10Krinkle) [23:28:21] Dereckson: Testing (now switched to a machine that can do X-Debug). [23:28:46] James_F: test also without it: it's live on all servers now [23:28:47] (03Merged) 10jenkins-bot: Remove unused deprecated $wgStyleSheetPath [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297511 (owner: 10Krinkle) [23:29:09] (03CR) 10Dzahn: [C: 031] "..i guess" [puppet] - 10https://gerrit.wikimedia.org/r/277904 (owner: 10Paladox) [23:29:12] Krinkle: live on mw1017 [23:29:34] Dereckson: Yup, looks good in prod on both mw1017 and real prod. [23:29:41] Dereckson: Verified. Thanks [23:30:09] Krinkle: you cheched the logs too? [23:30:22] James_F: nice, thanks for checking [23:30:56] Dereckson: checking now. [23:31:04] Amir1: pt. deployment didn't depend of oresc_is_predicted = 1 ? [23:31:20] Dereckson: nope [23:31:27] (03PS2) 10Dereckson: Enable ORES review tool as a beta feature in ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297917 (https://phabricator.wikimedia.org/T139692) (owner: 10Ladsgroup) [23:32:00] (03PS7) 10Dzahn: zuul: enhance logging [puppet] - 10https://gerrit.wikimedia.org/r/291913 (owner: 10Hashar) [23:32:07] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297917 (https://phabricator.wikimedia.org/T139692) (owner: 10Ladsgroup) [23:32:12] (03CR) 10Dzahn: [C: 032] zuul: enhance logging [puppet] - 10https://gerrit.wikimedia.org/r/291913 (owner: 10Hashar) [23:32:58] (03Merged) 10jenkins-bot: Enable ORES review tool as a beta feature in ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297917 (https://phabricator.wikimedia.org/T139692) (owner: 10Ladsgroup) [23:33:24] Amir1: 'Enable ORES review tool as a beta feature in ptwiki' live on mw1017 [23:33:37] Dereckson: thanks, testing [23:34:06] Dereckson: have you made the tables and maintenance scripts? [23:34:58] tables are not there [23:35:08] thus returns error if you enable ORES [23:35:34] Dereckson: all ok [23:35:54] Krinkle: okay, sending to prod [23:36:37] !log dereckson@tin Synchronized wmf-config/CommonSettings.php: Remove unused deprecated $wgStyleSheetPath ([[Gerrit:297511]]) (duration: 00m 27s) [23:36:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:36:46] Anyone able to check the schema of a table in labswiki? [23:37:36] Dereckson: for CentralNotice, core patches are not autogenerated [23:37:49] yeah with your wmf_deploy branch [23:38:21] Amir1: not yet for the branch, commit message let me think it was a post-merge operation [23:38:58] Dereckson: It's okay [23:39:10] it returns error if you deploy it [23:39:19] (in prod) [23:40:32] mutante: are you there? [23:41:00] Dereckson: yes [23:41:06] Dereckson: in case it's useful, here's a core patch updating the CentralNotice submodule https://gerrit.wikimedia.org/r/#/c/297926/ [23:41:30] mutante: could you look at https://gerrit.wikimedia.org/r/#/c/297917/ commit message? It offers a procedure to create tables for an extension. [23:41:47] mutante: I don't know how to do the step 1 [23:42:26] Dereckson: you can do it manually, you did it already for another wiki ;) [23:42:31] Dereckson: i dont know that either [23:42:41] https://github.com/wikimedia/mediawiki-extensions-WikimediaMaintenance/commit/ec7e675fe4880005077c8e1312c133ac09b08855 [23:43:14] which database is this [23:43:23] i am not sure we should be creating tables like that [23:43:43] Amir1: we normally have scripts for this kind of stuff, Reedy wrote one for Translate for example [23:44:03] doesnt this qualify as a scheme change? [23:44:14] schema [23:44:45] mutante: no, we did it in SWAT four times before [23:45:01] Amir1: do you remember who deployed that for you? [23:45:16] thcipriani, legoktm, MaxSem [23:45:33] hi [23:45:38] legoktm: how did you deploy tables for ORES? [23:45:43] mutante: https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change [23:45:44] sql.php [23:46:04] mwscript sql.php --wiki=xxwiki /srv/mediawiki/..../extensions/ORES/sql/foo.sql [23:46:20] legoktm: WikimediaMaintenance extension is deployed for ORES? [23:46:22] legoktm: on Terbium? [23:46:23] if you want me to do it I can take care of it [23:46:27] this commit [23:46:28] https://github.com/wikimedia/mediawiki-extensions-WikimediaMaintenance/commit/ec7e675fe4880005077c8e1312c133ac09b08855 [23:46:29] Amir1: not yet, that's next week [23:46:35] kk [23:46:39] Dereckson: yeah, from terbium. You could also do it from tin [23:46:51] Amir1: ok, good [23:47:14] legoktm: thanks / Amir1: okay I'm creating the [23:47:15] m [23:47:28] legoktm: thanks, TIL how to make these tables [23:48:23] Dereckson: thank you [23:48:31] !log Created table ores_model on ptwiki from php-1.28.0-wmf.9/extensions/ORES/sql/ores_model.sql [23:48:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:48:39] Dereckson: once the commit Amir1 linked is deployed, we can use the standard way of mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=xxwiki --ores [23:49:28] nice [23:50:15] !log Created table ores_classification on ptwiki from php-1.28.0-wmf.9/extensions/ORES/sql/ores_classification.sql [23:50:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:50:28] legoktm: instead of the two sql.php I used? [23:50:33] or in addition? [23:50:37] instead of [23:50:39] k [23:51:07] Amir1: so now the table exist you can test on mw1017? [23:51:26] on it [23:52:01] legoktm: createExtensionTables.php works for every extension or there are some tricky exceptions? [23:52:03] Dereckson: okay, works as expected [23:52:13] Amir1: good [23:52:23] but run the maintenance ones too [23:52:29] before deploying to prod [23:54:13] Dereckson: I think it should handle most of them? it's just a hardcoded list of tables, so as long as people keep it up to date, it should work [23:54:40] (03CR) 10Dzahn: [C: 031] ocg: Restrict to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297840 (owner: 10Muehlenhoff) [23:55:19] Amir1: done [23:55:54] weeee [23:55:59] works like a charm [23:56:22] Dereckson: thanks :) [23:56:38] Good. [23:57:07] !log Ran extensions/ORES/maintenance/CheckModelVersions.php and extensions/ORES/maintenance/PopulateDatabase.php on ptwiki (T139692) [23:57:08] T139692: Deploy ORES review tool in ptwiki - https://phabricator.wikimedia.org/T139692 [23:57:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:57:42] Amir1: could you also check if all looks fine in the logs? [23:57:49] sure [23:59:22] nothing much in logstash, and it shouldn't be ;) [23:59:39] Okay, go for prod now. [23:59:43] Dereckson: check size of ores_classification table if possible