[00:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150903T0000). [00:02:29] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: announce scheduled downtime - https://phabricator.wikimedia.org/T110133#1600370 (10Dzahn) mailed ops@lists mailed wikitech-l@lists mailed listadmins@lists talked briefly to Johan about it being on the Tech News too [00:02:37] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: announce scheduled downtime - https://phabricator.wikimedia.org/T110133#1600371 (10Dzahn) 5Open>3Resolved [00:02:38] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1600372 (10Dzahn) [00:04:25] (03CR) 10Tim Landscheidt: "I tested this on Toolsbeta by testing that existing instances don't do anything (i. e., the call is indeed silenced) and that two new inst" [puppet] - 10https://gerrit.wikimedia.org/r/235642 (https://phabricator.wikimedia.org/T109933) (owner: 10Tim Landscheidt) [00:04:40] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync all configs and archives one more time - https://phabricator.wikimedia.org/T110129#1600383 (10Dzahn) yea, this works. just always takes long no matter what. i can run it and then immediately repeat it and still wait 90 mins. real 90m25.457s [00:04:54] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync all configs and archives one more time - https://phabricator.wikimedia.org/T110129#1600384 (10Dzahn) 5Open>3Resolved [00:06:01] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync all configs and archives one more time - https://phabricator.wikimedia.org/T110129#1600388 (10JohnLewis) 90 minutes still seems like a concern to me but expected I guess. The length though can increase a bit which may be an issue on the day. So... [00:08:03] Krenair: rmoen: RoanKattouw: ostriches: greg-g: CentralNotice deploy is all done!! Everything seems to be in order so far. I'll be around on IRC for at least a few more hours in case anything comes up. Many, many thanks for your help and patience!! :D [00:08:18] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync all configs and archives one more time - https://phabricator.wikimedia.org/T110129#1600390 (10Dzahn) yea, agreed.this is making our maintenance window shorter. we also need to rsync the mail queue and run the import script. it adds up. will try... [00:08:21] great [00:08:28] I did nothing [00:08:45] But no problem :p [00:12:13] Krenair: answering stuff on IRC... and not troutslapping me for disorderliness... was a huge help, and not nothing :D good to know about the mw1017, can't believe I missed that [00:12:46] That section was written before X-Wikimedia-Debug [00:13:33] when mw1017 was testwiki only unless you were... I don't know, proxying into the cluster and sending requests there directly I guess would've worked? [00:13:45] so it's not entirely clear about that [00:14:27] certainly could be made more obvious [00:16:30] Mmmm.... [00:18:11] 6operations, 10Wikimedia-Mailing-lists: rsync all configs and archives one more time - https://phabricator.wikimedia.org/T110129#1600415 (10Dzahn) [00:18:15] 6operations, 10Traffic, 10fundraising-tech-ops, 7IPv6: Enable IPv6 on donate.wikimedia.org - https://phabricator.wikimedia.org/T73267#1600417 (10faidon) p:5Low>3High [00:20:34] (03CR) 10Dzahn: "what do others think? enable v6 on gerrit, do it? (we will still have it for a while, right)" [puppet] - 10https://gerrit.wikimedia.org/r/214437 (https://phabricator.wikimedia.org/T37540) (owner: 10Dzahn) [00:21:12] (03CR) 10Dzahn: "same here, gerrit IPv6? any reason not to?" [puppet] - 10https://gerrit.wikimedia.org/r/214437 (https://phabricator.wikimedia.org/T37540) (owner: 10Dzahn) [00:26:01] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting research DB access for Alex Monk - https://phabricator.wikimedia.org/T110754#1600433 (10Krenair) Works fine, thanks @RobH. I had found https://wikitech.wikimedia.org/wiki/Analytics/Data_access which I used to figure out where things are, and w... [00:27:08] !log Deployed patch for T111029 [00:27:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Mr. Obvious [00:28:25] 6operations, 10Traffic, 10fundraising-tech-ops, 7IPv6: Enable IPv6 on donate.wikimedia.org - https://phabricator.wikimedia.org/T73267#1600444 (10faidon) I'm not sure if there's anything that prevents IPv6 from being enabled on donate; if so, could we have it detailed in this task? Adding IPv6 to donate wo... [00:29:40] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1600447 (10Ariconte) "Daniel Zahn 9:56 AM (31 minutes ago) (back to just announcements on this list but this is one) We have scheduled an upgrade of mailman (htt... [00:30:50] 6operations, 6WMF-Legal: Set up new URL policy.wikimedia.org - https://phabricator.wikimedia.org/T97329#1600459 (10Slaporte) [00:30:51] 6operations, 10Traffic: SSL certificate for policy.wikimedia.org - https://phabricator.wikimedia.org/T110197#1600458 (10Slaporte) [00:30:53] 6operations: migrate policy.wikimedia.org from WMF cluster to Wordpress - https://phabricator.wikimedia.org/T110203#1600456 (10Slaporte) 5Open>3Resolved This is all set up: [[https://policy.wikimedia.org|policy.wikimedia.org]] Thanks for the awesome quick work, @RobH! [00:32:54] mutante: https://gerrit.wikimedia.org/r/#/c/235222/ - any ideas? [00:33:27] 6operations, 5Patch-For-Review: bond eth interfaces on ms1001 - https://phabricator.wikimedia.org/T89829#1600462 (10faidon) Is this still needed? I [[ http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=ms1001.wikimedia.org&m=cpu_report&r=hour&s=descending&hc=4&mc=2&st=1441240298&g=network_report&z=lar... [00:36:03] !log ori@tin Synchronized php-1.26wmf20/includes/parser/Preprocessor_Hash.php: Idd1acd903: Decline to cache preprocessor items larger than 1 Mb (duration: 00m 13s) [00:36:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:36:15] !log ori@tin Synchronized php-1.26wmf21/includes/parser/Preprocessor_Hash.php: Idd1acd903: Decline to cache preprocessor items larger than 1 Mb (duration: 00m 11s) [00:36:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:41:27] 6operations: sysctl::parameters don't take effect until next reboot (on Trusty at least) - https://phabricator.wikimedia.org/T109711#1600498 (10faidon) I just tried changing manually changing `70-core_dumps.conf` and manually applying it, then forced a puppet run. It worked, so I was unable to reproduce your pro... [00:42:46] (03CR) 10Paladox: "Needs rebase. And yes gerrit should be IPv6 enabled since we will still be using gerrit for a while till the migration starts which I doin" [puppet] - 10https://gerrit.wikimedia.org/r/214437 (https://phabricator.wikimedia.org/T37540) (owner: 10Dzahn) [00:46:30] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: Puppet has 1 failures [00:49:59] (03PS1) 10Papaul: I update the files with dns info for elastic2001-2004 Bug:T111080 [dns] - 10https://gerrit.wikimedia.org/r/235657 [00:50:49] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [00:52:19] (03PS2) 10Alex Monk: Add DNS entries for elastic200[1-4] [dns] - 10https://gerrit.wikimedia.org/r/235657 (https://phabricator.wikimedia.org/T111080) (owner: 10Papaul) [00:53:50] (03PS3) 10Alex Monk: Add DNS entries for elastic2001-2024 [dns] - 10https://gerrit.wikimedia.org/r/235657 (https://phabricator.wikimedia.org/T111080) (owner: 10Papaul) [01:00:58] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [01:12:29] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:25:07] 6operations, 6Analytics-Backlog, 6Labs, 10Wikimedia-Apache-configuration, and 3 others: https://wikitech.wikimedia.org/beacon/statsv 404 Not Found - https://phabricator.wikimedia.org/T104359#1600598 (10Krenair) 5Open>3Resolved [01:28:07] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1600617 (10Dzahn) >>! In T105756#1600447, @Ariconte wrote: > Are you going to tell the readers??? To forestall all the 'The list is not working messages.... While... [01:32:51] !log krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s) [01:33:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:33:24] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1600624 (10saper) Can we add checking web/email interface i18n encoding to the "it works" checklist? Caused some issues in the past (not only on our mailman install) [01:40:42] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1600632 (10Dzahn) >>! In T105756#1600624, @saper wrote: > Can we add checking web/email interface i18n encoding to the "it works" checklist? See T110131#1599716 for... [01:41:59] PROBLEM - Hadoop NodeManager on analytics1032 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [01:44:26] 6operations, 10Fundraising Tech Backlog: Add emcnaughton@wikimedia.org to fr-tech@ email group - https://phabricator.wikimedia.org/T111257#1600643 (10Dzahn) a:3Dzahn [01:46:18] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: Puppet has 1 failures [01:51:49] RECOVERY - Hadoop NodeManager on analytics1032 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [01:54:55] 6operations, 10Fundraising Tech Backlog: Add emcnaughton@wikimedia.org to fr-tech@ email group - https://phabricator.wikimedia.org/T111257#1600663 (10Dzahn) done. added her to fr-tech and fr-tech adds her to fr-online and that adds her to fr-all. yep let me know if you want me to paste all the lists (civicr... [01:59:35] 6operations, 10Fundraising Tech Backlog: Add emcnaughton@wikimedia.org to fr-tech@ email group - https://phabricator.wikimedia.org/T111257#1600675 (10Dzahn) 5Open>3Resolved has been applied on the mail server [02:01:07] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint, 7Icinga: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1600680 (10Dzahn) p:5Triage>3Normal [02:18:27] 6operations, 10Deployment-Systems: Remove lanthanum.eqiad.wmnet from Trebuchet redis - https://phabricator.wikimedia.org/T110677#1600699 (10Dzahn) on `tin `: > `redis-cli` ``` redis 127.0.0.1:6379> keys "deploy:integration*" 1) "deploy:integration/phpcs:minions:lanthanum.eqiad.wmnet" 2) "deploy:integration/m... [02:27:54] 6operations, 10Deployment-Systems: Remove lanthanum.eqiad.wmnet from Trebuchet redis - https://phabricator.wikimedia.org/T110677#1600700 (10Dzahn) ``` redis 127.0.0.1:6379> help srem SREM key member [member ...] summary: Remove one or more members from a set since: 1.0.0 group: set ``` ``` redis 1... [02:28:50] 6operations, 10Deployment-Systems: Remove lanthanum.eqiad.wmnet from Trebuchet redis - https://phabricator.wikimedia.org/T110677#1600701 (10Dzahn) Did that do it? It all seems like it except when i list the keys again i still see them. [02:36:01] 6operations, 10Deployment-Systems: pmtpa remnants in trebuchet redis - https://phabricator.wikimedia.org/T111301#1600713 (10Dzahn) [02:36:10] 6operations, 10Deployment-Systems: pmtpa remnants in trebuchet redis - https://phabricator.wikimedia.org/T111301#1600716 (10Dzahn) p:5Triage>3Low [02:39:06] !log l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 10m 41s) [02:39:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:39:43] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1600727 (10mmodell) I think the configuration change is a sensible one, at least for now, until we find a better solution. [02:42:09] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:45:36] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-09-03 02:45:36+00:00 [02:45:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:50:21] (03PS1) 10Alex Monk: Make eswiki groupOverrides inherit overrides from relevant tags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235666 (https://phabricator.wikimedia.org/T109157) [02:50:53] (03CR) 10Dzahn: [C: 031] "yep, and the rsyncd on dumps does not use this module" [puppet] - 10https://gerrit.wikimedia.org/r/235425 (https://phabricator.wikimedia.org/T108987) (owner: 10Muehlenhoff) [02:53:24] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1600754 (10Dzahn) Just for a metrics script that is deployed by puppet. specifically `passwords::mysql::phabricator::metrics_user` needs access to an additional database called `phabricat... [03:00:00] mutante, isn't redis friendly? [03:00:05] let's see if I can remember how this works [03:04:36] wait, mutante [03:04:45] what about deploy:integration/slave-scripts:minions:lanthanum.eqiad.wmnet and deploy:integration/php-coveralls:minions:lanthanum.eqiad.wmnet? [03:06:37] !log l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 32s) [03:06:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:08:00] (03CR) 10Ori.livneh: [C: 032] monitor.py: PEP8 compliance [debs/pybal] - 10https://gerrit.wikimedia.org/r/235623 (owner: 10Ori.livneh) [03:08:18] (03Merged) 10jenkins-bot: monitor.py: PEP8 compliance [debs/pybal] - 10https://gerrit.wikimedia.org/r/235623 (owner: 10Ori.livneh) [03:08:23] (03CR) 10Ori.livneh: [C: 032] ipvs.py: PEP8 compliance [debs/pybal] - 10https://gerrit.wikimedia.org/r/235624 (owner: 10Ori.livneh) [03:08:39] (03Merged) 10jenkins-bot: ipvs.py: PEP8 compliance [debs/pybal] - 10https://gerrit.wikimedia.org/r/235624 (owner: 10Ori.livneh) [03:09:20] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-03 03:09:20+00:00 [03:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:22:28] 6operations, 10Deployment-Systems: Remove lanthanum.eqiad.wmnet from Trebuchet redis - https://phabricator.wikimedia.org/T110677#1600803 (10Krenair) `del` command on each of the results returned by `keys *lanthanum.eqiad.wmnet` got rid of them. Does it work now, @hashar? [03:45:39] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: Puppet has 1 failures [04:22:27] (03PS7) 10Alex Monk: Add all groups to non-ops bastions, empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 [04:22:34] (03CR) 10jenkins-bot: [V: 04-1] Add all groups to non-ops bastions, empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 (owner: 10Alex Monk) [04:23:01] (03PS8) 10Alex Monk: Add all groups to non-ops bastions, empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 [04:23:09] (03CR) 10jenkins-bot: [V: 04-1] Add all groups to non-ops bastions, empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 (owner: 10Alex Monk) [04:27:55] (03PS9) 10Alex Monk: Add all groups to non-ops bastions, empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 [04:30:40] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 18.52% of data above the critical threshold [100000000.0] [05:06:30] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [05:08:28] 6operations, 10Deployment-Systems: Remove lanthanum.eqiad.wmnet from Trebuchet redis - https://phabricator.wikimedia.org/T110677#1600983 (10Dzahn) thanks @Krenair. I got the `srem` from the [[ https://wikitech.wikimedia.org/wiki/Trebuchet#Removing_minions_from_redis | wiki page ]]. Wanna edit that? It also sa... [05:14:16] 6operations, 10Wikimedia-Mailing-lists: announce mailman downtime - https://phabricator.wikimedia.org/T109891#1600993 (10Dzahn) duplicate of T110133 and done [05:15:15] 6operations, 10Wikimedia-Mailing-lists: announce mailman downtime - https://phabricator.wikimedia.org/T109891#1601004 (10Dzahn) [05:29:50] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: puppet fail [05:36:25] (03PS1) 10Dzahn: policy.wikimedia: remove puppetization [puppet] - 10https://gerrit.wikimedia.org/r/235673 [05:38:12] (03PS2) 10Dzahn: policy.wikimedia: remove puppetization [puppet] - 10https://gerrit.wikimedia.org/r/235673 (https://phabricator.wikimedia.org/T110203) [05:39:03] (03PS3) 10Dzahn: policy.wikimedia: remove puppetization [puppet] - 10https://gerrit.wikimedia.org/r/235673 (https://phabricator.wikimedia.org/T110203) [05:40:37] 6operations, 5Patch-For-Review: migrate policy.wikimedia.org from WMF cluster to Wordpress - https://phabricator.wikimedia.org/T110203#1601049 (10Dzahn) @Robh ^ the decom part is still open [05:42:49] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:57:39] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:24:00] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 10.34% of data above the critical threshold [100000000.0] [06:29:35] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Sep 3 06:29:35 UTC 2015 (duration 29m 34s) [06:29:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:30:58] PROBLEM - puppet last run on db2058 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:09] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:30] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:48] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:10] PROBLEM - puppet last run on subra is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:38] PROBLEM - puppet last run on db2044 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:39] PROBLEM - puppet last run on mw2145 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:48] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:09] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:18] PROBLEM - puppet last run on mw2036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:39] PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:39] PROBLEM - puppet last run on eventlog2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:09] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:39:14] 6operations, 10Traffic, 7discovery-system, 5services-tooling: Figure out a security model for etcd - https://phabricator.wikimedia.org/T97972#1601125 (10MoritzMuehlenhoff) >>! In T97972#1298574, @Joe wrote: > As ACLs are coming in etcd 2.1, for now I'd just use SSL connections. The other option we have is... [06:46:18] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:48:00] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [06:52:11] (03PS2) 10Muehlenhoff: Enable ferm for role::mariadb::core [puppet] - 10https://gerrit.wikimedia.org/r/235436 [06:53:43] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm for role::mariadb::core [puppet] - 10https://gerrit.wikimedia.org/r/235436 (owner: 10Muehlenhoff) [06:55:15] (03PS2) 10Muehlenhoff: Enable ferm for db2055-2070 [puppet] - 10https://gerrit.wikimedia.org/r/235453 [06:55:39] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm for db2055-2070 [puppet] - 10https://gerrit.wikimedia.org/r/235453 (owner: 10Muehlenhoff) [06:55:59] RECOVERY - puppet last run on subra is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:56:19] RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:56:19] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:56:28] RECOVERY - puppet last run on mw2145 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:56:39] RECOVERY - puppet last run on db2058 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:48] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:49] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:56:58] RECOVERY - puppet last run on mw2036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:09] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:19] RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:19] RECOVERY - puppet last run on eventlog2001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:57:29] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:49] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [07:03:18] (03PS2) 10Muehlenhoff: Exempt mariadb core port from connection tracking [puppet] - 10https://gerrit.wikimedia.org/r/235443 [07:03:35] (03CR) 10Muehlenhoff: [C: 032 V: 032] Exempt mariadb core port from connection tracking [puppet] - 10https://gerrit.wikimedia.org/r/235443 (owner: 10Muehlenhoff) [07:05:55] (03Abandoned) 10Muehlenhoff: Enable base::firewall for fluorine [puppet] - 10https://gerrit.wikimedia.org/r/227720 (owner: 10Muehlenhoff) [07:12:08] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:23:28] (03PS2) 10Muehlenhoff: Enable ferm for dbstore* in codfw [puppet] - 10https://gerrit.wikimedia.org/r/235435 [07:23:52] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm for dbstore* in codfw [puppet] - 10https://gerrit.wikimedia.org/r/235435 (owner: 10Muehlenhoff) [07:26:34] !log enabled ferm on dbstore* servers in codfw [07:26:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:35:07] (03PS3) 10Muehlenhoff: Enable ferm on initial appservers [puppet] - 10https://gerrit.wikimedia.org/r/235025 (https://phabricator.wikimedia.org/T104968) [07:42:12] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm on initial appservers [puppet] - 10https://gerrit.wikimedia.org/r/235025 (https://phabricator.wikimedia.org/T104968) (owner: 10Muehlenhoff) [07:54:06] (03PS1) 10Muehlenhoff: Disable ferm on mw200[89], needs an additional rule unveiled by iptables logging [puppet] - 10https://gerrit.wikimedia.org/r/235683 [07:55:47] (03CR) 10Muehlenhoff: [C: 032 V: 032] Disable ferm on mw200[89], needs an additional rule unveiled by iptables logging [puppet] - 10https://gerrit.wikimedia.org/r/235683 (owner: 10Muehlenhoff) [07:56:12] 6operations, 10Deployment-Systems: Remove lanthanum.eqiad.wmnet from Trebuchet redis - https://phabricator.wikimedia.org/T110677#1601205 (10hashar) It still shows lanthanum.eqiad.wmnet apparently :-( ``` hashar@tin:/srv/deployment/integration/slave-scripts$ git deploy start Deployment started. ``` ``` hashar@t... [07:58:28] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1601207 (10jcrespo) Let me do 2 things: solidify the configuration change on puppet, and leave `pt-table-checksum` running over the weekend analyzing slow queries... [08:11:09] (03PS1) 10Jcrespo: Set max idle connection timeout to 60 seconds on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/235685 (https://phabricator.wikimedia.org/T109279) [08:13:23] 6operations, 10Deployment-Systems: Remove lanthanum.eqiad.wmnet from Trebuchet redis - https://phabricator.wikimedia.org/T110677#1601219 (10hashar) a:3Dzahn @dzahn thank you very much for the detailed redis commands, that is very helpful. The list of minions is also stored in the keys `deploy::minions... [08:14:04] (03PS2) 10Jcrespo: Set max idle connection timeout to 60 seconds on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/235685 (https://phabricator.wikimedia.org/T109279) [08:17:35] 6operations, 10Deployment-Systems: Remove lanthanum.eqiad.wmnet from Trebuchet redis - https://phabricator.wikimedia.org/T110677#1601228 (10hashar) 5Open>3Resolved All good now, thank you very much. I have updated the Wikitech documentation https://wikitech.wikimedia.org/w/index.php?title=Trebuchet&diff=17... [08:21:39] jynus: good morning. I forgot to ask what is the max # of connections allowed for nodepool user on m5-master ? :-} [08:23:34] !log fixup current graphite retention T96662 [08:23:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:24:38] hashar, by default, we do not put per-user limit [08:25:15] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, 7Schema-change: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1601252 (10KartikMistry) 3NEW [08:25:26] jynus: sounds good to me so :-} thank you [08:25:26] the server has a maximum of 500 concurrent connections [08:25:37] but you share it with 10 other services [08:25:48] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, and 2 others: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1601261 (10KartikMistry) [08:25:48] so, do not use more than, let's say, 50 [08:26:13] jynus: do you have any monitoring for that ? [08:26:18] otherwise, I *will* put a limit :-) [08:26:23] hashar, yes [08:26:33] nodepool spawn instances in the openstack wmflabs, and apparently it keeps a db connection per instance spawned [08:26:46] but I trust on users first :-) [08:26:50] so if we end up with say 100 instances managed, that would be 100 db connections and possibly some more [08:27:04] well, limit that or I will [08:27:37] I am not too worried short term though [08:27:50] akosiaris: https://phabricator.wikimedia.org/T111317 - whom from DB should I add there? [08:28:40] godog: I found a ton of empty metrics in Graphite under reqstats. hierarchy. [08:29:10] godog: seems we have a perl script that emittted them at some point ( files/udp2log/sqstat.pl ) and there is an entry per domain with 3 metrics each (pageviews, tp50, p99) [08:29:34] most applications have a configuration to set a maximum pooled connections [08:31:41] hashar, I think nova follows the same pattern, and we didn't have any issue with it [08:32:50] 6operations, 10Analytics, 7Graphite: Graphite `reqstats.` hierarchy is filled with apparently unused metrics for each of our wiki domains - https://phabricator.wikimedia.org/T111318#1601266 (10hashar) 3NEW [08:33:05] godog: filled as https://phabricator.wikimedia.org/T111318 and CCed Analytics :} [08:33:29] jynus: maybe I am being paranoid. Will make sure to monitor it :-} [08:33:58] (03CR) 10Jcrespo: [C: 032] Set max idle connection timeout to 60 seconds on Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/235685 (https://phabricator.wikimedia.org/T109279) (owner: 10Jcrespo) [08:35:41] hashar, we both will notice it :-) connections will fail [08:36:39] it is true that phabricator had connection issues, but that is a very special case (creating 3000 connections in a fraction of a second) [08:41:12] 6operations, 10Traffic, 10netops: Requests from a specific network are blocked - https://phabricator.wikimedia.org/T110208#1601283 (10MartinK) //Just as a note:// I am running a small wiki for the students attending my lecture on the same shared hosting servers we are talking about here. This Wiki was also a... [08:42:47] hashar: indeed, reqstats is on its way out so I wouldn't be too concerned T83580 [08:44:54] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: Puppet has 1 failures [08:45:22] godog: ah thanks. marking it as a dupe so [08:45:39] 6operations, 10Analytics, 7Graphite: Graphite `reqstats.` hierarchy is filled with apparently unused metrics for each of our wiki domains - https://phabricator.wikimedia.org/T111318#1601287 (10hashar) [08:45:41] 6operations, 6Analytics-Kanban, 7Monitoring, 5Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1601288 (10hashar) [08:46:39] 6operations, 10Analytics, 7Graphite: Graphite `reqstats.` hierarchy is filled with apparently unused metrics for each of our wiki domains - https://phabricator.wikimedia.org/T111318#1601266 (10hashar) Per @fgiunchedi , `reqstats.` is being overhauled: {T83580}. [08:47:04] hashar: np, thanks tough for looking! [08:48:41] (03PS1) 10Jcrespo: grant phstats mysql user SELECT on phabricator_project on m3 [puppet] - 10https://gerrit.wikimedia.org/r/235687 (https://phabricator.wikimedia.org/T111200) [08:49:44] (03CR) 10Jcrespo: [C: 032] grant phstats mysql user SELECT on phabricator_project on m3 [puppet] - 10https://gerrit.wikimedia.org/r/235687 (https://phabricator.wikimedia.org/T111200) (owner: 10Jcrespo) [08:54:47] 6operations, 7Database, 5Patch-For-Review: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1601311 (10jcrespo) Done, I have added the rights to access for that user from iridum, like the other databases. Please check that it works for you, close as resolved... [08:55:10] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1601312 (10jcrespo) [08:56:00] (03CR) 10Luke081515: [C: 031] Make eswiki groupOverrides inherit overrides from relevant tags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235666 (https://phabricator.wikimedia.org/T109157) (owner: 10Alex Monk) [09:07:20] (03CR) 10Filippo Giunchedi: [C: 04-1] "minor nits, LGTM otherwise" (032 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 (owner: 10Thcipriani) [09:08:07] (03PS2) 10Muehlenhoff: Remove the ferm rules from modules/rsync/manifests/server.pp [puppet] - 10https://gerrit.wikimedia.org/r/235425 (https://phabricator.wikimedia.org/T108987) [09:09:07] (03CR) 10Muehlenhoff: [C: 032 V: 032] Remove the ferm rules from modules/rsync/manifests/server.pp [puppet] - 10https://gerrit.wikimedia.org/r/235425 (https://phabricator.wikimedia.org/T108987) (owner: 10Muehlenhoff) [09:11:23] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [09:12:22] !log stop puppet on ms-be1* after ferm rsync change [09:12:23] !log updated rsyncd firewall rules (see https://gerrit.wikimedia.org/r/235425 for details) [09:12:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:12:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:12:59] thanks morebots for not losing both [09:16:06] !log started profiling mysql queries at phabricator. Only a 1% overhead is expected. [09:16:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:22:32] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1601353 (10hashar) >>! In T104967#1504200, @Andrew wrote: > I'm a bit confused by this task. Is it still just that yo... [09:30:09] got a .deb packaging question. Patches under debian/patches only patch the source code aren't they? Or to say otherwise can I provide a patch in debian/patches that patch debian/control as well :-} [09:32:23] hashar: only for source. [09:32:31] You don't need to patch debian/control. [09:33:46] so I guess I need to craft a commit that adds the patch in debian/patches/ and amend the control file [09:33:50] I am still very confused [09:34:00] Yes. [09:34:34] (03PS7) 10Hashar: Stop all threads on SIGUSR1 [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/225410 [09:35:33] (03CR) 10Hashar: "Moved forward patch-queue/debian to catch up with debian branch. Change rebased on the tip of the branch:" [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/225410 (owner: 10Hashar) [09:35:58] the python modules dependencies handling is becoming a nigthmare [09:38:07] hashar: debian/patches is intended for Debian-specific changes to a specific upstream release, and the the rest of debian/ directory is specific Debian anyway [09:40:32] (03PS1) 10Hashar: Debug dying task managers [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/235690 [09:40:34] (03PS1) 10Hashar: Convert to use latest statsd version [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/235691 [09:40:36] (03PS1) 10Hashar: Convert timing metrics to milliseconds [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/235692 [09:40:48] those are upstream patches one of them change requirement [09:41:03] guess when generating the quilt patches I need to change debian/control now :D [09:42:38] (03CR) 10Hashar: [C: 032] "Not meant to be merged, that is applied to the Debian package as a quilt patch." [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/235690 (owner: 10Hashar) [09:42:43] (03CR) 10Hashar: [C: 032] "Not meant to be merged, that is applied to the Debian package as a quilt patch." [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/235691 (owner: 10Hashar) [09:42:50] (03CR) 10Hashar: [C: 04-2] "Not meant to be merged, that is applied to the Debian package as a quilt patch." [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/235692 (owner: 10Hashar) [09:42:59] (03CR) 10Hashar: [C: 04-2] "I meant -2" [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/235691 (owner: 10Hashar) [09:43:09] (03CR) 10Hashar: [C: 04-2] "I meant -2" [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/235690 (owner: 10Hashar) [09:51:01] (03PS1) 10Zfilipin: WIP rubocop: do not run for upstream code [puppet] - 10https://gerrit.wikimedia.org/r/235695 (https://phabricator.wikimedia.org/T102020) [09:51:34] (03CR) 10Zfilipin: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/235695 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [09:55:52] (03PS2) 10Zfilipin: WIP rubocop: do not run for upstream code [puppet] - 10https://gerrit.wikimedia.org/r/235695 (https://phabricator.wikimedia.org/T102020) [09:56:08] (03CR) 10Zfilipin: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/235695 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [09:57:12] (03CR) 10Zfilipin: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/235695 (https://phabricator.wikimedia.org/T102020) (owner: 10Zfilipin) [10:01:08] 7Puppet, 10Continuous-Integration-Config, 6Scrum-of-Scrums, 5Patch-For-Review: Setup rubocop for operations/puppet ruby code lints - https://phabricator.wikimedia.org/T102020#1601463 (10zeljkofilipin) I have disabled RuboCop for upstream code in [[ https://gerrit.wikimedia.org/r/235695 | 235695 ]], but I a... [10:02:03] !log reenable puppet on ms-be1* [10:02:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:02:15] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, and 2 others: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1601466 (10jcrespo) This SQL script does not involve any schema change -only new... [10:03:24] PROBLEM - Outgoing network saturation on labstore1002 is CRITICAL: CRITICAL: 15.38% of data above the critical threshold [100000000.0] [10:05:49] (03PS1) 10Hashar: 0.1.1-wmf2: support python-statsd >= 3.x [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/235696 (https://phabricator.wikimedia.org/T107268) [10:06:31] 6operations, 5Continuous-Integration-Scaling, 7Nodepool: Backport python-shade from debian/testing to jessie-wikimedia - https://phabricator.wikimedia.org/T107267#1601483 (10hashar) [10:06:35] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1601485 (10hashar) [10:07:10] es1002 -> 23 hours and still copying. Only 1 TB left. [10:12:15] (03CR) 10Filippo Giunchedi: [C: 04-1] Add config deployment (033 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/235385 (owner: 10Thcipriani) [10:12:32] jynus: how fast does it go btw? [10:14:18] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling, 7Nodepool: Upload nodepool_0.1.1-wmf2 package to apt.wikimedia.org to `jessie-wikimedia/thirdparty` - https://phabricator.wikimedia.org/T111203#1601509 (10hashar) [10:14:43] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling, 7Nodepool: Upload nodepool_0.1.1-wmf2 package to apt.wikimedia.org to `jessie-wikimedia/thirdparty` - https://phabricator.wikimedia.org/T111203#1597647 (10hashar) Bumped the package from wmf1 to wmf2 which includes python-statsd 3.x supp... [10:16:14] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: Puppet has 1 failures [10:18:13] 6operations, 10ops-eqiad, 7Database: Disk issue on db1028 - https://phabricator.wikimedia.org/T103230#1601527 (10jcrespo) ``` Firmware state: Online, Spun Up ``` After the rebuild, no extra strange logs: ``` Time: Wed Sep 2 19:14:41 2015 Code: 0x00000051 Class: 0 Locale: 0x01 Event Description: State cha... [10:18:54] godog, 80MB/s [10:19:02] half the time it uses to be [10:19:15] probably because of the snapshots running on the same servers [10:19:31] *half the speed [10:19:52] the eta was 16 hours, it will probably take 25-28 [10:20:45] I send it gz compressed, but for that server I almost do now will anything [10:20:52] *win [10:22:51] (03PS1) 10Jcrespo: Repool db1028 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235698 (https://phabricator.wikimedia.org/T103230) [10:23:24] RECOVERY - Outgoing network saturation on labstore1002 is OK: OK: Less than 10.00% above the threshold [75000000.0] [10:23:54] jynus: nice! [10:24:48] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 7Jenkins: Please refresh Jenkins package on apt.wikimedia.org to 1.609.3 - https://phabricator.wikimedia.org/T111327#1601571 (10hashar) 3NEW [10:25:08] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 7Jenkins: Please refresh Jenkins package on apt.wikimedia.org to 1.609.3 - https://phabricator.wikimedia.org/T111327#1601582 (10hashar) p:5Triage>3Normal [10:27:47] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling, 7Nodepool: Upload nodepool_0.1.1-wmf2 package to apt.wikimedia.org to `jessie-wikimedia/thirdparty` - https://phabricator.wikimedia.org/T111203#1601604 (10hashar) p:5Triage>3High Setting priority to high because that is preventing us... [10:28:56] (03PS1) 10Jcrespo: Increasing es2 and es3 servers to normal loads [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235700 [10:29:52] (03CR) 10Jcrespo: [C: 032] Repool db1028 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235698 (https://phabricator.wikimedia.org/T103230) (owner: 10Jcrespo) [10:30:51] (03CR) 10Jcrespo: [C: 032] Increasing es2 and es3 servers to normal loads [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235700 (owner: 10Jcrespo) [10:30:56] jynus: is Jcrespo? [10:31:10] I am Jcrespo [10:31:18] cool. Hi there :) [10:31:21] hello [10:31:53] jynus: So, we need to deploy new tables and you mentioned that it doesn't need DBA review - when we can do that? [10:32:15] Sometime next week is possible? I'll test first in Beta. [10:32:21] at any time [10:32:39] jynus: I should ping you on Monday then. [10:32:44] just get the deployment people in the loop (I do not know how they organize those) [10:33:40] jynus: sure. [10:33:57] I'll look into older ticket on how-we-did earlier too. [10:34:12] basically, you have my +1 to do it at any time [10:34:14] PROBLEM - puppet last run on mw2181 is CRITICAL: CRITICAL: puppet fail [10:34:43] I only worry is when it involves ALTER TABLE [10:34:54] CREATE TABLE is not an issue [10:35:08] jynus: first one for contenttranslation was: https://phabricator.wikimedia.org/T84969 [10:35:42] Okay! [10:36:32] if you need help doing those [10:37:00] I will need to understand which server and database to apply those [10:37:18] Sure. [10:37:54] but only if you need help, as I said, I am not a blocker, and can be done by anyone :-) [10:39:29] I see the previos change applied on x1-master, "wikishared" database [10:39:43] please confirm me if that would be enough [10:39:48] and I can do it right now [10:40:11] jynus: confirm [10:40:13] wikishared [10:40:33] so, it is a table that it is only there, not on every wiki? [10:40:35] jynus: cx* tables are from contenttranslation [10:40:52] jynus: it is shared among all wikis. [10:40:59] 'wikishared' [10:41:27] understood - those are the kind of things that I need to undestand (every extension has its own schema :-P) [10:41:38] :) [10:41:47] I just know this much :) [10:42:02] some have tables per wiki, some in 3 different places, some only on some wikis [10:42:15] that is why I need to ask basic questions like these [10:42:29] No problem! [10:43:08] kart_, sorry, can you link me to the ticket again, I have lost the number [10:43:22] the new change, I mean [10:43:32] jynus: new one? or old one? [10:43:36] https://phabricator.wikimedia.org/T111317 - new [10:43:57] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, and 2 others: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1601715 (10jcrespo) a:3jcrespo [10:44:05] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, and 2 others: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1601252 (10jcrespo) p:5Triage>3Normal [10:44:21] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, and 3 others: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1601252 (10jcrespo) [10:45:26] !log applying schema change for ContentTranslation on x1-master "wikishared" [10:45:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:47:55] kart_, there was a problem- the database was configured in latin1 encoding [10:48:18] jynus: was it actual run or dry run? [10:48:22] that is probably something you do not want (as it will fail with utf8 characters) [10:48:33] the existing database [10:48:49] so I am going to change it to binary charset to avoid problems [10:49:00] and review the existing tables [10:49:24] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [10:49:30] jynus: it may affect current users. [10:49:41] jynus: can you report it? [10:49:56] it may not if the tables are created properly [10:50:09] I am going to fix the configuration and the new tables [10:50:14] jynus: Please don't create tables now, we need to test in beta. [10:50:28] ok, then [10:50:37] Thanks :) [10:51:00] jynus: report issue, I think we need to fix encoding issue. [10:53:05] which project? ContentTranslation-* [10:53:19] cxserver? [10:53:25] jynus: mw-cx - you'll get MediaWiki-ContentTranslation.. [10:53:31] ah, ok [10:54:06] jynus: I've to go, will be back in hour or around. Thanks for quick test :) [10:55:10] kart_, T111333 [10:55:14] bye! [10:56:10] I think I have things on tin, don't I? [10:56:42] 6operations, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release6, and 3 others: Review and create table for Content Translation - https://phabricator.wikimedia.org/T111317#1601768 (10jcrespo) [10:56:49] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra inter-node encryption (TLS) - https://phabricator.wikimedia.org/T108953#1601771 (10fgiunchedi) thanks @eevans! I think that'll work, basically to generalize it on a per-cluster basis something like: I was thinking something along the lines of (all in... [11:01:35] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [11:01:59] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1028; increase the load of es1010, es1013 and es1017 (duration: 00m 12s) [11:02:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:02:33] RECOVERY - puppet last run on mw2181 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:41] (03PS2) 10Muehlenhoff: Enable ferm for mariadb core servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/235440 [11:06:49] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm for mariadb core servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/235440 (owner: 10Muehlenhoff) [11:07:41] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1601796 (10Aklapper) >>! In T111200#1601311, @jcrespo wrote: > Done, I have added the rights to access for that user from iridum, like the other databases. Thanks! I'll close this ticket... [11:07:45] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1601797 (10Aklapper) 5Open>3Resolved [11:09:35] PROBLEM - puppet last run on mw2137 is CRITICAL: CRITICAL: puppet fail [11:14:50] ^proxy error [11:17:01] jynus: is there any way to access replicas from extrnal (exept tunneling= [11:17:04] *)? [11:17:43] from extrnal? [11:17:44] RECOVERY - puppet last run on mw2137 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:17:55] (03PS1) 10ArielGlenn: dumps: be able to specify number of chunks for abstracts [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/235704 [11:18:22] jynus: not via the wikimedia, from a external server [11:18:44] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: puppet fail [11:19:01] no, mysql port is not exposed publicly, never [11:19:30] but you could setup with dumps you own server on an external server [11:19:31] ok :/ [11:19:37] so i need to tunnel :( [11:19:57] I would recommend against that, too [11:23:03] I mean, Spacemansam you accessing though a tunnel is ok [11:23:13] sorry, I meant Steinsplitter [11:23:33] but do not relay the connection to a 3rd party [11:24:04] my pc --> labs. schould be fine :) [11:24:07] on the manual you have how to do it https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Configuring_MySQL_Workbench [11:24:16] thanks [11:24:26] it is 1 ssh line of code :-) [11:47:03] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:47:52] PROBLEM - mysqld processes on es1002 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [11:48:11] ignore icinga [11:48:12] PROBLEM - mysqld processes on es1016 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [11:48:34] those are the depooled hosts, (the clone took more that expected) [11:49:03] I will downtime them when I come back [11:53:24] (03CR) 10Muehlenhoff: "I made a tcpdump of 5 minutes of traffic on db1047 and all mysql traffic originated from internal 10.x addresses." [puppet] - 10https://gerrit.wikimedia.org/r/235444 (owner: 10Muehlenhoff) [12:07:26] 6operations, 10Traffic, 10netops: Requests from a specific network are blocked - https://phabricator.wikimedia.org/T110208#1601875 (10Ironholds) 5stalled>3Resolved Apologies for the disruption; we will try to aim for smaller CIDR blocks with future issues. Agreed with akosiaris that this can be consider... [12:22:55] (03PS2) 10Aklapper: Phabricator project creation/changes log email for Phab admins [puppet] - 10https://gerrit.wikimedia.org/r/233219 (https://phabricator.wikimedia.org/T85183) [12:23:41] (03CR) 10Aklapper: "DB access should be fine now (see T111200) so I'm not aware of anything blocking this. Happy to help testing." [puppet] - 10https://gerrit.wikimedia.org/r/233219 (https://phabricator.wikimedia.org/T85183) (owner: 10Aklapper) [12:24:18] !log disable elastic1001 in lvs as we are gonig to try fw apply round #2 [12:24:25] !log move all shards off of elastic1001 [12:24:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:24:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:35:35] (03PS1) 10Rush: elasticsearch: ferm for all nodes [puppet] - 10https://gerrit.wikimedia.org/r/235714 [12:35:45] (03PS2) 10Rush: elasticsearch: ferm for all nodes [puppet] - 10https://gerrit.wikimedia.org/r/235714 [12:37:04] (03CR) 10Rush: [C: 032] elasticsearch: ferm for all nodes [puppet] - 10https://gerrit.wikimedia.org/r/235714 (owner: 10Rush) [12:42:46] !log unban elastic1001 and put back in service [12:42:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:52:13] (03CR) 10Alexandros Kosiaris: [V: 04-1] "The following packages have unmet dependencies:" [debs/contenttranslation/apertium-fr-ca] - 10https://gerrit.wikimedia.org/r/235418 (https://phabricator.wikimedia.org/T99637) (owner: 10KartikMistry) [12:52:57] (03CR) 10Alexandros Kosiaris: [V: 04-1] "The following packages have unmet dependencies:" [debs/contenttranslation/apertium-eo-ca] - 10https://gerrit.wikimedia.org/r/235415 (https://phabricator.wikimedia.org/T102101) (owner: 10KartikMistry) [12:53:17] (03CR) 10Alexandros Kosiaris: [C: 04-1] "The following packages have unmet dependencies:" [debs/contenttranslation/apertium-ca-it] - 10https://gerrit.wikimedia.org/r/235410 (https://phabricator.wikimedia.org/T105582) (owner: 10KartikMistry) [12:53:36] (03CR) 10Alexandros Kosiaris: [V: 04-1] "The following packages have unmet dependencies:" [debs/contenttranslation/apertium-eo-es] - 10https://gerrit.wikimedia.org/r/235408 (https://phabricator.wikimedia.org/T102101) (owner: 10KartikMistry) [12:53:54] (03CR) 10Alexandros Kosiaris: [V: 04-1] "The following packages have unmet dependencies:" [debs/contenttranslation/apertium-eo-fr] - 10https://gerrit.wikimedia.org/r/235404 (https://phabricator.wikimedia.org/T102101) (owner: 10KartikMistry) [12:54:08] akosiaris: ah. why I installed new packages in local repo :/ [13:00:15] (03PS2) 10KartikMistry: Add Debian package for apertium-ca-it [debs/contenttranslation/apertium-ca-it] - 10https://gerrit.wikimedia.org/r/235410 (https://phabricator.wikimedia.org/T105582) [13:02:20] (03PS2) 10KartikMistry: Add Debian package for apertium-eo-ca [debs/contenttranslation/apertium-eo-ca] - 10https://gerrit.wikimedia.org/r/235415 (https://phabricator.wikimedia.org/T102101) [13:05:39] (03PS2) 10KartikMistry: Added Debian package for apertium-eo-fr [debs/contenttranslation/apertium-eo-fr] - 10https://gerrit.wikimedia.org/r/235404 (https://phabricator.wikimedia.org/T102101) [13:07:56] (03PS2) 10KartikMistry: Added Debian package for apertium-eo-es [debs/contenttranslation/apertium-eo-es] - 10https://gerrit.wikimedia.org/r/235408 (https://phabricator.wikimedia.org/T102101) [13:09:32] (03PS2) 10KartikMistry: Add Debian package for apertium-fr-ca [debs/contenttranslation/apertium-fr-ca] - 10https://gerrit.wikimedia.org/r/235418 (https://phabricator.wikimedia.org/T99637) [13:13:11] !log bounce carbon daemons on graphite1001 [13:13:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:17:53] !log upgrading mr1-esams and mr1-eqiad to newer junos [13:18:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:34:41] (03CR) 10Filippo Giunchedi: cassandra: WIP support for multiple instances (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/231512 (https://phabricator.wikimedia.org/T95253) (owner: 10Filippo Giunchedi) [13:34:45] (03PS10) 10Filippo Giunchedi: cassandra: WIP support for multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/231512 (https://phabricator.wikimedia.org/T95253) [13:38:20] 6operations: dataset1001/dumps rsync setup should use rsync::server from module - https://phabricator.wikimedia.org/T108992#1602112 (10MoritzMuehlenhoff) [13:38:24] PROBLEM - Host mr1-esams is DOWN: CRITICAL - Network Unreachable (91.198.174.247) [13:40:32] (this is me, see above) [13:42:29] 6operations: dataset1001/dumps rsync setup should use rsync::server from module - https://phabricator.wikimedia.org/T108992#1602135 (10MoritzMuehlenhoff) 5Open>3declined Configuring the ferm roles in modules/rsync was a design error, which has been fixed in https://phabricator.wikimedia.org/rOPUPf1a21bfec9ab... [13:42:45] (03PS1) 10Yuvipanda: k8s: Explicitly use the debian backports repo [puppet] - 10https://gerrit.wikimedia.org/r/235722 [13:42:55] (03PS2) 10Yuvipanda: k8s: Explicitly use the debian backports repo [puppet] - 10https://gerrit.wikimedia.org/r/235722 [13:44:00] (03CR) 10Yuvipanda: [C: 032] k8s: Explicitly use the debian backports repo [puppet] - 10https://gerrit.wikimedia.org/r/235722 (owner: 10Yuvipanda) [13:44:35] RECOVERY - Host mr1-esams is UP: PING OK - Packet loss = 0%, RTA = 88.49 ms [13:52:28] (03PS1) 10Muehlenhoff: Assign grains for initial debdeploy clients [puppet] - 10https://gerrit.wikimedia.org/r/235725 [14:11:55] PROBLEM - Host ps1-a8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:11:55] PROBLEM - Host ps1-b4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:11:55] PROBLEM - Host ps1-b7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:05] PROBLEM - Host ps1-d5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:06] PROBLEM - Host ps1-d6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:06] PROBLEM - Host ps1-c6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:06] PROBLEM - Host ps1-d2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:06] PROBLEM - Host ps1-d8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:06] PROBLEM - Host ps1-c5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:06] PROBLEM - Host ps1-d4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:07] PROBLEM - Host ps1-c2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:07] PROBLEM - Host ps1-b8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:08] PROBLEM - Host ps1-c4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:08] PROBLEM - Host ps1-d1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:09] PROBLEM - Host ps1-c1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:16] PROBLEM - Host ps1-c8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:12:57] paravoid: ^ fallout from reboot I guess? [14:13:03] yes [14:13:34] PROBLEM - Host ps1-a3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:13:34] PROBLEM - Host ps1-b2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:13:40] kk, thanks [14:13:46] PROBLEM - Host ps1-b3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:13:46] PROBLEM - Host ps1-a4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:13:55] PROBLEM - Host ps1-a1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:13:55] PROBLEM - Host ps1-a7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:13:56] PROBLEM - Host ps1-a2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:13:56] PROBLEM - Host ps1-b6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:14:06] PROBLEM - Host ps1-b1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:14:06] PROBLEM - Host ps1-b5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:14:15] PROBLEM - Host ps1-a6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:14:25] PROBLEM - Host ps1-c7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:14:25] PROBLEM - Host ps1-c3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:14:25] PROBLEM - Host ps1-d7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:14:25] PROBLEM - Host ps1-d3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:14:35] PROBLEM - Host ps1-a5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:15:46] PROBLEM - Host mr1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:18:58] (03PS1) 10MarcoAurelio: Change Kannada Wikisource logo and project title [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) [14:19:04] RECOVERY - Host ps1-b1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.31 ms [14:19:04] RECOVERY - Host ps1-d1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.10 ms [14:19:04] RECOVERY - Host ps1-a2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.17 ms [14:19:04] RECOVERY - Host ps1-d4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.58 ms [14:19:05] RECOVERY - Host ps1-b7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.84 ms [14:19:05] RECOVERY - Host ps1-d3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 4.83 ms [14:19:05] RECOVERY - Host ps1-b6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.73 ms [14:19:06] RECOVERY - Host ps1-c1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.59 ms [14:19:06] RECOVERY - Host ps1-a5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.69 ms [14:19:07] RECOVERY - Host ps1-a6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.90 ms [14:19:07] RECOVERY - Host ps1-d2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.72 ms [14:19:08] RECOVERY - Host ps1-c7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.21 ms [14:19:08] RECOVERY - Host ps1-c3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 4.01 ms [14:19:09] RECOVERY - Host ps1-d6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.18 ms [14:19:12] o_O [14:19:25] RECOVERY - Host ps1-a4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.62 ms [14:19:25] RECOVERY - Host ps1-a8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 4.51 ms [14:19:25] RECOVERY - Host ps1-b4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 4.02 ms [14:19:36] RECOVERY - Host ps1-d5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.65 ms [14:19:36] RECOVERY - Host ps1-c2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.26 ms [14:19:36] RECOVERY - Host ps1-a7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.99 ms [14:19:36] RECOVERY - Host ps1-b8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.65 ms [14:20:17] 6operations, 10ops-esams, 10netops: Fix esams management network - https://phabricator.wikimedia.org/T80253#1602280 (10faidon) p:5Normal>3Low [14:20:18] 6operations, 10ops-esams, 10netops: Fix esams management network - https://phabricator.wikimedia.org/T80253#873175 (10faidon) [14:21:08] !log rebooting msw1-codfw [14:21:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:24:59] !log changing IPv6 RA interval/lifetime/virtual-router-only @ eqiad [14:25:03] bblack: ^ [14:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:25:26] :) [14:25:54] PROBLEM - Host mr1-codfw is DOWN: PING CRITICAL - Packet loss = 100% [14:26:06] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 114, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/3/3: down - Core: msw1-codfw:xe-0/1/2 {#10712} [10Gbps DF]BR [14:26:24] default via fe80::1 dev eth0 proto ra metric 1024 expires 582sec hoplimit 64 [14:26:47] yup [14:26:52] 10Ops-Access-Requests, 6operations: Request to access apertium-apy service restart - https://phabricator.wikimedia.org/T111360#1602327 (10KartikMistry) 3NEW [14:27:05] akosiaris: ^ [14:28:29] !log restarted phd (phabricator daemon) to pick up new configuration [14:28:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:30:25] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 116, down: 0, dormant: 0, excluded: 0, unused: 0 [14:31:35] RECOVERY - Host mr1-codfw is UP: PING OK - Packet loss = 0%, RTA = 53.02 ms [14:32:44] RECOVERY - mysqld processes on es1002 is OK: PROCS OK: 1 process with command name mysqld [14:32:58] (03CR) 10Filippo Giunchedi: [C: 031] Assign grains for initial debdeploy clients [puppet] - 10https://gerrit.wikimedia.org/r/235725 (owner: 10Muehlenhoff) [14:33:05] !log rebooting msw1-eqiad [14:33:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:34:18] (03PS2) 10Muehlenhoff: Assign grains for initial debdeploy clients [puppet] - 10https://gerrit.wikimedia.org/r/235725 [14:35:16] RECOVERY - mysqld processes on es1016 is OK: PROCS OK: 1 process with command name mysqld [14:36:14] PROBLEM - Host ps1-b6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:14] PROBLEM - Host ps1-d3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:15] PROBLEM - Host ps1-c3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:15] PROBLEM - Host ps1-b8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:15] PROBLEM - Host ps1-b4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:15] PROBLEM - Host ps1-b2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:15] PROBLEM - Host ps1-c2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:16] PROBLEM - Host ps1-a6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:16] PROBLEM - Host ps1-d7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:24] PROBLEM - Host ps1-a1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:24] PROBLEM - Host ps1-c7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:24] PROBLEM - Host ps1-a2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:24] PROBLEM - Host ps1-a8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:24] PROBLEM - Host ps1-b7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:25] PROBLEM - Host ps1-b3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:25] PROBLEM - Host ps1-b1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:26] PROBLEM - Host ps1-d2-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:26] PROBLEM - Host ps1-c1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:27] PROBLEM - Host ps1-d1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:28] PROBLEM - Host ps1-a5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:28] PROBLEM - Host ps1-d4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:34] PROBLEM - Host ps1-c5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:34] PROBLEM - Host ps1-c8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:34] PROBLEM - Host ps1-d6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:34] PROBLEM - Host ps1-a3-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:34] PROBLEM - Host ps1-c6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:35] PROBLEM - Host ps1-c4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:35] PROBLEM - Host ps1-b5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:36] (03CR) 10Muehlenhoff: [C: 032 V: 032] Assign grains for initial debdeploy clients [puppet] - 10https://gerrit.wikimedia.org/r/235725 (owner: 10Muehlenhoff) [14:36:36] PROBLEM - Host ps1-d8-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:37] still me [14:36:54] PROBLEM - Host ps1-d5-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:54] PROBLEM - Host mr1-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:36:54] PROBLEM - Host ps1-a4-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:37:04] PROBLEM - Host ps1-a7-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [14:37:26] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling, 7Nodepool: Upload nodepool_0.1.1-wmf2 package to apt.wikimedia.org to `jessie-wikimedia/thirdparty` - https://phabricator.wikimedia.org/T111203#1602397 (10Andrew) 5Open>3Resolved a:3Andrew ok, should be all set. Let me know if it... [14:40:53] (03PS2) 10Ottomata: Set Yarn AppMaster possible port range to 55000-55199 [puppet] - 10https://gerrit.wikimedia.org/r/235261 [14:41:06] (03CR) 10Ottomata: [C: 032 V: 032] Set Yarn AppMaster possible port range to 55000-55199 [puppet] - 10https://gerrit.wikimedia.org/r/235261 (owner: 10Ottomata) [14:41:54] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:42:05] RECOVERY - Host ps1-a5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.74 ms [14:42:05] RECOVERY - Host ps1-c1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.46 ms [14:42:05] RECOVERY - Host ps1-a8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.59 ms [14:42:05] RECOVERY - Host ps1-c7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.86 ms [14:42:05] RECOVERY - Host ps1-b6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.35 ms [14:42:05] RECOVERY - Host ps1-d1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.17 ms [14:42:05] RECOVERY - Host ps1-d2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.01 ms [14:42:06] RECOVERY - Host ps1-d4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 5.18 ms [14:42:06] RECOVERY - Host ps1-b1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.95 ms [14:42:07] RECOVERY - Host ps1-d3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.73 ms [14:42:07] RECOVERY - Host ps1-a6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.70 ms [14:42:08] RECOVERY - Host ps1-b7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.28 ms [14:42:08] RECOVERY - Host ps1-a2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.10 ms [14:42:09] RECOVERY - Host ps1-b5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.60 ms [14:42:25] RECOVERY - Host ps1-a4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.91 ms [14:42:25] RECOVERY - Host ps1-b4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 4.36 ms [14:42:26] RECOVERY - Host ps1-b8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 4.92 ms [14:42:26] RECOVERY - Host ps1-c2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 8.00 ms [14:42:35] RECOVERY - Host mr1-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.33 ms [14:42:35] RECOVERY - Host ps1-d5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.87 ms [14:42:35] oh hello icinga [14:42:39] good morning to you too [14:42:44] RECOVERY - Host ps1-a7-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.80 ms [14:43:28] it's all good [14:43:36] just switch/router upgrades [14:43:38] :) [14:43:58] and to be fair, the fact that we have no dependencies whatsoever in our icinga config :P [14:44:50] (03PS1) 10Muehlenhoff: Enable ferm on analytics1028 [puppet] - 10https://gerrit.wikimedia.org/r/235731 [14:51:25] !log performing es2 master switchover from es1006 to es1011 [14:51:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:56:32] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm on analytics1028 [puppet] - 10https://gerrit.wikimedia.org/r/235731 (owner: 10Muehlenhoff) [14:56:49] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling, 7Nodepool: Upload nodepool_0.1.1-wmf2 package to apt.wikimedia.org to `jessie-wikimedia/thirdparty` - https://phabricator.wikimedia.org/T111203#1602482 (10hashar) The service started, awesome. Spurts some error but that is a configurati... [15:00:04] anomie ostriches thcipriani marktraceur Krenair: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150903T1500). [15:00:04] kart_: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:13] here [15:00:20] I can do it [15:00:29] thnx! [15:01:15] kart_: Just wmf21? [15:01:25] yes [15:02:18] ostriches, could I sneak https://gerrit.wikimedia.org/r/#/c/235666/ in? [15:03:17] Krenair: Yeah that's fine. Add it to the page [15:03:28] In fact I'll do yours now while we wait on kart_'s to merge. [15:03:39] (03CR) 10Chad: [C: 032] Make eswiki groupOverrides inherit overrides from relevant tags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235666 (https://phabricator.wikimedia.org/T109157) (owner: 10Alex Monk) [15:04:08] (03Merged) 10jenkins-bot: Make eswiki groupOverrides inherit overrides from relevant tags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235666 (https://phabricator.wikimedia.org/T109157) (owner: 10Alex Monk) [15:04:18] (03PS1) 10Ottomata: analytics1015 is no longer a hadoop worker [puppet] - 10https://gerrit.wikimedia.org/r/235734 [15:04:25] (03PS2) 10Ottomata: analytics1015 is no longer a hadoop worker [puppet] - 10https://gerrit.wikimedia.org/r/235734 [15:04:44] added [15:04:58] !log demon@tin Synchronized php-1.26wmf21/extensions/Translate/: (no message) (duration: 00m 15s) [15:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:05:04] kart_: Plz test ^ [15:05:11] Testing [15:05:34] ostriches: OK. Fixed :) [15:05:38] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 13s) [15:05:38] Nikerabbit: ^^ [15:05:41] yay [15:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:05:44] Krenair: toi aussi ^ [15:06:31] (03CR) 10Ottomata: [C: 032] analytics1015 is no longer a hadoop worker [puppet] - 10https://gerrit.wikimedia.org/r/235734 (owner: 10Ottomata) [15:06:33] yep, that did the trick [15:06:34] thanks [15:06:54] np. [15:06:57] Anybody else? [15:07:30] I have something, but I want to do it myself, when you finish [15:08:31] I think that's it then [15:08:41] * ostriches hands jynus the Rod of Deployments [15:08:43] Go forth! [15:08:46] :-) [15:10:37] (03PS1) 10Jcrespo: Switchover of es2 master from es1006 to es1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235735 (https://phabricator.wikimedia.org/T105843) [15:11:16] (03PS1) 10Yuvipanda: k8s: Setup cluster-dns for k8s [puppet] - 10https://gerrit.wikimedia.org/r/235736 [15:11:43] (03CR) 10Jcrespo: [C: 032] Switchover of es2 master from es1006 to es1011 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235735 (https://phabricator.wikimedia.org/T105843) (owner: 10Jcrespo) [15:11:53] (03PS2) 10Yuvipanda: k8s: Setup cluster-dns for k8s [puppet] - 10https://gerrit.wikimedia.org/r/235736 [15:14:19] here we go [15:14:31] PROBLEM - Hadoop DataNode on analytics1015 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [15:14:43] (03PS2) 10Chad: Elastic: move merge_threads to hiera [puppet] - 10https://gerrit.wikimedia.org/r/207377 [15:14:44] !log jynus@tin Synchronized wmf-config/db-codfw.php: es2 master switchover from es1006 to es1011 (codfw) (duration: 00m 12s) [15:14:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:14:52] icinga beat me to it. [15:14:55] (03CR) 10jenkins-bot: [V: 04-1] Elastic: move merge_threads to hiera [puppet] - 10https://gerrit.wikimedia.org/r/207377 (owner: 10Chad) [15:14:56] cmon puppet! [15:14:58] i'm lcearing you [15:15:03] icinga! [15:15:21] !log jynus@tin Synchronized wmf-config/db-eqiad.php: es2 master switchover from es1006 to es1011 (eqiad) (duration: 00m 13s) [15:15:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:15:48] kart_: works [15:16:15] (03PS3) 10Chad: Elastic: move merge_threads to hiera [puppet] - 10https://gerrit.wikimedia.org/r/207377 [15:16:17] (03PS1) 10Chad: Add -s option to my .ackrc [puppet] - 10https://gerrit.wikimedia.org/r/235738 [15:16:25] Wherps. [15:16:29] Forgot about that patch [15:16:48] lol, YuviPanda... [15:16:55] puppetswat 235738? [15:17:42] ostriches: heh, put it in https://wikitech.wikimedia.org/wiki/Deployments#Thursday.2C.C2.A0September.C2.A003? [15:17:45] !log stopping nodepool on labnodepool1001.eqiad.wmnet not ready yet [15:17:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:18:04] so, it seems that we can still save pages, so that is cool [15:18:07] It's going to take me longer to edit that page.... [15:18:07] lol [15:18:20] !bash so, it seems that we can still save pages, so that is cool [15:18:54] :) [15:21:08] (03PS1) 10BBlack: multicert + libssl1.0.2 diffs rebased onto 1.9.4-1 [software/nginx] (wmf-1.9.4-1) - 10https://gerrit.wikimedia.org/r/235739 [15:21:10] (03PS1) 10BBlack: Release 1.9.4-1+wmf1 (multicert, libssl1.0.2) [software/nginx] (wmf-1.9.4-1) - 10https://gerrit.wikimedia.org/r/235740 [15:21:35] (03PS1) 10Ottomata: Add analytics1053 and analytic1057 as hadoop worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/235741 [15:21:45] 10Ops-Access-Requests, 6operations, 5Continuous-Integration-Scaling: contint-admins can't start/stop nodepool (lack sudo) - https://phabricator.wikimedia.org/T111374#1602573 (10hashar) 3NEW [15:22:03] (03CR) 10Yuvipanda: [C: 032] k8s: Setup cluster-dns for k8s [puppet] - 10https://gerrit.wikimedia.org/r/235736 (owner: 10Yuvipanda) [15:22:50] (03PS1) 10Hashar: nodepool: sudo rules for contint-admins [puppet] - 10https://gerrit.wikimedia.org/r/235742 (https://phabricator.wikimedia.org/T111374) [15:22:59] actually, with so much redundancy and cache levels, it is not at all clear if I set the wrong server as read-only [15:23:12] *immediately clear [15:23:15] someone in ops should be able to close the task mentioned at https://phabricator.wikimedia.org/T111335#1602583 [15:23:38] (03PS2) 10Ottomata: Add analytics1053 and analytic1057 as hadoop worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/235741 [15:23:44] (03CR) 10Ottomata: [C: 032 V: 032] Add analytics1053 and analytic1057 as hadoop worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/235741 (owner: 10Ottomata) [15:23:59] 6operations, 5Patch-For-Review: Ferm rules for elasticsearch - https://phabricator.wikimedia.org/T104962#1602598 (10chasemp) 5Open>3Resolved We had a meeting with the following outcomes: 1. We moved all shards off of the master 2. We picked a relative low traffic time 3. We removed the master from LVS (no... [15:24:27] YuviPanda: Added. [15:24:53] 6operations, 10ops-eqiad, 10Analytics-Cluster, 5Patch-For-Review: rack new hadoop worker nodes - https://phabricator.wikimedia.org/T104463#1602601 (10Ottomata) I just installed and puppetized these nodes. Thanks! [15:25:31] chasemp: Speaking of ferm rules for Elastic ^. I had a dream last night that involved disabling those after they were enabled. Somebody was like "nooooo!" I don't remember the rest of the dream. [15:25:41] I have odd dreams. [15:25:45] Good thing vacation is next week. [15:25:48] :) [15:26:04] you got firewalled out of your dream [15:27:23] 6operations, 10netops: Set up NTT transit @ eqdfw, eqord - https://phabricator.wikimedia.org/T111274#1602632 (10RobH) I've gone ahead and sent the order into EQ via the portal (which attaches the LoA.) Unlike our other carriers, NTT wasn't populated in the carrier drop-down, so I had to manually enter all the... [15:27:55] 6operations, 10netops: Set up NTT transit @ eqdfw, eqord - https://phabricator.wikimedia.org/T111274#1602639 (10RobH) a:3RobH claiming until i get completion notice of the x-connect, then I'll reassign to @faidon. [15:29:43] I will wait a bit before doing the switchover of the other shard, just in case [15:32:17] ostriches: :D ok [15:32:26] and this may seem like a small achievement, but just there went our first production master with mariadb10, and that means hundreds of hours of work to make that happen [15:33:24] congrats :) [15:34:43] PROBLEM - Host analytics1053 is DOWN: PING CRITICAL - Packet loss = 100% [15:36:51] (03CR) 10Andrew Bogott: "Was a proper fix for this ever applied?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [15:38:20] (03PS1) 10Jcrespo: Depool es1006 (old master) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235746 [15:38:33] yay, congrats! [15:39:08] (03CR) 10RobH: [C: 031] "good for puppet swat later today." [puppet] - 10https://gerrit.wikimedia.org/r/235738 (owner: 10Chad) [15:39:10] (03CR) 10Alex Monk: [C: 04-1] "per task, needs image change added" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) (owner: 10MarcoAurelio) [15:39:20] (03CR) 10Jcrespo: [C: 032] Depool es1006 (old master) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235746 (owner: 10Jcrespo) [15:40:34] Krenair: so I update the file on my local directory and send it to gerrit, right? [15:40:41] !log jynus@tin Synchronized wmf-config/db-eqiad.php: depool es1006 (duration: 00m 12s) [15:40:45] if so, I'll do this later [15:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:40:46] yep [15:40:49] like any other file [15:41:24] I had a doubt since it was not plain code, knowing that now, I'll amend the patch later today I hope [15:42:01] git handles non-text files as well [15:42:19] it just can't give you useful diffs for them [15:42:52] that's what confused me, I thought, "well, I won't get a diff so gerrit is not the solution" [15:42:55] --> fail [15:42:57] :) [15:43:09] oh and - if you have not already done so when you upload the patch, the deployer should run optipng on the file [15:43:30] in theory [15:43:36] it's been missed a couple of times in the past [15:44:38] will note it in gerrit so it's not missed [15:45:40] (03CR) 10RobH: [C: 031] policy.wikimedia: remove puppetization [puppet] - 10https://gerrit.wikimedia.org/r/235673 (https://phabricator.wikimedia.org/T110203) (owner: 10Dzahn) [15:45:56] (03CR) 10BryanDavis: "> Was a proper fix for this ever applied?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [15:46:29] (03CR) 10RobH: "This site migrated off our cluster on Tuesday. So we don't need to support its role on our side any longer." [puppet] - 10https://gerrit.wikimedia.org/r/235673 (https://phabricator.wikimedia.org/T110203) (owner: 10Dzahn) [15:47:48] 6operations, 10ops-eqiad, 7Database, 5Patch-For-Review: Disk issue on db1028 - https://phabricator.wikimedia.org/T103230#1602678 (10Cmjohnson) 5Open>3Resolved Resolving [15:48:52] ^cmjohnson1, ups [15:52:34] (03PS2) 10BBlack: Release 1.9.4-1+wmf1 (multicert, libssl1.0.2) [software/nginx] (wmf-1.9.4-1) - 10https://gerrit.wikimedia.org/r/235740 [15:53:52] RECOVERY - Host analytics1053 is UP: PING OK - Packet loss = 0%, RTA = 2.35 ms [15:56:59] (03PS10) 10Thcipriani: Add service deploy via scap [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 [15:59:53] PROBLEM - Hadoop DataNode on analytics1053 is CRITICAL: Connection refused by host [16:00:04] YuviPanda robh: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150903T1600). [16:00:04] ostriches: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:33] robh: wanna do it? [16:00:39] super trivial :) [16:00:51] yep [16:00:53] i'll do it [16:01:00] that way we can shame others that 3 opsen have swatted [16:01:01] ;] [16:01:15] (03PS2) 10RobH: Add -s option to my .ackrc [puppet] - 10https://gerrit.wikimedia.org/r/235738 (owner: 10Chad) [16:01:41] !log performing es3 master switchover from es1009 to es1014 [16:01:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:02:29] (03CR) 10RobH: [C: 032] Add -s option to my .ackrc [puppet] - 10https://gerrit.wikimedia.org/r/235738 (owner: 10Chad) [16:03:01] swat complete [16:03:11] ostriches: yer change is now live on cluster [16:03:22] oh my gerdddd [16:03:26] thx [16:03:30] heh [16:03:33] arent you overwhelmed in gratitude of the swat process? [16:03:51] it can be a lot to take in, you can take a moment. [16:04:14] Thats why YuviPanda and I do this, to touch lives. [16:04:22] (03PS1) 10BBlack: Enable IPv6 for donate.wm.o [dns] - 10https://gerrit.wikimedia.org/r/235749 (https://phabricator.wikimedia.org/T73267) [16:04:24] PROBLEM - YARN NodeManager Node-State on analytics1053 is CRITICAL: Connection refused by host [16:04:26] I have a couple to add [16:04:33] PROBLEM - configured eth on analytics1053 is CRITICAL: Connection refused by host [16:04:34] PROBLEM - dhclient process on analytics1053 is CRITICAL: Connection refused by host [16:04:37] Krenair: technically its over but link em here =] [16:04:48] https://gerrit.wikimedia.org/r/#/c/234404/ and https://gerrit.wikimedia.org/r/#/c/234429/ [16:04:53] PROBLEM - puppet last run on analytics1053 is CRITICAL: Connection refused by host [16:04:53] PROBLEM - Hadoop NodeManager on analytics1053 is CRITICAL: Connection refused by host [16:05:12] PROBLEM - salt-minion processes on analytics1053 is CRITICAL: Connection refused by host [16:05:16] 6operations, 10Traffic, 10fundraising-tech-ops, 7IPv6, 5Patch-For-Review: Enable IPv6 on donate.wikimedia.org - https://phabricator.wikimedia.org/T73267#1602727 (10BBlack) Actually, it was implemented currently via "text-addrs-v4". Simply removing the trailing "-v4" in a few places fixes this, as shown... [16:05:32] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 10.65.0.1, interfaces up: 34, down: 1, dormant: 0, excluded: 0, unused: 0BRfe-0/0/7: down - Layer42 OOB Link - dead - T82323BR [16:05:33] PROBLEM - DPKG on analytics1053 is CRITICAL: Connection refused by host [16:05:41] (03CR) 10BBlack: [C: 04-1] "holding on discussion in linked task" [dns] - 10https://gerrit.wikimedia.org/r/235749 (https://phabricator.wikimedia.org/T73267) (owner: 10BBlack) [16:05:43] PROBLEM - RAID on analytics1053 is CRITICAL: Connection refused by host [16:05:52] PROBLEM - Disk space on analytics1053 is CRITICAL: Connection refused by host [16:05:58] Krenair: robh I think it's ok to add more if we have time left - that's how MW swat does it too. Ofc, discretion of the swatters [16:06:03] PROBLEM - Disk space on Hadoop worker on analytics1053 is CRITICAL: Connection refused by host [16:06:28] the apache one seems easy enough (just normal apache udpates and all) but the other one I have to look at what polls that file [16:06:31] so i'll do apache one now. [16:07:02] i would do the DNS one [16:07:03] IIRC, you do need to make some form of change to another file in the dns repository to make it take effect [16:07:14] sorry for the pings about an53, in standup, its a new node, coming up now, will fix. [16:07:40] Krenair: add it to the page? [16:07:46] ok [16:07:48] (03PS2) 10Dzahn: Add be-tarask for renaming from be-x-old [dns] - 10https://gerrit.wikimedia.org/r/234429 (https://phabricator.wikimedia.org/T11823) (owner: 10Alex Monk) [16:08:00] mutante: dns one? [16:08:09] yea, i volunteer for that [16:08:16] no one linked a dns change? [16:08:16] adding the new language [16:08:28] https://gerrit.wikimedia.org/r/#/c/234429 [16:08:43] oh, that sorry, i misread it [16:09:13] mutante: im swatting dude [16:09:17] but if you want [16:09:21] you can have them and do the swat ;] [16:09:46] you cannot wander into swat and just take the easiest tickets ;p [16:09:54] steps away again [16:09:59] ;] [16:10:11] 6operations, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#1602730 (10greg) [16:10:40] YuviPanda: wanna +1 https://gerrit.wikimedia.org/r/#/c/234404/2 [16:10:45] it looks good to me but I want two votes [16:11:00] Krenair: shall I care about the size of the logo when uploading the patch or deployement takes care of that? [16:11:04] (03CR) 10RobH: [C: 032] "will merge in a moment." [puppet] - 10https://gerrit.wikimedia.org/r/234404 (https://phabricator.wikimedia.org/T41482) (owner: 10Alex Monk) [16:11:21] mafk, you need to get the size right, yes [16:11:22] robh: I don't know enough about apache to be able to +1 that, I think. [16:11:27] Okay. [16:11:29] ha [16:11:34] The apache change? [16:11:44] The apache part of it is generated automatically by the script [16:11:54] its pretty easy the second review was just ensuring that krenair didnt introduce a typo and i miss it [16:11:58] so i'll just merge [16:12:02] (and test) [16:12:05] (03CR) 10RobH: [C: 032] Add be-tarask for renaming from be-x-old [dns] - 10https://gerrit.wikimedia.org/r/234429 (https://phabricator.wikimedia.org/T11823) (owner: 10Alex Monk) [16:12:28] oh, and there's the remnant change as well I guess :/ [16:12:49] but still, it should be pretty trivial if you check what the vhost looked like before the previous change to it [16:13:05] so the apache patchset is missing stuff? [16:13:25] no [16:13:25] oh, yo umean the remanent change was manual [16:13:29] parsing error on my part ;] [16:13:32] yeah [16:14:08] I thought it was mostly done by the script, then looked back at the change and realised the remnant.conf part was manual [16:14:28] !log disabling puppet on all mw systems for apache config update [16:14:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:15:30] (03PS3) 10RobH: Redirect chapcom.wikimedia.org to affcom.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/234404 (https://phabricator.wikimedia.org/T41482) (owner: 10Alex Monk) [16:16:16] (03CR) 10BBlack: [C: 032 V: 032] multicert + libssl1.0.2 diffs rebased onto 1.9.4-1 [software/nginx] (wmf-1.9.4-1) - 10https://gerrit.wikimedia.org/r/235739 (owner: 10BBlack) [16:16:25] (03CR) 10BBlack: [C: 032 V: 032] Release 1.9.4-1+wmf1 (multicert, libssl1.0.2) [software/nginx] (wmf-1.9.4-1) - 10https://gerrit.wikimedia.org/r/235740 (owner: 10BBlack) [16:16:39] (03CR) 10RobH: [C: 032] Redirect chapcom.wikimedia.org to affcom.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/234404 (https://phabricator.wikimedia.org/T41482) (owner: 10Alex Monk) [16:17:48] 6operations, 10RESTBase-Cassandra, 10hardware-requests: codfw 3x spares for cassandra encryption testing - https://phabricator.wikimedia.org/T111382#1602749 (10fgiunchedi) 3NEW a:3RobH [16:18:39] (03PS2) 10MarcoAurelio: Change Kannada Wikisource logo and project title [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) [16:19:20] (03PS1) 10Jcrespo: Switchover es3 master from es1009 to es1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235755 [16:20:03] RECOVERY - DPKG on analytics1053 is OK: All packages OK [16:20:09] Krenair: before amend the patch, shall I rebase on gerrit? [16:20:13] RECOVERY - RAID on analytics1053 is OK: OK: optimal, 13 logical, 14 physical [16:20:18] nah [16:20:21] k [16:20:23] RECOVERY - Disk space on analytics1053 is OK: DISK OK [16:20:33] RECOVERY - Disk space on Hadoop worker on analytics1053 is OK: DISK OK [16:20:43] RECOVERY - Hadoop DataNode on analytics1053 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [16:20:54] ok, change worked on mw1222 testing, reenabling puppet on all mw and running [16:21:03] RECOVERY - YARN NodeManager Node-State on analytics1053 is OK: OK: YARN NodeManager analytics1053.eqiad.wmnet:8041 Node-State: RUNNING [16:21:12] RECOVERY - configured eth on analytics1053 is OK: OK - interfaces up [16:21:13] RECOVERY - dhclient process on analytics1053 is OK: PROCS OK: 0 processes with command name dhclient [16:21:20] !log re-enabling puppet on all mw systems [16:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:21:32] RECOVERY - puppet last run on analytics1053 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:21:32] RECOVERY - Hadoop NodeManager on analytics1053 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [16:21:43] RECOVERY - salt-minion processes on analytics1053 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:21:55] ok, here we go again [16:22:18] when someone says that i get the 'here i go again on my own' song in my head. [16:22:25] (03CR) 10Jcrespo: [C: 032] Switchover es3 master from es1009 to es1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235755 (owner: 10Jcrespo) [16:22:48] https://www.youtube.com/watch?v=i3MXiTeH_Pg [16:23:03] !log fixed content model of Template:Languages@metawiki [16:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:23:18] thanks! [16:23:36] hrmm, salt re-enabling mw puppet seems to take too long... [16:23:36] robh, we usually test things on mw1017 because of https://wikitech.wikimedia.org/wiki/Debugging_in_production [16:24:14] huh, duly noted [16:24:18] i'll append to the apcahe sync notes [16:24:39] though the 'test' isnt actual content pulling, just the apache-fast-test script [16:24:47] mmm, I almost commit a small mistake [16:25:21] all mw systems doing the 25% at a time batched puppet updating [16:25:42] (03PS3) 10MarcoAurelio: Change Kannada Wikisource logo and project title [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) [16:25:49] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 7Jenkins: Please refresh Jenkins package on apt.wikimedia.org to 1.609.3 - https://phabricator.wikimedia.org/T111327#1602772 (10Dzahn) a:3Dzahn [16:26:16] (03PS1) 10Jcrespo: Decreasing es1014 load, as a master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235758 [16:26:46] !log imported jenkins 1.609.3 into APT repo [16:26:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:26:57] (03CR) 10Jcrespo: [C: 032] Decreasing es1014 load, as a master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235758 (owner: 10Jcrespo) [16:27:48] robh: if you get a chance today mind checking https://phabricator.wikimedia.org/T111382 ? if it looks sane I can start reprovisioning some tomorrow [16:28:39] what hardware are we awaiting shipemnt to codfw? [16:28:58] !log jynus@tin Synchronized wmf-config/db-codfw.php: es3 master switchover from es1009 to es1014 (codfw) (duration: 00m 13s) [16:29:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:29:31] godog: Sorry, What hardware are we awaiting shipment on to codfw? I have no notes of restbase hardware in codfw. [16:29:41] hmm, 123 x 154 for the logo [16:30:19] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 7Jenkins: Please refresh Jenkins package on apt.wikimedia.org to 1.609.3 - https://phabricator.wikimedia.org/T111327#1602809 (10Dzahn) I did it. using `reprepro -C thirdparty includedeb precise-wikimedia jenkins_1.609.3_all.deb`.... [16:30:24] 6operations, 10RESTBase-Cassandra, 10hardware-requests: codfw 3x spares for cassandra encryption testing - https://phabricator.wikimedia.org/T111382#1602810 (10RobH) What hardware are we waiting on shipment to codfw? The recent restbase expansion order was entirely for eqiad. [16:30:45] !log jynus@tin Synchronized wmf-config/db-eqiad.php: es3 master switchover from es1009 to es1014 (eqiad) (duration: 00m 13s) [16:30:47] !log updating nginx->1.9.4 on cp1071, cp3033 for prod validation before broader rollout [16:30:47] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 7Jenkins: Please refresh Jenkins package on apt.wikimedia.org to 1.609.3 - https://phabricator.wikimedia.org/T111327#1602818 (10Dzahn) 5Open>3Resolved [16:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:30:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:31:27] robh: double checking [16:33:18] 6operations, 10Continuous-Integration-Infrastructure, 7Jenkins: Please refresh Jenkins package on apt.wikimedia.org to 1.609.3 - https://phabricator.wikimedia.org/T111327#1602857 (10Dzahn) [16:36:53] 6operations, 10RESTBase-Cassandra, 10hardware-requests: codfw 3x spares for cassandra encryption testing - https://phabricator.wikimedia.org/T111382#1602899 (10GWicke) > The recent restbase expansion order was entirely for eqiad. Umm, that surprises me. Those six nodes are meant to go to codfw, so that we c... [16:36:54] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Puppet has 1 failures [16:37:10] (03PS4) 10MarcoAurelio: Change Kannada Wikisource logo and project title [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) [16:37:35] so, it seems that with that external usb drive that I installed, it seems that we will have some extra space to continue saving pages for a couple of more days [16:38:27] disaster averted :-) [16:38:46] jynus: hope it is usb3 [16:38:54] robh: did the changes get through, btw? [16:39:16] no, only usb2, but I put 2 of them! [16:39:49] YuviPanda: puppet is still running acorss all mw [16:39:50] (03CR) 10MarcoAurelio: "Note to deployer: please do not forget to run optipng on the file (from #wikimedia-operations). Thanks." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) (owner: 10MarcoAurelio) [16:40:12] robh: ok! [16:40:31] jynus: ah, ok. But I thought we were gonna use zip drives? [16:40:35] 25% at a time puppet runs for mw takes as long as lettin git run manually. [16:40:45] except it has scrollback for error checking. [16:40:59] floppy disks, 5 1/4, but in the end I got budget for usb drives [16:41:15] jynus: waste of donor money, I say [16:41:49] (03PS5) 10Alex Monk: Change Kannada Wikisource logo and project title [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) (owner: 10MarcoAurelio) [16:42:11] (03CR) 10Alex Monk: "optipng -o7 ran on PS5" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) (owner: 10MarcoAurelio) [16:45:02] my calculations say that we have now space for 4.2 years of revisions, which means it is now a problem for jynus of the future [16:46:02] jynus: 500 years if we remove vandal edits ;) [16:46:17] :) [16:46:26] actually, no [16:46:36] revisions are stored compressed [16:47:04] so I assume the actual overhead for those is quite small [16:47:44] I think there would be worse offender, like bots going crazy [16:49:11] there's a "block" button for that [16:49:40] oh, yes, I am purely thinking storage-wise, not user-wise [16:51:52] robh, did the dns change not apply? [16:54:05] Krenair: it was supposed to add that to the langsubdomains right? [16:54:13] yeah [16:54:23] IIRC, you do need to make some form of change to another file in the dns repository to make it take effect [16:55:28] ahh, i missed that part [16:55:58] robh, mutante: https://gerrit.wikimedia.org/r/#/c/206146/ [16:56:12] yea doing now [16:56:14] 6operations, 6Performance-Team, 7Mobile: Remove docroot:/images/mobile in favour of docroot:/static/images/mobile - https://phabricator.wikimedia.org/T107395#1603115 (10Jdlrobson) [16:59:19] (03CR) 10MarcoAurelio: [C: 04-1] "abusefilter-hide-log and abusefilter-hidden-log are Oversight-level permissions which I don't think should be given to non-oversighters IM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234910 (https://phabricator.wikimedia.org/T109755) (owner: 10Mjbmr) [16:59:53] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:00:05] bearND mdholloway: Respected human, time to deploy Mobileapps service (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150903T1700). Please do the needful. [17:00:06] (03PS1) 10RobH: touch wikipedia.org template to generate 'be-tarask' [dns] - 10https://gerrit.wikimedia.org/r/235761 [17:00:43] (03CR) 10RobH: [C: 032] touch wikipedia.org template to generate 'be-tarask' [dns] - 10https://gerrit.wikimedia.org/r/235761 (owner: 10RobH) [17:01:44] Krenair: it works now [17:01:54] YuviPanda: puppet run finished 3 minutes ago [17:02:01] so it technically was done within swat ;] [17:02:12] the apache update, dns was a bit late but meh [17:02:25] yeah, 'tis ok [17:05:39] robh: Krenair so can we call puppetswat complete? [17:05:48] yep [17:05:52] thanks [17:06:34] !log cloning es1006 mysql data into es1015 [ETA:8h] [17:06:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:07:55] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra inter-node encryption (TLS) - https://phabricator.wikimedia.org/T108953#1603185 (10Eevans) >>! In T108953#1601771, @fgiunchedi wrote: > thanks @eevans! I think that'll work, basically to generalize it on a per-cluster basis something like: > [ ... ] >... [17:09:31] bd808: please ping if/when you are around [17:18:56] (03PS1) 10Yuvipanda: k8s: Add simple webproxy [puppet] - 10https://gerrit.wikimedia.org/r/235764 [17:19:19] (03PS2) 10Yuvipanda: k8s: Add simple webproxy [puppet] - 10https://gerrit.wikimedia.org/r/235764 [17:20:08] (03CR) 10jenkins-bot: [V: 04-1] k8s: Add simple webproxy [puppet] - 10https://gerrit.wikimedia.org/r/235764 (owner: 10Yuvipanda) [17:21:07] (03PS3) 10Yuvipanda: k8s: Add simple webproxy [puppet] - 10https://gerrit.wikimedia.org/r/235764 [17:23:06] (03CR) 10Yuvipanda: [C: 032 V: 032] k8s: Add simple webproxy [puppet] - 10https://gerrit.wikimedia.org/r/235764 (owner: 10Yuvipanda) [17:25:00] (03PS1) 10Yuvipanda: k8s: We aren't using k9s as we aren't a drug unit [puppet] - 10https://gerrit.wikimedia.org/r/235765 [17:25:12] (03CR) 10Yuvipanda: [C: 032 V: 032] k8s: We aren't using k9s as we aren't a drug unit [puppet] - 10https://gerrit.wikimedia.org/r/235765 (owner: 10Yuvipanda) [17:29:00] (03PS1) 10Yuvipanda: k8s: s/contents/content/ [puppet] - 10https://gerrit.wikimedia.org/r/235766 [17:29:13] (03CR) 10Yuvipanda: [C: 032 V: 032] k8s: s/contents/content/ [puppet] - 10https://gerrit.wikimedia.org/r/235766 (owner: 10Yuvipanda) [17:30:00] (03CR) 10Dzahn: "this commit message was brought to you by dogs.wikia.com" [puppet] - 10https://gerrit.wikimedia.org/r/235765 (owner: 10Yuvipanda) [17:32:03] (03PS4) 10Dzahn: policy.wikimedia: remove puppetization [puppet] - 10https://gerrit.wikimedia.org/r/235673 (https://phabricator.wikimedia.org/T110203) [17:33:08] (03CR) 10Dzahn: [C: 032] policy.wikimedia: remove puppetization [puppet] - 10https://gerrit.wikimedia.org/r/235673 (https://phabricator.wikimedia.org/T110203) (owner: 10Dzahn) [17:34:36] !log bromine - deleting policy docroot [17:34:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:39:16] !log krenair@tin Synchronized php-1.26wmf21/extensions/OpenStackManager/nova/OpenStackNovaController.php: https://gerrit.wikimedia.org/r/#/c/235769/ (duration: 00m 12s) [17:39:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:42:50] (03CR) 10Catrope: [C: 032] Enable Flow beta feature on beta lab [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235516 (owner: 10Sbisson) [17:43:15] (03Merged) 10jenkins-bot: Enable Flow beta feature on beta lab [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235516 (owner: 10Sbisson) [17:47:59] (03PS1) 10Rush: diamond: wdqs and blazegraph collectors tweaks [puppet] - 10https://gerrit.wikimedia.org/r/235770 [17:50:09] (03PS1) 10Yuvipanda: k8s: Run kube-proxy on the webproxy hosts [puppet] - 10https://gerrit.wikimedia.org/r/235771 [17:50:18] (03PS2) 10Yuvipanda: k8s: Run kube-proxy on the webproxy hosts [puppet] - 10https://gerrit.wikimedia.org/r/235771 [17:50:25] (03CR) 10Yuvipanda: [C: 032 V: 032] k8s: Run kube-proxy on the webproxy hosts [puppet] - 10https://gerrit.wikimedia.org/r/235771 (owner: 10Yuvipanda) [17:51:49] (03CR) 10Rush: "If andrew is fine with it then ok by me. He has more insight into whether this would be an issue. thanks." [puppet] - 10https://gerrit.wikimedia.org/r/235742 (https://phabricator.wikimedia.org/T111374) (owner: 10Hashar) [17:52:19] (03PS1) 10Ottomata: Debian packaging for 1.1.1-1 [debs/python-pykafka] (debian) - 10https://gerrit.wikimedia.org/r/235773 [17:52:50] (03PS2) 10Ottomata: Debian packaging for 1.1.1-1 [debs/python-pykafka] (debian) - 10https://gerrit.wikimedia.org/r/235773 (https://phabricator.wikimedia.org/T111182) [17:53:31] (03CR) 10Ottomata: [C: 032 V: 032] Debian packaging for 1.1.1-1 [debs/python-pykafka] (debian) - 10https://gerrit.wikimedia.org/r/235773 (https://phabricator.wikimedia.org/T111182) (owner: 10Ottomata) [17:53:40] (03PS8) 10Rush: diamond: service stats puppet integration [puppet] - 10https://gerrit.wikimedia.org/r/224094 (https://phabricator.wikimedia.org/T108027) (owner: 10Filippo Giunchedi) [17:53:48] (03CR) 10Smalyshev: [C: 031] diamond: wdqs and blazegraph collectors tweaks [puppet] - 10https://gerrit.wikimedia.org/r/235770 (owner: 10Rush) [17:55:06] andrewbogott: just seeing your ping.What's up? [17:55:33] (03PS2) 10Rush: diamond: wdqs and blazegraph collectors tweaks [puppet] - 10https://gerrit.wikimedia.org/r/235770 [17:55:51] bd808: It’s the same issue that I pinged you on gerrit about. I wish I could read ldap logs on wikitech… it’s broken in at least one serious way and I’m totally blind and helpless without the logs. [17:56:01] Sounds like you might have a quick fix for that? [17:56:23] (03PS1) 10Chad: Phab (labs): Move sshd to 2222, easier to remember than 222 [puppet] - 10https://gerrit.wikimedia.org/r/235777 [17:56:25] (03PS1) 10Chad: Phab: clean up role, remove ::config and ::main abstraction [puppet] - 10https://gerrit.wikimedia.org/r/235778 [17:56:42] gotcha. yeah let me see if I can amend that patch so you can test it. [17:57:07] (03PS1) 10Yuvipanda: k8s: Specify portnumber for http proxying [puppet] - 10https://gerrit.wikimedia.org/r/235779 [17:57:26] (03PS2) 10Yuvipanda: k8s: Specify portnumber for http proxying [puppet] - 10https://gerrit.wikimedia.org/r/235779 [17:57:33] (03CR) 10Rush: [C: 032] diamond: wdqs and blazegraph collectors tweaks [puppet] - 10https://gerrit.wikimedia.org/r/235770 (owner: 10Rush) [17:57:35] (03CR) 10Yuvipanda: [C: 032 V: 032] k8s: Specify portnumber for http proxying [puppet] - 10https://gerrit.wikimedia.org/r/235779 (owner: 10Yuvipanda) [17:57:57] (03PS3) 10Yuvipanda: k8s: Specify portnumber for http proxying [puppet] - 10https://gerrit.wikimedia.org/r/235779 [17:58:12] (03CR) 10Yuvipanda: [V: 032] k8s: Specify portnumber for http proxying [puppet] - 10https://gerrit.wikimedia.org/r/235779 (owner: 10Yuvipanda) [18:00:04] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150903T1800). Please do the needful. [18:00:33] (03PS9) 10Filippo Giunchedi: diamond: service stats puppet integration [puppet] - 10https://gerrit.wikimedia.org/r/224094 (https://phabricator.wikimedia.org/T108027) [18:00:39] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] diamond: service stats puppet integration [puppet] - 10https://gerrit.wikimedia.org/r/224094 (https://phabricator.wikimedia.org/T108027) (owner: 10Filippo Giunchedi) [18:00:44] I kinda like that the Monthly Metrics meeting is at the same time as our normal MW train window :) [18:03:41] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Add plumbing code for Flow beta feature (unused for now) (duration: 00m 12s) [18:03:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:04:27] !log catrope@tin Synchronized wmf-config/CommonSettings.php: Add plumbing code for Flow beta feature (unused for now) (duration: 00m 12s) [18:04:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:13:12] !log rolling restart of hadoop yarn nodemanagers to pick up Yarn AppMaster port range limitation to apply ferm rules. [18:13:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:13:59] 6operations: A link to Developer App Guidelines on API page - https://phabricator.wikimedia.org/T111423#1603490 (10VBaranetsky) 3NEW [18:14:00] (03PS3) 10BryanDavis: wikitech: Local logging config for ldap debugging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 [18:16:26] (03CR) 10BryanDavis: "Removing my -2 now that the patch is using the Iba6f115 solution. Needs testing, probably via cherry-pick to silver." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [18:17:03] andrewbogott: ^ there's my thought about what should work [18:18:17] 6operations: A link to Developer App Guidelines on API page - https://phabricator.wikimedia.org/T111423#1603510 (10Krenair) 5Open>3Invalid a:3Krenair This is a page for the generic MediaWiki API. It is not specific to Wikimedia sites. I don't see how #operations is relevant here either. [18:24:52] 6operations: A link to Developer App Guidelines on API page - https://phabricator.wikimedia.org/T111423#1603526 (10VBaranetsky) My misunderstanding - operations should not have been tagged. [18:28:22] greg-g: I had some issues with the mobileapps deploy earlier. Do you think I could try again after the MediaWiki train deployment is over? [18:29:12] bearND: yup [18:29:23] greg-g: thanks [18:52:40] (03CR) 10Andrew Bogott: "Thanks! I will test" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [19:03:33] (03PS3) 10Dzahn: Phabricator project creation/changes log email for Phab admins [puppet] - 10https://gerrit.wikimedia.org/r/233219 (https://phabricator.wikimedia.org/T85183) (owner: 10Aklapper) [19:06:03] (03CR) 10Dzahn: [C: 032] "confirmed puppet part with compiler. cron gets created." [puppet] - 10https://gerrit.wikimedia.org/r/233219 (https://phabricator.wikimedia.org/T85183) (owner: 10Aklapper) [19:06:14] hallo [19:06:21] didn't the train run today? [19:07:17] (03CR) 10Mjbmr: "@MarcoAurelio: For hewiki also should be removed or just this wiki?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234910 (https://phabricator.wikimedia.org/T109755) (owner: 10Mjbmr) [19:13:44] bd808, twentyafterfour - was there any issue with the train today? [19:14:38] aharoni: it hasn't been deployed yet, I was about to do it [19:14:54] thanks [19:15:34] (03CR) 10Alex Monk: "Like all the other apaches, the git repository data isn't synchronised over, so you'd have to patch silver manually. Could cherry-pick on " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/221825 (owner: 10BryanDavis) [19:18:36] (03PS1) 1020after4: wikipedia wikis to 1.26wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235840 [19:19:03] (03CR) 1020after4: [C: 032] wikipedia wikis to 1.26wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235840 (owner: 1020after4) [19:19:11] (03Merged) 10jenkins-bot: wikipedia wikis to 1.26wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235840 (owner: 1020after4) [19:21:41] bearND|lunch: train done [19:24:42] (03PS2) 10Mjbmr: Add new user groups for azbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234910 (https://phabricator.wikimedia.org/T109755) [19:24:52] 6operations, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#1603716 (10Dzahn) Why did you remove that project? It seems the right one to me. This is about not using gitblit anymore, that would solve the unstable-ness. [19:30:21] 6operations, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#1603743 (10greg) >>! In T83702#1603716, @Dzahn wrote: > Why did you remove that project? It seems the right one to me. This is about not using gitblit anymore, that would solve the unstable-ness. B... [19:31:00] !log twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf21 [19:31:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:31:24] greg-g: actually, now train done [19:31:26] nah [19:31:28] bah [19:31:29] :) [19:32:00] greg-g: thank you. Will commence shortly. /cc:mdholloway [19:32:14] (03PS1) 10Alex Monk: Map be-tarask.wikipedia.org to be_x_oldwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235843 (https://phabricator.wikimedia.org/T11823) [19:35:53] 6operations: Hadoop MapReduce port range cannot be configured to a fixed range - https://phabricator.wikimedia.org/T111433#1603747 (10MoritzMuehlenhoff) 3NEW [19:36:26] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests, 5Patch-For-Review: Rename "be-x-old" to "be-tarask" - https://phabricator.wikimedia.org/T11823#1603754 (10Krenair) I don't believe you need to manually configure each new *.wikipedia.org wiki domain in apache these days. So, once this is d... [19:36:38] 6operations: Hadoop MapReduce port range cannot be configured to a fixed range - https://phabricator.wikimedia.org/T111433#1603762 (10MoritzMuehlenhoff) [19:38:20] (03PS1) 10Faidon Liambotis: Fix $ORIGIN typos in reverse IPv6 codfw space [dns] - 10https://gerrit.wikimedia.org/r/235845 [19:38:41] (03CR) 10Faidon Liambotis: [C: 032] Fix $ORIGIN typos in reverse IPv6 codfw space [dns] - 10https://gerrit.wikimedia.org/r/235845 (owner: 10Faidon Liambotis) [19:52:08] 6operations, 10ops-codfw, 5Patch-For-Review: rack & initial setup of elastic2001-2024 - https://phabricator.wikimedia.org/T111080#1603814 (10Papaul) Servers racking complete Rack table update physical label in place mgmt settings complete test mgmt local IP and password complete Ports information elastic200... [19:54:27] !log MobileApps deployed sha1 553c399 [19:54:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:54:35] \o/ [19:54:47] greg-g: done ^. Thank you! [19:54:53] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/mobile-html-sections-lead/{title} is CRITICAL: Test retrieve lead section of en.wp San Francisco page via mobile-html-sections-lead responds with malformed body: ascii codec cant encode character u\xe6 in position 94: ordinal not in range(128) [19:56:53] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/page/mobile-html-sections-lead/{title} is CRITICAL: Test retrieve lead section of en.wp San Francisco page via mobile-html-sections-lead responds with malformed body: ascii codec cant encode character u\xe6 in position 94: ordinal not in range(128) [19:57:46] 6operations, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#1603833 (10Dzahn) Isn't blocker done by the "blocked-by" feature rather than the project tag? [20:01:49] gwicke: hmm, not sure why we get the critical errors from icinga. When I hit https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Main%20Page it looks good to me. [20:01:59] Is that normal to get a critical after restart? [20:02:38] (03CR) 10Alex Monk: [C: 031] "LGTM. Have scheduled this in a deployment window" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) (owner: 10MarcoAurelio) [20:03:37] (03PS1) 10Dzahn: phabricator: fix output formatting in metrics script [puppet] - 10https://gerrit.wikimedia.org/r/235848 (https://phabricator.wikimedia.org/T85183) [20:05:09] (03CR) 10Dzahn: [C: 032] phabricator: fix output formatting in metrics script [puppet] - 10https://gerrit.wikimedia.org/r/235848 (https://phabricator.wikimedia.org/T85183) (owner: 10Dzahn) [20:06:24] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [20:08:22] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: Puppet has 1 failures [20:10:02] (03PS1) 10Alex Monk: Update logo for ukwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235850 (https://phabricator.wikimedia.org/T110370) [20:12:14] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:14:39] (03PS1) 10Dzahn: admin: add a group for apertium admins [puppet] - 10https://gerrit.wikimedia.org/r/235851 (https://phabricator.wikimedia.org/T111360) [20:17:52] (03PS1) 10Alex Monk: Lift account creation throttle for National Library of Wales Rugby Editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235853 (https://phabricator.wikimedia.org/T111332) [20:20:09] (03PS2) 10Dzahn: admin: add a group for apertium admins [puppet] - 10https://gerrit.wikimedia.org/r/235851 (https://phabricator.wikimedia.org/T111360) [20:21:34] (03PS1) 10Dzahn: admin: add kartik to apertium-admins [puppet] - 10https://gerrit.wikimedia.org/r/235854 (https://phabricator.wikimedia.org/T111360) [20:22:21] (03CR) 10Dzahn: [C: 04-1] "pending access request requirements" [puppet] - 10https://gerrit.wikimedia.org/r/235854 (https://phabricator.wikimedia.org/T111360) (owner: 10Dzahn) [20:24:23] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:29:39] (03PS14) 10Dzahn: contint: move zuul_merger_hosts to hiera, use in ferm [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) [20:35:59] 6operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10MediaWiki-extensions-Translate, 3Fundraising Sprint S: Publishing translations for central notice banners fails - https://phabricator.wikimedia.org/T104774#1604135 (10AndyRussG) [20:36:35] (03PS4) 10Thcipriani: Add config deployment [tools/scap] - 10https://gerrit.wikimedia.org/r/235385 [20:38:54] (03PS15) 10Dzahn: contint: move zuul_merger_hosts to hiera, use in ferm [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) [20:38:57] bearND: is that problem gone? [20:39:03] the output looks okay to me as well [20:40:26] gwicke: i don't know where to check icinga [20:40:39] (03PS1) 10Chad: Use Phab urls instead of Gitblit (comments only) [puppet] - 10https://gerrit.wikimedia.org/r/235857 [20:40:41] isn't there some graph that would show problems? [20:41:25] the web interface is at https://icinga.wikimedia.org/icinga/ [20:42:06] /{domain}/v1/page/mobile-html-sections-lead/{title} is CRITICAL: Test retrieve lead section of en.wp San Francisco page via mobile-html-sections-lead responds with malformed body: 'ascii' codec can't encode character u'xe6' in position 94: ordinal not in range(128) [20:42:20] that sounds like a monitoring script bug [20:42:44] Giuseppe would know [20:43:04] (03CR) 10Dzahn: [C: 032] Use Phab urls instead of Gitblit (comments only) [puppet] - 10https://gerrit.wikimedia.org/r/235857 (owner: 10Chad) [20:44:35] https://github.com/wikimedia/operations-puppet/blob/production/modules/service/spec/checker/test_checker.py [20:45:29] gwicke: is it complaining about unicode characters? https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/San%20Francisco [20:45:47] yeah, I think so [20:46:06] mutante: Thx [20:46:26] the "æ" character in there maybe [20:46:31] ostriches: np [20:46:55] gwicke: I'll have to leave soon. I'll have to let mdholloway handle this if there are still issues with that [20:46:57] if that's position 94 [20:47:22] bearND: I think that it's not actually an issue with the service [20:47:39] mutante: could be. Thanks [20:48:34] ACKNOWLEDGEMENT - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/mobile-html-sections-lead/{title} is CRITICAL: Test retrieve lead section of en.wp San Francisco page via mobile-html-sections-lead responds with malformed body: ascii codec cant encode character u\xe6 in position 94: ordinal not in range(128) gwicke Looks like an encoding issue in https://github.com/wikimedia/operations-puppet/blob/production/modu [20:48:34] ACKNOWLEDGEMENT - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/page/mobile-html-sections-lead/{title} is CRITICAL: Test retrieve lead section of en.wp San Francisco page via mobile-html-sections-lead responds with malformed body: ascii codec cant encode character u\xe6 in position 94: ordinal not in range(128) gwicke Looks like an encoding issue in https://github.com/wikimedia/operations-puppet/blob/production/modu [20:48:50] oh wow, I was able to ack that one [20:49:09] nice [20:50:01] we could change the page it checks from "San Francisco" to something else? [20:50:09] until the encoding issue is fixed [20:50:45] 6operations, 6Services, 7Monitoring: Encoding issue in test_checker.py - https://phabricator.wikimedia.org/T111447#1604171 (10GWicke) 3NEW [20:51:18] mutante: we could probably, but I think there are other checks covering this as well [20:51:32] and we verified manually that it's working [20:51:48] task ^^ [20:52:03] yea, agree. it's just an issue in the script [20:52:51] added a link to the task in icinga as well [20:53:15] but the output is also a bit odd with that span title ACK and links in icinga = :) [20:54:18] yeah, I'm glad that we are now able to do that [20:54:36] the json looks okay to me [20:54:52] just contains some utf-8 content [20:55:36] 6operations, 6Services, 7Monitoring: Encoding issue in test_checker.py - https://phabricator.wikimedia.org/T111447#1604189 (10GWicke) This was likely triggered by https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/San%20Francisco [20:55:59] 6operations, 6Services, 3Mobile-Content-Service, 7Monitoring: Encoding issue in test_checker.py - https://phabricator.wikimedia.org/T111447#1604190 (10GWicke) [21:03:28] mutante: gwicke: Parsoid produces output with æ as well. The main difference is that our service produces JSON while Parsoid produces HTML. [21:03:46] https://en.wikipedia.org/api/rest_v1/page/html/San%20Francisco [21:07:04] 6operations, 6Labs, 7Database, 7Tracking: (Tracking) Database replication services - https://phabricator.wikimedia.org/T50930#1604221 (10MZMcBride) [21:09:32] (03PS1) 10Ori.livneh: Fix encoding issue in test_checker.py [puppet] - 10https://gerrit.wikimedia.org/r/235866 (https://phabricator.wikimedia.org/T111447) [21:10:27] bearND|afk, gwicke, mutante [21:13:40] 6operations, 10Continuous-Integration-Infrastructure: contint: fix puppet run in labs / puppet compiler - https://phabricator.wikimedia.org/T111450#1604263 (10Dzahn) 3NEW [21:14:00] 6operations, 10Continuous-Integration-Infrastructure: contint: fix puppet run in labs / puppet compiler - https://phabricator.wikimedia.org/T111450#1604270 (10Dzahn) [21:17:18] 6operations, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#1604291 (10greg) This is offtopic (sorry everyone else) but: 1) what is being lost by not having the gitblit deprecate project? Nothing, is my contention. This (the wikimedia-git-or-gerrit) is the p... [21:17:45] (03CR) 10Merlijn van Deen: "As far as I can see, there's no str() call here, so you shouldn't get an UnicodeEncodeError other than on output (str gets coerced to unic" [puppet] - 10https://gerrit.wikimedia.org/r/235866 (https://phabricator.wikimedia.org/T111447) (owner: 10Ori.livneh) [21:18:09] !log bouncing Cassandra on restbase1001 to apply temporary GC settings [21:18:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:18:28] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1604303 (10ashley) I'd like to join #Project-Creators to be able to create projects for the various extensions, skins and other tools maintain... [21:19:51] (03CR) 10Dzahn: "let's try Merlijn's solution first?" [puppet] - 10https://gerrit.wikimedia.org/r/235866 (https://phabricator.wikimedia.org/T111447) (owner: 10Ori.livneh) [21:20:12] (03CR) 10Ori.livneh: "@mutante, it's wrong -- the error is in file "./service_checker", line 356, in _check_json_chunk: `if not check(str(data)):`." [puppet] - 10https://gerrit.wikimedia.org/r/235866 (https://phabricator.wikimedia.org/T111447) (owner: 10Ori.livneh) [21:21:40] !log rebuilt HHVM with updated diff from facebook/hhvm PR #6071 (T109540), uploaded to apt as 3.6.5+dfsg1-1+wm5 [21:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:22:54] mutante: you can experiment like this: ssh to scb1001, cp /usr/local/lib/nagios/plugins/service_checker ~/ , then run ./service_checker -t5 127.0.0.1 http://restbase.svc.eqiad.wmnet:8888 [21:24:22] the traceback is not echoed because the except blocks suppress it, but if you remove them you end up with this: [21:24:33] https://dpaste.de/U9R8/raw [21:25:24] furthermore calling .encode('utf-8') instead of str(data) wouldn't work because data can be an int [21:25:39] (03CR) 10Merlijn van Deen: "Sorry, I missed that one. I'm not sure why it casts to str() there (as opposed to unicode()), but I can imagine you don't feel like debugg" [puppet] - 10https://gerrit.wikimedia.org/r/235866 (https://phabricator.wikimedia.org/T111447) (owner: 10Ori.livneh) [21:25:54] ori: unicode(data) [21:26:57] valhallasw`cloud: sure, but there are other calls to str(), like in EndpointRequest._verify, and TemplateUrl.realize, that would need to be fixed as well [21:27:12] ori: i did what you said. confirmed, your fix works :) [21:28:06] (03CR) 10Dzahn: [C: 032] "confirmed on scb1001. with this fix:" [puppet] - 10https://gerrit.wikimedia.org/r/235866 (https://phabricator.wikimedia.org/T111447) (owner: 10Ori.livneh) [21:28:30] valhallasw`cloud: I obsess over writing neat and tidy Python code, so I totally get your aversion to the default encoding override trick. But after literally years of pain I adopted it, and I have never regretted adding it to a command-line script [21:28:58] ori: heh. Yeah, it's fine if the assumption `str` == utf-8 encoded `unicode` holds [21:29:00] the reason not to do it is that ASCII was historically the default and there could in theory exist code that depends on it being the default encoding [21:29:15] I have never encountered that in practice [21:29:48] latin-1 was the default in some places... :( [21:30:15] but anyway, I completely missed the str() and debugging all of those sounds like a pain [21:30:37] (for some reason, chrome doesn't allow me to search in the diffs correctly :/) [21:30:46] 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: contint: fix puppet run in labs / puppet compiler - https://phabricator.wikimedia.org/T111450#1604376 (10Dzahn) example before: http://puppet-compiler.wmflabs.org/872/gallium.wikimedia.org/ example after: http://puppet-compiler.wmflabs... [21:31:03] 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: contint: fix puppet run in labs / puppet compiler - https://phabricator.wikimedia.org/T111450#1604378 (10Dzahn) 5Open>3Resolved a:3Dzahn compiler run works again [21:34:12] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [21:34:25] (03PS1) 10Rush: elasticsearch partman and autoinstall [puppet] - 10https://gerrit.wikimedia.org/r/235893 [21:34:43] gwicke: bearND|afk ^ [21:36:13] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [21:36:31] 6operations, 6Services, 3Mobile-Content-Service, 7Monitoring, 5Patch-For-Review: Encoding issue in test_checker.py - https://phabricator.wikimedia.org/T111447#1604424 (10Dzahn) 5Open>3Resolved a:3Dzahn 14:36 < icinga-wm> RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are hea... [21:36:57] 6operations, 6Services, 3Mobile-Content-Service, 7Monitoring, 5Patch-For-Review: Encoding issue in test_checker.py - https://phabricator.wikimedia.org/T111447#1604427 (10Dzahn) a:5Dzahn>3ori [21:38:30] (03PS16) 10Dzahn: contint: move zuul_merger_hosts to hiera, use in ferm [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) [21:39:11] (03CR) 10Dzahn: [C: 032] "looking good in compiler, now that compiler could be used on gallium http://puppet-compiler.wmflabs.org/873/gallium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) (owner: 10Dzahn) [21:42:56] (03CR) 10Dzahn: "root@gallium:/etc/ferm/conf.d# cat 10_gearman_from_zuul_mergers" [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) (owner: 10Dzahn) [21:44:32] 7Puppet, 6operations, 5Patch-For-Review: Migrate as much as possible from network::constants from network.pp to hiera - https://phabricator.wikimedia.org/T87519#1604463 (10Dzahn) 10 lines less ... [21:45:26] (03PS2) 10Andrew Bogott: Added openstack config files for version Kilo [puppet] - 10https://gerrit.wikimedia.org/r/235399 [21:46:22] (03CR) 10jenkins-bot: [V: 04-1] Added openstack config files for version Kilo [puppet] - 10https://gerrit.wikimedia.org/r/235399 (owner: 10Andrew Bogott) [21:47:40] (03PS3) 10Andrew Bogott: Added openstack config files for version Kilo [puppet] - 10https://gerrit.wikimedia.org/r/235399 [21:48:25] (03CR) 10jenkins-bot: [V: 04-1] Added openstack config files for version Kilo [puppet] - 10https://gerrit.wikimedia.org/r/235399 (owner: 10Andrew Bogott) [21:51:26] (03PS1) 10MarcoAurelio: Flood flag configuration changes for es.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235900 (https://phabricator.wikimedia.org/T111455) [21:53:01] (03CR) 10Alex Monk: Flood flag configuration changes for es.wikibooks (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235900 (https://phabricator.wikimedia.org/T111455) (owner: 10MarcoAurelio) [21:53:08] (03PS4) 10Andrew Bogott: Added openstack config files for version Kilo [puppet] - 10https://gerrit.wikimedia.org/r/235399 [21:57:18] (03PS2) 10MarcoAurelio: Flood flag configuration changes for es.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235900 (https://phabricator.wikimedia.org/T111455) [21:59:12] (03CR) 10MarcoAurelio: Flood flag configuration changes for es.wikibooks (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235900 (https://phabricator.wikimedia.org/T111455) (owner: 10MarcoAurelio) [22:15:55] I'm trying to figure out what settings Varnish uses for gzip [22:16:03] e.g. which strength etc. [22:17:03] (03PS1) 10Rush: New addition elasticsearch20[0-1][0-9] [dns] - 10https://gerrit.wikimedia.org/r/235906 [22:17:11] (03CR) 10jenkins-bot: [V: 04-1] New addition elasticsearch20[0-1][0-9] [dns] - 10https://gerrit.wikimedia.org/r/235906 (owner: 10Rush) [22:24:16] (03PS1) 10MarcoAurelio: Enabling extension WikiLove for outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235907 (https://phabricator.wikimedia.org/T106264) [22:27:52] (03PS1) 10Jforrester: Revert "Enable VisualEditor for NS_PROJECT on enwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235908 [22:28:17] (03PS2) 10Rush: New addition elasticsearch20[0-1][0-9] [dns] - 10https://gerrit.wikimedia.org/r/235906 [22:31:49] (03CR) 10Dzahn: "you seem to be duplicating Papaul's work at https://gerrit.wikimedia.org/r/#/c/235657/" [dns] - 10https://gerrit.wikimedia.org/r/235906 (owner: 10Rush) [22:33:45] 6operations, 6Services, 3Mobile-Content-Service, 7Monitoring, 5Patch-For-Review: Encoding issue in test_checker.py - https://phabricator.wikimedia.org/T111447#1604674 (10GWicke) Thanks, @ori! [22:38:23] PROBLEM - HHVM rendering on mw1224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:39:51] is https://phabricator.wikimedia.org/T73330 ok to do? [22:40:03] flow for fr.wiktionary on two pages [22:40:04] PROBLEM - Apache HTTP on mw1224 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.024 second response time [22:40:13] RECOVERY - HHVM rendering on mw1224 is OK: HTTP OK: HTTP/1.1 200 OK - 65911 bytes in 1.376 second response time [22:42:04] RECOVERY - Apache HTTP on mw1224 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.204 second response time [22:46:00] mafk, In #wikimedia-site-requests this'd go under 'Special handling' [22:46:16] To reserve it for the appropriate team rather than standard site-requests processing [22:46:32] Added #Flow [22:46:34] ok, I'll move it to that and abstain from handling it [22:46:42] read my mind :) [22:46:53] Extension-setup is rather disorganised [22:47:29] where's special handling? I can't find on the workboard for ext-setup [22:47:37] mafk, In #wikimedia-site-requests this'd go under 'Special handling' [22:47:46] This is not wikimedia-site-requests, it's wikimedia-extension-setup :/ [22:48:10] No it's now [22:48:11] *not [22:48:15] Flow is already installed there [22:48:19] See https://fr.wiktionary.org/wiki/Sp%C3%A9cial:Version [22:48:24] hah [22:48:29] All that remains to be done is activate it on the requested page [22:48:40] Which I believe you can just ask dannyh to do [22:48:45] Well in that case I'll pull it from extension-setup [22:49:03] Actually [22:49:07] Please migrate the request to https://www.mediawiki.org/wiki/Flow/Request_Flow_on_a_page [22:50:02] mafk, ^ [22:50:19] will say the guys that opened the ticket to do that [22:52:54] done https://phabricator.wikimedia.org/T73330#1604724 [22:53:23] (03PS1) 10saper: Covert Polish listinfo template back to ISO 8859-2 [puppet] - 10https://gerrit.wikimedia.org/r/235911 [22:53:59] (03PS2) 10saper: Covert Polish listinfo template back to ISO 8859-2 [puppet] - 10https://gerrit.wikimedia.org/r/235911 [22:58:32] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 Telia (IC-313592) {#1501} [10Gbps DWDM]BR [23:00:04] RoanKattouw ostriches rmoen Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150903T2300). Please do the needful. [23:00:04] Krenair James_F ebernhardson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:11] Heya. [23:00:20] hey [23:00:46] (03PS4) 10Dzahn: Add DNS entries for elastic2001-2024 mgmt [dns] - 10https://gerrit.wikimedia.org/r/235657 (https://phabricator.wikimedia.org/T111080) (owner: 10Papaul) [23:01:02] (03CR) 10Alex Monk: [C: 032] Revert "Enable VisualEditor for NS_PROJECT on enwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235908 (owner: 10Jforrester) [23:01:27] (03Merged) 10jenkins-bot: Revert "Enable VisualEditor for NS_PROJECT on enwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235908 (owner: 10Jforrester) [23:01:34] (03PS5) 10Dzahn: Add DNS entries for elastic2001-2024 mgmt [dns] - 10https://gerrit.wikimedia.org/r/235657 (https://phabricator.wikimedia.org/T111080) (owner: 10Papaul) [23:02:31] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235908/ (duration: 00m 13s) [23:02:33] James_F, ^ [23:02:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:02:39] Ta. [23:03:35] ebernhardson, around? [23:04:10] (03CR) 10Dzahn: [C: 032] Add DNS entries for elastic2001-2024 mgmt [dns] - 10https://gerrit.wikimedia.org/r/235657 (https://phabricator.wikimedia.org/T111080) (owner: 10Papaul) [23:04:28] Krenair: yea [23:04:38] Your one is not entirely trivial [23:04:44] Krenair: its just a cherry pick [23:04:56] to code that doesn't run in prod unless you use a custom experimental endpoit [23:06:14] ElasticsearchIntermediary.php and Searcher.php sound pretty productiony to me? [23:07:14] although those parts do look fairly simple [23:07:23] Krenair: Searcher.php - the only changes are in the suggest and postProcessSuggest methods [23:07:33] these are only used for the experimental api [23:07:52] ah yes [23:07:59] alright, I'm happy, let's do this [23:08:25] the intermediary, well yes that code is run a hundred million times a day, but its a very simple change involving isset and adding a variable to a log [23:08:32] thanks [23:10:37] 6operations, 10fundraising-tech-ops, 10netops: Cleanup layer2 firewall config from pfw-eqiad - https://phabricator.wikimedia.org/T111463#1604749 (10faidon) 3NEW [23:13:26] (03PS1) 10Dzahn: wmnet: indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/235928 [23:16:42] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 [23:23:05] This is ridiculous. [23:23:16] I approved that commit 15 minutes ago FFS [23:23:25] stil waiting? :S [23:24:22] ahh, lego submitted a ton of patches and someone merged before you [23:24:42] would be nice if zuul could prioritize deploy branches...but i know nothing about how any of that works :P [23:25:37] Finally [23:25:44] Took 17 minutes [23:25:53] the good news is that the jshint maintainers don't release new versions every day :P [23:26:20] and not at 4pm always :) [23:26:21] Are you deploying that paladox commit now legoktm? [23:26:27] umm [23:26:30] which commit? [23:26:35] bd2eb6cc1919c7dab056d5f8fe5b4a164236d78f [23:26:36] also the mw-ext-selenium browser test jobs are slower than mwcore zend phpunit [23:26:52] no [23:26:53] Krinkle [23:26:56] no... [23:27:03] legoktm: not all of them [23:27:15] Krenair: just sync it? it's a no-op in prod [23:28:05] legoktm: Yeah, that selenium job should not have been enabled in this way. Should be a test and/or postmerge job; or MobileFrontend needs to detach itself from the mediawiki global block queue. [23:28:25] what's wrong with having it on gate? [23:28:40] !log krenair@tin Synchronized php-1.26wmf21/package.json: bd2eb6cc1919c7dab056d5f8fe5b4a164236d78f (duration: 00m 13s) [23:28:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:29:21] slow tests are ok imo, as long as they're actually useful tests [23:29:33] !log krenair@tin Synchronized php-1.26wmf21/extensions/CirrusSearch: https://gerrit.wikimedia.org/r/#/c/235905/ (duration: 00m 13s) [23:29:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:29:38] ebernhardson, ^ [23:29:40] please test [23:30:24] Krenair: funilly, none of that is visible directly in the browser (it has to do with the scoring algorithm which will run when i rebuild the index). Watching logs though and nothing seems to have blown up [23:30:50] mw1224 does look unhappy though [23:31:32] but that's been going on for a couple of hours [23:32:33] PROBLEM - HHVM rendering on mw1224 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 0.003 second response time [23:32:48] !log mw1224 has been sending segfault warnings and "Lost parent, LightProcess exiting" to hhvm.log since about 21:17:34 [23:32:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:33:34] Sep 3 23:30:17 mw1224 kernel: [24434639.821438] init: hhvm main process (43471) killed by SEGV signal [23:33:37] Sep 3 23:30:17 mw1224 kernel: [24434639.821449] init: hhvm main process ended, respawning [23:33:40] Sep 3 23:30:49 mw1224 kernel: [24434671.330879] init: hhvm main process (43663) killed by SEGV signal [23:33:43] Sep 3 23:30:49 mw1224 kernel: [24434671.330890] init: hhvm main process ended, respawning [23:33:46] Sep 3 23:31:55 mw1224 kernel: [24434737.554100] init: hhvm main process (43817) killed by SEGV signal [23:34:50] (03CR) 10Alex Monk: [C: 032] Change Kannada Wikisource logo and project title [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) (owner: 10MarcoAurelio) [23:35:10] :) [23:35:17] (03Merged) 10jenkins-bot: Change Kannada Wikisource logo and project title [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235728 (https://phabricator.wikimedia.org/T110806) (owner: 10MarcoAurelio) [23:36:15] !log krenair@tin Synchronized w/static/images/project-logos/knwikisource.png: https://gerrit.wikimedia.org/r/#/c/235728/ (duration: 00m 12s) [23:36:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:36:32] RECOVERY - HHVM rendering on mw1224 is OK: HTTP OK: HTTP/1.1 200 OK - 65897 bytes in 0.920 second response time [23:36:58] the hhvm version is different from say, mw1225 [23:37:46] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/235728 (duration: 00m 13s) [23:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:37:54] !log mw1224 - killed and restarted defunct hhvm, version is different from the one on mw1225 [23:37:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:38:42] 10Ops-Access-Requests, 6operations: Requesting access to bastiononly or ops for papaul - https://phabricator.wikimedia.org/T111123#1604816 (10RobH) a:3mark [23:39:28] (03CR) 10Alex Monk: [C: 032] Update logo for ukwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235850 (https://phabricator.wikimedia.org/T110370) (owner: 10Alex Monk) [23:39:54] (03Merged) 10jenkins-bot: Update logo for ukwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235850 (https://phabricator.wikimedia.org/T110370) (owner: 10Alex Monk) [23:40:49] !log krenair@tin Synchronized w/static/images/project-logos/ukwikivoyage.png: https://gerrit.wikimedia.org/r/#/c/235850/ (duration: 00m 12s) [23:40:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:41:55] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235850/ (duration: 00m 12s) [23:42:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:46:55] mutante: Ithanks, i'll fix [23:47:28] 6operations, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#1604862 (10greg) [23:49:29] (03CR) 10Alex Monk: [C: 032] Map be-tarask.wikipedia.org to be_x_oldwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235843 (https://phabricator.wikimedia.org/T11823) (owner: 10Alex Monk) [23:49:53] (03Merged) 10jenkins-bot: Map be-tarask.wikipedia.org to be_x_oldwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235843 (https://phabricator.wikimedia.org/T11823) (owner: 10Alex Monk) [23:50:54] !log krenair@tin Synchronized multiversion/MWMultiVersion.php: https://gerrit.wikimedia.org/r/#/c/235843/ (duration: 00m 12s) [23:50:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:51:42] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235843/ (duration: 00m 12s) [23:51:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:52:37] (03CR) 10Alex Monk: [C: 032] Lift account creation throttle for National Library of Wales Rugby Editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235853 (https://phabricator.wikimedia.org/T111332) (owner: 10Alex Monk) [23:53:02] (03Merged) 10jenkins-bot: Lift account creation throttle for National Library of Wales Rugby Editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235853 (https://phabricator.wikimedia.org/T111332) (owner: 10Alex Monk) [23:53:29] !log krenair@tin Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/235853/ (duration: 00m 12s) [23:53:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:57:44] 10Ops-Access-Requests, 6operations: Requesting access to elasticsearch-roots - https://phabricator.wikimedia.org/T111473#1604946 (10Tfinc) 3NEW