[00:05:13] 6operations, 10Wikimedia-Mailing-lists: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1595495 (10JohnLewis) >>! In T110140#1582771, @Dzahn wrote: > 2.1.18 (03-May-2014) > > Dependencies > > - There is a new depen... [00:19:21] 6operations, 10Wikimedia-Mailing-lists: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1595524 (10JohnLewis) Operations * DMARC improvements (requires python-dnspython) * List names are now includes in vette.log entries f... [00:26:39] (03PS1) 10John F. Lewis: mailman: set new settings to improve security [puppet] - 10https://gerrit.wikimedia.org/r/235384 [00:27:49] (03CR) 10John F. Lewis: [C: 04-1] "Do not merge until Mailman version displayed on lists.wikimedia.org is at least 2.1.18! (and using fermium...)" [puppet] - 10https://gerrit.wikimedia.org/r/235384 (owner: 10John F. Lewis) [00:32:38] (03CR) 10Alex Monk: [C: 032] Search Συγγραφέας namespace by default on elwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235366 (https://phabricator.wikimedia.org/T110871) (owner: 10Alex Monk) [00:33:03] (03Merged) 10jenkins-bot: Search Συγγραφέας namespace by default on elwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235366 (https://phabricator.wikimedia.org/T110871) (owner: 10Alex Monk) [00:33:05] (03PS8) 10Thcipriani: Add service deploy via scap [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 [00:33:08] (03PS1) 10Thcipriani: Add config deployment [tools/scap] - 10https://gerrit.wikimedia.org/r/235385 [00:33:21] (03CR) 10jenkins-bot: [V: 04-1] Add config deployment [tools/scap] - 10https://gerrit.wikimedia.org/r/235385 (owner: 10Thcipriani) [00:33:32] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/235366/ (duration: 00m 13s) [00:33:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:38:43] (03PS1) 10MaxSem: Allow maps access from localhost [puppet] - 10https://gerrit.wikimedia.org/r/235386 [00:41:53] (03PS2) 10Thcipriani: Add config deployment [tools/scap] - 10https://gerrit.wikimedia.org/r/235385 [01:12:02] (03CR) 10BBlack: "What about IPv6? :P But seriously, how are you even running into this?" [puppet] - 10https://gerrit.wikimedia.org/r/235386 (owner: 10MaxSem) [01:16:54] (03CR) 10MaxSem: "By trying to develop a tool that uses these tiles." [puppet] - 10https://gerrit.wikimedia.org/r/235386 (owner: 10MaxSem) [01:20:16] (03CR) 10MaxSem: "Basically, I'm trying to try using https://github.com/MaxSem/maps-demo locally, but no bueno. And that's what our volunteer devs will try " [puppet] - 10https://gerrit.wikimedia.org/r/235386 (owner: 10MaxSem) [01:20:17] RECOVERY - wikidata.org dispatch lag is higher than 300s on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1417 bytes in 0.189 second response time [01:24:53] (03CR) 10BBlack: [C: 032] Allow maps access from localhost [puppet] - 10https://gerrit.wikimedia.org/r/235386 (owner: 10MaxSem) [01:57:58] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: puppet fail [02:05:07] PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-tools/snapshot is not accessible: Permission denied [02:09:48] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [24.0] [02:11:16] PROBLEM - Persistent high iowait on labstore1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [60.0] [02:23:57] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:26:43] !log l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 06m 31s) [02:26:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:29:56] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-09-02 02:29:56+00:00 [02:30:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:31:07] PROBLEM - Persistent high iowait on labstore1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [60.0] [02:38:27] 6operations, 6Labs, 10wikitech.wikimedia.org: intermittent nutcracker failures - https://phabricator.wikimedia.org/T105131#1595626 (10chasemp) Thinking we could combine a bump in allowed clients with logging the request before denying at https://phabricator.wikimedia.org/diffusion/ODDY/browse/master/src/nc_p... [02:44:43] (03PS1) 10EBernhardson: Enable experiment with experimental completion suggester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235391 (https://phabricator.wikimedia.org/T111078) [02:45:19] (03CR) 10EBernhardson: [C: 04-1] "don't deploy until sept 8th afternoon SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235391 (https://phabricator.wikimedia.org/T111078) (owner: 10EBernhardson) [02:45:44] (03PS2) 10EBernhardson: Enable experiment with experimental completion suggester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235391 (https://phabricator.wikimedia.org/T111078) [02:50:18] !log l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 05m 09s) [02:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:52:52] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-02 02:52:51+00:00 [02:52:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:56:04] (03CR) 10Deskana: "Note: experiment should actually run for two weeks rather than one, per Mikhail's recent email to the search list." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235391 (https://phabricator.wikimedia.org/T111078) (owner: 10EBernhardson) [03:30:04] 6operations: Reintroduce rejection for requests with nul user agents - https://phabricator.wikimedia.org/T111140#1595700 (10Ironholds) p:5Triage>3High [03:30:58] 6operations, 10Traffic, 7Varnish: Reintroduce rejection for requests with nul user agents - https://phabricator.wikimedia.org/T111140#1595704 (10Krenair) [03:31:23] 6operations, 10Traffic, 7Varnish: Reintroduce rejection for requests with null user agents - https://phabricator.wikimedia.org/T111140#1595706 (10Krenair) [03:31:26] 6operations, 10Traffic, 7Varnish: Reintroduce rejection for requests with null user agents - https://phabricator.wikimedia.org/T111140#1595708 (10demon) The discussion on wikitech is going the other direction...to **not** require them anymore. I think the docs need fixing in that case :) [03:33:34] 6operations, 10Traffic, 7Varnish: Reintroduce rejection for requests with null user agents - https://phabricator.wikimedia.org/T111140#1595709 (10Ironholds) Is it? It seems to have devolved into a debate over how exactly to rate-limit. [03:34:41] 6operations, 10Traffic, 7Varnish: Reintroduce rejection for requests with null user agents - https://phabricator.wikimedia.org/T111140#1595710 (10Ironholds) Oh, I see the posts now. Eh, we can see how the thread works out. I'd rather have rate-limiting; what I don't want is rate limiting...in 12 months, when... [03:36:47] PROBLEM - Persistent high iowait on labstore1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [60.0] [03:37:12] 6operations, 10Traffic, 7Varnish: Reintroduce rejection for requests with null user agents - https://phabricator.wikimedia.org/T111140#1595711 (10demon) Yes....but I'm not seeing anything relating to the UA in rate limiting. And any such rate limiter would be its own task...not the reintroduction of blanket... [03:37:51] 6operations, 10Wikimedia-Mailing-lists: wikinews-l: no active listadmin - https://phabricator.wikimedia.org/T110956#1595713 (10Koavf) I can try to help. [03:40:18] 6operations, 10Traffic, 7Varnish: Reintroduce rejection for requests with null user agents - https://phabricator.wikimedia.org/T111140#1595719 (10demon) >>! In T111140#1595710, @Ironholds wrote: > Oh, I see the posts now. Eh, we can see how the thread works out. I'd rather have rate-limiting; what I don't wa... [03:54:27] PROBLEM - Persistent high iowait on labstore1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [60.0] [04:04:28] (03CR) 10Ori.livneh: [C: 04-1] "This is more cleanly and cheaply done via an Upstart override file" [debs/nutcracker] - 10https://gerrit.wikimedia.org/r/235368 (owner: 10Rush) [04:06:33] (03PS1) 10Ori.livneh: Increase nutcracker ulimit via an Upstart override file [puppet] - 10https://gerrit.wikimedia.org/r/235398 [04:06:59] (03PS2) 10Ori.livneh: Increase nutcracker ulimit via an Upstart override file [puppet] - 10https://gerrit.wikimedia.org/r/235398 [04:08:08] (03PS1) 10Andrew Bogott: Added openstack config files for version Kilo [puppet] - 10https://gerrit.wikimedia.org/r/235399 [04:08:23] (03CR) 10Ori.livneh: [C: 032] Increase nutcracker ulimit via an Upstart override file [puppet] - 10https://gerrit.wikimedia.org/r/235398 (owner: 10Ori.livneh) [04:08:51] (03CR) 10jenkins-bot: [V: 04-1] Added openstack config files for version Kilo [puppet] - 10https://gerrit.wikimedia.org/r/235399 (owner: 10Andrew Bogott) [04:10:45] (03PS3) 10Andrew Bogott: Add some crappy but handy scripts for managing the grid during reboots. [puppet] - 10https://gerrit.wikimedia.org/r/232285 [04:14:57] (03CR) 10Ori.livneh: "easier to do with https://gerrit.wikimedia.org/r/#/c/235398/ . verified:" [puppet] - 10https://gerrit.wikimedia.org/r/235370 (owner: 10Rush) [04:15:12] (03Abandoned) 10Ori.livneh: Nutcracker: set a higher ulimit [puppet] - 10https://gerrit.wikimedia.org/r/235370 (owner: 10Rush) [04:15:40] (03Abandoned) 10Ori.livneh: Allow upstart to set ulimit [debs/nutcracker] - 10https://gerrit.wikimedia.org/r/235368 (owner: 10Rush) [04:18:16] RECOVERY - Persistent high iowait on labstore1002 is OK: OK: Less than 50.00% above the threshold [40.0] [04:20:13] (03Abandoned) 10Ori.livneh: Get rid of cargo-cult statistics in check_graphite [puppet] - 10https://gerrit.wikimedia.org/r/234969 (owner: 10Ori.livneh) [04:20:37] PROBLEM - torrus.wikimedia.org HTTP on netmon1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string Torrus Top: Wikimedia not found on http://torrus.wikimedia.org:80/torrus - 838 bytes in 0.280 second response time [04:22:37] RECOVERY - torrus.wikimedia.org HTTP on netmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 2166 bytes in 0.363 second response time [04:36:07] PROBLEM - Persistent high iowait on labstore1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [60.0] [04:53:47] PROBLEM - Persistent high iowait on labstore1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [60.0] [04:54:21] is labs nfs okay? [04:54:23] https://ganglia.wikimedia.org/latest/graph.php?r=4hr&z=xlarge&c=Labs+NFS+cluster+eqiad&m=cpu_report&s=by+name&mc=2&g=load_report [04:54:30] and it feels very slow [04:54:37] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Sep 2 04:54:37 UTC 2015 (duration 54m 36s) [04:54:38] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: Puppet has 1 failures [04:54:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:58:32] (03CR) 10Mattflaschen: [C: 031] "No one from the Collaboration team has objected to it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/229197 (https://phabricator.wikimedia.org/T107927) (owner: 10Aude) [05:05:47] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Puppet has 1 failures [05:20:26] RECOVERY - puppet last run on db2037 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:27:27] PROBLEM - Persistent high iowait on labstore1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [60.0] [05:33:27] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:48:02] 6operations, 7Monitoring: grafana.wikimedia.org calls out to AWS - https://phabricator.wikimedia.org/T110484#1595839 (10greg) [05:49:23] 6operations, 7Monitoring: grafana.wikimedia.org calls out to AWS - https://phabricator.wikimedia.org/T110484#1578639 (10greg) >>! In T110484#1595047, @greg wrote: > It simply GETs https://grafanarel.s3.amazonaws.com/latest.json to see what the latest version is, presumably to complain/suggest upgrading. Or as... [06:01:16] RECOVERY - Persistent high iowait on labstore1002 is OK: OK: Less than 50.00% above the threshold [40.0] [06:09:38] RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0] [06:11:55] (03PS1) 10KartikMistry: Added Debian package for apertium-eo-fr [debs/contenttranslation/apertium-eo-fr] - 10https://gerrit.wikimedia.org/r/235404 (https://phabricator.wikimedia.org/T102101) [06:20:58] 6operations: Initial ferm setup is disruptive - https://phabricator.wikimedia.org/T110514#1595881 (10MoritzMuehlenhoff) We wouldn't need to rebuild ferm on an ongoing manner; this effect only applies to setups where ferm is reproactively applied to running services. All newly rolled-out services would have ferm... [06:30:26] PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:07] PROBLEM - puppet last run on mw1090 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:27] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:34] (03PS1) 10KartikMistry: Added Debian package for apertium-eo-es [debs/contenttranslation/apertium-eo-es] - 10https://gerrit.wikimedia.org/r/235408 (https://phabricator.wikimedia.org/T102101) [06:31:46] PROBLEM - puppet last run on mw2145 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:48] PROBLEM - puppet last run on cp1068 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:17] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 3 failures [06:32:37] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:06] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:08] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:29] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:36] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:56] 6operations, 5Continuous-Integration-Scaling, 7Database: MySQL database for Nodepool - https://phabricator.wikimedia.org/T110693#1595930 (10jcrespo) > I am not sure how much of an issue it can be for our databases. I had to ask for 2 reasons: usually, misc servers are not dedicated servers, which means they... [06:41:17] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [06:45:17] RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0] [06:50:25] (03PS1) 10KartikMistry: Add Debian package for apertium-ca-it [debs/contenttranslation/apertium-ca-it] - 10https://gerrit.wikimedia.org/r/235410 (https://phabricator.wikimedia.org/T105582) [06:51:16] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [24.0] [06:53:11] (03PS2) 10Muehlenhoff: Add a custom rsync ferm rule for swift storage [puppet] - 10https://gerrit.wikimedia.org/r/235221 (https://phabricator.wikimedia.org/T108987) [06:56:36] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:37] RECOVERY - puppet last run on mw1090 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:06] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:57:07] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:57:08] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:57:17] RECOVERY - puppet last run on mw2145 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:26] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:56] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:06] RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:16] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:47] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:06:26] PROBLEM - NTP on lvs2004 is CRITICAL: NTP CRITICAL: No response from NTP server [07:09:51] 6operations, 6Performance-Team: New URL scheme for service-generated thumbnails - https://phabricator.wikimedia.org/T111048#1595976 (10Gilles) Come to think of it, it could be a great opportunity to comply with http://iiif.io/, making ourselves compatible with a growing corpus of open source image viewing tool... [07:13:08] RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0] [07:19:05] (03CR) 10Muehlenhoff: [C: 04-1] "We should keep the nodepool system user at /bin/false and rather use "sudo -H -u nodepool bash -l"" [puppet] - 10https://gerrit.wikimedia.org/r/234483 (owner: 10Hashar) [07:20:56] (03PS1) 10Jcrespo: Add nodepooldb mysql database to m5 and grants from libnodepool1001 [puppet] - 10https://gerrit.wikimedia.org/r/235412 (https://phabricator.wikimedia.org/T110693) [07:21:21] (03CR) 10Merlijn van Deen: [C: 031] Add some crappy but handy scripts for managing the grid during reboots. [puppet] - 10https://gerrit.wikimedia.org/r/232285 (owner: 10Andrew Bogott) [07:26:17] PROBLEM - Outgoing network saturation on labstore1002 is CRITICAL: CRITICAL: 20.69% of data above the critical threshold [100000000.0] [07:29:06] (03PS2) 10Jcrespo: Add nodepooldb mysql database to m5 and grants from libnodepool1001 [puppet] - 10https://gerrit.wikimedia.org/r/235412 (https://phabricator.wikimedia.org/T110693) [07:31:50] (03CR) 10Jcrespo: [C: 032] Add nodepooldb mysql database to m5 and grants from libnodepool1001 [puppet] - 10https://gerrit.wikimedia.org/r/235412 (https://phabricator.wikimedia.org/T110693) (owner: 10Jcrespo) [07:37:08] PROBLEM - puppet last run on ganeti2006 is CRITICAL: CRITICAL: puppet fail [07:39:30] (03PS1) 10KartikMistry: Add Debian package for apertium-eo-ca [debs/contenttranslation/apertium-eo-ca] - 10https://gerrit.wikimedia.org/r/235415 (https://phabricator.wikimedia.org/T102101) [07:41:46] (03CR) 10Jcrespo: [C: 031] "Ready for this." [puppet] - 10https://gerrit.wikimedia.org/r/233671 (owner: 10Muehlenhoff) [07:44:22] (03CR) 10Jcrespo: [C: 031] "This can be done at any time- they do not have a public facing service- they are only a "mysql event firewall" to labs. Worst case scenari" [puppet] - 10https://gerrit.wikimedia.org/r/234489 (https://phabricator.wikimedia.org/T104699) (owner: 10Muehlenhoff) [07:46:19] (03PS1) 10Jcrespo: Fix mysql grant issues on m5 (Followup to gerrit:235412) [puppet] - 10https://gerrit.wikimedia.org/r/235416 (https://phabricator.wikimedia.org/T110693) [07:50:46] !log enable ferm on remaining phabricator db hosts [07:50:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:52:27] RECOVERY - Outgoing network saturation on labstore1002 is OK: OK: Less than 10.00% above the threshold [75000000.0] [07:55:08] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [24.0] [07:58:54] (03PS1) 10Muehlenhoff: Enable ferm for db1069/sanitarium [puppet] - 10https://gerrit.wikimedia.org/r/235417 [07:59:51] (03CR) 10Jcrespo: [C: 031] Enable ferm for db1069/sanitarium [puppet] - 10https://gerrit.wikimedia.org/r/235417 (owner: 10Muehlenhoff) [08:01:52] !log enable ferm on db1069/sanitarium [08:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:02:28] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm for db1069/sanitarium [puppet] - 10https://gerrit.wikimedia.org/r/235417 (owner: 10Muehlenhoff) [08:03:10] !log restarting ntp on lvs2004 [08:03:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:03:21] RECOVERY - puppet last run on ganeti2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:06:14] (03PS2) 10Muehlenhoff: Add ferm rules for mariadb sanitarium [puppet] - 10https://gerrit.wikimedia.org/r/234489 (https://phabricator.wikimedia.org/T104699) [08:06:24] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add ferm rules for mariadb sanitarium [puppet] - 10https://gerrit.wikimedia.org/r/234489 (https://phabricator.wikimedia.org/T104699) (owner: 10Muehlenhoff) [08:14:56] (03PS1) 10KartikMistry: Add Debian package for apertium-fr-ca [debs/contenttranslation/apertium-fr-ca] - 10https://gerrit.wikimedia.org/r/235418 (https://phabricator.wikimedia.org/T99637) [08:16:44] (03PS2) 10Jcrespo: Fix mysql grant issues on m5 (Followup to gerrit:235412) [puppet] - 10https://gerrit.wikimedia.org/r/235416 (https://phabricator.wikimedia.org/T110693) [08:17:54] (03PS1) 10Muehlenhoff: Also allow access to sanitatium from iron [puppet] - 10https://gerrit.wikimedia.org/r/235419 [08:18:09] (03PS3) 10Jcrespo: Fix mysql grant issues on m5 (Followup to gerrit:235412) [puppet] - 10https://gerrit.wikimedia.org/r/235416 (https://phabricator.wikimedia.org/T110693) [08:20:40] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [08:20:44] (03CR) 10Jcrespo: [C: 031] Also allow access to sanitatium from iron [puppet] - 10https://gerrit.wikimedia.org/r/235419 (owner: 10Muehlenhoff) [08:21:40] (03CR) 10Muehlenhoff: [C: 032 V: 032] Also allow access to sanitatium from iron [puppet] - 10https://gerrit.wikimedia.org/r/235419 (owner: 10Muehlenhoff) [08:25:56] (03CR) 10Filippo Giunchedi: [C: 031] Add a custom rsync ferm rule for swift storage [puppet] - 10https://gerrit.wikimedia.org/r/235221 (https://phabricator.wikimedia.org/T108987) (owner: 10Muehlenhoff) [08:28:42] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:30:14] (03PS4) 10Jcrespo: Fix mysql grant issues on m5 (Followup to gerrit:235412) [puppet] - 10https://gerrit.wikimedia.org/r/235416 (https://phabricator.wikimedia.org/T110693) [08:31:07] (03CR) 10Jcrespo: [C: 032] Fix mysql grant issues on m5 (Followup to gerrit:235412) [puppet] - 10https://gerrit.wikimedia.org/r/235416 (https://phabricator.wikimedia.org/T110693) (owner: 10Jcrespo) [08:34:27] RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0] [08:38:12] jynus: \O/ :-} [08:40:28] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [24.0] [08:48:18] 6operations, 5Continuous-Integration-Scaling, 7Database, 5Patch-For-Review: MySQL database for Nodepool - https://phabricator.wikimedia.org/T110693#1596146 (10jcrespo) Access has been granted to m5-master only from labnodepool1001: ``` root@labnodepool1001:~$ mysql -h m5-master -u nodepool -p Enter passwo... [08:48:38] RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0] [08:54:58] (03CR) 10Filippo Giunchedi: [C: 04-1] "after chatting with Eric this looks like this is due to sysv init script sourcing and exporting JVM_OPTS, given we might switch to unit fi" [puppet] - 10https://gerrit.wikimedia.org/r/235012 (owner: 10Eevans) [08:57:53] zeljkof-meeting: https://phabricator.wikimedia.org/T110693#1596146 :-}}} [08:59:59] (03PS2) 10Jcrespo: Repool es1010, pool es1017 for the first time [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235276 (https://phabricator.wikimedia.org/T105843) [09:01:06] (03CR) 10Jcrespo: [C: 032] Repool es1010, pool es1017 for the first time [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235276 (https://phabricator.wikimedia.org/T105843) (owner: 10Jcrespo) [09:02:35] "# Your branch is ahead of 'origin/master' by 7 commits." on tin [09:05:48] I need to deploy, but I do not know on which state should I leave the tin repo [09:06:04] should I just merge my change? [09:06:25] jynus: could they be security patches ? [09:06:31] or maybe origin/master is not up to date [09:06:54] no, I fetched from origin/master and now we have 2 branches [09:07:41] oh seems last ones were merges [09:08:01] maybe that can just be rebased [09:08:19] if something screw up you can still rewind to current e83614f (HEAD, master) Merge remote-tracking branch 'origin/master' [09:08:27] (03Abandoned) 10Muehlenhoff: Enable base::firewall on analytics1021 [puppet] - 10https://gerrit.wikimedia.org/r/229374 (owner: 10Muehlenhoff) [09:08:42] seems krenair did merge commits on Tue Sep 1 21:19:01 2015 +0000 [09:08:52] yes, I saw that [09:09:12] $ git cherry origin/master HEAD [09:09:12] + 3af6f390e9073453a5d7cf9550a31d6e7296cb06 [09:09:17] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [24.0] [09:09:31] which is a commit to delete 1.26wmf12 branches [09:10:08] which isn't in Gerrit unfortunately [09:10:26] so what happened is twentyafterfour did a "delete 1.26wmf12 branches" as a live hack not in Gerrit / not merged [09:10:41] then other added more patches which ended up being merged on top of twentyafterfour patch [09:10:45] I think if you rebase [09:10:52] my issues is that I do not know how the branches are used, etc [09:10:56] you will have the live hack to delete 1.26wmf12 branches on top of it [09:11:20] !gerrit I1cc466cd6adde53140c5dcb5ae1e7361f8d8368a [09:11:20] https://gerrit.wikimedia.org/ [09:11:24] grrr [09:11:41] ah https://gerrit.wikimedia.org/r/#/c/235347/ [09:11:45] unmarried but applied on tin [09:11:46] 6operations, 10Traffic, 7Varnish: Reintroduce rejection for requests with null user agents - https://phabricator.wikimedia.org/T111140#1596220 (10faidon) p:5High>3Low [09:11:56] so I do not want to do any change that may delete live patches or hide them etc [09:12:47] so my question is, why was it not sent though gerrit, if it is not a security issue? [09:12:53] (03CR) 10Hashar: "That change has been fetched on tin.eqiad.wmnet but left unmerged here in Gerrit. So it is currently a live hack and all deployment made " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235347 (owner: 1020after4) [09:13:02] jynus: matrix glitch [09:13:17] (03PS2) 10Hashar: delete 1.26wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235347 (owner: 1020after4) [09:13:30] I would like to make sure 1.26wmf12 is no more used [09:14:22] basically, my fear is to do something that messes with deployment [09:14:25] switched out of it back in July 09 [09:14:43] (03CR) 10Hashar: [C: 032] "We switched out of it back in July 09" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235347 (owner: 1020after4) [09:14:48] (03Merged) 10jenkins-bot: delete 1.26wmf12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235347 (owner: 1020after4) [09:15:03] jynus: I think you can rebase now [09:15:07] the live hack is now merged [09:16:18] jynus: though you must not use 'root' :-} [09:16:23] ok, fetechm stash, rebase, stash pop, and now I am ok [09:16:33] or files under .git/ might end up belonging to root which would cause some troubles [09:16:38] though they apparently belong to wikidev .. [09:16:54] is shulud use my UID [09:17:40] looks fine [09:17:48] now there is a problem with /wikiversion.json :-( [09:18:11] for me that is ok, I left it as I found it ! [09:18:18] :-) [09:18:42] so I think you are safe now [09:18:47] yes [09:19:15] asuming the version live hacked was the one that was deployed [09:19:17] !log Merged in "delete 1.26wmf12" https://gerrit.wikimedia.org/r/235347 which was left unmerged in Gerrit but was present on tin /srv/mediawiki-staging confusing people. [09:19:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:19:26] (03PS3) 10Muehlenhoff: Add a custom rsync ferm rule for swift storage [puppet] - 10https://gerrit.wikimedia.org/r/235221 (https://phabricator.wikimedia.org/T108987) [09:19:26] that worry me [09:19:36] gotta diff the wikiversion.json :/ [09:19:41] but in any case, I only sync 1 file [09:19:58] not the whole contents, and that file is compatible with all possible versions [09:20:03] yeah will be fine [09:20:09] I am going to handle the diff [09:20:36] 6operations, 6Analytics-Kanban, 7Monitoring, 5Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1596240 (10fgiunchedi) @ottomata let's sync up on this on hangout/irc and report conclusions here, seems like it'll speed things up! [09:21:06] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add a custom rsync ferm rule for swift storage [puppet] - 10https://gerrit.wikimedia.org/r/235221 (https://phabricator.wikimedia.org/T108987) (owner: 10Muehlenhoff) [09:21:37] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [500.0] [09:21:42] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1010, pool es1017 (duration: 00m 13s) [09:21:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:21:55] im done [09:21:55] diff -u <(git show HEAD:wikiversions.json|python -m json.tool) <(cat wikiversions.json|python -m json.tool) [09:21:55] [09:21:56] :D [09:22:03] - "testwiki": "php-1.26wmf20", [09:22:03] + "testwiki": "php-1.26wmf21", [09:23:37] RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0] [09:24:39] !sal [09:24:39] https://wikitech.wikimedia.org/wiki/Server_Admin_Log https://tools.wmflabs.org/sal/production See it and you will know all you need. [09:25:04] will look at it later [09:25:07] need coffee / break etc [09:25:25] no blocker for me, so that's ok [09:30:55] (03PS1) 10Faidon Liambotis: phabricator: silence community_metrics.sh cronspam [puppet] - 10https://gerrit.wikimedia.org/r/235422 [09:31:09] (03CR) 10Faidon Liambotis: [C: 032] phabricator: silence community_metrics.sh cronspam [puppet] - 10https://gerrit.wikimedia.org/r/235422 (owner: 10Faidon Liambotis) [09:31:15] (03CR) 10Faidon Liambotis: [V: 032] phabricator: silence community_metrics.sh cronspam [puppet] - 10https://gerrit.wikimedia.org/r/235422 (owner: 10Faidon Liambotis) [09:34:08] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [09:39:07] (03PS1) 10Jcrespo: Depool es1002 in order to clone it to new server es1016 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235423 (https://phabricator.wikimedia.org/T105843) [09:39:41] (03CR) 10Jcrespo: [C: 032] Depool es1002 in order to clone it to new server es1016 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235423 (https://phabricator.wikimedia.org/T105843) (owner: 10Jcrespo) [09:41:47] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1002 (duration: 00m 12s) [09:41:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:42:17] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [09:50:51] jynus: I filled a task about the dirty wikiversions.json . I am too afraid to screw up something :/ [09:51:40] yes, that is why I asked also before- better to have a second opinion [09:53:52] I have to wait for cloning the servers- snapshots are running right now [09:55:41] jynus: I will need the nodepool database credentials for m5-master to be filled in the private repo. a boilerplate for labs/private is https://gerrit.wikimedia.org/r/#/c/235424/ [09:55:50] I am adjusting the bits in operations/puppet now [09:55:57] I already did that :-) [09:56:09] ohh [09:56:21] in can change it [09:56:46] but it may be nice to do it the other way round [09:56:47] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [09:56:50] going to adjust the labs/private patch :)} [09:57:13] also, put the user on the public repo, it is easier to change, and it is public in any case [09:57:15] (03PS1) 10Muehlenhoff: Remove the ferm rules from modules/rsync/manifests/server.pp [puppet] - 10https://gerrit.wikimedia.org/r/235425 (https://phabricator.wikimedia.org/T108987) [09:57:19] (that is my suggestion) [09:57:37] yup [09:57:40] are you ok with that, hashar ? [09:57:47] I haven't noticed you already updated ops/puppet [09:58:18] I wanted to mention the authentication, but had many things in mind, sorry [09:58:28] https://gerrit.wikimedia.org/r/#/c/235424/2/modules/passwords/manifests/init.pp,unified [09:58:34] should match what you did in prod [09:59:00] and actually, I didn't change the boilerplate, that is my mistake [09:59:48] my laptop loose wireless whenever I move :( [10:00:38] if you do not like the token I chose, I can change it, but I would like to minimize those commits, as I tend to break everithing [10:00:45] :-) [10:02:54] if anything is missing or it doesn't work, etc, hashar_ hashar, let me know and I will fix it [10:03:22] jynus: it is all fine to me [10:03:29] will just hardcode db_user='nodepool' in the role [10:03:57] yes, after all, it is public on the mysql side of things [10:08:10] (03PS1) 10Hashar: nodepool: adjust database configuration [puppet] - 10https://gerrit.wikimedia.org/r/235427 (https://phabricator.wikimedia.org/T110693) [10:08:57] (03CR) 10jenkins-bot: [V: 04-1] nodepool: adjust database configuration [puppet] - 10https://gerrit.wikimedia.org/r/235427 (https://phabricator.wikimedia.org/T110693) (owner: 10Hashar) [10:11:08] (03PS2) 10Hashar: nodepool: adjust database configuration [puppet] - 10https://gerrit.wikimedia.org/r/235427 (https://phabricator.wikimedia.org/T110693) [10:11:08] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1596312 (10Aklapper) >>! In T109279#1595147, @chasemp wrote: > Any luck? T110913 [10:11:36] jynus: so the boiler plate should match prod and only contains the user password https://gerrit.wikimedia.org/r/#/c/235424/2/modules/passwords/manifests/init.pp,unified [10:12:01] then I adjusted the nodepool role / class and yam template to use db_host db_name etc :) [10:12:33] do you need to do a failover process? [10:12:51] as in, me checking that we do the transition ok? [10:13:11] the service is not enabled yet [10:13:15] so there is little risk [10:13:19] oh, then no problem [10:13:22] at worth will need to send a fix up patch hehe [10:13:27] I am running the puppet compiler https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/870/console [10:13:46] (03CR) 10Hashar: "Puppet compilation https://puppet-compiler.wmflabs.org/870/" [puppet] - 10https://gerrit.wikimedia.org/r/235427 (https://phabricator.wikimedia.org/T110693) (owner: 10Hashar) [10:14:32] -dburi: 'mysql+pymysql://nodepool:nodepool@localhost/nodepool' [10:14:33] +dburi: 'mysql+pymysql://nodepool:@m5-master.eqiad.wmnet/nodepooldb' [10:14:34] \O/ [10:14:45] _joe_: thank you to have resurrected the puppet compiler! [10:16:08] (03PS3) 10Hashar: nodepool: adjust database configuration [puppet] - 10https://gerrit.wikimedia.org/r/235427 (https://phabricator.wikimedia.org/T110693) [10:17:07] (03CR) 10Hashar: [C: 031] "In the erb template: db_password -> db_pass" [puppet] - 10https://gerrit.wikimedia.org/r/235427 (https://phabricator.wikimedia.org/T110693) (owner: 10Hashar) [10:17:32] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra inter-node encryption (TLS) - https://phabricator.wikimedia.org/T108953#1596328 (10fgiunchedi) so I don't think we'll be able to reuse puppet certs with multiple instances, thus we'll have to roll our own. since we are going to use multiple cassandra c... [10:17:49] (03CR) 10Jcrespo: [C: 031] nodepool: adjust database configuration [puppet] - 10https://gerrit.wikimedia.org/r/235427 (https://phabricator.wikimedia.org/T110693) (owner: 10Hashar) [10:18:04] jynus: I think you can land that one :-D [10:18:16] good to know the configuration of a new db is handled via puppet [10:18:24] that streamlines the process imho [10:18:28] well [10:18:57] there are many things that we do manually [10:19:04] on purpose [10:19:19] outside of puppet, like data migration, etc. [10:19:29] I can imagine you don't want puppet to randomly drop db / alter columns / change credentials :D [10:19:36] exactly [10:19:48] actually, we do not want puppet to connect to mysql , period [10:20:04] sounds wise [10:20:04] but the manual process is just applying a script [10:20:27] so, creating this a new host takes 0 seconds [10:20:57] so, should I deploy, or do you want to wait for more +1? [10:21:00] (03CR) 10Zfilipin: [C: 031] nodepool: adjust database configuration [puppet] - 10https://gerrit.wikimedia.org/r/235427 (https://phabricator.wikimedia.org/T110693) (owner: 10Hashar) [10:21:53] in any case, (again) if for any reason you have to debug it, just ping me again [10:26:29] jynus: you can deploy it :} merge / palladium / puppet agent -tv on labnodepool [10:26:53] will validate the resulting yaml and check whether the service manage to connect [10:27:29] (03CR) 10Jcrespo: [C: 032] nodepool: adjust database configuration [puppet] - 10https://gerrit.wikimedia.org/r/235427 (https://phabricator.wikimedia.org/T110693) (owner: 10Hashar) [10:29:41] I do not see yet the connection on m5, but it may need some activity [10:30:12] !log installed qemu security updates on labvirt* [10:30:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:30:20] jynus: yeah the service is disabled right now [10:30:52] no connection denied, either [10:33:48] jynus: looks good to me :-} [10:34:28] RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0] [10:35:34] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1596380 (10jcrespo) [10:37:20] 6operations, 5Continuous-Integration-Scaling, 7Database, 5Patch-For-Review: MySQL database for Nodepool - https://phabricator.wikimedia.org/T110693#1596381 (10hashar) 5Open>3Resolved Jaime validated the mysql connection. I got the nodepool config adjusted and the command line utility manages to reach... [10:37:21] jynus: all good. Thank you very very much ! [10:37:31] I have totally missed out the database setup part :-/ [10:37:52] and was really worrying about the amount of work required to pick up a db host / set it up [10:38:34] Re: setup. Do not worry, go from time to time to #wikimedia-databases and I will tell to to do some tickets on my behalf :-D [10:39:13] I set it up on m5, because it matched the load and the theme (openstack) [10:39:27] new hardware is more complicated [10:47:12] hashar, the default character set of the database is binary, like in the wikis; that may create some issues with some applications (andrew had some recently) [10:47:45] jynus: ah thanks for pointing out [10:48:02] jynus: if it comes out to be a problem I can change it in the DB URI probably [10:48:45] actually, we may have to change it on the tables, but only if it is a problem [10:50:09] (03PS1) 10Muehlenhoff: Enable ferm rules for role::mariadb::misc::eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/235429 [10:50:11] (03PS1) 10Muehlenhoff: Enable ferm on db1046 [puppet] - 10https://gerrit.wikimedia.org/r/235430 [10:53:37] ^uff, that is more complicated :-) [11:00:04] nikerabbit kart_: Respected human, time to deploy Translation service (TTMServer) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150902T1100). Please do the needful. [11:00:51] !log cloning mysql data from es1002 into es1016 [ETA:16h] [11:00:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:01:16] yay [11:03:28] PROBLEM - Outgoing network saturation on labstore1002 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [100000000.0] [11:17:49] ugh [11:18:12] why in earth is there three active shards when I have requested 1 shard with 1 replica? [11:23:38] RECOVERY - Outgoing network saturation on labstore1002 is OK: OK: Less than 10.00% above the threshold [75000000.0] [11:23:56] Nikerabbit: because of auto_expand_replicas maybe [11:24:36] dcausse: my script refuses to continue because 3 !== 2... wondering how to work around that [11:25:20] Nikerabbit: it's for ttmserver-test? [11:25:45] dcausse: well, I was trying with ttmserver-test before running it for ttmserver [11:27:52] dcausse: was thinking of making it >= comparison on tin temporarily unless you have a better idea [11:28:04] (03PS1) 10Muehlenhoff: Enable ferm for dbstore* in codfw [puppet] - 10https://gerrit.wikimedia.org/r/235435 [11:28:24] Nikerabbit: ttmserver-test seems to be created with auto_expand_replicas 0-2, so it will always creates 2 replica in eqiad [11:28:52] dcausse: is ttmserver the same? [11:29:11] Nikerabbit: yes [11:41:38] dcausse: thanks for the explanation, using my workaround for nw [11:42:00] (03PS1) 10Muehlenhoff: Enable ferm for role::mariadb::core [puppet] - 10https://gerrit.wikimedia.org/r/235436 [11:43:45] Nikerabbit: trying to figure out if this auto_expand_replicas is a default value we set somewhere. Is it possible that your script set this param? [11:44:44] dcausse: I am almost certain I am not setting it [11:44:53] 6operations, 6Phabricator: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1596491 (10Aklapper) Afraid of sidetracking, but I'm wondering if/how to cover assigned tasks in ticketing systems. For example [[ https://wikimediafoundation.org/w/index.php?title=Templat... [11:45:16] Nikerabbit: then it's certainly a cluster default value (will investigate) [11:46:13] it seems CirrusSearch sets it, but I am using Elastica directly so it shouldn't affect [11:46:47] (03CR) 10Alexandros Kosiaris: [C: 031] Remove the ferm rules from modules/rsync/manifests/server.pp [puppet] - 10https://gerrit.wikimedia.org/r/235425 (https://phabricator.wikimedia.org/T108987) (owner: 10Muehlenhoff) [11:47:16] !log kill STOP'ed rsync on labstore1002 [11:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:52:54] (03PS1) 10Muehlenhoff: Enable ferm for mariadb core servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/235440 [11:53:07] 6operations, 7Monitoring: grafana.wikimedia.org calls out to AWS - https://phabricator.wikimedia.org/T110484#1596523 (10akosiaris) > Nothing, as the call to AWS doesn't provide any functionality, afaict. It simply GETs https://grafanarel.s3.amazonaws.com/latest.json to see what the latest version is, presumabl... [11:53:19] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [100000000.0] [11:54:21] (03PS1) 10Muehlenhoff: Exempt mariadb core port from connection tracking [puppet] - 10https://gerrit.wikimedia.org/r/235443 [11:56:40] (03CR) 10Filippo Giunchedi: [C: 031] Remove the ferm rules from modules/rsync/manifests/server.pp [puppet] - 10https://gerrit.wikimedia.org/r/235425 (https://phabricator.wikimedia.org/T108987) (owner: 10Muehlenhoff) [11:59:57] !log removed tools LV snapshots on labstore1002 [12:00:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:00:08] RECOVERY - Disk space on labstore1002 is OK: DISK OK [12:01:57] (03PS1) 10Muehlenhoff: Enable ferm for role::mariadb::analytics [puppet] - 10https://gerrit.wikimedia.org/r/235444 [12:01:59] (03PS1) 10Muehlenhoff: Enable ferm on db1047 [puppet] - 10https://gerrit.wikimedia.org/r/235445 [12:03:30] 6operations, 7Graphite: Upgrade to Grafana v2.x - https://phabricator.wikimedia.org/T104738#1596543 (10fgiunchedi) I did a brief review of the debian package for grafana v2 and it seems usable, we could import it into apt.wikimedia.org, no source package though [12:04:45] 6operations, 7Monitoring: grafana.wikimedia.org calls out to AWS - https://phabricator.wikimedia.org/T110484#1596556 (10fgiunchedi) FWIW see also {T104738} for a related discussion on the upgrade, I don't see a way to disable this via config at least in our version in trebuchet :| [12:15:48] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [12:25:52] 6operations, 7Graphite: Upgrade to Grafana v2.x - https://phabricator.wikimedia.org/T104738#1596624 (10akosiaris) Indeed grafana now features a Go server (possibly used as a proxy) to avoid CORS quirkness among other new features. There is a migration guide as already mentioned so it should be possible to upgr... [12:29:37] (03PS3) 10Alexandros Kosiaris: Log for Apertium [puppet] - 10https://gerrit.wikimedia.org/r/230992 (https://phabricator.wikimedia.org/T108797) (owner: 10KartikMistry) [12:29:46] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Log for Apertium [puppet] - 10https://gerrit.wikimedia.org/r/230992 (https://phabricator.wikimedia.org/T108797) (owner: 10KartikMistry) [12:32:12] akosiaris: Thanks! [12:33:36] (03PS1) 10Alexandros Kosiaris: apertium: fix typo errors [puppet] - 10https://gerrit.wikimedia.org/r/235450 [12:34:05] (03PS1) 10John F. Lewis: phabricator: replace Needs Volunteer with Lowest [puppet] - 10https://gerrit.wikimedia.org/r/235451 [12:34:24] andre__: ^ that what you wanted re. renaming (in metrics email) [12:35:36] (03CR) 10Alexandros Kosiaris: [C: 032] apertium: fix typo errors [puppet] - 10https://gerrit.wikimedia.org/r/235450 (owner: 10Alexandros Kosiaris) [12:39:27] 10Ops-Access-Requests, 6operations, 10ContentTranslation-Deployments, 3LE-CX6-Sprint 3: Access to /var/log/apertium for Kartik - https://phabricator.wikimedia.org/T108678#1596664 (10akosiaris) Access to /var/log/apertium/ since https://gerrit.wikimedia.org/r/230992 has been merged is now available. [12:42:16] (03PS1) 10Muehlenhoff: Enable ferm for db2055-2070 [puppet] - 10https://gerrit.wikimedia.org/r/235453 [12:42:38] PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: puppet fail [12:43:14] (03CR) 10Steinsplitter: [C: 031] Add *.ggpht.com to Wikimedia Commons upload whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234980 (https://phabricator.wikimedia.org/T110869) (owner: 10Dereckson) [12:45:12] JohnFLewis, oh thanks [12:46:08] JohnFLewis: though you undermine my lazy attempt to trick random people into becoming code contributors :D [12:47:19] (03CR) 10Aklapper: [C: 031] "lgtm. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/235451 (owner: 10John F. Lewis) [12:47:25] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1596692 (10Ainali) I like to join #Project-Creators to be able to setup projects for Wikimedia Sverige, following [[ https://www.mediawiki.org... [12:47:40] andre__: I'm sort of lazy ;) [12:56:31] (03PS1) 10Yuvipanda: labstore: Make sure that check_http hits correct IP [puppet] - 10https://gerrit.wikimedia.org/r/235454 [12:56:37] (03CR) 10jenkins-bot: [V: 04-1] labstore: Make sure that check_http hits correct IP [puppet] - 10https://gerrit.wikimedia.org/r/235454 (owner: 10Yuvipanda) [12:56:53] (03PS2) 10Yuvipanda: labstore: Make sure that check_http hits correct IP [puppet] - 10https://gerrit.wikimedia.org/r/235454 [12:57:47] JohnFLewis: No. Not really from what I see. Sorry. :P [12:58:35] (03CR) 10Yuvipanda: [C: 032] labstore: Make sure that check_http hits correct IP [puppet] - 10https://gerrit.wikimedia.org/r/235454 (owner: 10Yuvipanda) [13:01:14] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1596714 (10Qgil) @ainali {{done}}. Looking forward to see WMSE trying Wikimedia Phabricator for for your non-tech activities. [13:07:33] RECOVERY - NFS read/writeable on labs instances on labstore1002 is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.473 second response time [13:07:53] 6operations, 10ops-eqiad: Prepare shipping label for mx80 to eqord - https://phabricator.wikimedia.org/T109338#1596722 (10Cmjohnson) a:5Cmjohnson>3Papaul Papaul, Update the ticket and assign to me once shipped. Thanks [13:08:42] ^ that's better! [13:09:05] mark: ^ that check will tell us if NFS on instances fail. Same thing catchpoint uses. I'l make that paging [13:10:34] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:53] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 13.79% of data above the critical threshold [100000000.0] [13:12:35] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1596750 (10Nemo_bis) > closed blocking task Restricted Task as "Resolved". If it's resolved, please make it public. Thanks. [13:19:37] (03PS1) 10Alexandros Kosiaris: Provide LE with the right to stop/start apertium-apy [puppet] - 10https://gerrit.wikimedia.org/r/235461 (https://phabricator.wikimedia.org/T108678) [13:20:33] PROBLEM - puppet last run on mw2114 is CRITICAL: CRITICAL: Puppet has 1 failures [13:27:59] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra inter-node encryption (TLS) - https://phabricator.wikimedia.org/T108953#1596768 (10Eevans) >>! In T108953#1596328, @fgiunchedi wrote: > so I don't think we'll be able to reuse puppet certs with multiple instances, thus we'll have to roll our own. since... [13:29:05] (03PS3) 10Jcrespo: Save binary log coordinates from the master and the slave on backup [puppet] - 10https://gerrit.wikimedia.org/r/234503 [13:29:48] (03CR) 10jenkins-bot: [V: 04-1] Save binary log coordinates from the master and the slave on backup [puppet] - 10https://gerrit.wikimedia.org/r/234503 (owner: 10Jcrespo) [13:31:12] RECOVERY - NFS read/writeable on labs instances on labstore1001 is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.097 second response time [13:31:21] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 8.33% of data above the critical threshold [500.0] [13:31:41] RECOVERY - NFS read/writeable on labs instances on labstore2001 is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.068 second response time [13:32:22] RECOVERY - NFS read/writeable on labs instances on labstore1003 is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.060 second response time [13:32:46] (03PS4) 10Jcrespo: Save binary log coordinates from the master and the slave on backup [puppet] - 10https://gerrit.wikimedia.org/r/234503 [13:33:08] (03PS1) 10Revi: Change needs_volunteer to Lowest [puppet] - 10https://gerrit.wikimedia.org/r/235463 [13:34:49] 10Ops-Access-Requests, 6operations, 10ContentTranslation-Deployments, 3LE-CX6-Sprint 3, 5Patch-For-Review: Access to /var/log/apertium for Kartik - https://phabricator.wikimedia.org/T108678#1596775 (10akosiaris) @kart et al, mind filling a task under https://phabricator.wikimedia.org/project/profile/956/... [13:34:53] 10Ops-Access-Requests, 6operations, 10ContentTranslation-Deployments, 3LE-CX6-Sprint 3, 5Patch-For-Review: Access to /var/log/apertium for Kartik - https://phabricator.wikimedia.org/T108678#1596776 (10akosiaris) 5Open>3Resolved [13:38:58] (03CR) 10John F. Lewis: "Already patched in https://gerrit.wikimedia.org/r/#/c/235451/ and changes legacy variables as well." [puppet] - 10https://gerrit.wikimedia.org/r/235463 (owner: 10Revi) [13:39:47] (03Abandoned) 10Revi: Change needs_volunteer to Lowest [puppet] - 10https://gerrit.wikimedia.org/r/235463 (owner: 10Revi) [13:40:11] (03CR) 10Jcrespo: [C: 031] "There are 3 possibe variables to take into account:" [puppet] - 10https://gerrit.wikimedia.org/r/234503 (owner: 10Jcrespo) [13:41:02] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:42:32] (03PS5) 10Jcrespo: Save binary log coordinates from the master and the slave on backup [puppet] - 10https://gerrit.wikimedia.org/r/234503 [13:43:10] 6operations, 6Labs, 10wikitech.wikimedia.org: intermittent nutcracker failures - https://phabricator.wikimedia.org/T105131#1596786 (10akosiaris) Great! Let's hope it's that indeed. [13:44:21] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [13:46:33] RECOVERY - puppet last run on mw2114 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [13:47:48] (03Abandoned) 10Eevans: set JVM_OPTS entirely [puppet] - 10https://gerrit.wikimedia.org/r/235012 (owner: 10Eevans) [13:48:26] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "LGTM. I can merge myself if required. Best time to schedule this for merge though is Tuesday (just so we are aware of the changes on Wedne" [puppet] - 10https://gerrit.wikimedia.org/r/234503 (owner: 10Jcrespo) [13:48:45] akosiaris: you shouoldn't C+2 and V+2 to it and not merge :P [13:48:52] it's confusing [13:49:04] indeed [13:49:15] ok, I 'll fallback to +1 [13:49:18] use +1 instead if it is an endorsement, + a message if you're happy to merge it yourself [13:50:03] well, it's not an endorsement really. It's an LGTM and probably nobody else is there better than me to give the LGTM [13:50:21] the V+2 btw is because I wrote LGTM in the text box [13:50:34] and gerrit decided to V+2 on its own and I forgot about that behavior [13:51:13] (03CR) 10Alexandros Kosiaris: [C: 031] "Since I am not merging it right now. I moved down to +1. Damn gerrit..." [puppet] - 10https://gerrit.wikimedia.org/r/234503 (owner: 10Jcrespo) [13:51:18] akosiaris: yeah, but I think in our context V is usually set by jenkins / other automated bots. [13:51:18] there... [13:51:35] akosiaris: I agree it's quite confusing, though, the model and our local 'conventions' [13:55:01] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1596795 (10JohnLewis) >>! In T109279#1596750, @Nemo_bis wrote: >> closed blocking task Restricted Task as "Resolved". > > If it's resolved, please make it public... [13:58:45] (03CR) 10Jcrespo: "@Alex, let me sleep over the changes (I may do some extra changes), and we can schedule it for Tuesday." [puppet] - 10https://gerrit.wikimedia.org/r/234503 (owner: 10Jcrespo) [14:04:39] hey akosiaris, package versioning q for you [14:04:51] i fixed a but in a package, and I have submitted a pull request for the bug fix [14:05:02] * YuviPanda bugs ottomata about the stat -> labstore sync [14:05:10] but, I am impatient, and I want to use the fixed version [14:05:19] i fixed the bug in upstream's master [14:05:32] so it isn't tagged [14:05:57] should I build the package from master or somehow cherry pick onto a branch (or local tag?) of their latest tag [14:06:03] and either way, what should the version be? [14:06:07] (latest tag is 2.0.0) [14:06:14] YuviPanda: I have the ball??? [14:06:15] looking [14:06:36] ottomata: I... think so? [14:06:44] ungh i have many tickets and am now having a hard time finding things! [14:06:45] ah! [14:07:02] YuviPanda: its closed! [14:07:10] akosiaris: opened up the acl in the vlan [14:07:52] ottomata: oh [14:07:54] hahaha [14:07:55] I see [14:07:58] ottomata: so it's all good now? [14:08:00] I must've missed it [14:08:03] yes! [14:08:05] it works [14:08:41] ottomata: awesome! cool :) [14:09:00] ottomata: we already got a package for it ? [14:09:16] get the patch, put it in debian/patches [14:09:33] add and a debian/patches/series file if there isn't one already [14:09:40] and just bump the debian revision [14:09:59] as in ours is X.Y.Z-1, do it X.Y.Z-2 [14:10:22] ottomata: all this, assuming we are talking about debian and the patch applies cleanly on X.Y.Z and not just 2.0.0 [14:12:17] akosiaris: when I cherry picked the patches from master onto the tag, there were conflicts [14:12:33] but, i wrote the patch, so I could just manually make the changes and create a special patch for the 2.0.0 tag [14:12:46] but, ok ja, that makes sense [14:16:24] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1596872 (10jcrespo) > The content of the ticket is considered private data. The status of the ticket does not impact whether the data is public or private. John... [14:21:49] ottomata: so you patched master and not the version you want to run. Yeah, you will have to amend the patch to take into acount the version you want to run [14:22:12] akosiaris: ja that's easy enough, its a small aptch [14:22:26] the conflicts were minimal, but its easier just to make a manual patch rather than diff or cherry pick [14:22:53] (03CR) 10Jcrespo: [C: 031] Enable ferm for role::mariadb::core [puppet] - 10https://gerrit.wikimedia.org/r/235436 (owner: 10Muehlenhoff) [14:23:35] (03CR) 10Jcrespo: [C: 031] Enable ferm for mariadb core servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/235440 (owner: 10Muehlenhoff) [14:23:56] 6operations, 6Phabricator: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1596877 (10Dzahn) I would suggest that tickets are mass re-assigned to 'nobody/up for grabs" unless there was a specific agreement with somebody else taking them. This is the experience i... [14:24:12] (03CR) 10Jcrespo: [C: 031] Enable ferm for db2055-2070 [puppet] - 10https://gerrit.wikimedia.org/r/235453 (owner: 10Muehlenhoff) [14:25:24] (03CR) 10Jcrespo: [C: 04-1] "This need some extra rules which I need to check, let's postpone it." [puppet] - 10https://gerrit.wikimedia.org/r/235445 (owner: 10Muehlenhoff) [14:26:08] (03CR) 10Jcrespo: [C: 04-1] "This need some extra rules, let's postpone it." [puppet] - 10https://gerrit.wikimedia.org/r/235444 (owner: 10Muehlenhoff) [14:27:20] 6operations, 6Phabricator: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1596893 (10Krenair) >>! In T108131#1596491, @Aklapper wrote: > Afraid of sidetracking, but I'm wondering if/how to cover assigned tasks in ticketing systems. For example [[ https://wikimed... [14:27:34] (03CR) 10Jcrespo: [C: 031] "Ok with the change, most busy servers can have ~20K active simultaneous connections." [puppet] - 10https://gerrit.wikimedia.org/r/235443 (owner: 10Muehlenhoff) [14:28:27] (03CR) 10Jcrespo: [C: 031] "for dbstore2x we can proceed without problems." [puppet] - 10https://gerrit.wikimedia.org/r/235435 (owner: 10Muehlenhoff) [14:28:32] 6operations, 10Incident-20150205-SiteOutage, 7Database: sleeper database connection surges during outage - https://phabricator.wikimedia.org/T88770#1596897 (10Krenair) [14:32:08] (03PS3) 10Dzahn: Point sitemap.wikimedia.org to text-lb. [dns] - 10https://gerrit.wikimedia.org/r/234257 (https://phabricator.wikimedia.org/T110511) (owner: 10Chmarkine) [14:32:47] 6operations, 7Database: puppet stopped mysqld using orphan pid file from puppet agent - https://phabricator.wikimedia.org/T86482#1596929 (10Krenair) [14:35:39] (03PS1) 10Addshore: wgRCWatchCategoryMembership flase for commons & wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235467 (https://phabricator.wikimedia.org/T109707) [14:36:05] addshore: flase, eh? [14:36:11] addshore: is that a short form for a flower vase? :) [14:36:12] yes, flase [14:36:32] http://www.nearlynatural.com/common/images/products/large/4740-LG.jpg [14:36:42] a flase is better than a flpot, more fancy..... [14:36:57] (03PS2) 10Dzahn: phabricator: replace Needs Volunteer with Lowest [puppet] - 10https://gerrit.wikimedia.org/r/235451 (owner: 10John F. Lewis) [14:37:09] (03CR) 10Dzahn: [C: 032] phabricator: replace Needs Volunteer with Lowest [puppet] - 10https://gerrit.wikimedia.org/r/235451 (owner: 10John F. Lewis) [14:37:19] (03PS2) 10Addshore: wgRCWatchCategoryMembership false for commons & wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235467 (https://phabricator.wikimedia.org/T109707) [14:40:24] !log TTMServer reindex complete [14:40:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:42:42] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Various inline comments. In general, I think the approach will work, so LGTM. It pains me though to see we are running a software that is " (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/231512 (https://phabricator.wikimedia.org/T95253) (owner: 10Filippo Giunchedi) [14:42:55] (03CR) 10BBlack: [C: 032] Point sitemap.wikimedia.org to text-lb. [dns] - 10https://gerrit.wikimedia.org/r/234257 (https://phabricator.wikimedia.org/T110511) (owner: 10Chmarkine) [14:46:49] 6operations, 7HTTPS, 5Patch-For-Review: sitemap.wikimedia.org uses invalid SSL certificate - https://phabricator.wikimedia.org/T110511#1597011 (10Dzahn) Since Apache and DNS change are merged, this should solve the issue for now since sitemap can use the wildcard cert. [14:48:12] 6operations, 7HTTPS, 5Patch-For-Review: sitemap.wikimedia.org uses invalid SSL certificate - https://phabricator.wikimedia.org/T110511#1597025 (10Dzahn) 5Open>3Resolved [14:48:14] 6operations, 10Traffic: Clean up DNS/redirects for TLS - https://phabricator.wikimedia.org/T102824#1597026 (10Dzahn) [14:49:43] Nikerabbit: yay for ttmserver! [14:50:00] Krenair: It’s a bit much to ask, but I would love to get https://gerrit.wikimedia.org/r/#/c/235469/ into the upcoming swat [14:50:11] (if the patch looks valid to you, of course) [14:51:02] 6operations, 7Monitoring: grafana.wikimedia.org calls out to AWS - https://phabricator.wikimedia.org/T110484#1597044 (10greg) >>! In T110484#1596556, @fgiunchedi wrote: > I don't see a way to disable this via config at least in our version in trebuchet :| Yeah, I spent way too long yesterday poking around con... [14:53:10] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1597063 (10demon) For future reference, if it's just data (like queries, or logs, etc) and not an actual task that needs to be private, you can [[ /paste/create/... [14:53:19] 6operations, 7HTTPS, 5Patch-For-Review: sitemap.wikimedia.org uses invalid SSL certificate - https://phabricator.wikimedia.org/T110511#1597064 (10Dzahn) https://www.ssllabs.com/ssltest/analyze.html?d=sitemap.wikimedia.org&s=208.80.154.224 [14:54:11] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra inter-node encryption (TLS) - https://phabricator.wikimedia.org/T108953#1597071 (10fgiunchedi) >>! In T108953#1596768, @Eevans wrote: > FYI, we have T111113 as well. > > For this, I assume that we'll need another key per host (for the client), and PEM... [14:54:46] jynus: That last comment on the connection spike bug is a #TIL for you :) [14:55:22] Well, for anyone, but relevant because of the discussion there :) [14:59:08] I actually knew the functionality [14:59:46] but preferred a ticket in that case for a) avoid human error b) allow better forth and back [15:00:04] anomie ostriches thcipriani marktraceur Krenair: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150902T1500). [15:00:04] kart_ Glaisher: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:11] jynus: Fair enough :) [15:00:21] Here [15:00:28] I will be available during the process [15:00:32] andrewbogott, is_string( $projecttoken && !$forcerefresh ) [15:00:33] ? [15:00:36] 6operations, 7Monitoring: grafana.wikimedia.org calls out to AWS - https://phabricator.wikimedia.org/T110484#1597119 (10akosiaris) >>! In T110484#1597044, @greg wrote: >>>! In T110484#1596556, @fgiunchedi wrote: >> I don't see a way to disable this via config at least in our version in trebuchet :| > > Yeah,... [15:00:38] 10Ops-Access-Requests, 6operations, 7Icinga: give John Lewis permissions to send commands in icinga - https://phabricator.wikimedia.org/T105229#1597120 (10JohnLewis) Testing (elsewhere) and docs show that if a contact has 'can_submit_commands' set to 1 in their contact definition - they are allowed to submit... [15:00:51] "uncomfortable position" is a bit excessive [15:01:07] Didn't mean it to be, sry. [15:01:17] Krenair: For all those times you want to check if a boolean is a string :) [15:01:45] kart_, your changes are going through jenkins now [15:02:00] we do not reveal ips, session data, passwords... probably as a DBA I am so much acostumed to handle private data and several laws, etc :-) [15:02:09] Thanks Krenair [15:02:47] Glaisher, hey [15:02:55] Hi [15:03:03] jynus: IPs get copy+pasted accidentally all the time :p [15:03:12] Glaisher, gonna do yours while we wait for jenkins on kart_'s [15:03:21] * ostriches chuckles a bit [15:03:24] ok [15:04:12] it is true that I consider ips and session activity a level behind passwords and data deletion [15:04:13] (03PS2) 10Alex Monk: Enable WikidataPageBanner extension on Russian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234942 (https://phabricator.wikimedia.org/T110837) (owner: 10Glaisher) [15:04:18] (03CR) 10Alex Monk: [C: 032] Enable WikidataPageBanner extension on Russian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234942 (https://phabricator.wikimedia.org/T110837) (owner: 10Glaisher) [15:04:24] (03Merged) 10jenkins-bot: Enable WikidataPageBanner extension on Russian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234942 (https://phabricator.wikimedia.org/T110837) (owner: 10Glaisher) [15:04:30] (03PS3) 10Alex Monk: Clean up WikidataPageBanner related config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234944 (owner: 10Glaisher) [15:04:39] the latter, for example, do not even reach physically labs [15:04:44] 10Ops-Access-Requests, 6operations, 7Icinga: give John Lewis permissions to send commands in icinga - https://phabricator.wikimedia.org/T105229#1597145 (10JohnLewis) >>! In T105229#1555857, @Dzahn wrote: > just wondering, let's say the subtask was resolved, for which services is John requesting permissions?... [15:04:48] I know :) [15:05:23] GlobalUserPage is still in extension-list-labs? [15:05:27] legoktm, ^ [15:05:31] (03CR) 10Alex Monk: [C: 032] Clean up WikidataPageBanner related config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234944 (owner: 10Glaisher) [15:05:36] we should also not put them on phabricator at all because of potential security bugs [15:05:38] (03Merged) 10jenkins-bot: Clean up WikidataPageBanner related config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234944 (owner: 10Glaisher) [15:05:56] actually, we should not copy-paste passwords, ever [15:06:08] Indeed! [15:06:15] That's what sticky notes are for! [15:06:18] ha ha [15:06:49] copy-stick. never copy-paste! [15:07:04] one thing I would like to do with moritzm one day (when we finish our current priority tasks == never) [15:07:17] Krenair: you’re right! One moment... [15:07:27] is audit more deeply our security [15:07:30] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234942/ and https://gerrit.wikimedia.org/r/#/c/234944/ (duration: 00m 13s) [15:07:33] Glaisher, ^ [15:07:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:07:50] Krenair: looking [15:07:55] 6operations, 6Phabricator: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1597162 (10greg) >>! In T108131#1596491, @Aklapper wrote: > Afraid of sidetracking, but I'm wondering if/how to cover assigned tasks in ticketing systems. >>! In T108131#1596877, @Dzahn w... [15:08:26] JohnFLewis: My password store: https://static1.squarespace.com/static/5266e732e4b0eea0b509efa6/t/5318eac1e4b08801985b3e4a/1394141890071/Old+School+Password+Storage.png [15:08:49] ostriches: better than GPG? :) [15:08:55] Krenair: there, better? [15:08:59] Krenair: works [15:09:06] Glaisher, great [15:09:07] JohnFLewis: Completely obfuscated! [15:09:15] yeah, thanks [15:09:15] I hang a blank sticky in front of the one with the password on it [15:09:22] (So you can't see them!) [15:10:16] added security: use UV ink as well [15:10:17] 6operations, 10ops-codfw: Create shipment for eqord (router and gear) - https://phabricator.wikimedia.org/T109109#1597187 (10Papaul) 5Open>3Resolved This task is complete. I have the box in shipping. I am resolving this task. [15:10:19] 6operations, 10ops-codfw: EQDFW/EQORD Deployment Prep Task - https://phabricator.wikimedia.org/T91077#1597189 (10Papaul) [15:12:55] (03PS1) 10Alex Monk: Remove GlobalUserPage and ParsoidBatchAPI from extension-list-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235475 [15:13:05] kart_, okay, looks like jenkins completed [15:14:39] !log krenair@tin Synchronized php-1.26wmf20/extensions/ContentTranslation/modules/tools/ext.cx.tools.template.js: https://gerrit.wikimedia.org/r/#/c/235441/ (duration: 00m 12s) [15:14:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:14:47] kart_, ^ [15:14:56] Testing.. [15:14:57] oh by the way, I noticed you marked the wmf21 commit as wmf20 on the calendar [15:15:09] oh [15:15:09] 6operations, 10ops-eqiad: Prepare shipping label for mx80 to eqord - https://phabricator.wikimedia.org/T109338#1597213 (10Papaul) The box is in shipping waiting for confirmation once the box gets picked up. it will be sometimes before noon that the UPS truck will pick up out going packages according to the sh... [15:16:12] copy-paste :) [15:16:16] fixing. [15:17:31] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra inter-node encryption (TLS) - https://phabricator.wikimedia.org/T108953#1597221 (10fgiunchedi) >>! In T108953#1596328, @fgiunchedi wrote: > so I don't think we'll be able to reuse puppet certs with multiple instances, thus we'll have to roll our own. s... [15:19:33] !log krenair@tin Synchronized php-1.26wmf21/extensions/ContentTranslation/modules/tools/ext.cx.tools.template.js: https://gerrit.wikimedia.org/r/#/c/235442/ (duration: 00m 12s) [15:19:35] kart_, ^ please test [15:19:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:20:58] Krenair: looks okay! [15:21:10] can only test in wmf20, that's fine :) [15:24:26] andrewbogott, okay, doing your change now [15:25:00] Krenair: thanks! Are you going to make the branch & subproject commits too? [15:25:22] I think gerrit is self-aware now and makes the submodule commits automatically [15:25:25] (mysteriously0 [15:25:26] ) [15:25:38] I would both like and not like that to be true [15:26:09] it does! i learned of this yesterday [15:26:36] It's been doing submodule updates for almost all extensions, skins and other submodules for a while [15:26:40] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Updating_the_submodule [15:26:46] Unless your extension is called VisualEditor [15:26:48] (fukken magic) [15:28:06] (03PS1) 10Hashar: Merge tag '0.1.1' into debian [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/235480 [15:28:39] (03CR) 10Hashar: [C: 032 V: 032] Merge tag '0.1.1' into debian [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/235480 (owner: 10Hashar) [15:28:46] (03PS1) 10Alexandros Kosiaris: cassandra: storage_port is for cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/235481 [15:31:36] (03PS6) 10Hashar: Support spaces in Gearman functions names [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/205564 [15:31:48] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1597292 (10Aklapper) 3NEW [15:31:52] (03CR) 10Hashar: "Merged upstream." [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/205564 (owner: 10Hashar) [15:32:00] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1597304 (10Aklapper) [15:32:00] (03PS6) 10Hashar: Stop all threads on SIGUSR1 [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/225410 [15:32:36] (03CR) 10Aklapper: "> yea, i think against "database" is good for grant requests" [puppet] - 10https://gerrit.wikimedia.org/r/233219 (https://phabricator.wikimedia.org/T85183) (owner: 10Aklapper) [15:33:32] andrewbogott, YuviPanda, MatmaRex: This one is a pain for other reasons... [15:33:56] sorry :( How so? [15:33:57] me? [15:33:59] !log krenair@tin Synchronized php-1.26wmf20/extensions/OpenStackManager/nova/OpenStackNovaController.php: https://gerrit.wikimedia.org/r/#/c/235479/ (duration: 00m 13s) [15:33:59] Is there time for another patch? Sorry for the late arrival [15:34:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:34:13] andrewbogott, please test ^ [15:34:24] ok, testing :) [15:34:32] MatmaRex, no, deploying that patch was a pain because git. [15:36:26] marktraceur, sure, what is it? [15:36:48] Krenair: We actually might have two, but James_F seems to be taking charge of the other one [15:36:51] I have https://gerrit.wikimedia.org/r/#/q/I2277dda9fdac248e16317ca0f1ec5d1357096cb3,n,z [15:36:58] https://gerrit.wikimedia.org/r/#/q/I56f0859abff6ef9c983351dc2546ad3d647eb000,n,z [15:37:18] mediawiki.ui.button dependency in MMV not expressed. [15:37:21] :o [15:38:17] Krenair: anything that that might have broken is still working, so I think all is well. Time will tell if that fixed the thing I’m trying to fix. Thanks for the last-minute merge! [15:38:43] I'll do the wmf21 one as well [15:39:00] (03PS3) 10EBernhardson: Enable experiment with experimental completion suggester [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235391 (https://phabricator.wikimedia.org/T111078) [15:41:36] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1597366 (10jcrespo) I am sorry, this is not 100% clear to me. :-) * Do you need to grant database access to puppet to do changes, so that puppet agent can change mysql things? or * Do you... [15:42:06] !log krenair@tin Synchronized php-1.26wmf21/extensions/OpenStackManager/nova/OpenStackNovaController.php: https://gerrit.wikimedia.org/r/#/c/235482/ (duration: 00m 12s) [15:42:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:42:17] andrewbogott, ^ [15:42:21] (03PS9) 10Filippo Giunchedi: cassandra: WIP support for multiple instances [puppet] - 10https://gerrit.wikimedia.org/r/231512 (https://phabricator.wikimedia.org/T95253) [15:42:32] (03CR) 10Filippo Giunchedi: cassandra: WIP support for multiple instances (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/231512 (https://phabricator.wikimedia.org/T95253) (owner: 10Filippo Giunchedi) [15:43:34] ottomata: so addshore (from WMDE) wants hadoop / hive access, is there a form telling him what to do? [15:43:54] YuviPanda: I guess https://wikitech.wikimedia.org/wiki/Requesting_shell_access [15:44:16] 6operations, 10RESTBase, 10RESTBase-Cassandra: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1597378 (10Eevans) Considering that we have made little progress on this task, and that it's near crunch time, I propose the following: We focus on T108953, for the purposes o... [15:44:23] marktraceur, James_F: All 4 patches are going through jenkins [15:44:32] addshore: ah, hmm, yes, that might be enough [15:44:45] Krenair: splendid [15:45:01] YuviPanda: https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups [15:45:09] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Access#CLI_Access [15:45:18] addshore: ^ [15:45:19] (03PS1) 10Hashar: Bump Nodepool 0.1.1 [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/235487 (https://phabricator.wikimedia.org/T107266) [15:45:31] *clicks and reads8 [15:45:33] YuviPanda, addshore, ottomata: reference https://phabricator.wikimedia.org/T106042 as well [15:46:03] hahah, nice timing [15:46:20] Hah! [15:47:42] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1597402 (10jcrespo) Reading the gerrit ticket, I think it is the second option (the first is not allowed). Which grants do you need, @Aklapper (read only?). [15:49:23] woah, who's editing wikiversions? [15:49:34] Nikerabbit, twentyafterfour? [15:49:37] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1597406 (10jcrespo) a:3jcrespo [15:49:50] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1597292 (10jcrespo) p:5Triage>3Normal [15:49:58] jdlrobson, around? [15:50:37] Krenair: yup [15:51:01] jdlrobson, shall I change the wmgMFMobileFormatterHeadings value to drop h1 on all wikivoyages? [15:51:02] Krenair: wut? [15:51:10] Krenair: that would be good [15:51:16] https://ru.m.wikivoyage.org/wiki/%D0%9F%D1%8F%D1%80%D0%BD%D1%83 is currently looking very broken [15:51:24] and if it's enabled everywhere there are gonna be cache issues for a while :/ [15:51:35] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: T110837 (duration: 00m 13s) [15:51:38] jdlrobson, try now [15:51:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:51:55] boom! perfect [15:52:00] order is restored thanks Krenair :) [15:52:08] will upload the patch to gerrit [15:52:44] Krenair: I looked to that file recently but did not edit, closed all connections to tin as well [15:53:18] (03PS1) 10Alex Monk: Set wmgMFMobileFormatterHeadings value to drop h1 on all wikivoyages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235489 (https://phabricator.wikimedia.org/T110837) [15:53:37] 6operations, 5Continuous-Integration-Scaling, 7Nodepool, 5Patch-For-Review: Bump our Nodepool Debian package to 0.1.1 - https://phabricator.wikimedia.org/T107266#1597415 (10hashar) ``` root@integration-slave-jessie-1001:~/nodepool(debianu+1)# GIT_PBUILDER_AUTOCONF=no DIST=jessie WIKIMEDIA=yes git-buildpack... [15:53:48] 6operations, 5Continuous-Integration-Scaling, 7Nodepool, 5Patch-For-Review: Bump our Nodepool Debian package to 0.1.1 - https://phabricator.wikimedia.org/T107266#1597416 (10hashar) p:5Triage>3High [15:53:51] (03CR) 10Alex Monk: [C: 032] "already in prod to unbreak mobile ruwikivoyage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235489 (https://phabricator.wikimedia.org/T110837) (owner: 10Alex Monk) [15:53:57] (03Merged) 10jenkins-bot: Set wmgMFMobileFormatterHeadings value to drop h1 on all wikivoyages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235489 (https://phabricator.wikimedia.org/T110837) (owner: 10Alex Monk) [15:54:22] (03CR) 10Hashar: [C: 032 V: 032] "Uploaded to:" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/235487 (https://phabricator.wikimedia.org/T107266) (owner: 10Hashar) [15:55:54] 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Scaling, 7Nodepool: Upload nodepool_0.1.1-wmf1 package to apt.wikimedia.org to `jessie-wikimedia/thirdparty` - https://phabricator.wikimedia.org/T111203#1597647 (10hashar) 3NEW [15:56:39] !log krenair@tin Synchronized php-1.26wmf20/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: https://gerrit.wikimedia.org/r/#/c/235485/ (duration: 00m 12s) [15:56:41] marktraceur, ^ please test [15:56:44] Woot [15:56:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:57:01] 10Ops-Access-Requests, 6operations: Requesting access to hadoop / hive (analytics-privatedata-users) for Addshore - https://phabricator.wikimedia.org/T111204#1597666 (10Addshore) 3NEW [15:57:16] YuviPanda: ^^ ;) think I did everything [15:57:28] heh, thanks for pointing that out to me too! :D [15:58:05] Krenair: Looks fine to me, I'm getting MatmaRex to test if IE11 is fixed [15:58:09] !log krenair@tin Synchronized php-1.26wmf20/extensions/MultimediaViewer/MultimediaViewer.php: https://gerrit.wikimedia.org/r/#/c/235483/ (duration: 00m 13s) [15:58:11] James_F: ^ please test [15:58:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:58:47] addshore: :D [15:59:30] 6operations, 10RESTBase, 10RESTBase-Cassandra: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1597681 (10GWicke) > Ensure that eqiad RESTBase clients do not auto-discover codfw nodes Afaik they will discover those nodes. The default DCAwareRoundRobinPolicy behavior is... [15:59:56] Krenair: marktraceur: IE11 fixed, as expected [16:00:13] Sweet! [16:01:49] !log krenair@tin Synchronized php-1.26wmf21/extensions/UploadWizard/resources/mw.UploadWizardUploadInterface.js: https://gerrit.wikimedia.org/r/#/c/235486/ (duration: 00m 12s) [16:01:52] marktraceur, MatmaRex: ^ [16:01:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:02:59] !log krenair@tin Synchronized php-1.26wmf21/extensions/MultimediaViewer/MultimediaViewer.php: https://gerrit.wikimedia.org/r/#/c/235484/ (duration: 00m 12s) [16:03:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:03:37] Krenair: actually, there's no UploadWizard on any wmf21 wikis, afaik. so i can't test that :) [16:03:42] MatmaRex: https://test2.wikipedia.org/wiki/Special:UploadWizard [16:03:42] heh, okay [16:04:03] Works, waiting on IE11 confirmation [16:04:12] oh, we have it on test2? okay. looking [16:04:40] sure works. [16:05:19] Coolio, thanks MatmaRex [16:05:23] ytmnd [16:05:27] could you guys also test the MMV change please? [16:06:08] Looks fine to me [16:06:36] Thanks Krenair [16:07:09] ty [16:12:29] 6operations, 7Mail: Please add kharold@wikimedia.org to grants alies - https://phabricator.wikimedia.org/T111125#1597732 (10Dzahn) per https://meta.wikimedia.org/wiki/User:KHarold_%28WMF%29 -> "My background is in grantmaking" .. "working with the Grantmaking, Learning & Evaluation, and Education teams" [16:12:52] !log setting BBU auto-learn mode to warn only (disabled if not possible) on all database hosts [16:12:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:14:48] (03Abandoned) 10Jcrespo: Create new module for managing RAID settings [puppet] - 10https://gerrit.wikimedia.org/r/212027 (https://phabricator.wikimedia.org/T84178) (owner: 10Jcrespo) [16:15:01] Krenair: Sorry, never mentioned that I tested the MediaViewer thing and it worked fine. [16:15:15] (Yay meetings.) [16:16:11] RECOVERY - BGP status on cr2-eqiad is OK: OK: host 208.80.154.197, sessions up: 74, down: 0, shutdown: 0 [16:18:42] 6operations, 7Database, 5Patch-For-Review: investigate RAID BBU auto-learn on db hosts - https://phabricator.wikimedia.org/T84178#1597769 (10jcrespo) 5Open>3Resolved All database hosts that allow "Warn only" have been setup as such. On the few that didn't, it has been disabled. This is the script that ha... [16:18:44] 6operations, 7Database, 5Patch-For-Review: investigate RAID BBU auto-learn on db hosts - https://phabricator.wikimedia.org/T84178#1597773 (10jcrespo) [16:22:45] 6operations: labstore monitoring: NRPE: Command 'check_cleanup-snapshots-labstore-state' not defined - https://phabricator.wikimedia.org/T111211#1597811 (10Dzahn) 3NEW [16:23:12] 6operations, 6Labs, 10Labs-Infrastructure, 7Monitoring: labstore monitoring: NRPE: Command 'check_cleanup-snapshots-labstore-state' not defined - https://phabricator.wikimedia.org/T111211#1597821 (10Dzahn) [16:23:43] 6operations, 10RESTBase, 10RESTBase-Cassandra: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1597831 (10Eevans) >>! In T108613#1597681, @GWicke wrote: >> Ensure that eqiad RESTBase clients do not auto-discover codfw nodes > > Afaik they will discover those nodes. The... [16:23:43] ACKNOWLEDGEMENT - Last cleanup of snapshots in the labstore vg on labstore1001 is CRITICAL: NRPE: Command check_cleanup-snapshots-labstore-state not defined daniel_zahn https://phabricator.wikimedia.org/T111211 [16:25:14] !log restarting NTP on lvs2004 [16:25:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:25:31] (i see in SAL this happened before and it was CRIT) [16:26:14] ori: should the keyholder on mira be armed or not armed unless it is actually used for deployment [16:30:05] Krenair: that would've been me [16:30:53] Krenair: fixed [16:31:08] twentyafterfour@tin:/srv/mediawiki-staging$ git reset --hard [16:34:56] ACKNOWLEDGEMENT - NTP on lvs2004 is CRITICAL: NTP CRITICAL: No response from NTP server daniel_zahn https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=795315 [16:36:46] 6operations, 7Mail: Please add kharold@wikimedia.org to grants alies - https://phabricator.wikimedia.org/T111125#1597919 (10Dzahn) 5Open>3Resolved a:3Dzahn Hi Emerauld, Alex, this is done. The list of people on the grants@ alias is now: awang, jtud, kharold Best regards, Daniel [16:38:20] mutante: um, dunno. question for other ops, i guess. [16:39:36] (03PS1) 10Yuvipanda: labstore: Do cleanups of snapshots only on active labstore [puppet] - 10https://gerrit.wikimedia.org/r/235500 [16:39:47] 6operations, 7Database: Grant puppet script access to "phabricator_project" DB - https://phabricator.wikimedia.org/T111200#1597942 (10Aklapper) The queries are SELECT ones so read-only please. :) [16:42:42] ori: well, it was armed before and now it's not anymore since your work on the keyholder [16:44:40] would you like me to arm it? [16:44:50] i don't think i know the passphrase [16:44:56] even though i did it one time before [16:45:00] so, yes [16:45:05] you created the passphrase! :P [16:45:12] it's on iron, mediawiki-deployment-key or whatever [16:45:26] ok [16:45:34] but i'll do it, no worries [16:45:53] i got it [16:45:58] also https://wikitech.wikimedia.org/wiki/Keyholder [16:46:05] done [16:46:17] icinga-wm: tell us [16:47:01] * ori jabs icinga-wm [16:48:08] reschedules the next check .. hrmm [16:48:10] 7Puppet, 10Continuous-Integration-Config, 6Scrum-of-Scrums, 5Patch-For-Review: Setup rubocop for operations/puppet ruby code lints - https://phabricator.wikimedia.org/T102020#1598011 (10dduvall) >>! In T102020#1353839, @EBernhardson wrote: > We would likely have to setup some reduced rule sets though, rubo... [16:48:11] RECOVERY - Keyholder SSH agent on mira is OK: OK: Keyholder is armed with all configured keys. [16:48:20] ok :) [16:54:57] 6operations, 10Wikimedia-General-or-Unknown, 7Database, 7Performance: ishmael shows blank graphs - https://phabricator.wikimedia.org/T66581#1598091 (10Krenair) [16:58:08] 6operations, 7Mail: Remove Alias for sj@wm.o - https://phabricator.wikimedia.org/T108276#1598109 (10Dzahn) a:5MoritzMuehlenhoff>3RobH [17:00:35] (03PS1) 10Alexandros Kosiaris: maps: Improve water_polygons population [puppet] - 10https://gerrit.wikimedia.org/r/235509 [17:01:41] (03CR) 10Alexandros Kosiaris: "Changed the way this works in https://gerrit.wikimedia.org/r/#/c/235509/" [puppet] - 10https://gerrit.wikimedia.org/r/232728 (https://phabricator.wikimedia.org/T109710) (owner: 10Yurik) [17:01:57] (03PS2) 10Alexandros Kosiaris: maps: Improve water_polygons population [puppet] - 10https://gerrit.wikimedia.org/r/235509 (https://phabricator.wikimedia.org/T109710) [17:02:05] (03CR) 10Alexandros Kosiaris: [C: 04-1] Maps: Add geo-index to the water_polygons table [puppet] - 10https://gerrit.wikimedia.org/r/232728 (https://phabricator.wikimedia.org/T109710) (owner: 10Yurik) [17:02:31] (03CR) 10Alexandros Kosiaris: [C: 04-2] "actually -2 is correct here, since I prefer the approach in https://gerrit.wikimedia.org/r/#/c/235509/" [puppet] - 10https://gerrit.wikimedia.org/r/232728 (https://phabricator.wikimedia.org/T109710) (owner: 10Yurik) [17:05:33] 7Puppet, 10Continuous-Integration-Config, 6Scrum-of-Scrums, 5Patch-For-Review: Setup rubocop for operations/puppet ruby code lints - https://phabricator.wikimedia.org/T102020#1598160 (10dduvall) >>! In T102020#1592844, @zeljkofilipin wrote: > The question is: which folders contain upstream code and hence s... [17:06:40] 6operations, 6Labs, 10wikitech.wikimedia.org: Determine whether wikitech should really depend on production search cluster - https://phabricator.wikimedia.org/T110987#1598167 (10Andrew) I no nothing about implementation, but I'd certainly prefer that search be self-contained on silver. [17:07:04] !log ori@tin Synchronized php-1.26wmf20/extensions/UniversalLanguageSelector: 2154acc529: Updated mediawiki/core Project: mediawiki/extensions/UniversalLanguageSelector (duration: 00m 13s) [17:07:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:07:49] !log ori@tin Synchronized php-1.26wmf21/extensions/UniversalLanguageSelector: 78a5908fd9: Updated mediawiki/core Project: mediawiki/extensions/UniversalLanguageSelector (duration: 00m 16s) [17:07:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:08:53] 6operations, 10ops-ulsfo: troubleshoot ulsfo side of IC-313592 - https://phabricator.wikimedia.org/T111101#1598196 (10RobH) I've swapped out the patch cable, and the link light is again on. However, since that doesn't mean anything, I've left a voicemail with Michelle @ Telia to call me back and please retest... [17:10:57] (03CR) 10MaxSem: maps: Improve water_polygons population (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/235509 (https://phabricator.wikimedia.org/T109710) (owner: 10Alexandros Kosiaris) [17:11:23] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1598215 (10RobH) This order is estimated to ship out on 9/9 with a delivery date of 9/15. [17:14:41] (03PS3) 10Alexandros Kosiaris: maps: Improve water_polygons population [puppet] - 10https://gerrit.wikimedia.org/r/235509 (https://phabricator.wikimedia.org/T109710) [17:19:52] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 Telia (IC-313592) {#1501} [10Gbps DWDM]BR [17:24:15] 6operations, 10Wikimedia-Mailing-lists: wikinews-l: no active listadmin - https://phabricator.wikimedia.org/T110956#1598282 (10Dzahn) done. added revi and koavf to admins. Wikinews-l list run by justinkoavf at gmail.com, revi at fastlizard4.org per: https://commons.wikimedia.org/wiki/File:Justin_Anthony_Knap... [17:24:28] duh [17:27:54] mutante: you didn't send the password yet, right? (ping me when you do it so I can manually refresh inbox) [17:28:38] 6operations, 10Wikimedia-Mailing-lists: wikinews-l: no active listadmin - https://phabricator.wikimedia.org/T110956#1598317 (10Dzahn) @revi ``` -----BEGIN PGP MESSAGE----- Version: GnuPG v1 hQIMA2UHtI9tcuAyAQ//eU7Ffp5pz9rtyEmVyXKwpTBJSATdkJtWFg7LIbc7+YXK Zv0w3rcXXkjWb+2D7gABKiHhiHM7sW27wNopx4R3m9yxxWeV2V/ke... [17:29:04] revi: https://phabricator.wikimedia.org/T110956#1598317 [17:32:38] 6operations, 10Wikimedia-Mailing-lists: wikinews-l: no active listadmin - https://phabricator.wikimedia.org/T110956#1598324 (10Dzahn) 5Open>3Resolved I sent it to Justin too. I moved the former admin to the moderator field so the email is still there if they want to be reactivated. [17:32:39] 6operations, 10Wikimedia-Mailing-lists: Evaluate lists with large moderation queues - https://phabricator.wikimedia.org/T110438#1598326 (10Dzahn) [17:39:45] uhm big read "Authorization failed" [17:39:53] s/read/red [17:41:19] 6operations: Change distribution in releases.wikimedia.org to "sid" or "jessie" - https://phabricator.wikimedia.org/T111225#1598395 (10GWicke) [17:42:18] 7Puppet, 10Continuous-Integration-Config, 6Scrum-of-Scrums, 5Patch-For-Review: Setup rubocop for operations/puppet ruby code lints - https://phabricator.wikimedia.org/T102020#1598399 (10JanZerebecki) Shouldn't the puppet modules that are imported be changed to submodules? [17:45:52] mutante: ^^ wikinews-l password fails apparently [17:47:51] revi: try now [17:48:17] (03PS2) 10Yuvipanda: Tools: Replace reference to tools. in class toollabs [puppet] - 10https://gerrit.wikimedia.org/r/234688 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:48:27] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Replace reference to tools. in class toollabs [puppet] - 10https://gerrit.wikimedia.org/r/234688 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:48:50] (03PS2) 10Yuvipanda: Tools: Replace reference to tools. in class toollabs::checker [puppet] - 10https://gerrit.wikimedia.org/r/234689 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:48:59] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Replace reference to tools. in class toollabs::checker [puppet] - 10https://gerrit.wikimedia.org/r/234689 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:49:06] (03PS2) 10Yuvipanda: Tools: Replace reference to tools. in project-make-access [puppet] - 10https://gerrit.wikimedia.org/r/234690 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:49:30] {{done}} [17:49:37] works, mutante. Thanks! [17:50:19] !log krenair@tin Synchronized php-1.26wmf21/extensions/VisualEditor/modules/ve-mw/ui/inspectors: https://gerrit.wikimedia.org/r/#/c/235511/ (duration: 00m 12s) [17:50:19] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Replace reference to tools. in project-make-access [puppet] - 10https://gerrit.wikimedia.org/r/234690 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:50:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:50:26] Thanks Krenair. [17:50:41] (03PS2) 10Yuvipanda: Tools: Replace reference to tools. in motd-tips.sh [puppet] - 10https://gerrit.wikimedia.org/r/234692 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:50:47] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Replace reference to tools. in motd-tips.sh [puppet] - 10https://gerrit.wikimedia.org/r/234692 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:50:58] (03PS2) 10Yuvipanda: Tools: Replace reference to tools. in tool-uwsgi-python [puppet] - 10https://gerrit.wikimedia.org/r/234697 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:51:16] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Replace reference to tools. in tool-uwsgi-python [puppet] - 10https://gerrit.wikimedia.org/r/234697 (https://phabricator.wikimedia.org/T87387) (owner: 10Tim Landscheidt) [17:51:42] PROBLEM - torrus.wikimedia.org HTTP on netmon1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string Torrus Top: Wikimedia not found on http://torrus.wikimedia.org:80/torrus - 838 bytes in 0.277 second response time [17:51:47] 6operations, 10Wikimedia-Mailing-lists: wikinews-l: no active listadmin - https://phabricator.wikimedia.org/T110956#1598433 (10Revi) #verified I can login to admin panel. Well, really big modqueue.... [17:53:13] (03CR) 10Alex Monk: [C: 032] Remove GlobalUserPage and ParsoidBatchAPI from extension-list-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235475 (owner: 10Alex Monk) [17:53:21] (03Merged) 10jenkins-bot: Remove GlobalUserPage and ParsoidBatchAPI from extension-list-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235475 (owner: 10Alex Monk) [17:53:42] RECOVERY - torrus.wikimedia.org HTTP on netmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 2166 bytes in 0.363 second response time [17:53:52] 6operations, 10Wikimedia-Mailing-lists: wikinews-l: no active listadmin - https://phabricator.wikimedia.org/T110956#1598440 (10Dzahn) >>! In T110956#1598433, @Revi wrote: > #verified I can login to admin panel. > > Well, really big modqueue.... yea, that started the whole ticket :) We already deleted everyt... [17:55:11] and.... reject not discard educates the spamassasin right? [17:55:46] and... no I can handle tue queu. [17:55:48] queue* [17:56:57] spamassassin learns? :o [17:58:08] it can [17:58:17] 6operations, 10Wikimedia-Mailing-lists: wikinews-l: no active listadmin - https://phabricator.wikimedia.org/T110956#1598453 (10Koavf) I logged in now. If we need another mass delete, we can always request it here. I'll poke through and see how it looks for now. [17:59:29] 6operations, 10Wikimedia-Mailing-lists: wikinews-l: no active listadmin - https://phabricator.wikimedia.org/T110956#1598458 (10Dzahn) :) great, thanks both of you for volunteering [18:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150902T1800). [18:00:10] (03PS1) 10Sbisson: Enable Flow beta feature on beta lab [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235516 [18:00:16] (03PS2) 10Yuvipanda: labstore: Do cleanups of snapshots only on active labstore [puppet] - 10https://gerrit.wikimedia.org/r/235500 [18:00:23] (03CR) 10Yuvipanda: [C: 032 V: 032] labstore: Do cleanups of snapshots only on active labstore [puppet] - 10https://gerrit.wikimedia.org/r/235500 (owner: 10Yuvipanda) [18:02:19] (03PS2) 10Yuvipanda: Tools: Fix quoting in sql script [puppet] - 10https://gerrit.wikimedia.org/r/235378 (https://phabricator.wikimedia.org/T75595) (owner: 10Tim Landscheidt) [18:02:31] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Fix quoting in sql script [puppet] - 10https://gerrit.wikimedia.org/r/235378 (https://phabricator.wikimedia.org/T75595) (owner: 10Tim Landscheidt) [18:02:33] (03CR) 10Catrope: [C: 031] Enable Flow beta feature on beta lab [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235516 (owner: 10Sbisson) [18:02:51] Nemo_bis, around? [18:02:54] wondering about https://phabricator.wikimedia.org/T111079 [18:03:05] I seem to recall some local policy needs to be in place for these? [18:03:12] (03CR) 10Jforrester: [C: 031] "Confirming as (former?) custodian of the Beta Features gateway that I'm OK with this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235516 (owner: 10Sbisson) [18:03:12] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [18:04:32] RECOVERY - Host mw2027 is UP: PING WARNING - Packet loss = 61%, RTA = 52.26 ms [18:07:09] Krenair: commented [18:11:35] (03PS2) 10Filippo Giunchedi: cassandra: storage_port is for cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/235481 (owner: 10Alexandros Kosiaris) [18:11:40] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: storage_port is for cluster communication [puppet] - 10https://gerrit.wikimedia.org/r/235481 (owner: 10Alexandros Kosiaris) [18:12:38] lesson from being listadmin for 10 mins: fuck new gTLD# [18:12:44] *gTLDs [18:13:11] haha [18:13:17] !log bouncing Cassandra on restbase1001 to apply temporary GC settings [18:13:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:13:36] 15 spams in 10 mins, all from new gTLDs like .win, .review, .faith... [18:15:23] PROBLEM - Host mr1-ulsfo is DOWN: PING CRITICAL - Packet loss = 100% [18:15:31] (03PS1) 10Yuvipanda: labstore: Make NFS unavailability paging [puppet] - 10https://gerrit.wikimedia.org/r/235521 [18:15:40] (03CR) 10jenkins-bot: [V: 04-1] labstore: Make NFS unavailability paging [puppet] - 10https://gerrit.wikimedia.org/r/235521 (owner: 10Yuvipanda) [18:15:42] mutante: ^ paging, wanna give it a quick look? [18:15:56] (03PS2) 10Yuvipanda: labstore: Make NFS unavailability paging [puppet] - 10https://gerrit.wikimedia.org/r/235521 [18:15:57] (just needed a rebase) [18:16:32] robh: who should i poke for a tiny content mistake in policy.wikimedia.org [18:17:51] matanya: someone who remembers why Wikipedia is a wiki? [18:18:07] heh [18:18:37] it is hosted by wordpress, and there is not edit button :) [18:19:41] PROBLEM - Cassandra database on restbase1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 111 (cassandra), command name java, args CassandraDaemon [18:20:52] RECOVERY - Host mr1-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 75.03 ms [18:21:54] YuviPanda: how about first adding it, running Icinga and then when it works, add only the critical => true line [18:22:31] matanya: privacy@wikimedia.org :p really [18:22:38] it says so [18:22:41] mutante: it's already there and works - I just moved it so it's there in one place than 3 [18:22:55] mutante: that would be a classic but old school :) [18:23:11] PROBLEM - Host mr1-codfw is DOWN: PING CRITICAL - Packet loss = 100% [18:23:43] matanya: "To discuss or help translate this page visit the public policy discussion group.":p [18:23:54] https://meta.wikimedia.org/wiki/Public_policy [18:24:56] YuviPanda: you moved it out of monitoring.pp though [18:24:56] mutante: so if you see icinga, it's there for labstore1002, 3, and 2001 :) [18:25:04] mutante: yes, because that's generic for all 3 labstores [18:25:21] mutante: while this affects only the active labstore [18:26:21] RECOVERY - Host mr1-codfw is UP: PING OK - Packet loss = 0%, RTA = 53.01 ms [18:26:22] (03CR) 10Dzahn: [C: 031] "this would make it page ops, yes" [puppet] - 10https://gerrit.wikimedia.org/r/235521 (owner: 10Yuvipanda) [18:26:49] YuviPanda: so it would work. my only concern would be if we also know what we are supposed to do when getting that [18:26:49] (03CR) 10Yuvipanda: [C: 032] labstore: Make NFS unavailability paging [puppet] - 10https://gerrit.wikimedia.org/r/235521 (owner: 10Yuvipanda) [18:27:20] mutante: yeah, https://phabricator.wikimedia.org/T88723 [18:29:26] ok time to deploy some trains [18:32:05] !log replacing disk 10 on db1028 [18:32:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:34:26] 6operations, 10ops-eqiad, 7Database: Disk issue on db1028 - https://phabricator.wikimedia.org/T103230#1598589 (10Cmjohnson) swapped disk with one from a recently decom'd server....in rebuild state nclosure Device ID: 32 Slot Number: 10 Drive's position: DiskGroup: 0, Span: 5, Arm: 1 Enclosure position: N/A... [18:34:33] (03PS1) 1020after4: group1 wikis to 1.26wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235528 [18:35:54] (03CR) 1020after4: [C: 032] group1 wikis to 1.26wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235528 (owner: 1020after4) [18:36:01] (03Merged) 10jenkins-bot: group1 wikis to 1.26wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235528 (owner: 1020after4) [18:36:33] !log twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf21 [18:36:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:36:42] PROBLEM - RAID on db1028 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [18:37:33] uh oh [18:38:38] cmjohnson1: ^ [18:41:21] ori: yep it's going to show that way until the rebuild [18:41:22] is over [18:48:23] 6operations, 10ops-eqiad: Change racktables entries for renamed analytics -> kafka names - https://phabricator.wikimedia.org/T109856#1598622 (10Cmjohnson) 5Open>3Resolved Finished [18:52:04] !log deployed patch for T110553 [18:52:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:53:23] 6operations, 10ops-ulsfo: troubleshoot ulsfo side of IC-313592 - https://phabricator.wikimedia.org/T111101#1598635 (10RobH) We swapped out the patch, but since Michelle didn't call me back in the hour I sat down at ulsfo, we installed a loopback instead. Since then I've opened a ticket with UnitedLayer who is... [19:00:06] 6operations, 7Mail: Upgrade Exim to >=4.73 - https://phabricator.wikimedia.org/T83541#1598659 (10Dzahn) a:3Dzahn [19:01:50] !log bouncing Cassandra on restbase1001 to address bogus icinga process failure alert [19:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:02:54] RECOVERY - Cassandra database on restbase1001 is OK: PROCS OK: 1 process with UID = 111 (cassandra), command name java, args CassandraDaemon [19:04:34] PROBLEM - Host mr1-ulsfo is DOWN: PING CRITICAL - Packet loss = 100% [19:04:34] PROBLEM - Host mr1-codfw is DOWN: PING CRITICAL - Packet loss = 100% [19:04:58] 6operations, 10ops-eqiad, 7network: investigate ethernet errors: asw2-a5-eqiad port xe-0/0/36 - https://phabricator.wikimedia.org/T107635#1598670 (10Cmjohnson) swapped both sfp's and the fiber [19:07:17] !log upgrading mr1-codfw, mr1-ulsfo to newer junos [19:07:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:07:55] RECOVERY - Host mr1-codfw is UP: PING OK - Packet loss = 0%, RTA = 52.29 ms [19:08:35] RECOVERY - Host mr1-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 74.12 ms [19:09:02] !log restarted gitblit, stopped counting [19:09:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:17:50] 6operations, 10ops-eqiad, 7network: investigate ethernet errors: asw2-a5-eqiad port xe-0/0/36 - https://phabricator.wikimedia.org/T107635#1598709 (10Cmjohnson) 5Open>3Resolved a:3Cmjohnson new fiber # is 3908....@faidon verified all looks good in IRC [19:22:48] 6operations, 10Salt: on bootup, salt-minion should not start with -d - https://phabricator.wikimedia.org/T104867#1598732 (10ArielGlenn) [19:22:54] RECOVERY - RAID on db1028 is OK: OK: optimal, 1 logical, 2 physical [19:23:16] 6operations, 10Deployment-Systems, 10Salt, 5Patch-For-Review: [Trebuchet] Salt times out on parsoid restarts - https://phabricator.wikimedia.org/T63882#1598735 (10ArielGlenn) [19:24:18] 6operations, 10Salt: salt broken after the upgrade - https://phabricator.wikimedia.org/T100502#1598739 (10ArielGlenn) [19:25:15] 6operations, 10Salt: various salt-minions are not replying to test.ping or commands - https://phabricator.wikimedia.org/T102808#1598749 (10ArielGlenn) a:3ArielGlenn [19:26:00] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs - https://phabricator.wikimedia.org/T99213#1598755 (10ArielGlenn) [19:26:17] 6operations, 10Salt: salt-minion dies if /var is full - https://phabricator.wikimedia.org/T104866#1598758 (10ArielGlenn) [19:27:25] 7Puppet, 6operations, 10Deployment-Systems, 10Salt, 10Staging: provider => trebuchet doesn't work until manual 'git deploy start' on deployment-server - https://phabricator.wikimedia.org/T92978#1598761 (10ArielGlenn) [19:29:05] (03PS1) 10Ori.livneh: GeoIP: specify lat/lon to two decimal places [puppet] - 10https://gerrit.wikimedia.org/r/235543 [19:29:14] bblack: ^ easy-peasy [19:31:55] 6operations, 10Wikimedia-Mailing-lists: import old staff list archives ? - https://phabricator.wikimedia.org/T109395#1598788 (10Dzahn) which current list is the continuation of the old staff list? wmfall@? wmfreqs@? [19:36:55] Hi, I'm having trouble sshing into tin. Haven't changed my setup since last time... https://tools.wmflabs.org/paste/view/fbcd32b6 The file .ssh/cluster and .ssh/cluster.pub does exist, but I have no .ssh/cluster-cert [19:37:43] (03PS9) 10Thcipriani: Add service deploy via scap [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 [19:47:21] greg-g: hi! sorry for the bother, whom to I ping if I can't ssh into tin for some reason (see above ^) ? [19:47:48] AndyRussG: can you post your ssh config? [19:48:21] and perhaps ssh with -vvv so its more verbose? :) [19:49:44] -vvv gives the same, nearly: https://tools.wmflabs.org/paste/view/c3fbf9b9 [19:49:50] it's the right identify file [19:50:06] heh I missed that completely actually :) [19:50:19] The file exists, too [19:50:58] 6operations, 10ops-eqiad: Prepare shipping label for mx80 to eqord - https://phabricator.wikimedia.org/T109338#1599038 (10Cmjohnson) a:5Papaul>3Cmjohnson 1ZA19A020291908114 is the tracking number....according to @papaul in IRC this shipped at 245pm CST. Reassigning to myself to receive in at eqord. [19:51:29] Ah hmm sshing to bastion w/ vvv gives more info.... [19:52:28] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1599052 (10Smalyshev) 3NEW [19:52:50] 6operations, 10Wikimedia-Mailing-lists: import old staff list archives ? - https://phabricator.wikimedia.org/T109395#1599059 (10brion) I'd like to have these accessible; list archives provide records of decisions made, why we made them, etc. Not everybody is as good about keeping custom archives going back the... [19:52:59] K I think I got it, looks like it's using the wrong id file for bastion [19:53:02] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1599061 (10Smalyshev) [19:53:40] Hm.. mediawiki.org is still on 1.26wmf20? [19:53:51] 18:36 logmsgbot: twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf21 [19:54:40] Krinkle: known [19:54:51] https://phabricator.wikimedia.org/T111237 [19:55:11] AndyRussG: in general, here / opsen [19:55:13] JohnFLewis: (greg-g) solved!! many thanks :) [19:55:17] coolio! [19:55:45] :D [19:55:52] greg-g: and? [19:56:15] robh: hi, can you look at https://phabricator.wikimedia.org/T105229#1597120 please? added bonus: will solve https://phabricator.wikimedia.org/T111243#1599061 as well :) [19:56:25] It seems we want ahead with the next group still? [19:56:35] ^ that sounds good indeed, like icinga already has the feature we wanted it tohave [19:56:54] if somebody is a contact they can run commands on "their" service [19:57:25] where is it defined who owns what services? [19:57:33] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint, 7Icinga: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1599069 (10Dzahn) [19:57:55] I thought the issue was we don't have that kind of service grouping in icinga, and thats what had to be done first [19:58:03] that question is a trigger word for mass explosions robh [19:58:09] 6operations, 10Wikimedia-Mailing-lists: import old staff list archives ? - https://phabricator.wikimedia.org/T109395#1599071 (10Jalexander) >>! In T109395#1598788, @Dzahn wrote: > which current list is the continuation of the old staff list? wmfall@? wmfreqs@? officially neither... but wmfreqs is probably... [19:59:24] I mean, we just gave in and gave all services permissions for all of icinga due to that (with the stern instruction of 'if you use this improperly you'll lose it' [19:59:26] ) [20:00:00] greg-g: Ah, wrong time zone. I was under the impression that task was filed 12 hours ago [20:00:04] gwicke cscott arlolra subbu: Respected human, time to deploy Services – Parsoid / OCG / Citoid / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150902T2000). Please do the needful. [20:00:26] Phab doesn't display the interface timezone anywhere (not even ISO on hover). Fixed. [20:00:30] robh: they can do it for what they're a contact of [20:00:55] e.g. if I'm a contact for fermium, I can send commands to the icinga monitoring of fermium with that enabled [20:01:55] (03CR) 10Freephile: [C: 031] Enable Flow beta feature on beta lab [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235516 (owner: 10Sbisson) [20:02:03] huh... tested in labs already? (not saying you have to i can as well) [20:02:03] !log krinkle@tin Synchronized php-1.26wmf21/resources/src/startup.js: Ie65427caee (duration: 00m 12s) [20:02:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:02:12] or we can live hack some testing but just wondering [20:02:27] i see 'tested elsewhere' just wondering where elsewhere is [20:02:46] * robh is also cooking his lunch so won't merge right this second but is reading the task and is willing to give it a shot a bit later [20:03:34] robh: on a separate project of mine [20:03:58] icinga versions are different but history shows the function was added before 1.6.1 and never modified since [20:04:42] it seems easy enough to test out [20:05:06] (03PS1) 1020after4: group0 wikis to 1.26wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235609 [20:05:12] and i happen to know the only reason we didnt approve was the scope, so if this works, its indeed what we need =] [20:05:40] (03CR) 1020after4: "Apparently this change got overwritten." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235609 (owner: 1020after4) [20:05:43] robh: added plus; you add me as contact for fermium too which kills two birds with one stone ;) [20:05:46] (03CR) 1020after4: [C: 032] group0 wikis to 1.26wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235609 (owner: 1020after4) [20:05:51] (03Merged) 10jenkins-bot: group0 wikis to 1.26wmf21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235609 (owner: 1020after4) [20:08:46] !log twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf21 [20:08:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:17:53] (03CR) 10Deskana: [C: 031] "Thanks. No problems here, now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235274 (https://phabricator.wikimedia.org/T76497) (owner: 10EBernhardson) [20:21:09] JohnFLewis: the summary is great. especially how you split it up already for Ops/Admins/Users. thanks [20:21:51] mutante: too bad all this work is so we can now run list_lists to show only public archives! :o [20:21:57] (03PS1) 10Krinkle: graphite: Add aggregation logic for ".sum" properties [puppet] - 10https://gerrit.wikimedia.org/r/235612 (https://phabricator.wikimedia.org/T111170) [20:22:07] and then a bunch of security fixes patched in 2009 :p [20:22:54] JohnFLewis: :p nah, it's all just for the "no more lucid" log line [20:23:02] that will be the fun part [20:23:13] and merging all the stuff that was waiting just for that [20:23:58] JohnFLewis: the "spam prevention" one should be popular! [20:24:19] possibly! [20:24:28] looking at wikinews-l - spam is news [20:25:11] heh, so it's ham [20:27:36] !log updated Parsoid to version 5f2fae6c [20:27:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:34:57] (03PS1) 10Dzahn: mailman: add config options introduced in 2.1.15 [puppet] - 10https://gerrit.wikimedia.org/r/235614 [20:35:41] (03CR) 10jenkins-bot: [V: 04-1] mailman: add config options introduced in 2.1.15 [puppet] - 10https://gerrit.wikimedia.org/r/235614 (owner: 10Dzahn) [20:35:44] JohnFLewis: ^ just not sure yet if we have to wait [20:36:28] and no, that mm_cfg.py is not going to be PEP8 compliant [20:37:09] uh? https://integration.wikimedia.org/ci/job/operations-puppet-pep8/4937/violations/file/modules/mailman/files/mm_cfg.py/ [20:39:13] (03PS2) 10Dzahn: mailman: add config options introduced in 2.1.15 [puppet] - 10https://gerrit.wikimedia.org/r/235614 [20:39:52] (03PS3) 10Dzahn: mailman: add config options introduced in 2.1.15 [puppet] - 10https://gerrit.wikimedia.org/r/235614 [20:40:06] (03CR) 10Ori.livneh: [C: 032] graphite: Add aggregation logic for ".sum" properties [puppet] - 10https://gerrit.wikimedia.org/r/235612 (https://phabricator.wikimedia.org/T111170) (owner: 10Krinkle) [20:41:49] 6operations, 10Wikimedia-Mailing-lists: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1599242 (10Dzahn) >>! In T110140#1595524, @JohnLewis wrote: > I recommend the follow be set: > AUTHENTICATION_COOKIE_LIFETIME = 3600 >... [20:42:27] mutante: https://gerrit.wikimedia.org/r/#/c/235384/ I patched it after I made those comments :) [20:42:41] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1599245 (10Dzahn) [20:43:55] JohnFLewis: ooh, you were faster:) ok [20:44:09] 20 hours faster ;) [20:44:13] if older mailman just ignores these we dont have to wait [20:45:52] if you want to merge it now; feel free to [20:46:41] (03CR) 10John F. Lewis: mailman: set new settings to improve security [puppet] - 10https://gerrit.wikimedia.org/r/235384 (owner: 10John F. Lewis) [20:51:26] JohnFLewis: i actually asked #mailman instead [20:51:48] Sapiro might be around :p [20:52:11] why am I here if #mailman exists? :P [20:52:47] (03Abandoned) 10Dzahn: mailman: add config options introduced in 2.1.15 [puppet] - 10https://gerrit.wikimedia.org/r/235614 (owner: 10Dzahn) [20:57:27] (03PS2) 10Catrope: Enable Flow beta feature on beta lab [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235516 (owner: 10Sbisson) [20:58:54] twentyafterfour: Wondering why mw-core wmf/1.26wmf21 vendor has a bad treeish? [20:59:34] I see you created the branch here, commit 3e27288a5d8b945c22d41d0e66a4a4258d1127db and vendor points to 13f55dafe444e8a9474ac6a93f97ff70377b4635, which doesn't appear in gerrit [20:59:43] AndyRussG: fyi ^ [20:59:47] RFC meeting "Master & slave datacenter strategy" starting in 1 minute in #wikimedia-office [21:00:04] AndyRussG: Respected human, time to deploy CentralNotice (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150902T2100). Please do the needful. [21:11:54] PROBLEM - Host mr1-ulsfo is DOWN: PING CRITICAL - Packet loss = 100% [21:12:44] (03PS1) 10Tim Landscheidt: Labs: Allow self-hosted puppetmasters to auto-sign certificates [puppet] - 10https://gerrit.wikimedia.org/r/235621 [21:14:14] (03CR) 10Tim Landscheidt: "Tested and currently deployed on Toolsbeta." [puppet] - 10https://gerrit.wikimedia.org/r/235621 (owner: 10Tim Landscheidt) [21:15:14] (03PS2) 10Krinkle: asset-check: Wait for async modules to finish [puppet] - 10https://gerrit.wikimedia.org/r/235256 [21:15:15] RECOVERY - Host mr1-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 76.61 ms [21:18:24] PROBLEM - Host mr1-ulsfo is DOWN: PING CRITICAL - Packet loss = 100% [21:22:08] (03PS1) 10Ori.livneh: monitor.py: PEP8 compliance [debs/pybal] - 10https://gerrit.wikimedia.org/r/235623 [21:22:10] (03PS1) 10Ori.livneh: ipvs.py: PEP8 compliance [debs/pybal] - 10https://gerrit.wikimedia.org/r/235624 [21:23:00] (03CR) 10Ori.livneh: [C: 032] "I still think an mw.track in core would be better, but OK with this for now." [puppet] - 10https://gerrit.wikimedia.org/r/235256 (owner: 10Krinkle) [21:23:24] RECOVERY - Host mr1-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 74.00 ms [21:23:37] ori: Ah, I forgot about that. I'll do that later this week. [21:23:49] * Krinkle makes note this time. [21:24:03] I started creating a task for it in a tab somewhere, but never created it :D [21:24:08] process.nextTick() [21:25:06] heh [21:25:49] 6operations, 7network: asw2-a5-eqiad.mgmt.eqiad.wmnet xe-0/0/36 reporting errors - https://phabricator.wikimedia.org/T100820#1599422 (10faidon) [21:25:52] 6operations, 10ops-eqiad, 7network: investigate ethernet errors: asw2-a5-eqiad port xe-0/0/36 - https://phabricator.wikimedia.org/T107635#1599424 (10faidon) [21:28:35] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: import all lists with the script we wrote for that - https://phabricator.wikimedia.org/T110131#1599443 (10Platonides) >>! In T110131#1590865, @Dzahn wrote: > We should just blindly rsync it all so it's exactly like before and if in doubt we can deal... [21:31:34] PROBLEM - check google safe browsing for wiktionary.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:31:55] 6operations, 7network: change default route damping metric to 6000 - https://phabricator.wikimedia.org/T81587#1599473 (10faidon) 5Open>3declined a:3faidon We don't do route damping at all, AFAIK. [21:35:35] RECOVERY - check google safe browsing for wiktionary.org on google is OK: HTTP OK: HTTP/1.1 200 OK - 3942 bytes in 6.311 second response time [21:36:24] was that failing to connect go google...? o.O [21:39:12] 6operations, 7network: Implement RPKI (Resource Public Key Infrastructure) - https://phabricator.wikimedia.org/T61115#1599535 (10faidon) p:5Low>3Lowest [21:42:34] (03PS3) 10Thcipriani: Add config deployment [tools/scap] - 10https://gerrit.wikimedia.org/r/235385 [21:48:14] 6operations, 10Traffic, 7network: Requests from a specific network are blocked - https://phabricator.wikimedia.org/T110208#1599595 (10Platonides) The blocked ip looks like a shared web server. Why does a //web server// need a connction to Wikipedia? [21:53:01] awight: I don't know [21:54:06] ok no problem. bd808 says it could be a security patch [21:54:21] hmm [21:54:21] makes it a bit awkward to test locally, but I assume it'll pass [21:54:49] I'm not aware of a security patch on vendor and the security patch should never be pushed [21:56:04] there are no security patches on vendor afaik [21:56:24] 6operations, 10Fundraising Tech Backlog: Add emcnaughton@wikimedia.org to fr-tech@ email group - https://phabricator.wikimedia.org/T111257#1599634 (10K4-713) 3NEW [21:56:47] robh, hey [21:57:10] ? [21:57:23] are you merging https://gerrit.wikimedia.org/r/#/c/235047/ today? [21:58:02] no one objected, so yep, can merge now [21:58:09] (03PS3) 10RobH: admin: add Krenair to researchers [puppet] - 10https://gerrit.wikimedia.org/r/235047 (https://phabricator.wikimedia.org/T110754) (owner: 10John F. Lewis) [21:58:19] * JohnFLewis -1s [21:58:34] (03CR) 10RobH: [C: 032] admin: add Krenair to researchers [puppet] - 10https://gerrit.wikimedia.org/r/235047 (https://phabricator.wikimedia.org/T110754) (owner: 10John F. Lewis) [21:58:58] thanks [22:00:21] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting research DB access for Alex Monk - https://phabricator.wikimedia.org/T110754#1599671 (10RobH) 5Open>3Resolved No objections, merged and is now live. [22:00:40] welcome [22:01:51] 6operations, 10Traffic, 7network: Requests from a specific network are blocked - https://phabricator.wikimedia.org/T110208#1599683 (10MaxSem) >>! In T110208#1599595, @Platonides wrote: > The blocked ip looks like a shared web server. Why does a //web server// need a connction to Wikipedia? To share, remix a... [22:06:19] 6operations, 7network: Diagram eqiad network - https://phabricator.wikimedia.org/T80048#1599710 (10faidon) 5Open>3declined a:3faidon No point at this point really... [22:07:52] Ahm, dear ops [22:07:59] https://etherpad.wikimedia.org/p/ve-future-thoughts-2015 is broken [22:08:01] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: import all lists with the script we wrote for that - https://phabricator.wikimedia.org/T110131#1599716 (10Dzahn) >>! In T110131#1599443, @Platonides wrote: >>>! In T110131#1590865, @Dzahn wrote: >> We should just blindly rsync it all so it's exactly... [22:08:06] So... we have kind of lost our roadmap work? [22:08:23] Is Etherpad still officially unsupported? [22:08:26] ( cc mutante ) [22:09:09] 6operations, 10Traffic, 7network: Requests from a specific network are blocked - https://phabricator.wikimedia.org/T110208#1599729 (10BBlack) There's a dark side to that general openness WRT to heavy/obvious remote proxycaches. People have put up what amounts to complete (logos and legal notices and all) an... [22:09:16] 6operations, 7network: mr1 asymetric routing - https://phabricator.wikimedia.org/T81440#1599733 (10faidon) 5Open>3Resolved a:3faidon I'm not sure if I understand this task (the task summary describes a solution, not a problem), but a) we do have mr1's /32s injected into OSPF, b) mr1s work fine, as far as... [22:12:49] 7Puppet, 6operations, 5Patch-For-Review: Migrate as much as possible from network::constants from network.pp to hiera - https://phabricator.wikimedia.org/T87519#1599761 (10faidon) [22:12:55] 6operations: Enable add_ip6_mapped functionality on all hosts - https://phabricator.wikimedia.org/T100690#1599762 (10faidon) [22:13:07] 6operations, 10ops-codfw, 10netops: cr1-eqdfw PEM 0 failure - https://phabricator.wikimedia.org/T110435#1599768 (10faidon) p:5Normal>3High [22:13:42] !log andyrussg@tin Started scap: Update CentralNotice to 2.6.0 for wmf21 [22:13:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:15:31] 6operations, 10Traffic, 10netops: Requests from a specific network are blocked - https://phabricator.wikimedia.org/T110208#1599772 (10Ireas) As far as I know, this was about customers using the Wikipedia API. [22:15:45] 6operations, 10netops: interfaces that are down on mr1-eqiad - https://phabricator.wikimedia.org/T84502#1599774 (10faidon) 5Open>3Resolved a:3faidon These are now set to administratively down, so the check succeeds. The Layer42 interface is still down and this is tracked by T82323. [22:16:24] Who is doing this evening's SWAT? [22:18:59] RoanKattouw: ostriches: rmoen: Krenair: Hi! ^ Just asking since the CentralNotice deploy is taking a little time, because scap [22:19:19] It's not usually planned in advance [22:19:22] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra inter-node encryption (TLS) - https://phabricator.wikimedia.org/T108953#1599798 (10Eevans) @fgiunchedi, here is the background work I promised; I do not know if this is the //only// way, but this does demonstrate a working example of using an openssl g... [22:19:25] depends on whoever happens to be around [22:19:29] I'm in a meeting until 10 to [22:21:43] Krenair: hmmm I didn't also see any patches chalked up on the Deployments page [22:21:56] stuff often goes up quite late [22:21:58] Krenair: should we just march onward? [22:22:07] greg-g? [22:22:52] Maybe it was a bad idea, but I wanted to test wmf21 before sending up wmf20. So we're scapping having only pushed the change to wmf21 still [22:23:04] That part should be done by the end of our deploy slot [22:23:24] So we could also just leave things at wmf21. But it'd be best to just get it all out [22:23:29] awight|nomadic: https://phabricator.wikimedia.org/P1974 [22:23:31] because of campaigns [22:24:03] awight: it looks like that commit does exist, as far as I can tell ^ [22:29:48] twentyafterfour: Looks good. Must be a security patch or something? Cos https://gerrit.wikimedia.org/r/#/q/13f55dafe444e8a9474ac6a93f97ff70377b4635,n,z [22:32:45] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 [22:33:49] (03PS1) 10Andrew Bogott: Added puppetmaster test for catchpoint. [puppet] - 10https://gerrit.wikimedia.org/r/235632 (https://phabricator.wikimedia.org/T107456) [22:34:34] (03CR) 10jenkins-bot: [V: 04-1] Added puppetmaster test for catchpoint. [puppet] - 10https://gerrit.wikimedia.org/r/235632 (https://phabricator.wikimedia.org/T107456) (owner: 10Andrew Bogott) [22:35:11] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint, 7Icinga: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1599857 (10Dzahn) @Smalyshev As a first step, can we confirm that a basic login on icinga.... [22:36:16] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint, 7Icinga: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1599873 (10Smalyshev) @Dzahn, yes, I can log in to icinga and see stuff, but not control n... [22:38:54] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 Telia (IC-313592) {#1501} [10Gbps DWDM]BR [22:39:51] andrewbogott: can you try using the python 'requests' library? Instead of curl [22:40:17] andrewbogott: also the certfile might not be world readable [22:40:54] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 [22:41:10] robh: ^^^ [22:41:13] YuviPanda: I tried four different libraries before dropping back to curl [22:41:16] including ‘requests' [22:41:17] robh: carlos plugged the fiber back? [22:41:28] if you know of a way to pass in cert, key, cacert then please link me to docs [22:41:42] paravoid: he hasnt emailed me back that he has, but I did point out to him which one it was [22:41:46] so he knows what goes back [22:41:47] And, yeah, the cert isn’t world readable, but we need it to do this test, so we’ll have to find a way to read it [22:42:08] since he was there when i pulled and replaced the patch, plus to install the loopback [22:42:31] alright [22:42:41] also michelle hasnt returned my call still [22:42:49] so its good we didnt bother to wait on her. [22:43:03] awight: https://phabricator.wikimedia.org/rMWVD13f55dafe444e8a9474ac6a93f97ff70377b4635 [22:43:09] andrewbogott: http://docs.python-requests.org/en/latest/user/advanced/ [22:43:44] no security patch [22:43:51] andrewbogott: there is a section on SSL certificates. Both custom ca and local certs supported [22:44:33] twentyafterfour: OK sorry to waste your time with that. Any idea why git submodule update -i fails on my local? [22:45:01] I'm pointing at https://gerrit.wikimedia.org/r/p/mediawiki/core/vendor.git [22:45:13] and git log -1 13f55dafe444e8a9474ac6a93f97ff70377b4635 tells me No [22:45:18] bad object [22:51:05] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Core: cr1-eqord:xe-0/0/1 Telia (IC-313592) {#1501} [10Gbps DWDM]BR [22:52:10] well not everything can last forever :( [22:53:15] 6operations, 10Traffic, 10netops: Requests from a specific network are blocked - https://phabricator.wikimedia.org/T110208#1599958 (10Platonides) @MaxSem, that's not what you generally do from a shared web server. The logical thing woudl be to use local machine, or at least vps/dedicated server. Yes, you co... [22:55:05] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 [22:58:31] andrewbogott: going to sleep now. But I think the etcd::client or similar role has a hack for making the cert readable [22:58:37] (On phone and can't grep sorry) [22:59:29] 6operations, 10netops: Set up NTT transit @ eqdfw, eqord - https://phabricator.wikimedia.org/T111274#1599999 (10faidon) 3NEW [22:59:42] 6operations, 10netops: Set up NTT transit @ eqdfw, eqord - https://phabricator.wikimedia.org/T111274#1600015 (10faidon) [23:00:04] RoanKattouw ostriches rmoen Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150902T2300). Please do the needful. [23:00:40] AndyRussG, still scapping? [23:01:40] Schedule looks empty. Shouldn't the CentralNotice patch be linked? [23:01:55] Krenair: yes, the initial scap is almost done. We had thought of scapping again for the other branch, that's why I asked about more time [23:02:00] !log andyrussg@tin Finished scap: Update CentralNotice to 2.6.0 for wmf21 (duration: 48m 18s) [23:02:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:02:28] oh nvm [23:02:29] However there's an error on the CentralNotice admin UI so we're not going to continue until that's sorted out [23:02:49] My hope is it's something funny due to things rolling out bit by bit [23:02:53] Didn't you test this on mw1017 before the full scap? [23:03:45] Krenair: this is an admin interface that is only activated on meta [23:04:04] AndyRussG, yes.... So did you use mw1017 or not? [23:04:12] Krenair: no [23:04:16] why not? [23:04:17] Didn't know about that [23:04:32] awight: ejegg: ^ [23:06:28] Krenair: looking at the how to deploy page, I'm not sure it's possible to do a test with the admin interface that's on meta that way? Maybe I'm completely off. We did test a lot on the beta cluster [23:06:56] Krenair: aah OK now I see... [23:07:22] You can route any request that goes through production varnish to a wiki domain to mw1017 [23:07:42] Wow, that is a well-hidden feature [23:07:49] That's basically every single wiki that isn't some crazy thing in frack and isn't wikitech [23:08:23] Good to know [23:09:48] Krenair: The advantage over the beta cluster is being slightly more production-y? [23:10:19] Much more production-y [23:10:44] I don't know whether you could've caught the admin interface bug on beta or not [23:10:50] Also, why was this a full scap to update a single extension anyway? Did you make l10n changes? [23:11:51] yep, new i18n messages [23:13:13] Krenair: looks like ejegg just fixed the error! [23:13:20] Krenair: is there nothing in the SWAT? should we just go on? [23:13:41] Doesn't look like it [23:13:46] I wonder why scap took so long [23:14:01] greg-g, ^ [23:16:34] don't ask me [23:16:38] :) [23:17:31] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint, 7Icinga: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1600185 (10Dzahn) @Smalyshev Ok, great. So the next step to be able to run commands (sche... [23:17:34] greg-g: you should know these things! [23:17:45] jynus, there's another issue related to certain wikis having their own cluster (not the main ones), this time for Flow: https://phabricator.wikimedia.org/T111254 . This affects officewiki. Thanks again. [23:17:47] I try to know a lot, but I don't know it all :) [23:18:01] Heh I clearly no much less though [23:20:46] greg-g: what is the right way to proceed now with our deploy? It looks like there is nothing scheduled for the SWAT. We could in theory send up our fix, test a bit more, then deploy to wmf20 with another scap... I know it's a lot of messy unplannedness, really sorry about that [23:20:58] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint, 7Icinga: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1600197 (10Dzahn) Now this icinga contact can be used in puppet classes that apply monitor... [23:23:26] RoanKattouw_away: rmoen: Krenair: ostriches: we're still on the cluster deploying CN stuff, then. I'm assuming it's OK since there was nothing scheduled on the SWAT :) [23:24:16] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint, 7Icinga: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1600217 (10Smalyshev) @Dzahn yes that sounds good, thanks! [23:29:11] awight: I have no idea really, it sounds like your copy of the repo is somehow corrupted [23:29:15] missing that object [23:29:30] I rm'd the dir and re-downloaded. You have it locally? [23:29:41] yes [23:29:59] 10Ops-Access-Requests, 6operations, 7Icinga: give John Lewis permissions to send commands in icinga - https://phabricator.wikimedia.org/T105229#1600235 (10JohnLewis) FYI doc wise: http://docs.icinga.org/latest/en/objectdefinitions.html#contact ``` can_submit_commands: This directive is used to determine wh... [23:30:06] AndyRussG, probably. I'm not objecting to it myself. [23:30:41] Krenair: ...ok thanks [23:30:48] 10Ops-Access-Requests, 6operations, 3Discovery-Wikidata-Query-Service-Sprint, 7Icinga: Get smalyshev permissions to icinga enough to control monitoring for wdqs_eqiad group - https://phabricator.wikimedia.org/T111243#1600246 (10JohnLewis) @dzahn see T105229#1600235 for being able to send the commands as a... [23:30:48] rm -rf vendor/; git submodule update -i => fail [23:30:51] AndyRussG: i think its all good :) [23:30:52] 6operations, 10Traffic, 10netops: Requests from a specific network are blocked - https://phabricator.wikimedia.org/T110208#1600248 (10akosiaris) The user in question has contacted both me and @Ironholds privately providing an explanation for the behavior, noting he did already take a mitigation step. They di... [23:30:59] rmoen: thanks! [23:31:04] awight: fail in what way? [23:31:50] The same way, that the commit is a bad object. I'll just leave this alone for now, and will holler if it happens again. [23:32:19] ah, rm -rf vendor probably doesn't clear my local repo, it's hiding under .git somewhere. [23:32:48] awight: I'm trying to reproduce the issue, so far I haven't run into any problems with cloning, checking out the branch, and running submodule update -i --recursive [23:33:28] !log andyrussg@tin Synchronized php-1.26wmf21/extensions/CentralNotice/: Update CentralNotice (duration: 00m 13s) [23:33:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:34:41] 10Ops-Access-Requests, 6operations, 7Icinga: give John Lewis permissions to send commands in icinga - https://phabricator.wikimedia.org/T105229#1600256 (10Dzahn) >>! In T105229#1597145, @JohnLewis wrote: >>>! In T105229#1555857, @Dzahn wrote: >> just wondering, let's say the subtask was resolved, for which s... [23:43:05] 6operations, 10Wikimedia-Mailing-lists, 7Documentation: Overhaul Mailman documentation - https://phabricator.wikimedia.org/T109534#1600290 (10Ariconte) >>! In T109534#1551772, @Dzahn wrote: > It's not even clear to me why it has to be split between meta and wikitech. I think there are three audience... [23:57:57] !log andyrussg@tin Synchronized php-1.26wmf20/extensions/CentralNotice/: CentralNotice update (duration: 00m 13s) [23:58:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:58:27] (03PS1) 10Tim Landscheidt: labs_lvm: Only run extend-instance-vol when needed [puppet] - 10https://gerrit.wikimedia.org/r/235642 (https://phabricator.wikimedia.org/T109933)