[00:00:41] PROBLEM - Check whether ferm is active by checking the default input chain on ftp-internal is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [00:03:28] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 3 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3663102 (10EddieGP) >>! In T176754#3659854, @jcrespo wrote: > Let's stop meta-talking about the issue, and start worki... [00:03:31] RECOVERY - MegaRAID on db1046 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [00:09:34] (03PS1) 10EddieGP: [DNM] Add cron job for expired userrights maintenance script [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) [00:13:59] (03CR) 10EddieGP: "I hardly understand puppet logic at all (hopefully: yet), so I'm neither sure this is right nor that it's everything needed. Happy about a" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [00:16:30] (03CR) 10EddieGP: "Oh, and also: Should we test this on beta-cluster for a while or run it manually a few times before having a cron-job do it unattended? Do" [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [00:17:12] (03CR) 10Dzahn: [DNM] Add cron job for expired userrights maintenance script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [00:17:28] (03CR) 10Reedy: "Don't think it really matters. We could cherry pick it on the beta cluster, but unless someone starts creating quite a few expiring group " [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [00:18:59] (03CR) 10Dzahn: "the puppet part itself looks like it would work, yea. that doesn't mean though i checked "maintenance/purgeExpiredUserrights.php" or i can" [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [00:21:34] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 3 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3635590 (10Krinkle) >>! In T176754#3658385, @kaldari wrote: >>If purging is undesirable on production, which is someth... [00:27:08] 10Operations, 10DC-Ops: Review and fix PDU settings for syslog/ntp/email servers - https://phabricator.wikimedia.org/T175341#3663152 (10ayounsi) On ps1-d2-eqiad.mgmt.eqiad.wmnet: * FQDN set * NTP set * DNS set * Syslog set to librenms and syslog.eqiad.wmnet * Upgraded to most recent firmware * Email disabled... [00:28:00] (03CR) 10Dzahn: "the puppet code does add the crons, see here: http://puppet-compiler.wmflabs.org/8209/ terbium = maintenance server in eqiad wasat = ma" [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [00:30:34] (03CR) 10Dzahn: "should be like: "profile::mediawiki::maintenance::purge_expired_userrights::ensure: absent" in Hiera" [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [00:57:01] (03CR) 10EddieGP: "> Don't think it really matters. We could cherry pick it on the beta" [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [01:15:04] (03CR) 10Reedy: "www-data has no crontab (well, it's empty) on wasat. On terbium, it has many lines" [puppet] - 10https://gerrit.wikimedia.org/r/382631 (https://phabricator.wikimedia.org/T176754) (owner: 10EddieGP) [01:30:31] PROBLEM - puppet last run on analytics1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:55:31] RECOVERY - puppet last run on analytics1044 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [03:31:11] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 643.97 seconds [03:36:28] 10Operations, 10Deployments, 10Stashbot: [[wikitech:Server_admin_log]] should not rely on freenode irc for logmsgbot entries - https://phabricator.wikimedia.org/T46791#3663308 (10Liuxinyu970226) [04:43:03] (03PS1) 10Phedenskog: Fixed typeo for responseEnd property. [puppet] - 10https://gerrit.wikimedia.org/r/382643 (https://phabricator.wikimedia.org/T104902) [04:45:22] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 298.94 seconds [05:31:19] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1053" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382644 [05:31:23] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1053" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382644 [05:34:14] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1053" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382644 (owner: 10Marostegui) [05:34:24] (03PS2) 10Marostegui: mariadb: Update socket location for db1076 [puppet] - 10https://gerrit.wikimedia.org/r/381950 (https://phabricator.wikimedia.org/T174054) [05:35:20] !log Stop MySQL on db1076 for an upgrade [05:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:35:40] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1053" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382644 (owner: 10Marostegui) [05:36:08] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1053" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382644 (owner: 10Marostegui) [05:36:52] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1053 - T174509 (duration: 00m 48s) [05:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:36:58] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [05:38:10] (03CR) 10Marostegui: [C: 032] mariadb: Update socket location for db1076 [puppet] - 10https://gerrit.wikimedia.org/r/381950 (https://phabricator.wikimedia.org/T174054) (owner: 10Marostegui) [05:53:58] (03PS1) 10Marostegui: db-eqiad.php: Repool db1076 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382646 [05:57:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1076 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382646 (owner: 10Marostegui) [05:59:23] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1076 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382646 (owner: 10Marostegui) [05:59:33] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1076 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382646 (owner: 10Marostegui) [06:00:32] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1076 with low weight - T174054 (duration: 00m 46s) [06:00:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:00:38] T174054: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054 [06:14:37] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db1106 to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382648 (https://phabricator.wikimedia.org/T172679) [06:17:47] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Add db1106 to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382648 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:20:38] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db1106 to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382648 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:20:49] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db1106 to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382648 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:21:59] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Add db1106 to the config - T172679 (duration: 00m 47s) [06:22:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:22:06] T172679: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679 [06:23:32] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add db1106 to the config - T172679 (duration: 00m 47s) [06:23:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:53] 10Operations, 10Goal: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208#3663452 (10Marostegui) [06:28:25] (03PS1) 10Giuseppe Lavagetto: Add checks for nodes [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/382649 [06:29:33] mediawiki.org not loading for me [06:29:50] works for me [06:30:04] right.. just as I said it finished after 10+ seconds [06:30:25] <_joe_> marostegui: https://www.mediawiki.org/ gives me an error indeed [06:30:37] Not to me :| [06:30:38] <_joe_> and then it works [06:31:24] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382650 [06:31:28] Oh, now I see it [06:32:42] <_joe_> marostegui: can you check the x-cache header of the page? [06:33:34] X-Cache: cp3040 int [06:33:42] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [06:33:49] <_joe_> ok [06:33:53] <_joe_> ema: ^^ [06:33:55] looking [06:33:59] <_joe_> cp3040 maybe? [06:34:21] 3032 [06:34:30] +1 [06:34:38] let me try to disable max_connections [06:34:41] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382650 (owner: 10Marostegui) [06:35:31] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [06:35:49] !log set max_connections=0 on cp3032 varnish-be [06:35:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:36:03] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382650 (owner: 10Marostegui) [06:36:13] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1076 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382650 (owner: 10Marostegui) [06:37:12] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1076 weight - T174054 (duration: 00m 47s) [06:37:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:37:18] T174054: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054 [06:37:21] hey it looks like it worked [06:39:23] btw, is it know that mediawiki-errors dashboard in logstash doesn't load? [06:40:12] ema: timing checks out, it seems indeed that it worked [06:40:30] ok let's disable max_connections on all text backends and see [06:41:30] <_joe_> Nikerabbit: yeah looks strange, open a task maybe? [06:41:35] ema: so no moar 429 right ? [06:42:50] _joe_: ok I'll check Phab [06:44:23] (03PS1) 10Ema: cache_text: disable max_connections [puppet] - 10https://gerrit.wikimedia.org/r/382651 (https://phabricator.wikimedia.org/T175803) [06:44:33] elukey: ? [06:45:07] ema: I was trying to figure out what it means practically "disable max-conns" in varnish :) [06:46:06] oh! No, that's unrelated to 429s. It means don't give up if connections from backend to backend are piling up [06:46:29] ahhh okok [06:46:32] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:46:36] (03CR) 10Ema: [C: 032] cache_text: disable max_connections [puppet] - 10https://gerrit.wikimedia.org/r/382651 (https://phabricator.wikimedia.org/T175803) (owner: 10Ema) [06:47:42] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:49:32] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [06:50:03] 3032 ema [06:50:26] uh? I don't see it on varnish-failed-fetches [06:50:29] seems already recovered afaics [06:50:31] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:51:10] maybe this is a side effect of max-conns disabled? [06:52:03] I can see the spike in https://grafana.wikimedia.org/dashboard/db/varnish-failed-fetches?orgId=1&var-datasource=esams%20prometheus%2Fops&var-cache_type=text&from=now-3h&to=now [06:52:26] but that's the previous one [06:53:42] oh snap you are right [06:53:56] in fact I was trying to find the other one and failed miserably [06:54:20] all right no issue, sorry man, need coffee :) [06:54:27] the alarms deceived me [06:54:28] same here! [06:54:33] coffee! [06:55:44] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3663481 (10Marostegui) >>! In T145885#3555784, @Paladox wrote: > I doint think this is worth it now sin... [06:58:51] (03PS1) 10Marostegui: db-eqiad.php: Increase db1076 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382652 [07:02:31] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase db1076 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382652 (owner: 10Marostegui) [07:03:50] (03PS1) 10Ema: Revert "cache_text: disable max_connections" [puppet] - 10https://gerrit.wikimedia.org/r/382653 [07:04:03] (03Merged) 10jenkins-bot: db-eqiad.php: Increase db1076 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382652 (owner: 10Marostegui) [07:04:20] (03CR) 10jerkins-bot: [V: 04-1] Revert "cache_text: disable max_connections" [puppet] - 10https://gerrit.wikimedia.org/r/382653 (owner: 10Ema) [07:05:05] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1076 weight - T174054 (duration: 00m 47s) [07:05:10] (03CR) 10Ema: [V: 032 C: 032] Revert "cache_text: disable max_connections" [puppet] - 10https://gerrit.wikimedia.org/r/382653 (owner: 10Ema) [07:05:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:05:11] T174054: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054 [07:06:05] (03CR) 10jenkins-bot: db-eqiad.php: Increase db1076 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382652 (owner: 10Marostegui) [07:09:21] (03PS1) 10Ema: cache_text: disable varnish_be<->varnish_be max_connections [puppet] - 10https://gerrit.wikimedia.org/r/382654 (https://phabricator.wikimedia.org/T175803) [07:09:54] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663502 (10Ladsgroup) >>! In T99531#3663063, @Dzahn wrote: > Next we should figure out: > > - who should have Gerrit permissions for +2/mer... [07:11:25] !log live hacking on mwdebug1002 [07:11:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:11:45] (03CR) 10Ema: [C: 032] cache_text: disable varnish_be<->varnish_be max_connections [puppet] - 10https://gerrit.wikimedia.org/r/382654 (https://phabricator.wikimedia.org/T175803) (owner: 10Ema) [07:16:12] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:18:58] !log upgrade remaining API servers in codfw to HHVM 3.18.5 [07:19:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:19:34] (03CR) 10Gilles: [C: 031] Fixed typeo for responseEnd property. [puppet] - 10https://gerrit.wikimedia.org/r/382643 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [07:23:01] !log upgrade pybal to 1.14.0 on eqiad secondary LVSs [07:23:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:23:51] PROBLEM - DPKG on mw2124 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [07:24:31] PROBLEM - Apache HTTP on mw2121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:31] PROBLEM - HHVM rendering on mw2121 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:24:51] RECOVERY - DPKG on mw2124 is OK: All packages OK [07:24:57] moritzm: is that you? ^ [07:25:21] RECOVERY - Apache HTTP on mw2121 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.122 second response time [07:25:22] RECOVERY - HHVM rendering on mw2121 is OK: HTTP OK: HTTP/1.1 200 OK - 77214 bytes in 0.376 second response time [07:25:48] ema: yeah, these are all depooled during the upgrade [07:25:52] PROBLEM - PyBal backends health check on lvs1005 is CRITICAL: PYBAL CRITICAL - WARNING - Pool git-ssh4_22 is too small to allow depooling. Pool git-ssh6_22 is too small to allow depooling. [07:26:11] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:27:01] mmh interesting alert from lvs1005 (just upgraded to pybal 1.14.0) [07:27:40] <_joe_> ema: indeed [07:27:49] this one's unrelated to me, though :-) I'm only upgrading the remaining codfw hosts today [07:27:56] <_joe_> ema: seems like a bug we didn't hit before? [07:27:58] moritzm: yep! [07:28:17] <_joe_> I'm commuting right now but I can help debug later [07:28:38] _joe_: thanks! It looks like we've added this alert in 1.14, so no surprises we haven't seen it before [07:29:05] <_joe_> ema: oh right now I remember [07:29:11] PROBLEM - puppet last run on mw2121 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [07:29:15] <_joe_> that means pybal is misconfigured [07:29:26] <_joe_> I can fix it later, I would not worry [07:29:39] <_joe_> actually, it's pretty cool the alert did fire up [07:29:53] it is :) [07:30:05] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663517 (10Ladsgroup) wikibase.eqiad.wmflabs is ready for test (hasn't applied the puppet roles yet). Added the hiera var in https://wikitech... [07:35:48] oh so git-ssh4_22 only has phab1001-vcs.eqiad.wmnet: enabled/up/pooled [07:36:02] hence it is indeed too small to allow depooling [07:37:43] same for git-ssh6_22 [07:42:52] (03PS1) 10Marostegui: db-eqiad.php: Restore db1076 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382658 [07:44:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1076 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382658 (owner: 10Marostegui) [07:47:25] (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1076 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382658 (owner: 10Marostegui) [07:47:35] (03CR) 10jenkins-bot: db-eqiad.php: Restore db1076 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382658 (owner: 10Marostegui) [07:48:07] _joe_: so yeah I assuming it's ok to have a single host in the pool we could change the depool-threshold for git-ssh? [07:48:20] s/I// [07:48:24] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore db1076 original weight - T174054 (duration: 00m 48s) [07:48:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:30] T174054: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054 [07:53:37] (03CR) 10Filippo Giunchedi: "You'll also need to set the "cluster" puppet variable for these machines." [puppet] - 10https://gerrit.wikimedia.org/r/382506 (https://phabricator.wikimedia.org/T177501) (owner: 10Eevans) [07:54:11] RECOVERY - puppet last run on mw2121 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:55:22] 10Operations, 10Performance-Team, 10Thumbor, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10User-fgiunchedi: Remove X-Content-Dimensions for multipage originals - https://phabricator.wikimedia.org/T175689#3663621 (10Gilles) Cleanup of non-Commons wikis complete, started the cleanup of... [07:58:18] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663637 (10Lydia_Pintscher) >>! In T99531#3663063, @Dzahn wrote: > - decide whether in puppet it should be "ensure => latest" (means merging... [08:00:31] mmh also unless I'm missing something it looks like we've got a discrepancy between coordinator.canDepool and the instrumentation alert [08:01:26] ah no scratch that, they're doing two different things [08:02:54] well but still, canDepool is currently true for git-ssh although with a single host in the pool you can't really depool it :) [08:03:08] return len(self.servers) - len(downServers) >= len(self.servers) * self.lvsservice.getDepoolThreshold() [08:04:17] len(self.servers) is 1, 0 downServers, depoolthreshold is 0.5 [08:06:13] anyways, carrying on with pybal upgrades on eqiad primaries [08:09:58] !log upgrade pybal to 1.14.0 on eqiad primary LVSs [08:10:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:52] 10Operations, 10DC-Ops: Review and fix PDU settings for syslog/ntp/email servers - https://phabricator.wikimedia.org/T175341#3663659 (10fgiunchedi) Thanks @ayounsi ! Looks good to me, some things I found: * I can't reach its port 443 e.g. from bast3002 (connection refused) though ssh works and I'm seeing `SSL... [08:13:20] ACKNOWLEDGEMENT - PyBal backends health check on lvs1002 is CRITICAL: PYBAL CRITICAL - WARNING - Pool git-ssh4_22 is too small to allow depooling. Pool git-ssh6_22 is too small to allow depooling. Ema Known, see similar error on lvs1005. [08:16:44] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3663663 (10jcrespo) I will setup that and ping you. Independently of this, not supporting TLS 1.2 (if that... [08:23:03] 10Operations, 10Pybal, 10Traffic, 10monitoring, 10Patch-For-Review: pybal: add prometheus metrics - https://phabricator.wikimedia.org/T171710#3663686 (10ema) 05Open>03Resolved a:03ema PyBal 1.14.0, currently in prod, includes prometheus metrics. Closing. [08:23:24] (03PS1) 10Elukey: druid: fix cronspam from cronjobs trying to remove directories [puppet] - 10https://gerrit.wikimedia.org/r/382662 [08:25:25] (03CR) 10Elukey: [C: 032] druid: fix cronspam from cronjobs trying to remove directories [puppet] - 10https://gerrit.wikimedia.org/r/382662 (owner: 10Elukey) [08:34:24] (03PS1) 10Ema: cache::upload: enable nginx-lua-prometheus [puppet] - 10https://gerrit.wikimedia.org/r/382663 [08:37:11] (03PS1) 10Ema: prometheus: add nginx_cache_upload cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/382664 [08:40:29] !log upgrade remaining image scalers in codfw to HHVM 3.18.5 [08:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:00] (03CR) 10Filippo Giunchedi: [C: 031] "Can be merged at any time, metrics are not yet exposed on the nginx side" [puppet] - 10https://gerrit.wikimedia.org/r/382664 (owner: 10Ema) [08:47:31] (03CR) 10Filippo Giunchedi: "For more naming bikeshedding: since we're going to fold all machines into cassandra 3, we could keep cassandra 3 machines in the restbase " [puppet] - 10https://gerrit.wikimedia.org/r/382506 (https://phabricator.wikimedia.org/T177501) (owner: 10Eevans) [08:47:55] PROBLEM - LVS HTTPS IPv4 on rendering.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.21 and port 443: Connection refused [08:47:59] PROBLEM - LVS HTTP IPv4 on rendering.svc.codfw.wmnet is CRITICAL: connect to address 10.2.1.21 and port 80: Connection refused [08:48:11] uh oh [08:48:33] moritzm: ^ [08:48:37] upgrades? [08:48:50] timing matches yeah [08:49:05] RECOVERY - LVS HTTPS IPv4 on rendering.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 16118 bytes in 7.230 second response time [08:49:10] RECOVERY - LVS HTTP IPv4 on rendering.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 16118 bytes in 4.871 second response time [08:49:11] ok [08:49:15] whew [08:49:28] plus it is codfw, not taking any traffic [08:49:32] :-) [08:50:23] hmm, I think I know what went wrong: [08:51:19] I depooled two image scalers for the upgrade, but it turns out there were two others mw2244 and mw2245 who were already depooled [08:51:53] these were probably forgotten some time ago, will have a look and repool those as well unless there's a hw issue or so [08:52:36] <_joe_> so your servers were not actively depooled [08:53:33] (03PS3) 10Marostegui: db-eqiad.php: Set commonswiki on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) [08:54:07] _joe_: I'm pretty sure confctl switched from pooled=yes to pooled=no, let me re-try, don't have that terminal backscroll anymore [08:55:40] (03PS1) 10Ema: pybal: make check_pybal distinguish between warning and critical [puppet] - 10https://gerrit.wikimedia.org/r/382667 [08:55:53] _joe_: yeah, so at least on the level of what confctl prints, these were depooled, but maybe it failed in pybal [08:56:28] <_joe_> moritzm: what confctl does and what pybal does is different [08:56:42] <_joe_> pybal won't depool servers if too many are already down [08:57:25] (03PS2) 10Marostegui: db1068: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/382380 (https://phabricator.wikimedia.org/T168661) [08:58:02] (03CR) 10Giuseppe Lavagetto: [C: 031] pybal: make check_pybal distinguish between warning and critical [puppet] - 10https://gerrit.wikimedia.org/r/382667 (owner: 10Ema) [08:58:23] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3663768 (10Addshore) >>! In T99531#3663637, @Lydia_Pintscher wrote: > Let's stick with ensure => latest. Having to find the person who can ac... [08:58:40] (03CR) 10Ema: [C: 032] pybal: make check_pybal distinguish between warning and critical [puppet] - 10https://gerrit.wikimedia.org/r/382667 (owner: 10Ema) [08:58:49] sure, I know. but there's no good way to notice that pybal failed here [09:00:04] we should also have an icinga check for depooled servers, if a server is depooled for hardware maintenance, that alert can be acknowledged until it's fixed, in addition to those two image scalers I found two further hosts in codfw which were depooled w/o any apparent logs in SAL or Phabricator [09:00:14] I'll file a Phab task for that later [09:01:19] !log upgrade remaining app servers in codfw to HHVM 3.18.5 [09:01:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:40] <_joe_> moritzm: we do have a tool that checks that the server is effectively depooled from pybal as well [09:06:50] <_joe_> I'll add it to the appservers as well [09:14:17] ok, nice! [09:15:29] I am going to restart jenkins [09:17:12] (it is back) [09:17:24] !log Restarted the CI Jenkins for plugin upgrades [09:17:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:48] (03CR) 10Alexandros Kosiaris: [C: 031] package_builder: add support for WIKIMEDIA_EXPERIMENTAL [puppet] - 10https://gerrit.wikimedia.org/r/382403 (owner: 10Ema) [09:24:36] (03PS1) 10Jcrespo: Remove dbstore2 role, make dbstore default to the new socket location [puppet] - 10https://gerrit.wikimedia.org/r/382672 (https://phabricator.wikimedia.org/T168303) [09:26:33] (03CR) 10Alexandros Kosiaris: "Yeah this is OK as a first step. The apache issue is mostly related to how that module is structured. It will need to be refactored before" [puppet] - 10https://gerrit.wikimedia.org/r/382336 (owner: 10Dzahn) [09:26:42] (03PS2) 10Jcrespo: Remove dbstore2 role, make dbstore default to the new socket location [puppet] - 10https://gerrit.wikimedia.org/r/382672 (https://phabricator.wikimedia.org/T168303) [09:36:16] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3663819 (10Paladox) >>! In T145885#3663481, @Marostegui wrote: >>>! In T145885#3555784, @Paladox wrote:... [09:37:27] (03PS1) 10Jcrespo: dbstore: Reduce memory pressure to reduece the likelyhood of swapping [puppet] - 10https://gerrit.wikimedia.org/r/382673 (https://phabricator.wikimedia.org/T168303) [09:38:14] (03CR) 10Marostegui: [C: 031] dbstore: Reduce memory pressure to reduece the likelyhood of swapping [puppet] - 10https://gerrit.wikimedia.org/r/382673 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [09:40:50] (03PS1) 10Giuseppe Lavagetto: role::cache::base: abstract varnish logging to class [puppet] - 10https://gerrit.wikimedia.org/r/382674 [09:40:52] (03PS1) 10Giuseppe Lavagetto: cache: convert kafka::webrequest to profile [puppet] - 10https://gerrit.wikimedia.org/r/382675 [09:41:36] (03CR) 10jerkins-bot: [V: 04-1] cache: convert kafka::webrequest to profile [puppet] - 10https://gerrit.wikimedia.org/r/382675 (owner: 10Giuseppe Lavagetto) [09:41:57] (03CR) 10Marostegui: [C: 031] "This looks good too: https://puppet-compiler.wmflabs.org/compiler02/8213/" [puppet] - 10https://gerrit.wikimedia.org/r/382672 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [09:42:11] (03CR) 10Alexandros Kosiaris: [C: 032] scap::dsh: Create the parsoid-canaries group [puppet] - 10https://gerrit.wikimedia.org/r/382416 (https://phabricator.wikimedia.org/T177374) (owner: 10Alexandros Kosiaris) [09:42:20] (03PS4) 10Alexandros Kosiaris: scap::dsh: Create the parsoid-canaries group [puppet] - 10https://gerrit.wikimedia.org/r/382416 (https://phabricator.wikimedia.org/T177374) [09:43:21] (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/compiler02/8214/" [puppet] - 10https://gerrit.wikimedia.org/r/382672 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [09:45:11] PROBLEM - puppet last run on mw2180 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 10 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [09:45:20] (03CR) 10Elukey: [C: 031] Remove dbstore2 role, make dbstore default to the new socket location [puppet] - 10https://gerrit.wikimedia.org/r/382672 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [09:45:30] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3663883 (10Paladox) I guess we can remove DBA now as there is nothing here now for the DBA to do? [09:46:01] PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 15 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [09:48:41] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Useless pedantic typo comment about commit message" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382673 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [09:51:59] (03PS2) 10Giuseppe Lavagetto: cache: convert kafka::webrequest to profile [puppet] - 10https://gerrit.wikimedia.org/r/382675 [09:52:01] (03PS2) 10Jcrespo: dbstore: Reduce memory pressure to minimize the likelihood of swapping [puppet] - 10https://gerrit.wikimedia.org/r/382673 (https://phabricator.wikimedia.org/T168303) [09:52:30] (03CR) 10Jcrespo: "Thanks alex" [puppet] - 10https://gerrit.wikimedia.org/r/382673 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [09:52:32] (03CR) 10Elukey: [C: 031] "After a chat with Jaime/Manuel on IRC these numbers should help a bit avoiding these swap situations, so I am ok to test them on dbstore10" [puppet] - 10https://gerrit.wikimedia.org/r/382673 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [09:52:41] (03CR) 10jerkins-bot: [V: 04-1] cache: convert kafka::webrequest to profile [puppet] - 10https://gerrit.wikimedia.org/r/382675 (owner: 10Giuseppe Lavagetto) [09:53:42] !log stop replication (eventlogging_sync + mysql) on dbstore1002 as prep step for mysql restart [09:53:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:22] (03PS1) 10Ema: pybal: add missing curly bracket to check_pybal [puppet] - 10https://gerrit.wikimedia.org/r/382676 [09:56:02] (03CR) 10Giuseppe Lavagetto: [C: 031] pybal: add missing curly bracket to check_pybal [puppet] - 10https://gerrit.wikimedia.org/r/382676 (owner: 10Ema) [09:56:34] (03PS2) 10Ema: pybal: add missing curly bracket to check_pybal [puppet] - 10https://gerrit.wikimedia.org/r/382676 [09:56:57] (03CR) 10Ema: [V: 032 C: 032] pybal: add missing curly bracket to check_pybal [puppet] - 10https://gerrit.wikimedia.org/r/382676 (owner: 10Ema) [10:00:52] RECOVERY - puppet last run on mw2177 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [10:01:07] (03CR) 10Jcrespo: [C: 032] Remove dbstore2 role, make dbstore default to the new socket location [puppet] - 10https://gerrit.wikimedia.org/r/382672 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [10:01:09] (03PS3) 10Jcrespo: Remove dbstore2 role, make dbstore default to the new socket location [puppet] - 10https://gerrit.wikimedia.org/r/382672 (https://phabricator.wikimedia.org/T168303) [10:01:11] (03CR) 10Alexandros Kosiaris: [C: 031] dbstore: Reduce memory pressure to minimize the likelihood of swapping [puppet] - 10https://gerrit.wikimedia.org/r/382673 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [10:02:39] (03PS3) 10Jcrespo: dbstore: Reduce memory pressure to minimize the likelihood of swapping [puppet] - 10https://gerrit.wikimedia.org/r/382673 (https://phabricator.wikimedia.org/T168303) [10:03:38] (03CR) 10Jcrespo: [C: 032] dbstore: Reduce memory pressure to minimize the likelihood of swapping [puppet] - 10https://gerrit.wikimedia.org/r/382673 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [10:10:13] RECOVERY - puppet last run on mw2180 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:12:25] (03PS1) 10Jcrespo: dbstore: Move socket location to the right dir /run/mysqld [puppet] - 10https://gerrit.wikimedia.org/r/382678 (https://phabricator.wikimedia.org/T168303) [10:13:46] (03CR) 10Jcrespo: [C: 032] dbstore: Move socket location to the right dir /run/mysqld [puppet] - 10https://gerrit.wikimedia.org/r/382678 (https://phabricator.wikimedia.org/T168303) (owner: 10Jcrespo) [10:17:16] !log upgrade remaining job runners in codfw to HHVM 3.18.5 [10:17:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:01] PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 23 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [10:31:01] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:43:48] !log restart replication (mysql+ eventlogging_sync) on dbstore1002 after mysql restart [10:43:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:59] !log upgrade remaining video scalers in codfw to HHVM 3.18.5 [10:48:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:44] !log upgrading HHVM on script runners to HHVM 3.18.5 [10:51:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:55:01] PROBLEM - DPKG on terbium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:56:01] RECOVERY - DPKG on terbium is OK: All packages OK [11:00:41] PROBLEM - puppet last run on mw2247 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 20 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [11:03:47] !log installing git security updates on trusty (Debian already fixed) [11:03:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:31] !log rolling restart of all the druid daemons on druid100[1-6] to pick up new logging changes [11:10:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:41] RECOVERY - puppet last run on mw2247 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [11:17:27] (03PS1) 10Giuseppe Lavagetto: role::cache::base: convert to profile [1] [puppet] - 10https://gerrit.wikimedia.org/r/382683 [11:17:30] (03PS1) 10Giuseppe Lavagetto: role::cache::base: convert to profile [2] [puppet] - 10https://gerrit.wikimedia.org/r/382684 [11:18:23] (03CR) 10jerkins-bot: [V: 04-1] role::cache::base: convert to profile [1] [puppet] - 10https://gerrit.wikimedia.org/r/382683 (owner: 10Giuseppe Lavagetto) [11:18:36] <_joe_> oh yeah, yeah [11:20:42] hah [11:21:07] <_joe_> bblack: ema lured me into trying to take a stab at it [11:21:08] !log installing gdk-pixbuf security updates [11:21:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:27] <_joe_> it is *very* preliminary and I guess the patch series will be 20-30 patches long [11:21:51] honestly, it would probably be better to do some cleanup refactoring before touching profile:: [11:22:25] right now, even if you construct some sort of mental model of the coding style / module layout that the cache stuff appears to be following, things are wrongly-scoped/divided by any model [11:22:36] <_joe_> well l I am doing both [11:23:09] <_joe_> let me try to prepare the first few patches [11:24:10] I really think at the end of the day, we need more layers somewhere for this case. maybe structured something like: [11:24:15] <_joe_> I'm thinking of just refactor role::cache::base [11:24:22] <_joe_> *refactoring [11:24:36] <_joe_> and see how that goes [11:24:45] (03PS1) 10Muehlenhoff: Add library hint for gdk-pixbuf [puppet] - 10https://gerrit.wikimedia.org/r/382686 [11:24:47] <_joe_> role::cache::instances is way trickier :) [11:25:19] modules/varnish - generic varnish stuff; modules/tlsproxy - generic tlsproxy stuff like it already is (consuming modules/nginx...); modules/cacheproxy - the very-wmf-specific combination of 2x modules/varnish instances + 1x modules/tlsproxy instance with appropriate cluster parameterization [11:25:30] and then we can put the role/profile stuff on top of modules/cacheproxy per-cluster [11:25:50] role::cache::instances really doesn't belong in roles/profiles, it belongs in that hypothetical cacheproxy module [11:26:05] (03PS2) 10Muehlenhoff: Add library hint for gdk-pixbuf [puppet] - 10https://gerrit.wikimedia.org/r/382686 [11:27:02] (as do the "wikimedia" -level shared VCL templates) [11:27:55] <_joe_> bblack: ok I wanted to use the name "modules/cache" [11:28:00] <_joe_> cacheproxy is better though [11:28:10] <_joe_> but we came to the same conclusion, that looks promising [11:28:12] <_joe_> :) [11:28:46] (03CR) 10Muehlenhoff: [C: 032] Add library hint for gdk-pixbuf [puppet] - 10https://gerrit.wikimedia.org/r/382686 (owner: 10Muehlenhoff) [11:34:32] !log repooling mw2244/mw2245 (image scalers). they were previosly depooled, but there's no tickets or SAL entries indicating hardware maintenance, so they were probably simply forgotten to repool [11:34:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:18] _joe_: also I won't do it now because it will complicate refactoring, but it would've been far less confusing to name role::cache::instances as role::cache::instancepair . Note to past self: don't use plurals in class names :P [11:46:44] !log cp1* (eqiad caches) - upgrade to nginx-1.13.5-1+wmf1~jessie1 to match all the other sites [11:46:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:03:28] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599#3664202 (10Pablo-WMDE) [12:22:41] PROBLEM - Varnish HTTP text-backend - port 3128 on cp4028 is CRITICAL: connect to address 10.128.0.128 and port 3128: Connection refused [12:23:41] RECOVERY - Varnish HTTP text-backend - port 3128 on cp4028 is OK: HTTP OK: HTTP/1.1 200 OK - 178 bytes in 0.158 second response time [12:26:02] (03PS5) 10Ema: package_builder: add support for WIKIMEDIA_EXPERIMENTAL [puppet] - 10https://gerrit.wikimedia.org/r/382403 [12:26:07] (03CR) 10Ema: [V: 032 C: 032] package_builder: add support for WIKIMEDIA_EXPERIMENTAL [puppet] - 10https://gerrit.wikimedia.org/r/382403 (owner: 10Ema) [12:26:30] moritzm: ok to merge your changes too? [12:26:42] (Add library hint for gdk-pixbuf) [12:31:57] ah,yes, please [12:32:12] (03PS1) 10Muehlenhoff: Use readline in generate-debdeploy-spec [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/382689 [12:32:30] moritzm: done! [12:41:02] PROBLEM - graphite.wikimedia.org on graphite1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:41:36] gah, I'll take a look [12:42:01] RECOVERY - graphite.wikimedia.org on graphite1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1547 bytes in 2.117 second response time [12:47:59] <_joe_> bblack: I'd call it cacheproxy::instance_pair probably [12:48:12] <_joe_> that class is basically not a role [12:51:30] 10Operations, 10Traffic, 10Goal, 10User-fgiunchedi: Add Prometheus client support for varnish/statsd metrics daemons - https://phabricator.wikimedia.org/T177199#3664279 (10ema) p:05Triage>03Normal [12:53:17] (03PS6) 10Muehlenhoff: Add initial profile for ferm rules shared by all labstore hosts [puppet] - 10https://gerrit.wikimedia.org/r/353508 (https://phabricator.wikimedia.org/T165136) [12:54:43] !log repooling mw2248-mw2250 (job runners). they were previosly depooled, but there's no tickets or SAL entries indicating hardware maintenance, so they were probably simply forgotten to repool [12:54:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:57] 10Operations, 10Traffic, 10Goal, 10User-fgiunchedi: Add Prometheus client support for varnish/statsd metrics daemons - https://phabricator.wikimedia.org/T177199#3664281 (10fgiunchedi) IMO we could approach the problem of getting the stats above to Prometheus in at least two ways: 1. Import the Prometheus... [13:09:59] (03CR) 10Ladsgroup: "This needs to be merged before deployment of wmf-3, otherwise we will have lots of logspam for Ops. Merging it in Monday or so should be f" [puppet] - 10https://gerrit.wikimedia.org/r/352574 (https://phabricator.wikimedia.org/T140890) (owner: 10Ladsgroup) [13:12:54] (03CR) 10Ladsgroup: "This is good to merge, please do it soon before dropping of the table (in that case the dumping script will fail probably)" [puppet] - 10https://gerrit.wikimedia.org/r/352797 (https://phabricator.wikimedia.org/T140890) (owner: 10Ladsgroup) [13:17:57] (03PS1) 10Ladsgroup: labs: do not replicate wb_entity_per_page table [puppet] - 10https://gerrit.wikimedia.org/r/382694 (https://phabricator.wikimedia.org/T95685) [13:20:24] (03CR) 10Ladsgroup: "It's good to deploy it after end of the next week (Tuesday) and after opening the term_full_entity_id (T167114) but before the dropping th" [puppet] - 10https://gerrit.wikimedia.org/r/382694 (https://phabricator.wikimedia.org/T95685) (owner: 10Ladsgroup) [13:20:58] (03PS1) 10Filippo Giunchedi: prometheus: add conntrack and edac default collectors [puppet] - 10https://gerrit.wikimedia.org/r/382695 (https://phabricator.wikimedia.org/T177196) [13:22:03] 10Operations, 10Analytics, 10netops, 10User-Elukey: Review ACLs for the Analytics VLAN - https://phabricator.wikimedia.org/T157435#3664332 (10elukey) 05Open>03stalled [13:22:34] 10Operations, 10monitoring: Monitor hardware thermal issues - https://phabricator.wikimedia.org/T125205#3664335 (10elukey) [13:22:36] 10Operations, 10ops-eqiad, 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3664334 (10elukey) 05Open>03stalled [13:23:08] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3664336 (10fgiunchedi) [13:32:17] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3664350 (10fgiunchedi) [13:33:25] 10Operations, 10Ops-Access-Requests, 10Analytics: analytics-privatedata-users access for Jeff Green - https://phabricator.wikimedia.org/T177602#3664354 (10Jgreen) [13:35:58] 10Operations, 10Ops-Access-Requests, 10Analytics: analytics-privatedata-users access for Jeff Green - https://phabricator.wikimedia.org/T177602#3664354 (10elukey) This guy seems trustable, approved :) [13:38:37] (03PS1) 10Ottomata: Install analytics_cluster hadoop client on flerovium and furud [puppet] - 10https://gerrit.wikimedia.org/r/382701 (https://phabricator.wikimedia.org/T176505) [13:43:38] (03PS2) 10Giuseppe Lavagetto: role::cache::base: abstract varnish logging to class [puppet] - 10https://gerrit.wikimedia.org/r/382674 [13:43:40] (03PS3) 10Giuseppe Lavagetto: cache: convert kafka::webrequest to profile [puppet] - 10https://gerrit.wikimedia.org/r/382675 [13:43:42] (03PS2) 10Giuseppe Lavagetto: role::cache::base: convert to profile [1] [puppet] - 10https://gerrit.wikimedia.org/r/382683 [13:43:44] (03PS2) 10Giuseppe Lavagetto: role::cache::base: convert to profile [2] [puppet] - 10https://gerrit.wikimedia.org/r/382684 [13:44:10] (03CR) 10Faidon Liambotis: [C: 031] Install analytics_cluster hadoop client on flerovium and furud [puppet] - 10https://gerrit.wikimedia.org/r/382701 (https://phabricator.wikimedia.org/T176505) (owner: 10Ottomata) [13:44:36] (03CR) 10Ottomata: [C: 032] Install analytics_cluster hadoop client on flerovium and furud [puppet] - 10https://gerrit.wikimedia.org/r/382701 (https://phabricator.wikimedia.org/T176505) (owner: 10Ottomata) [13:45:24] 10Operations, 10Ops-Access-Requests: IRC operator request for Freenode #wikimedia-operations for @Dereckson - https://phabricator.wikimedia.org/T177493#3660733 (10MarcoAurelio) And we need a Task for this? `/me approves` đŸ‘đŸ» [13:52:12] (03PS1) 10Ottomata: Use analytics_cluster::client role instead of new analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/382704 [13:52:44] (03CR) 10jerkins-bot: [V: 04-1] Use analytics_cluster::client role instead of new analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/382704 (owner: 10Ottomata) [13:53:38] (03PS2) 10Ottomata: analytics_cluster::client role instead of analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/382704 [13:54:52] (03CR) 10Ottomata: [C: 032] analytics_cluster::client role instead of analytics_cluster::hadoop::client [puppet] - 10https://gerrit.wikimedia.org/r/382704 (owner: 10Ottomata) [13:56:18] (03PS1) 10Ottomata: Revert "analytics_cluster::client role instead of analytics_cluster::hadoop::client" [puppet] - 10https://gerrit.wikimedia.org/r/382705 [13:56:52] (03CR) 10jerkins-bot: [V: 04-1] Revert "analytics_cluster::client role instead of analytics_cluster::hadoop::client" [puppet] - 10https://gerrit.wikimedia.org/r/382705 (owner: 10Ottomata) [14:02:50] (03PS1) 10Muehlenhoff: Install python-dateutils via puppet [puppet] - 10https://gerrit.wikimedia.org/r/382707 [14:04:21] RECOVERY - Check whether ferm is active by checking the default input chain on ftp-internal is OK: OK ferm input default policy is set [14:05:03] (03PS2) 10Ottomata: Bring back hadoop::client role, move cdh module hiera to common [puppet] - 10https://gerrit.wikimedia.org/r/382705 [14:05:34] (03CR) 10jerkins-bot: [V: 04-1] Bring back hadoop::client role, move cdh module hiera to common [puppet] - 10https://gerrit.wikimedia.org/r/382705 (owner: 10Ottomata) [14:06:24] (03PS3) 10Ottomata: Bring back hadoop::client role, move cdh module hiera to common [puppet] - 10https://gerrit.wikimedia.org/r/382705 [14:08:10] (03CR) 10Ottomata: [C: 032] Bring back hadoop::client role, move cdh module hiera to common [puppet] - 10https://gerrit.wikimedia.org/r/382705 (owner: 10Ottomata) [14:09:57] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599#3664427 (10Aklapper) (Could you explain why this is tagged with #wmf-nda-requests?) [14:11:24] PROBLEM - puppet last run on flerovium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 10 seconds ago with 1 failures. Failed resources (up to 3 shown): Package[hadoop-client] [14:11:49] (03PS1) 10Ottomata: Move hadoop/logstash.yaml to common/ [puppet] - 10https://gerrit.wikimedia.org/r/382708 [14:12:30] (03CR) 10Ottomata: [C: 032] Move hadoop/logstash.yaml to common/ [puppet] - 10https://gerrit.wikimedia.org/r/382708 (owner: 10Ottomata) [14:14:26] (03PS1) 10Ottomata: Put hadoop/logstash.yaml in common/profile where it belongs [puppet] - 10https://gerrit.wikimedia.org/r/382709 [14:14:28] (03PS7) 10Rush: Add role::labs::libraryupgrader puppet configuration [puppet] - 10https://gerrit.wikimedia.org/r/372213 (https://phabricator.wikimedia.org/T173478) (owner: 10Legoktm) [14:15:02] (03CR) 10jerkins-bot: [V: 04-1] Add role::labs::libraryupgrader puppet configuration [puppet] - 10https://gerrit.wikimedia.org/r/372213 (https://phabricator.wikimedia.org/T173478) (owner: 10Legoktm) [14:15:09] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599#3664451 (10Addshore) >>! In T177599#3664427, @Aklapper wrote: > (Could you explain why this is tagged with #wmf-nda-requests?) This requires an NDA to be signed, b... [14:15:13] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3664452 (10fgiunchedi) [14:16:15] (03PS1) 10BBlack: purging: put BE before FE in varnishes list [puppet] - 10https://gerrit.wikimedia.org/r/382710 [14:17:04] PROBLEM - puppet last run on analytics1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:17:42] ottomata: Could not find data item profile::hadoop::logstash::enabled in any Hiera data file --^ [14:17:52] (03CR) 10Ottomata: [C: 032] Put hadoop/logstash.yaml in common/profile where it belongs [puppet] - 10https://gerrit.wikimedia.org/r/382709 (owner: 10Ottomata) [14:17:54] PROBLEM - puppet last run on analytics1064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:18:09] yaya [14:18:13] (03PS2) 10BBlack: purging: put BE before FE in varnishes list [puppet] - 10https://gerrit.wikimedia.org/r/382710 (https://phabricator.wikimedia.org/T133821) [14:18:37] ah it wasn't merged, didn't see the other patch [14:18:39] gooood [14:21:24] PROBLEM - puppet last run on analytics1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:21:24] PROBLEM - puppet last run on analytics1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:21:24] PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:21:24] PROBLEM - puppet last run on analytics1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:21:24] PROBLEM - puppet last run on analytics1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:21:24] RECOVERY - puppet last run on flerovium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:22:24] PROBLEM - puppet last run on analytics1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:23:25] (03CR) 10Filippo Giunchedi: [C: 031] Install python-dateutils via puppet [puppet] - 10https://gerrit.wikimedia.org/r/382707 (owner: 10Muehlenhoff) [14:27:11] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade jenkins to 2.73.1 (new lts release) - https://phabricator.wikimedia.org/T168644#3664532 (10MoritzMuehlenhoff) apt.wikimedia.org has been updated to 2.73.1. Let me know if I can help with anything... [14:27:33] 10Operations, 10AbuseFilter, 10ApiFeatureUsage, 10Domains, and 12 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664533 (1012345) a:0312345 [14:28:15] 10Operations, 10AbuseFilter, 10ApiFeatureUsage, 10Collaboration-Team-Triage, and 14 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664537 (10Joaquinito2018) [14:28:36] (03CR) 10Rush: Add initial profile for ferm rules shared by all labstore hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/353508 (https://phabricator.wikimedia.org/T165136) (owner: 10Muehlenhoff) [14:28:38] is that spam ^^ ? "Create a fibragratis.org with RequestFTTH extension" [14:28:44] (03PS7) 10Rush: Add initial profile for ferm rules shared by all labstore hosts [puppet] - 10https://gerrit.wikimedia.org/r/353508 (https://phabricator.wikimedia.org/T165136) (owner: 10Muehlenhoff) [14:34:27] (03PS2) 10Eevans: cassandra: move machines from restbase to restbase_ng cluster [puppet] - 10https://gerrit.wikimedia.org/r/382506 (https://phabricator.wikimedia.org/T177501) [14:36:50] (03CR) 10Eevans: "> For more naming bikeshedding: since we're going to fold all" [puppet] - 10https://gerrit.wikimedia.org/r/382506 (https://phabricator.wikimedia.org/T177501) (owner: 10Eevans) [14:38:11] 10Operations, 10ApiFeatureUsage, 10Collaboration-Team-Triage, 10MediaWiki-API, and 8 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664586 (10Joaquinito2018) 05Invalid>03Open Register to byet.host. [14:38:37] 10Operations, 10ApiFeatureUsage, 10Collaboration-Team-Triage, 10MediaWiki-API, and 8 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664590 (10Joaquinito2018) With free hosting, and free subdomain. [14:38:48] (03PS1) 10Hashar: rake task to continuously run puppet-lint [puppet] - 10https://gerrit.wikimedia.org/r/382716 [14:41:41] 10Operations, 10ApiFeatureUsage, 10Collaboration-Team-Triage, 10MediaWiki-API, and 7 others: Create a fibragratis.org with RequestFTTH extension - https://phabricator.wikimedia.org/T177606#3664611 (10Joaquinito2018) [14:43:42] (03PS3) 10BBlack: purging: put BE before FE in varnishes list [puppet] - 10https://gerrit.wikimedia.org/r/382710 (https://phabricator.wikimedia.org/T133821) [14:44:16] (03CR) 10BBlack: [C: 032] purging: put BE before FE in varnishes list [puppet] - 10https://gerrit.wikimedia.org/r/382710 (https://phabricator.wikimedia.org/T133821) (owner: 10BBlack) [14:46:18] (03PS2) 10Hashar: rake task to continuously run puppet-lint [puppet] - 10https://gerrit.wikimedia.org/r/382716 [14:46:21] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:46:21] RECOVERY - puppet last run on analytics1063 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [14:46:22] RECOVERY - puppet last run on analytics1066 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:47:01] RECOVERY - puppet last run on analytics1052 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:47:52] RECOVERY - puppet last run on analytics1064 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:48:33] 10Operations, 10Cloud-VPS, 10Patch-For-Review: Ferm rules for labstore NFS hosts - https://phabricator.wikimedia.org/T165136#3664623 (10chasemp) p:05Triage>03Normal [14:48:40] 10Operations, 10Wikimedia-log-errors: mw1209 /usr/bin/timeout: the monitored command dumped core - https://phabricator.wikimedia.org/T171903#3664625 (10MoritzMuehlenhoff) @herron : Thanks for addressing it on mw1262. I have some WIP Icinga check for this at https://gerrit.wikimedia.org/r/#/c/359120, I'll pick... [14:49:57] (03PS1) 10Rush: labstore: set base::firewall comment with notice [puppet] - 10https://gerrit.wikimedia.org/r/382718 [14:50:22] 10Operations, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3664629 (10Dzahn) [14:50:33] (03CR) 10jerkins-bot: [V: 04-1] labstore: set base::firewall comment with notice [puppet] - 10https://gerrit.wikimedia.org/r/382718 (owner: 10Rush) [14:51:22] RECOVERY - puppet last run on analytics1029 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:51:22] RECOVERY - puppet last run on analytics1054 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:52:21] RECOVERY - puppet last run on analytics1037 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:54:40] (03PS3) 10Giuseppe Lavagetto: role::cache::base: convert to profile [2] [puppet] - 10https://gerrit.wikimedia.org/r/382684 [14:54:42] (03PS1) 10Giuseppe Lavagetto: prometheus::varnish::exporter: convert to role/profile [puppet] - 10https://gerrit.wikimedia.org/r/382720 [14:54:44] (03PS1) 10Giuseppe Lavagetto: cacheproxy: move some content to new module [puppet] - 10https://gerrit.wikimedia.org/r/382721 [14:54:45] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3664636 (10elukey) One thing that I noticed now from https://github.com/varnishcache/varnish-cache/blob/master/doc/changes.rst is the following: ``` Varnish Cache 5.2-R... [14:54:54] (03PS3) 10Hashar: rake task to continuously run puppet-lint [puppet] - 10https://gerrit.wikimedia.org/r/382716 [14:55:59] (03PS2) 10Rush: labstore: set base::firewall comment with notice [puppet] - 10https://gerrit.wikimedia.org/r/382718 (https://phabricator.wikimedia.org/T165136) [14:56:06] (03CR) 10jerkins-bot: [V: 04-1] cacheproxy: move some content to new module [puppet] - 10https://gerrit.wikimedia.org/r/382721 (owner: 10Giuseppe Lavagetto) [14:56:16] (03PS3) 10Rush: labstore: set base::firewall comment with notice [puppet] - 10https://gerrit.wikimedia.org/r/382718 (https://phabricator.wikimedia.org/T165136) [14:56:42] (03CR) 10Zoranzoki21: [C: 031] "I agree with MarcoAurelio.. But, I will put +1 because is no problem with patch. But, no deploy this" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382357 (https://phabricator.wikimedia.org/T177448) (owner: 10Jayprakash12345) [14:57:30] (03PS4) 10Rush: labstore: set commented base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/382718 (https://phabricator.wikimedia.org/T165136) [14:58:04] (03PS5) 10Rush: labstore: set commented base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/382718 (https://phabricator.wikimedia.org/T165136) [14:58:07] (03PS4) 10Muehlenhoff: Add Icinga check for depletion of HHVM CLI cache [puppet] - 10https://gerrit.wikimedia.org/r/359120 (https://phabricator.wikimedia.org/T161598) [14:58:11] (03CR) 10Muehlenhoff: Add Icinga check for depletion of HHVM CLI cache (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/359120 (https://phabricator.wikimedia.org/T161598) (owner: 10Muehlenhoff) [15:01:41] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3664657 (10BBlack) We're moving to 5.1.3 with this upgrade. 5.2.0 is a little too bleeding-edge for now :) [15:01:44] (03CR) 10Rush: [C: 032] labstore: set commented base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/382718 (https://phabricator.wikimedia.org/T165136) (owner: 10Rush) [15:05:11] (03CR) 10Hashar: "The aim is to run the wmf_styleguide continuously so we can get some kind of progress report per day and per module." [puppet] - 10https://gerrit.wikimedia.org/r/382716 (owner: 10Hashar) [15:06:26] (03CR) 10Muehlenhoff: Add initial profile for ferm rules shared by all labstore hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/353508 (https://phabricator.wikimedia.org/T165136) (owner: 10Muehlenhoff) [15:10:13] (03CR) 10Gehel: Docstrings: use Google Style (036 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/382479 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [15:10:41] PROBLEM - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 605.99 seconds [15:19:41] RECOVERY - MariaDB Slave Lag: s5 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 256.78 seconds [15:20:04] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3664709 (10ema) >>! In T168529#3664636, @elukey wrote: > all the other python daemons will need to get reviewed (already seeing commits for 5.2 in https://github.com/xci... [15:25:15] !log repooling mw2203 (API server). it was previously depooled, but there's no tickets or SAL entries indicating hardware maintenance, so it was probably simply forgotten to repool it [15:25:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:52] 10Operations, 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3664734 (10elukey) [15:27:53] (03PS1) 10Filippo Giunchedi: Collect APC info [software/hhvm_exporter] - 10https://gerrit.wikimedia.org/r/382728 (https://phabricator.wikimedia.org/T177196) [15:28:48] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3664741 (10fgiunchedi) [15:30:30] !log repooling mw2145 (API server). it was previously depooled, but there's no tickets or SAL entries indicating hardware maintenance, so it was probably simply forgotten to repool it [15:30:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:54] 10Operations, 10netops: Implement RPKI (Resource Public Key Infrastructure) - https://phabricator.wikimedia.org/T61115#3664762 (10ayounsi) 05Open>03Resolved We're all done here! [15:37:32] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.401 second response time [15:43:29] (03PS3) 10Giuseppe Lavagetto: role::cache::base: abstract varnish logging to class [puppet] - 10https://gerrit.wikimedia.org/r/382674 [15:43:31] (03PS4) 10Giuseppe Lavagetto: cache: convert kafka::webrequest to profile [puppet] - 10https://gerrit.wikimedia.org/r/382675 [15:43:33] (03PS3) 10Giuseppe Lavagetto: role::cache::base: convert to profile [1] [puppet] - 10https://gerrit.wikimedia.org/r/382683 [15:43:35] (03PS4) 10Giuseppe Lavagetto: role::cache::base: convert to profile [2] [puppet] - 10https://gerrit.wikimedia.org/r/382684 [15:43:37] (03PS2) 10Giuseppe Lavagetto: prometheus::varnish::exporter: convert to role/profile [puppet] - 10https://gerrit.wikimedia.org/r/382720 [15:43:39] (03PS2) 10Giuseppe Lavagetto: cacheproxy: move some content to new module [puppet] - 10https://gerrit.wikimedia.org/r/382721 [15:45:23] (03CR) 10jerkins-bot: [V: 04-1] cacheproxy: move some content to new module [puppet] - 10https://gerrit.wikimedia.org/r/382721 (owner: 10Giuseppe Lavagetto) [15:58:09] @seen hashar [15:58:09] mutante: Last time I saw hashar they were changing the nickname to hasharAway, but hasharAway is no longer in channel because he quit the network 00:48:01.2094870 ago. The nick change was done in #wikimedia-databases at 10/6/2017 3:05:37 PM (52m31s ago) [16:04:27] @seen wm-bot [16:04:27] paladox: I am right here [16:04:28] mutante ^^ [16:17:22] ACKNOWLEDGEMENT - IPMI Sensor Status on analytics1036 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical, Status = Critical] Herron https://phabricator.wikimedia.org/T177227 [16:17:22] ACKNOWLEDGEMENT - IPMI Sensor Status on analytics1037 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical, Status = Critical] Herron https://phabricator.wikimedia.org/T177227 [16:27:57] (03CR) 10Jdlrobson: "> skin: I would advice to purge this field after 90 days (remove it from this white-list). I guess it could be potentially identifying if " [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395) (owner: 10Bmansurov) [16:29:22] 10Operations, 10ops-ulsfo: Multiple systems in ulsfo 1.22 showing PSU failures - https://phabricator.wikimedia.org/T177622#3664980 (10herron) [16:31:05] 10Operations, 10ops-ulsfo: check lvs4002 power supply redundancy - https://phabricator.wikimedia.org/T177623#3664995 (10herron) [16:31:38] 10Operations, 10ops-ulsfo: check cp4007 power supply redundancy - https://phabricator.wikimedia.org/T177624#3665008 (10herron) [16:31:50] 10Operations, 10ops-ulsfo: check cp4007 power supply redundancy - https://phabricator.wikimedia.org/T177624#3665008 (10herron) [16:32:51] 10Operations, 10ops-ulsfo: check cp4008 power supply redundancy - https://phabricator.wikimedia.org/T177625#3665024 (10herron) [16:33:42] ACKNOWLEDGEMENT - IPMI Sensor Status on lvs4002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Herron https://phabricator.wikimedia.org/T177622 [16:40:05] (03CR) 10Gehel: [C: 031] "LGTM (and yes, more readable)" [software/cumin] - 10https://gerrit.wikimedia.org/r/382481 (owner: 10Volans) [16:59:17] 10Operations, 10ops-eqiad, 10DBA: check db1052 power supply redundancy - https://phabricator.wikimedia.org/T177627#3665092 (10herron) [16:59:32] ACKNOWLEDGEMENT - IPMI Sensor Status on db1052 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Herron https://phabricator.wikimedia.org/T177627 [17:01:49] 10Operations, 10ops-eqiad, 10DBA: check db1054 power supply redundancy - https://phabricator.wikimedia.org/T177628#3665107 (10herron) [17:02:06] ACKNOWLEDGEMENT - IPMI Sensor Status on db1054 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Herron https://phabricator.wikimedia.org/T177628 [17:04:03] 10Operations, 10ops-eqiad, 10DBA: check db1080 power supply redundancy - https://phabricator.wikimedia.org/T177630#3665141 (10herron) [17:07:05] @seen Ladsgroup [17:07:05] mutante: I have never seen Ladsgroup [17:08:45] 10Operations, 10ops-eqiad, 10Discovery, 10Discovery-Search, 10Elasticsearch: check elastic1022 power supply redundancy - https://phabricator.wikimedia.org/T177631#3665166 (10herron) [17:11:46] 10Operations, 10ops-eqiad: check kafka1022 power supply status - https://phabricator.wikimedia.org/T177633#3665200 (10herron) [17:13:03] mutante it's amir [17:13:11] Amir1 [17:13:12] i mean [17:13:23] ah, right:) thanks [17:13:44] (03PS10) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [17:13:45] your welcome :) [17:14:18] (03CR) 10jerkins-bot: [V: 04-1] start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [17:14:47] 10Operations, 10ops-eqiad: check mc1016 power supply redundancy - https://phabricator.wikimedia.org/T177634#3665219 (10herron) [17:17:42] 10Operations, 10ops-eqiad: check mw1200 power supply redundancy - https://phabricator.wikimedia.org/T177635#3665237 (10herron) [17:19:34] (03CR) 10Dzahn: "hmm, right now i think i have these options: write it so that the style check likes it but it will not work when mixed with other roles in" [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [17:20:29] 10Operations, 10ops-eqiad: check mw1203 power supply redundancy - https://phabricator.wikimedia.org/T177637#3665268 (10herron) [17:20:38] ^ i could write it without adding new violations and the -1 but then it would not work as soon as another role with apache is on the same node [17:21:16] or i could ignore jerkins -1.. or i could stop using Apache module at all and "manually" install Apache in puppet.. or i could refactor the entire Apache module to become "defines" [17:21:21] and i dont like any of these :) [17:23:21] 10Operations, 10ops-codfw: check mw2160 power supply redundancy - https://phabricator.wikimedia.org/T177638#3665280 (10herron) [17:25:02] 10Operations, 10ops-codfw: check mw2176 power supply redundancy - https://phabricator.wikimedia.org/T177639#3665292 (10herron) [17:33:47] (03CR) 10Krinkle: [C: 031] Fixed typeo for responseEnd property. [puppet] - 10https://gerrit.wikimedia.org/r/382643 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [17:34:43] (03PS2) 10Krinkle: webperf: Fixed typeo for navtiming2 responseEnd property [puppet] - 10https://gerrit.wikimedia.org/r/382643 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [17:43:06] (03CR) 10Herron: [C: 032] webperf: Fixed typeo for navtiming2 responseEnd property [puppet] - 10https://gerrit.wikimedia.org/r/382643 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [17:43:14] (03PS3) 10Herron: webperf: Fixed typeo for navtiming2 responseEnd property [puppet] - 10https://gerrit.wikimedia.org/r/382643 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [17:44:02] PROBLEM - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 614.34 seconds [17:44:36] 10Operations, 10OCG-General, 10Readers-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#2799667 (10Nemo_bis) I'm not attached to OCG, but https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality#... [17:45:55] 10Operations, 10ops-eqiad, 10DBA: check db1054 power supply redundancy - https://phabricator.wikimedia.org/T177628#3665356 (10Marostegui) p:05Triage>03High This is s2 master so raising it to high. @Cmjohnson does this host have a led indicator on the PSU so we can see whether this is true or a false posi... [17:46:14] 10Operations, 10ops-eqiad, 10DBA: check db1052 power supply redundancy - https://phabricator.wikimedia.org/T177627#3665359 (10Marostegui) p:05Triage>03High This is s1 master so raising it to high. @Cmjohnson does this host have a led indicator on the PSU so we can see whether this is true or a false posi... [17:47:28] 10Operations, 10ops-eqiad, 10DBA: check db1080 power supply redundancy - https://phabricator.wikimedia.org/T177630#3665364 (10Marostegui) p:05Triage>03Normal @Cmjohnson this host is easy to depool so we can test if this is true or a false positive. We can reboot it, change its PDU socket, cable whatever... [17:53:56] (03PS1) 10Bartosz DziewoƄski: Disallow most file types from upload to enwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382749 (https://phabricator.wikimedia.org/T176647) [17:54:20] (03CR) 10BryanDavis: [C: 031] "Description is through and comments from previous reviews have been addressed." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/380568 (https://phabricator.wikimedia.org/T176624) (owner: 10Sowjanyavemuri) [17:58:33] (03CR) 10Bartosz DziewoƄski: "Note for the deployer: InitialiseSettings.php must be synced before CommonSettings.php, otherwise we'll get tons of warnings in the logs d" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382749 (https://phabricator.wikimedia.org/T176647) (owner: 10Bartosz DziewoƄski) [17:59:08] 10Operations, 10ops-eqiad, 10DBA: check db1052 power supply redundancy - https://phabricator.wikimedia.org/T177627#3665403 (10Marostegui) According to the logs it is the PSU number 2. ``` /admin1/system1/logs1/log1-> show record5 properties CreationTimestamp = 20170124114652.000000-360 ElementName = Sy... [18:01:29] 10Operations, 10ops-eqiad, 10DBA: check db1054 power supply redundancy - https://phabricator.wikimedia.org/T177628#3665413 (10Marostegui) Looks like PSU #2 as per the logs: ``` /admin1/system1/logs1/log1-> show record1 properties CreationTimestamp = 20170610221238.000000-300 ElementName = System Event... [18:03:28] 10Operations, 10ops-eqiad, 10DBA: check db1080 power supply redundancy - https://phabricator.wikimedia.org/T177630#3665424 (10Marostegui) Looks like PSU #2 as per the logs: ``` status=0 status_tag=COMMAND COMPLETED Fri Oct 6 17:45:40 2017 /system1/log1/record8 Targets Properties number=8 sev... [18:08:22] mutante: hey, I just got back [18:08:35] what's up, I was looking for you for the wikiba.se stuff [18:10:21] !log arlolra@tin Started deploy [parsoid/deploy@e437c93]: Updating Parsoid to 772e11bf [18:10:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:03] !log arlolra@tin Finished deploy [parsoid/deploy@e437c93]: Updating Parsoid to 772e11bf (duration: 12m 41s) [18:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:22] (03CR) 10Ladsgroup: "Regarding the first option, does it help if we get it a dedicated VM so it doesn't have any other roles?" [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [18:38:58] !log catrope@tin Synchronized php-1.31.0-wmf.2/extensions/TwoColConflict/: Unbreak preference changes (T177524) (duration: 00m 47s) [18:39:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:05] T177524: "User::loadFromSession called before the end of Setup.php" due to a combination of TwoColConflict and BetaFeatures - https://phabricator.wikimedia.org/T177524 [18:48:43] 10Operations, 10DC-Ops: Review and fix PDU settings for syslog/ntp/email servers - https://phabricator.wikimedia.org/T175341#3665672 (10ayounsi) [18:48:52] Amir1: hey, same here. just got back, was looking for you for the wikiba.se stuff :) [18:49:19] :D [18:49:26] Amir1: my current issue is with the puppet code that i wrote myself, heh [18:49:27] okay, what should we do? [18:50:30] so.. there are some options.. and i like none of them [18:50:35] see my comment on gerrit [18:51:11] but .. i think it is "write it so that jenkins likes it now but it cant run together with other roles"... merge it.. use that in labs on an instance with nothing else so it doesnt matter [18:51:33] while using it there.. hope that the apache module gets refactored [18:51:40] or do it ..:p [18:52:07] before it hits production, adjust the code so that it won't conflict with other roles on the prod VM [18:52:23] while jenkins also doesnt hate it.. one way or another [18:52:39] shrug [18:53:00] mutante: I wrote it there, I hope I understood you correctly, but if we get it a dedicated vm, it won't have other roles, right? [18:53:03] (03CR) 10Zoranzoki21: [C: 031] Disallow most file types from upload to enwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382749 (https://phabricator.wikimedia.org/T176647) (owner: 10Bartosz DziewoƄski) [18:53:18] so the first option would be okay [18:53:26] Amir1: yea, true. so that is the right question. _should_ it need a dedicated VM? i dont know [18:53:27] well, okay-ish [18:53:42] I think we have a reason for it now :D [18:54:00] that reason should go away _somehow_ anyways [18:54:05] because it will hit all other roles too [18:54:19] anyone who would try to add a new role/profile using apache, would hit this issue right now [18:54:42] it just doesnt happen that often [18:54:50] and the check is newish [18:55:09] Amir1: so you created a project and instance already, yes? [18:55:25] mutante: not a new project [18:55:30] new instance [18:55:35] wikibase in wikidata-dev [18:55:59] ok, cool [18:56:16] and then the Hiera: settings for this project? [18:56:25] we can use wiki _or_ repo [18:57:23] and you should add the things used in Apache template, server admin and server name [18:57:39] do we want wikibase.wmflabs.org? yes, right [18:57:57] wikiba.se.wmflabs.org won't work :) [18:58:42] mutante: yes, wikibase.wmflabs.org for now [18:58:47] I added it to the hiera page [18:59:07] https://wikitech.wikimedia.org/w/index.php?title=Hiera:Wikidata-dev&diff=prev&oldid=1772179 [18:59:35] Amir1: looks good! now another line for the server admin email address [19:00:18] okay, what email address? I'd love an invalid one :D [19:00:48] eh.. not mine either :) i wonder if noreply@wmflabs.org will bounce [19:01:00] but it should really be the project admin , heh [19:01:42] Amir1: root@localhost ? [19:01:51] so we could read it on shell if needed? [19:02:03] will it deliver? [19:02:59] I doubt [19:03:12] I mean labs people are the experts here [19:03:23] as long as it doesnt end up at root@wm :p [19:04:33] :)))) That would be fun [19:05:10] well i remember we got a bunch of root mail from labs.. but then things changed [19:05:35] not entirely sure now, but i think root mail should be delivered to the project admin somehow [19:06:10] maybe it should be like with mailing lists, where you have an "-owner" special address that forwards to multiple admins [19:06:28] wikidata-dev-owner@cloudvps.wmflabs.org shrug [19:06:44] or maybe it already is somehow setup but i just dont know it [19:08:36] Amir1: http://mytrashmail.com/myTrashMail_inbox.aspx?email=wikidata-dev fixed ?:) [19:08:54] we are so public, you can even read the mail [19:10:09] that is the inbox of wikidata-dev@mytrashmail.com [19:10:21] mutante: I was looking for the root [19:10:26] awesome [19:11:06] I just sent one, let's see if it gets it [19:11:54] i think i can only merge the puppet change if i override jenkins -1 ... grmbl [19:12:22] let's try something else [19:13:15] yep [19:13:15] (03PS11) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [19:13:21] you can only merge if you remove the -1 [19:13:40] though if it -1, you have to judge if it's a ci error and report it. :) [19:14:26] paladox: to avoid this issue we have to change the entire Apache module [19:14:35] which is used by a bunch of things..ugh [19:14:39] oh [19:15:00] so the new jenkins vote is based on the "violations delta", right [19:15:12] if i add more new issues than fixing old issues, i get -1 and vice versa [19:15:41] yep [19:15:49] that means i can move around all the existing roles like i did and not get any downtvotes.. because even though they do have violations.. i am not adding any NEW ones right now [19:15:53] but in our case here [19:16:01] i am writing an entirely new role/profile [19:16:05] yep [19:16:06] the first time since the check exists [19:16:19] so since i cant avoid adding the new violation when using apache module [19:16:26] i always get a delta of at least -1 [19:16:31] (03CR) 10jerkins-bot: [V: 04-1] start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [19:16:38] because in a brandnew module.. of course i am never fixing any EXISTING issues either [19:17:33] and when using the Apache module i can either do "include ::apache..." and get "includes from another module, bad" [19:18:04] or i can use "class { '::apache..." which either gets "uses class from other module, if it's not in profile" [19:18:40] or it is in profile and then the check likes it .. but now you have "class already declared" puppet error as soon as you put this on a node that has ANOTHER role also using Apache [19:19:01] the solution to all this is: [19:19:11] turn the entire Apache module into defined types [19:19:20] then you can use them multiple times and not violate the rules [19:19:42] but that of courses touches $manythings [19:20:07] and it wasnt supposed to have anything to do with wikiba.se [19:20:56] i'm getting picked up for lunch, Amir1, i will make that role usable in labs as soon as i get back today [19:21:31] if you add me to the project with permissions i can just put it on the instance myself and run puppet to check [19:22:01] then i'll add the "proxy" in Horizon and update the ticket with progress [19:22:42] paladox: eh, interesting , the latest jenkins output is not OK or FAILED, it is this: [19:22:45] operations-puppet-tests-docker ABORTED in 3m 00s [19:22:49] sorry, have to go, be back later [19:22:54] ok [19:22:58] ran into timeout [19:23:03] me and jenkins, both :) [19:23:54] lol [19:46:33] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), and 2 others: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3665899 (10awight) [19:46:35] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Give ores admins read access to /srv/log/ores/main.log* - https://phabricator.wikimedia.org/T175736#3601826 (10awight) 05Open>03Resolved [19:51:52] hey folks, I have a problem ssh'ing into staging environment, is there anyone who could help me ? [19:54:57] anyone? [19:58:09] mutante i think it may be related to integration-slave-docker-1001 going down. [19:59:23] (03CR) 10Paladox: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [19:59:31] PROBLEM - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 608.23 seconds [19:59:58] mutante works now [20:07:56] raynor: which staging environment? Something in Cloud Services? [20:08:43] reading-web-staging.reading-web-staging.equiad.wmflabs [20:09:01] I can access primary.bastion.wmflabs.org [20:09:30] raynor: ok. that's a Cloud VPS project. Lets go over to the #wikimedia-cloud channel and see if we can figure out where things are going wrong for you. [20:09:40] tbh I'm trying to access chromium-pdf.reading-web-staging.eqiad.wmflabs - a new instance created by bmansurov [20:09:47] ok, thx [20:14:31] RECOVERY - MariaDB Slave Lag: s5 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 290.39 seconds [20:36:48] (03CR) 10Hashar: [V: 031 C: 031] "Cherry picked on the CI puppet master." [puppet] - 10https://gerrit.wikimedia.org/r/382429 (owner: 10Hashar) [21:31:41] PROBLEM - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 609.43 seconds [21:38:45] (03CR) 10Dzahn: [C: 032] "labs-only so far and not adding new violations - will need adjustments later for prod though" [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) (owner: 10Dzahn) [21:39:06] (03PS12) 10Dzahn: start profile for wikiba.se web hosting [puppet] - 10https://gerrit.wikimedia.org/r/382355 (https://phabricator.wikimedia.org/T99531) [21:45:39] (03CR) 10Krinkle: [C: 04-1] Stop forcing php5 in `mwscript` [puppet] - 10https://gerrit.wikimedia.org/r/358896 (https://phabricator.wikimedia.org/T146285) (owner: 10Chad) [21:45:44] (03CR) 10Krinkle: [C: 04-1] "Per T145819" [puppet] - 10https://gerrit.wikimedia.org/r/358896 (https://phabricator.wikimedia.org/T146285) (owner: 10Chad) [22:32:12] (03CR) 10BryanDavis: "Applied on beta cluster puppetmaster for >1 year" [puppet] - 10https://gerrit.wikimedia.org/r/312523 (https://phabricator.wikimedia.org/T146381) (owner: 10Hashar) [22:36:29] (03PS1) 10Dzahn: wikibase: fix document root path [puppet] - 10https://gerrit.wikimedia.org/r/382860 [22:37:01] (03CR) 10Paladox: [C: 031] wikibase: fix document root path [puppet] - 10https://gerrit.wikimedia.org/r/382860 (owner: 10Dzahn) [22:37:36] (03CR) 10Dzahn: [C: 032] wikibase: fix document root path [puppet] - 10https://gerrit.wikimedia.org/r/382860 (owner: 10Dzahn) [22:37:49] (03PS1) 10BBlack: assertion/warning nits [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382862 [22:37:51] (03PS1) 10BBlack: raise PERSIST_REQS to 100K [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382863 [22:37:55] (03PS1) 10BBlack: Add inpkts_sent metric [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382864 [22:37:56] (03PS1) 10BBlack: Remove multi-head support from strq [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382865 [22:37:58] (03PS1) 10BBlack: Move the strq object completely inside the purger [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382866 [22:38:00] (03PS1) 10BBlack: Move all URL parsing and HTTP req generation to receiver [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382867 [22:38:02] (03PS1) 10BBlack: Chain the purgers together. [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382868 [22:38:04] (03PS1) 10BBlack: split per-purger stats [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382869 [22:38:06] (03PS1) 10BBlack: Bump http-parser upstream src to 2.7.1 [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382870 [22:38:08] (03PS1) 10BBlack: http-parser usage updates [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382871 [22:38:10] (03PS1) 10BBlack: bump copyright years [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382872 [22:38:12] (03PS1) 10BBlack: Release 0.0.12 [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382873 [22:38:40] 10Operations, 10DBA, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3666404 (10jrbs) Hey - just wondering what the status is here. Is there anything we can do to push this along? [23:01:41] (03PS1) 10Dzahn: wikibase: fix git clone destination [puppet] - 10https://gerrit.wikimedia.org/r/382884 [23:02:21] (03CR) 10Paladox: [C: 031] wikibase: fix git clone destination [puppet] - 10https://gerrit.wikimedia.org/r/382884 (owner: 10Dzahn) [23:02:26] (03CR) 10Dzahn: [C: 032] wikibase: fix git clone destination [puppet] - 10https://gerrit.wikimedia.org/r/382884 (owner: 10Dzahn) [23:03:17] this "joke" with the domain name "wikiba.se" just makes things confusing:) this isn't Sweden, wish it was wikibase.org , heh [23:05:36] imagines "What is a Wikiba? And why isn't this in Swedish?" [23:12:19] wikibase.org is for sale [23:13:04] yea, i noticed [23:13:41] so with an .org domain the doc root would be typically like /srv/org/wikimedia/ [23:13:57] now with this this becomes /srv/se/wikiba/ [23:14:13] accurately ugly :) [23:14:13] (y) [23:32:44] (03PS1) 10Dzahn: wikibase: set Labs Hiera values in repo [puppet] - 10https://gerrit.wikimedia.org/r/382892 [23:33:59] (03Draft1) 10Paladox: wikiba.se: Fix class [puppet] - 10https://gerrit.wikimedia.org/r/382891 [23:34:01] (03Draft2) 10Paladox: wikiba.se: Fix class [puppet] - 10https://gerrit.wikimedia.org/r/382891 [23:34:27] (03CR) 10Paladox: [C: 031] wikibase: set Labs Hiera values in repo [puppet] - 10https://gerrit.wikimedia.org/r/382892 (owner: 10Dzahn) [23:35:44] (03CR) 10Paladox: wikibase: set Labs Hiera values in repo [puppet] - 10https://gerrit.wikimedia.org/r/382892 (owner: 10Dzahn) [23:36:09] (03PS2) 10Dzahn: wikibase: set Labs Hiera values in repo [puppet] - 10https://gerrit.wikimedia.org/r/382892 [23:37:00] (03PS3) 10Paladox: wikiba.se: Add missing variables to wikibase.pp profile [puppet] - 10https://gerrit.wikimedia.org/r/382891 [23:37:11] (03CR) 10Dzahn: [C: 032] wikibase: set Labs Hiera values in repo [puppet] - 10https://gerrit.wikimedia.org/r/382892 (owner: 10Dzahn) [23:38:01] (03PS4) 10Dzahn: wikiba.se: Add missing variables to wikibase.pp profile [puppet] - 10https://gerrit.wikimedia.org/r/382891 (owner: 10Paladox) [23:38:44] (03CR) 10Dzahn: [C: 032] "thx" [puppet] - 10https://gerrit.wikimedia.org/r/382891 (owner: 10Paladox) [23:38:51] your welcome :) [23:42:38] 10Operations, 10DBA, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3666568 (10Reedy) Just needs scheduling, patches rebasing and someone to deploy it [23:45:09] (03Draft1) 10Paladox: wikiba.se: Remove colon from VirtualHost [puppet] - 10https://gerrit.wikimedia.org/r/382893 [23:45:12] (03PS2) 10Paladox: wikiba.se: Remove colon from VirtualHost [puppet] - 10https://gerrit.wikimedia.org/r/382893 [23:48:57] (03PS3) 10Paladox: wikiba.se: Remove colon from VirtualHost [puppet] - 10https://gerrit.wikimedia.org/r/382893 [23:49:37] (03PS4) 10Paladox: wikiba.se: Remove colon from VirtualHost [puppet] - 10https://gerrit.wikimedia.org/r/382893 [23:51:09] (03PS5) 10Paladox: wikiba.se: Remove colon from VirtualHost [puppet] - 10https://gerrit.wikimedia.org/r/382893 [23:51:24] (03CR) 10Dzahn: [C: 04-1] "the other way around. it's correct in Apache template but wrong in puppet class :)" [puppet] - 10https://gerrit.wikimedia.org/r/382893 (owner: 10Paladox) [23:52:21] (03CR) 10Dzahn: [C: 032] wikiba.se: Remove colon from VirtualHost [puppet] - 10https://gerrit.wikimedia.org/r/382893 (owner: 10Paladox) [23:55:31] (03CR) 10Dzahn: "manually rm -rf'ed the wrong path, works now :)" [puppet] - 10https://gerrit.wikimedia.org/r/382893 (owner: 10Paladox) [23:56:06] (03Draft1) 10Paladox: wikiba.se: Fix docroot to /srv/se/wikiba/output [puppet] - 10https://gerrit.wikimedia.org/r/382894 [23:56:08] (03Draft2) 10Paladox: wikiba.se: Fix docroot to /srv/se/wikiba/output [puppet] - 10https://gerrit.wikimedia.org/r/382894