[00:00:16] wfm? [00:00:27] no_justification wfm? [00:00:39] https://gerrit.wikimedia.org/r/#/c/407861/ "https://gerrit.wikimedia.org/r/#/c/407861/" [00:00:41] Son of a bitch [00:00:46] The indexer got smart ideas again [00:00:49] 500 Internal Server Error [00:00:53] yeh [00:01:02] At least the new gerrit interface handles 500s much better. [00:01:02] Also it seems https://gerrit.wikimedia.org/r/#/projects/performance,dashboards/custom:custom says "Group performance not found" instead. [00:01:46] Krinkle that's the index [00:01:49] most likly [00:01:53] groups is now in the index [00:02:14] mutante: I'm going to live-hack the indexing threads to 1 [00:02:22] So we can keep it up and merge it permenantly [00:02:22] no_justification: i can merge it [00:02:26] ok [00:02:26] Or do that yeah [00:02:34] I was thinking it might just be flakey until then [00:02:36] Why 500 Internal Server error again? [00:02:50] Zoranzoki21: If you'd follow along and read what's being said, you'd know. [00:03:07] no_justification: I following you and reading all [00:03:10] no_justification: please live hack :) [00:03:16] In which case: you know why [00:03:19] because in that moment.. i got them too [00:04:01] If you make 408k change, add me as reviewer [00:04:05] :) Ok? [00:04:37] What on earth are you going on about? [00:04:37] PROBLEM - puppet last run on labsdb1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config] [00:04:39] 10Operations, 10Analytics, 10Research, 10Traffic, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3942447 (10Nuria) >What I would like is to extend the current origin-when-cross-origin policy with the origin-when-crossorigin, origin fal... [00:06:13] 503 [00:06:15] mutante: Actually, it's index.batchThreads [00:06:27] PROBLEM - puppet last run on releases2001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/tools/release],Exec[git_pull_jenkins CI Composer] [00:06:34] Hauskatze: Problem with gerrit [00:06:46] Zoranzoki21: I know [00:06:47] no_justification: it's in the [index] section, isn't that prefixed? [00:06:57] Yes. [00:07:37] https://gerrit.wikimedia.org/r/c/407857/1/modules/gerrit/templates/gerrit.config.erb [00:08:17] PROBLEM - puppet last run on labsdb1011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/mediawiki-config] [00:08:19] PROBLEM - puppet last run on kafka2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [00:08:32] Ah, and that's what you're doing [00:08:35] Correct, merge that [00:08:37] My fix is wrong [00:08:40] 'k :) [00:08:42] (local hack fix) [00:08:42] :) [00:09:17] PROBLEM - puppet last run on graphite2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_performance/docroot] [00:09:45] i tried but i wasnt quick enough [00:10:09] it worked momentarily [00:10:34] I fixed it locally again because it was gonna keep trying to reindex [00:10:54] yep, good [00:11:47] Ok, one thread much nicer :) [00:12:07] PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [00:12:07] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [00:12:09] :) [00:12:37] PROBLEM - puppet last run on bromine is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 6 minutes ago with 4 failures. Failed resources (up to 3 shown): Exec[git_pull_wikimedia/annualreport],Exec[git_pull_wikimedia/TransparencyReport],Exec[git_pull_wikimedia/TransparencyReport-private],Exec[git_pull_wikibase/wikiba.se-deploy] [00:12:40] wikibugs may need restarting. [00:12:47] Really, what we need is to let the indexing finish [00:12:54] Just at a rate that won't make the DB explode :) [00:12:57] yep [00:12:59] i have to do polygerrit=0 . not used to it yet :) [00:13:06] heh :) [00:13:14] If the indexing doesn't finish, we'll keep having a stale one [00:13:27] submitting patch. now that i can [00:13:56] merged on master [00:13:58] :) [00:14:38] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [00:14:51] no_justification it has auto reindex if stale [00:15:06] Yeah, which is correct [00:15:10] yep [00:15:42] mutante: Pulled to cobalt, no change locally [00:17:25] error log seems quiet now [00:17:28] Indexer going slowly [00:17:30] (good) [00:17:30] no_justification i've noticed big performance improvements in the ui. loads almost instantly for me. [00:17:39] for polygerrit [00:17:45] gwtui is around the same as before [00:17:57] same :) [00:18:06] mutante: The downside of 1 thread? It just started a big repo ;-) [00:18:16] a big repo? [00:18:26] ops/puppet [00:18:29] lol [00:18:30] reindexing all those changes [00:20:08] I'm going afk for a bit, need a break. Things should be going along quietly now [00:21:19] no_justification: ! ACK :) [00:22:27] legoktm could you restart wikibugs please? :) [00:22:31] Next bandaid would be to disable the reindexIfStale, but let's cross that bridge if we come to it [00:23:49] heh [00:24:44] puppet done, moved to mw/core [00:24:49] Yeah, those are two of the biggest [00:24:52] Er, with most history [00:24:58] (repo size isn't the issue here, it's history) [00:25:16] heh [00:25:25] not so bad :) [00:27:22] yay @ big performance [00:28:10] :) [00:28:40] Hm.. looks like project dashboards and project admin don't work yet in PolyGerrit [00:28:53] admin says nicely "This page is not yet implemented in PolyGerrit. View it in the Old UI" [00:28:56] Krinkle that's fixed in 2.15 [00:29:00] and 2.16 / 3.0 [00:29:23] The dashboard one fails with a server error, ouch [00:29:32] Krinkle as im the one who implemented the project list in 2.15 (and kicked off adding full support for all of admin which becky @ google helped with) [00:30:03] https://gerrit.wikimedia.org/r/projects/performance/WebPageTest,dashboards/default "Site unreachable - ERR_INVALID_RESPONSE" [00:30:16] Krinkle you can test with http://gerrit-new.wmflabs.org/r/?polygerrit=1 [00:30:27] Krinkle yeh, that's due to you using the poly ui [00:30:32] switching to gwtui will fix that [00:30:40] that's fixed in gerrit 2.16 / 3.0 [00:30:51] (switching to gwtui = ?polygerrit=0 [00:31:04] Krinkle and it is no longer called projects in polygerrit from 2.16 / 3.0 [00:31:10] repoistory's is it's new name [00:31:13] https://gerrit-new.wmflabs.org/r/admin/repos [00:31:27] RECOVERY - puppet last run on releases2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:31:42] gwtui is still default, polygerrit is experimental in this version [00:31:54] but as paladox said that will change later [00:31:57] so it's a preview [00:32:01] yep :) [00:32:07] polygerrit is fully stable in 2.15 [00:32:27] just not the default in that release due to inline editing not being in it. inline edit has been added to 2.16 / 3.0 [00:32:47] Krinkle https://docs.google.com/presentation/d/17q-ygGioZi_5DITLyELa8oaOr22e15AHy8cq6XTZ0nY/edit#slide=id.g27f16618ec_0_139 [00:33:17] RECOVERY - puppet last run on labsdb1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:34:17] There's a "Delete Change" button now too [00:34:30] Yeh [00:34:37] RECOVERY - puppet last run on labsdb1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:35:37] Krinkle and there's naming patches feature in poly (supported in 2.14) [00:35:42] polygerrit feature not gwtui [00:35:54] and there's also a status feature found in settings again only poly feature [00:37:07] RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:37:07] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:37:22] Another restart? [00:37:25] (500s) [00:37:31] Yeah, was testing something [00:37:37] RECOVERY - puppet last run on bromine is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:37:48] (This is why we do gerrit upgrades on fridays at COB-ish) [00:38:17] RECOVERY - puppet last run on kafka2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:38:35] no_justification: That depends on Geo and mental timezone I suppose. [00:38:51] * greg-g looks around [00:39:17] RECOVERY - puppet last run on graphite2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:39:29] Krinkle: Short of me doing it on a saturday, end of friday is best I can do :\ [00:40:10] Krinkle: If it makes you feel better, I'm going afk now so I won't press restart again :) [00:42:42] no_justification "Chad (taking a break)" lol :) [00:42:50] gerrit is as my previous provider :lol: [00:43:38] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:31] I can not to wait to phabricator come back on normal [00:44:53] What do you mean? [00:44:55] phab is normal [00:48:18] I apologize to everyone for my bad behaviour on gerrit [00:49:42] I promise I will not make any more trouble there [00:54:07] wondering could some open up https://phabricator.wikimedia.org/T140366 please [00:54:10] it's now resolved [00:58:13] Good night. [01:05:55] 10Operations, 10Gerrit, 10Release-Engineering-Team (Someday): Make sure replying to emails in gerrit 2.14 works - https://phabricator.wikimedia.org/T158915#3051483 (10Paladox) we can try this now that we have upgraded to 2.14. [01:06:07] thanks no_justification :) [01:07:46] 10Operations, 10Gerrit, 10Release-Engineering-Team: Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086#3942613 (10Paladox) We can do this now :) [01:09:37] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [01:34:58] the jenkins/grrrit irc bot seems to have died with all the gerrit/zuul exvcitement [01:45:34] 10Operations, 10Prod-Kubernetes, 10Kubernetes: Utilize the deployment pipeline (stretch) - https://phabricator.wikimedia.org/T184924#3942632 (10Krinkle) [01:45:40] 10Operations, 10Prod-Kubernetes, 10Kubernetes: Validate whether the (implemented) standardized application environment works as expected - https://phabricator.wikimedia.org/T184923#3942633 (10Krinkle) [01:46:19] 10Operations, 10Prod-Kubernetes, 10Kubernetes: Serve at least 50% of Mathoid via kubernetes - https://phabricator.wikimedia.org/T184919#3942634 (10Krinkle) [01:46:32] 10Operations, 10Prod-Kubernetes, 10Kubernetes: Serve one production service via Kubernetes - https://phabricator.wikimedia.org/T184462#3942635 (10Krinkle) [01:54:41] 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#3363615 (10bd808) +1 for using this box to replace the labservices1002 box which has been flagged as out of warranty in a recent round of eqiad hardware audits. [02:00:16] cscott: Yeah. I have commands to run but idk what I'm doing wrong [02:00:23] I'm on tools-login and stuck [02:00:47] sorry, i can't actually help. i can just point at the broken thing and jump up and down. [02:06:51] no_justification: /me finds the doc [02:07:22] no_justification: https://www.mediawiki.org/wiki/Wikibugs [02:10:42] Should be running? [02:10:52] got a wikibugs notification just now in #mediawiki-parsoid so it looks like you're doing something right [02:13:14] (03CR) 10Andrew Bogott: ldap: move things from ldap::role to profile::ldap (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/407039 (owner: 10Andrew Bogott) [02:33:56] (03CR) 10Andrew Bogott: [C: 032] ldap: move things from ldap::role to profile::ldap [puppet] - 10https://gerrit.wikimedia.org/r/407039 (owner: 10Andrew Bogott) [02:44:54] (03CR) 10Andrew Bogott: [V: 032 C: 032] Added dummy for 'profile::openstack::main::rabbit_cleanup_pass' [labs/private] - 10https://gerrit.wikimedia.org/r/407870 (owner: 10Andrew Bogott) [02:49:27] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:03:27] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:25:57] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 771.30 seconds [03:55:14] !log restarting zuul to drop 407165,3 from the queue [03:55:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:57:08] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 206.13 seconds [04:19:11] 10Operations, 10Traffic, 10HTTPS, 10Security: $wgServer with initial https:// does not force HTTPS (wgSecureLogin) - https://phabricator.wikimedia.org/T156320#3942806 (10Krinkle) [09:17:50] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/403372 (owner: 10Hashar) [09:42:02] (03PS3) 10Zoranzoki21: New throttle rule, clean obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406820 (https://phabricator.wikimedia.org/T186002) (owner: 10Urbanecm) [09:42:17] (03CR) 10Zoranzoki21: "Oops. Sorry. I thinked to it is my" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406820 (https://phabricator.wikimedia.org/T186002) (owner: 10Urbanecm) [09:44:44] (03CR) 10Zoranzoki21: [C: 04-1] "You can abandon this because rule expired." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406820 (https://phabricator.wikimedia.org/T186002) (owner: 10Urbanecm) [10:12:02] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3943066 (10akosiaris) Yes I am, but currently in fosdem so it 'll take a bit more. ETA is probably Tuesday [11:17:49] PROBLEM - Host cp3045.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [11:22:59] RECOVERY - Host cp3045.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.47 ms [11:25:42] (03PS1) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [11:27:00] (03PS2) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [12:09:25] HI, I have question [12:09:43] How to set topic for gerrit to be avaiable when I upload patch? [12:15:00] Zoranzoki21: https://gerrit-review.googlesource.com/Documentation/intro-user.html#topics [12:15:13] Wiki13: thank you [12:55:07] (03CR) 10MarcoAurelio: "You can abandon this patch. Bureaucrats were recently given the power to add and remove this group by default on WMF wikis." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405771 (https://phabricator.wikimedia.org/T185531) (owner: 10Framawiki) [14:22:33] (03Draft1) 10Paladox: Gerrit: Cache groups, groups_byinclude and groups_members [puppet] - 10https://gerrit.wikimedia.org/r/407927 [14:22:36] (03PS2) 10Paladox: Gerrit: Cache groups [puppet] - 10https://gerrit.wikimedia.org/r/407927 [14:28:22] PROBLEM - HHVM rendering on mw2202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:29:12] RECOVERY - HHVM rendering on mw2202 is OK: HTTP OK: HTTP/1.1 200 OK - 75842 bytes in 0.302 second response time [14:32:01] (03Draft1) 10Paladox: Gerrit: Remove certificate params [puppet] - 10https://gerrit.wikimedia.org/r/407932 [14:32:03] (03PS2) 10Paladox: Gerrit: Remove certificate params [puppet] - 10https://gerrit.wikimedia.org/r/407932 [15:05:13] 10Operations, 10docker-pkg: Allow selecting which images to build - https://phabricator.wikimedia.org/T186416#3943472 (10Joe) [15:51:12] PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: CRITICAL - kubelet_operational_latencies is 424186 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [15:51:53] PROBLEM - kubelet operational latencies on kubernetes1004 is CRITICAL: CRITICAL - kubelet_operational_latencies is 415702 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [15:52:12] PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: CRITICAL - kubelet_operational_latencies is 412825 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [15:52:43] PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: CRITICAL - kubelet_operational_latencies is 72298 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [15:52:53] RECOVERY - kubelet operational latencies on kubernetes1004 is OK: OK - kubelet_operational_latencies is 3999 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [15:53:12] RECOVERY - kubelet operational latencies on kubernetes1003 is OK: OK - kubelet_operational_latencies is 4272 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [15:53:14] RECOVERY - kubelet operational latencies on kubernetes1001 is OK: OK - kubelet_operational_latencies is 4645 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [15:53:43] RECOVERY - kubelet operational latencies on kubernetes1002 is OK: OK - kubelet_operational_latencies is 4340 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [16:20:44] (03Draft1) 10Paladox: apache: support php7 in mpm.pp [puppet] - 10https://gerrit.wikimedia.org/r/407962 [16:20:47] (03PS2) 10Paladox: apache: support php7 in mpm.pp [puppet] - 10https://gerrit.wikimedia.org/r/407962 [16:21:14] (03CR) 10jerkins-bot: [V: 04-1] apache: support php7 in mpm.pp [puppet] - 10https://gerrit.wikimedia.org/r/407962 (owner: 10Paladox) [16:21:32] (03PS3) 10Paladox: apache: support php7 in mpm.pp [puppet] - 10https://gerrit.wikimedia.org/r/407962 [16:32:14] I yesterday got email from greg-g and I only want to tell to I am sorry for all [16:32:48] I will not make problems anymore [17:22:11] (03Draft1) 10Paladox: WIP: phabricator: Replace mod_php with php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/407958 [17:22:14] (03Draft2) 10Paladox: WIP: phabricator: Replace mod_php with php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/407958 [17:57:06] (03PS3) 10Paladox: WIP: phabricator: Replace mod_php with php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/407958 [18:05:24] (03PS4) 10Paladox: phabricator: Replace mod_php with php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/407958 (https://phabricator.wikimedia.org/T182832) [18:06:43] (03CR) 10Paladox: "I've noticed performance improvements as soon as i switched to this on https://phab.wmflabs.org (i will put this back to stay current with" [puppet] - 10https://gerrit.wikimedia.org/r/407958 (https://phabricator.wikimedia.org/T182832) (owner: 10Paladox) [18:09:55] (03PS5) 10Paladox: phabricator: Replace mod_php with php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/407958 (https://phabricator.wikimedia.org/T182832) [23:42:49] (03CR) 10Jcrespo: [C: 031] "I do not know enough about gerrit to know if this is correct, but defintely was too much load the other day, so much that the database was" [puppet] - 10https://gerrit.wikimedia.org/r/407857 (owner: 10Dzahn) [23:44:58] (03CR) 10Jcrespo: [C: 031] "As an addendum, it may not have been the only issue- alter tables may have created pilups, that may have overload the database after they " [puppet] - 10https://gerrit.wikimedia.org/r/407857 (owner: 10Dzahn)