[00:04:53] (03PS2) 10Yuvipanda: dynamicproxy: Migrate to python3 [puppet] - 10https://gerrit.wikimedia.org/r/291565 (owner: 10Ladsgroup) [00:05:09] (03CR) 10Yuvipanda: [C: 032 V: 032] "All of these packages seem available, so let me give this a shot." [puppet] - 10https://gerrit.wikimedia.org/r/291565 (owner: 10Ladsgroup) [00:14:54] (03CR) 10Yuvipanda: "and fix all the service {} stanzas as well." [puppet] - 10https://gerrit.wikimedia.org/r/291751 (owner: 10Alexandros Kosiaris) [00:15:24] (03PS2) 10Yuvipanda: Attempt to fix dynamicproxy-api service [puppet] - 10https://gerrit.wikimedia.org/r/289870 (owner: 10Alex Monk) [00:19:42] (03CR) 10Yuvipanda: [C: 032] Attempt to fix dynamicproxy-api service [puppet] - 10https://gerrit.wikimedia.org/r/289870 (owner: 10Alex Monk) [00:29:51] (03PS1) 10Yuvipanda: dynamicproxy: Followup to I7c13506b1a38f03815481651fd13411f7cf7c0c9 [puppet] - 10https://gerrit.wikimedia.org/r/291853 [00:31:01] (03CR) 10Yuvipanda: [C: 032] dynamicproxy: Followup to I7c13506b1a38f03815481651fd13411f7cf7c0c9 [puppet] - 10https://gerrit.wikimedia.org/r/291853 (owner: 10Yuvipanda) [02:25:43] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 09m 30s) [02:25:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:31:37] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue May 31 02:31:36 UTC 2016 (duration 5m 54s) [02:31:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:05:09] RECOVERY - cassandra-c CQL 10.64.32.196:9042 on restbase1008 is OK: TCP OK - 0.042 second response time on port 9042 [05:36:42] (03PS4) 10Giuseppe Lavagetto: base::grub: fix the ioscheduler setting [puppet] - 10https://gerrit.wikimedia.org/r/291706 [05:37:59] <_joe_> how the hell can jenkins be slow at this time of the day? [05:40:51] (03CR) 10Giuseppe Lavagetto: [C: 032] base::grub: fix the ioscheduler setting [puppet] - 10https://gerrit.wikimedia.org/r/291706 (owner: 10Giuseppe Lavagetto) [05:55:06] (03PS1) 10Mobrovac: service::node: Provide MW API and RESTBase request templates [puppet] - 10https://gerrit.wikimedia.org/r/291857 [06:02:25] (03PS4) 10Giuseppe Lavagetto: base::grub: actually use augeas on jessie [puppet] - 10https://gerrit.wikimedia.org/r/291707 [06:04:58] (03CR) 10Mobrovac: "OKed by PCC - https://puppet-compiler.wmflabs.org/2994/" [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac) [06:05:08] (03CR) 10Mobrovac: [C: 031] Change-Prop: White-list user-agent header header in http filter [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko) [06:06:46] (03CR) 10Giuseppe Lavagetto: [C: 032] "Let's go." [puppet] - 10https://gerrit.wikimedia.org/r/291707 (owner: 10Giuseppe Lavagetto) [06:13:56] (03PS2) 10Giuseppe Lavagetto: base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 [06:15:23] _joe_: I think your aug changes might be causing some puppet failure in labs [06:15:36] > ESC[1;31mError: /Stage[main]/Base::Grub/Augeas[grub2]: Could not evaluate: Save failed with return code false, see debugESC[0m [06:15:38] ESC[mNotice: /Stage[main]/Base::Grub/Exec[update-grub]: Dependency Augeas[grub2] has failures: trueESC[0m [06:15:40] ESC[1;31mWarning: /Stage[main]/Base::Grub/Exec[update-grub]: Skipping because of failed dependenciesESC[0m [06:16:15] <_joe_> YuviPanda: wat? [06:16:21] <_joe_> in prod it's working well [06:16:29] <_joe_> you mean we don't install augeas in labs? [06:16:45] _joe_: not sure. am running puppet again to see [06:16:46] <_joe_> YuviPanda: can you name one machine where I can see that? [06:16:56] <_joe_> ot [06:17:02] _joe_: I'm looking at tools-exec-1409 [06:17:04] <_joe_> *is it a jessie machine btw? [06:17:08] but waiting for another run [06:17:10] to see if it works fine [06:17:12] _joe_: no trusty [06:17:17] nope it's still failing [06:17:23] <_joe_> trusty doesn't have augeas 1.2.0 AFAIR [06:17:31] let me check a jessie / precise machine [06:17:49] <_joe_> oh no, it has augeas [06:17:50] <_joe_> cool [06:18:10] <_joe_> dpkg -l libaugeas0 [06:18:35] is fine on jessie [06:19:12] <_joe_> YuviPanda: uhm ok [06:19:43] <_joe_> YuviPanda: that's strange, in prod on a trusty machine it works like a charm [06:19:45] (is still running in precise) [06:19:57] _joe_: I wonder if it is a problem with us having trusty-backports? [06:20:12] _joe_: ok, fine on precise too. [06:20:15] !log upgrading hhvm in eqiad (also picking up updated versions of icu and lcms) [06:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:20:36] <_joe_> YuviPanda: I think it's an issue with that specific machine [06:20:51] _joe_: yeah, am trying a different one right now [06:21:02] <_joe_> I am running puppet with --debug there [06:21:18] (am testing on tools-bastion-03 now) [06:22:44] <_joe_> oblivian@tools-exec-1409:~$ sudo cat /etc/default/grub [06:22:44] <_joe_> cat: /etc/default/grub: No such file or directory [06:22:56] <_joe_> YuviPanda: no shit it fails :P [06:23:13] hmm [06:23:21] it works fine on -bastion-03 :D [06:23:26] _joe_: I've no idea what that implies tho :D [06:23:47] _joe_: my suspicion now is that these were all from a particular base image that did something to grub maybe [06:23:54] <_joe_> YuviPanda: it implies that if you upgrade the kernel there, you won't be able to rebuild the bootloader [06:30:10] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:19] PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:48] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:49] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [06:31:09] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:29] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:09] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:10] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:40] PROBLEM - puppet last run on ms-be2021 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:37] (03PS3) 10Giuseppe Lavagetto: base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 [06:39:11] (03PS4) 10Giuseppe Lavagetto: base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 [06:43:08] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:46:18] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [06:47:01] (03PS5) 10Giuseppe Lavagetto: base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 [06:47:07] <_joe_> can someone look int the 5xxs? [06:47:08] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: Puppet has 1 failures [06:47:58] <_joe_> moritzm: these failures ^^ seem related to hhvm [06:48:23] let me see oxygen [06:49:18] I think current status is ok, normally an indicative of a passed spike [06:49:25] let me confirm that [06:49:35] <_joe_> it could be moritzm upgrading hhvm [06:50:14] having a look [06:50:19] if that is true, maybe it should be done slower, but let me confirm first the status [06:50:46] yes, it was a spike, a relatively large one, however [06:50:47] haven't started the restarts yet, though, upgrading all the mw* systems in eqiad and pulling their 0.5 GB dbg package took a little [06:50:49] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:50:50] <_joe_> jynus: even when done very slow some more 503s are expected, not enough to create a spike though [06:51:06] <_joe_> moritzm: well upgrade does the hhvm restart already [06:51:14] large is a bad word [06:51:20] wide, but not tall [06:51:42] let me now see why [06:51:46] _joe_: ah, ofc. silly me [06:52:57] the puppet failure on mw1161 resolved itself with a puppet run which executed /usr/local/sbin/install-pkg-src from hhvm::debug [06:53:40] sorry about the noise, need coffee [06:53:57] <_joe_> eheh [06:54:17] it seems mostly api calls, was the one(s) affected an api node? [06:55:06] let's go to mediawiki to find out [06:55:42] mmm, high db errors [06:55:59] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:56:18] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:56:31] of course, mediawiki will not have that if HHVM was the culprit [06:57:09] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:20] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:29] RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:29] db1049 having issues, though [06:57:49] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:58:00] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:58:38] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:48] RECOVERY - puppet last run on ms-be2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:06] (03CR) 10Giuseppe Lavagetto: [C: 032] base::grub: allow enabling the memory cgroup controller [puppet] - 10https://gerrit.wikimedia.org/r/291772 (owner: 10Giuseppe Lavagetto) [06:59:09] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:30] most of the issues are rpc-related [07:00:58] the isses started happening all of a sudden since 14h yesterday [07:01:55] (03PS2) 10Muehlenhoff: Stop using package->latest in gerrit module [puppet] - 10https://gerrit.wikimedia.org/r/291762 (https://phabricator.wikimedia.org/T115348) [07:02:24] there is no lag, but there are wikidata jobs locking for >10 seconds [07:02:52] some wikiadmin jobs >500 seconds [07:04:13] nothing interesting happening at that time, no deploys [07:04:58] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [07:08:26] I do not see pure OS issues [07:09:25] one critical disk, though [07:10:42] but it is is only 2 Media Error [07:14:02] I am going to put offline that drive to discard disk issues [07:15:54] !log db1049> megacli -PDOffline -PhysDrv '[32:4]' -a0 [07:16:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:16:17] let's see if that helps [07:19:19] if it does, we may want to tune raid heuristics for doing that automatically, if that is a thing [07:20:38] (03CR) 10Alexandros Kosiaris: "@halfak: I am not following. You mean that somehow "ores" is not clearly distinguishable from "celery-ores-worker" ?" [puppet] - 10https://gerrit.wikimedia.org/r/291751 (owner: 10Alexandros Kosiaris) [07:21:18] PROBLEM - MegaRAID on db1049 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [07:25:00] ^that is expected and self-created [07:28:19] (03CR) 10Aklapper: "The DBs phabricator_maniphest.edge and phabricator_maniphest.maniphest_task were not used beforehand in this script, so this might require" [puppet] - 10https://gerrit.wikimedia.org/r/291781 (https://phabricator.wikimedia.org/T133649) (owner: 10Aklapper) [07:28:22] (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/2995/" [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko) [07:31:09] 06Operations, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, 06Services, and 2 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2340394 (10KartikMistry) There has been slow update, but I have built most of packages locally, seems OK on Jess... [07:32:50] (03CR) 10Alexandros Kosiaris: [C: 031] "Apart from the confusing "forward_headers" part which makes me think changeprop will be receiving requests from something and is requested" [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko) [07:32:57] 06Operations, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, 06Services, and 2 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2340398 (10KartikMistry) [07:33:17] 06Operations, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, 06Services, and 2 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#1491973 (10KartikMistry) [07:34:18] (03CR) 10Alexandros Kosiaris: [C: 031] service::node: Provide MW API and RESTBase request templates [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac) [07:34:45] as far as I can see, offlining the disk did not improve the issues [07:35:19] (03PS2) 10Jcrespo: Change-Prop: White-list user-agent header header in http filter [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko) [07:36:36] (03CR) 10Jcrespo: [C: 032] Change-Prop: White-list user-agent header header in http filter [puppet] - 10https://gerrit.wikimedia.org/r/291784 (owner: 10Ppchelko) [07:46:10] !log change-prop deploying 980f65c [07:46:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:56:03] (03PS1) 10Elukey: Clarify the use of TAG_F_NOVARMATCH within the context of %{}t. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/291870 (https://phabricator.wikimedia.org/T136314) [08:01:14] (03PS2) 10Elukey: Clarify the use of TAG_F_NOVARMATCH within the context of %{}t. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/291870 (https://phabricator.wikimedia.org/T136314) [08:03:17] let's revive that disk, that didn't work [08:04:49] !log db1049> megacli -PDOnline -PhysDrv '[32:4]' -a0 [08:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:06:47] RECOVERY - MegaRAID on db1049 is OK: OK: optimal, 1 logical, 2 physical [08:07:36] 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2340433 (10Gilles) I've started working on https://wiki.debian.org/Python/Thumbor as Marcelo requested and I'm almost done filing ITPs. I'll add a column for jessie and... [08:18:08] PROBLEM - MegaRAID on db1049 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [08:24:03] 06Operations: Linking a bn.wikipedia.org button to G+ page. - https://phabricator.wikimedia.org/T109810#2340449 (10Aklapper) >>! In T109810#2197918, @Jalexander wrote: > will look into it in the morning. @Jalexander: Any news to share here? Thanks! [08:30:32] ACKNOWLEDGEMENT - MegaRAID on db1049 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Jcrespo rebuilding completely just in case [08:31:37] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2340456 (10elukey) I discussed with @ema the inconsistency that we are seeing and we came to the conclusion that this change could be... [08:31:40] given than #1 cause is not the reason, let's go to #2: bots [08:32:53] BotNinja is there, but it is behaving [08:34:46] !log restbase deploy start of fcd62e1 [08:34:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:50:04] !log restbase deploy end of fcd62e1 [08:50:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:52:42] (03CR) 10Mobrovac: [C: 031] "Cherry-picked in Beta as well, works." [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac) [09:08:23] 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2340504 (10Gilles) [09:19:46] (03PS4) 10Mobrovac: Partially port RESTBaseUpdateJobs to change propagation. [puppet] - 10https://gerrit.wikimedia.org/r/291201 (owner: 10Ppchelko) [09:39:27] (03PS1) 10Gehel: Change expired file zoom level from 16 to 15. [puppet] - 10https://gerrit.wikimedia.org/r/291885 (https://phabricator.wikimedia.org/T136483) [09:43:13] (03CR) 10Gehel: "Puppet compiler output: https://puppet-compiler.wmflabs.org/2997/" [puppet] - 10https://gerrit.wikimedia.org/r/291885 (https://phabricator.wikimedia.org/T136483) (owner: 10Gehel) [09:49:28] 06Operations, 10MediaWiki-General-or-Unknown, 06Performance-Team: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603#2340617 (10Joe) [09:59:50] (03CR) 10Jcrespo: "I can confirm UID is correct and key is not shared with labs." [puppet] - 10https://gerrit.wikimedia.org/r/291255 (https://phabricator.wikimedia.org/T136417) (owner: 10Ladsgroup) [10:06:15] 06Operations, 10Traffic, 07Browser-Support-Firefox, 07HTTPS: Secure connection failed when attempting to send POST request (if connection has been idle for a while; disabling HTTP/2 helps) - https://phabricator.wikimedia.org/T134869#2340659 (10Aklapper) [10:11:25] 06Operations, 10Traffic, 07Browser-Support-Firefox, 07HTTPS: Secure connection failed when attempting to send POST request using HTTP/2 (if connection has been idle for a certain time) - https://phabricator.wikimedia.org/T134869#2340670 (10Danny_B) [10:19:13] (03PS54) 10Alexandros Kosiaris: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [10:19:41] (03PS3) 10Muehlenhoff: Stop using package->latest in gerrit module [puppet] - 10https://gerrit.wikimedia.org/r/291762 (https://phabricator.wikimedia.org/T115348) [10:19:52] (03CR) 10Muehlenhoff: [C: 032 V: 032] Stop using package->latest in gerrit module [puppet] - 10https://gerrit.wikimedia.org/r/291762 (https://phabricator.wikimedia.org/T115348) (owner: 10Muehlenhoff) [10:22:09] (03CR) 10Alexandros Kosiaris: "amended per 20after4's recommendation" [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [10:23:01] (03PS2) 10Jcrespo: Add ladsgroup user key and data and production cluster access [puppet] - 10https://gerrit.wikimedia.org/r/291255 (https://phabricator.wikimedia.org/T136417) (owner: 10Ladsgroup) [10:23:06] (03PS2) 10Alexandros Kosiaris: service::node: Provide MW API and RESTBase request templates [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac) [10:23:11] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] service::node: Provide MW API and RESTBase request templates [puppet] - 10https://gerrit.wikimedia.org/r/291857 (owner: 10Mobrovac) [10:23:56] (03PS3) 10Jcrespo: Add ladsgroup user key and data and production cluster access [puppet] - 10https://gerrit.wikimedia.org/r/291255 (https://phabricator.wikimedia.org/T136417) (owner: 10Ladsgroup) [10:25:44] (03CR) 10Jcrespo: [C: 032] Add ladsgroup user key and data and production cluster access [puppet] - 10https://gerrit.wikimedia.org/r/291255 (https://phabricator.wikimedia.org/T136417) (owner: 10Ladsgroup) [10:26:36] 06Operations, 10MediaWiki-General-or-Unknown, 06Performance-Team: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603#2340617 (10akosiaris) Installing libpam-systemd seems to solve the above for logged in users, need to check if it solves it as well for serv... [10:33:46] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2340736 (10jcrespo) With your description, I see you want statistics-privatedata-users which will give you access to the web request logs. However, while I see you h... [10:39:15] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting deployment access (for deploying to scb) for Ladsgroup - https://phabricator.wikimedia.org/T136406#2340741 (10jcrespo) @Ladsgroup I have granted you access to the cluster (although not yet to scb or other hosts). While I work on the rest of... [10:51:18] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 2 failures [11:01:29] PROBLEM - puppet last run on mc2015 is CRITICAL: CRITICAL: puppet fail [11:05:15] RECOVERY - MegaRAID on db1049 is OK: OK: optimal, 1 logical, 2 physical [11:09:24] (03CR) 10Elukey: "LGTM, maybe we could discuss Filippo's suggestion and then decide if merge or add the change?" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/291752 (https://phabricator.wikimedia.org/T132835) (owner: 10Ema) [11:14:01] 06Operations, 06Analytics-Kanban: Jmxtrans failures on Kafka hosts caused metric holes in grafana - https://phabricator.wikimedia.org/T136405#2340779 (10elukey) @mforns: Sure! I think that we should follow up on the items in the task's description, namely: 1) follow up with upstream and package/test/deploy th... [11:20:49] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/291116 (https://phabricator.wikimedia.org/T136188) (owner: 10Hashar) [11:24:39] (03PS2) 10Hashar: (DO NOT SUBMIT) chromium on hold, drop ensure => latest [puppet] - 10https://gerrit.wikimedia.org/r/291116 (https://phabricator.wikimedia.org/T136188) [11:26:45] RECOVERY - puppet last run on mc2015 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:34:16] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 645 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6160116 keys - replication_delay is 645 [11:47:15] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:48:08] (03PS1) 10KartikMistry: Beta: Enable Compact Language Links for new users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291908 (https://phabricator.wikimedia.org/T136161) [11:51:18] (03PS1) 10Muehlenhoff: Stop installing PHP on jessie app servers [puppet] - 10https://gerrit.wikimedia.org/r/291909 [11:59:16] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6068894 keys - replication_delay is 0 [12:07:56] 06Operations, 10MediaWiki-General-or-Unknown, 06Performance-Team: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603#2340856 (10Joe) @akosiaris libpam-systemd solves the problem for users with a login session, as it registers a user slice etc. It doesn't... [12:09:58] (03PS6) 10Muehlenhoff: Add a new backup set to backup openldap databases and enable on serpens [puppet] - 10https://gerrit.wikimedia.org/r/289824 (https://phabricator.wikimedia.org/T120919) [12:10:17] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add a new backup set to backup openldap databases and enable on serpens [puppet] - 10https://gerrit.wikimedia.org/r/289824 (https://phabricator.wikimedia.org/T120919) (owner: 10Muehlenhoff) [12:20:17] jynus: hey, around? [12:21:50] !log continue rolling reboot of mc2* systems for Linux 4.4 upgrade [12:21:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:23:08] 06Operations, 10ops-eqiad: HP Warning on boot [Firmware Bug]: the BIOS has corrupted hw-PMU resources - https://phabricator.wikimedia.org/T136345#2340874 (10jcrespo) [12:24:11] jynus: I was able to access for the first time but I saw some warnings that I want to share with you. [12:24:22] maybe there is MIM [12:24:23] Amir1, please do [12:41:48] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting deployment access (for deploying to scb) for Ladsgroup - https://phabricator.wikimedia.org/T136406#2340892 (10Ladsgroup) @jcrespo [[https://people.wikimedia.org/~ladsgroup/|This]] might answer your question \o/ [12:51:43] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: investigate RAID failure on beryllium.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T135178#2340902 (10Aklapper) >>! In T135178#2292645, @Cmjohnson wrote: > @jgreen, We will need to scheduled down time to replace the disk. @Cmjohnson / @jgreen: Has... [12:58:47] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2340906 (10Ladsgroup) I only need access to stats1002 without having access to private data [13:00:33] (03PS1) 10Hashar: zuul: log 'connection' bucket [puppet] - 10https://gerrit.wikimedia.org/r/291913 [13:06:08] 06Operations, 06Analytics-Kanban: Jmxtrans failures on Kafka hosts caused metric holes in grafana - https://phabricator.wikimedia.org/T136405#2340918 (10mforns) Thanks @elukey! [13:07:38] (03PS2) 10BBlack: raise fe mem size to 37% on text and upload [puppet] - 10https://gerrit.wikimedia.org/r/291593 (https://phabricator.wikimedia.org/T135384) [13:07:40] (03PS2) 10BBlack: varnish: jemalloc tuning for frontend caches [puppet] - 10https://gerrit.wikimedia.org/r/291592 (https://phabricator.wikimedia.org/T135384) [13:07:53] (03PS1) 10BBlack: tlsproxy: double ssl session cache size [puppet] - 10https://gerrit.wikimedia.org/r/291914 [13:09:10] (03CR) 10BBlack: varnish: jemalloc tuning for frontend caches (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/291592 (https://phabricator.wikimedia.org/T135384) (owner: 10BBlack) [13:16:34] (03CR) 10Lokal Profil: "My apologies.I didn't know I was the one who could add it to a deploy window. I'll figure out which works for me and add it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/288582 (https://phabricator.wikimedia.org/T135212) (owner: 10Lokal Profil) [13:20:11] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2340940 (10jcrespo) From what I undestand after talking on IRC to Ladsgroup, then the right groups is analytics-users: access to stats1002 and stats1004 with no direct... [13:27:41] (03PS1) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) [13:28:13] PROBLEM - Disk space on ms-be2012 is CRITICAL: DISK CRITICAL - free space: / 2124 MB (3% inode=96%) [13:32:26] (03PS2) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) [13:33:32] (03CR) 10jenkins-bot: [V: 04-1] Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey) [13:33:50] 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service: ORES-staging is broken due to service::uwsgi mandatory scap::target invoke - https://phabricator.wikimedia.org/T136488#2340955 (10Ladsgroup) Got a workaround with adding deployment-tin.deployment-prep.eqiad.wmflabs' IP as tin.eqiad.wmnet in /etc/hosts [13:34:05] (03PS1) 10Jcrespo: Add Ladsgroup to analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/291918 (https://phabricator.wikimedia.org/T136417) [13:34:13] (03Abandoned) 10Ladsgroup: service: Let other methods of deployment work in uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/291527 (owner: 10Ladsgroup) [13:34:35] (03CR) 10Jcrespo: [C: 04-2] "Not yet sponsored." [puppet] - 10https://gerrit.wikimedia.org/r/291918 (https://phabricator.wikimedia.org/T136417) (owner: 10Jcrespo) [13:36:35] (03PS3) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) [13:41:55] (03PS4) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) [13:56:09] (03CR) 10Elukey: "Puppet compiler looks good: https://puppet-compiler.wmflabs.org/3003/" [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey) [14:00:55] RECOVERY - Disk space on ms-be2012 is OK: DISK OK [14:14:38] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2340987 (10Halfak) +1 I'm working with @Ladsgroup and @TJones on some modeling work that will require access to private data on the stats machines. [14:15:57] (03PS1) 10Muehlenhoff: Add firejail profile and wrapper for ghostscript [puppet] - 10https://gerrit.wikimedia.org/r/291924 (https://phabricator.wikimedia.org/T135111) [14:16:08] jynus: ^ [14:17:40] 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2340989 (10Halfak) [14:18:15] (03CR) 10Jcrespo: [C: 031] "confirmed by User:Halfak_(WMF) on https://phabricator.wikimedia.org/T136417#2340987" [puppet] - 10https://gerrit.wikimedia.org/r/291918 (https://phabricator.wikimedia.org/T136417) (owner: 10Jcrespo) [14:18:15] 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341003 (10Halfak) See @Ladsgroup's access request here: T136406 We'll need similar rights to be able to update and deploy #ORES. [14:19:19] 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341005 (10jcrespo) a:03jcrespo [14:19:23] 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341006 (10Halfak) Notes that @Ladsgroup's permissions were added in this patch: https://gerrit.wikimedia.org/r/291255 [14:20:00] 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341007 (10Halfak) Actually, it looks like https://gerrit.wikimedia.org/r/#/c/291716/ might be more relevant. [14:20:36] subbu: I'd like to migrate wikitextexp to a different virt host, is it ok if there's an hour or so of downtime with that instance? [14:20:54] (03CR) 10Faidon Liambotis: Increase time before alter for elasticsearch disk space issues (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/290487 (owner: 10Gehel) [14:21:42] 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341008 (10Ladsgroup) https://gerrit.wikimedia.org/r/291716 is actually adding @halfak's permission as well [14:23:07] (03CR) 10Jcrespo: [C: 032] Add Ladsgroup to analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/291918 (https://phabricator.wikimedia.org/T136417) (owner: 10Jcrespo) [14:24:01] thanks :) [14:24:26] I probably need to wait until puppetmasters catch up [14:27:55] mmm, I do not see puppet creating your user [14:34:11] (03CR) 10Gehel: Increase time before alter for elasticsearch disk space issues (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/290487 (owner: 10Gehel) [14:36:39] PROBLEM - Host beryllium is DOWN: PING CRITICAL - Packet loss = 100% [14:37:23] (03PS3) 10BBlack: varnish: jemalloc tuning for frontend caches [puppet] - 10https://gerrit.wikimedia.org/r/291592 (https://phabricator.wikimedia.org/T135384) [14:37:36] (03CR) 10BBlack: [C: 032 V: 032] varnish: jemalloc tuning for frontend caches [puppet] - 10https://gerrit.wikimedia.org/r/291592 (https://phabricator.wikimedia.org/T135384) (owner: 10BBlack) [14:37:50] (03PS3) 10BBlack: raise fe mem size to 37% on text and upload [puppet] - 10https://gerrit.wikimedia.org/r/291593 (https://phabricator.wikimedia.org/T135384) [14:38:07] (03CR) 10BBlack: [C: 032 V: 032] raise fe mem size to 37% on text and upload [puppet] - 10https://gerrit.wikimedia.org/r/291593 (https://phabricator.wikimedia.org/T135384) (owner: 10BBlack) [14:38:17] icinga says 1/2 [14:38:43] oh, a wild downtime appears! [14:39:14] 06Operations, 10Ops-Access-Requests: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2340989 (10DarTar) Supporting this request as @halfak's manager. This will substantially improve ORES release workflow. [14:39:45] author marvin-bot ? [14:41:27] (03PS6) 10Gehel: Increase time before alter for elasticsearch disk space issues [puppet] - 10https://gerrit.wikimedia.org/r/290487 [14:41:43] (03CR) 10Gehel: Increase time before alter for elasticsearch disk space issues (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/290487 (owner: 10Gehel) [14:41:52] (03CR) 10jenkins-bot: [V: 04-1] Increase time before alter for elasticsearch disk space issues [puppet] - 10https://gerrit.wikimedia.org/r/290487 (owner: 10Gehel) [14:46:50] (03PS2) 10Thcipriani: Scap3 config for tilerator [puppet] - 10https://gerrit.wikimedia.org/r/291268 (https://phabricator.wikimedia.org/T129146) [14:49:06] (03PS7) 10Gehel: Increase time before alter for elasticsearch disk space issues [puppet] - 10https://gerrit.wikimedia.org/r/290487 [14:52:57] 06Operations, 10ops-codfw: Faulty RAM on mc2001 - https://phabricator.wikimedia.org/T136558#2341072 (10RobH) mc2001 is out of warranty for about a year now. I'll check with @mark and see if we're planning on replacing these in Q1 of next fiscal, or later. (Later and we'll want to replace the faulty memory.)... [14:53:55] (03PS1) 10Thcipriani: Scap3 config for Kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/291930 (https://phabricator.wikimedia.org/T129150) [14:54:15] !log rolling reboot of scb systems in codfw for Linux 4.4 upgrade [14:54:16] 06Operations, 10ops-eqiad: HP Warning on boot [Firmware Bug]: the BIOS has corrupted hw-PMU resources - https://phabricator.wikimedia.org/T136345#2341077 (10RobH) a:03RobH This isn't really ops-eqiad, but an issue for all HP setups onsite. I'll keep this assigned to me until the documentation is updated on... [14:54:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:55:02] <_joe_> /win 23 [14:55:39] andrewbogott, sounds good. i have 2 vms .. mw-expt and mw-base .. do you mean both of those or just one? but, either is okay. [14:56:22] subbu: oh sorry, I was reading the wrong line :) Instance name 'mw-base ' [14:56:29] I'll start moving it right now [14:56:38] k [15:00:00] (03PS1) 10Jcrespo: Add analytics-users accounts to stats1002 [puppet] - 10https://gerrit.wikimedia.org/r/291932 [15:00:04] anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T1500). Please do the needful. [15:00:04] mobrovac Pchelolo: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:42] I can SWAT today: mobrovac Pchelolo ping me when you're around [15:00:51] here Th [15:00:54] here thcipriani [15:00:59] let's go :) [15:01:04] (03CR) 10Jcrespo: "I do not know if this is right or documentation should be updated." [puppet] - 10https://gerrit.wikimedia.org/r/291932 (owner: 10Jcrespo) [15:01:27] (03PS2) 10Thcipriani: Math: Enable MathML everywhere but private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291766 (https://phabricator.wikimedia.org/T131177) (owner: 10Mobrovac) [15:01:29] thcipriani: I'm here too [15:02:15] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291766 (https://phabricator.wikimedia.org/T131177) (owner: 10Mobrovac) [15:03:02] (03Merged) 10jenkins-bot: Math: Enable MathML everywhere but private wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291766 (https://phabricator.wikimedia.org/T131177) (owner: 10Mobrovac) [15:03:42] !log restarting all frontend caches for new memory params (randomized order, ~1-min spacing, ~2h to completion) - T135384 [15:03:43] T135384: Raise cache frontend memory sizes significantly - https://phabricator.wikimedia.org/T135384 [15:03:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:05:43] 06Operations, 06Labs, 10Wikimedia-Video, 07Need-volunteer: Upload the Wikimania 2014 videos to Commons - https://phabricator.wikimedia.org/T106038#2341123 (10chasemp) p:05Triage>03Low [15:05:59] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:291766|Math: Enable MathML everywhere but private wikis]] (duration: 00m 34s) [15:06:02] ^ mobrovac check please [15:06:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:06:38] checking [15:07:36] !log rebooting mendelevium (ticket.wikimedia.org) for update to Linux 4.4 [15:07:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:08:19] RECOVERY - Host beryllium is UP: PING OK - Packet loss = 0%, RTA = 2.31 ms [15:08:21] thcipriani: works! [15:08:22] thnx! [15:08:30] mobrovac: cool, thanks for checking! [15:08:47] (03PS4) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 [15:08:59] PROBLEM - HHVM rendering on mw1180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:09:20] PROBLEM - Apache HTTP on mw1180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:19] 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2341138 (10Papaul) @Volans it is not a problem to update the RAID controller firmware . [15:10:27] !log thcipriani@tin Synchronized php-1.28.0-wmf.3/extensions/EventBus/EventBus.hooks.php: SWAT: [[gerrit:291904|Replace wfUrlEncode with rawurlencode]] (duration: 00m 27s) [15:10:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:10:35] ^ Pchelolo check please [15:12:28] (03CR) 10Alexandros Kosiaris: [C: 032] network: Move into module [puppet] - 10https://gerrit.wikimedia.org/r/291234 (owner: 10Alexandros Kosiaris) [15:12:33] (03PS7) 10Alexandros Kosiaris: network: Move into module [puppet] - 10https://gerrit.wikimedia.org/r/291234 [15:12:41] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] network: Move into module [puppet] - 10https://gerrit.wikimedia.org/r/291234 (owner: 10Alexandros Kosiaris) [15:13:29] thcipriani: it's not easy to check, but from what I can tell it's ok [15:13:31] thank you [15:13:39] (03PS1) 10Jcrespo: Move ladsgroup from analytics-users to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/291933 (https://phabricator.wikimedia.org/T136417) [15:13:39] Pchelolo: ack. Thanks. [15:14:56] (03PS2) 10Jcrespo: Move ladsgroup from analytics-users to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/291933 (https://phabricator.wikimedia.org/T136417) [15:15:09] PROBLEM - check_apache2 on payments2001 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2 [15:15:31] 06Operations, 10ops-codfw: lvs2006 degraded RAID - https://phabricator.wikimedia.org/T136584#2341163 (10Papaul) p:05Triage>03High [15:15:39] (03CR) 10Lokal Profil: "Added to the Wednesday, June 01 Morning SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/288582 (https://phabricator.wikimedia.org/T135212) (owner: 10Lokal Profil) [15:16:39] (03PS2) 10Alexandros Kosiaris: sca: remove cxserver-admin [puppet] - 10https://gerrit.wikimedia.org/r/291785 [15:16:49] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] sca: remove cxserver-admin [puppet] - 10https://gerrit.wikimedia.org/r/291785 (owner: 10Alexandros Kosiaris) [15:16:52] 06Operations, 10DBA, 06Labs, 10Tool-Labs: Replicate wikimania2017wiki to labs - https://phabricator.wikimedia.org/T126096#2341169 (10chasemp) p:05Triage>03Normal [15:17:19] (03PS5) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 [15:17:35] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce ores.svc.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/277725 (https://phabricator.wikimedia.org/T124202) (owner: 10Alexandros Kosiaris) [15:18:19] (03CR) 10jenkins-bot: [V: 04-1] Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 (owner: 10Andrew Bogott) [15:19:16] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 06Labs, 10hardware-requests: eqiad: (2) Relevance forge servers - https://phabricator.wikimedia.org/T131184#2341198 (10chasemp) p:05Triage>03Normal [15:19:19] PROBLEM - puppet last run on db1022 is CRITICAL: CRITICAL: puppet fail [15:19:20] (03PS6) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 [15:19:26] 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2341199 (10chasemp) p:05Triage>03Normal [15:20:09] PROBLEM - check_apache2 on payments2002 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2 [15:20:10] PROBLEM - check_apache2 on payments2001 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2 [15:20:10] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 1 failures [15:21:35] (03PS1) 10Eevans: Upgrade eqiad rack 'a' to Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/291935 (https://phabricator.wikimedia.org/T126629) [15:24:03] (03PS3) 10Jcrespo: Move ladsgroup from analytics-users to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/291933 (https://phabricator.wikimedia.org/T136417) [15:25:05] 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2341216 (10jcrespo) Do you need downtime for that? If yes, let's program a time. [15:25:09] RECOVERY - check_apache2 on payments2001 is OK: PROCS OK: 6 processes with command name apache2 [15:25:10] PROBLEM - check_apache2 on payments2002 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2 [15:25:11] PROBLEM - check_apache2 on payments2003 is CRITICAL: PROCS CRITICAL: 0 processes with command name apache2 [15:25:11] PROBLEM - check_puppetrun on payments2002 is CRITICAL: CRITICAL: Puppet has 1 failures [15:25:39] (03CR) 10Jcrespo: [C: 032] Move ladsgroup from analytics-users to statistics-users [puppet] - 10https://gerrit.wikimedia.org/r/291933 (https://phabricator.wikimedia.org/T136417) (owner: 10Jcrespo) [15:28:17] subbu: all done [15:28:18] (03PS55) 10Alexandros Kosiaris: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [15:29:01] andrewbogott, ok. thanks. [15:29:40] 06Operations, 06Labs, 10Labs-Infrastructure, 10Traffic: Move californium to an internal host? - https://phabricator.wikimedia.org/T133149#2341254 (10chasemp) p:05Triage>03Normal [15:30:10] RECOVERY - check_apache2 on payments2002 is OK: PROCS OK: 6 processes with command name apache2 [15:30:10] RECOVERY - check_apache2 on payments2003 is OK: PROCS OK: 6 processes with command name apache2 [15:30:10] RECOVERY - check_puppetrun on payments2002 is OK: OK: Puppet is currently enabled, last run 227 seconds ago with 0 failures [15:31:38] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stats for Ladsgroup - https://phabricator.wikimedia.org/T136417#2341265 (10jcrespo) 05Open>03Resolved User confirmed on IRC the access. [15:32:10] 06Operations, 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure, 10Monitoring: Have a paging check for Nova API accessible - https://phabricator.wikimedia.org/T133656#2341273 (10chasemp) p:05Triage>03High I believe this is still happening on infrequently [15:32:39] (03Abandoned) 10Jcrespo: Add analytics-users accounts to stats1002 [puppet] - 10https://gerrit.wikimedia.org/r/291932 (owner: 10Jcrespo) [15:36:49] (03CR) 10Ottomata: [C: 031] Set Kafka default cleanup policy to 'delete' to avoid any compaction with 0.9 [puppet] - 10https://gerrit.wikimedia.org/r/291697 (owner: 10Elukey) [15:37:58] (03CR) 10Alexandros Kosiaris: [C: 032] "merging" [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [15:38:04] (03PS56) 10Alexandros Kosiaris: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [15:38:08] (03CR) 10Alexandros Kosiaris: [V: 032] ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 (owner: 10Ladsgroup) [15:42:50] PROBLEM - Restbase root url on restbase1012 is CRITICAL: Connection refused [15:43:00] PROBLEM - Host cp3016 is DOWN: PING CRITICAL - Packet loss = 100% [15:43:03] ^^ looking [15:43:10] PROBLEM - Host cp3006 is DOWN: PING CRITICAL - Packet loss = 100% [15:43:10] PROBLEM - Host cp3037 is DOWN: PING CRITICAL - Packet loss = 100% [15:43:10] PROBLEM - Host cp3045 is DOWN: PING CRITICAL - Packet loss = 100% [15:43:11] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.32.79, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [15:43:20] RECOVERY - Host cp3006 is UP: PING OK - Packet loss = 0%, RTA = 84.00 ms [15:43:20] RECOVERY - Host cp3045 is UP: PING OK - Packet loss = 0%, RTA = 83.60 ms [15:43:20] RECOVERY - Host cp3037 is UP: PING OK - Packet loss = 0%, RTA = 83.51 ms [15:43:29] RECOVERY - Host cp3016 is UP: PING OK - Packet loss = 0%, RTA = 83.26 ms [15:44:14] thcipriani, hi, want to hold my hand for scap3 in 1.25 hours? [15:44:16] mobrovac: restbase on 1012 is down again, it logged a shutdown [15:44:45] yurik: if we can find an opsen that's willing to hold both our hands :) [15:44:54] gehel, ^? [15:45:00] RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:45:38] uf [15:45:41] kk thnx urandom [15:45:55] mobrovac: gremlins? [15:51:14] !log Disabling puppet in preparation for upgrade on restbase1007, 1010, and 1011 : T126629 [15:51:15] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [15:51:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:53:21] * gehel reading back... [15:54:00] PROBLEM - Host beryllium is DOWN: PING CRITICAL - Packet loss = 100% [15:54:25] yurik, thcipriani: I know mostly nothing about scap3, so I can hold your virtual hands if you want, but that's probably the extent of my contribution [15:54:40] RECOVERY - Restbase root url on restbase1012 is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.009 second response time [15:54:45] thcipriani, what would we need from opsen? [15:55:10] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [15:55:23] yurik: we just need to merge the two puppet patches + run puppet in a couple places. [15:55:47] thcipriani, who would be the most ideal opsen for the task? [15:55:53] (and the 2nd idea) [15:55:55] _joe_: and godog have been helping out with the scap3 migrations [15:55:56] ideal) [15:56:34] yurik: maybe we can schedule the move for your services with them soonish. They may be gone already :\ [15:56:51] thcipriani, i will be traveling in the next few days, will not be very stable for me [15:57:00] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [15:57:19] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [15:57:28] <_joe_> thcipriani: both me and godog are on vacation for the rest of the week btw [15:57:51] _joe_: oh, was unaware, sorry for the ping, thanks for the update. [15:58:26] yurik: maybe it'd be best to schedule for next week then, from the sounds of it. [15:58:29] <_joe_> thcipriani: well I am here tomorrow, but it seems yurik won't :P [15:58:52] sounds like next week it is :) [15:59:00] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [15:59:07] (03PS1) 10Alexandros Kosiaris: Enable ores.svc.eqiad.wmnet IP on scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/291942 [15:59:09] (03PS1) 10Alexandros Kosiaris: conftool: Add ores.svc.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/291943 [15:59:11] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [15:59:11] (03PS1) 10Alexandros Kosiaris: Add ores-admin to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/291944 [15:59:13] (03PS1) 10Alexandros Kosiaris: lvs: add ores [puppet] - 10https://gerrit.wikimedia.org/r/291945 [15:59:18] i could even do it during the day if needed (i'm on UTC+3 time) [16:00:04] godog moritzm coreyfloyd: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T1600). [16:00:04] urandom: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:12] * urandom is present [16:00:13] o/ [16:00:23] <_joe_> I can puppetswat if no one's around [16:00:58] (03PS2) 10Alexandros Kosiaris: Enable ores.svc.eqiad.wmnet IP on scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/291942 [16:01:04] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Enable ores.svc.eqiad.wmnet IP on scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/291942 (owner: 10Alexandros Kosiaris) [16:01:07] <_joe_> urandom: how should we proceed? [16:01:20] <_joe_> disable puppet on the hosts before upgrading? [16:01:25] _joe_: already done [16:01:27] <_joe_> and then run sequentially? [16:01:39] who restarted rb on restbase1012? [16:01:40] (03PS2) 10Giuseppe Lavagetto: Upgrade eqiad rack 'a' to Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/291935 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [16:01:41] yurik: kk, next week wfm. I want to make scap migrations a more regular calendar event just to make scheduling less ad-hoc. [16:01:57] _joe_: thank you sir! [16:01:58] <_joe_> mobrovac: need me to take a look? [16:02:07] thcipriani, scap migrations or scap depls? :) [16:02:11] _joe_: also, you have a strange idea of how to vacation! [16:02:15] <_joe_> urandom: so it's safe to merge if I understood correctly [16:02:21] <_joe_> urandom: actually I am working tomorrow [16:02:23] _joe_: yup [16:02:28] no _joe_, i'm there, but it seems rb went down and came back up [16:02:30] _joe_: safe to merge [16:02:34] and i'm not sure what's up with that [16:02:47] yurik: just the initial migrations. the deploys should already mostly be on the calendar :) [16:02:57] <_joe_> mobrovac: check /var/log/auth.log if someone did it by hand [16:03:06] kk thnx [16:03:22] <_joe_> jenkins makes me sad [16:03:31] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Upgrade eqiad rack 'a' to Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/291935 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [16:03:36] \o/ [16:04:02] <_joe_> urandom: merged [16:04:12] <_joe_> should I run puppet or are you going to do that? [16:04:24] _joe_: no, i will do that [16:04:30] incrementally :) [16:04:41] _joe_: thank you! [16:04:47] <_joe_> cool [16:04:51] <_joe_> that was easy :P [16:04:56] yeah :) [16:05:46] <_joe_> yurik: and btw, I guess gehel will need to learn about scap3 sooner or later ;) [16:05:57] <_joe_> he's your guy, right? [16:06:14] _joe_: yep, I'm happy to follow that one to learn a bit about it... [16:06:14] _joe_, agreed, but i don't want him to get overloaded and run screaming for the mountains [16:06:16] !log Stopping restbase1007-{a,b,c} in preparation for upgrade : T126629 [16:06:17] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [16:06:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:07:11] <_joe_> yurik: and me/filippo would be the alternatives not overloaded? ;) [16:07:45] !log depooling cp3032 to investigate T126062 [16:07:46] T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062 [16:07:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:07:51] <_joe_> my point is, it's a good thing for a few project your team has if gehel gets his ears wet with scap3 [16:08:00] <_joe_> *projects [16:08:14] hehe, totally, its in the pipeline :) [16:08:17] * gehel fully agree with _joe_ [16:08:19] <_joe_> I'm sure me/alex/filippo will help [16:08:52] but lets launch production maps first :) [16:09:25] so that you can forget about it? [16:09:27] tsc tsc [16:09:40] Concretely, what do I need to do for this scap3 deployment? [16:10:09] !log Upgrading Cassandra to 2.2.6 on restbase1007.eqiad.wmnet : T126629 [16:10:10] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [16:10:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:10:59] 06Operations, 10ops-codfw, 10DBA: db2034 degraded RAID - https://phabricator.wikimedia.org/T136583#2341475 (10Papaul) p:05Triage>03Normal [16:11:17] 06Operations, 10ops-codfw, 10DBA: db2034 degraded RAID - https://phabricator.wikimedia.org/T136583#2339694 (10Papaul) Will have the disk on site tomorrow Dear Mr Papaul Tshibamba, Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and... [16:11:51] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [16:12:02] PROBLEM - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is CRITICAL: Connection refused [16:12:22] ^^^ expected [16:12:22] PROBLEM - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: Connection refused [16:12:27] gehel: for tilerator, you'd merge: https://gerrit.wikimedia.org/r/#/c/291268/ then run puppet on tin, yurik would run: `scap deploy` for tilerator, it will fail, then you'll run puppet on the tilerator service nodes, it should succeed, then yurik can run `scap deploy` for tilerator again from tin and this time it should succeed and new code should be deployed. [16:12:42] 06Operations, 10ops-codfw: lvs2006 degraded RAID - https://phabricator.wikimedia.org/T136584#2341482 (10Papaul) Will receive the disk tomorrow. Dear Mr Papaul Tshibamba, Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details... [16:13:00] gehel: same process for kartotherian with https://gerrit.wikimedia.org/r/#/c/291930/ [16:13:01] PROBLEM - cassandra-b CQL 10.64.0.231:9042 on restbase1007 is CRITICAL: Connection refused [16:13:27] thcipriani: ok, so if everything goes as planned, it's easy, and if not I'll scream for help. I can do that. [16:13:34] :D [16:13:52] RECOVERY - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is OK: TCP OK - 0.000 second response time on port 9042 [16:14:06] sounds like the tent-posts of every deployment :) [16:14:22] RECOVERY - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is OK: TCP OK - 0.001 second response time on port 9042 [16:14:52] RECOVERY - cassandra-b CQL 10.64.0.231:9042 on restbase1007 is OK: TCP OK - 0.003 second response time on port 9042 [16:15:29] yurik, thcipriani: when does this need to happen? [16:15:41] !log Upgrade of restbase1007-{a,b,c} complete : T126629 [16:15:42] no rush i guess [16:15:42] T126629: Cassandra 2.1.13 and/or 2.2.6 - https://phabricator.wikimedia.org/T126629 [16:15:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:17:58] yurik: your call. I'm around during your deployment window today. These two will have to merge pre-scap-deploy: https://gerrit.wikimedia.org/r/#/c/285979/ https://gerrit.wikimedia.org/r/#/c/285980/ [16:18:36] 06Operations, 10ops-esams, 06DC-Ops, 10Traffic: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2341497 (10BBlack) Supporting the theory that these need firmware updates.... cp2001 racadm getversion: ``` Bios Version = 1.2.10 iDRAC Version... [16:18:41] thcipriani, sure, i have exactly one hour during the depl :) [16:18:50] afterwards is a meeting, and than i'm running for the train ) [16:19:06] (03PS3) 10Jcrespo: Add ores-admins group and provide permissions for scb [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) (owner: 10Ladsgroup) [16:19:28] gehel, btw, unrelated, seems like something was missed in the maps-test today - https://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&m=cpu_report&c=Maps+Cluster+codfw&h=maps-test2001.codfw.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS [16:19:37] could it be that db update broke? [16:20:07] (03PS3) 10Elukey: Set Kafka default cleanup policy to 'delete' to avoid any compaction with 0.9 [puppet] - 10https://gerrit.wikimedia.org/r/291697 [16:20:27] gehel, don't worry about it if it takes too long to check, but just in case its something obvious [16:20:29] yurik: gehel let's schedule something for next week if that works for you both. I don't anticipate the migration taking an hour but an absolute timebox is probably not a good thing. [16:20:48] thcipriani, agree, lets do it next week [16:20:49] thx! [16:20:51] sounds good to me. [16:20:53] thanks! [16:21:06] yurik: gehel awesome. Thanks both :) [16:25:02] PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: Puppet has 1 failures [16:27:55] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2341533 (10Ottomata) > This is a problem in the way we check data integrity rather than in vk itself, so we should fix our calculation... [16:28:42] (03CR) 10Jcrespo: "This is ready, could be optionally blocked on 278990" [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) (owner: 10Ladsgroup) [16:29:29] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2341538 (10elukey) >>! In T136314#2340456, @elukey wrote: > 1) vk is correctly adding the start timestamp to our logs but this trigger... [16:31:06] (03PS4) 10Jcrespo: Add ores-admins group and provide permissions for scb [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) (owner: 10Ladsgroup) [16:31:59] (03CR) 10Elukey: [C: 032] Set Kafka default cleanup policy to 'delete' to avoid any compaction with 0.9 [puppet] - 10https://gerrit.wikimedia.org/r/291697 (owner: 10Elukey) [16:32:05] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting deployment access (for deploying to scb) for Halfak - https://phabricator.wikimedia.org/T136612#2341545 (10jcrespo) Already ready to go: https://gerrit.wikimedia.org/r/291716 This could technically go at any time, but there is nothing to de... [16:33:38] 06Operations, 10ops-esams, 06DC-Ops, 10Traffic: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2341551 (10BBlack) Latest on Dell's site seems to be http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=5GCHC - going to reconfirm we still have issues, then t... [16:39:51] (03CR) 10Jcrespo: [C: 031] Add ores-admins group and provide permissions for scb [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) (owner: 10Ladsgroup) [16:40:51] 06Operations, 10ops-esams, 06DC-Ops, 10Traffic: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2341578 (10BBlack) So... cp3032 rebooted fine via software, after I had done a preemptive `racadm racreset`. Will move on to a few others that were known-problems in the past and se... [16:44:12] (03PS1) 10Jcrespo: Update autoinstall to include db1079-94 [puppet] - 10https://gerrit.wikimedia.org/r/291948 (https://phabricator.wikimedia.org/T135253) [16:45:15] (03PS1) 10Giuseppe Lavagetto: base: add service_sidekick [puppet] - 10https://gerrit.wikimedia.org/r/291949 [16:46:22] (03CR) 10jenkins-bot: [V: 04-1] base: add service_sidekick [puppet] - 10https://gerrit.wikimedia.org/r/291949 (owner: 10Giuseppe Lavagetto) [16:48:18] (03CR) 10Jcrespo: [C: 032] Update autoinstall to include db1079-94 [puppet] - 10https://gerrit.wikimedia.org/r/291948 (https://phabricator.wikimedia.org/T135253) (owner: 10Jcrespo) [16:48:26] (03CR) 10Ottomata: [C: 032 V: 032] "Comments yay!" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/291870 (https://phabricator.wikimedia.org/T136314) (owner: 10Elukey) [16:50:01] (03CR) 10Ottomata: [C: 031] access request for joe sutherland [puppet] - 10https://gerrit.wikimedia.org/r/290599 (https://phabricator.wikimedia.org/T136137) (owner: 10RobH) [16:51:13] (03PS5) 10Elukey: Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) [16:51:33] 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and set up 16 db's db1079-1094 - https://phabricator.wikimedia.org/T135253#2341606 (10jcrespo) [16:51:57] 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and set up 16 db's db1079-1094 - https://phabricator.wikimedia.org/T135253#2293311 (10jcrespo) a:05Cmjohnson>03jcrespo [16:52:52] !log disabling puppet on mc10* hosts as prep step for https://gerrit.wikimedia.org/r/#/c/291916. Memcached 1.4.25 will be deployed to mc1010 as part of a perf. test (T129963) [16:52:53] T129963: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963 [16:52:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:54:08] (03CR) 10Elukey: [C: 032] Deploy memcached 1.4.25 to mc1010 as part of a performance experiment. [puppet] - 10https://gerrit.wikimedia.org/r/291916 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey) [16:54:28] (03PS1) 10Madhuvishy: ifttt: Specify the right uwsgi plugin for python2 [puppet] - 10https://gerrit.wikimedia.org/r/291952 [16:54:32] oh, one host is actually installed already [17:00:04] yurik gwicke cscott arlolra subbu: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T1700). Please do the needful. [17:03:34] !log starting branch-cut for mediawiki and extensions for version 1.28.0-wmf.4 [17:03:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:03:48] nope, skipping this one [17:04:23] (03PS1) 10Jcrespo: Add bot_password as a private table [puppet] - 10https://gerrit.wikimedia.org/r/291954 (https://phabricator.wikimedia.org/T135074) [17:06:45] could I get an opsent to remove /tmp/make-wmf-branch ? [17:06:52] er opsen rather :P [17:07:03] on tin [17:07:06] where, tin? [17:07:07] ok [17:07:09] doing [17:07:13] thanks [17:08:03] thcipriani, done [17:08:18] jynus: thank you! [17:08:19] I will delete it when you confirm tin is not horribly broken [17:08:30] (it is not there anymore) [17:08:52] (rm redirects to move for ops) [17:12:23] (03CR) 10Chad: [C: 031] Add bot_password as a private table [puppet] - 10https://gerrit.wikimedia.org/r/291954 (https://phabricator.wikimedia.org/T135074) (owner: 10Jcrespo) [17:17:55] !log depooled reboot of cp3040 - T126062 [17:17:56] T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062 [17:18:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:19:31] (03PS2) 10Jcrespo: Add bot_password as a private table [puppet] - 10https://gerrit.wikimedia.org/r/291954 (https://phabricator.wikimedia.org/T135074) [17:21:08] (03CR) 10Jcrespo: [C: 032] Add bot_password as a private table [puppet] - 10https://gerrit.wikimedia.org/r/291954 (https://phabricator.wikimedia.org/T135074) (owner: 10Jcrespo) [17:26:56] !log restarting mysqls at sanitarium, some transitional lag on labs [17:27:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:31:23] I saw a kernel update, opted for a full reboot [17:32:20] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures [17:41:16] (03PS7) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 [17:41:29] RECOVERY - Host beryllium is UP: PING OK - Packet loss = 0%, RTA = 2.90 ms [17:45:10] (03PS1) 10Jcrespo: Change filter to the actual real name: bot_passwords, plural [puppet] - 10https://gerrit.wikimedia.org/r/291956 (https://phabricator.wikimedia.org/T135074) [17:45:12] (03PS1) 10Urbanecm: Enable VE in NS_PROJECT in cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291957 (https://phabricator.wikimedia.org/T136628) [17:46:14] (03CR) 10Jcrespo: [C: 032] Change filter to the actual real name: bot_passwords, plural [puppet] - 10https://gerrit.wikimedia.org/r/291956 (https://phabricator.wikimedia.org/T135074) (owner: 10Jcrespo) [17:50:36] (03CR) 10Ladsgroup: "I think this can be merged now :)" [puppet] - 10https://gerrit.wikimedia.org/r/278989 (https://phabricator.wikimedia.org/T124201) (owner: 10Alexandros Kosiaris) [17:52:26] (03CR) 10Dereckson: "@Nemo: please schedule this during a SWAT window at https://wikitech.wikimedia.org/wiki/Deployments." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250384 (https://phabricator.wikimedia.org/T130442) (owner: 10Nemo bis) [17:54:00] PROBLEM - Router interfaces on pfw-eqiad is CRITICAL: CRITICAL: host 208.80.154.218, interfaces up: 106, down: 1, dormant: 0, excluded: 1, unused: 0BRge-11/0/10: down - berylliumBR [17:56:37] 06Operations, 10ops-codfw, 10media-storage: rack ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341867 (10RobH) [17:57:20] 06Operations, 10ops-codfw, 10media-storage: rack ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341884 (10RobH) a:05RobH>03fgiunchedi I'm going to assign this to @fgiunchedi for his recommendation on how we need to space out the 6 new systems in the 4 rows. Once we have his feedback, @Papaul... [17:57:56] 06Operations, 10ops-codfw, 10media-storage: rack ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341888 (10RobH) [17:57:58] 06Operations, 10media-storage, 07Tracking: refresh swift hardware in codfw/eqiad (tracking) - https://phabricator.wikimedia.org/T130012#2341887 (10RobH) [17:58:11] PROBLEM - aqs endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:00:29] PROBLEM - aqs endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:01:05] (03CR) 10Nemo bis: "I don't use SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250384 (https://phabricator.wikimedia.org/T130442) (owner: 10Nemo bis) [18:01:13] (03CR) 10Dereckson: "You can use Depends-On: I66b437795a376223b02a0c8a87bddc197470a3b9 to park the dependency." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287936 (https://phabricator.wikimedia.org/T134770) (owner: 10Addshore) [18:04:12] 06Operations, 10ops-codfw, 10media-storage: rack ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341901 (10RobH) It seems @fgiunchedi is out from now until the 10th. These servers may arrive on site before he returns. [18:04:13] RECOVERY - aqs endpoints health on aqs1002 is OK: All endpoints are healthy [18:05:03] 06Operations, 10DBA: Install, configure and provision recently arrived db core machines - https://phabricator.wikimedia.org/T133398#2230646 (10jcrespo) [18:05:05] 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and set up 16 db's db1079-1094 - https://phabricator.wikimedia.org/T135253#2341902 (10jcrespo) 05Open>03stalled Stalling until T133398 is completed as I've been warned there could be some network issues. [18:05:21] Nemo_bis: the change isn't going to deploy itself by magic, I've asked on the task if a wight or n uria is willing to deploy it, but if they aren't interested either, we don't have magic processes to do it. What you could do to help is note on https://phabricator.wikimedia.org/T130442 the procedure to test your change, so the one willing to deploy it for you will know what to do. [18:05:34] 06Operations, 10DBA: Install, configure and provision recently arrived db core machines - https://phabricator.wikimedia.org/T133398#2230646 (10jcrespo) a:03jcrespo [18:06:42] RECOVERY - aqs endpoints health on aqs1003 is OK: All endpoints are healthy [18:07:09] 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2341913 (10RobH) [18:07:24] 06Operations, 10ops-codfw, 10media-storage: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341929 (10RobH) [18:12:43] (03CR) 10Dereckson: [C: 04-1] "Blocked by https://phabricator.wikimedia.org/T124841" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285009 (https://phabricator.wikimedia.org/T104163) (owner: 10Urbanecm) [18:13:23] !log mwscript deleteEqualMessages.php --wiki hrwikibooks (T45917) [18:13:23] T45917: Delete all redundant "MediaWiki" pages for system messages - https://phabricator.wikimedia.org/T45917 [18:13:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:15:02] (03Abandoned) 10Urbanecm: Enable DynamicPageList extension on te.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285009 (https://phabricator.wikimedia.org/T104163) (owner: 10Urbanecm) [18:15:42] (03CR) 10Dereckson: [C: 031] "This patch is ready to deploy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284087 (https://phabricator.wikimedia.org/T132972) (owner: 10Eranroz) [18:17:09] (03CR) 10Dereckson: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291218 (https://phabricator.wikimedia.org/T58037) (owner: 10Matěj Suchánek) [18:20:13] RECOVERY - Router interfaces on pfw-eqiad is OK: OK: host 208.80.154.218, interfaces up: 108, down: 0, dormant: 0, excluded: 1, unused: 0 [18:20:15] (03CR) 10Dereckson: [C: 031] "This looks good to me, and depending change has been merged. So this is ready for deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291218 (https://phabricator.wikimedia.org/T58037) (owner: 10Matěj Suchánek) [18:20:20] thcipriani, yurik: did we decide on a date (more precise than "next week") for this scap3 thing? [18:21:12] gehel, nope, up to thcipriani, i'm very flex :) [18:21:18] gehel: no we didn't. Would be nice to get it on the deployemnt calendar as well. Monday and Wednesday are usually the most open days there. [18:21:52] Monday pre-service deployment window? [18:22:15] thcipriani, what time is it? [18:22:18] oh wait, meeting time at that time. [18:22:33] PROBLEM - Host cp3043 is DOWN: PING CRITICAL - Packet loss = 100% [18:22:35] hehe, i think the depl calendar should be managed in gcal :) [18:22:52] PROBLEM - Host cp3012 is DOWN: PING CRITICAL - Packet loss = 100% [18:23:02] should we create a bot that copies all the events from wiki to gcal? [18:23:03] PROBLEM - Host lvs3003 is DOWN: PING CRITICAL - Packet loss = 100% [18:23:23] PROBLEM - Host cp3008 is DOWN: PING CRITICAL - Packet loss = 100% [18:23:29] yurik: that bot would be great! [18:23:42] i will let greg-g do that :) [18:23:50] or who manages rel eng nowadays? [18:24:02] RECOVERY - Host cp3008 is UP: PING WARNING - Packet loss = 54%, RTA = 83.61 ms [18:24:02] yurik: too easy, you give the idea, you take care of the implementation! [18:24:02] RECOVERY - Host cp3043 is UP: PING WARNING - Packet loss = 54%, RTA = 86.26 ms [18:24:02] `? [18:24:03] RECOVERY - Host cp3012 is UP: PING OK - Packet loss = 0%, RTA = 83.30 ms [18:24:11] Wednesday 2016-06-08 18:00 UTC would work for me. Right after SoS. [18:24:13] RECOVERY - Host lvs3003 is UP: PING OK - Packet loss = 0%, RTA = 87.73 ms [18:25:25] gehel: I have been known to be around an hour before morning SWAT. [18:25:36] so 7am SF time. [18:25:51] thcipriani: damn! I'm not a morning person ... that's impressive! [18:26:05] I'm not in SF's timezone :P [18:26:12] that's 8am for me. [18:26:42] Wednesday 18UTC sounds good to me. yurik ? [18:27:28] gehel, thcipriani, 18UTC is ogod [18:27:54] PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: Puppet has 2 failures [18:28:33] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [18:29:12] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: Puppet has 1 failures [18:30:35] PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: puppet fail [18:35:23] yurik: gehel greg-g https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160608T1800 [18:36:32] (03PS2) 10Ottomata: Make hive-metastore service depend on libmysql-jar in classpath [puppet/cdh] - 10https://gerrit.wikimedia.org/r/284506 (https://phabricator.wikimedia.org/T133198) [18:42:52] (03CR) 10Matěj Suchánek: "Ok, thanks. Sheduled for tomorrow's Morning SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291218 (https://phabricator.wikimedia.org/T58037) (owner: 10Matěj Suchánek) [18:45:35] thcipriani: weee [18:47:10] :) [18:47:15] (03PS3) 10Ottomata: Make hive-metastore service depend on libmysql-jar in classpath [puppet/cdh] - 10https://gerrit.wikimedia.org/r/284506 (https://phabricator.wikimedia.org/T133198) [18:48:14] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 649 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6081860 keys - replication_delay is 649 [18:50:13] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:50:20] (03CR) 10Ottomata: [C: 032] Make hive-metastore service depend on libmysql-jar in classpath [puppet/cdh] - 10https://gerrit.wikimedia.org/r/284506 (https://phabricator.wikimedia.org/T133198) (owner: 10Ottomata) [18:52:22] 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2342059 (10mmodell) @luke081515 I'll work on it a bit and see if I can get it to be more automated. [18:52:29] 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2342060 (10mmodell) a:03mmodell [18:53:50] (03PS1) 10Ottomata: Update cdh submodule wth libmysql-java dependency fix [puppet] - 10https://gerrit.wikimedia.org/r/291964 (https://phabricator.wikimedia.org/T133198) [18:54:03] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 2.008 second response time [18:54:42] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [18:55:23] RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:55:53] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:56:54] (03CR) 10Ottomata: [C: 032] Update cdh submodule wth libmysql-java dependency fix [puppet] - 10https://gerrit.wikimedia.org/r/291964 (https://phabricator.wikimedia.org/T133198) (owner: 10Ottomata) [18:57:52] RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:00:04] thcipriani: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T1900). [19:02:44] * thcipriani does [19:02:49] PROBLEM - HHVM rendering on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:51] 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure: connect usb external disk to labmon1001 - https://phabricator.wikimedia.org/T136242#2342154 (10Cmjohnson) Connected a 3TB disk with the usb drive toaster. Did not mount. [19:03:38] 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service: ORES-staging is broken due to service::uwsgi mandatory scap::target invoke - https://phabricator.wikimedia.org/T136488#2336803 (10yuvipanda) List of reasons why this is a problem: 1. Setting up a standalone scap3 master in a project that is not deployment-p... [19:04:10] RECOVERY - HHVM rendering on mw1020 is OK: HTTP OK: HTTP/1.1 200 OK - 68586 bytes in 0.157 second response time [19:04:30] 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure: connect usb external disk to labmon1001 - https://phabricator.wikimedia.org/T136242#2342163 (10Cmjohnson) a:05Cmjohnson>03RobH [19:06:54] 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service: ORES-staging is broken due to service::uwsgi mandatory scap::target invoke - https://phabricator.wikimedia.org/T136488#2342181 (10Ladsgroup) {meme, src="southparkfan-approves", below=Great!} [19:07:04] (03PS2) 10Dereckson: Enable Visual Editor in NS_PROJECT in cs.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291957 (https://phabricator.wikimedia.org/T136628) (owner: 10Urbanecm) [19:08:19] (03CR) 10Dereckson: [C: 04-1] Enable Visual Editor in NS_PROJECT in cs.wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291957 (https://phabricator.wikimedia.org/T136628) (owner: 10Urbanecm) [19:10:40] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6031337 keys - replication_delay is 0 [19:11:59] (03PS1) 10Yuvipanda: service: Allow not requiring scap3 for service::uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/291967 (https://phabricator.wikimedia.org/T136488) [19:12:17] 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service, 13Patch-For-Review: ORES-staging is broken due to service::uwsgi mandatory scap::target invoke - https://phabricator.wikimedia.org/T136488#2342223 (10yuvipanda) Patch takes slightly different approach, but same thing. [19:14:23] !log Restart restbase1007-c.eqiad.wmnet because reasons [19:14:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:15:45] (03CR) 10Awight: Use full URL in $wgNoticeHideUrls (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250384 (https://phabricator.wikimedia.org/T130442) (owner: 10Nemo bis) [19:16:54] (03PS8) 10Andrew Bogott: Allow horizon to query the labs puppetmaster for a list of classes [puppet] - 10https://gerrit.wikimedia.org/r/284103 [19:16:56] (03PS1) 10Andrew Bogott: Allow horizon hosts to contact the labs puppetmaster. [puppet] - 10https://gerrit.wikimedia.org/r/291969 (https://phabricator.wikimedia.org/T91990) [19:17:45] (03PS1) 10Thcipriani: Group0 to 1.28.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291970 [19:18:41] (03CR) 10Andrew Bogott: [C: 032] Allow horizon hosts to contact the labs puppetmaster. [puppet] - 10https://gerrit.wikimedia.org/r/291969 (https://phabricator.wikimedia.org/T91990) (owner: 10Andrew Bogott) [19:22:09] 06Operations, 10ops-eqiad: mw1070-89 and mw1121-30 are shut down and should be physically decommissioned - https://phabricator.wikimedia.org/T133770#2342276 (10Cmjohnson) [19:22:16] 06Operations, 10ops-eqiad: mw1070-89 and mw1121-30 are shut down and should be physically decommissioned - https://phabricator.wikimedia.org/T133770#2242410 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson [19:22:18] 06Operations, 13Patch-For-Review, 05codfw-rollout, 03codfw-rollout-Jan-Mar-2016: Reduce the number of appservers we're using in eqiad - https://phabricator.wikimedia.org/T126242#2342280 (10Cmjohnson) [19:23:26] !log thcipriani@tin Started scap: testwiki to php-1.28.0-wmf.4 and rebuild l10n cache [19:23:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:28:02] (03PS1) 10Mobrovac: MobileApps: use the provided request templates for API calls [puppet] - 10https://gerrit.wikimedia.org/r/291972 [19:29:35] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2342320 (10Ottomata) [19:29:46] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2260510 (10Ottomata) 05Open>03Resolved [19:34:11] 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2342330 (10Southparkfan) [19:35:02] 06Operations, 10ops-ulsfo: cp4016: bad power supply - https://phabricator.wikimedia.org/T134526#2342331 (10RobH) 05Open>03Resolved Dropping off the return to USPS today. {F4095382} {F4095386} [19:35:05] (03PS2) 10BBlack: tlsproxy: double ssl session cache size [puppet] - 10https://gerrit.wikimedia.org/r/291914 [19:35:34] (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: double ssl session cache size [puppet] - 10https://gerrit.wikimedia.org/r/291914 (owner: 10BBlack) [19:35:53] (03PS2) 10BryanDavis: Make the builder script less simple [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291525 [19:37:42] (03CR) 10BryanDavis: "check experimental" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291525 (owner: 10BryanDavis) [19:37:53] (03CR) 10Mobrovac: "OKed by PCC - https://puppet-compiler.wmflabs.org/3008/" [puppet] - 10https://gerrit.wikimedia.org/r/291972 (owner: 10Mobrovac) [19:39:37] (03CR) 10Mobrovac: [C: 04-1] "This needs to be deployed in sync with I945d21e341b5d6d7d3a9848ea166bc68f281878d, so -1'ing until the time comes." [puppet] - 10https://gerrit.wikimedia.org/r/291972 (owner: 10Mobrovac) [19:45:55] !log restarting cp* nginxes for config update [19:46:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:49:26] (03CR) 10BryanDavis: [C: 032] Add base PHP container & php web container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290607 (owner: 10Yuvipanda) [19:49:38] (03CR) 10BryanDavis: [C: 032] Switch to using wikimedia-jessie as base container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290795 (owner: 10Yuvipanda) [19:50:04] (03CR) 10BryanDavis: [C: 032] Add a simple builder script [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290793 (owner: 10Yuvipanda) [19:50:21] (03CR) 10BryanDavis: [C: 032] Add a java base + web container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291222 (https://phabricator.wikimedia.org/T124903) (owner: 10Yuvipanda) [19:50:31] (03CR) 10Yuvipanda: [C: 032] Make the builder script less simple [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291525 (owner: 10BryanDavis) [19:50:45] !log depooled reboot of cp3030 - T126062 [19:50:46] T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062 [19:50:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:51:10] YuviPanda: is that all set up for zuul to do the actual merges? [19:51:25] bd808: oh, good point, I've no idea [19:51:42] Dereckson: Hi i left a comment at https://gerrit.wikimedia.org/r/#/c/291671/ [19:52:02] bd808: shall I just merge by hand? or is there a thing I need to do for zuul? [19:53:22] PROBLEM - Host cp3030 is DOWN: PING CRITICAL - Packet loss = 100% [19:53:42] YuviPanda: probably just merge yourself for now. I think https://gerrit.wikimedia.org/r/#/c/291685/ will get zuul and jenkins setup properly [19:54:11] RECOVERY - Host cp3030 is UP: PING OK - Packet loss = 0%, RTA = 83.33 ms [19:54:19] Or I can go hit v+2 on the start of the chain if you want [19:54:39] bd808: nah, I feel ok doing the V+2 [19:54:50] cool beans [19:54:50] (03CR) 10Yuvipanda: [V: 032] Add base PHP container & php web container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290607 (owner: 10Yuvipanda) [19:55:01] (03CR) 10Yuvipanda: [V: 032] Switch to using wikimedia-jessie as base container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290795 (owner: 10Yuvipanda) [19:55:29] (03CR) 10Yuvipanda: [V: 032] Add a simple builder script (033 comments) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/290793 (owner: 10Yuvipanda) [19:55:43] (03CR) 10Yuvipanda: [V: 032] Add a java base + web container [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291222 (https://phabricator.wikimedia.org/T124903) (owner: 10Yuvipanda) [19:55:54] (03CR) 10Yuvipanda: [V: 032] Make the builder script less simple [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291525 (owner: 10BryanDavis) [19:57:04] !log depooled reboot of cp3031 - T126062 [19:57:04] T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062 [19:57:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:59:12] PROBLEM - Host cp3031 is DOWN: PING CRITICAL - Packet loss = 100% [20:00:33] RECOVERY - Host cp3031 is UP: PING OK - Packet loss = 0%, RTA = 94.20 ms [20:02:02] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 660 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6036004 keys - replication_delay is 660 [20:02:34] !log depooled reboot of cp3032 - T126062 [20:02:35] T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062 [20:02:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:02:54] !log depooled reboot of cp3033 (not 3032) - T126062 [20:02:55] T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062 [20:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:08:40] (03PS1) 10Yuvipanda: build: Allow disregarding cache when building docker images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 [20:09:00] Per Dereckson suggestion I added a request for deploying a config change for hewiki (gerrit:284087) in the next SWAT slot. I added the request Deployments page. [20:09:20] (03CR) 10Yuvipanda: "Tested" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 (owner: 10Yuvipanda) [20:09:55] !log depooled reboot of cp3041 - T126062 [20:09:56] T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062 [20:10:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:10:03] !log disabling cql binary transport on restbase1007-c [20:10:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:11:42] eranroz: nice, thanks [20:12:21] PROBLEM - Host cp3041 is DOWN: PING CRITICAL - Packet loss = 100% [20:12:46] (03PS2) 10BryanDavis: build: Allow disregarding cache when building docker images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 (owner: 10Yuvipanda) [20:13:17] (03CR) 10BryanDavis: [C: 032] build: Allow disregarding cache when building docker images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 (owner: 10Yuvipanda) [20:13:22] RECOVERY - Host cp3041 is UP: PING OK - Packet loss = 0%, RTA = 91.72 ms [20:13:35] !log thcipriani@tin Finished scap: testwiki to php-1.28.0-wmf.4 and rebuild l10n cache (duration: 50m 09s) [20:13:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:13:42] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6023645 keys - replication_delay is 0 [20:14:13] thcipriani: can I sneak in a backport? [20:14:44] jzerebecki: sure, hopefully not one I need a full scap on? [20:14:55] (03CR) 10BryanDavis: [V: 032] build: Allow disregarding cache when building docker images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291980 (owner: 10Yuvipanda) [20:15:21] PROBLEM - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is CRITICAL: Connection refused [20:15:35] bd808: thanks :D I'm working on small no-op refactors to toollabs-webservice too, so should be up soon [20:16:08] thcipriani: no only one file https://gerrit.wikimedia.org/r/#/c/291974/ [20:18:57] jzerebecki: sure. I'll get it out (if zuul/wikibugs notices) [20:23:01] RECOVERY - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is OK: TCP OK - 0.000 second response time on port 9042 [20:23:55] !log depooled reboot of cp3042 - T126062 [20:23:56] T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062 [20:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:26:11] PROBLEM - Host cp3042 is DOWN: PING CRITICAL - Packet loss = 100% [20:27:02] RECOVERY - Host cp3042 is UP: PING OK - Packet loss = 0%, RTA = 83.92 ms [20:28:34] !log depooled reboot of cp3043 - T126062 [20:28:34] T126062: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062 [20:28:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:28:59] (03PS1) 10BryanDavis: [DO NOT MERGE] Verify that tox job notices flake8 failure [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291988 [20:29:41] (03CR) 10jenkins-bot: [V: 04-1] [DO NOT MERGE] Verify that tox job notices flake8 failure [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291988 (owner: 10BryanDavis) [20:31:13] PROBLEM - Host cp3043 is DOWN: PING CRITICAL - Packet loss = 100% [20:31:42] RECOVERY - Host cp3043 is UP: PING OK - Packet loss = 0%, RTA = 85.89 ms [20:32:18] (03Abandoned) 10BryanDavis: [DO NOT MERGE] Verify that tox job notices flake8 failure [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291988 (owner: 10BryanDavis) [20:32:33] PROBLEM - traffic-pool service on cp3043 is CRITICAL: CRITICAL - Expecting active but unit traffic-pool is activating [20:33:50] !log thcipriani@tin Synchronized php-1.28.0-wmf.4/extensions/Wikidata/extensions/Wikibase/view/resources/jquery/ui/jquery.ui.tagadata.js: [[gerit:291974|Update Wikibase]] (duration: 00m 30s) [20:33:53] RECOVERY - traffic-pool service on cp3043 is OK: OK - traffic-pool is active [20:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:34:08] ^ jzerebecki wmf.3 coming [20:34:13] PROBLEM - NTP on cp3043 is CRITICAL: NTP CRITICAL: Offset unknown [20:34:30] 06Operations, 10ops-esams, 06DC-Ops, 10Traffic: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2342590 (10BBlack) 05Open>03Resolved a:03BBlack All of cache_text in esams (8/12 of the nodes considered affected) have rebooted into 4.4.2-3+wmf1 today without issue. It coul... [20:35:04] !log thcipriani@tin Synchronized php-1.28.0-wmf.3/extensions/Wikidata/extensions/Wikibase/view/resources/jquery/ui/jquery.ui.tagadata.js: [[gerit:291974|Update Wikibase]] (duration: 00m 23s) [20:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:35:09] ^ jzerebecki check please [20:36:04] RECOVERY - NTP on cp3043 is OK: NTP OK: Offset -0.05275964737 secs [20:36:49] (03CR) 10Thcipriani: [C: 032] Group0 to 1.28.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291970 (owner: 10Thcipriani) [20:37:01] thcipriani: it seems i still get old js [20:37:29] thcipriani: ok works now [20:37:29] thx [20:37:41] jzerebecki: np :) [20:37:55] (03Merged) 10jenkins-bot: Group0 to 1.28.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291970 (owner: 10Thcipriani) [20:38:04] (03PS1) 10Yuvipanda: java: Inherit from correct base image name [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 [20:38:13] bd808: ^ we can test on this [20:38:21] (03CR) 10jenkins-bot: [V: 04-1] java: Inherit from correct base image name [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 (owner: 10Yuvipanda) [20:38:33] hah [20:38:45] (03PS2) 10Yuvipanda: java: Inherit from correct base image name [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 [20:39:46] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.4 [20:39:50] YuviPanda: I think the php template has the same problem [20:39:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:41:14] (03PS3) 10Yuvipanda: Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 [20:41:25] bd808: yeah, I think it worked for php because there might've been a hand built toollabs-php before [20:41:33] (03PS4) 10Yuvipanda: Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 [20:42:17] well that's weird. Seeing lots of notices for Notice: Undefined variable: wgAbuseFilterAvailableActions in /srv/mediawiki/wmf-config/abusefilter.php on line 23 [20:42:23] (03CR) 10BryanDavis: [C: 032] Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 (owner: 10Yuvipanda) [20:42:29] (03CR) 10jenkins-bot: [V: 04-1] Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 (owner: 10Yuvipanda) [20:42:31] (03PS3) 10Dereckson: Add namespace translation 'Portal' for diq [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284866 (https://phabricator.wikimedia.org/T133702) (owner: 10Raimond Spekking) [20:42:55] hrrm [20:43:51] (03CR) 10Dereckson: [C: 031] "So, finally, we've got feedback from the community: they agree with this change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284866 (https://phabricator.wikimedia.org/T133702) (owner: 10Raimond Spekking) [20:44:05] (03Merged) 10jenkins-bot: Make web templates inherit from correct base templates [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/291990 (owner: 10Yuvipanda) [20:44:07] ah, there it goes [20:55:24] !log rolling back group0 wmf.4 for T136644 too much log spam [20:55:25] T136644: Notice: Undefined variable: wgAbuseFilterAvailableActions in /srv/mediawiki/wmf-config/abusefilter.php on line 23 - https://phabricator.wikimedia.org/T136644 [20:55:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:55:53] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [20:57:08] (03PS1) 10Thcipriani: Revert "Group0 to 1.28.0-wmf.4" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292027 [20:58:10] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 back to 1.28.0-wmf.3 [20:58:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:58:58] (03CR) 10Thcipriani: [C: 032] Revert "Group0 to 1.28.0-wmf.4" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292027 (owner: 10Thcipriani) [20:59:35] (03Merged) 10jenkins-bot: Revert "Group0 to 1.28.0-wmf.4" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292027 (owner: 10Thcipriani) [21:06:05] (03PS1) 10Yuvipanda: [WIP] Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 [21:10:15] (03PS2) 10Yuvipanda: [WIP] Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 [21:12:51] 06Operations, 10DBA, 06Labs: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2342722 (10russblau) Is there any update on the status of this? On 23 May, the revision table was in progress and was expected to take ~12 hours. The pagelinks table is about 3X larger and so might be expected... [21:12:52] thcipriani: I think I think this https://github.com/wikimedia/mediawiki-extensions-AbuseFilter/commit/e71808f4c4deca416ecd39160d12f2584bfb9d65 caused the problem [21:13:23] thcipriani: What is on line 23 of /srv/mediawiki/wmf-config/abusefilter.php [21:14:17] paladox: I saw that on the ticket. Added tgr to the ticket. https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/abusefilter.php#L21-L23 [21:14:25] (03PS1) 10Madhuvishy: uwsgi: Allow specifying plugins optionally as a uwsgi command line option [puppet] - 10https://gerrit.wikimedia.org/r/292030 [21:14:30] Ok thanks [21:14:39] YuviPanda: ^^ [21:14:55] not tested yet but about to [21:16:17] (03CR) 10Yuvipanda: [C: 04-1] uwsgi: Allow specifying plugins optionally as a uwsgi command line option (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/292030 (owner: 10Madhuvishy) [21:16:23] madhuvishy: kk, did a nit [21:17:02] YuviPanda: yup cool. can you merge https://gerrit.wikimedia.org/r/#/c/291952/ [21:19:43] thcipriani: it's best to use @username when adding someone to a ticket [21:19:43] PROBLEM - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is CRITICAL: Connection refused [21:19:47] (03PS2) 10Yuvipanda: ifttt: Specify the right uwsgi plugin for python2 [puppet] - 10https://gerrit.wikimedia.org/r/291952 (owner: 10Madhuvishy) [21:19:54] (03CR) 10Yuvipanda: [C: 032 V: 032] ifttt: Specify the right uwsgi plugin for python2 [puppet] - 10https://gerrit.wikimedia.org/r/291952 (owner: 10Madhuvishy) [21:20:10] if you just update the subscribers, chances are they won't see it until someone adds a new comment [21:20:54] ACKNOWLEDGEMENT - cassandra-c CQL 10.64.0.232:9042 on restbase1007 is CRITICAL: Connection refused eevans t-shooting - The acknowledgement expires at: 2016-06-01 21:20:33. [21:21:02] tgr: hmm, hadn't noticed that, will keep it in mind in future. [21:22:03] tgr: ticket I was talking about (if you missed backscroll): https://phabricator.wikimedia.org/T136644 [21:22:28] (03PS3) 10Yuvipanda: [WIP] Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 [21:22:51] thcipriani: yeah, saw it, extension registration must cause that variable to be defined too late [21:23:00] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 (owner: 10Yuvipanda) [21:23:12] that's seemingly the case in this instance. [21:23:16] maybe legoktm has an idea how to solve that nicely [21:24:16] 06Operations, 10ops-codfw, 13Patch-For-Review: codfw old mw app server decomission - https://phabricator.wikimedia.org/T135468#2342748 (10Papaul) disk wipe complete on mw2001-mw2016 and mw2018-mw2040. Those servers are unracked and stored in the storage area. Disk wipe in progress on mw2014-mw2060 [21:24:31] I commented [21:24:31] (03PS4) 10Yuvipanda: Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 [21:24:38] tgr would reverting so we can go on with wmf 4 and try next week with wmf 5 if it is fixed. [21:26:21] thcipriani legoktm ^^ [21:26:23] PROBLEM - cxserver endpoints health on scb1002 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) is CRITICAL: Test Fetch enwiki Oxygen page returned the unexpected status 404 (expecting: 200) [21:26:33] PROBLEM - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: Connection refused [21:26:34] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Ge [21:26:43] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200): /page/title/{title} (Get rev by title from storage) is CRITICAL [21:26:44] 06Operations, 10ops-codfw: rack/setup/deploy new codfw mw app servers - https://phabricator.wikimedia.org/T135466#2342763 (10Papaul) [21:26:52] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp San Francisco page via mobile-sections-lead) is CRITICAL: Test retrieve lead section of en.wp San Francisco page via mobile-sections-lead returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp Barack Obama page via mobil [21:26:53] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Ge [21:26:53] PROBLEM - restbase endpoints health on restbase1008 is CRITICAL: /page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200): /page/mobile-sections/{title} (Get MobileApps Foobar page) is CRITICAL: Test Get MobileApps Foobar page returned the unexpected status 500 (expecting: 200) [21:26:54] PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/mobile-summary/{title} (retrieve page preview of Dog page) is CRITICAL: Test retrieve page preview of Dog page returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp San Francisco page via mobile-sections-lead) is CRITICAL: Test retrieve lead section of en.wp San Franc [21:27:00] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, and 2 others: Image cache issue when 'over-writing' an image on commons - https://phabricator.wikimedia.org/T119038#2342765 (10BBlack) [21:27:02] PROBLEM - cassandra-b CQL 10.64.0.231:9042 on restbase1007 is CRITICAL: Connection refused [21:27:04] (03CR) 10Yuvipanda: "I've tested this and it seems to work." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 (owner: 10Yuvipanda) [21:27:13] PROBLEM - cxserver endpoints health on scb1001 is CRITICAL: /v1/page/{language}/{title}{/revision} (Fetch enwiki Oxygen page) is CRITICAL: Test Fetch enwiki Oxygen page returned the unexpected status 404 (expecting: 200) [21:27:13] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Te [21:27:33] PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200): /page/title/{title} (Get rev by title from storage) is CRITICAL [21:27:33] PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /page/title/{title} (Get rev by title from storage) is CRITICAL: Test Get rev by title from storage returned the unexpected status 500 (expecting: 200): /page/revision/{revision} (Get rev by ID) is CRITICAL: Test Get rev by ID r [21:27:43] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /page/html/{title} (Get html by title from storage) is CRITICAL: Test Ge [21:27:43] PROBLEM - restbase endpoints health on restbase1010 is CRITICAL: /media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200) [21:27:51] ummm [21:27:53] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) is CRITICAL: Test retrieve images and videos of en.wp Cat page via media route returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/mobile-summary/{title} (retrieve page preview of Dog page) is CRITICAL: Test retrieve page preview of Dog page returned the unexpec [21:28:13] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [21:28:14] :| [21:28:38] !log Bouncing restbase on restbase1010.eqiad [21:28:42] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [21:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:29:06] thcipriani: I think I would rather just duplicate variable initialization in the config file for now and take a little more time writing the proper patch [21:29:13] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [1000.0] [21:29:13] RECOVERY - cxserver endpoints health on scb1001 is OK: All endpoints are healthy [21:29:43] RECOVERY - restbase endpoints health on restbase1010 is OK: All endpoints are healthy [21:29:53] !log Bouncing restbase on restbase1008.eqiad.wmnet [21:29:53] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] [21:29:53] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] [21:29:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:30:04] 06Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration: Create moon.wikimedia.org and redirect it to https://meta.wikimedia.org/wiki/Wikipedia_to_the_Moon - https://phabricator.wikimedia.org/T136557#2342785 (10BBlack) Unclear from the description: Is it intended that moon always redirects to this... [21:30:23] 06Operations, 10Traffic: Upgrade all cache clusters to Varnish 4 - https://phabricator.wikimedia.org/T131499#2342795 (10BBlack) [21:30:25] 06Operations, 10Traffic, 13Patch-For-Review: Sort out vcl_deliver vs vcl_synth mess with v4 VCL - https://phabricator.wikimedia.org/T135696#2342793 (10BBlack) 05Open>03Resolved a:03BBlack [21:30:45] !log Bouncing restbase on restbase1009.eqiad.wmnet [21:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:30:52] tgr: that works for me. [21:30:53] RECOVERY - restbase endpoints health on restbase1008 is OK: All endpoints are healthy [21:31:59] !log Bouncing restbase on restbase1012.eqiad.wmnet [21:32:04] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [21:32:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:32:25] 06Operations, 06Commons, 10Traffic, 10media-storage, and 2 others: upload-lb.ulsfo.wikimedia.org still allow access to some deleted files - https://phabricator.wikimedia.org/T133819#2342800 (10BBlack) [21:32:26] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy [21:32:29] 06Operations, 10Traffic: Content purges are unreliable - https://phabricator.wikimedia.org/T133821#2342802 (10BBlack) [21:32:44] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [21:32:56] !log Bouncing restbase on restbase1013.eqiad.wmnet [21:33:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:33:26] 06Operations, 06Commons, 10Traffic, 10media-storage, and 2 others: Deleted files sometimes remain visible to non-privileged users if permanently linked - https://phabricator.wikimedia.org/T109331#2342807 (10BBlack) [21:33:31] 06Operations, 10Traffic: Content purges are unreliable - https://phabricator.wikimedia.org/T133821#2245593 (10BBlack) [21:33:50] !log Bouncing restbase on restbase1015.eqiad.wmnet [21:33:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:34:25] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [21:34:35] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [21:34:56] RECOVERY - cxserver endpoints health on scb1002 is OK: All endpoints are healthy [21:34:56] RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy [21:34:57] ACKNOWLEDGEMENT - cassandra-b CQL 10.64.0.231:9042 on restbase1007 is CRITICAL: Connection refused eevans t-shooting - The acknowledgement expires at: 2016-06-01 21:34:45. [21:35:54] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [21:36:05] RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy [21:36:46] (03PS5) 10Yuvipanda: Introduce 'Backends' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 [21:37:01] 06Operations, 10Traffic: Varnish: the lower the Age value, the slower the request - https://phabricator.wikimedia.org/T84980#2342830 (10BBlack) 05Open>03Resolved No movement in over a year, and is more an observation than a question. [21:37:55] RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy [21:38:33] (03PS1) 10Gergő Tisza: Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) [21:38:54] (03CR) 10Yuvipanda: "There's also the question of if 'Backend' is the right terminology to use here." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292028 (owner: 10Yuvipanda) [21:39:21] thcipriani: legoktm: https://gerrit.wikimedia.org/r/#/c/292036/ [21:39:36] ACKNOWLEDGEMENT - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: Connection refused eevans administratively shutdown while t-shooting - The acknowledgement expires at: 2016-06-01 21:39:13. [21:40:24] (03CR) 10Legoktm: [C: 031] Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) (owner: 10Gergő Tisza) [21:40:26] (03CR) 10Paladox: [C: 031] Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) (owner: 10Gergő Tisza) [21:40:53] 06Operations, 10Traffic, 10Wikimedia-Apache-configuration, 07HTTPS: HTTP->HTTPS redirects need to unconditional send Vary header - https://phabricator.wikimedia.org/T98990#2342859 (10BBlack) 05Open>03declined Varnish is now doing all the redirects directly rather than the applayer. [21:40:53] (03CR) 10Thcipriani: [C: 032] Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) (owner: 10Gergő Tisza) [21:41:06] tgr: thank you :) [21:41:50] (03Merged) 10jenkins-bot: Workaround for T136644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292036 (https://phabricator.wikimedia.org/T136644) (owner: 10Gergő Tisza) [21:43:51] !log thcipriani@tin Synchronized wmf-config/abusefilter.php: [[gerrit:292036|Workaround for T136644]] (duration: 00m 30s) [21:43:52] T136644: Notice: Undefined variable: wgAbuseFilterAvailableActions in /srv/mediawiki/wmf-config/abusefilter.php on line 23 - https://phabricator.wikimedia.org/T136644 [21:43:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:44:04] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [21:45:34] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [21:45:55] (03PS1) 10Thcipriani: Revert "Revert "Group0 to 1.28.0-wmf.4"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292037 [21:46:17] (03CR) 10Thcipriani: [C: 032] Revert "Revert "Group0 to 1.28.0-wmf.4"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292037 (owner: 10Thcipriani) [21:46:44] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [21:46:44] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [21:46:46] 06Operations, 10Traffic, 07HTTPS: Getting ssl_error_inappropriate_fallback_alert very rarely - https://phabricator.wikimedia.org/T108579#2342885 (10BBlack) 05Open>03Resolved a:03BBlack Assuming not, re-open if so. [21:46:53] (03Merged) 10jenkins-bot: Revert "Revert "Group0 to 1.28.0-wmf.4"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292037 (owner: 10Thcipriani) [21:47:29] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.4 [21:48:20] no log explosion [21:48:45] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [21:50:05] 06Operations, 10Traffic, 07HTTPS: When user is logging out via HTTPS, insecure HTTP cookies keeping logged in state should be cleared as well - https://phabricator.wikimedia.org/T34144#2342891 (10BBlack) 05Open>03Resolved a:03BBlack Assuming this is no longer an issue, since login via HTTP is impossible. [21:50:50] (03PS2) 10Madhuvishy: uwsgi: Allow specifying plugins optionally as a uwsgi command line option [puppet] - 10https://gerrit.wikimedia.org/r/292030 [21:51:24] 06Operations, 10Traffic, 07Beta-Cluster-reproducible: PHP fatal errors causing Varnish to return 503 - "Junk after gzip data" - https://phabricator.wikimedia.org/T125938#2342896 (10BBlack) Is this still reproducible? Did we decide whether varnish or hhvm was at fault? [21:51:57] 06Operations, 10Traffic: 3 Varnish cache_upload servers crashed in a short time window - https://phabricator.wikimedia.org/T125401#2342897 (10BBlack) 05Open>03Resolved a:03BBlack Haven't seen much of this since, and 4.4.x upgrades are in-progress this week. [21:53:25] 06Operations, 10Traffic: Varnish leaks memory - https://phabricator.wikimedia.org/T122455#2342900 (10BBlack) 05Open>03Resolved a:03BBlack We've kept TBF reverted ever since. At this point the VCL wouldn't un-revert easily anyways, so we'll look again at TBF or similar post-Varnish4, and we don't have an... [21:54:23] 06Operations, 07Puppet, 10Traffic: Clean up nginx / nginx::ssl classes and usage - https://phabricator.wikimedia.org/T118078#2342904 (10BBlack) 05Open>03Resolved a:03BBlack eh, this is a "refactor things better" ticket. We're always doing that and we're never done. [21:55:11] 06Operations, 10Traffic: Reintroduce rejection for requests with null user agents - https://phabricator.wikimedia.org/T111140#2342912 (10BBlack) 05Open>03declined [21:56:58] 06Operations, 10MediaWiki-extensions-CentralNotice, 10Traffic, 10Wikimedia-Fundraising: Provide location, logged-in status and device information in ResourceLoaderContext - https://phabricator.wikimedia.org/T103695#2342913 (10BBlack) This ticket is getting stale, is it still relevant and up-to-date with cu... [21:58:03] 06Operations, 10Traffic: Varnish Assert error in VGZ_Ibuf() - https://phabricator.wikimedia.org/T122462#2342915 (10BBlack) 05Open>03Resolved a:03BBlack It hasn't been a huge issue over the past several months, and everything about this will change with Varnish4 which is in the process of being deployed. [22:03:24] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [22:07:26] I'm getting "VM417:400 Uncaught TypeError: Cannot read property 'toLowerCase' of null" on Commons [22:07:35] some gadgets aren't loading... [22:09:04] http://dpaste.com/10FTJY3.txt [22:09:39] something new happened to mediawiki? new apis or something? [22:12:04] thcipriani: did you just rool out to commons? ^ [22:12:17] legoktm: ^ [22:12:48] uh, commons shouldn't have gotten it today? [22:12:49] p858snake: no, commons did not just get an update. mediawiki.org did get a new version as well as testwiki and test2wiki [22:14:55] The gadgets (Google and TineEye) worked just minutes before. Now I'm getting this instead [22:15:07] (plus a few more gadgets) [22:15:24] (I'll file a ticket) [22:17:34] nvm...seems to be https://phabricator.wikimedia.org/T134860 [22:23:02] anyone know why this https://graphite.wikimedia.org/render?target=servers.restbase1007.iostat.md2.read_byte_per_second&from=-12h&width=1024 ... would disagree with the output of iostat on the machine? [22:23:18] disagree by a lot [22:25:27] (03PS2) 10Dzahn: Stop using package->latest in ganglia monitor [puppet] - 10https://gerrit.wikimedia.org/r/291764 (https://phabricator.wikimedia.org/T115384) (owner: 10Muehlenhoff) [22:30:31] (03CR) 10Yuvipanda: [C: 04-1] uwsgi: Allow specifying plugins optionally as a uwsgi command line option (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/292030 (owner: 10Madhuvishy) [22:35:52] PROBLEM - puppet last run on cp2009 is CRITICAL: CRITICAL: Puppet has 1 failures [22:36:10] (03PS1) 10Dereckson: Set collation to uca-it for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 [22:38:03] !log running "mwscript sql.php --wiki=zerowiki /srv/mediawiki/php-1.28.0-wmf.4/maintenance/archives/patch-bot_passwords.sql" for T135074 [22:38:04] T135074: Update JsonConfig for AuthManager - https://phabricator.wikimedia.org/T135074 [22:38:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:42:02] PROBLEM - HP RAID on ms-be1016 is CRITICAL: CHECK_NRPE: Socket timeout after 20 seconds. [22:43:53] (03CR) 10Dzahn: [C: 032] Stop using package->latest in ganglia monitor [puppet] - 10https://gerrit.wikimedia.org/r/291764 (https://phabricator.wikimedia.org/T115384) (owner: 10Muehlenhoff) [22:44:01] RECOVERY - HP RAID on ms-be1016 is OK: OK: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2, Controller, Battery/Capacitor [22:49:43] (03PS1) 10Gergő Tisza: Enable bot passwords on zerowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292053 (https://phabricator.wikimedia.org/T135074) [23:00:05] RoanKattouw ostriches Krenair MaxSem Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160531T2300). Please do the needful. [23:00:05] RoanKattouw eranroz ebernhardson James_F: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:13] Hi. [23:00:17] * RoanKattouw waves [23:00:20] \o [23:00:22] * James_F waves. [23:01:00] \m/ [23:02:16] RECOVERY - puppet last run on cp2009 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [23:02:27] I would like to add https://gerrit.wikimedia.org/r/292050 to the SWAT. [23:03:27] Eranroz isn't here? [23:05:09] Okay I can SWAT. We'll see later for Eranroz. [23:06:15] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [23:06:57] (03PS1) 10Yuvipanda: Add LICENSE [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292056 [23:07:04] 06Operations, 10Traffic: Content purges are unreliable - https://phabricator.wikimedia.org/T133821#2343168 (10MZMcBride) Related: * {T56902} * {T130901} * {T135964} [23:07:25] (03PS1) 10Yuvipanda: Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 [23:07:26] bd808: ^^ [23:07:35] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [23:08:05] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [23:08:08] (03PS2) 10BryanDavis: Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 (owner: 10Yuvipanda) [23:08:13] (03CR) 10jenkins-bot: [V: 04-1] Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 (owner: 10Yuvipanda) [23:08:39] bd808: I think for the base images you need to agree [23:08:48] (03CR) 10BryanDavis: [C: 032] Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 (owner: 10Yuvipanda) [23:08:55] * bd808 agrees [23:09:03] bd808: for the other one, only other committer is valhallasw`cloud and he also only added a short comment. I'm ok with us merging it or waiting for his +1 [23:09:13] Dereckson: Note that you'll need to make the pull-through commits for the MW-VE production branches into MW manually, as always. [23:12:11] (03CR) 10BryanDavis: [C: 031] Add LICENSE (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292056 (owner: 10Yuvipanda) [23:13:43] James_F: I generally rebase the wmf branch against origin/wmf [23:14:04] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:14:22] Dereckson: OK? But my point remains, gerrit won't auto-make the commits for you for the VE-MW repo. [23:14:37] oh okay, yes yes for VE I rebase the extension branch. [23:15:15] * James_F nods. [23:15:19] It's irritating. [23:15:25] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:15:29] So we're waiting Zuul now, https://integration.wikimedia.org/zuul/ [23:15:50] (03PS3) 10Madhuvishy: uwsgi: Allow specifying plugins optionally as a uwsgi command line option [puppet] - 10https://gerrit.wikimedia.org/r/292030 [23:15:55] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:17:08] (03PS2) 10Dereckson: Set collation to uca-it for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 (https://phabricator.wikimedia.org/T136647) [23:18:30] Dereckson: I'm going afk for just a bit and you should be done with swat before I'm back, but plz ping me or something when swat is done, I'm going to sync something afterwords. [23:18:38] (03Merged) 10jenkins-bot: Add LICENSE [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/292057 (owner: 10Yuvipanda) [23:18:43] ostriches: k [23:18:47] Thx [23:19:48] (03CR) 10Luke081515: [C: 031] Set collation to uca-it for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 (https://phabricator.wikimedia.org/T136647) (owner: 10Dereckson) [23:22:54] Ah, Zuul merged stuff. [23:24:47] cool [23:28:32] !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: Turn off textcat subtest of search satisfaction (T134319) (duration: 00m 30s) [23:28:32] T134319: Turn off TextCat A/B test on the English Wikipedia on or after May 23 - https://phabricator.wikimedia.org/T134319 [23:28:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:28:39] ebernhardson: please test ^ [23:29:15] Dereckson: it will be a couple minutes before the cache clears, but will do [23:29:34] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [23:33:12] !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/Echo/modules/styles/: Adjust styling for Special:Notification items (T136572, 1/2) (duration: 00m 30s) [23:33:12] T136572: Make notification styling on the Notifications Page closer to the ones in the panel - https://phabricator.wikimedia.org/T136572 [23:33:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:33:49] !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/Echo/modules/ui/mw.echo.ui.NotificationItemWidget.js: Adjust styling for Special:Notification items (T136572, 2/2) (duration: 00m 24s) [23:33:50] T136572: Make notification styling on the Notifications Page closer to the ones in the panel - https://phabricator.wikimedia.org/T136572 [23:33:52] RoanKattouw: please test ^ [23:33:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:36:27] Dereckson: testwiki looks reasonable for my patch [23:36:33] ok let's go for wmf3 [23:36:53] Dereckson: Looks good [23:37:02] ack [23:37:44] !log dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: Turn off textcat subtest of search satisfaction (T134319) (duration: 00m 23s) [23:37:44] T134319: Turn off TextCat A/B test on the English Wikipedia on or after May 23 - https://phabricator.wikimedia.org/T134319 [23:37:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:38:09] James_F: you're next [23:39:07] Kk. [23:41:28] (03PS2) 10Dzahn: remove pardus table and orain remnants [debs/wikistats] - 10https://gerrit.wikimedia.org/r/291481 (https://phabricator.wikimedia.org/T136460) [23:43:53] !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/VisualEditor/modules/ve-mw/init/ve.init.MWWelcomeDialog.js: ve.init.MWWelcomeDialog: Fix keyboard focus on dialog actions (T135808) (duration: 00m 23s) [23:43:54] T135808: "Start editing" popup can't be dismissed without clicking "Start editing" - https://phabricator.wikimedia.org/T135808 [23:43:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:44:08] James_F: please test on wmf4 ^ [23:44:46] Dereckson: Yup, works well. [23:45:26] !log dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/VisualEditor/modules/ve-mw/init/ve.init.MWWelcomeDialog.js: ve.init.MWWelcomeDialog: Fix keyboard focus on dialog actions (T135808) (duration: 00m 22s) [23:45:27] T135808: "Start editing" popup can't be dismissed without clicking "Start editing" - https://phabricator.wikimedia.org/T135808 [23:45:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:47:19] !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/GeoData/includes/Hooks.php: Don't index non-Earth coordinates (T136559) (duration: 00m 24s) [23:47:20] T136559: Elasticsearch: illegal longitude value [219.38] for coordinates.coord - https://phabricator.wikimedia.org/T136559 [23:47:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:47:56] (03CR) 10Dzahn: [C: 032 V: 032] remove pardus table and orain remnants [debs/wikistats] - 10https://gerrit.wikimedia.org/r/291481 (https://phabricator.wikimedia.org/T136460) (owner: 10Dzahn) [23:47:56] MaxSem: please test ^ [23:48:00] (wmf4) [23:50:02] Dereckson: i have to run and catch a train, maxsem will be double checking that my patch works (he sits next to me) [23:52:22] okay, good train [23:53:05] MaxSem: ebernhardson has already tested it for wmf4, only wmf3 still to test for Turn off textcat search test [23:53:47] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 (https://phabricator.wikimedia.org/T136647) (owner: 10Dereckson) [23:53:59] Dereckson, "testing" requires monitoring over a prolonged period, so just go ahead [23:54:05] k [23:54:26] (03Merged) 10jenkins-bot: Set collation to uca-it for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292050 (https://phabricator.wikimedia.org/T136647) (owner: 10Dereckson) [23:54:54] !log dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/GeoData/includes/Hooks.php: Don't index non-Earth coordinates (T136559) (duration: 00m 23s) [23:54:55] T136559: Elasticsearch: illegal longitude value [219.38] for coordinates.coord - https://phabricator.wikimedia.org/T136559 [23:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:54:59] MaxSem: here you are ^ all is synced for you and ebernhardson [23:55:17] thanks [23:57:27] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Set collation to uca-it for it.wikipedia (T136647) (duration: 00m 25s) [23:57:28] T136647: Set UCA-IT as it.wiki's collation - https://phabricator.wikimedia.org/T136647 [23:57:29] Will test that later, after running collation update script on Terbium. [23:57:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:57:55] Let's see if we can find Eranroz. If no, the SWAT is done. [23:58:53] 22:36:22 -!- eranroz [~Thunderbi@37.46.39.199] has quit [Quit: eranroz]