[00:15:54] PROBLEM - puppet last run on elastic1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:41:50] RECOVERY - puppet last run on elastic1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:44:14] PROBLEM - puppet last run on dbproxy1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:10:38] RECOVERY - puppet last run on dbproxy1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:32:31] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:58:12] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:00:25] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] [04:05:41] gerrit dead? [04:07:24] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [04:35:19] RECOVERY - MegaRAID on dataset1001 is OK: OK: optimal, 3 logical, 36 physical [06:26:24] RECOVERY - Disk space on cp4006 is OK: DISK OK [06:34:04] (03CR) 10Giuseppe Lavagetto: [C: 031] hhvm: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/316550 (owner: 10Muehlenhoff) [06:48:07] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [06:50:33] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [06:53:33] PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[pv] [07:00:20] 06Operations, 06Multimedia, 10Traffic, 15User-Josve05a, 15User-Urbanecm: Thumbnails failing to render sporadically (ERR_CONNECTION_CLOSED or ERR_SSL_BAD_RECORD_MAC_ALERT) - https://phabricator.wikimedia.org/T148917#2736599 (10Arseny1992) T148830 describes a related caching issue T145811 and T93041 descri... [07:07:41] (03PS2) 10Giuseppe Lavagetto: apache: Standardise ShortURL config per Giuseppe on If258a076 [puppet] - 10https://gerrit.wikimedia.org/r/315522 (owner: 10Alex Monk) [07:09:10] !log Deploying alter table s1.enwiki on codfw - T147166 [07:09:11] (03PS2) 10Elukey: Add user pmiazga and its related ssh key [puppet] - 10https://gerrit.wikimedia.org/r/317142 (https://phabricator.wikimedia.org/T148477) [07:09:11] T147166: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166 [07:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:10:44] marosteg1i: o/ it is not a good monday morning without an alter table right? :D [07:10:55] (03CR) 10Giuseppe Lavagetto: [C: 032] apache: Standardise ShortURL config per Giuseppe on If258a076 [puppet] - 10https://gerrit.wikimedia.org/r/315522 (owner: 10Alex Monk) [07:11:12] elukey: o/ indeed, ready to break stuff on Monday so I have the whole week to fix it :p [07:12:22] (03CR) 10Elukey: [C: 032] Add user pmiazga and its related ssh key [puppet] - 10https://gerrit.wikimedia.org/r/317142 (https://phabricator.wikimedia.org/T148477) (owner: 10Elukey) [07:12:38] (03PS3) 10Elukey: Add user pmiazga and its related ssh key [puppet] - 10https://gerrit.wikimedia.org/r/317142 (https://phabricator.wikimedia.org/T148477) [07:16:49] (03PS1) 10Giuseppe Lavagetto: Revert "apache: Standardise ShortURL config per Giuseppe on If258a076" [puppet] - 10https://gerrit.wikimedia.org/r/317466 [07:17:50] (03CR) 10Giuseppe Lavagetto: "Apparently Proxy-passing the original URL doesn't work, which means short urls will only work if the original url is rewritten to /w/index" [puppet] - 10https://gerrit.wikimedia.org/r/317466 (owner: 10Giuseppe Lavagetto) [07:17:58] (03PS2) 10Giuseppe Lavagetto: Revert "apache: Standardise ShortURL config per Giuseppe on If258a076" [puppet] - 10https://gerrit.wikimedia.org/r/317466 [07:18:44] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Revert "apache: Standardise ShortURL config per Giuseppe on If258a076" [puppet] - 10https://gerrit.wikimedia.org/r/317466 (owner: 10Giuseppe Lavagetto) [07:19:55] RECOVERY - puppet last run on labvirt1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:22:38] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2724169 (10ArielGlenn) Regarding the gc patch, can we get some logging around how often full gcs are happening first... [07:30:07] (03PS1) 10Giuseppe Lavagetto: mediawiki::web: use Rewrites for ShortUrl extension [puppet] - 10https://gerrit.wikimedia.org/r/317467 (https://phabricator.wikimedia.org/T146014) [07:30:40] (03PS1) 10Elukey: Add user pmiazga to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/317468 (https://phabricator.wikimedia.org/T148477) [07:33:14] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki::web: use Rewrites for ShortUrl extension [puppet] - 10https://gerrit.wikimedia.org/r/317467 (https://phabricator.wikimedia.org/T146014) (owner: 10Giuseppe Lavagetto) [07:33:34] (03CR) 10Elukey: [C: 032] Add user pmiazga to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/317468 (https://phabricator.wikimedia.org/T148477) (owner: 10Elukey) [07:33:39] (03PS2) 10Elukey: Add user pmiazga to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/317468 (https://phabricator.wikimedia.org/T148477) [07:43:36] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to "Production shell" for pmiazga - https://phabricator.wikimedia.org/T148477#2737375 (10elukey) @pmiazga you should be now able to connect to stat1002.eqiad.wmnet and stat1004.eqiad.wmnet. Please follow up with one of your colleagues... [07:46:33] !log reimaging mc1022.eqiad.wmnet (T137345) [07:46:34] T137345: Rack/Setup new memcache servers mc1019-36 - https://phabricator.wikimedia.org/T137345 [07:46:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:59:32] (03CR) 10Mobrovac: "With the planned move to scap config deploys, is this still relevant?" [puppet] - 10https://gerrit.wikimedia.org/r/302309 (https://phabricator.wikimedia.org/T139674) (owner: 10Ppchelko) [08:02:26] (03PS1) 10Giuseppe Lavagetto: mediawiki::web: Add ShortUrl support on wikimedia.org docroot sites [puppet] - 10https://gerrit.wikimedia.org/r/317469 (https://phabricator.wikimedia.org/T146014) [08:03:13] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki::web: Add ShortUrl support on wikimedia.org docroot sites [puppet] - 10https://gerrit.wikimedia.org/r/317469 (https://phabricator.wikimedia.org/T146014) (owner: 10Giuseppe Lavagetto) [08:12:01] (03Abandoned) 10Giuseppe Lavagetto: Follow-up Ifa2cc187: Add ShortUrl support on wikimedia.org docroot sites [puppet] - 10https://gerrit.wikimedia.org/r/311647 (https://phabricator.wikimedia.org/T146014) (owner: 10Alex Monk) [08:20:03] !log reimaging mc1023.eqiad.wmnet [08:20:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:28:16] (03CR) 10Alexandros Kosiaris: "I do, I 'll merge and deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316356 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [08:33:05] (03CR) 10Alexandros Kosiaris: [C: 032] Replace helium with poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316356 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [08:33:11] (03PS2) 10Alexandros Kosiaris: Replace helium with poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316356 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [08:33:13] (03CR) 10Alexandros Kosiaris: [V: 032] Replace helium with poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316356 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [08:33:19] !log rebooting contint1001 [08:33:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:34:56] PROBLEM - DPKG on scandium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:35:06] thats ^ me [08:35:31] cp: cannot create regular file '/usr/share/python/zuul/bin/python2.7': Text file busy [08:35:37] sigh [08:36:06] !log rebooting scandium for kernel upgrades [08:36:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:36:34] !log rebooting labnodepool1001 for kernel upgrades [08:36:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:37:22] Dereckson , https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=921391&oldid=921364 [08:37:24] 06Operations, 13Patch-For-Review: Migrate pool counters to trusty/jessie - https://phabricator.wikimedia.org/T123734#2737430 (10akosiaris) helium replaced with poolcounter1001 [08:37:50] and Urbanecm [08:37:53] ^ [08:38:49] !log Restarting gallium (Jenkins/Zuul) for kernel upgrades [08:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:38:55] PROBLEM - Host labnodepool1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:38:55] !log continue rolling restart of elasticsearch eqiad cluster [08:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:39:06] arseny92: Sorry, I've made a mistake and didn't add a task. [08:39:32] All my changes are correct from my side. [08:39:36] RECOVERY - Host labnodepool1001 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms [08:39:36] RECOVERY - DPKG on scandium is OK: All packages OK [08:39:40] Should I do anything else? [08:39:50] arseny92 and Dereckson ^^ [08:40:27] more meant just the styling of that page [08:42:27] {{plab}} doesn't differ from the interwiki , {{phabricator| prepends "Task: " , etc, need some consistency :p [08:45:06] PROBLEM - Host gallium is DOWN: PING CRITICAL - Packet loss = 100% [08:45:49] RECOVERY - Host gallium is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms [08:46:25] !log restbase deploy start of f9017ad [08:46:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:48:17] (03CR) 10Alexandros Kosiaris: [C: 031] add mapped IPv6 address for eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/317192 (owner: 10Dzahn) [08:50:16] nice akosiaris --^ [08:51:07] I think we can leave it like it is. I looked at another patches from me, there was no task :). [08:51:37] PROBLEM - puppet last run on mw1253 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:53:26] !log reimaging mc1024 [08:53:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:55:03] !log rebooting cobalt (gerrit) for kernel upgrades [08:55:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:55:10] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging [08:55:26] hmm that is probably me, doing stupid things ^ [08:55:28] fixing [08:57:06] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on tin is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging [08:57:31] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is OK: Files ownership is ok. [09:01:45] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [09:04:26] mmm [09:06:20] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Varnishlog with Start timestamp but no Resp one causing data consistency check alarms - https://phabricator.wikimedia.org/T148412#2737447 (10elukey) One thing that is worth mentioning: the `ReqAcct` tag in the logged requests without `Timesta... [09:09:23] !log restbase deploy end of f9017ad [09:09:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:16:24] RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [09:20:36] !log change-prop deploying c7feda2 [09:20:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:26:26] <_joe_> uhm where is grrit-wm? [09:27:05] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [09:45:56] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Add shell scripts CI validations - https://phabricator.wikimedia.org/T148494#2737489 (10hashar) [09:46:23] PROBLEM - dhclient process on kubernetes1002 is CRITICAL: Connection refused by host [09:46:54] PROBLEM - Check size of conntrack table on kubernetes1002 is CRITICAL: Connection refused by host [09:47:04] 06Operations, 06Discovery, 06Maps, 10Maps-data, and 2 others: Configure new maps servers in eqiad - https://phabricator.wikimedia.org/T138092#2737495 (10Gehel) [09:47:06] 06Operations, 06Discovery, 06Maps, 10Traffic, 03Interactive-Sprint: Maps - move traffic to eqiad instead of codfw - https://phabricator.wikimedia.org/T145758#2737493 (10Gehel) 05Open>03Resolved Resolved during maps incident as pointed out by @BBlack. [09:47:14] PROBLEM - DPKG on kubernetes1002 is CRITICAL: Connection refused by host [09:47:32] PROBLEM - salt-minion processes on kubernetes1002 is CRITICAL: Connection refused by host [09:47:38] 06Operations, 06Discovery, 06Maps, 10Maps-data, 07Epic: Epic: cultivating the Maps garden - https://phabricator.wikimedia.org/T137616#2737498 (10Gehel) [09:47:41] 06Operations, 06Discovery, 06Maps, 10Maps-data, and 2 others: Configure new maps servers in eqiad - https://phabricator.wikimedia.org/T138092#2388948 (10Gehel) 05Open>03Resolved Traffic is moved to eqiad maps cluster, which is a good proof that this task is completed. [09:47:42] PROBLEM - Disk space on kubernetes1002 is CRITICAL: Connection refused by host [09:48:02] PROBLEM - MD RAID on kubernetes1002 is CRITICAL: Connection refused by host [09:48:22] PROBLEM - configured eth on kubernetes1002 is CRITICAL: Connection refused by host [09:50:40] I rebooted the wrong server [09:51:31] PROBLEM - Host dbproxy1001 is DOWN: PING CRITICAL - Packet loss = 100% [09:52:46] jynus: oops.. [09:52:46] isn't there supposed to be molly-guard to prevent that? [09:52:52] there is [09:52:58] <_joe_> jynus: should we do something? [09:53:06] I think it is not the active proxy [09:53:09] <_joe_> not via salt, there isn't [09:53:10] so it should be ok? [09:53:14] ok then [09:53:23] <_joe_> jynus: I guess so, we'll see [09:53:30] I am just trying to not reinstall it [09:53:36] jynus: since we are not in an emergency, where should I puppetize the tendril_web mysql user ? [09:53:56] akosiaris, wait one second so I can confirm [09:53:58] git grep isn't helping me [09:54:00] it is inded not an issue [09:55:22] RECOVERY - Host dbproxy1001 is UP: PING OK - Packet loss = 0%, RTA = 4.00 ms [09:55:32] o_O [09:56:40] m1-master.eqiad.wmnet. 25 IN CNAME dbproxy1006.eqiad.wmnet. [09:56:44] yes, not an issue [09:57:28] akosiaris, you are not going to like it [09:57:35] uh oh [09:57:42] please do continue [09:57:42] but for now it is puppet:templates/mariadb [09:57:51] I know I have to move it [09:58:29] ah, production-grants.sql.erb! [09:58:36] actually [09:58:38] ah ok. my git grep voodoo failed then [09:58:40] that is not a production grant [09:58:56] it should be a tendril-grants.sql.erb [09:59:09] but apperently, nothing was puppetized before :-/ [09:59:11] I 'll leave that technicality to you for when you move it :-) [09:59:22] :P [09:59:27] just create a new file [09:59:33] put the users there without the pass [09:59:40] or open a ticket [09:59:49] whatever is easier for you [10:00:27] but we need to track those [10:00:54] PROBLEM - haproxy failover on dbproxy1001 is CRITICAL: CRITICAL check_failover servers up 2 down 1 [10:01:11] interesting [10:03:16] PROBLEM - Host kubernetes1002 is DOWN: PING CRITICAL - Packet loss = 100% [10:03:31] ^that is not me [10:06:09] for some reason, when the proxy started, it declared one of the mysql servers down [10:06:12] that is a problem [10:06:23] not an ongoing issue, but a bad logic [10:06:37] RECOVERY - haproxy failover on dbproxy1001 is OK: OK check_failover servers up 2 down 0 [10:07:08] maybe it started before network was up? [10:07:51] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Add shell scripts CI validations - https://phabricator.wikimedia.org/T148494#2724601 (10hashar) My editor uses both and they are quite useful to avoid mistakes and preventing corner cases issues. The main challenge though is n... [10:08:38] volans: thoughts for shellcheck/checkbashisms ^^ :] [10:08:50] hashar: ack, reading [10:12:24] <_joe_> uhm how come grrrit-wm can't talk here? [10:12:53] maybe the account is not registered in freenode and this channel is restricted to registered users? [10:13:25] <_joe_> no it's not [10:13:48] <_joe_> let me try to restart it, although according to grrrit-wm itself, it's running fine [10:16:20] (03CR) 10Gilles: "Ah, I see now that one of them is in a separate changeset." [puppet] - 10https://gerrit.wikimedia.org/r/316543 (https://phabricator.wikimedia.org/T147923) (owner: 10Filippo Giunchedi) [10:16:22] (03CR) 10Giuseppe Lavagetto: [C: 032] docker::engine: add dependency of the service on the storage [puppet] - 10https://gerrit.wikimedia.org/r/317478 (https://phabricator.wikimedia.org/T147181) (owner: 10Giuseppe Lavagetto) [10:16:28] <_joe_> hey, welcome back [10:19:52] RECOVERY - puppet last run on kubernetes1003 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:21:01] <_joe_> !log rebooting kubernetes1002 [10:21:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:21:50] grrrit-wm: welcome [10:22:06] PROBLEM - puppet last run on kubernetes1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Service[docker] [10:25:52] <_joe_> the kubernetes failure are pretty much expected. [10:26:52] RECOVERY - puppet last run on kubernetes1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:27:37] PROBLEM - configured eth on kubernetes1003 is CRITICAL: Connection refused by host [10:27:51] PROBLEM - DPKG on kubernetes1003 is CRITICAL: Connection refused by host [10:28:17] PROBLEM - Disk space on kubernetes1003 is CRITICAL: Connection refused by host [10:28:37] PROBLEM - dhclient process on kubernetes1003 is CRITICAL: Connection refused by host [10:28:47] PROBLEM - MD RAID on kubernetes1003 is CRITICAL: Connection refused by host [10:28:59] <_joe_> grr [10:29:34] (03PS1) 10Giuseppe Lavagetto: monitoring::host: default to $main_ipaddress [puppet] - 10https://gerrit.wikimedia.org/r/317480 [10:29:50] PROBLEM - puppet last run on kubernetes1003 is CRITICAL: Connection refused by host [10:30:07] PROBLEM - Check size of conntrack table on kubernetes1003 is CRITICAL: Connection refused by host [10:30:07] PROBLEM - salt-minion processes on kubernetes1003 is CRITICAL: Connection refused by host [10:40:00] <_joe_> sigh [10:42:07] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "I actually read that wrong, it is "labnet", and in fact this change would do the wrong thing there." [puppet] - 10https://gerrit.wikimedia.org/r/317130 (https://phabricator.wikimedia.org/T147181) (owner: 10Giuseppe Lavagetto) [10:47:32] !log reimaged mc102[56], currently doing mc1027 [10:47:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:24:22] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] [11:29:20] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [11:31:31] 06Operations, 06Multimedia, 10Traffic, 15User-Josve05a, 15User-Urbanecm: Thumbnails failing to render sporadically (ERR_CONNECTION_CLOSED or ERR_SSL_BAD_RECORD_MAC_ALERT) - https://phabricator.wikimedia.org/T148917#2737624 (10ema) As @BBlack mentioned on IRC, the reports tend to happen in esams at peak r... [11:44:46] (03PS1) 10Giuseppe Lavagetto: docker: fix monitoring for hosts with docker installed [puppet] - 10https://gerrit.wikimedia.org/r/317483 [11:46:08] (03PS10) 10Alexandros Kosiaris: icinga: Remove event_profiling_enabled [puppet] - 10https://gerrit.wikimedia.org/r/315085 [11:46:11] (03PS6) 10Alexandros Kosiaris: Replace neon with einsteinium where applicable [puppet] - 10https://gerrit.wikimedia.org/r/315257 [11:46:13] (03PS7) 10Alexandros Kosiaris: naggen2: Kill hostextinfo support [puppet] - 10https://gerrit.wikimedia.org/r/315243 [11:46:15] (03PS7) 10Alexandros Kosiaris: Remove absented /etc/icinga/puppet_hostextinfo.cfg entry [puppet] - 10https://gerrit.wikimedia.org/r/315244 [11:46:17] (03PS1) 10Alexandros Kosiaris: tendril: Install php5-mysql package [puppet] - 10https://gerrit.wikimedia.org/r/317484 [11:46:19] (03PS1) 10Alexandros Kosiaris: tendril: Apache 2.4 syntax in If clause guards [puppet] - 10https://gerrit.wikimedia.org/r/317485 [11:46:21] (03PS1) 10Alexandros Kosiaris: tendril: Set short_open_tag to true in virtual host [puppet] - 10https://gerrit.wikimedia.org/r/317486 [11:46:23] (03PS1) 10Alexandros Kosiaris: icinga::ircbot: Add an ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/317487 [11:47:31] (03PS2) 10Giuseppe Lavagetto: docker: fix monitoring for hosts with docker installed [puppet] - 10https://gerrit.wikimedia.org/r/317483 [11:47:58] <_joe_> akosiaris: you mean we're turning off neon today? :) [11:48:10] hehehe [11:48:13] you 'd wish [11:48:18] but I am almost done [11:48:19] <_joe_> oh I don't care [11:48:20] <_joe_> :P [11:48:40] so $::main_address did not work after all ? [11:49:28] <_joe_> it works everywhere but on labnet1001 [11:49:40] lol [11:49:42] <_joe_> so yeah, I prefer playing it safe like this [11:50:00] why not labnet1001 ? [11:50:12] <_joe_> akosiaris: why not what? [11:50:20] why it does not work on labnet1001 [11:50:38] * akosiaris afraid to even ssh into the ox [11:50:45] ox/box.. tomato/tomato [11:51:01] omg, the NAT box [11:51:04] yeah I get why [11:51:19] <_joe_> akosiaris: ipaddress_br1102 wins over ipaddress_eth0 [11:51:26] <_joe_> and, that's apparently a good thing [11:51:35] <_joe_> so what the hell do I know, heh [11:52:39] it is ? [11:52:44] ipaddress => 10.68.16.1 [11:52:44] ipaddress_br1102 => 10.68.16.1 [11:52:44] ipaddress_eth0 => 10.64.20.13 [11:53:05] so main_ipaddress should be 10.64.20.13 [11:53:08] <_joe_> akosiaris: from prod I can ping 10.68.16.1 but not 10.64.20.13 IIRC [11:53:12] <_joe_> yes [11:53:15] wat ? [11:53:53] akosiaris@iron:~$ ping -c 1 10.64.20.13 [11:53:53] PING 10.64.20.13 (10.64.20.13) 56(84) bytes of data. [11:53:53] 64 bytes from 10.64.20.13: icmp_seq=1 ttl=63 time=0.250 ms [11:53:55] <_joe_> hah, no, firewalling I guess [11:54:06] <_joe_> nevermind [11:54:14] <_joe_> so that would be good there too? [11:54:22] ok for some reason I have this recurring deja vu moment right now [11:54:24] <_joe_> I'd prefer to use that change instead [11:54:44] _joe_: I think so.. doesn't look that bad [11:54:54] and main_address is probably used in other places too [11:54:58] <_joe_> akosiaris: oh sorry, actually [11:54:58] and those are not broken [11:55:05] <_joe_> that's the opposite [11:55:36] * akosiaris confused [11:55:56] <_joe_> from prod I can ping 10.64.20.13 but not 10.68.16.1 [11:56:05] <_joe_> but from neon I can ping both [11:56:17] as well as iron [11:56:25] which has no special privileges IIRC [11:56:49] and bast1001 as well.. and tbh, it's unimportant [11:56:53] <_joe_> well try from puppetmaster1001 :) [11:57:08] <_joe_> or mw1017 [11:57:18] <_joe_> anyways, ok I can remove my -2 and merge it [11:57:45] ok [11:57:51] (03CR) 10Giuseppe Lavagetto: [C: 032] "I just got this backwards when doing tests, this change should be safe to merge, doing it" [puppet] - 10https://gerrit.wikimedia.org/r/317130 (https://phabricator.wikimedia.org/T147181) (owner: 10Giuseppe Lavagetto) [11:57:58] (03PS2) 10Giuseppe Lavagetto: nrpe: use main_ipaddress, not the ipaddress fact [puppet] - 10https://gerrit.wikimedia.org/r/317130 (https://phabricator.wikimedia.org/T147181) [11:58:05] <_joe_> I have one more coming up :) [11:58:07] whatever labnet1001 does, it's an exception and we should fix it there [11:58:08] :-) [11:58:08] (03CR) 10Giuseppe Lavagetto: [V: 032] nrpe: use main_ipaddress, not the ipaddress fact [puppet] - 10https://gerrit.wikimedia.org/r/317130 (https://phabricator.wikimedia.org/T147181) (owner: 10Giuseppe Lavagetto) [12:02:07] (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/4463/ shows it should be all good, including pinging the actual ip address of labnet*" [puppet] - 10https://gerrit.wikimedia.org/r/317480 (owner: 10Giuseppe Lavagetto) [12:02:18] (03PS2) 10Giuseppe Lavagetto: monitoring::host: default to $main_ipaddress [puppet] - 10https://gerrit.wikimedia.org/r/317480 [12:02:28] (03CR) 10Giuseppe Lavagetto: [V: 032] monitoring::host: default to $main_ipaddress [puppet] - 10https://gerrit.wikimedia.org/r/317480 (owner: 10Giuseppe Lavagetto) [12:06:11] PROBLEM - Disk space on labnet1001 is CRITICAL: Connection refused by host [12:06:51] PROBLEM - salt-minion processes on labnet1001 is CRITICAL: Connection refused by host [12:07:11] PROBLEM - configured eth on labnet1001 is CRITICAL: Connection refused by host [12:07:27] PROBLEM - Check size of conntrack table on labnet1001 is CRITICAL: Connection refused by host [12:07:42] PROBLEM - dhclient process on labnet1001 is CRITICAL: Connection refused by host [12:08:12] PROBLEM - nova-api process on labnet1001 is CRITICAL: Connection refused by host [12:09:11] RECOVERY - Host kubernetes1002 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [12:09:21] RECOVERY - DPKG on kubernetes1002 is OK: All packages OK [12:09:21] RECOVERY - Check size of conntrack table on kubernetes1002 is OK: OK: nf_conntrack is 0 % full [12:09:21] RECOVERY - salt-minion processes on kubernetes1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:09:22] RECOVERY - nova-api process on labnet1001 is OK: PROCS OK: 37 processes with regex args ^/usr/bin/python /usr/bin/nova-api [12:09:33] RECOVERY - Disk space on kubernetes1002 is OK: DISK OK [12:09:33] RECOVERY - configured eth on labnet1001 is OK: OK - interfaces up [12:09:52] RECOVERY - MD RAID on kubernetes1002 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [12:09:52] RECOVERY - dhclient process on labnet1001 is OK: PROCS OK: 0 processes with command name dhclient [12:10:13] RECOVERY - salt-minion processes on labnet1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:10:31] RECOVERY - dhclient process on kubernetes1002 is OK: PROCS OK: 0 processes with command name dhclient [12:12:00] RECOVERY - Check size of conntrack table on labnet1001 is OK: OK: nf_conntrack is 16 % full [12:12:30] RECOVERY - Disk space on labnet1001 is OK: DISK OK [12:20:53] 06Operations, 10ops-eqiad, 10Prod-Kubernetes, 05Kubernetes-production-experiment, and 2 others: Rack/Setup Kubernetes Servers - https://phabricator.wikimedia.org/T147933#2737707 (10Joe) @Cmjohnson thanks, Installing it now. [12:23:35] 06Operations, 10ops-eqiad, 10Prod-Kubernetes, 05Kubernetes-production-experiment, and 2 others: Rack/Setup Kubernetes Servers - https://phabricator.wikimedia.org/T147933#2737712 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by oblivian on neodymium.eqiad.wmnet for hosts: ``` ['kubernetes1001.... [12:27:07] 06Operations, 10DBA: Puppetize tendril web user creation - https://phabricator.wikimedia.org/T148955#2737720 (10akosiaris) [12:29:29] (03CR) 10Alexandros Kosiaris: [C: 032] naggen2: Kill hostextinfo support [puppet] - 10https://gerrit.wikimedia.org/r/315243 (owner: 10Alexandros Kosiaris) [12:29:33] (03PS8) 10Alexandros Kosiaris: naggen2: Kill hostextinfo support [puppet] - 10https://gerrit.wikimedia.org/r/315243 [12:29:36] (03CR) 10Alexandros Kosiaris: [V: 032] naggen2: Kill hostextinfo support [puppet] - 10https://gerrit.wikimedia.org/r/315243 (owner: 10Alexandros Kosiaris) [12:30:58] RECOVERY - dhclient process on kubernetes1003 is OK: PROCS OK: 0 processes with command name dhclient [12:31:07] RECOVERY - Host notebook1001 is UP: PING OK - Packet loss = 0%, RTA = 1.22 ms [12:31:07] RECOVERY - Check size of conntrack table on kubernetes1003 is OK: OK: nf_conntrack is 0 % full [12:31:22] RECOVERY - puppet last run on kubernetes1003 is OK: OK: Puppet is currently enabled, last run 22 minutes ago with 0 failures [12:31:28] RECOVERY - salt-minion processes on kubernetes1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:31:48] RECOVERY - Check size of conntrack table on notebook1001 is OK: OK: nf_conntrack is 0 % full [12:31:49] RECOVERY - DPKG on kubernetes1003 is OK: All packages OK [12:31:49] RECOVERY - dhclient process on notebook1001 is OK: PROCS OK: 0 processes with command name dhclient [12:31:57] RECOVERY - DPKG on notebook1001 is OK: All packages OK [12:32:14] RECOVERY - Disk space on notebook1001 is OK: DISK OK [12:32:51] RECOVERY - Disk space on kubernetes1003 is OK: DISK OK [12:33:07] RECOVERY - MD RAID on kubernetes1003 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [12:33:27] RECOVERY - salt-minion processes on notebook1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:36:38] (03PS1) 10Gehel: postgis - upgrade to postgis 2.3 [puppet] - 10https://gerrit.wikimedia.org/r/317494 (https://phabricator.wikimedia.org/T144763) [12:37:58] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 608 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3012379 keys - replication_delay is 608 [12:38:49] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: thumbor memory limits for main process and subprocesses - https://phabricator.wikimedia.org/T145623#2737775 (10Gilles) It seems like the only viable solution is to get cgrulesengd to work, with rules for the expected processes. [12:39:59] RECOVERY - MegaRAID on notebook1001 is OK: OK: optimal, 1 logical, 2 physical [12:40:19] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3012944 keys - replication_delay is 0 [12:58:16] PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [13:00:05] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161024T1300). Please do the needful. [13:00:05] Dereckson, mafk, and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:02:20] Hello. [13:02:53] (03PS1) 10Giuseppe Lavagetto: check_eth: blacklist docker0 as well [puppet] - 10https://gerrit.wikimedia.org/r/317498 [13:03:56] (03PS2) 10Giuseppe Lavagetto: check_eth: blacklist docker0 as well [puppet] - 10https://gerrit.wikimedia.org/r/317498 [13:04:00] RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [13:05:33] (03PS3) 10BBlack: tlsproxy: use 8x FE ports to balance [puppet] - 10https://gerrit.wikimedia.org/r/317405 (https://phabricator.wikimedia.org/T107749) [13:05:35] (03PS2) 10BBlack: cache frontends: 8x local ports 3120-3127 [puppet] - 10https://gerrit.wikimedia.org/r/317404 (https://phabricator.wikimedia.org/T107749) [13:05:37] (03PS3) 10BBlack: tlsproxy: raise worker connection limits, too [puppet] - 10https://gerrit.wikimedia.org/r/317414 (https://phabricator.wikimedia.org/T107749) [13:05:39] (03PS1) 10BBlack: reduce cache local ports slightly [puppet] - 10https://gerrit.wikimedia.org/r/317499 (https://phabricator.wikimedia.org/T107749) [13:06:09] (03PS2) 10Dereckson: Edit-a-thon BDA (Poitiers) throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317174 (https://phabricator.wikimedia.org/T148852) [13:06:10] I'm adding to SWAT this change ^ [13:06:28] And I can SWAT. [13:06:31] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317174 (https://phabricator.wikimedia.org/T148852) (owner: 10Dereckson) [13:06:48] urbanecm and mafk seems away. [13:07:02] (03Merged) 10jenkins-bot: Edit-a-thon BDA (Poitiers) throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317174 (https://phabricator.wikimedia.org/T148852) (owner: 10Dereckson) [13:07:04] (03CR) 10BBlack: [C: 032 V: 032] cache frontends: 8x local ports 3120-3127 [puppet] - 10https://gerrit.wikimedia.org/r/317404 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [13:08:13] (03PS3) 10Giuseppe Lavagetto: check_eth: blacklist docker0 as well [puppet] - 10https://gerrit.wikimedia.org/r/317498 [13:08:22] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] check_eth: blacklist docker0 as well [puppet] - 10https://gerrit.wikimedia.org/r/317498 (owner: 10Giuseppe Lavagetto) [13:08:46] _joe__: we need to reenable l10update for this night run: https://gerrit.wikimedia.org/r/#/c/317390/ [13:09:31] 06Operations, 10ops-eqiad, 10Prod-Kubernetes, 05Kubernetes-production-experiment, and 2 others: Rack/Setup Kubernetes Servers - https://phabricator.wikimedia.org/T147933#2737818 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['kubernetes1001.eqiad.wmnet'] ``` Of which those **FAILED**: ``` se... [13:09:37] I'll run l10nupdate manually in a few moment, it's been one week without running. [13:12:13] 317174 live on mw1099 and working fine [13:12:22] (03PS1) 10Paladox: Phabricator: Add x-javascript to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/317500 [13:12:40] 06Operations, 10ops-eqiad, 10Prod-Kubernetes, 05Kubernetes-production-experiment, and 2 others: Rack/Setup Kubernetes Servers - https://phabricator.wikimedia.org/T147933#2737823 (10Joe) 05Open>03Resolved [13:12:47] (03PS3) 10Gilles: Add mtail program to track thumbor OOM kills [puppet] - 10https://gerrit.wikimedia.org/r/315272 [13:13:05] 06Operations, 10Prod-Kubernetes, 05Kubernetes-production-experiment, 13Patch-For-Review, 15User-Joe: Docker installation for production kubernetes - https://phabricator.wikimedia.org/T147181#2737826 (10Joe) 05Open>03Resolved [13:13:53] 06Operations, 10Traffic, 15User-Joe, 07discovery-system: Upgrade conftool to 0.3.1 - https://phabricator.wikimedia.org/T147480#2737841 (10Joe) 05Open>03Resolved a:05ema>03Joe [13:13:54] !log dereckson@mira Synchronized wmf-config/throttle.php: Edit-a-thon BDA (Poitiers) throttle rule (T148852) (duration: 01m 13s) [13:13:56] T148852: Throttle rule for Futuroscope edit-a-thon - https://phabricator.wikimedia.org/T148852 [13:14:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:14:22] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on tin is OK: Files ownership is ok. [13:15:08] (03PS2) 10Paladox: Phabricator: Add javascript to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/317500 [13:15:21] (03PS3) 10Paladox: Phabricator: Add javascript to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/317500 [13:15:44] (03CR) 10Paladox: "Mobile browsers should already be able to view raw js, but desktop browsers carn't because they can download files." [puppet] - 10https://gerrit.wikimedia.org/r/317500 (owner: 10Paladox) [13:16:05] (03CR) 10Paladox: "So this fixes it for desktop browsers to be able to view it raw in the browser without needing to download." [puppet] - 10https://gerrit.wikimedia.org/r/317500 (owner: 10Paladox) [13:18:06] !log Started manually l10nupdate, as it didn't run for 6 days, and more especially to fix T148921 user-facing issue. [13:18:07] T148921: Fancycaptcha-accountcreater translation isn't exported for some languages - https://phabricator.wikimedia.org/T148921 [13:18:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:19:39] 06Operations, 10Prod-Kubernetes, 05Kubernetes-production-experiment, 15User-Joe: Install a docker registry for production - https://phabricator.wikimedia.org/T148960#2737849 (10Joe) [13:20:09] Dereckson: Was my patch ever reverted? [13:20:11] I presume now [13:20:12] *not [13:20:37] (03PS1) 10Reedy: Revert "Let's disable l10nupdate completely until we have /srv/mediawiki-staging back" [puppet] - 10https://gerrit.wikimedia.org/r/317501 [13:20:41] (03PS2) 10Reedy: Revert "Let's disable l10nupdate completely until we have /srv/mediawiki-staging back" [puppet] - 10https://gerrit.wikimedia.org/r/317501 [13:20:44] !log reimaging mc120[89] and mc1030 [13:20:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:20:55] Dereckson: ^ We should get someone to merge that [13:22:09] Sorry, I didn't look at the time... [13:22:27] (^ was written because of SWAT) [13:23:04] BTW why I'm getting 403 error when looking at channel logs? [13:23:41] WFM [13:24:41] 06Operations, 10Prod-Kubernetes, 10vm-requests, 05Kubernetes-production-experiment, 15User-Joe: Ganeti VM for docker registry - https://phabricator.wikimedia.org/T148961#2737861 (10Joe) [13:24:48] <_joe__> akosiaris: ^^ [13:24:59] I'm still getting HTTP ERROR 403 [13:25:03] (03PS4) 10Gilles: Add mtail program to track thumbor OOM kills [puppet] - 10https://gerrit.wikimedia.org/r/315272 [13:25:08] <_joe__> Urbanecm: which url? [13:25:10] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "Let's disable l10nupdate completely until we have /srv/mediawiki-staging back" [puppet] - 10https://gerrit.wikimedia.org/r/317501 (owner: 10Reedy) [13:25:27] <_joe__> heh wasn't referring to that :P [13:25:34] _joe__: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/?C=M;O=D [13:25:39] <_joe__> but thanks, I'd have addressed that when free [13:26:13] http://ur1.ca/edq22 from the header [13:26:16] <_joe__> Urbanecm: can't reproduce [13:26:35] But http://ur1.ca/edq22 expands to http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/?C=M;O=D and the second URL returns 403 [13:26:45] <_joe__> not for me [13:26:53] (03Abandoned) 10Gilles: Add memory limit to Thumbor subprocesses [puppet] - 10https://gerrit.wikimedia.org/r/315248 (owner: 10Gilles) [13:27:37] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2737876 (10Gilles) [13:27:40] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: thumbor memory limits for main process and subprocesses - https://phabricator.wikimedia.org/T145623#2737874 (10Gilles) 05Open>03declined Actually I've just realized that cgrulesengd cannot work either, because it would assign all instance... [13:27:51] But when I use my private proxy server I can access without error. [13:28:02] 06Operations, 10Datasets-General-or-Unknown, 10hardware-requests: reallocate snapshto1001 for use as canary/testbed for dumps - https://phabricator.wikimedia.org/T144728#2608426 (10mark) Yes, this is ok to use as a non-prod, test only box for another 1-2 years. [13:28:31] And when I disconnect from the proxy server, I got 403 again. [13:28:32] (03Abandoned) 10Dereckson: Revert "Let's disable l10nupdate … until … /srv/mediawiki-staging back" [puppet] - 10https://gerrit.wikimedia.org/r/317390 (https://phabricator.wikimedia.org/T148571) (owner: 10Dereckson) [13:28:45] My IP address is 213.226.216.2 if this will help... [13:28:48] 06Operations, 06Performance-Team, 10Thumbor: Record OOM kills as a metric with mtail - https://phabricator.wikimedia.org/T148962#2737882 (10Gilles) [13:28:51] (213.226.216.2 without using proxy) [13:29:17] (03PS5) 10Gilles: Add mtail program to track thumbor OOM kills [puppet] - 10https://gerrit.wikimedia.org/r/315272 (https://phabricator.wikimedia.org/T148962) [13:29:36] (with proxy I use 94.143.232.92) [13:29:44] _joe__ ^^ [13:29:58] cant reproduce either [13:30:17] <_joe__> Urbanecm: that first IP definitely is NOT a wmf ip [13:30:27] <_joe__> oh sorry read that wrong [13:30:34] <_joe__> Urbanecm: shouldn't make any difference [13:30:35] I know. The first IP is my IP. [13:30:44] <_joe__> yeah sorry, I read that backwards [13:30:50] Okay. [13:31:08] (03PS2) 10Gehel: postgis - upgrade to postgis 2.3 [puppet] - 10https://gerrit.wikimedia.org/r/317494 (https://phabricator.wikimedia.org/T144763) [13:31:10] But it makes from unknown reasons... [13:31:54] <_joe__> Urbanecm: try asking in -labs later [13:32:21] Okay. I also get 403 from another computer which is connected to the same network. [13:33:47] (03CR) 10Alexandros Kosiaris: [C: 031] postgis - upgrade to postgis 2.3 [puppet] - 10https://gerrit.wikimedia.org/r/317494 (https://phabricator.wikimedia.org/T144763) (owner: 10Gehel) [13:35:04] (03CR) 10Gehel: [C: 032] postgis - upgrade to postgis 2.3 [puppet] - 10https://gerrit.wikimedia.org/r/317494 (https://phabricator.wikimedia.org/T144763) (owner: 10Gehel) [13:35:09] PROBLEM - Varnish HTTP upload-frontend - port 3121 on cp1049 is CRITICAL: Connection refused [13:35:09] PROBLEM - Varnish HTTP maps-frontend - port 3120 on cp1047 is CRITICAL: Connection refused [13:35:09] PROBLEM - Varnish HTTP upload-frontend - port 3121 on cp1048 is CRITICAL: Connection refused [13:35:09] PROBLEM - Varnish HTTP upload-frontend - port 3122 on cp1050 is CRITICAL: Connection refused [13:35:09] PROBLEM - Varnish HTTP maps-frontend - port 3120 on cp1059 is CRITICAL: Connection refused [13:35:09] PROBLEM - Varnish HTTP misc-frontend - port 3121 on cp1061 is CRITICAL: Connection refused [13:35:09] PROBLEM - Varnish HTTP maps-frontend - port 3120 on cp1060 is CRITICAL: Connection refused [13:35:45] network issue? [13:36:35] PROBLEM - Varnish HTTP misc-frontend - port 3120 on cp1045 is CRITICAL: Connection refused [13:36:35] PROBLEM - Varnish HTTP maps-frontend - port 3122 on cp1046 is CRITICAL: Connection refused [13:36:35] PROBLEM - Varnish HTTP upload-frontend - port 3124 on cp1049 is CRITICAL: Connection refused [13:36:35] PROBLEM - Varnish HTTP upload-frontend - port 3125 on cp1050 is CRITICAL: Connection refused [13:36:35] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1054 is CRITICAL: Connection refused [13:36:36] PROBLEM - Varnish HTTP upload-frontend - port 3124 on cp1048 is CRITICAL: Connection refused [13:36:36] PROBLEM - Varnish HTTP maps-frontend - port 3123 on cp1047 is CRITICAL: Connection refused [13:36:37] PROBLEM - Varnish HTTP misc-frontend - port 3121 on cp1058 is CRITICAL: Connection refused [13:36:37] PROBLEM - Varnish HTTP misc-frontend - port 3124 on cp1051 is CRITICAL: Connection refused [13:36:54] ema, bblack --^ [13:36:56] bblack, ema lots of varnish complaining [13:37:36] PROBLEM - Varnish HTTP maps-frontend - port 3126 on cp1047 is CRITICAL: Connection refused [13:37:37] PROBLEM - Varnish HTTP misc-frontend - port 3123 on cp1045 is CRITICAL: Connection refused [13:37:37] PROBLEM - Varnish HTTP maps-frontend - port 3125 on cp1046 is CRITICAL: Connection refused [13:37:37] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1054 is CRITICAL: Connection refused [13:37:37] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1055 is CRITICAL: Connection refused [13:37:38] no issue [13:37:42] ok [13:37:52] sorry for the spam, I didn't realize my change would cause icinga to start monitoring those immediately :P [13:37:53] it was terrifying seeing so many alert [13:37:55] PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1055 is CRITICAL: Connection refused [13:37:55] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1053 is CRITICAL: Connection refused [13:37:55] PROBLEM - Varnish HTTP misc-frontend - port 3125 on cp1058 is CRITICAL: Connection refused [13:37:55] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1054 is CRITICAL: Connection refused [13:37:56] PROBLEM - Varnish HTTP misc-frontend - port 3124 on cp1045 is CRITICAL: Connection refused [13:37:56] PROBLEM - Varnish HTTP upload-frontend - port 3120 on cp2008 is CRITICAL: Connection refused [13:37:56] PROBLEM - Varnish HTTP upload-frontend - port 3126 on cp2005 is CRITICAL: Connection refused [13:37:57] it is ok [13:38:15] there's going to be something like 700 lines of that before it's done :/ [13:38:27] because it was not like the usual "1 server is down" [13:38:38] well, except we'll miss several of them from the excess-flood quits heh [13:38:52] PROBLEM - Varnish HTTP upload-frontend - port 3121 on cp1071 is CRITICAL: Connection refused [13:38:52] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1067 is CRITICAL: Connection refused [13:38:52] We got an issue report on #wikipedia-fr from an user on Free. [13:38:52] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1068 is CRITICAL: Connection refused [13:38:53] PROBLEM - Varnish HTTP upload-frontend - port 3122 on cp1099 is CRITICAL: Connection refused [13:38:53] PROBLEM - Varnish HTTP upload-frontend - port 3121 on cp1073 is CRITICAL: Connection refused [13:38:54] PROBLEM - Varnish HTTP upload-frontend - port 3121 on cp1074 is CRITICAL: Connection refused [13:38:54] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp2001 is CRITICAL: Connection refused [13:38:55] PROBLEM - Varnish HTTP upload-frontend - port 3125 on cp2002 is CRITICAL: Connection refused [13:38:55] PROBLEM - Varnish HTTP upload-frontend - port 3123 on cp2008 is CRITICAL: Connection refused [13:38:56] PROBLEM - Varnish HTTP maps-frontend - port 3125 on cp2009 is CRITICAL: Connection refused [13:38:56] PROBLEM - Varnish HTTP upload-frontend - port 3124 on cp2017 is CRITICAL: Connection refused [13:38:57] PROBLEM - Varnish HTTP misc-frontend - port 3125 on cp2018 is CRITICAL: Connection refused [13:39:02] I added 7 new listening ports to the varnish *configuration*, but the restarts have to happen slower afterwards to enable the ports, and icinga is already monitoring them :P [13:39:05] bye icinga-wm [13:39:27] it make sense [13:39:31] *makes [13:39:36] PROBLEM - Varnish HTTP upload-frontend - port 3120 on cp1050 is CRITICAL: Connection refused [13:39:37] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1067 is CRITICAL: Connection refused [13:39:37] PROBLEM - Varnish HTTP upload-frontend - port 3120 on cp1063 is CRITICAL: Connection refused [13:39:37] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1065 is CRITICAL: Connection refused [13:39:37] PROBLEM - Varnish HTTP upload-frontend - port 3121 on cp1064 is CRITICAL: Connection refused [13:39:37] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1066 is CRITICAL: Connection refused [13:39:37] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1068 is CRITICAL: Connection refused [13:39:54] and there's ~100 affected varnishd instances heh [13:40:35] RECOVERY - configured eth on kubernetes1003 is OK: OK - interfaces up [13:40:51] (03PS4) 10Niharika29: Set $wgCategoryCollation to 'uca-hr' for Croatian wikipedia Add numeric sorting for bs, hr and uk wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317139 (https://phabricator.wikimedia.org/T148749) [13:42:35] RECOVERY - cassandra service on maps2004 is OK: OK - cassandra is active [13:42:45] !log restarting all varnish frontends (serially per-cluster with proper depooling, etc) [13:42:48] RECOVERY - configured eth on kubernetes1002 is OK: OK - interfaces up [13:43:20] (03CR) 10Ema: [C: 031] reduce cache local ports slightly [puppet] - 10https://gerrit.wikimedia.org/r/317499 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [13:44:11] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp3030 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.175 second response time [13:44:11] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp4006 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.159 second response time [13:44:14] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp4006 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.159 second response time [13:44:26] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp2006 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.077 second response time [13:44:31] (03PS5) 10Niharika29: Set $wgCategoryCollation to 'uca-hr' for Croatian wikipedia Add numeric sorting for bs, hr and uk wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317139 (https://phabricator.wikimedia.org/T148749) [13:44:33] (03PS1) 10Niharika29: Switch bs, hr and uk wikis to numeric collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317502 (https://phabricator.wikimedia.org/T148682) [13:44:35] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp3030 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.168 second response time [13:44:35] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp4011 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.161 second response time [13:44:35] and now we start getting the 700x recoveries, yay [13:44:38] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp4006 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.221 second response time [13:44:39] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp2006 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.077 second response time [13:44:45] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp3030 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.171 second response time [13:44:45] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp4011 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.160 second response time [13:44:55] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp2006 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.079 second response time [13:44:56] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp4006 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.159 second response time [13:45:06] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp1045 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.004 second response time [13:45:06] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1054 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.005 second response time [13:45:07] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp3030 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.169 second response time [13:45:07] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp4011 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.160 second response time [13:45:07] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp2006 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.073 second response time [13:45:07] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp3006 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.171 second response time [13:45:07] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp4006 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.162 second response time [13:45:14] PROBLEM - check_puppetrun on payments1002 is CRITICAL: CRITICAL: Puppet has 18 failures [13:45:25] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1054 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.151 second response time [13:45:25] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp1045 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.139 second response time [13:45:25] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp3030 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.171 second response time [13:45:26] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp3047 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.176 second response time [13:45:26] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp4011 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.159 second response time [13:45:36] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp1060 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.008 second response time [13:45:45] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp2006 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.075 second response time [13:45:46] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp3006 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.168 second response time [13:45:47] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp1045 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.012 second response time [13:45:47] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1054 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.001 second response time [13:45:48] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp3030 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.169 second response time [13:45:48] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp3047 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.173 second response time [13:45:48] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp4011 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.162 second response time [13:45:48] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp4006 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.163 second response time [13:46:05] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp1045 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.004 second response time [13:46:05] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1054 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.012 second response time [13:46:05] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp2008 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.074 second response time [13:46:05] payments1002 puppetrun alert ^^^ is because I rebooted the frack puppetmaster [13:46:06] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp3030 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.174 second response time [13:46:06] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp3047 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.169 second response time [13:46:07] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp1060 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.007 second response time [13:46:07] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp2006 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.080 second response time [13:46:15] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp3006 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.173 second response time [13:46:15] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp4011 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.162 second response time [13:46:15] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp4006 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.158 second response time [13:46:25] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp2008 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.075 second response time [13:46:25] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp4001 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.158 second response time [13:46:25] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp3047 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:46:36] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp1045 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.002 second response time [13:46:36] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1054 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.005 second response time [13:46:36] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp1060 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.001 second response time [13:46:38] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp2006 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.077 second response time [13:46:39] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp3006 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:46:39] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp4011 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.161 second response time [13:46:39] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.162 second response time [13:46:40] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[postgresql-9.1-postgis-scripts] [13:46:45] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp2008 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.075 second response time [13:46:45] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp3047 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:46:45] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp4001 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.160 second response time [13:46:52] Urbanecm: you had changes scheduled for this SWAT by the way [13:46:55] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp1045 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.002 second response time [13:46:55] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp1047 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.014 second response time [13:46:55] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1054 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.010 second response time [13:46:55] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp1060 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.027 second response time [13:46:55] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp3006 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.168 second response time [13:46:56] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.168 second response time [13:47:07] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp2008 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.076 second response time [13:47:09] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp4001 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.158 second response time [13:47:09] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp3006 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.169 second response time [13:47:09] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp1045 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.007 second response time [13:47:09] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp1047 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.045 second response time [13:47:10] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1054 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.016 second response time [13:47:10] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.086 second response time [13:47:10] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp1060 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.043 second response time [13:47:10] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp3047 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:47:10] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp3041 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.185 second response time [13:47:11] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.159 second response time [13:47:16] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.014 second response time [13:47:26] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp2008 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.073 second response time [13:47:26] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp3006 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:47:26] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp4001 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.158 second response time [13:47:27] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.161 second response time [13:47:36] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp1047 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.008 second response time [13:47:36] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp1060 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.005 second response time [13:47:36] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp1047 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.003 second response time [13:47:37] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.002 second response time [13:47:37] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp1060 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.011 second response time [13:47:37] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp3047 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:47:37] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp3041 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.169 second response time [13:47:37] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp2015 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.073 second response time [13:47:38] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp2014 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.078 second response time [13:47:38] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp4001 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.158 second response time [13:47:39] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.160 second response time [13:47:55] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp2008 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.073 second response time [13:47:55] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp3041 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.176 second response time [13:47:55] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp3042 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.168 second response time [13:48:05] inb4 another kill excess flood [13:48:05] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp1047 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.001 second response time [13:48:15] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.001 second response time [13:48:16] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp2014 is OK: HTTP OK: HTTP/1.1 200 OK - 321 bytes in 0.074 second response time [13:48:16] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp2015 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.072 second response time [13:48:17] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp3009 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.168 second response time [13:48:17] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp2008 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.073 second response time [13:48:18] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp3041 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.170 second response time [13:48:18] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp3042 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.175 second response time [13:48:18] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp4001 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.164 second response time [13:48:18] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp3049 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.168 second response time [13:48:18] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.163 second response time [13:48:35] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp2004 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.074 second response time [13:48:35] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp3009 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.172 second response time [13:48:35] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp3042 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.169 second response time [13:48:35] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp3041 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.172 second response time [13:48:36] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp3049 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:48:36] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp4001 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.157 second response time [13:48:37] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp1047 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.020 second response time [13:48:37] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.012 second response time [13:48:37] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp2012 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.076 second response time [13:48:37] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp2014 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.076 second response time [13:48:38] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp2015 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.079 second response time [13:48:45] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp3004 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.170 second response time [13:48:45] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.158 second response time [13:48:55] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp2015 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.073 second response time [13:48:55] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp3009 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.169 second response time [13:48:56] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp3042 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.169 second response time [13:48:56] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp3049 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:49:05] (03PS6) 10Niharika29: Set $wgCategoryCollation to 'uca-hr' for Croatian wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317139 (https://phabricator.wikimedia.org/T148749) [13:49:07] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp1047 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.001 second response time [13:49:07] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.003 second response time [13:49:09] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp2004 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.073 second response time [13:49:09] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp2012 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.074 second response time [13:49:10] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp2014 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.073 second response time [13:49:11] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp3004 is OK: HTTP OK: HTTP/1.1 200 OK - 321 bytes in 0.173 second response time [13:49:11] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp3041 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.169 second response time [13:49:13] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp4020 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.167 second response time [13:49:24] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp1071 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.002 second response time [13:49:25] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp2015 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.078 second response time [13:49:25] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp2014 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.075 second response time [13:49:25] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp3009 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.176 second response time [13:49:25] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp3049 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.174 second response time [13:49:25] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp3044 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.175 second response time [13:49:27] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp1058 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.008 second response time [13:49:27] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp1051 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.012 second response time [13:49:27] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp2004 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.078 second response time [13:49:34] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp2012 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.076 second response time [13:49:35] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp3041 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.169 second response time [13:49:35] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp3042 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.168 second response time [13:49:35] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp3004 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.172 second response time [13:49:35] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp4020 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.159 second response time [13:49:36] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp1071 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.001 second response time [13:49:36] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp2014 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.073 second response time [13:49:36] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp2015 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.077 second response time [13:49:37] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp3049 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.173 second response time [13:49:37] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp3009 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.169 second response time [13:49:38] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp1058 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.007 second response time [13:49:44] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp2004 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.073 second response time [13:49:45] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp2012 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.073 second response time [13:49:45] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp3004 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.169 second response time [13:49:46] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp3044 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.176 second response time [13:49:46] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp3042 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.174 second response time [13:49:46] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp4020 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.159 second response time [13:49:54] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp1071 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.002 second response time [13:49:56] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp2001 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.074 second response time [13:49:56] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp2014 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.073 second response time [13:49:56] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp2015 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.077 second response time [13:49:56] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp3009 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.174 second response time [13:49:57] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp3049 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.170 second response time [13:50:05] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.008 second response time [13:50:14] RECOVERY - check_puppetrun on payments1002 is OK: OK: Puppet is currently enabled, last run 211 seconds ago with 0 failures [13:50:15] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp1058 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.001 second response time [13:50:15] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp2004 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.076 second response time [13:50:16] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp2012 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.073 second response time [13:50:17] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp2025 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.076 second response time [13:50:17] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp1071 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.004 second response time [13:50:17] Urbanecm: please tell me if you're interested by a deployment in 20 minutes (ETA for l10nupdate scap, then for cache rebuild) or if you prefer reschedule them for a next window [13:50:17] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp3004 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.173 second response time [13:50:17] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp3042 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.168 second response time [13:50:18] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp3044 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.171 second response time [13:50:18] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp3009 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.169 second response time [13:50:18] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp3049 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:50:19] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp4020 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.162 second response time [13:50:25] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.006 second response time [13:50:34] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp1058 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.005 second response time [13:50:34] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp2004 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.079 second response time [13:50:35] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp2001 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.076 second response time [13:50:35] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp2012 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.075 second response time [13:50:35] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp2025 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.076 second response time [13:50:35] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp3035 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.169 second response time [13:50:35] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp3004 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.169 second response time [13:50:36] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp3044 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.170 second response time [13:50:36] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp4020 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.161 second response time [13:50:44] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.019 second response time [13:50:45] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp1071 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.004 second response time [13:50:45] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1067 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.016 second response time [13:50:45] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp2012 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.076 second response time [13:50:45] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp3004 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.182 second response time [13:50:45] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp1058 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.001 second response time [13:50:55] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp2004 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.083 second response time [13:50:55] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp2001 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.072 second response time [13:50:55] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp2025 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.074 second response time [13:50:56] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp3035 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:50:56] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp4020 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.160 second response time [13:50:56] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp3044 is OK: HTTP OK: HTTP/1.1 200 OK - 326 bytes in 0.168 second response time [13:50:56] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp4002 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.160 second response time [13:50:56] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp4003 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.158 second response time [13:51:06] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1067 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.006 second response time [13:51:06] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp1071 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.002 second response time [13:51:06] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp2001 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.073 second response time [13:51:06] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp2025 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.075 second response time [13:51:12] Dereckson: I have about 1 hour left of my time. [13:51:14] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp3035 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:51:14] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp4002 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.158 second response time [13:51:14] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp3044 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.171 second response time [13:51:14] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp4013 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.159 second response time [13:51:15] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.030 second response time [13:51:15] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp1058 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.004 second response time [13:51:16] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp4012 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.162 second response time [13:51:16] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp4003 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.161 second response time [13:51:16] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp4020 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.159 second response time [13:51:18] 06Operations, 10Prod-Kubernetes, 05Kubernetes-production-experiment, 15User-Joe: Puppet implementation of the production docker-registry installation - https://phabricator.wikimedia.org/T148966#2737951 (10Joe) [13:51:25] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1067 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.008 second response time [13:51:25] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp1071 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.005 second response time [13:51:26] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp2001 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.073 second response time [13:51:26] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp2025 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.073 second response time [13:51:26] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp3044 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.169 second response time [13:51:34] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp4003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.157 second response time [13:51:34] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp4002 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.162 second response time [13:51:36] 06Operations, 10Prod-Kubernetes, 05Kubernetes-production-experiment, 15User-Joe: Puppet implementation of the production docker-registry installation - https://phabricator.wikimedia.org/T148966#2737951 (10Joe) p:05Triage>03High [13:51:36] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp1058 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.019 second response time [13:51:36] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.002 second response time [13:51:36] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1055 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.007 second response time [13:51:44] Dereckson: So if deploying in 60 minutes will be possible, it is fine for me. [13:51:49] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp4012 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.172 second response time [13:51:49] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp3035 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.170 second response time [13:51:49] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp4013 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.158 second response time [13:51:53] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1067 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.081 second response time [13:51:53] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp2001 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.122 second response time [13:51:53] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp4003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.158 second response time [13:51:54] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp4002 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.177 second response time [13:52:06] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.021 second response time [13:52:06] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1055 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.002 second response time [13:52:06] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp2025 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.073 second response time [13:52:07] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp3007 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.170 second response time [13:52:07] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp3035 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:52:07] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.170 second response time [13:52:07] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp4013 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.160 second response time [13:52:07] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp4012 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.163 second response time [13:52:15] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1067 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.002 second response time [13:52:15] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp2001 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.074 second response time [13:52:16] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp4002 is OK: HTTP OK: HTTP/1.1 200 OK - 321 bytes in 0.160 second response time [13:52:16] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp3007 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.169 second response time [13:52:16] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp4003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.162 second response time [13:52:17] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp1049 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.010 second response time [13:52:17] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp1046 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.014 second response time [13:52:17] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1055 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.010 second response time [13:52:17] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp2025 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.074 second response time [13:52:18] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.174 second response time [13:52:18] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp3035 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.169 second response time [13:52:18] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp4012 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.167 second response time [13:52:19] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp4013 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.159 second response time [13:52:26] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp1049 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.004 second response time [13:52:36] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp3007 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.169 second response time [13:52:36] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp3010 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.168 second response time [13:52:36] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp4003 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.160 second response time [13:52:36] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp4002 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.161 second response time [13:52:46] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1055 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.007 second response time [13:52:47] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp1049 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.010 second response time [13:52:47] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1067 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.008 second response time [13:52:47] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.169 second response time [13:52:47] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp3005 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.170 second response time [13:52:47] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp3035 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.169 second response time [13:52:47] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp4013 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.159 second response time [13:52:48] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp4012 is OK: HTTP OK: HTTP/1.1 200 OK - 321 bytes in 0.159 second response time [13:52:48] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp3005 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.170 second response time [13:52:48] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp3007 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.173 second response time [13:52:49] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp4002 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.164 second response time [13:52:49] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp4003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.158 second response time [13:52:50] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp3010 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.172 second response time [13:52:55] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp4016 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.158 second response time [13:52:57] RECOVERY - configured eth on kubernetes1001 is OK: OK - interfaces up [13:53:05] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1055 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.002 second response time [13:53:05] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp2022 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.074 second response time [13:53:05] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.172 second response time [13:53:05] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp4012 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.158 second response time [13:53:05] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp4013 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.161 second response time [13:53:14] Urbanecm not sure ops/deployteam would read this due to icinga flood [13:53:24] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp1049 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.046 second response time [13:53:26] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp2022 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.074 second response time [13:53:27] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp3007 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.170 second response time [13:53:27] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp3008 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.170 second response time [13:53:27] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp3005 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.171 second response time [13:53:27] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp3010 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.168 second response time [13:53:28] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1055 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.008 second response time [13:53:29] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp2003 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.074 second response time [13:53:29] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.169 second response time [13:53:29] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp4012 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.163 second response time [13:53:29] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp4013 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.164 second response time [13:53:29] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp4016 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.158 second response time [13:53:45] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp2003 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.074 second response time [13:53:46] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp3010 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.168 second response time [13:53:46] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp4016 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.164 second response time [13:53:54] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp1049 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.001 second response time [13:53:54] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1055 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.001 second response time [13:53:55] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1066 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.001 second response time [13:53:55] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1065 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.001 second response time [13:53:55] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp2022 is OK: HTTP OK: HTTP/1.1 200 OK - 321 bytes in 0.076 second response time [13:53:55] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.169 second response time [13:53:55] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp3005 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.169 second response time [13:53:56] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp3007 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.171 second response time [13:53:56] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp3008 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:54:06] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1065 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.005 second response time [13:54:07] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp3010 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.174 second response time [13:54:07] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp4016 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.159 second response time [13:54:16] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp1049 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.005 second response time [13:54:17] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1066 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.003 second response time [13:54:17] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp1072 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.007 second response time [13:54:18] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp2022 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.072 second response time [13:54:19] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp2003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.076 second response time [13:54:19] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.172 second response time [13:54:19] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp3007 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.171 second response time [13:54:20] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp3005 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.172 second response time [13:54:20] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp3008 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.170 second response time [13:54:20] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp4019 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.159 second response time [13:54:24] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1066 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.001 second response time [13:54:24] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp1072 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.001 second response time [13:54:24] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1065 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.002 second response time [13:54:24] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp4004 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.159 second response time [13:54:24] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp3008 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.169 second response time [13:54:24] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp3010 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.168 second response time [13:54:25] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp4016 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.159 second response time [13:54:36] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp1049 is OK: HTTP OK: HTTP/1.1 200 OK - 322 bytes in 0.012 second response time [13:54:36] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp2003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.072 second response time [13:54:36] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp2022 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.074 second response time [13:54:36] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp3005 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.173 second response time [13:54:36] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp4019 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.164 second response time [13:54:36] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1066 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.007 second response time [13:54:44] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1065 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.005 second response time [13:54:44] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp1072 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.001 second response time [13:54:45] RECOVERY - Varnish HTTP misc-frontend - port 3124 on cp4004 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.159 second response time [13:54:46] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp3008 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.171 second response time [13:54:46] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp4016 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.159 second response time [13:54:46] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp3010 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:54:46] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp2003 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.073 second response time [13:54:47] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp2022 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.080 second response time [13:54:47] RECOVERY - Varnish HTTP maps-frontend - port 3125 on cp3005 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.171 second response time [13:54:47] RECOVERY - Varnish HTTP maps-frontend - port 3126 on cp4019 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.163 second response time [13:54:47] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp4010 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.159 second response time [13:54:58] RECOVERY - Varnish HTTP misc-frontend - port 3120 on cp1061 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.032 second response time [13:54:58] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1065 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.001 second response time [13:54:58] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1066 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.013 second response time [13:54:58] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp1072 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.005 second response time [13:54:58] (03CR) 10Ori.livneh: "IMO this should be reverted and discussed." [puppet] - 10https://gerrit.wikimedia.org/r/317501 (owner: 10Reedy) [13:55:06] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp3008 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [13:55:06] RECOVERY - Varnish HTTP misc-frontend - port 3125 on cp4004 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.161 second response time [13:55:06] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp4010 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.160 second response time [13:55:08] RECOVERY - Varnish HTTP maps-frontend - port 3120 on cp4019 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.158 second response time [13:55:08] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp4016 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.157 second response time [13:55:15] RECOVERY - Varnish HTTP misc-frontend - port 3121 on cp1061 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.008 second response time [13:55:16] RECOVERY - Varnish HTTP maps-frontend - port 3123 on cp2003 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.073 second response time [13:55:16] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp1063 is OK: HTTP OK: HTTP/1.1 200 OK - 319 bytes in 0.011 second response time [13:55:16] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp2009 is OK: HTTP OK: HTTP/1.1 200 OK - 323 bytes in 0.074 second response time [13:55:16] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1066 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.010 second response time [13:55:16] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1065 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.011 second response time [13:55:16] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp1072 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.008 second response time [13:55:17] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp2022 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.073 second response time [13:55:24] arseny92: I've written to Dereckson by private chat, it will be better I think... [13:55:25] RECOVERY - Varnish HTTP misc-frontend - port 3123 on cp3008 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.171 second response time [13:55:26] RECOVERY - Varnish HTTP misc-frontend - port 3126 on cp4004 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.161 second response time [13:55:29] RECOVERY - Varnish HTTP maps-frontend - port 3121 on cp4019 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.161 second response time [13:55:29] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp4010 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.159 second response time [13:55:33] can we kill the bot? [13:55:35] RECOVERY - Varnish HTTP maps-frontend - port 3124 on cp2003 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.072 second response time [13:55:35] RECOVERY - Varnish HTTP maps-frontend - port 3122 on cp2009 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.077 second response time [13:55:35] RECOVERY - Varnish HTTP misc-frontend - port 3122 on cp2018 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.073 second response time [13:55:53] paravoid: I prefer fixing of this problem... [13:55:53] done [13:56:02] not really a problem I think [13:56:07] Really? [13:56:16] it's bblack's change I think [13:56:17] <_joe__> nope, just monitoring coming up before the service was initialized [13:56:29] bblack: still, if we are to keep that, we should probably have one check for all ports or something [13:56:39] Okay, it's fine then. 7 [13:56:51] !log dereckson@mira scap sync-l10n completed (1.28.0-wmf.22) (duration: 10m 46s) [13:56:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:57:02] bblack: otherwise a) it's too much of an overhead for the icinga server (which shouldn't be a problem, but...) and b) everytime a varnish dies we'll get a storm of alerts [13:57:35] sure [13:57:44] one check that checks a list of ports and reports back if one or more of them are not in the desired state should suffice, right? [13:59:50] I think so [14:00:33] it's poorly-factored in general (e.g. the port list copied manually between tlsproxy and varnish parameterizaion) [14:00:45] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp1048 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.003 second response time [14:00:46] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp2024 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.073 second response time [14:00:46] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1068 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.005 second response time [14:00:46] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp2007 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.074 second response time [14:00:46] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp3031 is OK: HTTP OK: HTTP/1.1 200 OK - 427 bytes in 0.168 second response time [14:00:52] we can fixup after, mostly I just want to get the functional thing out of the way in case it's causative [14:01:11] (03CR) 10Giuseppe Lavagetto: [C: 032] docker: fix monitoring for hosts with docker installed [puppet] - 10https://gerrit.wikimedia.org/r/317483 (owner: 10Giuseppe Lavagetto) [14:01:16] (03PS3) 10Giuseppe Lavagetto: docker: fix monitoring for hosts with docker installed [puppet] - 10https://gerrit.wikimedia.org/r/317483 [14:01:19] (03CR) 10Giuseppe Lavagetto: [V: 032] docker: fix monitoring for hosts with docker installed [puppet] - 10https://gerrit.wikimedia.org/r/317483 (owner: 10Giuseppe Lavagetto) [14:01:30] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp3040 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.168 second response time [14:01:41] <_joe_> ouch, wrong patch [14:01:45] (03Abandoned) 10Giuseppe Lavagetto: docker: fix monitoring for hosts with docker installed [puppet] - 10https://gerrit.wikimedia.org/r/317483 (owner: 10Giuseppe Lavagetto) [14:01:47] (03CR) 10Alexandros Kosiaris: "To my understanding, this was due to https://phabricator.wikimedia.org/T148571. So reverting the change to get back to the proper state is" [puppet] - 10https://gerrit.wikimedia.org/r/317501 (owner: 10Reedy) [14:01:49] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp2024 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.085 second response time [14:01:50] RECOVERY - Varnish HTTP upload-frontend - port 3120 on cp3039 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.171 second response time [14:01:50] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp3040 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.170 second response time [14:01:50] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp4017 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.158 second response time [14:02:05] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp3039 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [14:02:06] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp3040 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.171 second response time [14:02:06] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1053 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.005 second response time [14:02:19] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp3039 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [14:02:19] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp3040 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.168 second response time [14:02:22] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp4017 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.157 second response time [14:02:22] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp1048 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.008 second response time [14:02:23] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp3039 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [14:02:23] RECOVERY - Varnish HTTP upload-frontend - port 3121 on cp3046 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.172 second response time [14:02:36] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp1048 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.001 second response time [14:02:37] RECOVERY - Varnish HTTP upload-frontend - port 3125 on cp3039 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.182 second response time [14:02:37] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp3046 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.174 second response time [14:02:39] RECOVERY - Varnish HTTP upload-frontend - port 3126 on cp3039 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.171 second response time [14:02:39] RECOVERY - Varnish HTTP upload-frontend - port 3123 on cp3046 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [14:02:40] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp2023 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.074 second response time [14:02:47] RECOVERY - Varnish HTTP upload-frontend - port 3122 on cp3039 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.169 second response time [14:02:47] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp3040 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.168 second response time [14:02:56] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp3031 is OK: HTTP OK: HTTP/1.1 200 OK - 427 bytes in 0.168 second response time [14:02:56] RECOVERY - Varnish HTTP upload-frontend - port 3124 on cp3046 is OK: HTTP OK: HTTP/1.1 200 OK - 325 bytes in 0.168 second response time [14:03:07] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1053 is OK: HTTP OK: HTTP/1.1 200 OK - 429 bytes in 0.003 second response time [14:03:07] !log l10nupdate@mira ResourceLoader cache refresh completed at Mon Oct 24 14:03:07 UTC 2016 (duration 6m 17s) [14:03:07] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp2023 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.076 second response time [14:03:09] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp3031 is OK: HTTP OK: HTTP/1.1 200 OK - 427 bytes in 0.177 second response time [14:03:09] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp4017 is OK: HTTP OK: HTTP/1.1 200 OK - 428 bytes in 0.160 second response time [14:03:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:04:56] no pages during any of that, should I have gotten some? [14:05:18] no [14:05:22] ok [14:05:33] there was no effect on public services for paging [14:05:38] (03PS11) 10Alexandros Kosiaris: icinga: Remove event_profiling_enabled [puppet] - 10https://gerrit.wikimedia.org/r/315085 [14:05:40] swet [14:05:40] (03PS2) 10Alexandros Kosiaris: icinga::ircbot: Add an ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/317487 [14:05:41] e [14:05:42] (03PS2) 10Alexandros Kosiaris: tendril: Set short_open_tag to true in virtual host [puppet] - 10https://gerrit.wikimedia.org/r/317486 [14:05:44] (03PS2) 10Alexandros Kosiaris: tendril: Apache 2.4 syntax in If clause guards [puppet] - 10https://gerrit.wikimedia.org/r/317485 [14:05:46] (03PS2) 10Alexandros Kosiaris: tendril: Install php5-mysql package [puppet] - 10https://gerrit.wikimedia.org/r/317484 [14:05:48] (03PS7) 10Alexandros Kosiaris: Replace neon with einsteinium where applicable [puppet] - 10https://gerrit.wikimedia.org/r/315257 [14:05:50] (03PS8) 10Alexandros Kosiaris: Remove absented /etc/icinga/puppet_hostextinfo.cfg entry [puppet] - 10https://gerrit.wikimedia.org/r/315244 [14:05:51] just internal service defintions, a race between defining new ones, monitoring them, and turning them on for real [14:05:52] (03PS1) 10Alexandros Kosiaris: Enable role::tcpircbot on tegmen [puppet] - 10https://gerrit.wikimedia.org/r/317503 [14:06:11] (03CR) 10Jcrespo: [C: 031] tendril: Set short_open_tag to true in virtual host [puppet] - 10https://gerrit.wikimedia.org/r/317486 (owner: 10Alexandros Kosiaris) [14:06:56] (03CR) 10Jcrespo: tendril: Apache 2.4 syntax in If clause guards (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/317485 (owner: 10Alexandros Kosiaris) [14:07:25] (03CR) 10Jcrespo: [C: 031] tendril: Install php5-mysql package [puppet] - 10https://gerrit.wikimedia.org/r/317484 (owner: 10Alexandros Kosiaris) [14:07:34] We've in the logs some "Failed to get object num from hint tables ..." which is probably the PDF renderer software [14:07:53] 06Operations, 06Discovery, 06Discovery-Analysis (Current work), 13Patch-For-Review, 07Tracking: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2737999 (10Ottomata) > stat1002 and 1003 are supposed to have nearly identical configuration,... [14:08:42] Urbanecm: let me check if l10nupdate fixed the login message issue we had, and I check the private [14:08:45] scrolled up to get the beginning of that, now I see [14:09:02] 06Operations, 10Prod-Kubernetes, 10vm-requests, 05Kubernetes-production-experiment, 15User-Joe: Ganeti VM for docker registry - https://phabricator.wikimedia.org/T148961#2737861 (10akosiaris) LGTM. any preference in naming this? something kubernetes specific ? [14:09:04] Yes looks good. [14:11:26] 06Operations, 10ops-eqiad, 06DC-Ops: Broken disk on kafka1018 - https://phabricator.wikimedia.org/T147707#2738011 (10Ottomata) 05Open>03Resolved Ah yes, we are good! [14:11:32] (03CR) 10Alexandros Kosiaris: [C: 032] Enable role::tcpircbot on tegmen [puppet] - 10https://gerrit.wikimedia.org/r/317503 (owner: 10Alexandros Kosiaris) [14:11:35] !log Deploy schema change s5 dewiki.revision - only codfw T148967 [14:11:36] T148967: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967 [14:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:11:42] (03PS2) 10Alexandros Kosiaris: Enable role::tcpircbot on tegmen [puppet] - 10https://gerrit.wikimedia.org/r/317503 [14:11:44] (03CR) 10Alexandros Kosiaris: [V: 032] Enable role::tcpircbot on tegmen [puppet] - 10https://gerrit.wikimedia.org/r/317503 (owner: 10Alexandros Kosiaris) [14:12:53] arseny92: Urbanecm: I prefer #wikimedia-operations than private by the way for SWAT: https://wikitech.wikimedia.org/wiki/SWAT_deploys states "All communication MUST happen in #wikimedia-operations connect on Freenode (not in separate team or area-specific channels)" [14:13:37] That allows a better transparency. [14:13:55] 06Operations, 06Analytics-Kanban, 10EventBus: setup/install/deploy kafka1003 (WMF4723) - https://phabricator.wikimedia.org/T148849#2738019 (10Ottomata) Nice! Thanks! [14:14:01] Urbanecm: so yes, l10nupdate has fisnihed, so I can deploy them now [14:14:26] (03PS2) 10Dereckson: Add all edited pages by myself to watchlist by default (for new users) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317387 (https://phabricator.wikimedia.org/T148328) (owner: 10Urbanecm) [14:14:48] 06Operations, 10Prod-Kubernetes, 05Kubernetes-production-experiment: Build Kubernetes for production use - https://phabricator.wikimedia.org/T148968#2738021 (10Joe) [14:15:27] fun you got an agreement for watchlist by default, nemo tried for all Wikimedia projects without success [14:16:38] We just had to bear patience for a couple days more and the whiners would have stopped [14:17:27] The main objection as far as I remember is people wants to avoid to spam new contributors with mails each time an article in their watchlist has been edited [14:17:49] 06Operations, 06Discovery, 06Maps, 03Interactive-Sprint, 13Patch-For-Review: Unmet dependencies around postgis apt packages on maps* servers - https://phabricator.wikimedia.org/T147780#2738034 (10Gehel) 05Open>03Resolved `postgresql-9.4-postgis` is now updated to 2.3 on all maps servers, puppet run i... [14:18:04] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317387 (https://phabricator.wikimedia.org/T148328) (owner: 10Urbanecm) [14:18:07] Not really, the main objection was from botowners [14:18:19] As if bot owners were not able to adjust their preferences [14:18:25] well the main *reasonnable* objection [14:18:32] (03Merged) 10jenkins-bot: Add all edited pages by myself to watchlist by default (for new users) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317387 (https://phabricator.wikimedia.org/T148328) (owner: 10Urbanecm) [14:18:34] Dereckson hi, where can i add new wmf extensions to get branching for wmf [14:18:46] Reason is SemanticForms is being renamed to PageForms [14:19:16] But isen't ready to be Changed from SemanticForms to PageForms yet. [14:19:32] Dereckson: I know but because of flood I rewrited my message to your private chat. [14:20:06] ok [14:20:08] Urbanecm: 317387 live on mw1099 [14:21:33] you can probably test with a new test account logging on cs. on mw1099 (without visiting cs. without sending the request to mw1099 before) [14:23:35] checked with mwrepl, $wgDefaultUserOptions['watchdefault'] is well at 1 on cs.wiki, 0 on en.wiki [14:23:55] So I should create a new account from mw1099 at cs.wiki, login to it and test if all works as expected? [14:24:18] i'm about to sync kartotherian - refresher only, to resolve some stale servers [14:24:36] All on mw1099? [14:24:45] I don't understand your advices about testing... [14:24:47] Dereckson: ^ [14:27:36] yurik: I'm currently deploying config stuff [14:27:43] yurik: I'll ping you when I'm done [14:28:09] 06Operations, 06Discovery, 06Discovery-Analysis (Current work), 13Patch-For-Review, 07Tracking: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2738052 (10Ottomata) Oh, just re-read earlier parts of this ticket. Hm. There are .deb pack... [14:28:19] Dereckson, sorry, already synced. It was a scap3 service depl, so shouldn't have caused anything [14:28:22] Urbanecm: (1) set the extension to go to mw1099 (2) create a new test account or use one already existing test account not yet registered on cs.wiki locally (3) check preferences [14:28:27] yurik: ok [14:29:04] Okay, going to do it. [14:29:14] !log re-deployed current kartotherian to all servers (maps1002 & maps-test* were stale) [14:29:16] icinga spam is gone, if whoever wants to let it connect again [14:29:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:33:25] Dereckson: Seems to work. [14:34:27] ok [14:34:45] 06Operations, 10Prod-Kubernetes, 05Kubernetes-production-experiment, 15User-Joe: Puppet implementation of the production docker-registry installation - https://phabricator.wikimedia.org/T148966#2737951 (10Joe) a:03Joe [14:35:16] BTW is it possible to delete the testing account which I created? It is not needed by me because I have one testing account already. [14:35:58] 06Operations, 06ELiSo, 10RESTBase, 10VisualEditor, 07Esperanto-Sites: RESTBase thinks beta.wikiversity pages don't exist - https://phabricator.wikimedia.org/T148861#2738099 (10AlexMonk-WMF) [14:36:32] !log disabling puppet on all caches ahead of port# work, to test - T107749 / https://gerrit.wikimedia.org/r/#/c/317405 [14:36:33] T107749: HTTP/1.1 keepalive for local nginx->varnish conns - https://phabricator.wikimedia.org/T107749 [14:36:36] (03PS1) 10Alex Monk: Add missing wiki to RB [puppet] - 10https://gerrit.wikimedia.org/r/317511 (https://phabricator.wikimedia.org/T148861) [14:36:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:36:50] !log dereckson@mira Synchronized wmf-config/InitialiseSettings.php: Toggle wgDefaultUserOptions['watchdefault'] on for cs.wikipedia, off elsewhere (T148328, 1/2) (duration: 00m 54s) [14:36:51] T148328: Add all edited pages by myself to watchlist by default (for new users) - https://phabricator.wikimedia.org/T148328 [14:36:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:37:11] Urbanecm: note the account could be useful again in the future if you need to visit as a new user a project already visited with your main test account [14:37:27] Yeah, that's true. [14:37:38] 06Operations, 06ELiSo, 10RESTBase, 10VisualEditor, and 2 others: RESTBase thinks beta.wikiversity pages don't exist - https://phabricator.wikimedia.org/T148861#2738115 (10AlexMonk-WMF) >>! In T148861#2737349, @Psychoslave wrote: > How long may the deployment train delay be? I don't think this has anything... [14:37:45] (03PS4) 10BBlack: tlsproxy: use 8x FE ports to balance [puppet] - 10https://gerrit.wikimedia.org/r/317405 (https://phabricator.wikimedia.org/T107749) [14:37:47] gwicke, mobrovac: around? [14:38:00] (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: use 8x FE ports to balance [puppet] - 10https://gerrit.wikimedia.org/r/317405 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [14:38:06] (03PS4) 10BBlack: tlsproxy: raise worker connection limits, too [puppet] - 10https://gerrit.wikimedia.org/r/317414 (https://phabricator.wikimedia.org/T107749) [14:38:12] (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: raise worker connection limits, too [puppet] - 10https://gerrit.wikimedia.org/r/317414 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [14:38:16] By the way, it's a little late to notice that, but wmgWatchlistdefault → wmgWatchlistDefault would be perhaps more natural in camel case. [14:38:19] !log dereckson@mira Synchronized wmf-config/CommonSettings.php: Toggle wgDefaultUserOptions['watchdefault'] on for cs.wikipedia, off elsewhere (T148328, 2/2) (duration: 00m 50s) [14:38:21] (03PS2) 10BBlack: reduce cache local ports slightly [puppet] - 10https://gerrit.wikimedia.org/r/317499 (https://phabricator.wikimedia.org/T107749) [14:38:25] (03CR) 10BBlack: [C: 032 V: 032] reduce cache local ports slightly [puppet] - 10https://gerrit.wikimedia.org/r/317499 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [14:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:39:13] Okay, I'll try to remember it. [14:40:14] Dereckson its better to fix the capitalzation then as things tend to get permanent if not dealt with [14:41:01] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1002.eqiad.wmnet [14:41:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:41:14] yurik: ^ maps1002 is re-pooled [14:42:08] 06Operations, 10MediaWiki-General-or-Unknown, 06Release-Engineering-Team, 10Traffic, and 5 others: Make sure we're not relying on HTTP_PROXY headers - https://phabricator.wikimedia.org/T140658#2738137 (10demon) p:05High>03Low >>! In T140658#2684595, @BBlack wrote: > Is there more to do here on the MW-C... [14:42:20] (03PS3) 10Alexandros Kosiaris: tendril: Install php5-mysql package [puppet] - 10https://gerrit.wikimedia.org/r/317484 [14:42:24] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] tendril: Install php5-mysql package [puppet] - 10https://gerrit.wikimedia.org/r/317484 (owner: 10Alexandros Kosiaris) [14:42:42] arseny92: Urbanecm: yes, "litte late" meant "to fix it before deployment", but we need a follow-up change to fix capitalization soon, indeed [14:42:59] Should I prepare it? [14:43:03] Urbanecm: yes [14:43:11] a small miracle: no CS/IS desync issue! [14:43:12] (03PS3) 10Alexandros Kosiaris: tendril: Set short_open_tag to true in virtual host [puppet] - 10https://gerrit.wikimedia.org/r/317486 [14:43:17] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] tendril: Set short_open_tag to true in virtual host [puppet] - 10https://gerrit.wikimedia.org/r/317486 (owner: 10Alexandros Kosiaris) [14:44:08] (03PS1) 10BBlack: bugfix for e23f699fe2 [puppet] - 10https://gerrit.wikimedia.org/r/317514 (https://phabricator.wikimedia.org/T107749) [14:44:29] (03PS2) 10BBlack: bugfix for e23f699fe2 [puppet] - 10https://gerrit.wikimedia.org/r/317514 (https://phabricator.wikimedia.org/T107749) [14:44:32] (03CR) 10Dereckson: [C: 04-1] Show changes from last 14 days in watchlist in cswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/316295 (https://phabricator.wikimedia.org/T148327) (owner: 10Urbanecm) [14:44:34] (03CR) 10BBlack: [C: 032 V: 032] bugfix for e23f699fe2 [puppet] - 10https://gerrit.wikimedia.org/r/317514 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [14:47:00] (03PS12) 10Alexandros Kosiaris: icinga: Remove event_profiling_enabled [puppet] - 10https://gerrit.wikimedia.org/r/315085 [14:47:02] (03PS3) 10Alexandros Kosiaris: icinga::ircbot: Add an ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/317487 [14:47:04] (03PS3) 10Alexandros Kosiaris: tendril: Apache 2.4 syntax in If clause guards [puppet] - 10https://gerrit.wikimedia.org/r/317485 [14:47:06] (03PS8) 10Alexandros Kosiaris: Replace neon with einsteinium where applicable [puppet] - 10https://gerrit.wikimedia.org/r/315257 [14:47:08] (03PS9) 10Alexandros Kosiaris: Remove absented /etc/icinga/puppet_hostextinfo.cfg entry [puppet] - 10https://gerrit.wikimedia.org/r/315244 [14:47:10] (03PS1) 10Alexandros Kosiaris: tcpircbot: Use base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/317515 [14:48:55] (03PS4) 10Alexandros Kosiaris: tendril: Apache 2.4 syntax in If clause guards [puppet] - 10https://gerrit.wikimedia.org/r/317485 [14:49:12] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "Fixed style, thanks! Merging." [puppet] - 10https://gerrit.wikimedia.org/r/317485 (owner: 10Alexandros Kosiaris) [14:49:21] (03PS4) 10Alexandros Kosiaris: icinga::ircbot: Add an ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/317487 [14:49:25] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] icinga::ircbot: Add an ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/317487 (owner: 10Alexandros Kosiaris) [14:53:11] Dereckson: I must go home or they will lock me in the building. I can be available in about 30-40 minutes if you want me. [14:53:57] Urbanecm: okay, you can schedule follow-up patch for morning SWAT window at worst [14:54:13] see you later [14:54:43] !log starting ferm server on eeden, radon [14:54:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:57:01] !log restarting ferm on es2015 [14:57:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:57:26] (03CR) 10Hashar: [C: 04-1] "I no close to nothing with Phabricator, would need more context as to why javascript has to be added there :]" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/317500 (owner: 10Paladox) [14:59:03] 06Operations, 06Release-Engineering-Team, 07HHVM, 13Patch-For-Review, 06Services (doing): Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2738249 (10hashar) [14:59:05] 06Operations, 06Release-Engineering-Team, 07Beta-Cluster-reproducible, 13Patch-For-Review: mwscript on jessie mediawiki fails - https://phabricator.wikimedia.org/T146286#2738245 (10hashar) 05Open>03Resolved a:03hashar https://gerrit.wikimedia.org/r/#/c/315260/ adds mediawiki::packages::php5 on the ro... [14:59:38] 06Operations, 10Traffic: Strongswan Icinga check: do not report issues about depooled hosts - https://phabricator.wikimedia.org/T148976#2738250 (10ema) [14:59:52] 06Operations, 10Traffic: Strongswan Icinga check: do not report issues about depooled hosts - https://phabricator.wikimedia.org/T148976#2738265 (10ema) p:05Triage>03Normal [15:00:21] (03PS1) 10Jcrespo: proxysql: Fix typos on role [puppet] - 10https://gerrit.wikimedia.org/r/317517 (https://phabricator.wikimedia.org/T148500) [15:00:27] (03PS4) 10Paladox: Phabricator: Add javascript to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/317500 [15:00:34] (03CR) 10Paladox: Phabricator: Add javascript to files.viewable-mime-types (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/317500 (owner: 10Paladox) [15:00:43] (03PS5) 10Paladox: Phabricator: Add javascript to files.viewable-mime-types [puppet] - 10https://gerrit.wikimedia.org/r/317500 [15:02:55] Dereckson , prepared the change but git-review -R says that the remote ref is closed, so no amends once merged? [15:03:35] arseny92: right [15:03:49] duh [15:03:53] arseny92: Gerrit purpose is to review changes before they are integrated to the master branch [15:04:16] !log enabling/running puppet on caches for 8x varnish ports changes - T107749 [15:04:17] T107749: HTTP/1.1 keepalive for local nginx->varnish conns - https://phabricator.wikimedia.org/T107749 [15:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:04:38] arseny92: once integrated to master, you prepare a new change, generally called "follow-up" [15:06:17] 06Operations, 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2738322 (10Nuria) [15:07:04] Krenair: pong [15:07:31] gwicke, are we missing any more of these? https://gerrit.wikimedia.org/r/317511 [15:09:21] Krenair: hard to tell off-hand (don't follow project creations closely); we have a script to re-generate the file from the siteinfo api, which might be the easiest way to check: https://github.com/wikimedia/restbase/blob/master/maintenance/generate-wikimedia-domain-config.js [15:09:42] this is not a new domin [15:11:03] running the script places it in the "wikiversity" section [15:11:19] what other domains does running the script add? [15:11:44] (03CR) 10Hashar: [C: 031] "Looks potentially fine?" [puppet] - 10https://gerrit.wikimedia.org/r/317500 (owner: 10Paladox) [15:11:50] Krenair: https://gist.github.com/gwicke/37461b0204797f2d09f83607adc4b920 [15:12:15] diff? [15:13:03] (03CR) 1020after4: [C: 031] Also display column name when hiding/showing workboard columns [puppet] - 10https://gerrit.wikimedia.org/r/317323 (owner: 10Aklapper) [15:13:58] 06Operations, 06ELiSo, 10RESTBase, 10VisualEditor, and 2 others: RESTBase thinks beta.wikiversity pages don't exist - https://phabricator.wikimedia.org/T148861#2735081 (10Mvolz) >>! In T148861#2738115, @AlexMonk-WMF wrote: >>>! In T148861#2737349, @Psychoslave wrote: >> How long may the deployment train de... [15:15:41] (03CR) 10Jcrespo: "This breaks labsdb1005 and labsdb1007: "E: Unable to locate package postgresql-9.1-postgis-scripts"" [puppet] - 10https://gerrit.wikimedia.org/r/317494 (https://phabricator.wikimedia.org/T144763) (owner: 10Gehel) [15:16:27] (03PS1) 10Arseny1992: Fix capitalization for change 317387 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317523 (https://phabricator.wikimedia.org/T148328) [15:16:52] Dereckson ^ [15:16:58] krenair: only one potentially missing I see is labtestwikitech.wikimedia.org [15:17:04] arseny92: thanks [15:17:12] gwicke, that's like wikitech [15:17:25] doesn't sit behind varnish [15:17:49] it's listed separately in siteinfo [15:18:42] so if that is mapping to actual wikitech, then nothing would be missing [15:18:55] labtestwikitech doesn't map to actual wikitech [15:18:56] there are a couple in the config that are (no longer) in siteinfo [15:19:01] Dereckson , sync this then ;) [15:19:02] it is a separate wiki in it's own right [15:19:07] ACKNOWLEDGEMENT - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[postgresql-9.1-postgis-scripts] Gehel puppte error related to https://gerrit.wikimedia.org/r/#/c/317494/ - patch coming up... [15:19:07] ACKNOWLEDGEMENT - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 19 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[postgresql-9.1-postgis-scripts] Gehel puppte error related to https://gerrit.wikimedia.org/r/#/c/317494/ - patch coming up... [15:19:07] like older wikimania wikis [15:19:22] it's just used for testing stuff [15:19:30] when stuff closes we don't remove it from the list [15:19:38] hm [15:19:41] yeah, I figured [15:19:53] why is closed stuff not listed there? [15:20:03] (in general) [15:20:20] the script might filter those out [15:20:22] arseny92: schedule it for the 21:00 UTC+3 SWAT perhaps? [15:20:29] it does [15:20:30] but why [15:20:55] don't some privileged users retain edit access on closed wikis? [15:21:18] we wrote this script when we expanded RB to cover all active wikis [15:22:06] I think we didn't see a reason to include closed ones, but if there is, then it would be an easy change [15:22:35] (03PS2) 10Jcrespo: proxysql: Fix typos on role [puppet] - 10https://gerrit.wikimedia.org/r/317517 (https://phabricator.wikimedia.org/T148500) [15:23:47] (03PS3) 10Jcrespo: proxysql: Fix typos on role [puppet] - 10https://gerrit.wikimedia.org/r/317517 (https://phabricator.wikimedia.org/T148500) [15:24:00] I can edit https://wikimania2014.wikimedia.org/wiki/Wikimania [15:24:36] that one is listed [15:24:39] https://wikimania2013.wikimedia.org/wiki/Main_page?veaction=edit is broken, I can edit it in source mode [15:26:31] Reedy, around? [15:26:44] it probably doesn't matter much [15:26:52] let's focus on beta.wikiversity for now then [15:27:03] (03PS1) 10Gehel: postgis - postgis scripts package does not exist for precise [puppet] - 10https://gerrit.wikimedia.org/r/317527 (https://phabricator.wikimedia.org/T144763) [15:27:05] Krenair: yeah, you are the first to notice & speak up [15:27:13] I think most of those projects are really rather dead [15:27:43] most people with the level of access that allows me to edit those closed wikis probably don't use VE :) [15:28:23] yeah, the overlap in venn diagrams leaves about one person ;) [15:28:34] heh [15:28:45] I think they remove all local admins too [15:28:50] after wiki closure [15:29:50] Dereckson , we're both around now, and this change is rather cosmetic, so why it would need scheduling as computers and PHP don't distinguish that capitalization, humans do, so it wouldn't break anything [15:29:56] (03CR) 10Gehel: [C: 032] postgis - postgis scripts package does not exist for precise [puppet] - 10https://gerrit.wikimedia.org/r/317527 (https://phabricator.wikimedia.org/T144763) (owner: 10Gehel) [15:30:25] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to "Production shell" for pmiazga - https://phabricator.wikimedia.org/T148477#2738408 (10elukey) >>! In T148477#2738302, @pmiazga wrote: > @elukey - I just checked access and it works \o/. The only small issue is that I have to put ke... [15:30:26] Krenair , #wikimedia-stewards for appropriate closure of projects [15:30:42] arseny92, hi? [15:31:05] PROBLEM - puppet last run on mc1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:57] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [15:32:59] arseny92: configuration changes are typically deployed during SWAT windows, from Monday to Thursday at 16:00, 21:00 and 2:00 in your timezone [15:33:33] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:33:48] arseny92: and it's a complicated change to merge, deployer need to livehack it [15:34:09] livehack? [15:34:13] (03CR) 10Jcrespo: [C: 032] proxysql: Fix typos on role [puppet] - 10https://gerrit.wikimedia.org/r/317517 (https://phabricator.wikimedia.org/T148500) (owner: 10Jcrespo) [15:34:18] (03PS4) 10Jcrespo: proxysql: Fix typos on role [puppet] - 10https://gerrit.wikimedia.org/r/317517 (https://phabricator.wikimedia.org/T148500) [15:34:54] (03PS13) 10Alexandros Kosiaris: icinga: Remove event_profiling_enabled [puppet] - 10https://gerrit.wikimedia.org/r/315085 [15:34:56] (03PS2) 10Alexandros Kosiaris: tcpircbot: Use base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/317515 [15:34:58] (03PS9) 10Alexandros Kosiaris: Replace neon with einsteinium where applicable [puppet] - 10https://gerrit.wikimedia.org/r/315257 [15:35:00] (03PS10) 10Alexandros Kosiaris: Remove absented /etc/icinga/puppet_hostextinfo.cfg entry [puppet] - 10https://gerrit.wikimedia.org/r/315244 [15:35:07] arseny92: InitialiseSettings.php and CommonSettings.php won't be deployed as the same time. If CS reaches the server first, the variable doesn't exist as IS hasn't been updated. If IS reaches first, CS don't get the old variable. That will leads to flood logs with variable missing errors. [15:36:18] arseny92: there are two solutions: either do that in two patches, one to add the new setting, the other to remove the old (th.cipriani likes that) or to modify slighly one of the two files when syncing to avoid the error (ree.dy likes that) [15:36:32] arseny92, Dereckson: and occasionally, if the deployer wasn't expecting it or doesn't know what they're doing, a panic revert after seeing the log flood :) [15:36:54] so it's a "cosmetic" change but something needing care to deploy [15:37:58] Dereckson: I'm back. [15:39:05] uh, after checking out http://php.net/manual/en/language.variables.basics.php your above comment on landing times makes sense [15:39:27] "The variable name is case-sensitive." [15:39:33] uh :/ [15:40:07] arseny92: Does it belong to me? [15:41:28] (03CR) 10GWicke: [C: 031] Add missing wiki to RB [puppet] - 10https://gerrit.wikimedia.org/r/317511 (https://phabricator.wikimedia.org/T148861) (owner: 10Alex Monk) [15:41:30] (03PS1) 10BBlack: tlsproxy: http2: limit concurrency to 64 (was 128) [puppet] - 10https://gerrit.wikimedia.org/r/317528 (https://phabricator.wikimedia.org/T107749) [15:41:48] (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: http2: limit concurrency to 64 (was 128) [puppet] - 10https://gerrit.wikimedia.org/r/317528 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [15:44:18] Urbanecm that belong to [18:35] arseny92: InitialiseSettings.php and CommonSettings.php won't be deployed as the same time. If CS reaches the server first, the variable doesn't exist as IS hasn't been updated. If IS reaches first, CS don't get the old variable. That will leads to flood logs with variable missing errors [15:45:47] (03PS10) 10Alexandros Kosiaris: Replace neon with einsteinium where applicable [puppet] - 10https://gerrit.wikimedia.org/r/315257 [15:46:18] Okay, sorry. [15:48:56] (03PS5) 10Jcrespo: proxysql: Fix typos on role [puppet] - 10https://gerrit.wikimedia.org/r/317517 (https://phabricator.wikimedia.org/T148500) [15:49:53] Krenair: i count 287 wikipedias in the RB config.yaml but i have 295 wikipedias in wikistats, it looks [15:50:05] RECOVERY - cassandra CQL 10.192.48.57:9042 on maps2004 is OK: TCP OK - 0.036 second response time on port 9042 [15:51:48] (03CR) 10Jcrespo: [V: 032] proxysql: Fix typos on role [puppet] - 10https://gerrit.wikimedia.org/r/317517 (https://phabricator.wikimedia.org/T148500) (owner: 10Jcrespo) [15:53:09] Krenair: aa, cho, ho, hz, ii, kj, kr, mh, mo, mus, ng (nostalgia, test,test2) [15:53:40] iirc aawiki is closed [15:54:04] don't recognise the others off the top of my head but chances are those are as well? [15:54:13] test/test2 should be on there [15:54:55] RECOVERY - puppet last run on mc1017 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [15:55:03] (03PS14) 10Alexandros Kosiaris: icinga: Remove event_profiling_enabled [puppet] - 10https://gerrit.wikimedia.org/r/315085 [15:55:05] (03PS3) 10Alexandros Kosiaris: tcpircbot: Use base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/317515 [15:55:07] (03PS11) 10Alexandros Kosiaris: Replace neon with einsteinium where applicable [puppet] - 10https://gerrit.wikimedia.org/r/315257 [15:55:09] (03PS11) 10Alexandros Kosiaris: Remove absented /etc/icinga/puppet_hostextinfo.cfg entry [puppet] - 10https://gerrit.wikimedia.org/r/315244 [15:55:18] Krenair: yes, but depends on the definition of closed. if the main page says "has been closed" but it's still reachable and the API returns statistics, then it's considered working for stats [15:55:26] so yes [15:55:44] checked "mus" f.e. and that was the case [15:56:29] Dereckson , and what is the typical time between deploying CS and IS? [15:58:45] PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [24.0] [16:01:02] if the build sync is off by just some secs, that won't produce a big logflood (or is it?) [16:05:44] 06Operations, 10ops-eqiad, 10Traffic: cp3021 failed disk sdb - https://phabricator.wikimedia.org/T148983#2738539 (10faidon) [16:06:26] 06Operations, 10ops-eqiad, 10Traffic: cp3021 failed disk sdb - https://phabricator.wikimedia.org/T148983#2738539 (10BBlack) cp3021 is not in service. it's meant to be decom/spare, but waiting on the decom part... [16:06:53] 06Operations, 10ops-esams, 10Traffic: cp3021 failed disk sdb - https://phabricator.wikimedia.org/T148983#2738554 (10faidon) [16:08:26] PROBLEM - puppet last run on dbproxy1011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 9 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/run/proxysql] [16:08:45] PROBLEM - salt-minion processes on dbproxy1011 is CRITICAL: NRPE: Command check_check_salt_minion not defined [16:10:04] arseny92: 1-2 seconds if synced together and not done manually one per one but that's enough for 2500-4000 errors served [16:11:09] Maybe we can point the error-reporting bot to logging file and then look if there will be another errors than expected. [16:11:24] RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [16:14:04] 06Operations, 10ops-esams, 10Traffic: cp3021 failed disk sdb - https://phabricator.wikimedia.org/T148983#2738592 (10BBlack) Found decom ticket: T130883 [16:15:59] 06Operations, 10hardware-requests: codfw/eqiad: 2x systems for prometheus - https://phabricator.wikimedia.org/T148513#2725201 (10RobH) We have some spare systems in eqiad that may meet this (4*4tb 32GB systems), but none in codfw. Task T145112 has info for a spare pool refresh in codfw. [16:16:09] 06Operations, 10hardware-requests: codfw/eqiad: 2x systems for prometheus - https://phabricator.wikimedia.org/T148513#2738600 (10RobH) a:03RobH [16:17:23] 06Operations: Firewall sets not being loaded post-reboot due to a @resolve race - https://phabricator.wikimedia.org/T148986#2738609 (10faidon) [16:19:13] PROBLEM - Host mw2098 is DOWN: PING CRITICAL - Packet loss = 100% [16:19:14] 06Operations, 10ops-codfw, 06DC-Ops: mw2098 unreachable by mgmt - https://phabricator.wikimedia.org/T148719#2738629 (10faidon) p:05Triage>03Normal a:03Papaul [16:19:58] ACKNOWLEDGEMENT - Host mw2098 is DOWN: PING CRITICAL - Packet loss = 100% Faidon Liambotis T148719 [16:24:04] (03PS15) 10Alexandros Kosiaris: icinga: Remove event_profiling_enabled [puppet] - 10https://gerrit.wikimedia.org/r/315085 [16:24:06] (03PS4) 10Alexandros Kosiaris: tcpircbot: Use base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/317515 [16:24:08] (03PS12) 10Alexandros Kosiaris: Replace neon with einsteinium where applicable [puppet] - 10https://gerrit.wikimedia.org/r/315257 [16:24:10] (03PS12) 10Alexandros Kosiaris: Remove absented /etc/icinga/puppet_hostextinfo.cfg entry [puppet] - 10https://gerrit.wikimedia.org/r/315244 [16:28:13] Hi, is it technically possible to run a SQL query which needs access to the text table? [16:28:24] Urbanecm: aye [16:28:49] labs doesn't replicate text table by the way? [16:28:51] Ehm, I posted it to differrent channel than I wanted to. [16:28:55] Nope. I can't see it. [16:29:05] See https://cs.wikipedia.org/wiki/Wikipedista:Martin_Urbanec/Random_notes/show_tables [16:29:17] This is result of show tables; query at toollabs. [16:29:45] Dereckson scheduled then as requested , since i see you don't veto that change outright [16:29:58] I thought this is planned behaviour but it seems it isn't... [16:30:44] arseny92: Because I don't see any. My two changes are still in Mid-day SWAT. [16:30:48] ACKNOWLEDGEMENT - puppet last run on dbproxy1011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 13 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/run/proxysql] Jcrespo host is being installed for the first time T148500 [16:30:48] ACKNOWLEDGEMENT - salt-minion processes on dbproxy1011 is CRITICAL: NRPE: Command check_check_salt_minion not defined Jcrespo host is being installed for the first time T148500 [16:30:58] Urbanecm: if it's for a one shot, you can write the query somewhere and someone can run it for you ; if it's for a tool, a more durable solution must be found [16:32:10] I don't know if this should be ran by tool, I'm only asking because another user asked if he can find list of articles which he created by adding some text to existing redirect. [16:33:27] About your comment Dereckson, yeah, watchlistdays is set ATM but cswiki wants to has 14 days displayed instead of 3 (which is default value ATM). [16:33:48] And this is what I tried to setup at https://gerrit.wikimedia.org/r/#/c/316295/3/wmf-config/InitialiseSettings.php . [17:00:04] gehel: Respected human, time to deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161024T1700). Please do the needful. [17:00:26] SMalyshev: deploy done on https://wdqs-test.wmflabs.org/ [17:00:49] SMalyshev: looks good... [17:00:51] dbproxy1011 has some issues- it is a new service, so you can ignore them for now [17:01:00] I have acked the alerts on icinga [17:01:09] gehel: yes [17:01:30] (03PS2) 10Elukey: Add ppchelko to the deployment group [puppet] - 10https://gerrit.wikimedia.org/r/317121 (https://phabricator.wikimedia.org/T148475) [17:02:39] !log deplyoing latest GUI for WDQS [17:02:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:03:15] (03CR) 10Elukey: [C: 032] Add ppchelko to the deployment group [puppet] - 10https://gerrit.wikimedia.org/r/317121 (https://phabricator.wikimedia.org/T148475) (owner: 10Elukey) [17:03:21] (03PS1) 10BBlack: tlsproxy: http2: limit concurrency to 512 (was 64) [puppet] - 10https://gerrit.wikimedia.org/r/317536 (https://phabricator.wikimedia.org/T107749) [17:03:28] (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: http2: limit concurrency to 512 (was 64) [puppet] - 10https://gerrit.wikimedia.org/r/317536 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [17:03:40] (03PS2) 10BBlack: tlsproxy: http2: limit concurrency to 512 (was 64) [puppet] - 10https://gerrit.wikimedia.org/r/317536 (https://phabricator.wikimedia.org/T107749) [17:03:42] (03CR) 10BBlack: [V: 032] tlsproxy: http2: limit concurrency to 512 (was 64) [puppet] - 10https://gerrit.wikimedia.org/r/317536 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [17:03:44] (03CR) 10Elukey: "Forgot to mention: the change has been approved by the Ops meeting" [puppet] - 10https://gerrit.wikimedia.org/r/317121 (https://phabricator.wikimedia.org/T148475) (owner: 10Elukey) [17:04:08] elukey: merged yours [17:04:35] thanks! [17:04:54] Pchelolo: you are now in the 'deployment' group :) [17:05:03] (give puppet a bit of time to propagate) [17:05:26] ^ how does one get deploy group? [17:05:38] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 06Services (blocked): Access to fluorine for Petr - https://phabricator.wikimedia.org/T148475#2738849 (10elukey) 05Open>03Resolved Approved by the Ops meeting and change merged! [17:05:49] SMalyshev: deployment completed, looks good, feel free to test [17:08:19] gehel: excellent, thanks! [17:09:10] Zppix: usually with an Access request task, but it must be mentioned a valid reason since it also grands sudo permissions on some hosts (needs ops meeting approval) [17:09:21] SMalyshev: at your service! [17:09:23] *grants [17:11:17] Oh k [17:11:33] (03CR) 10Alexandros Kosiaris: [C: 032] tcpircbot: Use base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/317515 (owner: 10Alexandros Kosiaris) [17:11:36] (03PS5) 10Alexandros Kosiaris: tcpircbot: Use base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/317515 [17:11:38] (03CR) 10Alexandros Kosiaris: [V: 032] tcpircbot: Use base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/317515 (owner: 10Alexandros Kosiaris) [17:11:40] (03PS1) 10Elukey: Add user pmiazga to the 'deployment' group [puppet] - 10https://gerrit.wikimedia.org/r/317539 (https://phabricator.wikimedia.org/T148477) [17:14:06] (03PS2) 10Elukey: Add user pmiazga to the 'deployment' group [puppet] - 10https://gerrit.wikimedia.org/r/317539 (https://phabricator.wikimedia.org/T148477) [17:15:14] (03CR) 10Elukey: [C: 032] Add user pmiazga to the 'deployment' group [puppet] - 10https://gerrit.wikimedia.org/r/317539 (https://phabricator.wikimedia.org/T148477) (owner: 10Elukey) [17:16:48] (03PS1) 10Jcrespo: proxysql: Fix process check and user permissions; Fix .my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/317542 (https://phabricator.wikimedia.org/T148500) [17:17:15] (03PS2) 10Jcrespo: proxysql: Fix process check and user permissions; Fix .my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/317542 (https://phabricator.wikimedia.org/T148500) [17:18:33] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to "Production shell" for pmiazga - https://phabricator.wikimedia.org/T148477#2738907 (10elukey) 05Open>03Resolved a:03elukey Approved by the ops meeting, @pmiazga you are now also in the `deployment` group. As said before pleas... [17:20:44] mutante: closed the tasks opened, there is only one that should be actionable tomorrow [17:21:02] and there is some garbage in RT, I tried to clean it as much as possible [17:21:16] let me know if there are other things to handover :) [17:21:47] (03PS3) 10Jcrespo: proxysql: Fix process check and user permissions; Fix .my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/317542 (https://phabricator.wikimedia.org/T148500) [17:23:33] (03CR) 10Jcrespo: [C: 032] proxysql: Fix process check and user permissions; Fix .my.cnf [puppet] - 10https://gerrit.wikimedia.org/r/317542 (https://phabricator.wikimedia.org/T148500) (owner: 10Jcrespo) [17:23:36] (03PS1) 10BBlack: http2 concurrency: 32 [puppet] - 10https://gerrit.wikimedia.org/r/317543 (https://phabricator.wikimedia.org/T107749) [17:23:37] elukey: thank you, sounds good [17:23:44] (03CR) 10BBlack: [C: 032 V: 032] http2 concurrency: 32 [puppet] - 10https://gerrit.wikimedia.org/r/317543 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [17:23:45] I am attempting to get a replica or subset replica of wikimedia's database for a data science lab. [17:23:50] (!@*#& [17:23:54] (03PS2) 10BBlack: http2 concurrency: 32 [puppet] - 10https://gerrit.wikimedia.org/r/317543 (https://phabricator.wikimedia.org/T107749) [17:23:57] (03CR) 10BBlack: [V: 032] http2 concurrency: 32 [puppet] - 10https://gerrit.wikimedia.org/r/317543 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [17:27:37] RECOVERY - puppet last run on dbproxy1011 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:31:10] HIH_Henri_III: see https://dumps.wikimedia.org/ [17:31:14] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:33:52] (03PS1) 10BBlack: http2 concurrency: back to defaults [puppet] - 10https://gerrit.wikimedia.org/r/317544 (https://phabricator.wikimedia.org/T107749) [17:34:03] (03CR) 10BBlack: [C: 032 V: 032] http2 concurrency: back to defaults [puppet] - 10https://gerrit.wikimedia.org/r/317544 (https://phabricator.wikimedia.org/T107749) (owner: 10BBlack) [17:34:15] RECOVERY - Router interfaces on pfw-eqiad is OK: OK: host 208.80.154.218, interfaces up: 107, down: 0, dormant: 0, excluded: 2, unused: 0 [17:36:27] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [17:43:32] 06Operations, 10Ops-Access-Requests, 06Release-Engineering-Team, 13Patch-For-Review: Add Tyler Cipriani to releasers-mediawiki - https://phabricator.wikimedia.org/T148681#2738982 (10Dzahn) a:03Dzahn [17:43:53] (03CR) 10Mobrovac: [C: 031] Add missing wiki to RB [puppet] - 10https://gerrit.wikimedia.org/r/317511 (https://phabricator.wikimedia.org/T148861) (owner: 10Alex Monk) [17:44:09] (03PS2) 10Chad: Adding Tyler to releasers-mediawiki group [puppet] - 10https://gerrit.wikimedia.org/r/316843 (https://phabricator.wikimedia.org/T148681) [17:44:40] (03PS3) 10Dzahn: Adding Tyler to releasers-mediawiki group [puppet] - 10https://gerrit.wikimedia.org/r/316843 (https://phabricator.wikimedia.org/T148681) (owner: 10Chad) [17:45:13] (03PS2) 10Dzahn: Add missing wiki to RB [puppet] - 10https://gerrit.wikimedia.org/r/317511 (https://phabricator.wikimedia.org/T148861) (owner: 10Alex Monk) [17:45:19] (03CR) 10Dzahn: [C: 032] Add missing wiki to RB [puppet] - 10https://gerrit.wikimedia.org/r/317511 (https://phabricator.wikimedia.org/T148861) (owner: 10Alex Monk) [17:51:56] bd808: Database back-up dumps for English Wikipedia are housed as wikidatawiki or commonswiki? I need to produce a CSV containing the information displayed in the infobox tables, so I can have students find information under different paradigms. [17:53:14] HIH_Henri_III: english wikipedia is "enwiki" -- https://dumps.wikimedia.org/enwiki/20161020/ [17:54:11] HIH_Henri_III: If you're interested in getting structured data, you might want to read up about Wikidata: https://www.wikidata.org/wiki/Wikidata:Introduction [17:54:21] (but note, that data doesn't map 1:1 to Wikipedia) [17:56:30] hoo: I was going to finish with SparQL. It will be a lesson well learned, because no one wants to download a 14GB of data, and figure out how to load that into excel [17:58:34] (03PS1) 10Jcrespo: proxysql: Add firewall to labs role [puppet] - 10https://gerrit.wikimedia.org/r/317548 (https://phabricator.wikimedia.org/T148500) [18:00:01] (03PS2) 10Jcrespo: proxysql: Add firewall to labs role [puppet] - 10https://gerrit.wikimedia.org/r/317548 (https://phabricator.wikimedia.org/T148500) [18:00:07] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161024T1800). [18:00:07] Krinkle, Niharika, arseny92, and dcausse: A patch you scheduled for Morning SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [18:00:19] .. [18:00:20] jenkins is vary slow [18:00:26] o/ [18:00:32] o/ [18:01:06] (03PS3) 10Jcrespo: proxysql: Add firewall to labs role [puppet] - 10https://gerrit.wikimedia.org/r/317548 (https://phabricator.wikimedia.org/T148500) [18:01:44] it may take anywhere from 15 - 1 hr for patches to merge, judging from https://integration.wikimedia.org/zuul/ [18:02:18] I can SWAT today, just have to get setup (and as long as zuul cooperates). [18:02:23] Krinkle: ping for SWAT [18:02:25] Thanks for that update, paladox. [18:02:34] Your welcome [18:03:50] hey mutante, has puppet applied on the restbase boxes? [18:04:11] Niharika: ok, lets see if we can get some collation changes out. [18:04:28] thcipriani: Sure! Hope they get merged. [18:05:03] !log downgrading nginx(+linked openssl implicitly) on cp* [18:05:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:05:21] Good morning thcipriani. For arseny92 change, there is an IS/CS sync dance to do. [18:05:40] Krenair: no, it has not been merged [18:06:09] Dereckson: ah, so there is. [18:07:41] arseny92: could you split that into 2 patches please? One to add wmgWatchlistDefault in InitialiseSettings.php (while leaving the old spelling) and remove wmgWatchlistdefault from CommonSettings.php, the next to remove wmgWatchlistdefault from InitialiseSettings.php? [18:08:04] sorry for the hassle, out current deployment isn't too atomic so we need to be manually atomic :) [18:08:34] (03PS1) 10Filippo Giunchedi: base: send syslog only to codfw to reimage lithium [puppet] - 10https://gerrit.wikimedia.org/r/317550 (https://phabricator.wikimedia.org/T143307) [18:09:19] (03PS7) 10Thcipriani: Set $wgCategoryCollation to 'uca-hr' for Croatian wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317139 (https://phabricator.wikimedia.org/T148749) (owner: 10Niharika29) [18:09:24] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317139 (https://phabricator.wikimedia.org/T148749) (owner: 10Niharika29) [18:09:57] (03Merged) 10jenkins-bot: Set $wgCategoryCollation to 'uca-hr' for Croatian wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317139 (https://phabricator.wikimedia.org/T148749) (owner: 10Niharika29) [18:10:26] ok, what should I do on the current change, remove CS first or IS? [18:11:40] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2739102 (10Paladox) We could do this -Xloggc:/var/lib/gerrit2/review_site/logs/`date +%F_%H-%M-%S`-gc.log -XX:+Prin... [18:13:35] arseny92: I'd say in the current change add wmgWatchlistDefault in InitialiseSettings.php (and leave wmfWatchlistdefault) and remove wmgWatchlistdefault from CommonSettings.php that way I can sync InitialiseSettings.php then CommonSettings.php. [18:14:00] arseny92: then create a change where you just remove wmfWatchlistdefault from InitialiseSettings.php and I'll sync that one independently [18:15:27] Niharika: your change is on mw1099, but probably not much to check there until the maintenance script is run. Anything to check before syncing everywhere? [18:15:33] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] base: send syslog only to codfw to reimage lithium [puppet] - 10https://gerrit.wikimedia.org/r/317550 (https://phabricator.wikimedia.org/T143307) (owner: 10Filippo Giunchedi) [18:16:01] Dereckson: how do you handle collation updates? Sync everywhere then run the script? Or pull on wasat and run the script there, then sync? [18:16:43] thcipriani: No, that's good. Thanks. [18:17:14] Niharika: ok, syncing everywhere. [18:18:14] thcipriani: check on mw1099 a category page if the code is new (an invalid code = a fatal error there) [18:18:24] but uca-hr is already used, so that's fine [18:18:44] and then sync everywhere then run the script [18:19:22] (03PS2) 10Filippo Giunchedi: prometheus-node-exporter: allow access from ipv6 too [puppet] - 10https://gerrit.wikimedia.org/r/317125 [18:19:23] What I'd pull on wasat/terbium would be a backport to the collation script [18:19:43] (but I guess it could be run from mw1099 too) [18:19:57] !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:317139|Set $wgCategoryCollation to uca-hr for Croatian wikipedia (T148749)]] (duration: 00m 57s) [18:19:59] T148749: Set $wgCategoryCollation to 'uca-hr' on Croatian Wikipedia and rebuild category sort keys - https://phabricator.wikimedia.org/T148749 [18:20:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:20:26] ^ Niharika live everywhere, should I run the updateCollation script or are you setup to do that? [18:20:57] thcipriani: I can do that. Thanks a lot! [18:21:05] Niharika: okie doke, thanks :) [18:22:05] (03PS4) 10Dzahn: Adding Tyler to releasers-mediawiki group [puppet] - 10https://gerrit.wikimedia.org/r/316843 (https://phabricator.wikimedia.org/T148681) (owner: 10Chad) [18:23:10] (03CR) 10Filippo Giunchedi: [C: 031] add mapped IPv6 address for eventlog1001 [puppet] - 10https://gerrit.wikimedia.org/r/317192 (owner: 10Dzahn) [18:23:20] (03PS2) 10Thcipriani: Switch bs, hr and uk wikis to numeric collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317502 (https://phabricator.wikimedia.org/T148682) (owner: 10Niharika29) [18:23:38] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317502 (https://phabricator.wikimedia.org/T148682) (owner: 10Niharika29) [18:24:07] (03Merged) 10jenkins-bot: Switch bs, hr and uk wikis to numeric collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317502 (https://phabricator.wikimedia.org/T148682) (owner: 10Niharika29) [18:25:26] thcipriani: pong [18:25:59] Krinkle: ack. I'll get your change moving along :) [18:26:44] Niharika: your numeric collation change for bs, hr, and uk is live on mw1099, check please [18:27:04] thcipriani: On it. [18:27:23] (?)Recombine all pages, current versions only(?), enwiki-20161001-pages-meta-current.xml.bz2; Does this file contain the complete current information (header, body, infobox) displayed on the wiki page for the article? [18:28:03] (03CR) 10Filippo Giunchedi: [C: 032] prometheus-node-exporter: allow access from ipv6 too [puppet] - 10https://gerrit.wikimedia.org/r/317125 (owner: 10Filippo Giunchedi) [18:28:13] (03PS2) 10Arseny1992: Fix capitalization for change 317387 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317523 (https://phabricator.wikimedia.org/T148328) [18:28:15] thcipriani , like this? [18:28:31] * thcipriani checks [18:28:45] so it have both for the moment [18:29:24] HIH_Henri_III: it contains the wikitext of all articles, templates, talk pages, and every other page [18:29:27] thcipriani: Looks all okay. [18:29:52] infoboxes on an article page are included as the template invocations with parameters [18:29:57] arseny92: yup, and in that same patch remove wmgWatchlistdefault from being referenced in CommonSettings.php. Then remove wmgWatchlistdefault in InitialiseSettings.php in a different patch. Thank you for your help :) [18:30:07] Niharika: ack, going live everywhere [18:30:10] the infobox templates ehmselves are included as code on the Template:Whatever pages [18:31:32] apergos:(?) enwiki-20161001-pages-articles.xml.bz2 (?), article data only [18:31:41] 06Operations, 06Multimedia, 10Traffic, 15User-Josve05a, 15User-Urbanecm: Thumbnails failing to render sporadically (ERR_CONNECTION_CLOSED or ERR_SSL_BAD_RECORD_MAC_ALERT) - https://phabricator.wikimedia.org/T148917#2739171 (10BBlack) Can anyone still repro this issue? We've done a bunch of debugging tod... [18:32:20] !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:317502|Switch bs, hr and uk wikis to numeric collation (T148682)]] (duration: 00m 50s) [18:32:21] T148682: Convert wikis to numerical sorting batch #3 - https://phabricator.wikimedia.org/T148682 [18:32:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:32:27] ^ Niharika live everywhere [18:32:46] thcipriani: Awesome, thanks! [18:32:57] that one has articles and templates but no talk pages [18:33:07] so it would also be sufficient for your needs [18:33:31] 06Operations, 06Multimedia, 10Traffic, 15User-Josve05a, 15User-Urbanecm: Thumbnails failing to render sporadically (ERR_CONNECTION_CLOSED or ERR_SSL_BAD_RECORD_MAC_ALERT) - https://phabricator.wikimedia.org/T148917#2739176 (10Paladox) Nope carn't reproduce it yet. [18:33:56] (03CR) 10Dzahn: [C: 032] Adding Tyler to releasers-mediawiki group [puppet] - 10https://gerrit.wikimedia.org/r/316843 (https://phabricator.wikimedia.org/T148681) (owner: 10Chad) [18:34:01] (03PS3) 10Arseny1992: Fix capitalization for change 317387 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317523 (https://phabricator.wikimedia.org/T148328) [18:34:07] (03PS5) 10Dzahn: Adding Tyler to releasers-mediawiki group [puppet] - 10https://gerrit.wikimedia.org/r/316843 (https://phabricator.wikimedia.org/T148681) (owner: 10Chad) [18:34:40] this effectively reverts the change then [18:35:06] torrent or DCC available? [18:35:17] (03PS1) 10Filippo Giunchedi: standard: deploy prometheus-node-exporter to eqiad [puppet] - 10https://gerrit.wikimedia.org/r/317552 (https://phabricator.wikimedia.org/T140646) [18:35:19] (03CR) 10Dzahn: [V: 032] Adding Tyler to releasers-mediawiki group [puppet] - 10https://gerrit.wikimedia.org/r/316843 (https://phabricator.wikimedia.org/T148681) (owner: 10Chad) [18:35:29] thcipriani [18:36:20] check the dumps torrents page, there will be torrent info there: https://meta.wikimedia.org/wiki/Data_dump_torrents [18:36:57] dcausse: is your config change dependant on the CirrusSearch change or are the independent? [18:37:02] thcipriani: yes [18:37:09] I mean dependent [18:37:29] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:37:35] the wmf22 fix must be deployed before the config change [18:38:12] gotcha :) [18:38:20] thcipriani [18:38:22] arseny92: I just saw your update. [18:38:43] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2739193 (10demon) Close. I was thinking more like... -Xloggc:/var/log/gerrit/jvm_gc.%p.log -XX:+PrintGCDetails... [18:38:54] 06Operations, 10Ops-Access-Requests, 06Release-Engineering-Team, 13Patch-For-Review: Add Tyler Cipriani to releasers-mediawiki - https://phabricator.wikimedia.org/T148681#2739195 (10Dzahn) on bromine.eqiad.wmnet Notice: /Stage[main]/Admin/Admin::Hashuser[thcipriani]/Admin::User[thcipriani]/User[thcipriani... [18:39:13] 06Operations, 10Ops-Access-Requests, 06Release-Engineering-Team, 13Patch-For-Review: Add Tyler Cipriani to releasers-mediawiki - https://phabricator.wikimedia.org/T148681#2739196 (10Dzahn) 05Open>03Resolved [18:39:22] arseny92: I think I miscommunicated, in 317523 by "remove" I meant make CommonSettings.php look like this: https://gerrit.wikimedia.org/r/#/c/317523/1/wmf-config/CommonSettings.php [18:39:29] 06Operations, 10Ops-Access-Requests, 06Release-Engineering-Team: Add Tyler Cipriani to releasers-mediawiki - https://phabricator.wikimedia.org/T148681#2729869 (10Dzahn) [18:39:30] so remove the *old* variable [18:40:10] (03PS3) 10Dzahn: Add missing wiki to RB [puppet] - 10https://gerrit.wikimedia.org/r/317511 (https://phabricator.wikimedia.org/T148861) (owner: 10Alex Monk) [18:41:59] You said to temove it from there as well [18:42:09] urandom: Whee, https://phabricator.wikimedia.org/T148655#2739144 :D [18:42:28] https://meta.wikimedia.org/wiki/Data_dump_torrents#enwiki , link to http://burnbit.com/torrent/491258/enwiki_20160820_pages_articles_xml_bz2, page loading issue. [18:42:41] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2739204 (10Paladox) Oh yes please :), would you like me to do the change or are you? [18:43:32] (03PS4) 10Arseny1992: Fix capitalization for change 317387 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317523 (https://phabricator.wikimedia.org/T148328) [18:43:41] changed again [18:43:55] ostriches: you know, `%t` and `%p` macros sound familiar, i want to say we tried that and it didn't work [18:43:57] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2739218 (10ArielGlenn) I'd like to get this going during a normal usage period with babysitting for the first little... [18:43:59] arseny92: perfect, thanks! :) [18:44:02] i want to say godog tried it [18:44:06] godog: ^^^ [18:44:14] but that would have been a while ago [18:44:25] aa timestamp would help tho [18:44:35] Krenair: now the restbase change is merged, puppet ran on the first few [18:44:36] Wasn't backported to all releases, so might not work everywhere. [18:44:41] But worth a shot imho [18:44:48] mobrovac, gwicke: need to do any restarts? [18:44:54] (03PS5) 10Thcipriani: Fix capitalization for change 317387 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317523 (https://phabricator.wikimedia.org/T148328) (owner: 10Arseny1992) [18:44:58] ostriches: yeah [18:45:14] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2739221 (10Paladox) Normal usage should be around now, highest peak time. [18:45:25] aye, iirc it was documented in oracle jvm docs but then not implemented in openjvm [18:45:36] the ordering would still be even more confusing, but at least it wouldn't trample recent logs [18:45:45] arseny92: can I get you to create a 2nd patch that removes wmgWatchlistdefault from InitialiseSettings.php? [18:45:50] s/still // [18:46:07] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317523 (https://phabricator.wikimedia.org/T148328) (owner: 10Arseny1992) [18:46:11] godog: Looks like it made it into 8 and basically anything 7u80 or newer [18:46:24] we definitely tried it since then [18:46:33] (03Merged) 10jenkins-bot: Fix capitalization for change 317387 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317523 (https://phabricator.wikimedia.org/T148328) (owner: 10Arseny1992) [18:46:46] i mean, it definitely would have been within that time frame [18:46:50] doing [18:47:04] but still worth trying again [18:47:21] Krinkle: you change is live on mw1099, check if possible please [18:47:23] *your [18:47:28] yeah easy enough to try [18:47:37] If they don't, using `date` or something in the init could work [18:47:55] icinga-wm isnt talking about the IPsec CRITs [18:47:56] Like suggested https://phabricator.wikimedia.org/T148478#2739102 [18:48:03] but they are also not ACKed or disabled [18:49:01] there's also https://gerrit.wikimedia.org/r/316983 [18:49:01] that may help and https://gerrit.wikimedia.org/r/317322 [18:49:17] ostriches ^^ [18:50:07] Yep, testing.. [18:50:36] paladox: I don't want to change GC strategy without logging and seeing what's up first [18:50:37] oh! [18:50:43] Ok [18:50:52] * apergos finally makes the connection between udrandom and eevans [18:50:52] *urandom [18:50:52] ha! [18:50:59] Step 1 is analysis, then we can make appropriate adjustments :) [18:51:03] Ok [18:51:05] yep [18:51:13] ACKNOWLEDGEMENT - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:13] ACKNOWLEDGEMENT - IPsec on cp2004 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:13] ACKNOWLEDGEMENT - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:14] I have your name down here to ask you some jvm brainpicking questions (not now, later) [18:51:14] ACKNOWLEDGEMENT - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:14] ACKNOWLEDGEMENT - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:14] ACKNOWLEDGEMENT - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:14] ACKNOWLEDGEMENT - IPsec on cp2019 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:15] ACKNOWLEDGEMENT - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:15] ACKNOWLEDGEMENT - IPsec on cp3030 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:15] ACKNOWLEDGEMENT - IPsec on cp3031 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:16] ACKNOWLEDGEMENT - IPsec on cp3032 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:16] ACKNOWLEDGEMENT - IPsec on cp3033 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 daniel_zahn https://phabricator.wikimedia.org/T148891 [18:51:52] uhm.. ok, that was a bit odd [18:52:35] sorry for spam but it looked like not working and the ticket is right [18:52:41] (03PS1) 10Arseny1992: Fix capitalization for change 317387 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317555 (https://phabricator.wikimedia.org/T148328) [18:53:09] thcipriani ^ [18:53:20] arseny92: yup, looks good :) [18:53:30] will merge after I sync the other change [18:53:54] (03PS1) 10Andrew Bogott: Revert "labs firstboot.sh: Add instance hostname to /etc/hosts" [puppet] - 10https://gerrit.wikimedia.org/r/317556 [18:54:17] arseny92: thanks again for bearing with me. 2 line changes that span those 2 files get weird :\ [18:55:05] jouncebot update [18:55:43] SWAT is going to run a little long [18:56:06] (whatever the command is, I updated the page referencing the other change) [18:56:30] (03CR) 10Andrew Bogott: [C: 032] Labs images: Only add fqdn of instance to /etc/hosts [puppet] - 10https://gerrit.wikimedia.org/r/317554 (https://phabricator.wikimedia.org/T120830) (owner: 10Andrew Bogott) [18:57:16] 06Operations, 10Traffic: Strongswan Icinga check: do not report issues about depooled hosts - https://phabricator.wikimedia.org/T148976#2738250 (10Dzahn) fwiw, i saw these in Icinga web-ui but icinga-wm was apparently not talking about them on IRC, but i did not see "disabled notifications" or ACKs in the web... [18:59:53] thcipriani hows going? [19:00:25] thcipriani: All fine. I'll keep running more tests in the background on the exact perf impact, but functionally all okay and no obvious perf regression [19:00:51] Krinkle: ok, thank you. Will sync everywhere. [19:01:05] arseny92: will get your patches out the door after I sync out this next change. [19:01:29] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:01:31] burnbit.com is down. God gave us T3 for us to be entertained and beer to show he loves us. Thanks everyone. [19:01:31] T3: Upgrade logstash elasticsearch instance to latest version - https://phabricator.wikimedia.org/T3 [19:03:05] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Add shell scripts CI validations - https://phabricator.wikimedia.org/T148494#2724601 (10fgiunchedi) >>! In T148494#2737503, @hashar wrote: > Finally we will need an entry point in puppet.git to easily run the command (maybe reu... [19:03:33] !log thcipriani@mira Synchronized php-1.28.0-wmf.22/resources/src/mediawiki/mediawiki.js: SWAT: [[gerrit:317283|resourceloader: Make cache-eval in mw.loader.work asynchronous (T142129)]] (duration: 00m 52s) [19:03:34] T142129: Source code eval should be async in mw.loader.work - https://phabricator.wikimedia.org/T142129 [19:03:39] ^ Krinkle live everywhere [19:03:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:04:54] arseny92: first change live on mw1099, check please [19:06:31] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:07:19] seem working [19:08:23] arseny92: ok, going out everywhere [19:08:25] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Add shell scripts CI validations - https://phabricator.wikimedia.org/T148494#2724601 (10Dzahn) Looks like we have about 206 scripts with "bin/bash" in ops/puppet. About 56 are called .sh or .sh.erb. Would we want to rename the... [19:10:08] !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: [[gerrit:317523|Fix capitalization for change 317387 (T148328)]] PART I (duration: 00m 50s) [19:10:09] T148328: Add all edited pages by myself to watchlist by default (for new users) - https://phabricator.wikimedia.org/T148328 [19:10:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:11:21] !log thcipriani@mira Synchronized wmf-config/CommonSettings.php: [[gerrit:317523|Fix capitalization for change 317387 (T148328)]] PART II (duration: 00m 50s) [19:11:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:12:15] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317555 (https://phabricator.wikimedia.org/T148328) (owner: 10Arseny1992) [19:12:27] now to scap the second change and sync ;) [19:12:35] indeed :) [19:12:50] (03Merged) 10jenkins-bot: Fix capitalization for change 317387 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317555 (https://phabricator.wikimedia.org/T148328) (owner: 10Arseny1992) [19:13:04] need atomic deployment lol [19:13:24] :D [19:13:24] +1 [19:15:20] (03CR) 10Filippo Giunchedi: Enable simple-json-datasource on prod Grafana (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/314029 (https://phabricator.wikimedia.org/T147329) (owner: 10Addshore) [19:15:51] in the old days it was atomic deployment though :/ [19:16:29] !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:317555|Fix capitalization for change 317387 (T148328)]] (duration: 00m 51s) [19:16:42] ^ arseny92 live everywhere :) [19:16:54] T148328: Add all edited pages by myself to watchlist by default (for new users) - https://phabricator.wikimedia.org/T148328 [19:17:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:18:42] dcausse: sorry for the delay, your CirrusSearch update is live on mw1099 if there's anything to check there. [19:18:54] is tin still broken as i see you use mira [19:19:01] thcipriani: np :) looking [19:19:42] (03PS4) 10Thcipriani: [cirrus] Activate BM25 on top 10 wikis: Step 2 (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317159 (https://phabricator.wikimedia.org/T147508) (owner: 10DCausse) [19:19:57] thcipriani: looks good [19:20:05] dcausse: ok, going live everywhere [19:21:31] (03PS2) 10Dzahn: decom palladium from puppet, install_server, network constants [puppet] - 10https://gerrit.wikimedia.org/r/315891 (https://phabricator.wikimedia.org/T147320) [19:21:43] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317159 (https://phabricator.wikimedia.org/T147508) (owner: 10DCausse) [19:22:11] (03Merged) 10jenkins-bot: [cirrus] Activate BM25 on top 10 wikis: Step 2 (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317159 (https://phabricator.wikimedia.org/T147508) (owner: 10DCausse) [19:22:23] !log thcipriani@mira Synchronized php-1.28.0-wmf.22/extensions/CirrusSearch/includes/SearchConfig.php: SWAT: [[gerrit:317512|Add wgContentNamespaces to the list of vars loaded by SearchConfig (T148840)]] (duration: 00m 58s) [19:22:25] T148840: Fix in_array() expects parameter 2 to be an array or collection in /srv/mediawiki/php-1.28.0-wmf.22/extensions/CirrusSearch/includes/Search/RescoreBuilders.php on line 160 - https://phabricator.wikimedia.org/T148840 [19:22:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:22:34] ^ dcausse live everywhere [19:22:50] thcipriani: thanks [19:23:23] dcausse: mw-config change live on mw1099, check please [19:23:28] thcipriani: testing [19:26:38] thcipriani: looks good so far [19:27:13] dcausse: that's good. let me know when you're ready for it to go live. [19:27:48] thcipriani: ready, expect some poolcounter errors (possibly a short spike) [19:28:04] dcausse: ok, thank you for the heads up :) [19:29:16] 06Operations, 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2739351 (10mpopov) >>! In T147682#2738052, @Ottomata wrote: > Oh, just re-read earlier parts... [19:30:09] !log thcipriani@mira Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:317159|[cirrus] Activate BM25 on top 10 wikis: Step 2 (take 2) (T147508)]] (duration: 00m 50s) [19:30:11] T147508: BM25: initial limited release into production - https://phabricator.wikimedia.org/T147508 [19:30:15] ^ dcausse live everywhere [19:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:30:32] thcipriani: thanks, logs are clean? [19:30:56] dcausse: yeah, if there was a log spike, it wasn't enough to break through the other errors there [19:31:06] ok [19:31:39] thcipriani: can we babysit the change for 2/3 more minutes please? [19:31:53] dcausse: yup, I'm around. [19:31:58] ok thanks [19:32:12] s 5 Invalid argument supplied for foreach() in /srv/mediawiki/php-1.28.0-wmf.22/extensions/CirrusSearch/includes/ExplainPrinter.php on line 50 [19:32:42] hm... it's a debug option so most probably me [19:33:31] RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [19:33:44] yeah, that's the only one that looks vaguely related in the top 50 messages as far as I can see. [19:34:35] and, yeah, the 5 is the count, all happened in a 30 second window about 8 minutes ago. [19:35:34] thcipriani: yes this ExplainPrinter.php error is certainly my fault mixing wrong cirrus debug options [19:36:04] I'll double check but it's not a feature exposed to users [19:36:17] (03PS1) 10Filippo Giunchedi: Revert "base: send syslog only to codfw to reimage lithium" [puppet] - 10https://gerrit.wikimedia.org/r/317566 (https://phabricator.wikimedia.org/T143307) [19:36:21] thcipriani: for me it looks good, thanks! [19:36:45] dcausse: by the way, this morning I've filled this based on a log error: https://phabricator.wikimedia.org/T148978 [19:37:06] (ah you've seen it and submitted a change) [19:37:21] Dereckson: yes thanks for the report! [19:37:37] dcausse: awesome! Thanks for being cautious, appreciated :) [19:38:16] thus concludes morning swat [19:38:42] (03CR) 10Filippo Giunchedi: [C: 032] Revert "base: send syslog only to codfw to reimage lithium" [puppet] - 10https://gerrit.wikimedia.org/r/317566 (https://phabricator.wikimedia.org/T143307) (owner: 10Filippo Giunchedi) [19:39:25] dcausse: a cherry-pick could be useful, as the issue is currently in prod [19:39:53] (03CR) 10Dzahn: "Paladox is right that T148478 is likely not another hardware failure, unlike the slowdown before that. The symptons look different on the " [puppet] - 10https://gerrit.wikimedia.org/r/316983 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [19:40:13] Dereckson: sure, I'll schedule it for tomorrow, (or do you think we should swat it today?) [19:40:47] (03Abandoned) 10Andrew Bogott: Revert "labs firstboot.sh: Add instance hostname to /etc/hosts" [puppet] - 10https://gerrit.wikimedia.org/r/317556 (owner: 10Andrew Bogott) [19:41:27] It seems the error appears sometimes, but with few occurences. [19:42:26] yes I think it happens when one elastic restarts and this PHP mistake hides an elastic failure [19:43:04] but at a very low rate, this code runs ~1K qps [19:44:09] 06Operations, 10ops-eqiad, 13Patch-For-Review: Add new disks to syslog server in eqiad (lithium) - https://phabricator.wikimedia.org/T143307#2739410 (10fgiunchedi) 05Open>03Resolved as pointed out by @faidon the problem with pxe failing is that lithium was hammered with udp packets from the fleet. After... [19:44:36] (03CR) 10Hashar: [C: 031] "This one looks fine to me as far as I can tell. Specially:" [puppet] - 10https://gerrit.wikimedia.org/r/264303 (https://phabricator.wikimedia.org/T133183) (owner: 10Niedzielski) [19:51:44] PROBLEM - Host elastic2020 is DOWN: PING CRITICAL - Packet loss = 100% [19:51:53] sigh [19:52:03] PROBLEM - swift-account-replicator on ms-be1015 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:59:26] elastic2020 seems to be powered off (iLO) [20:00:08] gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, Amir1, and yurik: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161024T2000). [20:01:15] ORES might have something to deploy, let me talk to halfak [20:03:57] (03Abandoned) 10Thcipriani: Fix failing keyholder arming check [puppet] - 10https://gerrit.wikimedia.org/r/312947 (owner: 10Thcipriani) [20:04:22] (03Abandoned) 10Thcipriani: Revert "Only set remote config if wiki isn't Zero" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315871 (owner: 10Thcipriani) [20:05:34] (03CR) 10Thcipriani: [C: 04-1] "This patch will need coordination with the release of the new scap version. -1 to indicate that it cannot merge any time." [puppet] - 10https://gerrit.wikimedia.org/r/315139 (https://phabricator.wikimedia.org/T146602) (owner: 10Thcipriani) [20:06:24] !log powering on elastic2020 (no idea why it is powered off) [20:06:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:07:33] elastic2020 does not want to power on [20:08:08] it broke 10 mins after the deploy... [20:09:15] gehel: maybe "power off hard" / "power reset"? [20:09:37] mutante: thanks! I'll try that as well... [20:09:54] also it can take a REALLY long to boot [20:09:58] before the first messages show up [20:11:25] mutante: when trying to open the serial console, I get a message that "The server is not powered on". It looks very much like it is really off (to my untrained eyes at least) [20:11:47] It's not really an emergency, we have enough nodes in that cluster... [20:14:34] gehel: hmm, yea, confirmed: Server Power: Off .. i guess it actually broke then.. i wonder if it has 2 power supplies [20:15:22] mutante: not much more we can do in remote. I'm opening a phab task and asking papaul if he can have a closer look. [20:16:12] gehel: *nod*, sounds good [20:16:21] 06Operations, 10ops-codfw, 06DC-Ops, 06Discovery, and 2 others: elastic2020 is powered off and does not want to restart - https://phabricator.wikimedia.org/T149006#2739503 (10Gehel) [20:16:48] also: The Virtual Serial Port is not available. [20:17:08] mutante: that seems to be related to the server being powered off [20:17:26] mutante: at least that's what the error message said [20:17:39] right after saying that isnt available it then claims it's "Starting virtual serial port" [20:17:46] but to no avail [20:18:15] (03PS1) 10DCausse: Remove duplicated wmgCirrusSearchClusterOverrides entry [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317574 [20:18:18] !log starting Parsoid deploy [20:18:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:18:27] (03CR) 10Hashar: "dzahn hooked in the discussion on IRC. To get Xorg to spawn:" [puppet] - 10https://gerrit.wikimedia.org/r/264303 (https://phabricator.wikimedia.org/T133183) (owner: 10Niedzielski) [20:18:29] Okay, ORES is going to deploy after parsiod deploy [20:18:45] Yay [20:18:53] So no more ores beta? [20:19:27] Zppix, the ORES service is in production. The ORES Review Tool is a beta feature. [20:19:32] :) [20:20:24] Ores is the highlight one right? [20:20:29] Yup [20:20:37] Flags probably damaging edits for review [20:21:02] mutante: re: tcpircbot access from the restbase cluster, it sounded like you had some thoughts on this [20:21:20] (03PS3) 10Chad: Gerrit: Clean up cron job definitions [puppet] - 10https://gerrit.wikimedia.org/r/315960 [20:21:43] hmm, shouldn't saying no to continuing a deploy roll back the canary? [20:21:52] mutante: it looks like they need to be added to the instance cidr list, in addition to ferm [20:22:08] mutante: but i'm guessing we don't want to add the hosts individually [20:22:41] 06Operations, 10ops-codfw, 06DC-Ops, 06Discovery, and 2 others: elastic2020 is powered off and does not want to restart - https://phabricator.wikimedia.org/T149006#2739524 (10Gehel) Note that the load on the codfw cluster was fairly high at the time of the failure, due to the fresh deployment on BM25 (T147... [20:25:25] !log stopping elasticsearch eqiad cluster restart for the night. [20:25:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:25:40] hi all. does anyone know if WMF prefers RSA or DSA ssh keys? [20:26:26] !log starting mobileapps deploy [20:26:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:28:15] (03CR) 10ArielGlenn: [C: 032] Gerrit: Clean up cron job definitions [puppet] - 10https://gerrit.wikimedia.org/r/315960 (owner: 10Chad) [20:29:25] (03PS1) 10Paladox: Gerrit: Enable loggin for jvm gc [puppet] - 10https://gerrit.wikimedia.org/r/317582 (https://phabricator.wikimedia.org/T148478) [20:29:36] ostriches ^^ [20:29:43] !log deployed mobileapps f872894 [20:29:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:29:52] (03PS2) 10Paladox: Gerrit: Enable logging for jvm gc [puppet] - 10https://gerrit.wikimedia.org/r/317582 (https://phabricator.wikimedia.org/T148478) [20:29:56] (03PS3) 10Paladox: Gerrit: Enable logging for jvm gc [puppet] - 10https://gerrit.wikimedia.org/r/317582 (https://phabricator.wikimedia.org/T148478) [20:30:41] paladox: I'm going to steal your change and amend it. I wanna make that a lil more configurable. [20:30:45] Thx for starting it tho [20:30:55] Your welcome and sure :) [20:31:07] ostriches note that you carn't specify javaOptions twice [20:31:17] even though the docs says you can on gerrit [20:31:23] Theres a bug on gerrit about this [20:36:11] Found the bug https://bugs.chromium.org/p/gerrit/issues/detail?id=3209 [20:38:45] (03PS4) 10Chad: Gerrit: Enable logging for jvm gc [puppet] - 10https://gerrit.wikimedia.org/r/317582 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [20:39:10] ostriches i found a fix https://github.com/gerrit-review/gerrit/commit/84f83b9a8647412248750805e843d198a3be4101 [20:39:46] (03CR) 10jenkins-bot: [V: 04-1] Gerrit: Enable logging for jvm gc [puppet] - 10https://gerrit.wikimedia.org/r/317582 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [20:39:46] Eh, we don't need multiple options :) [20:39:51] I like it my way better [20:39:55] Ok [20:39:56] :) [20:40:36] (03PS5) 10Chad: Gerrit: Enable logging for jvm gc [puppet] - 10https://gerrit.wikimedia.org/r/317582 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [20:40:44] 06Operations, 10ops-codfw, 06DC-Ops, 06Discovery, and 2 others: elastic2020 is powered off and does not want to restart - https://phabricator.wikimedia.org/T149006#2739503 (10RobH) I'd suggest fully removing all power from the system (pulling the power cords) and allowing a full power loss event. This wil... [20:41:41] your way is better :) [20:42:23] Anyway, I don't have time to babysit that today or analyze any logs. [20:42:45] !log updated Parsoid to version 63f1e151 (T139032, T146612, T141905) [20:42:48] T141905: Parsoid crashes from logs - https://phabricator.wikimedia.org/T141905 [20:42:48] T146612: Create Livvi-Karelian Wikipedia at olo.wikipedia.org - https://phabricator.wikimedia.org/T146612 [20:42:48] T139032: Close wikimania2015wiki - https://phabricator.wikimedia.org/T139032 [20:42:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:44:28] !log deploying 0caa589 on ores canary node [20:44:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:49:30] !log deploying 0caa589 to all ores nodes [20:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:52:44] PROBLEM - ores on scb1002 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 136 bytes in 0.015 second response time [20:53:59] ores is up, icinga acted crazy [20:56:44] Agreed and confirmed [20:59:38] I get too much of internal errors [20:59:43] rollbacking [21:00:04] dapatrick and bawolff: Dear anthropoid, the time has come. Please deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161024T2100). [21:00:55] PROBLEM - ores on scb1001 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 136 bytes in 0.001 second response time [21:01:49] !log rollbacking ores to 8bbd3ab [21:01:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:03:13] RECOVERY - ores on scb1002 is OK: HTTP OK: HTTP/1.0 200 OK - 2822 bytes in 0.011 second response time [21:03:28] RECOVERY - ores on scb1001 is OK: HTTP OK: HTTP/1.0 200 OK - 2822 bytes in 0.030 second response time [21:04:57] the rollback is done [21:05:09] I need to check why the logging part failed [21:05:12] tomorrow [21:05:15] o/ [21:13:04] logging failed?? [21:13:45] 06Operations, 10DNS, 10Domains, 10Traffic, and 2 others: Point wikipedia.in to 180.179.52.130 instead of URL forward - https://phabricator.wikimedia.org/T144508#2739847 (10CRoslof) If @Naveenpf or #operations is still interested in pursuing this task's specific request, I'd appreciate an answer to my clari... [21:17:01] (03PS1) 10Kaldari: Switch Norwegian Wikipedia to uca-no-u-kn category collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317652 (https://phabricator.wikimedia.org/T146675) [21:25:02] (03PS1) 10Chad: gerrit (2.12.5-wmf.1) jessie-wikimedia; urgency=low [debs/gerrit] - 10https://gerrit.wikimedia.org/r/317654 (https://phabricator.wikimedia.org/T143089) [21:27:13] RECOVERY - puppet last run on notebook1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:28:14] (03CR) 10Paladox: [C: 031] gerrit (2.12.5-wmf.1) jessie-wikimedia; urgency=low [debs/gerrit] - 10https://gerrit.wikimedia.org/r/317654 (https://phabricator.wikimedia.org/T143089) (owner: 10Chad) [21:28:22] (03CR) 10Chad: "Mukunda: Why are we doing this? Without better profiling into our GC situation I don't want to potentially exhaust the heap. All this does" [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [21:29:34] RECOVERY - puppet last run on notebook1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:33:44] PROBLEM - puppet last run on elastic1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:38:29] (03PS1) 10Chad: Remove bits.wm.o from Apache vhosts [puppet] - 10https://gerrit.wikimedia.org/r/317656 [21:38:35] Heh ^ [21:40:41] didn't we already have one of those? [21:40:53] (03PS1) 10Chad: Remove bits docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317657 [21:41:33] I dunno I didn't look [21:42:06] i know this is not ops (more then likely) related but does anyone know that if wm-bot has +o on it in a chan will it give itself op if it needs it for something? [21:43:18] ostriches, https://gerrit.wikimedia.org/r/#/c/305536/ [21:43:38] !log rolling-restart of ms-fe in codfw/eqiad for kernel update [21:43:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:43:46] Zppix: wm-bot has nothing to do with this channel [21:43:47] (03Abandoned) 10Chad: Remove bits.wm.o from Apache vhosts [puppet] - 10https://gerrit.wikimedia.org/r/317656 (owner: 10Chad) [21:43:54] you will need to ask it maintainers [21:44:06] p858snake|L2 no-one at wm-bots chan is replying [21:44:11] but that being said, it can't op itself if its not on the access list [21:44:19] (03PS2) 10Chad: Remove bits docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317657 [21:44:20] but if it is? [21:44:29] Zppix: many timezones, people busy etc etc [21:44:50] 06Operations, 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2740149 (10Ottomata) Crap crackers. I just added bts and boom to our apt, but alas, there ar... [21:55:44] RECOVERY - swift-account-replicator on ms-be1015 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [21:58:55] RECOVERY - puppet last run on elastic1039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:07:34] (03CR) 10ArielGlenn: "Do you mind using PrintGCDateStamps instead? This gives regular date/time values whereas PrintGCDateStamps prints time elapsed from start " [puppet] - 10https://gerrit.wikimedia.org/r/317582 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [22:09:17] ostriches ^^ [22:10:43] (03PS6) 10Paladox: Gerrit: Enable logging for jvm gc [puppet] - 10https://gerrit.wikimedia.org/r/317582 (https://phabricator.wikimedia.org/T148478) [22:25:03] (03CR) 10Chad: [C: 031] Strip out branch HEAD in git.wikimedia.org tree link [puppet] - 10https://gerrit.wikimedia.org/r/302747 (https://phabricator.wikimedia.org/T141965) (owner: 10Paladox) [22:25:28] (03PS7) 10Paladox: Strip out branch HEAD in git.wikimedia.org tree link [puppet] - 10https://gerrit.wikimedia.org/r/302747 (https://phabricator.wikimedia.org/T141965) [22:26:34] (03PS16) 10Alexandros Kosiaris: icinga: Remove event_profiling_enabled [puppet] - 10https://gerrit.wikimedia.org/r/315085 [22:26:36] (03PS13) 10Alexandros Kosiaris: Replace neon with einsteinium where applicable [puppet] - 10https://gerrit.wikimedia.org/r/315257 [22:26:38] (03PS13) 10Alexandros Kosiaris: Remove absented /etc/icinga/puppet_hostextinfo.cfg entry [puppet] - 10https://gerrit.wikimedia.org/r/315244 [22:26:40] (03PS1) 10Alexandros Kosiaris: role::tcpircbot: Add an ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/317665 [22:28:23] PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:30:19] 06Operations, 06Labs, 10Labs-Infrastructure: Designate seems very slow to delete records? - https://phabricator.wikimedia.org/T149057#2740397 (10AlexMonk-WMF) [22:35:15] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed disks on ms-be1027 - https://phabricator.wikimedia.org/T140374#2740411 (10fgiunchedi) Tried a reinstall, though only some spinning disks are seen and no SSDs (all 3TB sizes reported) ``` ~ # cat /proc/partitions major minor #blocks name 8... [22:44:49] gwicke, around? [22:46:14] RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:50:18] (03CR) 10Dereckson: "uca-no-u-kn is a new collation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317652 (https://phabricator.wikimedia.org/T146675) (owner: 10Kaldari) [22:53:23] (03CR) 10Dereckson: "Ready for deployment or need https://gerrit.wikimedia.org/r/317004 before?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317652 (https://phabricator.wikimedia.org/T146675) (owner: 10Kaldari) [22:54:21] (03CR) 10Dereckson: [C: 04-1] "Still in use:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317574 (owner: 10DCausse) [23:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161024T2300). [23:00:05] ebernhardson and kaldari: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:56] \o [23:00:57] available! [23:01:04] Hello, I can SWAT this evening. [23:01:17] ebernhardson: CirrusSearch-common.php still uses wmgCirrusSearchClusterOverrides [23:01:23] this night *=) [23:01:47] kaldari: there is a leading zero issue, with a fix available in master at https://gerrit.wikimedia.org/r/317004 [23:02:03] Dereckson: it should, there are accidentaly two entries in InitialiseSettings.php [23:02:06] kaldari: you don't need to cherry pick that befeor new deployment? [23:02:14] ebernhardson: ack'ed [23:03:11] Dereckson: No, that only affect the "numeric" collation, not the "uca-xx" collations. [23:03:15] affects [23:03:27] or "uca-xx-u-kn" collations [23:04:05] Dereckson: but thanks for checking! [23:04:11] (03CR) 10Dereckson: [C: 031] "The duplicate was in IS, not wg/wmg." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317574 (owner: 10DCausse) [23:04:33] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317574 (owner: 10DCausse) [23:05:11] (03Merged) 10jenkins-bot: Remove duplicated wmgCirrusSearchClusterOverrides entry [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317574 (owner: 10DCausse) [23:07:05] ebernhardson: live on mw1099 [23:07:40] !log Deployed patch for T148600 to wmf22 [23:07:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:07:49] Dereckson: all looks pretty sane [23:13:13] logs look good too [23:14:00] !log dereckson@mira Synchronized wmf-config/InitialiseSettings.php: Remove duplicated wmgCirrusSearchClusterOverrides entry (duration: 00m 50s) [23:14:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:14:59] (03CR) 10Dereckson: "On #wikimedia-operations, Kaldari noted change 317004 is only for numeric collation, but not uca-xx-u-kn collations." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317652 (https://phabricator.wikimedia.org/T146675) (owner: 10Kaldari) [23:15:03] (03PS2) 10Dereckson: Switch Norwegian Wikipedia to uca-no-u-kn category collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317652 (https://phabricator.wikimedia.org/T146675) (owner: 10Kaldari) [23:15:18] Dereckson: looks to be doing as expected, completion traffic is shifting to the eqiad cluster [23:16:26] good [23:17:23] is rcstream down? [23:17:30] im not able to connect to it right now [23:17:56] jdlrobson: it is not [23:18:00] hmm weird [23:19:52] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317652 (https://phabricator.wikimedia.org/T146675) (owner: 10Kaldari) [23:20:20] (03Merged) 10jenkins-bot: Switch Norwegian Wikipedia to uca-no-u-kn category collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317652 (https://phabricator.wikimedia.org/T146675) (owner: 10Kaldari) [23:21:05] kaldari: live on mw1099, check category page doesn't throw a fatal exception as the cat hasn't been used in config yet [23:21:16] checking ... [23:21:17] (03PS1) 10Filippo Giunchedi: icinga: default duration for icinga-downtime [puppet] - 10https://gerrit.wikimedia.org/r/317720 [23:21:19] (03PS1) 10Filippo Giunchedi: icinga: also schedule host services downtime [puppet] - 10https://gerrit.wikimedia.org/r/317721 [23:22:08] (03CR) 10Filippo Giunchedi: "I'm not 100% sure this is needed, but I've seen cases where host downtime alone wasn't sufficient" [puppet] - 10https://gerrit.wikimedia.org/r/317721 (owner: 10Filippo Giunchedi) [23:22:27] Dereckson: Looks good at https://no.wikipedia.org/wiki/Kategori:Actionkomedier_fra_USA with the header set to mw1099, feel free to sync. [23:23:48] Logs look good too. [23:24:32] !log ^&& JOIN #wikimedia-ayuda &&^ [23:24:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:40] !log dereckson@mira Synchronized wmf-config/InitialiseSettings.php: Set collation to uca-no-u-kn on no.wikipedia (146675, T148488) (duration: 00m 50s) [23:24:41] !log ^&& JOIN #wikimedia-ayuda &&^ [23:24:42] T148488: Figure out if no.wikipedia.org wants UCA - https://phabricator.wikimedia.org/T148488 [23:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:48] I got the sal [23:25:04] Someone get twitter [23:26:00] someone kill twitter* [23:28:01] Can we not make !log people with cloaks only [23:30:21] last Friday: [23:30:21] [06:02:54] _joe_: morning, is there a ticket to whitelist the bot users ? [23:30:21] [06:05:32] <_joe_> matanya: nope [23:30:21] [06:05:52] _joe_: you think we need it ? [23:30:21] [06:06:53] <_joe_> matanya: absolutely and surely not. I just want to unlink the damn twitter feed [23:33:00] last Thursday: [23:33:02] [06:06:11] <_joe_> AlexZ: we don't want this channel to be +r even temporarily [23:33:02] [06:06:28] <_joe_> you should just stop giving this troll so much attention, that's what they crave for [23:33:02] [06:06:46] I'm not, they're spamming over 20 channels. [23:33:02] [06:07:13] <_joe_> well, this one, we can manage [23:33:02] [06:07:27] and letting logmsgbot continue working without an acl, is encouraging him [23:33:03] [06:07:34] <_joe_> I'm asking myself why the hell logmsgbot is still linked to twitter [23:33:03] [06:07:41] and twitter [23:33:03] [06:07:48] <_joe_> AlexZ: twitter I agree [23:33:04] [06:07:52] <_joe_> I said it repeatedly [23:33:04] [06:08:19] For some users, like AlvaroMolina this harassment [23:33:04] [06:08:21] <_joe_> but, as a part of the ops team, I have two requirements for this channel: 1) that unregistered users can freely join [23:33:06] [06:08:53] <_joe_> 2) that anyone and not just people on a whitelist can register things to the SAL [23:42:27] Zppix|Mobile , Zppix ^ [23:42:28] I didn't even know logmsgbot had a twitter output heh [23:42:41] (03PS1) 10Filippo Giunchedi: role: add prometheus ipv6 ferm rules to varnish_exporter [puppet] - 10https://gerrit.wikimedia.org/r/317723 [23:42:42] I'll clean the twitter feed [23:42:45] bblack: no its morebots [23:42:47] fascinating [23:43:18] Looks like only one made it and I deleted it [23:43:18] https://twitter.com/wikimediatech [23:43:21] (03CR) 10Filippo Giunchedi: [C: 032] role: add prometheus ipv6 ferm rules to varnish_exporter [puppet] - 10https://gerrit.wikimedia.org/r/317723 (owner: 10Filippo Giunchedi) [23:44:19] Dereckson [23:44:19] arseny92: ? [23:44:19] Vai2fc_ , describe what you need changed [23:44:22] uh netsplit, but we're all here still luckily [23:45:14] hi all, I have been asked by a colleague to see if it is possible to get the daily limit on new accounts lifted for the wikipedia edit-a-thon we're holding tomorrow at Vanderbilt libraries. I know it's a last-minute request, but am currently in the process of making a ticket in phabricator. [23:46:05] can anybody help me in getting this approved? I realize it's really late notice. :/ [23:47:29] Dereckson ^ [23:48:15] Hi Vai2fc_, sure, it is possible. [23:49:05] Dereckson - currently filling out the phabricator ticket and trying to get the IP address. [23:49:19] ok [23:49:42] Thanks for the help arseny92 and Dereckson! [23:49:46] Ask for IPv4 + IOv6 [23:53:00] also luckily is the fact that there are no pending deployment windows waiting to be processed, so I guess technically Evening SWAT can be prolonged? [23:54:43] arseny92: the window closes at 24:00/0:00 UTC (in 6 minutes) [23:54:46] arseny92: yes we're still in the evening SWAT window currently [23:54:59] so if there is right now the IP, there is still time to merge thazty [23:56:52] greg-g , evening swats don't usually have a some other windows following them, like on paper all is done for the day [23:57:04] Vai2fc_: at what time tomorrow is the event by the way? [23:57:45] currently trying to get the IP address, Dereckson. The event is from 11 AM to 4 PM CST. [23:57:52] arseny92: meet our director of release engineering: greg-g :) [23:58:25] but on practice i guess a window that is last for the day can take its time in case thats needed [23:58:38] Vai2fc_: arseny92: okay, we're not in an hurry and we don't need to merge that right now, there is tomorrow an EU mid SWAT window before 11 AM CST [23:58:58] Ah, thanks for the clarification, Dereckson. PHEW. [23:59:09] as its unlikely we get the ip in 1minute [23:59:25] and then also someone needs to prepare a change [23:59:31] Yeah - I live across from campus and sent my spouse across the street... [23:59:33] arseny92: not exactly. Many people sign off for the day and thus when the window is over we stop doing any SWAT deployments. The reason being that there could be an issue that needs more access/experience than the SWAT deployer has. 5pm in SF is quitting time. Period. :) [23:59:33] and merge it