[00:00:04] <jouncebot>	 addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180124T0000).
[00:00:04] <jouncebot>	 MatmaRex: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[00:03:14] <MatmaRex>	 so, any deployers around?
[00:04:08] <wikibugs>	 (03CR) 10Dzahn: "compiles now after https://gerrit.wikimedia.org/r/#/c/406001/" [puppet] - 10https://gerrit.wikimedia.org/r/405990 (owner: 10Dzahn)
[00:04:42] <wikibugs>	 (03CR) 10Dzahn: [C: 032] wmcs::puppetmaster: move standard/firewall include to roles [puppet] - 10https://gerrit.wikimedia.org/r/405990 (owner: 10Dzahn)
[00:04:57] <wikibugs>	 (03PS2) 10Dzahn: wmcs::puppetmaster: move standard/firewall include to roles [puppet] - 10https://gerrit.wikimedia.org/r/405990
[00:08:26] <urandom>	 !log bootstrapping restbase2007-a - T184100
[00:08:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:08:41] <stashbot>	 T184100: Reprovision legacy Cassandra nodes into new cluster - https://phabricator.wikimedia.org/T184100
[00:11:21] <MatmaRex>	 :(
[00:11:37] <MatmaRex>	 James_F: find me a deployer?
[00:13:03] <James_F>	 MatmaRex: Probably not. Bit chaötic here. :-(
[00:14:40] <James_F>	 MatmaRex: matt_flaschen says he'll do it.
[00:14:44] <James_F>	 (He's awesome.)
[00:16:03] <MatmaRex>	 whoo
[00:16:36] <matt_flaschen>	 Okay, reviewing patches now.
[00:17:27] <matt_flaschen>	 Just one.  Everyone must be busy somewhere (yeah, I'm also at Dev Summit).
[00:20:20] <wikibugs>	 (03PS1) 10Dzahn: openstack::main: move standard/firewall includes to roles [puppet] - 10https://gerrit.wikimedia.org/r/406003
[00:21:13] <bd808>	 MatmaRex: are you only not a deployer yourself by choice?
[00:21:47] * bd808 feels like he has asked this before
[00:21:56] <MatmaRex>	 bd808: mostly yes
[00:27:19] <wikibugs>	 (03PS2) 10Dzahn: openstack::main: move standard/firewall includes to roles [puppet] - 10https://gerrit.wikimedia.org/r/406003
[00:28:22] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9811/" [puppet] - 10https://gerrit.wikimedia.org/r/406003 (owner: 10Dzahn)
[00:28:58] <icinga-wm>	 RECOVERY - Check systemd state on restbase1011 is OK: OK - running: The system is fully operational
[00:29:57] <wikibugs>	 (03CR) 10Mobrovac: [C: 04-1] "One minor thing, otherwise LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/404705 (https://phabricator.wikimedia.org/T175284) (owner: 10Eevans)
[00:31:18] * matt_flaschen sighs
[00:31:25] <matt_flaschen>	 "800 | WARNING | Line exceeds 100 characters; contains 102 characters"
[00:31:28] <matt_flaschen>	 https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm-jessie/33584/console
[00:31:50] <matt_flaschen>	 ^ James_F
[00:32:07] <icinga-wm>	 PROBLEM - Check systemd state on restbase1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:32:31] <matt_flaschen>	 I guess the rule is different on master, or recently changed.
[00:32:48] <James_F>	 That's… pretty unhelpful.
[00:32:53] <MatmaRex>	 matt_flaschen: uhh, no
[00:32:54] <James_F>	 MatmaRex: ^^^
[00:33:00] <_joe_>	 anyone doing things on rb1011?
[00:33:03] <MatmaRex>	 matt_flaschen: that looks like the error output for a different patch
[00:33:39] <MatmaRex>	 we're not even touching that file
[00:34:56] <James_F>	 Yeah, that's the failure for 405993.
[00:35:07] <MatmaRex>	 matt_flaschen: https://gerrit.wikimedia.org/r/#/c/405933/ just merged
[00:35:07] <James_F>	 The SWAT is for 405933.
[00:35:09] <matt_flaschen>	 MatmaRex, oops, sorry.
[00:35:15] <James_F>	 Double-3 not double-9. :-)
[00:39:58] <logmsgbot>	 !log mobrovac@tin Started deploy [zotero/translators@8f53531]: Update translators to 528296d
[00:40:06] <logmsgbot>	 !log mobrovac@tin Finished deploy [zotero/translators@8f53531]: Update translators to 528296d (duration: 00m 08s)
[00:40:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:40:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:44:00] <matt_flaschen>	 James_F, please test: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug#Staging_changes
[00:44:04] <matt_flaschen>	 On mwdebug1002
[00:44:21] <James_F>	 MatmaRex: ^^
[00:44:55] <MatmaRex>	 ooking
[00:44:58] <MatmaRex>	 looking*
[00:51:42] <MatmaRex>	 please hold on. firefox's debugger is making me cry
[00:54:21] <MatmaRex>	 matt_flaschen: are you sure the code is live?
[00:55:21] <matt_flaschen>	 MatmaRex, only on mwdebug1002. ^
[00:55:44] <MatmaRex>	 matt_flaschen: i'm looking at https://en.wikipedia.org/w/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.init.js on mwdebug1002 and it's showing the old code
[00:56:23] <MatmaRex>	 it doesn't have the "Support: Firefox =< 52" line
[00:56:29] <matt_flaschen>	 MatmaRex, ahg, I forgot to update the submodule.  Really sorry.
[00:56:31] <MatmaRex>	 matt_flaschen: are you seeing the new version there?
[00:56:51] <MatmaRex>	 aha. no problem :)
[00:57:09] <James_F>	 MatmaRex: matt_flaschen was just making sure you were actually testing. ;-)
[00:58:57] <matt_flaschen>	 MatmaRex, it's really on mwdebug1002 now.  Sorry again.
[00:59:24] <MatmaRex>	 matt_flaschen: yeah. works now :)
[01:02:41] <logmsgbot>	 !log mattflaschen@tin Synchronized php-1.31.0-wmf.17/extensions/VisualEditor/: (no justification provided) (duration: 00m 58s)
[01:02:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:05:54] <MatmaRex>	 thanks matt_flaschen
[01:08:53] <matt_flaschen>	 MatmaRex, no problem.  Please test without mwdebug1002.
[01:09:38] <MatmaRex>	 i did and it's working
[01:09:47] <matt_flaschen>	 MatmaRex, thanks.
[01:10:00] <matt_flaschen>	 !log Deployed 'T185304: NWE: Don't attempt to set selection on unattached textarea' in extensions/VisualEditor
[01:10:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:10:12] <stashbot>	 T185304: NWE doesn't load in Firefox ESR - https://phabricator.wikimedia.org/T185304
[01:14:53] <matt_flaschen>	 !log SWAT complete
[01:15:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:16:31] <James_F>	 matt_flaschen: You're awesome, thank you.
[01:16:59] <matt_flaschen>	 James_F, you're welcome.
[01:25:18] <wikibugs>	 (03PS1) 10Ayounsi: Assigning eqsin PCCW peering IPs [dns] - 10https://gerrit.wikimedia.org/r/406011
[01:27:29] <wikibugs>	 (03CR) 10Ayounsi: [C: 032] Assigning eqsin PCCW peering IPs [dns] - 10https://gerrit.wikimedia.org/r/406011 (owner: 10Ayounsi)
[02:22:45] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.17) (duration: 05m 33s)
[02:22:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:25:37] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 769.42 seconds
[03:56:47] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 207.84 seconds
[04:11:49] <wikibugs>	 (03Draft2) 10Jayprakash12345: Allow bureaucrats to add/remove 'accountcreator' permission on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406012 (https://phabricator.wikimedia.org/T185597)
[04:12:56] <wikibugs>	 (03CR) 10Jayprakash12345: "Please review deeply." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406012 (https://phabricator.wikimedia.org/T185597) (owner: 10Jayprakash12345)
[04:57:27] <icinga-wm>	 PROBLEM - Disk space on labtestnet2001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=61%)
[06:18:57] <icinga-wm>	 RECOVERY - cassandra-a CQL 10.192.16.176:9042 on restbase2007 is OK: TCP OK - 0.036 second response time on 10.192.16.176 port 9042
[06:24:54] <wikibugs>	 (03CR) 10Brian Wolff: "Wouldnt url-downloader.wikimedia.org block  tool labs or is my knowledge on how this all works outdated?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404653 (https://phabricator.wikimedia.org/T185087) (owner: 10Aklapper)
[06:26:20] <urandom>	 !log bootstrapping restbase2007-b - T184100
[06:26:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:26:34] <stashbot>	 T184100: Reprovision legacy Cassandra nodes into new cluster - https://phabricator.wikimedia.org/T184100
[06:27:27] <icinga-wm>	 RECOVERY - cassandra-b SSL 10.192.16.177:7001 on restbase2007 is OK: SSL OK - Certificate restbase2007-b valid until 2018-08-17 16:12:09 +0000 (expires in 205 days)
[06:36:08] <icinga-wm>	 PROBLEM - puppet last run on restbase-dev1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:37:07] <icinga-wm>	 PROBLEM - Disk space on labtestnet2001 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=61%)
[06:46:17] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5
[06:47:17] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[07:11:37] <icinga-wm>	 PROBLEM - HHVM rendering on mw1288 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.002 second response time
[07:11:57] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1288 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.006 second response time
[07:12:08] <icinga-wm>	 PROBLEM - Apache HTTP on mw1288 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[07:12:37] <icinga-wm>	 RECOVERY - HHVM rendering on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 74539 bytes in 0.188 second response time
[07:12:57] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.044 second response time
[07:13:08] <icinga-wm>	 RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.215 second response time
[07:19:37] <icinga-wm>	 PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received
[07:21:27] <icinga-wm>	 RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy
[07:27:38] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 26 probes of 290 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[07:32:37] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 13 probes of 290 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[07:49:58] <icinga-wm>	 PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received
[07:50:57] <icinga-wm>	 RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy
[08:03:02] <elukey>	 cp4024 seems throwing 503s.. it happens daily now in ulsfo
[08:14:37] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5
[08:14:37] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[08:16:32] <ema>	 !log cp4024: restart varnish-be due to 503s
[08:16:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:17] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp4024 is OK: OK: expiry mailbox lag is 0
[08:28:38] <icinga-wm>	 RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=upload&var-status_type=5
[08:29:38] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[08:34:14] <wikibugs>	 (03CR) 10TerraCodes: [C: 031] Allow bureaucrats to add/remove 'accountcreator' permission on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406012 (https://phabricator.wikimedia.org/T185597) (owner: 10Jayprakash12345)
[08:37:43] <wikibugs>	 (03PS49) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956)
[10:17:33] <wikibugs>	 (03PS1) 10Jon Harald Søby: Set category collation for nowikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406022 (https://phabricator.wikimedia.org/T185630)
[10:19:23] <wikibugs>	 (03CR) 10Jon Harald Søby: "When this is merged, the deployer needs to run the following command:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406022 (https://phabricator.wikimedia.org/T185630) (owner: 10Jon Harald Søby)
[11:27:14] <Hauskatze>	 marostegui: jynus -- I'm about to perform a bigdelete on dewiki, page has 5 088 revids; be adviced
[11:28:12] <Hauskatze>	 I'll use API, it's faster
[11:49:54] <wikibugs>	 (03PS1) 10MarcoAurelio: Bureaucrats on WMF wikis to add and remove 'accountcreator' by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406025 (https://phabricator.wikimedia.org/T185417)
[12:33:27] <icinga-wm>	 PROBLEM - puppet last run on labtestnet2001 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago
[12:57:49] <wikibugs>	 (03PS4) 10Paladox: ircecho: Support ssl when connecting to irc [puppet] - 10https://gerrit.wikimedia.org/r/405591
[13:07:45] <wikibugs>	 (03PS5) 10Paladox: ircecho: Support ssl when connecting to irc [puppet] - 10https://gerrit.wikimedia.org/r/405591
[13:07:59] <wikibugs>	 (03CR) 10Paladox: ircecho: Support ssl when connecting to irc (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/405591 (owner: 10Paladox)
[13:08:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ircecho: Support ssl when connecting to irc [puppet] - 10https://gerrit.wikimedia.org/r/405591 (owner: 10Paladox)
[13:09:42] <wikibugs>	 (03CR) 10Paladox: ircecho: Support ssl when connecting to irc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/405591 (owner: 10Paladox)
[13:12:32] <wikibugs>	 (03PS6) 10Paladox: ircecho: Support ssl when connecting to irc [puppet] - 10https://gerrit.wikimedia.org/r/405591
[13:12:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ircecho: Support ssl when connecting to irc [puppet] - 10https://gerrit.wikimedia.org/r/405591 (owner: 10Paladox)
[13:17:17] <wikibugs>	 (03PS7) 10Paladox: ircecho: Support ssl when connecting to irc [puppet] - 10https://gerrit.wikimedia.org/r/405591
[13:42:04] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: load.php response taking 160s (of which only 0.031s in Apache) - https://phabricator.wikimedia.org/T181315#3921750 (10Gilles) Revisiting[[ https://logstash.wikimedia.org/goto/788ca720a38ccbed8dab29adab7ac2ca |  the link I posted previously ]], graph for the past 2...
[13:42:13] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: load.php response taking 160s (of which only 0.031s in Apache) - https://phabricator.wikimedia.org/T181315#3921751 (10Gilles) a:05Gilles>03None
[13:42:57] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.192.16.177:9042 on restbase2007 is OK: TCP OK - 0.036 second response time on 10.192.16.177 port 9042
[14:00:04] <jouncebot>	 addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180124T1400).
[14:00:04] <jouncebot>	 Jhs and Jayprakash12345: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:03:26] <Jayprakash12345>	 zeljkof:Hello
[14:09:30] <Jayprakash12345>	 Who will swat?
[14:15:59] <Jhs>	 ahoy, i'm here for swat if i'm not too late
[14:16:07] <Jhs>	 my IRC client was acting up and i didn't notice until now
[14:17:05] <Jayprakash12345>	 me too
[14:17:42] <Jhs>	 let's ping zeljkof for good measure ;)
[14:19:13] <Jayprakash12345>	 zeljkof: Hello
[14:19:42] <Jhs>	 hmm, i think he might be in San Francisco for the all-hands thing, so probably not awake
[14:20:46] <Jhs>	 aude, perhaps?
[14:25:32] <Jayprakash12345>	 hashar: are you here?
[14:31:26] <Jayprakash12345>	 MaxSem: ping?
[14:32:09] <Jayprakash12345>	 aude: ping?
[14:32:50] <Jayprakash12345>	 no_justification: ping?
[14:39:29] <apergos>	 6:40 am, it's going to be a hard sell to find sf folks on line now
[15:29:42] <_joe_>	 apergos: ping
[15:29:45] <_joe_>	 :P
[15:30:25] <apergos>	 _joe_: ponnngggg
[15:32:28] <wikibugs>	 10Operations, 10media-storage: upload.wikimedia.org reports wrong mimetype for svg - https://phabricator.wikimedia.org/T179787#3921821 (10zhuyifei1999)
[15:34:43] <no_justification>	 I need to get off the 6am swat I never ever do it (if I'm even awake) 
[15:35:25] <no_justification>	 Good thing I sleep with my phone on mute
[15:50:57] <wikibugs>	 (03CR) 10Aklapper: "@Brian: I don't get the question and how it is related, sorry. Could you elaborate?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404653 (https://phabricator.wikimedia.org/T185087) (owner: 10Aklapper)
[16:10:48] <greg-g>	 no_justification: to simplify tracking we just have it ping every swatter each slot :)
[16:11:23] <no_justification>	 Maybe I should just resign as a swatter 😉
[16:11:46] * greg-g glares
[16:11:48] <greg-g>	 :)
[16:14:58] <no_justification>	 greg-g: well Im usually last to volunteer anyway 😂
[16:48:58] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Kanban: BBU alarms flapping for analytics1038 - https://phabricator.wikimedia.org/T185409#3921916 (10RobH) This is an older R720xd, and uses an older H710 controller.  While @Cmjohnson can check for a spare when back onsite, there is a good chance we don't have any.  If...
[16:59:51] <wikibugs>	 10Operations, 10Phabricator: Switch phabricator from using apache to nginx - https://phabricator.wikimedia.org/T185644#3921927 (10Paladox)
[17:01:30] <wikibugs>	 10Operations, 10Gerrit: Switch gerrit from using apache to nginx - https://phabricator.wikimedia.org/T185645#3921946 (10Paladox)
[17:08:26] <wikibugs>	 10Operations, 10Phabricator: Switch phabricator from using apache to nginx - https://phabricator.wikimedia.org/T185644#3921962 (10Paladox)
[17:08:35] <wikibugs>	 10Operations, 10Gerrit: Switch gerrit from using apache to nginx - https://phabricator.wikimedia.org/T185645#3921963 (10Paladox)
[17:15:34] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /en.wikipedia
[17:15:34] <icinga-wm>	 title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) is CRITICAL: Test Retrieve selected the events for Jan 01 returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL
[17:15:34] <icinga-wm>	 itle from storage returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retri eevans Not production
[17:15:34] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /en.wikipedia
[17:15:34] <icinga-wm>	 title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) is CRITICAL: Test Retrieve selected the events for Jan 01 returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL
[17:15:34] <icinga-wm>	 itle from storage returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retri eevans Not production
[17:15:34] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on restbase-dev1006 is CRITICAL: CRITICAL: State: degraded, Active: 11, Working: 11, Failed: 1, Spare: 0 eevans Not production
[17:15:35] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-a CQL 10.64.48.168:9042 on restbase-dev1006 is CRITICAL: connect to address 10.64.48.168 and port 9042: Connection refused eevans Not production
[17:15:35] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on restbase-dev1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues eevans Not production
[17:15:36] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/media/math/check/{type} (Mathoid - check test formula) is CRITICAL: Test Mathoid - check test formula returned the unexpected status 500 (expecting: 200): /en.wikipedia
[17:15:36] <icinga-wm>	 title} (Get html by title from storage) is CRITICAL: Test Get html by title from storage returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) is CRITICAL: Test Retrieve selected the events for Jan 01 returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/page/title/{title} (Get rev by title from storage) is CRITICAL
[17:15:37] <icinga-wm>	 itle from storage returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retri eevans Not production
[17:16:00] <icinga-wm>	 ACKNOWLEDGEMENT - IPsec on mc1036 is CRITICAL: Strongswan CRITICAL - ok: 0 not-conn: mc2036_v4 Giuseppe Lavagetto mc2036 is down for hardware repair, the IPSEC is going to be broken until that comes up again.
[17:19:32] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on restbase1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. eevans Investigating error (but cassandra-metrics-collector is not in use, and its failure isnt degrading)
[17:22:46] <wikibugs>	 (03Abandoned) 10MarcoAurelio: Add gwtoolset to GlobalGroupPermissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/395806 (owner: 10MarcoAurelio)
[17:43:40] <wikibugs>	 (03PS13) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151)
[17:51:18] <wikibugs>	 (03PS5) 10MarcoAurelio: Remove upload rights on wikis where local uploads are disabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405421 (https://phabricator.wikimedia.org/T143789)
[17:54:00] <Amir1>	 gerrit seems to be broken
[17:54:06] <Amir1>	 can't do "git review"
[17:54:19] <Amir1>	 same for other people in the office
[17:54:49] <paladox>	 Amir1 what's the error please?
[17:55:57] <bd808>	 Amir1: I was wondering if it was just me and the weird wifi I'm on.
[17:55:59] <paladox>	 Amir1 are you using git-review version 1.26.0?
[17:56:25] <paladox>	 You need to be on 1.26.0 to prevent a bug from comming up where it does /changes/ instead of /r/changes/
[17:57:05] <bd808>	 It took a *very* long time just to do a clone of a very small new repo
[17:57:25] <paladox>	 That would be your internet connection :)
[17:57:49] <bd808>	 paladox: I'm not sure. direct ssh and traceroute are looking good
[17:57:58] <bd808>	 just git operations are slow
[17:58:05] <paladox>	 bd808 ah
[17:58:10] <paladox>	 operations is a very large repo
[17:58:21] <paladox>	 it is the biggest out of all repos and tieing close to mw.
[17:59:02] <paladox>	 bd808 maybe we should gc it as it is slow for me too.
[17:59:04] <paladox>	 no_justification ^^
[17:59:26] <robh>	 huh, git pull still pending...
[17:59:38] <robh>	 (usually it answers within 30 seconds, now its taking longer)
[17:59:40] <bd808>	 paladox: not the operations repo, the act of using git via ssh to clone and update. The exact repo I was hitting is https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/libs/ObjectFactory
[17:59:41] <paladox>	 Hmm it's not pulling for me.
[17:59:58] <robh>	 of course everyone went to git pull when amir reported the error so now its really borked ;D
[18:00:02] <paladox>	 bd808 ah, well the operations repo isen't cloning for me. (updating i mean)
[18:00:04] <robh>	 no_justification: ^
[18:00:04] <jouncebot>	 addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: How many deployers does it take to do Morning SWAT (Max 8 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180124T1800).
[18:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[18:00:18] * paladox creates a task
[18:00:22] <no_justification>	 Don't bother
[18:00:23] <no_justification>	 I'll repack it
[18:00:26] <no_justification>	 Probably won't help
[18:00:30] <no_justification>	 It's a /big/ repo
[18:00:36] <robh>	 git pull is still just sitting for me heh
[18:00:42] <Jayprakash12345>	 I am here
[18:01:49] <paladox>	 no_justification it seems git cloning a small repo is not working.
[18:02:03] <robh>	 yeah git pull on operations/dns is also not working
[18:02:05] <robh>	 and its a very tiny repo
[18:02:08] <no_justification>	 Well I'm already here, so still don't file a task yet
[18:02:08] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922041 (10Paladox)
[18:02:23] <robh>	 also just not git pull, not clone...
[18:02:24] <paladox>	 no_justification oh i just saw that i will close it then.
[18:02:35] <apergos>	 so should I not be attempting to do a regular pull right now? 
[18:02:43] <robh>	 apergos: i tried and its not working
[18:02:47] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922051 (10Paladox) 05Open>03Invalid Chad's dealing with it on irc.
[18:02:47] <apergos>	 right
[18:02:57] <Jayprakash12345>	 Who can SWAT today?
[18:03:34] <robh>	 gits not working so swat is likely going to get held up
[18:03:51] <robh>	 rephrase: our git/gerrit instance is having issues.  
[18:03:58] <no_justification>	 Cloned dns instantly
[18:04:04] <robh>	 huh
[18:04:24] <no_justification>	 Y'all using SSH?
[18:04:28] * no_justification doesn't
[18:04:35] <robh>	 yes
[18:04:43] <paladox>	 no_justification ssh 
[18:04:46] <no_justification>	 SSH pool might be exausted.
[18:04:58] <no_justification>	 Would explain why my maintenance commands won't complete
[18:05:02] <robh>	 how do we go about fixing?
[18:05:04] <no_justification>	 And why puppet's not freaking out everywhere
[18:05:24] <no_justification>	 Well everyone should ctrl+c out of their clones they're doing for "testing"
[18:05:29] <no_justification>	 For starters ;-)
[18:06:59] <no_justification>	 Aaron has had a push open since Jan 18th.
[18:07:00] <no_justification>	 Wtf.
[18:07:04] <no_justification>	 Something got stuck
[18:08:18] <no_justification>	 Amir1: You've got 3 fetches open to Wikibase...?
[18:08:47] <no_justification>	 Er, or pushes. I always get upload-pack and receive-pack backards
[18:12:21] <Jayprakash12345>	 Let me know if SWAT member ready
[18:14:17] <mutante>	 yea, also using ssh. i'm not going to try though  since you said stop testing :)
[18:14:25] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922041 (10Zoranzoki21) All is ok. For me in Serbia work without problems.
[18:14:29] <mutante>	 didnt notice an issue earlier
[18:14:45] <no_justification>	 People should use https instead ;-)
[18:15:48] <mutante>	 i think i even changed that _from_ https to ssh because of some reason
[18:15:57] <mutante>	 ok
[18:16:00] <jynus>	 are the current issues fixed, should we kick it once ?
[18:16:17] <paladox>	 because of git-review 
[18:16:24] <mutante>	 yea, that
[18:16:26] <paladox>	 now that the issue is fixed in 1.26.0
[18:16:37] <mutante>	 i can change it bac?, yes, i am doing that
[18:16:47] <paladox>	 Ok.
[18:17:01] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922073 (10Paladox) Using ssh?
[18:17:30] <mutante>	  /r/ or /r/p/   heh
[18:17:44] <paladox>	 no_justification i wonder do we want to in crease sshd.threads ?
[18:17:57] <no_justification>	 Yes, but not blindly.
[18:18:02] <no_justification>	 I want to revisit all of the thread pool settings
[18:18:07] <no_justification>	 And base them in Science!
[18:18:39] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922077 (10Zoranzoki21) >>! In T185649#3922073, @Paladox wrote: > Using ssh?  With all avaiable options for cloning (anon https, noanon https and ssh)
[18:19:09] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922078 (10Paladox) Hmm strange as it's affecting us and uk users over ssh.
[18:19:29] <paladox>	 Ok :)
[18:19:47] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922080 (10demon)
[18:20:47] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922085 (10Zoranzoki21) >>! In T185649#3922078, @Paladox wrote: > Hmm strange as it's affecting us and uk users over ssh.  Note: I using internet from provider telenor.
[18:21:20] <Jayprakash12345_>	 Git clone was not woking in india sometime before. But Now It is working
[18:23:31] <mutante>	 so to those who use ssh. in your .git/config  in a repo, a line with "url = ssh://" what you do is change the protocol to https://, remove the port number at the end and change the URL to add "/r/p/" before the repo name
[18:23:35] <mutante>	 example:     url = https://gerrit.wikimedia.org/r/p/operations/dns.git
[18:24:05] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922041 (10Lucas_Werkmeister_WMDE) We were experiencing the same problem in the Wikimedia Germany office, but it seems to be working again now (since 18:17 UTC).
[18:24:06] <mutante>	 and then it works fine
[18:24:45] <mutante>	 just dont forget to also remove that port :29418
[18:25:18] <wikibugs>	 10Operations, 10Gerrit: git cloning is not working with gerrit repos - https://phabricator.wikimedia.org/T185649#3922089 (10Zoranzoki21) I have not had a problem for a long time with this.
[18:28:33] <wikibugs>	 (03PS1) 10Chad: Gerrit: Remove arbitrary SSH thread settings [puppet] - 10https://gerrit.wikimedia.org/r/406052 (https://phabricator.wikimedia.org/T182756)
[18:28:52] <no_justification>	 robh, _joe_ ^^
[18:29:01] <robh>	 yep, reviewing now
[18:29:20] <robh>	 so removing it will default it back to the math you place in the commit msg?
[18:29:51] <robh>	 im not sure of that behavior but it seems sensible to me... also if it doesnt work we can just revert and unbreak gerrit 
[18:29:58] <robh>	 and this only affects gerrit so seems ok to me to merge
[18:30:22] <robh>	 no_justification: shall i just go ahead and merge, then we can kick puppet on cobalt and test?
[18:31:08] <wikibugs>	 (03CR) 10RobH: [C: 032] Gerrit: Remove arbitrary SSH thread settings [puppet] - 10https://gerrit.wikimedia.org/r/406052 (https://phabricator.wikimedia.org/T182756) (owner: 10Chad)
[18:31:24] <robh>	 i meant to +1 that not +2...
[18:31:36] <robh>	 no_justification: i assume this should wiat for _joe_ as well?
[18:31:38] <no_justification>	 Yeah, needs a merge, puppet run on cobalt & gerrit2001
[18:31:42] <no_justification>	 Then I'll restart gerrit
[18:31:45] <wikibugs>	 (03CR) 10RobH: [C: 031] Gerrit: Remove arbitrary SSH thread settings [puppet] - 10https://gerrit.wikimedia.org/r/406052 (https://phabricator.wikimedia.org/T182756) (owner: 10Chad)
[18:31:54] <robh>	 ok, ill go ahead and merge then
[18:31:56] <no_justification>	 robh: If you're ok with it. Just pinged _joe_ too cuz he asked in another channel :)
[18:32:07] <robh>	 im cool with it since itll only break gerrit anyhow
[18:32:09] <robh>	 and we're actively working on it
[18:32:12] <wikibugs>	 (03CR) 10RobH: [C: 032] Gerrit: Remove arbitrary SSH thread settings [puppet] - 10https://gerrit.wikimedia.org/r/406052 (https://phabricator.wikimedia.org/T182756) (owner: 10Chad)
[18:32:30] <mutante>	 do it :)
[18:32:47] <urandom>	 !log bootstrapping restbase2007-c - T184100
[18:32:48] <no_justification>	 iirc, those settings came because the original defaults way back when were batshit
[18:32:52] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 031] "WFM, although I would have picked (52 + 52/4) * 7 * 24 = 10920h instead :)" [puppet] - 10https://gerrit.wikimedia.org/r/404434 (https://phabricator.wikimedia.org/T160677) (owner: 10Filippo Giunchedi)
[18:32:55] <mutante>	 defaualts sounds sane
[18:32:57] <mutante>	 for now
[18:32:59] <no_justification>	 But I've honestly not touched them in agesssssss
[18:33:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:33:01] <stashbot>	 T184100: Reprovision legacy Cassandra nodes into new cluster - https://phabricator.wikimedia.org/T184100
[18:33:07] <no_justification>	 Gerrit mostly chooses sane defaults for thread pools and such
[18:33:07] <mutante>	 but better than ancient settings
[18:33:13] <no_justification>	 (httpd being the one exception)
[18:33:17] <mutante>	 *nod*
[18:33:47] <robh>	 ok, merged and puppet is running on cobalt and gerrit2001
[18:33:48] <icinga-wm>	 RECOVERY - cassandra-c service on restbase2007 is OK: OK - cassandra-c is active
[18:33:48] <icinga-wm>	 RECOVERY - cassandra-c SSL 10.192.16.178:7001 on restbase2007 is OK: SSL OK - Certificate restbase2007-c valid until 2018-08-17 16:12:10 +0000 (expires in 204 days)
[18:33:57] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy
[18:33:58] <robh>	 no_justification: cobalt done and saw the update
[18:34:08] <icinga-wm>	 RECOVERY - cassandra-b service on restbase2007 is OK: OK - cassandra-b is active
[18:34:11] <robh>	 same with gerrit2001
[18:34:17] <no_justification>	 Ok, I'll restart services on both (2001 then cobalt)
[18:34:17] <icinga-wm>	 RECOVERY - puppet last run on restbase2007 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[18:34:56] <no_justification>	 Ok, gerrit2001 succeeded (in failing, known issue)
[18:34:57] <no_justification>	 :P
[18:35:03] <mutante>	 ;)
[18:35:16] <no_justification>	 !log gerrit: restarting services, will be back momentarily
[18:35:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:35:37] <Krinkle>	 Timing :)
[18:35:59] <no_justification>	 Back!
[18:36:50] <robh>	 Jayprakash12345_: im kind of surprised that swat was scheduled this week at all
[18:37:03] <robh>	 this week is the dev summit and all hands
[18:38:18] <apergos>	 so are git pulls back in business?
[18:38:50] <no_justification>	 They were before the fix. I cleared the pool and had robh test while it was empty
[18:39:00] <no_justification>	 This will keep people from clogging it up again
[18:39:04] <no_justification>	 (it's usually an accident)
[18:41:08] <icinga-wm>	 PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas]
[18:42:11] <jynus>	 no_justification: thanks, I was suggesting doing that, at least until a more permanent decision is taken
[18:44:56] <no_justification>	 jynus: More permanent...?
[18:45:52] <jynus>	 (ticket)
[18:46:07] <no_justification>	 Ahhhh, ok. No worries
[18:46:14] <no_justification>	 Turns out it was a dupe of a private task we already had
[18:46:25] <jynus>	 cool
[18:46:30] <no_justification>	 tldr: gerrit's thread pool too tiny. So we upped it
[18:46:31] <no_justification>	 :)
[18:46:44] <jynus>	 I added a question for more long-term
[18:50:09] <no_justification>	 jynus: Responding. I have a long answer :)
[18:56:27] <jynus>	 I answered too regarding the one we block
[18:57:06] <jynus>	 proxy purchases for codfw is going slow because there are many other blockers and we are not 100% sure of the final infrastructure
[18:57:20] <jynus>	 *infrastructure's architecture
[18:59:16] <no_justification>	 Good to know!
[19:01:29] <mutante>	 nice!
[19:03:41] <jynus>	 but we should unblock it yes or yes by EoQ
[19:04:01] <jynus>	 it may need app changes, though, like hiera-fication of port numbers
[19:04:28] <jynus>	 (not only for gerrit, for all misc services using databases)
[19:06:49] <wikibugs>	 10Operations, 10Gerrit: Switch gerrit from using apache to nginx - https://phabricator.wikimedia.org/T185645#3922220 (10demon) 05Open>03declined Apache uses less than 100MB of memory on average, plus Gerrit's heap has plenty of space right now--we don't need the memory headroom.  Also, I'd rather move it b...
[19:09:17] <no_justification>	 jynus: Like I said on the task: it'd be *awesome* to offload Puppet's git::clone{} operations to a slave. It's usually only a few seconds behind the master, and those operations are all read-only anyway :)
[19:10:36] <_joe_>	 no_justification: uhm, that would merit its own task
[19:11:07] <icinga-wm>	 RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[19:11:12] <_joe_>	 also, I might take the time to rewrite that bowl of puppet tech debt as a proper resource
[19:11:25] <_joe_>	 no_justification: but git::clone goes via https IIRC
[19:11:29] <paladox>	 jynus no_justification also account data is partially stored in a git repo in the next update. So that will be important to get that synced too.
[19:11:55] <no_justification>	 All repos are sync'd
[19:12:25] <no_justification>	 _joe_: Yes, it does :)
[19:12:49] <_joe_>	 so the threadpool issue is not just about ssh
[19:13:24] <apergos>	 but?
[19:14:04] <no_justification>	 The thread pool for HTTP is wayyyyyy larger
[19:14:29] <no_justification>	 maxThreads = 60
[19:14:34] <no_justification>	 (minThreads = 10)
[19:15:37] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1304 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[19:16:37] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1304 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.001 second response time
[19:18:58] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Depool es2011 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406060
[19:23:35] <wikibugs>	 (03PS1) 10Dzahn: rename piwik::server to just piwik [puppet] - 10https://gerrit.wikimedia.org/r/406061
[19:24:38] <mutante>	 really using the "http password" setting to upload via https then
[19:24:41] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Reimage es2011 into stretch/MariaDB 10.1 [puppet] - 10https://gerrit.wikimedia.org/r/406062
[19:25:25] <jynus>	 I will be doing some minimal mediawiki deployments today
[19:25:28] <icinga-wm>	 RECOVERY - Check systemd state on restbase1011 is OK: OK - running: The system is fully operational
[19:28:27] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool es2011 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406060 (owner: 10Jcrespo)
[19:29:59] <wikibugs>	 (03Merged) 10jenkins-bot: mariadb: Depool es2011 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406060 (owner: 10Jcrespo)
[19:30:16] <wikibugs>	 (03CR) 10jenkins-bot: mariadb: Depool es2011 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406060 (owner: 10Jcrespo)
[19:38:58] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp4025 is CRITICAL: CRITICAL: expiry mailbox lag is 2016246
[19:41:52] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-codfw.php: Depool es2011 (duration: 00m 57s)
[19:42:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:46:38] <greg-g>	 h/go rele
[19:57:10] <jynus>	 !log starting es2011 reimage
[19:57:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:46] <wikibugs>	 10Operations, 10ops-esams, 10Traffic, 10hardware-requests: Procure and install LVS and miscellaneous servers - https://phabricator.wikimedia.org/T184068#3922310 (10RobH) a:03BBlack @BBlack,  Can I get some feedback on how many lvs systems we'll need in esams going forward for this replacement?  I've assu...
[20:46:28] <wikibugs>	 (03CR) 10Imarlier: [C: 031] prometheus: bump global retention to 15 months [puppet] - 10https://gerrit.wikimedia.org/r/404434 (https://phabricator.wikimedia.org/T160677) (owner: 10Filippo Giunchedi)
[20:49:39] <wikibugs>	 10Operations, 10ops-esams, 10Traffic, 10hardware-requests: Procure and install LVS and miscellaneous servers - https://phabricator.wikimedia.org/T184068#3922373 (10BBlack) a:05BBlack>03RobH Basically, for the cache-only sites the setup we've recently purchased for ulsfo + eqsin applies for esams refres...
[21:02:28] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Reimage es2011 into stretch/MariaDB 10.1 [puppet] - 10https://gerrit.wikimedia.org/r/406062 (owner: 10Jcrespo)
[21:11:43] <wikibugs>	 10Operations, 10Analytics-Data-Quality, 10Traffic: Vet reliability of the response_size field for data analysis purposes - https://phabricator.wikimedia.org/T185350#3922415 (10BBlack) Regarding the accuracy/interpretation of `response_size`:  it is based on varnishncsa's `%b`, which is the amount of HTTP bod...
[21:24:43] <wikibugs>	 10Operations, 10TechCom, 10Services (attic), 10User-mobrovac: Service Ownership and Maintenance - https://phabricator.wikimedia.org/T122825#3922428 (10Krinkle) 05stalled>03Open
[21:26:28] <icinga-wm>	 PROBLEM - HHVM rendering on mw2124 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[21:27:27] <icinga-wm>	 RECOVERY - HHVM rendering on mw2124 is OK: HTTP OK: HTTP/1.1 200 OK - 75335 bytes in 0.315 second response time
[21:29:35] <wikibugs>	 10Operations: Something is wrong with installer root disk stuff - https://phabricator.wikimedia.org/T149845#2766226 (10jcrespo) I think I am suffering this, but not on first boot, but on installer (both jessie and stretch). This doesn't happen with regular dbs, but these have 20TB, which may affect it being extr...
[21:52:11] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-partman: Modify recipe, to test with es2011 [puppet] - 10https://gerrit.wikimedia.org/r/406114
[21:53:28] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb-partman: Modify recipe, to test with es2011 [puppet] - 10https://gerrit.wikimedia.org/r/406114 (owner: 10Jcrespo)
[21:56:38] <addshore>	 If anyone is around that can reset my 2fa on phab that would be great! :) (not urgent) I'll just keep posting here every now and again fishing for someone!
[21:57:08] <icinga-wm>	 PROBLEM - HHVM rendering on mw1281 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[21:57:17] <icinga-wm>	 PROBLEM - Apache HTTP on mw1281 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[21:58:08] <icinga-wm>	 RECOVERY - HHVM rendering on mw1281 is OK: HTTP OK: HTTP/1.1 200 OK - 75319 bytes in 0.127 second response time
[21:58:17] <icinga-wm>	 RECOVERY - Apache HTTP on mw1281 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.217 second response time
[22:08:27] <wikibugs>	 (03PS1) 10Jcrespo: Revert "mariadb: Reimage es2011 into stretch/MariaDB 10.1" [puppet] - 10https://gerrit.wikimedia.org/r/406117
[22:08:33] <wikibugs>	 (03PS2) 10Jcrespo: Revert "mariadb: Reimage es2011 into stretch/MariaDB 10.1" [puppet] - 10https://gerrit.wikimedia.org/r/406117
[22:15:02] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Reimage es2011 into stretch/MariaDB 10.1" [puppet] - 10https://gerrit.wikimedia.org/r/406117 (owner: 10Jcrespo)
[22:17:06] <greg-g>	 addshore: how can I know it's you? (I don't think i have access anyway...)
[22:17:31] <addshore>	 hehe, im in the office next to thcipriani and Reedy if that helps ;)
[22:17:43] <Reedy>	 greg-g: addshore is the worst
[22:18:02] <greg-g>	 nice! hope to se eyou tonight!
[22:18:40] <greg-g>	 https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Access_list chad and mukunda, or a root root :)
[22:19:20] <apergos>	 addshore: still around? and if so, are you in the office?
[22:19:50] <addshore>	 apergos: indeed and indeed!
[22:20:00] <addshore>	 I am in the kitchen / by the kitchen / whatever this place is called
[22:20:10] <greg-g>	 duebner lounge
[22:20:55] <addshore>	 I found the docs :D https://wikitech.wikimedia.org/wiki/Phabricator#Removing_Two_Factor_Authentication
[22:21:12] <wikibugs>	 (03PS1) 10Jcrespo: partman: Add temporary hack to test es2011 partitioning [puppet] - 10https://gerrit.wikimedia.org/r/406121
[22:22:04] <wikibugs>	 10Operations: Something is wrong with installer root disk stuff - https://phabricator.wikimedia.org/T149845#3922480 (10jcrespo) My issue could be an installer one, so ignore my latest comment.
[22:23:04] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] partman: Add temporary hack to test es2011 partitioning [puppet] - 10https://gerrit.wikimedia.org/r/406121 (owner: 10Jcrespo)
[22:30:43] <wikibugs>	 10Operations, 10Analytics: setup/install evenlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3922494 (10RobH) p:05Triage>03Normal
[22:35:48] <wikibugs>	 10Operations, 10ops-eqiad: apply hostname labels to eventlog1001/WMF4751 - https://phabricator.wikimedia.org/T185668#3922507 (10RobH) p:05Triage>03Normal
[22:43:35] <wikibugs>	 (03PS1) 10RobH: setting dns entries for eventlog1002 [dns] - 10https://gerrit.wikimedia.org/r/406127 (https://phabricator.wikimedia.org/T185667)
[22:46:09] <wikibugs>	 (03PS2) 10RobH: setting dns entries for eventlog1002 [dns] - 10https://gerrit.wikimedia.org/r/406127 (https://phabricator.wikimedia.org/T185667)
[22:47:12] <wikibugs>	 (03CR) 10RobH: [C: 032] setting dns entries for eventlog1002 [dns] - 10https://gerrit.wikimedia.org/r/406127 (https://phabricator.wikimedia.org/T185667) (owner: 10RobH)
[22:47:48] <wikibugs>	 10Operations, 10Analytics, 10Patch-For-Review: setup/install evenlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3922545 (10RobH)
[22:48:25] <wikibugs>	 (03PS1) 10Addshore: Add 'RevisionStore' to wmgMonologChannels [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406128
[22:48:49] <wikibugs>	 10Operations, 10Analytics, 10hardware-requests: EQIAD: (1) hardware request for eventlog1001 replacement - eventlog1002. - https://phabricator.wikimedia.org/T184551#3922546 (10RobH) 05Open>03Resolved T185667 has been created to track the setup of eventlog1002.  Resolving this #hw-request.
[22:56:46] <wikibugs>	 10Operations, 10Analytics, 10Patch-For-Review: setup/install evenlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3922558 (10RobH)
[23:00:59] <wikibugs>	 10Operations, 10Analytics, 10Patch-For-Review: setup/install evenlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3922561 (10RobH) a:05RobH>03Ottomata @Ottomata: eventlog1001 is trusty.  Can eventlog1002 be stretch or does it need to be an older distro?  Please advise and assign back to me...
[23:08:36] <wikibugs>	 (03PS2) 10RobH: adding new shell user Ramsey Isler [puppet] - 10https://gerrit.wikimedia.org/r/405981 (https://phabricator.wikimedia.org/T185356)
[23:08:38] <wikibugs>	 (03PS2) 10RobH: adding Ramsey Isler to statistics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/405982 (https://phabricator.wikimedia.org/T185356)
[23:08:40] <wikibugs>	 (03PS1) 10RobH: eventlog1002 install params [puppet] - 10https://gerrit.wikimedia.org/r/406129 (https://phabricator.wikimedia.org/T185667)
[23:08:46] <robh>	 .... wtffff
[23:09:00] <wikibugs>	 (03PS2) 10RobH: eventlog1002 install params [puppet] - 10https://gerrit.wikimedia.org/r/406129 (https://phabricator.wikimedia.org/T185667)
[23:09:05] <robh>	 bah, bad local state push
[23:09:11] <robh>	 easy enough to fix, just annoying.
[23:10:58] <wikibugs>	 (03CR) 10RobH: [C: 032] eventlog1002 install params [puppet] - 10https://gerrit.wikimedia.org/r/406129 (https://phabricator.wikimedia.org/T185667) (owner: 10RobH)
[23:11:57] <wikibugs>	 10Operations, 10Analytics: setup/install evenlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3922578 (10RobH)
[23:15:47] <ema>	 !log cp4025: restart varnish backend due to mbox lag
[23:15:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:07] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp4025 is OK: OK: expiry mailbox lag is 0
[23:22:11] <wikibugs>	 (03PS3) 10RobH: adding new shell user Ramsey Isler [puppet] - 10https://gerrit.wikimedia.org/r/405981 (https://phabricator.wikimedia.org/T185356)
[23:24:27] <icinga-wm>	 PROBLEM - HHVM rendering on mw2110 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:25:17] <icinga-wm>	 RECOVERY - HHVM rendering on mw2110 is OK: HTTP OK: HTTP/1.1 200 OK - 75335 bytes in 0.312 second response time
[23:37:05] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Move socket location, disable notifications of es2011 [puppet] - 10https://gerrit.wikimedia.org/r/406130
[23:39:43] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Move socket location, disable notifications of es2011 [puppet] - 10https://gerrit.wikimedia.org/r/406130 (owner: 10Jcrespo)
[23:41:56] <wikibugs>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool es2011 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406132
[23:45:47] <wikibugs>	 10Operations, 10DC-Ops, 10Patch-For-Review: document all scs connections - https://phabricator.wikimedia.org/T175876#3922597 (10RobH) a:05RobH>03ayounsi scs-c1-eqiad.mgmt.eqiad.wmnet is online
[23:46:22] <wikibugs>	 10Operations, 10DC-Ops: document all scs connections - https://phabricator.wikimedia.org/T175876#3922599 (10RobH)
[23:51:24] <wikibugs>	 (03PS1) 10Jcrespo: Revert "Revert "mariadb: Reimage es2011 into stretch/MariaDB 10.1"" [puppet] - 10https://gerrit.wikimedia.org/r/406133
[23:51:31] <wikibugs>	 (03PS2) 10Jcrespo: Revert "Revert "mariadb: Reimage es2011 into stretch/MariaDB 10.1"" [puppet] - 10https://gerrit.wikimedia.org/r/406133
[23:52:34] <wikibugs>	 (03PS1) 10Jcrespo: Revert "mariadb-partman: Modify recipe, to test with es2011" [puppet] - 10https://gerrit.wikimedia.org/r/406134
[23:52:40] <wikibugs>	 (03PS2) 10Jcrespo: Revert "mariadb-partman: Modify recipe, to test with es2011" [puppet] - 10https://gerrit.wikimedia.org/r/406134
[23:53:53] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] Revert "mariadb-partman: Modify recipe, to test with es2011" [puppet] - 10https://gerrit.wikimedia.org/r/406134 (owner: 10Jcrespo)
[23:54:19] <wikibugs>	 (03PS3) 10Jcrespo: Revert "Revert "mariadb: Reimage es2011 into stretch/MariaDB 10.1"" [puppet] - 10https://gerrit.wikimedia.org/r/406133
[23:55:10] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] Revert "Revert "mariadb: Reimage es2011 into stretch/MariaDB 10.1"" [puppet] - 10https://gerrit.wikimedia.org/r/406133 (owner: 10Jcrespo)
[23:55:40] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-2] "Not until es2011 is back in a production state" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406132 (owner: 10Jcrespo)