[00:35:52] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: x1 on db2033 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.40 seconds
[03:01:42] <icinga-wm>	 PROBLEM - mcrouter process on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:01:43] <icinga-wm>	 PROBLEM - Check size of conntrack table on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:01:53] <icinga-wm>	 PROBLEM - nutcracker process on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:02:02] <icinga-wm>	 PROBLEM - MD RAID on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:02:12] <icinga-wm>	 PROBLEM - nutcracker port on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:02:12] <icinga-wm>	 PROBLEM - dhclient process on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:02:32] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:02:33] <icinga-wm>	 PROBLEM - DPKG on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:02:36] <paladox>	 Hmm
[03:02:42] <icinga-wm>	 PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:02:42] <icinga-wm>	 PROBLEM - Disk space on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:02:43] <icinga-wm>	 PROBLEM - configured eth on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:04:16] <paladox>	 mutante: herron ^^
[03:06:42] <icinga-wm>	 PROBLEM - puppet last run on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:08:53] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1009 is CRITICAL: 0.6993 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[03:09:13] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1007 is CRITICAL: 0.4703 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[03:09:22] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mwmaint1002 is CRITICAL: Return code of 255 is out of bounds
[03:10:02] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1009 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[03:10:23] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1007 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[03:16:23] <icinga-wm>	 RECOVERY - MD RAID on mwmaint1002 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0
[03:16:33] <icinga-wm>	 RECOVERY - nutcracker port on mwmaint1002 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[03:16:42] <icinga-wm>	 RECOVERY - dhclient process on mwmaint1002 is OK: PROCS OK: 0 processes with command name dhclient
[03:16:52] <icinga-wm>	 RECOVERY - puppet last run on mwmaint1002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[03:17:02] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mwmaint1002 is OK: OK ferm input default policy is set
[03:17:02] <icinga-wm>	 RECOVERY - DPKG on mwmaint1002 is OK: All packages OK
[03:17:12] <icinga-wm>	 RECOVERY - Check systemd state on mwmaint1002 is OK: OK - running: The system is fully operational
[03:17:13] <icinga-wm>	 RECOVERY - Disk space on mwmaint1002 is OK: DISK OK
[03:17:13] <icinga-wm>	 RECOVERY - configured eth on mwmaint1002 is OK: OK - interfaces up
[03:17:13] <icinga-wm>	 RECOVERY - mcrouter process on mwmaint1002 is OK: PROCS OK: 1 process with UID = 114 (mcrouter), command name mcrouter
[03:17:22] <icinga-wm>	 RECOVERY - Check size of conntrack table on mwmaint1002 is OK: OK: nf_conntrack is 0 % full
[03:17:32] <icinga-wm>	 RECOVERY - nutcracker process on mwmaint1002 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker
[03:34:42] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 858.27 seconds
[03:38:42] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 4 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad%2520prometheus%252Fops
[03:39:23] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mwmaint1002 is OK: OK: synced at Sat 2018-10-20 03:39:21 UTC.
[03:42:12] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1009 is CRITICAL: 0.604 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[03:42:33] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1007 is CRITICAL: 0.419 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[03:42:43] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.3383 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[03:43:22] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1009 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[03:43:42] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1007 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[03:43:52] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1008 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[03:47:02] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 239.95 seconds
[05:38:20] <marostegui>	 !log Force writeback on db2033 - T184888
[05:38:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:38:25] <stashbot>	 T184888: Replace codfw x1 master (db2033) (WAS: Failed BBU on db2033 (x1 master)) - https://phabricator.wikimedia.org/T184888
[06:44:37] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10Smalyshev) I did a short evaluation on provided VM and it looks like it behaves...
[06:45:43] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: x1 on db2033 is OK: OK slave_sql_lag Replication lag: 47.06 seconds
[07:31:11] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: wdqs1009 - cannot create /var/log/wdqs/wdqs_autodeployment.log - https://phabricator.wikimedia.org/T206318 (10Smalyshev)
[07:31:17] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: wdqs1009 - cannot create /var/log/wdqs/wdqs_autodeployment.log - https://phabricator.wikimedia.org/T206318 (10Smalyshev) p:05Triage>03High
[07:34:02] <wikibugs>	 (03PS1) 10Rxy: Add CentralAuth related permissions to stewards at metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468691 (https://phabricator.wikimedia.org/T207531)
[07:34:43] <wikibugs>	 (03PS2) 10Rxy: Add CentralAuth related permissions to stewards at metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468691 (https://phabricator.wikimedia.org/T207531)
[08:08:43] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.2391 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[08:11:02] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.6198 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[08:15:23] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1007 is CRITICAL: 0.2901 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[08:16:32] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1007 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[08:17:43] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1008 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[08:57:39] <wikibugs>	 (03PS1) 10GTirloni: Initial import of shinken-2.0.3 [debs/shinken] - 10https://gerrit.wikimedia.org/r/468692
[08:58:31] <wikibugs>	 (03PS1) 10Matěj Suchánek: Update several Wikidata-related configs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468693
[09:07:22] <wikibugs>	 (03PS2) 10GTirloni: Initial import of shinken-2.0.3 [debs/shinken] - 10https://gerrit.wikimedia.org/r/468692 (https://phabricator.wikimedia.org/T204562)
[09:11:56] <wikibugs>	 (03PS3) 10GTirloni: Initial import of shinken-2.0.3 [debs/shinken] - 10https://gerrit.wikimedia.org/r/468692 (https://phabricator.wikimedia.org/T204562)
[09:12:28] <wikibugs>	 (03PS4) 10GTirloni: Initial import of shinken-2.0.3 [debs/shinken] - 10https://gerrit.wikimedia.org/r/468692 (https://phabricator.wikimedia.org/T204562)
[09:21:08] <wikibugs>	 (03CR) 10GTirloni: [V: 032 C: 032] Initial import of shinken-2.0.3 [debs/shinken] - 10https://gerrit.wikimedia.org/r/468692 (https://phabricator.wikimedia.org/T204562) (owner: 10GTirloni)
[09:40:59] <wikibugs>	 10Operations, 10Puppet, 10Cloud-VPS: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10faidon) Ping? Could we setup a couple of puppetmasters in the new "cloudinfra" project and see where that leads us? I was previously told that this is probably a 1-2 weeks p...
[09:41:41] <wikibugs>	 10Operations, 10MediaWiki-Page-deletion, 10Performance: Deleting pages on the English Wikipedia is very slow - https://phabricator.wikimedia.org/T207530 (10TTO)
[09:42:07] <wikibugs>	 10Operations, 10Puppet, 10Cloud-Services: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10faidon)
[09:53:35] <wikibugs>	 10Operations, 10Cloud-VPS: Move labs-recursors in WMCS - https://phabricator.wikimedia.org/T207533 (10faidon) p:05Triage>03Normal
[09:56:07] <wikibugs>	 10Operations, 10Cloud-VPS: Move labs-recursors in WMCS - https://phabricator.wikimedia.org/T207533 (10faidon)
[09:56:08] <wikibugs>	 (03CR) 10Framawiki: [C: 04-1] Enable suppressredirect and markbotedit rights to rollbackers on it.wikiversity (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468075 (https://phabricator.wikimedia.org/T207300) (owner: 10Zoranzoki21)
[10:57:15] <wikibugs>	 10Operations, 10Cloud-VPS: Move labs-recursors in WMCS - https://phabricator.wikimedia.org/T207533 (10Krenair) > The only gotcha seems to be that the recursor runs some custom Lua code, that uses data generated by a Python script, that in turn seems to gather those from Nova's API. I'm not sure if that's acces...
[11:06:33] <wikibugs>	 10Operations, 10Puppet, 10Cloud-Services: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10Krenair) I imagine we'd  need to issue every instance being moved a new puppet cert, as we presumably wouldn't want to hand the current labs puppetmaster CA over to the...
[11:09:02] <wikibugs>	 10Operations, 10Cloud-VPS: Move labs-recursors in WMCS - https://phabricator.wikimedia.org/T207533 (10Krenair)
[11:09:38] <wikibugs>	 10Operations, 10Cloud-Services, 10Mail, 10Patch-For-Review, 10User-herron: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10Krenair)
[11:09:52] <wikibugs>	 10Operations, 10Puppet, 10Cloud-Services: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10Krenair)
[11:10:34] <paravoid>	 Krenair: oh thanks!
[11:10:44] <paravoid>	 I wanted to do that too, so +1 and really appreciated
[11:10:56] <Krenair>	 paravoid, yeah I noticed a pattern of tasks emerging and thought I'd try to track them
[11:11:00] <Krenair>	 are there any others floating around?
[11:12:02] <Krenair>	 I searched for Cloud VPS/Cloud-Services tasks open and authored by you, the other ones that came up don't appear relevant
[11:14:51] <paravoid>	 I don't think so
[11:58:07] <wikibugs>	 10Operations, 10Cloud-VPS: Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10faidon)
[12:32:00] <wikibugs>	 (03PS1) 10Faidon Liambotis: designate/mitaka: remove typo'ed extension [puppet] - 10https://gerrit.wikimedia.org/r/468697
[12:38:36] <wikibugs>	 (03CR) 10Alex Monk: "Looks like this goes back to the original designate puppetisation in Ic06414d1a942ad0ef9f1fd4be5f5bd002cd07cda so has probably always been" [puppet] - 10https://gerrit.wikimedia.org/r/468697 (owner: 10Faidon Liambotis)
[12:43:19] <wikibugs>	 (03CR) 10Hoo man: [C: 032] Add CentralAuth related permissions to stewards at metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468691 (https://phabricator.wikimedia.org/T207531) (owner: 10Rxy)
[12:44:32] <wikibugs>	 (03Merged) 10jenkins-bot: Add CentralAuth related permissions to stewards at metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468691 (https://phabricator.wikimedia.org/T207531) (owner: 10Rxy)
[12:46:41] <logmsgbot>	 !log hoo@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add CentralAuth related permissions to stewards at metawiki (T207531) (duration: 01m 09s)
[12:46:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:46:45] <stashbot>	 T207531: Migrate global permissions "globalgroupmembership" and "globalgrouppermissions" to meta local definition - https://phabricator.wikimedia.org/T207531
[12:57:36] <wikibugs>	 (03CR) 10jenkins-bot: Add CentralAuth related permissions to stewards at metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468691 (https://phabricator.wikimedia.org/T207531) (owner: 10Rxy)
[13:03:00] <wikibugs>	 10Operations, 10Cloud-VPS: Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Krenair)
[13:03:53] <wikibugs>	 10Operations, 10Cloud-VPS: Move labmon (Graphite, StatsD) into a Cloud VPS - https://phabricator.wikimedia.org/T207543 (10Krenair)
[13:04:24] <wikibugs>	 10Operations, 10Cloud-VPS: Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Krenair)
[13:13:45] <wikibugs>	 10Operations, 10Cloud-VPS: Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Krenair)
[13:14:10] <wikibugs>	 10Operations, 10Cloud-VPS: Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Krenair)
[13:24:25] <wikibugs>	 10Operations, 10Cloud-VPS: Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Krenair)
[13:30:33] <icinga-wm>	 PROBLEM - Disk space on eventlog1002 is CRITICAL: DISK CRITICAL - free space: / 1766 MB (3% inode=98%)
[13:37:37] <wikibugs>	 10Operations, 10Horizon, 10Traffic, 10Upstream: Horizon Designate dashboard not allowing creation of NS records - https://phabricator.wikimedia.org/T204013 (10Krenair) I created an upstream patch, it got merged, now we just need to wait for OpenStack Stein to be released and upgrade to it. Also my original...
[13:53:45] <logmsgbot>	 !log reedy@deploy1001 Synchronized php-1.32.0-wmf.26/includes/auth/AuthManager.php: (no justification provided) (duration: 00m 55s)
[13:53:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:57] <wikibugs>	 (03PS1) 10Alex Monk: labs recursor: require interface alias before trying to start pdns-recursor [puppet] - 10https://gerrit.wikimedia.org/r/468708
[14:14:54] <wikibugs>	 (03CR) 10Alex Monk: "e.g." [puppet] - 10https://gerrit.wikimedia.org/r/468708 (owner: 10Alex Monk)
[14:34:31] <wikibugs>	 (03PS1) 10Alex Monk: labs recursor: Tell labsaliaser to use keystone public port instead of admin port [puppet] - 10https://gerrit.wikimedia.org/r/468709
[14:35:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] labs recursor: Tell labsaliaser to use keystone public port instead of admin port [puppet] - 10https://gerrit.wikimedia.org/r/468709 (owner: 10Alex Monk)
[14:36:39] <wikibugs>	 (03PS2) 10Alex Monk: labsaliaser: use keystone public port instead of admin port [puppet] - 10https://gerrit.wikimedia.org/r/468709 (https://phabricator.wikimedia.org/T207533)
[14:39:13] <icinga-wm>	 RECOVERY - Disk space on eventlog1002 is OK: DISK OK
[14:40:07] <elukey>	 working on it --^
[14:41:44] <wikibugs>	 (03PS1) 10Alex Monk: labs dnsrecursor: require clientlib before labsaliaser [puppet] - 10https://gerrit.wikimedia.org/r/468714
[14:44:04] <wikibugs>	 (03CR) 10Alex Monk: "I think this works in prod at the moment because the hosts include this through other profiles." [puppet] - 10https://gerrit.wikimedia.org/r/468714 (owner: 10Alex Monk)
[14:54:51] <wikibugs>	 (03CR) 10Alex Monk: "(See the Ferm::Rule resources at the bottom of modules/profile/manifests/openstack/base/keystone/service.pp)" [puppet] - 10https://gerrit.wikimedia.org/r/468709 (https://phabricator.wikimedia.org/T207533) (owner: 10Alex Monk)
[15:01:13] <wikibugs>	 (03CR) 10Alex Monk: [C: 031] designate/mitaka: remove typo'ed extension [puppet] - 10https://gerrit.wikimedia.org/r/468697 (owner: 10Faidon Liambotis)
[15:27:24] <wikibugs>	 (03PS1) 10Elukey: eventlogging::server: rotate logs on size (not only on time) [puppet] - 10https://gerrit.wikimedia.org/r/468718
[15:28:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] eventlogging::server: rotate logs on size (not only on time) [puppet] - 10https://gerrit.wikimedia.org/r/468718 (owner: 10Elukey)
[15:28:36] <wikibugs>	 (03PS2) 10Elukey: eventlogging::server: rotate logs on size (not only on time) [puppet] - 10https://gerrit.wikimedia.org/r/468718
[15:28:47] <elukey>	 oh noes I didn't see jenkins
[15:28:51] <elukey>	 another -1 coming
[15:29:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] eventlogging::server: rotate logs on size (not only on time) [puppet] - 10https://gerrit.wikimedia.org/r/468718 (owner: 10Elukey)
[15:29:42] <wikibugs>	 (03PS3) 10Elukey: eventlogging::server: rotate logs on size (not only on time) [puppet] - 10https://gerrit.wikimedia.org/r/468718
[15:30:32] <wikibugs>	 (03PS2) 10Alex Monk: labs dnsrecursor: require clientlib before labsaliaser [puppet] - 10https://gerrit.wikimedia.org/r/468714 (https://phabricator.wikimedia.org/T207533)
[15:30:57] <wikibugs>	 10Operations, 10Cloud-VPS, 10Patch-For-Review: Move labs-recursors in WMCS - https://phabricator.wikimedia.org/T207533 (10Krenair) Created `labs-dnsrecursor-alex-test.openstack.eqiad.wmflabs` and applied `profile::openstack::base::pdns::recursor::service` as well as this hieradata to make it as similar to a...
[15:31:57] <wikibugs>	 (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13119/" [puppet] - 10https://gerrit.wikimedia.org/r/468718 (owner: 10Elukey)
[15:37:57] <elukey>	 cc: mobrovac ---^ I applied a max size for the /var/log/eventlogging dir because of an issue with eventlog1002, but it applies also to kafka[1,2]*. I don't see any issue with it but lemme know otherwise
[16:09:27] <wikibugs>	 10Operations, 10Puppet, 10Cloud-Services: Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10Krenair) >>! In T171188#4682291, @faidon wrote: > - Security model: I suppose that's cloudinfra, right? We need to address that regardless, as we move more services wit...
[17:07:04] <Krenair>	 Reedy, can you look the stack trace in https://phabricator.wikimedia.org/T207553 ?
[17:07:53] <Reedy>	 Error: 1048 Column 'afa_parameters' cannot be null (10.64.32.64)
[17:08:24] <Krenair>	 ty
[18:53:22] <wikibugs>	 (03PS1) 10Niharika29: Deploy TemplateWizard everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468730
[19:51:02] <icinga-wm>	 PROBLEM - High lag on wdqs1003 is CRITICAL: 3601 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[19:53:12] <icinga-wm>	 PROBLEM - High lag on wdqs1003 is CRITICAL: 3611 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[19:53:48] * gehel is looking at wdqs1003
[19:54:50] <gehel>	 !log depooling wdqs1003 to catch up on lag
[19:54:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:57:32] <icinga-wm>	 PROBLEM - High lag on wdqs1003 is CRITICAL: 3632 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[20:39:06] <wikibugs>	 10Operations, 10MediaWiki-Page-deletion, 10Performance: Deleting pages on the English Wikipedia is very slow - https://phabricator.wikimedia.org/T207530 (10Izno) This might possibly be caused by the work for {T198176}.
[21:29:31] <gehel>	 !log repooling wdqs1003 (still some lag, but 100[45] start to be impacted)
[21:29:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:05:27] <logmsgbot>	 !log reedy@deploy1001 Synchronized php-1.32.0-wmf.26/extensions/CentralAuth/: Update setEmail (duration: 00m 55s)
[23:05:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:27:53] <wikibugs>	 (03PS1) 10GTirloni: shinken: Adjustments necessary to upgrade 1.4->2.0 and Trusty->Jessie [puppet] - 10https://gerrit.wikimedia.org/r/468792 (https://phabricator.wikimedia.org/T204562)
[23:32:32] <wikibugs>	 10Operations, 10Cloud-VPS: Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Krenair)
[23:32:50] <Krenair>	 gtirloni, nice, did you get it working?
[23:35:18] <gtirloni>	 Krenair: yep, seems like it! I'll just push a small modification to that change though, forgot about cherrypy
[23:35:36] <Krenair>	 gtirloni, is that based on what I did or did you start afresh?
[23:36:14] <Krenair>	 note that it seemed at first like I got it working, didn't survive the first few restarts though :(
[23:36:49] <gtirloni>	 Krenair: yeah, the restarts caused a few problems with invalid directory permissions and whatnot.. it's weird. you're right, I need to double check that
[23:37:57] <gtirloni>	 at some point I just removed all packages and configs, ran Puppet and started over.. I was too deep in the rabbit hole
[23:38:24] <gtirloni>	 adding a comment to the phab task
[23:38:53] <Krenair>	 was this all based on my attempt?
[23:39:01] <Krenair>	 ah
[23:39:04] <Krenair>	 ok
[23:42:02] <gtirloni>	 I started from your attempt yeah, trying things out.. but I had a few fresh starts that I took as a learning opportunity and it took me a while to get to the point where you left off (I cursed myself more than enough times for that) :-)
[23:43:48] <gtirloni>	 Krenair: my wife is looking at me with the angry eyes.. gotta shutdown the computer now. Thanks for all your work, much appreciated! (it was great that you narrowed it down to a single error).  If you could give your feedback about the change, that'd be awesome.. if you want to add anything, feel free to do so too :)
[23:44:12] <Krenair>	 heh
[23:44:15] <Krenair>	 no worries
[23:44:20] <Krenair>	 thanks