[01:07:23] <icinga-wm>	 RECOVERY - Check systemd state on kafkamon1001 is OK: OK - running: The system is fully operational
[01:10:42] <icinga-wm>	 PROBLEM - Check systemd state on kafkamon1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[02:02:16] <wikibugs>	 (03PS2) 10GTirloni: shinken: Adjustments necessary to upgrade 1.4->2.0 and Trusty->Jessie [puppet] - 10https://gerrit.wikimedia.org/r/468792 (https://phabricator.wikimedia.org/T204562)
[02:03:12] <icinga-wm>	 RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 117 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[03:30:13] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 717.32 seconds
[03:34:23] <icinga-wm>	 PROBLEM - puppet last run on cp4028 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-ISP.mmdb.gz]
[04:01:03] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 193.09 seconds
[04:05:02] <icinga-wm>	 RECOVERY - puppet last run on cp4028 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[06:28:12] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/smart-data-dump]
[06:28:52] <icinga-wm>	 PROBLEM - puppet last run on authdns2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/20-confd.conf]
[06:31:03] <icinga-wm>	 PROBLEM - puppet last run on cloudservices1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled]
[06:31:33] <icinga-wm>	 PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ferm/functions.conf]
[06:56:32] <icinga-wm>	 RECOVERY - puppet last run on cloudservices1004 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[06:57:02] <icinga-wm>	 RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:58:33] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:59:22] <icinga-wm>	 RECOVERY - puppet last run on authdns2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:08:53] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.4769 ge 0.1 https://grafana.wikimedia.org/dashboard/db/logstash
[09:09:53] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1008 is OK: (C)0.1 ge (W)0.05 ge 0 https://grafana.wikimedia.org/dashboard/db/logstash
[09:35:02] <icinga-wm>	 PROBLEM - Logstash rate of ingestion percent change compared to yesterday on einsteinium is CRITICAL: 137.7 ge 130 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen
[11:10:48] <yannf>	 *.archive.org is whitelisted on Commons for URLto Upload https://commons.wikimedia.org/wiki/Commons:Upload_tools/wgCopyUploadsDomains
[11:11:09] <yannf>	 but https://ia800406.us.archive.org/21/items/BharatvarshiyMadhyayuginCharitrakoshCropped/lila%20charitra_cropped.pdf can't be uploaded
[11:11:15] <yannf>	 any idea?
[11:14:08] <Krenair>	 yannf, yeah that won't match
[11:14:25] <Krenair>	 *.archive.org is whitelisted, not *.*.archive.org
[11:15:23] <Krenair>	 https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/includes/upload/UploadFromUrl.php#83
[11:18:35] <yannf>	 oh :/
[11:22:42] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[11:25:26] <wikibugs>	 10Operations, 10MediaWiki-Page-deletion, 10Performance: Deleting pages on the English Wikipedia is very slow - https://phabricator.wikimedia.org/T207530 (10MarcoAurelio) AIUI the method we requested in the task above was to be applied when a page had a high number of revisions, not for all page deletions. If...
[11:27:25] <yannf>	 https://phabricator.wikimedia.org/T207581
[11:27:46] <yannf>	 Krenair, parent task is ok?
[11:28:55] <Krenair>	 yeah
[11:29:22] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[11:29:35] <Krenair>	 you will need to add site-requests though yannf
[11:29:42] <wikibugs>	 10Operations, 10DNS, 10Traffic: Add punjabi.wikimedia.org to DNS - https://phabricator.wikimedia.org/T207583 (10Urbanecm)
[11:36:48] <wikibugs>	 (03PS5) 10MarcoAurelio: Close chairwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/443585 (https://phabricator.wikimedia.org/T184961)
[11:37:16] <wikibugs>	 (03PS1) 10Urbanecm: Add punjabiwikimedia [dns] - 10https://gerrit.wikimedia.org/r/468812 (https://phabricator.wikimedia.org/T207583)
[11:46:46] <wikibugs>	 10Operations, 10DNS, 10Traffic, 10Patch-For-Review: Add punjabi.wikimedia.org to DNS and Apache - https://phabricator.wikimedia.org/T207583 (10Urbanecm)
[11:46:56] <wikibugs>	 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Add punjabi.wikimedia.org to DNS and Apache - https://phabricator.wikimedia.org/T207583 (10Urbanecm)
[11:48:03] <wikibugs>	 (03PS1) 10Urbanecm: Add punjabi.wikimedia.org to Apache [puppet] - 10https://gerrit.wikimedia.org/r/468814
[11:48:27] <wikibugs>	 (03PS2) 10Urbanecm: Add punjabi.wikimedia.org to Apache [puppet] - 10https://gerrit.wikimedia.org/r/468814 (https://phabricator.wikimedia.org/T207583)
[11:58:37] <wikibugs>	 (03PS1) 10Urbanecm: Initial configuration for punjabiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468815 (https://phabricator.wikimedia.org/T204477)
[12:03:12] <wikibugs>	 (03PS2) 10Urbanecm: Initial configuration for punjabiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468815 (https://phabricator.wikimedia.org/T204477)
[12:15:20] <wikibugs>	 (03PS3) 10Urbanecm: Add punjabi.wikimedia.org to Apache [puppet] - 10https://gerrit.wikimedia.org/r/468814 (https://phabricator.wikimedia.org/T207583)
[12:26:24] <wikibugs>	 (03PS1) 10Urbanecm: Close internalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468823 (https://phabricator.wikimedia.org/T205584)
[12:30:53] <icinga-wm>	 PROBLEM - High lag on wdqs1003 is CRITICAL: 1.247e+04 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[12:34:39] * onimisionipe is looking into WDQS
[12:39:12] <onimisionipe>	 !log depooling wdqs1003 to catchup on lag time
[12:39:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:31] <gehel>	 onimisionipe: thanks for looking into it!
[12:41:35] <wikibugs>	 (03PS1) 10Urbanecm: Anniversary logo for cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468824 (https://phabricator.wikimedia.org/T207589)
[12:42:01] <gehel>	 We should have a patch for the kafka poller tomorrow
[12:43:01] <onimisionipe>	 gehel: You welcome!
[12:47:10] <wikibugs>	 (03PS2) 10Zoranzoki21: Enable suppressredirect and markbotedit rights to rollbackers on it.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468075 (https://phabricator.wikimedia.org/T207300)
[12:47:14] <wikibugs>	 (03CR) 10Zoranzoki21: Enable suppressredirect and markbotedit rights to rollbackers on it.wikiversity (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468075 (https://phabricator.wikimedia.org/T207300) (owner: 10Zoranzoki21)
[12:47:20] <wikibugs>	 (03PS3) 10Zoranzoki21: Enable suppressredirect and markbotedit rights to rollbackers on it.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468075 (https://phabricator.wikimedia.org/T207300)
[12:55:43] <wikibugs>	 (03PS2) 10Urbanecm: Enable rollbacker right on srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468080 (https://phabricator.wikimedia.org/T206935) (owner: 10Zoranzoki21)
[12:56:04] <wikibugs>	 (03PS2) 10Zoranzoki21: Enable autopatroller, patroller and rollbacker rights on srwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468079 (https://phabricator.wikimedia.org/T206936)
[12:56:52] <wikibugs>	 (03PS3) 10Urbanecm: Enable autopatroller, patroller and rollbacker rights on srwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468079 (https://phabricator.wikimedia.org/T206936) (owner: 10Zoranzoki21)
[12:59:16] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/443585 (https://phabricator.wikimedia.org/T184961) (owner: 10MarcoAurelio)
[13:00:24] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468075 (https://phabricator.wikimedia.org/T207300) (owner: 10Zoranzoki21)
[13:21:13] <wikibugs>	 (03PS3) 10Zoranzoki21: Enable rollbacker right on srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468080 (https://phabricator.wikimedia.org/T206935)
[13:22:55] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468080 (https://phabricator.wikimedia.org/T206935) (owner: 10Zoranzoki21)
[13:33:16] <wikibugs>	 (03PS4) 10Zoranzoki21: Enable rollbacker right on srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468080 (https://phabricator.wikimedia.org/T206935)
[13:37:40] <wikibugs>	 (03PS4) 10Zoranzoki21: Enable autopatroller, patroller and rollbacker rights on srwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468079 (https://phabricator.wikimedia.org/T206936)
[13:38:53] <wikibugs>	 (03PS1) 10Framawiki: Whitelist *.*.archive.org in wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468831 (https://phabricator.wikimedia.org/T207581)
[13:39:32] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468079 (https://phabricator.wikimedia.org/T206936) (owner: 10Zoranzoki21)
[13:47:07] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10Andrew) Thanks, Stas.  There are two ways I think we can go forward with this:...
[13:47:22] <wikibugs>	 (03PS6) 10Andrew Bogott: nova: update scheduling pools for main and eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/468377
[13:48:32] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] nova: update scheduling pools for main and eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/468377 (owner: 10Andrew Bogott)
[13:49:48] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] "I don't know what this was supposed to do, but clearly we're getting along without it :)" [puppet] - 10https://gerrit.wikimedia.org/r/468697 (owner: 10Faidon Liambotis)
[13:49:58] <wikibugs>	 (03PS2) 10Andrew Bogott: designate/mitaka: remove typo'ed extension [puppet] - 10https://gerrit.wikimedia.org/r/468697 (owner: 10Faidon Liambotis)
[13:52:05] <wikibugs>	 (03PS3) 10Andrew Bogott: labsaliaser: use keystone public port instead of admin port [puppet] - 10https://gerrit.wikimedia.org/r/468709 (https://phabricator.wikimedia.org/T207533) (owner: 10Alex Monk)
[13:53:31] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labsaliaser: use keystone public port instead of admin port [puppet] - 10https://gerrit.wikimedia.org/r/468709 (https://phabricator.wikimedia.org/T207533) (owner: 10Alex Monk)
[13:57:09] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-1] "This seems right but we probably need the include in an upstream role.  As it is this fails on cloudservices1003:" [puppet] - 10https://gerrit.wikimedia.org/r/468714 (https://phabricator.wikimedia.org/T207533) (owner: 10Alex Monk)
[13:57:37] <wikibugs>	 (03PS7) 10Zoranzoki21: Edited syntax of the code where is the content for user rights for mlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/464485
[13:57:54] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] "Tested post-merge and it looks good.  Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/468709 (https://phabricator.wikimedia.org/T207533) (owner: 10Alex Monk)
[13:58:59] <wikibugs>	 (03PS2) 10Andrew Bogott: labs recursor: require interface alias before trying to start pdns-recursor [puppet] - 10https://gerrit.wikimedia.org/r/468708 (owner: 10Alex Monk)
[13:59:33] <icinga-wm>	 PROBLEM - configured eth on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[13:59:52] <icinga-wm>	 PROBLEM - MD RAID on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[13:59:54] <Krenair>	 andrewbogott, I'm not sure about that clientlib thing. Shouldn't I be able to apply the pdns recursor profile and have it work without requiring other profiles?
[14:00:02] <icinga-wm>	 PROBLEM - dhclient process on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:00:12] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:00:12] <icinga-wm>	 PROBLEM - DPKG on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:00:13] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:00:22] <icinga-wm>	 PROBLEM - Keyholder SSH agent on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:00:23] <icinga-wm>	 PROBLEM - Disk space on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:00:23] <icinga-wm>	 PROBLEM - Check size of conntrack table on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:01:40] <andrewbogott>	 Krenair: yeah, ideally the profile should be able to live on its own.  I haven't actually looked, do you know where the conflicting include is?
[14:02:02] <icinga-wm>	 PROBLEM - puppet last run on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:02:19] <Krenair>	 hm
[14:02:30] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labs recursor: require interface alias before trying to start pdns-recursor [puppet] - 10https://gerrit.wikimedia.org/r/468708 (owner: 10Alex Monk)
[14:02:39] <Krenair>	 I just noticed that I'm trying to include a main profile inside a base profile too
[14:02:53] <Krenair>	 I imagine it conflicts with stuff like modules/profile/manifests/openstack/main/designate/service.pp:    require ::profile::openstack::main::clientlib
[14:04:49] <andrewbogott>	 btw, do you know what Faidon means about 'cloud->prod flow'?  Are there things being communicated from the VMs to the recursor other than the names and IPs of instances?
[14:04:53] <icinga-wm>	 RECOVERY - Check size of conntrack table on cumin1001 is OK: OK: nf_conntrack is 0 % full
[14:05:12] <icinga-wm>	 RECOVERY - configured eth on cumin1001 is OK: OK - interfaces up
[14:05:23] <icinga-wm>	 RECOVERY - MD RAID on cumin1001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[14:05:33] <icinga-wm>	 RECOVERY - dhclient process on cumin1001 is OK: PROCS OK: 0 processes with command name dhclient
[14:05:43] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational
[14:05:43] <icinga-wm>	 RECOVERY - DPKG on cumin1001 is OK: All packages OK
[14:05:43] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on cumin1001 is OK: OK ferm input default policy is set
[14:05:52] <icinga-wm>	 RECOVERY - Keyholder SSH agent on cumin1001 is OK: OK: Keyholder is armed with all configured keys.
[14:05:53] <icinga-wm>	 RECOVERY - Disk space on cumin1001 is OK: DISK OK
[14:06:33] <Krenair>	 andrewbogott, I assume he means cloud stuff able to talk to prod stuff in ways that random hosts on the internet would not be able to
[14:07:08] <andrewbogott>	 ah, so just because that's a non-public IP?  Hm… 
[14:07:12] <icinga-wm>	 RECOVERY - puppet last run on cumin1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:07:20] <Krenair>	 andrewbogott, well
[14:07:26] <Krenair>	 it's a public IP
[14:07:39] <andrewbogott>	 sorry, I meant to type, a non-public port
[14:07:47] <Krenair>	 yeah
[14:07:54] <andrewbogott>	 I don't generally think of DNS servers as an attack vector
[14:08:01] <Krenair>	 this is kind of a tangent anyway
[14:08:30] <Krenair>	 it seems the basic idea is to move as much cloud infrastructure as possible into cloud VMs
[14:08:50] <Krenair>	 see PM
[14:16:33] <icinga-wm>	 PROBLEM - MD RAID on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:16:42] <icinga-wm>	 PROBLEM - dhclient process on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:16:53] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:16:53] <icinga-wm>	 PROBLEM - DPKG on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:16:53] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:16:57] <wikibugs>	 10Operations, 10Cloud-VPS, 10Patch-For-Review: Move labs-recursors in WMCS - https://phabricator.wikimedia.org/T207533 (10Andrew) My only concern about this is that those recursors are used about every second on every VM, so they're a huge, vital point of failure and I'm a bit reluctant to rock the boat.  In...
[14:17:02] <icinga-wm>	 PROBLEM - Keyholder SSH agent on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:17:03] <icinga-wm>	 PROBLEM - Disk space on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:17:12] <icinga-wm>	 PROBLEM - Check size of conntrack table on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:17:32] <icinga-wm>	 PROBLEM - configured eth on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:19:42] <icinga-wm>	 PROBLEM - puppet last run on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:23:43] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on cumin1001 is CRITICAL: Return code of 255 is out of bounds
[14:34:43] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational
[14:34:43] <icinga-wm>	 RECOVERY - DPKG on cumin1001 is OK: All packages OK
[14:34:52] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on cumin1001 is OK: OK ferm input default policy is set
[14:34:53] <icinga-wm>	 RECOVERY - Keyholder SSH agent on cumin1001 is OK: OK: Keyholder is armed with all configured keys.
[14:34:53] <icinga-wm>	 RECOVERY - Disk space on cumin1001 is OK: DISK OK
[14:35:02] <icinga-wm>	 RECOVERY - puppet last run on cumin1001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[14:35:02] <icinga-wm>	 RECOVERY - Check size of conntrack table on cumin1001 is OK: OK: nf_conntrack is 0 % full
[14:35:22] <icinga-wm>	 RECOVERY - configured eth on cumin1001 is OK: OK - interfaces up
[14:35:33] <icinga-wm>	 RECOVERY - MD RAID on cumin1001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[14:35:42] <icinga-wm>	 RECOVERY - dhclient process on cumin1001 is OK: PROCS OK: 0 processes with command name dhclient
[14:53:52] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on cumin1001 is OK: OK: synced at Sun 2018-10-21 14:53:45 UTC.
[15:35:22] <icinga-wm>	 PROBLEM - Logstash rate of ingestion percent change compared to yesterday on einsteinium is CRITICAL: 136.3 ge 130 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen
[15:57:28] <bawolff>	 !log adjust patch for T194204
[15:57:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:11:08] <wikibugs>	 (03PS1) 10Reedy: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468845
[16:11:10] <wikibugs>	 (03CR) 10Reedy: [C: 032] Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468845 (owner: 10Reedy)
[16:14:52] <wikibugs>	 (03Merged) 10jenkins-bot: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468845 (owner: 10Reedy)
[16:15:57] <logmsgbot>	 !log reedy@deploy1001 Synchronized wmf-config/interwiki.php: Updating interwiki cache (duration: 04m 52s)
[16:15:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:25:50] <wikibugs>	 (03CR) 10jenkins-bot: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468845 (owner: 10Reedy)
[16:33:08] <Zoranzoki21>	 What happening with jobs?
[16:33:37] <Zoranzoki21>	 gate-and-submit stopped with working and started everything from start
[16:33:56] <Zoranzoki21>	 https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/467119/ is not merged
[16:47:43] <greg-g>	 I looked ^^ things look fine to me. The patch they referenced was odd (they Verified+2 instead of just CodeReview+2) I re+2'd, we'll see
[16:50:45] <greg-g>	 ok, maybe not? zuul isn't picking up my +2
[16:53:11] <paladox>	 i see it in the queue greg-g 
[16:53:19] <paladox>	 467119,6
[16:53:25] <paladox>	 and 453660,14
[17:03:23] <wikibugs>	 (03PS3) 10GTirloni: shinken: Adjustments necessary to upgrade 1.4->2.0 and Trusty->Jessie [puppet] - 10https://gerrit.wikimedia.org/r/468792 (https://phabricator.wikimedia.org/T204562)
[17:04:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] shinken: Adjustments necessary to upgrade 1.4->2.0 and Trusty->Jessie [puppet] - 10https://gerrit.wikimedia.org/r/468792 (https://phabricator.wikimedia.org/T204562) (owner: 10GTirloni)
[17:05:54] <greg-g>	 paladox: there's no update from zuul: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/467119/
[17:07:53] <greg-g>	 paladox: but yeah, I see that 467119,6 there being worked up, we'll see
[17:12:37] <wikibugs>	 (03PS1) 10GTirloni: shinken: Remove unused 'Keyholder status' check [puppet] - 10https://gerrit.wikimedia.org/r/468848 (https://phabricator.wikimedia.org/T183454)
[17:22:41] <wikibugs>	 (03CR) 10Alex Monk: [C: 04-1] "This is not for toolsbeta." [puppet] - 10https://gerrit.wikimedia.org/r/468848 (https://phabricator.wikimedia.org/T183454) (owner: 10GTirloni)
[17:22:58] <greg-g>	 heh, and now it failed
[17:26:16] <wikibugs>	 (03CR) 10GTirloni: "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/468848 (https://phabricator.wikimedia.org/T183454) (owner: 10GTirloni)
[17:26:19] <wikibugs>	 (03CR) 10Alex Monk: [C: 04-1] "That said looks like it has been broken since Ieac6487d." [puppet] - 10https://gerrit.wikimedia.org/r/468848 (https://phabricator.wikimedia.org/T183454) (owner: 10GTirloni)
[17:27:46] <wikibugs>	 (03PS2) 10Alex Monk: shinken: Remove broken 'Keyholder status' check [puppet] - 10https://gerrit.wikimedia.org/r/468848 (https://phabricator.wikimedia.org/T183454) (owner: 10GTirloni)
[17:27:57] <wikibugs>	 (03CR) 10Alex Monk: [C: 031] shinken: Remove broken 'Keyholder status' check [puppet] - 10https://gerrit.wikimedia.org/r/468848 (https://phabricator.wikimedia.org/T183454) (owner: 10GTirloni)
[17:37:08] <Zoranzoki21>	 LGTM now https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/467119/
[19:11:11] <Zoranzoki21>	 Hi, why phab.wmflabs.org no send me email?
[19:15:05] <p858snake>	 Zoranzoki21: its better to ask in -cloud, or the relevant tool maintainers directly
[19:15:11] <paladox>	 because it's in neutron
[19:17:32] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10Smalyshev) > Run a second set of tests on a similar VM that shares a host with o...
[19:38:21] <wikibugs>	 (03PS1) 10GTirloni: git-sync-upstream: Send cron mail in case of failures [puppet] - 10https://gerrit.wikimedia.org/r/468865 (https://phabricator.wikimedia.org/T184261)
[20:42:26] <banyek>	 !log resuming replication on s4@dbstore2002 (T204930)
[20:42:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:42:29] <stashbot>	 T204930: dbstore2002 tables compression status check - https://phabricator.wikimedia.org/T204930
[20:56:18] <banyek>	 I ack'd a predictive disk failure on db2061 with (https://phabricator.wikimedia.org/T207212#4679442)
[20:56:40] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10monitoring, 10Patch-For-Review: Releases Jenkins Icinga check failing after restricting access - https://phabricator.wikimedia.org/T206579 (10hashar) Sorry for the trailing `/` and thank you for the quick fix.
[21:45:44] <wikibugs>	 10Operations, 10MediaWiki-Page-deletion, 10Performance: Deleting pages on the English Wikipedia is very slow - https://phabricator.wikimedia.org/T207530 (10BPirkle) It is unexpected that the website would appear to hang, whether or not the deletion is batched.  The batch deletion threshold is controlled by $...
[22:11:09] <wikibugs>	 (03PS4) 10GTirloni: shinken: Adjustments necessary to upgrade 1.4->2.0 and Trusty->Jessie [puppet] - 10https://gerrit.wikimedia.org/r/468792 (https://phabricator.wikimedia.org/T204562)
[22:11:19] <wikibugs>	 (03CR) 10Mathew.onipe: "I have some concern with the way I implemented this. I have another proposal tomorrow that I think is better. Will discuss in standup. Tha" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T202885) (owner: 10Mathew.onipe)
[22:14:23] <icinga-wm>	 RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1131 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[22:15:39] <onimisionipe>	 !log repooling wdqs1003 as it has caught up on lag
[22:15:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:20:20] <gehel>	 onimisionipe: thanks again!
[22:20:23] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[22:21:22] <onimisionipe>	 gehel: uwc!
[22:22:42] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen