[00:18:58] <icinga-wm>	 PROBLEM - MegaRAID on db1016 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[02:30:03] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.11) (duration: 10m 16s)
[02:30:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:51:00] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.12) (duration: 07m 56s)
[02:51:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:57:42] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Aug  7 02:57:42 UTC 2017 (duration 6m 42s)
[02:57:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:26:07] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 817.30 seconds
[03:52:08] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 297.06 seconds
[04:38:57] <icinga-wm>	 PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:08:17] <icinga-wm>	 RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[06:20:17] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3505078 (10Marostegui) The BBU is failing again, so we should try to give m1 master failover some priority amongst the other misc services.
[06:20:46] <marostegui>	 !log Force BBU re-learn on db1016 - T166344
[06:20:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:21:00] <stashbot>	 T166344: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344
[06:22:27] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2073" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370438
[06:22:30] <wikibugs>	 (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2073" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370438
[06:23:57] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: puppetmaster::puppetdb::client: fix dependencies. [puppet] - 10https://gerrit.wikimedia.org/r/370439 (https://phabricator.wikimedia.org/T172547)
[06:24:30] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2073" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370438 (owner: 10Marostegui)
[06:25:57] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2073" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370438 (owner: 10Marostegui)
[06:26:07] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2073" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370438 (owner: 10Marostegui)
[06:27:09] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2073 - T171321 (duration: 00m 47s)
[06:27:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:27:22] <stashbot>	 T171321: Finish dbstore2002 migration to multi-instance - https://phabricator.wikimedia.org/T171321
[06:28:10] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster::puppetdb::client: fix dependencies. [puppet] - 10https://gerrit.wikimedia.org/r/370439 (https://phabricator.wikimedia.org/T172547) (owner: 10Giuseppe Lavagetto)
[06:29:08] <icinga-wm>	 RECOVERY - MegaRAID on db1016 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[06:29:50] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3505090 (10Marostegui) After forcing the relearn, this recovered: ``` ˜/icinga-wm 8:29> RECOVERY - MegaRAID on db1016 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy ```
[06:30:57] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Depool db2074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370440 (https://phabricator.wikimedia.org/T171321)
[06:32:13] <wikibugs>	 10Puppet, 10Cloud-VPS, 10Patch-For-Review: ::profile::puppetmaster::common missing dependencies when $storeconfigs=puppetdb - https://phabricator.wikimedia.org/T172547#3505092 (10Joe) 05Open>03Resolved
[06:33:00] <marostegui>	 !log Stop replication on db2075 - T170662
[06:33:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:33:12] <stashbot>	 T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662
[06:33:56] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370440 (https://phabricator.wikimedia.org/T171321) (owner: 10Marostegui)
[06:35:21] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw.php: Depool db2074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370440 (https://phabricator.wikimedia.org/T171321) (owner: 10Marostegui)
[06:36:10] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Depool db2074 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370440 (https://phabricator.wikimedia.org/T171321) (owner: 10Marostegui)
[06:37:39] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2074 - T171321 (duration: 00m 46s)
[06:37:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:37:51] <stashbot>	 T171321: Finish dbstore2002 migration to multi-instance - https://phabricator.wikimedia.org/T171321
[06:38:26] <marostegui>	 !log Stop MySQL on db2074 - T171321
[06:38:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:42:44] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Add s3 to dbstore2002 [puppet] - 10https://gerrit.wikimedia.org/r/370441 (https://phabricator.wikimedia.org/T171321)
[06:45:36] <wikibugs>	 (03CR) 10Marostegui: "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler02/7314/" [puppet] - 10https://gerrit.wikimedia.org/r/370441 (https://phabricator.wikimedia.org/T171321) (owner: 10Marostegui)
[06:45:41] <wikibugs>	 (03CR) 10Marostegui: [C: 032] mariadb: Add s3 to dbstore2002 [puppet] - 10https://gerrit.wikimedia.org/r/370441 (https://phabricator.wikimedia.org/T171321) (owner: 10Marostegui)
[06:52:33] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370442
[06:59:35] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370442 (owner: 10Marostegui)
[07:01:05] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370442 (owner: 10Marostegui)
[07:01:15] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370442 (owner: 10Marostegui)
[07:02:16] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2065 to reimport: page, linter and watchlist tables (duration: 00m 47s)
[07:02:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:02:37] <marostegui>	 !log Stop replication on db2065 to reimport: page, linter and watchlist tables
[07:02:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:29:30] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633#3505104 (10elukey) Tried this:  * ifdown eth0 * modprobe -r tg3 * modprobe tg3 * ifup eth0  ``` [Mon Aug  7 07:28:13 2017] pps_core: LinuxPPS...
[08:09:08] <icinga-wm>	 PROBLEM - MegaRAID on db1016 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough
[08:11:22] <marostegui>	 Again..
[08:11:54] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3505175 (10Marostegui) And again: `˜/icinga-wm 10:09> PROBLEM - MegaRAID on db1016 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough`
[08:12:01] <marostegui>	 !log Force BBU re-learn on db1016 - T166344
[08:12:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:15] <stashbot>	 T166344: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344
[08:13:20] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3293244 (10jcrespo) Maybe we can setup m1 on db1069?
[08:18:27] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3505184 (10Marostegui) >>! In T166344#3505178, @jcrespo wrote: > Maybe we can setup m1 on db1069?  I like that idea, I'll try to work on: T166546 soon as I am about to finish with: T153743
[08:28:59] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work): some elasticsearch servers in eqiad have CPU overheating - https://phabricator.wikimedia.org/T168816#3505200 (10Gehel) elastic1017-1031 have had thermal paste applied. Looking at [[ https://grafana.wikimedia.org/dashboard/db/prometheus...
[08:30:47] <Deskana>	 Is there anyone available here that can help me log in to Phabricator?
[08:31:06] <Deskana>	 I'm locked out due to no longer having the device that it wants an auth code from.
[08:37:57] <Bsadowski1>	 Ouch
[08:38:00] <Bsadowski1>	 Hey Deskana :D
[08:38:11] <Bsadowski1>	 How are ya?
[08:45:40] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370444
[09:00:41] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370444 (owner: 10Marostegui)
[09:02:05] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370444 (owner: 10Marostegui)
[09:02:19] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370444 (owner: 10Marostegui)
[09:03:53] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2065 after fixing: linter, page and watchlist tables (duration: 00m 47s)
[09:04:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:06:33] <elukey>	 !log set net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 (was 120) on all the analytics kafka brokers  - T136094
[09:06:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:06:44] <stashbot>	 T136094: Race condition in setting net.netfilter.nf_conntrack_tcp_timeout_time_wait - https://phabricator.wikimedia.org/T136094
[09:07:06] <wikibugs>	 10Operations, 10Patch-For-Review, 10User-Elukey: Race condition in setting net.netfilter.nf_conntrack_tcp_timeout_time_wait - https://phabricator.wikimedia.org/T136094#3505255 (10elukey)
[09:17:21] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: role::puppet_compiler: bind ssl to 0.0.0.0 [puppet] - 10https://gerrit.wikimedia.org/r/370445
[09:37:35] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Pool db1098 as new s8 recentchanges/watchlist host [puppet] - 10https://gerrit.wikimedia.org/r/370447 (https://phabricator.wikimedia.org/T172679)
[09:42:34] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Pool db1098 as new s6 recentchanges/watchlist host [puppet] - 10https://gerrit.wikimedia.org/r/370447 (https://phabricator.wikimedia.org/T172679)
[09:47:45] <jynus>	 !log stopping db1050's mysql and cloning it to db1089
[09:47:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:08] <wikibugs>	 (03PS1) 10Marostegui: s3.hosts: dbstore2002 is now replicating s3 [software] - 10https://gerrit.wikimedia.org/r/370448 (https://phabricator.wikimedia.org/T171321)
[09:52:17] <wikibugs>	 (03PS1) 10Marostegui: mariadb: dbstore2002 has now 5 shards replicating [puppet] - 10https://gerrit.wikimedia.org/r/370449 (https://phabricator.wikimedia.org/T171321)
[09:52:24] <marostegui>	 jynus: ^
[09:52:53] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] mariadb: dbstore2002 has now 5 shards replicating [puppet] - 10https://gerrit.wikimedia.org/r/370449 (https://phabricator.wikimedia.org/T171321) (owner: 10Marostegui)
[09:53:12] <wikibugs>	 (03CR) 10Marostegui: [C: 032] mariadb: dbstore2002 has now 5 shards replicating [puppet] - 10https://gerrit.wikimedia.org/r/370449 (https://phabricator.wikimedia.org/T171321) (owner: 10Marostegui)
[09:56:05] <wikibugs>	 (03CR) 10Marostegui: [C: 032] s3.hosts: dbstore2002 is now replicating s3 [software] - 10https://gerrit.wikimedia.org/r/370448 (https://phabricator.wikimedia.org/T171321) (owner: 10Marostegui)
[09:56:51] <wikibugs>	 (03Merged) 10jenkins-bot: s3.hosts: dbstore2002 is now replicating s3 [software] - 10https://gerrit.wikimedia.org/r/370448 (https://phabricator.wikimedia.org/T171321) (owner: 10Marostegui)
[09:59:24] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: role::puppet_compiler: bind ssl to 0.0.0.0 [puppet] - 10https://gerrit.wikimedia.org/r/370445
[10:02:14] <marostegui>	 !log Add dbstore2002:3313 to tendril - T171321
[10:02:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:02:26] <stashbot>	 T171321: Finish dbstore2002 migration to multi-instance - https://phabricator.wikimedia.org/T171321
[10:07:02] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/7321/" [puppet] - 10https://gerrit.wikimedia.org/r/370445 (owner: 10Giuseppe Lavagetto)
[10:14:34] <Volker_E>	 qchris: around?
[10:14:45] <wikibugs>	 10Operations, 10Analytics-Kanban, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3505470 (10elukey)
[10:18:59] <wikibugs>	 10Operations, 10Analytics-Kanban, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3505486 (10elukey)
[10:19:08] <icinga-wm>	 RECOVERY - MegaRAID on db1016 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy
[10:23:03] <wikibugs>	 10Operations, 10Analytics-Kanban, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3505490 (10elukey)
[10:23:34] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: puppet-compiler: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/370451
[10:24:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] puppet-compiler: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/370451 (owner: 10Giuseppe Lavagetto)
[10:28:02] <wikibugs>	 (03CR) 10MarcoAurelio: Enable wgMinervaEnableSiteNotice for kowiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[10:30:12] <TabbyCat>	 Hi Reedy: do you think https://gerrit.wikimedia.org/r/#/c/370310/ is ready?
[10:40:18] <wikibugs>	 (03PS3) 10Revi: Enable wgMinervaEnableSiteNotice for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630)
[10:41:52] <wikibugs>	 (03CR) 10Revi: Enable wgMinervaEnableSiteNotice for kowiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[10:42:10] <revi>	 err apple waych
[10:42:12] <revi>	 watch*
[10:42:47] <wikibugs>	 (03CR) 10MarcoAurelio: [C: 031] Enable wgMinervaEnableSiteNotice for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[10:42:54] <TabbyCat>	 :D
[10:43:23] <revi>	 :DD
[10:48:47] <qchris_>	 Volker_E: Yup. What's up?
[10:53:44] <wikibugs>	 (03CR) 10Thiemo Mättig (WMDE): [C: 031] mediawiki: Another increase of batch size in dispatchChanges cronjob [puppet] - 10https://gerrit.wikimedia.org/r/370315 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[11:17:58] <wikibugs>	 (03PS1) 10Ladsgroup: beta: Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370455 (https://phabricator.wikimedia.org/T112606)
[11:21:37] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3505584 (10Marostegui) ``` ˜/icinga-wm 12:19> RECOVERY - MegaRAID on db1016 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy ```
[11:29:07] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0]
[11:36:07] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[11:38:07] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[12:07:26] <wikibugs>	 (03PS12) 10Gehel: logstash: Parse nginx access logs for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/299825 (owner: 10BryanDavis)
[12:08:04] <gehel>	 !log deploying https://gerrit.wikimedia.org/r/#/c/299825/ - some logs will be lost during logstash restart
[12:08:08] <wikibugs>	 (03CR) 10Gehel: [C: 032] logstash: Parse nginx access logs for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/299825 (owner: 10BryanDavis)
[12:08:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:09:13] <gehel>	 _joe_: there is puppet compiler fixed not yet merged on puppetmaster1001. Ok if I merge it with my change?
[12:10:09] <gehel>	 _joe_: it looks trivial enough, I'm merging it...
[12:15:37] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370457
[12:15:41] <wikibugs>	 (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370457
[12:22:16] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1009 is CRITICAL: PYBAL CRITICAL - logstash-syslog-tcp_10514 - Could not depool server logstash1001.eqiad.wmnet because of too many down!: logstash-json-tcp_11514 - Could not depool server logstash1001.eqiad.wmnet because of too many down!: logstash-log4j_4560 - Could not depool server logstash1001.eqiad.wmnet because of too many down!: logstash-json-udp_11514_udp - Could not depool s
[12:22:16] <icinga-wm>	 eqiad.wmnet because of too many down!: logstash-syslog-udp_10514_udp - Could not depool server logstash1001.eqiad.wmnet because of too many down!
[12:22:36] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - logstash-syslog-udp_10514_udp - Could not depool server logstash1002.eqiad.wmnet because of too many down!: logstash-json-tcp_11514 - Could not depool server logstash1002.eqiad.wmnet because of too many down!: logstash-log4j_4560 - Could not depool server logstash1002.eqiad.wmnet because of too many down!: logstash-json-udp_11514_udp - Could not depo
[12:22:36] <icinga-wm>	 002.eqiad.wmnet because of too many down!: logstash-syslog-tcp_10514 - Could not depool server logstash1002.eqiad.wmnet because of too many down!
[12:22:46] <gehel>	 ^that's me, rolling back ...
[12:22:56] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - logstash-syslog-tcp_10514 - Could not depool server logstash1002.eqiad.wmnet because of too many down!: logstash-json-tcp_11514 - Could not depool server logstash1002.eqiad.wmnet because of too many down!: logstash-log4j_4560 - Could not depool server logstash1002.eqiad.wmnet because of too many down!: logstash-json-udp_11514_udp - Could not depool s
[12:22:56] <icinga-wm>	 eqiad.wmnet because of too many down!: logstash-syslog-udp_10514_udp - Could not depool server logstash1002.eqiad.wmnet because of too many down!
[12:23:41] <wikibugs>	 (03PS1) 10Gehel: Revert "logstash: Parse nginx access logs for wdqs" [puppet] - 10https://gerrit.wikimedia.org/r/370460
[12:23:47] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370457 (owner: 10Marostegui)
[12:23:55] <wikibugs>	 (03CR) 10Gehel: [V: 032 C: 032] Revert "logstash: Parse nginx access logs for wdqs" [puppet] - 10https://gerrit.wikimedia.org/r/370460 (owner: 10Gehel)
[12:25:01] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on logstash.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.36 and port 10514: Connection refused
[12:25:19] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370457 (owner: 10Marostegui)
[12:25:30] <gehel>	 now I need to understand why this is working fine on labs...
[12:26:05] <_joe_>	 gehel: yeah sorry 
[12:26:20] <gehel>	 _joe_: no problem :)
[12:26:23] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2074 - T171321 (duration: 00m 45s)
[12:26:23] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2074" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370457 (owner: 10Marostegui)
[12:26:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:26:35] <stashbot>	 T171321: Finish dbstore2002 migration to multi-instance - https://phabricator.wikimedia.org/T171321
[12:28:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy
[12:31:01] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on logstash.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on 10.2.2.36 port 10514
[12:31:06] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1009 is OK: PYBAL OK - All pools are healthy
[12:31:06] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy
[12:33:36] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[12:34:20] * gehel is also having a look at elasticsearch slowing down...
[12:36:16] <icinga-wm>	 PROBLEM - pdfrender on scb1001 is CRITICAL: connect to address 10.64.0.16 and port 5252: Connection refused
[12:37:36] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[12:37:47] <wikibugs>	 (03PS1) 10Gehel: logstash: Parse nginx access logs for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/370463
[12:38:37] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[12:39:11] <wikibugs>	 (03CR) 10Gehel: [C: 032] logstash: Parse nginx access logs for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/370463 (owner: 10Gehel)
[12:39:18] <elukey>	 !log restart kafka on kafka1018 to force it out of the kafka topic leaders - T172681
[12:39:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:29] <_joe_>	 !log restarting pdfrender on scb1001, T159922
[12:39:29] <stashbot>	 T172681: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681
[12:39:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:40] <stashbot>	 T159922: pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922
[12:43:17] <icinga-wm>	 RECOVERY - pdfrender on scb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time
[13:00:06] <jouncebot>	 addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170807T1300).
[13:00:06] <jouncebot>	 stephanebisson, TabbyCat, revi, and Amir1: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[13:00:14] <TabbyCat>	 o/
[13:00:16] <Amir1>	 o/
[13:00:16] <revi>	 available
[13:00:19] <stephanebisson>	 hello
[13:02:10] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Pool db1098 as new s6 recentchanges/watchlist host [puppet] - 10https://gerrit.wikimedia.org/r/370447 (https://phabricator.wikimedia.org/T172679) (owner: 10Jcrespo)
[13:02:17] <wikibugs>	 (03PS3) 10Jcrespo: mariadb: Pool db1098 as new s6 recentchanges/watchlist host [puppet] - 10https://gerrit.wikimedia.org/r/370447 (https://phabricator.wikimedia.org/T172679)
[13:04:48] * revi drinks his cola
[13:07:44] <TabbyCat>	 who's swatting today?
[13:08:03] * TabbyCat eyes aude
[13:09:12] <revi>	 I think wikimania season...
[13:09:16] <revi>	 seems errbody is there?
[13:10:53] <TabbyCat>	 Amir1: can you deploy?
[13:11:14] <Amir1>	 I can but I'm not an official SWATer
[13:11:32] <TabbyCat>	 is that a hard blocker?
[13:11:41] <TabbyCat>	 (not forcing you, just asking)
[13:11:45] <Amir1>	 The problem is, I don't know :D
[13:11:50] <TabbyCat>	 lol k
[13:12:19] <TabbyCat>	 revi: yep, WM is probably the reason; I think I saw something on Wikitech about that?
[13:12:34] <revi>	 Wikimania is Aug 9th through 13th
[13:12:38] <revi>	 MW Train will progress as normal
[13:12:38] <revi>	 Service and SWAT deploys will be on a best-effort basis (if anything to deploy)
[13:12:39] <revi>	 probably you saw this
[13:12:43] <TabbyCat>	 ^^
[13:12:49] <revi>	 Right after ==Week of August 7th==
[13:13:24] <revi>	 I think deployers would be available for murrican morning deploy and evening but that would be too late for me
[13:13:31] <revi>	 03:00–04:00 UTC+9	
[13:13:36] <revi>	 that's... no...no....
[13:13:44] * Amir1 faceplams
[13:13:54] <Amir1>	 I'm pinging releng
[13:14:07] <revi>	 I would be awake at 8AM (evening swat) but I have to fly for Wikimania at that time
[13:14:09] <revi>	 :P
[13:14:14] <revi>	 so now or Wed
[13:14:21] <TabbyCat>	 oh revi I'll see you there then
[13:14:28] <revi>	 :D
[13:14:33] <TabbyCat>	 (joke)
[13:14:38] <revi>	 telepathy
[13:15:15] <jynus>	 !log reboot db1098
[13:15:22] <_joe_>	 yes, SWAT windows should be done by official swatters only without explicit permission or UBN! tickets
[13:15:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:56] <Amir1>	 Hello everyone, welcome to my first SWAT
[13:16:05] <Amir1>	 I hope we don't crash to anywhere :D
[13:16:08] <revi>	 so is he authorized?
[13:16:09] <TabbyCat>	 *clap* *clap*
[13:16:16] <_joe_>	 I have no idea :P
[13:16:23] <revi>	 lol
[13:16:29] <Amir1>	 I just asked in releng
[13:16:33] <_joe_>	 cool!
[13:16:42] * revi applauds
[13:17:14] <Amir1>	 revi: I prefer to start with you for timezone reasons 
[13:17:23] <revi>	 thanks :D
[13:17:37] <wikibugs>	 (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[13:17:54] <wikibugs>	 (03CR) 10Ladsgroup: Enable wgMinervaEnableSiteNotice for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[13:18:19] <Amir1>	 Wait a sec, I forgot my yubikey
[13:18:24] <revi>	 sure
[13:20:09] <Amir1>	 revi: TabbyCat , sorry, my yubikey is in hotel, I can't login to prod at all
[13:20:16] * revi :O
[13:20:34] <revi>	 that's fine, the task itself isn't urgent, I can do that @ montreal
[13:20:43] <TabbyCat>	 wonder if _joe_ can deploy
[13:20:47] <revi>	 preferably @hackathon
[13:21:03] <Amir1>	 I use a hardware key to login, I should keep it with myself all the time
[13:21:10] <Amir1>	 sorry
[13:21:45] <revi>	 no need to be sorry :D
[13:22:03] <revi>	 so I think today's european deploy is gone, rescheduling
[13:22:10] <TabbyCat>	 jynus: ¿tu tienes permisos para deploy?
[13:22:16] <icinga-wm>	 PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:22:35] <icinga-wm>	 PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:22:45] <icinga-wm>	 PROBLEM - Disk space on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:22:45] <icinga-wm>	 PROBLEM - salt-minion processes on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:22:46] <icinga-wm>	 PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:22:51] <Amir1>	 revi: do you want me to schedule it for later SWAT today? I have something to deploy too, can babysit yours (If it doesn't require knowing Korean)
[13:22:52] <stephanebisson>	 my patch was not urgent, I'll let it ride the train this week
[13:22:55] <icinga-wm>	 PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:23:05] <icinga-wm>	 PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:23:05] <icinga-wm>	 PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:23:17] <revi>	 Amir1: you just need to check if the same text in web sitenotice is visible on mobile too
[13:23:38] <revi>	 (mobile web)
[13:24:00] <Amir1>	 "향후 15년의 위키미디어의 미래를 논의하는 위키미디어 2030 전략 토론이 진행되고 있습니다." ?
[13:24:03] <revi>	 current kowiki sitenotice has English in it (Altostratus) so it should be possible to compare it
[13:24:06] <revi>	 two lines
[13:24:13] <revi>	 향후 15년의 위키미디어의 미래를 논의하는 위키미디어 2030 전략 토론이 진행되고 있습니다.
[13:24:13] <revi>	 사용자:Altostratus에 대한 관리자 선거가 2017년 8월 8일 (화) 14:09 (KST)까지 진행됩니다.
[13:24:43] <Amir1>	 okay
[13:24:50] <Amir1>	 That doesn't seem bad
[13:25:03] <revi>	 Maybe just check Altostratus, numbers, KST :P
[13:25:20] <TabbyCat>	 Is James_F also at Wikimania?
[13:25:42] <revi>	 https://wikimania2017.wikimedia.org/wiki/Template:Attendees/100 probably yes
[13:26:03] <Amir1>	 :D
[13:26:09] <TabbyCat>	 I guess this week it'll be complicated to do anything
[13:26:19] <revi>	 likely
[13:26:35] <TabbyCat>	 and tomorrow I have the phone company migrating my ADSL to fiber and it'll likely take the whole day
[13:30:41] <TabbyCat>	 Amir1: well, I guess we need to revert revi's patch now that it's merged?
[13:30:47] <revi>	 it wasn't
[13:30:54] <Amir1>	 TabbyCat: I didn't let it merge
[13:30:57] <TabbyCat>	 it's cr+2
[13:30:59] <TabbyCat>	 ah
[13:31:09] <revi>	 he removed +2 when he was looking for his key
[13:31:46] <TabbyCat>	 otoh Amir1 https://phabricator.wikimedia.org/T172641#3505730 <-- how should I check that?
[13:32:13] <TabbyCat>	 I mean, I use labels.wmflabs interface
[13:32:33] <Amir1>	 TabbyCat: https://translatewiki.net/wiki/Special:Translate?action=translate&group=wiki-ai-wikilabels-form-dagf&language=es&filter=%21translated
[13:33:04] <TabbyCat>	 okay, I'll do a quick review
[13:33:05] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[13:33:47] <TabbyCat>	 Amir1: looks good to me
[13:33:52] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Add db1098 to the list of available s6 hosts [software] - 10https://gerrit.wikimedia.org/r/370465 (https://phabricator.wikimedia.org/T172679)
[13:34:05] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:34:45] <icinga-wm>	 RECOVERY - configured eth on stat1005 is OK: OK - interfaces up
[13:34:49] <Amir1>	 TabbyCat: So we need to wait that this gets to wikilabels
[13:34:55] <icinga-wm>	 RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient
[13:34:55] <icinga-wm>	 RECOVERY - salt-minion processes on stat1005 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[13:35:00] <Amir1>	 probably by tomorrow 
[13:35:05] <icinga-wm>	 RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0
[13:35:05] <icinga-wm>	 RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[13:35:06] <icinga-wm>	 RECOVERY - DPKG on stat1005 is OK: All packages OK
[13:35:15] <icinga-wm>	 RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational
[13:35:21] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Add db1098 to the list of available s6 hosts [software] - 10https://gerrit.wikimedia.org/r/370465 (https://phabricator.wikimedia.org/T172679) (owner: 10Jcrespo)
[13:35:24] <TabbyCat>	 Amir1: alright, cool
[13:35:41] <TabbyCat>	 probably when a translation is missing it should fallback to English
[13:35:46] <wikibugs>	 (03PS2) 10Andrew Bogott: toolschecker: use the new puppetmaster for manifest checks [puppet] - 10https://gerrit.wikimedia.org/r/370251 (https://phabricator.wikimedia.org/T171786)
[13:36:05] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0]
[13:37:55] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:38:55] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy
[13:39:24] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] toolschecker: use the new puppetmaster for manifest checks [puppet] - 10https://gerrit.wikimedia.org/r/370251 (https://phabricator.wikimedia.org/T171786) (owner: 10Andrew Bogott)
[13:40:36] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:40:55] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:41:55] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:41:56] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy
[13:41:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:42:56] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy
[13:43:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy
[13:44:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:45:26] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:45:26] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:45:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:46:05] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy
[13:46:35] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy
[13:46:35] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy
[13:47:22] <jynus>	 what whas the issue with recommendation_api ?
[13:48:15] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:48:16] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: puppet-compiler: further work for puppetdb support [puppet] - 10https://gerrit.wikimedia.org/r/370466
[13:48:29] <TabbyCat>	 Amir1: maybe... you could babysit my patches as well?
[13:48:33] <elukey>	 might be that the testing endpoints have been removed?
[13:48:40] <elukey>	 (200 --> 404)
[13:48:44] <TabbyCat>	 I'll try to be around
[13:48:55] <Amir1>	 TabbyCat: first, can you check the wikilabels for eswiki?
[13:49:02] <TabbyCat>	 sure
[13:49:05] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:49:06] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy
[13:49:06] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:49:06] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy
[13:49:09] <jynus>	 elukey: like data not existing due to edits?
[13:49:35] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:49:35] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:49:35] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:49:38] <elukey>	 jynus: something like that, but it is only a speculation. Checking on the host
[13:50:05] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy
[13:50:05] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy
[13:50:12] <Amir1>	 TabbyCat: Yeah sure
[13:50:35] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy
[13:50:36] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy
[13:50:36] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy
[13:51:34] <TabbyCat>	 https://phabricator.wikimedia.org/F8980254 <-- Amir1 now I see this
[13:51:56] <Amir1>	 TabbyCat: yeah, that was the plan
[13:52:04] <wikibugs>	 10Operations, 10ops-codfw: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346#3505831 (10Papaul) @elukey do you have any log for me?
[13:52:06] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy
[13:52:12] <TabbyCat>	 oh argh
[13:52:16] <TabbyCat>	 vandals on wikitech
[13:52:30] * TabbyCat heads for the broom
[13:52:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received
[13:53:55] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy
[13:54:24] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] puppet-compiler: further work for puppetdb support [puppet] - 10https://gerrit.wikimedia.org/r/370466 (owner: 10Giuseppe Lavagetto)
[13:54:33] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: puppet-compiler: further work for puppetdb support [puppet] - 10https://gerrit.wikimedia.org/r/370466
[13:56:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[13:56:25] <icinga-wm>	 PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:56:26] <icinga-wm>	 PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:56:43] <elukey>	 I am on it --^
[13:56:55] <icinga-wm>	 PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:57:05] <icinga-wm>	 PROBLEM - salt-minion processes on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:57:05] <icinga-wm>	 PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:57:15] <icinga-wm>	 PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:57:16] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy
[13:57:25] <icinga-wm>	 PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds
[14:01:35] <halfak>	 o/ _joe_ 
[14:01:51] <halfak>	 Do you have some time today to look at ORES stress tests with me :)
[14:03:09] <halfak>	 Related, if anyone could give me a review of https://gerrit.wikimedia.org/r/#/c/369915/, I could probably continue on my own. 
[14:03:20] <halfak>	 Not quite sure I've done that the right way. 
[14:03:35] <icinga-wm>	 RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational
[14:03:55] <icinga-wm>	 RECOVERY - configured eth on stat1005 is OK: OK - interfaces up
[14:04:05] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Mon 2017-08-07 14:03:56 UTC.
[14:04:09] <icinga-wm>	 RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient
[14:04:09] <icinga-wm>	 RECOVERY - salt-minion processes on stat1005 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[14:04:16] <icinga-wm>	 RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0
[14:04:16] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team, 10Patch-For-Review, 10User-Joe: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3505857 (10Halfak) Looks like we have missed the scheduled time.  I'm just waiting on review of the above patchset so that I can continue testi...
[14:04:25] <icinga-wm>	 RECOVERY - DPKG on stat1005 is OK: All packages OK
[14:04:25] <icinga-wm>	 RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[14:13:29] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: puppet-compiler: add environment to exec [puppet] - 10https://gerrit.wikimedia.org/r/370468
[14:13:32] <wikibugs>	 10Operations, 10Mail: Install missing Spamassassin DKIM dependencies on lists and mx - https://phabricator.wikimedia.org/T172689#3505877 (10herron)
[14:13:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] puppet-compiler: add environment to exec [puppet] - 10https://gerrit.wikimedia.org/r/370468 (owner: 10Giuseppe Lavagetto)
[14:21:09] <wikibugs>	 10Operations, 10Mail: Install missing Spamassassin DKIM dependencies on lists and mx - https://phabricator.wikimedia.org/T172689#3505954 (10herron) Installed libmail-dkim-perl and restarted spamassassin service    fermium:~# spamassassin -D --lint   dbg: diag: [...] module installed: Mail::DKIM, version 0.4...
[14:21:18] <wikibugs>	 (03PS2) 10Andrew Bogott: shinken: test the new labs puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/370252 (https://phabricator.wikimedia.org/T171786)
[14:22:09] <herron>	 !log mx[1,2]001, fermium: Installed libmail-dkim-perl and restarted spamassassin service - T172689
[14:22:17] <wikibugs>	 10Operations, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): Switch to new labs puppetmasters - https://phabricator.wikimedia.org/T171786#3505957 (10Andrew)
[14:22:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:19] <stashbot>	 T172689: Install missing Spamassassin DKIM dependencies on lists and mx - https://phabricator.wikimedia.org/T172689
[14:22:23] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] shinken: test the new labs puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/370252 (https://phabricator.wikimedia.org/T171786) (owner: 10Andrew Bogott)
[14:22:53] <wikibugs>	 10Operations, 10Mail: Install missing Spamassassin DKIM dependencies on lists and mx - https://phabricator.wikimedia.org/T172689#3505961 (10herron) 05Open>03Resolved
[14:25:14] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Analysis, 10Maps, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3505968 (10Gehel) Over the last 30 days, backend requests [[ https://grafana-admin.wikimedia.org/dashboard/db/maps-performances?panelId=4&fullscr...
[14:26:14] <wikibugs>	 10Operations, 10Cloud-VPS, 10Patch-For-Review: rack/setup/install labtestpuppetmaster2001 - https://phabricator.wikimedia.org/T167157#3505969 (10Andrew) 05Open>03Resolved This is up and working.
[14:28:25] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[14:32:11] <mutante>	 !log phab2001 - stopping Apache,schedule downtime for http and puppet
[14:32:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:25] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[14:34:35] <icinga-wm>	 RECOVERY - Disk space on stat1005 is OK: DISK OK
[14:35:41] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3505985 (10mcruzWMF) >>! In T172417#3504034, @Reedy wrote: > I don't disagree with Timo above, and I'm guessing #operations will ag...
[14:36:25] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[14:38:21] <wikibugs>	 (03PS1) 10Andrew Bogott: wikitech-static monitoring: check much less frequently [puppet] - 10https://gerrit.wikimedia.org/r/370472 (https://phabricator.wikimedia.org/T168962)
[14:38:35] <elukey>	 !log updated librdkafka1 and ++1 to 0.9.4.1 on hafnium
[14:38:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] wikitech-static monitoring: check much less frequently [puppet] - 10https://gerrit.wikimedia.org/r/370472 (https://phabricator.wikimedia.org/T168962) (owner: 10Andrew Bogott)
[14:39:26] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0]
[14:41:34] <wikibugs>	 10Operations, 10Cloud-Services, 10Patch-For-Review: wikitech-static sync check shouldn't happen so often - https://phabricator.wikimedia.org/T168962#3506019 (10Andrew) 05Open>03Resolved
[14:45:35] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[14:45:57] <wikibugs>	 10Operations, 10Wikimania-Hackathon-2017-Organization, 10Release-Engineering-Team (Watching / External): Wikimania needs hosting on a server for onsite conference guide - https://phabricator.wikimedia.org/T172217#3506028 (10Dzahn) @Antoine2711 Is this working for you? Any issues? I will be in travel to Wikim...
[14:51:26] <gehel>	 !log reducing elasticsearch eqiad concurrent rebalance to 4 (from 8)
[14:51:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:47] <wikibugs>	 (03PS1) 10Andrew Bogott: increase retries for check_nova_compute_process [puppet] - 10https://gerrit.wikimedia.org/r/370474 (https://phabricator.wikimedia.org/T171606)
[14:55:04] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] increase retries for check_nova_compute_process [puppet] - 10https://gerrit.wikimedia.org/r/370474 (https://phabricator.wikimedia.org/T171606) (owner: 10Andrew Bogott)
[14:58:06] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3506077 (10mcruzWMF) @Reedy @Krinkle Would it be possible to implement this by tomorrow (Tuesday August 8), because if so we would...
[14:59:31] <wikibugs>	 (03PS1) 10Ottomata: Allow rsync to dataset1001 for pagecounts-ez [puppet] - 10https://gerrit.wikimedia.org/r/370478 (https://phabricator.wikimedia.org/T152712)
[15:00:41] <wikibugs>	 (03CR) 10Ottomata: [C: 032] Allow rsync to dataset1001 for pagecounts-ez [puppet] - 10https://gerrit.wikimedia.org/r/370478 (https://phabricator.wikimedia.org/T152712) (owner: 10Ottomata)
[15:07:06] <wikibugs>	 (03CR) 10Jcrespo: "These are truly awful grants- We will get rid of most of these, but we need time." [puppet] - 10https://gerrit.wikimedia.org/r/369832 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn)
[15:08:14] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Add db1098 as new s6 recentchanges/watchlist/... replica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370480 (https://phabricator.wikimedia.org/T171027)
[15:10:59] <thcipriani>	 !log restarting jenkins for plugin upgrade
[15:11:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:02] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Add db1098 as new s6 recentchanges/watchlist/... replica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370480 (https://phabricator.wikimedia.org/T171027)
[15:14:39] <wikibugs>	 (03PS3) 10Jcrespo: mariadb: Add db1098 as new s6 recentchanges/watchlist/... replica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370480 (https://phabricator.wikimedia.org/T171027)
[15:15:52] <wikibugs>	 (03PS4) 10Jcrespo: mariadb: Add db1098 as new s6 recentchanges/watchlist/... replica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370480 (https://phabricator.wikimedia.org/T171027)
[15:30:33] <wikibugs>	 (03PS1) 10Jgreen: unsubscribe awight from fr-tech icinga alerts [puppet] - 10https://gerrit.wikimedia.org/r/370483
[15:31:14] <wikibugs>	 (03PS2) 10Jgreen: unsubscribe awight from fr-tech icinga alerts [puppet] - 10https://gerrit.wikimedia.org/r/370483 (https://phabricator.wikimedia.org/T170437)
[15:31:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] unsubscribe awight from fr-tech icinga alerts [puppet] - 10https://gerrit.wikimedia.org/r/370483 (https://phabricator.wikimedia.org/T170437) (owner: 10Jgreen)
[15:35:27] <wikibugs>	 (03PS3) 10Jgreen: unsubscribe awight from fr-tech icinga alerts [puppet] - 10https://gerrit.wikimedia.org/r/370483 (https://phabricator.wikimedia.org/T170437)
[15:36:13] <wikibugs>	 (03CR) 10Jgreen: [C: 032] unsubscribe awight from fr-tech icinga alerts [puppet] - 10https://gerrit.wikimedia.org/r/370483 (https://phabricator.wikimedia.org/T170437) (owner: 10Jgreen)
[15:49:08] <wikibugs>	 10Operations, 10Analytics-Kanban, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3506357 (10elukey) From https://apache.googlesource.com/kafka/+/refs/heads/trunk/clients/src/main/java/org/apache/kafka/common/protocol/ApiKeys...
[15:49:53] <wikibugs>	 (03PS1) 10Herron: Add SPF and DKIM perl package requires to spamassassin class [puppet] - 10https://gerrit.wikimedia.org/r/370487 (https://phabricator.wikimedia.org/T172689)
[15:51:19] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[15:51:24] <wikibugs>	 (03PS5) 10Gehel: wdqs - remove upstart configuration files [puppet] - 10https://gerrit.wikimedia.org/r/369688
[15:52:08] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[15:52:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy
[15:55:15] <wikibugs>	 (03PS1) 10Marostegui: realm.pp: Add to oauth tables to the private list [puppet] - 10https://gerrit.wikimedia.org/r/370489 (https://phabricator.wikimedia.org/T172693)
[15:56:55] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics-Kanban, 10User-Elukey: Analytics1034 eth0 negotiated speed to 100Mb/s instead of 1000Mb/s - https://phabricator.wikimedia.org/T172633#3506395 (10Nuria)
[15:56:59] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] realm.pp: Add to oauth tables to the private list [puppet] - 10https://gerrit.wikimedia.org/r/370489 (https://phabricator.wikimedia.org/T172693) (owner: 10Marostegui)
[15:57:36] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] realm.pp: Add to oauth tables to the private list (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/370489 (https://phabricator.wikimedia.org/T172693) (owner: 10Marostegui)
[15:58:20] <wikibugs>	 (03PS2) 10Marostegui: realm.pp: Add two oauth tables to the private list [puppet] - 10https://gerrit.wikimedia.org/r/370489 (https://phabricator.wikimedia.org/T172693)
[16:00:06] <wikibugs>	 (03CR) 10Marostegui: [C: 032] realm.pp: Add two oauth tables to the private list [puppet] - 10https://gerrit.wikimedia.org/r/370489 (https://phabricator.wikimedia.org/T172693) (owner: 10Marostegui)
[16:06:18] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2033659
[16:06:18] <icinga-wm>	 ACKNOWLEDGEMENT - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] Ayounsi https://phabricator.wikimedia.org/T169498
[16:08:24] <wikibugs>	 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3497641 (10Nuria) Not sure what do we need to do here. What is on analytics store apart from eventlogging and mediawiki databases?
[16:08:51] <wikibugs>	 (03CR) 10Gehel: [C: 04-1] wdqs - remove upstart configuration files [puppet] - 10https://gerrit.wikimedia.org/r/369688 (owner: 10Gehel)
[16:12:02] <_joe_>	 gehel: tell me when should I take a look at those changes, btw
[16:12:18] <gehel>	 _joe_: yep, I'll poing you when ready!
[16:12:35] <_joe_>	 thanks for working on that :)
[16:12:48] <gehel>	 my pleasure (well, to some extent...)
[16:12:53] <_joe_>	 eheh
[16:13:24] <_joe_>	 I hope that by thursday we'll also have full puppetdb support in the compiler
[16:14:58] <wikibugs>	 (03CR) 10Dzahn: "our only blocker here is currently that phab fails to create the weekly stats mail with "ERROR 1698 (28000): Access denied for user 'phsta" [puppet] - 10https://gerrit.wikimedia.org/r/369832 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn)
[16:20:57] <wikibugs>	 (03CR) 10Paladox: [C: 031] mariadb/phabricator: update GRANTS from iridium to phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/369832 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn)
[16:24:18] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[16:31:19] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[16:33:23] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Analysis, 10Maps, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3506554 (10BBlack) It's just per-IP.  So yes that sounds fine: if you're peaking at 80/s total, then lets put an upper sanity bound at 100/s miss...
[16:34:42] <wikibugs>	 (03PS1) 10EBernhardson: Enable max token count for phrase rescore on zh lang wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370497 (https://phabricator.wikimedia.org/T169498)
[16:36:23] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 0
[16:36:50] <wikibugs>	 (03CR) 10Awight: [C: 04-1] Adds hieradata for ores::celery::workers with default. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/369915 (https://phabricator.wikimedia.org/T169246) (owner: 10Halfak)
[16:37:03] <XioNoX>	 !log manually restarted varnish on cp1099
[16:37:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:38:39] <icinga-wm>	 ACKNOWLEDGEMENT - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [1000.0] Ayounsi T169498
[16:39:03] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3506569 (10Reedy) It's not on either of us, at this point, it's on #operations to do the review/merging/deployment  Though, as you...
[16:39:28] <wikibugs>	 (03PS1) 10Dzahn: phabricator: open firewall holes only on active_server [puppet] - 10https://gerrit.wikimedia.org/r/370498
[16:39:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] phabricator: open firewall holes only on active_server [puppet] - 10https://gerrit.wikimedia.org/r/370498 (owner: 10Dzahn)
[16:40:13] <wikibugs>	 (03Abandoned) 10Reedy: Add resources.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/369971 (https://phabricator.wikimedia.org/T172417) (owner: 10Reedy)
[16:41:25] <gehel>	 XioNoX: thanks for the ack! we should have a temporary fix deployed in a few hours...
[16:44:50] <marostegui>	 !log Restart s7 instance on db1069 to pick up new replication filters - T172693
[16:45:00] <wikibugs>	 (03PS2) 10Reedy: Redirect wikimedia.org/resources to meta [puppet] - 10https://gerrit.wikimedia.org/r/369970 (https://phabricator.wikimedia.org/T172417)
[16:45:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:45:43] <wikibugs>	 (03CR) 10Reedy: "PS2 swaps it to something specific after the / and rebases it" [puppet] - 10https://gerrit.wikimedia.org/r/369970 (https://phabricator.wikimedia.org/T172417) (owner: 10Reedy)
[16:45:45] <wikibugs>	 (03PS2) 10Gilles: Serve a synth error page when error body is empty in Varnish [puppet] - 10https://gerrit.wikimedia.org/r/365589 (https://phabricator.wikimedia.org/T169683)
[16:46:10] <wikibugs>	 10Operations, 10Phabricator, 10Traffic, 10Release-Engineering-Team (Kanban): Verify that the codfw lvs is configured correctly for Phabricator - https://phabricator.wikimedia.org/T168699#3506599 (10mmodell) phab2001 web works, git-ssh still unknown.
[16:46:20] <wikibugs>	 (03PS2) 10Dzahn: phabricator: open firewall holes only on active_server [puppet] - 10https://gerrit.wikimedia.org/r/370498
[16:46:40] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Please use require_package instead" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/370487 (https://phabricator.wikimedia.org/T172689) (owner: 10Herron)
[16:47:59] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "I will leave db1098 partitioning overnight, and maybe it can be pooled tomorrow." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370480 (https://phabricator.wikimedia.org/T171027) (owner: 10Jcrespo)
[16:48:09] <wikibugs>	 (03PS3) 10Dzahn: phabricator: open firewall holes only on active_server [puppet] - 10https://gerrit.wikimedia.org/r/370498
[16:48:26] <wikibugs>	 10Operations, 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Performance-Team, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3506604 (10mmodell)
[16:50:07] <_joe_>	 Reedy: that would only work for wikimedia.org/resources
[16:50:14] <_joe_>	 not for www.wikimedia.org/resources
[16:50:20] <_joe_>	 is that expected?
[16:50:59] <_joe_>	 also, I have a meeting at 11 pm, I wanna get off the clock now
[16:51:59] <Reedy>	 Pass
[16:52:04] <Reedy>	 None of the others do
[16:52:28] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[16:52:42] <Reedy>	 https://github.com/wikimedia/puppet/blob/production/modules/mediawiki/files/apache/sites/redirects/redirects.dat#L442-L443
[16:53:08] <Reedy>	 I don't see anythat that redirects from www.wikimedia.org, only t
[16:53:09] <Reedy>	 to
[16:53:41] <_joe_>	 Reedy: ok, I'm still logging off for now
[16:53:44] <Reedy>	 heh
[16:54:02] <_joe_>	 but there's plenty of ops still online I bet
[16:54:21] <Reedy>	 Who might just go "apache? lolno"
[16:54:23] * Reedy grins
[16:54:45] <Reedy>	 _joe_: Fancy sticking a CR +1 on it regardless so they know someone else has looked at it please?
[16:55:51] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] "seems to do what it's designed for" [puppet] - 10https://gerrit.wikimedia.org/r/369970 (https://phabricator.wikimedia.org/T172417) (owner: 10Reedy)
[16:55:58] <Reedy>	 cheers!
[16:57:46] <bblack>	 Reedy: yeah www.wikimedia.org vs wikimedia.org as HTTP hostnames is actually a separate thorny issue...
[16:57:58] <Reedy>	 heh
[16:58:06] <Reedy>	 I'll comment on ticket to be explicit that www. won't work
[16:58:07] <bblack>	 there's one or two open tickets about it
[16:59:08] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3506650 (10Reedy) Just a heads up, `www.wikimedia.org/resources` will not work, but `wikimedia.org/resources` will  So please put `...
[16:59:11] <bblack>	 https://phabricator.wikimedia.org/T133178
[16:59:13] <bblack>	 ^ is one
[16:59:28] <Reedy>	 I meant on the ticket for this, be explicit about what they should print
[17:00:04] <bblack>	 yeah I just don't know which is actually more appropriate
[17:00:04] <jouncebot>	 gehel: Dear anthropoid, the time has come. Please deploy Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170807T1700).
[17:00:50] <bblack>	 I think currently "wikimedia.org/" redirects to "www.wikimedia.org/" (if no URL path), which has our "here's our projects" landing page
[17:00:53] <gehel>	 jouncebot: o/
[17:01:36] <bblack>	 www.wikimedia.org has some global API stuff (as in global to projects/languages)?
[17:01:49] <bblack>	 wikimedia.org has basically-nothing at present I think, except the base URL redirect to www
[17:02:16] <wikibugs>	 (03PS4) 10Dzahn: phabricator: open firewall holes only on active_server [puppet] - 10https://gerrit.wikimedia.org/r/370498 (https://phabricator.wikimedia.org/T137928)
[17:02:17] <bblack>	 but then RB is currently backwards from that (subject of the ticket above)
[17:02:37] <logmsgbot>	 !log gehel@tin Started deploy [wdqs/wdqs@da33919]: (no justification provided)
[17:02:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:52] <bblack>	 but I think the consensus on that ticket is to move RB to www as well
[17:03:18] <wikibugs>	 (03CR) 10Paladox: [C: 031] phabricator: open firewall holes only on active_server [puppet] - 10https://gerrit.wikimedia.org/r/370498 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn)
[17:03:50] <bblack>	 to me it seems like "wikimedia.org/resources" is more logical than "www.wikimedia.org/resources" for this, but I could see someone involved arguing the opposite maybe.  I don't know.
[17:05:05] <logmsgbot>	 !log gehel@tin Finished deploy [wdqs/wdqs@da33919]: (no justification provided) (duration: 02m 28s)
[17:05:05] <bblack>	 www.wikimedia.org is more like some kind of meta-wiki (in the real sense of technical-meta to projects/langs, rather than the more abstract content/community-meta of meta.wikimedia.org)
[17:05:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:05:53] <gehel>	 SMalyshev: deployment completed, tests are green
[17:06:10] * gehel is keeping a look at error rate, see if we don't throttle too many...
[17:06:14] <mutante>	 detects "labs" string on www.wikimedia.org
[17:06:19] <mutante>	 bd808: :)
[17:08:19] <bblack>	 maybe we should have some sort of official Public URI Namespace Bikeshedding Committee that makes consistent policies and decisions about all related things :)
[17:09:36] <mutante>	 yea. requesting wiki at bikeshedding.committee.wikimedia.org
[17:09:53] <SMalyshev>	 gehel: thanks! let's monitor it for a while
[17:10:33] <SMalyshev>	 gehel: I see logging patch is also merged? 
[17:10:37] <marostegui>	 !log Restart s7 instance on db1102 to pick up new replication filters - T172693
[17:10:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:18] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Analysis, 10Maps, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3506699 (10Gehel) @BBlack I'm probably the one who should be around. I can be available any time from 10am to 11pm CEST (1am to 2pm PT). Just let...
[17:12:00] <wikibugs>	 (03CR) 10Jcrespo: "@Dzhan: change the dns of m3-slave to point to the same server than m3-master, that will fix the issue, and I will fix the existing mess w" [puppet] - 10https://gerrit.wikimedia.org/r/369832 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn)
[17:15:57] <gehel>	 SMalyshev: lgostash patch is merged, and it looks like the IP/UA also made it
[17:16:34] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "tested with apache-fast-test on mwdebug1001" [puppet] - 10https://gerrit.wikimedia.org/r/369970 (https://phabricator.wikimedia.org/T172417) (owner: 10Reedy)
[17:18:44] <wikibugs>	 (03PS1) 10Ayounsi: Bumping HP RAID Icinga check timeout from 60 to 90s [puppet] - 10https://gerrit.wikimedia.org/r/370505 (https://phabricator.wikimedia.org/T172708)
[17:21:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:21:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:21:59] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:21:59] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:21:59] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:22:04] <wikibugs>	 (03PS1) 10Dzahn: point m3-slave to same server as m3-master [dns] - 10https://gerrit.wikimedia.org/r/370506
[17:22:08] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:22:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:22:20] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:22:20] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:22:30] <robh>	 uh oh
[17:22:38] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:22:39] <robh>	 seems a config issue?
[17:23:50] <wikibugs>	 (03CR) 10Dzahn: "@jcrespo to dbproxy like this?  https://gerrit.wikimedia.org/r/#/c/370506/1/templates/wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/369832 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn)
[17:24:08] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy
[17:24:08] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy
[17:24:18] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy
[17:24:19] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy
[17:24:28] <ebernhardson>	 robh: remmendation_api? no that one just alerts with too much sensitivity (and repetition) on cirrussearch issues
[17:24:28] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] "We have done this in the past, when we have done maintenance on the passive host, with no issues." [dns] - 10https://gerrit.wikimedia.org/r/370506 (owner: 10Dzahn)
[17:24:38] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy
[17:24:51] <robh>	 thx for info =]
[17:25:30] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] "In fact, we need to do maintenance and upgrade hardware here, so now it is a good time to do both." [dns] - 10https://gerrit.wikimedia.org/r/370506 (owner: 10Dzahn)
[17:25:47] <ebernhardson>	 usually a minutes or two later the cirrus ones will alert 
[17:25:53] <ebernhardson>	 i have a partial fix going out in swat...might help
[17:26:05] <wikibugs>	 (03PS2) 10Dzahn: point m3-slave to same server as m3-master [dns] - 10https://gerrit.wikimedia.org/r/370506
[17:27:08] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:27:08] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:27:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:27:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:27:29] <icinga-wm>	 PROBLEM - Number of backend failures per minute from CirrusSearch on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [600.0]
[17:27:39] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200)
[17:27:39] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0]
[17:28:27] <bblack>	 how is 404 an intermittent issue anyways? I could understand being oversensitive to something like 5xx which might actually be intermittent in a failure, but surely 404-ing a URL is just broken?
[17:29:12] <wikibugs>	 (03CR) 10Dzahn: [C: 032] point m3-slave to same server as m3-master [dns] - 10https://gerrit.wikimedia.org/r/370506 (owner: 10Dzahn)
[17:29:16] <bblack>	 or maybe this is a case of "inappropriate 404", where some service is returning a 404 when a 5xx would be more-appropriate
[17:29:28] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0]
[17:30:00] <ebernhardson>	 hmm, indeed restbase should be returning a 5xx there
[17:30:26] <bblack>	 (404 shouldn't be used as "I temporarily can't contact whatever I'm proxying/querying to, so let's call it 'not found'".  It should only have the public and consistent (over time) meaning "This URL is not valid and nothing lives here".
[17:30:39] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy
[17:31:19] <icinga-wm>	 ACKNOWLEDGEMENT - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1000.0] Gehel known issue, fix coming up soon - T169498
[17:31:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy
[17:32:01] <bblack>	 good thing I'm not a LISP programmer, I always loose track of my parens :P
[17:33:18] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy
[17:33:18] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy
[17:33:18] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy
[17:33:18] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy
[17:33:18] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy
[17:33:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy
[17:33:31] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy
[17:34:18] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy
[17:35:08] <mutante>	 jynus: i can confirm the stats script does not get the 'access denied' anymore now, on phab1001
[17:35:11] <mutante>	 thanks
[17:35:27] <mutante>	 and phab isn't broken :)
[17:35:52] <jynus>	 I just want to do things well, or else I would never fix that mess
[17:35:54] <mutante>	 now i just have to fix a totally unrelated issue with that script that come from trusty->jessie
[17:36:08] <jynus>	 that template is really bad
[17:36:38] <icinga-wm>	 RECOVERY - Number of backend failures per minute from CirrusSearch on graphite1001 is OK: OK: Less than 20.00% above the threshold [300.0]
[17:36:46] <mutante>	 jynus: yea, that sounds good. glad there was a quick fix :)
[17:37:01] <jynus>	 let's call it temporary workaround :-)
[17:37:06] <mutante>	 ok :)
[17:40:29] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0]
[17:43:34] <wikibugs>	 (03CR) 10Ayounsi: [C: 032] Bumping HP RAID Icinga check timeout from 60 to 90s [puppet] - 10https://gerrit.wikimedia.org/r/370505 (https://phabricator.wikimedia.org/T172708) (owner: 10Ayounsi)
[17:43:41] <wikibugs>	 (03PS2) 10Ayounsi: Bumping HP RAID Icinga check timeout from 60 to 90s [puppet] - 10https://gerrit.wikimedia.org/r/370505 (https://phabricator.wikimedia.org/T172708)
[17:44:24] <TabbyCat>	 jouncebot: next
[17:44:25] <jouncebot>	 In 0 hour(s) and 15 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170807T1800)
[17:45:10] <TabbyCat>	 would anyone do that window this time? :)
[17:45:48] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[17:48:31] <greg-g>	 TabbyCat: regarding this morning's session: SWATs are normally on a best effort basis, especially the 13:00 UTC one as there is less SWAT deployer coverage.
[17:48:52] <TabbyCat>	 greg-g: I know + it's Wikimania time :)
[17:49:03] <greg-g>	 yup :)
[17:52:04] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "no-op on phab1001. closes firewall on phab2001  (except ssh between phab servers)  http://puppet-compiler.wmflabs.org/7326/" [puppet] - 10https://gerrit.wikimedia.org/r/370498 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn)
[17:52:20] <wikibugs>	 (03PS5) 10Dzahn: phabricator: open firewall holes only on active_server [puppet] - 10https://gerrit.wikimedia.org/r/370498 (https://phabricator.wikimedia.org/T137928)
[17:53:08] <ebernhardson>	 i'll be deploying swat if noone else shows up, i have important things that have to go out :P
[17:54:58] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[17:56:01] <jynus>	 !log stopping slave and reparitioning db1098
[17:56:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:57:39] <mutante>	 !log phab2001 - re-enabling puppet, but closing firewall for 80/443
[17:57:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:00:04] <jouncebot>	 addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170807T1800).
[18:00:04] <jouncebot>	 Amir1, TabbyCat, and Jdlrobson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[18:00:13] <TabbyCat>	 o/
[18:00:16] <TabbyCat>	 meow
[18:01:06] <TabbyCat>	 and ebernhardson (used irGnick instead or irC)
[18:01:20] <ebernhardson>	 yea i just fixed that :P i'll ship this today
[18:01:40] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Grant 'autopatrol' to 'editor' in en.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370311 (https://phabricator.wikimedia.org/T172561) (owner: 10MarcoAurelio)
[18:01:56] <TabbyCat>	 :D
[18:02:07] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Translate sitename for nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370313 (https://phabricator.wikimedia.org/T172594) (owner: 10MarcoAurelio)
[18:03:07] <wikibugs>	 (03Merged) 10jenkins-bot: Grant 'autopatrol' to 'editor' in en.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370311 (https://phabricator.wikimedia.org/T172561) (owner: 10MarcoAurelio)
[18:03:20] <wikibugs>	 (03CR) 10jenkins-bot: Grant 'autopatrol' to 'editor' in en.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370311 (https://phabricator.wikimedia.org/T172561) (owner: 10MarcoAurelio)
[18:05:13] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: Grant autopatrol to editor in en.wikibooks - T172561 (duration: 00m 47s)
[18:05:15] <ebernhardson>	 Amir1: around?
[18:05:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:05:26] <stashbot>	 T172561: Addition to the "autopatrol" right to the user group "reviewers" on the English Wikibooks - https://phabricator.wikimedia.org/T172561
[18:05:33] <ebernhardson>	 TabbyCat: auto patrol is out, 
[18:05:48] <TabbyCat>	 ebernhardson: live or on mwdebug?
[18:05:57] <wikibugs>	 (03PS2) 10EBernhardson: Translate sitename for nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370313 (https://phabricator.wikimedia.org/T172594) (owner: 10MarcoAurelio)
[18:05:59] <TabbyCat>	 oh, live I see
[18:06:05] * TabbyCat checks
[18:06:05] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Translate sitename for nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370313 (https://phabricator.wikimedia.org/T172594) (owner: 10MarcoAurelio)
[18:06:31] <TabbyCat>	 https://en.wikibooks.org/wiki/Special:ListGroupRights#editor <-- looks good to me
[18:07:10] <wikibugs>	 (03CR) 10Dzahn: "there is a redirect to www. when accessing it from external now. it worked when testing from tin on mwdebug1001, but not now..." [puppet] - 10https://gerrit.wikimedia.org/r/369970 (https://phabricator.wikimedia.org/T172417) (owner: 10Reedy)
[18:07:29] <wikibugs>	 (03Merged) 10jenkins-bot: Translate sitename for nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370313 (https://phabricator.wikimedia.org/T172594) (owner: 10MarcoAurelio)
[18:07:38] <wikibugs>	 (03CR) 10jenkins-bot: Translate sitename for nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370313 (https://phabricator.wikimedia.org/T172594) (owner: 10MarcoAurelio)
[18:07:58] <ebernhardson>	 TabbyCat: TabbyCat nl.wikinews on mwdebug1001
[18:08:18] <TabbyCat>	 checking
[18:08:49] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Enable max token count for phrase rescore on zh lang wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370497 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson)
[18:09:09] <TabbyCat>	 ebernhardson: sitename change looks good at mwdebug1001
[18:10:07] <ebernhardson>	 syncing
[18:10:28] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: T172594 - Translate sitename for nl.wikinews (duration: 00m 47s)
[18:10:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:38] <stashbot>	 T172594: Change $wgSitename for Dutch Wikinews to Wikinieuws - https://phabricator.wikimedia.org/T172594
[18:10:48] <TabbyCat>	 rechecking
[18:12:08] <TabbyCat>	 perfect
[18:12:21] <TabbyCat>	 thanks ebernhardson
[18:12:23] <ebernhardson>	 np
[18:12:32] <jdlrobson>	 (here btw)
[18:12:41] <jdlrobson>	 (just at end of list)
[18:12:48] <ebernhardson>	 jdlrobson: you snuck in late to an overfull swat :P but it's my fault its overfull so i guess we can try... :)
[18:12:52] <Urbanecm>	 ebernhardson, can I add one change to Morning SWAT?
[18:12:53] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[18:12:59] <jdlrobson>	 i was patch number 8 i believe :)
[18:13:07] <ebernhardson>	 Urbanecm: swat is already over full, can check once everything else is out if there is still time
[18:13:28] <ebernhardson>	 jdlrobson: well, somehow there are 10 patches :P
[18:13:38] <jdlrobson>	 ebernhardson: mine was before yours :P https://wikitech.wikimedia.org/w/index.php?title=Deployments&action=history
[18:13:43] <Urbanecm>	 ebernhardson, oh, I see. Okay, please ping me after you deploy all patches. I'll use another window if it won't be possible
[18:13:50] <jdlrobson>	 hehe
[18:14:02] <TabbyCat>	 abuse :O :P
[18:14:08] <TabbyCat>	 (joking)
[18:14:10] <ebernhardson>	 jdlrobson: :P i must not have seen yours below the marker 
[18:14:16] <jdlrobson>	 yeh iput it underneath that was my fail
[18:14:24] <jdlrobson>	 hate editing that wiki page... :)
[18:14:33] <jdlrobson>	 is there a bot for it btw
[18:14:37] <jdlrobson>	 that would be so cool..
[18:14:44] <TabbyCat>	 ++++1 that
[18:15:16] <TabbyCat>	 jdlrobson: you might find T171940 useful I think
[18:15:16] <stashbot>	 T171940: Create a Gadget to easily add/remove/modify patches for SWAT at wikitech:Deployments - https://phabricator.wikimedia.org/T171940
[18:15:20] <ebernhardson>	 jdlrobson: waiting for some things to merge now, then you'll be up
[18:16:02] <wikibugs>	 (03PS1) 10Smalyshev: Some requests may have no client IP defined. [puppet] - 10https://gerrit.wikimedia.org/r/370511 (https://phabricator.wikimedia.org/T172713)
[18:17:22] <Amir1>	 ebernhardson: I'm around now
[18:17:43] <Amir1>	 sorry, completely forgot about swat 
[18:18:04] <ebernhardson>	 no worries, i do that too, or i get distracted while other things are deploying in swat and miss when my patch comes up...
[18:18:45] <revi>	 errrrrrr being an owl at 3am
[18:18:52] <revi>	 (phone now tho)
[18:20:27] <wikibugs>	 10Operations, 10Analytics, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3507261 (10Halfak) Looks like @jcrespo wants to phase out an analytics/dba maintained resource.  I guess I'd expect analytics to lead the process of phasing that out.
[18:20:57] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Enable max token count for phrase rescore on zh lang wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370497 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson)
[18:21:17] <wikibugs>	 10Operations, 10monitoring: fix librenms LE check for netmon2001 - https://phabricator.wikimedia.org/T172712#3507265 (10Dzahn) a:03Dzahn
[18:23:27] <logmsgbot>	 !log ebernhardson@tin Synchronized php-1.30.0-wmf.12/extensions/CirrusSearch/: T169498 limit phrase token count,  T172464 constant boost ltr queries (duration: 00m 58s)
[18:23:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:23:41] <stashbot>	 T169498: Investigate load spikes on the elasticsearch cluster in eqiad - https://phabricator.wikimedia.org/T169498
[18:23:41] <stashbot>	 T172464: Problems with MLR and small rescore windows - https://phabricator.wikimedia.org/T172464
[18:26:06] <wikibugs>	 (03PS2) 10EBernhardson: Exclude files from Special:ShortPages on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369503 (https://phabricator.wikimedia.org/T170687) (owner: 10Jdlrobson)
[18:28:12] <jdlrobson>	 \o/
[18:28:21] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Exclude files from Special:ShortPages on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369503 (https://phabricator.wikimedia.org/T170687) (owner: 10Jdlrobson)
[18:28:36] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0
[18:29:54] <wikibugs>	 (03Merged) 10jenkins-bot: Exclude files from Special:ShortPages on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369503 (https://phabricator.wikimedia.org/T170687) (owner: 10Jdlrobson)
[18:30:09] <wikibugs>	 (03CR) 10jenkins-bot: Exclude files from Special:ShortPages on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369503 (https://phabricator.wikimedia.org/T170687) (owner: 10Jdlrobson)
[18:30:33] <ebernhardson>	 jdlrobson: you're up on mwdebug1001
[18:30:35] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[18:30:39] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Enable wgMinervaEnableSiteNotice for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[18:30:45] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[18:30:58] <jdlrobson>	 ebernhardson: testing
[18:31:00] <revi>	 (please know that I can't test for being mobile)
[18:31:05] <wikibugs>	 (03PS4) 10EBernhardson: Enable wgMinervaEnableSiteNotice for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[18:31:15] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[18:31:29] <ebernhardson>	 revi: you can get minerva without being on a mobile device
[18:32:12] <revi>	 I mean, X-wikimedia-debug
[18:32:17] <icinga-wm>	 PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[18:32:36] <ebernhardson>	 Whats up with these 5xx's ?
[18:33:04] <ebernhardson>	 looks like luasandbox in fatalmonitor?
[18:33:05] <revi>	 and what I meant... https://usercontent.irccloud-cdn.com/file/kDeWIfnz/IMG_2279.PNG
[18:33:07] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 27 probes of 275 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[18:33:18] <ebernhardson>	 oh, no not luasandbox. They all have the message %{message}. Very useful
[18:33:27] <jdlrobson>	 ebernhardson: my patch LGTM
[18:33:54] <ebernhardson>	 jdlrobson: sec to see if these 5xx alerts clear
[18:34:09] <ebernhardson>	 they look in fatalmonitor to have only been for ~10s
[18:36:41] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Enable wgMinervaEnableSiteNotice for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[18:37:26] <icinga-wm>	 RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[18:38:07] <wikibugs>	 (03Merged) 10jenkins-bot: Enable wgMinervaEnableSiteNotice for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[18:38:13] <ebernhardson>	 jdlrobson: syncing out
[18:38:20] <wikibugs>	 (03CR) 10jenkins-bot: Enable wgMinervaEnableSiteNotice for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370363 (https://phabricator.wikimedia.org/T172630) (owner: 10Revi)
[18:38:26] <icinga-wm>	 RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[18:38:41] <wikibugs>	 (03PS2) 10EBernhardson: beta: Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370455 (https://phabricator.wikimedia.org/T112606) (owner: 10Ladsgroup)
[18:38:45] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: T170687 - Exclude files from Special:ShortPages on commons (duration: 00m 46s)
[18:38:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:38:56] <stashbot>	 T170687: [[special:ShortPages]] includes file pages on Commons - https://phabricator.wikimedia.org/T170687
[18:39:43] <ebernhardson>	 revi: Amir1 : kowiki config is up on mwdebug1001
[18:39:46] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[18:40:15] <revi>	 kk... (needs a min)
[18:40:30] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] beta: Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370455 (https://phabricator.wikimedia.org/T112606) (owner: 10Ladsgroup)
[18:40:56] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[18:41:26] <revi>	 ebernhardson: worksforme
[18:41:59] <wikibugs>	 (03Merged) 10jenkins-bot: beta: Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370455 (https://phabricator.wikimedia.org/T112606) (owner: 10Ladsgroup)
[18:42:07] <wikibugs>	 (03CR) 10jenkins-bot: beta: Add copyright info for Wikidata API [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370455 (https://phabricator.wikimedia.org/T112606) (owner: 10Ladsgroup)
[18:42:30] <ebernhardson>	 revi: kk, syncing out
[18:42:50] <ebernhardson>	 Amir1: i'm just going to sync out the other, since its a labs only change
[18:42:58] <Amir1>	 back
[18:43:05] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: T172630 - Enable wgMinervaEnableSiteNotice for kowiki (duration: 00m 46s)
[18:43:06] <Amir1>	 sorry
[18:43:07] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 0 probes of 275 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[18:43:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:14] <stashbot>	 T172630: Enable wgMinervaEnableSiteNotice for kowiki - https://phabricator.wikimedia.org/T172630
[18:44:01] <revi>	 works on prod too
[18:44:04] <revi>	 it seems
[18:44:27] <ebernhardson>	 Amir1: your other change should show up on beta within 5 minutes, iiuc
[18:44:48] <wikibugs>	 (03PS2) 10EBernhardson: Enable max token count for phrase rescore on zh lang wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370497 (https://phabricator.wikimedia.org/T169498)
[18:44:49] <Amir1>	 revi: so you're up :) Thanks
[18:44:53] <Amir1>	 ebernhardson: Thanks
[18:44:54] <revi>	 yeah :P
[18:44:57] <revi>	 I couldn't sleep
[18:44:58] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/Wikibase-labs.php: T112606 - beta only - Add copyright info for Wikidata API (duration: 00m 46s)
[18:45:07] * ebernhardson wonders why https://gerrit.wikimedia.org/r/#/c/370497/ keeps not merging but without errors ...
[18:45:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:45:09] <stashbot>	 T112606: [Bug] The API query for rightsinfo on www.wikidata.org reports CC-SA 3.0 , while its page footer says CC0 as well - https://phabricator.wikimedia.org/T112606
[18:45:57] <wikibugs>	 (03PS2) 10EBernhardson: Update CirrusSearch AB test rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370127 (https://phabricator.wikimedia.org/T171212)
[18:48:11] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Update CirrusSearch AB test rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370127 (https://phabricator.wikimedia.org/T171212) (owner: 10EBernhardson)
[18:49:40] <wikibugs>	 (03Merged) 10jenkins-bot: Update CirrusSearch AB test rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370127 (https://phabricator.wikimedia.org/T171212) (owner: 10EBernhardson)
[18:49:50] <wikibugs>	 (03CR) 10jenkins-bot: Update CirrusSearch AB test rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370127 (https://phabricator.wikimedia.org/T171212) (owner: 10EBernhardson)
[18:53:22] <wikibugs>	 (03CR) 10EBernhardson: [C: 032] Enable max token count for phrase rescore on zh lang wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370497 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson)
[18:53:56] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: T171212 - Update CirrusSearch AB test rescore profiles (duration: 00m 46s)
[18:54:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:54:07] <stashbot>	 T171212: Interleaved results A/B test: turn on - https://phabricator.wikimedia.org/T171212
[18:55:10] <jdlrobson>	 thanks ebernhardson 
[18:55:11] <wikibugs>	 (03Merged) 10jenkins-bot: Enable max token count for phrase rescore on zh lang wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370497 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson)
[18:56:21] <wikibugs>	 (03CR) 10jenkins-bot: Enable max token count for phrase rescore on zh lang wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370497 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson)
[18:59:17] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/InitialiseSettings.php: T169498 - Enable max token count for phrase rescore on zh lang wikis (step 1) (duration: 00m 46s)
[18:59:28] <wikibugs>	 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: send wdqs logs to logstash - https://phabricator.wikimedia.org/T172710#3507415 (10Smalyshev) p:05Triage>03Normal
[18:59:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:59:29] <stashbot>	 T169498: Investigate load spikes on the elasticsearch cluster in eqiad - https://phabricator.wikimedia.org/T169498
[19:00:06] <ebernhardson>	 swat is running a smidgen over, but just one more patch after this syncs
[19:00:24] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/CirrusSearch-common.php: T169498 - Enable max token count for phrase rescore on zh lang wikis (step 2) (duration: 00m 46s)
[19:00:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:03:12] <paladox>	 mutante twentyafterfour package heirloom-mailx should fix mail -r command on debian :)
[19:05:58] <logmsgbot>	 !log ebernhardson@tin Synchronized php-1.30.0-wmf.12/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: T171212 - Turn on CirrusSearch MLR AB test (duration: 00m 46s)
[19:06:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:09] <stashbot>	 T171212: Interleaved results A/B test: turn on - https://phabricator.wikimedia.org/T171212
[19:06:44] <ebernhardson>	 SWAT complete
[19:07:16] <wikibugs>	 (03Draft1) 10Paladox: Phabricator: Install package heirloom-mailx for mail command [puppet] - 10https://gerrit.wikimedia.org/r/370518
[19:07:20] <wikibugs>	 (03PS2) 10Paladox: Phabricator: Install package heirloom-mailx for mail command [puppet] - 10https://gerrit.wikimedia.org/r/370518
[19:09:00] <wikibugs>	 (03CR) 10Paladox: Phabricator: Install package heirloom-mailx for mail command (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/370518 (owner: 10Paladox)
[19:10:26] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0]
[19:16:20] <wikibugs>	 (03PS2) 10Herron: Add SPF and DKIM perl package requires to spamassassin class [puppet] - 10https://gerrit.wikimedia.org/r/370487 (https://phabricator.wikimedia.org/T172689)
[19:16:26] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[19:17:34] <wikibugs>	 10Operations, 10DC-Ops: audit spare disk levels for codfw & eqiad utlized storage in servers - https://phabricator.wikimedia.org/T160097#3507570 (10RobH) 05Open>03Resolved dc trackign sheet has 1tb, 2tb, 4tb sata as well as ssd spares now in the 800 and 1.6tb  sizes
[19:19:39] <wikibugs>	 (03CR) 10Herron: "Sounds good.  require_package is much cleaner!" [puppet] - 10https://gerrit.wikimedia.org/r/370487 (https://phabricator.wikimedia.org/T172689) (owner: 10Herron)
[19:23:40] <wikibugs>	 (03PS1) 10Andrew Bogott: tools-clush-*: move to python-2 [puppet] - 10https://gerrit.wikimedia.org/r/370522
[19:47:47] <wikibugs>	 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Traffic, 10Wikipedia-Android-App-Backlog: Determine how to upload Zim files to Swift infrastructure - https://phabricator.wikimedia.org/T172123#3507727 (10Mholloway)
[19:49:24] <wikibugs>	 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog: Create 'pagecompilation' Swift account(s) (beta + prod) for Readers offline article compilations project - https://phabricator.wikimedia.org/T172735#3507730 (10Mholloway)
[19:59:19] <wikibugs>	 (03CR) 10Rush: [C: 031] "py3 we love you but no" [puppet] - 10https://gerrit.wikimedia.org/r/370522 (owner: 10Andrew Bogott)
[19:59:21] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] tools-clush-*: move to python-2 [puppet] - 10https://gerrit.wikimedia.org/r/370522 (owner: 10Andrew Bogott)
[20:00:04] <jouncebot>	 gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170807T2000).
[20:00:26] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#3507816 (10JbuattiWMF) 05Resolved>03Open Hey @Dzahn, would it be possible to add AShahrestani to the WMF group? This is again so that one of our legal fellows can work on th...
[20:00:32] <halfak>	 Nothing for ORES today
[20:07:49] <wikibugs>	 (03PS1) 10Gehel: discovery-stats user should be a member of wikidev group [puppet] - 10https://gerrit.wikimedia.org/r/370530 (https://phabricator.wikimedia.org/T172740)
[20:10:14] <NotASpy>	 so what's broken that Upload isn't working on Common ?
[20:10:44] <NotASpy>	 "Our servers are currently under maintenance or experiencing a technical problem." which isn't helping you or me, I suspect. 
[20:19:13] <wikibugs>	 (03CR) 10Bearloga: [C: 031] discovery-stats user should be a member of wikidev group [puppet] - 10https://gerrit.wikimedia.org/r/370530 (https://phabricator.wikimedia.org/T172740) (owner: 10Gehel)
[20:19:36] <icinga-wm>	 PROBLEM - Disk space on stat1005 is CRITICAL: DISK CRITICAL - free space: /srv 283092 MB (3% inode=94%)
[20:19:40] <MaxSem>	 NotASpy, still doesn't work?
[20:20:09] <NotASpy>	 yeah, it uploaded without issue. 
[20:23:36] <icinga-wm>	 PROBLEM - Disk space on stat1005 is CRITICAL: DISK CRITICAL - free space: /srv 279839 MB (3% inode=94%)
[20:26:30] <wikibugs>	 10Operations, 10Patch-For-Review: Standardizing our partman recipes - https://phabricator.wikimedia.org/T156955#3507975 (10fgiunchedi) >>! In T156955#3035850, @fgiunchedi wrote: > RAID/disk layer: > * either software or hardware raid > * in any case one block device is exposed (including the single-disk case,...
[20:30:33] <wikibugs>	 (03CR) 10Smalyshev: [C: 031] wdqs - remove upstart configuration files [puppet] - 10https://gerrit.wikimedia.org/r/369688 (owner: 10Gehel)
[20:31:10] <wikibugs>	 (03PS1) 10Mobrovac: Cassandra: Do not include the main DNS in the list of seeds [puppet] - 10https://gerrit.wikimedia.org/r/370554 (https://phabricator.wikimedia.org/T172610)
[20:33:18] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#3508045 (10Dzahn) a:05Dzahn>03RobH @JbuattiWMF i'm in in middle of travel now   @Robh could you help out by any chance? would be great, thank you!
[20:40:41] <wikibugs>	 (03PS1) 10RobH: adding AShahrestani to ldap per request [puppet] - 10https://gerrit.wikimedia.org/r/370555 (https://phabricator.wikimedia.org/T140380)
[20:40:57] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[20:42:28] <wikibugs>	 10Operations, 10AbuseFilter, 10Traffic, 10Zero: user_wpzero doesn't always work - https://phabricator.wikimedia.org/T169907#3412425 (10zhuyifei1999) The fact: uploaders are not always in WP0 ranges, but downloaders are nearly always in WP0 ranges (Z591)
[20:42:37] <wikibugs>	 (03CR) 10RobH: [C: 032] adding AShahrestani to ldap per request [puppet] - 10https://gerrit.wikimedia.org/r/370555 (https://phabricator.wikimedia.org/T140380) (owner: 10RobH)
[20:43:34] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#3508104 (10JbuattiWMF) Thanks @RobH !
[20:44:25] <wikibugs>	 (03PS1) 10RobH: Revert "adding AShahrestani to ldap per request" [puppet] - 10https://gerrit.wikimedia.org/r/370556
[20:45:03] <wikibugs>	 (03CR) 10RobH: [C: 032] Revert "adding AShahrestani to ldap per request" [puppet] - 10https://gerrit.wikimedia.org/r/370556 (owner: 10RobH)
[20:45:56] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[20:46:07] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#3508106 (10RobH) Yeah ignore those patchsets, they weren't required.
[20:47:56] <wikibugs>	 (03CR) 10Mobrovac: "PCC OK - https://puppet-compiler.wmflabs.org/compiler02/7327/" [puppet] - 10https://gerrit.wikimedia.org/r/370554 (https://phabricator.wikimedia.org/T172610) (owner: 10Mobrovac)
[20:55:36] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[21:00:00] <wikibugs>	 (03PS3) 10Andrew Bogott: openstack: libvirtd.conf from Jessie package [1/2] [puppet] - 10https://gerrit.wikimedia.org/r/369615 (owner: 10Hashar)
[21:00:04] <jouncebot>	 dapatrick, bawolff, and Reedy: Respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170807T2100). Please do the needful.
[21:00:29] <wikibugs>	 (03PS2) 10Andrew Bogott: openstack: libvirtd.conf from Jessie package [2/2] [puppet] - 10https://gerrit.wikimedia.org/r/369617 (owner: 10Hashar)
[21:00:46] * harej wonders why jouncebot is programmed to speak in such a mystifying tone
[21:01:00] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] openstack: libvirtd.conf from Jessie package [1/2] [puppet] - 10https://gerrit.wikimedia.org/r/369615 (owner: 10Hashar)
[21:01:22] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] openstack: libvirtd.conf from Jessie package [2/2] [puppet] - 10https://gerrit.wikimedia.org/r/369617 (owner: 10Hashar)
[21:01:27] <ebernhardson>	 herron: mystifying? I thought its more like the 'totally not a robot' meme
[21:02:37] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge.
[21:02:47] <herron>	 mystifying indeed :D
[21:05:01] <wikibugs>	 (03PS1) 10RobH: add AShahrestani to admin module for inclusion in wmf group [puppet] - 10https://gerrit.wikimedia.org/r/370579 (https://phabricator.wikimedia.org/T140380)
[21:05:33] <wikibugs>	 (03PS2) 10RobH: add AShahrestani to admin module for inclusion in wmf group [puppet] - 10https://gerrit.wikimedia.org/r/370579 (https://phabricator.wikimedia.org/T140380)
[21:05:43] <wikibugs>	 (03CR) 10RobH: [C: 032] add AShahrestani to admin module for inclusion in wmf group [puppet] - 10https://gerrit.wikimedia.org/r/370579 (https://phabricator.wikimedia.org/T140380) (owner: 10RobH)
[21:16:56] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Assistance with LDAP Access for Transparency Report - https://phabricator.wikimedia.org/T140380#3508188 (10RobH) 05Open>03Resolved Ok, chatted with @MoritzMuehlenhoff who was able to clarify.  We include in the file (even though they have an ldap...
[21:27:55] <wikibugs>	 10Operations, 10Performance-Team, 10monitoring: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837#3508201 (10Krinkle) a:03Krinkle
[21:28:59] <urandom>	 !log T172384: Upgrading Cassandra to 3.11.0-wmf1 in dev environment (build patched to disable in-built heap dumping)
[21:29:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:12] <stashbot>	 T172384: OOM exceptions in dev environment - https://phabricator.wikimedia.org/T172384
[21:35:48] <Krinkle>	 Reedy: Did you deploy anything else on Aug 5 around 14:00 UTC besides the two sole entries at https://wikitech.wikimedia.org/wiki/Server_Admin_Log#2017-08-05 ?
[21:36:19] <Krinkle>	 It seems the major regression in backend-save-timing dropped straight back down about 10minutes before that. https://grafana.wikimedia.org/dashboard/db/save-timing?refresh=5m&orgId=1&from=1501903577562&to=1502035738902
[21:36:33] <Krinkle>	 from > 1.5s to <0.5s, major drop
[21:37:26] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0]
[21:43:44] <cscott>	 herron: it looks like a native speaker of english programmed it to be "cute", and didn't realize the impact on intelligibility
[21:47:17] <cscott>	 (sorry, just read https://medium.com/@mollyclare/taming-the-steamroller-how-to-communicate-compassionately-with-non-native-english-speakers-d95d8d1845a0 )
[21:48:26] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[21:59:31] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#2684468 (10Pigsonthewing) > Users which cannot move off of the underlying Windows XP operating system can install the latest Firefox...
[22:35:02] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3508328 (10MaxSem) If a corporation is insane enough to still run XP and force their users to run IE, we can only hope that yet anot...
[23:00:04] <jouncebot>	 addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170807T2300).
[23:04:00] <wikibugs>	 10Operations, 10Cloud-Services, 10Cloud-VPS: silver has trouble rebooting - https://phabricator.wikimedia.org/T168559#3508388 (10RobH) a:03Andrew @andrew: Since this is slated for decom once the new system is in place, I'm assigning this to you for feedback.  Please let me know when this system can be pull...
[23:04:22] <wikibugs>	 10Operations, 10Cloud-Services: decom silver (was silver has trouble rebooting) - https://phabricator.wikimedia.org/T168559#3508390 (10RobH)
[23:04:34] <wikibugs>	 10Operations, 10Cloud-Services, 10hardware-requests: decom silver (was silver has trouble rebooting) - https://phabricator.wikimedia.org/T168559#3368299 (10RobH)
[23:08:05] <wikibugs>	 10Operations, 10Cloud-Services, 10hardware-requests: decom silver (was silver has trouble rebooting) - https://phabricator.wikimedia.org/T168559#3368299 (10Luke081515) Is there already a task for the replacement of silver?
[23:11:43] <wikibugs>	 10Operations, 10Cloud-Services, 10Cloud-VPS: logrotate/disk space on silver for nutcracker log - https://phabricator.wikimedia.org/T120683#3508410 (10RobH) a:03chasemp I'm working on Ops Clinic Duty this week, and as part of that work, I've done a search for unowned, high priority tasks in #Operations.  Th...
[23:17:30] <wikibugs>	 10Operations: Something is wrong with installer root disk stuff - https://phabricator.wikimedia.org/T149845#2766226 (10RobH) I'm working on Ops Clinic Duty this week, and as part of that work, I've done a search for unowned, high priority tasks in #Operations.  This task came up as a high priority #operations ta...
[23:55:00] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3508516 (10Jayprakash12345)
[23:58:20] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10Wikimedia Resource Center, 10Patch-For-Review: Create wikimedia.org/resources redirect for Wikimedia Resource Center - https://phabricator.wikimedia.org/T172417#3508519 (10Krinkle)