[00:00:04] <icinga-wm>	 PROBLEM - puppet last run on labservices1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:19:34] <icinga-wm>	 PROBLEM - puppet last run on labtestvirt2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:20:53] <matt_flaschen>	 RoanKattouw, https://gerrit.wikimedia.org/r/368331
[00:20:54] <icinga-wm>	 PROBLEM - dhclient process on thumbor1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:21:04] <icinga-wm>	 PROBLEM - salt-minion processes on thumbor1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:21:34] <icinga-wm>	 PROBLEM - nutcracker process on thumbor1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:22:45] <icinga-wm>	 RECOVERY - dhclient process on thumbor1003 is OK: PROCS OK: 0 processes with command name dhclient
[00:22:54] <icinga-wm>	 RECOVERY - salt-minion processes on thumbor1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[00:23:34] <icinga-wm>	 RECOVERY - nutcracker process on thumbor1003 is OK: PROCS OK: 1 process with UID = 111 (nutcracker), command name nutcracker
[00:27:12] <mutante>	 !log bromine sudo -E reprepro clearvanished to deleted unused precise-mediawiki causing reprepro errors
[00:27:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:31:14] <icinga-wm>	 PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:35:07] <wikibugs>	 10Puppet, 10Cloud-VPS: role::puppetmaster::standalone: Unable to locate package geoipupdate - https://phabricator.wikimedia.org/T171916#3480251 (10MaxSem)
[00:35:35] <wikibugs>	 10Puppet, 10Cloud-VPS: role::puppetmaster::standalone: Unable to locate package geoipupdate - https://phabricator.wikimedia.org/T171916#3479983 (10MaxSem) Still fails, with even more errors (I tried on a fresh VM).
[00:48:54] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on bromine is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[00:51:22] <mutante>	 ^ me
[00:51:38] <mutante>	 !log releases1001 - rsynced reprepro db data from bromine
[00:51:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:51:54] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on bromine is OK: OK ferm input default policy is set
[00:55:25] <wikibugs>	 (03PS1) 10Rush: Revert "openstack: move openstack::repo to new model" [puppet] - 10https://gerrit.wikimedia.org/r/368332
[00:55:56] <wikibugs>	 (03CR) 10Rush: [V: 032 C: 032] Revert "openstack: move openstack::repo to new model" [puppet] - 10https://gerrit.wikimedia.org/r/368332 (owner: 10Rush)
[00:56:22] <wikibugs>	 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3480257 (10Dzahn) 17:27 < mutante> !log bromine sudo -E reprepro clearvanished to deleted unused precise-mediawik...
[00:57:16] <wikibugs>	 10Puppet, 10Cloud-VPS: role::puppetmaster::standalone: Unable to locate package geoipupdate - https://phabricator.wikimedia.org/T171916#3480258 (10bd808) >>! In T171916#3480251, @MaxSem wrote: > Still fails, with even more errors (I tried on a fresh VM).  Is this a jessie|stretch base image? For //"reasons"//...
[00:58:48] <wikibugs>	 10Puppet, 10Cloud-VPS: role::puppetmaster::standalone: Unable to locate package geoipupdate - https://phabricator.wikimedia.org/T171916#3480261 (10MaxSem) Stretch.
[00:59:15] <icinga-wm>	 RECOVERY - puppet last run on labservices1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[01:00:24] <icinga-wm>	 RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[01:10:23] <wikibugs>	 (03PS1) 10Dzahn: releases: rsync reprepro data, set active server in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/368333 (https://phabricator.wikimedia.org/T164030)
[01:11:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] releases: rsync reprepro data, set active server in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/368333 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn)
[01:11:28] <wikibugs>	 (03PS2) 10Dzahn: releases: rsync reprepro data, set active server in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/368333 (https://phabricator.wikimedia.org/T164030)
[01:12:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] releases: rsync reprepro data, set active server in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/368333 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn)
[01:12:58] <wikibugs>	 (03PS3) 10Dzahn: releases: rsync reprepro data, set active server in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/368333 (https://phabricator.wikimedia.org/T164030)
[01:13:19] <wikibugs>	 10Puppet, 10Cloud-VPS: role::puppetmaster::standalone on stretch: Unable to locate package geoipupdate - https://phabricator.wikimedia.org/T171916#3480267 (10bd808)
[01:16:15] <icinga-wm>	 RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[01:18:20] <wikibugs>	 (03PS2) 10Ayounsi: Assign internal IPs to pfw3-codfw<->pfw3-eqiad ipsec link [dns] - 10https://gerrit.wikimedia.org/r/367933 (https://phabricator.wikimedia.org/T169643)
[01:18:55] <icinga-wm>	 RECOVERY - puppet last run on labtestvirt2002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[01:19:30] <wikibugs>	 (03PS1) 10Dzahn: install_server: add install2001 to DHCP, partman [puppet] - 10https://gerrit.wikimedia.org/r/368334 (https://phabricator.wikimedia.org/T171917)
[01:20:57] <wikibugs>	 (03PS2) 10Dzahn: install_server: add install2001 to DHCP, partman [puppet] - 10https://gerrit.wikimedia.org/r/368334 (https://phabricator.wikimedia.org/T171917)
[01:21:27] <wikibugs>	 (03CR) 10Ayounsi: [C: 032] Assign internal IPs to pfw3-codfw<->pfw3-eqiad ipsec link [dns] - 10https://gerrit.wikimedia.org/r/367933 (https://phabricator.wikimedia.org/T169643) (owner: 10Ayounsi)
[01:23:36] <wikibugs>	 (03CR) 10Dzahn: [C: 032] install_server: add install2001 to DHCP, partman [puppet] - 10https://gerrit.wikimedia.org/r/368334 (https://phabricator.wikimedia.org/T171917) (owner: 10Dzahn)
[01:32:57] <wikibugs>	 (03PS2) 10Dzahn: Revert "Set debug_level on icinga" [puppet] - 10https://gerrit.wikimedia.org/r/366876 (owner: 10Jcrespo)
[01:34:35] <wikibugs>	 10Operations, 10vm-requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): setup releases2001.codfw.wmnet - https://phabricator.wikimedia.org/T171917#3480300 (10Dzahn) ```  [!!] Install the GRUB boot loader on a hard disk ├┐                           │...
[01:36:53] <wikibugs>	 (03CR) 10Dzahn: [C: 032] Revert "Set debug_level on icinga" [puppet] - 10https://gerrit.wikimedia.org/r/366876 (owner: 10Jcrespo)
[02:02:58] <wikibugs>	 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#3480311 (10Dzahn) is this site still being planned? it's over a year later
[02:08:15] <ottomata>	 !log stat1002: disabled puppet, umounted /tmp, /home and /a, poweroff
[02:08:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:14:45] <wikibugs>	 (03CR) 10Dzahn: "Current Status:	  CRITICAL" [puppet] - 10https://gerrit.wikimedia.org/r/365640 (https://phabricator.wikimedia.org/T152712) (owner: 10Ottomata)
[02:18:00] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster: thorium - failed git clone of geowiki-data-private - https://phabricator.wikimedia.org/T171923#3480324 (10Dzahn)
[02:18:09] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on thorium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_geowiki-data-private] daniel_zahn https://phabricator.wikimedia.org/T171923
[02:19:03] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on mw1260 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn scheduled host downtime
[02:19:03] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on mw1260 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Service[jobchron],Service[jobrunner] daniel_zahn scheduled host downtime
[02:21:41] <wikibugs>	 10Operations, 10cloud-services-team: notebook100[12] - Invalid relationship: Apt::Pin[r-base] - https://phabricator.wikimedia.org/T171924#3480338 (10Dzahn)
[02:22:01] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on notebook1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues daniel_zahn https://phabricator.wikimedia.org/T171924
[02:22:14] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on notebook1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues daniel_zahn https://phabricator.wikimedia.org/T171924
[02:24:49] <wikibugs>	 10Operations, 10Electron-PDFs, 10Patch-For-Review, 10Reading-Web-Backlog (Tracking), 10Services (blocked): pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922#3083419 (10Dzahn) on scb100**2**   ``` Current Status:   CRITICAL    (for 0d 5h 51m...
[02:26:48] <icinga-wm>	 RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time
[02:26:56] <mutante>	 !log scb1002 - systemctl restart pdfrender - was "connect to address 10.64.16.21 and port 5252: Connection refused" in Icinga since a couple hours (T159922) - recovered
[02:27:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:27:06] <stashbot>	 T159922: pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922
[02:27:33] <wikibugs>	 10Operations, 10Electron-PDFs, 10Patch-For-Review, 10Reading-Web-Backlog (Tracking), 10Services (blocked): pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922#3480357 (10Dzahn) >>! In T159922#3480356, @Stashbot wrote: > {nav icon=file, name=Me...
[02:34:54] <wikibugs>	 (03PS2) 10Andrew Bogott: m5-master: allow labspuppet@labpuppetmaster1001 and 1002 to labspuppet [puppet] - 10https://gerrit.wikimedia.org/r/368251
[02:47:33] <wikibugs>	 (03PS1) 10Andrew Bogott: labs puppetmaster: rebase from gerrit once per minute [puppet] - 10https://gerrit.wikimedia.org/r/368339
[02:47:35] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] m5-master: allow labspuppet@labpuppetmaster1001 and 1002 to labspuppet [puppet] - 10https://gerrit.wikimedia.org/r/368251 (owner: 10Andrew Bogott)
[02:48:12] <wikibugs>	 (03PS2) 10Andrew Bogott: labs puppetmaster: rebase from gerrit once per minute [puppet] - 10https://gerrit.wikimedia.org/r/368339
[02:50:05] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labs puppetmaster: rebase from gerrit once per minute [puppet] - 10https://gerrit.wikimedia.org/r/368339 (owner: 10Andrew Bogott)
[03:01:51] <wikibugs>	 (03PS1) 10Andrew Bogott: puppetmaster profiles: add prevent_cherrypicks param [puppet] - 10https://gerrit.wikimedia.org/r/368340
[03:02:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] puppetmaster profiles: add prevent_cherrypicks param [puppet] - 10https://gerrit.wikimedia.org/r/368340 (owner: 10Andrew Bogott)
[03:04:38] <wikibugs>	 (03PS2) 10Andrew Bogott: puppetmaster profiles: add prevent_cherrypicks param [puppet] - 10https://gerrit.wikimedia.org/r/368340
[03:06:52] <wikibugs>	 (03PS3) 10Andrew Bogott: puppetmaster profiles: add prevent_cherrypicks param [puppet] - 10https://gerrit.wikimedia.org/r/368340
[03:10:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] puppetmaster profiles: add prevent_cherrypicks param [puppet] - 10https://gerrit.wikimedia.org/r/368340 (owner: 10Andrew Bogott)
[03:26:29] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 721.24 seconds
[03:30:45] <wikibugs>	 (03PS1) 10Andrew Bogott: labs puppetmaster backend: open firewall on 8141 [puppet] - 10https://gerrit.wikimedia.org/r/368342
[03:32:45] <wikibugs>	 (03PS2) 10Andrew Bogott: labs puppetmaster backend: open firewall on 8141 [puppet] - 10https://gerrit.wikimedia.org/r/368342
[03:35:26] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labs puppetmaster backend: open firewall on 8141 [puppet] - 10https://gerrit.wikimedia.org/r/368342 (owner: 10Andrew Bogott)
[03:43:38] <wikibugs>	 (03PS1) 10Andrew Bogott: labs puppetmasters: Let the puppetmasters talk to each other on 8141 [puppet] - 10https://gerrit.wikimedia.org/r/368343
[03:44:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] labs puppetmasters: Let the puppetmasters talk to each other on 8141 [puppet] - 10https://gerrit.wikimedia.org/r/368343 (owner: 10Andrew Bogott)
[03:47:36] <wikibugs>	 (03PS2) 10Andrew Bogott: labs puppetmasters: Let the puppetmasters talk to each other on 8141 [puppet] - 10https://gerrit.wikimedia.org/r/368343
[03:48:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labs puppetmasters: Let the puppetmasters talk to each other on 8141 [puppet] - 10https://gerrit.wikimedia.org/r/368343 (owner: 10Andrew Bogott)
[04:05:39] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 210.35 seconds
[04:09:28] <wikibugs>	 (03PS1) 10Andrew Bogott: define puppetmaster::servers for labpuppetmaster1002 [puppet] - 10https://gerrit.wikimedia.org/r/368345
[04:11:03] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] define puppetmaster::servers for labpuppetmaster1002 [puppet] - 10https://gerrit.wikimedia.org/r/368345 (owner: 10Andrew Bogott)
[04:25:35] <wikibugs>	 (03PS1) 10Andrew Bogott: labs puppetmaster: simplify allow_from rules [puppet] - 10https://gerrit.wikimedia.org/r/368347
[04:26:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] labs puppetmaster: simplify allow_from rules [puppet] - 10https://gerrit.wikimedia.org/r/368347 (owner: 10Andrew Bogott)
[04:28:20] <wikibugs>	 (03PS2) 10Andrew Bogott: labs puppetmaster: simplify allow_from rules [puppet] - 10https://gerrit.wikimedia.org/r/368347
[04:29:44] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labs puppetmaster: simplify allow_from rules [puppet] - 10https://gerrit.wikimedia.org/r/368347 (owner: 10Andrew Bogott)
[04:42:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1017 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Cache: Permanently Disabled - Cable Error - Battery/Capacitor: Recharging
[04:43:02] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on ms-be1017 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Cache: Permanently Disabled - Cable Error - Battery/Capacitor: Recharging nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T171926
[04:43:07] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1017 - https://phabricator.wikimedia.org/T171926#3480411 (10ops-monitoring-bot)
[05:19:43] <wikibugs>	 (03Abandoned) 10Krinkle: mediawiki: update apache rules for 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/225552 (owner: 10Gergő Tisza)
[05:21:29] <icinga-wm>	 PROBLEM - puppet last run on db1063 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 2 minutes ago with 4 failures. Failed resources (up to 3 shown): Package[tzdata],Exec[wikidev_ensure_members],Exec[ops_ensure_members],Exec[absent_ensure_members]
[05:48:58] <icinga-wm>	 RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[06:09:39] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1001 is OK: HTTP OK: HTTP/1.1 200 OK - 13348 bytes in 0.001 second response time
[06:09:59] <icinga-wm>	 RECOVERY - WDQS HTTP on wdqs1001 is OK: HTTP OK: HTTP/1.1 200 OK - 13348 bytes in 0.001 second response time
[06:12:09] <SMalyshev>	 wikidata changes stopped about 20 mins ago - anybody knows the reason?
[06:27:29] <icinga-wm>	 PROBLEM - High lag on wdqs2002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1800.0]
[06:27:39] <icinga-wm>	 PROBLEM - High lag on wdqs2001 is CRITICAL: CRITICAL: 31.03% of data above the critical threshold [1800.0]
[06:28:09] <icinga-wm>	 PROBLEM - High lag on wdqs1003 is CRITICAL: CRITICAL: 34.48% of data above the critical threshold [1800.0]
[06:28:18] <icinga-wm>	 PROBLEM - High lag on wdqs2003 is CRITICAL: CRITICAL: 31.03% of data above the critical threshold [1800.0]
[06:30:18] <icinga-wm>	 PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 31.03% of data above the critical threshold [1800.0]
[06:30:19] <icinga-wm>	 PROBLEM - High lag on wdqs2003 is CRITICAL: CRITICAL: 31.03% of data above the critical threshold [1800.0]
[06:33:39] <icinga-wm>	 PROBLEM - High lag on wdqs2001 is CRITICAL: CRITICAL: 44.83% of data above the critical threshold [1800.0]
[06:35:20] <wikibugs>	 10Operations, 10Wikidata: Wikidata database locked - https://phabricator.wikimedia.org/T171928#3480493 (10Esc3300)
[06:35:54] <moritzm>	 !log installing apache security updates on trusty systems
[06:36:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:41:58] <icinga-wm>	 PROBLEM - puppet last run on mw1259 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[apache2]
[06:45:36] <wikibugs>	 10Operations, 10Android-app-feature-Compilations, 10Traffic, 10Wikipedia-Android-App-Backlog, 10Reading-Infrastructure-Team-Backlog (Kanban): Determine where to host zim files for the Android app - https://phabricator.wikimedia.org/T170843#3480500 (10Nemo_bis)
[06:51:46] <kart_>	 "read-only wiki" while adding wikidata link. Known outage?
[06:52:08] <icinga-wm>	 RECOVERY - puppet last run on mw1259 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[06:56:53] <wikibugs>	 10Operations, 10Puppet, 10Traffic, 10Mobile, and 2 others: URLs with title query string parameter and additional query string parameters do not redirect to mobile site - https://phabricator.wikimedia.org/T154227#2904582 (10Nemo_bis) Can you guarantee to support all the URLs with parameters which would get...
[06:57:24] <elukey>	 kart_: the only thing that we know afaics is the alerts for Wikidata Query Service lag
[06:57:51] <elukey>	 ah ok unbreak now - https://phabricator.wikimedia.org/T171928
[06:58:28] <wikibugs>	 10Operations, 10Wikidata: Wikidata database locked - https://phabricator.wikimedia.org/T171928#3480510 (10Esc3300) Wikivoyage seems to work.
[06:59:11] <wikibugs>	 10Operations, 10Wikidata: Wikidata database locked - https://phabricator.wikimedia.org/T171928#3480468 (10Mbch331) Dutch Wikipedia also works. So it's not all projects.
[07:04:43] <wikibugs>	 10Operations, 10Wikidata: Wikidata database locked - https://phabricator.wikimedia.org/T171928#3480468 (10jcrespo) Database crashed, it should be ok to edit now.
[07:04:46] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1017 - https://phabricator.wikimedia.org/T171926#3480518 (10MoritzMuehlenhoff) p:05Triage>03Normal a:03Cmjohnson
[07:05:26] <wikibugs>	 10Operations, 10Wikidata: Wikidata database locked - https://phabricator.wikimedia.org/T171928#3480468 (10Joe) I just did two test edits, I can confirm it works.
[07:12:14] <wikibugs>	 10Operations, 10Wikidata: Wikidata database locked - https://phabricator.wikimedia.org/T171928#3480528 (10Esc3300) Yes, it's back! Thanks for your help.  ``` (diff | hist) . . 99minutos.com (Q33542455)‎; 07:03 . . (-95)‎ . . ‎Tarawa1943 (talk | contribs)‎ (‎Page on [eswiki] deleted: 99minutos.com) [rollback] (...
[07:12:22] <elukey>	 kart_: ---^ 
[07:12:34] <elukey>	 thanks for the ping
[07:12:58] <wikibugs>	 10Operations, 10Wikidata: Wikidata database locked - https://phabricator.wikimedia.org/T171928#3480529 (10Esc3300) p:05Unbreak!>03Triage
[07:19:32] <kart_>	 elukey: thanks. I was about to report it :)
[07:26:28] <icinga-wm>	 RECOVERY - High lag on wdqs1003 is OK: OK: Less than 30.00% above the threshold [600.0]
[07:26:38] <icinga-wm>	 RECOVERY - High lag on wdqs2003 is OK: OK: Less than 30.00% above the threshold [600.0]
[07:26:58] <icinga-wm>	 RECOVERY - High lag on wdqs2001 is OK: OK: Less than 30.00% above the threshold [600.0]
[07:27:29] <icinga-wm>	 RECOVERY - High lag on wdqs1002 is OK: OK: Less than 30.00% above the threshold [600.0]
[07:27:48] <icinga-wm>	 RECOVERY - High lag on wdqs2002 is OK: OK: Less than 30.00% above the threshold [600.0]
[07:48:58] <wikibugs>	 10Operations, 10cloud-services-team: notebook100[12] - Invalid relationship: Apt::Pin[r-base] - https://phabricator.wikimedia.org/T171924#3480579 (10MoritzMuehlenhoff) p:05Triage>03High Seems like a side effect of 7dfe90c0d494999e2cfc05b12169401d40d54c99 ?
[07:49:25] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team-Backlog: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3480582 (10MoritzMuehlenhoff) p:05Triage>03Normal
[07:52:27] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wdqs1001.eqiad.wmnet
[07:52:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:52:38] <elukey>	 !log forced mii-tool -r eth0 on analytics1034 to get 1G negotiated speed
[07:52:40] <gehel>	 !log repooling wdqs1001 (data import completed)
[07:52:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:52:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:43] <elukey>	 !log update nodejs to 6.11 on aqs1004 (testing prod node after beta qa) - T170790
[07:56:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:53] <stashbot>	 T170790: Upgrade AQS to node 6.11 - https://phabricator.wikimedia.org/T170790
[08:01:53] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3376126 (10Crochet.david) >>! In T168765#3477698, @Jayprakash12345 wrote: >  > https://commons.wikimedia.org/wiki/File:Wikiversity-logo-hi.s...
[08:04:58] <icinga-wm>	 RECOVERY - puppet last run on thorium is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[08:05:28] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3480635 (10Crochet.david) >>! In T168765#3468785, @Jayprakash12345 wrote: > Will Quiz Extension be install automatically at the time of wiki...
[08:09:06] <wikibugs>	 10Operations, 10Commons, 10Traffic, 10media-storage: 503 error for certain JPG thumbnail: "Backend fetch failed" - https://phabricator.wikimedia.org/T171421#3480638 (10ema) 05Open>03Resolved a:03ema We do have occasional backend fetch failures. Closing, as this looks like a transient error.
[08:11:40] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster: thorium - failed git clone of geowiki-data-private - https://phabricator.wikimedia.org/T171923#3480643 (10elukey) This issue has already happened in the past, this brutal sequence of commands fixed it:  ``` root@thorium:/srv/geowiki# rm -rf data-private root@th...
[08:12:06] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Cluster, 10User-Elukey: thorium - failed git clone of geowiki-data-private - https://phabricator.wikimedia.org/T171923#3480644 (10elukey) p:05Triage>03Normal
[08:33:45] <wikibugs>	 (03PS4) 10Filippo Giunchedi: Don't show diffs for files with secret content [puppet] - 10https://gerrit.wikimedia.org/r/366806 (https://phabricator.wikimedia.org/T79881)
[08:33:56] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3480703 (10Jayprakash12345) >>! In T168765#3480603, @Crochet.david wrote: >>>! In T168765#3477698, @Jayprakash12345 wrote: >>  >> https://co...
[08:34:13] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678
[08:35:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678 (owner: 10Giuseppe Lavagetto)
[08:36:37] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Don't show diffs for files with secret content [puppet] - 10https://gerrit.wikimedia.org/r/366806 (https://phabricator.wikimedia.org/T79881) (owner: 10Filippo Giunchedi)
[08:38:46] <godog>	 ugh, incoming puppet shower, sorry about that
[08:39:08] <elukey>	 shall we stop ircecho 
[08:39:10] <elukey>	 ?
[08:39:28] <icinga-wm>	 PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:38] <icinga-wm>	 PROBLEM - puppet last run on maps1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:38] <icinga-wm>	 PROBLEM - puppet last run on wtp1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:46] <godog>	 elukey: fixing as we speak, but yeah if you could stop it!
[08:39:48] <icinga-wm>	 PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:48] <icinga-wm>	 PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:48] <icinga-wm>	 PROBLEM - puppet last run on restbase-test2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:49] <icinga-wm>	 PROBLEM - puppet last run on mw2226 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:49] <icinga-wm>	 PROBLEM - puppet last run on ms-be2021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:58] <icinga-wm>	 PROBLEM - puppet last run on dbstore1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:58] <icinga-wm>	 PROBLEM - puppet last run on mw2171 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:58] <icinga-wm>	 PROBLEM - puppet last run on ms-be1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:39:59] <icinga-wm>	 PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:07] <wikibugs>	 (03PS1) 10Filippo Giunchedi: profile: fix bogus show_diff for ssh::userkey [puppet] - 10https://gerrit.wikimedia.org/r/368361
[08:40:08] <icinga-wm>	 PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:08] <icinga-wm>	 PROBLEM - puppet last run on cp1074 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:08] <icinga-wm>	 PROBLEM - puppet last run on mw2131 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:10] <icinga-wm>	 PROBLEM - puppet last run on wdqs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:10] <icinga-wm>	 PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:10] <icinga-wm>	 PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:18] <icinga-wm>	 PROBLEM - puppet last run on db2073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:18] <icinga-wm>	 PROBLEM - puppet last run on ms-be1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:18] <icinga-wm>	 PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:18] <icinga-wm>	 PROBLEM - puppet last run on kubestagetcd1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:18] <icinga-wm>	 PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:19] <icinga-wm>	 PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:26] * elukey stops
[08:40:28] <icinga-wm>	 PROBLEM - puppet last run on radon is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:28] <icinga-wm>	 PROBLEM - puppet last run on mw1299 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:28] <icinga-wm>	 PROBLEM - puppet last run on logstash1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:28] <icinga-wm>	 PROBLEM - puppet last run on lvs2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:34] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] profile: fix bogus show_diff for ssh::userkey [puppet] - 10https://gerrit.wikimedia.org/r/368361 (owner: 10Filippo Giunchedi)
[08:40:38] <icinga-wm>	 PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:38] <icinga-wm>	 PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:38] <icinga-wm>	 PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:38] <icinga-wm>	 PROBLEM - puppet last run on es2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:38] <icinga-wm>	 PROBLEM - puppet last run on chlorine is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:38] <icinga-wm>	 PROBLEM - puppet last run on wtp1038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:38] <icinga-wm>	 PROBLEM - puppet last run on elastic1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:39] <icinga-wm>	 PROBLEM - puppet last run on mw2257 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:39] <icinga-wm>	 PROBLEM - puppet last run on cp1099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:40] <icinga-wm>	 PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:40] <icinga-wm>	 PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:48] <icinga-wm>	 PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:48] <icinga-wm>	 PROBLEM - puppet last run on analytics1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:48] <icinga-wm>	 PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:48] <icinga-wm>	 PROBLEM - puppet last run on mw1296 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:48] <icinga-wm>	 PROBLEM - puppet last run on db2089 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:48] <icinga-wm>	 PROBLEM - puppet last run on oresrdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:48] <icinga-wm>	 PROBLEM - puppet last run on mw2210 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:41:29] <elukey>	 !log stop ircecho on einstenium as puppet-error-shower countermeasure
[08:41:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:03:28] <wikibugs>	 10Operations, 10Wikidata: Wikidata database locked - https://phabricator.wikimedia.org/T171928#3480745 (10jcrespo) a:03jcrespo Investigation is not over, here is what we have found out for now of the causes:  https://wikitech.wikimedia.org/wiki/Incident_documentation/20170728-s5_(WikiData_and_dewiki)_read-only
[09:07:29] <moritzm>	 !log installing apache security updates on puppet masters
[09:07:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:24] <wikibugs>	 (03PS7) 10Giuseppe Lavagetto: Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678
[09:12:05] <wikibugs>	 10Operations, 10Wikidata: Wikidata and dewiki databases locked - https://phabricator.wikimedia.org/T171928#3480757 (10jcrespo)
[09:14:30] <wikibugs>	 10Operations, 10Wikidata, 10Wikimedia-Incident: Wikidata and dewiki databases locked - https://phabricator.wikimedia.org/T171928#3480761 (10Peachey88)
[09:25:19] <wikibugs>	 (03PS8) 10Giuseppe Lavagetto: Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678
[09:26:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678 (owner: 10Giuseppe Lavagetto)
[09:28:45] <wikibugs>	 10Operations, 10monitoring, 10User-fgiunchedi: Update diamond to latest upstream version - https://phabricator.wikimedia.org/T97635#3480780 (10fgiunchedi) 05Open>03stalled The `--log-stdout` issue has been filed as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=869970  As for the slow shutdown I've re...
[09:32:01] <wikibugs>	 (03PS9) 10Giuseppe Lavagetto: Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678
[09:36:50] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1017 - https://phabricator.wikimedia.org/T171926#3480786 (10fgiunchedi) p:05Normal>03High @Cmjohnson I suspect this is again the battery dying and needs replacement, same as {T171183}
[09:37:05] <wikibugs>	 10Operations, 10ops-eqiad, 10User-fgiunchedi: Degraded RAID on ms-be1016 - https://phabricator.wikimedia.org/T171183#3456811 (10fgiunchedi) p:05Normal>03High
[09:39:47] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3480794 (10elukey) Since the kafka1012->kafka1022 are going to be decommed and kafka-jumbo is a complete new cluster from our...
[09:41:41] <elukey>	 !log re-enable irc-echo on einstenium
[09:41:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:52] <wikibugs>	 (03PS10) 10Giuseppe Lavagetto: Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678
[09:42:58] <icinga-wm>	 RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures
[09:43:48] <icinga-wm>	 RECOVERY - puppet last run on mw2150 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[09:52:32] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3480813 (10gh87) @Jayprakash12345 You can ask at [[https://commons.wikimedia.org/wiki/Commons:Graphic_Lab/Illustration_workshop|Commons:Grap...
[09:53:25] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3480814 (10Urbanecm)
[10:15:26] <wikibugs>	 (03PS1) 10ArielGlenn: do rsyncs of pageviews and other items from stat1005 now instead of stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/368383
[10:15:38] <elukey>	 thanksss apergos !
[10:15:42] <elukey>	 I was about to do it :)
[10:15:53] <elukey>	 I just seen the task
[10:16:31] <elukey>	 can we put the value in heira though?
[10:17:34] <apergos>	 we can, but that should be part of setting up the new labstore hosts, which will be taking over the dataset roles.
[10:17:39] <apergos>	 or rather, some of the dataset roles.
[10:18:01] <elukey>	 okok :)
[10:19:24] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] do rsyncs of pageviews and other items from stat1005 now instead of stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/368383 (owner: 10ArielGlenn)
[10:20:02] <wikibugs>	 (03PS5) 10Ema: pybal::monitoring: add check_pybal_ipvs_diff [puppet] - 10https://gerrit.wikimedia.org/r/367662 (https://phabricator.wikimedia.org/T134893)
[10:20:26] <wikibugs>	 (03CR) 10Ema: [V: 032 C: 032] pybal::monitoring: add check_pybal_ipvs_diff [puppet] - 10https://gerrit.wikimedia.org/r/367662 (https://phabricator.wikimedia.org/T134893) (owner: 10Ema)
[10:29:54] <wikibugs>	 (03PS11) 10Giuseppe Lavagetto: Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678
[10:31:11] <jynus>	 !log upgrading and restarting labsdb1009 and labsdb1011
[10:31:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:31:23] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3480849 (10Urbanecm) @Dereckson Can you reserve a window for this?
[10:40:34] <wikibugs>	 (03PS1) 10Jcrespo: labsdb-replicas: Update new labsdb hosts to stretch/systemd [puppet] - 10https://gerrit.wikimedia.org/r/368391 (https://phabricator.wikimedia.org/T153743)
[10:42:37] <wikibugs>	 (03PS2) 10Jcrespo: labsdb-replicas: Update new labsdb hosts to stretch/systemd [puppet] - 10https://gerrit.wikimedia.org/r/368391 (https://phabricator.wikimedia.org/T153743)
[10:47:53] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] labsdb-replicas: Update new labsdb hosts to stretch/systemd [puppet] - 10https://gerrit.wikimedia.org/r/368391 (https://phabricator.wikimedia.org/T153743) (owner: 10Jcrespo)
[10:58:15] <wikibugs>	 (03PS1) 10Muehlenhoff: Add scons to package list [puppet] - 10https://gerrit.wikimedia.org/r/368399
[11:01:05] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367892 (https://phabricator.wikimedia.org/T171501) (owner: 10MarcoAurelio)
[11:08:28] <icinga-wm>	 PROBLEM - graphite.wikimedia.org on graphite1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.001 second response time
[11:09:29] <elukey>	 The proxy server received an invalid response from an upstream server.
[11:09:46] <godog>	 indeed, doesn't seem very happy, taking a look
[11:10:28] <icinga-wm>	 RECOVERY - graphite.wikimedia.org on graphite1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1547 bytes in 1.737 second response time
[11:11:28] <godog>	 mhh recovered by itself
[11:11:32] <elukey>	 yeah
[11:13:24] <elukey>	 from the apache logs it seems that uwsgi was off for a bit
[11:16:36] <godog>	 yeah using a lot of cpu too
[11:17:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Add scons to package list [puppet] - 10https://gerrit.wikimedia.org/r/368399 (owner: 10Muehlenhoff)
[11:17:19] <wikibugs>	 (03PS2) 10Muehlenhoff: Add scons to package list [puppet] - 10https://gerrit.wikimedia.org/r/368399
[11:19:08] <elukey>	 from ~10:58
[11:20:21] <godog>	 yeah I'm looking at the graphite-web logs to see if a query stands out
[11:21:18] <elukey>	     294 2017-07-28T10:56
[11:21:18] <elukey>	     282 2017-07-28T10:57
[11:21:18] <elukey>	       4 2017-07-28T10:58
[11:21:18] <elukey>	       2 2017-07-28T10:57
[11:21:19] <elukey>	       3 2017-07-28T10:58
[11:21:26] <elukey>	 reqs from the apache logs
[11:43:03] <wikibugs>	 (03CR) 10Daniel Kinzler: "@hoo that sounds like a good suggestion. Can you make a ticket or patch for doing this?" [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[11:46:41] <wikibugs>	 10Operations, 10Multimedia, 10TimedMediaHandler, 10HHVM, 10Patch-For-Review: Migrate video scalers to jessie - https://phabricator.wikimedia.org/T145742#3480961 (10MoritzMuehlenhoff) Status update: The new jessie scaler has been exposed to production traffic and a few files have been identified which cra...
[11:47:32] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on mw1260 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Muehlenhoff T145742
[11:47:32] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on mw1260 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Service[jobchron],Service[jobrunner] Muehlenhoff T145742
[11:58:36] <wikibugs>	 (03CR) 10Ladsgroup: "I can amend this patch if you want to" [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[12:03:00] <wikibugs>	 (03PS1) 10Jcrespo: labsdb: Rename sanitarium2 to sanitarium multisource [puppet] - 10https://gerrit.wikimedia.org/r/368408 (https://phabricator.wikimedia.org/T153743)
[12:05:21] <wikibugs>	 10Operations, 10Multimedia, 10TimedMediaHandler, 10HHVM, 10Patch-For-Review: Migrate video scalers to jessie - https://phabricator.wikimedia.org/T145742#3481017 (10MoritzMuehlenhoff) The test case (https://commons.wikimedia.org/wiki/File:National_Archaeological_Museum_Kabile_-_near_Yambol.webm) also cras...
[12:06:55] <wikibugs>	 (03PS2) 10Jcrespo: labsdb: Rename sanitarium2 to sanitarium multisource [puppet] - 10https://gerrit.wikimedia.org/r/368408 (https://phabricator.wikimedia.org/T153743)
[12:09:14] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3481022 (10Jayprakash12345) @Urbanecm Sir, Our native member made new logo  Logo:-https://commons.wikimedia.org/wiki/File:Wikividhyalay_logo...
[12:18:59] <wikibugs>	 (03PS3) 10Jcrespo: labsdb: Rename sanitarium2 to sanitarium multisource [puppet] - 10https://gerrit.wikimedia.org/r/368408 (https://phabricator.wikimedia.org/T153743)
[12:20:24] <wikibugs>	 (03CR) 10Elukey: Kafka broker profile and roles for new 'aggregate' (TBD) cluster and 'simple' cluster (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/356232 (https://phabricator.wikimedia.org/T166162) (owner: 10Ottomata)
[12:20:49] <wikibugs>	 (03PS4) 10Jcrespo: labsdb: Rename sanitarium2 to sanitarium multisource [puppet] - 10https://gerrit.wikimedia.org/r/368408 (https://phabricator.wikimedia.org/T153743)
[12:22:24] <wikibugs>	 (03CR) 10Daniel Kinzler: "@Ladsgroup sure, agree on a good config with Hoo and roll it out :)" [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[12:25:44] <wikibugs>	 (03PS5) 10Jcrespo: labsdb: Rename sanitarium2 to sanitarium multisource [puppet] - 10https://gerrit.wikimedia.org/r/368408 (https://phabricator.wikimedia.org/T153743)
[12:27:54] <wikibugs>	 (03PS3) 10Ladsgroup: mediawiki: increase the batch size of dispatchChanges cronjob [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263)
[12:28:43] <wikibugs>	 (03CR) 10Ladsgroup: "Done, @hoo: What do you think?" [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[12:32:02] <ShakespeareFan00>	 Hi
[12:32:13] <ShakespeareFan00>	 I'm getting very poor performance out of Englsih Wikisource
[12:32:27] <ShakespeareFan00>	 Sometimes it's taking over 2 mins to load pages 
[12:32:28] <wikibugs>	 (03PS6) 10Jcrespo: labsdb: Rename sanitarium2 to sanitarium multisource [puppet] - 10https://gerrit.wikimedia.org/r/368408 (https://phabricator.wikimedia.org/T153743)
[12:32:52] <wikibugs>	 (03PS1) 10Muehlenhoff: Restore access for ladsgroup [puppet] - 10https://gerrit.wikimedia.org/r/368411 (https://phabricator.wikimedia.org/T170801)
[12:35:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Restore access for ladsgroup [puppet] - 10https://gerrit.wikimedia.org/r/368411 (https://phabricator.wikimedia.org/T170801) (owner: 10Muehlenhoff)
[12:37:24] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3481082 (10Urbanecm) Okay, ack'ed. Will replace the logos.
[12:39:22] <wikibugs>	 (03PS7) 10Urbanecm: Initial configuration for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368165 (https://phabricator.wikimedia.org/T168765)
[12:40:12] <wikibugs>	 (03PS8) 10Urbanecm: Initial configuration for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368165 (https://phabricator.wikimedia.org/T168765)
[12:40:55] <wikibugs>	 10Operations, 10Commons, 10Traffic, 10media-storage: 503 error for certain JPG thumbnail: "Backend fetch failed" - https://phabricator.wikimedia.org/T171421#3481084 (10Jeff_G) >>! In T171421#3469152, @fgiunchedi wrote: > @Aklapper _usually_ traffic since this indicates varnish failure to fetch and most lik...
[12:41:53] <wikibugs>	 (03PS9) 10Urbanecm: Initial configuration for hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368165 (https://phabricator.wikimedia.org/T168765)
[12:43:30] <jynus>	 ShakespeareFan00: are you logged in, which continent are you?
[12:43:41] <ShakespeareFan00>	 Logged in - Europe
[12:44:31] <jynus>	 cannot reproduce, maybe ips issues?
[12:44:54] <ShakespeareFan00>	 Narrowing it down UK ( Vodafone)
[12:44:54] <jynus>	 *isp
[12:45:28] <jynus>	 can you ping wikipedia.org and see if you have package loss?
[12:47:22] <paladox>	 ShakespeareFan00 works for me on bt.
[12:47:36] <wikibugs>	 (03PS3) 10Urbanecm: Initial configuration for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368168 (https://phabricator.wikimedia.org/T155038)
[12:47:59] <jynus>	 I am checking network vendors maintenance or alerts and network alerts and see nothing,but I will keep looking
[12:48:34] <wikibugs>	 (03PS4) 10Urbanecm: Initial configuration for wikimania2018wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368168 (https://phabricator.wikimedia.org/T155038)
[12:49:30] <wikibugs>	 10Operations, 10Multimedia, 10TimedMediaHandler, 10HHVM, 10Patch-For-Review: Migrate video scalers to jessie - https://phabricator.wikimedia.org/T145742#3481105 (10MoritzMuehlenhoff) Another observation: Using ffmpeg to convert to ogv, the conversion works just fine (tested on stretch, will also repeat o...
[12:50:01] <jynus>	 no performance issues on the metrics https://grafana.wikimedia.org/dashboard/db/navigation-timing-by-continent
[12:50:36] <jynus>	 but please provide more info if you have it of page load performance and network issues
[12:53:33] <wikibugs>	 (03CR) 10Hoo man: [C: 031] "This should solve (or at least ease) the enwiki dispatch backlog issue for now." [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[12:55:18] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3481107 (10Jayprakash12345) >>! In T168765#3481082, @Urbanecm wrote: > Okay, ack'ed. Will replace the logos.   Yes sir, please change the lo...
[12:55:41] <wikibugs>	 (03CR) 10Jcrespo: "I am ok with this, but please let's deploy on Monday- technically, no deployments should happen on Fridays." [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[12:55:54] <wikibugs>	 10Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 10Hindi-Sites, and 2 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3481109 (10Urbanecm) Just was done before a moment
[12:56:20] <wikibugs>	 10Operations, 10Commons, 10Traffic, 10media-storage: 503 error for certain JPG thumbnail: "Backend fetch failed" - https://phabricator.wikimedia.org/T171421#3481110 (10ema) >>! In T171421#3481084, @Jeff_G wrote: >>>! In T171421#3469152, @fgiunchedi wrote: >> @Aklapper _usually_ traffic since this indicates...
[12:59:22] <wikibugs>	 (03PS1) 10Urbanecm: Revert "Revert "Set initial configuration for techconduct.wikimedia.org"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368415
[12:59:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "Revert "Set initial configuration for techconduct.wikimedia.org"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368415 (owner: 10Urbanecm)
[13:03:02] <wikibugs>	 (03PS2) 10Urbanecm: Revert "Revert "Set initial configuration for techconduct.wikimedia.org"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368415 (https://phabricator.wikimedia.org/T165977)
[13:04:06] <wikibugs>	 (03PS3) 10Urbanecm: Revert "Revert "Set initial configuration for techconduct.wikimedia.org"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368415 (https://phabricator.wikimedia.org/T165977)
[13:04:08] <jynus>	 !log upgrading and restarting db1095
[13:04:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:27] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] labsdb: Rename sanitarium2 to sanitarium multisource [puppet] - 10https://gerrit.wikimedia.org/r/368408 (https://phabricator.wikimedia.org/T153743) (owner: 10Jcrespo)
[13:04:35] <wikibugs>	 (03PS7) 10Jcrespo: labsdb: Rename sanitarium2 to sanitarium multisource [puppet] - 10https://gerrit.wikimedia.org/r/368408 (https://phabricator.wikimedia.org/T153743)
[13:06:36] <wikibugs>	 (03PS1) 10Ema: pybal::monitoring: add OK message to check_pybal_ipvs_diff [puppet] - 10https://gerrit.wikimedia.org/r/368416 (https://phabricator.wikimedia.org/T134893)
[13:07:52] <wikibugs>	 (03CR) 10Gehel: [C: 031] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga)
[13:25:37] <wikibugs>	 (03CR) 10Daniel Kinzler: "Let's hope the backlog doesn't grow huge over the weekend, then..." [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[13:28:36] <wikibugs>	 (03PS3) 10Reception123: Added wordmark for Wikipedia Atikamekw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368198
[13:28:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Added wordmark for Wikipedia Atikamekw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368198 (owner: 10Reception123)
[13:29:54] <wikibugs>	 (03CR) 10Reception123: "Conflicts. Rebase failed (still merge conflict). Not sure what I can do" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368198 (owner: 10Reception123)
[13:31:40] <ShakespeareFan00>	 I'lll assume it's localised then thanks
[13:52:35] <wikibugs>	 (03PS1) 10Urbanecm: Optimalize all PNGs in this repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368423 (https://phabricator.wikimedia.org/T170569)
[13:54:24] <wikibugs>	 (03PS1) 10Andrew Bogott: labs puppetmaster: allow all puppetmasters access to the enc api [puppet] - 10https://gerrit.wikimedia.org/r/368424
[14:00:58] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labs puppetmaster: allow all puppetmasters access to the enc api [puppet] - 10https://gerrit.wikimedia.org/r/368424 (owner: 10Andrew Bogott)
[14:01:51] <wikibugs>	 (03PS12) 10Giuseppe Lavagetto: Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678
[14:02:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678 (owner: 10Giuseppe Lavagetto)
[14:03:41] <wikibugs>	 (03PS1) 10Reedy: Make babel use Database and SUL wikis use metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368429 (https://phabricator.wikimedia.org/T145366)
[14:05:15] * halfak looks for akosiaris
[14:05:44] <wikibugs>	 10Operations, 10Multimedia, 10TimedMediaHandler, 10HHVM, 10Patch-For-Review: Migrate video scalers to jessie - https://phabricator.wikimedia.org/T145742#3481301 (10MoritzMuehlenhoff) Also tested to work fine with jessie-wikimedia: ffmpeg -i National_Archaeological_Museum_Kabile_-_near_Yambol.webm -codec:...
[14:06:15] <_joe_>	 halfak: he's on PTO
[14:06:42] <_joe_>	 whatever you needed akosiaris for, you'd have to ask someone else in ops :)
[14:07:03] <_joe_>	 and since it's 4 PM on friday, I hope you come bearing gifts :)
[14:08:09] <wikibugs>	 (03PS13) 10Giuseppe Lavagetto: Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678
[14:08:50] <halfak>	 Hey _joe_!  I've been trying to run a stress test with akosiaris for a while on the new ORES cluster.  I was hoping to have some opsen work with me in sync to make sure we learned what we needed to. 
[14:09:03] <wikibugs>	 (03PS1) 10Andrew Bogott: labs puppetmaster: allow access to the enc api via ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/368430
[14:09:44] <halfak>	 No gifts I'm afraid.  The best I can muster is scheduling a time that's not patently absurd because I live in North America :) 
[14:09:47] <_joe_>	 uhm, I don't know much about what alex was doing there
[14:10:06] <_joe_>	 but I guess I can catch-up if needed
[14:10:26] <wikibugs>	 (03CR) 10DCausse: [C: 031] "looks like it failed to merge?" [puppet] - 10https://gerrit.wikimedia.org/r/367709 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson)
[14:10:32] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labs puppetmaster: allow access to the enc api via ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/368430 (owner: 10Andrew Bogott)
[14:11:11] <paravoid>	 !log upgrading rhenium to stretch via dist-upgrade
[14:11:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:47] <halfak>	 Gist is, we have 9x2 new servers across eqiad and codfw.  I think they should be fully puppetized.  The plan is to hit them with a little stress testing utility that I made until they keel over.  I've updated our dashboard so we can check that. 
[14:11:55] <halfak>	 ores100*.eqiad.wmnet
[14:12:21] <halfak>	 _joe_, ^ 
[14:12:46] <wikibugs>	 (03Abandoned) 10Cmjohnson: Adding dns entries for kafka-jumbo100[1-6] T167992 [dns] - 10https://gerrit.wikimedia.org/r/368186 (owner: 10Cmjohnson)
[14:13:02] <Reedy>	 halfak: Stop forkbombing your own servers
[14:13:09] <_joe_>	 rotfl
[14:13:41] <icinga-wm>	 PROBLEM - DPKG on rhenium is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[14:14:12] <halfak>	 :)  Reedy 
[14:14:35] * halfak gets a task
[14:14:36] <_joe_>	 halfak: so, those servers are still not part of any pool, so you'd need to point to them directly
[14:14:43] <halfak>	 https://phabricator.wikimedia.org/T169246
[14:14:52] <_joe_>	 so you can test a single server for now
[14:15:00] <halfak>	 Right.  The stress tester will auto-roundrobin a set of hosts. 
[14:15:04] <_joe_>	 ok
[14:15:49] <halfak>	 So, the stress tester requires minimal resources.  Should I run it from bast1001? 
[14:16:19] <_joe_>	 I would run it from one of the ores hosts themselves, tbh
[14:16:35] <halfak>	 When that host starts to die, it might affect the stress tester. 
[14:17:02] <_joe_>	 uhm, if the death of ores there affects the stress tester, that's a misconfiguration
[14:17:10] <halfak>	 fair point. 
[14:17:19] <halfak>	 OK!  
[14:17:27] <_joe_>	 system should have resources set up so that any small utility can run while ores is under full load :)
[14:17:32] <halfak>	 I'll choose ores1001 then and get started. 
[14:17:36] <_joe_>	 ok
[14:17:55] <halfak>	 I'll need to run a quick minor test to make sure grafana is picking it up right. 
[14:17:56] <_joe_>	 let me know if you want me to take a look at things once you managed to send the service belly up :P
[14:18:21] <_joe_>	 let's start with a couple hosts and then ramp up?
[14:18:21] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on rhenium is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[14:18:34] <halfak>	 _joe_, I'm sure there's means of monitoring what's going on that you know about and I don't.  If you can look at them after the fact, I'll just let you know when I'm done. 
[14:18:53] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to mwlog1001.eqiad.wmnet for goransm - https://phabricator.wikimedia.org/T171958#3481349 (10GoranSMilovanovic)
[14:19:01] <_joe_>	 so that we don't risk flooding the mw api if we underestimated your benchmarking tool :P
[14:19:35] <halfak>	 _joe_, sure.  Not sure how this would affect mw api, but OK!
[14:20:44] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): instance root passwords vs. multiple puppetmasters - https://phabricator.wikimedia.org/T171959#3481362 (10Andrew)
[14:20:52] <halfak>	 OK.  So problem pulling my utility to this machine because it can't talk to the outside.  Sec. 
[14:21:10] <_joe_>	 halfak: isn't ores calling the mw api?
[14:21:27] <_joe_>	 how is it fetching revisions/edits if not that way? :)
[14:21:40] <halfak>	 Oh!  derp.  GOod point :) 
[14:21:51] <halfak>	 I figured the api is way higher capacity :D 
[14:21:52] <icinga-wm>	 PROBLEM - salt-minion processes on rhenium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[14:22:08] <_joe_>	 uhm
[14:22:37] * halfak figures out how to move the necessary files around. 
[14:22:42] <_joe_>	 I'm not that sure everything is configured like I'd like it to be in these systems
[14:23:09] <_joe_>	 all of them use a local redis instance, AFAICT
[14:23:34] <halfak>	 Oh...  They should share a redis instance
[14:23:36] * halfak digs
[14:23:39] <_joe_>	 wait
[14:23:45] <_joe_>	 I'm looking into it
[14:24:06] <_joe_>	 they're definitely using a local redis instance
[14:24:27] <halfak>	 yup
[14:24:29] <halfak>	 abort :( 
[14:24:34] <halfak>	 This isn't prod-like
[14:24:35] <_joe_>	 no wait
[14:24:38] <halfak>	 So it won't work
[14:24:46] <_joe_>	 they're all using a redis instance on ores1001
[14:24:51] <_joe_>	 at least the ones in eqiad
[14:25:14] <_joe_>	 yup
[14:25:17] <_joe_>	 sorry, brb
[14:25:18] <halfak>	 Ohhhh...  That's not terrible
[14:25:20] <halfak>	 kk
[14:25:35] <halfak>	 This is prod-ish
[14:25:39] <halfak>	 unabort
[14:25:50] <halfak>	 but I'm not sure we're going to get a good and accurate understanding
[14:27:40] <_joe_>	 the issue being
[14:27:51] <icinga-wm>	 PROBLEM - puppet last run on rhenium is CRITICAL: Return code of 255 is out of bounds
[14:27:52] <_joe_>	 I didn't inspect the config of that redis instance
[14:27:57] <_joe_>	 I'm going to look now
[14:29:02] <icinga-wm>	 PROBLEM - MD RAID on rhenium is CRITICAL: Return code of 255 is out of bounds
[14:29:06] <_joe_>	 halfak: it should be ok, I'd suggest to leave ores1001 out of your tests for now
[14:29:11] <icinga-wm>	 PROBLEM - configured eth on rhenium is CRITICAL: Return code of 255 is out of bounds
[14:29:11] <icinga-wm>	 PROBLEM - Check systemd state on rhenium is CRITICAL: Return code of 255 is out of bounds
[14:29:12] <icinga-wm>	 PROBLEM - Check size of conntrack table on rhenium is CRITICAL: Return code of 255 is out of bounds
[14:29:15] <_joe_>	 the list of hosts I mean
[14:29:21] <icinga-wm>	 PROBLEM - Disk space on rhenium is CRITICAL: Return code of 255 is out of bounds
[14:29:22] <icinga-wm>	 PROBLEM - dhclient process on rhenium is CRITICAL: Return code of 255 is out of bounds
[14:29:24] <_joe_>	 can someone check on rhenium please?
[14:29:27] <halfak>	 _joe_, OK will do
[14:30:30] <ema>	 checking rhenium
[14:30:32] <herron>	 _joe_ checking rhenium
[14:30:41] <ema>	 herron: go ahead :)
[14:30:58] <halfak>	 Confirmed the stress tester can run
[14:31:15] <herron>	 ok :D
[14:31:24] <_joe_>	 eheh
[14:31:36] <_joe_>	 halfak: ok, let me know which machines you're targeting
[14:31:42] <halfak>	 confirmed that grafana reports activity in the cluster
[14:31:47] <halfak>	 just ores1002/3 right now
[14:31:55] <halfak>	 On super duper light mode
[14:32:40] <_joe_>	 cool
[14:34:00] <halfak>	 BTW, the celery queue will make sure that all of the machines get hit for CPU work. 
[14:34:03] <moritzm>	 rhenium is fine, Faidon upgraded it to stretch
[14:34:09] <halfak>	 It'll just distribute however it can
[14:34:12] <icinga-wm>	 RECOVERY - configured eth on rhenium is OK: OK - interfaces up
[14:34:19] <paravoid>	 s/upgrading/is upgrading/
[14:34:21] <icinga-wm>	 RECOVERY - Check size of conntrack table on rhenium is OK: OK: nf_conntrack is 0 % full
[14:34:21] <icinga-wm>	 RECOVERY - Disk space on rhenium is OK: DISK OK
[14:34:22] <icinga-wm>	 RECOVERY - dhclient process on rhenium is OK: PROCS OK: 0 processes with command name dhclient
[14:34:26] <halfak>	 But by hitting specific machines through http, we'll make sure all of the IO (mostly mw api) happens there
[14:34:27] <paravoid>	 I !logged it too, see above
[14:34:30] <halfak>	 _joe_, ^ 
[14:34:31] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on rhenium is OK: OK ferm input default policy is set
[14:34:52] <paravoid>	 ema: ^
[14:35:02] <paravoid>	 and _joe_ 
[14:35:07] <_joe_>	 paravoid: heh I lost that in the shower of messages here
[14:35:11] <icinga-wm>	 RECOVERY - MD RAID on rhenium is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[14:35:33] <ema>	 herron: rhenium is OK ^
[14:36:09] <herron>	 \o/
[14:36:19] * halfak updates some more grafana based on missing stats. 
[14:37:55] <jynus>	 load on s2 seems higher than usual
[14:39:24] <jynus>	 main traffic, not api
[14:40:39] <halfak>	 waiting on getting a dataset of random revision IDs for the test...
[14:40:41] <icinga-wm>	 RECOVERY - DPKG on rhenium is OK: All packages OK
[14:40:48] <halfak>	 SHould just be 1-2 more minutes. 
[14:41:09] <_joe_>	 halfak: the sistems are unimpressed for now :P
[14:41:15] <_joe_>	 *systems
[14:41:52] <icinga-wm>	 RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[14:42:12] <icinga-wm>	 RECOVERY - Check systemd state on rhenium is OK: OK - running: The system is fully operational
[14:42:51] <icinga-wm>	 RECOVERY - salt-minion processes on rhenium is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[14:44:28] <wikibugs>	 (03PS1) 10Andrew Bogott: Only store instance root passwords on the frontend puppetmaster. [puppet] - 10https://gerrit.wikimedia.org/r/368433 (https://phabricator.wikimedia.org/T171959)
[14:45:23] <halfak>	 OK here we go.  Just hitting ores1002/3
[14:46:55] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Only store instance root passwords on the frontend puppetmaster. [puppet] - 10https://gerrit.wikimedia.org/r/368433 (https://phabricator.wikimedia.org/T171959) (owner: 10Andrew Bogott)
[14:48:13] <_joe_>	 halfak: your tool is submitting requests without a revid
[14:48:16] <_joe_>	 AFAICT
[14:48:22] <halfak>	 Gotcha. 
[14:48:22] <_joe_>	  /v3/scores/enwiki/?features=&revids=
[14:48:23] <halfak>	 Checking
[14:48:39] <halfak>	 Ahh yes.  i can see that in the logging now. 
[14:49:12] <halfak>	 Oh strange
[14:49:21] <_joe_>	 andrewbogott: is storing those passwords a security issue?
[14:49:21] <icinga-wm>	 PROBLEM - Check systemd state on rhenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[14:49:22] <wikibugs>	 (03PS1) 10Jcrespo: sanitarium3: Convert db1102 into a proper multi-instance host [puppet] - 10https://gerrit.wikimedia.org/r/368434 (https://phabricator.wikimedia.org/T169514)
[14:49:36] <_joe_>	 I know nothing about that generator script
[14:49:57] <halfak>	 I know what happened.  I'll need to file a quarry bug
[14:49:58] <_joe_>	 but neither the old or the new guard  work
[14:50:01] <halfak>	 but there was also human error ;) 
[14:50:15] <_joe_>	 the new one is particularly easy to spoof
[14:50:17] <andrewbogott>	 _joe_:  tell me about how they don't work?
[14:50:30] <andrewbogott>	 (It's only a mild security issue since getting the console in the first place also requires prod access)
[14:50:38] <wikibugs>	 (03Abandoned) 10Jcrespo: mariadb: Switch db1102 role from sanitarium3->dbstore_multiinstance [puppet] - 10https://gerrit.wikimedia.org/r/363204 (https://phabricator.wikimedia.org/T169514) (owner: 10Jcrespo)
[14:50:52] <_joe_>	 andrewbogott: if I get this correctly, you want to prevent someone from running that script from a self-hosted puppetmaster?
[14:51:04] <wikibugs>	 10Operations, 10Analytics, 10netops, 10User-Elukey: Review ACLs for the Analytics VLAN - https://phabricator.wikimedia.org/T157435#3481411 (10elukey)
[14:51:07] <_joe_>	 I'm not sure what's the context, what you're trying to guard against
[14:51:21] <icinga-wm>	 RECOVERY - Check systemd state on rhenium is OK: OK - running: The system is fully operational
[14:51:38] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678 (owner: 10Giuseppe Lavagetto)
[14:51:51] <andrewbogott>	 _joe_: I want one and only one puppetmaster (the one running in prod) to have the store of all the passwords.  So once an instance switches to a self-hosted puppetmaster it should stop generating and storing them
[14:51:58] <andrewbogott>	 oh, hm...
[14:52:02] <halfak>	 OK attempting again
[14:52:23] <wikibugs>	 (03Merged) 10jenkins-bot: Add filters to the future parser [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/367678 (owner: 10Giuseppe Lavagetto)
[14:52:28] <andrewbogott>	 _joe_: you're right, my new patch is dumb since it means that if the puppetmaster name is changed in hiera the test still passds
[14:52:34] <andrewbogott>	 so I guess I have to hard-code it
[14:52:34] <_joe_>	 yes
[14:52:39] <_joe_>	 or if you spoof dns
[14:53:00] <_joe_>	 but then people can change the code
[14:53:01] <_joe_>	 :)
[14:54:37] <_joe_>	 so my suggestion was to do something simpler even, without using ipresolve
[14:55:00] <wikibugs>	 (03PS1) 10Andrew Bogott: Another change to generation of root passwords [puppet] - 10https://gerrit.wikimedia.org/r/368435 (https://phabricator.wikimedia.org/T171959)
[14:56:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Another change to generation of root passwords [puppet] - 10https://gerrit.wikimedia.org/r/368435 (https://phabricator.wikimedia.org/T171959) (owner: 10Andrew Bogott)
[14:56:14] <halfak>	 _joe_, I feel like this test has been running pretty well.  It's at 1/10th of what I think capacity should be and only hitting two nodes directly.  What do you think?
[14:56:51] <halfak>	 ~600 requests per minute.
[14:57:10] <halfak>	 I want to add the other nodes and try ~2000 requests per minute
[14:57:13] <_joe_>	 servers are unimpressed generally
[14:57:17] <halfak>	 :)  
[14:57:23] <_joe_>	 I'd say go on
[14:57:25] <halfak>	 OK time for a real test!
[14:57:36] * halfak starts taking notes and noting timestamps
[14:58:09] <halfak>	 wow.  3365*5 scores generates and no timeout errors :) 
[14:59:39] <_joe_>	 go on and try harder :P
[15:00:00] <halfak>	 Here we go!
[15:00:18] <wikibugs>	 (03PS2) 10Jcrespo: sanitarium3: Convert db1102 into a proper multi-instance host [puppet] - 10https://gerrit.wikimedia.org/r/368434 (https://phabricator.wikimedia.org/T169514)
[15:02:33] <halfak>	 Woops.  Looks like we had a brief overload event
[15:02:39] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: puppet_compiler: upgrade to 0.2.2 [puppet] - 10https://gerrit.wikimedia.org/r/368437
[15:02:41] <wikibugs>	 (03PS2) 10Andrew Bogott: Disable generation of root passwords for now [puppet] - 10https://gerrit.wikimedia.org/r/368435 (https://phabricator.wikimedia.org/T171289)
[15:02:55] <_joe_>	 halfak: can you define "overload"?
[15:02:59] <halfak>	 Wait... wasn't an overload. 
[15:03:08] <halfak>	 We set backpressures on our celery queue. 
[15:03:12] <_joe_>	 yeah not from the servers' prespective
[15:03:16] <halfak>	 When it gets too big, we start to return 503s
[15:03:21] <_joe_>	 uhm
[15:03:27] <_joe_>	 so we need to have more celery workers
[15:03:33] <halfak>	 right. 
[15:03:47] <halfak>	 How's memory usage?
[15:03:51] <halfak>	 That's our ceiling for workers.
[15:04:04] <_joe_>	 well, if you have a task open for this, add date/times (in UTC)
[15:04:06] <_joe_>	 65881080 total, 14889904 used, 50991176 free
[15:04:11] <_joe_>	 tons of free memory
[15:04:30] <_joe_>	 that's why I said we will need to tune it
[15:04:45] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Disable generation of root passwords for now [puppet] - 10https://gerrit.wikimedia.org/r/368435 (https://phabricator.wikimedia.org/T171289) (owner: 10Andrew Bogott)
[15:04:53] <halfak>	 Right.  We should definitely up that worker count.  How do you feel about making changes to these servers directly?
[15:05:39] <halfak>	 Confirmed that for a brief moment, 1003 returned a 503 because it thought the queue was too big. 
[15:05:55] <_joe_>	 halfak: at 5 pm on friday?
[15:06:07] <_joe_>	 uhm lemme find the appropriate gif for that :P
[15:06:08] <halfak>	 _joe_, good point.  probably don't want to do that. 
[15:06:13] <halfak>	 lol
[15:06:22] <jynus>	 I also asked to wait for a wikidata deployment
[15:06:26] <jynus>	 in similar terms
[15:06:39] <_joe_>	 jynus: this is out of production atm
[15:06:43] <jynus>	 ok
[15:06:43] <_joe_>	 so it's less critical
[15:06:50] <halfak>	 _joe_, I'm OK with calling it right here so you can enjoy your evening.  I feel like this was very useful already and I'll be more prepared to try again. 
[15:07:02] <halfak>	 WOuld you be willing to work with me around the same time on Monday? 
[15:07:10] <halfak>	 I could show up an hour earlier too without much pain. 
[15:07:17] <_joe_>	 still, if something goes wrong with whatever me and halfak are doing we're not burning down production
[15:07:33] <halfak>	 right.  These servers aren't pooled
[15:07:38] <halfak>	 No external requests
[15:07:42] <_joe_>	 halfak: heh monday is meetings day, but if you show up at about 14:00Z I have a couple hours
[15:07:47] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Switch to new labs puppetmasters - https://phabricator.wikimedia.org/T171786#3481437 (10Andrew)
[15:07:52] <halfak>	 How about 1300 UTC?
[15:07:58] <_joe_>	 that's even better
[15:07:59] <_joe_>	 :)
[15:08:14] <_joe_>	 if you want to do more tests, please do
[15:08:27] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Switch to new labs puppetmasters - https://phabricator.wikimedia.org/T171786#3476152 (10Andrew)
[15:08:32] <halfak>	 {{done}}!
[15:08:34] <_joe_>	 I have one thing to finish, then I might be able to look at the data
[15:08:50] <halfak>	 _joe_, OK!  I'll hit it hard before I give up for the day. 
[15:08:52] <_joe_>	 what is the task for this load-testing?
[15:09:01] <icinga-wm>	 PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0]
[15:09:58] <halfak>	 https://phabricator.wikimedia.org/T169246
[15:10:19] <_joe_>	 ok, I'll subscribe :)
[15:10:53] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team, 10Patch-For-Review, 10User-Joe: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3481458 (10Joe)
[15:11:34] <halfak>	 Triple the rate of request!
[15:11:59] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: puppet_compiler: upgrade to 0.2.2 [puppet] - 10https://gerrit.wikimedia.org/r/368437
[15:12:37] <halfak>	 OK Definitely over capacity!
[15:12:45] <halfak>	 Cool!  
[15:13:16] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] puppet_compiler: upgrade to 0.2.2 [puppet] - 10https://gerrit.wikimedia.org/r/368437 (owner: 10Giuseppe Lavagetto)
[15:16:01] <icinga-wm>	 PROBLEM - salt-minion processes on rhenium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[15:17:12] <halfak>	 Trying out batches of 50 
[15:17:43] <halfak>	 Still getting an overload.  I'm surprised. 
[15:17:57] <halfak>	 I suppose the bottleneck continues to be celery. 
[15:18:05] <halfak>	 And batching only affects IO (uwsgi workers)
[15:18:25] * halfak talks to himself.  
[15:18:27] <halfak>	 But it helps
[15:22:48] <wikibugs>	 (03CR) 10Gehel: "yep, waiting Monday to merge and deploy..." [puppet] - 10https://gerrit.wikimedia.org/r/367709 (https://phabricator.wikimedia.org/T169498) (owner: 10EBernhardson)
[15:23:30] <jynus>	 !log upgrading and restarting db1102
[15:23:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:23:49] <_joe_>	 halfak: yes, we should raise the number of celery workers
[15:23:50] <wikibugs>	 (03PS3) 10Jcrespo: sanitarium3: Convert db1102 into a proper multi-instance host [puppet] - 10https://gerrit.wikimedia.org/r/368434 (https://phabricator.wikimedia.org/T169514)
[15:24:03] <_joe_>	 I'll check our metrics
[15:24:40] <halfak>	 +1 
[15:25:39] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team, 10Patch-For-Review, 10User-Joe: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3481494 (10Halfak) I've completed a few tests.  TL;DR: we need to up our celery worker count before we'll get an accurate reflection of the cap...
[15:25:43] <halfak>	 Just about to leave a note with my tests on the phab card. 
[15:25:43] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3481495 (10Jgreen)
[15:25:45] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] sanitarium3: Convert db1102 into a proper multi-instance host [puppet] - 10https://gerrit.wikimedia.org/r/368434 (https://phabricator.wikimedia.org/T169514) (owner: 10Jcrespo)
[15:27:10] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3481515 (10Jgreen)
[15:27:13] <wikibugs>	 10Operations, 10ops-eqiad, 10netops: eqiad: rack frack refresh equipment - https://phabricator.wikimedia.org/T169644#3481514 (10Jgreen)
[15:31:08] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3481522 (10ayounsi) We have plenty of ports on the new switches to accommodate that. My suggestion is that we do it after the migration to the new infra (and...
[15:37:06] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Do not filter catalogs if they have not compiled. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/368438
[15:37:11] <icinga-wm>	 RECOVERY - salt-minion processes on rhenium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[15:38:48] <ShakespeareFan00>	 Sorry, still having performance issue
[15:38:51] <ShakespeareFan00>	 In Europe
[15:38:52] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Do not filter catalogs if they have not compiled. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/368438
[15:39:01] <ShakespeareFan00>	 It's taking over 2 mins to load some pages
[15:39:10] <_joe_>	 ShakespeareFan00: care to make an example?
[15:39:10] <ShakespeareFan00>	 This is NOT acceptable
[15:39:25] <ShakespeareFan00>	 https://en.wikisource.org/wiki/Page:The_Cutter%27s_Practical_Guide_Part_13.djvu/65
[15:39:29] <ShakespeareFan00>	 Took over 2 mins to loaf
[15:39:30] <ShakespeareFan00>	 *load
[15:39:35] <_joe_>	 this is a djvu single page
[15:39:44] <ShakespeareFan00>	 or didn't even finish loading
[15:39:49] <ShakespeareFan00>	 _joe_ : that correct
[15:39:56] <_joe_>	 that loaded in 108 ms for me
[15:40:04] <ShakespeareFan00>	 Not for me
[15:40:09] <_joe_>	 can you try now?
[15:40:42] <ShakespeareFan00>	 I can but it's been consistently poor since this morning
[15:40:55] <_joe_>	 so can I have another example?
[15:41:00] <_joe_>	 one that is currently slow
[15:41:10] <_joe_>	 or I cannot really help trying to investigate
[15:41:21] <icinga-wm>	 PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[debdeploy-minion],Package[quickstack]
[15:41:25] <_joe_>	 Is this only happening with wikisource and djvu?
[15:41:25] <ShakespeareFan00>	 https://en.wikisource.org/wiki/Page:Dictionary_of_National_Biography_volume_26.djvu/16
[15:41:32] <ShakespeareFan00>	 #https://en.wikisource.org/wiki/Page:The_Cutter%27s_Practical_Guide_Part_13.djvu/65
[15:41:43] <ShakespeareFan00>	 _joe_ : I haven't looked at other wikis yet
[15:41:47] <_joe_>	 loaded both in ~ 110 ms
[15:42:00] <_joe_>	 ShakespeareFan00: that seems like a problem with multi-page djvu files
[15:42:01] <ShakespeareFan00>	 But yeah
[15:42:06] <_joe_>	 but I'm not sure
[15:42:15] <ShakespeareFan00>	 having problems with the main page of en.wikipedia.org just now
[15:42:26] <ShakespeareFan00>	 Images aren't loading - https://en.wikipedia.org/wiki/Main_Page
[15:42:33] <_joe_>	 ok then it's definitely your connection
[15:42:35] <paladox>	 https://en.wikipedia.org/wiki/Main_Page loads fine for me using bt.
[15:43:17] <ShakespeareFan00>	 Puzzling
[15:43:30] <ShakespeareFan00>	 Because I am not having issues with other websites
[15:43:33] <_joe_>	 ShakespeareFan00: can you tell me your IP in private?
[15:43:46] <ShakespeareFan00>	 IP or iSP?
[15:43:50] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3481550 (10RobH) >>! In T171962#3481522, @ayounsi wrote: > We have plenty of ports on the new switches to accommodate that. My suggestion is that we do it aft...
[15:43:51] <_joe_>	 both :P
[15:43:56] <_joe_>	 ideally
[15:44:03] <ShakespeareFan00>	 ISP is Voadfone
[15:44:08] <ShakespeareFan00>	 (ex Demon/Thus)
[15:44:19] <_joe_>	 and a traceroute to en.wikipedia.org
[15:44:22] <ShakespeareFan00>	 As to the IP I'm not sure as i think it's a pool address which is dynamic
[15:44:59] <_joe_>	 ShakespeareFan00: literally write "whats my ip" in google :)
[15:45:12] <_joe_>	 and share it in private
[15:45:15] <ShakespeareFan00>	 Tracert isn't givign sensible results
[15:46:18] <paladox>	 ShakespeareFan00 restart the router :).
[15:46:34] <paladox>	 Have you checked your local exchange for any maint?
[15:46:45] <ShakespeareFan00>	 paladox: I haven't
[15:46:53] <ShakespeareFan00>	 So it could be that
[15:46:58] <XioNoX>	 hi
[15:47:28] <XioNoX>	 if you can also give the output of a speedtest.net 
[15:47:46] <paladox>	 Try https://www.homeandwork.openreach.co.uk/help-and-support/local-network-status-checker.aspx
[15:47:53] <ShakespeareFan00>	 No local maintainence that I can obviously find
[15:49:13] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Do not filter catalogs if they have not compiled. [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/368438 (owner: 10Giuseppe Lavagetto)
[15:51:19] <paladox>	 ShakespeareFan00: https://support.vodafone.co.uk/Vodafone-products-and-services/Broadband/Vodafone-broadband-router/57355761/My-broadband-is-slow-How-can-I-make-it-faster.htm
[15:52:10] <ShakespeareFan00>	 paladox: usual helpdesk advice I already know and follow
[15:52:12] <ShakespeareFan00>	 ;)
[15:52:19] <ShakespeareFan00>	 (And completly useless.)
[15:52:20] <paladox>	 oh :)
[15:53:12] <paladox>	 ShakespeareFan00 check your phone line by picking the phone up to see if there's any noise. The lines are known to be slow if you have a phone line fault.
[15:53:28] <ShakespeareFan00>	 paladox: It's a dedicated line 
[15:53:32] <ShakespeareFan00>	 XD
[15:53:38] <ShakespeareFan00>	 It shouldn't be "noisy"
[15:53:41] <paladox>	 what do you mean dedicated?
[15:53:54] <paladox>	 if there's a line fault there will be noise.
[15:54:06] <XioNoX>	 ShakespeareFan00: also https://wikitech.wikimedia.org/wiki/Reporting_a_connectivity_issue
[15:58:12] <paladox>	 ShakespeareFan00 here's some bt things not sure if the bt check your line will work for you but here it is  https://www.bt.com/help/home/broadband/speedtest/ (but i hope some of these will work for you as vodaphone uses bts lines)
[15:58:16] <paladox>	 https://www.bt.com/consumerFaultTracking/public/faults/tracking.do?pageId=31 
[15:58:52] <paladox>	 ShakespeareFan00 is any of the areas above your area?
[15:59:22] <ShakespeareFan00>	 Nope
[15:59:38] <ShakespeareFan00>	 Although problems in London Thamesmead might affect UK connectvity with the rest of the wrold
[15:59:44] <ShakespeareFan00>	 because of where LINX is
[16:00:18] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: puppet-compiler: bump it up again [puppet] - 10https://gerrit.wikimedia.org/r/368443
[16:01:00] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] puppet-compiler: bump it up again [puppet] - 10https://gerrit.wikimedia.org/r/368443 (owner: 10Giuseppe Lavagetto)
[16:01:04] <paladox>	 ShakespeareFan00 but then it would affect me too
[16:01:07] <paladox>	 which it dosen't
[16:01:23] <paladox>	 try restarting the router. It could be the big green box outside your house.
[16:02:35] <paladox>	 ShakespeareFan00 do you have fttc, fttp or adsl broaband?
[16:02:46] <ShakespeareFan00>	 ADSL broadband
[16:02:50] <ShakespeareFan00>	 As far as I know
[16:05:02] <paladox>	 that would explain somethings.
[16:06:35] <wikibugs>	 10Operations, 10Puppet, 10Traffic, 10Mobile, and 2 others: URLs with title query string parameter and additional query string parameters do not redirect to mobile site - https://phabricator.wikimedia.org/T154227#3481621 (10Jdlrobson) >>! In T154227#3480505, @Nemo_bis wrote: > Can you guarantee to support a...
[16:08:07] <paladox>	 ShakespeareFan00 could you run what XioNoX has asked please :)
[16:08:31] <ShakespeareFan00>	 I don't have curl 
[16:08:35] <ShakespeareFan00>	 (Not a linux user)
[16:08:43] <paladox>	 ShakespeareFan00 install git
[16:08:49] <paladox>	 php has curl
[16:08:56] <ShakespeareFan00>	 Well ....
[16:09:03] <ShakespeareFan00>	 I am reticent to install anything 
[16:09:04] <XioNoX>	 speedtest and traceroutes would be a good start
[16:09:12] <ShakespeareFan00>	 I did a tracert
[16:09:19] <ShakespeareFan00>	 It didn't show anything obvious
[16:09:21] <icinga-wm>	 PROBLEM - carbon-cache too many creates on graphite1001 is CRITICAL: CRITICAL: 1.67% of data above the critical threshold [1000.0]
[16:11:17] <paladox>	 ShakespeareFan00 could it be possible vodaphone is throtlling wikipedia?  Though bt and other big providers have signed a document saying they wont do that but i am not aware vodaphone did.
[16:12:18] <paladox>	 ShakespeareFan00 http://speedtest.net  could you run that please?
[16:13:18] <ShakespeareFan00>	 paladox: If they are throttlling it wouldn't suprise me
[16:13:50] <paladox>	 that would then be the only provider that does
[16:14:14] <wikibugs>	 (03PS4) 10Jcrespo: sanitarium3: Convert db1102 into a proper multi-instance host [puppet] - 10https://gerrit.wikimedia.org/r/368434 (https://phabricator.wikimedia.org/T169514)
[16:14:45] <paladox>	 ShakespeareFan00 does vodaphone do support through twitter? If so they may be able to get an engineer to look into that for you :).
[16:15:17] <ShakespeareFan00>	 last time I dealt with Voadphone support it was hard to get them to acknoweldge there was an issue
[16:16:05] <paladox>	 I have a thiery on why you have slow broadband but i have not used adsl in a long while (too slow). It's possible that the thing in the green box is detecting a fault on the line so it increases latency and lower speeds.
[16:16:54] <paladox>	 oh wait, your not with green box (my mistake) your connected directly to the exchange because you use adsl
[16:18:12] <wikibugs>	 (03CR) 10GWicke: [C: 031] JobQueueEventBus: Enable job events in group0 wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368258 (https://phabricator.wikimedia.org/T163380) (owner: 10Ppchelko)
[16:18:51] <andrewbogott>	 !log apt-get install apache2 on californium for security updates
[16:19:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:26] <ShakespeareFan00>	 paladox:http://www.speedtest.net/my-result/6493425327
[16:19:37] <andrewbogott>	 !log apt-get install apache2 on silver for security updates
[16:19:40] <paladox>	 PING 390 ms
[16:19:43] <wikibugs>	 10Operations, 10Pybal, 10Traffic, 10Patch-For-Review: Add support for setting weight=0 when depooling - https://phabricator.wikimedia.org/T86650#3481655 (10ema) p:05Low>03High
[16:19:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:50] <paladox>	 that's very high
[16:20:04] <paladox>	 mine was in the 00 when i had adsl
[16:20:09] <ShakespeareFan00>	 But the download speed is 9.mbps which is excellent for the UK
[16:20:10] <paladox>	 00 -> tens
[16:20:31] <paladox>	 that's excellent for adsl but i get 60+ on fibre
[16:21:22] <paladox>	 Your internet is provided by http://demon.net
[16:21:37] <andrewbogott>	 !log apt-get install apache2 on labcontrol1001 and labcontrol1002 for security updates
[16:21:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:23:06] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp402[34] - https://phabricator.wikimedia.org/T171966#3481656 (10RobH)
[16:23:34] <wikibugs>	 (03PS1) 10Reception123: Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769)
[16:24:21] <icinga-wm>	 RECOVERY - carbon-cache too many creates on graphite1001 is OK: OK: Less than 1.00% above the threshold [500.0]
[16:25:57] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp4022 - https://phabricator.wikimedia.org/T171967#3481679 (10RobH)
[16:30:17] <wikibugs>	 (03PS1) 10Cmjohnson: Adding dns entries (mgmt and production) for labstore1006/7 public vlan T167984 [dns] - 10https://gerrit.wikimedia.org/r/368445
[16:30:27] <wikibugs>	 (03CR) 10Reception123: [C: 031] Make wikiquote.png equivalent to enwikiquote.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368244 (https://phabricator.wikimedia.org/T171887) (owner: 10Urbanecm)
[16:32:45] <wikibugs>	 (03PS1) 10RobH: install params for cp402[34].ulsfo.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/368447 (https://phabricator.wikimedia.org/T171966)
[16:33:16] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: setup/install cp402[34] - https://phabricator.wikimedia.org/T171966#3481733 (10RobH)
[16:34:40] <wikibugs>	 (03CR) 10RobH: [C: 032] install params for cp402[34].ulsfo.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/368447 (https://phabricator.wikimedia.org/T171966) (owner: 10RobH)
[16:38:01] <icinga-wm>	 RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[16:41:05] <wikibugs>	 (03CR) 10محمد شعیب: [C: 031] Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123)
[16:41:38] <wikibugs>	 (03PS5) 10Jcrespo: sanitarium3: Convert db1102 into a proper multi-instance host [puppet] - 10https://gerrit.wikimedia.org/r/368434 (https://phabricator.wikimedia.org/T169514)
[16:46:54] <wikibugs>	 (03PS1) 10Andrew Bogott: labs puppetmaster: validate cert name before autosigning [puppet] - 10https://gerrit.wikimedia.org/r/368449 (https://phabricator.wikimedia.org/T171289)
[16:51:05] <wikibugs>	 10Operations, 10fundraising-tech-ops, 10netops: Move codfw frack to new infra - https://phabricator.wikimedia.org/T171970#3481776 (10ayounsi)
[16:52:36] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-multiinstance: Add missing basedir to config [puppet] - 10https://gerrit.wikimedia.org/r/368450 (https://phabricator.wikimedia.org/T169514)
[16:54:29] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb-multiinstance: Add missing basedir to config [puppet] - 10https://gerrit.wikimedia.org/r/368450 (https://phabricator.wikimedia.org/T169514) (owner: 10Jcrespo)
[16:56:55] <wikibugs>	 (03PS2) 10Andrew Bogott: labs puppetmaster: validate cert name before autosigning [puppet] - 10https://gerrit.wikimedia.org/r/368449 (https://phabricator.wikimedia.org/T171961)
[16:57:20] <wikibugs>	 (03PS1) 10RobH: fixing cp402[34] mac addresses [puppet] - 10https://gerrit.wikimedia.org/r/368453 (https://phabricator.wikimedia.org/T171966)
[16:57:48] <wikibugs>	 (03PS2) 10RobH: fixing cp402[34] mac addresses [puppet] - 10https://gerrit.wikimedia.org/r/368453 (https://phabricator.wikimedia.org/T171966)
[16:59:29] <wikibugs>	 (03CR) 10RobH: [C: 032] fixing cp402[34] mac addresses [puppet] - 10https://gerrit.wikimedia.org/r/368453 (https://phabricator.wikimedia.org/T171966) (owner: 10RobH)
[17:03:41] <icinga-wm>	 PROBLEM - Check systemd state on db1102 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[17:07:42] <icinga-wm>	 RECOVERY - Check systemd state on db1102 is OK: OK - running: The system is fully operational
[17:08:58] <wikibugs>	 (03PS1) 10Andrew Bogott: fullstack:  Switch back to the normal schedule pool [puppet] - 10https://gerrit.wikimedia.org/r/368454
[17:09:00] <wikibugs>	 (03PS1) 10Andrew Bogott: nova: add labvirt1016 to the scheduling pool [puppet] - 10https://gerrit.wikimedia.org/r/368455
[17:10:07] <icinga-wm>	 RECOVERY - mysqld processes on db1102 is OK: PROCS OK: 3 processes with command name mysqld
[17:10:07] <wikibugs>	 10Operations, 10Wikimedia-log-errors: mw1209 /usr/bin/timeout: the monitored command dumped core - https://phabricator.wikimedia.org/T171903#3481878 (10thcipriani) >>! In T171903#3481698, @hashar wrote: > Without knowing the command passed to it, I am not sure how to track the root cause of that. >  > ulimit `...
[17:11:44] <apergos>	 recovery page w/o the down page
[17:11:48] <apergos>	 huh
[17:11:55] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] fullstack:  Switch back to the normal schedule pool [puppet] - 10https://gerrit.wikimedia.org/r/368454 (owner: 10Andrew Bogott)
[17:12:59] <jynus>	 apergos: see my comment on the other channel- that was broken for a long time
[17:13:12] <jynus>	 it took me 3 days to fix it
[17:15:42] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on db1102 is CRITICAL: CRITICAL slave_io_state could not connect
[17:16:22] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] nova: add labvirt1016 to the scheduling pool [puppet] - 10https://gerrit.wikimedia.org/r/368455 (owner: 10Andrew Bogott)
[17:16:41] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s6 on db1102 is CRITICAL: CRITICAL slave_io_state could not connect
[17:17:14] <apergos>	 ok!
[17:17:31] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on db1102 is CRITICAL: CRITICAL slave_io_state could not connect
[17:18:59] <herron>	 !log cleaned up core files in mw1209:/var/tmp/core to clear disk alert
[17:19:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:01] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on db1102 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:22:01] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on db1102 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:22:51] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on db1102 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:25:06] <wikibugs>	 10Operations, 10Wikimedia-log-errors: mw1209 /usr/bin/timeout: the monitored command dumped core - https://phabricator.wikimedia.org/T171903#3481957 (10herron) @Joe and I were just looking at this because icinga had fired a disk alert.    The 512M /var/cache/hhvm/cli.hhbc.sq3 file has been removed, and /var/tm...
[17:25:22] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db1102 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:26:21] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db1102 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:27:11] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db1102 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:30:07] <wikibugs>	 10Operations, 10Wikimedia-log-errors: mw1209 /usr/bin/timeout: the monitored command dumped core - https://phabricator.wikimedia.org/T171903#3479715 (10MoritzMuehlenhoff) The  /var/cache/hhvm/cli.hhbc.sq3 caches were cleared when I upgraded to 3.18, I doubt any of those grew to 512 again. I also created https:...
[17:31:54] <wikibugs>	 (03PS1) 10Jcrespo: sanitarium_multiinstance: Enable binlog [puppet] - 10https://gerrit.wikimedia.org/r/368458 (https://phabricator.wikimedia.org/T169514)
[17:32:33] <jynus>	 that is me, downtime expired- things took more than I expected
[17:35:38] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] sanitarium_multiinstance: Enable binlog [puppet] - 10https://gerrit.wikimedia.org/r/368458 (https://phabricator.wikimedia.org/T169514) (owner: 10Jcrespo)
[17:36:07] <wikibugs>	 (03PS2) 10Jdlrobson: Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123)
[17:36:09] <wikibugs>	 (03CR) 10Jdlrobson: "PS2 compresses the SVG:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123)
[17:37:29] <wikibugs>	 (03CR) 10Reception123: [C: 031] Add new mobile watermark for Urdu Wikipedia. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123)
[17:37:32] <wikibugs>	 (03CR) 10Jdlrobson: Add new mobile watermark for Urdu Wikipedia. (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368444 (https://phabricator.wikimedia.org/T171769) (owner: 10Reception123)
[17:37:55] <wikibugs>	 (03PS1) 10Jforrester: [DNM] ContInt: Upgrade npm from 2.15.2 to 3.8.3 in CI [puppet] - 10https://gerrit.wikimedia.org/r/368459 (https://phabricator.wikimedia.org/T161861)
[17:38:15] <wikibugs>	 (03CR) 10Jdlrobson: [C: 04-1] "Logo is incorrect. @nirzar will provide a more suitable one." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368198 (owner: 10Reception123)
[17:39:01] <icinga-wm>	 PROBLEM - Host rhenium is DOWN: PING CRITICAL - Packet loss = 100%
[17:39:16] <wikibugs>	 (03CR) 10Jforrester: "Not sure if we should go to v3.10.10 (latest release in 3.x); this is the node 6.0 version." [puppet] - 10https://gerrit.wikimedia.org/r/368459 (https://phabricator.wikimedia.org/T161861) (owner: 10Jforrester)
[17:40:31] <icinga-wm>	 RECOVERY - Host rhenium is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms
[17:42:36] <wikibugs>	 (03CR) 10Reception123: "Ok, this is the one that I was given. If you can please also rebase this as I'm not sure why I can't." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368198 (owner: 10Reception123)
[17:51:38] <wikibugs>	 (03CR) 10Ladsgroup: [C: 031] "The rebuild is complete now: https://phabricator.wikimedia.org/T171461#3469736 I will get this deployed in SWAT in monday." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367393 (https://phabricator.wikimedia.org/T165197) (owner: 10Ladsgroup)
[17:54:52] <wikibugs>	 (03PS1) 10Ottomata: Install virtualenv bin on stat boxes [puppet] - 10https://gerrit.wikimedia.org/r/368461 (https://phabricator.wikimedia.org/T152712)
[17:55:12] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Install virtualenv bin on stat boxes [puppet] - 10https://gerrit.wikimedia.org/r/368461 (https://phabricator.wikimedia.org/T152712) (owner: 10Ottomata)
[18:02:01] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s2 on db1102 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:02:08] <wikibugs>	 (03PS1) 10Rush: openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368464
[18:02:11] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s6 on db1102 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:02:12] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s2 on db1102 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:02:21] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db1102 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[18:02:31] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db1102 is OK: OK slave_sql_lag Replication lag: 0.23 seconds
[18:02:41] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db1102 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[18:02:42] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s7 on db1102 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:02:44] <jynus>	 finally fixed
[18:02:52] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s6 on db1102 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:03:01] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s7 on db1102 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:03:03] <wikibugs>	 (03PS2) 10Rush: openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368464
[18:05:53] <wikibugs>	 (03CR) 10Bearloga: "@Otto I realize you're super busy with stat1002 stuff but also we need this patch because we're coming up to being a week behind on our me" [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga)
[18:08:00] <wikibugs>	 (03CR) 10Ottomata: [C: 031] "Seems totally fine to me!  Thanks so much!  If you don't mind, I'll let Gehel merge; (I'm not officially working today :) ), otherwise I c" [puppet] - 10https://gerrit.wikimedia.org/r/367930 (https://phabricator.wikimedia.org/T170494) (owner: 10Bearloga)
[18:10:44] <wikibugs>	 (03PS1) 10Rush: openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368466
[18:11:00] <wikibugs>	 (03PS2) 10Rush: openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368466
[18:11:24] <wikibugs>	 (03Abandoned) 10Rush: openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368464 (owner: 10Rush)
[18:14:08] <wikibugs>	 (03PS3) 10Rush: openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368466 (https://phabricator.wikimedia.org/T171494)
[18:14:16] <wikibugs>	 (03PS4) 10Rush: openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368466 (https://phabricator.wikimedia.org/T171494)
[18:15:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368466 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush)
[18:16:21] <icinga-wm>	 PROBLEM - Router interfaces on pfw-eqiad is CRITICAL: CRITICAL: host 208.80.154.218, interfaces up: 104, down: 1, dormant: 0, excluded: 3, unused: 0
[18:16:21] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0
[18:16:31] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0
[18:17:32] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0
[18:17:40] <XioNoX>	 please ignore the above ^
[18:18:21] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0
[18:19:15] <wikibugs>	 (03CR) 10Andrew Bogott: "If the puppet compiler is happy then I'm happy." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/368466 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush)
[18:19:49] <wikibugs>	 (03PS5) 10Rush: openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368466 (https://phabricator.wikimedia.org/T171494)
[18:20:22] <icinga-wm>	 RECOVERY - Router interfaces on pfw-eqiad is OK: OK: host 208.80.154.218, interfaces up: 105, down: 0, dormant: 0, excluded: 3, unused: 0
[18:20:24] <wikibugs>	 (03PS6) 10Rush: openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368466 (https://phabricator.wikimedia.org/T171494)
[18:20:53] <icinga-wm>	 PROBLEM - puppet last run on stat1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[virtualenv]
[18:23:24] <wikibugs>	 10Operations, 10ORES, 10Scoring-platform-team, 10Patch-For-Review, 10User-Joe: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3482118 (10Halfak) I scheduled some time with @joe for running another test with more workers on Monday at 1300 UTC.
[18:33:02] <chasemp>	 !log disabling puppet for labs things for trying out refactor rollout
[18:33:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:36:20] <wikibugs>	 (03CR) 10Rush: [C: 032] openstack: move openstack::repo to new model [puppet] - 10https://gerrit.wikimedia.org/r/368466 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush)
[18:37:53] <wikibugs>	 10Operations, 10Wikimedia-log-errors: mw1209 /usr/bin/timeout: the monitored command dumped core - https://phabricator.wikimedia.org/T171903#3482149 (10thcipriani) 05Open>03Resolved a:03herron >>! In T171903#3481957, @herron wrote: > @Joe and I were just looking at this because icinga had fired a disk al...
[18:38:55] <wikibugs>	 10Operations, 10Epic, 10Goal, 10Services (later): End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3482164 (10Eevans) Regarding space in the cluster:  [[ https://grafana.wikimedia.org/dashboard/db/restbase-cassandra-storage?orgId=1 | The dashboard ]] wo...
[18:39:30] <wikibugs>	 10Operations, 10Epic, 10Goal, 10Services (later): End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3482165 (10Eevans)
[18:48:10] <chasemp>	 !log enable and force puppet on labtestservices2001,labtestvirt2001,labtestcontrol2001,labservices1002,labcontrol1002,labnet1002,labvirt1014 and labtestneutron2001 to see a newly installed host get the change instead of a noop
[18:48:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:06:04] <wikibugs>	 10Operations, 10Traffic: setup/install cp402[34] - https://phabricator.wikimedia.org/T171966#3482223 (10RobH) a:05RobH>03None
[19:06:35] <wikibugs>	 10Operations, 10Traffic: setup/install cp402[34] - https://phabricator.wikimedia.org/T171966#3481656 (10RobH) These two systems are installed and calling into puppet, ready for service implementation.  Assigning to @ayounsi but not sure if this should be him or @bblack.
[19:09:08] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic: setup/install cp4022 - https://phabricator.wikimedia.org/T171967#3482230 (10RobH) 05Open>03stalled
[19:09:10] <wikibugs>	 10Operations, 10ops-ulsfo, 10Traffic, 10Patch-For-Review: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3482231 (10RobH)
[19:14:31] <icinga-wm>	 PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:16:53] <chasemp>	 ^ looking
[19:17:31] <icinga-wm>	 RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[19:19:22] <wikibugs>	 10Operations, 10ArchCom-RfC, 10Traffic, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3482263 (10GWicke)
[19:20:32] <wikibugs>	 10Operations, 10ArchCom-RfC, 10Traffic, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3349120 (10GWicke)
[19:26:06] <wikibugs>	 10Operations, 10ArchCom-RfC, 10Traffic, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3482294 (10GWicke)
[19:26:10] <wikibugs>	 10Operations, 10Traffic: setup/install cp402[34] - https://phabricator.wikimedia.org/T171966#3482295 (10ayounsi) a:03BBlack
[19:32:43] <wikibugs>	 (03PS1) 10Thcipriani: Jobrunner: create dsh groups per datacenter [puppet] - 10https://gerrit.wikimedia.org/r/368476 (https://phabricator.wikimedia.org/T129148)
[19:37:27] <wikibugs>	 10Operations, 10Epic, 10Goal, 10Services (doing), and 2 others: Services Q1 2017/18 goal: Begin migrating job queue processing to multi-DC enabled eventbus infrastructure. - https://phabricator.wikimedia.org/T169937#3482343 (10Pchelolo)
[19:40:15] <mutante>	 !log releases2001 - OS install worked this time, could not reproduce grub error, signing puppet cert, initial puppet run (T171917)
[19:40:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:40:27] <stashbot>	 T171917: setup releases2001.codfw.wmnet - https://phabricator.wikimedia.org/T171917
[19:47:52] <wikibugs>	 (03PS1) 10Dzahn: releases: add releases2001 to site, change rsync direction [puppet] - 10https://gerrit.wikimedia.org/r/368477 (https://phabricator.wikimedia.org/T171917)
[19:50:29] <wikibugs>	 (03CR) 10Dzahn: [C: 032] releases: add releases2001 to site, change rsync direction [puppet] - 10https://gerrit.wikimedia.org/r/368477 (https://phabricator.wikimedia.org/T171917) (owner: 10Dzahn)
[19:50:31] <wikibugs>	 (03CR) 10Paladox: [C: 031] releases: add releases2001 to site, change rsync direction [puppet] - 10https://gerrit.wikimedia.org/r/368477 (https://phabricator.wikimedia.org/T171917) (owner: 10Dzahn)
[19:58:55] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0
[19:58:55] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0
[20:01:58] <wikibugs>	 (03CR) 10MarcoAurelio: Initial configuration for hiwikiversity (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368165 (https://phabricator.wikimedia.org/T168765) (owner: 10Urbanecm)
[20:02:55] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0
[20:02:56] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0
[20:06:18] <wikibugs>	 (03CR) 10MarcoAurelio: Initial configuration for hiwikiversity (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368165 (https://phabricator.wikimedia.org/T168765) (owner: 10Urbanecm)
[20:08:47] <wikibugs>	 (03CR) 10Thcipriani: "I described the use-case for this patch in https://phabricator.wikimedia.org/T129148#3482379 but I'm not sure if there's an easier way to " [puppet] - 10https://gerrit.wikimedia.org/r/368476 (https://phabricator.wikimedia.org/T129148) (owner: 10Thcipriani)
[20:14:36] <icinga-wm>	 PROBLEM - Check systemd state on releases2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:15:26] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on releases2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:16:27] <icinga-wm>	 PROBLEM - DPKG on releases2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:17:16] <icinga-wm>	 PROBLEM - Disk space on releases2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:18:56] <icinga-wm>	 PROBLEM - configured eth on releases2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:19:13] <mutante>	 that's new but still no reason to do that...
[20:19:56] <icinga-wm>	 PROBLEM - dhclient process on releases2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:20:28] <foks>	 how do I log a msg again?
[20:20:42] <foks>	 ~dumb questions~
[20:20:42] <mutante>	 you start the line with !log
[20:20:46] <icinga-wm>	 PROBLEM - puppet last run on releases2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:20:59] <foks>	 !log removing 2FA from User:SPoore (WMF)
[20:21:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:21:11] <foks>	 thanks mutante 
[20:21:14] <mutante>	 yw
[20:21:38] <icinga-wm>	 PROBLEM - salt-minion processes on releases2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:22:16] <icinga-wm>	 RECOVERY - Disk space on releases2001 is OK: DISK OK
[20:22:17] <icinga-wm>	 RECOVERY - DPKG on releases2001 is OK: All packages OK
[20:22:19] <wikibugs>	 (03CR) 10MarcoAurelio: "Looks good so far. Thank you." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368168 (https://phabricator.wikimedia.org/T155038) (owner: 10Urbanecm)
[20:22:27] <icinga-wm>	 RECOVERY - salt-minion processes on releases2001 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[20:22:30] <mutante>	 ah, because IPv6 address still had to be added by puppet
[20:22:46] <icinga-wm>	 RECOVERY - dhclient process on releases2001 is OK: PROCS OK: 0 processes with command name dhclient
[20:22:56] <icinga-wm>	 RECOVERY - configured eth on releases2001 is OK: OK - interfaces up
[20:23:36] <icinga-wm>	 RECOVERY - puppet last run on releases2001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[20:24:26] <icinga-wm>	 RECOVERY - Check systemd state on releases2001 is OK: OK - running: The system is fully operational
[20:32:38] <Krinkle>	 MatmaRex: https://gerrit.wikimedia.org/r/#/c/368487/
[20:34:06] <wikibugs>	 (03PS4) 10Dzahn: releases: rsync reprepro data, set active server in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/368333 (https://phabricator.wikimedia.org/T164030)
[20:35:22] <wikibugs>	 (03CR) 10Dzahn: "modified so that we only have hiera lookup in parameter of profile classes, nothing like that in role class" [puppet] - 10https://gerrit.wikimedia.org/r/368333 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn)
[20:38:03] <wikibugs>	 (03PS2) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[20:45:18] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on releases2001 is OK: OK: synced at Fri 2017-07-28 20:45:13 UTC.
[20:51:53] <MatmaRex>	 Krinkle: i honestly know nothing about that stuff but i can +2 if you want me to
[20:52:00] <wikibugs>	 (03PS3) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[20:54:36] <wikibugs>	 (03CR) 10Urbanecm: Initial configuration for hiwikiversity (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368165 (https://phabricator.wikimedia.org/T168765) (owner: 10Urbanecm)
[20:59:11] <wikibugs>	 (03PS4) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[21:03:37] <wikibugs>	 (03PS5) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[21:06:47] <Krinkle>	 MatmaRex: Aye, that'd be nice
[21:08:56] <Krinkle>	 Thanks
[21:10:30] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/7209/releases1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/368333 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn)
[21:14:40] <wikibugs>	 10Operations: librenms - syslog stopped working after migration - https://phabricator.wikimedia.org/T172008#3482759 (10Dzahn)
[21:14:48] <wikibugs>	 10Operations: librenms - syslog stopped working after migration - https://phabricator.wikimedia.org/T172008#3482774 (10Dzahn) p:05Triage>03High
[21:40:54] <wikibugs>	 (03PS6) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[21:40:56] <Amir1_>	 I'm checking logstash for some errors and found lots of errors like this: https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2017.07.28/apache2?id=AV2KcTDcCtkHCY6a_HUI&_g=()
[21:41:05] <Amir1_>	 "	AH01067: Failed to read FastCGI header"
[21:41:19] <Amir1_>	 Is it normal? Just wanted to give the heads up
[21:42:27] <wikibugs>	 10Operations, 10RESTBase, 10RESTBase-API, 10Traffic, 10Services (next): RESTBase support for www.wikimedia.org missing - https://phabricator.wikimedia.org/T133178#3482880 (10GWicke) >>! In T133178#3428811, @Krinkle wrote: > I'd recommend the latter, but not indefinitely. We'd deprecate REST on `wikimedia...
[21:48:20] <Amir1_>	 Another fantastic error: https://su.wikipedia.org/w/index.php?title=Propinsi_Gifu&action=info
[21:54:02] <MaxSem>	 Amir1_, fatal error: Argument 1 passed to MediaWiki\Linker\LinkRenderer::makeLink() must implement interface MediaWiki\Linker\LinkTarget, null given in /srv/mediawiki/php-1.30.0-wmf.11/includes/actions/InfoAction.php on line 240
[21:54:14] <Amir1_>	 MaxSem: https://phabricator.wikimedia.org/T172016
[21:54:19] <Amir1_>	 just made the bug
[21:54:39] <Amir1_>	 The page is not redirect but ActionInfo thinks so and tries to load redirect target
[21:55:58] <MaxSem>	 it checks for $title->isRedirect()
[21:56:44] <MaxSem>	 so we have a discrepancy somewhere
[22:01:41] <wikibugs>	 (03PS7) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[22:02:39] <wikibugs>	 (03PS8) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[22:04:29] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] Fix exceptionmonitor [puppet] - 10https://gerrit.wikimedia.org/r/249905 (owner: 10MaxSem)
[22:14:07] <wikibugs>	 (03PS9) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[22:18:42] <wikibugs>	 (03PS10) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[22:19:33] <wikibugs>	 (03PS11) 10Chad: WIP: moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[22:22:18] <icinga-wm>	 PROBLEM - puppet last run on mw1295 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:23:39] <wikibugs>	 (03Abandoned) 10MaxSem: Fix exceptionmonitor [puppet] - 10https://gerrit.wikimedia.org/r/249905 (owner: 10MaxSem)
[22:26:50] <wikibugs>	 (03PS12) 10Chad: Moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[22:30:32] <wikibugs>	 (03PS13) 10Chad: Moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828
[22:32:42] <wikibugs>	 (03PS1) 10MaxSem: logging: Remove exceptionmonitor [puppet] - 10https://gerrit.wikimedia.org/r/368522
[22:42:56] <wikibugs>	 (03PS1) 10Rush: openstack: move rabbitmq to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/368523 (https://phabricator.wikimedia.org/T171494)
[22:44:43] <wikibugs>	 (03PS2) 10Rush: openstack: move rabbitmq to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/368523 (https://phabricator.wikimedia.org/T171494)
[22:47:01] <wikibugs>	 (03PS3) 10Rush: wip openstack: move rabbitmq to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/368523 (https://phabricator.wikimedia.org/T171494)
[22:53:28] <icinga-wm>	 RECOVERY - puppet last run on mw1295 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[22:56:54] <wikibugs>	 (03PS1) 10Dzahn: admins::dzahn: export reprepro base dir based on hostname [puppet] - 10https://gerrit.wikimedia.org/r/368524
[22:58:10] <wikibugs>	 (03PS2) 10Dzahn: admins::dzahn: export reprepro base dir based on hostname [puppet] - 10https://gerrit.wikimedia.org/r/368524
[22:59:07] <wikibugs>	 (03CR) 10Dzahn: [C: 032] admins::dzahn: export reprepro base dir based on hostname [puppet] - 10https://gerrit.wikimedia.org/r/368524 (owner: 10Dzahn)
[23:01:32] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10vm-requests, 10Security-General: New ganeti VM for MW release pipeline work - https://phabricator.wikimedia.org/T163743#3483047 (10Dzahn)
[23:01:35] <wikibugs>	 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3483046 (10Dzahn) 05Open>03Resolved
[23:03:00] <wikibugs>	 10Operations, 10Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3483092 (10Dzahn)
[23:03:03] <wikibugs>	 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3218909 (10Dzahn)
[23:03:06] <wikibugs>	 10Operations, 10vm-requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): setup releases2001.codfw.wmnet - https://phabricator.wikimedia.org/T171917#3483090 (10Dzahn) 05Open>03Resolved
[23:06:01] <wikibugs>	 (03PS1) 10Dzahn: cache::misc: release: add codfw backend, make active-active [puppet] - 10https://gerrit.wikimedia.org/r/368527 (https://phabricator.wikimedia.org/T171917)
[23:07:00] <wikibugs>	 (03PS2) 10Dzahn: cache::misc: releases: add codfw backend, make active-active [puppet] - 10https://gerrit.wikimedia.org/r/368527 (https://phabricator.wikimedia.org/T171917)
[23:10:38] <wikibugs>	 (03PS3) 10Dzahn: cache::misc: releases: add codfw backend, make active-active [puppet] - 10https://gerrit.wikimedia.org/r/368527 (https://phabricator.wikimedia.org/T171917)
[23:10:49] <wikibugs>	 10Puppet, 10Cloud-VPS: role::puppetmaster::standalone on stretch: Unable to locate package geoipupdate - https://phabricator.wikimedia.org/T171916#3483101 (10bd808) Discussed a bit on irc with @faidon. The recommended short term fix is to use jessie instead of stretch.  The next tier of fix is for us to fix op...
[23:11:16] <wikibugs>	 10Puppet, 10Cloud-VPS: role::puppetmaster::standalone on stretch: Unable to locate package geoipupdate - https://phabricator.wikimedia.org/T171916#3483104 (10bd808) p:05Triage>03Normal
[23:17:49] <wikibugs>	 (03PS4) 10Rush: openstack: move rabbitmq to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/368523 (https://phabricator.wikimedia.org/T171494)
[23:17:54] <wikibugs>	 (03Abandoned) 10Rush: labtest: labcontrol2001 use rabbitmq role [puppet] - 10https://gerrit.wikimedia.org/r/366166 (https://phabricator.wikimedia.org/T167559) (owner: 10Rush)
[23:18:39] <wikibugs>	 (03PS5) 10Rush: openstack: move rabbitmq to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/368523 (https://phabricator.wikimedia.org/T171494)
[23:25:46] <wikibugs>	 (03CR) 10Dzahn: [C: 032] cache::misc: releases: add codfw backend, make active-active [puppet] - 10https://gerrit.wikimedia.org/r/368527 (https://phabricator.wikimedia.org/T171917) (owner: 10Dzahn)
[23:32:41] <mutante>	 !log puppetmaster2001 - git pulled in /var/lib/git/operations/puppet to sync with puppetmaster1001 - accidentally interrupted puppet-merge 
[23:32:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:44:45] <wikibugs>	 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3483199 (10Dzahn)
[23:45:17] <wikibugs>	 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#2990470 (10Dzahn)