[00:07:57] <icinga-wm>	 PROBLEM - Druid historical on druid1006 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server historical
[00:07:58] <icinga-wm>	 PROBLEM - Check systemd state on druid1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:36:27] <icinga-wm>	 RECOVERY - Druid historical on druid1006 is OK: PROCS OK: 1 process with command name java, args io.druid.cli.Main server historical
[00:36:27] <icinga-wm>	 RECOVERY - Check systemd state on druid1006 is OK: OK - running: The system is fully operational
[02:55:58] <wikibugs_>	 (03PS1) 10Andrew Bogott: dnsleaks.py: ignore things under .svc.eqiad.wmflabs [puppet] - 10https://gerrit.wikimedia.org/r/379699
[02:56:50] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] dnsleaks.py: ignore things under .svc.eqiad.wmflabs [puppet] - 10https://gerrit.wikimedia.org/r/379699 (owner: 10Andrew Bogott)
[03:00:47] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:17:27] <wikibugs_>	 (03PS1) 10Andrew Bogott: dnsleaks.py: use case-insensitive comparisons [puppet] - 10https://gerrit.wikimedia.org/r/379700
[03:18:17] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] dnsleaks.py: use case-insensitive comparisons [puppet] - 10https://gerrit.wikimedia.org/r/379700 (owner: 10Andrew Bogott)
[03:30:17] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[03:35:21] <wikibugs_>	 (03CR) 10BryanDavis: [C: 04-1] "The existing array syntax is fine, the problem is that the IP address given is incorrect." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379661 (https://phabricator.wikimedia.org/T176287) (owner: 10Zoranzoki21)
[05:48:21] <kart_>	 Can anyone unbreak jenkins? https://integration.wikimedia.org/zuul/
[06:03:45] <legoktm>	 hmm, it looks like gearman is stuck
[06:31:50] <wikibugs_>	 10Operations, 10cloud-services-team (Kanban): puppet ca_server confusion - https://phabricator.wikimedia.org/T176437#3626230 (10Joe) If you want to better understand what puppet_ca does on an agent, and why removing it afterwards "doesn't break anything" there are good reads in the puppet docs:  - https://docs...
[06:32:53] <moritzm>	 !log installing emacs security updates on trusty (Debian already fixed)
[06:33:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:35] <wikibugs_>	 10Operations, 10Discovery, 10Maps-Sprint, 10Maps (Kartographer), and 2 others: nodejs 6.11 - https://phabricator.wikimedia.org/T170548#3626231 (10MoritzMuehlenhoff)
[06:36:10] <wikibugs_>	 10Operations, 10Discovery, 10Maps-Sprint, 10Maps (Kartographer), and 2 others: nodejs 6.11 - https://phabricator.wikimedia.org/T170548#3434983 (10MoritzMuehlenhoff) 05Open>03Resolved >>! In T170548#3625373, @Gehel wrote: > maps is finally upgraded to nodejs 6.11. >  > @MoritzMuehlenhoff: according to t...
[06:53:13] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] "Looks fine, but is there a reason why this only adds ipv4 addresses?" [puppet] - 10https://gerrit.wikimedia.org/r/379559 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey)
[07:08:53] <wikibugs_>	 (03PS2) 10Muehlenhoff: Remove salt minion Icinga check [puppet] - 10https://gerrit.wikimedia.org/r/379500
[07:12:58] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Remove salt minion Icinga check [puppet] - 10https://gerrit.wikimedia.org/r/379500 (owner: 10Muehlenhoff)
[07:15:11] <wikibugs_>	 (03CR) 10Mobrovac: [C: 031] Configure agent to export Cassandra histogram metrics [puppet] - 10https://gerrit.wikimedia.org/r/379610 (https://phabricator.wikimedia.org/T171772) (owner: 10Eevans)
[07:15:13] <wikibugs_>	 (03Abandoned) 10Giuseppe Lavagetto: puppet: switch all production hosts to the future parser [puppet] - 10https://gerrit.wikimedia.org/r/379492 (https://phabricator.wikimedia.org/T171704) (owner: 10Giuseppe Lavagetto)
[07:20:17] <icinga-wm>	 PROBLEM - SSH on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:22:42] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 04-2] "I don't understand why my previous comments haven't been  taken into account at all." [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392) (owner: 10Paladox)
[07:23:08] <icinga-wm>	 RECOVERY - SSH on copper is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[07:27:57] <wikibugs_>	 (03CR) 10Muehlenhoff: [V: 032 C: 032] Remove salt minion Icinga check [puppet] - 10https://gerrit.wikimedia.org/r/379500 (owner: 10Muehlenhoff)
[07:32:04] <wikibugs_>	 (03CR) 10Legoktm: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/379500 (owner: 10Muehlenhoff)
[07:37:17] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review: add pdu redundancy checking to server/router/switch checks in icinga - https://phabricator.wikimedia.org/T109903#3626264 (10faidon) >>! In T109903#3625304, @herron wrote: > Check_ipmi_sensor is showing failures on 3 out of 4 of the Dell PowerEdge R620 class sys...
[07:37:45] <wikibugs_>	 (03CR) 10Paladox: "> I don't understand why my previous comments haven't been  taken" [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392) (owner: 10Paladox)
[07:42:20] <wikibugs_>	 (03CR) 10Paladox: "I’m not sure how to get this moving along. Maybe we should change the priority of the task to normal unless we can get this change moving " [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392) (owner: 10Paladox)
[07:47:41] <wikibugs_>	 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3626266 (10Paladox) The patch has stalled and dosent look like it will move along, I guess we should change the priority to no...
[07:54:37] <wikibugs_>	 (03PS2) 10Elukey: network::constants: add aqs hosts [puppet] - 10https://gerrit.wikimedia.org/r/379559 (https://phabricator.wikimedia.org/T176223)
[07:55:50] <_joe_>	 win 17
[07:55:53] <wikibugs_>	 (03CR) 10Elukey: "> Looks fine, but is there a reason why this only adds ipv4" [puppet] - 10https://gerrit.wikimedia.org/r/379559 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey)
[07:57:27] <wikibugs_>	 (03PS6) 10Paladox: Phabricator: Fix aphlict systemd script [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392)
[07:58:07] <wikibugs_>	 (03CR) 10Paladox: "@Giuseppe Lavagetto I’ve made it forking now." [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392) (owner: 10Paladox)
[08:26:35] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3626293 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` mw1322.eqiad.wmnet ``` The log can be foun...
[08:26:37] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3626294 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1322.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['mw1322.eqiad.wmnet'] ```
[08:28:34] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3626295 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` mw1322.eqiad.wmnet ``` The log can be foun...
[08:29:00] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 13 probes of 286 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[08:31:20] <icinga-wm>	 RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 4 probes of 289 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map
[08:42:41] <wikibugs_>	 (03CR) 10Hashar: [C: 04-1] "I gave it a try on integration-slave-docker-1001 and it fails :(" [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar)
[08:43:04] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/379559 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey)
[08:46:34] <wikibugs_>	 (03CR) 10Hashar: [C: 031] "That worked just fine on the labs instances :]  The Docker hosts on labs are now all on Docker 17.06 \o/" [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar)
[08:47:09] <wikibugs_>	 (03CR) 10Elukey: [C: 032] network::constants: add aqs hosts [puppet] - 10https://gerrit.wikimedia.org/r/379559 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey)
[08:52:05] <wikibugs_>	 (03CR) 10Hashar: [C: 031] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar)
[08:56:02] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "A couple more small issues that I'd like to see fixed, apart from that LGTM." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392) (owner: 10Paladox)
[09:02:51] <wikibugs_>	 (03PS2) 10Muehlenhoff: Remove salt minion packages in production [puppet] - 10https://gerrit.wikimedia.org/r/379525
[09:03:15] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Remove salt minion packages in production [puppet] - 10https://gerrit.wikimedia.org/r/379525 (owner: 10Muehlenhoff)
[09:04:47] <wikibugs_>	 (03PS3) 10Muehlenhoff: Remove salt minion packages in production [puppet] - 10https://gerrit.wikimedia.org/r/379525
[09:10:05] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: "I don't think this is the right approach. aqs hosts are in no way "special hosts", nor should they be treated that way. A better approach " [puppet] - 10https://gerrit.wikimedia.org/r/379559 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey)
[09:14:10] <jynus>	 !log stop mariadb at db1055 for upgrade and maintenance
[09:14:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:22:11] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3626360 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1322.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['mw1322.eqiad.wmnet'] ```
[09:22:41] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] contint: docker-ce on labs docker slaves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar)
[09:25:44] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: Convert to use of the future parser by default [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/379569 (https://phabricator.wikimedia.org/T171704)
[09:34:34] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3626374 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` mw1321.eqiad.wmnet ``` The log can be foun...
[09:35:57] <wikibugs_>	 (03PS4) 10Muehlenhoff: Stop using salt minion in production [puppet] - 10https://gerrit.wikimedia.org/r/379525
[09:41:42] <wikibugs_>	 10Operations, 10Edit-Review-Improvements, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Performance: Systematically test load speeds of Watchlist and Recent Changes - https://phabricator.wikimedia.org/T176445#3626375 (10mark)
[09:42:27] <elukey>	 !log mw1319 (new appserver) serving traffic (going to increase its weight up to 20)
[09:42:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:58] <wikibugs_>	 (03PS7) 10Paladox: Phabricator: Fix aphlict systemd script [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392)
[09:57:01] <wikibugs_>	 (03CR) 10Paladox: Phabricator: Fix aphlict systemd script (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392) (owner: 10Paladox)
[09:57:44] <wikibugs_>	 (03PS9) 10Paladox: Phabricator: Fix aphlict systemd script [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392)
[09:59:35] <wikibugs_>	 (03PS3) 10Giuseppe Lavagetto: Convert to use of the future parser by default [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/379569 (https://phabricator.wikimedia.org/T171704)
[10:03:18] <wikibugs_>	 (03PS1) 10Muehlenhoff: Stop including a salt master in the cluster management role [puppet] - 10https://gerrit.wikimedia.org/r/379712
[10:03:20] <wikibugs_>	 (03PS1) 10Muehlenhoff: Remove obsolete role::salt::masters::production class [puppet] - 10https://gerrit.wikimedia.org/r/379713
[10:03:47] <wikibugs_>	 (03PS1) 10Addshore: Add AdvancedSearch to extension-list-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379714
[10:03:49] <wikibugs_>	 (03PS1) 10Addshore: Enable AdvancedSearch on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379715
[10:04:11] <wikibugs_>	 (03CR) 10Addshore: [C: 04-1] "not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379714 (owner: 10Addshore)
[10:04:15] <wikibugs_>	 (03CR) 10Addshore: [C: 04-1] "not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379715 (owner: 10Addshore)
[10:05:11] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Convert to use of the future parser by default [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/379569 (https://phabricator.wikimedia.org/T171704) (owner: 10Giuseppe Lavagetto)
[10:11:05] <wikibugs_>	 (03CR) 10Ema: [C: 032] bgp: FSM can be in states != ST_IDLE when the connection is closed [debs/pybal] (1.14) - 10https://gerrit.wikimedia.org/r/379570 (https://phabricator.wikimedia.org/T173028) (owner: 10Ema)
[10:13:40] <icinga-wm>	 PROBLEM - SSH on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:13:47] <_joe_>	 what's up with copper?
[10:14:12] <_joe_>	 moritzm: are you logged in? I can't login in fact
[10:14:33] <elukey>	 https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=copper&refresh=1m&orgId=1
[10:14:36] <elukey>	 it looks a bit overloaded
[10:14:44] <ema>	 _joe_: there's an icinga critical for copper SSH
[10:14:54] <_joe_>	 ema: that's what I was responding to indeed
[10:15:00] <_joe_>	 and yes, it seems overloaded
[10:15:05] <_joe_>	 I was asking myself by what
[10:15:13] <ema>	 oh yeah I see :) I can't ssh currently
[10:15:25] <ema>	 now I'm in
[10:15:26] <_joe_>	 it already happened at 7 AM utc
[10:15:31] <icinga-wm>	 RECOVERY - SSH on copper is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[10:15:36] <_joe_>	 I'm what was going on
[10:17:13] <moritzm>	 don't think it's related to the hhvm build, that was idling due to a build error
[10:32:39] <wikibugs_>	 (03PS6) 10ArielGlenn: Move dataset rsync config manifests to dumps module [puppet] - 10https://gerrit.wikimedia.org/r/379668 (https://phabricator.wikimedia.org/T175528)
[10:33:24] <wikibugs_>	 (03CR) 10ArielGlenn: [C: 032] Move dataset rsync config manifests to dumps module [puppet] - 10https://gerrit.wikimedia.org/r/379668 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn)
[10:38:48] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: puppet-compiler: bump to version 0.3.4 [puppet] - 10https://gerrit.wikimedia.org/r/379717 (https://phabricator.wikimedia.org/T171704)
[10:39:09] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: puppet-compiler: bump to version 0.3.4 [puppet] - 10https://gerrit.wikimedia.org/r/379717 (https://phabricator.wikimedia.org/T171704)
[10:39:21] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] puppet-compiler: bump to version 0.3.4 [puppet] - 10https://gerrit.wikimedia.org/r/379717 (https://phabricator.wikimedia.org/T171704) (owner: 10Giuseppe Lavagetto)
[10:41:26] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "PCC looks good https://puppet-compiler.wmflabs.org/compiler03/7987/phab1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392) (owner: 10Paladox)
[10:41:48] <wikibugs_>	 (03PS10) 10Giuseppe Lavagetto: Phabricator: Fix aphlict systemd script [puppet] - 10https://gerrit.wikimedia.org/r/379560 (https://phabricator.wikimedia.org/T176392) (owner: 10Paladox)
[10:42:17] <wikibugs_>	 (03PS1) 10Jcrespo: Repool db1055 with low weight after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379718
[10:43:13] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 621785.68 seconds
[10:43:50] <_joe_>	 paladox: applying in a minute to phab1001
[10:44:14] <_joe_>	 thanks for working on this
[10:44:17] <wikibugs_>	 (03CR) 10Jcrespo: [C: 04-1] "Not until the buffer pool warms up: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=1&fullscreen&orgId=1&var-dc=eqiad%20prometheu" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379718 (owner: 10Jcrespo)
[10:45:25] <_joe_>	 lol @ puppet
[10:45:55] <_joe_>	 it's telling me aphlict failed, but it hasn't
[10:46:04] <_joe_>	 actually, the fix worked very well
[10:46:21] <_joe_>	 paladox: good job
[10:47:13] <icinga-wm>	 PROBLEM - puppet last run on phab1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[aphlict]
[10:47:13] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 621923.97 seconds
[10:48:10] <wikibugs_>	 (03CR) 10Hashar: [C: 031] contint: docker-ce on labs docker slaves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar)
[10:48:17] <_joe_>	 that failure on phab1001 is bogus
[10:48:54] <wikibugs_>	 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3626648 (10Joe) Thanks to @Paladox work on this, the aphlict service unit now handles correctly the software.  I am going to m...
[10:49:13] <icinga-wm>	 RECOVERY - puppet last run on phab1001 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[10:49:14] <wikibugs_>	 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3626649 (10Joe) 05Open>03Resolved a:03Paladox
[10:49:52] <icinga-wm>	 PROBLEM - SSH on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:50:39] <ema>	 mmh, again
[10:50:52] <icinga-wm>	 RECOVERY - SSH on copper is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[10:52:00] <wikibugs_>	 (03PS1) 10Sbisson: RCFilters: cleanup unused variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379719
[10:54:12] <wikibugs_>	 (03PS1) 10Elukey: profile::kafka::broker: add the cluster label to the prometheus metrics [puppet] - 10https://gerrit.wikimedia.org/r/379720 (https://phabricator.wikimedia.org/T175922)
[10:58:15] <wikibugs_>	 (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/7989/ looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/379720 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey)
[11:05:39] <wikibugs_>	 (03CR) 10Elukey: "Example of kafka metrics in here:" [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey)
[11:05:47] <wikibugs_>	 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3626672 (10MoritzMuehlenhoff)
[11:15:23] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 622276.83 seconds
[11:33:45] <ema>	 `.
[11:33:47] <ema>	 `.
[11:34:08] <ema>	 heh
[11:40:22] <wikibugs_>	 10Operations, 10Traffic, 10Wikimedia-Shop, 10HTTPS: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#3626730 (10Jseddon) Hey @BBlack,  Been working on this over the last week.  The short: We have HSTS but its set to 90 days. Shopify have confirmed that this can be extended in le...
[11:45:36] <wikibugs_>	 10Puppet, 10Trebuchet: Trebuchet master should be separate from scap - https://phabricator.wikimedia.org/T96042#3626742 (10MoritzMuehlenhoff) 05Open>03declined Trebuchet has been removed.
[11:50:13] <icinga-wm>	 PROBLEM - puppet last run on ms-be1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:50:39] <wikibugs_>	 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: The aphlict systemd unit needs to be rewritten from scratch - https://phabricator.wikimedia.org/T176392#3626762 (10Paladox) @Joe thanks :)  Yeh we can remove Ubuntu / upstart support.
[12:00:27] <wikibugs_>	 (03PS2) 10Hashar: contint: docker-ce on labs docker slaves [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267)
[12:00:29] <wikibugs_>	 (03PS1) 10Hashar: Decouple profile::ci::docker and arcanist install [puppet] - 10https://gerrit.wikimedia.org/r/379726 (https://phabricator.wikimedia.org/T176267)
[12:00:33] <wikibugs_>	 (03PS1) 10Hashar: Decouple profile::ci::docker and zuul-cloner install [puppet] - 10https://gerrit.wikimedia.org/r/379727 (https://phabricator.wikimedia.org/T176267)
[12:00:35] <wikibugs_>	 (03PS1) 10Hashar: Decouple profile::ci::docker and worker_localhost [puppet] - 10https://gerrit.wikimedia.org/r/379728 (https://phabricator.wikimedia.org/T176267)
[12:00:37] <wikibugs_>	 (03PS1) 10Hashar: Move jenkins agent username to hiera [puppet] - 10https://gerrit.wikimedia.org/r/379729
[12:02:37] <hashar>	 !log apt-get upgrade on contint1001 / contint2001 
[12:02:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:26] <wikibugs_>	 (03PS1) 10Milimetric: [WIP] Add druid options to AQS config [puppet] - 10https://gerrit.wikimedia.org/r/379730
[12:17:23] <icinga-wm>	 RECOVERY - puppet last run on ms-be1023 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[12:19:47] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] Repool db1055 with low weight after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379718 (owner: 10Jcrespo)
[12:20:12] <wikibugs_>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379731
[12:23:36] <wikibugs_>	 (03Merged) 10jenkins-bot: Repool db1055 with low weight after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379718 (owner: 10Jcrespo)
[12:24:01] <wikibugs_>	 (03CR) 10jenkins-bot: Repool db1055 with low weight after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379718 (owner: 10Jcrespo)
[12:26:48] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3626861 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1321.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['mw1321.eqiad.wmnet'] ```
[12:30:23] <jynus>	 did I miss the log with the deployment log or did it actually show?
[12:31:32] <jynus>	 I think some of the bots stopped working, not sure which ones: https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:37] <wikibugs_>	 (03PS1) 10Elukey: profile::kafka::broker: remove graphite metrics config [puppet] - 10https://gerrit.wikimedia.org/r/379734 (https://phabricator.wikimedia.org/T175922)
[12:34:50] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3626867 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` mw1320.eqiad.wmnet ``` The log can be foun...
[12:36:22] <wikibugs_>	 (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/7990/" [puppet] - 10https://gerrit.wikimedia.org/r/379734 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey)
[12:42:55] <wikibugs_>	 (03PS1) 10Volans: wmf-auto-reimage: bugfix variable reference [puppet] - 10https://gerrit.wikimedia.org/r/379742
[12:44:42] <icinga-wm>	 PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:45:42] <wikibugs_>	 (03PS2) 10Volans: wmf-auto-reimage: bugfix variable reference [puppet] - 10https://gerrit.wikimedia.org/r/379742
[12:46:58] <wikibugs_>	 (03CR) 10Volans: [C: 032] wmf-auto-reimage: bugfix variable reference [puppet] - 10https://gerrit.wikimedia.org/r/379742 (owner: 10Volans)
[13:01:51] <wikibugs_>	 (03PS2) 10Hashar: Move jenkins agent username to hiera [puppet] - 10https://gerrit.wikimedia.org/r/379729
[13:06:21] <wikibugs_>	 (03CR) 10Hashar: "For production hosts, the puppet compiler seems all happy about it https://puppet-compiler.wmflabs.org/compiler02/7991/ :]" [puppet] - 10https://gerrit.wikimedia.org/r/379729 (owner: 10Hashar)
[13:07:51] <wikibugs_>	 10Operations, 10Traffic, 10Wikimedia-Shop, 10HTTPS: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#3626970 (10BBlack) Thanks for the updates!  Even a 90d HSTS without the preload/includeSub flags is better than nothing.  If we can get the time extended out to 1y that's even be...
[13:09:40] <wikibugs_>	 10Operations, 10Traffic, 10Wikimedia-Shop, 10HTTPS: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#3626971 (10BBlack)
[13:13:03] <icinga-wm>	 RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[13:16:38] <wikibugs_>	 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3626988 (10MoritzMuehlenhoff)
[13:16:46] <wikibugs_>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379750
[13:16:55] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Revert "mariadb: Depool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379750 (owner: 10Jcrespo)
[13:17:16] <wikibugs_>	 (03Abandoned) 10Jcrespo: Revert "mariadb: Depool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379731 (owner: 10Jcrespo)
[13:17:42] <wikibugs_>	 (03Abandoned) 10Jcrespo: Revert "mariadb: Depool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379750 (owner: 10Jcrespo)
[13:21:46] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA, 10Phabricator: Decommission db1048 (was Move m3 slave to db1059) - https://phabricator.wikimedia.org/T175679#3626995 (10jcrespo) @mmodell This is still needed, but this and the next week are going to be problematic. As a heads up, we may need to merge some puppet changes s...
[13:22:52] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA, 10Phabricator: Decommission db1048 (was Move m3 slave to db1059) - https://phabricator.wikimedia.org/T175679#3626998 (10mmodell) @jcrespo: Thanks, I'll keep an eye out for it.
[13:25:52] <wikibugs_>	 (03PS4) 10Zoranzoki21: Fix problem with throttle rule for John Michael Kohler Art Center. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379661 (https://phabricator.wikimedia.org/T176287)
[13:27:05] <wikibugs_>	 (03PS5) 10Zoranzoki21: Fix problem with throttle rule for John Michael Kohler Art Center. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379661 (https://phabricator.wikimedia.org/T176287)
[13:27:11] <wikibugs_>	 10Operations, 10Mail: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361#3627013 (10MoritzMuehlenhoff) >>! In T175361#3621879, @herron wrote: > # Provision a mx2001 replacement, say mx2002, test it and then cut the public IPs of mx2001 over to mx2002.  Potentially rename it back to mx...
[13:30:12] <wikibugs_>	 (03PS1) 10Jcrespo: Pool db1101 as recentchanges replica for s2 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379756 (https://phabricator.wikimedia.org/T176311)
[13:32:01] <wikibugs_>	 (03PS1) 10Jcrespo: Pool db1055 with full weight, remove main traffic from rc replicas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379757
[13:45:20] <wikibugs_>	 10Operations, 10Discovery, 10Maps-Sprint, 10Maps (Kartographer), and 2 others: nodejs 6.11 - https://phabricator.wikimedia.org/T170548#3627066 (10debt)
[13:45:25] <wikibugs_>	 10Operations, 10Maps-Sprint, 10Maps (Kartotherian): Upgrade kartotherian and tilerator to nodejs 6.11 - https://phabricator.wikimedia.org/T171707#3627064 (10debt) 05Open>03Resolved Woohoo!  🎉
[13:49:16] <wikibugs_>	 10Operations, 10Discovery, 10Maps, 10Maps-Sprint, and 2 others: Make maps active / active - https://phabricator.wikimedia.org/T162362#3627067 (10debt) 05Open>03Resolved Thanks @BBlack and @Gehel !
[13:49:45] <elukey>	 !log mw1321 (new appserver) serving traffic (going to increase its weight up to 20)
[13:49:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:46] <wikibugs_>	 10Operations, 10cloud-services-team (Kanban): puppet ca_server confusion - https://phabricator.wikimedia.org/T176437#3627211 (10Andrew) As far as I can see, the docs only describe setting ca_server once, for agents, in the [main] block.  I am missing an explanation of why we would set it twice, and what settin...
[14:15:31] <wikibugs_>	 (03PS1) 10Muehlenhoff: Remove role::salt::masters::labs::project_master [puppet] - 10https://gerrit.wikimedia.org/r/379763
[14:32:08] <wikibugs_>	 (03CR) 10Jdlrobson: "VolkerE Yeh this just needs a SWAT. Ping me if you need help learning about how to do that." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377406 (https://phabricator.wikimedia.org/T175670) (owner: 10VolkerE)
[14:42:48] <moritzm>	 !log updated tor packages to 0.3.1.7 (new stable series)
[14:43:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:29] <moritzm>	 !log uploaded php-luasandbox build for src:php5.5 (required for CI tests on jessie)
[14:43:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:46] <moritzm>	 ^ hashar: that should've been the last one. when you have a patch to switch the tests to apt.wikimedia.org, add me to reviewers and I'll look into merging it
[14:45:16] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] Pool db1101 as recentchanges replica for s2 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379756 (https://phabricator.wikimedia.org/T176311) (owner: 10Jcrespo)
[14:46:51] <wikibugs_>	 (03Merged) 10jenkins-bot: Pool db1101 as recentchanges replica for s2 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379756 (https://phabricator.wikimedia.org/T176311) (owner: 10Jcrespo)
[14:47:02] <wikibugs_>	 (03CR) 10jenkins-bot: Pool db1101 as recentchanges replica for s2 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379756 (https://phabricator.wikimedia.org/T176311) (owner: 10Jcrespo)
[14:49:56] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: repool db1101 with low weight (duration: 00m 47s)
[14:50:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:50:15] <wikibugs_>	 10Operations, 10fundraising-tech-ops, 10netops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627380 (10Jgreen)
[14:50:33] <wikibugs_>	 10Operations, 10fundraising-tech-ops, 10netops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627397 (10Jgreen)
[14:50:35] <wikibugs_>	 10Operations, 10fundraising-tech-ops, 10netops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627399 (10Jgreen) a:05Jgreen>03None
[14:50:49] <wikibugs_>	 10Operations, 10fundraising-tech-ops, 10netops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627380 (10Jgreen) p:05High>03Triage
[14:51:59] <wikibugs_>	 (03PS1) 10Muehlenhoff: Remove role::salt::masters::labs from labcontrol* hosts [puppet] - 10https://gerrit.wikimedia.org/r/379770
[15:05:22] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3627421 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1320.eqiad.wmnet'] ```  Of which those **FAILED**: ``` ['mw1320.eqiad.wmnet'] ```
[15:14:42] <wikibugs_>	 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3626672 (10RobH) This seems like it doesn't need much space on the disks, the smallest spare eqiad system I have that meets the other requirements (32GB RAM), we have a few options.  We have an older spare W...
[15:17:21] <wikibugs_>	 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3627478 (10MoritzMuehlenhoff) WMF4727 sounds like a pretty good fit (if we can swap copper's SSD drives in there (since they currently have SATA)?)
[15:21:42] <wikibugs_>	 10Operations, 10Mail: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361#3627483 (10akosiaris) >>! In T175361#3621879, @herron wrote: > Looking more closely at how to pull mx2001 out of service for an OS reload it is more complicated than I originally thought.  We have ~100 dns zones...
[15:21:53] <hashar>	 !log Restarted Jenkins. Out of memory)
[15:22:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:23:46] <wikibugs_>	 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3627486 (10RobH) Copper is a very old R310, which has cabled HDD with LFF bays.  The SFF SDDs fit in, since it is a non-hot-swap chassis.  If we want to move the old SSDs from copper into the new host, it wi...
[15:27:47] <wikibugs_>	 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3627493 (10MoritzMuehlenhoff) Or maybe let's go ahead with the SATAs as currently used in WMF4727 (which still is a much faster system than copper). Package building isn't the most I/O bound task we're runni...
[15:31:07] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 30 probes of 286 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[15:33:50] <bblack>	 oh look ripe-atlas-codfw again
[15:36:07] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 1 probes of 286 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[15:42:00] <wikibugs_>	 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3627551 (10akosiaris) IIRC we opened T130759 because slow IO had indeed cause some minor suffering on our part. If we can avoid migrating back to SATA disks easily I think we should. There's one more option...
[15:42:17] <icinga-wm>	 PROBLEM - puppet last run on ganeti1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:53:23] <wikibugs_>	 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3627554 (10MoritzMuehlenhoff) My (mild) concern against a Ganeti VM is that some packages might build differently if they detect virtualisation (via systemd-detect-virt or whatever). Not sure if that's an is...
[15:56:29] <wikibugs_>	 10Operations, 10HHVM, 10User-Elukey: Migration of mw* servers to stretch - https://phabricator.wikimedia.org/T174431#3627558 (10elukey)
[15:58:07] <paladox>	 _joe_ your welcome :). Just got home and saw your irc messages.
[16:04:50] <wikibugs_>	 10Operations, 10fundraising-tech-ops, 10netops: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627580 (10Jgreen) This also requires an updating to the firewall policy, I added the new database and generated the new policy.  com...
[16:10:27] <icinga-wm>	 RECOVERY - puppet last run on ganeti1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[16:17:58] <wikibugs_>	 (03PS1) 10Jgreen: switch fundraising test box hostname from frav1001 to frdb1003, adjust IP for new subnet [dns] - 10https://gerrit.wikimedia.org/r/379782 (https://phabricator.wikimedia.org/T176492)
[16:31:17] <wikibugs_>	 (03CR) 10Jgreen: [C: 032] switch fundraising test box hostname from frav1001 to frdb1003, adjust IP for new subnet [dns] - 10https://gerrit.wikimedia.org/r/379782 (https://phabricator.wikimedia.org/T176492) (owner: 10Jgreen)
[16:36:42] <wikibugs_>	 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627694 (10ayounsi) a:03ayounsi Vlan changed on pfw-eqiad (old) Vlan changed on fasw-c-eqiad (new) Security p...
[16:39:22] <wikibugs_>	 10Operations, 10Goal, 10Kubernetes: Operations Q1 goal: Streamlined Service Delivery - https://phabricator.wikimedia.org/T170108#3627729 (10akosiaris)
[16:39:24] <wikibugs_>	 10Operations, 10Goal, 10Kubernetes: Experiment with ingress solutions (stretch) - https://phabricator.wikimedia.org/T170121#3627726 (10akosiaris) 05Open>03Resolved a:03akosiaris Here's my experimentation results.  = Intro = Ingress resources are just a resource declaration in the kubernetes API, they n...
[16:41:26] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 031] "I 'll merge on Monday" [puppet] - 10https://gerrit.wikimedia.org/r/379556 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar)
[16:45:20] <wikibugs_>	 (03PS1) 10Andrew Bogott: labtest: don't override the labtest puppetmaster ca_server [puppet] - 10https://gerrit.wikimedia.org/r/379788
[16:46:08] <wikibugs_>	 (03PS2) 10Andrew Bogott: labtest: don't override the labtest puppetmaster ca_server [puppet] - 10https://gerrit.wikimedia.org/r/379788
[16:46:39] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] labtest: don't override the labtest puppetmaster ca_server [puppet] - 10https://gerrit.wikimedia.org/r/379788 (owner: 10Andrew Bogott)
[16:49:12] <wikibugs_>	 (03PS1) 10ArielGlenn: move fetches of various datasets to dump module from datasets module [puppet] - 10https://gerrit.wikimedia.org/r/379790 (https://phabricator.wikimedia.org/T175528)
[16:49:36] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] move fetches of various datasets to dump module from datasets module [puppet] - 10https://gerrit.wikimedia.org/r/379790 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn)
[16:50:59] <wikibugs_>	 (03CR) 10ArielGlenn: "I was thinking to collect all these from different parts of the dumps module (where these manifests now are) and pass them in at the profi" [puppet] - 10https://gerrit.wikimedia.org/r/379517 (owner: 10Reedy)
[16:51:57] <wikibugs_>	 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: move frav1001's to the frack-fundraising VLAN so we can use it for database testing - https://phabricator.wikimedia.org/T176492#3627748 (10ayounsi) 05Open>03Resolved new policy file worked fine, committed. Don't forget to update rackta...
[17:03:05] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: Add support for opinionated build containers [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/379792
[17:03:09] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: [WiP] Add runy base image and a fluentd image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/379793
[17:04:56] <wikibugs_>	 (03PS2) 10ArielGlenn: move fetches of various datasets to dump module from datasets module [puppet] - 10https://gerrit.wikimedia.org/r/379790 (https://phabricator.wikimedia.org/T175528)
[17:05:20] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] move fetches of various datasets to dump module from datasets module [puppet] - 10https://gerrit.wikimedia.org/r/379790 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn)
[17:06:27] <wikibugs_>	 (03PS1) 10Andrew Bogott: Revert "labtest: don't override the labtest puppetmaster ca_server" [puppet] - 10https://gerrit.wikimedia.org/r/379795
[17:07:03] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] Revert "labtest: don't override the labtest puppetmaster ca_server" [puppet] - 10https://gerrit.wikimedia.org/r/379795 (owner: 10Andrew Bogott)
[17:09:14] <wikibugs_>	 (03PS3) 10ArielGlenn: move fetches of various datasets to dump module from datasets module [puppet] - 10https://gerrit.wikimedia.org/r/379790 (https://phabricator.wikimedia.org/T175528)
[17:12:00] <wikibugs_>	 (03CR) 10Dzahn: "aha, so the instance info tells us the creator, Created by" [puppet] - 10https://gerrit.wikimedia.org/r/379499 (owner: 10Elukey)
[17:12:48] <wikibugs_>	 (03CR) 10Dzahn: [C: 031] "oh, somebody already did :), i would if we get a +1 from leszek then merge it" [puppet] - 10https://gerrit.wikimedia.org/r/379499 (owner: 10Elukey)
[17:25:07] <wikibugs_>	 10Operations, 10ops-eqiad: rack/setup/install flerovium.eqiad.wmnet - https://phabricator.wikimedia.org/T176505#3627849 (10RobH)
[17:25:09] <wikibugs_>	 10Operations, 10ops-codfw: rack/setup/install furud.codfw.wmnet - https://phabricator.wikimedia.org/T176506#3627866 (10RobH)
[17:25:30] <wikibugs_>	 10Operations, 10ops-eqiad: relabel WMF3083 as frdb1003 - https://phabricator.wikimedia.org/T176507#3627884 (10Jgreen)
[17:32:32] <wikibugs_>	 (03PS1) 10BBlack: LVS: turn off ip_early_demux [puppet] - 10https://gerrit.wikimedia.org/r/379798
[17:32:34] <wikibugs_>	 (03PS1) 10BBlack: Global: Turn off ethernet flow for all interfaces [puppet] - 10https://gerrit.wikimedia.org/r/379799
[17:32:36] <wikibugs_>	 (03PS1) 10BBlack: LVS: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379800
[17:32:38] <wikibugs_>	 (03PS1) 10BBlack: Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801
[17:33:07] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Global: Turn off ethernet flow for all interfaces [puppet] - 10https://gerrit.wikimedia.org/r/379799 (owner: 10BBlack)
[17:33:18] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] LVS: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379800 (owner: 10BBlack)
[17:33:36] <wikibugs_>	 (03PS1) 10Madhuvishy: public_dumps: Create initial role for public dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/379802
[17:33:41] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801 (owner: 10BBlack)
[17:35:25] <wikibugs_>	 (03PS2) 10BBlack: Global: Turn off ethernet flow for all interfaces [puppet] - 10https://gerrit.wikimedia.org/r/379799
[17:35:27] <wikibugs_>	 (03PS2) 10BBlack: LVS: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379800
[17:35:29] <wikibugs_>	 (03PS2) 10BBlack: Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801
[17:36:07] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] LVS: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379800 (owner: 10BBlack)
[17:36:21] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801 (owner: 10BBlack)
[17:36:49] <wikibugs_>	 (03CR) 10Madhuvishy: [C: 032] public_dumps: Create initial role for public dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/379802 (owner: 10Madhuvishy)
[17:37:56] <wikibugs_>	 (03PS3) 10BBlack: LVS: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379800
[17:37:58] <wikibugs_>	 (03PS3) 10BBlack: Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801
[17:38:41] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801 (owner: 10BBlack)
[17:42:59] <bblack>	 I hate you too jenkins :P
[17:43:09] <wikibugs_>	 (03PS4) 10BBlack: Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801
[17:43:44] <mutante>	 hrhr
[17:43:50] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801 (owner: 10BBlack)
[17:47:18] <wikibugs_>	 (03PS4) 10BryanDavis: wmcs: Add wikireplica_dns management script [puppet] - 10https://gerrit.wikimedia.org/r/378739 (https://phabricator.wikimedia.org/T174860)
[17:47:55] <wikibugs_>	 (03PS1) 10Andrew Bogott: labs puppetmaster: install observerenv [puppet] - 10https://gerrit.wikimedia.org/r/379804
[17:48:20] <wikibugs_>	 (03CR) 10BryanDavis: "PS4 adds the tools.db.svc.eqiad.wmflabs service name. This is ready to merge." [puppet] - 10https://gerrit.wikimedia.org/r/378739 (https://phabricator.wikimedia.org/T174860) (owner: 10BryanDavis)
[17:48:55] <wikibugs_>	 (03PS5) 10BBlack: Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801
[17:50:36] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] labs puppetmaster: install observerenv [puppet] - 10https://gerrit.wikimedia.org/r/379804 (owner: 10Andrew Bogott)
[17:54:13] <wikibugs_>	 10Operations, 10Fundraising-Backlog, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#3627931 (10cwdent)
[17:54:15] <wikibugs_>	 10Operations, 10fundraising-tech-ops: Long term storage for frack prometheus data - https://phabricator.wikimedia.org/T175738#3627929 (10cwdent) 05Open>03Resolved We will look into aggregated stats again later but there were spare 1TB disks on the lvs servers so I moved the prometheus backend there and set...
[18:06:06] <wikibugs_>	 (03PS4) 10ArielGlenn: move fetches of various datasets to dump module from datasets module [puppet] - 10https://gerrit.wikimedia.org/r/379790 (https://phabricator.wikimedia.org/T175528)
[18:11:13] <wikibugs_>	 (03Draft1) 10Paladox: Phabricator: Remove ubuntu / upstart support [puppet] - 10https://gerrit.wikimedia.org/r/379794
[18:11:16] <wikibugs_>	 (03PS2) 10Paladox: Phabricator: Remove ubuntu / upstart support [puppet] - 10https://gerrit.wikimedia.org/r/379794
[18:11:24] <wikibugs_>	 (03PS3) 10Paladox: Phabricator: Remove ubuntu / upstart support [puppet] - 10https://gerrit.wikimedia.org/r/379794
[18:18:01] <wikibugs_>	 (03PS1) 10Madhuvishy: public_dumps: Set up initial module and profile, add to role [puppet] - 10https://gerrit.wikimedia.org/r/379810 (https://phabricator.wikimedia.org/T171539)
[18:20:48] <wikibugs_>	 (03CR) 10Madhuvishy: [C: 032] public_dumps: Set up initial module and profile, add to role [puppet] - 10https://gerrit.wikimedia.org/r/379810 (https://phabricator.wikimedia.org/T171539) (owner: 10Madhuvishy)
[18:26:33] <logmsgbot>	 !log demon@tin Pruned MediaWiki: 1.30.0-wmf.18 [keeping static files] (duration: 01m 29s)
[18:26:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:32:04] <wikibugs_>	 (03CR) 10Dzahn: [C: 031] Phabricator: Remove ubuntu / upstart support [puppet] - 10https://gerrit.wikimedia.org/r/379794 (owner: 10Paladox)
[18:32:36] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2008 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:32:46] <icinga-wm>	 PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:32:46] <icinga-wm>	 PROBLEM - Restbase edge codfw on text-lb.codfw.wikimedia.org is CRITICAL: /api/rest_v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:32:47] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:32:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:32:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:32:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:32:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:32:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1015 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:06] <icinga-wm>	 PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:06] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2004 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:07] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:08] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2007 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:16] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:16] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:16] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1011 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:16] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:17] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:26] <icinga-wm>	 PROBLEM - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is CRITICAL: /api/rest_v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:27] <icinga-wm>	 PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get)
[18:33:39] <greg-g>	 uh
[18:35:37] <mutante>	 urandom: 
[18:35:39] <mutante>	 eh
[18:37:36] <icinga-wm>	 PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:41:21] <wikibugs_>	 10Operations, 10Edit-Review-Improvements, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Performance: Systematically test load speeds of Watchlist and Recent Changes - https://phabricator.wikimedia.org/T176445#3628015 (10jmatazzoni)
[18:47:36] <icinga-wm>	 PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: CRITICAL - Destination Unreachable (2607:f6f0:205::153)
[18:47:46] <icinga-wm>	 PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100%
[18:51:27] <wikibugs_>	 10Operations, 10Analytics, 10monitoring, 10Patch-For-Review: Eventstreams graphite disk usage - https://phabricator.wikimedia.org/T160644#3628043 (10Nuria)
[18:52:46] <icinga-wm>	 RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 7.07 ms
[18:52:56] <icinga-wm>	 RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 37.82 ms
[19:05:56] <icinga-wm>	 RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures
[19:13:52] <wikibugs_>	 (03PS1) 10Krinkle: webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149)
[19:15:39] <wikibugs_>	 (03CR) 10Krinkle: "Test fails as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149) (owner: 10Krinkle)
[19:36:53] <wikibugs_>	 (03CR) 1020after4: [C: 031] phragile: disallow .htaccess usage [puppet] - 10https://gerrit.wikimedia.org/r/379499 (owner: 10Elukey)
[19:39:40] <wikibugs_>	 (03CR) 1020after4: [C: 031] Phabricator: Remove ubuntu / upstart support [puppet] - 10https://gerrit.wikimedia.org/r/379794 (owner: 10Paladox)
[19:41:16] <icinga-wm>	 PROBLEM - puppet last run on mw1296 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:43:25] <wikibugs_>	 (03CR) 10Dzahn: [C: 031] Gerrit: Enable ui for slaves [puppet] - 10https://gerrit.wikimedia.org/r/379420 (owner: 10Paladox)
[19:46:45] <wikibugs_>	 (03PS4) 1020after4: Phabricator: Remove ubuntu / upstart support [puppet] - 10https://gerrit.wikimedia.org/r/379794 (owner: 10Paladox)
[19:47:09] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Remove ubuntu / upstart support [puppet] - 10https://gerrit.wikimedia.org/r/379794 (owner: 10Paladox)
[19:48:10] <wikibugs_>	 (03PS5) 1020after4: Phabricator: Remove ubuntu / upstart support [puppet] - 10https://gerrit.wikimedia.org/r/379794 (owner: 10Paladox)
[19:49:33] <wikibugs_>	 (03CR) 1020after4: [C: 031] Phabricator: Remove ubuntu / upstart support [puppet] - 10https://gerrit.wikimedia.org/r/379794 (owner: 10Paladox)
[19:54:50] <wikibugs_>	 (03PS1) 10Bmansurov: Implement Schema:Print purging strategy [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395)
[19:55:13] <wikibugs_>	 (03PS5) 10Krinkle: webperf: Limit by-country navtiming breakdown to those with 5+ hits/min [puppet] - 10https://gerrit.wikimedia.org/r/377806 (https://phabricator.wikimedia.org/T166390)
[19:55:15] <wikibugs_>	 (03PS2) 10Krinkle: webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149)
[19:55:17] <wikibugs_>	 (03PS1) 10Krinkle: [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830
[19:57:39] <wikibugs_>	 (03PS17) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013)
[19:57:41] <icinga-wm>	 ACKNOWLEDGEMENT - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:41] <icinga-wm>	 ACKNOWLEDGEMENT - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:41] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase1007 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:41] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase1011 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:41] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase1012 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:41] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase1013 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:42] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase1014 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:42] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase1015 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:43] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:43] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:44] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:44] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase2002 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:45] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase2004 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:45] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase2006 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:46] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase2007 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:46] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase2008 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:47] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase2009 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:47] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase2010 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:48] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase2011 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:57:48] <icinga-wm>	 ACKNOWLEDGEMENT - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke There is a bug here that dropped optional fields. However, severity is low (missing optional thumb), so no need to stress over the weekend.
[19:58:35] <wikibugs_>	 (03PS5) 10Andrew Bogott: WIP: nova: turn off hourly instance usage audits [puppet] - 10https://gerrit.wikimedia.org/r/377187
[19:58:55] <icinga-wm>	 ACKNOWLEDGEMENT - Restbase edge codfw on text-lb.codfw.wikimedia.org is CRITICAL: /api/rest_v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke Non-critical bug that dropped optional response properties.
[19:58:55] <icinga-wm>	 ACKNOWLEDGEMENT - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke Non-critical bug that dropped optional response properties.
[19:58:55] <icinga-wm>	 ACKNOWLEDGEMENT - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke Non-critical bug that dropped optional response properties.
[19:58:55] <icinga-wm>	 ACKNOWLEDGEMENT - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is CRITICAL: /api/rest_v1/page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage responds with malformed body (AttributeError: NoneType object has no attribute get): gwicke Non-critical bug that dropped optional response properties.
[20:02:48] <wikibugs_>	 (03PS2) 10Bmansurov: Implement Schema:Print purging strategy [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395)
[20:05:06] <wikibugs_>	 (03PS2) 10Krinkle: [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830
[20:05:38] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830 (owner: 10Krinkle)
[20:05:42] <wikibugs_>	 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint-2016-11-08: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3628244 (10Dzahn) Hey @Lydia_Pintscher Happy to work on this and talk to you maybe on IRC as well.  I would say one of the...
[20:06:11] <wikibugs_>	 (03PS3) 10Krinkle: [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830
[20:08:33] <wikibugs_>	 (03PS4) 10Krinkle: [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830
[20:10:52] <wikibugs_>	 (03CR) 10MarcoAurelio: "Wouldn't it be simpler if all changes to operations/mediawiki-config be in a single patch? Not wishing to put myself as an example but it " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378401 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup)
[20:11:30] <wikibugs_>	 (03PS5) 10Krinkle: [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830
[20:11:32] <wikibugs_>	 (03PS3) 10Krinkle: webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149)
[20:12:13] <wikibugs_>	 (03CR) 10Chad: "What Marco said: let's please deploy new wikis with as much initial configuration as possible." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378401 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup)
[20:12:15] <wikibugs_>	 (03CR) 10Ladsgroup: "meh, I love small patches." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378401 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup)
[20:12:26] <icinga-wm>	 RECOVERY - puppet last run on mw1296 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[20:13:54] <wikibugs_>	 (03CR) 10Ladsgroup: "okay, I fix it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378401 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup)
[20:18:53] <wikibugs_>	 (03CR) 10MarcoAurelio: Add config for amwikimedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378400 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup)
[20:21:36] <wikibugs_>	 (03PS6) 10Krinkle: [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830
[20:21:38] <wikibugs_>	 (03PS4) 10Krinkle: webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149)
[20:22:24] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149) (owner: 10Krinkle)
[20:25:11] <wikibugs_>	 (03PS7) 10Krinkle: [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830
[20:25:13] <wikibugs_>	 (03PS5) 10Krinkle: webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149)
[20:26:05] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149) (owner: 10Krinkle)
[20:28:49] <wikibugs_>	 10Operations, 10Edit-Review-Improvements, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Performance: Systematically test load speeds of Watchlist and Recent Changes - https://phabricator.wikimedia.org/T176445#3628281 (10jmatazzoni)
[20:30:07] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy
[20:30:07] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2007 is OK: All endpoints are healthy
[20:30:16] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy
[20:30:16] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy
[20:30:17] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2008 is OK: All endpoints are healthy
[20:30:26] <icinga-wm>	 RECOVERY - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is OK: All endpoints are healthy
[20:30:27] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2011 is OK: All endpoints are healthy
[20:30:27] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy
[20:30:36] <icinga-wm>	 RECOVERY - Restbase edge codfw on text-lb.codfw.wikimedia.org is OK: All endpoints are healthy
[20:30:36] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy
[20:30:36] <icinga-wm>	 RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy
[20:30:37] <icinga-wm>	 RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy
[20:30:37] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy
[20:30:37] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2010 is OK: All endpoints are healthy
[20:30:37] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2012 is OK: All endpoints are healthy
[20:30:46] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1011 is OK: All endpoints are healthy
[20:30:47] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy
[20:30:47] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1015 is OK: All endpoints are healthy
[20:30:47] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy
[20:30:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy
[20:30:57] <icinga-wm>	 RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy
[20:31:06] <icinga-wm>	 RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy
[20:31:06] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2004 is OK: All endpoints are healthy
[20:31:06] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy
[20:32:17] <wikibugs_>	 10Operations, 10Edit-Review-Improvements, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Performance: Systematically test load speeds of Watchlist and Recent Changes - https://phabricator.wikimedia.org/T176445#3628285 (10jmatazzoni)
[20:39:39] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628320 (10Slaporte)
[20:40:02] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628337 (10Slaporte)
[20:42:12] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628343 (10ZhouZ) I can confirm Stephen is taking over this.
[20:43:17] <wikibugs_>	 (03PS8) 10Krinkle: [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830
[20:43:19] <wikibugs_>	 (03PS6) 10Krinkle: webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149)
[20:44:10] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149) (owner: 10Krinkle)
[20:47:35] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628347 (10Zoranzoki21) a:03Zoranzoki21
[20:50:08] <tabbycat>	 hmm... don't we need a C-level stuff to take over on that? ^
[20:50:17] <tabbycat>	 s/stuff/staff
[20:53:46] <wikibugs_>	 (03PS4) 10Hashar: contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529
[20:55:00] <wikibugs_>	 (03PS5) 10Hashar: contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 (https://phabricator.wikimedia.org/T174972)
[20:58:12] <wikibugs_>	 (03Draft2) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518)
[21:00:11] <wikibugs_>	 (03CR) 10Zoranzoki21: "I do not know what to add in uid.. I added uid: 176518" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[21:01:01] <wikibugs_>	 (03CR) 10Hashar: [C: 031] "Prior patchsets were using an aptly repo on labs. Now that the package are uploaded on apt.wikimedia.org in jessie-wikimedia/component/ci " [puppet] - 10https://gerrit.wikimedia.org/r/377529 (https://phabricator.wikimedia.org/T174972) (owner: 10Hashar)
[21:04:15] <p858snake>	 tabbycat: can you -2 the patchset, ops havn't reviewed for start
[21:04:32] <p858snake>	 UID looks wrong
[21:04:51] <tabbycat>	 p858snake: I can't -2
[21:05:20] <tabbycat>	 I'm a simple user :)
[21:07:36] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0]
[21:12:29] <wikibugs_>	 (03CR) 10Hashar: [C: 04-1] Access for Slaporte (Stephen LaPorte) to stat1005 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[21:12:56] <hashar>	 p858snake: tabbycat: yeah the uid is random :)
[21:13:06] <hashar>	 I added some basic comments on the patchset
[21:14:23] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628320 (10hashar) Note the public key is used on labs and IIRC access to production requires a different ssh key.  @Zoranzoki21 provided a patch in Gerrit at https://gerrit.wikimedia.org...
[21:15:04] <wikibugs_>	 (03PS4) 10Halfak: Adds myspell-lv package to ores::base Switches myspell-uk to aspell-uk (better package) Bug: T175628 Bug: T175627 Change-Id: I6d917712a44f404a9a2737c4c58df12f4ee15547 [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628)
[21:15:33] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Adds myspell-lv package to ores::base Switches myspell-uk to aspell-uk (better package) Bug: T175628 Bug: T175627 Change-Id: I6d917712a44f404a9a2737c4c58df12f4ee15547 [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628) (owner: 10Halfak)
[21:15:53] <wikibugs_>	 (03PS5) 10Halfak: Adds myspell-lv package to ores::base Switches myspell-uk to aspell-uk (better package) Bug: T175628 Bug: T175627 Change-Id: I6d917712a44f404a9a2737c4c58df12f4ee15547 [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628)
[21:16:20] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Adds myspell-lv package to ores::base Switches myspell-uk to aspell-uk (better package) Bug: T175628 Bug: T175627 Change-Id: I6d917712a44f404a9a2737c4c58df12f4ee15547 [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628) (owner: 10Halfak)
[21:17:20] <icinga-wm>	 ACKNOWLEDGEMENT - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [140.0] amusso Transient mass changes on mediawiki/core
[21:18:20] <wikibugs_>	 (03CR) 10Hashar: Adds myspell-lv package to ores::base Switches myspell-uk to aspell-uk (better package) Bug: T175628 Bug: T175627 Change-Id: I6d917712a44f40 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628) (owner: 10Halfak)
[21:18:38] <hashar>	 halfak: hi you are missing a newline in the commit message :]
[21:19:17] <halfak>	 hashar, where?  between the message and "Bug: "?
[21:19:32] <wikibugs_>	 (03PS3) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518)
[21:20:03] <wikibugs_>	 (03PS6) 10Halfak: Adds myspell-lv package to ores::base Switches myspell-uk to aspell-uk (better package) [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628)
[21:20:06] <halfak>	 Arg! 
[21:20:09] <halfak>	 That should do it
[21:20:24] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Adds myspell-lv package to ores::base Switches myspell-uk to aspell-uk (better package) [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628) (owner: 10Halfak)
[21:20:52] <wikibugs_>	 (03PS4) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518)
[21:21:08] <wikibugs_>	 (03CR) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[21:22:29] <hashar>	 halfak: you should be able to reproduce locally though by just running "tox" :)
[21:25:51] <wikibugs_>	 (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[21:26:40] <wikibugs_>	 (03CR) 10Zoranzoki21: "@Hashar How to I work recheck?" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[21:26:55] <wikibugs_>	 (03CR) 10Hashar: "Thanks Zoranzoki21 :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[21:27:24] * halfak runs tox
[21:27:42] <hashar>	 halfak: specially  tox -e commit-message
[21:27:50] <hashar>	 that is a python soft that validates.. the commit message! :]
[21:28:01] <legoktm>	 hashar: https://www.mediawiki.org/wiki/Commit-message-validator has instructions on how to set it up as a git hook
[21:28:07] <legoktm>	 halfak: ^
[21:28:15] <hashar>	 magic
[21:28:43] <hashar>	 on those good words. I  am off for the week-end!  Happy hacking everyone :]
[21:29:01] <wikibugs_>	 (03PS7) 10Halfak: Adds myspell-lv, myspell-uk to aspell-uk to ores::base [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628)
[21:29:09] <halfak>	 o/ hashar 
[21:29:12] <halfak>	 thanks for the help
[21:29:15] <halfak>	 also legoktm :D
[21:30:45] <wikibugs_>	 (03CR) 10Zoranzoki21: "Ok @Hashar" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[21:31:49] * hashar waves
[21:32:05] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628562 (10Zoranzoki21) >>! In T176518#3628460, @hashar wrote: > Note the public key is used on labs and IIRC access to production requires a different ssh key. >  > @Zoranzoki21 provided...
[21:35:46] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:35:46] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:35:46] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s5 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:35:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[21:35:46] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:35:47] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:35:47] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:35:48] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s6 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:35:48] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m3 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:35:49] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:35:49] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:35:56] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:35:56] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:35:56] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s4 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:35:56] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:35:56] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:35:57] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:35:57] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m2 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:35:58] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[21:36:17] <icinga-wm>	 PROBLEM - MariaDB Slave IO: x1 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[21:36:27] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:36:27] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[21:36:30] <wikibugs_>	 (03PS1) 10Greg Grossmeier: admin: Add gjg to contint-admin [puppet] - 10https://gerrit.wikimedia.org/r/379932
[21:38:51] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628614 (10Slaporte) >>! In T176518#3628460, @hashar wrote: > Note the public key is used on labs and IIRC access to production requires a different ssh key.  Here is a different key:  ``...
[21:39:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s7 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[21:39:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s5 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[21:39:46] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[21:39:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s2 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[21:39:47] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[21:39:47] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s6 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[21:39:47] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s5 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[21:39:48] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[21:39:48] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[21:39:49] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s2 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[21:39:56] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[21:39:57] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[21:39:57] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m2 on dbstore1001 is OK: OK slave_sql_state not a slave
[21:39:57] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s4 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[21:39:57] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: x1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[21:39:57] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s7 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[21:39:57] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m2 on dbstore1001 is OK: OK slave_io_state not a slave
[21:40:17] <icinga-wm>	 RECOVERY - MariaDB Slave IO: x1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[21:40:26] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s4 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[21:40:26] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s6 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[21:51:56] <wikibugs_>	 (03PS5) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518)
[21:52:41] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628639 (10Zoranzoki21) >>! In T176518#3628614, @Slaporte wrote: >>>! In T176518#3628460, @hashar wrote: >> Note the public key is used on labs and IIRC access to production requires a di...
[21:53:13] <wikibugs_>	 (03CR) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[22:08:48] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3628679 (10Zoranzoki21) a:03Zoranzoki21 I support this. I will made a patch when https://gerrit.wikimedia.org/r/#/c/379851/ and https://gerrit.wikimedia.org/r/#/c/379932/1 be...
[22:10:04] <wikibugs_>	 (03CR) 10Zoranzoki21: [C: 031] admin: Add gjg to contint-admin [puppet] - 10https://gerrit.wikimedia.org/r/379932 (owner: 10Greg Grossmeier)
[22:13:50] <wikibugs_>	 10Operations, 10Ops-Access-Requests, 10WMF-NDA-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3628686 (10zhuyifei1999)
[22:14:18] <wikibugs_>	 10Operations, 10Ops-Access-Requests, 10WMF-NDA-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3622856 (10MarcoAurelio) @Zoranzoki21 This needs to be supported/approved by some people, NDAS, etc. I suggest you un-claim the task so the relevant peopl...
[22:14:22] <wikibugs_>	 (03PS1) 10Andrew Bogott: labtest: include salt profile on labtestcontrol [puppet] - 10https://gerrit.wikimedia.org/r/379940
[22:14:47] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] labtest: include salt profile on labtestcontrol [puppet] - 10https://gerrit.wikimedia.org/r/379940 (owner: 10Andrew Bogott)
[22:15:22] <wikibugs_>	 10Operations, 10Ops-Access-Requests, 10WMF-NDA-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3628690 (10Zoranzoki21) a:05Zoranzoki21>03None >>! In T176364#3628688, @MarcoAurelio wrote: > @Zoranzoki21 This needs to be supported/approved by some...
[22:15:53] <wikibugs_>	 10Operations, 10Ops-Access-Requests, 10WMF-NDA-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3628693 (10Dereckson) To better understand your request, could you give a sample of tasks you would like to create with LogStash access?  At what frequenc...
[22:17:23] <wikibugs_>	 (03PS2) 10Andrew Bogott: labtest: include salt profile on labtestcontrol [puppet] - 10https://gerrit.wikimedia.org/r/379940
[22:17:59] <wikibugs_>	 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3628696 (10zhuyifei1999)
[22:18:23] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] labtest: include salt profile on labtestcontrol [puppet] - 10https://gerrit.wikimedia.org/r/379940 (owner: 10Andrew Bogott)
[22:19:10] <wikibugs_>	 (03CR) 10Dzahn: Access for Slaporte (Stephen LaPorte) to stat1005 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[22:20:38] <wikibugs_>	 (03CR) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[22:21:23] <tabbycat>	 Dereckson: so does https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T1100 look right to you?
[22:21:35] <wikibugs_>	 (03CR) 10Dzahn: Fix problem with throttle rule for John Michael Kohler Art Center. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379661 (https://phabricator.wikimedia.org/T176287) (owner: 10Zoranzoki21)
[22:21:47] <wikibugs_>	 (03PS6) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518)
[22:22:39] <wikibugs_>	 (03CR) 10Dzahn: [C: 031] admin: Add gjg to contint-admin [puppet] - 10https://gerrit.wikimedia.org/r/379932 (owner: 10Greg Grossmeier)
[22:23:01] <Dereckson>	 tabbycat: looking
[22:23:23] <Niharika>	 tabbycat: Why are you backporting https://gerrit.wikimedia.org/r/#/c/356362/?
[22:23:43] <wikibugs_>	 (03CR) 10Zoranzoki21: "@Dzahn Email address changed on @wikimedia.org domain, per email from new generated ssh key" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21)
[22:23:52] <tabbycat>	 Niharika: it was requested in a task... wasn't it needed?
[22:24:12] <Niharika>	 tabbycat: It's been long deployed with the train. 
[22:24:32] <Niharika>	 tabbycat: See https://github.com/wikimedia/mediawiki-extensions-BlockAndNuke/blob/master/BanPests.php
[22:24:47] <tabbycat>	 Niharika: sorry, I'm so noob... Does that mean that no backport to other REL* are needed?
[22:25:50] <Niharika>	 tabbycat: Yep, no backports are needed. Anything that gets +2ed and merged goes out with the train unless somebody wants it out sooner. In that case cherry-picking and SWAT are required.
[22:26:01] <Niharika>	 The train runs every week.
[22:26:09] * tabbycat is confused
[22:26:21] <tabbycat>	 so... why are RELs for?
[22:26:30] <greg-g>	 backports to REL_ branches don't get pushed to production, we only use those for tarballs
[22:26:34] <greg-g>	 ^
[22:26:34] <zhuyifei1999_>	 releases
[22:26:44] <tabbycat>	 block and nuke ain't deployed to the wikimedia cluster though
[22:27:18] <greg-g>	 if they are needed for a new tarball release of those REL_ branches, then sure
[22:27:36] <tabbycat>	 okay, let me get this... so once the change in master was merged... it also gots added to the other REL* ?
[22:27:47] <tabbycat>	 see T173687
[22:27:48] <stashbot>	 T173687: Block and Nuke broken in REL1_27 branch due to whitelist truncation - please backport patch from master - https://phabricator.wikimedia.org/T173687
[22:28:09] <greg-g>	 tabbycat: no, I think there was confusion from others at the beginning
[22:28:28] <Niharika>	 Ah, I didn't know about the tarball being different from production.
[22:28:30] <greg-g>	 tl;dr: you did it right, if this is something that needs to be added to and create a new release for those past releases
[22:28:48] <Niharika>	 My bad. Sorry for the confusion tabbycat.
[22:28:55] <greg-g>	 :)
[22:29:37] <tabbycat>	 greg-g: I think that's what they requested, because the extension is 'broken' from REL1_27 onwards
[22:29:41] <mutante>	 !log gerrit2001 - systemd says gerrit.service is failed. gerrit.sh start says "Already Running!!" :p - cobalt is fine
[22:29:43] <tabbycat>	 Niharika: no problem! :)
[22:29:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:30:06] <tabbycat>	 I still have a lot to learn in this new world of the development
[22:31:27] * tabbycat still remembers the first time he read 'cherry-pick' and was like... wtf have cherries to do with software o_O
[22:32:03] <Niharika>	 :D
[22:32:24] <tabbycat>	 let's hope they don't create a 'potato-pick' or something else
[22:32:38] <tabbycat>	 or a 'goat-pick'... goats are in fashion on Wikimedia lately
[22:33:50] <Niharika>	 Pray no German hears that...
[22:34:18] <mutante>	 !log gerrit2001 - stopping gerrit with gerrit.sh stop, letting puppet start it again
[22:34:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:35:57] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[22:36:16] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[22:37:57] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s5 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:37:57] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:37:57] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s6 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:37:57] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:37:57] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:37:57] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:37:57] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:38:06] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m3 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:38:07] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:38:07] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:38:16] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:38:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:38:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:38:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:38:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:38:17] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m2 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:38:17] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s4 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:38:36] <icinga-wm>	 PROBLEM - MariaDB Slave IO: x1 on dbstore1001 is CRITICAL: CRITICAL slave_io_state could not connect
[22:38:37] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:38:37] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state could not connect
[22:43:06] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0]
[22:43:11] <wikibugs_>	 (03PS6) 10Andrew Bogott: nova: turn off hourly instance usage audits [puppet] - 10https://gerrit.wikimedia.org/r/377187
[22:43:35] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 031] "I don't see this causing any harm on labtest.  I'll merge in prod when I'm going to be around for a while to watch." [puppet] - 10https://gerrit.wikimedia.org/r/377187 (owner: 10Andrew Bogott)
[22:45:06] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[22:45:06] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[22:45:36] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: x1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[22:45:37] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[22:45:37] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[22:45:37] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[22:45:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[22:48:05] <wikibugs_>	 (03PS7) 10Krinkle: webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149)
[22:52:07] <paladox>	 no_justification hi, do you know why we comment this
[22:52:08] <paladox>	 # '-Dlog4j.configuration=file:///var/lib/gerrit2/review_site/etc/log4j.properties',
[22:52:09] <paladox>	 out?
[22:52:19] <paladox>	 please
[22:52:50] <no_justification>	 Uh cuz it wasn't working. Did it not get uncommented when we fixed the settings?
[22:53:19] <paladox>	 nope
[22:53:20] <paladox>	 was wondering why it was commented out.
[22:53:21] <paladox>	 thanks for explaning
[22:53:29] <no_justification>	 That'd explain why fixed settings didn't do anything
[22:53:33] <wikibugs_>	 (03Draft1) 10Paladox: Gerrit: Remove gc logging [puppet] - 10https://gerrit.wikimedia.org/r/379946
[22:53:36] <wikibugs_>	 (03PS2) 10Paladox: Gerrit: Remove gc logging [puppet] - 10https://gerrit.wikimedia.org/r/379946
[22:54:08] <wikibugs_>	 (03CR) 10Chad: "Let's comment them out instead so we have them for reference again if need be" [puppet] - 10https://gerrit.wikimedia.org/r/379946 (owner: 10Paladox)
[22:54:17] <wikibugs_>	 (03CR) 10Chad: "Maybe with a comment what they're for" [puppet] - 10https://gerrit.wikimedia.org/r/379946 (owner: 10Paladox)
[22:55:13] <wikibugs_>	 (03PS3) 10Paladox: Gerrit: Remove gc logging [puppet] - 10https://gerrit.wikimedia.org/r/379946
[22:56:31] <mutante>	 !log gerrit2001 - trying to manually start gerrit again, as opposed to puppet doing it.. debugging gerrit-ssh issue there, cobalt still untouched
[22:56:33] <wikibugs_>	 (03CR) 10Chad: [C: 031] Gerrit: Remove gc logging [puppet] - 10https://gerrit.wikimedia.org/r/379946 (owner: 10Paladox)
[22:56:42] <paladox>	 thanks :)
[22:56:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:57:10] <mutante>	 it also fails to start when i just manually use gerrit.sh start 
[22:57:15] <paladox>	 hmm
[22:57:17] <paladox>	 try 
[22:57:19] <mutante>	 so should not even be systemd related
[22:57:20] <paladox>	 bin/gerrit.sh run
[22:57:58] <paladox>	 that should at least give us something since it dosen't seem to be writing to the logs
[22:58:00] <mutante>	 whats the difference between running and starting
[22:58:09] <paladox>	 running shows everything
[22:58:16] <mutante>	 Starting Gerrit Code Review: FAILED
[22:58:22] <mutante>	 Running Gerrit Code Review:
[22:58:27] <mutante>	 Already Running!!
[22:58:37] <paladox>	 it shows you what the log should show
[22:58:42] <paladox>	 hmm
[22:58:46] <paladox>	 try bin/gerrit.sh stop
[22:58:49] <paladox>	 bin/gerrit.sh run
[23:00:11] <wikibugs_>	 (03PS1) 10Krinkle: Enable jQuery 3 on Wiktionary sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379947 (https://phabricator.wikimedia.org/T124742)
[23:00:13] <wikibugs_>	 (03PS1) 10Krinkle: Enable jQuery 3 on most group1 wikis (non-Wikipedia) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379948 (https://phabricator.wikimedia.org/T124742)
[23:00:15] <wikibugs_>	 (03PS1) 10Krinkle: Enable jQuery 3 on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379949
[23:02:12] <paladox>	 mutante did bin/gerrit.sh run show anything?
[23:02:53] <mutante>	 paladox: yes, what i pasted, it shouts at me "Already Running!!"
[23:02:59] <paladox>	 ok
[23:03:06] <mutante>	 ok, one sec
[23:03:07] <paladox>	 mutante try stopping and then run run
[23:03:09] <paladox>	 ok
[23:03:10] <paladox>	 thanks
[23:04:15] <mutante>	 doing , stop/run
[23:04:24] <paladox>	 thanks
[23:04:29] <mutante>	 yes, it calls it "Running" as opposed to "starting"
[23:04:37] <wikibugs_>	 (03PS2) 10Krinkle: Enable jQuery 3 on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379949 (https://phabricator.wikimedia.org/T124742)
[23:05:23] <wikibugs_>	 10Operations, 10Ops-Access-Requests: Requesting access to production bastions for cwdent - https://phabricator.wikimedia.org/T176529#3628771 (10cwdent)
[23:05:33] <paladox>	 yep
[23:05:36] <mutante>	 waits
[23:07:03] <mutante>	 paladox: it's like it runs but still doesnt start the ssh daemon
[23:07:16] <mutante>	 no output where i typed "run" but also no failure and doesnt quit
[23:07:25] <paladox>	 does it show anything in log format
[23:07:28] <paladox>	 in the screen
[23:07:29] <paladox>	 ?
[23:07:48] <mutante>	 no, just "Running Gerrit Code Review: " and sits there
[23:08:06] <mutante>	 tries something else
[23:08:23] <paladox>	 hmm
[23:08:49] <mutante>	 unlinks plugin dir
[23:08:52] <mutante>	 gerrit.sh start
[23:09:38] <paladox>	 i guess we found a problem. Some how it's failing to allow gerrit to start on gerrit2001
[23:10:00] <mutante>	 well yea :)
[23:10:05] <paladox>	 lets do a reboot to see if that will clear anything that is blocking it.
[23:10:12] <mutante>	 and it all started since i restarted the service once
[23:10:27] <paladox>	 ah
[23:10:30] <paladox>	 lets try a reboot
[23:10:44] <paladox>	 it could be a proc
[23:11:08] <mutante>	 i see the gerrit process running and not running .. when i do start/stop
[23:11:23] <mutante>	 there are 2 types of behaviour:
[23:11:30] <mutante>	 a) gerrit service itself fails to start
[23:11:38] <mutante>	 b) gerrit service runs 
[23:11:44] <mutante>	 but in both cases.. gerrit-ssh never runs anymore
[23:11:49] <paladox>	 hmm
[23:11:49] <mutante>	 and it used to before
[23:11:54] <paladox>	 lets try a reboot.
[23:12:18] <mutante>	 doesnt want to go the windows route just yet...
[23:12:30] <paladox>	 ok
[23:13:04] <paladox>	 aha
[23:13:10] <paladox>	 bin/gerrit.sh start
[23:13:15] <paladox>	 then bin/gerrit.sh supervise
[23:13:31] <paladox>	 gerrit2@gerrit-test3:~/review_site$ bin/gerrit.sh supervise
[23:13:31] <paladox>	 [2017-09-22 23:13:05,211] [main] INFO  com.google.gerrit.server.cache.h2.H2CacheFactory : Enabling disk cache /var/lib/gerrit2/review_site/cache
[23:13:32] <paladox>	 [2017-09-22 23:13:05,746] [main] WARN  com.google.gerrit.server.config.AdministrateServerGroupsProvider : Group "ldap/ops" not available, skipping.
[23:13:32] <paladox>	 [2017-09-22 23:13:06,783] [main] INFO  com.google.gerrit.server.config.ScheduleConfig : gc schedule parameter "gc.interval" is not configured
[23:13:32] <paladox>	 [2017-09-22 23:13:06,783] [main] INFO  com.google.gerrit.server.config.ScheduleConfig : changeCleanup schedule parameter "changeCleanup.startTime" is not configured
[23:13:36] <mutante>	 but "start" said FAILED and exited again.. after some time 
[23:13:47] <mutante>	 tries
[23:14:44] <mutante>	 yea, always that nice combo where it first says FAILED but then "Already Running" too
[23:15:07] <paladox>	 hmm
[23:15:09] <mutante>	 it's confused and that is when i'm not even using puppet or unit file, just the gerrit.sh 
[23:15:14] <mutante>	 wth
[23:15:53] <mutante>	   GERRIT_STARTUP_TIMEOUT =  90
[23:16:02] <paladox>	 90 seconds
[23:16:04] <mutante>	 i think that's when i get the FAILED
[23:16:08] <paladox>	 yeh
[23:16:28] <paladox>	 if it goes passed that (even though it's configuable) then it likly indicates a problem
[23:16:37] <paladox>	 it depends on the system though
[23:16:47] <mutante>	 i just repeated "start" and this time it doesnt think it's already running, it tries it again
[23:17:19] <paladox>	 ok
[23:17:45] <mutante>	 "supervise" shows nothing so far
[23:17:51] <paladox>	 hmm
[23:18:26] <mutante>	 during all this.. not a single line in error_log
[23:18:34] <mutante>	 still stopped on 19th
[23:18:52] <mutante>	 am i really doing this as gerrit2 user?
[23:19:03] <mutante>	 it can definitely write into that log
[23:19:13] <paladox>	 hmm
[23:19:16] <paladox>	 if it can write
[23:19:29] <paladox>	 then something is blocking gerrit from doing it
[23:19:34] <mutante>	 i tested writing to it as gerrit2, can
[23:19:58] <paladox>	 ok
[23:20:03] <paladox>	 so gerrit is having the problem
[23:20:17] <paladox>	 but if it works on cobalt, it has to be something other then a gerrit config
[23:20:41] <mutante>	 will not touch cobalt, it's possible it would break there too if we do
[23:21:18] <mutante>	 and it just wasnt restarted since X
[23:21:40] <mutante>	 much better to know first what is the issue on the non-active one
[23:21:51] <mutante>	 but could also be that it's all just because this one uses --slave
[23:22:00] <mutante>	 and without it it wouldnt have the problem
[23:22:01] <paladox>	 i used --slave
[23:22:03] <paladox>	 and works for me
[23:22:06] <mutante>	 ok..
[23:22:07] <paladox>	 writes to logs too
[23:24:12] <mutante>	 oh
[23:24:21] <mutante>	 you know what.. apt-get upgrade would upgrade gerrit here
[23:25:14] <paladox>	 yep
[23:25:17] <paladox>	 ah
[23:25:25] <paladox>	 is there an update?
[23:25:34] <mutante>	 wait, i dont get it. it has 2.13.8+git1-wmf.6 on both of them
[23:26:26] <paladox>	 apt-get update
[23:26:26] <mutante>	 it would install 2.13.8+git1-wmf.7
[23:26:29] <paladox>	 apt-get upgrade
[23:26:38] <mutante>	 but still.. it's not different.. both just on .6 so far
[23:27:06] <paladox>	 hmm
[23:29:26] <mutante>	 yea, i built that package, it's just not installed yet, on either
[23:29:39] <paladox>	 hmm
[23:30:29] <mutante>	 .7 drops the plugins 
[23:30:34] <paladox>	 yep
[23:36:15] <mutante>	 !log gerrit2001 - rebooting .. wave a dead chicken http://www.catb.org/jargon/html/W/wave-a-dead-chicken.html
[23:36:25] <paladox>	 lol
[23:36:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:40:47] <icinga-wm>	 PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/lib/nagios/plugins/check_sysctl]
[23:43:20] <paladox>	 mutante did a reboot work? :)
[23:43:24] <mutante>	 paladox: ..no 
[23:43:29] <paladox>	 oh
[23:46:43] <paladox>	 mutante does this /etc/default/gerritcodereview file exist?
[23:47:09] <mutante>	 yes, it does
[23:47:16] <mutante>	   1 GERRIT_SITE="/var/lib/gerrit2/review_site"
[23:47:21] <mutante>	   2 GERRIT_WAR="/var/lib/gerrit2/review_site/bin/gerrit.war"
[23:47:30] <paladox>	 ok
[23:48:03] <paladox>	 lets try bin/gerrit.sh stop && chmod -R gerrit2:gerrit2 /var/lib/gerrit2/review_site/ && bin/gerrit.sh start
[23:48:16] <paladox>	 https://www.google.co.uk/search?client=safari&channel=ipad_bm&dcr=0&source=hp&q=gerrit+not+starting&oq=gerrit+not+starting&gs_l=psy-ab.3..0.426.3519.0.3716.21.19.0.0.0.0.91.1026.19.19.0....0...1.1.64.psy-ab..2.19.1024.0..35i39k1j0i131k1j0i22i30k1j0i22i10i30k1.0.unB21B3UVgs
[23:51:34] <wikibugs_>	 (03PS1) 10Krinkle: mediawiki/hhvm: Move fatal-error.php to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114)
[23:51:54] <mutante>	 chown, not chmod.. it doesnt fix it
[23:52:04] <paladox>	 ok
[23:52:33] <wikibugs_>	 (03CR) 10Krinkle: "Looking for feedback on puppet logic (correct Require? Okay to require that from here? Correct Before?)." [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle)
[23:53:08] <mutante>	 suspects plugins dir and tries unlinking that again
[23:53:49] <paladox>	 ok
[23:54:18] <mutante>	 none of that makes a difference..
[23:55:10] <mutante>	 does "bash -x" on the old init.d script (all methods fail)
[23:55:13] <paladox>	 ok
[23:55:17] <paladox>	 ah yep
[23:55:19] <paladox>	 bash -x
[23:55:29] <wikibugs_>	 (03PS2) 10Krinkle: mediawiki/hhvm: Move fatal-error.php to Puppet [puppet] - 10https://gerrit.wikimedia.org/r/379953 (https://phabricator.wikimedia.org/T113114)
[23:55:37] <mutante>	 so it just sits there and keeps checking what is inside the gerrit.run file
[23:56:13] <paladox>	 ok
[23:56:19] <mutante>	 and sleeps and tries agian.. until the timeout is reached
[23:56:29] <paladox>	 yeh
[23:57:23] <mutante>	 gerrit does get a pid 
[23:57:43] <paladox>	 yep
[23:57:45] <paladox>	 gerrit.pid
[23:58:45] <paladox>	 i am guessing it's time for running the java command manually
[23:59:18] <paladox>	 java -jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site
[23:59:19] <paladox>	 mutante ^^
[23:59:56] <paladox>	 java -jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site --show-stack-trace