[00:25:57] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2061 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[00:27:58] <icinga-wm>	 PROBLEM - Check systemd state on restbase-dev1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:34:18] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[03:26:48] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 847.89 seconds
[03:57:03] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure: Mails through deployment-mx SPF & DKIM fails - https://phabricator.wikimedia.org/T87338#4269749 (10Krenair) Probably, I should probably add an SPF record allowing this host to send mail
[04:08:35] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure: Mails through deployment-mx SPF & DKIM fails - https://phabricator.wikimedia.org/T87338#4269753 (10Krenair) I've added SPF and DMARC (p=none) records. Haven't done DKIM yet.
[04:11:18] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 404.09 seconds
[04:13:28] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 434.86 seconds
[04:13:58] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 102.83 seconds
[04:19:57] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2036 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.79 seconds
[04:19:57] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2050 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.42 seconds
[04:20:07] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 48.94 seconds
[04:20:08] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.80 seconds
[04:20:28] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2074 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.82 seconds
[04:20:57] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 313.77 seconds
[04:20:58] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 315.15 seconds
[04:48:47] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[04:52:07] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[05:12:58] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 443.26 seconds
[05:18:37] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[05:19:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 35.17 seconds
[05:21:57] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[05:44:18] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 382.05 seconds
[05:44:47] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2061 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 393.64 seconds
[06:22:28] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 0.46 seconds
[06:22:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2043 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[06:22:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2036 is OK: OK slave_sql_lag Replication lag: 0.33 seconds
[06:22:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2050 is OK: OK slave_sql_lag Replication lag: 0.36 seconds
[06:22:47] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2094 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[06:23:08] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2074 is OK: OK slave_sql_lag Replication lag: 0.41 seconds
[06:28:19] <wikibugs>	 (03PS1) 10Alex Monk: Followup If545182a: Actually use cert_name now [puppet] - 10https://gerrit.wikimedia.org/r/439451
[06:30:17] <icinga-wm>	 PROBLEM - puppet last run on mw1308 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean]
[06:30:47] <icinga-wm>	 PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/add-ldap-group],File[/etc/update-motd.d/97-last-puppet-run]
[06:45:17] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={create_container,image_status,podsandbox_status,remove_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:46:18] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:55:38] <icinga-wm>	 RECOVERY - puppet last run on mw1308 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[06:56:08] <icinga-wm>	 RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[06:57:35] <wikibugs>	 (03PS1) 10Dzahn: wikistats: update name of miraheze import script [puppet] - 10https://gerrit.wikimedia.org/r/439455 (https://phabricator.wikimedia.org/T191245)
[07:00:39] <wikibugs>	 (03PS2) 10Dzahn: wikistats: update name of miraheze import script [puppet] - 10https://gerrit.wikimedia.org/r/439455 (https://phabricator.wikimedia.org/T191245)
[07:01:54] <wikibugs>	 (03CR) 10Dzahn: [C: 032] wikistats: update name of miraheze import script [puppet] - 10https://gerrit.wikimedia.org/r/439455 (https://phabricator.wikimedia.org/T191245) (owner: 10Dzahn)
[07:03:17] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.23 seconds
[07:08:27] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.30 seconds
[07:08:28] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.56 seconds
[07:08:37] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2036 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.67 seconds
[07:08:37] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2050 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 301.90 seconds
[07:09:07] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2074 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.27 seconds
[07:12:38] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 406.74 seconds
[07:18:37] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[07:20:18] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 0.33 seconds
[07:21:57] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[07:29:47] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 22 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map
[07:32:46] <wikibugs>	 (03PS1) 10Urbanecm: Fix wgMetaNamespace for pswikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439457 (https://phabricator.wikimedia.org/T196837)
[07:34:57] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 8 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map
[07:40:45] <wikibugs>	 (03PS4) 10Urbanecm: id_internalwikimedia: register in DNS [dns] - 10https://gerrit.wikimedia.org/r/438275 (https://phabricator.wikimedia.org/T196747)
[07:43:41] <wikibugs>	 (03PS6) 10Urbanecm: id_internalwikimedia: Initial configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438279
[07:45:43] <wikibugs>	 (03PS4) 10Urbanecm: id_privatewikimedia: add Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/438276 (https://phabricator.wikimedia.org/T196747)
[07:46:06] <wikibugs>	 (03PS5) 10Urbanecm: id_internalwikimedia: add Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/438276 (https://phabricator.wikimedia.org/T196747)
[07:48:29] <wikibugs>	 (03PS7) 10Urbanecm: id_internalwikimedia: Initial configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438279
[08:01:08] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2074 is OK: OK slave_sql_lag Replication lag: 23.28 seconds
[08:01:38] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 0.06 seconds
[08:01:47] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2043 is OK: OK slave_sql_lag Replication lag: 0.04 seconds
[08:01:48] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2036 is OK: OK slave_sql_lag Replication lag: 0.45 seconds
[08:01:48] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2050 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[08:01:57] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2094 is OK: OK slave_sql_lag Replication lag: 0.56 seconds
[08:02:38] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2061 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[08:03:18] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 37.95 seconds
[08:27:58] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1969 bytes in 0.092 second response time
[08:48:18] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1971 bytes in 0.074 second response time
[08:54:47] <wikibugs>	 (03PS4) 10EddieGP: mediawiki: Move www.wikimedia.org to wwwportals.conf [puppet] - 10https://gerrit.wikimedia.org/r/424707 (https://phabricator.wikimedia.org/T173887)
[08:55:20] <wikibugs>	 (03CR) 10EddieGP: "bump - joe, when will you have time to review this?" [puppet] - 10https://gerrit.wikimedia.org/r/424707 (https://phabricator.wikimedia.org/T173887) (owner: 10EddieGP)
[09:01:37] <wikibugs>	 (03CR) 10EddieGP: [C: 04-1] "We will eventually want this, but currently deployment-tin.eqiad.wmflabs is still alive (I just logged in there), definitely not "has been" [puppet] - 10https://gerrit.wikimedia.org/r/438001 (https://phabricator.wikimedia.org/T192071) (owner: 10Dzahn)
[09:36:18] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1976 bytes in 0.070 second response time
[09:41:27] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1966 bytes in 0.106 second response time
[09:48:47] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1963 bytes in 0.079 second response time
[09:49:07] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational
[09:52:18] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:53:48] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1966 bytes in 0.068 second response time
[10:12:38] <icinga-wm>	 PROBLEM - puppet last run on kafka2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas]
[10:26:57] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2053 is OK: OK slave_sql_lag Replication lag: 20.66 seconds
[10:26:57] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2060 is OK: OK slave_sql_lag Replication lag: 4.46 seconds
[10:27:18] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2046 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[10:27:28] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2039 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[10:27:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2067 is OK: OK slave_sql_lag Replication lag: 0.51 seconds
[10:27:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2087 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[10:27:47] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2076 is OK: OK slave_sql_lag Replication lag: 0.46 seconds
[10:27:48] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db2089 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[10:39:18] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 0.24 seconds
[10:42:10] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4269936 (10Marostegui)
[10:43:17] <icinga-wm>	 RECOVERY - puppet last run on kafka2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[10:45:29] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4269948 (10Marostegui) p:05Triage>03High
[10:45:36] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4269949 (10greg) Related: The Gerrit upgrade included a migration that created many new git refs. Those are replicated to Phabricator and thus it also had to ingest/index them.
[10:46:23] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4269950 (10Marostegui) Ah right! I'm from my phone and cannot check what the writes are. Any ETA for that to be finished?
[10:47:07] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4269951 (10greg) Not sure, @mmodell ? @demon ?
[10:53:05] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4269954 (10Marostegui) Codfw is lagging behind as it cannot cope with the amount of writes. Not a big deal as it is not used, but it is an indicative of how massive it is. It w...
[11:03:12] <marostegui_>	 Gerrit down?
[11:04:38] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4269955 (10Paladox) It will be a long while as phab has to parse all the new commits (notedb) we should probaly try to ignore refs/changes/**/meta in phabricator.
[11:05:17] <paladox>	 marostegui: it’s not loading for me
[11:07:41] <paladox>	 https://downforeveryoneorjustme.com/gerrit.wikimedia.org
[11:08:19] <_joe_>	 only the web interface
[11:08:21] <_joe_>	 I'm on it
[11:08:48] <_joe_>	 !log restarting gerrit on cobalt as the web interface is unresponsive
[11:08:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:31] <_joe_>	 marostegui_: do you need gerrit right now?
[11:10:29] <marostegui_>	 Nope
[11:10:48] <marostegui_>	 As I created a task related to it,  I saw there was an alert about it
[11:11:09] <marostegui_>	 But as I am on 4G I wasn't sure if it was me only or the alert was indeed correct
[11:11:37] <_joe_>	 gerrit now loads
[11:11:49] <_joe_>	 but it's spitting scary errors
[11:11:57] <marostegui_>	 Works for me indeed
[11:12:04] <_joe_>	 marostegui_: log into your nick
[11:12:15] <paladox>	 Thanks
[11:12:23] <_joe_>	 paladox: I don't think we're ok
[11:12:24] <paladox>	 _joe_: what kind of errors?
[11:12:31] <_joe_>	 [2018-06-10 11:11:52,155] [OnlineNoteDbMigrator] ERROR com.google.gerrit.server.notedb.rebuild.NoteDbMigrator : Error migrating primary storage for 35785
[11:12:38] <paladox>	 That’s ok
[11:12:43] <_joe_>	 that's ok?
[11:12:46] <paladox>	 Those changes were probaly deleted 
[11:13:04] <_joe_>	 ok then, see you tomorrow, hopefully
[11:13:07] <marostegui_>	 https://phabricator.wikimedia.org/T196840
[11:13:12] * _joe_ back to lunch
[11:13:12] <marostegui_>	 For what it worth
[11:13:27] <icinga-wm>	 PROBLEM - puppet last run on bromine is CRITICAL: CRITICAL: Puppet has 7 failures. Last run 3 minutes ago with 7 failures. Failed resources (up to 3 shown): Exec[git_pull_wikimedia/annualreport],Exec[git_pull_wikimedia/TransparencyReport],Exec[git_pull_wikimedia/TransparencyReport-private],Exec[git_pull_wikibase/wikiba.se-deploy]
[11:13:37] <icinga-wm>	 PROBLEM - puppet last run on vega is CRITICAL: CRITICAL: Puppet has 5 failures. Last run 3 minutes ago with 5 failures. Failed resources (up to 3 shown): Exec[git_pull_wikimedia/TransparencyReport-private],Exec[git_pull_wikibase/wikiba.se-deploy],Exec[git_pull_research/landing-page],Exec[git_pull_design/landing-page]
[11:13:38] <icinga-wm>	 PROBLEM - puppet last run on releases2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI Composer]
[11:13:41] <_joe_>	 this is related to the restart ^^
[11:13:43] <_joe_>	 I think
[11:13:48] <icinga-wm>	 PROBLEM - puppet last run on releases1001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/deployment-charts],Exec[git_pull_jenkins CI Composer]
[11:14:00] <mutante>	 hi, just got online. can i still help
[11:14:08] <mutante>	 got a text from greg
[11:14:19] <_joe_>	 mutante: I should've fixed the immediate issue
[11:14:28] <icinga-wm>	 PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 4 minutes ago with 4 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas],Exec[git_pull_statistics_mediawiki],Exec[git_pull_analytics/reportupdater],Exec[git_pull_wikimedia/discovery/golden]
[11:14:32] <_joe_>	 see the possible cause elsewhere, where I pasted it
[11:14:37] <mutante>	 ok, thank you
[11:14:38] <icinga-wm>	 PROBLEM - puppet last run on thorium is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[git_pull_geowiki-scripts],Exec[git_pull_analytics.wikimedia.org]
[11:14:38] <icinga-wm>	 PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas]
[11:14:40] <_joe_>	 the puppet errors will go away soon
[11:14:47] <icinga-wm>	 PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas]
[11:14:51] <_joe_>	 I just re-ran puppet on vega, and it's ok
[11:15:08] <marostegui_>	 _joe_: it is being hard to log with my nick, my connection is quite bad now :(
[11:15:25] <_joe_>	 marostegui_: ok, go away :D
[11:15:30] <_joe_>	 I'm going off too
[11:15:39] <marostegui_>	 Thanks for getting it fixed!
[11:15:57] <marostegui_>	 Not sure if it could have been related to the task i created
[11:16:07] <icinga-wm>	 PROBLEM - puppet last run on kafka2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas]
[11:16:35] <paladox>	 Or marostegui_ not related to your task :)
[11:17:28] <icinga-wm>	 PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas]
[11:17:41] <marostegui_>	 OK :)
[11:17:48] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4269958 (10Marostegui) >>! In T196840#4269955, @Paladox wrote: > It will be a long while as phab has to parse all the new commits (notedb) we should probaly try to ignore refs/...
[11:18:38] <icinga-wm>	 RECOVERY - puppet last run on vega is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[11:18:58] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4269961 (10Paladox) Yep days.
[11:29:48] <icinga-wm>	 RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[11:33:58] <icinga-wm>	 RECOVERY - puppet last run on releases2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[11:34:08] <icinga-wm>	 RECOVERY - puppet last run on releases1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:36:37] <icinga-wm>	 RECOVERY - puppet last run on kafka2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:37:48] <icinga-wm>	 RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[11:38:47] <icinga-wm>	 RECOVERY - puppet last run on bromine is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[11:40:08] <icinga-wm>	 RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[11:40:08] <icinga-wm>	 RECOVERY - puppet last run on thorium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[11:40:08] <icinga-wm>	 RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:59:31] <wikibugs>	 (03PS1) 10MarcoAurelio: Archive the operations/software/tessera repository [software/tessera] - 10https://gerrit.wikimedia.org/r/439467 (https://phabricator.wikimedia.org/T186096)
[12:01:28] <bawolff>	 !log disable some botpasswords (T194204)
[12:01:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:03:42] <wikibugs>	 (03CR) 10MarcoAurelio: "No .gitreview and `git push origin HEAD:master` gives me permission error. I'll try to see what I can do." [software/tessera] - 10https://gerrit.wikimedia.org/r/439467 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio)
[12:06:07] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4270016 (10demon) We haven't needed to replicate any refs other than heads and tags since we brought gitiles online... Disable them. Now. And prune them from Phab  while we're...
[12:06:32] <wikibugs>	 (03PS1) 10MarcoAurelio: Temporary allow Gerrit Managers to own this repository for archiving purposes [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439468 (https://phabricator.wikimedia.org/T186096)
[12:13:56] <wikibugs>	 (03PS2) 10MarcoAurelio: Archive the operations/software/tessera repository [software/tessera] - 10https://gerrit.wikimedia.org/r/439467 (https://phabricator.wikimedia.org/T186096)
[12:14:32] <wikibugs>	 (03Abandoned) 10MarcoAurelio: Temporary allow Gerrit Managers to own this repository for archiving purposes [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439468 (https://phabricator.wikimedia.org/T186096) (owner: 10MarcoAurelio)
[12:15:24] <wikibugs>	 (03PS1) 10MarcoAurelio: Mark repository as read only [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439469
[12:16:01] <wikibugs>	 (03PS2) 10MarcoAurelio: Mark repository as read only [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439469
[12:16:33] <wikibugs>	 (03PS3) 10MarcoAurelio: Mark repository as read only [software/tessera] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/439469 (https://phabricator.wikimedia.org/T186096)
[12:20:52] <wikibugs>	 (03PS1) 10Reedy: Enable wgCSPReportOnlyHeader for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439470
[12:26:42] <wikibugs>	 (03CR) 10Reedy: [C: 032] Enable wgCSPReportOnlyHeader for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439470 (owner: 10Reedy)
[12:28:21] <wikibugs>	 (03Merged) 10jenkins-bot: Enable wgCSPReportOnlyHeader for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439470 (owner: 10Reedy)
[12:28:35] <wikibugs>	 (03CR) 10jenkins-bot: Enable wgCSPReportOnlyHeader for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439470 (owner: 10Reedy)
[12:30:04] <logmsgbot>	 !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings.php: CSP in report mode for group0 (duration: 00m 55s)
[12:30:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:34:46] <wikibugs>	 (03PS1) 10Reedy: Enable CSP in Report Only Mode everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439471
[12:47:15] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4270059 (10Paladox) But phabricator changed it's behaviour and now clones refs/**. So to fix this we need regex to not clone refs/changes/**/meta.
[13:09:07] <icinga-wm>	 PROBLEM - Disk space on elastic1019 is CRITICAL: DISK CRITICAL - free space: /srv 59430 MB (12% inode=99%)
[13:18:57] <icinga-wm>	 RECOVERY - Disk space on elastic1019 is OK: DISK OK
[13:24:43] <wikibugs>	 (03PS1) 10Urbanecm: Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134)
[13:26:02] <wikibugs>	 (03PS2) 10Urbanecm: Revert "Change bewikiquote logo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439477 (https://phabricator.wikimedia.org/T196134)
[13:29:31] <wikibugs>	 (03PS1) 10Urbanecm: Change logo files for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439478 (https://phabricator.wikimedia.org/T196134)
[13:29:33] <wikibugs>	 (03PS1) 10Urbanecm: Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479
[13:29:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (owner: 10Urbanecm)
[13:30:04] <wikibugs>	 (03PS2) 10Urbanecm: Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134)
[13:30:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Use uploaded HD logo for bewikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/439479 (https://phabricator.wikimedia.org/T196134) (owner: 10Urbanecm)
[13:50:07] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1965 bytes in 0.072 second response time
[13:53:37] <wikibugs>	 (03PS1) 10Aklapper: Make Phabricator footer links use Special:MyLanguage [puppet] - 10https://gerrit.wikimedia.org/r/439482 (https://phabricator.wikimedia.org/T196836)
[14:03:20] <wikibugs>	 (03PS1) 10Paladox: Gerrit: Add CoC and privacy policy to footer [puppet] - 10https://gerrit.wikimedia.org/r/439483
[14:06:20] <wikibugs>	 (03PS2) 10Paladox: Gerrit: Add CoC and privacy policy to footer [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835)
[14:07:49] <wikibugs>	 (03PS3) 10Paladox: Gerrit: Add CoC and privacy policy to footer [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835)
[14:10:27] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1965 bytes in 0.070 second response time
[14:14:35] <wikibugs>	 (03CR) 10Paladox: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/439483 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox)
[14:17:38] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1972 bytes in 0.074 second response time
[14:37:58] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1953 bytes in 0.066 second response time
[14:45:18] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1976 bytes in 0.095 second response time
[15:00:38] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1942 bytes in 0.087 second response time
[15:57:34] <bawolff>	 Are the weird centralauth using the wrong db errors at https://logstash.wikimedia.org/goto/90399bf0d838acda35862bc488c28a05 a known issue?
[17:00:18] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={container_status,create_container,podsandbox_status,remove_container,start_container,stop_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[17:01:28] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[17:11:43] <wikibugs>	 (03CR) 10Framawiki: [C: 031] Make Phabricator footer links use Special:MyLanguage [puppet] - 10https://gerrit.wikimedia.org/r/439482 (https://phabricator.wikimedia.org/T196836) (owner: 10Aklapper)
[18:48:35] <wikibugs>	 (03CR) 10MarcoAurelio: [C: 031] Make Phabricator footer links use Special:MyLanguage [puppet] - 10https://gerrit.wikimedia.org/r/439482 (https://phabricator.wikimedia.org/T196836) (owner: 10Aklapper)
[19:32:37] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1968 bytes in 0.065 second response time
[19:37:38] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1953 bytes in 0.076 second response time
[19:43:54] <wikibugs>	 (03PS1) 10Paladox: Rename wikimedia-polygerrit-style.html to gerrit-theme.html and also add footer links [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503
[19:55:08] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1966 bytes in 0.063 second response time
[19:59:17] <wikibugs>	 (03PS2) 10Paladox: Rename wikimedia-polygerrit-style.html to gerrit-theme.html and also add footer links [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503
[19:59:41] <wikibugs>	 (03PS3) 10Paladox: Rename wikimedia-polygerrit-style.html to gerrit-theme.html and also add footer links [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503 (https://phabricator.wikimedia.org/T196835)
[20:00:08] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1945 bytes in 0.070 second response time
[20:00:36] <wikibugs>	 (03PS1) 10Paladox: Rename wikimedia-polygerrit-style.html to gerrit-theme.html [puppet] - 10https://gerrit.wikimedia.org/r/439504 (https://phabricator.wikimedia.org/T196835)
[20:02:32] <wikibugs>	 (03PS2) 10Paladox: Rename wikimedia-polygerrit-style.html to gerrit-theme.html [puppet] - 10https://gerrit.wikimedia.org/r/439504 (https://phabricator.wikimedia.org/T196835)
[20:03:05] <wikibugs>	 (03CR) 10Paladox: "this should be merged at the same time as the puppet change https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/439504/" [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox)
[20:03:26] <wikibugs>	 (03CR) 10Paladox: "This change is ready for review." [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/439503 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox)
[20:03:51] <wikibugs>	 (03CR) 10Paladox: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/439504 (https://phabricator.wikimedia.org/T196835) (owner: 10Paladox)
[20:07:27] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1965 bytes in 0.084 second response time
[20:17:37] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1971 bytes in 0.061 second response time
[21:58:26] <wikibugs>	 10Operations, 10DBA, 10Gerrit, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4270673 (10Paladox)