[00:01:46] <wikibugs>	 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10mobrovac) All of the instances have joined the ring (thnx @fgiunchedi!) and the latest version of RESTBase is in place, so we are good. There is one problem...
[00:13:58] <wikibugs>	 (03PS1) 10Bstorm: toolforge: update the version of php-cgi to 7.2 as well [puppet] - 10https://gerrit.wikimedia.org/r/485343 (https://phabricator.wikimedia.org/T213666)
[00:15:21] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] toolforge: update the version of php-cgi to 7.2 as well [puppet] - 10https://gerrit.wikimedia.org/r/485343 (https://phabricator.wikimedia.org/T213666) (owner: 10Bstorm)
[01:25:07] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/485106 (owner: 10Dzahn)
[05:18:11] <logmsgbot>	 !log legoktm@deploy1001 Synchronized php-1.33.0-wmf.13/extensions/JsonConfig/includes/JCCache.php:  Revert "JCCache: Explicit load the main slot to avoid API warnings" - T214179 (duration: 00m 58s)
[05:18:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:18:15] <stashbot>	 T214179: mw.ext.data.get Lua call returns false - https://phabricator.wikimedia.org/T214179
[05:25:18] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@af21320]: bump discovery analytics to latest
[05:25:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:25:35] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@af21320]: bump discovery analytics to latest (duration: 00m 17s)
[05:25:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:46:03] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step
[05:46:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:46:17] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step (duration: 00m 13s)
[05:46:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:02] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step
[05:55:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:18] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step (duration: 00m 15s)
[05:55:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:32] <logmsgbot>	 !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step
[05:55:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:46] <logmsgbot>	 !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step (duration: 00m 14s)
[05:55:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:46:44] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 2972.64 seconds
[06:46:58] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 3506.21 seconds
[06:47:08] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: x1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 64129.49 seconds
[06:47:16] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 178588.03 seconds
[06:47:18] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1032, Errmsg: Could not execute Update_rows_v1 event on table frwiki.echo_notification: Cant find record in echo_notification, Error_code: 1032: handler error HA_ERR_KEY_NOT_FOUND: the events master log db1069-bin.000314, end_log_pos 800908454
[06:47:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 5744.24 seconds
[06:47:30] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 7771.10 seconds
[07:13:50] <icinga-wm>	 PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:14:54] <icinga-wm>	 RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time
[07:24:54] <icinga-wm>	 PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:28:55] <wikibugs>	 (03Abandoned) 10MGChecker: Reduce Codesniffer exclusions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/467104 (owner: 10MGChecker)
[07:33:24] <icinga-wm>	 RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time
[07:36:44] <elukey>	 !log restart pdfrender on scb1004
[07:36:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:12] <marostegui>	 !log Fixing dbstore1002 x1 replication T213670
[08:42:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:15] <stashbot>	 T213670: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670
[08:45:14] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: x1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[08:58:08] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 963.16 seconds
[08:58:14] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 997.00 seconds
[08:58:20] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 961.38 seconds
[09:02:24] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m2 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:02:24] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:02:24] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:02:24] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:02:28] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s8 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:02:30] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:02:38] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:02:42] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:02:42] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m3 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:02:44] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:02:44] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:02:50] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:02:52] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:02:54] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:03:00] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:03:02] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:03:08] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s5 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:03:12] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s6 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:03:14] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:03:18] <icinga-wm>	 PROBLEM - MariaDB Slave IO: x1 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:03:34] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:03:34] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s4 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:06:20] <marostegui>	 elukey: I think it has now crashed because of the alter?
[09:11:30] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:11:36] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:10] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:31:10] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:31:10] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:31:10] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:10] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:31:14] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:14] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:14] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:31:16] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:18] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s5 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:31:22] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:22] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s6 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:31:24] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:31:26] <icinga-wm>	 PROBLEM - MariaDB Slave IO: x1 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:31:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:38] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: x1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:46] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:31:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:46] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s4 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:31:50] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:31:50] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m2 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:31:50] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:31:50] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:31:54] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s8 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:31:56] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:31:56] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:32:04] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:32:04] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[09:32:08] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m3 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:32:08] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:32:10] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[09:32:10] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[09:42:05] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Product-Analytics: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui)
[09:42:26] <elukey>	 marostegui: :(
[09:43:37] <elukey>	 I am tailing /srv/sqldata/dbstore1002.err, I am seeing the recovery steps
[09:43:40] <elukey>	 sigh
[09:44:09] <marostegui>	 yeah, let's not alter it anymore
[09:44:17] <marostegui>	 check the update on the task (the alters task)
[09:45:16] <elukey>	 yep I saw it, makes sense
[09:45:27] <marostegui>	 what a nightmare :(
[09:45:32] <elukey>	 the main problem is if people will keep writing to dbstore1002's staging
[09:46:08] <marostegui>	 yeah, what I suggested for monday is just a PoC
[09:46:13] <marostegui>	 To see if it works fine
[09:46:17] <elukey>	 is mysql still bootstrapping ?
[09:46:20] <marostegui>	 yes
[09:46:24] <marostegui>	 it will take a while
[09:46:53] <marostegui>	 Once we are ready to fully migrate staging users from dbstore1002, we can do the final mysqldump+alter on the final host
[09:47:21] <marostegui>	 Which still reminds me that we need to decide where you want to place the current staging db
[09:47:29] <marostegui>	 we as in anlytics :)
[09:48:13] <elukey>	 in any of the dbstores
[09:48:16] <elukey>	 no preference
[09:48:42] <marostegui>	 yeah, but on which section?
[09:49:01] <marostegui>	 https://phabricator.wikimedia.org/T210478
[09:49:25] <elukey>	 I thought it was on a separate db not belonging to any section
[09:49:27] <elukey>	 no?
[09:49:33] <elukey>	 maybe I am still missing some stuff
[09:51:20] <elukey>	 anyway, me and Manuel have to run errand, and mysql is boostrapping
[09:51:45] <elukey>	 I should be back in maximum a couple of hours to see if everything is ok and slaves can be restarted
[09:52:13] <elukey>	 there not much that we can do now :(
[09:52:27] <elukey>	 will update this chan later on! (unless anybody else beats me :P)
[10:03:46] <icinga-wm>	 PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:18:12] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m2 on dbstore1002 is OK: OK slave_sql_lag not a slave
[10:18:44] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m3 on dbstore1002 is OK: OK slave_sql_state not a slave
[10:18:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m2 on dbstore1002 is OK: OK slave_io_state not a slave
[10:18:58] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m2 on dbstore1002 is OK: OK slave_sql_state not a slave
[10:19:02] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m3 on dbstore1002 is OK: OK slave_io_state not a slave
[10:19:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on dbstore1002 is OK: OK slave_sql_lag not a slave
[10:29:52] <icinga-wm>	 RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:06:50] <elukey>	 back
[12:07:37] <elukey>	 ok so mysql on dbstore1002 seems running fine
[12:08:04] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[12:08:06] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s2 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[12:08:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s7 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[12:08:14] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s3 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[12:08:16] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s1 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[12:08:18] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s8 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[12:08:19] <elukey>	 !log run 'start all slaves' on dbstore1002 after crash
[12:08:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:08:24] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s6 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[12:08:26] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s5 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[12:08:32] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s5 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[12:08:34] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s2 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[12:08:34] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s6 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[12:08:36] <icinga-wm>	 RECOVERY - MariaDB Slave IO: x1 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[12:08:56] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s7 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[12:08:56] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s4 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[12:09:02] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[12:09:04] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s8 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[12:09:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s4 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[12:11:44] <elukey>	 will recheck later :)
[12:34:06] <marostegui>	 elukey: :)
[12:34:43] <onimisionipe>	 !log pool maps1003 - stretch migration is complete T198622
[12:34:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:34:46] <stashbot>	 T198622: migrate maps servers to stretch with the current style - https://phabricator.wikimedia.org/T198622
[12:36:01] <marostegui>	 elukey: x1 replication is broken ,I will check how many rows are missing and if I can fix it quickly or we should just reimport it
[12:39:48] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: x1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[13:16:40] <icinga-wm>	 PROBLEM - HHVM rendering on mw1231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:17:44] <icinga-wm>	 RECOVERY - HHVM rendering on mw1231 is OK: HTTP OK: HTTP/1.1 200 OK - 81018 bytes in 0.149 second response time
[13:18:00] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2077 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.04 seconds
[13:18:04] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.41 seconds
[13:18:10] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2040 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.49 seconds
[13:18:12] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2054 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.63 seconds
[13:18:14] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.82 seconds
[13:18:24] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2086 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 308.65 seconds
[13:18:42] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2061 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 312.62 seconds
[13:18:42] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 312.27 seconds
[13:18:58] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2087 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 314.36 seconds
[13:21:07] <elukey>	 marostegui: sounds like you fixed it right??
[13:22:09] <marostegui>	 elukey: I have had to fix lots and lots of rows
[13:22:23] <marostegui>	 It is still catching up, down from 60k seconds to 8k seconds
[13:24:17] <marostegui>	 another one failed
[13:25:45] <elukey>	 :(
[13:28:22] <marostegui>	 elukey: x1 caught up
[13:28:46] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: x1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.35 seconds
[13:31:02] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 268.50 seconds
[13:34:59] <elukey>	 niceee
[13:37:43] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Product-Analytics: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) After all the crashes, MySQL was able to start at around 10:18:11 (UTC). @elukey start replication on all slaves at around 12:07:56 (UTC). x1 replication w...
[13:57:22] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2077 is OK: OK slave_sql_lag Replication lag: 1.16 seconds
[13:57:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2068 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[13:57:32] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2040 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[13:57:36] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2054 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[13:57:38] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2047 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[13:57:46] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2086 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[13:58:04] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2095 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[13:58:04] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2061 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[13:58:20] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on db2087 is OK: OK slave_sql_lag Replication lag: 0.01 seconds
[14:01:42] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 291.00 seconds
[14:36:48] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: Log docker build output [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475779 (owner: 10Hashar)
[14:39:29] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Fix the logic of the FSM to account for the fact we allow pulling [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485219 (owner: 10Giuseppe Lavagetto)
[14:40:00] <wikibugs>	 (03CR) 10jenkins-bot: Fix the logic of the FSM to account for the fact we allow pulling [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/485219 (owner: 10Giuseppe Lavagetto)
[14:40:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Log docker build output [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475779 (owner: 10Hashar)
[14:40:58] <wikibugs>	 (03CR) 10jenkins-bot: Log docker build output [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475779 (owner: 10Hashar)
[14:55:27] <wikibugs>	 (03PS1) 10Marostegui: [WIP] dbstore_multiinstance: Add stanging db [puppet] - 10https://gerrit.wikimedia.org/r/485367
[15:01:20] <wikibugs>	 (03PS2) 10Marostegui: [WIP] dbstore_multiinstance: Add stanging db [puppet] - 10https://gerrit.wikimedia.org/r/485367
[15:06:46] <wikibugs>	 (03PS3) 10Marostegui: [WIP] dbstore_multiinstance: Add stanging db [puppet] - 10https://gerrit.wikimedia.org/r/485367
[15:09:03] <wikibugs>	 (03PS4) 10Marostegui: [WIP] dbstore_multiinstance: Add stanging db [puppet] - 10https://gerrit.wikimedia.org/r/485367
[15:10:14] <wikibugs>	 (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler1002/14398/" [puppet] - 10https://gerrit.wikimedia.org/r/485367 (owner: 10Marostegui)
[15:14:40] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2090 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 341.18 seconds
[15:14:42] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 341.94 seconds
[15:14:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 343.48 seconds
[15:14:48] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2084 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 345.12 seconds
[15:15:04] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 350.16 seconds
[15:15:30] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 359.60 seconds
[15:15:32] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 360.36 seconds
[15:15:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2073 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 365.52 seconds
[15:34:12] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2073 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 307.66 seconds
[15:34:20] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2090 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.20 seconds
[15:34:20] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.28 seconds
[15:34:24] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.12 seconds
[15:34:28] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2084 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.65 seconds
[15:34:44] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.10 seconds
[15:53:14] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 248.04 seconds
[16:03:14] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s8 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 297.01 seconds
[16:13:54] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2051 is OK: OK slave_sql_lag Replication lag: 56.29 seconds
[16:13:56] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2084 is OK: OK slave_sql_lag Replication lag: 56.18 seconds
[16:14:14] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2058 is OK: OK slave_sql_lag Replication lag: 59.90 seconds
[16:15:45] <Hauskatze>	 actor migration? ^^
[16:15:52] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2065 is OK: OK slave_sql_lag Replication lag: 24.03 seconds
[16:15:54] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2091 is OK: OK slave_sql_lag Replication lag: 24.26 seconds
[16:16:08] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2073 is OK: OK slave_sql_lag Replication lag: 28.30 seconds
[16:16:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2095 is OK: OK slave_sql_lag Replication lag: 21.12 seconds
[16:16:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2090 is OK: OK slave_sql_lag Replication lag: 18.37 seconds
[16:19:46] <icinga-wm>	 PROBLEM - puppet last run on kafka-jumbo1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:49:32] <wikibugs>	 (03PS5) 10ArielGlenn: do multistream dumps in parallel and recombine for big wikis [dumps] - 10https://gerrit.wikimedia.org/r/484754 (https://phabricator.wikimedia.org/T213912)
[16:51:06] <icinga-wm>	 RECOVERY - puppet last run on kafka-jumbo1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[17:05:51] <marostegui>	 Hauskatze: yes
[17:06:08] <Hauskatze>	 marostegui: ¿tienes un minuto?
[17:08:41] <wikibugs>	 (03PS5) 10Marostegui: dbstore_multiinstance: Add stanging db [puppet] - 10https://gerrit.wikimedia.org/r/485367 (https://phabricator.wikimedia.org/T210478)
[17:11:57] <marostegui>	 Hauskatze: Si :)
[17:12:04] <wikibugs>	 (03PS6) 10Marostegui: dbstore_multiinstance: Add staging db [puppet] - 10https://gerrit.wikimedia.org/r/485367 (https://phabricator.wikimedia.org/T210478)
[18:02:00] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2065 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.55 seconds
[18:02:02] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.29 seconds
[18:02:12] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2073 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.70 seconds
[18:02:20] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.11 seconds
[18:02:22] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2090 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.69 seconds
[18:02:32] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 312.40 seconds
[18:02:32] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2084 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 312.51 seconds
[18:02:50] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 318.37 seconds
[18:51:12] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 266.61 seconds
[19:31:22] <wikibugs>	 (03PS3) 10GTirloni: labstore - Allow multiple bdsync jobs per host [puppet] - 10https://gerrit.wikimedia.org/r/485200 (https://phabricator.wikimedia.org/T209527)
[19:33:55] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] labstore - Allow multiple bdsync jobs per host [puppet] - 10https://gerrit.wikimedia.org/r/485200 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni)
[19:39:06] <wikibugs>	 (03PS1) 10BryanDavis: toolforge: Prometheus replacement for sge.py diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/485372 (https://phabricator.wikimedia.org/T211684)
[19:39:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] toolforge: Prometheus replacement for sge.py diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/485372 (https://phabricator.wikimedia.org/T211684) (owner: 10BryanDavis)
[19:42:09] <wikibugs>	 (03PS2) 10BryanDavis: toolforge: Prometheus replacement for sge.py diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/485372 (https://phabricator.wikimedia.org/T211684)
[20:34:26] <gtirloni>	 !log upgraded and rebooted labstore200{3,4}
[20:34:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:16] <wikibugs>	 (03PS1) 10GTirloni: wmcs::nfs::misc - Backup for misc server (cloudstore1008) [puppet] - 10https://gerrit.wikimedia.org/r/485375 (https://phabricator.wikimedia.org/T209527)
[20:39:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs::nfs::misc - Backup for misc server (cloudstore1008) [puppet] - 10https://gerrit.wikimedia.org/r/485375 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni)
[20:42:38] <wikibugs>	 (03PS4) 10ArielGlenn: option to skip siteinfo header, mw footer for recompressing files [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/442774 (https://phabricator.wikimedia.org/T213200)
[20:42:40] <wikibugs>	 (03PS4) 10ArielGlenn: options for writeuptopageid to skip writing header or footer [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/442775 (https://phabricator.wikimedia.org/T213200)
[20:42:42] <wikibugs>	 (03PS2) 10ArielGlenn: version 0.0.9 [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/482861 (https://phabricator.wikimedia.org/T213200)
[20:44:45] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: lost administrator login password for Wikies-l mail list - https://phabricator.wikimedia.org/T214249 (10JorgeGG)
[20:45:22] <wikibugs>	 (03Abandoned) 10ArielGlenn: fix up iohandlers to write separate streams for header and footer again [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/485240 (owner: 10ArielGlenn)
[20:47:39] <wikibugs>	 (03PS2) 10GTirloni: wmcs::nfs::misc - Backup for misc server (cloudstore1008) [puppet] - 10https://gerrit.wikimedia.org/r/485375 (https://phabricator.wikimedia.org/T209527)
[20:47:48] <wikibugs>	 (03CR) 10ArielGlenn: [V: 03+2 C: 03+2] move iohandler code for compression/decompression out to a separate file [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/441484 (https://phabricator.wikimedia.org/T213200) (owner: 10ArielGlenn)
[20:49:26] <wikibugs>	 (03CR) 10ArielGlenn: [V: 03+2 C: 03+2] use iohandlers for recompressxml input and output [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/441485 (https://phabricator.wikimedia.org/T213200) (owner: 10ArielGlenn)
[20:50:25] <wikibugs>	 (03CR) 10ArielGlenn: [V: 03+2 C: 03+2] option to skip siteinfo header, mw footer for recompressing files [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/442774 (https://phabricator.wikimedia.org/T213200) (owner: 10ArielGlenn)
[20:51:30] <wikibugs>	 (03CR) 10ArielGlenn: [V: 03+2 C: 03+2] options for writeuptopageid to skip writing header or footer [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/442775 (https://phabricator.wikimedia.org/T213200) (owner: 10ArielGlenn)
[20:52:22] <wikibugs>	 (03CR) 10ArielGlenn: [V: 03+2 C: 03+2] version 0.0.9 [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/482861 (https://phabricator.wikimedia.org/T213200) (owner: 10ArielGlenn)
[20:55:00] <wikibugs>	 (03PS3) 10GTirloni: wmcs::nfs::misc - Backup for misc server (cloudstore1008) [puppet] - 10https://gerrit.wikimedia.org/r/485375 (https://phabricator.wikimedia.org/T209527)
[21:09:12] <wikibugs>	 (03CR) 10ArielGlenn: [V: 03+2 C: 03+2] version 0.0.9 [debs/mwbzutils] - 10https://gerrit.wikimedia.org/r/483077 (https://phabricator.wikimedia.org/T213200) (owner: 10ArielGlenn)
[21:17:34] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 329.68 seconds
[21:17:42] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2040 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 334.79 seconds
[21:17:48] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 335.65 seconds
[21:17:48] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2054 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 335.71 seconds
[21:17:54] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2086 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 339.83 seconds
[21:17:56] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 340.39 seconds
[21:18:10] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2061 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 347.08 seconds
[21:18:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2087 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 352.09 seconds
[21:18:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2077 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 359.73 seconds
[21:26:13] <wikibugs>	 (03PS2) 10ArielGlenn: specify output file explicitly for recompress dump jobs [dumps] - 10https://gerrit.wikimedia.org/r/482870 (https://phabricator.wikimedia.org/T213200)
[21:26:15] <wikibugs>	 (03PS10) 10ArielGlenn: write header/body/footer of xml gz files as separate streams [dumps] - 10https://gerrit.wikimedia.org/r/484505 (https://phabricator.wikimedia.org/T182572)
[21:26:17] <wikibugs>	 (03PS6) 10ArielGlenn: do multistream dumps in parallel and recombine for big wikis [dumps] - 10https://gerrit.wikimedia.org/r/484754 (https://phabricator.wikimedia.org/T213912)
[21:57:49] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] specify output file explicitly for recompress dump jobs [dumps] - 10https://gerrit.wikimedia.org/r/482870 (https://phabricator.wikimedia.org/T213200) (owner: 10ArielGlenn)
[22:05:39] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] write header/body/footer of xml gz files as separate streams [dumps] - 10https://gerrit.wikimedia.org/r/484505 (https://phabricator.wikimedia.org/T182572) (owner: 10ArielGlenn)
[22:10:47] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] do multistream dumps in parallel and recombine for big wikis [dumps] - 10https://gerrit.wikimedia.org/r/484754 (https://phabricator.wikimedia.org/T213912) (owner: 10ArielGlenn)
[22:12:22] <logmsgbot>	 !log ariel@deploy1001 Started deploy [dumps/dumps@ab79bbb]: multistream dumps in parallel, recombine gz and multistream without decompression
[22:12:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:12:26] <logmsgbot>	 !log ariel@deploy1001 Finished deploy [dumps/dumps@ab79bbb]: multistream dumps in parallel, recombine gz and multistream without decompression (duration: 00m 03s)
[22:12:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:15:24] <icinga-wm>	 PROBLEM - Host labstore2004 is DOWN: PING CRITICAL - Packet loss = 100%
[22:30:42] <wikibugs>	 (03PS1) 10ArielGlenn: dumps: recombine multiple page content multistream files, if produced [puppet] - 10https://gerrit.wikimedia.org/r/485477 (https://phabricator.wikimedia.org/T213912)
[22:34:27] <apergos>	 tfw you just pushed a bunch of stuff late on a saturday and suddenly your screen is full of PROBLEM... and then you relize a) it's known b) it's unrelated :-) :-)
[22:41:26] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] dumps: recombine multiple page content multistream files, if produced [puppet] - 10https://gerrit.wikimedia.org/r/485477 (https://phabricator.wikimedia.org/T213912) (owner: 10ArielGlenn)
[22:42:58] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Reset list admin password for Wikies-l mailing list - https://phabricator.wikimedia.org/T214249 (10Peachey88)
[22:50:25] <apergos>	 all set for tomorrow's xml/sql dump run now. which, by my clock, is actually later today!
[23:09:49] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:10:54] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.015 second response time
[23:25:26] <icinga-wm>	 PROBLEM - WDQS HTTP Port on wdqs1004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 387 bytes in 0.002 second response time
[23:35:13] <icinga-wm>	 ACKNOWLEDGEMENT - Host labstore2004 is DOWN: PING CRITICAL - Packet loss = 100% GTirloni Stuck after reboot