[00:00:04] <jouncebot>	 twentyafterfour: #bothumor I � Unicode. All rise for Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180628T0000).
[00:05:23] <twentyafterfour>	 !log taking apache offline momentarily on phab1001
[00:05:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:09:47] <icinga-wm>	 PROBLEM - https://phabricator.wikimedia.org on phab1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 2428 bytes in 0.010 second response time
[00:11:58] <icinga-wm>	 RECOVERY - https://phabricator.wikimedia.org on phab1002 is OK: HTTP OK: HTTP/1.1 200 OK - 32278 bytes in 0.270 second response time
[00:12:57] <twentyafterfour>	 !log phabricator update failed. unable to apply database migrations: mysql access denied
[00:12:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:13:20] <twentyafterfour>	 !log rolled back and restored service to previous state
[00:13:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:28:35] <RoanKattouw>	 Uhm, wtf
[00:28:40] <RoanKattouw>	 A full scap breaks on wmf.10
[00:28:53] <RoanKattouw>	 1) when lint fails, it should tell you why, not just say "exit status 123"
[00:29:00] <RoanKattouw>	 2) why on earth do we have syntax errors in vendor
[00:29:05] <RoanKattouw>	 Fatal error: syntax error, unexpected T_CONST, expecting T_VARIABLE in /srv/mediawiki-staging/php-1.32.0-wmf.10/vendor/psy/psysh/test/ClassWithSecrets.php on line 16
[00:29:43] <RoanKattouw>	 Oh wait I wasn't trying a full scap, rather sync-dir php-1.32.0-wmf.10
[00:31:54] <RoanKattouw>	 OK then I will have to violate policy and sync includes/ and resources/ separately with two syncs
[00:32:49] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.32.0-wmf.10/includes: Watchlist perf patches for SWAT, part 1 (T197168, T198140, T198142) (duration: 01m 13s)
[00:32:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:32:54] <stashbot>	 T197168: Fix slow Watchlist load and interaction times - https://phabricator.wikimedia.org/T197168
[00:32:54] <stashbot>	 T198140: Prevent updateInputSize() in mw.rcfilters.ui.FilterTagMultiselectWidget - https://phabricator.wikimedia.org/T198140
[00:32:54] <stashbot>	 T198142: Speed up lazy-building of menu - https://phabricator.wikimedia.org/T198142
[00:33:57] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.32.0-wmf.10/resources: Watchlist perf patches for SWAT, part 2 (T197168, T198140, T198142) (duration: 00m 57s)
[00:33:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:57:29] <wikibugs>	 (03PS7) 10Smalyshev: Generate daily diffs for categories RDF [puppet] - 10https://gerrit.wikimedia.org/r/378355 (https://phabricator.wikimedia.org/T198356)
[00:58:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Generate daily diffs for categories RDF [puppet] - 10https://gerrit.wikimedia.org/r/378355 (https://phabricator.wikimedia.org/T198356) (owner: 10Smalyshev)
[01:30:48] <Krinkle>	 RoanKattouw: Be sure to file a task if there isn't one already.
[01:31:07] <Krinkle>	 psysh was updated very recently, train block worthy imho
[01:34:55] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] "In a later patch is fine, and per Aaron, might even be obsolete if it ends up removed indeed. If the key name is the only difference then " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440469 (https://phabricator.wikimedia.org/T198239) (owner: 10Aaron Schulz)
[02:00:38] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on mw1239 is CRITICAL: 4 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1239&var-datasource=eqiad%2520prometheus%252Fops
[02:06:28] <wikibugs>	 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration: m.{project}.org portal/redirect consistency - https://phabricator.wikimedia.org/T78421#4320893 (10MZMcBride)
[02:21:29] <logmsgbot>	 !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.8) (duration: 07m 54s)
[02:21:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:53:09] <logmsgbot>	 !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.10) (duration: 13m 47s)
[02:53:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:03:34] <logmsgbot>	 !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Thu Jun 28 03:03:34 UTC 2018 (duration 10m 25s)
[03:03:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:06:22] <wikibugs>	 (03PS8) 10Smalyshev: Generate daily diffs for categories RDF [puppet] - 10https://gerrit.wikimedia.org/r/378355 (https://phabricator.wikimedia.org/T198356)
[03:13:48] <wikibugs>	 (03CR) 10Smalyshev: [C: 031] Add cirrussearch settings for wikibase (1.5/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442317 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[03:14:13] <wikibugs>	 (03CR) 10Smalyshev: [C: 031] Add cirrussearch settings for wikibase (2/3) (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442318 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[03:14:30] <wikibugs>	 (03CR) 10Smalyshev: [C: 031] Add cirrussearch settings for wikibase (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441057 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[04:35:21] <marostegui>	 twentyafterfour: Can you try again for T198367
[04:35:21] <stashbot>	 T198367: Mysql Access denied to 'phadmin'@'10.64.0.198' - https://phabricator.wikimedia.org/T198367
[04:35:21] <marostegui>	 ?
[04:42:17] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442758 (https://phabricator.wikimedia.org/T191316)
[04:43:34] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442758 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[04:44:43] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442758 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[04:45:58] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1099:3318 for alter table (duration: 00m 59s)
[04:46:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:46:19] <marostegui>	 !log  Deploy schema change on db1099:3318 T191316 T192926 T89737 T195193
[04:46:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:46:23] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[04:46:23] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[04:46:23] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[04:46:24] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[04:48:24] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442758 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[05:30:06] <wikibugs>	 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Investigate HTTP 500 on POST request to WDQS - https://phabricator.wikimedia.org/T198055#4320949 (10Smalyshev) p:05Triage>03Normal
[05:30:16] <wikibugs>	 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Enable async logging on Wikidata Query Service - https://phabricator.wikimedia.org/T198051#4320950 (10Smalyshev) p:05Triage>03Normal
[05:49:11] <wikibugs>	 (03PS2) 10ArielGlenn: use iohandlers for recompressxml input and output [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/441485
[05:49:13] <wikibugs>	 (03PS1) 10ArielGlenn: option to skip siteinfo header, mw footer for recompresing files [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/442774
[05:49:15] <wikibugs>	 (03PS1) 10ArielGlenn: options for writeuptopageid to skip writing header or footer [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/442775
[06:27:57] <icinga-wm>	 PROBLEM - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/share/ca-certificates/wmf_ca_2017_2020.crt]
[06:29:18] <icinga-wm>	 PROBLEM - puppet last run on oresrdb1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/vim/vimrc.local]
[06:30:07] <icinga-wm>	 PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ssh/userkeys/root.d/labstore]
[06:30:28] <icinga-wm>	 PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/gen_fingerprints]
[06:55:28] <icinga-wm>	 RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[06:55:48] <icinga-wm>	 RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures
[06:56:09] <wikibugs>	 (03PS2) 10Muehlenhoff: Add trusty-wikimedia to known-dists [puppet] - 10https://gerrit.wikimedia.org/r/442325
[06:58:27] <icinga-wm>	 RECOVERY - puppet last run on ms-be1027 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:59:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Add trusty-wikimedia to known-dists [puppet] - 10https://gerrit.wikimedia.org/r/442325 (owner: 10Muehlenhoff)
[06:59:48] <icinga-wm>	 RECOVERY - puppet last run on oresrdb1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[07:12:12] <elukey>	 !log upload piwik 3.2.1 to jessie-wikimedia
[07:12:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:26:42] <wikibugs>	 10Operations, 10ops-eqiad: Heating alerts for mw servers in eqiad - https://phabricator.wikimedia.org/T149287#4321016 (10MoritzMuehlenhoff) @Cmjohnson Thanks, sounds good.
[07:30:05] <wikibugs>	 (03PS1) 10Joal: Remove ORM jar from sqoop cron command [puppet] - 10https://gerrit.wikimedia.org/r/442780 (https://phabricator.wikimedia.org/T196912)
[07:30:15] <joal>	 elukey: --^ please :)
[07:30:56] <wikibugs>	 (03CR) 10Elukey: [C: 032] Remove ORM jar from sqoop cron command [puppet] - 10https://gerrit.wikimedia.org/r/442780 (https://phabricator.wikimedia.org/T196912) (owner: 10Joal)
[07:33:24] <wikibugs>	 (03PS1) 10Muehlenhoff: Add tarrow to LDAP users list [puppet] - 10https://gerrit.wikimedia.org/r/442781 (https://phabricator.wikimedia.org/T196434)
[07:34:17] <wikibugs>	 (03PS2) 10Muehlenhoff: Add tarrow to LDAP users list [puppet] - 10https://gerrit.wikimedia.org/r/442781 (https://phabricator.wikimedia.org/T196434)
[07:34:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Add tarrow to LDAP users list [puppet] - 10https://gerrit.wikimedia.org/r/442781 (https://phabricator.wikimedia.org/T196434) (owner: 10Muehlenhoff)
[07:38:38] <wikibugs>	 (03CR) 10Tarrow: Add tarrow to LDAP users list (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/442781 (https://phabricator.wikimedia.org/T196434) (owner: 10Muehlenhoff)
[07:40:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Add tarrow to LDAP users list (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/442781 (https://phabricator.wikimedia.org/T196434) (owner: 10Muehlenhoff)
[07:42:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Fix email address for tarrow [puppet] - 10https://gerrit.wikimedia.org/r/442785
[07:43:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Fix email address for tarrow [puppet] - 10https://gerrit.wikimedia.org/r/442785 (owner: 10Muehlenhoff)
[08:09:34] <vgutierrez>	 !log updating librdkafka1 && restart varnishkafka instances in cache::text nodes - T182993
[08:09:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:09:37] <stashbot>	 T182993: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993
[08:11:54] <wikibugs>	 (03PS8) 10Muehlenhoff: debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[08:15:03] <elukey>	 !log restart-hhvm on mw1227 (some threads stuck in jit-related operations, causing high load)
[08:15:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:16:24] <wikibugs>	 (03CR) 10Volans: [C: 032] debmonitor: install debmonitor-client [puppet] - 10https://gerrit.wikimedia.org/r/439641 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[08:16:42] <wikibugs>	 (03PS5) 10ArielGlenn: generate temp stubs for page ranges serially from same input stub file [dumps] - 10https://gerrit.wikimedia.org/r/436956 (https://phabricator.wikimedia.org/T196063)
[08:18:13] <wikibugs>	 (03PS3) 10Gehel: maps: isolate maps-test2003 and reimage it to stretch [puppet] - 10https://gerrit.wikimedia.org/r/442258 (https://phabricator.wikimedia.org/T198290)
[08:18:17] <wikibugs>	 (03PS4) 10Gehel: maps: isolate maps-test2003 and reimage it to stretch [puppet] - 10https://gerrit.wikimedia.org/r/442258 (https://phabricator.wikimedia.org/T198290)
[08:19:29] <wikibugs>	 (03CR) 10Gehel: [C: 032] maps: isolate maps-test2003 and reimage it to stretch [puppet] - 10https://gerrit.wikimedia.org/r/442258 (https://phabricator.wikimedia.org/T198290) (owner: 10Gehel)
[08:32:14] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Depool es1016 for reimage to stretch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442792
[08:35:34] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool es1016 for reimage to stretch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442792 (owner: 10Jcrespo)
[08:36:48] <wikibugs>	 (03Merged) 10jenkins-bot: mariadb: Depool es1016 for reimage to stretch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442792 (owner: 10Jcrespo)
[08:37:44] <wikibugs>	 (03PS1) 10Volans: debmonitor: fix trusty crontab redirection [puppet] - 10https://gerrit.wikimedia.org/r/442793 (https://phabricator.wikimedia.org/T191300)
[08:38:15] <wikibugs>	 (03CR) 10jenkins-bot: mariadb: Depool es1016 for reimage to stretch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442792 (owner: 10Jcrespo)
[08:39:42] <wikibugs>	 (03CR) 10Volans: [C: 032] debmonitor: fix trusty crontab redirection [puppet] - 10https://gerrit.wikimedia.org/r/442793 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[08:40:32] <wikibugs>	 (03PS2) 10Vgutierrez: varnishkafka: Enable TLS signature algorithms and curves lists config [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993)
[08:40:34] <wikibugs>	 (03PS1) 10Vgutierrez: varnishkafka: Set TLS curves list and sigalgs list for cache::misc [puppet] - 10https://gerrit.wikimedia.org/r/442794 (https://phabricator.wikimedia.org/T182993)
[08:41:44] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Prepare for reimage of es1016 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/442795
[08:46:04] <arturo>	 !log aborrero@labtestnet2001:~ 7s 130 $ sudo service nova-spiceproxy stop # daemon in infinite respawning loop
[08:46:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:41] <wikibugs>	 (03CR) 10Vgutierrez: "pcc shows (mostly) no changes: https://puppet-compiler.wmflabs.org/compiler02/11600/" [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[08:46:44] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool es1016 (duration: 01m 04s)
[08:46:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:40] <wikibugs>	 (03PS1) 10KartikMistry: lttoolbox: New upstream release [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/442798 (https://phabricator.wikimedia.org/T197559)
[08:47:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] lttoolbox: New upstream release [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/442798 (https://phabricator.wikimedia.org/T197559) (owner: 10KartikMistry)
[08:48:44] <wikibugs>	 (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/442798 (https://phabricator.wikimedia.org/T197559) (owner: 10KartikMistry)
[08:50:15] <wikibugs>	 (03PS2) 10DCausse: Add cirrussearch settings for wikibase (1.5/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442317 (https://phabricator.wikimedia.org/T182717)
[08:50:17] <wikibugs>	 (03PS2) 10DCausse: Add cirrussearch settings for wikibase (2/3) (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442318 (https://phabricator.wikimedia.org/T182717)
[08:50:19] <wikibugs>	 (03PS9) 10DCausse: Add cirrussearch settings for wikibase (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441057 (https://phabricator.wikimedia.org/T182717)
[08:50:30] <wikibugs>	 (03CR) 10Vgutierrez: "pcc shows no changes in upload and text nodes, and the expected changes in misc: https://puppet-compiler.wmflabs.org/compiler02/11601/" [puppet] - 10https://gerrit.wikimedia.org/r/442794 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[08:51:43] <wikibugs>	 (03CR) 10Elukey: [C: 031] varnishkafka: Enable TLS signature algorithms and curves lists config [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[08:52:26] <wikibugs>	 (03CR) 10Elukey: [C: 031] varnishkafka: Set TLS curves list and sigalgs list for cache::misc [puppet] - 10https://gerrit.wikimedia.org/r/442794 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[08:54:35] <wikibugs>	 (03PS3) 10Vgutierrez: varnishkafka: Enable TLS signature algorithms and curves lists config [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993)
[08:55:02] <wikibugs>	 (03CR) 10Vgutierrez: [C: 032] varnishkafka: Enable TLS signature algorithms and curves lists config [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[08:58:06] <wikibugs>	 (03PS2) 10Vgutierrez: varnishkafka: Set TLS curves list and sigalgs list for cache::misc [puppet] - 10https://gerrit.wikimedia.org/r/442794 (https://phabricator.wikimedia.org/T182993)
[08:58:22] <wikibugs>	 (03CR) 10Vgutierrez: [C: 032] varnishkafka: Set TLS curves list and sigalgs list for cache::misc [puppet] - 10https://gerrit.wikimedia.org/r/442794 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[09:00:22] <wikibugs>	 (03CR) 10Gehel: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442317 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[09:07:22] <wikibugs>	 10Operations, 10netops: Allow labnet/labnodepool/labvirt to connect to debmonitor hosts/443 - https://phabricator.wikimedia.org/T198375#4321131 (10MoritzMuehlenhoff)
[09:10:56] <vgutierrez>	 !log Apply new TLS varnishkafka settings in cache::misc nodes - T182993
[09:10:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:58] <stashbot>	 T182993: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993
[09:13:37] <logmsgbot>	 !log mobrovac@deploy1001 Started deploy [proton/deploy@8a887b5]: Update to dceaf80 - T186748
[09:13:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:13:40] <stashbot>	 T186748: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748
[09:14:06] <logmsgbot>	 !log mobrovac@deploy1001 Finished deploy [proton/deploy@8a887b5]: Update to dceaf80 - T186748 (duration: 00m 28s)
[09:14:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:09] <wikibugs>	 (03PS1) 10Elukey: Upgrade the piwik module to matomo [puppet] - 10https://gerrit.wikimedia.org/r/442806 (https://phabricator.wikimedia.org/T192298)
[09:16:45] <elukey>	 yep s/piwik/matomo
[09:16:58] <elukey>	 https://matomo.org/
[09:20:16] <arturo>	 !log T198377 stop nova-spiceproxy daemon in labcontrol1002.wikimedia.org
[09:20:17] <wikibugs>	 (03PS2) 10Elukey: Upgrade the piwik module to matomo [puppet] - 10https://gerrit.wikimedia.org/r/442806 (https://phabricator.wikimedia.org/T192298)
[09:20:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:18] <stashbot>	 T198377: nova-spiceproxy is in an infinite respawning loop - https://phabricator.wikimedia.org/T198377
[09:24:51] <joal>	 Hi ops-team - Little ping about analytics deploying AQS (elukey knows)
[09:25:01] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: eqiad1: keystone bootstrap done [puppet] - 10https://gerrit.wikimedia.org/r/442807 (https://phabricator.wikimedia.org/T196633)
[09:25:44] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: eqiad1: keystone bootstrap done [puppet] - 10https://gerrit.wikimedia.org/r/442807 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[09:26:05] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/11603/bohrium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/442806 (https://phabricator.wikimedia.org/T192298) (owner: 10Elukey)
[09:28:01] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/aqs/deploy@194ca96]: Deploying AQS pageviews-per-country ceiling-value glue code
[09:28:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:04] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/aqs/deploy@194ca96]: Deploying AQS pageviews-per-country ceiling-value glue code (duration: 01m 03s)
[09:29:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:25] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] debmonitor: fine-tune client user creation [puppet] - 10https://gerrit.wikimedia.org/r/442246 (https://phabricator.wikimedia.org/T191300) (owner: 10Volans)
[09:45:52] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/aqs/deploy@8eef2a9]: Deploying AQS pageviews-per-country ceiling-value glue code - Corrected
[09:45:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:15] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: hieradata: add profile::openstack::eqiad1::neutron::db_pass [labs/private] - 10https://gerrit.wikimedia.org/r/442814 (https://phabricator.wikimedia.org/T196633)
[09:46:39] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] hieradata: add profile::openstack::eqiad1::neutron::db_pass [labs/private] - 10https://gerrit.wikimedia.org/r/442814 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[09:46:42] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [V: 032 C: 032] hieradata: add profile::openstack::eqiad1::neutron::db_pass [labs/private] - 10https://gerrit.wikimedia.org/r/442814 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[09:48:40] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/aqs/deploy@8eef2a9]: Deploying AQS pageviews-per-country ceiling-value glue code - Corrected (duration: 02m 48s)
[09:48:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:49:44] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: eqiad1: enable neutron in control boxes [puppet] - 10https://gerrit.wikimedia.org/r/442815 (https://phabricator.wikimedia.org/T196633)
[09:52:50] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] cassandra: add another package version to the 2.2 list [puppet] - 10https://gerrit.wikimedia.org/r/442251 (https://phabricator.wikimedia.org/T197062) (owner: 10Elukey)
[09:53:21] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: eqiad1: enable neutron in control boxes [puppet] - 10https://gerrit.wikimedia.org/r/442815 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[09:53:40] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [V: 032 C: 032] "Compiler says this is OK:" [puppet] - 10https://gerrit.wikimedia.org/r/442815 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[09:58:13] <twentyafterfour>	 !log Resuming deployment of phabricator upgrade tagged release/2018-06-27/1 - details: https://phabricator.wikimedia.org/project/profile/3439/ )
[09:58:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:57] <moritzm>	 !log installing reportbug update from jessie 8.11 point release
[10:00:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:07:06] <twentyafterfour>	 !log running phabricator database migration
[10:07:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:09:45] <joal>	 Hi again ops-team - Analytics deploy hadoop related scripts - No impact expected on wiki side
[10:10:19] <logmsgbot>	 !log joal@deploy1001 Started deploy [analytics/refinery@4fc20a5]: Regular weekly deploy
[10:10:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:55] <twentyafterfour>	 !log phabricator database migration complete, service restored and appears stable.
[10:11:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:13:17] <addshore>	 twentyafterfour: is phab meat to be complete?
[10:13:22] <addshore>	 https://phabricator.wikimedia.org/T198341 PhabricatorDataNotAttachedException
[10:13:47] <wikibugs>	 (03CR) 1020after4: [C: 031] "I think this should be ready to merge." [puppet] - 10https://gerrit.wikimedia.org/r/441525 (https://phabricator.wikimedia.org/T197922) (owner: 1020after4)
[10:14:16] <twentyafterfour>	 addshore: hmm that's not right
[10:14:21] <addshore>	 :D
[10:14:44] <twentyafterfour>	 addshore: strangely, every other task I've tried so far was fine
[10:14:59] <addshore>	 twentyafterfour: also https://phabricator.wikimedia.org/T198360
[10:15:24] <addshore>	 and https://phabricator.wikimedia.org/T136528 D:, infact, the only 3 tasks I have tried to load have failed :D
[10:15:42] <jakob_WMDE>	 I'm also getting PhabricatorDataNotAttachedException. the person across from me claims everything's working for him :|
[10:15:49] <MatmaRex>	 hi, i've just come to report the same thing, presumably
[10:16:02] <Hauskatze>	 it looks like the favicon also changed?
[10:16:05] <MatmaRex>	 i can't view some tasks when logged in, but they work fine in incognito window
[10:16:22] <addshore>	 ooh, yes, incog works for me too
[10:16:24] <legoktm>	 phab is down
[10:16:26] <legoktm>	 ?
[10:16:29] <legoktm>	 ok
[10:16:32] <Hauskatze>	 Request from xxxxx via cp1061 cp1061, Varnish XID 24768343
[10:16:32] <Hauskatze>	 Error: 503, Backend fetch failed at Thu, 28 Jun 2018 10:16:18 GMT
[10:16:36] <Hauskatze>	 on Phab
[10:16:59] <Hauskatze>	 and now
[10:17:00] <Hauskatze>	 PhabricatorDataNotAttachedException
[10:17:01] <icinga-wm>	 PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[10:17:02] <icinga-wm>	 PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation=compareAndSwap https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[10:17:06] <twentyafterfour>	 hmm I'm not sure what's up with it, I'm working on it
[10:17:20] <Hauskatze>	 it also says: "Attempting to access attached data on PhabricatorProject, but the data is not actually attached. Before accessing attachable data on an object, you must load and attach it.
[10:17:20] <Hauskatze>	 Data is normally attached by calling the corresponding needX() method on the Query class when the object is loaded. You can also call the corresponding attachX() method explicitly."
[10:17:40] <addshore>	 cleared cookies, logged out and back in and still get the exceptions
[10:18:41] <logmsgbot>	 !log joal@deploy1001 Finished deploy [analytics/refinery@4fc20a5]: Regular weekly deploy (duration: 08m 21s)
[10:18:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:19:09] <twentyafterfour>	 ok the data not attached error is fixed
[10:19:15] <addshore>	 twentyafterfour: looks fixed to me
[10:19:18] <addshore>	 thanks!
[10:19:22] <jakob_WMDE>	 here too. cool!
[10:19:34] <_joe_>	 twentyafterfour: what was the problem?
[10:19:47] <twentyafterfour>	 !log hotfixing phabricator DataNotAttached bug
[10:19:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:19:55] <twentyafterfour>	 _joe_: a bug in the code I just deployed
[10:20:10] <Hauskatze>	 wfm atm
[10:21:21] <_joe_>	 heh
[10:21:26] <twentyafterfour>	 I'm not sure why it only happens on some tasks and not others. Or why it doesn't happen on my test instance 
[10:22:41] <icinga-wm>	 RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[10:23:41] <icinga-wm>	 RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[10:48:38] <twentyafterfour>	 !log deployed fix for PhabricatorDataNotAttachedException - https://phabricator.wikimedia.org/rPHEX03971ea8965d3613df69833a766d1502b6d8dabb
[10:48:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:34:34] <wikibugs>	 (03PS1) 10ArielGlenn: generate multiple temp stub files at once for larger wikis [dumps] - 10https://gerrit.wikimedia.org/r/442828 (https://phabricator.wikimedia.org/T196063)
[11:34:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] generate multiple temp stub files at once for larger wikis [dumps] - 10https://gerrit.wikimedia.org/r/442828 (https://phabricator.wikimedia.org/T196063) (owner: 10ArielGlenn)
[12:00:11] <wikibugs>	 (03PS1) 1020after4: Phabricator: Use mysqlnd [puppet] - 10https://gerrit.wikimedia.org/r/442829
[12:02:03] <wikibugs>	 (03PS2) 10ArielGlenn: generate multiple temp stub files at once for larger wikis [dumps] - 10https://gerrit.wikimedia.org/r/442828 (https://phabricator.wikimedia.org/T196063)
[12:04:42] <moritzm>	 !log installing patch security updates
[12:04:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:05:29] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T197707 (10Cmjohnson) The disk has been swapped with a 2TB disk.
[12:07:59] <wikibugs>	 10Operations, 10Traffic, 10Goal: Establish timeline and methodology for upcoming deprecation of non-forward-secret ciphers and TLSv1.0 - https://phabricator.wikimedia.org/T192559 (10Vgutierrez) Our [[ https://grafana.wikimedia.org/dashboard/db/tls-ciphersuite-explorer?panelId=2&fullscreen&orgId=1&from=now-30...
[12:08:38] <logmsgbot>	 !log akosiaris@puppetmaster1001 conftool action : set/pooled=inactive; selector: dc=eqiad,service=.*,cluster=scb,name=scb1002
[12:08:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:21] <icinga-wm>	 PROBLEM - Host scb1002 is DOWN: PING CRITICAL - Packet loss = 100%
[12:13:11] <icinga-wm>	 ACKNOWLEDGEMENT - Host scb1002 is DOWN: PING CRITICAL - Packet loss = 100% alexandros kosiaris memory dimm issue. https://phabricator.wikimedia.org/T196901
[12:13:53] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Search: migrate elasticsearch cirrus cluster to RAID0 - https://phabricator.wikimedia.org/T198391 (10Gehel)
[12:14:47] <paladox>	 twentyafterfour:  I am seeing:
[12:14:49] <paladox>	 17 notifications about objects which no longer exist or which you can no longer see were discarded.
[12:14:59] <paladox>	 That is new and I have never seen that
[12:16:21] <paladox>	 Hmm seems to have gone but it’s now showing only three recent notifications (otherwise I have to click to view all notifications)
[12:23:51] <icinga-wm>	 RECOVERY - Host scb1002 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[12:24:30] <wikibugs>	 10Operations, 10Puppet, 10puppet-compiler, 10User-herron: Upgrade Puppet compilers to Stretch - https://phabricator.wikimedia.org/T191438 (10aborrero)
[12:30:40] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: grafana: Name the grafana-admin key correctly [puppet] - 10https://gerrit.wikimedia.org/r/442835
[12:34:33] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: eqiad1: enable glance in control boxes [puppet] - 10https://gerrit.wikimedia.org/r/442836 (https://phabricator.wikimedia.org/T196633)
[12:36:37] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: hieradata: add profile::openstack::eqiad1::glance::db_pass [labs/private] - 10https://gerrit.wikimedia.org/r/442837 (https://phabricator.wikimedia.org/T196633)
[12:37:25] <logmsgbot>	 !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,service=.*,cluster=scb,name=scb1002
[12:37:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:37:42] <akosiaris>	 !log repool scb1002 T196901
[12:37:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:37:45] <stashbot>	 T196901: Replace memory bank on scb1002 - https://phabricator.wikimedia.org/T196901
[12:38:11] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [V: 032 C: 032] hieradata: add profile::openstack::eqiad1::glance::db_pass [labs/private] - 10https://gerrit.wikimedia.org/r/442837 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[12:39:53] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] grafana: Name the grafana-admin key correctly [puppet] - 10https://gerrit.wikimedia.org/r/442835 (owner: 10Alexandros Kosiaris)
[12:40:30] <wikibugs>	 (03CR) 10Rush: [C: 032] openstack: eqiad1: enable glance in control boxes [puppet] - 10https://gerrit.wikimedia.org/r/442836 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[12:41:34] <wikibugs>	 (03CR) 10Rush: [C: 032] "fyi https://gerrit.wikimedia.org/r/c/operations/puppet/+/440147" [puppet] - 10https://gerrit.wikimedia.org/r/442836 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[12:41:48] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [V: 032] "Compiler is happy:" [puppet] - 10https://gerrit.wikimedia.org/r/442836 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[12:46:26] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: eqiad1: enable neutron in control boxes [puppet] - 10https://gerrit.wikimedia.org/r/442815 (https://phabricator.wikimedia.org/T196633)
[12:49:10] <elukey>	 !log stop hadoop daemons on analytics1032 + shutdown to swap BBU -T194234
[12:49:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:12] <stashbot>	 T194234: anaytics1032's BBU is not working correctly - https://phabricator.wikimedia.org/T194234
[12:49:15] <wikibugs>	 (03PS3) 10Filippo Giunchedi: WIP grafana: host overview dashboard as code [puppet] - 10https://gerrit.wikimedia.org/r/442301 (https://phabricator.wikimedia.org/T178690)
[12:49:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP grafana: host overview dashboard as code [puppet] - 10https://gerrit.wikimedia.org/r/442301 (https://phabricator.wikimedia.org/T178690) (owner: 10Filippo Giunchedi)
[12:51:27] <icinga-wm>	 PROBLEM - puppet last run on labcontrol1004 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 40 seconds ago with 2 failures. Failed resources (up to 3 shown): Package[neutron-common],File[/etc/neutron/original]
[12:53:18] <moritzm>	 !log installing blktrace update from jessie 8.11 point release
[12:53:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:53:30] <wikibugs>	 (03PS1) 10Rush: labstore: notes in nfs-manage for failover [puppet] - 10https://gerrit.wikimedia.org/r/442838 (https://phabricator.wikimedia.org/T157478)
[12:53:37] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: neutron-common: install from jessie-backports if running mitaka [puppet] - 10https://gerrit.wikimedia.org/r/442839 (https://phabricator.wikimedia.org/T196633)
[12:53:58] <marostegui>	 jouncebot: next
[12:53:58] <jouncebot>	 In 0 hour(s) and 6 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180628T1300)
[12:54:25] <wikibugs>	 (03CR) 10Rush: [C: 032] labstore: notes in nfs-manage for failover [puppet] - 10https://gerrit.wikimedia.org/r/442838 (https://phabricator.wikimedia.org/T157478) (owner: 10Rush)
[12:55:07] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: neutron-common: install from jessie-backports if running mitaka [puppet] - 10https://gerrit.wikimedia.org/r/442839 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[12:55:20] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: neutron-common: install from jessie-backports if running mitaka [puppet] - 10https://gerrit.wikimedia.org/r/442839 (https://phabricator.wikimedia.org/T196633)
[13:00:04] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180628T1300).
[13:00:04] <jouncebot>	 raynor, MatmaRex, and dcausse: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:10] <moritzm>	 !log installing bwm-ng update from jessie 8.11 point release
[13:00:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:17] <MatmaRex>	 hi
[13:00:50] <raynor>	 present
[13:01:19] <zeljkof>	 o/
[13:01:31] <dcausse>	 o/
[13:01:43] <zeljkof>	 raynor and dcausse: you are deployers, rigth? want to deploy your own commits?
[13:01:47] <icinga-wm>	 RECOVERY - puppet last run on labcontrol1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[13:01:49] <elukey>	 !log upload matomo (new Piwik) 3.5.1-1 to jessie-wikimedia
[13:01:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:58] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:02:01] <dcausse>	 zeljkof: sure I can
[13:02:04] <raynor>	 yup, I can do that
[13:02:14] <raynor>	 I can go last as I don't have too much experience yet
[13:02:22] <raynor>	 and definitely it will take me the longest ;)
[13:02:27] <icinga-wm>	 PROBLEM - Host analytics1032.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[13:02:36] <zeljkof>	 raynor and dcausse: anybody wants to deploy MatmaRex's patch too? :)
[13:02:47] <zeljkof>	 (I can do it, just asking)
[13:02:57] <dcausse>	 mine can take some time but I can go first if noone objects
[13:02:59] <raynor>	 I'll watch ;)
[13:03:02] <dcausse>	 I can deploy MatmaRex one
[13:03:47] <zeljkof>	 ok, then we are all set, dcausse you are the main swatter today, let MatmaRex know when you are deploying his patch, and let raynor know when it's his turn :)
[13:04:00] <zeljkof>	 I am around if anybody needs me
[13:04:09] <dcausse>	 MatmaRex: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/442832 got -1 from jenkins
[13:04:35] <MatmaRex>	 dcausse: it should be harmless, CI has been having some issues since yesterday and occasionally jobs time out
[13:04:44] <wikibugs>	 10Operations, 10Traffic, 10Goal: Establish timeline and methodology for upcoming deprecation of non-forward-secret ciphers and TLSv1.0 - https://phabricator.wikimedia.org/T192559 (10BBlack) Going a bit beyond the explicit scope of this ticket, there are really a few different legacy-support risks we'd like t...
[13:04:49] <MatmaRex>	 note how it took exactly 30 minutes to fail
[13:05:05] <MatmaRex>	 (bad news is, we might be waiting a long time for changes to merge)
[13:05:12] <dcausse>	 :/
[13:05:33] <MatmaRex>	 (see https://phabricator.wikimedia.org/T198348)
[13:06:03] <dcausse>	 so what's the plan C+2/V+2 ?
[13:06:26] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: eqiad1: enable glance in control boxes [puppet] - 10https://gerrit.wikimedia.org/r/442836 (https://phabricator.wikimedia.org/T196633)
[13:06:28] <wikibugs>	 (03PS1) 10Vgutierrez: varnishkafka: Set TLS curves list and sigalgs list for cache::upload [puppet] - 10https://gerrit.wikimedia.org/r/442840 (https://phabricator.wikimedia.org/T182993)
[13:06:43] <MatmaRex>	 just C+2, it will run the tests again, and hopefully they'll pass
[13:06:58] <dcausse>	 ok will C+2 and deploy my patches in the meantime
[13:07:16] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: eqiad1: enable glance in control boxes [puppet] - 10https://gerrit.wikimedia.org/r/442836 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[13:07:18] <wikibugs>	 (03CR) 10Elukey: [C: 031] varnishkafka: Set TLS curves list and sigalgs list for cache::upload [puppet] - 10https://gerrit.wikimedia.org/r/442840 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[13:07:20] <MatmaRex>	 and if it takes more than 10 minutes or so, then yeah, you'll have to V+2 and merge, i guess
[13:07:34] <wikibugs>	 (03CR) 10DCausse: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442317 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[13:07:38] <icinga-wm>	 RECOVERY - Host analytics1032.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.52 ms
[13:07:59] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T197707 (10Marostegui) ``` root@dbstore1002:~# megacli -PDRbld -ShowProg -PhysDrv [32:5] -aALL  Rebuild Progress on Device at Enclosure 32, Slot 5 Completed 1% in 63 Minutes. ```
[13:08:31] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T197707 (10elukey) Thanks @Marostegui !
[13:08:37] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[13:08:48] <wikibugs>	 (03Merged) 10jenkins-bot: Add cirrussearch settings for wikibase (1.5/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442317 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[13:09:04] <wikibugs>	 (03CR) 10jenkins-bot: Add cirrussearch settings for wikibase (1.5/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442317 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[13:10:14] <wikibugs>	 10Operations, 10ops-eqiad: anaytics1032's BBU is not working correctly - https://phabricator.wikimedia.org/T194234 (10elukey) Looks good!  ``` elukey@analytics1032:~$ sudo megacli -AdpBbuCmd -GetBbuStatus -aALL  BBU status for Adapter: 0  BatteryType: BBU Voltage: 3966 mV Current: 161 mA Temperature: 40 C Batt...
[13:10:52] <wikibugs>	 (03CR) 10Vgutierrez: [C: 032] varnishkafka: Set TLS curves list and sigalgs list for cache::upload [puppet] - 10https://gerrit.wikimedia.org/r/442840 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[13:11:10] <wikibugs>	 (03PS2) 10Vgutierrez: varnishkafka: Set TLS curves list and sigalgs list for cache::upload [puppet] - 10https://gerrit.wikimedia.org/r/442840 (https://phabricator.wikimedia.org/T182993)
[13:11:11] <wikibugs>	 10Operations, 10ops-eqiad: anaytics1032's BBU is not working correctly - https://phabricator.wikimedia.org/T194234 (10elukey) 05Open>03Resolved
[13:11:19] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: glance: install from jessie-backports if running mitaka [puppet] - 10https://gerrit.wikimedia.org/r/442841 (https://phabricator.wikimedia.org/T196633)
[13:11:48] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:12:08] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: glance: install from jessie-backports if running mitaka [puppet] - 10https://gerrit.wikimedia.org/r/442841 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez)
[13:12:19] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: glance: install from jessie-backports if running mitaka [puppet] - 10https://gerrit.wikimedia.org/r/442841 (https://phabricator.wikimedia.org/T196633)
[13:13:00] <logmsgbot>	 !log dcausse@deploy1001 Synchronized ./wmf-config/WikibaseSearchSettings.php: Add cirrussearch settings for wikibase (1.5/3) (duration: 00m 56s)
[13:13:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:12] <dcausse>	 deploying my second patch: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/442318/
[13:13:29] <wikibugs>	 (03CR) 10DCausse: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442318 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[13:14:17] <icinga-wm>	 PROBLEM - puppet last run on labcontrol1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[glance]
[13:14:31] <vgutierrez>	 !log Apply new TLS varnishkafka settings in cache::upload nodes - T182993
[13:14:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:14:33] <stashbot>	 T182993: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993
[13:14:39] <wikibugs>	 (03Merged) 10jenkins-bot: Add cirrussearch settings for wikibase (2/3) (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442318 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[13:16:18] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[13:18:56] <logmsgbot>	 !log dcausse@deploy1001 Synchronized ./wmf-config/: Add cirrussearch settings for wikibase (2/3) (take 2) (duration: 00m 58s)
[13:18:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:18] <icinga-wm>	 RECOVERY - puppet last run on labcontrol1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[13:19:22] <wikibugs>	 (03CR) 10jenkins-bot: Add cirrussearch settings for wikibase (2/3) (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442318 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[13:19:34] <dcausse>	 deploying my third patch: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/441057/
[13:19:38] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:19:53] <wikibugs>	 (03CR) 10DCausse: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441057 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[13:20:49] <wikibugs>	 (03PS4) 10Elukey: cassandra: add another package version to the 2.2 list [puppet] - 10https://gerrit.wikimedia.org/r/442251 (https://phabricator.wikimedia.org/T197062)
[13:21:05] <wikibugs>	 (03Merged) 10jenkins-bot: Add cirrussearch settings for wikibase (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441057 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[13:21:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cassandra: add another package version to the 2.2 list [puppet] - 10https://gerrit.wikimedia.org/r/442251 (https://phabricator.wikimedia.org/T197062) (owner: 10Elukey)
[13:22:52] <wikibugs>	 (03CR) 10Ottomata: ":D" [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez)
[13:24:11] <wikibugs>	 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993 (10Ottomata) Woo hoo!  Annnnd soon we disable IPSec?! :D
[13:25:02] <wikibugs>	 (03CR) 10jenkins-bot: Add cirrussearch settings for wikibase (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441057 (https://phabricator.wikimedia.org/T182717) (owner: 10DCausse)
[13:25:06] <logmsgbot>	 !log dcausse@deploy1001 Synchronized ./wmf-config/Wikibase-production.php: Add cirrussearch settings for wikibase (3/3) (duration: 00m 56s)
[13:25:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:25] <dcausse>	 ok I'm done with my patches
[13:25:46] <dcausse>	 still waiting for CI on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/442832 :(
[13:29:34] <wikibugs>	 (03CR) 10Elukey: [V: 032 C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11608/" [puppet] - 10https://gerrit.wikimedia.org/r/442251 (https://phabricator.wikimedia.org/T197062) (owner: 10Elukey)
[13:31:35] <MatmaRex>	 it took forever to get going, but it's actually running tests now
[13:32:18] <dcausse>	 ok
[13:36:15] <wikibugs>	 10Operations, 10ops-eqiad, 10User-Elukey, 10User-Joe: rack/setup/install rdb10[09|10].eqiad.wmnet - https://phabricator.wikimedia.org/T196685 (10Cmjohnson)
[13:38:06] <MatmaRex>	 dcausse: ugh, well, it failed due to timing out
[13:38:12] <dcausse>	 :(
[13:38:36] <wikibugs>	 10Operations, 10SRE-Access-Requests: WMF-NDA-Request for User:Braveheart - https://phabricator.wikimedia.org/T198190 (10Braveheart) Hi Nuria!  I'd love to look at the geoeditor reports - when do you expect them to be published? I assume I still need LDAP access for these datasets?  Best, Philip
[13:38:56] <dcausse>	 zeljkof: is it OK to C+2/V+2 when jenkins is timing out and the patch looks harmless?
[13:39:16] <zeljkof>	 dcausse: it's up to deployed to decide :D
[13:39:28] <dcausse>	 meh :)
[13:39:42] <zeljkof>	 dcausse: if you are reasonable sure it will not break stuff and please monitor the logs for at least a few minutes after the deploy
[13:39:43] <MatmaRex>	 dcausse: haha, actually, it looks like different jobs timed out in the "Main test build" and "Gate pipeline build". so it does actually pass them all, at least sometimes ;)
[13:43:29] <dcausse>	 MatmaRex: it's live on mwdebug1002
[13:43:56] <MatmaRex>	 ok, testing
[13:44:37] <icinga-wm>	 PROBLEM - puppet last run on maps-test2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[cassandra]
[13:46:04] <MatmaRex>	 dcausse: looks fine!
[13:46:11] <dcausse>	 MatmaRex: ok deploying
[13:46:11] <wikibugs>	 (03PS5) 10Muehlenhoff: Enable microcode for all database roles [puppet] - 10https://gerrit.wikimedia.org/r/442269 (https://phabricator.wikimedia.org/T127825)
[13:46:27] <MatmaRex>	 (the page took like a minute to load the first time)
[13:48:01] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442852
[13:48:07] <wikibugs>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442852
[13:48:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Enable microcode for all database roles [puppet] - 10https://gerrit.wikimedia.org/r/442269 (https://phabricator.wikimedia.org/T127825) (owner: 10Muehlenhoff)
[13:49:14] <elukey>	 !log downgrade cassadra and cassandra-tools from 2.2.6-wmf5 to 2.2.6-wmf3 in jessie-wikimedia component/cassandra22 - T197062
[13:49:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:17] <stashbot>	 T197062: Upgrade Cassandra on AQS to 2.2.6-wmf5 - https://phabricator.wikimedia.org/T197062
[13:49:23] <logmsgbot>	 !log dcausse@deploy1001 Synchronized ./php-1.32.0-wmf.10/includes/htmlform/: Allow overloading of getLabel() with return '&#160;' (duration: 00m 59s)
[13:49:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:32] <dcausse>	 MatmaRex: done
[13:49:48] <dcausse>	 raynor: I'm done
[13:49:54] <dcausse>	 sorry for the delay :(
[13:50:18] <MatmaRex>	 thanks. looks good in production
[13:50:57] <raynor>	 no worries
[13:51:34] <marostegui>	 dcausse: Is swat done?
[13:51:44] <raynor>	 nope, I need to swat one more thing
[13:51:44] <dcausse>	 marostegui: raynor has one more patch to submit
[13:51:50] <wikibugs>	 (03PS1) 10Rush: labstore: nfs-mount-manager add list, all, and refine help [puppet] - 10https://gerrit.wikimedia.org/r/442853
[13:51:56] <marostegui>	 Ah cool :)
[13:52:02] <raynor>	 swatting https://gerrit.wikimedia.org/r/#/c/442170/
[13:53:57] <wikibugs>	 (03PS2) 10Rush: labstore: nfs-mount-manager add list, all, and refine help [puppet] - 10https://gerrit.wikimedia.org/r/442853
[13:58:59] <wikibugs>	 (03PS3) 10Rush: labstore: nfs-mount-manager add list, all, and refine help [puppet] - 10https://gerrit.wikimedia.org/r/442853
[13:59:37] <raynor>	 it's soo slow ;/
[14:00:26] <dcausse>	 raynor: yes CI is struggling :(, I had to force merge the last patch
[14:01:35] <wikibugs>	 (03CR) 10Rush: [C: 032] labstore: nfs-mount-manager add list, all, and refine help [puppet] - 10https://gerrit.wikimedia.org/r/442853 (owner: 10Rush)
[14:03:01] <raynor>	 zeljkof: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/6238/console >> 13:55:14 npm ERR! registry error parsing json
[14:03:08] <raynor>	 is it something we should worry?
[14:04:07] <zeljkof>	 raynor: probably! I think hashar is working on it https://phabricator.wikimedia.org/T198348  
[14:05:05] <raynor>	 ok, zeljkof I think I need some help ;/
[14:05:41] <raynor>	 I merged the patch and I don't see it on deploy1001 - the patch is to merge to 	wmf/1.32.0-wmf.10
[14:05:56] <raynor>	 not master, maybe because of that I don't see it if I do `git fetch` ?
[14:06:10] <gehel>	 !log downgrading cassandra to 2.2.6-wmf3 on maps-test2001 (it should never have been upgraded)
[14:06:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:29] <wikibugs>	 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993 (10Vgutierrez) >>! In T182993#4321845, @Ottomata wrote: > Woo hoo! >  > Annnnd soon we disable IPSec?! :D  As soon as we rollout this on cache::...
[14:06:33] <zeljkof>	 raynor: yes, there is slightly different steps for backports
[14:06:46] <zeljkof>	 raynor: looking up docs
[14:06:46] <raynor>	 also, the change is for Vector skin
[14:06:47] <raynor>	 not core
[14:07:07] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651 (10Cmjohnson)
[14:07:10] <zeljkof>	 hm, not sure I've ever done skin, but it should be similar to extension...
[14:07:38] <raynor>	 yup, I also think it's the same
[14:08:23] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651 (10Cmjohnson) i was able to relocate a few servers in d2 to make room for the new disk shelf (LS1007). For LS1006, I just removed 2 decom'd servers from u24 and 25...
[14:09:43] <icinga-wm>	 RECOVERY - puppet last run on maps-test2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[14:10:19] <zeljkof>	 raynor: so these are the steps
[14:10:20] <zeljkof>	 https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Step_2:_get_the_code_on_the_deployment_host
[14:10:32] <zeljkof>	 I do have a simplified version, I think, will create a phab paste
[14:11:29] <raynor>	 ok, got it, thanks
[14:11:34] <zeljkof>	 raynor: so this is what I do https://phabricator.wikimedia.org/P7315
[14:12:02] <zeljkof>	 let me know if you have questions
[14:19:42] <raynor>	 ok,I have code in vector, code is up to date
[14:19:50] <raynor>	 now if I do scap pull on mwdebug1002 nothing changes
[14:19:57] <wikibugs>	 10Operations, 10ops-eqiad: mw1239 correctable memory errors - https://phabricator.wikimedia.org/T198398 (10fgiunchedi)
[14:20:33] <icinga-wm>	 ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on mw1239 is CRITICAL: 12 ge 4 Filippo Giunchedi https://phabricator.wikimedia.org/T198398 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1239&var-datasource=eqiad%2520prometheus%252Fops
[14:21:12] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[14:24:32] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[14:26:22] <hashar>	 !log CI jobs running npm might suffer from a 10 minutes delay since June 27th | T198348
[14:26:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:25] <stashbot>	 T198348: CI jobs takes too long / instances overloaded - https://phabricator.wikimedia.org/T198348
[14:29:04] <moritzm>	 !log installing ghostscript security updates
[14:29:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:39] <wikibugs>	 (03CR) 10Filippo Giunchedi: "I couldn't find a production host that runs striker, according to puppet's manifest comment that ought to be labtestweb2001 but PCC disagr" [puppet] - 10https://gerrit.wikimedia.org/r/431595 (https://phabricator.wikimedia.org/T147326) (owner: 10Filippo Giunchedi)
[14:31:28] <icinga-wm>	 PROBLEM - Host ms-be1036.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:32:03] <godog>	 that's known, downtime expired perhaps
[14:35:23] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: rack/setup/install cloudelastic100[1-4].eqiad.wmnet systems - https://phabricator.wikimedia.org/T194186 (10Cmjohnson)
[14:36:40] <marostegui>	 zeljkof: can I deploy? https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/442852/
[14:37:07] <zeljkof>	 marostegui: just a sec, raynor is finishing up something
[14:37:14] <marostegui>	 Cool!
[14:39:43] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install backup1001 - https://phabricator.wikimedia.org/T196478 (10Cmjohnson)
[14:39:53] <wikibugs>	 (03PS3) 10Elukey: Upgrade the piwik module to matomo [puppet] - 10https://gerrit.wikimedia.org/r/442806 (https://phabricator.wikimedia.org/T192298)
[14:40:03] <wikibugs>	 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install backup1001 - https://phabricator.wikimedia.org/T196478 (10Cmjohnson) disk arrays are racked in D2.
[14:40:57] <icinga-wm>	 PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received
[14:41:44] <wikibugs>	 (03CR) 10Elukey: [C: 032] Upgrade the piwik module to matomo [puppet] - 10https://gerrit.wikimedia.org/r/442806 (https://phabricator.wikimedia.org/T192298) (owner: 10Elukey)
[14:41:57] <icinga-wm>	 RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy
[14:44:00] <raynor>	 marostegui, almost done
[14:44:47] <marostegui>	 great, I will wait for it :)
[14:46:07] <wikibugs>	 (03PS1) 10Filippo Giunchedi: statsite: deprecate Diamond udp collector [puppet] - 10https://gerrit.wikimedia.org/r/442865 (https://phabricator.wikimedia.org/T183454)
[14:46:40] <elukey>	 !log upgrade piwik 3.2.1 to matomo (new name/package) 3.5.1 - T192298
[14:46:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:46:42] <stashbot>	 T192298: Update piwik to latest stable - https://phabricator.wikimedia.org/T192298
[14:46:59] <wikibugs>	 (03PS1) 10Urbanecm: Add sat to langs.tmpl [dns] - 10https://gerrit.wikimedia.org/r/442867 (https://phabricator.wikimedia.org/T198400)
[14:49:57] <logmsgbot>	 !log pmiazga@deploy1001 Synchronized php-1.32.0-wmf.10/skins/Vector/components/watchstar.less: SWAT: [[gerrit:442170|Use exactly calculated value to work around a Chrome bug (T196610)]] (duration: 01m 00s)
[14:49:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:59] <stashbot>	 T196610: Star in tab bar disappears after adding page to watchlist in Chrome - https://phabricator.wikimedia.org/T196610
[14:50:52] <raynor>	 !log EU SWAT finished
[14:50:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:07] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[14:51:16] <raynor>	 SWAT is done, sorry for taking so long, I had problems with testing the patch ;/
[14:51:29] <raynor>	 marostegui: you can proceed
[14:51:34] <marostegui>	 Thanks
[14:52:41] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442852 (owner: 10Marostegui)
[14:52:43] <wikibugs>	 10Operations, 10ops-eqdfw, 10netops: eqdfw: Patch GTT cross-connect - https://phabricator.wikimedia.org/T194515 (10ayounsi) 05Open>03Resolved The LOA was incorrect, Equinix moved it to the proper one and link is up.
[14:53:58] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442852 (owner: 10Marostegui)
[14:54:27] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[14:55:14] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1099:3318 after alter table (duration: 00m 58s)
[14:55:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:12] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442869 (https://phabricator.wikimedia.org/T191316)
[14:58:03] <wikibugs>	 (03PS1) 10Rush: WIP labstore: switch labstore1005 to primary in pair [puppet] - 10https://gerrit.wikimedia.org/r/442870 (https://phabricator.wikimedia.org/T187962)
[14:58:42] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1099:3318" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442852 (owner: 10Marostegui)
[14:59:13] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442869 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[15:00:06] <wikibugs>	 (03PS1) 10Urbanecm: Initial configuration for satwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442871 (https://phabricator.wikimedia.org/T198400)
[15:00:23] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442869 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[15:01:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for satwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442871 (https://phabricator.wikimedia.org/T198400) (owner: 10Urbanecm)
[15:01:39] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 for alter table (duration: 00m 57s)
[15:01:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:53] <marostegui>	 !log  Deploy schema change on db1101:3318 T191316 T192926 T89737 T195193
[15:01:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:57] <stashbot>	 T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737
[15:01:57] <stashbot>	 T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926
[15:01:58] <stashbot>	 T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193
[15:01:58] <stashbot>	 T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316
[15:03:47] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1101:3318 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442869 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui)
[15:07:17] <icinga-wm>	 PROBLEM - puppet last run on labstore1007 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 4 minutes ago with 4 failures. Failed resources (up to 3 shown): File[/srv/dumps/xmldatadumps/public/other/mediacounts/readme.html],File[/srv/dumps/xmldatadumps/public/other/pageviews/readme.html],File[/srv/dumps/xmldatadumps/public/other/misc]
[15:08:10] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477 (10Papaul) @MoritzMuehlenhoff  please see below for the out put you requested   {F23057592}  {F23057594}
[15:10:17] <icinga-wm>	 PROBLEM - HP RAID on labstore1007 is CRITICAL: CRITICAL: Slot 1: OK: 2I:4:1, 2I:4:2, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK --- Slot 3: bad transfer speed: 1E:2:1(12.0Gbps, Unknown), 1E:2:2(12.0Gbps, Unknown), 1E:2:3(12.0Gbps, Unknown), 1E:2:4(12.0Gbps, Unknown), 1E:2:5(12.0Gbps, Unknown), 1E:2:6(12.0Gbps, Unknown), 1E:2:7(12.0Gbps,
[15:10:17] <icinga-wm>	 2.0Gbps, Unknown), 1E:2:9(12.0Gbps, Unknown), 1E:2:10(12.0Gbps, Unknown), 1E:2:11(12.0Gbps, Unknown), 1E:2:12(12.0Gbps, Unknown) - OK: 1E:1:1, 1E:1:3, 1E:1:5, 1E:1:7, 1E:1:9, 1E:1:11, 1E:2:1, 1E:2:2, 1E:2:3, 1E:2:4, 1E:2:5, 1E:2:6, 1E:2:7, 1E:2:8, 1E:2:9, 1E:2:10, 1E:2:11, 1E:2:12 - Failed: 1E:1:2, 1E:1:4, 1E:1:6, 1E:1:8, 1E:1:10, 1E:1:12 - Controller: OK - Battery/Capacitor: OK
[15:10:19] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on labstore1007 is CRITICAL: CRITICAL: Slot 1: OK: 2I:4:1, 2I:4:2, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK --- Slot 3: bad transfer speed: 1E:2:1(12.0Gbps, Unknown), 1E:2:2(12.0Gbps, Unknown), 1E:2:3(12.0Gbps, Unknown), 1E:2:4(12.0Gbps, Unknown), 1E:2:5(12.0Gbps, Unknown), 1E:2:6(12.0Gbps, Unknown), 1E:2:7(1
[15:10:20] <icinga-wm>	 1E:2:8(12.0Gbps, Unknown), 1E:2:9(12.0Gbps, Unknown), 1E:2:10(12.0Gbps, Unknown), 1E:2:11(12.0Gbps, Unknown), 1E:2:12(12.0Gbps, Unknown) - OK: 1E:1:1, 1E:1:3, 1E:1:5, 1E:1:7, 1E:1:9, 1E:1:11, 1E:2:1, 1E:2:2, 1E:2:3, 1E:2:4, 1E:2:5, 1E:2:6, 1E:2:7, 1E:2:8, 1E:2:9, 1E:2:10, 1E:2:11, 1E:2:12 - Failed: 1E:1:2, 1E:1:4, 1E:1:6, 1E:1:8, 1E:1:10, 1E:1:12 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https:
[15:10:20] <icinga-wm>	 edia.org/T198407
[15:10:23] <wikibugs>	 10Operations, 10SRE-Access-Requests: WMF-NDA-Request for User:Braveheart - https://phabricator.wikimedia.org/T198190 (10Nuria) @Braveheart: yes, you will need LDAP access and a term for which is granted (we do not grant unbounded access) . We have also given out edited versions of those reports w/o LDAP to thi...
[15:10:25] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T198407 (10ops-monitoring-bot)
[15:11:57] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on labstore1006 is CRITICAL: cluster=misc device={cciss,14,cciss,15,cciss,16,cciss,17,cciss,18,cciss,19,cciss,20,cciss,21,cciss,22,cciss,23} instance=labstore1006:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labstore1006&var-datasource=eqiad%2520prometheus%252Fops
[15:12:47] <wikibugs>	 10Operations, 10ops-eqiad, 10DNS, 10Traffic: rack/setup/install authdns1001.wikimedia.org - https://phabricator.wikimedia.org/T196693 (10Cmjohnson)
[15:13:04] <wikibugs>	 (03PS2) 10Urbanecm: Initial configuration for satwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442871 (https://phabricator.wikimedia.org/T198400)
[15:14:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for satwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442871 (https://phabricator.wikimedia.org/T198400) (owner: 10Urbanecm)
[15:15:01] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T198407 (10chasemp) a:03Cmjohnson I don't quite understand this.  Is this trying to say 6 failed drives?
[15:15:08] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T198407 (10chasemp) p:05Triage>03High
[15:15:46] <wikibugs>	 (03PS3) 10Urbanecm: Initial configuration for satwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442871 (https://phabricator.wikimedia.org/T198400)
[15:16:14] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T198407 (10chasemp) and labstore1006 as well?  [from irc]  ```PROBLEM - Device not healthy -SMART- on labstore1006 is CRITICAL: cluster=misc device={cciss,14,cciss,15,cciss,16,cciss,17,cciss,18,cciss,19,cciss,20,cciss,...
[15:16:28] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T198407 (10chasemp) @Volans can you help make sense of this?
[15:17:11] <volans>	 chasemp: wow :)
[15:17:39] <chasemp>	 volans: I have a guess that cmjohnson1 is adding new shelves here and it's caushing the raid monitoring to freak out, but I'm really not sure
[15:17:47] <chasemp>	 I have a meeting in 3 fyi
[15:18:24] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T198407 (10chasemp) possibly related to {T196651}?
[15:18:36] <volans>	 might be, has so many things wrong, if you're mangling with the host the best suggestion is to disable event handler in Icinga
[15:18:45] <volans>	 for the HP RAID
[15:18:48] <volans>	 check for that host
[15:19:09] <volans>	 and re-enable it once done
[15:19:19] <chasemp>	 I'm not doing anything w/ it today but ack I wonder if cmjohnson1 is
[15:19:51] <volans>	 for that I've no more info than you ;)
[15:20:52] <chasemp>	 heard
[15:21:03] <chasemp>	 apergos: fyi T198407 and T196651, I'm not sure what's going on
[15:21:03] <stashbot>	 T198407: Degraded RAID on labstore1007 - https://phabricator.wikimedia.org/T198407
[15:21:03] <stashbot>	 T196651: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651
[15:21:07] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[15:21:20] <apergos>	 chasemp: ?
[15:21:31] <marostegui>	 chasemp: Bunch of IO errors on dmesg
[15:21:48] <apergos>	 crap
[15:22:07] <marostegui>	 Maybe cmjohnson1 pulling out disks
[15:22:09] <chasemp>	 it seems crazy it would hit both servers at hte same time unless it was related to connecting new shelves
[15:22:19] <apergos>	 indeed
[15:22:48] <apergos>	 that explains the cron email about a read-only filesystem I just got (labstore1007)
[15:23:00] <wikibugs>	 (03PS1) 10Volans: Improve validation on host package updates [software/debmonitor] - 10https://gerrit.wikimedia.org/r/442876 (https://phabricator.wikimedia.org/T191299)
[15:23:42] <cmjohnson1>	 Chasemp sorry I connected them. I thought the new shelves were powered off
[15:23:51] <apergos>	 ah there is the mystery
[15:23:54] <chasemp>	 ohhhh
[15:23:55] <chasemp>	 ok
[15:23:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Improve validation on host package updates [software/debmonitor] - 10https://gerrit.wikimedia.org/r/442876 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans)
[15:24:27] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:24:53] <chasemp>	 I have to hop into a meeting apergos and cmjohnson1, thanks (we have a bit maint in 40 minutes)
[15:25:01] <apergos>	 so do I
[15:31:47] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Not bad at all as a first step. Pretty good in fact!" [puppet] - 10https://gerrit.wikimedia.org/r/442301 (https://phabricator.wikimedia.org/T178690) (owner: 10Filippo Giunchedi)
[15:31:59] <cmjohnson1>	 apergos I disconnected the new disk shelves
[15:32:07] <cmjohnson1>	 per chasemp request 
[15:32:07] <icinga-wm>	 PROBLEM - HP RAID on labstore1006 is CRITICAL: CRITICAL: Slot 1: OK: 2I:4:1, 2I:4:2, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK --- Slot 3: Failed: 1E:1:2, 1E:1:4, 1E:1:6, 1E:1:8, 1E:1:10, 1E:1:12 - OK: 1E:1:1, 1E:1:3, 1E:1:5, 1E:1:7, 1E:1:9, 1E:1:11 - Controller: OK - Battery/Capacitor: OK
[15:32:10] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on labstore1006 is CRITICAL: CRITICAL: Slot 1: OK: 2I:4:1, 2I:4:2, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK --- Slot 3: Failed: 1E:1:2, 1E:1:4, 1E:1:6, 1E:1:8, 1E:1:10, 1E:1:12 - OK: 1E:1:1, 1E:1:3, 1E:1:5, 1E:1:7, 1E:1:9, 1E:1:11 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: htt
[15:32:10] <icinga-wm>	 kimedia.org/T198408
[15:32:14] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on labstore1006 - https://phabricator.wikimedia.org/T198408 (10ops-monitoring-bot)
[15:32:29] <apergos>	 I see
[15:32:31] <apergos>	 /dev/mapper/data-dumps on /srv/dumps type ext4 (ro,noatime,stripe=384,data=ordered)
[15:32:34] <apergos>	 still on labstore1007
[15:32:45] <apergos>	 I can't really look at it right now, meeting
[15:32:53] <chasemp>	 cmjohnson1: ^ I think chris is shuting down the new shelves for now apergos 
[15:41:33] <apergos>	 can someone fsck or remount or whatever needs to happen over there please?
[15:41:37] <apergos>	 they are still ro 
[15:41:50] <apergos>	 I'm only on labstore1007, I have no idea about the other hosts
[15:43:24] <wikibugs>	 10Operations, 10Wikidata, 10monitoring, 10Patch-For-Review, 10User-Addshore: Add Addshore & possibly other WMDE devs/deployers to the wikidata icinga contact list - https://phabricator.wikimedia.org/T195289 (10Ladsgroup) Is this done?
[15:45:07] <icinga-wm>	 RECOVERY - Host ms-be1036 is UP: PING OK - Packet loss = 0%, RTA = 1.07 ms
[15:45:33] <cmjohnson1>	 godog ^ it's back 
[15:47:00] <wikibugs>	 10Operations, 10ops-eqiad: ms-be1036 in power off status, not responsive to power on commands - https://phabricator.wikimedia.org/T196873 (10Cmjohnson) I pushed the schedule and the HP tech came today.  The server is back online.  @godog please resolve if satisfied.
[15:49:17] <icinga-wm>	 PROBLEM - puppet last run on labstore1006 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 4 minutes ago with 3 failures. Failed resources (up to 3 shown): File[/srv/dumps/xmldatadumps/public/other/unique_devices/readme.html],File[/srv/dumps/xmldatadumps/public/other/misc]
[15:50:58] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[15:52:51] <andrewbogott>	 apergos: we're about to do some network maintenance on other labstores and I kind of want to ignore the 1006/1007 issues until after our window.  Can you live with that?  (It shouldn't be long)
[15:53:09] <apergos>	 it means rsyncs will fail for awhile
[15:53:20] <apergos>	 when is your window?
[15:53:53] <andrewbogott>	 in 7 minutes
[15:53:55] <apergos>	 fine
[15:54:12] <apergos>	 I was steeling myelf for "oh in 6 hours'
[15:54:18] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:54:34] <andrewbogott>	 apergos: thanks
[15:54:59] <apergos>	 thanks for letting me know/looking at it later
[15:56:48] <icinga-wm>	 RECOVERY - Memory correctable errors -EDAC- on cp1053 is OK: (C)4 ge (W)2 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=cp1053&var-datasource=eqiad%2520prometheus%252Fops
[16:00:04] <jouncebot>	 godog, moritzm, and _joe_: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Puppet SWAT(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180628T1600).
[16:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[16:00:20] <wikibugs>	 (03PS2) 10Vgutierrez: vcl: Bump AES128-SHA pageview replacement to 10% [puppet] - 10https://gerrit.wikimedia.org/r/441804 (https://phabricator.wikimedia.org/T192555)
[16:00:34] <wikibugs>	 (03CR) 10Vgutierrez: [C: 032] vcl: Bump AES128-SHA pageview replacement to 10% [puppet] - 10https://gerrit.wikimedia.org/r/441804 (https://phabricator.wikimedia.org/T192555) (owner: 10Vgutierrez)
[16:00:37] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on labstore1007 is CRITICAL: cluster=misc device={cciss,14,cciss,15,cciss,16,cciss,17,cciss,18,cciss,19,cciss,20,cciss,21,cciss,22,cciss,23} instance=labstore1007:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labstore1007&var-datasource=eqiad%2520prometheus%252Fops
[16:03:29] <arturo>	 apergos: labstore1007 dmesg
[16:03:32] <arturo>	 https://www.irccloud.com/pastebin/Spm5ruwv/
[16:03:46] <apergos>	 I'm n a meeting :-(
[16:05:32] <wikibugs>	 (03PS3) 10Paladox: Gerrit: Clone avatars repo into /var/www/avatars [puppet] - 10https://gerrit.wikimedia.org/r/440104
[16:06:09] <wikibugs>	 (03PS4) 10Paladox: Gerrit: Clone avatars repo into /var/www/avatars [puppet] - 10https://gerrit.wikimedia.org/r/440104
[16:09:15] <moritzm>	 !log restarting Cassandra instances on restbase2005 to pick up Java security update
[16:09:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:19] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review: Release and deploy Debmonitor (patch management software) [Technology Goal 2017-18_Q4] - https://phabricator.wikimedia.org/T191298 (10Volans)
[16:09:23] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Debmonitor: deploy the service in production - https://phabricator.wikimedia.org/T191299 (10Volans) 05Open>03Resolved The service is in production and working fine. Some fine-tune will follow in separated tasks. Goal wise this is comple...
[16:09:53] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review: Release and deploy Debmonitor (patch management software) [Technology Goal 2017-18_Q4] - https://phabricator.wikimedia.org/T191298 (10Volans)
[16:09:58] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Debmonitor: deploy the agent across the fleet - https://phabricator.wikimedia.org/T191300 (10Volans) 05Open>03Resolved The client is in production across the whole fleet and working fine. Some fine-tune might follow in separated tasks....
[16:10:17] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review: Release and deploy Debmonitor (patch management software) [Technology Goal 2017-18_Q4] - https://phabricator.wikimedia.org/T191298 (10Volans)
[16:10:44] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review: Release and deploy Debmonitor (patch management software) [Technology Goal 2017-18_Q4] - https://phabricator.wikimedia.org/T191298 (10Volans) 05Open>03Resolved The service and client are in production and working fine. Some fi...
[16:10:57] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend access for jsamra [puppet] - 10https://gerrit.wikimedia.org/r/442881
[16:12:07] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on labstore1006 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labstore1006&var-datasource=eqiad%2520prometheus%252Fops
[16:12:50] <wikibugs>	 10Operations, 10Operations-Software-Development, 10Patch-For-Review: New tool to track package updates/status for hosts and images (debmonitor) - https://phabricator.wikimedia.org/T167504 (10Volans) The service and client are in production and working fine. Leaving the task open for the Docker images part.
[16:13:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Extend access for jsamra [puppet] - 10https://gerrit.wikimedia.org/r/442881 (owner: 10Muehlenhoff)
[16:17:47] <icinga-wm>	 PROBLEM - Host labstore1004 is DOWN: PING CRITICAL - Packet loss = 100%
[16:18:17] <icinga-wm>	 PROBLEM - toolschecker: tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds
[16:18:55] <icinga-wm>	 PROBLEM - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 356 bytes in 60.008 second response time
[16:19:07] <icinga-wm>	 RECOVERY - Host labstore1004 is UP: PING OK - Packet loss = 0%, RTA = 0.18 ms
[16:19:18] <chasemp>	 ok well those are legit except it sould be returning
[16:19:28] <chasemp>	 apologies this maintenance has been a bit of chaos
[16:20:27] <apergos>	 ok I am now here (meeting out)
[16:20:35] <icinga-wm>	 RECOVERY - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.012 second response time
[16:21:53] <icinga-wm>	 PROBLEM - drbd service on labstore1004 is CRITICAL: CRITICAL - Expecting active but unit drbd is inactive
[16:23:47] <herron>	 ^ just got the page on that.  am around if you need any help
[16:24:12] <icinga-wm>	 RECOVERY - drbd service on labstore1004 is OK: OK - drbd is active
[16:27:24] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on labstore1006 - https://phabricator.wikimedia.org/T198408 (10herron) p:05Triage>03High
[16:29:47] <icinga-wm>	 PROBLEM - Host es1015 is DOWN: PING CRITICAL - Packet loss = 100%
[16:30:01] <akosiaris>	 jynus: marostegui ^ ?
[16:30:32] <marostegui>	 Checking
[16:30:40] <marostegui>	 I think it was going to be reimaged
[16:30:43] <marostegui>	 Maybe downtime expired?
[16:30:45] <marostegui>	 checking anyways
[16:30:47] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on labstore1007 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labstore1007&var-datasource=eqiad%2520prometheus%252Fops
[16:30:49] <jynus>	 not 15
[16:30:52] <marostegui>	 ah, not 15
[16:30:53] <marostegui>	 ok
[16:30:55] <marostegui>	 so depooling it
[16:30:57] <jynus>	 it could mean a site-wide outage
[16:31:56] <akosiaris>	 it was probably depooled automatically by mediawiki by the lb per that task by jaime is not great
[16:32:06] <akosiaris>	 could indeed cause issues
[16:32:10] <jynus>	 akosiaris: load balancer doesn't work
[16:32:20] <akosiaris>	 yeah I 've read that task
[16:32:20] <jynus>	 and less with network or hw issues
[16:32:20] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool es1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442888
[16:32:32] <akosiaris>	 I 'll gonna try the mgmt
[16:32:33] <marostegui>	 jynus: ^
[16:32:58] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] db-eqiad.php: Depool es1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442888 (owner: 10Marostegui)
[16:33:09] <wikibugs>	 (03CR) 10Marostegui: [V: 032 C: 032] db-eqiad.php: Depool es1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442888 (owner: 10Marostegui)
[16:33:19] <wikibugs>	 10Operations, 10ops-eqiad: rack/setup/install torrelay1001.wikimedia.org - https://phabricator.wikimedia.org/T196701 (10Cmjohnson)
[16:33:28] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool es1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442888 (owner: 10Marostegui)
[16:33:30] <akosiaris>	 eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
[16:33:31] <marostegui>	 Deploying
[16:33:38] <akosiaris>	 something networky is going on
[16:34:19] <akosiaris>	 cmjohnson1: any chance something happened to C2 ?
[16:34:24] <marostegui>	 XioNoX: ^
[16:34:26] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool es1015 - crashed (duration: 00m 57s)
[16:34:26] <jynus>	 mediawiki connection error https://logstash.wikimedia.org/goto/5b54c3ce596239a5908c43866b151449
[16:34:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:37] <akosiaris>	 es1015 (U15) got disconnected from the network
[16:34:42] <jynus>	 only 3000
[16:34:57] <icinga-wm>	 RECOVERY - Host es1015 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[16:34:57] <jynus>	 so load balancer could have worked this time, too early to say
[16:35:06] <marostegui>	 akosiaris: so the server is actually up then?
[16:35:13] <marostegui>	 at least those are good news (so no mysql crash)
[16:35:13] <akosiaris>	 yup
[16:35:14] <XioNoX>	 looking
[16:35:17] <akosiaris>	 working just fine
[16:35:28] <akosiaris>	 Jun 28 16:27:39 es1015 kernel: [9013291.474181] tg3 0000:01:00.0 eth0: Link is down
[16:35:28] <akosiaris>	 Jun 28 16:34:43 es1015 kernel: [9013715.430079] tg3 0000:01:00.0 eth0: Link is up at 1000 Mbps, full duplex
[16:35:35] <akosiaris>	 hmm
[16:35:35] <XioNoX>	 uh
[16:35:44] <cmjohnson1>	  i think we have a loose connection
[16:37:24] <marostegui>	 It is up again
[16:38:07] <akosiaris>	 yeah a ping -f does not spot any missed packets
[16:38:30] <marostegui>	 cmjohnson1: what was it then?
[16:38:38] <marostegui>	 cable misbehaving?
[16:38:55] <cmjohnson1>	 ironically I was just on that switch moving labstore1004 
[16:39:13] <akosiaris>	 probably a loose rj14 jack ?
[16:39:45] <jynus>	 maybe a lose rj11?
[16:39:58] <cmjohnson1>	 haha
[16:40:35] <marostegui>	 Maybe a loose coax?
[16:40:40] <akosiaris>	 :D
[16:40:58] <marostegui>	 I will leave it depooled a bit more to make sure it is all fine
[16:41:03] <jynus>	 +1
[16:41:07] <icinga-wm>	 PROBLEM - puppet last run on es1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:41:10] <akosiaris>	 ah it is probably
[16:41:22] <jynus>	 it would be nice to check how mediawiki behaved
[16:41:22] <akosiaris>	 I can tell you I am doing ping -f for a long time now on it
[16:41:30] <akosiaris>	 and it's nice to see no packets lost
[16:41:37] <jynus>	 but for es* servers maybe the logic is cleaner/simpler
[16:41:49] <jynus>	 plus they have way less connections
[16:42:11] <jynus>	 (the problem is not immediate when it happens, it takes some time to buildup)
[16:42:25] <akosiaris>	 maybe we were just fast enough
[16:43:35] <marostegui>	 Could be yeah
[16:43:56] <jynus>	 it is not a 100% sure problem when it happens
[16:44:18] <jynus>	 e.g. I think it happens with DROP but not REJECT fue to network timeouts
[16:44:58] <wikibugs>	 (03CR) 10Bstorm: [C: 032] WIP labstore: switch labstore1005 to primary in pair [puppet] - 10https://gerrit.wikimedia.org/r/442870 (https://phabricator.wikimedia.org/T187962) (owner: 10Rush)
[16:45:16] <wikibugs>	 (03PS2) 10Bstorm: WIP labstore: switch labstore1005 to primary in pair [puppet] - 10https://gerrit.wikimedia.org/r/442870 (https://phabricator.wikimedia.org/T187962) (owner: 10Rush)
[16:45:39] <_joe_>	 yes exactly that jynus 
[16:45:56] <jynus>	 it is not only that
[16:46:13] <_joe_>	 well in this case it's DROP-like behaviour
[16:46:21] <jynus>	 it requires some cache expiring and other interactions
[16:46:21] <_joe_>	 so the problem *should* be there
[16:46:29] <_joe_>	 heh ok
[16:46:46] <jynus>	 maybe it cannot happen on es* servers because there is not gtid wait, for example
[16:48:12] <bstorm_>	 moritzm: Ok to merge the commit for access for jsamra?
[16:48:44] <_joe_>	 bstorm_: assume it is 
[16:48:49] <bstorm_>	 ok :)
[16:49:13] <_joe_>	 if it was something less simple I would've advised to wait 
[16:49:30] <_joe_>	 for moritzm to respond, but in this case it's safe to assume it's ok
[16:49:33] <bstorm_>	 Fair enough
[16:49:37] <bstorm_>	 That makes sense
[16:49:39] <_joe_>	 and I get you're in the middle of a migration
[16:50:36] <bstorm_>	 👍🏻
[16:51:17] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[16:54:28] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[16:54:54] <chasemp>	 ^ arturo :)
[16:55:11] <arturo>	 how :S
[16:55:30] <arturo>	 according to icinga, it's downtimed
[16:56:21] <arturo>	 perhaps because the alert was before the downtime
[16:57:17] <icinga-wm>	 RECOVERY - toolschecker: tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 1015 bytes in 0.065 second response time
[16:57:33] <arturo>	 I just disabled notifications for all the services on the host and the host itself
[16:57:55] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool es1015" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442891
[16:58:01] <marostegui>	 jynus: ^ going to repool
[16:58:25] <jynus>	 should we wait until tomorrow?
[16:58:39] <marostegui>	 That wouldn't hurt
[16:58:44] <jynus>	 sorry, I don't know if the reason was cought
[16:58:46] <marostegui>	 Let's do it
[16:58:49] <jynus>	 like a mistake or something
[16:58:53] <jynus>	 if not, it won't hur
[16:59:04] <jynus>	 the other server depooled is on a different shard
[16:59:29] <marostegui>	 Yeah, let's leave it till tomorrow
[16:59:51] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Search: migrate elasticsearch cirrus cluster to RAID0 - https://phabricator.wikimedia.org/T198391 (10herron) p:05Triage>03Normal What are your thoughts about RAID10, RAID5(0) or even exposing each individual disk to ES an option for expansion?  I am leery of RAID0 s...
[17:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and Amir1: Dear deployers, time to do the Services – Graphoid / Parsoid / Citoid / ORES deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180628T1700).
[17:01:47] <moritzm>	 ah, sorry, forgot to press ENTER in puppet-merge...
[17:03:00] <arturo>	 moritzm: :-)
[17:04:28] <icinga-wm>	 PROBLEM - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.235 second response time
[17:04:38] <chasemp>	 ^ andrewbogott :)
[17:05:17] <andrewbogott>	 does that mean I made it worse?
[17:06:17] <icinga-wm>	 RECOVERY - puppet last run on es1015 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[17:06:40] <arturo>	 should I downtime toolschecker?
[17:07:22] <chasemp>	 andrewbogott: I thought that was recovery...and it's not my bad
[17:07:25] <chasemp>	 arturo: sure please
[17:07:32] <chasemp>	 we are making too much noise I think unnecessarily
[17:07:41] <chasemp>	 but andrewbogott I'm not sure what is still broken will look
[17:09:00] <arturo>	 downtimed
[17:09:27] <icinga-wm>	 RECOVERY - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.234 second response time
[17:09:51] <chasemp>	 real recovery, apologies for teh chaos.  things went a little sideways^
[17:16:03] <wikibugs>	 10Operations, 10ops-eqiad: mw1239 correctable memory errors - https://phabricator.wikimedia.org/T198398 (10herron) p:05Triage>03High Is a DIMM swap on channel:1 slot:0 the action to take on this?
[17:20:56] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[17:24:16] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[17:31:13] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Search: migrate elasticsearch cirrus cluster to RAID0 - https://phabricator.wikimedia.org/T198391 (10EBernhardson) for the elasticsearch cluster, we could probably lose 3 or 4 machines before there was any thought of potential urgency. Elasticsearch can handle being pro...
[17:35:56] <wikibugs>	 10Operations, 10Discovery, 10Discovery-Search: migrate elasticsearch cirrus cluster to RAID0 - https://phabricator.wikimedia.org/T198391 (10Gehel) As an example, during clsuter restarts, my standard procedure is to restart 3 nodes at a time. So we have strong evidence that loosing 3 nodes is a non issue.
[17:44:28] <icinga-wm>	 PROBLEM - Host labstore1007 is DOWN: CRITICAL - Host Unreachable (208.80.155.106)
[17:48:34] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: [WIP] Add a WMF-specific tool for managing db config in MediaWiki [software/conftool] - 10https://gerrit.wikimedia.org/r/441396 (https://phabricator.wikimedia.org/T197126)
[17:48:36] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Sanitize class names for entities [software/conftool] - 10https://gerrit.wikimedia.org/r/442899
[17:49:18] <icinga-wm>	 RECOVERY - Host labstore1007 is UP: PING OK - Packet loss = 0%, RTA = 0.14 ms
[17:49:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add a WMF-specific tool for managing db config in MediaWiki [software/conftool] - 10https://gerrit.wikimedia.org/r/441396 (https://phabricator.wikimedia.org/T197126) (owner: 10Giuseppe Lavagetto)
[17:49:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Sanitize class names for entities [software/conftool] - 10https://gerrit.wikimedia.org/r/442899 (owner: 10Giuseppe Lavagetto)
[18:06:08] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on labvirt1009 is CRITICAL: cluster=labvirt device=cciss,8 instance=labvirt1009:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labvirt1009&var-datasource=eqiad%2520prometheus%252Fops
[18:14:10] <wikibugs>	 (03PS1) 10Krinkle: webperf: Get graphite_host for coal::processor from Hiera [puppet] - 10https://gerrit.wikimedia.org/r/442900 (https://phabricator.wikimedia.org/T195314)
[18:20:49] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[18:23:18] <wikibugs>	 (03PS1) 10Volans: manage.py: add custom command for GC [software/debmonitor] - 10https://gerrit.wikimedia.org/r/442901
[18:24:18] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[18:24:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] manage.py: add custom command for GC [software/debmonitor] - 10https://gerrit.wikimedia.org/r/442901 (owner: 10Volans)
[18:26:54] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651 (10chasemp) We ran into trouble here:  * RAID issues reported and errors, and the /srv/dumps path was changed to ro * Chris set shelves back to before * labstore10...
[18:33:12] <wikibugs>	 (03CR) 10Krinkle: "Compiler failed (as expected) given I didn't add the Hiera field for this role yet, just wanted to confirm that in case it was being set i" [puppet] - 10https://gerrit.wikimedia.org/r/442900 (https://phabricator.wikimedia.org/T195314) (owner: 10Krinkle)
[18:36:09] <icinga-wm>	 RECOVERY - Device not healthy -SMART- on labvirt1009 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labvirt1009&var-datasource=eqiad%2520prometheus%252Fops
[18:37:51] <wikibugs>	 (03PS2) 10Krinkle: webperf: Get graphite_host for coal::processor from Hiera [puppet] - 10https://gerrit.wikimedia.org/r/442900 (https://phabricator.wikimedia.org/T195314)
[18:38:13] <wikibugs>	 (03PS1) 10QChris: Add .gitreview [software/certcentral] - 10https://gerrit.wikimedia.org/r/442904
[18:38:16] <wikibugs>	 (03CR) 10QChris: [V: 032 C: 032] Add .gitreview [software/certcentral] - 10https://gerrit.wikimedia.org/r/442904 (owner: 10QChris)
[18:43:33] <wikibugs>	 (03CR) 10Krinkle: "No on-disk difference for prod:" [puppet] - 10https://gerrit.wikimedia.org/r/442900 (https://phabricator.wikimedia.org/T195314) (owner: 10Krinkle)
[18:44:02] <wikibugs>	 (03CR) 10Krinkle: "Diff from beta/webperf11:" [puppet] - 10https://gerrit.wikimedia.org/r/442900 (https://phabricator.wikimedia.org/T195314) (owner: 10Krinkle)
[18:48:09] <icinga-wm>	 RECOVERY - puppet last run on labstore1007 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures
[18:52:49] <marxarelli>	 tgr: rolling the train soon. it would be great to have you around to help in case
[18:53:38] <marxarelli>	 i'll be watching the logs like a hawk but more eyeballs are always helpful :)
[18:53:53] <marxarelli>	 oh and thcipriani is helping too
[18:54:25] * thcipriani raring
[18:54:41] <marxarelli>	 again, the plan is: group1 - commons, commons, vet vet vet, group2
[18:57:54] <tgr>	 marxarelli: when are you starting?
[18:58:23] <marxarelli>	 tgr: in 2 minutes, but i can wait a bit if that means you'll be more ready
[18:59:22] <tgr>	 FWIW I'm pretty sure the patch fixes the issue we have seen. I'm not sure at all there are no other issues - the MCR patches together were 3000 lines or so.
[18:59:48] <tgr>	 I don't think we have any better way of finding out than deploying though :(
[19:00:00] <tgr>	 I can be around for an hour, maybe two
[19:00:04] <jouncebot>	 marxarelli: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180628T1900).
[19:01:17] <wikibugs>	 (03PS1) 10Dduvall: Group1 (less commons) to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442909
[19:01:34] <wikibugs>	 10Operations, 10Mail, 10monitoring, 10User-herron, 10Wikimedia-Incident: Improve outbound mail service alerting - https://phabricator.wikimedia.org/T197172 (10herron) p:05High>03Normal
[19:01:43] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review, 10User-herron: Icinga check for sysctl settings - https://phabricator.wikimedia.org/T160060 (10herron) p:05High>03Normal
[19:02:27] <marxarelli>	 tgr: right on. thanks
[19:02:35] <thcipriani>	 hrm did it not move the symlink?
[19:03:03] <marxarelli>	 thcipriani: it's already pointing to wmf.10 apparently
[19:03:20] <thcipriani>	 marxarelli: looks modified on deploy1001
[19:03:26] <thcipriani>	 checkout git status
[19:03:38] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 20 probes of 302 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[19:03:39] <marxarelli>	 oh whoops. thanks for catching that!
[19:03:40] <marxarelli>	 :)
[19:03:44] <thcipriani>	 :)
[19:04:04] <wikibugs>	 (03PS2) 10Dduvall: Group1 (less commons) to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442909
[19:04:05] <wikibugs>	 10Operations, 10Mail, 10monitoring, 10User-herron, 10Wikimedia-Incident: Improve outbound mail service alerting - https://phabricator.wikimedia.org/T197172 (10herron) p:05Normal>03High
[19:04:34] <wikibugs>	 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review, 10User-herron: Icinga check for sysctl settings - https://phabricator.wikimedia.org/T160060 (10herron) p:05Normal>03High
[19:04:58] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 47 probes of 323 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[19:05:07] <wikibugs>	 (03CR) 10Thcipriani: [C: 031] Group1 (less commons) to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442909 (owner: 10Dduvall)
[19:05:11] <thcipriani>	 lgtm
[19:05:31] <wikibugs>	 (03CR) 10Dduvall: [C: 032] Group1 (less commons) to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442909 (owner: 10Dduvall)
[19:07:04] <wikibugs>	 (03Merged) 10jenkins-bot: Group1 (less commons) to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442909 (owner: 10Dduvall)
[19:08:48] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 8 probes of 302 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[19:09:51] <wikibugs>	 (03CR) 10Imarlier: [C: 031] webperf: Get graphite_host for coal::processor from Hiera [puppet] - 10https://gerrit.wikimedia.org/r/442900 (https://phabricator.wikimedia.org/T195314) (owner: 10Krinkle)
[19:09:59] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 0 probes of 323 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[19:16:06] <logmsgbot>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: Group1 (less commons) to 1.32.0-wmf.10
[19:16:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:48] <wikibugs>	 (03CR) 10BryanDavis: "This role may be unused now outside of my test environment in the striker Cloud VPS project. The production striker deploys are now using " [puppet] - 10https://gerrit.wikimedia.org/r/431595 (https://phabricator.wikimedia.org/T147326) (owner: 10Filippo Giunchedi)
[19:17:21] <logmsgbot>	 !log dduvall@deploy1001 Synchronized php: Group1 (less commons) to 1.32.0-wmf.10 (duration: 00m 57s)
[19:17:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:18:54] <marxarelli>	 tgr, thcipriani: ^ (nothing terrible so far)
[19:19:20] <thcipriani>	 so far so good afaict
[19:21:11] <marxarelli>	 i'll give it another 5 minutes or so, and then roll to commonswiki
[19:21:22] <marxarelli>	 but logs look really clean
[19:22:05] <thcipriani>	 almost...suspiciously clean
[19:22:07] <thcipriani>	 :)
[19:22:15] * thcipriani adds drama
[19:22:16] <greg-g>	 knock on wood you jerk
[19:22:57] * bd808 tosses salt over shoulder and spits 3 times to help greg-g out
[19:25:02] <wikibugs>	 (03PS1) 10Rush: toolforge: remove labstore1006 from dumps config [puppet] - 10https://gerrit.wikimedia.org/r/442913
[19:27:18] <marxarelli>	 thcipriani: oh good, there's at least 1 "exceeded memory limit" error now, for wmf.10 :)
[19:27:36] <thcipriani>	 :)
[19:28:07] <marxarelli>	 alright. rolling out to commonswiki
[19:28:19] <thcipriani>	 +1
[19:29:03] <wikibugs>	 (03CR) 10Rush: [C: 032] toolforge: remove labstore1006 from dumps config [puppet] - 10https://gerrit.wikimedia.org/r/442913 (owner: 10Rush)
[19:29:54] <wikibugs>	 (03PS1) 10Dduvall: commonswiki to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442914
[19:31:26] <wikibugs>	 (03CR) 10Dduvall: [C: 032] commonswiki to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442914 (owner: 10Dduvall)
[19:32:41] <wikibugs>	 (03Merged) 10jenkins-bot: commonswiki to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442914 (owner: 10Dduvall)
[19:33:00] <wikibugs>	 10Operations, 10netops: Allow labnet/labnodepool/labvirt to connect to debmonitor hosts/443 - https://phabricator.wikimedia.org/T198375 (10ayounsi) 05Open>03Resolved a:03ayounsi Policy added: ```lang=diff [edit firewall family inet filter labs-in4] +      term debmonitor { +          from { +...
[19:34:23] <logmsgbot>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: commonswiki to 1.32.0-wmf.10
[19:34:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:35:57] <marxarelli>	 good so far...
[19:37:10] <thcipriani>	 yep
[19:41:49] * James_F crosses fingers and toes.
[19:45:17] * marxarelli is seeing something
[19:45:37] <marxarelli>	 lock wait timeouts again
[19:45:48] <marxarelli>	 tgr, thcipriani: ^
[19:46:08] <marxarelli>	 from commons
[19:46:11] <marxarelli>	 rolling back
[19:47:09] <thcipriani>	 +1
[19:47:18] <wikibugs>	 (03PS1) 10Dduvall: Rollback commonswiki to 1.32.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442915
[19:47:29] <tgr>	 uh, that query is still the same wrong one
[19:47:40] <tgr>	 could the patch have gotten lost somehow?
[19:48:03] <marxarelli>	 i verified it was there in git log
[19:48:04] <marxarelli>	 sec
[19:48:29] <tgr>	 let me test it on wmf.10
[19:49:11] <wikibugs>	 (03CR) 10jenkins-bot: Group1 (less commons) to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442909 (owner: 10Dduvall)
[19:49:13] <wikibugs>	 (03CR) 10jenkins-bot: commonswiki to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442914 (owner: 10Dduvall)
[19:51:11] <marxarelli>	 tgr: i screwed up. didn't actually get the patch synced :(
[19:51:19] <icinga-wm>	 PROBLEM - Host labstore1006 is DOWN: PING CRITICAL - Packet loss = 100%
[19:51:24] <marxarelli>	 rolling back, then syncing, then forward
[19:51:27] <marxarelli>	 sheesh
[19:51:50] <DanielK_WMDE>	 did wmf10 go live on commons afterall?
[19:52:34] <thcipriani>	 DanielK_WMDE: didn't go live initially, all looked calm, then commons rolled forward, rolling back commons now
[19:52:52] <DanielK_WMDE>	 ic
[19:53:22] <DanielK_WMDE>	 and the issue was that tgr's fix wasn't deployed? sorry, i joined late
[19:53:39] <thcipriani>	 yes
[19:54:05] <tgr>	 marxarelli: ping me if it's synced, I'll do some testing on mwdebug1001/group0
[19:54:41] <marxarelli>	 tgr: will do
[19:55:08] <icinga-wm>	 RECOVERY - Host labstore1006 is UP: PING WARNING - Packet loss = 86%, RTA = 0.15 ms
[19:55:39] <icinga-wm>	 RECOVERY - HP RAID on labstore1006 is OK: OK: Slot 1: OK: 2I:4:1, 2I:4:2, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK --- Slot 3: OK: 1E:1:1, 1E:1:2, 1E:1:3, 1E:1:4, 1E:1:5, 1E:1:6, 1E:1:7, 1E:1:8, 1E:1:9, 1E:1:10, 1E:1:11, 1E:1:12 - Controller: OK - Battery/Capacitor: OK
[19:56:12] <marxarelli>	 waiting on sync-wikiversions to re-sync. it seems stalled
[19:57:07] <wikibugs>	 (03CR) 10C. Scott Ananian: Replace Tidy with RemexHtml everywhere (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442142 (https://phabricator.wikimedia.org/T175706) (owner: 10Subramanya Sastry)
[19:57:28] <icinga-wm>	 PROBLEM - NFS on labstore1006 is CRITICAL: connect to address 208.80.154.7 and port 2049: Connection refused
[19:57:29] <wikibugs>	 (03PS1) 10Daniel Kinzler: MCR DNM Enable MCR write-both mode on commons beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442918 (https://phabricator.wikimedia.org/T197818)
[19:57:55] <Yann_>	 what's going for Commons? https://phabricator.wikimedia.org/T198350
[19:58:06] <Yann_>	 same error messages as yesterday
[19:58:58] <icinga-wm>	 RECOVERY - puppet last run on labstore1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[19:59:03] <tgr>	 yannf: yeah, known
[19:59:51] <DanielK_WMDE>	 tgr: but no errors showed up on wikidata? that's surprising
[20:01:16] <tgr>	 DanielK_WMDE: yeah, not sure what to make of that
[20:01:25] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651 (10chasemp) labstore1007 has been restored to service and NFS clients and web users are pointed at it (https://gerrit.wikimedia.org/r/c/operations/puppet/+/442913)...
[20:02:19] <DanielK_WMDE>	 tgr: maybe not enough things were being deleted...
[20:02:55] <tgr>	 why is it different from last time though? some kind of time-of-day pattern?
[20:02:58] <logmsgbot>	 !log dduvall@deploy1001 sync-wikiversions aborted: Rollback commonswiki to 1.32.0-wmf.8 (duration: 15m 24s)
[20:03:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:04:13] <logmsgbot>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: Rollback commonswiki to 1.32.0-wmf.8 (resync following ssh hang)
[20:04:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:05:30] <addshore>	 are there errors again?
[20:06:15] <DanielK_WMDE>	 tgr: possibly...
[20:06:52] <DanielK_WMDE>	 addshore: tgr found the issue, but the re-deploy accidentally went out without the fix...
[20:06:57] <marxarelli>	 tgr: syncing the fix now
[20:07:04] <addshore>	 aaaaaah, not great ;)
[20:07:09] <logmsgbot>	 !log dduvall@deploy1001 Synchronized php-1.32.0-wmf.10/includes/page/WikiPage.php: Syncing table locking fix (T198350) (duration: 00m 57s)
[20:07:10] * addshore goes back to eating
[20:07:11] <marxarelli>	 ^ nope :(
[20:07:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:07:12] <stashbot>	 T198350: Rising lock wait timeout SQL errors upon 1.32.0-wmf.10 group1 deployment - https://phabricator.wikimedia.org/T198350
[20:07:14] <marxarelli>	 not great
[20:07:23] <DanielK_WMDE>	 addshore: remember this? https://gerrit.wikimedia.org/r/c/mediawiki/core/+/442889/1/includes/page/WikiPage.php
[20:07:27] <DanielK_WMDE>	 we got it wrong :P
[20:07:49] <tgr>	 sorry, I should have tested on group0 in the first place
[20:07:50] <addshore>	 DanielK_WMDE: yes, that's actually what I was looking at this morning, but got distracted by other tasks after lunch
[20:07:59] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651 (10Bstorm) Cabling information grabbed from these two documents: D3600 manual: http://h20628.www2.hp.com/km-ext/kmcsdirect/emr_na-c04219600-1.pdf D3000 series wiri...
[20:08:39] <DanielK_WMDE>	 addshore: i stared at it too, but didn't see the issue. found it hard to believe that deletions could be the problem, seemed to low volume. 
[20:08:44] <wikibugs>	 (03CR) 10Dduvall: [C: 032] Rollback commonswiki to 1.32.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442915 (owner: 10Dduvall)
[20:09:11] <marxarelli>	 ^ fyi, synced before pushing for review
[20:09:59] <wikibugs>	 (03Merged) 10jenkins-bot: Rollback commonswiki to 1.32.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442915 (owner: 10Dduvall)
[20:10:15] <wikibugs>	 (03CR) 10jenkins-bot: Rollback commonswiki to 1.32.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442915 (owner: 10Dduvall)
[20:10:22] <marxarelli>	 tgr: let me know when you've tested the fix
[20:11:24] <tgr>	 I'll mess up some files on mwdebug1001
[20:15:10] <marxarelli>	 well i really mucked that up. strange that the error didn't surface on wikidata though
[20:19:30] <tgr>	 marxarelli: tested, works
[20:19:40] <tgr>	 did I mention that PsySH is awesome?
[20:19:45] <marxarelli>	 tgr: excellent!
[20:21:49] <marxarelli>	 let's try this again, the right way
[20:24:10] <wikibugs>	 (03PS1) 10Dduvall: commonswiki to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442976
[20:24:50] <marxarelli>	 thcipriani: ^
[20:26:09] <wikibugs>	 (03CR) 10Thcipriani: [C: 031] commonswiki to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442976 (owner: 10Dduvall)
[20:26:12] <wikibugs>	 (03CR) 10Dduvall: [C: 032] commonswiki to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442976 (owner: 10Dduvall)
[20:26:13] <thcipriani>	 well I +1'd but wikibugs is...there it is
[20:27:05] <wikibugs>	 (03Merged) 10jenkins-bot: commonswiki to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442976 (owner: 10Dduvall)
[20:29:05] <logmsgbot>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: commonswiki to 1.32.0-wmf.10 otra vez
[20:29:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:05] <marxarelli>	 looking good this time
[20:46:30] <marxarelli>	 !log Rolling 1.32.0-wmf.10 to group2 following fix and successful re-deploy to group1
[20:46:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:53:03] <logmsgbot>	 !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.10
[20:53:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:54:36] <wikibugs>	 (03PS1) 10Dduvall: all wikis to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442981
[20:54:38] <wikibugs>	 (03CR) 10Dduvall: [C: 032] all wikis to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442981 (owner: 10Dduvall)
[20:54:48] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442981 (owner: 10Dduvall)
[21:00:05] <jouncebot>	 Niharika and mooeypoo: Dear deployers, time to do the PageTriage deploy deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180628T2100).
[21:00:47] <greg-g>	 marxarelli: how's things looking?
[21:01:23] <marxarelli>	 greg-g: things look ok
[21:02:54] <greg-g>	 give it another 10 minutes and we'll call it done?
[21:03:09] <marxarelli>	 sounds good
[21:06:37] <RoanKattouw>	 18:30:47 <Krinkle> RoanKattouw: Be sure to file a task if there isn't one already.
[21:07:02] <RoanKattouw>	 Filed T198422 and T198423
[21:07:03] <stashbot>	 T198423: Linting phase in scap doesn't surface errors - https://phabricator.wikimedia.org/T198423
[21:07:03] <stashbot>	 T198422: Running scap sync-dir php-1.32.0-wmf.10 fails due to syntax error - https://phabricator.wikimedia.org/T198422
[21:07:22] <wikibugs>	 (03CR) 10jenkins-bot: commonswiki to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442976 (owner: 10Dduvall)
[21:07:24] <wikibugs>	 (03CR) 10jenkins-bot: all wikis to 1.32.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442981 (owner: 10Dduvall)
[21:12:54] <greg-g>	 marxarelli: still good?
[21:13:17] <marxarelli>	 greg-g: still good
[21:15:18] <greg-g>	 mooeypoo: all yours
[21:15:25] <Niharika>	 greg-g: Thanks. 
[21:15:40] <greg-g>	 :)
[21:20:48] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[21:24:08] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[21:34:05] <wikibugs>	 10Operations, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar): Deploy mcrouter to production as a wancache backend - https://phabricator.wikimedia.org/T192370 (10Krinkle)
[21:34:26] <wikibugs>	 10Operations, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar): Deploy mcrouter to production as a wancache backend - https://phabricator.wikimedia.org/T192370 (10Krinkle)
[21:34:35] <wikibugs>	 10Operations, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar): Deploy mcrouter to production as a wancache backend - https://phabricator.wikimedia.org/T192370 (10Krinkle)
[21:36:46] <Krinkle>	 moritzm: apergos: available for two quick webperf puppet patches?
[21:36:56] <apergos>	 I am so not here.
[21:37:03] <apergos>	 it is midnight 30 after a very long day
[21:37:29] <Krinkle>	 No problem - don't stay up for this, it can wait. 
[21:38:54] <apergos>	 I could stay u. Ijust do't have any working brain cells left
[21:40:05] <moritzm>	 same here, add me to reviewers and I'll have a look tomorrow
[21:42:05] <Krinkle>	 Thx, done.
[21:43:42] <wikibugs>	 (03PS4) 10Niharika29: Enable Draft namespace and AfC mode for PageTriage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441944 (https://phabricator.wikimedia.org/T198143) (owner: 10MusikAnimal)
[21:43:58] <wikibugs>	 (03CR) 10Niharika29: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441944 (https://phabricator.wikimedia.org/T198143) (owner: 10MusikAnimal)
[21:45:27] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Draft namespace and AfC mode for PageTriage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441944 (https://phabricator.wikimedia.org/T198143) (owner: 10MusikAnimal)
[21:52:52] <logmsgbot>	 !log niharika29@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable Draft namespace and AfC mode for PageTriage on testwiki T198143 (duration: 00m 53s)
[21:52:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:52:56] <stashbot>	 T198143: Enable Draft namespace on testwiki - https://phabricator.wikimedia.org/T198143
[21:55:18] <logmsgbot>	 !log niharika29@deploy1001 Synchronized php-1.32.0-wmf.10/extensions/PageTriage/: Update extension directory (duration: 00m 51s)
[21:55:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:01:51] <wikibugs>	 (03PS2) 10Andrew Bogott: labtestn: use proper labtestn db password from hiera [puppet] - 10https://gerrit.wikimedia.org/r/440366
[22:05:50] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labtestn: use proper labtestn db password from hiera [puppet] - 10https://gerrit.wikimedia.org/r/440366 (owner: 10Andrew Bogott)
[22:13:52] <wikibugs>	 (03PS8) 10Krinkle: profiler-labs: Use FlameGraph-compatible format for xhprof sampler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434522 (https://phabricator.wikimedia.org/T176916)
[22:15:55] <wikibugs>	 (03PS9) 10Krinkle: profiler-labs: Remove 'sampleprofiler' experiment. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434522 (https://phabricator.wikimedia.org/T176916)
[22:16:00] <wikibugs>	 (03PS10) 10Krinkle: profiler-labs: Remove 'sampleprofiler' experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434522 (https://phabricator.wikimedia.org/T176916)
[22:24:35] <wikibugs>	 (03CR) 10jenkins-bot: Enable Draft namespace and AfC mode for PageTriage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441944 (https://phabricator.wikimedia.org/T198143) (owner: 10MusikAnimal)
[22:26:44] <wikibugs>	 (03CR) 10Krinkle: [C: 032] "beta-only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434522 (https://phabricator.wikimedia.org/T176916) (owner: 10Krinkle)
[22:27:59] <wikibugs>	 (03Merged) 10jenkins-bot: profiler-labs: Remove 'sampleprofiler' experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434522 (https://phabricator.wikimedia.org/T176916) (owner: 10Krinkle)
[22:28:59] <icinga-wm>	 PROBLEM - tilerator on maps-test2003 is CRITICAL: connect to address 10.192.16.34 and port 6534: Connection refused
[22:29:19] <icinga-wm>	 PROBLEM - Check systemd state on maps-test2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[22:29:59] <icinga-wm>	 PROBLEM - tileratorui on maps-test2003 is CRITICAL: connect to address 10.192.16.34 and port 6535: Connection refused
[22:39:24] <wikibugs>	 (03CR) 10jenkins-bot: profiler-labs: Remove 'sampleprofiler' experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434522 (https://phabricator.wikimedia.org/T176916) (owner: 10Krinkle)
[22:44:14] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Datasets-General-or-Unknown: rack upgraded storage capacity in labstore100[67].eqiad.wmnet - https://phabricator.wikimedia.org/T196651 (10Nemo_bis)
[22:49:19] <icinga-wm>	 RECOVERY - Check systemd state on maps-test2003 is OK: OK - running: The system is fully operational
[22:49:58] <icinga-wm>	 RECOVERY - tileratorui on maps-test2003 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.095 second response time
[22:50:08] <icinga-wm>	 RECOVERY - tilerator on maps-test2003 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.099 second response time
[22:51:48] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[22:54:59] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[22:56:52] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install graphite2003 - https://phabricator.wikimedia.org/T196483 (10Papaul)
[22:56:55] <wikibugs>	 10Operations, 10ops-codfw, 10netops: switch port configuration for graphite2003 - https://phabricator.wikimedia.org/T198119 (10Papaul) 05Open>03Resolved a:03Papaul switch configuration done   Interface       Admin Link Description ge-5/0/17       up    down graphite2003
[22:59:02] <wikibugs>	 (03PS2) 10Krinkle: Use perftools/xhgui-collector instead of perftools/xhgui [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432016
[23:00:04] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180628T2300).
[23:00:04] <jouncebot>	 bmansurov and RoanKattouw: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:00:09] <bmansurov>	 here
[23:00:25] <RoanKattouw>	 Here
[23:00:56] <RoanKattouw>	 I can do the deploy myself if needed, but in ~5 mins
[23:01:07] <Platonides>	 seems easy to get a sticker, then
[23:05:29] <James_F>	 RoanKattouw: If possible, mooeypoo/Niharika would like you to do a full scap at the end for i18n sync of a previous deploy.
[23:05:57] <RoanKattouw>	 OK
[23:06:01] <Niharika>	 James_F: RoanKattouw: It's not very urgent and can wait. 
[23:06:05] <Niharika>	 Until next week.
[23:06:17] <James_F>	 Sure, but let's not leave prod broken if we don't have ot.
[23:06:33] <Niharika>	 I don't consider testwiki as prod. :P
[23:06:51] <James_F>	 However, until we finally delete the stupid thing, it is.
[23:07:26] <wikibugs>	 (03CR) 10Catrope: [C: 032] Increase Schema:CitationUsage sampling rate to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441567 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[23:08:16] <RoanKattouw>	 OK, well the first step is waiting for Jenkins
[23:08:42] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install authdns2001.wikimedia.org - https://phabricator.wikimedia.org/T196664 (10Papaul)
[23:08:44] <wikibugs>	 10Operations, 10ops-codfw, 10netops: Swith port information for authdns2001 - https://phabricator.wikimedia.org/T198126 (10Papaul) 05Open>03Resolved a:03Papaul switch port configuration done   Interface       Admin Link Description ge-5/0/5        up    down authdns2001  [edit interfaces interface-rang...
[23:09:11] <RoanKattouw>	 bmansurov: Uhh are you sure you put the right change on the Deployments page? It depends on https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/440867/3 which is not merged
[23:09:23] <bmansurov>	 let me see, 1 sec
[23:09:28] <wikibugs>	 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install graphite2003 - https://phabricator.wikimedia.org/T196483 (10Papaul)
[23:09:29] <RoanKattouw>	 So right not it's 1:100, the change that's unmerged and not listed for deployment is 1:6.67, and the one you asked to deploy is 1:1
[23:09:41] <RoanKattouw>	 But the 1:1 change won't merge unless I also +2 the 1:6.67 change
[23:09:45] <bmansurov>	 RoanKattouw: let me rebase, it should not depend on that patch
[23:10:17] <wikibugs>	 (03PS1) 10Smalyshev: Enable smater logging for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/443001 (https://phabricator.wikimedia.org/T197645)
[23:10:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Enable smater logging for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/443001 (https://phabricator.wikimedia.org/T197645) (owner: 10Smalyshev)
[23:11:44] <wikibugs>	 (03PS2) 10Smalyshev: Enable smater logging for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/443001 (https://phabricator.wikimedia.org/T197645)
[23:11:58] <wikibugs>	 (03PS3) 10Bmansurov: Increase Schema:CitationUsage sampling rate to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441567 (https://phabricator.wikimedia.org/T191086)
[23:12:08] <bmansurov>	 RoanKattouw: done
[23:12:35] <wikibugs>	 (03CR) 10Catrope: [C: 032] Increase Schema:CitationUsage sampling rate to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441567 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[23:12:38] <wikibugs>	 (03Abandoned) 10Bmansurov: Increase Schema:CitationUsage sampling rate to 15% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440867 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[23:13:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Increase Schema:CitationUsage sampling rate to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441567 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[23:13:42] <bmansurov>	 hmm
[23:13:51] <wikibugs>	 (03Merged) 10jenkins-bot: Increase Schema:CitationUsage sampling rate to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441567 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[23:14:08] <wikibugs>	 (03CR) 10jenkins-bot: Increase Schema:CitationUsage sampling rate to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441567 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[23:15:38] <RoanKattouw>	 bmansurov: On mwdebug1002, please test
[23:15:44] <bmansurov>	 RoanKattouw: testing
[23:16:55] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10netops: Get Papaul access to network equipment - https://phabricator.wikimedia.org/T198344 (10ayounsi) 05Open>03Resolved Talked to Papaul on IRC, key push to asw-a/b/c/d-codfw and will be pushed progressively to more devices.  I gave him a Juniper configuration and...
[23:17:01] <bmansurov>	 RoanKattouw: works
[23:18:20] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Increase Schema:CitationUsage sampling rate to 100% (T191086) (duration: 00m 51s)
[23:18:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:18:25] <stashbot>	 T191086: Instrument, collect data, and perform the first round of analysis on click-through data on citations/footnotes - https://phabricator.wikimedia.org/T191086
[23:18:49] <wikibugs>	 10Operations, 10ops-codfw, 10netops: Swith port information for authdns2001 - https://phabricator.wikimedia.org/T198126 (10Papaul)    [edit interfaces interface-range vlan-public1-a-codfw]        member ge-5/0/23 { ... }   +    member ge-5/0/5;   [edit interfaces interface-range disabled]   -    member ge-5/...
[23:19:31] <RoanKattouw>	 bmansurov: Deployed
[23:19:41] <bmansurov>	 RoanKattouw: thank you!
[23:20:51] <wikibugs>	 (03PS3) 10Jforrester: Stop loading the MwEmbedSupport extension, part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441518
[23:20:53] <wikibugs>	 (03PS3) 10Jforrester: Stop loading the MwEmbedSupport extension, part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441519
[23:20:55] <wikibugs>	 (03PS3) 10Jforrester: Stop loading the MwEmbedSupport extension, part III [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441520
[23:20:57] <wikibugs>	 (03PS3) 10Jforrester: Stop loading the MwEmbedSupport extension, part IV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441521
[23:21:48] <icinga-wm>	 RECOVERY - Check systemd state on labcontrol1004 is OK: OK - running: The system is fully operational
[23:25:09] <icinga-wm>	 PROBLEM - Check systemd state on labcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[23:47:15] <RoanKattouw>	 OK so now Jenkins has finally merged my cherry-picks
[23:54:06] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.32.0-wmf.10/resources/src/mediawiki.rcfilters/: Watchlist perf fixes (T198359, T198399) (duration: 00m 52s)
[23:54:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:54:11] <stashbot>	 T198359: Reduce number of times we apply highlights - https://phabricator.wikimedia.org/T198359
[23:54:11] <stashbot>	 T198399: Avoid unnecessary calls to updateIfHeightChanged on page load when highlighting is in query params - https://phabricator.wikimedia.org/T198399
[23:56:39] <Lith>	 is being able to see other people's dashboards intended?
[23:56:49] <Lith>	 *dashboards on gerrit
[23:57:28] <James_F>	 Yes.
[23:59:57] <paladox>	 yes