[00:00:04] <jouncebot>	 twentyafterfour: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160623T0000). Please do the needful.
[00:02:23] <grrrit-wm>	 (03PS3) 10Jdlrobson: Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) 
[00:15:50] <grrrit-wm>	 (03CR) 10Alex Monk: "(Note: The Jenkins failure appears to be bogus)" [software] - 10https://gerrit.wikimedia.org/r/295598 (owner: 10Alex Monk)
[00:39:05] <grrrit-wm>	 (03PS1) 10Yurik: Configure Kartotherian geoshapes support [puppet] - 10https://gerrit.wikimedia.org/r/295602 (https://phabricator.wikimedia.org/T134084) 
[00:41:13] <icinga-wm>	 PROBLEM - puppet last run on elastic2011 is CRITICAL: CRITICAL: puppet fail
[01:08:36] <icinga-wm>	 RECOVERY - puppet last run on elastic2011 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[01:53:55] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400928 (10AlexMonk-WMF) a:03AlexMonk-WMF I'm having a go at this.  > It blows up and rebuilds all wikis on every run.  It truncates the meta_p.wiki...
[02:16:51] <grrrit-wm>	 (03PS1) 10Alex Monk: [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 
[02:17:20] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (owner: 10Alex Monk)
[02:17:22] <grrrit-wm>	 (03PS2) 10Alex Monk: [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 
[02:17:48] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (owner: 10Alex Monk)
[02:26:16] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.6) (duration: 11m 19s)
[02:26:26] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:41:15] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.7) (duration: 07m 05s)
[02:41:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:47:59] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Jun 23 02:47:59 UTC 2016 (duration 6m 44s)
[02:48:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:56:15] <grrrit-wm>	 (03PS4) 10Smalyshev: Prepare scap3 deployment for WDQS [puppet] - 10https://gerrit.wikimedia.org/r/295437 (https://phabricator.wikimedia.org/T129144) 
[03:11:49] <grrrit-wm>	 (03PS3) 10Alex Monk: [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 
[03:12:05] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (owner: 10Alex Monk)
[03:13:15] <grrrit-wm>	 (03PS4) 10Alex Monk: [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 
[03:13:32] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (owner: 10Alex Monk)
[03:14:11] <grrrit-wm>	 (03PS5) 10Alex Monk: [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) 
[03:14:28] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) (owner: 10Alex Monk)
[03:17:55] <grrrit-wm>	 (03PS1) 10Alex Monk: Couple of tiny maintain-meta_p.py improvements [software] - 10https://gerrit.wikimedia.org/r/295608 
[03:18:13] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Couple of tiny maintain-meta_p.py improvements [software] - 10https://gerrit.wikimedia.org/r/295608 (owner: 10Alex Monk)
[03:52:04] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 730 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5196960 keys - replication_delay is 730
[03:56:34] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5165141 keys - replication_delay is 0
[04:06:12] <wikibugs>	 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2400964 (10mehtab.ahmed) Author still needs s couple days.
[06:11:48] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[06:12:28] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[06:13:13] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-eo-es: Rebuild for Jessie, cleanup [debs/contenttranslation/apertium-eo-es] - 10https://gerrit.wikimedia.org/r/295611 (https://phabricator.wikimedia.org/T107306) 
[06:15:24] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2400994 (10KartikMistry)
[06:16:17] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[06:19:08] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[06:25:07] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[06:25:58] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[06:29:37] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[06:30:27] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[06:31:07] <icinga-wm>	 PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:18] <icinga-wm>	 PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:18] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:16] <icinga-wm>	 PROBLEM - puppet last run on db2044 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:05] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:33:06] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:36] <icinga-wm>	 PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:16] <icinga-wm>	 PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:34:25] <icinga-wm>	 PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:35:25] <icinga-wm>	 PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:56:07] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[06:56:36] <icinga-wm>	 RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[06:56:46] <icinga-wm>	 RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[06:57:16] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[06:57:16] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures
[06:57:46] <icinga-wm>	 RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:47] <icinga-wm>	 RECOVERY - puppet last run on mw1276 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:58:15] <icinga-wm>	 RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:15] <icinga-wm>	 RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:16] <icinga-wm>	 RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:59:07] <moritzm>	 !log installing spice security updates
[06:59:11] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:09:16] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: reimage or decom db servers on precise - https://phabricator.wikimedia.org/T125028#2401018 (10jcrespo) It is a bit more complex than that- we need to failover the slave actions to the master (and use only the master). Then (for example, the following week) we need to...
[07:14:04] <wikibugs>	 06Operations, 10DBA, 10Phabricator: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2401019 (10jcrespo)
[07:15:54] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: reimage or decom db servers on precise - https://phabricator.wikimedia.org/T125028#1972522 (10jcrespo) I have created T138460 specifically for Phabricator. Related to T137928#2389155, too.
[07:18:55] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2401040 (10jcrespo)  labsdb1002 will never get fixed.
[07:23:08] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-es-ast: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-es-ast] - 10https://gerrit.wikimedia.org/r/295624 (https://phabricator.wikimedia.org/T107306) 
[07:24:31] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2401055 (10KartikMistry)
[07:25:28] <legoktm>	 Krenair: o.O what timezone are you in right now?
[07:25:50] <legoktm>	 Krenair: probably zuul-merger has a corrupt copy of the operations/sofware repo
[07:35:08] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-es-gl: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-es-gl] - 10https://gerrit.wikimedia.org/r/295625 (https://phabricator.wikimedia.org/T107306) 
[07:35:18] <MatmaRex>	 Krenair eschews timezones
[07:35:50] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2401071 (10KartikMistry)
[07:46:32] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/295608 (owner: 10Alex Monk)
[07:49:53] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/295608 (owner: 10Alex Monk)
[07:51:03] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/295598 (owner: 10Alex Monk)
[07:51:21] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) (owner: 10Alex Monk)
[07:51:24] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/295564 (https://phabricator.wikimedia.org/T135029) (owner: 10Ori.livneh)
[07:51:46] <grrrit-wm>	 (03PS3) 10ArielGlenn: add job that dumps history of flow pages [dumps] - 10https://gerrit.wikimedia.org/r/295587 (https://phabricator.wikimedia.org/T89398) 
[07:55:14] <grrrit-wm>	 (03CR) 10Legoktm: "@Ariel: this is ready to be merged now! Everything else is in place, so I'll be turning on the extension for usage next week after Wikiman" [puppet] - 10https://gerrit.wikimedia.org/r/278400 (https://phabricator.wikimedia.org/T116986) (owner: 10ArielGlenn)
[07:56:41] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on ms-be2023 is CRITICAL: CRITICAL: Puppet has 12 failures Filippo Giunchedi https://gerrit.wikimedia.org/r/295492
[07:56:41] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on ms-be2024 is CRITICAL: CRITICAL: Puppet has 12 failures Filippo Giunchedi https://gerrit.wikimedia.org/r/295492
[07:56:41] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on ms-be2025 is CRITICAL: CRITICAL: Puppet has 13 failures Filippo Giunchedi https://gerrit.wikimedia.org/r/295492
[07:56:41] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Puppet has 12 failures Filippo Giunchedi https://gerrit.wikimedia.org/r/295492
[07:56:41] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on ms-be2027 is CRITICAL: CRITICAL: Puppet has 12 failures Filippo Giunchedi https://gerrit.wikimedia.org/r/295492
[07:57:32] <hashar>	 good morning
[08:00:30] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[08:03:01] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5159071 keys - replication_delay is 0
[08:06:50] <mobrovac>	 !log change-prop deploying 45db4f84827
[08:06:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:09:51] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2401109 (10Jdforrester-WMF) 05stalled>03Resolved a:03Jdforrester-WMF In that case, I'm declaring this fixed.
[08:11:29] <grrrit-wm>	 (03PS1) 10Jcrespo: [WIP] Delete deprecated modules coredb_mysql and mysql_wmf [puppet] - 10https://gerrit.wikimedia.org/r/295628 
[08:15:24] <grrrit-wm>	 (03CR) 10Gehel: [C: 031] "restbase1001 still failing for the same unrelated reason. Otherwise lgtm." [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko)
[08:18:39] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 13Patch-For-Review: Install Amiri font (arabic) for svg - https://phabricator.wikimedia.org/T135347#2401125 (10MoritzMuehlenhoff) @Uwe_a : When testing this I noticed that the Amiri font is in fact already installed on the image scalers: It was installed indirecty as...
[08:20:11] <grrrit-wm>	 (03PS10) 10Filippo Giunchedi: prometheus: add nginx reverse proxy [puppet] - 10https://gerrit.wikimedia.org/r/290479 (https://phabricator.wikimedia.org/T126785) 
[08:22:11] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400728 (10jcrespo) I have to add a view to a newly created labs-only table, so it is created for new wikis, too:  ``` MariaDB L...
[08:23:25] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2401137 (10jcrespo)
[08:25:23] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: prometheus: add nginx reverse proxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/290479 (https://phabricator.wikimedia.org/T126785) (owner: 10Filippo Giunchedi)
[08:26:28] <grrrit-wm>	 (03PS2) 10Gehel: Moving elasticsearch masters to new servers [puppet] - 10https://gerrit.wikimedia.org/r/295585 (https://phabricator.wikimedia.org/T138329) 
[08:27:07] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Wikidata, 10Wikimedia-Language-setup, and 2 others: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2258408 (10Jdforrester-WMF) This is now done, right?
[08:27:54] <yuvipanda>	 mobrovac I'm happy to merge https://gerrit.wikimedia.org/r/#/c/295576/5 now if you or someone from services is around
[08:28:06] <mobrovac>	 yuvipanda: cool, thnx
[08:28:23] <mobrovac>	 yuvipanda: yes, we're all on the bench in the park waiting for the other rooms to open :)
[08:28:42] <mobrovac>	 yuvipanda: so you can go ahead and merge it
[08:29:06] <grrrit-wm>	 (03PS2) 10Muehlenhoff: Add Amiri font to the scalers [puppet] - 10https://gerrit.wikimedia.org/r/295498 (https://phabricator.wikimedia.org/T135347) 
[08:30:12] <grrrit-wm>	 (03PS6) 10Yuvipanda: Change-Prop: Added rules for ORES cache updates [puppet] - 10https://gerrit.wikimedia.org/r/295576 (owner: 10Ppchelko)
[08:30:34] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] Change-Prop: Added rules for ORES cache updates [puppet] - 10https://gerrit.wikimedia.org/r/295576 (owner: 10Ppchelko)
[08:31:05] <yuvipanda>	 mobrovac done
[08:31:13] <mobrovac>	 thnx yuvipanda!
[08:33:48] <mobrovac>	 yuvipanda: did you merge it on the puppetmaster?
[08:34:24] <yuvipanda>	 mobrovac yup
[08:34:32] <yuvipanda>	 mobrovac failed on strontium, fixing
[08:34:45] <yuvipanda>	 mobrovac done
[08:34:54] <mobrovac>	 cheers!
[08:39:09] <mobrovac>	 !log change-prop restarting on scb to pick up ores rules https://gerrit.wikimedia.org/r/295576
[08:39:13] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:39:18] <mobrovac>	 Amir1, Pchelolo, akosiaris: ^^
[08:39:45] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: swift: align partition to 1M boundary [puppet] - 10https://gerrit.wikimedia.org/r/295492 
[08:39:53] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: align partition to 1M boundary [puppet] - 10https://gerrit.wikimedia.org/r/295492 (owner: 10Filippo Giunchedi)
[08:40:16] <grrrit-wm>	 (03Abandoned) 10Legoktm: Apache redirects for w.wiki [puppet] - 10https://gerrit.wikimedia.org/r/285932 (https://phabricator.wikimedia.org/T108557) (owner: 10Dereckson)
[08:40:17] <Amir1>	 nice, thank 
[08:40:42] <Amir1>	 *thanks
[08:55:19] <icinga-wm>	 RECOVERY - puppet last run on ms-be2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:58:05] <hoo>	 _joe_: Small: https://nn.wikipedia.org/wiki/Spesial:AboutTopic/Q1955993 Medium: https://nn.wikipedia.org/wiki/Spesial:AboutTopic/Q105598 (Very) large: https://nn.wikipedia.org/wiki/Spesial:AboutTopic/Q2150573
[08:58:54] <hoo>	 Given the item sizes on Wikidata, I guess the average will be between small and medium
[08:59:10] <icinga-wm>	 PROBLEM - HHVM rendering on mw1160 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:01:29] <icinga-wm>	 RECOVERY - HHVM rendering on mw1160 is OK: HTTP OK: HTTP/1.1 200 OK - 66150 bytes in 0.250 second response time
[09:02:01] <grrrit-wm>	 (03PS1) 10Legoktm: Have "https://w.wiki/" do a 301 to Meta-Wiki [puppet] - 10https://gerrit.wikimedia.org/r/295632 
[09:02:43] <grrrit-wm>	 (03PS2) 10Legoktm: Have "https://w.wiki/" do a 301 to Meta-Wiki [puppet] - 10https://gerrit.wikimedia.org/r/295632 (https://phabricator.wikimedia.org/T133485) 
[09:04:10] <icinga-wm>	 RECOVERY - puppet last run on ms-be2024 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[09:07:29] <icinga-wm>	 RECOVERY - swift-object-server on ms-be2022 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[09:08:08] <icinga-wm>	 RECOVERY - swift-container-server on ms-be2022 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server
[09:08:09] <icinga-wm>	 RECOVERY - swift-account-server on ms-be2022 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server
[09:08:29] <icinga-wm>	 RECOVERY - swift-container-updater on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater
[09:09:19] <icinga-wm>	 RECOVERY - swift-object-updater on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater
[09:11:09] <jynus>	 !log syncing etherpadlite.store (m1) on db2010, which had 2 bad chunks
[09:11:13] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:16:28] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering: PNG thumbnail preview of SVG misses some text - https://phabricator.wikimedia.org/T123106#2401262 (10MoritzMuehlenhoff)
[09:18:12] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering: PNG thumbnail preview of SVG misses some text - https://phabricator.wikimedia.org/T123106#1921435 (10MoritzMuehlenhoff) @Efa Thanks for the detailed bug report. This is a bug in the librsvg library we use to generate the PNG thumbnails. I have reproduced that this still...
[09:23:27] <grrrit-wm>	 (03PS3) 10Ema: Have "https://w.wiki/" do a 301 to Meta-Wiki [puppet] - 10https://gerrit.wikimedia.org/r/295632 (https://phabricator.wikimedia.org/T133485) (owner: 10Legoktm)
[09:24:27] <grrrit-wm>	 (03CR) 10Ema: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/295632 (https://phabricator.wikimedia.org/T133485) (owner: 10Legoktm)
[09:35:23] <grrrit-wm>	 (03CR) 10Dereckson: "Superseded by e031db9e, which handles URL shortener requests at Varnish level." [puppet] - 10https://gerrit.wikimedia.org/r/285932 (https://phabricator.wikimedia.org/T108557) (owner: 10Dereckson)
[09:43:02] <grrrit-wm>	 (03PS1) 10Gehel: Remove old maps-test servers from LVS config [puppet] - 10https://gerrit.wikimedia.org/r/295640 
[09:44:43] <grrrit-wm>	 (03PS1) 10Hashar: contint: tidy Nodepool slaves config history [puppet] - 10https://gerrit.wikimedia.org/r/295641 (https://phabricator.wikimedia.org/T126552) 
[09:45:22] <grrrit-wm>	 (03CR) 10Daniel Kinzler: [C: 031] Log PHP/HHVM errors in CLI mode to stderr, not stdout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295554 (https://phabricator.wikimedia.org/T138291) (owner: 10Hoo man)
[09:45:31] <grrrit-wm>	 (03CR) 10Gehel: "If I understand correctly, once this is merged, there is nothing more to do (no restart of LVS / pybal / ...)." [puppet] - 10https://gerrit.wikimedia.org/r/295640 (owner: 10Gehel)
[09:47:09] <grrrit-wm>	 (03CR) 10Hashar: "I have no idea how to properly test the puppet tidy type. Though based on puppet 3.4.3 source code that looks fine." [puppet] - 10https://gerrit.wikimedia.org/r/295641 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar)
[09:48:34] <wikibugs>	 06Operations, 10media-storage: 'swift' user/group IDs should be consistent across the fleet - https://phabricator.wikimedia.org/T123918#2401342 (10fgiunchedi) doable also post-puppet but before machines are in services (i.e. many files owned by swift)  ``` swift-init all stop userdel swift groupdel swift group...
[09:49:16] <wikibugs>	 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2401356 (10Dereckson) Thanks for the update.  If the author has some questions about licensing, I'll be happy to answer them.
[09:49:23] <godog>	 !log reimage ms-be202[567] with incorrect raid settings
[09:49:28] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:50:02] <grrrit-wm>	 (03CR) 10DCausse: [C: 031] Moving elasticsearch masters to new servers [puppet] - 10https://gerrit.wikimedia.org/r/295585 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[09:50:05] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#2401359 (10MoritzMuehlenhoff)
[09:52:12] <grrrit-wm>	 (03PS4) 10Legoktm: Have "https://w.wiki/" do a 301 to Meta-Wiki [puppet] - 10https://gerrit.wikimedia.org/r/295632 (https://phabricator.wikimedia.org/T133485) 
[09:54:17] <wikibugs>	 06Operations: eqiad: 1 hardware access request for labs on real hardware (mwoffliner) - https://phabricator.wikimedia.org/T117095#1766457 (10Andrew) I just spoke to Kelson about this, and I'm willing to set this up if we can provide him with the hardware.  Adding a second bare-metal server will be an 'interestin...
[09:55:25] <grrrit-wm>	 (03PS5) 10Legoktm: Have "https://w.wiki/" do a 301 to Meta-Wiki [puppet] - 10https://gerrit.wikimedia.org/r/295632 (https://phabricator.wikimedia.org/T133485) 
[09:58:06] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/295632 (https://phabricator.wikimedia.org/T133485) (owner: 10Legoktm)
[10:02:08] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] "+1. Looks fine. I 'd be depooling first the servers on palladium using confctl but it is not strictly required." [puppet] - 10https://gerrit.wikimedia.org/r/295640 (owner: 10Gehel)
[10:04:53] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1301 is CRITICAL: Connection timed out
[10:04:53] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1300 is CRITICAL: Connection timed out
[10:06:42] <icinga-wm>	 PROBLEM - Disk space on mw1301 is CRITICAL: Timeout while attempting connection
[10:06:43] <icinga-wm>	 PROBLEM - Disk space on mw1300 is CRITICAL: Timeout while attempting connection
[10:07:00] <elukey>	 this is me
[10:07:32] <elukey>	 (new jobrunners)
[10:07:49] <elukey>	 should be silenced now
[10:08:25] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#657868 (10MoritzMuehlenhoff) That bug is fixed on the new jessie image scaler using 2.4.16 (tested locally, it's not yet pooled into the set o...
[10:08:33] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#2401422 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[10:09:23] <grrrit-wm>	 (03PS1) 10Gehel: Decommission old maps servers [puppet] - 10https://gerrit.wikimedia.org/r/295649 (https://phabricator.wikimedia.org/T138329) 
[10:09:50] <grrrit-wm>	 (03CR) 10Gehel: [C: 04-1] "Not to merge before traffic is moved off those servers" [puppet] - 10https://gerrit.wikimedia.org/r/295649 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[10:13:36] <andrewbogott>	 !log restarting rabbitmq-server on labcontrol1001 (random debugging attempt for T138106)
[10:13:40] <stashbot>	 T138106: Nodepool has trouble taking snapshots on OpenStack labs - https://phabricator.wikimedia.org/T138106
[10:13:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:17:27] <grrrit-wm>	 (03CR) 10Gehel: "maps-test* servers have already been depooled:" [puppet] - 10https://gerrit.wikimedia.org/r/295640 (owner: 10Gehel)
[10:23:03] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 4 failures
[10:33:28] <Amir1>	 jynus: hey, if you are around I have a DBA performance question
[10:33:39] <Amir1>	 tell me when you have some minutes
[10:35:15] <jynus>	 ask- but I always prefer that you create a ticket
[10:35:53] <icinga-wm>	 PROBLEM - swift-account-auditor on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[10:36:23] <icinga-wm>	 PROBLEM - swift-account-reaper on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[10:36:43] <icinga-wm>	 PROBLEM - swift-account-replicator on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[10:39:00] <Amir1>	 jynus: https://phabricator.wikimedia.org/T138444
[10:39:03] <icinga-wm>	 PROBLEM - swift-object-replicator on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[10:39:04] <Amir1>	 made it 
[10:39:10] <Amir1>	 https://gerrit.wikimedia.org/r/#/c/295528/3/includes/Hooks.php
[10:39:22] <icinga-wm>	 PROBLEM - swift-object-server on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[10:39:38] <Amir1>	 we want to know if joining with revision table can make this query faster
[10:41:13] <hoo>	 We're already using that "hack" in a few places in MediaWiki itself, because there's no index on rc_this_id, but one on rc_timestamp
[10:41:23] <Amir1>	 yup
[10:42:02] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: Incorrect text positioning in SVG rasterization (any extreme down scale) (fixed in upstream 2.40.13) - https://phabricator.wikimedia.org/T65703#2401512 (10MoritzMuehlenhoff)
[10:42:26] <grrrit-wm>	 (03PS1) 10Elukey: Add the -T VSL API timeout parameter plus the related formatter. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 
[10:43:23] <icinga-wm>	 RECOVERY - swift-account-reaper on ms-be2025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[10:43:32] <Amir1>	 jynus: ^
[10:43:43] <icinga-wm>	 RECOVERY - swift-object-replicator on ms-be2025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[10:43:43] <icinga-wm>	 RECOVERY - swift-account-replicator on ms-be2025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[10:44:02] <icinga-wm>	 RECOVERY - swift-object-server on ms-be2025 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[10:45:13] <icinga-wm>	 RECOVERY - swift-account-auditor on ms-be2025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[10:46:22] <grrrit-wm>	 (03PS1) 10Jcrespo: [WIP] Move all misc db scripts to db_maintenance module [puppet] - 10https://gerrit.wikimedia.org/r/295654 
[10:46:42] <icinga-wm>	 RECOVERY - Disk space on mw1301 is OK: DISK OK
[10:48:24] <jynus>	 joins are not a problem in general
[10:48:49] <jynus>	 joining with revision is, given that it is the largest table of all our infrastructure
[10:49:09] <jynus>	 and recentchanges was created to avoid using it
[10:50:11] <grrrit-wm>	 (03CR) 10Nikerabbit: [C: 04-1] Deploy Compact Language Links as default (Stage 2) (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295454 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry)
[10:50:23] <icinga-wm>	 RECOVERY - Disk space on mw1300 is OK: DISK OK
[10:51:04] <jynus>	 this is not a matter of opinion, plase generate a query from labs -or anywhere you have a test env-, paste them and I can check them on several wikis
[10:51:09] <jynus>	 Amir1^
[10:51:33] <Amir1>	 we did that before, let me find and paste it here
[10:51:53] <jynus>	 do not paste it here
[10:51:57] <jynus>	 paste it on the task
[10:52:18] <jynus>	 try to centralize things there- this is ok for a heads up, but the rest is better there
[10:54:12] <Amir1>	 jynus: https://phabricator.wikimedia.org/T138444#2401522
[10:54:21] <Amir1>	 did that already
[10:55:33] <Amir1>	 Test it in wikidata
[10:55:36] <Amir1>	 jynus: ^
[10:56:47] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [WIP] Move all misc db scripts to db_maintenance module [puppet] - 10https://gerrit.wikimedia.org/r/295654 (owner: 10Jcrespo)
[10:58:24] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1301 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.429 second response time
[10:59:16] <Amir1>	 I need to go for lunch, I'll be back soon
[11:00:03] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1300 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.015 second response time
[11:03:23] <icinga-wm>	 PROBLEM - puppet last run on kafka2002 is CRITICAL: CRITICAL: puppet fail
[11:11:31] <Jamesofur>	 Some issues on etherpad atm it seems (I assume because of hackathon attention?)  https://usercontent.irccloud-cdn.com/file/yD9IQ6Jo/etherpaderror
[11:15:41] <grrrit-wm>	 (03PS34) 10Alexandros Kosiaris: network: add $production_networks [puppet] - 10https://gerrit.wikimedia.org/r/260926 (https://phabricator.wikimedia.org/T122396) (owner: 10Faidon Liambotis)
[11:15:49] <grrrit-wm>	 (03PS3) 10KartikMistry: Deploy Compact Language Links as default (Stage 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295454 (https://phabricator.wikimedia.org/T136677) 
[11:17:10] <wikibugs>	 06Operations, 13Patch-For-Review: Contain imagemagick on the image scalers with firejail - https://phabricator.wikimedia.org/T135111#2401558 (10MoritzMuehlenhoff) 05Open>03Resolved This is enabled on the image scalers (and app servers for the Score extensions) since last week
[11:17:42] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:19:56] <grrrit-wm>	 (03PS1) 10Gehel: Add new elasticsearch servers to LVS [puppet] - 10https://gerrit.wikimedia.org/r/295657 (https://phabricator.wikimedia.org/T138329) 
[11:30:33] <icinga-wm>	 RECOVERY - puppet last run on kafka2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:32:39] <grrrit-wm>	 (03PS1) 10Muehlenhoff: Add firejail wrapper for rsvg-convert [puppet] - 10https://gerrit.wikimedia.org/r/295659 
[11:32:58] <gehel>	 !log rolling restart of elasticsearch10(01|30|08|36|13|40) to activate new masters
[11:33:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[11:33:05] <grrrit-wm>	 (03PS3) 10Gehel: Moving elasticsearch masters to new servers [puppet] - 10https://gerrit.wikimedia.org/r/295585 (https://phabricator.wikimedia.org/T138329) 
[11:35:18] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Moving elasticsearch masters to new servers [puppet] - 10https://gerrit.wikimedia.org/r/295585 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[11:37:32] <Amir1>	 back
[11:37:55] <Amir1>	 jynus: it would be great if you check it
[11:38:16] <nuria_>	 Would anyone be able to point me to the puppet code hat describes our storage setup for graphite?  maybe _joe_ ? Reading teh graphite module looks like it does not have storage
[11:38:19] <nuria_>	 *the
[11:38:28] <grrrit-wm>	 (03PS1) 10Jcrespo: Depool db1059; Repool db1061 & db1062; increase weight of db1068 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295661 
[11:38:31] <jynus>	 Amir1, I will
[11:38:38] <jynus>	 are you in a rush?
[11:39:05] <grrrit-wm>	 (03PS4) 10KartikMistry: Deploy Compact Language Links as default (Stage 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295454 (https://phabricator.wikimedia.org/T136677) 
[11:39:14] <Amir1>	 jynus: we have a showcase in one hour, I thought it would be great if we can show it to people
[11:39:47] <jynus>	 you want to deploy to production?
[11:44:13] <jynus>	 nuria_, I think ~/puppet/modules/graphite/manifests/init.pp has everthing it needs on storage side
[11:44:29] <nuria_>	 jynus: thank you, looking
[11:45:17] <jynus>	 there is of course the cluster on top of that
[11:46:04] <jynus>	 plus if you are interested on a future setup, we have as a more promising solution (for us) prometheus
[11:47:05] <Amir1>	 jynus: not deploying, just telling people that we merged it and it'll be there soon
[11:47:36] <jynus>	 well, then no rush- if someone complains, tell them it will be
[11:47:55] <Amir1>	 okay :)
[11:48:02] <jynus>	 or if someone complains tell them is all my fault
[11:49:49] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Depool db1059; Repool db1061 & db1062; increase weight of db1068 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295661 (owner: 10Jcrespo)
[11:51:42] <icinga-wm>	 PROBLEM - puppet last run on sca2002 is CRITICAL: CRITICAL: puppet fail
[11:52:29] <grrrit-wm>	 (03PS2) 10Muehlenhoff: Add firejail wrapper for rsvg-convert [puppet] - 10https://gerrit.wikimedia.org/r/295659 
[11:54:30] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Wikidata, 10Wikimedia-Language-setup, and 2 others: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2401650 (10Dzahn) No, i don't think it is done. Still what Rob described above.
[11:58:30] <grrrit-wm>	 (03CR) 10Legoktm: [C: 031] Log PHP/HHVM errors in CLI mode to stderr, not stdout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295554 (https://phabricator.wikimedia.org/T138291) (owner: 10Hoo man)
[12:00:22] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Add firejail wrapper for rsvg-convert [puppet] - 10https://gerrit.wikimedia.org/r/295659 (owner: 10Muehlenhoff)
[12:01:32] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Wikidata, 10Wikimedia-Language-setup, and 2 others: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2401684 (10jcrespo) @Dzahn check your mail.
[12:07:05] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1059; Repool db1061 & db1062; increase weight of db1068 (duration: 00m 39s)
[12:07:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:12:02] <wikibugs>	 06Operations, 10Traffic, 10Wiki-Loves-Monuments, 07HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2401704 (10Dzahn) Yay! Thank you!
[12:14:58] <_joe_>	 moritzm: wow nice!
[12:15:05] <_joe_>	 (firejail for rsvg-convert)
[12:17:33] <icinga-wm>	 RECOVERY - puppet last run on sca2002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[12:20:07] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "PCC is happy at https://puppet-compiler.wmflabs.org/3168/carbon.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/260926 (https://phabricator.wikimedia.org/T122396) (owner: 10Faidon Liambotis)
[12:31:32] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:31:51] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://citoid.svc.codfw.wmnet:1970/api: Timeout on connection while downloading http://citoid.svc.codfw.wmnet:1970/api
[12:32:13] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:32:23] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://citoid.svc.eqiad.wmnet:1970/api: Timeout on connection while downloading http://citoid.svc.eqiad.wmnet:1970/api
[12:32:31] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:33:19] <grrrit-wm>	 (03PS2) 10Elukey: Add the -T VSL API timeout parameter plus the related formatter. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 
[12:33:52] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[12:34:01] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy
[12:34:41] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy
[12:34:51] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[12:34:52] <moritzm>	 _joe_: not yet enabled so far. will first pool mw1291 for some 15 mins of smoketesting in a bit
[12:39:37] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2401809 (10jcrespo) >> It blows up and rebuilds all wikis on every run. >It truncates the meta_p.wiki table but it doesn't drop...
[12:40:11] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:41:02] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:41:21] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://citoid.svc.codfw.wmnet:1970/api: Timeout on connection while downloading http://citoid.svc.codfw.wmnet:1970/api
[12:41:31] <jynus>	 same issue as yesterday?
[12:41:51] <icinga-wm>	 PROBLEM - citoid endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:42:01] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad PMCID) is CRITICAL: Could not fetch url http://citoid.svc.eqiad.wmnet:1970/api: Timeout on connection while downloading http://citoid.svc.eqiad.wmnet:1970/api
[12:43:26] <paravoid>	 again this shit?
[12:43:36] <paravoid>	 akosiaris: weren't you fixing that/
[12:43:45] <jynus>	 I am just asking, how can I check
[12:43:54] <jynus>	 no, he wasn't the one
[12:44:19] <jynus>	 joe implemented it, he said, but it is not his fault
[12:44:48] <_joe_>	 I implemented the checker, not the specs
[12:44:52] <jynus>	 let's calm down :-)
[12:44:54] <_joe_>	 that's what should be removed
[12:45:06] <jynus>	 yes, I understood it like that
[12:45:13] <akosiaris>	 paravoid: fixing ? 
[12:45:14] <akosiaris>	 how ?
[12:45:16] <_joe_>	 (thst is monitoring basically an external resource)
[12:45:25] <jynus>	 yes
[12:45:38] <akosiaris>	 yeah the gov database about PMCID
[12:45:49] <_joe_>	 who is responsible for citoid?
[12:45:54] <akosiaris>	 _joe_: good thing you haven't set those to paging
[12:45:58] <_joe_>	 I CAN PESTER PEOPLE IRL FOR ONCE
[12:46:00] <akosiaris>	 _joe_: mobrovac, how else ?
[12:46:04] <akosiaris>	 who*
[12:46:27] <_joe_>	 actually, people are pestering me more than I like :P
[12:46:28] <akosiaris>	 somehow mobrovac is responsible for 70% of the services 
[12:46:31] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[12:46:31] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy
[12:46:41] <icinga-wm>	 RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy
[12:47:02] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[12:47:03] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: mw1302.eqiad.wmnet issues while booting - https://phabricator.wikimedia.org/T138485#2401862 (10elukey)
[12:47:21] <akosiaris>	 _joe_: anyway, the only way to actually fix that is have the spec inform the checker that this endpoint either should not be monitored or is ok to return an error
[12:47:44] <akosiaris>	 effectively both mean "not monitored"
[12:47:46] <_joe_>	 akosiaris: yes, I am supposed to spin off service_checker from the puppet repo today
[12:47:52] <mobrovac>	 akosiaris: i can just remove these checks from the spec for the time being, i guess
[12:47:59] <_joe_>	 mobrovac: yes
[12:48:03] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy
[12:48:04] <_joe_>	 mobrovac: WHERE ARE U
[12:48:12] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy
[12:48:14] <akosiaris>	 that would be a solution. I 'd appreciate it :-)
[12:48:20] <_joe_>	 I want to storm to a dev's desk shouting "you fix this shit"
[12:48:22] <_joe_>	 :D
[12:48:28] <mobrovac>	 _joe_: IN ROOM 30, but i'm gonna get out soon
[12:48:42] <akosiaris>	 _joe_: you have a few secs.. RRRRRRRRRRRRRRUUUUUUUUUUUUN!!!!!!
[12:48:42] <_joe_>	 mobrovac: you can't hide for long!!
[12:48:51] <mobrovac>	 lol
[12:48:55] <_joe_>	 akosiaris: no I am busy making fun of yurik 
[12:48:59] <_joe_>	 err yuvipanda 
[12:49:04] <_joe_>	 sorry yurik 
[12:49:05] <_joe_>	 :)
[12:49:35] <moritzm>	 !log pooling new jessie image scaler mw1291 for short production smoke testing
[12:49:38] <p858snake|_>	 <_joe_> mobrovac: WHERE ARE U <just build a vb.net gui to tack his IP >.>
[12:49:39] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:49:56] <akosiaris>	 vb.net ? 
[12:50:03] <mobrovac>	 p858snake|_: that'll never get old :)
[12:50:08] <akosiaris>	 lol
[12:50:22] <mobrovac>	 akosiaris: https://www.youtube.com/watch?v=hkDD03yeLnU
[12:50:26] <_joe_>	 p858snake|_: we're in the same physical place, at wikimania
[12:51:13] <_joe_>	 mobrovac: you will appreciate https://www.youtube.com/watch?v=s5ocXFgowZA 
[12:51:14] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, +1 on merging to move forward" [puppet] - 10https://gerrit.wikimedia.org/r/260926 (https://phabricator.wikimedia.org/T122396) (owner: 10Faidon Liambotis)
[12:51:14] <elukey>	 mobrovac: thanks for the link I somehow missed this pearl before today
[12:51:18] <_joe_>	 (it's italian, sorry)
[12:51:37] <elukey>	 "un debian!"
[12:51:47] <_joe_>	 elukey: never gets old, right?
[12:51:50] <elukey>	 nope
[12:54:06] <grrrit-wm>	 (03CR) 10DCausse: [C: 031] Add new elasticsearch servers to LVS [puppet] - 10https://gerrit.wikimedia.org/r/295657 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[12:54:15] <jynus>	 _joe_, what does the expert say "debian, similar to linux code?" something like that?
[12:55:13] <_joe_>	 jynus: he says, let me try to translate
[12:55:23] <grrrit-wm>	 (03CR) 10DCausse: Decommission old maps servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295649 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[12:55:27] <_joe_>	 "holy shit! it's an debian! similar to linux"
[12:56:01] <godog>	 /r/itsaunixsystem
[12:56:27] <jynus>	 I think the worst offender is this one: https://www.youtube.com/watch?v=u8qgehH3kEQ
[12:56:56] <mobrovac>	 "molto simile a linux"
[12:56:57] <mobrovac>	 hahaha
[12:56:59] <godog>	 heheh NCIS delivers
[12:57:16] <grrrit-wm>	 (03CR) 10Gehel: Decommission old maps servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295649 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[12:58:38] <grrrit-wm>	 (03PS2) 10Gehel: Decommission old elasticsearch servers [puppet] - 10https://gerrit.wikimedia.org/r/295649 (https://phabricator.wikimedia.org/T138329) 
[12:58:38] <wikibugs>	 06Operations, 10Traffic: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2401926 (10BBlack) [[  https://tools.ietf.org/html/rfc7905 | RFC 7905 ]] is published! Now we just need a released version of openssl 1.1.x :)  We could test a build of openssl's master branch on cp1...
[12:58:52] <grrrit-wm>	 (03CR) 10Gehel: Decommission old elasticsearch servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295649 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[13:00:25] * yurik throws a banana at _joe_ 
[13:03:17] <grrrit-wm>	 (03PS1) 10Ppchelko: Change-Prop: Ignore certain errors on page_delete and null_edit. [puppet] - 10https://gerrit.wikimedia.org/r/295680 
[13:09:05] <moritzm>	 !log depooled jessie image scaler (mw1291) again, works fine, to be permanently pooled on Monday
[13:09:10] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:11:25] <elukey>	 !log purged some puppet output logs on compiler02.puppet3-diffs.eqiad.wmflabs to free space (disk full)
[13:11:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:11:47] <jynus>	 I synced a change but it did not get to some mediawikis, and now they are querying wrong db hosts
[13:11:48] <mobrovac>	 !log citoid deploying 0129ab0b
[13:11:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:12:24] <mobrovac>	 akosiaris: ^^ done, now the spec checks only citoid and zotero
[13:13:06] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1300 is CRITICAL: Host mw1300 is not in mediawiki-installation dsh group
[13:13:06] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1301 is CRITICAL: Host mw1301 is not in mediawiki-installation dsh group
[13:13:11] <jynus>	 !log running scap pool on mw1300
[13:13:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:13:17] <jynus>	 ah!
[13:13:20] <jynus>	 there it is
[13:13:43] <jynus>	 those hosts are pooled but not being updated
[13:13:50] <mobrovac>	 !log restarting zotero on sca, 6g mem
[13:13:51] <jynus>	 which is really dangerous
[13:13:54] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:13:57] <akosiaris>	 mobrovac: thanks!
[13:14:19] <elukey>	 jynus: those are new jobrunners :)
[13:14:45] <jynus>	 elukey, it is ok if they are not updated, but please repool it
[13:14:52] <jynus>	 production queries are running on them
[13:15:05] <elukey>	 how is that possible?
[13:15:25] <elukey>	 I thought I needed to add them in puppet
[13:15:43] <jynus>	 allow me to update them so they do not fail, you can continue investigating
[13:15:57] <elukey>	 oh sure go ahead, sorry for the trouble
[13:16:04] <elukey>	 I thought that I needed to explicitly pool them first
[13:16:07] <wikibugs>	 06Operations, 10Traffic: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2401975 (10MoritzMuehlenhoff) 1.1.0~pre5 is in Debian experimental.  It has quite some API changes, though. https://wiki.openssl.org/index.php/1.1_API_Changes  In a rebuild of the Debian archive over...
[13:16:09] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] "Merging then" [puppet] - 10https://gerrit.wikimedia.org/r/260926 (https://phabricator.wikimedia.org/T122396) (owner: 10Faidon Liambotis)
[13:16:16] <grrrit-wm>	 (03PS35) 10Alexandros Kosiaris: network: add $production_networks [puppet] - 10https://gerrit.wikimedia.org/r/260926 (https://phabricator.wikimedia.org/T122396) (owner: 10Faidon Liambotis)
[13:16:21] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [V: 032] network: add $production_networks [puppet] - 10https://gerrit.wikimedia.org/r/260926 (https://phabricator.wikimedia.org/T122396) (owner: 10Faidon Liambotis)
[13:16:25] <jynus>	 elukey, see that I am not lying: https://phabricator.wikimedia.org/P3304
[13:16:56] <jynus>	 !log running scap pool on mw1301
[13:17:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:17:26] <elukey>	 jynus: didn't think that you were lying, I was just surprised :)
[13:17:38] <jynus>	 well, I could be wrong
[13:17:56] <jynus>	 that is why I am sharing the same thing I saw, so you have more info
[13:18:02] <elukey>	 thanks!
[13:18:26] <elukey>	 afaik the jobrunners need to be in hiera before starting to pull jobs from the queues
[13:18:35] <elukey>	 mmm
[13:18:38] <jynus>	 mmm, not sure about that
[13:18:56] <jynus>	 they were controled with salt in the past
[13:19:13] <jynus>	 but if this confirms, please report a bug
[13:20:35] <elukey>	 I have two more coming up to speed so I am going to check that now :)
[13:21:56] <icinga-wm>	 PROBLEM - puppet last run on mw2168 is CRITICAL: CRITICAL: puppet fail
[13:21:56] <icinga-wm>	 PROBLEM - puppet last run on mw2249 is CRITICAL: CRITICAL: puppet fail
[13:22:15] <elukey>	 maybe I confused them with the hiera config for the job queues itself
[13:27:29] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2402012 (10fgiunchedi)
[13:27:31] <wikibugs>	 06Operations, 10media-storage, 07Tracking: refresh swift hardware in codfw/eqiad (tracking) - https://phabricator.wikimedia.org/T130012#2402011 (10fgiunchedi)
[13:29:05] <wikibugs>	 06Operations, 10Traffic: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2402015 (10BBlack) Yeah it's going to be a big transition.  I've seen openssl-1.1.x-related patches in nginx master though (which is basically what we're running), so I'm crossing fingers that nginx...
[13:29:29] <grrrit-wm>	 (03PS5) 10KartikMistry: Deploy Compact Language Links as default (Stage 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295454 (https://phabricator.wikimedia.org/T136677) 
[13:30:16] <jynus>	 !log db1059 backup and reimage
[13:30:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:33:15] <elukey>	 I kept only puppet compiler outputs up to 40 days ago on compiler02.puppet3-diffs.eqiad.wmflabs to free space
[13:33:28] <elukey>	 FYI to everybody
[13:33:41] <elukey>	 hope that I didn't cancel something important
[13:34:49] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1304 is CRITICAL: Connection timed out
[13:34:59] <elukey>	 just silenced it
[13:35:41] <elukey>	 ah and also the jobrunners don't need to be in mediawiki-installation
[13:36:22] <hashar>	 !log CI is slowed down due to surge of jobs and lack of instances to build them on ( T133911 ). Queue is 50 for Jessie and 25 for Trusty.
[13:36:23] <stashbot>	 T133911: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911
[13:36:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:36:56] <jynus>	 no, only the service/cron/whatever has to be active
[13:37:24] <grrrit-wm>	 (03CR) 10Elukey: "The change looks really good, thanks again!" [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko)
[13:38:16] <elukey>	 jynus: /me learning, thanks!
[13:39:00] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] ferm: Kill INTERNAL_V4/INTERNAL_V6 definitions [puppet] - 10https://gerrit.wikimedia.org/r/295332 (owner: 10Alexandros Kosiaris)
[13:39:04] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: ferm: Kill INTERNAL_V4/INTERNAL_V6 definitions [puppet] - 10https://gerrit.wikimedia.org/r/295332 
[13:39:15] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] ferm: Kill INTERNAL_V4/INTERNAL_V6 definitions [puppet] - 10https://gerrit.wikimedia.org/r/295332 (owner: 10Alexandros Kosiaris)
[13:39:17] <jynus>	 but they have to be on the "updatable group" (dsh), which I cannot find now
[13:40:34] <elukey>	 jynus: mw1001.eqiad.wmnet is a jobrunner and it is in mediawiki-installation (DSH), and icinga is complaining about 130[01] not being in there 
[13:41:14] <jynus>	 if mediawiki-installation is dsh, then yes, it must be there
[13:41:28] <elukey>	 super
[13:41:33] <jynus>	 sorry, I confuse that with the etcd pooling config
[13:41:40] <jynus>	 *got confused
[13:41:41] <elukey>	 going to reboot mw1304 and then I'll add the last 3
[13:41:59] <jynus>	 I do not work with that very often, I am learning too
[13:43:40] <grrrit-wm>	 (03PS1) 10Elukey: Add mw130[01] to the mediawiki DSH scap list (new jobrunners) [puppet] - 10https://gerrit.wikimedia.org/r/295690 
[13:43:45] <jynus>	 to be fair, part of dsh config is on hiera and part is on modules/scap, not precisely strightforward
[13:46:04] <grrrit-wm>	 (03CR) 10Elukey: [C: 032 V: 032] Add mw130[01] to the mediawiki DSH scap list (new jobrunners) [puppet] - 10https://gerrit.wikimedia.org/r/295690 (owner: 10Elukey)
[13:46:34] <grrrit-wm>	 (03PS5) 10Alexandros Kosiaris: networks::constants: use slice_network_constants [puppet] - 10https://gerrit.wikimedia.org/r/291819 
[13:49:25] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2402049 (10ema) The initial portion of the 3WHS can be used to check whether a remote TCP server supports TFO. For example, with [[https://github.com/secdev/scapy/ | scapy]]:  ``` f...
[13:50:39] <icinga-wm>	 RECOVERY - puppet last run on mw2168 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[13:50:40] <icinga-wm>	 RECOVERY - puppet last run on mw2249 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[13:53:09] <hashar>	 !log Zuul/CI are slowly catching up. I had to drop a few changes that got force merged on the SmashPig repo.
[13:53:09] <hashar>	 should be all fine
[13:53:13] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:53:20] <hashar>	 have to head to dentist  be back later in the evening
[13:58:06] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2402068 (10Krenair) a:05Jdforrester-WMF>03ori
[14:05:12] <ragesock>	 _joe_: here's the ORES query that redirects to http: https://ores.wmflabs.org/v2/scores/enwiki/wp10/642215410?features
[14:05:22] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Wikidata, 10Wikimedia-Language-setup, and 2 others: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2402093 (10Krenair) 05stalled>03Resolved a:03Krenair Yes, this appears to be complete now.
[14:05:46] <ragesock>	 _joe_: the lack of a slash before the param is what triggers that behavior.
[14:06:45] <halfak>	 ragesock, I'm not getting the HTTP redirection. 
[14:07:20] <halfak>	 Oh wait... it appears I am
[14:07:21] <ragesock>	 halfak: do "curl "https://ores.wmflabs.org/v2/scores/enwiki/wp10/642215410?features" -v"
[14:07:22] <halfak>	 Woah
[14:08:34] <halfak>	 So, it looks like you make a request to https, get a 301 for http and then get a 301 for https.
[14:09:09] <halfak>	 Then the https 200 OK's
[14:09:41] <icinga-wm>	 PROBLEM - puppet last run on ms-be2027 is CRITICAL: CRITICAL: Puppet has 1 failures
[14:10:43] <ragesock>	 halfak: yeah. weird, huh?
[14:10:57] <halfak>	 So, I think I know what this is. 
[14:11:33] <halfak>	 The web nodes know to redirect .../<rev_id> to .../<rev_id>/, but they get the request forwarded via http
[14:11:43] <halfak>	 So they provide an http redirect. 
[14:12:36] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2402101 (10fgiunchedi) thanks @Cmjohnson !  I was checking again the allocation and there's a correction: row A isn't needed. Please go with 2x machines in each of B/C/D. wrt 10G vs 1G let'...
[14:12:41] <icinga-wm>	 PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: puppet fail
[14:13:53] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2402105 (10Cmjohnson) @fgiunchedi   That will be  2 each in rows A/C/D for 10G.
[14:14:22] <icinga-wm>	 RECOVERY - puppet last run on ms-be2027 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[14:14:23] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1301 is OK: OK
[14:14:23] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1300 is OK: OK
[14:17:55] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] "PCC is happy at https://puppet-compiler.wmflabs.org/3174/carbon.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/291819 (owner: 10Alexandros Kosiaris)
[14:18:27] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2402115 (10elukey) Finally got the root cause of the VSL timeouts after a chat with Varnish devs.  The Varnish workers use a buffer to...
[14:18:40] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2402117 (10fgiunchedi) @Cmjohnson ok! let's stick with B/C/D for rows and 10G for C/D and 1G for B
[14:26:37] <wikibugs>	 06Operations, 10media-storage, 07Tracking: expand swift hardware in codfw/eqiad (tracking) - https://phabricator.wikimedia.org/T130012#2402130 (10fgiunchedi)
[14:27:07] <thedj>	 etherpad.wm.org seems down ?
[14:27:12] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[14:28:06] <jynus>	 it seems sometimes up, sometimes down
[14:28:28] <jynus>	 I was doing some maintenance on its passive slave
[14:28:34] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-eu-en: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-eu-en] - 10https://gerrit.wikimedia.org/r/295696 (https://phabricator.wikimedia.org/T107306) 
[14:29:06] <jynus>	 I have now stopped it, but the error keeps happening
[14:29:17] <jynus>	 I will restart the service
[14:29:19] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2402140 (10KartikMistry)
[14:29:26] <jynus>	 well, check the errors first
[14:29:51] <icinga-wm>	 PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[14:30:17] <jynus>	 "console - TypeError: Cannot set property 'timestamp' of null"
[14:30:53] <jynus>	 it is flopping
[14:31:18] <jynus>	 I am going to restart it, it is better than the current state
[14:32:06] <jynus>	 !log restarting etherpad-lite.service
[14:32:11] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:32:35] <jynus>	 is it better now?
[14:33:11] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1304 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time
[14:34:12] <jynus>	 akosiaris, see log
[14:34:17] <akosiaris>	 looking
[14:34:19] <jynus>	 it is flopping up and down
[14:34:25] <akosiaris>	 yeah, it's crashing
[14:34:26] <jynus>	 I tried restarting already
[14:34:39] <jynus>	 is it usually a single pad or is this new?
[14:34:59] <akosiaris>	 I think it's a single pad
[14:35:03] <akosiaris>	 lemme delete and see
[14:35:30] <jynus>	 something seems sliglty better
[14:35:47] <jynus>	 maybe it is a couple?
[14:35:58] <akosiaris>	 no, seems like more than one pad
[14:36:05] <akosiaris>	 there is no pattern
[14:36:10] <jynus>	 ah
[14:36:20] <grrrit-wm>	 (03PS1) 10KartikMistry: apertium-eu-en: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-eu-es] - 10https://gerrit.wikimedia.org/r/295697 (https://phabricator.wikimedia.org/T107306) 
[14:36:22] <icinga-wm>	 RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[14:37:10] <wikibugs>	 06Operations, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver, 10MediaWiki-extensions-ContentTranslation, and 4 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2402181 (10KartikMistry)
[14:37:34] <jynus>	 should I prepare the backup?
[14:38:18] <akosiaris>	 !log stopping etherpad-lite on etherpad1001, disabling puppet
[14:38:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:38:32] <akosiaris>	 jynus: hmmm....
[14:38:35] <jynus>	 it could also be a connection overload due to wikimania
[14:38:41] <akosiaris>	 how much effort is that ?
[14:38:46] <jynus>	 I am preparing the backup just in case
[14:39:00] <jynus>	 effor not much, but it will take some time
[14:39:12] <jynus>	 I will be doing it anyway
[14:39:18] <akosiaris>	 jynus: https://grafana.wikimedia.org/dashboard/db/etherpad
[14:39:19] <jynus>	 on a separate db
[14:39:24] <akosiaris>	 users are not so many however
[14:40:09] <wikibugs>	 06Operations, 10media-storage: bring swift eqiad to one zone per row - https://phabricator.wikimedia.org/T138496#2402183 (10fgiunchedi)
[14:40:20] <wikibugs>	 06Operations, 10media-storage: bring swift eqiad to one zone per row - https://phabricator.wikimedia.org/T138496#2402198 (10fgiunchedi) p:05Triage>03Normal
[14:41:27] <jynus>	 let's do something
[14:41:30] <akosiaris>	 so, blocking all access and accessing a known pad does not crash it
[14:41:32] <jynus>	 lets recover the service
[14:41:44] <jynus>	 to test the theroy
[14:41:52] <jynus>	 by creating a blank DB
[14:41:59] <akosiaris>	 consider it tested
[14:42:01] <jynus>	 *to test my theory
[14:42:15] <akosiaris>	 I 've just blocked all access via ferm to etherpad
[14:42:22] <akosiaris>	 and access the SoS pad via SSH tunnel
[14:42:27] <akosiaris>	 it does not crash the service
[14:42:34] <jynus>	 ok, I will rename the table and create a new one
[14:42:40] <akosiaris>	 so, it's either something in the DB or something else
[14:42:58] <jynus>	 do you know if first install need some things already on the db?
[14:43:27] <akosiaris>	 no you don't
[14:43:57] <akosiaris>	 actually it will create everything on its own
[14:44:07] <jynus>	 ok, start the service now
[14:44:15] <jynus>	 it has a black "db"
[14:44:18] <jynus>	 *blank
[14:44:32] <akosiaris>	 seems like it's working
[14:44:36] <akosiaris>	 but of course no data
[14:44:41] <akosiaris>	 but that's expected
[14:44:42] <jynus>	 no prob
[14:44:47] <akosiaris>	 so, it's the DB that's problematic
[14:44:48] <jynus>	 I will recover now the data
[14:45:10] <thedj>	 i guess someone pasted something that it didnt like :)
[14:45:14] <jynus>	 can you add like a message?
[14:45:36] <jynus>	 modify maybe the default message?
[14:45:37] <akosiaris>	 !log debugging etherpad. Started the service with a blank db, looks like it's working
[14:45:43] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:45:57] <akosiaris>	 jynus: er, yes
[14:46:03] <jynus>	 because even if I recover, the one in the current one will be lost
[14:46:10] <jynus>	 I can recover, but probably not merge
[14:46:12] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[14:47:11] <mobrovac>	 !log change-prop deploying 05c72ed24ca
[14:47:16] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:47:42] <akosiaris>	 !log change the default message in etherpad to indicate problems
[14:47:47] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:48:17] <akosiaris>	 indeed I see quite a few people from wikimania
[14:48:19] <akosiaris>	 at least I think
[14:48:22] <jynus>	 sorry I am such an ass
[14:48:25] <jynus>	 can you add
[14:48:37] <jynus>	 "backup anything you add here as it will be deleted"
[14:48:43] <icinga-wm>	 RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[14:49:13] <jynus>	 sorry for that O:-(
[14:49:36] <jynus>	 or if you just tell me where that is, I can do it :-)
[14:49:37] <akosiaris>	 done
[14:50:08] <jynus>	 thank you, will take it from here
[14:51:11] <akosiaris>	 seems like people are temporarily backing off already
[14:51:20] <jynus>	 :-/
[14:51:56] <jynus>	 to be fair, it is not like we have a proper HA setup, or that that is needed
[14:52:18] <akosiaris>	 if it's db corruption indeed, that would not have helped much
[14:52:26] <jynus>	 true
[14:52:45] <jynus>	 also, we have preciselly old m1 down
[14:52:48] <jynus>	 *have
[14:54:21] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2402245 (10AlexMonk-WMF) >>! In T138450#2401809, @jcrespo wrote: >>> It blows up and rebuilds all wikis on every run. >>It trunc...
[14:54:27] <akosiaris>	 jynus: have an ETA by any chance ?
[14:54:34] <akosiaris>	 or is there something I should be doing ?
[14:54:40] * akosiaris feels itchy
[14:55:03] * _joe_ scratches akosiaris 
[14:55:32] <akosiaris>	 so, people have actually stopped trying to access etherpad
[14:55:50] <akosiaris>	 a few here and there but not the usual rate obviously
[14:55:52] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2402249 (10jcrespo) > Who in the ops group could be its 'sole owner'? No one else has any access to these systems.  Maybe labs a...
[15:00:04] <jouncebot>	 anomie, ostriches, thcipriani, marktraceur, and aude: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160623T1500). Please do the needful.
[15:00:04] <jouncebot>	 kart_: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[15:01:28] <thcipriani>	 I can SWAT, kart_ ping me when you're around for SWAT
[15:01:46] <kart_>	 thcipriani: around.
[15:01:48] <jynus>	 akosiaris, I said it was not going to be fast :-(, I am on it
[15:02:03] <kart_>	 thcipriani: usual deploy on test host first as dblist is new this time.
[15:02:05] <akosiaris>	 jynus: yeah understood
[15:02:08] <thcipriani>	 kart_: ack
[15:02:22] <akosiaris>	 jynus: are you restoring a different table/db ?
[15:02:47] <jynus>	 I am not restoring yet, I am still searthing for the table
[15:02:51] <jynus>	 but I will
[15:03:09] <akosiaris>	 I am thinking I should rename the table back and try to make some more sense from the issue
[15:03:19] <akosiaris>	 should I ?
[15:03:24] <jynus>	 no
[15:03:24] <grrrit-wm>	 (03PS6) 10Thcipriani: Deploy Compact Language Links as default (Stage 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295454 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry)
[15:03:26] <akosiaris>	 or will this cause problems for you ?
[15:03:34] <jynus>	 do that on a separate instance
[15:04:05] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295454 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry)
[15:04:14] <akosiaris>	 https://github.com/ether/etherpad-lite/issues/2946
[15:04:55] <grrrit-wm>	 (03Merged) 10jenkins-bot: Deploy Compact Language Links as default (Stage 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295454 (https://phabricator.wikimedia.org/T136677) (owner: 10KartikMistry)
[15:05:44] <robh>	 !log starting data backup of labmon1001, halting statsite/graphite/carbon-relay on system
[15:05:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:06:20] <thcipriani>	 kart_: patch has been pulled to mw1017
[15:06:42] <kart_>	 thcipriani: testing..
[15:09:56] <grrrit-wm>	 (03PS1) 10Elukey: Restore mc1007 memcached growth factor to 1.05 as the rest of the cluster. [puppet] - 10https://gerrit.wikimedia.org/r/295702 (https://phabricator.wikimedia.org/T129963) 
[15:10:40] <wikibugs>	 06Operations, 06Discovery, 06Maps: Ensure Maps servers can be installed easily (automation + documentation) - https://phabricator.wikimedia.org/T138501#2402296 (10Gehel)
[15:11:00] <robh>	 !log puppet disabled on labmon1001 along with all icinga alerting.  data migration to usb in progress via root screen session
[15:11:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:11:15] <kart_>	 thcipriani: still some moar testing, 2-3 minutes please.
[15:11:26] <thcipriani>	 kart_: kk, np
[15:11:40] <chasemp>	 robh: labmon goes well?
[15:12:38] <kart_>	 thcipriani: looks good, go ahead.
[15:12:42] <robh>	 chasemp: so far so good, data copy in progress with --ignore-existing to try to cut down on cruft
[15:12:49] <thcipriani>	 kart_: ack
[15:13:18] <robh>	 and the various services (statsite/graphite/carbon-relay) are stopped
[15:13:35] <robh>	 but only at 3% of copy so it may still take a long time =[
[15:15:29] <logmsgbot>	 !log thcipriani@tin Synchronized dblists/clldefault.dblist: SWAT: [[gerrit:295454|Deploy Compact Language Links as default (Stage 2)]] PART I (duration: 00m 41s)
[15:15:34] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:16:20] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:295454|Deploy Compact Language Links as default (Stage 2)]] PART II (duration: 00m 28s)
[15:16:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:16:43] <grrrit-wm>	 (03PS1) 10Yurik: Prevent geoshape service use by production [puppet] - 10https://gerrit.wikimedia.org/r/295703 
[15:16:48] <yurik>	 gehel, ^
[15:16:51] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:295454|Deploy Compact Language Links as default (Stage 2)]] PART III (duration: 00m 24s)
[15:16:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:16:58] <thcipriani>	 ^ kart_ check please
[15:18:07] <kart_>	 thcipriani: testing.
[15:18:41] <wikibugs>	 06Operations, 06Discovery, 06Maps, 03Maps-Sprint: Ensure Maps servers can be installed easily (automation + documentation) - https://phabricator.wikimedia.org/T138501#2402333 (10Yurik)
[15:18:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: puppet fail
[15:20:48] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 031] Prepare scap3 deployment for WDQS [puppet] - 10https://gerrit.wikimedia.org/r/295437 (https://phabricator.wikimedia.org/T129144) (owner: 10Smalyshev)
[15:20:56] <wikibugs>	 06Operations, 06Discovery, 06Maps, 07Epic: Epic: cultivating the Maps garden - https://phabricator.wikimedia.org/T137616#2402334 (10Yurik)
[15:22:23] <kart_>	 thcipriani: nice. All well.
[15:22:38] <thcipriani>	 kart_: glad to hear it :)
[15:22:40] <kart_>	 and thanks!
[15:23:04] <grrrit-wm>	 (03PS2) 10Yurik: Prevent geoshape service use by production [puppet] - 10https://gerrit.wikimedia.org/r/295703 
[15:23:32] <icinga-wm>	 RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:24:19] <grrrit-wm>	 (03CR) 10Elukey: "Puppet compiler looks good:" [puppet] - 10https://gerrit.wikimedia.org/r/295702 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey)
[15:26:10] <akosiaris>	 !log stop etherpad-lite, etherpad is down
[15:26:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:26:58] <yurik>	 thcipriani, swating?
[15:27:10] <thcipriani>	 yurik: finished
[15:27:26] <yurik>	 thcipriani, bummer, i forgot to add a small labs-only patch
[15:27:37] <yurik>	 if you don't mind, i will sync it now
[15:27:40] <thcipriani>	 yurik: oh, if you need a patch merged, there's still time
[15:27:46] <grrrit-wm>	 (03PS1) 10Elukey: Add mw1303 to the scap MW DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295706 
[15:27:50] <yurik>	 or you can do it :)
[15:27:56] <thcipriani>	 which patch?
[15:28:13] <yurik>	 https://gerrit.wikimedia.org/r/#/c/295580/1/wmf-config/CommonSettings-labs.php
[15:28:15] <yurik>	 thcipriani, ^
[15:28:36] <grrrit-wm>	 (03CR) 10Elukey: [C: 032 V: 032] Add mw1303 to the scap MW DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295706 (owner: 10Elukey)
[15:28:46] <grrrit-wm>	 (03PS2) 10Thcipriani: LABS: Enable geoshapes graph protocol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295580 (https://phabricator.wikimedia.org/T138192) (owner: 10Yurik)
[15:28:53] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295580 (https://phabricator.wikimedia.org/T138192) (owner: 10Yurik)
[15:29:10] <yurik>	 thx!
[15:29:18] <thcipriani>	 :D
[15:29:34] <grrrit-wm>	 (03Merged) 10jenkins-bot: LABS: Enable geoshapes graph protocol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295580 (https://phabricator.wikimedia.org/T138192) (owner: 10Yurik)
[15:30:58] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/CommonSettings-labs.php: SWAT: [[gerrit:295580|LABS: Enable geoshapes graph protocol]] (duration: 00m 29s)
[15:31:03] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:32:33] <marktraceur>	 akosiaris: What's up with EPL?
[15:32:59] <akosiaris>	 marktraceur: EPL ? 
[15:33:13] <marktraceur>	 akosiaris: Etherpad Lite
[15:33:19] <akosiaris>	 a etherpad lite ? it's not a great piece of software, what else ? 
[15:33:34] <marktraceur>	 akosiaris: Just wondering, I saw you logged it went down
[15:33:47] <akosiaris>	 yeah, it crashes constantly
[15:33:56] <akosiaris>	 marktraceur: https://github.com/ether/etherpad-lite/issues/2946
[15:33:59] <akosiaris>	 is the upstream issue
[15:34:07] <akosiaris>	 still trying to figure out what is going on
[15:34:18] <HaeB>	 marktraceur: see also https://lists.wikimedia.org/pipermail/wikimania-l/2016-June/007570.html
[15:34:19] <marktraceur>	 Ah.
[15:34:19] <akosiaris>	 nothing conclusive yet, aside from what you see in that ticket
[15:34:39] <marktraceur>	 K, yeah, I was worried it would affect the hackathon at WM
[15:37:06] <wikibugs>	 06Operations, 06Performance-Team, 13Patch-For-Review: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963#2402418 (10elukey) I have been very slow to follow up on this task due to other priorities, I'll add a summary very soon for all my findings. gerrit/295702 is a...
[15:41:18] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: mw1302.eqiad.wmnet issues while booting - https://phabricator.wikimedia.org/T138485#2402421 (10elukey) Also installing mw1304 leads to:  Loading Linux 4.4.0-1-amd64 ... Loading initial ramdisk ...  Tried to hard reboot, nothing. Not sure where it gets stuck into..
[15:41:42] <grrrit-wm>	 (03PS1) 10RobH: setting up temp spare host for labmon1001 data migrations [dns] - 10https://gerrit.wikimedia.org/r/295711 
[15:41:53] <icinga-wm>	 PROBLEM - puppet last run on mw2206 is CRITICAL: CRITICAL: Puppet has 1 failures
[15:44:53] <grrrit-wm>	 (03PS1) 10RobH: setting WMF4724 install params [puppet] - 10https://gerrit.wikimedia.org/r/295718 
[15:45:20] <grrrit-wm>	 (03CR) 10RobH: [C: 032] setting up temp spare host for labmon1001 data migrations [dns] - 10https://gerrit.wikimedia.org/r/295711 (owner: 10RobH)
[15:49:14] <grrrit-wm>	 (03CR) 10RobH: [C: 032 V: 032] setting WMF4724 install params [puppet] - 10https://gerrit.wikimedia.org/r/295718 (owner: 10RobH)
[15:50:42] <icinga-wm>	 PROBLEM - puppet last run on etherpad1001 is CRITICAL: Timeout while attempting connection
[15:51:55] <grrrit-wm>	 (03CR) 10Gehel: Prevent geoshape service use by production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295703 (owner: 10Yurik)
[15:52:48] <grrrit-wm>	 (03CR) 10Gehel: Prevent geoshape service use by production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295703 (owner: 10Yurik)
[16:00:04] <jouncebot>	 godog, moritzm, and _joe_: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160623T1600). Please do the needful.
[16:00:52] <icinga-wm>	 PROBLEM - etherpad.wikimedia.org HTTP on etherpad1001 is CRITICAL: Connection refused
[16:01:37] <godog>	 no puppet swat patches afaics
[16:01:52] <icinga-wm>	 PROBLEM - etherpad_lite_process_running on etherpad1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js
[16:03:42] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[16:06:02] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5153362 keys - replication_delay is 0
[16:07:22] <icinga-wm>	 RECOVERY - puppet last run on mw2206 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[16:07:54] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2402468 (10chasemp) I'm 100% on board for being on the hook for this process, or at least being a partner.  We can coparent :)...
[16:09:49] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2402474 (10chasemp) p:05Triage>03High
[16:10:33] <grrrit-wm>	 (03PS1) 10BBlack: r::c::perf: move all commentary inline [puppet] - 10https://gerrit.wikimedia.org/r/295722 
[16:10:35] <grrrit-wm>	 (03PS1) 10BBlack: r::c::perf: enable tcp metrics saving [puppet] - 10https://gerrit.wikimedia.org/r/295723 
[16:10:38] <grrrit-wm>	 (03PS1) 10BBlack: cache roles: add tcpmhash_entries=64K to kernel cmdline [puppet] - 10https://gerrit.wikimedia.org/r/295724 
[16:11:12] <icinga-wm>	 RECOVERY - etherpad_lite_process_running on etherpad1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js
[16:12:05] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] r::c::perf: move all commentary inline [puppet] - 10https://gerrit.wikimedia.org/r/295722 (owner: 10BBlack)
[16:12:13] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] r::c::perf: enable tcp metrics saving [puppet] - 10https://gerrit.wikimedia.org/r/295723 (owner: 10BBlack)
[16:12:31] <icinga-wm>	 RECOVERY - etherpad.wikimedia.org HTTP on etherpad1001 is OK: HTTP OK: HTTP/1.1 200 OK - 7928 bytes in 0.017 second response time
[16:12:53] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] cache roles: add tcpmhash_entries=64K to kernel cmdline [puppet] - 10https://gerrit.wikimedia.org/r/295724 (owner: 10BBlack)
[16:14:00] <bblack>	 16:11:22 Looking potential typos from '/typos' file
[16:14:00] <bblack>	 16:11:30 ./modules/install_server/files/dhcpd/linux-host-entries.ttyS1-115200:    fixed-address WMF4724.eqiad.wmnet;
[16:14:02] <bblack>	 16:11:30 Typos found!
[16:14:13] <bblack>	 ^ jenkins is -1 on new puppet commits for unrelated things again...
[16:15:11] <Krenair>	 yeah, https://gerrit.wikimedia.org/r/#/c/295718/
[16:15:36] <bblack>	 yeah, the typos file is wrong in this case
[16:16:00] <bblack>	 it doesn't like WMF4724.eqiad.wmnet because it expects anything[0-9]{4}.eqiad to start the number with 1
[16:16:28] <Krenair>	 robh
[16:16:31] <Krenair>	 it was merged before jenkins could vote
[16:16:43] <bblack>	 either way, jenkins' vote is faulty
[16:16:47] <robh>	 ahh
[16:16:52] <robh>	 did i break something?
[16:17:14] <bblack>	 robh: yeah your V+2 overrode what would've been a jenkins -1, which now applies to all future commits until it's fixed :P
[16:17:21] <grrrit-wm>	 (03CR) 10Gehel: "Yep, that error looks weird, but also appears on the production catalogue, a clear indication that it is not related to this change. Still" [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko)
[16:17:30] <robh>	 bblack: wait, all future commits of other folks?
[16:17:43] <bblack>	 yes, because the merged state of the repo fails validation checks
[16:17:49] <robh>	 fuck me sorry =[
[16:18:23] <robh>	 so the ideal way for me to fix is just make a single fix patch independently of the original?
[16:18:26] <bblack>	 but the fix really isn't in your commit, the "typos" validation check is in error (it's wrongly not liking your change)
[16:18:33] <godog>	 !log swift: add ms-be202[234] weight 1000 - T136630
[16:18:34] <stashbot>	 T136630: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630
[16:18:38] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:19:27] <robh>	 bblack: so my change was good, but the check was bad.  my not waiting for the check to fail has put the repo into a bad state of always validating 
[16:19:28] <robh>	 ?
[16:19:41] <robh>	 (due to my forcing it through rather than wait)
[16:19:50] <robh>	 argh, i never force it through and just did today ;_;
[16:20:42] <icinga-wm>	 RECOVERY - puppet last run on etherpad1001 is OK: OK: Puppet is currently enabled, last run 1 hour ago with 0 failures
[16:21:03] <robh>	 So what is the fix (i imagine someone is already doing it now, and i dont intend to skip validation again anytime soon cuz this.)
[16:21:10] <robh>	 but wanna know the fix anyhow =]
[16:21:15] <bblack>	 I'm trying to figure out a fix, regexes are hards
[16:21:27] <robh>	 sorry to break stuff =[
[16:22:00] <robh>	 in particular sorry to break stuff and then force you to deal with (they arent really very) regular expressions.
[16:24:13] <wikibugs>	 06Operations, 10Gerrit, 06Release-Engineering-Team, 06WMF-Legal, and 2 others: Gerrit seemingly violates data retention guidelines - https://phabricator.wikimedia.org/T114395#2402519 (10chasemp) 05Open>03Resolved there were still some files older than 90 there   `-rw-r-----  1 root adm   39441687 Feb 2...
[16:26:29] <grrrit-wm>	 (03PS2) 10Gehel: Remove old maps-test servers from LVS config [puppet] - 10https://gerrit.wikimedia.org/r/295640 
[16:26:59] <bblack>	 where does the operations-puppet-typos check get defined?
[16:27:18] <bblack>	 I'm trying to figure out if whatever regex engine is going to support negative lookbehind or not
[16:27:36] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Remove old maps-test servers from LVS config [puppet] - 10https://gerrit.wikimedia.org/r/295640 (owner: 10Gehel)
[16:27:54] <chasemp>	 !log remove old log files on ytterbium for T114395
[16:27:55] <stashbot>	 T114395: Gerrit seemingly violates data retention guidelines - https://phabricator.wikimedia.org/T114395
[16:27:58] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:28:12] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 2 failures
[16:29:55] <bblack>	 well, worst case I break the typo check and either it still -1's everything, or it fails to check some typos and someone else has to fix it later
[16:31:24] <grrrit-wm>	 (03PS1) 10BBlack: exclude WMFNNNN.$dcname.wmnet from hostname typos [puppet] - 10https://gerrit.wikimedia.org/r/295727 
[16:32:01] <bblack>	 I wonder if updates to the typos file applied to jenkins check of the same change
[16:32:47] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] exclude WMFNNNN.$dcname.wmnet from hostname typos [puppet] - 10https://gerrit.wikimedia.org/r/295727 (owner: 10BBlack)
[16:32:58] <bblack>	 apparently they do!
[16:33:13] <bblack>	 so either I broke the NNNN.$dcname typo checks completely, or I fixed them to exclude WMF, one of the two :)
[16:33:30] <grrrit-wm>	 (03PS2) 10BBlack: r::c::perf: move all commentary inline [puppet] - 10https://gerrit.wikimedia.org/r/295722 
[16:33:32] <grrrit-wm>	 (03PS2) 10BBlack: r::c::perf: enable tcp metrics saving [puppet] - 10https://gerrit.wikimedia.org/r/295723 
[16:33:34] <grrrit-wm>	 (03PS2) 10BBlack: cache roles: add tcpmhash_entries=64K to kernel cmdline [puppet] - 10https://gerrit.wikimedia.org/r/295724 
[16:35:26] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] r::c::perf: move all commentary inline [puppet] - 10https://gerrit.wikimedia.org/r/295722 (owner: 10BBlack)
[16:35:41] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] cache roles: add tcpmhash_entries=64K to kernel cmdline [puppet] - 10https://gerrit.wikimedia.org/r/295724 (owner: 10BBlack)
[16:36:00] <bblack>	 I think that -1 is probably legitimate :)
[16:36:36] <grrrit-wm>	 (03PS3) 10BBlack: cache roles: add tcpmhash_entries=64K to kernel cmdline [puppet] - 10https://gerrit.wikimedia.org/r/295724 
[16:44:35] <wikibugs>	 06Operations, 03Discovery-Search-Sprint: Followup on elastic1026 blowing up May 9, 21:43-22:14 UTC - https://phabricator.wikimedia.org/T134829#2402581 (10Dzahn) 05Open>03Resolved a:03Dzahn If it's on Done on a board, the status should also be resolved,   right?
[16:45:15] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, thanks Nicko for taking care of this!" [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko)
[16:46:38] <wikibugs>	 06Operations, 03Discovery-Search-Sprint: Followup on elastic1026 blowing up May 9, 21:43-22:14 UTC - https://phabricator.wikimedia.org/T134829#2402597 (10Gehel) The usage in Discovery is to move tasks to Done on board and let our product owner have a final review and closing them.
[16:46:53] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[16:47:45] <wikibugs>	 06Operations, 03Discovery-Search-Sprint: Followup on elastic1026 blowing up May 9, 21:43-22:14 UTC - https://phabricator.wikimedia.org/T134829#2402598 (10Dzahn) 05Resolved>03Open
[16:47:45] <elukey>	 jynus: I am currently fighting a bit with mw1304 (another jobrunner) that had problems with boot, now puppet is running.. Let me know if you see issues like before
[16:48:06] <wikibugs>	 06Operations, 03Discovery-Search-Sprint: Followup on elastic1026 blowing up May 9, 21:43-22:14 UTC - https://phabricator.wikimedia.org/T134829#2278919 (10Dzahn) a:05Dzahn>03None
[16:48:09] <grrrit-wm>	 (03PS4) 10EBernhardson: logstash: Update filters for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[16:48:40] <jynus>	 elukey, busy with something else, maybe someone else can help you, if not ,please wait some time
[16:49:08] <elukey>	 yes sure! I meant to tell you that I am working on mw1304, that's it :)
[16:51:12] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad: etherpad database issues - https://phabricator.wikimedia.org/T138516#2402605 (10jcrespo)
[16:53:06] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2402631 (10jcrespo)
[16:54:45] <grrrit-wm>	 (03PS5) 10EBernhardson: logstash: Update filters for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[16:55:29] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: mw1302.eqiad.wmnet issues while booting - https://phabricator.wikimedia.org/T138485#2402641 (10elukey) Actually now, mw1304 looks weird only from the console, I managed to run puppet using install-console on palladium..
[16:59:52] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] "Manual testing on one node shows very small loadavg increase, so this seems un-dangerous to turn on and watch perf graphs for the next few" [puppet] - 10https://gerrit.wikimedia.org/r/295723 (owner: 10BBlack)
[17:00:04] <jouncebot>	 yurik, gwicke, cscott, arlolra, and subbu: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160623T1700).
[17:00:25] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] "Compiler output looks ok, but can't see deeply through augeaus results.." [puppet] - 10https://gerrit.wikimedia.org/r/295724 (owner: 10BBlack)
[17:03:00] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2402704 (10jcrespo) p:05Triage>03High
[17:03:41] <bblack>	 !log cache perf tuning marker: start rollout of tcp_no_metrics_save:0
[17:03:46] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:11:51] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 031] Log PHP/HHVM errors in CLI mode to stderr, not stdout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295554 (https://phabricator.wikimedia.org/T138291) (owner: 10Hoo man)
[17:26:45] <elukey>	 cmjohnson: hi! do you have a minute?
[17:26:57] <cmjohnson>	 Hi elukey
[17:26:58] <cmjohnson>	 sure
[17:27:06] <cmjohnson>	 I see your tasks about the apaches
[17:27:19] <cmjohnson>	 app servers
[17:27:39] <elukey>	 yeah, mw1304 is a bit weird.. I am following the puppet run from palladium since I've used wmf-reimage, but I can't access the server console
[17:27:43] <elukey>	 seems stuck somewhere
[17:28:05] <elukey>	 and before that I tried to powercycle thinking that it was a boot problem
[17:28:38] <elukey>	 but same issue (stuck somewhere while booting, not output/errors)
[17:28:45] <elukey>	 can you double check?
[17:28:58] <elukey>	 maybe it is me missing something really trivial
[17:29:21] <cmjohnson>	 when i plug in...it gives me the os prompt
[17:29:30] <cmjohnson>	 mw1304 login:
[17:29:56] <robh>	 !log labmon1001 cpy changed back to local usb, errors on network transfer for ownership.  resumed rsync with append flag to local usb disk.
[17:30:00] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:30:12] <icinga-wm>	 PROBLEM - HP RAID on ms-be2022 is CRITICAL: CHECK_NRPE: Socket timeout after 20 seconds.
[17:30:43] <cmjohnson>	 i think the serial console is not set correctly
[17:32:30] <subbu>	 will deploy new version of parsoid shortly ... 
[17:35:42] <elukey>	 cmjohnson: all right, that kinda makes sense, after your "mw1304 login:" I felt a bit frustrated :P
[17:36:41] <elukey>	 (afk for ~30 mins)
[17:36:51] <cmjohnson>	 elukey: fixed
[17:37:01] <subbu>	 !log starting parsoid deploy
[17:37:04] <cmjohnson>	 Debian GNU/Linux 8 mw1304 ttyS1
[17:37:04] <cmjohnson>	 mw1304 login:
[17:37:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:37:15] <elukey>	 cmjohnson: thanks!!!!
[17:37:22] <elukey>	 what was the issue?
[17:37:32] <elukey>	 I mean, can I recognize it in the future and fix it by myself?
[17:38:21] <cmjohnson>	 no, it's a setting that I got wrong when I initially racked them
[17:40:23] <grrrit-wm>	 (03PS2) 10Gehel: LABS: Enable graphoid geoshapes [puppet] - 10https://gerrit.wikimedia.org/r/295581 (https://phabricator.wikimedia.org/T138192) (owner: 10Yurik)
[17:40:51] <subbu>	 !log synced new code; restarted parsoid on wtp1001 as a canary
[17:40:56] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:43:22] <subbu>	 lgtm. restarting on all nodes
[17:44:17] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2286195 (10ksmith) Thanks to everyone who helped get this unstuck and fixed!
[17:44:23] <grrrit-wm>	 (03CR) 10Gehel: "This is only deployment-prep configuration. Can be merge as-is. Conversation continues for the prod part." [puppet] - 10https://gerrit.wikimedia.org/r/295581 (https://phabricator.wikimedia.org/T138192) (owner: 10Yurik)
[17:44:45] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] LABS: Enable graphoid geoshapes [puppet] - 10https://gerrit.wikimedia.org/r/295581 (https://phabricator.wikimedia.org/T138192) (owner: 10Yurik)
[17:45:24] <subbu>	 !log finished deploying parsoid sha 18022c96
[17:45:28] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:50:43] <icinga-wm>	 PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Puppet has 1 failures
[17:50:43] <wikibugs>	 06Operations, 06Community-Liaisons, 10Wikimedia-Mailing-lists: mailman maint window 2016-06-xx 16:00 - 18:00 UTC - https://phabricator.wikimedia.org/T138228#2402815 (10Aklapper)
[17:53:24] <wikibugs>	 06Operations, 13Patch-For-Review: Staging area for the next version of the transparency report - https://phabricator.wikimedia.org/T138197#2402822 (10Aklapper) In reply to T138197#2395473: See task summary: "semi-private staging area"
[17:55:12] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2402828 (10jcrespo)
[17:55:55] <elukey>	 cmjohnson: got it thanks!
[17:57:44] <grrrit-wm>	 (03PS4) 10Jdlrobson: Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) 
[18:02:37] <grrrit-wm>	 (03PS1) 10Elukey: Add mw1304 to the MW scap DSH list [puppet] - 10https://gerrit.wikimedia.org/r/295740 
[18:04:19] <grrrit-wm>	 (03CR) 10Elukey: [C: 032] Add mw1304 to the MW scap DSH list [puppet] - 10https://gerrit.wikimedia.org/r/295740 (owner: 10Elukey)
[18:06:48] <grrrit-wm>	 (03PS6) 10EBernhardson: logstash: Update filters for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[18:09:47] <robh>	 !log labmon1001 powering down for reimage
[18:09:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:10:07] <robh>	 cmjohnson: Ok, its all on you once labmon1001 powers down.  Please set aside the old disks in order, in case I messed up and we have to fall back to them.
[18:10:29] <robh>	 then lemme know when the new ones are in and ready to go, and i can reimage it and restore data.
[18:11:29] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 031] Restore mc1007 memcached growth factor to 1.05 as the rest of the cluster. [puppet] - 10https://gerrit.wikimedia.org/r/295702 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey)
[18:12:00] <icinga-wm>	 PROBLEM - HP RAID on ms-be2023 is CRITICAL: CHECK_NRPE: Socket timeout after 20 seconds.
[18:12:10] <icinga-wm>	 PROBLEM - HP RAID on ms-be2024 is CRITICAL: CHECK_NRPE: Socket timeout after 20 seconds.
[18:14:06] <grrrit-wm>	 (03PS7) 10EBernhardson: logstash: Update filters for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[18:14:08] <grrrit-wm>	 (03PS6) 10EBernhardson: Duplicate logstash output to alternate elasticsearch cluster [puppet] - 10https://gerrit.wikimedia.org/r/295442 
[18:15:21] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] logstash: Update filters for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) (owner: 10EBernhardson)
[18:15:21] <icinga-wm>	 RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[18:15:25] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Duplicate logstash output to alternate elasticsearch cluster [puppet] - 10https://gerrit.wikimedia.org/r/295442 (owner: 10EBernhardson)
[18:17:21] <grrrit-wm>	 (03PS8) 10EBernhardson: logstash: Update filters for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[18:17:23] <grrrit-wm>	 (03PS7) 10EBernhardson: Duplicate logstash output to alternate elasticsearch cluster [puppet] - 10https://gerrit.wikimedia.org/r/295442 
[18:18:15] <grrrit-wm>	 (03PS2) 10Muehlenhoff: Update debdeploy config for maps caches [puppet] - 10https://gerrit.wikimedia.org/r/295211 
[18:20:53] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Update debdeploy config for maps caches [puppet] - 10https://gerrit.wikimedia.org/r/295211 (owner: 10Muehlenhoff)
[18:26:25] <grrrit-wm>	 (03PS9) 10EBernhardson: logstash: Update logstash for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[18:27:30] <grrrit-wm>	 (03PS1) 10Urbanecm: [cleanup] Delete old throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295744 
[18:31:44] <elukey>	 !log mw130[0134] - new jobrunners installed and pooled (happened automatically after the fist puppet run) 
[18:31:49] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:33:53] <elukey>	 ok just finished mw1304, seems to work fine
[18:35:12] <elukey>	 going afk, just logged --^ a summary of the new jobrunners
[18:35:33] <elukey>	 I hoped to have some explicit pool action, but it seems embedded in puppet
[18:35:41] <elukey>	 anyhow, logs looks good
[18:35:57] <elukey>	 let me know if anything weird comes up during the next hours!
[18:49:37] <wikibugs>	 06Operations, 06Discovery, 10Kartotherian, 06Maps: Maps - enable Geoshapes on production - https://phabricator.wikimedia.org/T138525#2402939 (10Gehel)
[18:49:59] <wikibugs>	 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2402952 (10Gehel)
[18:50:01] <wikibugs>	 06Operations, 06Discovery, 10Kartotherian, 06Maps: Maps - enable Geoshapes on production - https://phabricator.wikimedia.org/T138525#2402951 (10Gehel)
[19:00:04] <jouncebot>	 thcipriani: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160623T1900).
[19:01:23] <thcipriani>	 hold your horses. Holding train for the moment while some patches are deployed.
[19:01:54] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400728 (10scfc) I once thought of a tool that does something like `diff -u <(mysqldump --no-data) <(what-views-and-triggers-and...
[19:05:20] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2402968 (10jcrespo) @scfc redactatron is a horrible piece of software and we do not want to expand it, but kill it. It has its f...
[19:21:47] <RoanKattouw>	  !log Synced patches for T137288 and T137593
[19:23:53] <greg-g>	 delete the leading space :)
[19:24:20] <greg-g>	 !log 19:21 < RoanKatto>  !log Synced patches for T137288 and T137593
[19:24:24] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:25:59] <grrrit-wm>	 (03PS1) 10Thcipriani: all wikis to 1.28.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295747 
[19:27:40] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] all wikis to 1.28.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295747 (owner: 10Thcipriani)
[19:28:18] <grrrit-wm>	 (03Merged) 10jenkins-bot: all wikis to 1.28.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295747 (owner: 10Thcipriani)
[19:29:01] <logmsgbot>	 !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.7
[19:29:06] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:49:20] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master).
[19:55:47] <thcipriani>	 Abuse filter does not seem happy after rolling forward :\ https://phabricator.wikimedia.org/T138529 + https://phabricator.wikimedia.org/T138528
[19:56:14] <hashar>	 I guess it is some rule on enwiki which ends up triggering the flow of notices
[19:56:19] <hashar>	 they are probably easy fix
[19:59:42] <jzerebecki>	 thcipriani: mind if I deploy a config change?
[19:59:59] <thcipriani>	 jzerebecki: go ahead
[20:00:21] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 032] Log PHP/HHVM errors in CLI mode to stderr, not stdout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295554 (https://phabricator.wikimedia.org/T138291) (owner: 10Hoo man)
[20:01:40] <grrrit-wm>	 (03PS2) 10JanZerebecki: Log PHP/HHVM errors in CLI mode to stderr, not stdout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295554 (https://phabricator.wikimedia.org/T138291) (owner: 10Hoo man)
[20:01:48] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 032] Log PHP/HHVM errors in CLI mode to stderr, not stdout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295554 (https://phabricator.wikimedia.org/T138291) (owner: 10Hoo man)
[20:02:28] <grrrit-wm>	 (03Merged) 10jenkins-bot: Log PHP/HHVM errors in CLI mode to stderr, not stdout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295554 (https://phabricator.wikimedia.org/T138291) (owner: 10Hoo man)
[20:03:03] <robh>	 !log labmon1001 data restore at 100gb 50minutes in, 298gb total for restoration
[20:03:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:04:29] <logmsgbot>	 !log jzerebecki@tin Synchronized wmf-config/CommonSettings.php: Log PHP/HHVM errors in CLI mode to stderr, not stdout T138291 (duration: 00m 28s)
[20:04:30] <stashbot>	 T138291: Latest wikidata JSON dump contains unexpected sql warning - https://phabricator.wikimedia.org/T138291
[20:04:34] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:05:06] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2403090 (10AlexMonk-WMF) >>! In T138450#2401133, @jcrespo wrote: > I have to add a view to a newly created labs-only table, so i...
[20:05:11] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[20:09:36] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] Add Amiri font to the scalers [puppet] - 10https://gerrit.wikimedia.org/r/295498 (https://phabricator.wikimedia.org/T135347) (owner: 10Muehlenhoff)
[20:12:02] <jzerebecki>	 done
[20:17:02] <Dereckson>	 !log Run initSiteStats.php on cebwiki (T138533)
[20:17:03] <stashbot>	 T138533: Update statistics count on cebwiki - https://phabricator.wikimedia.org/T138533
[20:17:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:17:43] <grrrit-wm>	 (03PS1) 10Alex Monk: Replace impossible watchlist_counts custom view with full view of already-filtered watchlist_count [software] - 10https://gerrit.wikimedia.org/r/295751 
[20:19:30] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:19:48] <grrrit-wm>	 (03CR) 10EBernhardson: "test deployed to beta cluster, looks to be working with no warnings/errors." [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/295575 (https://phabricator.wikimedia.org/T138335) (owner: 10EBernhardson)
[20:20:13] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2403157 (10AlexMonk-WMF) @jcrespo: I was wrong in my last comment and have uploaded https://gerrit.wikimedia.org/r/295751 which,...
[20:21:25] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Add de_dot filter and rename to logstash-filters-wikimedia [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/295575 (https://phabricator.wikimedia.org/T138335) (owner: 10EBernhardson)
[20:21:29] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[20:21:49] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING
[20:21:55] <bd808>	 ebernhardson: Do you want to deploy that now or should we wait for the filters that use it?
[20:24:37] <ebernhardson>	 bd808: hmm, might as well wait i suppose. I think i'll cherry pick the patch back to master though, the first patch for making deployment-logstash3 work might not ever even need proper merging, just shutdown the host and remove the patch from deployment-puppetmaster
[20:24:45] <ebernhardson>	 s/master/production/
[20:24:58] <ebernhardson>	 the puppet also seems to be working, but still testing things
[20:26:26] <bd808>	 works for me. we should try not to forget that a trebuchet deploy is needed before the de_dot filter can be used in prod
[20:26:43] <ebernhardson>	 ahh thats right, i guess i'll make it easy and just sync it out now without restarting logstash
[20:27:11] <bd808>	 *nod* that should be safe
[20:28:06] <ebernhardson>	 bd808: no jenkins on that repo btw, needs v+2 and merge
[20:28:23] <harej>	 jynus: re T137058 – is it simply a matter of getting around the production/labs split by storing the data in analytics instead and having that be the data store?
[20:28:24] <stashbot>	 T137058: Investigation: MediaWiki extension for database reports - https://phabricator.wikimedia.org/T137058
[20:28:27] <bd808>	 doh. I can do that
[20:28:52] <grrrit-wm>	 (03CR) 10BryanDavis: [V: 032] Add de_dot filter and rename to logstash-filters-wikimedia [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/295575 (https://phabricator.wikimedia.org/T138335) (owner: 10EBernhardson)
[20:29:27] <wikibugs>	 06Operations, 06Discovery, 10Kartotherian, 06Maps: Maps - enable Geoshapes on production - https://phabricator.wikimedia.org/T138525#2403184 (10Yurik)
[20:31:04] <ebernhardson>	 !log synced out latest logstash-plugins via trebuchet
[20:31:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:32:00] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2403185 (10jcrespo) So good news: we have been able to recover until just a few minutes before crashing (which means virtually no data loss).  The problem is we have yet to reimpor...
[20:32:47] <jynus>	 harej, what do you mean with analytics?
[20:33:19] <harej>	 "Another option is to send labs data to a specialized analytics store, where creating reports on the fly would be much easier and faster."
[20:34:06] <harej>	 While having a MediaWiki extension that pulls directly from the production DB is unacceptable, and likewise pulling from the Labs replicas is unacceptable, it sounds like putting the data in analytics is acceptable?
[20:35:18] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2403187 (10AlexMonk-WMF) >>! In T138450#2402468, @chasemp wrote: > * For [[ https://phabricator.wikimedia.org/T135029#2400629 |...
[20:36:28] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2403189 (10Effeietsanders) Can you please make sure to not overwrite the things added later? I re-did a bunch of the work I did this afternoon in preperation of the discussions tom...
[20:40:31] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2403193 (10jcrespo) @Effeietsanders as I sent on my email- no data will be added, deleted or overwritten on the current etherpad. **I promised that and I will maintain that.** We t...
[20:42:21] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2403195 (10jcrespo) Clarification: no data will be added, deleted or overwritten on the current etherpad **by us (operators)**, you are expected to do that as usual (use the curren...
[20:42:26] <grrrit-wm>	 (03PS10) 10EBernhardson: logstash: Update logstash for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[20:47:30] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[20:47:57] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2403204 (10jcrespo) @scfc BTW, I actually documented [[ https://wikitech.wikimedia.org/wiki/MariaDB/Sanitarium_and_Labsdbs | red...
[20:53:00] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2403209 (10jcrespo) @akosiaris I managed to reimport the tables, with two different timestamps. They are on the same host (m1-master), and I have granted permission to the same use...
[21:08:53] <icinga-wm>	 PROBLEM - Disk space on labstore1004 is CRITICAL: DISK CRITICAL - free space: /srv/project/maps 4161047 MB (0% inode=-)]
[21:09:21] <akosiaris>	 jynus: ok after some mucking around I have a second instance using the restore_2 DB and using a different port
[21:09:36] <akosiaris>	 lemme make it a bit more permanent and fix the rest
[21:10:56] <jynus>	 maybe try it first with ssh?
[21:11:27] <jynus>	 I have like a 90% confidence on 1 and a 50% on 2
[21:11:56] <chasemp>	 !log silence alerts for labstore1004 for setup
[21:12:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:15:02] <jynus>	 akosiaris, but if you make it work, with all the beers I own joe and chris, and now you I will get broke!
[21:16:42] <bblack>	 there's a pretty serious save-timing regression that kicks off around 19:20-ish
[21:16:54] <icinga-wm>	 PROBLEM - etherpad_lite_process_running on etherpad1001 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js
[21:18:11] <bblack>	 it's either sync-wikiversions at circa 19:20, or maybe the sync-dir at 19:15, which I guess might be 19:21 < RoanKattouw>  !log Synced patches for T137288 and T137593
[21:18:29] <bblack>	 I thought we used to get these echo'd to -ops? (the sync traffic)
[21:18:29] <ebernhardson>	 restbase math server === mathoid ?
[21:19:58] <bblack>	 I guess they are, but the stamps in grafana don't match the stamps in -ops.  perhaps one's at the start and the other at the end
[21:19:59] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: cache::misc: Set up a temporary etherpad host [puppet] - 10https://gerrit.wikimedia.org/r/295757 (https://phabricator.wikimedia.org/T138516) 
[21:20:27] <akosiaris>	 bblack: I'd appreciate a review of ^
[21:20:36] <thcipriani>	 bblack: from the look of it, it's probably related to the wikiversion sync
[21:20:40] <bblack>	 anyways, still, it's probably RK's "synced patches" or 19:29 < logmsgbot> !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.7
[21:21:37] <grrrit-wm>	 (03CR) 10BBlack: [C: 031] cache::misc: Set up a temporary etherpad host [puppet] - 10https://gerrit.wikimedia.org/r/295757 (https://phabricator.wikimedia.org/T138516) (owner: 10Alexandros Kosiaris)
[21:21:51] <thcipriani>	 I'm going to guess that the move to wmf.7 had a pretty big impact. I'll rollback.
[21:22:01] <akosiaris>	 bblack: thanks!
[21:22:04] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 2 failures
[21:22:04] <icinga-wm>	 ACKNOWLEDGEMENT - etherpad_lite_process_running on etherpad1001 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js alexandros kosiaris See T138516 as to why there are currently 2 instances
[21:22:15] <bblack>	 the save timing regression is ~ +30%, it's pretty bad
[21:22:47] <akosiaris>	 jynus: I owe you beers over this as well so we are going to get even ;-)
[21:22:59] <MaxSem>	 ori, ^
[21:23:00] <grrrit-wm>	 (03PS2) 10BBlack: stream.wm.o: move to cache_misc in DNS [dns] - 10https://gerrit.wikimedia.org/r/295385 (https://phabricator.wikimedia.org/T134871) 
[21:23:09] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2403326 (10ori) >>! In T138450#2403187, @AlexMonk-WMF wrote: >>>! In T138450#2402468, @chasemp wrote: >> * For [[ https://phabri...
[21:23:10] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] cache::misc: Set up a temporary etherpad host [puppet] - 10https://gerrit.wikimedia.org/r/295757 (https://phabricator.wikimedia.org/T138516) (owner: 10Alexandros Kosiaris)
[21:23:14] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: cache::misc: Set up a temporary etherpad host [puppet] - 10https://gerrit.wikimedia.org/r/295757 (https://phabricator.wikimedia.org/T138516) 
[21:23:25] <ori>	 MaxSem: thanks for the ping, catching up with backlog now.
[21:23:29] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [V: 032] cache::misc: Set up a temporary etherpad host [puppet] - 10https://gerrit.wikimedia.org/r/295757 (https://phabricator.wikimedia.org/T138516) (owner: 10Alexandros Kosiaris)
[21:23:41] <ori>	 what's the tl;dr? bad regression, coincides with wmf7 release?
[21:23:48] <bblack>	 ori: yes
[21:23:49] <Krenair>	 ty ori
[21:23:57] <bblack>	 but there's some other minor changes around that time, too
[21:24:16] <bblack>	 I only see the big hit in savetiming, not other metrics that I looked at so far
[21:24:27] <logmsgbot>	 !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group2 wikis to wmf.6
[21:24:31] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:24:48] <ori>	 that other ones aren't typically sensitive to backend response time, since the majority of requests are served from varnish
[21:24:53] <thcipriani>	 seems that way, I just rolled back group2 wikis that went out today
[21:25:04] <ori>	 thanks
[21:25:11] <ori>	 the hourly flame graphs are usually useful (https://performance.wikimedia.org/xenon/svgs/hourly/)
[21:26:12] <grrrit-wm>	 (03PS1) 10Thcipriani: Revert "all wikis to 1.28.0-wmf.7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295758 
[21:26:56] <RoanKattouw>	 bblack, ori: Sorry, my laptop died, only seeing this now
[21:26:57] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] Revert "all wikis to 1.28.0-wmf.7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295758 (owner: 10Thcipriani)
[21:27:07] <RoanKattouw>	 The "patches" I synced were security patches, see the bugs I tagged
[21:27:24] <RoanKattouw>	 I don't think offhand that they should be able to cause save time regressions but let me skim them
[21:27:34] <grrrit-wm>	 (03Merged) 10jenkins-bot: Revert "all wikis to 1.28.0-wmf.7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295758 (owner: 10Thcipriani)
[21:28:01] <thcipriani>	 yeah, judging from https://grafana.wikimedia.org/dashboard/db/save-timing there is a pretty strong correlation with sync-wikiversions i.e. wmf.7
[21:28:07] <ori>	 other thing I'm doing is looking at xenon-grep
[21:28:08] <thcipriani>	 doubt it was the security patches
[21:28:19] <ori>	 https://dpaste.de/PR23/raw
[21:28:23] <RoanKattouw>	 Nope, they are not at all related to saving
[21:29:08] <ori>	 actually -2: is better
[21:29:56] <ori>	 https://dpaste.de/aHpY/raw
[21:29:59] <ori>	 ApiStashEdit::checkCache looks suspect
[21:30:14] <ori>	  1.19% -> 8.33
[21:30:36] <ori>	 Aaron's made some changes to that recently
[21:31:50] <ori>	 no diff b/w php-1.28.0-wmf.[67]/includes/api/ApiStashEdit.php , but calling code could have changed
[21:32:05] <bd808>	 that doesn't show very prominently in https://performance.wikimedia.org/xenon/svgs/hourly/2016-06-23_20.index.reversed.svgz (0.4%)
[21:32:23] <grrrit-wm>	 (03PS2) 10Gehel: (WIP) Notify TileratorUI on new expiry files [puppet] - 10https://gerrit.wikimedia.org/r/295450 (https://phabricator.wikimedia.org/T108459) (owner: 10Yurik)
[21:33:18] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Set up etherpad-restore.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/295760 (https://phabricator.wikimedia.org/T138516) 
[21:34:20] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Set up etherpad-restore.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/295760 (https://phabricator.wikimedia.org/T138516) (owner: 10Alexandros Kosiaris)
[21:36:22] <ori>	 bd808: that's for all index.php reqs, whereas the xenon-grep invocation filters for traces that include EditPage
[21:36:35] <bd808>	 *nod*
[21:37:44] <grrrit-wm>	 (03CR) 10Gehel: "Already putting this out there for review, but it's too late, I probably missed something obvious." [puppet] - 10https://gerrit.wikimedia.org/r/295450 (https://phabricator.wikimedia.org/T108459) (owner: 10Yurik)
[21:39:48] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 13Patch-For-Review, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2403399 (10akosiaris) Thanks to @jcrespo 's efforts and using the `etherpadlite_restore2` database, we now have http://etherpad-restore.wikimedia.org. This is...
[21:41:05] <grrrit-wm>	 (03CR) 10Yurik: [C: 04-1] "wow, lots of nice improvements :) Made a few minor comments." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/295450 (https://phabricator.wikimedia.org/T108459) (owner: 10Yurik)
[21:42:19] <ori>	 page save time is dropping back down
[21:43:18] <grrrit-wm>	 (03PS3) 10BBlack: Remove old maps-test servers from LVS config [puppet] - 10https://gerrit.wikimedia.org/r/295640 (owner: 10Gehel)
[21:44:10] <grrrit-wm>	 (03CR) 10BBlack: [C: 031] "Correct, "puppet-merge" will invoke "conftool-merge" to remove the servers from the lists pybal uses, no explicit action on LVSes is requi" [puppet] - 10https://gerrit.wikimedia.org/r/295640 (owner: 10Gehel)
[21:44:55] <grrrit-wm>	 (03CR) 10Gehel: "BBlack: thanks! Will merge tomorrow." [puppet] - 10https://gerrit.wikimedia.org/r/295640 (owner: 10Gehel)
[21:46:53] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:47:27] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "nice!. minor inline comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295450 (https://phabricator.wikimedia.org/T108459) (owner: 10Yurik)
[21:48:07] <ori>	 yeah, the edit stash hit ratio track yesterday's until the deployment and then dipped: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=1466718428.219&from=-4hours&target=alias(asPercent(sumSeries(MediaWiki.editstash.cache_hits.*.rate)%2C%20sumSeries(MediaWiki.editstash.cache_%7Bhits%2Cmisses%7D.*.rate))%2C%22edit%20stash%20hit%20%25%22)&target=alias(timeShift(asPercent(sumSeries(MediaWiki.editstash.cache_hits.*.r
[21:48:07] <ori>	 ate)%2C%20sumSeries(MediaWiki.editstash.cache_%7Bhits%2Cmisses%7D.*.rate))%2C%20%221d%22)%2C%20%22edit%20stash%20hit%20%25%2C%20-1d%22)
[21:48:45] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master).
[21:48:51] <ori>	 I'd paste the short URL but I can't because Graphite is broken ("Graphite encountered an unexpected error while handling your request." in the top frame, and short urls point to 127.0.0.1)
[21:48:54] <ori>	 anyone know what that's about?
[21:49:34] * ori guesses yuvi / I22f45a80e834ab1a686fe09c8ce64da19380dbaa
[21:50:44] <icinga-wm>	 RECOVERY - etherpad_lite_process_running on etherpad1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js
[21:58:26] <wikibugs>	 07Blocked-on-Operations, 06Operations, 07Graphite: "unexpected error" on graphite-web - https://phabricator.wikimedia.org/T138541#2403434 (10ori)
[21:58:51] <thcipriani>	 hmm, so the suspicion is https://gerrit.wikimedia.org/r/#/c/295023/1 ?
[21:59:10] <ori>	 thcipriani: no, that one is on both 7 and 6
[21:59:34] <icinga-wm>	 PROBLEM - etherpad_lite_process_running on etherpad1001 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js
[22:01:13] <ori>	 it might not be edit stash related at all, since page save time is back down, but the stash hit rate is not
[22:01:36] <grrrit-wm>	 (03PS5) 10Gehel: Move es-tool to a proper python package [puppet] - 10https://gerrit.wikimedia.org/r/290765 
[22:02:50] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Move es-tool to a proper python package [puppet] - 10https://gerrit.wikimedia.org/r/290765 (owner: 10Gehel)
[22:03:27] <ori>	 thcipriani: I doubt I'll get to the bottom of it in the few minutes I have before I need to go. Is it all right to leave wikis on wmf.6 for now?
[22:04:06] <thcipriani>	 ori: yes. probably for the best if the solution is unknown and the save time is returning to normal since the rollback.
[22:05:29] <thcipriani>	 abusefilter logspam was almost to a rollback tipping point anyway.
[22:09:51] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Introduce etherpad100b.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/295763 (https://phabricator.wikimedia.org/T138516) 
[22:10:15] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Introduce etherpad100b.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/295763 (https://phabricator.wikimedia.org/T138516) (owner: 10Alexandros Kosiaris)
[22:10:54] <jynus>	 ignore that
[22:11:50] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "jenkins seems to have croaked. Removing and self merging" [dns] - 10https://gerrit.wikimedia.org/r/295763 (https://phabricator.wikimedia.org/T138516) (owner: 10Alexandros Kosiaris)
[22:12:03] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: etherpad-restore: Use etherpad1001b [puppet] - 10https://gerrit.wikimedia.org/r/295764 (https://phabricator.wikimedia.org/T138516) 
[22:12:32] <chasemp>	 !log powercycle labstore1005
[22:12:32] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: etherpad-restore: Use etherpad1001b [puppet] - 10https://gerrit.wikimedia.org/r/295764 (https://phabricator.wikimedia.org/T138516) 
[22:12:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:12:50] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] etherpad-restore: Use etherpad1001b [puppet] - 10https://gerrit.wikimedia.org/r/295764 (https://phabricator.wikimedia.org/T138516) (owner: 10Alexandros Kosiaris)
[22:12:53] <ori>	 OK -- I have to go. If no one else files a task, I will file one when I get back in a couple of hours. Thanks for spotting that and rolling back, and thanks for the ping.
[22:13:02] <ori>	 ^ thcipriani
[22:13:24] <thcipriani>	 ack
[22:13:37] <thcipriani>	 thanks for looking into it
[22:15:01] <wikibugs>	 06Operations, 10Traffic: Backend naming in VCL needs to use fqdn+port - https://phabricator.wikimedia.org/T138546#2403529 (10BBlack)
[22:15:54] <icinga-wm>	 PROBLEM - carbon-cache@b service on labmon1001 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@b is inactive
[22:16:14] <icinga-wm>	 PROBLEM - carbon-cache@c service on labmon1001 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@c is inactive
[22:16:25] <icinga-wm>	 PROBLEM - carbon-cache@d service on labmon1001 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@d is inactive
[22:16:44] <icinga-wm>	 PROBLEM - carbon-cache@e service on labmon1001 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@e is inactive
[22:17:03] <icinga-wm>	 PROBLEM - carbon-cache@f service on labmon1001 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@f is inactive
[22:17:13] <icinga-wm>	 PROBLEM - graphite.wmflabs.org on labmon1001 is CRITICAL: Connection refused
[22:17:15] <icinga-wm>	 PROBLEM - carbon-cache@g service on labmon1001 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@g is inactive
[22:17:16] <chasemp>	 robh: I guess not silenced^?
[22:17:29] <robh>	 ahhh, for the old window, then removed from pupet
[22:17:31] <robh>	 lemme fix
[22:17:43] <icinga-wm>	 PROBLEM - carbon-cache@h service on labmon1001 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@h is inactive
[22:17:44] <icinga-wm>	 PROBLEM - carbon-cache@a service on labmon1001 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@a is inactive
[22:18:53] <robh>	 ok, they are in maint mode until tomrrow 1700gmt
[22:19:17] <robh>	 so no more irc spam.  the issue is the maint isnt sticky when you reinstall a host and remove it from icinga
[22:19:18] <robh>	 heh
[22:22:37] <grrrit-wm>	 (03PS6) 10Alex Monk: [WIP/POC/POS] Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) 
[22:22:44] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[22:33:07] <chasemp>	 !log reimage labstore1005 post io testing
[22:33:11] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:33:34] <icinga-wm>	 RECOVERY - etherpad_lite_process_running on etherpad1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js
[22:34:14] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[22:40:24] <icinga-wm>	 PROBLEM - etherpad_lite_process_running on etherpad1001 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js
[22:44:49] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 13Patch-For-Review, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2403642 (10jcrespo) It is finally working: https://etherpad-restore.wikimedia.org (if it does not, wait for your DNS cache to update).  Please recover anythin...
[22:46:16] <wikibugs>	 06Operations, 10DBA, 10Wikimedia-Etherpad, 13Patch-For-Review, 07User-notice: etherpad database issues - https://phabricator.wikimedia.org/T138516#2403643 (10jcrespo)
[22:52:14] <icinga-wm>	 PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[23:00:04] <jouncebot>	 RoanKattouw, ostriches, Krenair, MaxSem, awight, and Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160623T2300). Please do the needful.
[23:00:04] <jouncebot>	 Jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[23:00:16] <jdlrobson>	 here /o
[23:02:52] <grrrit-wm>	 (03CR) 10MaxSem: [C: 04-1] Complete list of legacy main pages, switch default to false (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) (owner: 10Jdlrobson)
[23:03:39] <jdlrobson>	 MaxSem: why sorted? It's actually more useful to group by project
[23:04:05] <MaxSem>	 it's not sorted even within one project. also, other dblists are sorted
[23:05:13] <jdlrobson>	 okay.. but is there any reason other than readability (just so i understand motivation)?
[23:06:26] <MaxSem>	 readability is a pretty important one
[23:06:46] <MaxSem>	 I don't think it's technically required
[23:07:19] <jdlrobson>	 i'm just thinking how best to sort it
[23:07:39] <MaxSem>	 cat | sort > file
[23:07:41] <jdlrobson>	 it would be preferable to have projects sorted by languages but not sure how easy that would be to achieve...
[23:07:51] <Dereckson>	 coherence with other dblist is a pretty good idea too
[23:07:57] <jdlrobson>	 (i know i can sort it that way... but that leads to all the zh's together)
[23:08:08] <jdlrobson>	 which makes it harder to divide and conquer this list one community at a time
[23:08:36] <MaxSem>	 that's how all te other lists are sorted
[23:08:58] <Dereckson>	 jdlrobson: the dblist is not a good management todo list tool
[23:09:23] <grrrit-wm>	 (03PS5) 10Jdlrobson: Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) 
[23:09:26] <Dereckson>	 jdlrobson: grep wikt myaweesome.dblist
[23:09:28] <jdlrobson>	 i dont care enough so i just sorted :)
[23:09:38] <jdlrobson>	 new patch is up
[23:09:44] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) (owner: 10Jdlrobson)
[23:09:57] <grrrit-wm>	 (03PS6) 10Jdlrobson: Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) 
[23:10:30] <jdlrobson>	 wait smething went wrong
[23:10:42] <grrrit-wm>	 (03CR) 10Jdlrobson: [C: 04-1] Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) (owner: 10Jdlrobson)
[23:11:17] <MaxSem>	 why -1?
[23:11:37] <jdlrobson>	 i think it's okay
[23:11:45] <jdlrobson>	 i just read the diff wrong - i thought it had removed some items in the sort
[23:11:50] <jdlrobson>	 but they seem to be there
[23:11:52] <MaxSem>	 then remove it:)
[23:11:57] <jdlrobson>	 i have
[23:12:03] <jdlrobson>	 it's okay to merge again :)
[23:12:34] <grrrit-wm>	 (03PS7) 10MaxSem: Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) (owner: 10Jdlrobson)
[23:12:43] <grrrit-wm>	 (03CR) 10MaxSem: [C: 032] Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) (owner: 10Jdlrobson)
[23:13:29] <grrrit-wm>	 (03Merged) 10jenkins-bot: Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) (owner: 10Jdlrobson)
[23:14:55] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[23:15:05] <logmsgbot>	 !log maxsem@tin Synchronized dblists/mobilemainpagelegacy.dblist: https://gerrit.wikimedia.org/r/#/c/295600/ (duration: 00m 28s)
[23:15:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:16:59] <logmsgbot>	 !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/295600/ (duration: 00m 29s)
[23:17:04] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:17:04] <icinga-wm>	 RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[23:17:15] <MaxSem>	 jdlrobson, ^
[23:17:21] <jdlrobson>	 on it
[23:25:00] <jdlrobson>	 MaxSem: looks good to me (As best as i can test - can't find any examples where it broke things)
[23:25:05] <jdlrobson>	 thank you!
[23:28:12] <MaxSem>	 :)
[23:45:05] <icinga-wm>	 PROBLEM - puppet last run on mw2238 is CRITICAL: CRITICAL: puppet fail